FastHASH: A New Algorithm for Fast and Comprehensive Next-Generation Sequence Mapping Hongyi Xin† Farhad Hormozdiari ‡ Donghyuk Lee† Can Alkan § Onur Mutlu† † Carnegie Mellon University § University of Washington ‡ University of California Los Angeles Next-Generation DNA Sequencing (NGS) n DNA sequencing is important n Basic Approach: Read and map C—G T—A G—C T—G C—A A—C T—G G—T C—A C—A n Next-generation sequencing technologies produce a large number of short DNA fragments q q q n Chromosome VII G—123 A—124 C—125 G—126 A—127 C—128 G—129 T—130 A—131 A—132 More computationally intensive Harder to map the entire genome Especially when allowing polymorphism Goal: Design fast and comprehensive algorithms to analyze enormous amounts of NGS data 2 Challenge of Existing NGS Mapping Tools n We want Fast and Comprehensive A—T A—T T—122 C—G T—289 G—C C—G G—123 T—260 A—T T—A G—290 C—G C—G T—122 T—A A—124 G—361 C—G G—C T—732 T—A A—291 T—A C—G G—123 G—C C—125 T—1133 A—362 T—A T—G G—733 C—G C—292 G—C T—A A—124 T—G G—126 G—1134 C—363 G—C C—A A—734 T—A G—293 T—G G—C C—125 C—A C—735 A—127 A—1135 G—364 T—G A—C C—G A—294 C—A T—G G—126 A—C C—128 C—1136 A—365 C—A T—G T—G G—736 C—295 A—C C—A A—127 T—G A—C G—129 G—1137 C—366 G—T A—127 C—A G—296 T—G A—C C—128 G—T T—130 A—1138 G—367 T—G C—A C—738 A—C T—297 G—T T—G G—129 C—A A—131 C—1139 T—268 G—T C—A G—739 G—T A—298 C—A G—T T—130 C—A C—A A—132 G—1140 A—269 T—740 G—T A—299 C—A C—A A—131 T—1141 A—370A—741 C—A C—A C—A A—132A—1142 A—742 C—A A—1143 n Some tools are only fast q q n à need a fast tool à need a comprehensive tool BWA, SOAPv2, Bowtie Not comprehensive Some tools are only comprehensive q q mrFAST, mrsFAST Slow 3 FastHASH NGS Mapping Kernel n Goal q n Observation q n Some tools do unnecessary work to guarantee comprehensiveness à main cause of slowness Main Idea of FastHASH q n Best of both worlds: Fast and Comprehensive Cut down unnecessary work by taking advantage of full knowledge of reference genome Two main mechanisms q Adjacency Filtering and Cheap Segment Selection 4 Evaluation Runtime (s) 20000 18369 15000 mrFAST kernel 10000 5000 0 n FastHASH kernel 38x speedup compared to state-of-the-art alignment tool q n 478 No sacrifice of comprehensiveness Also implemented on GPU for further acceleration 5 Thanks n For more information, please come by our poster! 6 FastHASH: A New Algorithm for Fast and Comprehensive Next-Generation Sequence Mapping Hongyi Xin† Farhad Hormozdiari ‡ Donghyuk Lee† Can Alkan § Onur Mutlu† † Carnegie Mellon University § University of Washington ‡ University of California Los Angeles