Poster 118 FastHASH: A New Algorithm for Fast and Comprehensive Next-Generation Sequence Mapping Next-Generation DNA Sequencing (NGS) DNA sequencing is important Basic Approach: Read and map C—G T—A G—C T—G C—A A—C T—G G—T C—A C—A Next-generation sequencing technologies (NGS) produce many short DNA fragments Chromosome VII G—123 A—124 C—125 G—126 A—127 C—128 G—129 T—130 A—131 A—132 More computationally intensive Harder to map a fragment to entire genome Especially when allowing polymorphisms Goal: Design fast and comprehensive algorithms to analyze enormous amounts of NGS data 2 Challenge of Existing NGS Mapping Tools We want a tool that is both Fast and Comprehensive A—T A—T T—122 C—G G—C T—289 C—G A—T G—123 T—260 T—A G—290 C—G C—G T—122 T—A A—124 G—361 C—G T—A G—C T—732 A—291 T—A C—G G—123 G—C C—125 T—1133 A—362G—733 T—A T—G C—G C—292 G—C T—A A—124 T—G G—C G—126 G—1134 C—363 C—A T—G A—734 T—A G—293 G—C C—125 C—A A—127 A—1135 G—364 T—G C—G A—C C—735 A—294 C—A T—G G—126 A—C C—128 C—1136 A—365 C—A T—G G—736 T—G C—295 A—C C—A A—127 T—G G—129 G—1137 C—366 A—C C—A G—T A—127 G—296 T—G A—C C—128 G—T T—130 A—1138 G—367C—738 T—G C—A A—C T—297 G—T T—G G—129 C—A A—131 C—1139 T—268 G—T C—A G—739 G—T A—298 C—A G—T T—130 C—A A—132 G—1140 A—269T—740 C—A G—T A—299 C—A C—A A—131 T—1141 A—370A—741 C—A C—A C—A A—132A—1142 A—742 C—A A—1143 Some tools are only fast need a fast tool need a comprehensive tool Want to find all possible locations with some mutations allowed BWA, SOAPv2, Bowtie Lose mapping Some tools are only comprehensive mrFAST, mrsFAST Slow 3 FastHASH NGS Mapping Kernel Goal Observation Best of both worlds: Fast and Comprehensive Some tools do unnecessary work to guarantee comprehensiveness main cause of slowness Main Idea of FastHASH Cut down unnecessary work by exploiting knowledge of reference genome 4 Evaluation Runtime (s) 20000 18369 15000 mrFAST kernel 10000 5000 0 FastHASH kernel 38x speedup compared to state-of-the-art comprehensive tool 478 Without sacrificing comprehensiveness Also implemented on GPU for further acceleration 5 Poster 118 FastHASH: A New Algorithm for Fast and Comprehensive Next-Generation Sequence Mapping