FastHASH: A New Algorithm for Fast and Comprehensive

advertisement
Poster 118
FastHASH:
A New Algorithm for
Fast and Comprehensive
Next-Generation Sequence Mapping
Next-Generation DNA Sequencing (NGS)
 
DNA sequencing is important
 
Basic Approach: Read and map
C—G
T—A
G—C
T—G
C—A
A—C
T—G
G—T
C—A
C—A
 
Next-generation sequencing technologies (NGS) produce many
short DNA fragments
 
 
 
 
Chromosome VII
G—123
A—124
C—125
G—126
A—127
C—128
G—129
T—130
A—131
A—132
More computationally intensive
Harder to map a fragment to entire genome
Especially when allowing polymorphisms
Goal: Design fast and comprehensive algorithms to analyze
enormous amounts of NGS data
2
Challenge of Existing NGS Mapping Tools
 
We want a tool that is both Fast and Comprehensive
A—T
A—T
T—122
C—G G—C
T—289
C—G A—T
G—123
T—260
T—A
G—290
C—G C—G
T—122
T—A
A—124
G—361
C—G T—A
G—C
T—732
A—291
T—A
C—G
G—123
G—C
C—125
T—1133
A—362G—733
T—A
T—G
C—G
C—292
G—C
T—A
A—124
T—G G—C
G—126
G—1134
C—363
C—A T—G
A—734 T—A
G—293
G—C
C—125
C—A
A—127
A—1135
G—364
T—G C—G
A—C
C—735
A—294
C—A
T—G
G—126
A—C
C—128
C—1136
A—365
C—A
T—G
G—736
T—G
C—295
A—C
C—A
A—127
T—G
G—129
G—1137
C—366
A—C C—A
G—T
A—127
G—296
T—G
A—C
C—128
G—T
T—130
A—1138
G—367C—738
T—G
C—A
A—C
T—297
G—T T—G
G—129
C—A
A—131
C—1139
T—268
G—T
C—A
G—739
G—T
A—298
C—A
G—T
T—130
C—A
A—132
G—1140
A—269T—740
C—A G—T
A—299
C—A C—A
A—131
T—1141
A—370A—741
C—A
C—A
C—A
A—132A—1142
A—742 C—A
A—1143
 
Some tools are only fast
 
 
 
 need a fast tool
 need a comprehensive tool
Want to find all possible locations
with some mutations allowed
BWA, SOAPv2, Bowtie
Lose mapping
Some tools are only comprehensive
 
 
mrFAST, mrsFAST
Slow
3
FastHASH NGS Mapping Kernel
 
Goal
 
 
Observation
 
 
Best of both worlds: Fast and Comprehensive
Some tools do unnecessary work to guarantee
comprehensiveness  main cause of slowness
Main Idea of FastHASH
 
Cut down unnecessary work by exploiting knowledge of
reference genome
4
Evaluation
Runtime (s)
20000
18369
15000
mrFAST kernel
10000
5000
0
 
FastHASH kernel
38x speedup compared to state-of-the-art comprehensive
tool
 
 
478
Without sacrificing comprehensiveness
Also implemented on GPU for further acceleration
5
Poster 118
FastHASH:
A New Algorithm for
Fast and Comprehensive
Next-Generation Sequence Mapping
Download