Short read alignment, genome alignment, and high performance

Short read alignment BNFO 601 Short read alignment • Input: – Reads: short DNA sequences usually up to 100 base pairs (bp) produced by a sequencing machine • Reads are fragments of a longer DNA sequence present in the sample given as input to the machine • Usually number in the millions – Genome sequence: a reference DNA sequence much longer than the read length Short read alignment • Applications – Genome assembly – RNA splicing studies – Gene expression studies – Discovery of new genes – Discovering of cancer causing mutations Short read alignment • Two approaches – Hashing based algorithms • • • • BFAST SHRIMP MAQ STAMPY (statistical alignment) – Burrows Wheeler transform • Bowtie • BWA BFAST overview PLoS ONE 4(11): e7767. BFAST algorithm PLoS ONE 4(11): e7767. BFAST masked keys Short read alignment Empirical performance: • Simulated data: – Extract random substrings of fixed length with random mutations and gaps – Realign back to reference genome • Real data: – Paired reads: two ends of the same molecule – Count number of paired reads within 500 to 10000 bases of each other Short read alignment Courtesy of Genome Res. June 2011 21: 936-939; Short read alignment Courtesy of Genome Res. June 2011 21: 936-939; Short read alignment

Short read alignment, genome alignment, and high performance

Related documents

Products

Support

Short read alignment, genome alignment, and high performance

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib