Parallel Implementation of BWT Under the Guidance of : Prof . Kolin Paul Presented By: Lalchand Gaurav Jain Agenda •Application Domain & objective •Use of Bwt in Sequence assembly •Bwt Implementation on GPU •Bwt Implementation for larger Genome •Comparative study Agenda •Application Domain & objective •Use of Bwt in Sequence assembly •Bwt Implementation on GPU •Bwt Implementation for larger Genome •Comparative study Agenda •Application Domain & objective •Use of Bwt in Sequence assembly •Bwt Implementation on GPU •Bwt Implementation for larger Genome •Comparative study Agenda •Application Domain & objective •Use of Bwt in Sequence assembly •Bwt Implementation on GPU •Bwt Implementation for larger Genome •Comparative study Agenda •Application Domain & objective •Use of Bwt in Sequence assembly •Bwt Implementation on GPU •Bwt Implementation for larger Genome •Comparative study Application Domain & Objective •Analyzing Gene expression •Mapping variations between individuals •Mapping homologous Proteins •Assembling Genome of Organism To present an efficient implementation of BWT for larger Genome. Use of Bwt in Sequence assembly Genome e Indexing Intermediate size :10^18 SGA Contigs Burrows-Wheeler Transform Input: A C G T A $ indices: 0 1 2 3 4 5 5 $ A C G T A 4 A $ A C G T 3 T A $ A C G 2 G T A $ A C 5 $ A C G T A 1 C G T A $ A 4 A $ A C G T 0 A C G T A $ 0 A C G T A $ 1 C G T A $ A 2 G T A $ A C 3 T A $ A C G Bwt[i] = ref [ SA[i] -1] indices: 5 4 0 1 2 3 Output: A T $ A C G {Bwt[i] = $ when S(i)= 0} 9 Work Done Implemented Bwt on GPU Bitonic sort Implemented Bwt for larger genome In mutipass (GPU and CPU) Why Bitonic ??... • Concatenations of two sub-sequences sorted in opposite directions • A cyclic shift of elements • Implemented by comparator networks • Work in place • No Communication • Naturally suitable for SIMD architectures • Each thread executing same code but different data • O(log2n) time and O(nlog2n) work Bwt Procedure For larger Genome Genome Read & store (CPU) 2*CHUNK Bitonic_sort_step Merge Suffix array (CPU) Calcualte Gt array Calcualte Gap array Suffix array (CPU) Suffix - > BWT Comparison between Parallel BWT(GPU) and serial BWT (CPU) GPU Statistics SM 8 Frequency 1 Mhz No. Of Blocks 2 ^(26) Threads/Block 1024 Serial Bwt : Does not work for large files Global Mem 2 GB Comparison between Parallel BWT (GPU) and Parallel BWT (CPU) CPU Statistics Core 16 Freq. 1.5 Ghz GPU Statistics SM 8 Frequency 1 Mhz 2 ^(26) L1 Cache 32 K No. Of Blocks L2 Cache 2 MB Threads/Block 1024 Global Mem 64 GB Global Mem 2 GB Evaluation for larger Genome GPU Statistics SM 8 Frequency 1 Mhz No. Of Blocks 2 ^(26) Threads/Block 1024 Global Mem 2 GB References : •Lightweight Data Indexing and Compression in External Memor Paolo Ferragina 1, Travis Gagie2 , and Giovanni Manzini •Fast in-place sorting with CUDA based on bitonic sort :Hagen Peters •Rapid Parallel Genome Indexing with MapReduce :Rohith K. Menon • M. Burrows and D. Wheeler. A Block-Sorting Lossless Data Compression Algorithm. Technical report •Lightweight Data Indexing and Compression in External Memory :Paolo Ferragina •Parallel Lossless Data Compression on the GPU : Yao Zhang Thanks