ppt

advertisement
Parallel Implementation
of BWT
Under the Guidance of :
Prof . Kolin Paul
Presented By:
Lalchand
Gaurav Jain
Agenda
•Application Domain & objective
•Use of Bwt in Sequence assembly
•Bwt Implementation on GPU
•Bwt Implementation for larger Genome
•Comparative study
Agenda
•Application Domain & objective
•Use of Bwt in Sequence assembly
•Bwt Implementation on GPU
•Bwt Implementation for larger Genome
•Comparative study
Agenda
•Application Domain & objective
•Use of Bwt in Sequence assembly
•Bwt Implementation on GPU
•Bwt Implementation for larger Genome
•Comparative study
Agenda
•Application Domain & objective
•Use of Bwt in Sequence assembly
•Bwt Implementation on GPU
•Bwt Implementation for larger Genome
•Comparative study
Agenda
•Application Domain & objective
•Use of Bwt in Sequence assembly
•Bwt Implementation on GPU
•Bwt Implementation for larger Genome
•Comparative study
Application Domain & Objective
•Analyzing Gene expression
•Mapping variations between individuals
•Mapping homologous Proteins
•Assembling Genome of Organism
To present an efficient implementation of
BWT for larger Genome.
Use of Bwt in Sequence assembly
Genome
e
Indexing
Intermediate
size :10^18
SGA
Contigs
Burrows-Wheeler Transform
Input: A C G T A $
indices: 0 1 2 3 4 5
5
$
A
C
G
T
A
4
A
$
A
C
G
T
3
T
A
$
A
C
G
2
G
T
A
$
A
C
5
$
A
C
G
T
A
1
C
G
T
A
$
A
4
A
$
A
C
G
T
0
A
C
G
T
A
$
0
A
C
G
T
A
$
1
C
G
T
A
$
A
2
G
T
A
$
A
C
3
T
A
$
A
C
G
Bwt[i] = ref [ SA[i] -1]
indices: 5 4 0 1 2 3
Output: A T $ A C G
{Bwt[i] = $ when S(i)= 0}
9
Work Done

Implemented Bwt on GPU


Bitonic sort
Implemented Bwt for larger genome

In mutipass (GPU and CPU)
Why Bitonic ??...
• Concatenations of two sub-sequences sorted in
opposite directions
• A cyclic shift of elements
• Implemented by comparator networks
• Work in place
• No Communication
• Naturally suitable for SIMD architectures
• Each thread executing same code but different data
• O(log2n) time and O(nlog2n) work
Bwt Procedure For larger Genome
Genome
Read & store
(CPU)
2*CHUNK
Bitonic_sort_step
Merge Suffix
array (CPU)
Calcualte Gt
array
Calcualte
Gap array
Suffix array
(CPU)
Suffix - > BWT
Comparison between Parallel BWT(GPU) and
serial BWT (CPU)
GPU Statistics
SM
8
Frequency
1 Mhz
No. Of Blocks
2 ^(26)
Threads/Block 1024
Serial Bwt : Does not work for large files
Global Mem
2 GB
Comparison between Parallel BWT (GPU)
and Parallel BWT (CPU)
CPU Statistics
Core
16
Freq.
1.5 Ghz
GPU Statistics
SM
8
Frequency
1 Mhz
2 ^(26)
L1 Cache
32 K
No. Of Blocks
L2 Cache
2 MB
Threads/Block 1024
Global Mem
64 GB
Global Mem
2 GB
Evaluation for larger Genome
GPU Statistics
SM
8
Frequency
1 Mhz
No. Of Blocks
2 ^(26)
Threads/Block 1024
Global Mem
2 GB
References :
•Lightweight Data Indexing and Compression in External Memor
Paolo Ferragina 1, Travis Gagie2 , and Giovanni Manzini
•Fast in-place sorting with CUDA based on bitonic sort :Hagen Peters
•Rapid Parallel Genome Indexing with MapReduce :Rohith K. Menon
• M. Burrows and D. Wheeler. A Block-Sorting Lossless Data Compression Algorithm.
Technical report
•Lightweight Data Indexing and Compression in External Memory :Paolo Ferragina
•Parallel Lossless Data Compression on the GPU : Yao Zhang
Thanks
Download