Nile University, Bioinformatics Group. Cluster Computer For Bioinformatics Applications Hisham Adel 2008 Done By: 1. Hisham Adel Hassan. Supervised by: Dr. Mohamed Aboualhouda 2 Points • • • • • • • • • • Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic Problems. General Idea about Sequence Alignment. BLAST and Parallel BLAST Algorithm. Sequence Alignment and Parallel Sequence Alignment. Learned Skills. 3 Introduction 4 Points • • • • • • • • • • Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic Problems. General Idea about Sequence Alignment. BLAST and Parallel BLAST Algorithm. Sequence Alignment and Parallel Sequence Alignment. Learned Skills. 5 Cluster Definition •Group of computers and servers (connected together) that act like a single system. •Each system called a Node. •Node contain one or more Processor , Ram ,Hard disk and LAN card. •Nodes work in Parallel. •We can increase performance by adding more Nodes. 6 7 8 Points • • • • • • • • • • Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic Problems. General Idea about Sequence Alignment. BLAST and Parallel BLAST Algorithm. Sequence Alignment and Parallel Sequence Alignment. Learned Skills. 9 Cluster types •Load Balancing Cluster (Parallel BLAST). •Computing Cluster(Parallel sequence alignment). •High-availability (HA) clusters. 10 Cluster types:Load Balancing Cluster Task 11 Cluster types:Computing Cluster Task 12 Cluster type:High-availability Clusters 13 Cluster advantages •Performance. •Scalability. •Maintenance. •Cost. 14 Points • • • • • • • • • • Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic Problems. General Idea about Sequence Alignment. BLAST and Parallel BLAST Algorithm. Sequence Alignment and Parallel Sequence Alignment. Learned Skills. 15 Our Cluster Internet Node 4 Node 1 Internet Node 2 Internet switch Internet Node 3 16 Our Cluster specification Communication : Switch 5-Port 10/100Mbps. Processor and Ram: -Master Node Duo core Processor 1.86 GHZ. Ram 1GB. -Node 1 Pentium 4 Ram 1GB. -Node 2 Pentium 4 Ram 1GB -Node 3 Pentium 4 Ram 512 MB 17 Our Cluster specification (cont’) Operating System OPEN SUSE 10.3 http://software.opensuse.org/ MPICH2 http://www.mcs.anl.gov/research/projects/mpich2/ 18 Points • • • • • • • • • • Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic Problems. General Idea about Sequence Alignment. BLAST and Parallel BLAST Algorithm. Sequence Alignment and Parallel Sequence Alignment. Learned Skills. 19 Performance of the Cluster is affected by 1-Node speed. 2-Running Program. 20 Running Program(sequential) Working… 21 Running Program(sequential) Working… 22 Running Program(sequential) Working… 23 Running Program(sequential) 24 Running Program(Parallel) Data sent Data sent Data sent 25 Running Program(Parallel) Working… Working… Working… Working… 26 Running Program(Parallel) Finished… Results Get results… Results Finished… Finished… Results 27 Points • • • • • • • • • • Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic Problems. General Idea about Sequence Alignment. BLAST and Parallel BLAST Algorithm. Sequence Alignment and Parallel Sequence Alignment. Learned Skills. 28 Sequence Alignment 29 Sequence Alignment Used to : 1-Compare between sequences. 2-Search databases. 30 How to Align two Sequences. if we have two sequences A A A C G A A A T G A Let match=1, gap=-1 , miss-match=0. they can be aligned as: 1- A A A C G A | | | | | Score=3 | A A T _ G A 2- A A A C _ G A | | | | | | | Score=1 A A _ _ T G A 31 Points • • • • • • • • • • Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance Cluster Computer for Basic Problems.. General Idea about Sequence Alignment. BLAST and Parallel BLAST Algorithm. Sequence Alignment and Parallel Sequence Alignment. Learned Skills. 32 BLAST (Basic Local Alignment Search Tool) Searching DataBases 33 BLAST Algorithm (High scoring pairs) 34 Blast search types. BLASTN - Compares a nucleotide query sequence against a nucleotide sequence database. BLASTP- Compares an amino acid query sequence against a protein sequence database. TBLASTN- Compares a protein query sequence against a nucleotide sequence Database. BLASTX- Compares nucleotide query sequence against a protein sequence database. 35 Why We need BLAST to be parallelized ? 36 Our Program:Parallel BLAST 37 Parallel BLAST(cont’) Formatdb.c Nucleotide sequence database Protein sequence database “formatdb -i DATABASE -p F “. “formatdb -i DATABASE -p T “. 38 Parallel BLAST(cont’) Linux_Cluster_BLASTALL.c “blastall -p BLAST Search Type -d DATABASE -i QUERY FILE -o out . Txt” 39 Results Average of running 1000 Query, 1000 times. Nucleotide-Nucleotide 1.8000000 1.6000000 1.4000000 1.2000000 Tim e(S) 1.0000000 1 Node 3 Nodes-Query time 3-Nodes-Query and communication time 0.8000000 0.6000000 0.4000000 0.2000000 0.0000000 drosoph.nt (118,6 MB)) Yeastnt (3.2 MB) month.htgs (573 MB) igseqnt (67.5 MB) Pdbnt (1.7 MB) mito.nt (3.2 MB) Database(Size) 40 Results(cont’) Average of running 1000 Query, 1000 times. Amino acid_Amino acid 90.000000 80.000000 70.000000 Tim e(S) 60.000000 50.000000 1 Node-Query Time 3 Nodes-Query time 3 Nodes-Query and communication time 40.000000 30.000000 20.000000 10.000000 0.000000 env_nr(1.6GB) nr(573MB) Sw issProt(160MB) Pdbaa(20MB) Yeast.aa(3.2MB) Database(size) 41 Results(cont’) Average of running 1000 Query, 1000 times. Amino acid_Nucltide 90.0000000 80.0000000 70.0000000 Time(S) 60.0000000 50.0000000 1 Node Query time 3 Nodes Query time only 3 Nodes Query and Communication time 40.0000000 30.0000000 20.0000000 10.0000000 0.0000000 env_nr(1.6GB) Sw issprot(160MB) nr(84.7MB) Pdbaa(20.4MB) yeast.aa(3.2MB) Database(Size) 42 Conclusion about Parallel BLAST. •Performane: Batter by using CLUSTER. •Scalability:More Nodes time decrease. 43 Points • • • • • • • • • • Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic Problems. General Idea about Sequence Alignment. BLAST and Parallel BLAST Algorithm. Sequence Alignment and Parallel Sequence Alignment. Learned Skills. 44 Sequence Alignment Compare between sequences 45 Sequence Alignment •Introduction. •Sequence Alignment Benefits. •Sequence Alignment Types. 46 Needleman-Wunsch Algorithm 47 Why We need Sequence Alignment to be parallelized ? 48 Parallel Sequence Alignment algorithm 49 Our Sequence Alignment Program •Pairwise Alignment. •Built Using Needleman-Wunsch algorithm. 50 Learned Skills. •Using Linux (Suse 10.3) operating system. • Programming using C language. • Cluster computers and how to build one. • MPICH2 for message passing interfaces between nodes. • Latex. • Team working, and helping each other. • Presentation skills. 51 Thank you for your time. Hisham Adel 52