Computer Cluster - NUBIOS

advertisement
Nile University,
Bioinformatics Group.
Cluster Computer
For Bioinformatics
Applications
Hisham Adel
2008
Done By:
1.
Hisham Adel Hassan.
Supervised by:
Dr. Mohamed Aboualhouda
2
Points
•
•
•
•
•
•
•
•
•
•
Introduction.
Cluster and Supercomputers.
Cluster Types and Advantages.
Our Cluster.
Cluster Performance.
Cluster Computer for Basic Problems.
General Idea about Sequence Alignment.
BLAST and Parallel BLAST Algorithm.
Sequence Alignment and Parallel Sequence Alignment.
Learned Skills.
3
Introduction
4
Points
•
•
•
•
•
•
•
•
•
•
Introduction.
Cluster and Supercomputers.
Cluster Types and Advantages.
Our Cluster.
Cluster Performance.
Cluster Computer for Basic Problems.
General Idea about Sequence Alignment.
BLAST and Parallel BLAST Algorithm.
Sequence Alignment and Parallel Sequence Alignment.
Learned Skills.
5
Cluster Definition
•Group of computers and servers (connected together) that act like a single
system.
•Each system called a Node.
•Node contain one or more Processor , Ram ,Hard disk and LAN card.
•Nodes work in Parallel.
•We can increase performance by adding more Nodes.
6
7
8
Points
•
•
•
•
•
•
•
•
•
•
Introduction.
Cluster and Supercomputers.
Cluster Types and Advantages.
Our Cluster.
Cluster Performance.
Cluster Computer for Basic Problems.
General Idea about Sequence Alignment.
BLAST and Parallel BLAST Algorithm.
Sequence Alignment and Parallel Sequence Alignment.
Learned Skills.
9
Cluster types
•Load Balancing Cluster (Parallel BLAST).
•Computing Cluster(Parallel sequence alignment).
•High-availability (HA) clusters.
10
Cluster types:Load Balancing Cluster
Task
11
Cluster types:Computing Cluster
Task
12
Cluster type:High-availability Clusters
13
Cluster advantages
•Performance.
•Scalability.
•Maintenance.
•Cost.
14
Points
•
•
•
•
•
•
•
•
•
•
Introduction.
Cluster and Supercomputers.
Cluster Types and Advantages.
Our Cluster.
Cluster Performance.
Cluster Computer for Basic Problems.
General Idea about Sequence Alignment.
BLAST and Parallel BLAST Algorithm.
Sequence Alignment and Parallel Sequence Alignment.
Learned Skills.
15
Our Cluster
Internet
Node
4
Node
1
Internet
Node
2
Internet
switch
Internet
Node
3
16
Our Cluster specification
Communication : Switch 5-Port 10/100Mbps.
Processor and Ram:
-Master Node
Duo core Processor 1.86 GHZ.
Ram 1GB.
-Node 1
Pentium 4
Ram 1GB.
-Node 2
Pentium 4
Ram 1GB
-Node 3
Pentium 4
Ram 512 MB
17
Our Cluster specification (cont’)
Operating System OPEN SUSE 10.3
http://software.opensuse.org/
MPICH2
http://www.mcs.anl.gov/research/projects/mpich2/
18
Points
•
•
•
•
•
•
•
•
•
•
Introduction.
Cluster and Supercomputers.
Cluster Types and Advantages.
Our Cluster.
Cluster Performance.
Cluster Computer for Basic Problems.
General Idea about Sequence Alignment.
BLAST and Parallel BLAST Algorithm.
Sequence Alignment and Parallel Sequence Alignment.
Learned Skills.
19
Performance of the Cluster is affected by
1-Node speed.
2-Running Program.
20
Running Program(sequential)
Working…
21
Running Program(sequential)
Working…
22
Running Program(sequential)
Working…
23
Running Program(sequential)
24
Running Program(Parallel)
Data sent
Data sent
Data sent
25
Running Program(Parallel)
Working…
Working…
Working…
Working…
26
Running Program(Parallel)
Finished…
Results
Get results…
Results
Finished…
Finished…
Results
27
Points
•
•
•
•
•
•
•
•
•
•
Introduction.
Cluster and Supercomputers.
Cluster Types and Advantages.
Our Cluster.
Cluster Performance.
Cluster Computer for Basic Problems.
General Idea about Sequence Alignment.
BLAST and Parallel BLAST Algorithm.
Sequence Alignment and Parallel Sequence Alignment.
Learned Skills.
28
Sequence Alignment
29
Sequence Alignment
Used to :
1-Compare between sequences.
2-Search databases.
30
How to Align two Sequences.
if we have two sequences
A A A C G A
A A T G A
Let match=1, gap=-1 , miss-match=0.
they can be aligned as:
1-
A A A C G A
| | | | |
Score=3
|
A A T _ G A
2-
A A A C _ G A
| | | | |
|
|
Score=1
A A _ _ T G A
31
Points
•
•
•
•
•
•
•
•
•
•
Introduction.
Cluster and Supercomputers.
Cluster Types and Advantages.
Our Cluster.
Cluster Performance
Cluster Computer for Basic Problems..
General Idea about Sequence Alignment.
BLAST and Parallel BLAST Algorithm.
Sequence Alignment and Parallel Sequence Alignment.
Learned Skills.
32
BLAST
(Basic Local Alignment Search Tool)
Searching DataBases
33
BLAST Algorithm
(High scoring pairs)
34
Blast search types.
BLASTN - Compares a nucleotide query sequence against a nucleotide sequence
database.
BLASTP- Compares an amino acid query sequence against a protein sequence
database.
TBLASTN- Compares a protein query sequence against a nucleotide sequence
Database.
BLASTX- Compares nucleotide query sequence against a protein sequence
database.
35
Why We need BLAST to be
parallelized ?
36
Our Program:Parallel BLAST
37
Parallel BLAST(cont’)
Formatdb.c
Nucleotide sequence database
Protein sequence database
“formatdb -i DATABASE -p F “.
“formatdb -i DATABASE -p T “.
38
Parallel BLAST(cont’)
Linux_Cluster_BLASTALL.c
“blastall -p BLAST Search Type -d DATABASE -i QUERY FILE -o out . Txt”
39
Results
Average of running 1000 Query, 1000 times.
Nucleotide-Nucleotide
1.8000000
1.6000000
1.4000000
1.2000000
Tim e(S)
1.0000000
1 Node
3 Nodes-Query time
3-Nodes-Query and communication time
0.8000000
0.6000000
0.4000000
0.2000000
0.0000000
drosoph.nt (118,6 MB))
Yeastnt (3.2 MB)
month.htgs (573 MB)
igseqnt (67.5 MB)
Pdbnt (1.7 MB)
mito.nt (3.2 MB)
Database(Size)
40
Results(cont’)
Average of running 1000 Query, 1000 times.
Amino acid_Amino acid
90.000000
80.000000
70.000000
Tim e(S)
60.000000
50.000000
1 Node-Query Time
3 Nodes-Query time
3 Nodes-Query and communication time
40.000000
30.000000
20.000000
10.000000
0.000000
env_nr(1.6GB)
nr(573MB)
Sw issProt(160MB)
Pdbaa(20MB)
Yeast.aa(3.2MB)
Database(size)
41
Results(cont’)
Average of running 1000 Query, 1000 times.
Amino acid_Nucltide
90.0000000
80.0000000
70.0000000
Time(S)
60.0000000
50.0000000
1 Node Query time
3 Nodes Query time only
3 Nodes Query and Communication
time
40.0000000
30.0000000
20.0000000
10.0000000
0.0000000
env_nr(1.6GB)
Sw issprot(160MB)
nr(84.7MB)
Pdbaa(20.4MB)
yeast.aa(3.2MB)
Database(Size)
42
Conclusion about Parallel BLAST.
•Performane: Batter by using CLUSTER.
•Scalability:More Nodes time decrease.
43
Points
•
•
•
•
•
•
•
•
•
•
Introduction.
Cluster and Supercomputers.
Cluster Types and Advantages.
Our Cluster.
Cluster Performance.
Cluster Computer for Basic Problems.
General Idea about Sequence Alignment.
BLAST and Parallel BLAST Algorithm.
Sequence Alignment and Parallel Sequence Alignment.
Learned Skills.
44
Sequence Alignment
Compare between sequences
45
Sequence Alignment
•Introduction.
•Sequence Alignment Benefits.
•Sequence Alignment Types.
46
Needleman-Wunsch Algorithm
47
Why We need Sequence Alignment
to be parallelized ?
48
Parallel Sequence Alignment algorithm
49
Our Sequence Alignment Program
•Pairwise Alignment.
•Built Using Needleman-Wunsch algorithm.
50
Learned Skills.
•Using Linux (Suse 10.3) operating system.
• Programming using C language.
• Cluster computers and how to build one.
• MPICH2 for message passing interfaces between nodes.
• Latex.
• Team working, and helping each other.
• Presentation skills.
51
Thank you for your time.
Hisham Adel
52
Download