Bioinformatics areas

advertisement
Bioinformatics
2010-2011
Lecture 1
Introduction
Dr. Aladdin Hamwieh
Khalid Al-shamaa
Abdulqader Jighly
Aleppo University
Faculty of technical engineering
Department of Biotechnology
Main Lines
• Definition
• Bioinformatics areas
• Bioinformatics data
– Data types
– Applications for these data
• Next generation sequencing
• Bioinformatics algorithms
• Joint international programming
initiatives
Definition
•
•
•
Bioinformatics is the field of science in
which biology, computer science, and
information technology merge into a
single discipline.
Bioinformatics is the science of
managing and analyzing biological
data using advanced computing
techniques
Bioinformatics applies principles of
information science to make the vast,
diverse, and complex life sciences data
more understandable and useful.
Definition
• There are two extremes
bioinformatics work
in
– Tool users (biologists): know how to
press the buttons and the biology
but have no clue what happens
inside the program
– Tool shapers (informaticians): know
the algorithms and how the tool
works but have no clue about the
biology
Bioinformatics areas
• Molecular sequence analysis
1.
2.
3.
4.
5.
Sequence alignment
Sequence database searching
Motif discovery
Gene and promoter finding
Reconstruction of evolutionary
relationships
6. Genome
assembly
and
comparison
Bioinformatics areas
• Molecular structural analysis
1.
2.
3.
4.
5.
Protein structure analysis
Nucleic acid structure analysis
Comparison
Classification
prediction
Bioinformatics areas
• Molecular functional analysis
1. gene expression profiling
2. Protein–protein interaction
prediction
3. protein sub-cellular localization
prediction
4. Metabolic pathway reconstruction
5. simulation
Bioinformatics data
There is different data
types usually used in
bioinformatics
The same data may be
used in different
areas
Data types
• DNA sequences
• RNA sequences
• Expression (microarray) profile
• Proteome (x-ray, NMR) profile
• Metabolome profile
• Haplotype profile
• Phenotype profile
1- DNA Sequences
• Simple sequence analysis
– Database searching
– Pairwise and multiple analysis
•
•
•
•
Regulatory regions
Gene finding
Whole genome annotation
Comparative genomics
2- RNAs
•
•
•
•
•
•
Splice variants
Tissue specific expression
2D structure
3D structure
Single gene analysis
Microarray
2D and 3D structure of tRNA
2D and 3D structure of rRNA
Microarray
• 20,000 to 60,000 short
DNA probes of specified
sequences are orderly
tethered on a small
slide.
Each
probe
corresponds
to
a
particular short section
of a gene.
Microarray
• DNA microarrays measure the RNA
abundance with either 1 channel
(one color) or 2 channels (two
colors).
• Stanford microarrays measure by
competitive
hybridization
the
relative expression under a given
condition (fluorescent red dye Cy5)
compared to its control (labeled
with a green fluorescent dye, Cy3)
(Two channels)
• Affymetrix GeneChip has 1 channel
and use either fluorescent red dye
Cy5 or green fluorescent dye, Cy3
3- Proteins
• Protein sequences analysis
– Database searching
– Pairwise and multiple analysis
•
•
•
•
2D structure
3D structure
Classification of proteins families
Protein arrays
3D structure
Animation
4- Metabolome and molecular biology
• Metabolic pathways
• Regulatory networks
Helps to understand
systems biology
5- Haplotype
• Molecular Markers
–
–
–
–
–
–
RFLP
RAPD
SSR
ISSR
AFLP
DArT
– SNP
– ….
SNP
6- Phenotype
•
•
•
•
•
•
•
Morphological data
Physiological data
Stresses tolerance
Pathogenic infections
Diseases resistance
Cancers types
…..
Haplotype & Phenotype
Next Generation Sequencing
Sequencing ABI 3730 Roche
Machine
GSFLX
Launched
2000
Illumina
Solexa
AB
SOLiD
Helicos
SMRT
2006
2007
2008
Target
release 2010
35-70
25-35
28
964
400K
120M
170M
85M
NA
100 MB
6 GB
6 GB
2 GB
NA
$5.97 k
$5.81 k
NA
NA
2004
Read length 800-1100 250-400
Reads/run
96
Throughput 0.1 MB
per run
Cost/Mb
High cost $84.39
Short reads assembly problems
Short reads assembly problems
Short reads assembly problems
Algorithms in bioinformatics
• String algorithms
• Dynamic programming
• Machine learning (NN, k-NN, SVM, GA, ..)
• Markov chain models
• Hidden Markov models
• Markov Chain Monte Carlo (MCMC) algorithms
• Stochastic context free grammars
• EM algorithms
• Gibbs sampling
• Clustering
• Tree algorithms (suffix trees)
• Graph algorithms
• Text analysis
• Hybrid/combinatorial techniques
• ….
Joint international programming initiatives
• Bioperl
http://www.bioperl.org/wiki/Main_Page
• Biopython
http://www.biopython.org/
• BioTcl
http://wiki.tcl.tk/12367
• BioJava
www.biojava.org/wiki/Main_Page
Thank You
Download