Algorithms in Computational Biology (236522) Spring 2006

advertisement
Algorithms in Computational
Biology (236522) Spring 2006
Lecture: Monday 12:30-2:20, Taub 4
Tutorial: Tuesday 1:30-2:20, Taub 7
Lecturer: Golan Yona
Office hours: Wednesday or Thursday 2-3pm (Taub 632, Tel 4356)
TA: Itai Sharon
Office hours: Tuesday 2:30-3:20 (Taub 621, Tel 4946)
1
Computational Biology
Computational biology is the application of computational
tools and techniques to (primarily) molecular biology. It
enables new ways of study in life sciences, allowing analytic
and predictive methodologies that support and enhance
laboratory work. It is a multidisciplinary area of study that
combines Biology, Computer Science, and Statistics.
Computational biology is also called Bioinformatics, although
many practitioners define Bioinformatics somewhat
narrower by restricting the field to molecular Biology only.
1
Examples of Areas of Interest
•
•
•
•
•
•
Understanding the structure of genomes (detecting
genes, regulatory elements, variations)
Deciphering structure and function of proteins
Discovery of cellular “procedures” (pathways)
Indentifying disease-causing genes
Building the tree of life
More ..
1
Exponential growth of biological information:
growth of sequences, structures, and literature.
1
Course’s goals
The focus of this course is the set of algorithms, tools and
models used today to analyse molecular biological data,
recover and discover hidden information.
1
Course Prerequisites
Computer Science and Probability Background
• Data structure 1 (cs234218)
• Algorithms 1 (cs234247)
• Probability (any course)
Or permission from instructor
Biology background
• Formally: none (to allow CS studnets to take this course)
• Recommended: Molecular Biology 1 (especially for those in
the Bioinformatics track), or a similar Biology course, and/or
a serious desire to complement your knowledge in Biology
by reading the appropriate material (see the course home).1
Requirements & Grades
• 40% homework, in five or six assignments. Homework is
obligatory.
• 60% test. Must pass 55 for the homework’s grade to count
• Exam date: 17.7.06
1
Syllabus
• Introduction, biological background (0.5 weeks)
• Gene detection and function prediction
– Pairwise alignment (2 weeks)
– Multiple sequence alignment (2 weeks)
– Profile and Hidden Markov Models (2 weeks)
• Motif detection (1 week)
• Phylogenetic trees (2 weeks)
• Expression data analysis, pathways (2.5 weeks)
• Protein structure analysis (2 weeks)
1
Bibliography
• Biological Sequence Analysis, R.Durbin et al. , Cambridge
University Press, 1998
• Introduction to Molecular Biology, J. Setubal, J. Meidanis,
PWS publishing Company, 1997
• Misc papers
• Some slides adopted from courses taught by Nir Friedman
(Hebrew U), Dan Geiger and Shlomo Moran
• Course home: webcourse.cs.technion.ac.il/~cs236522
1
Biological Background
First home work assignment: Read the first chapter (pages 130) of Setubal et al., 1997. (copies are available in the Taub
building library, and in the central library). Answer the
questions of the first assignment in the course site.
1
Course starts..
1
Human Genome
Most human cells contain
46 chromosomes:
• 2 sex chromosomes
(X,Y):
XY – in males.
XX – in females.
• 22 pairs of
chromosomes named
autosomes.
1
Source: Alberts et al
DNA Organization
1
Source: Alberts et al
The Double Helix
1
DNA Components
Four nucleotide types:
• Adenine
• Guanine
• Cytosine
• Thymine
1
Base pairs
Hydrogen bonds (electrostatic connection):
• A-T
• C-G
1
1
Genome Sizes
•
•
•
•
E.Coli (bacteria)
4.6 x 106 bases
Yeast (simple fungi)
15 x 106 bases
Smallest human chromosome 50 x 106 bases
Entire human genome
3 x 109 bases
1
Genetic Information
• Genome – the
collection of genetic
information.
• Chromosomes –
storage units of genes.
• Gene – basic unit of
genetic information.
They determine the
inherited characters.
1
Genes
The DNA strings include:
• Coding regions (“genes”)
– E. coli has ~4,000 genes
– Yeast has ~6,000 genes
– C. Elegans has ~13,000 genes
– Humans have ~32,000 genes
• Control regions
– These typically are adjacent to the genes
– They determine when a gene should be “expressed”
• “Junk” DNA (unknown function - ~90% of the DNA
in human’s chromosomes)
1
The cell
All cells of an organism contain the same DNA content
(and the same genes) yet there is a variety of cell types.
1
Example: Tissues in Stomach
How is this variety encoded and expressed ?
1
Central Dogma
‫שעתוק‬
Transcription
Gene
‫תרגום‬
Translation
mRNA
Protein
cells express different subset of the genes
In different tissues and under different conditions
1
Central dogma
1
Transcription
Source: Mathews & van Holde
• Coding sequences can be transcribed to
RNA
• RNA
– Similar to DNA, slightly different nucleotides:
different backbone
– Uracil (U) instead of Thymine (T)
1
1
Transcription: Junk DNA, RNA
Editing, Alternative Splicing
1. Transcribe to RNA
2. Eliminate introns
3. Splice (connect) exons
* Alternative splicing exists
Exons hold information, they are more stable during evolution.
This process takes place in the nucleus. The mRNA molecules
diffuse through the nucleus membrane to the outer cell plasma.
1
RNA roles
• Messenger RNA (mRNA)
– Encodes protein sequences. Each three nucleotide acids
translate to an amino acid (the protein building block).
• Transfer RNA (tRNA)
– Decodes the mRNA molecules to amino-acids. It connects
to the mRNA with one side and holds the appropriate
amino acid on its other side.
• Ribosomal RNA (rRNA)
– Part of the ribosome, a machine for translating mRNA to
proteins. It catalyzes (like enzymes) the reaction that
attaches the hanging amino acid from the tRNA to the
amino acid chain being created.
• ...
1
Central dogma
1
Proteins
Made of 20
Amino acids
1
Translation
• Translation is mediated by the ribosome
• Ribosome is a complex of protein & rRNA
molecules
• The ribosome attaches to the mRNA at a
translation initiation site
• Then ribosome moves along the mRNA
sequence and in the process constructs a
sequence of amino acids (polypeptide)
which is released and folds into a protein.
1
1
Helper molecules: tRNA
1
The Genetic Code
1
1
1
1
Protein
Structure
• Proteins are polypeptides of 703000 amino-acids
• This structure is
(mostly)
determined by
the sequence of
amino-acids that
make up the
protein
1
Protein structures
Various structures with different functions
• Structural framework (keratin, collagen)
• Transport and storage of small molecules
(hemoglobin)
• Transmit information (hormones, receptors)
• Antibodies
• Blood clotting factors
• Enzymes
1
Protein-Protein interactions
1
Pathways
1
1
1
Evolution
• Related organisms have similar DNA
– Similarity in sequences of proteins
– Similarity in organization of genes along the
chromosomes
• Evolution plays a major role in biology
– Many mechanisms are shared across a wide
range of organisms
– During the course of evolution existing
components are adapted for new functions
1
Evolution
Evolution of new organisms is driven by
• Diversity
– Different individuals carry different variants of
the same basic blue print
• Mutations
– The DNA sequence can be changed due to
single base changes, deletion/insertion of
DNA segments, etc.
• Selection bias
1
Source: Alberts et al
The Tree of Life
1
Download