Document

advertisement
Biology Tutorial
Aarti Balasubramani
Anusha Bharadwaj
Massa Shoura
Stefan Giovan
Viruses
A T4 bacteriophage injecting DNA into a cell.
Influenza A virus
Electron micrograph of HIV. Cone-shaped cores are sectioned in various orientations. Viral
genomic RNA is located in the electron-dense wide end of core.http://stc/istc.nsf/va_WebPages/InfluenzaEngPrint
http://pathmicro.med.sc.edu
Life Begins with Cells
All cells are Prokaryotic or Eukaryotic
http://course1.winona.edu/
Eukaryotic Cell
Endothelial cells under the microscope. Nuclei
are stained blue with DAPI, microtubules are
marked green by an antibody bound to FITC and
actin filaments are labeled red with phalloidin
bound to TRITC. Bovine pulmonary artery
endothelial cells
Cell Organelles
Nucleus= contains the genetic material
Mitochondrion= produces energy
Golgi complex=protein distribution
Endoplasmic Reticulum
and Ribosomes=protein factory
Lysosome=degradation
http://microbewiki.kenyon.edu/
Plasma Membrane
DNA Replication
Base Pairing
A=T
CG
http://www.youtube.com/watch?v=teV62zrm2P0&feature=related
Life Cycle of a Cell
Cell division
RNA and protein synthesis
RNA and protein synthesis
Resting cells
DNA Replication
The Central Dogma of Biology
Replication
Transcription
http://www.youtube.com/watch?v=ztPkv7wc3yU
Translation
http://www.youtube.com/watch?v=-zb6r1MMTkc
Outline
• Cellular Biology
– Organelle Structure/Function
– Central Dogma
• Biochemistry
– Energy Storage/Utilization
– Macromolecules
• Bioinformatics
– Sequences and Databases
– Alignments, Tree Building, Modeling
Cells are Composed of a Molecular
Hierarchy
}
Small molecules
}
Macromolecules
}
Supramolecular
complexes
BONDS, JUST BONDS
• Covalent – nuclei share common electrons
– STRONG!!
• Non-Covalent – No common electrons
– WEAK!!
• Ionic
• Non-Ionic
http://publications.nigms.nih.gov/chemhealth/images/ch1_bonds.gif
Macromolecular Structures are
Stabilized by Weak Forces
Strength,
kJ mol-1
Distance
Dependence
Effective
Range, nm
Van der Waals
interactions
0.4 - 4
r 6
0.2
Hydrogen bonds
4 - 48
r 3
0.3
Electrostatic
interactions
(unscreened)
20 - 50
r 1
5 - 50
<40
?
?
Force
Hydrophobic
interactions
Hydrophobic Interactions
Structures formed by amphipathic molecules in H2O
Vibrational frequencies of
O-H bond of H2O in ice,
liquid H2O and CCl4
van Holde, Johnson & Ho Principles of Physical Biochemistry Prentice Hall, Upper Saddle River, NJ (1998)
What Is DNA Made of?
5’
3’
DNA – The Double Helix
Levels of Chromatin Packing
The Human Genome
DNA to Amino Acids
Amino Acids – Proteins Building Blocks
The Making of a Polypeptide Chain
The Four Levels of Protein Structure
3-dimensional folding
of molecule
Linear arrangement of
monomeric unit
Local regular structure
Spatial arrangement
of multiple subunits
Single Nucleotide Mutations
DNA Mutations
Experimental Techniques
Restriction Digestion
Use of Restriction Digestion to Identify
Mutations
(a) Wild-type and mutant DNA sequences
Gel Electrophoresis
Gel Electrophoresis-Visualizing DNA
The Polymerase Chain Reaction (PCR)
Cloning a human gene in a bacterial plasmid
Outline
• Cellular Biology
– Organelle Structure/Function
– Central Dogma
• Biochemistry
– Energy Storage/Utilization
– Macromolecules
• Bioinformatics
– Sequences and Databases
– Alignments, Tree Building, Modeling
Phenotype Tree Building
How Related are Organisms?
What do they eat? Where do they live? How do they divide? Move? Etc.
Qualitative
http://nai.arc.nasa.gov/seminars/68_Rivera/tree.jpg
Genotype Tree Building
How Related are Organisms?
How similar is their genome? Proteome?
MOLECULAR EVOLUTION
Quantitative
http://nai.arc.nasa.gov/seminars/68_Rivera/tree.jpg
Comparison of Genomes
• 1977- Φ-X174 genome sequenced
– Only about 5.4 kbp
• 1997- E. coli K-12 genome sequenced
– About 4.6x103 kbp
• 2007- Watson’s Genome sequenced!
– About 3x106 kbp!
• About 0.1% difference between human
genomes and 1% difference between humans
and chimps!
Bioinformatics is…
• Highly Interdisciplinary
– Proteomics and Genomics
– Structural and Computational Biology
– Systems Biology
– Computer Science, Probabilistic Modeling
• Computational Sequence Analysis
– What’s in a sequence?
Power of Prediction
• Can we …
– predict structural and functional properties of
proteins given its sequence?
– predict the consequences of a mutation?
– design proteins or drugs with specific functions?
• Every thing we need to know is at our fingertips, just need a better understanding of the
natural world
Protein Structure
• Structure adopted is completely determined
by sequence of residues
• Compromise between comfort (𝑈 or 𝐻) and
freedom (T𝑆)
F  U  TS
G  H  TS
http://www.news.cornell.edu/stories/Aug06/protein_folding.jpg
Secondary Structure Prediction
• 2o structures form beneficial H-bonds (lower E)
• -helices, -sheets
• Dihedral angles (,)
Source: Wikipedia
Tertiary Structure Prediction
• Homology/Comparative Modeling
– BEST
– Structure of very related protein is known
• Fold Recognition/Threading
– OFTEN IS ENOUGH
– Similar folds available but no close relative
• Knowledge Based or A Priori Predictions
– ONLY POSSIBLE FOR VERY SHORT PROTEINS
– Fold prediction but without experimental quality
Sequence Alignments
• FASTA Text Format
>header – my sequence >header – my thesis
THISISMYSEQ
THESISTHYSTING
• Alignment
THI SIS–MYSE–Q–
THESIST HYST ING
• What can we learn from this?
Alignments
• Pairwise
– Dot Plot
– Global(N-W) or Local(S-W)
Dot plot of two subunits in
Human Hemoglobin
Alpha Chain
• Simple Database Searches
• Multiple Alignments
– CLUSTAL
• Advanced Strategies
– PSI/PHI-BLAST, HMM’s
Beta Chain
– FASTA/BLAST
Databases
• Nucleotide Sequence
Database Collaboration
– DDBJ, EMBL, GenBank at
NCBI
• Amino Acid Databases
– UniProt, SWISS-PROT,
TrEMBL
• Structural
– PDB, MMDB, MSD
• Very Many Derivations!
http://www.ncbi.nlm.nih.gov/Database/
Scoring Matrices
• PAM Matrix : Point Accepted Mutation
– PAM1 estimates substitution rate if 1% of AA had
changed. Standards: PAM30 and PAM60
• BLOSUM : BLOcks of Amino Acid SUbstitution
Matrix
– BLOSUM80 “blocks” together sequences with
greater then 80% similarity.
PAM1
BLOSUM80
Less Divergent
PAM250
More Divergent
BLOSUM45
FASTA and BLAST
• FASTA - FAST All, Rapid AA or NT Alignments
• BLAST – Basic Local Alignment Search Tool
• Scoring Alignments
– Raw and Bit Scores; S ' 
 S  ln K
ln 2
– Significance of Local Alignment;
E  mn  2
– Significance of Global Alignment; Z 
x u

S '
Nucleotide Sequence Distances
• Jukes-Cantor, single parameter
3  4
d   ln 1 
4  3

p

A

G




C

T
• Kimura, 2 parameter
 1  1 
1 
1
d  ln 
  ln 

2  1  2 p  q  4  1  2q 
A

G




C

T
Distance Based Tree Building
• Tree Building => UPGMA
– Smallest distance element -> nearest neighbors
t1  t2  0.5d12
1
2
3
5
4
12 0.13 0.8 0.84 0.8 1 0.35 0.9 0.9 0.3 0.2-
0.05
1
0.05
2
Distance Based Tree Building
• Tree Building => UPGMA
– Smallest distance element -> nearest neighbors
t4  t5  0.5d 45
1
2
3
5
4
6(1,2) 3 0.8 4 0.9 0.3 5 0.9 0.3 0.2
6
1
0.10
2
4
0.10
5
Distance Based Tree Building
• Tree Building => UPGMA
– Smallest distance element -> nearest neighbors
t3  0.5d37
1
2
3
5
4
6(1,2) 3 0.8 7(4,5) 0.9 0.3
1
7
0.15
6
2
3
4
5
Distance Based Tree Building
• Tree Building => UPGMA
– Smallest distance element -> nearest neighbors
t6  0.5d 68
1
2
3
5
4
6(1,2)
8(3,4,5) 0.85
-
9
0.425
8
7
6
1
2
3
4
5
Distance Based Tree Building
• UPGMA is efficient but makes non-biological
assumption that rate of substitution is constant
for all branches
– Useful in a variety of applications such as microarray
data processing
• Neighbor-Joining does not make this assumption
and is still efficient
– More accurate for use in phylogenetic analyses
• Also -> Maximum Parsimony, Maximum
Likelihood, Minimum Evolution, and Bayesian
methods
Energy Calculations
• Goal: Find Unique Arrangement of Atoms
which Maximizes Stability
• Experimental (usually X-ray or NMR)
• Monte Carlo
𝑈
−
𝑒 𝑘𝐵𝑇
– Explore states ∝
– Let T->0 and discover low energy states
(Simulated Annealing)
• Molecular Dynamics
– Newtonian mechanics to evolve the system
Molecular Mechanics
E  K V
V  i Vi ,bonding  Vi ,nonbond
2
p
1
1
K   i mi v i 2   i i
2
2
mi
  xV 
 i
Fi   iV    yVi 
 V 
  zi 
E : Total energy
K : Kinetic energy
V : Potential energy
Sum of covalent and
noncovalent interactions
vi
: Velocity of particle i
pi : Momentum of particle i
Fi : Force acting on particle i
(gradient of potential energy)
Fold It!!
FOLD IT
http://fold.it/portal/info/science
Pairwise Alignment
• Dot Plot
– Visual and Qualitative
Dot plot of two subunits in
Human Hemoglobin
• Needleman-Wunsch
Global Alignment
• Smith-Waterman
Local Alignment
Beta Chain
– Alignment over entire
sequence
Alpha Chain
– Alignment over subsequences
http://lectures.molgen.mpg.de/Pairwise/DotPlots/
N-W Alignment
• Produces Optimal Global Alignment
– Without exhaustive pairwise comparison
• Scoring Matrix, S
F MD T P L N E
F 1
K
H
M
1
E
1
D
1
P
1
L
1
E
1
• Simple scoring matrix for these
sequences
• Matches get a score of +1
• Mismatches (blank) get a score of -2
• One could also use BLOSUM or PAM
scoring matrix for example
N-W Alignment
• Produces Optimal Global Alignment
– Without exhaustive pairwise comparison
• Alignment Matrix, F
F
K
H
M
E
D
P
L
E
F M D T P L N E
0 -2 -4 -6 -8 -10 -12 -14 -16
-2 +1
-4
-6
-8
-10
-12
-14
-16
-18
 Fi 1, j 1  S kl 


Fij  max  Fi 1, j  gap 
 F  gap 
 i , j 1

Match always results in largest 𝐹𝑖𝑗,
else take the largest score from
• mismatch,
• gap in sequence 1 , or
• gap in sequence 2 .
N-W Alignment
• Produces Optimal Global Alignment
– Without exhaustive pairwise comparison
• Build Scoring Matrix, F
F
K
H
M
E
D
P
L
E
F M D T P L N E
0 -2 -4 -6 -8 -10 -12 -14 -16
-2 +1 -1 -3 -5 -7 -9 -11 -13
-4 -1 -1
-6
-8
-10
-12
-14
-16
-18
 Fi 1, j 1  S kl 


Fij  max  Fi 1, j  gap 
 F  gap 
 i , j 1

N-W Alignment
• Produces Optimal Global Alignment
– Without exhaustive pairwise comparison
• Build Scoring Matrix, F
F
K
H
M
E
D
P
L
E
F M D T P L N E
0 -2 -4 -6 -8 -10 -12 -14 -16
-2 +1 -1 -3 -5 -7 -9 -11 -13
-4 -1 -1 -3 -5 -7 -9 -11 -13
-6 -3 -3 -3 -5 -7 -9 -11 -13
-8 -5 -2 -4 -5 -7 -9 -11 -13
-10 -7 -4 -4 -6 -7 -9 -11 -10
-12 -9 -6 -3 -5 -7 -9 -11 -12
-14 -11 -8 -5 -5 -4 -6 -8 -10
-16 -13 -10 -7 -7 -6 -3 -5 -7
-18 -15 -12 -9 -9 -8 -5 -5 -4
 Fi 1, j 1  S kl 


Fij  max  Fi 1, j  gap 
 F  gap 
 i , j 1

Overall alignment score
N-W Alignment
• Produces Optimal Global Alignment
– Without exhaustive pairwise comparison
• Trace Back to Determine Optimum Alignment
F
K
H
M
E
D
P
L
E
F M D T P L N E
0 -2 -4 -6 -8 -10 -12 -14 -16
-2 +1 -1 -3 -5 -7 -9 -11 -13
-4 -1 -1 -3 -5 -7 -9 -11 -13
-6 -3 -3 -3 -5 -7 -9 -11 -13
-8 -5 -2 -4 -5 -7 -9 -11 -13
-10 -7 -4 -4 -6 -7 -9 -11 -10
-12 -9 -6 -3 -5 -7 -9 -11 -12
-14 -11 -8 -5 -5 -4 -6 -8 -10
-16 -13 -10 -7 -7 -6 -3 -5 -7
-18 -15 -12 -9 -9 -8 -5 -5 -4
Match or Mismatch
Gap in Sequence 1
Gap in Sequence 2
Seq1: F K HME D- P L - E
Seq2: F - - M- DT P L NE
Smith-Waterman Alignment
• Local alignment, Similar in Nature to N-W
– S takes only non-negative values
– Highest value in matrix corresponds to end of
alignment, need not be in corner
– No penalty for gaps at ends
• Most rigorous method of aligning nucleotide
or protein sequence domains
Database Searches
• Optimal pairwise alignment produced by S-W,
but insufficient in scanning databases
• Scan for likely matches before performing
more rigorous alignments
– FASTA, BLAST
• Scan for words scoring higher than some
threshold, extend alignment until score drops
Advanced Database Searches
• When BLAST falls short
– Detecting homology between distantly related
proteins
– Very long (>20kbp) genome sequences with highly
conserved regions and highly variable regions
• PSI-BLAST (Position-Specific Iterated)
– BLAST generates Position Specific Scoring Matrix
– PSSM used as query to re-search database
• Also, PHI-BLAST, HMMs…
Multiple Sequence Alignments
• Exact Approaches
– e.g. N-W alignments
– Prohibitive for many or long sequences
• Progressive Approaches
– e.g. CLUSTAL
• Iterative Approaches
• Consistency-Based Approaches
• Structure-Based Methods
Distance Between Sequences
• Based on theory of molecular evolution
differences  distances
• Simplest method, Hamming distance, d  100  p
• Multiple substitutions at single site?
• Poisson correction, d   ln 1  p 
– Assume: Probability of observing a change is
small, but constant across all sites
– Rate of mutation is constant over time
– Mutations at different sites occur independently
James Watson, Francis Crick and
Rosalind Franklin
Download