Biology Tutorial Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan Viruses A T4 bacteriophage injecting DNA into a cell. Influenza A virus Electron micrograph of HIV. Cone-shaped cores are sectioned in various orientations. Viral genomic RNA is located in the electron-dense wide end of core.http://stc/istc.nsf/va_WebPages/InfluenzaEngPrint http://pathmicro.med.sc.edu Life Begins with Cells All cells are Prokaryotic or Eukaryotic http://course1.winona.edu/ Eukaryotic Cell Endothelial cells under the microscope. Nuclei are stained blue with DAPI, microtubules are marked green by an antibody bound to FITC and actin filaments are labeled red with phalloidin bound to TRITC. Bovine pulmonary artery endothelial cells Cell Organelles Nucleus= contains the genetic material Mitochondrion= produces energy Golgi complex=protein distribution Endoplasmic Reticulum and Ribosomes=protein factory Lysosome=degradation http://microbewiki.kenyon.edu/ Plasma Membrane DNA Replication Base Pairing A=T CG http://www.youtube.com/watch?v=teV62zrm2P0&feature=related Life Cycle of a Cell Cell division RNA and protein synthesis RNA and protein synthesis Resting cells DNA Replication The Central Dogma of Biology Replication Transcription http://www.youtube.com/watch?v=ztPkv7wc3yU Translation http://www.youtube.com/watch?v=-zb6r1MMTkc Outline • Cellular Biology – Organelle Structure/Function – Central Dogma • Biochemistry – Energy Storage/Utilization – Macromolecules • Bioinformatics – Sequences and Databases – Alignments, Tree Building, Modeling Cells are Composed of a Molecular Hierarchy } Small molecules } Macromolecules } Supramolecular complexes BONDS, JUST BONDS • Covalent – nuclei share common electrons – STRONG!! • Non-Covalent – No common electrons – WEAK!! • Ionic • Non-Ionic http://publications.nigms.nih.gov/chemhealth/images/ch1_bonds.gif Macromolecular Structures are Stabilized by Weak Forces Strength, kJ mol-1 Distance Dependence Effective Range, nm Van der Waals interactions 0.4 - 4 r 6 0.2 Hydrogen bonds 4 - 48 r 3 0.3 Electrostatic interactions (unscreened) 20 - 50 r 1 5 - 50 <40 ? ? Force Hydrophobic interactions Hydrophobic Interactions Structures formed by amphipathic molecules in H2O Vibrational frequencies of O-H bond of H2O in ice, liquid H2O and CCl4 van Holde, Johnson & Ho Principles of Physical Biochemistry Prentice Hall, Upper Saddle River, NJ (1998) What Is DNA Made of? 5’ 3’ DNA – The Double Helix Levels of Chromatin Packing The Human Genome DNA to Amino Acids Amino Acids – Proteins Building Blocks The Making of a Polypeptide Chain The Four Levels of Protein Structure 3-dimensional folding of molecule Linear arrangement of monomeric unit Local regular structure Spatial arrangement of multiple subunits Single Nucleotide Mutations DNA Mutations Experimental Techniques Restriction Digestion Use of Restriction Digestion to Identify Mutations (a) Wild-type and mutant DNA sequences Gel Electrophoresis Gel Electrophoresis-Visualizing DNA The Polymerase Chain Reaction (PCR) Cloning a human gene in a bacterial plasmid Outline • Cellular Biology – Organelle Structure/Function – Central Dogma • Biochemistry – Energy Storage/Utilization – Macromolecules • Bioinformatics – Sequences and Databases – Alignments, Tree Building, Modeling Phenotype Tree Building How Related are Organisms? What do they eat? Where do they live? How do they divide? Move? Etc. Qualitative http://nai.arc.nasa.gov/seminars/68_Rivera/tree.jpg Genotype Tree Building How Related are Organisms? How similar is their genome? Proteome? MOLECULAR EVOLUTION Quantitative http://nai.arc.nasa.gov/seminars/68_Rivera/tree.jpg Comparison of Genomes • 1977- Φ-X174 genome sequenced – Only about 5.4 kbp • 1997- E. coli K-12 genome sequenced – About 4.6x103 kbp • 2007- Watson’s Genome sequenced! – About 3x106 kbp! • About 0.1% difference between human genomes and 1% difference between humans and chimps! Bioinformatics is… • Highly Interdisciplinary – Proteomics and Genomics – Structural and Computational Biology – Systems Biology – Computer Science, Probabilistic Modeling • Computational Sequence Analysis – What’s in a sequence? Power of Prediction • Can we … – predict structural and functional properties of proteins given its sequence? – predict the consequences of a mutation? – design proteins or drugs with specific functions? • Every thing we need to know is at our fingertips, just need a better understanding of the natural world Protein Structure • Structure adopted is completely determined by sequence of residues • Compromise between comfort (𝑈 or 𝐻) and freedom (T𝑆) F U TS G H TS http://www.news.cornell.edu/stories/Aug06/protein_folding.jpg Secondary Structure Prediction • 2o structures form beneficial H-bonds (lower E) • -helices, -sheets • Dihedral angles (,) Source: Wikipedia Tertiary Structure Prediction • Homology/Comparative Modeling – BEST – Structure of very related protein is known • Fold Recognition/Threading – OFTEN IS ENOUGH – Similar folds available but no close relative • Knowledge Based or A Priori Predictions – ONLY POSSIBLE FOR VERY SHORT PROTEINS – Fold prediction but without experimental quality Sequence Alignments • FASTA Text Format >header – my sequence >header – my thesis THISISMYSEQ THESISTHYSTING • Alignment THI SIS–MYSE–Q– THESIST HYST ING • What can we learn from this? Alignments • Pairwise – Dot Plot – Global(N-W) or Local(S-W) Dot plot of two subunits in Human Hemoglobin Alpha Chain • Simple Database Searches • Multiple Alignments – CLUSTAL • Advanced Strategies – PSI/PHI-BLAST, HMM’s Beta Chain – FASTA/BLAST Databases • Nucleotide Sequence Database Collaboration – DDBJ, EMBL, GenBank at NCBI • Amino Acid Databases – UniProt, SWISS-PROT, TrEMBL • Structural – PDB, MMDB, MSD • Very Many Derivations! http://www.ncbi.nlm.nih.gov/Database/ Scoring Matrices • PAM Matrix : Point Accepted Mutation – PAM1 estimates substitution rate if 1% of AA had changed. Standards: PAM30 and PAM60 • BLOSUM : BLOcks of Amino Acid SUbstitution Matrix – BLOSUM80 “blocks” together sequences with greater then 80% similarity. PAM1 BLOSUM80 Less Divergent PAM250 More Divergent BLOSUM45 FASTA and BLAST • FASTA - FAST All, Rapid AA or NT Alignments • BLAST – Basic Local Alignment Search Tool • Scoring Alignments – Raw and Bit Scores; S ' S ln K ln 2 – Significance of Local Alignment; E mn 2 – Significance of Global Alignment; Z x u S ' Nucleotide Sequence Distances • Jukes-Cantor, single parameter 3 4 d ln 1 4 3 p A G C T • Kimura, 2 parameter 1 1 1 1 d ln ln 2 1 2 p q 4 1 2q A G C T Distance Based Tree Building • Tree Building => UPGMA – Smallest distance element -> nearest neighbors t1 t2 0.5d12 1 2 3 5 4 12 0.13 0.8 0.84 0.8 1 0.35 0.9 0.9 0.3 0.2- 0.05 1 0.05 2 Distance Based Tree Building • Tree Building => UPGMA – Smallest distance element -> nearest neighbors t4 t5 0.5d 45 1 2 3 5 4 6(1,2) 3 0.8 4 0.9 0.3 5 0.9 0.3 0.2 6 1 0.10 2 4 0.10 5 Distance Based Tree Building • Tree Building => UPGMA – Smallest distance element -> nearest neighbors t3 0.5d37 1 2 3 5 4 6(1,2) 3 0.8 7(4,5) 0.9 0.3 1 7 0.15 6 2 3 4 5 Distance Based Tree Building • Tree Building => UPGMA – Smallest distance element -> nearest neighbors t6 0.5d 68 1 2 3 5 4 6(1,2) 8(3,4,5) 0.85 - 9 0.425 8 7 6 1 2 3 4 5 Distance Based Tree Building • UPGMA is efficient but makes non-biological assumption that rate of substitution is constant for all branches – Useful in a variety of applications such as microarray data processing • Neighbor-Joining does not make this assumption and is still efficient – More accurate for use in phylogenetic analyses • Also -> Maximum Parsimony, Maximum Likelihood, Minimum Evolution, and Bayesian methods Energy Calculations • Goal: Find Unique Arrangement of Atoms which Maximizes Stability • Experimental (usually X-ray or NMR) • Monte Carlo 𝑈 − 𝑒 𝑘𝐵𝑇 – Explore states ∝ – Let T->0 and discover low energy states (Simulated Annealing) • Molecular Dynamics – Newtonian mechanics to evolve the system Molecular Mechanics E K V V i Vi ,bonding Vi ,nonbond 2 p 1 1 K i mi v i 2 i i 2 2 mi xV i Fi iV yVi V zi E : Total energy K : Kinetic energy V : Potential energy Sum of covalent and noncovalent interactions vi : Velocity of particle i pi : Momentum of particle i Fi : Force acting on particle i (gradient of potential energy) Fold It!! FOLD IT http://fold.it/portal/info/science Pairwise Alignment • Dot Plot – Visual and Qualitative Dot plot of two subunits in Human Hemoglobin • Needleman-Wunsch Global Alignment • Smith-Waterman Local Alignment Beta Chain – Alignment over entire sequence Alpha Chain – Alignment over subsequences http://lectures.molgen.mpg.de/Pairwise/DotPlots/ N-W Alignment • Produces Optimal Global Alignment – Without exhaustive pairwise comparison • Scoring Matrix, S F MD T P L N E F 1 K H M 1 E 1 D 1 P 1 L 1 E 1 • Simple scoring matrix for these sequences • Matches get a score of +1 • Mismatches (blank) get a score of -2 • One could also use BLOSUM or PAM scoring matrix for example N-W Alignment • Produces Optimal Global Alignment – Without exhaustive pairwise comparison • Alignment Matrix, F F K H M E D P L E F M D T P L N E 0 -2 -4 -6 -8 -10 -12 -14 -16 -2 +1 -4 -6 -8 -10 -12 -14 -16 -18 Fi 1, j 1 S kl Fij max Fi 1, j gap F gap i , j 1 Match always results in largest 𝐹𝑖𝑗, else take the largest score from • mismatch, • gap in sequence 1 , or • gap in sequence 2 . N-W Alignment • Produces Optimal Global Alignment – Without exhaustive pairwise comparison • Build Scoring Matrix, F F K H M E D P L E F M D T P L N E 0 -2 -4 -6 -8 -10 -12 -14 -16 -2 +1 -1 -3 -5 -7 -9 -11 -13 -4 -1 -1 -6 -8 -10 -12 -14 -16 -18 Fi 1, j 1 S kl Fij max Fi 1, j gap F gap i , j 1 N-W Alignment • Produces Optimal Global Alignment – Without exhaustive pairwise comparison • Build Scoring Matrix, F F K H M E D P L E F M D T P L N E 0 -2 -4 -6 -8 -10 -12 -14 -16 -2 +1 -1 -3 -5 -7 -9 -11 -13 -4 -1 -1 -3 -5 -7 -9 -11 -13 -6 -3 -3 -3 -5 -7 -9 -11 -13 -8 -5 -2 -4 -5 -7 -9 -11 -13 -10 -7 -4 -4 -6 -7 -9 -11 -10 -12 -9 -6 -3 -5 -7 -9 -11 -12 -14 -11 -8 -5 -5 -4 -6 -8 -10 -16 -13 -10 -7 -7 -6 -3 -5 -7 -18 -15 -12 -9 -9 -8 -5 -5 -4 Fi 1, j 1 S kl Fij max Fi 1, j gap F gap i , j 1 Overall alignment score N-W Alignment • Produces Optimal Global Alignment – Without exhaustive pairwise comparison • Trace Back to Determine Optimum Alignment F K H M E D P L E F M D T P L N E 0 -2 -4 -6 -8 -10 -12 -14 -16 -2 +1 -1 -3 -5 -7 -9 -11 -13 -4 -1 -1 -3 -5 -7 -9 -11 -13 -6 -3 -3 -3 -5 -7 -9 -11 -13 -8 -5 -2 -4 -5 -7 -9 -11 -13 -10 -7 -4 -4 -6 -7 -9 -11 -10 -12 -9 -6 -3 -5 -7 -9 -11 -12 -14 -11 -8 -5 -5 -4 -6 -8 -10 -16 -13 -10 -7 -7 -6 -3 -5 -7 -18 -15 -12 -9 -9 -8 -5 -5 -4 Match or Mismatch Gap in Sequence 1 Gap in Sequence 2 Seq1: F K HME D- P L - E Seq2: F - - M- DT P L NE Smith-Waterman Alignment • Local alignment, Similar in Nature to N-W – S takes only non-negative values – Highest value in matrix corresponds to end of alignment, need not be in corner – No penalty for gaps at ends • Most rigorous method of aligning nucleotide or protein sequence domains Database Searches • Optimal pairwise alignment produced by S-W, but insufficient in scanning databases • Scan for likely matches before performing more rigorous alignments – FASTA, BLAST • Scan for words scoring higher than some threshold, extend alignment until score drops Advanced Database Searches • When BLAST falls short – Detecting homology between distantly related proteins – Very long (>20kbp) genome sequences with highly conserved regions and highly variable regions • PSI-BLAST (Position-Specific Iterated) – BLAST generates Position Specific Scoring Matrix – PSSM used as query to re-search database • Also, PHI-BLAST, HMMs… Multiple Sequence Alignments • Exact Approaches – e.g. N-W alignments – Prohibitive for many or long sequences • Progressive Approaches – e.g. CLUSTAL • Iterative Approaches • Consistency-Based Approaches • Structure-Based Methods Distance Between Sequences • Based on theory of molecular evolution differences distances • Simplest method, Hamming distance, d 100 p • Multiple substitutions at single site? • Poisson correction, d ln 1 p – Assume: Probability of observing a change is small, but constant across all sites – Rate of mutation is constant over time – Mutations at different sites occur independently James Watson, Francis Crick and Rosalind Franklin