An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Molecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen, Mike Daly, Hoa Dinh, Erinn Hama, Robert Hinman, Julio Ng, Michael Sneddon, Hoa Troung, Jerry Wang, Che Fung Yung An Introduction to Bioinformatics Algorithms www.bioalgorithms.info All Life depends on 3 critical molecules • DNAs • Hold information on how cell works • RNAs • Act to transfer short pieces of information to different parts of cell • Provide templates to synthesize into protein • Proteins • Form enzymes that send signals to other cells and regulate gene activity • Form body’s major components (e.g. hair, skin, etc.) An Introduction to Bioinformatics Algorithms DNA: The Basis of Life • Humans have about 3 billion base pairs. • How do you package it into a cell? • How does the cell know where in the highly packed DNA where to start transcription? • Special regulatory sequences • DNA size does not mean more complex • Complexity of DNA • Eukaryotic genomes consist of variable amounts of DNA • Single Copy or Unique DNA • Highly Repetitive DNA www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Discovery of DNA • • DNA Sequences • Chargaff and Vischer, 1949 • DNA consisting of A, T, G, C • Adenine, Guanine, Cytosine, Thymine • Chargaff Rule • Noticing #A#T and #G#C • A “strange but possibly meaningless” phenomenon. Wow!! A Double Helix • Watson and Crick, Nature, April 25, 1953 1 Biologist • 1 Physics Ph.D. Student 900 words Nobel Prize • Rich, 1973 • Structural biologist at MIT. • DNA’s structure in atomic resolution. Crick Watson An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA • Stores all information of life • 4 “letters” base pairs. AGTC (adenine, guanine, thymine, cytosine ) which pair A-T and C-G on complimentary strands. • DNA is a polymer • Sugar-Phosphate-Base • Bases held together by H bonding to the opposite strand http://www.lbl.gov/Education/HGP-images/dna-medium.gif An Introduction to Bioinformatics Algorithms The Purines www.bioalgorithms.info The Pyrimidines An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA - Deoxyribonucleic acid AMP O HO P O N N HO O P O O N NH2 O 5´ N 1´ 3´ 2´ NH N HO OH NH2 Base CH3 OH 4´ HO Phosphate Sugar P O NH2 N O OH HO GMP N NH2 O O NH N 4´ HO O P O HO OH TMP O N O 5´ N 1´ 3´ 2´ HO CMP O An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA, continued Sugar Phosphate Base (A,T, C or G) http://www.bio.miami.edu/dana/104/DNA2.jpg An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA, continued • DNA has a double helix structure. However, it is not symmetric. It has a “forward” and “backward” direction. The ends are labeled 5’ and 3’ after the Carbon atoms in the sugar component. 5’ AATCGCAAT 3’ 3’ TTAGCGTTA 5’ DNA always reads 5’ to 3’ for transcription replication An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Basic Structure Implications • DNA is (-) charged due to phosphate: gel electrophoresis, DNA sequencing (Sanger method) • H-bonds form between specific bases: hybridization – replication, transcription, translation DNA microarrays, hybridization blots, PCR C-G bound tighter than A-T due to triple H-bond • DNA-protein interactions (via major & minor grooves): transcriptional regulation • DNA polymerization: 5’ to 3’ – phosphodiester bond formed between 5’ phosphate and 3’ OH An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Double helix of DNA • The double helix of DNA has these features: • Concentration of adenine (A) is equal to thymine (T) • Concentration of cytidine (C) is equal to guanine (G). • Watson-Crick base-pairing A will only base-pair with T, and C with G • base-pairs of G and C contain three H-bonds, • Base-pairs of A and T contain two H-bonds. • G-C base-pairs are more stable than A-T base-pairs • Two polynucleotide strands wound around each other. • The backbone of each consists of alternating deoxyribose and phosphate groups An Introduction to Bioinformatics Algorithms www.bioalgorithms.info DNA - replication • DNA can replicate by splitting, and rebuilding each strand. • Note that the rebuilding of each strand uses slightly different mechanisms due to the 5’ 3’ asymmetry, but each daughter strand is an exact replica of the original strand. http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/D/DNAReplication.html An Introduction to Bioinformatics Algorithms Superstructure Lodish et al. Molecular Biology of the Cell (5th ed.). W.H. Freeman & Co., 2003. www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Superstructure Implications • DNA in a living cell is in a highly compacted and structured state • Transcription factors and RNA polymerase need ACCESS to do their work • Transcription is dependent on the structural state – SEQUENCE alone does not tell the whole story An Introduction to Bioinformatics Algorithms www.bioalgorithms.info RNA • RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by U(racil) • Some forms of RNA can form secondary structures by “pairing up” with itself. This can have change its properties dramatically. DNA and RNA can pair with each other. tRNA linear and 3D view: http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.gif An Introduction to Bioinformatics Algorithms www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info RNA, continued • Several types exist, classified by function • mRNA – this is what is usually being referred to when a Bioinformatician says “RNA”. This is used to carry a gene’s message out of the nucleus. • tRNA – transfers genetic information from mRNA to an amino acid sequence • rRNA – ribosomal RNA. Part of the ribosome which is involved in translation. An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins…. If there is a job to be done in the molecular world of our cells, usually that job is done by a protein. CATALASE A protein hormone which helps to regulate your blood sugar levels Source: An enzyme which removes Hydrogen peroxide from your body so it does not become toxic http://courses.washington.edu/conj/protein/insulin2.gif http://www.biochem.ucl.ac.uk/bsm/pdbsum/1gwf/main.html 18 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins can be fibrous or globular Let’s explore the diversity of protein structure and function by investigating some examples 19 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Fibrous proteins have a structural role •Collagen is the most abundant protein in vertebrates. Collagen fibers are a major portion of tendons, bone and skin. Alpha helices of collagen make up a triple helix structure giving it tough and flexible properties. •Fibroin fibers make the silk spun by spiders and silk worms stronger weight for weight than steel! The soft and flexible properties come from the beta structure. •Keratin is a tough insoluble protein that makes up the quills of echidna, your hair and nails and the rattle of a rattle snake. The structure comes from alpha helices that are cross-linked by disulfide bonds. Source:http://www.prideofindia.net/images/nails.jpg http://opbs.okstate.edu/~petracek/2002%20protein%20structure%20function/CH06/Fig%2006-12.GIF http://my.webmd.com/hw/health_guide_atoz/zm2662.asp?printing=true 20 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info The globular proteins The globular proteins have a number of biologically important roles. They include: Cell motility – proteins link together to form filaments which make movement possible. Organic catalysts in biochemical reactions – enzymes Regulatory proteins – hormones, transcription factors Membrane proteins – MHC markers, protein channels, gap junctions Defense against pathogens – poisons/toxins, antibodies, complement Transport and storage – hemoglobin and myosin 21 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins for cell motility Above: Myosin (red) and actin filaments (green) in coordinated muscle contraction. Right: Actin bound to the mysoin binding site (groove in red part of myosin protein). Add energy (ATP) and myosin moves, moving actin with it. Source: http://www.ebsa.org/npbsn41/maf_home.html http://sun0.mpimf- 22 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins in the Cell Cytoskeleton Eukaryote cells have a cytoskeleton made up of straight hollow cylinders called microtubules (bottom left). Tubulin forms helical filaments They help cells maintain their shape, they act like conveyer belts moving organelles around in the cytoplasm, and they participate in forming spindle fibres in cell division. Microtubules are composed of filaments of the protein, tubulin (top left) . These filaments are compressed like springs allowing microtubules to ‘stretch and contract’. 13 of these filaments attach side to side, a little like the slats in a barrel, to form a microtubule. This barrel shaped structure gives strength to the microtubule. Source: heidelberg.mpg.de/shared/docs/staff/user/0001/24.php3?department=01&LA NG=en http://www.fz-juelich.de/ibi/ibi-1/Cellular_signaling/ 23 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins speed up reactions Enzymes 2 2 Catalase speeds up the breakdown of hydrogen peroxide, (H2O2) a toxic by product of metabolic reactions, to the harmless substances, water and oxygen. The reaction is extremely rapid as the enzyme lowers the energy needed to kick-start the reaction (activation energy) + No catalyst = Input of 71kJ energy required Energy Activation Energy With catalase = Input of 8 kJ energy required Substrate Product Progress of reaction 24 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins can regulate metabolism – hormones When your body detects an increase in the sugar content of blood after a meal, the hormone insulin is released from cells in the pancreas. Insulin binds to cell membranes and this triggers the cells to absorb glucose for use or for storage as glycogen in the liver. Proteins span membranes –protein channels The CFTR membrane protein is an ion channel that regulates the flow of chloride ions. Not enough of this protein gets inserted into the membranes of people suffering Cystic fibrosis. This causes secretions to become thick as they are not hydrated. The lungs and secretory ducts become blocked as a consequence. Source: http://www.biology.arizona.edu/biochemistry/tutorials/chemistry/page2.html http://www.cbp.pitt.edu/bradbury/projects.htm 25 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins Defend us against pathogens – antibodies Left: Antibodies like IgG found in humans, recognise and bind to groups of molecules or epitopes found on foreign invaders. Right: The binding site of an antigen protein (left) interacting with the epitope of a foreign antigen (green) Source: http://www.biology.arizona.edu/immunology/tutorials/antibody/FR.html http://tutor.lscf.ucsb.edu/instdev/sears/immunology/info/sears-ab.htm http://www.spilya.com/research/ http://www.umass.edu/microbio/chime/ 26 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Protein structure and human disease In some cases, a single amino acid substitution can induce a dramatic change in protein structure. For example, the DF508 mutation of CFTR alters the a helical content of the protein, and disrupts intracellular trafficking. Other changes are subtle. The E6V mutation in the gene encoding hemoglobin beta causes sicklecell anemia. The substitution introduces a hydrophobic patch on the protein surface, leading to clumping of hemoglobin molecules. Page 27311 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Protein structure and human disease Disease Cystic fibrosis Sickle-cell anemia “mad cow” disease Alzheimer disease Protein CFTR hemoglobin beta prion protein amyloid precursor protein Table 9.5 Page 28312 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins • Complex organic molecules made up of amino acid subunits • there are 20 different types of amino acids. • different sequences of amino acids fold into different 3-D shapes. • Proteins can range from fewer than 20 to more than 5000 amino acids in length. • Each protein that an organism can produce is encoded in a piece of the DNA called a “gene”. • the single-celled bacterium E.coli has about 4300 different genes. • Humans are believed to have about 30,000 different genes (the exact number as yet unresolved) An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Amino acids 20 possible groups R | H2N--C--COOH | H An Introduction to Bioinformatics Algorithms Dipeptide R | H2N--C--COOH | H R | H2N--C--COOH | H www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Dipeptide This is a peptide bond R O R | II | H2N--C--C--NH--C--COOH | | H H An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins – Amino Acids Name 1-letter code Triplet Glycine G GGT,GGC,GGA,GGG Alanine A GCT,GCC,GCA,GCG Valine V GTT,GTC,GTA,GTG Leucine L TTG,TTA,CTT,CTC,CTA,CTG Isoleucine I ATT,ATC,ATA Histidine H CAT,CAC Serine S TCT,TCC,TCA,TCG,AGT,AGC Threonine T ACT,ACC,ACA,ACG Cysteine C TGT,TGC Methionine M ATG Glutamic Acid E GAA,GAG Aspartic Acid D GAT,GAC,AAT,AAC Lysine K AAA,AAG Arginine R CGT,CGC,CGA,CGG,AGA,AGG Asparagine N AAT,AAC Glutamine Q CAA,CAG Phenylalanine F TTT,TTC Tyrosine Y TAT,TAC Tryptophan W TGG Proline P CCT,CCC,CCA,CCG Terminator (Stop) * TAA,TAG,TGA Protein-Sequence (Alphaet: ACDEFGHIKLMNPQRSTV WY): • a typical human cell contains about 100 million proteins of about 10,000 types An Introduction to Bioinformatics Algorithms 3/12/2016 www.bioalgorithms.info 34 An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins – Amino Acids Properties of amino acids: • play a role in the construction of 3-D stuctures in proteins An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Proteins - Summary • DNA sequence determines protein sequence • Protein sequence determines protein structure • Protein structure determines protein folding and function An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Polypeptide v. Protein • A protein is a polypeptide, however to understand the function of a protein given only the polypeptide sequence is a very difficult problem. • Protein folding an open problem. The 3D structure depends on many variables. • Current approaches often work by looking at the structure of homologous (similar) proteins. • Improper folding of a protein is believed to be the cause of mad cow disease. http://www.sanger.ac.uk/Users/sgj/thesis/node2.html for more information on folding An Introduction to Bioinformatics Algorithms Protein Folding • Proteins tend to fold into the lowest free energy conformation. • Proteins begin to fold while the peptide is still being translated. • Proteins bury most of its hydrophobic residues in an interior core to form an α helix. • Most proteins take the form of secondary structures α helices and β sheets. • Molecular chaperones, hsp60 and hsp 70, work with other proteins to help fold newly synthesized proteins. • Much of the protein modifications and folding occurs in the endoplasmic reticulum and mitochondria. www.bioalgorithms.info An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Protein Folding • Proteins are not linear structures, though they are built that way • The amino acids have very different chemical properties; they interact with each other after the protein is built • This causes the protein to start fold and adopting it’s functional structure • Proteins may fold in reaction to some ions, and several separate chains of peptides may join together through their hydrophobic and hydrophilic amino acids to form a polymer An Introduction to Bioinformatics Algorithms Protein Folding (cont’d) • The structure that a protein adopts is vital to it’s chemistry • Its structure determines which of its amino acids are exposed carry out the protein’s function • Its structure also determines what substrates it can react with www.bioalgorithms.info