Introduction to Molecular Biology Raja Logananatharaj Center for Advanced Computer Studies University of Louisiana at Lafayette Chromosome Structure • Most plant and animal cells are diploid – they have two sets of chromosomes Human Metaphase Chromosomes •Most plant and animal gametes are haploid – they have one set of chromosomes •In humans, there are 23 chromosomes per haploid set: 22 autosomes plus 1 sex chromosome (X or Y) •There are about 100,000 gene loci on these chromosomes Spectral karyotype: metaphase A Single Chromatid Long Arm Centromere Short Arm A Pair of Homologous Chromosomes Genes, loci, alleles and chromosomes • Each chromatid contains a single, very long piece of DNA • Each gene is a small section of this DNA • A gene locus is the place where a gene is located • An allele is a particular variety of a gene, different alleles have different DNA sequences. Human chromosome 6 Macromolecules • Large molecules made up of chains of smaller molecules • Also called biopolymers • Macromolecules of special interest – Deoxyribonucleic Acid (DNA) – Ribonucleic Acid (RNA) – Polypeptides (including proteins) Deoxyribonucleic Acid DNA is made from smaller molecules phosphoric acid (also called “phosphate”) deoxyribose (a sugar) nitrogenous bases - four types: adenine thymine cytosine guanine Building Blocks of DNA Nitrogenous Bases Purines Pyrimidines Nitrogenous Bases • The bases are compounds known as Purines and Pyrimidines • Purines have two rings • Pyrimidines have one ring • The rings make these very flat structures Purines: Adenine and Guanine The small molecules form larger units • Nucleosides – deoxyribose plus a nitrogenous base – Deoxyadenosine – Deoxycytidine – Deoxyguanosine – Deoxythymidine Pyrimidines Thymine and Cytosine Nucleotides • Phosphate + Sugar + Base • Because there are 4 types of bases, there are 4 types of nucleotides • Nucleotides are the basic unit of DNA and RNA structure • Strings of nucleotides make up strands of DNA or RNA Nucleotide Phosphate + Deoxyribose + one of the four bases Bases are attached to the side of the sugar-phosphate backbone For each base, an N is bound to a C-1 in a deoxyribose Deoxythymidylic acid Double Helix • A DNA double helix has two strands that run in opposite directions (antiparallel) • The two strands in the double helix are held together by hydrogen bonds between the bases. Here 3 nucleotides are joined to make one strand of DNA Nitrogenous Bases Highlighted Base Pairing Rules • Each of the four bases must be paired with a specific complementary base in the opposite strand • A (Adenine) and T (Thymine) are complements • C (Cytosine) and G (Guanine) are complements Adenine - Thymine Nucleotide Pair Pairs of Nitrogenous Bases Two Strands of DNA are Held Together by Hydrogen Bonds between the bases Ribonucleic Acid Structure (RNA) • Same overall structure • In RNA – ribose instead of deoxyribose – uracil instead of thymine • DNA is “typically” double stranded – (but not always!) • RNA is “typically” single stranded – (but not always!) Proteins Amino Acids, Polypeptides and Proteins • Amino acids – genetic code specifies 20 different amino acids – each has an amino group, a carboxyl group, and a “side group” • Polypeptides – polymers of amino acids linked by peptide bonds between their carboxyl and amino groups – every polypeptide has an amino terminus (end) and a carboxyl terminus (end) • Proteins – assembled from one or more polypeptides Amino Acids Amino Acid Abbreviations • Standard 3-letter and 1-letter abbreviations have been devised • Examples: Proline 3 letter abbreviation: Pro 1 letter abbreviation: P Argenine 3 letter abbreviation: Arg 1 letter abbreviation: R Peptide Chains Peptide Chains Fold in to Complex Shapes Gene Expression • Transcription – DNA sequence is transcribed to RNA sequence – messenger RNA (mRNA) is synthesized on a DNA template • Translation – mRNA sequence is translated to amino acid sequence of protein – each amino acid is specified by a combination of three nucleotides (a codon) Directionality of Transcription and Translation • A DNA or RNA strand has a 5’end and a 3’end • Polypeptides have an amino end and a carboxyl end • For DNA replication and RNA transcription… – nucleotides are added to 3’end • For translation – amino acids are add to carboxyl terminus DNA DNA 5’-CCTAAAAGT-3’ 3’-GGATTTTCA-5’ RNA 5’-CCUAAAAGU-3’ Polypeptide amino-ProLysSer-carboxyl Messenger RNA Messenger RNA (mRNA) – Transcribed from DNA – Translated to amino acid sequences on ribosomes Coding vs. non-coding regions – Only part of the mRNA sequence is translated to proteins (coding region) – Biological systems must determine where the coding region begins and ends – Rules for finding start of coding regions are difficult to define with precision The Genetic Code • Triplet codon – each amino acid specified by 3 ribonucleotides • Unambiguous – each triplet sequence specifies only 1 amino acid • Degenerate – more than one triplet may specify the same amino acid • Ordered – Degenerate codons for the same amino acid are similar, usually differing in the 3rd base • Punctuation – There is punctuation for the start and stop of a polypeptide – There is no punctuation between codons (“commas”) Reading Frame • the position of the first translated nucleotide determines the reading frame (where every codon starts) • there are 3 possible reading frames for each strand (if translation starts on the 4th nucleotide, that’s the same reading frame as starting on the 1st , the 5th is the same as the 2nd, etc. • frameshift mutations are insertions or deletions of nucleotides that shift the reading frame Reading Frames Frameshift Frameshift Mutations • Frame shift mutations occur if the number of bases inserted or deleted is not a multiple of 3 • Translation past the frame shift results in either a “junk” polypeptide, or early termination from a stop codon Start and Stop Codons • In addition to specifying amino acids, codons (triplets of nucleotides) mark the starting and ending points of translation • Start and stop codons aren’t at the ends of RNA sequences, they depend on the reading frame Start Codon • In bacteria: – Start codon is AUG – Normally codes for methionine (Met) – Distinction between a start codon and a Met codon depends on the neighboring sequence – Sometimes GUG is used as start codon • In higher organisms (plants, animals, etc): – AUG is also the start codon – Also used for Met – Depends on neighboring sequence Termination of Transcription • Three codons are “stop codons” – UAA, UAG, UGA • Sometimes called “nonsense” codons because they don’t code for any amino acids • nonsense mutation - changes an amino acid to a stop codon Split Genes • Protein coding sequences are not always contiguous • In DNA sequence of a gene, coding region may consist of multiple exons separated by introns • Introns are removed (splicing) from RNA sequence before translation The Almost Universal Genetic Code • In general, all viruses, prokaryotes and eukaryotes use the same genetic code • Indicates a single ancestor for all forms of life • There are a few minor variations – for example – in mitochondria • UGA is Trp instead of stop • AUA is Ile instead of met – in some protozoa • UAA (and sometimes UAG) is Gln instead of stop Introns are Removed by Splicing Theory vs. Practice Theory • Simple rules • Clean data • Textbook examples • Simple (elegant) programming approaches • Focus on Algorithms Practice • Rules are broken • Errors in data • Real-life examples • Simple programming approaches may be dead ends • Focus on robustness Acknowledgements • None of the materials presented is original • All of them are compiled from books, journal articles, and from other PPT materials available on the WEB Example: DNA Sequences Theory • Sequences are fully characterized ATCGGGC • Start of translation known • Contiguous coding region • Universal genetic code Practice • Some nucleotides are ambiguous AT (C or G?) GGGC • Start of translation unknown • Coding region split into exons • Exceptions to genetic code