Coming Soon to a Lab Near you… QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. AST734 Andrew Boal 25 November 2004 Image credits: NASA, NPS, and Protein Data Bank Extreme environments and astrobiology Numerous extreme terrestrial habitats are seen as potential analogs to life-bearing niches in the solar system Extreme environments are those which exist outside of the conditions of a “mesophilic environment” (T~30-40oC, salt concentration <3%, etc) Terrestrial examples include hot springs (high temp.), salt lakes (high salt), deep sea vents (high pressure), deserts (low water) These “extreme” environments might model conditions found on Mars, Europa, Titan, elsewhere QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Image credits: NASA, NPS Extreme environments: microbes in residence Extremeophiles are defined by the type of environment required for growth There is no overall consensus on the definition of an extreme environment Organisms that can survive in an extreme environment but do not require those conditions for growth are extremeotolerent Mesophile: Lives in an ambient environment QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Thermophile: Temp. > ~45oC Psychrophile: Temp. < ~20oC Barophile: High pressure Xerophile: Low water content Halophile: Salt content > 3-10% Acidophile: pH < 5 Alkaliphilie: pH > 9 Image credit: CDC Radiophile: high amounts of radiation Biogeography Biogeography is the study of the environmental distribution of species One can explore several, isolated, analagous extreme environments which may not allow transport of microbes between them to develop a better understanding of microbial evolution QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Map credit: CIA World Factbook, Image credits: NPS But, what about a deeper look? Molecular components of cells The predominant components of the molecular makeup of cells include lipids, nucleotides, and proteins Nucleotides: protein blueprints and fabrication QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Lipids: provide cell membranes QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Proteins: do the work of the cell The ability of these molecules to function is directly related to molecular shape, which is influenced by the environments, so… Biomolecular structural endemism The Big Questions: QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Are there molecular structures which are endemic in an environment? If so, how and why are those structures arrived at? Photo Credits: National Park Service Web pages What are biomolecules? Biomolecular structure Biomolecular structure is determined by a combination of covalent and noncovalent bonds Covalent bonds are static entities which are little effected by environment Noncovalent bonds (hydrophobic interactions, hydrogen bonding, and electrostatic attraction) exist in a dynamic equilibrium, and thus can be attenuated by factors such as temperature, ion content, and pH Biomolecules must both be somewhat flexible and somewhat rigid to attain proper functioning, therefore the forces that hold the molecular shape must attain a balance with the environment Too static- function is compromised Balance- function and function preserved Too dynamic- structure is compromised Lipid structure Lipids are made up of a hydrophilic (water-loving) head group and a hydrophobic (water fearing) tail Hydrophilic head group H3C CH3 N+ CH3 H2C CH2 O O PO O CH2 H2C CH O O O C O C CH2 CH2 H2C H2C CH2 CH2 H2C H2C CH2 CH2 H2C H2C CH2 CH2 H2C H2C CH2 CH2 H2C H2C CH2 CH2 H2C H2C CH2 CH2 H2C H2C CH3 CH3 N+ H 2O O O PO O O O O Hydrophobic tail O QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. In cell membranes, lipids pack to form a bilayer so that the heads are in water and the tails are mixed together Lipids in thermal environments Lipids from thermophilic archaea have a dramatically different chemical structure PO PO O O O O O O O O O OP Thermophile Archaea bilayer O O Hyperthermophile Archaea lipid Increased hydrocarbon branchingincreased hydrophobicity Mesophile lipid bilayer O PO O OP O Head-tail linkages are ethers, not esters, and are chemically more robust Backbone of both layers is chemically connected, again increased stability O OP O DNA and RNA: chemistry DNA and RNA are polymers of nucleotides (oligo- or polynucleotide) Nucleotides are comprised of nucleobases attached to a sugar Nucleobases: O O N H N Sugar H N O Sugar N H N N N O N Sugar Adenine H N N H N N Cytosine H Backbone Nucleobase Sugar Guanine Nucleobases are cyclic structures which are basic (like ammonia) O Nucleobase O O N Sugar N Sugars: Backbone O O Thymine (DNA Uracil (RNA only) only) H N H N O N H O OH O Backbone Backbone Ribose is in RiboNucleic Acid (RNA) Deoxyribose is in DeoxyriboNucleic Acid (DNA) The extra -OH (alcohol) in ribose makes it much less chemically stable DNA and RNA: polynucleotide structure DNA and RNA structure is based on hydrophobic interactions and hydrogen bonding Hydrogen bonding is a weak interaction where two electronegative elements “share” a hydrogen atom (note that carbon-hydrogen bonds do not partake in hydrogen bonding Center of duplex is hydrophobic H H N O Polynucleotide backbone has charged phosphate groups which are hydrophilic O N N O N N N H Thymine:Adenine (T:A) base pair N O H O O P OO N O O O N N H H N Guanine:Cytosine (G:C) base pair N N N N H O H Dashed lines indicate hydrogen bonds DNA: secondary structure Base pairing determines the nature of the secondary structure The basic elements of DNA secondary structure are the duplex (which is by far the most prevalent), the junction, and the hairpin T G duplex G T T C T C C G C T A A G G T A T G A A G G C G A T T C A G C T A C QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. junction A C C G hairpin DNA melting One of the easiest ways to measure DNA stability is to obtain a “melting curve” which is a spectroscopic measurement of duplex unzipping Example of a DNA melting curve obtained spectroscopicly Representation of DNA melting by duplex unzipping or unwinding Figure taken from: Drukker, K., et. al. J. Phys. Chem. B. 2000, 104, 6108-6111 Stability of DNA in extreme environments Main determinant of DNA stability is the fraction of C:G base pairs in a given oligonucleotide sequence The primary difference between an A:T and G:C base pair is that G:C has three hydrogen bonds, and is thus more stable 90 T , 69mM NaCl m T , 220mM NaCl m T , 1020mM NaCl 80 m H H N N N m N N H 70 N N H H N O N N 60 A:T O N H 50 N N N N H H T (ÞC) O G:C O 40 0.1 0.2 0.3 0.4 0.5 0.6 DNA content (f ) 0.7 0.8 0.9 GC Data taken from: Owczarzy, R., et. al. Biochemistry, 2004, 43, 3537-3554. Other factors include hydrophobicity and interaction between salt and the DNA backbone The many faces of RNA RNA is primarily involved in protein synthesis and comes in three major types: Ribosome RNA (rRNA) forms the skeleton of the ribosome, the machine which makes proteins Message RNA (mRNA) is made by transcription of DNA and lists the amino acid sequence of a protein Growing protein Amino acid Transfer RNA (tRNA) transports amino acids into the ribosome Structure of tRNA tRNA is a good molecule to explore for environmental studies G G C T A A T C G G C T A G G T AA G G G A T C G T T G A G C G C C A G C G G C T C G T G T C G C C C G A A A C T T A T T A C G G C G T G C G C A C C A tRNA molecules are usually fairly small (less than 100 nucleic acid monomers) tRNA has a relatively simple secondary structure tRNA usually exists as free molecule in the cell Like DNA, RNA secondary structure has elements such as duplexes, loops, bulges, and hairpins Stability of tRNA The stability of tRNA can be both measured spectroscopicly like DNA but can also be calculated Calculated free energy is obtained by factoring in the strength of noncovalent interactions in a folded and unfolded tRNA and is expressed as the free energy of complex formation, ∆Gf (NOTE: lower ∆Gf value indicates increased stability, formation is more favorable) Initial ∆Gf values and predicted secondary structure can be calculated from raw sequence data: Calculated ∆Gf values of the GGC codon tRNA from E. coli. and T. acidophilum E. coli.: ∆Gf = -28.9 kcal/mol T. acidophilum.: ∆Gf = -30 kcal/mol Thermoplasma acidophilum: GGGCCGGTAGATCAGAGGTAGATCGCTTCCTTGGCATGGAAGAGGcCAGGGGTTCAAATCCCCTCCGGTCCA E. coli.: GGGGCTATAGCTCAGCTGGGAGAGCGCTTGCATGGCATGCAAGAGGtCAGCGGTTCGATCCCGCTTAGCTCCACCA Proteins: amino acids and primary structure Primary structure is determined by covalent amide bonds between individual amino acids Examples of amino acids O Amino acids Amino Acid functionality O H functionality O HO R HO O HO OH O H N H H HO Side chain “R-group”- defines the chemical and physical nature of the amino acid O H N H H HO O H2N OH Lysine (K, Lys) hydrophilic, positive Glutamic acid (G, Glu) hydrophilic, negative A peptide or protein is a chain of 10-1000 amino acids NH2 HN HO N NH HO O HO O H N O N H N O H N O N H O H N O H N H H Serine (S, Ser) Alanine (A, Ala) Leucine (L, Leu) slightly hydrophobic strongly hydrophobic hydrophilic N H H HO H N H H H N H H N H O H N O N H O H N O N H O H N O NH2 N H OH O H2N O Proteins: secondary structure Secondary structures (folds) are defined by hydrogen bonding and steric interactions of the side chain The -helix is a coil of a peptide chain and has 3.6 residues per helical turn Primary interactions are hydrogen bonding between residues along the helical axis and steric interactions between side chains The -sheet is a linear arrangement of amino acids Structure is defined by interstrand hydrogen bonds, less by sterics of side chains Sheets can be parallel or anti-parallel, defined by orientation of the backbone Other, but far less common, peptide folds include the coiled-coil, random coil, bulge, -turn, 310 helix, 27 helix, -helix, -barrel, and so on… Proteins: tertiary and quaternary structure Tertiary and quaternary structure is defined almost entirely by noncovalent interactions Quaternary Structure: the Tertiary Structure: the overall shape of a folded protein QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. assembly of multiple protein units into a larger structure QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Top View Side View Protein model systems: helices The helix is a common protein structural element which can be readily studied helices are the secondary structural element which is most susceptible to sequence and environment factors and the stability of helices is related to the stability of the overall protein Like DNA melting, helix (and protein) stability is related to a structural denaturation Structural stability is measured by spectroscopicly observing helix unfolding Graph taken from: Whitington, S. J., et. al. Biochemistry, 2003, 42, 14690-14695. As for tRNA, ∆Gf can be calculated for helices or can be measured using Circular Dichrosim spectroscopy by employing the relationship ∆Gf = -RTlnK, where K can be measured from the spectrum Example of environment related structural differences One example is the study of the helices of RecA RecA is a protein involved with DNA repair, cell division and other processes and is found in all environments Crystal structure of RecA from E. coli RecA sequences from 29 proteins were aligned with that of E. coli, allowing for the determination of helical fragments QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. There are 10 helical regions in RecA This work was published as: Petukov, M. et. al. Proteins: Structure, Function, and Genetics 1997, 29, 309-320 Crystal structure of RecA from E. coli was used as a template ∆Gf values for these sequences were calculated and analyzed Thermophile helices are more stable Calculated ∆Gf values indicated that helices of thermophlie origin were more stable than mesophile helices Eight of the thermophile helices were found to be more stable- these helices are likely related to STRUCTURAL stability Total helix ∆Gf No change was found for two helices, both of which are directly involved in interactions with DNA and other proteins, these helices likely need to retain flexibility for FUNCTIONAL stability T. thermophilus (80oC) E. coli (37oC) P. areuglinosa (20oC) 20oC 37oC 80oC Interestingly, total helix stability was found to be the same value if the optimal temperature for protein activity is taken into account- this is again related to the need for molecular flexibility Biomolecular structural endemism The Big Questions: QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Are there molecular structures which are endemic in an environment? If so, how and why are those structures arrived at? Photo Credits: National Park Service Web pages Study roadmap Bioinformatics Develop comprehensive listing of known protein/RNA sequences from public database Search for environmentspecific structural elements Sample Collection Identify environments for study (Hawaii lakes, Chile: Andes and Patagonia?) Travel/Sample Collection/ Data Analysis Model Studies Synthesis of short RNA and peptide sequences Study structure of these molecules in lab-generated extreme (thermal/salt/pressure) environments Computer models of these systems Environments to be explored Initial work will be carried out in Hawaiian lakes These include Lake Kauhako (Moloka’I), Lake Wai’ele’ele (Maui), Green Lake and lake Waiau (Hawai’I) These lakes are relatively accessible and will provide a ready data set that we will use to develop our sampling and analysis methodologies This data set will also establish part of the mesophile baseline South American Lake Environments South America, specifically the Andes and Patagonia, have numerous extremeophilic environments South American lakes are less well studied from the biogeographical view point- will be able to describe new environments These environments are also geographically isolated from other extreme environments will allow for greater geographic variability Other possible environments include deep sea trenches and subglacial lakes- UH collaborations What we will look at: “adaptive” proteins Proteins which serve a function adapted to the environment Antifreeze protein: inhibits ice crystal formation QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Mechanosensitive channel: responds to osmotic stress Potassium Channel: transports K+ into the cell QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. ATPsulfurylase: critical in sulfate reducing bacteria What we will look at: “conserved” proteins Conserved proteins are those which would be expected to be more similar given a function which is ubiquitous QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. ATPase: synthesis of ATP, a cell energy source QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. DNA gyrase: involved in DNA packaging Pyruvate kinase: involved in glycolysis QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Rhodopsin: light sensing QuickTime™ and a and TIFF (LZW) decompressor transduction are needed to see this picture. Planned Methodologies: Bioinformatics and Sample Collection Bioinformatics is the term used to describe the mining of biological sequence and structural data bases The initial work here will be to develop a database of molecular sequences correlated with the organism of origin (which will tell us the nature of the environments they came from) These sequences will then be examined for environment-specific structural motifs This database will help to establish environmental targets and can be modified by biogeographical studies Data that will be collected in the environment Environmental DNA- will be used to establish the biodiversity of a site as well as provide information regarding molecular sequences Physical factors will also be taken into account, including the temperature, salinity, nutrient composition, etc… Methodologies: Model systems and computations Synthesis and physical or computational characterization of model and natural peptides or nucleotide sequences helices QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. tRNA sequences More complicated peptides such as the helix bundle (common in membrane proteins) These studies will provide us with a numerical quantity (∆Gf) for stability as well as molecular level insights of the mechanism of stability Other variants of this work includes the study of the folding of proteins isolated from the environment and the study of peptide-oligonuicleotides interactions Bringing it all together We will attempt to establish a relationship between the physical environment, biodiversity, and molecular structure One way this can be accomplished is to generate plots of stability vs. structural similarity for individual environments Increasing structural similarity This range will indicate stability window This range will indicate the variance of structures which are capable of surviving Increasing stability or protein activity A small stability range would indicate that there are rigorous energetic requirements A small structural similarity range would indicate environment specific structures If both values are small, it may indicate that structures evolved to meet the specific requirements of that environment