Chapter 9 DNA based Information technologies 9.1 Studying Genes and their Products Typical research problem. Have isolated a protein that you want to study Want to isolate enough so can crystallize for X-ray Want to alter AA sequence in active site to see what happens Want to see how enzyme interacts with other proteins in cell Want to see how protein is regulated in cell Can do all of the above if can get the DNA for the gene encoding the enzyme but gene is a few thousand base pairs in a chromosome of 100' of million base pairs. This section describes the tools to do the above Genes can be isolated by DNA cloning Clone - identical copy of something Originally applied to a cell that was allowed to reproduce to make a colony of identical cells In DNA it refers to making many identical copies of a gene sequence The Process of Gene cloning involves 5 steps Figure 9-1 1. Cutting DNA at precise locations Done by sequence specific endonucleases 2. Selecting a small carrier DNA capable of self- replication These are called cloning vectors 3. Joining the two DNA fragments covalently Composite product called recombinant DNA 4. Moving the recombinant DNA for a test tube back into a host cell for replication 5. Finding and selecting for the host cells that contain the cloning vector. The above process called Recombinant DNA technology Genetic engineering Most of discussion will focus on Ecoli methods, since that is the best understood 2 Restriction Endonucleases and DNA Ligases yield recombinant DNA Step 1 cutting DNA at specific location Restriction endonucleases or restriction enzymes Recognize and cleave DNA at specific sequences When used on a large piece of DNA makes smaller, well defined pieces DNA ligases then link cleaved DNA into cloning vector Restriction Endonucleases Found in a wide range of bacteria Used in bacteria to cleave foreign DNA Does not cleave its own DNA because it has been methylated at that site By specific methylase Restriction endonuclease and matching methylase called Restriction-modification system Three types of restriction endonucleases Type I Large multi-subunit protein Cleaves at random sites that can be more than 1000 bp from recognition site Uses ATP energy to move along DNA Type II Simpler structure Does not move on DNA so no need for ATP Cleaves withing recognitions sequence This is the one we use for genetic recombination Type III Large multi-subunit protein Cleaves DNA about 25 bp from recognition site Uses ATP energy to move along DNA Thousands of type II discovered >100 different DNA recognition sites Recognition site usually 4-6 residues and pallindromic Table 9-2 a small sample Some make staggered cuts Leaves 2-4 unpaired bases at each end Called sticky ends Can hbobd back to self or to another piece of DNA Some make blunt end cuts Average size of DNA fragments produced depends on how often sequence occurs in DNA 3 This, in turn, depends on size of site 6 bp site Random chance of 46 or 4096 So size of average fragment should be 1 in 4100 4 bp site Random chance 44 or 256 bp So average size about 1 in 250 Not entirely random Sites occur less often than expected Can get larger fragments if simply stop reaction before complete This is called a partial digest Can also get larger fragments if use homing endonuclease Chapter 26! Recognition sequence 14-20 bp Once DNA cut can be purified Agarose or polyacrylamide gel or HPLC If have cleaved entire chromosome usually too many fragments So construct a DNA library first (Next major section) Once purified use ligase to attache to cloning vector that has been cleaved with same restriction enzyme so sticky end match (Cloning vector described in next minor section) Ligase uses ATP to linked DNA strands together Most efficient with Sticky end restriction endonucleases But can work with blunt ends as well Can also link with synthetic DNA containing novel seqeunces Called linkers Or even poly linkers Linkers contain other sequences that will be useful in the cloning process Cloning Vectors Now lets look at the DNA we are going to attach our chosen sequence to Plasmids Circular piece of DNA that replicates sperately from bacterial host chromosome 5,000-400,000 bp Thought to be molecular parasites Like a virus DNA that can no longer make a virus capsule and infect other bacteria Contain some sequence that allow to reproduce in host bacteria using the bacteria’s own enzymes to reproduce Often plasmids are more like symbiotes than parasites Either confer antibiotic resistance 4 Or confer a new property on bacteria Ti plasmid in Agrobacterium tumefaciens - allows bacteria to invade plant cells Classic Plasmid pBR322 - constructed in 1977 Figure 9-3 In 4,361 bp sequence Ori - origin of replication Uses host bacteria’s enzymes to start replicating at this point Associated regulatory system limits to 10-20 copies/cell Genes that confer resistance to Ampicillin Tetracycline Several sites for unique cleavage with restriction enzymes PstI, EcoRI, BamHI, SalI, PvuII Small size of plasmid makes it easier to get into host cell and to manipulate Other plasmids Different ori seqeunces give different copy numbers 1 to 1000/cell If have two plasmids in cell and both use same ori Will interfere with each other Said to be incompatable So if need 2 plasmids at once need a different ori on each plasmid Transformation The process of putting a plasmid into a cell In many bacteria simple Put bacteria and plasmid in a CaCl2 solution into test tube at 0C Rapidly bring to 37C or 43 C Don’t know why but it works! Some bacteria just naturally ‘competent’ at DNA uptake Do not need above treatment Other cells need Electroporation Subject cell to high voltage pulse Allows cell membrane to uptake large DNA 5 Selection Not all cells will have taken in the plasmid DNA Now need to identify the cells containing the plasmid Utilize genes in plasmid called Selectable Markers Selectable markers - 2 kinds Allow cell containing plasmid to grow under defined conditions Called positive selection Kill cell containing the plasmid under defined conditions Negative selection pBR 322 Figure 9-4 shows how use both Screenable Markers Make transformed cell have color or fluoresce Plasmid has extra gene for this trait So can visually see the transformed colonies Transformation less efficient with larger DNA 15,000 largest can do with a plasmid Bacterial Artificial Chromosomes (BAC’s) Figure 9-5 Can do for DNA 100,000 bp to 300,000 bp Approaching size of host chromosome! Simple plasmid at start Low copy number in cel (1 or 2) So do not see recombination events (Next semester) Also contains selectable and screenable markers Yeast Artificial Chromosomes (YAC’s) Figure 9-6 Yeast genetics almost as well understood as e coli Easy to maintain Easy to grow on an industrial scale Eukariotic so will be processed differently Plasmid vectors developed much like E cole Some systems with multiple origins for different organisms Called shuttle vectors Contains origin for Yeast Two selectable markers Telomeres and centromere Two telomeres (usually at end of eukariotic chromosome) BAMH1 sites used to remove DNA between telomeres to make linear 6 Can put in DNA chunks up to 2x106 bp Stability of YAC increases with size up to limit 100,000 bp or less - slowly lost 150,000 bP stable Used to study eukariotic chromosome metabolism Expressing protein product of genes usually product of gene, not the gene itself that you want to study either for study or to make commercially Trying to express eukariotic gene in bacteria has some issues Sequences needed for transcription and regulation in original host do not function in bacterial cell Promoters Ribosome binding sites etc So need to add sequences for bacteria control and transription Cloning vectors that contain all the signals for regulated expression are called Expression vectors Again details on these sequences in second semester Figure 9-7 Many different systems are used to express recombinant proteins Lots of different systems. Each has advantages and disadvantages Bacteria Best understood and most common Easy to store and grow in lab Also media is cheap Can be grown on industrial scale But proteins may not be processed correctly Do not fold right Do not get proper covalent modifications May need proteolytic cleavage for activation Many eukariotic proteins aggregate into insoluble celluar precipitates Called inclusion bodies Always developing new system to get around these problems Skip next two paragraphs Yeast Well understood and characterized Also easy and cheap to grow on industrial scale 7 Have tough cell walls so harder to get DNA inside That is why first make the shuttle vector in bacteria Since is eukariotic system control mechanisms for eukariotic genes work better But can still have folding and procession problems Insect and Insect viruses Baculoviruses figure 9-9 Insect viruses that inser double stranded DNA into DNA genomes Usually act as parasite Kills host larvae while making more viruses Late in infection produce large amounts of two proteins for virus p10 and polyhedrin Not needed in cultured insect cells So replace you your gene of interest Can get up to 25% of protein to be your protein of interest Baculovirus most common protein expression system Genome is 134,000 bp Too large for direct cloning Also purfying virus is difficult Use bacmids instead Large circular DNA Contains baculovirus DNA Plus sequences for replication in e coli You usually start your cloning in a smaller plasmid system Then join with bacmid to make larger gene Large number of systems commercially available Protein modification is better But still some failures Mammalian cell Culture Easiest to introduce foreign DNA with viruses A variety of engineered viruses are available commercially Using some viruses can get DNA permanently incorporated into cell line Very little problems with process fo final protein Biggest issue is mammalian cell cultures are expensive to mantain Alteration of Clones Gene to produce altered proteins Figure 9-10 Once have a cloned gene being expressed, can then use site-directed mutagenisis to change sequence of protein 8 Powerful approach to see how individual residues in a protein affect structure/function If they exist use restriction sites to remove a piece of DNA And replace with a new piece with a few small changes Also can use oligonucleotide-directed mutagenesis Make DNA with desired base change ne for each strand If DNA 30-40 residues will anneal to original DNA Use as PCR (later this chapter) primers to replicate DNA Process makes many copies of mutated DNA Fewer copies of original DNA If original DNA came from wild-type Ecoli will be methylated Can use DpnI to destroy this DNA! Can also fuse protein or domains together or remove and replace domains Product of a fused gene is called a fusion protein Terminal Tags as handles You learned back in chapter 3 that the best way to purify a protein was with affinity chromotagraphy Many proteins do not have ligands that can be used for affinity chrom So as long as we are modifying our protein and producing on an industrial scale, lets add a tag on either end of the protein that can be purified with affinity chromatography so purifying the protein is simplified. Common tags shown in table 9-3 Diagram of process shown in figure 9-11 GST - Glutathione-S- transferase Small protein (26,000) Ightly binds to lutathione Another system Add 6 or more HIS residues to one end His binds to Ni2+ Use chromatographis matrix that has bound Ni2+ Gene Sequence amplification PCR Polymerase chain reaction PCR - conceived by Kary Mullis in 1983 used to amplify the number of copies of a piece of DNA relies on DNA Polymerase Need deoxy NTP’s Needs template 9 Only works in 5'63' direction Figure 9-12 Start with 2 synthetic pieces of DNA Complementary to ends of target DNA sequence 1 for each strand Step 1 Heat DNa to separate strands Step 2 add primers and cool to anneal Step 3 add heat stable Taq DNS polymerase + deoxy NTP’s Will extend from primers Repeat steps 1-3 25 or 30 times DNA in question amplified 106 times Each cycle double amount of DNA 20 times = 220 > 106 30 times 230> 109 Can include additional tricks Have restriction endo nuclease sites added to ends of primers If short this segment won’t anneal to native DNA But as amplifies gets incorporated into the new DNA Can amplify a single coy of DNA into useful amounts Used to amplyfy 40,000 year olds DNA from mummy Also Wooly mammoth DNA Use in forensic analysis See Box 9-1 Since amplifies any DNA - contamination a serious problem Variations Reverse transriptase PCR (RT-PCR) Start with an RNA Use a reversetranscriptase for first cycle to make a DNA Use regular PCR to replicate the DNA Quantitative PCR (qPCR) Figure 9-13 Have a probe witch has fluorophore and quencher With both on 1 molecule - no fluorescnece Amplify DNA When probe binds to target Fluorophoe and quencher separated Will begin to fluoresce If target DNA in high amounts will see fluorescence after fewer cycles of PCR. If target DNA in low amounts will take more cycles 10 9.2 Using DNA based methods to understand Protein Function Can describe protein function on 3 levels 1. Phenotypic function Effect of protein on entire organism Modify protein and see overall change in organism 2. Cellular function Describe the network of interaction within gthe cell 3. Molecular function Precise biochemical activity of protein Wide variety of DNA based techniques to study at all levels DNA Libraries A collection of DNA clones Used for Ggenome sequencing, gene discovery or determination of gene/protein function Genomic Library Cleave entire genome of an organism in to 1000's of fragments All fragments cloned by insertion into vectors Step 1. Partial digestion of genome by restriction endonucleases Make fragments of a limited size range Trying to make sure all genes are in library clones Remove large and small fragments with centrifugation or electrophoresis Step 2 digest a BAC or YAC with same endonuclease Step 3 ligate genome and vector DNA together and transform into yeast or bacteria so have library of cells containing genomic DNA Complementary DNA library Step 1 extract mRNA form an organism or a tissue Step 2 reverse transcriptase PCR Step 3 Clone DNA into vectors to make cDNA library Clones only those parts fo the genome that are expressed Get rid of all introns 11 Sequence or Structural relationships for Protein function ‘Comparative genomics’ Compare newly discovered gene to genes of known function Genes with similar sequence of function from different spieces Orthologs Genes with similar sequence of function in same species Paralogs Easiest to compare in similar species (human to mouse) But orthologs observed between bacteria and humans conserved gene order on a chromosome observed in closely related species Synteny Sometimes see structural motifs that point to function ATP binding domains in ATPases Finger fingers in DNA binding proteins Fusion Proteins and Immunoflorescence to localize protein in cell Location in cell can give clue to function Green Fluorescent protein (GFP) Derived from jellyfish Aequorea victoria Fluorophore in center of a beta arrel Just need O2 to flresce Fuse onto protein of interest Use microscope to see location in cell Several proteins with different color now available Figure 9-16 Alternate method Fuse protein of interest with short protein sequence that has well characterized antibodies Called epitope tage Kill and fix cell on microscope slide Attach flurophore to antibody Allow antibody to find target Target location now fluresces Figure 9-17 Similar method, don’t alter protein But raise antibody to it And have fluorescent antibody that binds to first antibody Can do these things for entire library! 12 Protein-Protein Interactions make protein with epitope tag In cell precipitate protein with antibody See if any other proteins precipitate out because in complex with first protein Many variations Tandem affinity purification Figure 9-20 Yeast two-hybrid analysis Figure 9-21 Skip DNA Microarrays short DNA segments from genes of know sequence (50-100 of bp all PCR’s together Robotic device add nonliter drops of solution in a predesigned array onto solid surface that binds DNA Or simply synthesize DNA onto solid surface Figure 9-22 Resulting array called a chip May include a sequence form every gene in an organism Probe chip with mRNA or cDNA from an organism or tissue in a particular state Provides researcher with snapshot of all genes being expressed at that time Figure 9-23 Figure 9-24 9.3 Genomic and Human Story 2 complete human genomes published in 2001 Watson-Collins - publically funded Venter- Privately funded reflects about 10 years of sequencing by methods give so far New generation of DNA sequencing New technology - “Next gen” sequencers bacterial genome a few hours Human genome a day or two Step 1 Shear genome to randomly generate fragments of a few hundred base pairs Step 2 add synthetic sequences of DNA to end of all DNA This give you a reference point Step 3 Immobilize DNA on a matrix Step 4 PCR amplify all DNA sequences Now have microarray of millions of DNA fragments 13 2 different ways to sequence all these fragments Pyrosequencing Figure 9-25 Flush chip with each of the dNTP’s in turn And then use apyrase to degrade unreacted base If could not add to DNA because not next in sequence, nothing happens If could add to DNA because next in sequence would release Ppi Sulfurylase converts Ppi to ATP ATP reacts with ATP to create flash of light Watch chip to see where bases are added Reversible terminator sequencing Figure 9-26 1. Add blocked fluorecently labeled nucleotides Use fluoresence to see what nucleotides added where 2. Remove labels and blocking groups 3. Wash Repeat step 1 etc And I will quit there