FCH 532 Lecture 6 Chapter 5 Page 111 Figure 5-51 A degenerate oligonucleotide probe. Page 113 Figure 5-52 Colony (in situ) hybridization. Page 113 Figure 5-53 Chromosome walking. Page 117 Figure 5-56 Constru ction of a recombinant DNA molecule by directional cloning. Page 114 Figure 5-54 The polymerase chain reaction (PCR). •Thought up by Kerry Mullis in 1985. •Amplify DNA up to 10 kb. •Heat denatured DNA is incubated with •DNA polymerase •dNTPs •Two oligonucleotide primers •Heat stable poymerases used •Taq •Pfu PCR • Amplified DNA can be used for RFLP analysis, Southern blotting, and sequencing. • Can be used for rapid detection of diseases nad mutations. • Can be used to identify DNA from hair, sperm, blood by amplification of short tandem repeats (STRs)-segments of repeating DNA sequence (2 -7 bp) such as (CA)n and (ATGC)n • STRs are genetically variable and can be used as markers for individuality. The number of tandem repeats of STR are unique to an individual. • STRs are amplified from unique sequence outside the tandem repeats. • RNA can be amplified by PCR; first reverse transcribing it to DNA (cDNA) through reverse transcriptase. Figure 5-57 Site-directed mutagenesis. Allows for the “customization” of a protein. Page 118 Oligonucleotide containing a short gene segment with the desired altered base sequence corresponding to the new amino acid sequence is used as a primer in the reaction. In this case used DNA polymerase I. Can also use PCR to amplify a gene of interest and insert a mutation in the primer. Production of proteins • Cloned structural genes can be inserted into an expression vector to produce recombinant protein. • Relaxed control plasmid with an efficient promoter can produce up to 30% of the total cellular protein as the inserted structural gene. • Inclusion bodies-large amounts of insoluble and denatured protein. The protein must be extracted and renatured by dissolving in a chaotrope like urea or guanididium chloride and slowly renaturing the protein. Page 116 Figure 5-55 Electron micrograph of an inclusion body of the protein prochymosin in an E. coli cell. Production of proteins • Can engineer a signal sequence to target the protein to the periplasmic space of the bacteria so it folds properly. • Toxic proteins can be placed under an inducible promoter (lac) promoter in a plasmid that also has the gene for the lac repressor protein. – Binding of the lac repressor will prevent the expression from the lac promoter. – After cells have grown to high density, an inducer (isopropylthiogalactoside-IPTG, a synthetic nonmetabolizable analog of allolactose) is added to release the lac repressor protein. Reporter genes can be used to monitor transcription • Rate at which a gene is expressed dependent on upstream control sequences. • Replace the gene you want to monitor with a reporter gene. • Reporter genes encode proteins that can be easily detected by some assay. lacZ can be assayed with xgal and the production of blue color. • Another reporter is the green fluorescent protein (GFP) which produces a bioluminiscent protein when irradiated with UV or 400nm light. geneX Replace geneX with reporter gene in the correct reading frame lacZ In the presence of X-gal, expression will produce the blue color. Page 119 Figure 5-58 Use of green fluorescent protein (GFP) as a reporter gene. Transgenic organisms • Organisms expressing a foreign gene are considered transgenic. • Foreign gene referred to as transgene. • For the change to be permanent, transgene must be stably integrated into germ cell. • Established in mice by microinjection of DNA into a pronucleus of a fertilized ovum. • Can also be accomplished in an embryonic stem cell. Page 119 Figure 5-59 Microinjection of DNA into the pronucleus of a fertilized mouse ovum. Nucleic acid sequencing • Development of DNA sequencing techniques has spurred the huge amount of DNA sequence data (>35 billion nucleotides in 2003 and growing!) • Complete genomes determined for over 110 prokaryotes and over 11 eukaryotes. Nucleic acid sequencing • Development of DNA sequencing techniques has spurred the huge amount of DNA sequence data (>35 billion nucleotides in 2003 and growing!) • Complete genomes determined for over 110 prokaryotes and over 11 eukaryotes. Page 177 Table 7-3a Some Sequenced Genomes Page 177 Table 7-3bSome Sequenced Genomes. Nucleic acid sequencing • Chain terminator method (aka dideoxy sequencing)used to sequence long stretches of DNA. • Utilizes DNA polymerase to synthesize single stranded DNA. • Assembles the four deoxynucleoside triphosphates (dNTPs) into a complementary sequence. • Initiates from a primer sequence. • Sequence is terminated after the incorporation of 2’3’dideoxynucleoside triphosphate (ddNTP) P P P Base OCH2 O H H H HH H Page 178 Figure 7-14 Flow diagram of the chain-terminator (dideoxy) method of DNA sequencing. Figure 7-15 O O 35S O P O O P O O P O- O- Autoradiograph of a sequencing gel. O- H H H Page 179 Base O OCH2 O HH H A G C T A GC T Page 179 Chain-terminator method has been automated. Instead of radioactivity, use fluoresence-labeling techniques. 2 types used: 1. Four reaction/one gel systems - primers used in each of the four chain extension reactions are 5’-linked to a differently fluorescing dye. Loaded into a single lane of a gel. As each exits, the fluorescence is detected. 2. One reaction/one gel system - Each of the four ddNTPs used to terminate the chain extension is linked to a different fluroescing dye. The extension is carried out in a single vessel and the mixture is loaded into a single lane. Advanced systems use capillaries instead of slab gels. Genome sequencing • • • • • • • • • In order to sequence entire genomes, segments need to be assembled into contigs (contiguous blocks) to establish the correct order of the sequence. Chromosome walking may be one way to do so, but is prohibitively expensive. Two methods have been used recently: 1. Conventional genome sequencing-low resolution maps made by identifying “landmarks” in ~250 kb inserts in YACs. Landmarks are 200-300 bp segments, aka sequence tagged sites(STSs)-2 clones with the same STS overlap. STS-containing inserts are sheared randomly into ~40kB segments and cloned into cosmid vectors-used to create high resolution maps. The cosmid inserts are fragmented to smaller sizes and sequenced. Cosmid inserts are assembled by using the STS sequence overlaps and cosmid walking. Cannot be used effectively with sequences containing high amounts of repetitive sequence. (Use expressed sequence tags (ESTs)). Genome sequencing • 2. Shotgun strategy– genome library is randomly fragmented – large amount of cloned fragments are sequenced. – Genome is assembled by identifying overlaps between pairs of fragments. • • The probability that a base is not sequenced is e-c, c is the redundancy of coverage, c = LN/G, – where L is the average length of the cloned inserts in base pairs, – N is the number of inserts sequenced, – and G is the length of the genome in base pairs. • • • • The aggregate length of the gaps between contigs is G e-c and the average gap size is G/N. Bacterial genomes-shotgun strategy is straightforward. Gaps are filled in by synthesizing PCR primers and finishing a genome. Eukaryotic genomes-larger size so it must be carried out in stages using BACs and then identifying ~500 bp sequences from each to yield sequence tagged connnectors (STCs or BAC ends) This allows assembly via the overlapping of STCs. Page 180 Figure 7-17 Genome sequencing strategies. Human genome • 2.2 billion nucleotide sequence ~90% complete because of highly repetitive sequence. • About half of the human genome consists of various repeating sequences. • Only ~28% of the genome is transcribed to RNA • Only 1.1% to 1.4% of the genome (~5% of the transcribed RNA) encodes protein. • Only ~30,000 protein encoding genes (open reading frames or ORFs) identified. Predicted 50,000 - 140,000 ORFs. • Only a small fraction of human protein families are unique to vertebrates; most occur in other life forms. • Two randomly selected human genomes differ, on average, by only 1 nucleotide per 1250; that is, any 2 people are likely to be >99.9% identical. Human genome • 2.2 billion nucleotide sequence ~90% complete because of highly repetitive sequence. • About half of the human genome consists of various repeating sequences. • Only ~28% of the genome is transcribed to RNA • Only 1.1% to 1.4% of the genome (~5% of the transcribed RNA) encodes protein. • Only ~30,000 protein encoding genes (open reading frames or ORFs) identified. Predicted 50,000 - 140,000 ORFs. • Only a small fraction of human protein families are unique to vertebrates; most occur in other life forms. • Two randomly selected human genomes differ, on average, by only 1 nucleotide per 1250; that is, any 2 people are likely to be >99.9% identical. Chemical evolution • Evolutionary aspects of amino acid sequences. • Change stem from random mutational events that alter a protein’s primary structure. • Mutational change must offer a selective advantage or at least, not decrease fitness. • Most mutations are deleterious and often lethal so they are not reproduced. • Sometimes mutations occur that increase fitness of the host in its natural environment. • Example: Sickle-cell anemia. Page 183 Figure 7-18a Scanning electron microscope of human erythrocytes. (a) Normal human erythrocytes revealing their biconcave disklike shape. Page 183 Figure 7-18b Scanning electron microscope of human erythrocytes. (b) Sickled erythrocytes from an individual with sickle-cell anemia. Page 184 Figure 7-20 A map indicating the regions of the world where malaria caused by P. falciparum was prevalent before 1930. Chemical evolution • Pauling and co-workers showed that normal human hemoglobin (HbA) is more electronegative than sicklecell hemoglobin (HbS). • Sickle-cell anemia is inherited according to the laws of Mendelian genetics. • Homozygous for HbS is almost all HbS, phenotype=sickle cell anemia. • Heterozygous for HbS is ~40% HBs, phenotype=sickle cell trait. • Homozygous for HbA, normal human hemoglobin. Mutations in a- or b-globin genes can cause disease state • Sickle cell anemia – E6 to V6 • Causes V6 to bind to hydrophobic pocket in deoxy-Hb • Polymerizes to form long filaments • Cause sickling of cells • Sickle cell trait offers advantage against malaria • Cells sickle under low oxygen conditions and if infected with Plasmodium falciparum. • Causes the preferential removal of infected erythrocytes from circulation. Variations in homologous proteins • • • • • • • • • Similar proteins from related species likely derived from the same ancestor. A protein that is well adapted to its function will continue to evolve. Neutral drift-mutational changes in a protein that don’t affect its function over time. Homologous proteins-evolutionarily related proteins. Comparison of the primary structures of homologous structures can be used to identify which residues are essential to its function, lesser significance, and little function. Invariant residue-the same side chain at a particular position in the amino acid sequence of related proteins. If an invariant residue is observed between related proteins, it is likely necessary to some essential function of the protein. Other amino acids may have less stringent side chain requirements-where amino acids may be conservatively substituted-(be substituted with an amino acid with similar properties). If many amino acids tolerated at a specific position - hypervariable. Cytochrome c • Cytochrome c is nearly universal eukaryotic protein necessary for electron transport. • Vertebrates 103-104 residues; up to 8 more aas in other phyla. • Similarities are observed in an alignment. • 38 of 105 residues are invariant and the others are conservatively substituted. • 8 positions are hypervariable. • His 18 and Met 80 form bonds with the redox Fe of the heme group. Page 184 Table 7-4a Amino Acid Sequences of Cytochromes c from 38 species. Page 185 Cytochrome c • Evolutionary differences between two homologous proteins are determined by counting the amino acid differences between them. • Order of differences parallels taxonomy and can be put into a table. • This data can be used to construct a phylogenetic tree-a tree that indicates ancestral relationships among organisms and their proteins. Page 186 Figure 7-21 • Page 187 • Each branch point indicates a possible common ancestor to everything above it. Relative evolutionary distances between neighboring branch points are expressed as the number of amino acid differences per 100 residues of the protein (percentage of accepted point mutations or PAM units). Phylogenic tree of cytochrome c.