ANALYSIS OF DNA AND GENOMES 1. Describe the properties of restriction nucleases and understand how they can be used to join DNA fragments. 2. Describe how gel electrophoresis is used to separate DNA molecules by size. 3. Understand how nucleic acid hybridization techniques are used to detect the amount of a particular DNA or RNA molecule present in a complex mixture. Understand how the reaction conditions can be adjusted to allow either hybridization of only identical sequences or also of related ones. 4. Describe how Southern blotting is used to detect a particular DNA molecule and Northern blotting is used to detect a particular RNA molecule present in a complex sample. Understand how Southern blotting is used to detect a mutation to a particular gene. 5. Understand how DNA cloning with plasmid cloning vectors that are introduced into bacteria is used to generate many copies of a specific DNA sequence. 6. Describe the sequence composition of a genomic DNA library and how this type of library is generated. 7. Describe the sequence composition of a cDNA library and how this type of library is generated. 8. Distinguish between a genomic DNA and a cDNA clone and describe different uses for each one. 9. Understand how the polymerase chain reaction is used to amplify a segment of DNA. 10. Describe how a cDNA corresponding to one particular mRNA present in a complex mixture can be amplified by the polymerase chain reaction. 11. Understand how PCR analysis of variable number of tandem repeat loci is useful in forensic investigation and paternity testing. 12. Understand how hybridization with allele-specific oligonucleotides can be used in genetic diagnosis. 13. Describe how microarrays can be used to simultaneously analyze the expression of many genes or to detect genetic variations. 14. Describe the dideoxy method and highly parallel methods for sequencing DNA. 15. Describe how a protein sequence is deduced from a cDNA sequence. Describe the different strategies used to locate protein coding sequences within genomes. 16. Describe how antibodies can be used to detect specific proteins, including their use in Western blotting. 17. Describe how introducing expression vectors into cells is used to produce large amounts of a desired protein. 18. Understand how the approximate location of a gene that causes a disease can be determined by linkage analysis with physical markers. Describe the types of markers that are used. 19. Describe the inheritance patterns for autosomal dominant, autosomal recessive, and X-linked recessive diseases. Understand how diseases can have complex patterns of inheritance. 20. Understand how genome-wide association studies can be used to identify genetic variation associated with increased risk of developing a complex genetic disease. 21. Describe how gene targeting in mice is used to analyze the function of particular genes. 22. Understand how RNA interference can be used to experimentally turn off expression of a particular gene and how CRISPR-Cas9 can be used for targeted genome manipulation. ANALYSIS OF DNA AND GENOMES 1. Basic methods for isolating, manipulating, and analyzing DNA (recombinant DNA technology) a. Restriction nucleases Reproducibly cut DNA at specific sites that are recognized by a 4-8 bp sequence Many of these enzymes produce staggered cuts that leave single-stranded cohesive ends Cohesive ends of two different fragments that were produced by the same enzyme are complementary and can be readily joined to create recombinant DNA b. Gel electrophoresis DNA is negatively charged and will travel through electric field Gel matrix serves as molecular sieve that separates DNA molecules by size; larger molecules move slower through gel DNA is visualized by either staining with ethidium bromide which fluoresces under ultraviolet light or by prior incorporation of a radioisotope (usually 32P) into DNA which is then detected by autoradiography c. Nucleic acid hybridization Double stranded DNA can be denatured into two single strands by heating or pH extremes; complementary DNA or RNA strands can renature (hybridize) under appropriate conditions Many hybridization procedures use a labeled single stranded DNA probe to detect presence of a particular DNA or RNA species from a complex mixture Common method for generating probe in vitro - Purified DNA fragment denatured and annealed to pool of short random primers; - DNA polymerase then incorporates labeled nucleotides to create labeled DNA molecules Stringency of hybridization - Hybridization temperature determines whether only identical or also related sequences can hybridize - Temperatures slightly below melting temperature only permit hybridization of perfectly matched sequences (high stringency); detect one gene from entire genome - Lower temperatures allow related sequences with some mismatches to also hybridize (low stringency); detect related genes or homologous genes from another organism 2. Southern and Northern blotting 1 a. Complex mixture of DNA (for Southern blotting) generated by restriction nuclease or RNA (for Northern blotting) separated by gel electrophoresis prior to hybridization b. Following electrophoresis, nucleic acids transferred to and immobilized on membrane; for Southern blotting, molecules in gel are denatured prior to transfer c. Membrane exposed to solution containing labeled probe; molecules that hybridize to probe are identified as discrete bands d. Southern blotting can detect a mutation in a particular gene caused by insertion or deletion of a segment of DNA e. Northern blotting can detect if expression of a particular gene is altered under a given condition, for example in a mutant organism 3. DNA cloning a. Describes making of many identical copies of a DNA molecule; also describes isolation of one segment of DNA such as a gene from the cell’s total DNA b. To clone using bacteria, segment of DNA is inserted into cloning vector, which is then propagated in host cells c. Cloning vectors are commonly bacterial plasmids (small circular DNA molecules); plasmid is cut with restriction nuclease and joined with DNA fragment to be cloned d. Recombinant plasmid DNA is introduced (transfected) into bacteria; propagation of bacteria in culture results in repeated replication of plasmid, which can be easily purified from bacteria 4. DNA library a. Collection of cloned DNA fragments, which are commonly contained in plasmids in bacterial host b. Set of recombinant plasmids is generated and transfected into bacteria; bacteria are grown on plates to form isolated colonies; bacteria in each colony contain plasmid with one particular cloned DNA fragment c. Genomic DNA library Contains the entire genome of a particular individual Total DNA is isolated from cells, cut with restriction nuclease, inserted into plasmids, and propagated in bacteria Fragments in library contain all genes from the individual; also includes abundance of noncoding DNA d. cDNA (complementary DNA) library Contains only DNA sequences that are transcribed into mRNA; DNA is generated that is complementary to mRNA 2 mRNA isolated from cells is reverse transcribed into single stranded DNA with reverse transcriptase; DNA polymerase is used to make double stranded cDNA which is inserted into plasmids and propagated in bacteria Clones in library represent all mRNAs expressed in a particular cell type; different libraries will be generated using different cell types from same organism e. Uses of genomic and cDNA libraries cDNA clones useful for deducing amino acid sequence of protein or for bulk production of protein in bacteria or other cells Genomic clones useful for obtaining gene regulatory sequences or for sequencing genomes Libraries can be screened to select clones with particular properties, such as based on sequence (by hybridization) or function of gene product 5. Polymerase chain reaction (PCR) a. Amplification of a segment of DNA from a complex mixture in vitro without using bacterial host b. Need sequence information of short segments at each end of sequence to be amplified c. Use two chemically synthesized oligonucleotides each complementary to one of the strands at opposite ends of sequence to be amplified; serve as primers for DNA polymerization reactions d. Each cycle consists of three different temperatures for denaturing DNA strands, annealing primers to template DNA, and DNA synthesis from primers; requires heat stable DNA polymerase e. Perform repeated cycles; DNA that is synthesized in one cycle serves as template in subsequent cycles which results in exponential amplification of region between primers f. Can obtain genomic DNA and cDNA clones by PCR For genomic clones, DNA isolated from cells, and PCR used to obtain DNA located between primers For cDNA clones, mRNA isolated from cells and reverse transcribed with reverse transcriptase; PCR used to obtain cDNA located between primers g. Useful in forensic analysis and paternity testing Examine segments whose length is highly variable between individuals; often use microsatellites (runs of short repeated sequences) whose lengths vary, known as variable number of tandem repeats (VNTR) 3 PCR using primers to non-variable segments that flank a particular VNTR; reactions analyzed by gel electrophoresis; each individual will usually have different products from paternal and maternal inherited alleles By examining five to ten different VNTRs, a very precise genetic fingerprint is established for an individual h. Useful for detecting presence of low levels of viral DNA indicative of viral infection; useful for detection of many genetic diseases, particular those involving insertions and deletions 6. Genetic analysis using allele-specific oligonucleotides a. Can detect small differences between different alleles, even single base pair differences b. Isolate DNA from cells and PCR amplify segment of gene that includes region to be analyzed c. Separate reaction by gel electrophoresis and transfer to membrane d. Hybridize with labeled oligonucleotide probe (chemically synthesized single stranded DNA approximately 20 nucleotides in length) that recognizes a specific allele; use conditions that allow only perfect matches to hybridize e. Repeat procedure using oligonucleotide probe specific for another allele of same gene, for example to distinguish between normal and disease-causing alleles 7. Hybridization using microarrays a. Hybridization technique for simultaneously monitoring expression of thousands of genes or for detecting genetic variation b. Slide generated containing large, dense array of DNA probes each of known sequence and position on slide c. Isolate from cells genomic DNA sample or mRNA sample that is reverse transcribed to cDNA d. DNA sample is labeled with fluorescent dye and hybridized to slide e. Intensity of fluorescence at each spot indicates abundance of particular segment of DNA within sample f. Applications Analyze genetic variation including single nucleotide polymorphisms (SNPs) Determine gene expression patterns that underlie many cellular processes Distinguish among different types of cancer cells based upon characteristic gene expression patterns 8. Sequencing DNA 4 a. Dideoxy (Sanger) method Synthesis of purified DNA in vitro using DNA polymerase, primer, and four deoxynucleotides Four reactions each which includes a small amount of one dideoxynucleoside triphosphate (ddNTP); rare incorporation of a ddNTP blocks further chain growth Reaction with a particular ddNTP generates set of fragments whose lengths indicate positions where the nucleotide is present Each reaction separated by gel electrophoresis in parallel lanes on gel; examining all fragments in order of size reveals sequence b. Automated sequencing One reaction includes small amounts of all four ddNTPs, each ddNTP labeled with a different color fluorescent dye Detector located at bottom of gel reads the color of the label in each fragment c. Sequencing genomes by shotgun sequencing method Generate several genomic libraries with different insert sizes Perform sequencing reactions on millions of different genomic DNA clones; will have segments of sequence overlap between different clones Complex algorithms for assembly of sequence data in appropriate order on chromosomes based on sequence overlaps d. Highly parallel sequencing Sample preparation DNA randomly sheared and ligated to adaptors; DNA molecules attached to solid surface Clonal PCR-based amplification from single DNA molecules on a solid surface; adaptor sequences serve as priming sites for PCR amplification Form high density array of clonal DNA clusters (up to millions of clusters) Sequencing by synthesis- repeated cycles using sequencing instrument Polymerase-catalyzed addition of nucleotide that is complementary to template strand Fluorescence or chemiluminescence imaging to detect nucleotide incorporation at each DNA cluster One technology uses four different types of fluorescent-labeled reversible dideoxy terminator nucleotides; cycles involve addition of nucleotide, imaging of fluorescence at each DNA cluster to identify which nucleotide was incorporated, removal of blocked terminus and fluorophore to allow subsequent cycle Other technology couples pyrophosphate release during nucleotide incorporation to reaction that produces chemiluminescence signal; cycles involve addition of one of four nucleotides and chemiluminescence imaging at 5 each DNA cluster to identify if nucleotide was incorporated, each of four nucleotides added sequentially 9. Finding DNA sequences that encode proteins a. From cDNA sequence- six potential reading frames; usually only one is recognized as correct due to protein coding region that begins with ATG and ends with stop codon (open reading frame) and that is reasonably long; other frames usually contain frequent stop codons; b. From genome sequence- more difficult because vast majority of sequence is noncoding Search for open reading frames; must account for long introns within open reading frames by searching for sequences that signal intron/exon boundaries; also searching for upstream regulatory sequences helps to find genes Sequence cDNAs; large collection of cDNA sequences (database) can be compared to genomic sequence to locate exons and introns in genes Compare sequences between species, for example human and mouse; conserved sequences usually indicative of exons that encode proteins 10. Use of antibodies to detect specific proteins a. Antibodies are proteins produced by immune system; produced in billions of different forms that bind to different targets known as antigens b. Can be generated by injecting animal with antigen and collecting specific antibodies produced by immune cells c. In typical application, primary antibody recognizes antigen of interest; detection occurs from secondary antibodies coupled to a marker (enzyme, fluorescence) that bind to primary antibody d. Western blotting- use antibody to detect specific protein from complex mixture that has been separated by polyacrylamide-gel electrophoresis; proteins from gel transferred to and immobilized on membrane prior to antibody detection 11. Producing proteins in large amounts a. For medically useful proteins, such as insulin, growth hormone, interferon, or for research use b. Expression vectors contain strong promoter to drive transcription of adjacent protein coding gene c. Protein coding sequence, usually from cDNA, inserted into expression vector; resultant vector introduced into cells in culture d. Different expression vectors designed for bacteria, yeast, insect, or mammalian cells 6 e. High abundance of protein facilitates purification; can engineer gene to produce protein with molecular tag that facilitates affinity purification 12. Genetic approach for determining gene function and identifying disease-related genes a. Determine genotype of organism with particular phenotype Find genes responsible for genetic diseases in humans Find mutated genes in model organisms subjected to random mutagenesis that have interesting phenotypes Linkage analysis - The closer two loci are on the same chromosome, the greater chance that they will be passed on to offspring together - Useful physical markers have known locations in genome and have at least two different forms (polymorphic); VNTRs can be used as markers; also many SNPs identified in humans which can be detected by hybridization techniques - Examine relationship in families between many markers and disease (or phenotype) - If a particular marker is almost always inherited with disease (or phenotype), mutated gene is located near that marker - If genome is sequenced, can examine candidate genes located at that region b. Inheritance patterns Most diseases have genetic component; those with simple Mendelian inheritance easier to determine responsible gene because defect in single gene has an overwhelming effect - Autosomal dominant- one copy of defective gene causes disease; if one parent has disease, 50% chance of offspring being affected - Autosomal recessive- both copies of gene must be defective; if both parents are carriers, 25% chance of offspring being affected, 50% chance of offspring being carriers - X-linked recessive- if mother is carrier, 50% chance of son being affected, 50% chance of daughter being carrier; if father has disease, all daughters will be carriers Complex genetic diseases- no simple inheritance pattern; can inherit an increased risk; dependent upon multiple genes and environment; many common diseases such as hypertension, heart disease, diabetes, bipolar disorder c. Genome-wide association studies to study complex genetic diseases Compare frequency of each SNP in disease and control groups Most SNPs show no significant difference in frequency of each allele between two groups 7 Any SNP in which one allele occurs at a greater frequency in disease group indicates SNP allele is marker for genetic risk factor; these SNPs mark general location of the genetic alteration contributing to disease risk By analyzing ~1000-2000 individuals for disease and control groups using DNA microarrays that type ~500,000 SNPs, have been able to identify specific genetic variations that increase risk by as little as ~1.2 times 13. Gene targeting in mice to determine gene function (reverse genetics) a. Introduce mutated, often inactivated, gene into embryonic stem (ES) cells in culture; in rare instances introduced gene replaces one copy of cellular counterpart by homologous recombination b. Identify colonies in which cellular gene has been replaced using Southern blotting or PCR c. Inject mutant ES cells into early embryo; implant into mouse; breed offspring that have mutated gene in germ cells 14. RNA interference a. Turn off expression of specific gene to study its function; potential therapeutic use b. Mechanism Small fragments (21-23 bp) of double-stranded RNA known as small interfering RNA (siRNA) that are complementary to particular mRNA sequence introduced into cells; alternatively, larger double stranded RNA can be converted to siRNA Multi-protein RNA-Induced Silencing Complex (RISC) acts with siRNA to specifically cleave appropriate mRNA 15. Genome Engineering using CRISPR (clustered regularly interspaced short palindromic repeat)/Cas9 (CRISPR-associated nuclease 9) a. Targeted editing of genome; can be used to study gene function; potential therapeutic use in correcting disease-causing mutations b. Mechanism Introduce into cells Cas9 and guide RNA (gRNA); particular gRNA contains ~20 nt sequence that matches desired genomic target gRNA complexes with Cas9, guides it to genomic target and makes doublestrand break genomic target (called protospacer) requires adjacent protospacer adjacent motif (PAM) of NGG for recognition by Cas9 Endogenous machinery repairs double-strand break; nonhomologous endjoining results in mutation (usually deletion); usually leads to inactive gene by disrupting downstream reading frame 8 Homologous repair template with desired changes can be introduced to facilitate precise gene editing by homologous recombination c. Technique can be used for efficient generation of mutant organisms by microinjecting Cas9, sgRNA, and repair template into one-cell embryos 9