Chapter 16: Genome Analysis: DNA Typing, Genomics, and Beyond Some scientists said there was no reason to do it [The Human Genome Project] over 15 years. Why not do it over 25? One important reason is that if you did it over 25 years, most of the experienced scientists involved in it might be dead, at least mentally, by the time it was finished… Most people like to do things where they can see the results. James Watson, Genetics and Society (1993), p. 18. 16.1 Introduction • Levels of genome analysis range from personal identification to comparative analysis of entire genomes. 16.2 DNA typing • One of the most reliable and conclusive methods available for identification of an individual. • Technique developed by Alec Jeffrey’s and coworkers in 1985. • First called “DNA fingerprinting,” now called “DNA typing.” Applications of DNA typing • Establish paternity and other family relationships. • Identify potential suspects whose DNA may match evidence left at a crime scene. • Exonerate persons wrongly accused of crimes. • Match organ donors with recipients in transplant programs. • Identify catastrophe victims. • Detect bacteria and other organisms that may pollute air, water, soil, and food. • Determine whether a clone is genetically identical to the donor nucleus. • Trace the source of different marijuana plants. • Identify endangered and protected species as an aid to wildlife officials. DNA profiles of marijuana • DNA profiles generated by amplified fragment length polymorphism (AFLP). • Used to trace the source of marijuana samples to growers. • PCR amplification of restriction fragments to which adaptor oligomer sequences have been attached. • PCR primers recognize adaptors and bind to amplify different sized fluorescently-tagged DNA fragments. • Detected with a DNA sequencer. Nonhuman DNA typing • DNA of protected whales found at Japanese markets • Incriminating pets – The case of the Rottweilers’ saliva. – The case of the hair from Snowball the cat. DNA polymorphisms: the basis of DNA typing • Only about 0.1% of the human genome differs from one person to another. • With the exception of the human leukocyte antigen (HLA) region, genetic variation is relatively limited in coding DNA. • Less than 40% of the human genome is comprised of genes and gene-related sequences. • Intergenic DNA consists of unique or low copy number sequences and moderately to highly repetitive sequences. • The majority of DNA typing systems used in forensic casework are based on genetic loci with minisatellites or short tandem repeats (STRs). • Analyze multiple variable regions, called polymorphic markers. • The power of DNA evidence lies in statistics. • Aim to calculate the probability that only one person in a quadrillion (1015) could have the same profile of markers. A variety of DNA technologies are used in forensic investigations: • Minisatellite analysis • PCR-based analysis • STR analysis • Mitochondrial DNA analysis • Y chromosome analysis • Random amplified polymorphic DNA (RAPD) analysis Minisatellite analysis • Minisatellites are a special class of RFLP in which the variable lengths of the DNA fragments result from a change in the number, not the base sequence, of minisatellite repeats. • Also known as variable number tandem repeats (VNTRs). Classic “DNA fingerprinting:” minisatellite analysis with a multilocus probe • Unique biological identifier for each individual. • Essentially constant for an individual, irrespective of the source of DNA. • Simple Mendelian pattern of inheritance. • Requires relatively large amounts of DNA. • Does not work well with degraded samples. Minisatellite analysis with a single-locus probe • A single-locus probe allows the detection of a single minisatellite DNA locus on one chromosome. • To increase the sensitivity, 3-5 singlelocus probes are mixed in a single-locus “cocktail.” Polymerase chain reaction-based analysis • Sufficient DNA can be collected from saliva on a postage stamp or bones from skeletons. • Even highly degraded DNA can be amplified, as long as the target sequence is intact. Short tandem repeat analysis • Currently the most widely used DNA typing procedure in forensic genetics. • The variability in STRs mainly occurs by slippage during DNA replication, rather than by unequal crossing-over. Multiplex analysis of STRs • Simultaneous amplification of many targets of interest in one reaction by using more than one pair of primers. • The FBI uses a standard set of 13 specific STR regions for CODIS (The Combined DNA Index System). Example: • 15 different STRs and a gender-specific marker amplified by PCR. • One primer in each pair is labeled with a fluorescent tag for 4 color detection. • Detect PCR amplification products using an automated sequencer. • Separated by size and detected by color after laser-induced excitation. Mitochondrial DNA analysis • Every cell has hundreds of mitochondria with several hundred mtDNA molecules. • Older biological samples (e.g. strands of hair, solid bone, or teeth) often lack usable nuclear DNA but have abundant mtDNA. • mtDNA has been successfully isolated from fossil bones. • Analysis by PCR amplification and direct sequencing of two highly variable regions in the D loop region. • Can only identify a person’s maternal lineage. Y chromosome analysis • Y chromosome-specific STRs. • Paternity testing of male offspring • Analyzing biological evidence in criminal casework involving multiple male contributors. Randomly amplified polymorphic DNA (RAPD) analysis • No knowledge of an organism’s DNA sequence is required. • PCR primers consist of random sequences. • e.g. The case of the Palo Verde tree seed pods. • e.g. Differentiation between Bacillus species. 16.3 Genomics, proteomics, and beyond • Whereas gene discovery once drove DNA sequencing, now the sequencing of entire genomes drives gene discovery. What is bioinformatics? • Area of computer science devoted to collecting, organizing, and analyzing DNA and protein sequences and all the data being generated by genomics and proteomics labs. Tools of bioinformatics: • Locate and align sequences. • Assemble consensus sequences. • Analyze properties of proteins. • Analyze sequence patterns to locate restriction sites, promoters, DNA binding domains, etc. • Phylogenetic analysis. • Basic local alignment search tool (BLAST). • The most commonly used genome tool. • Example: Search for all the predicted protein sequences that are related to a “query sequence.” Genomics • The comprehensive study of whole sets of genes and their interactions rather than single genes. • Comparative analysis of genomes based on the availability of complete genome sequences. Proteomics • The comprehensive study of the full set of proteins encoded by a genome—the “proteome.” • Protein biochemistry on a “highthroughput” scale. The age of “omics” and systems biology • A whole set of related terms coined to describe the comparative study of databases. – e.g., transcriptomics, metabolomics, kinomics, glycomics, lipidomics • Interactomics: the study of macromolecular machines, mapping protein-protein interactions throughout a cell. • Systems biology aims to make sense of all the data arising from the study of biomolecular networks. • Uses both experimental and computational approaches to model these interactions. • “Attempts to piece together everything.” 16.4 Whole genome sequencing Major milestones in sequencing technology • Development of automated DNA sequencing. • Development of the BLAST algorithm. • Development of bacterial artificial chromosome (BAC) vectors. Two main genome sequencing methods • Clone by clone genome assembly approach: – Used by the publicly funded international sequencing consortium for the human genome. • Whole-genome shotgun approach: – Used by the privately funded Celera Genomics Corporation for the human genome. Clone by clone genome assembly approach • Restriction fragments of ~150 kb are cloned into BAC vectors. • A physical map of the genome is produced. • The BAC clones are broken up into smaller fragments, subcloned, and sequenced. • This places the sequences in order so they can be pieced together. • Time consuming, but precise. Whole-genome shotgun approach • Plasmid clones with 2-10 kb inserts are prepared directly from fragmented genomic DNA. • Clones are randomly selected for sequencing. • Sequence is reassembled in order with the aid of a supercomputer. • More rapid, but often results in gaps in the sequence. Rough drafts versus finished sequences • “Rough draft” of the human genome reported in 2001 by the publicly and privately funded groups. • “Finished” sequence reported in 2004. • More accurate and complete, but still contains some gaps. • Annotation of a sequenced genome is an in-depth analysis of all functional elements of the genome. • Much of the emphasis is on the gene content, with the aim of characterizing all of the genes and their functions. Comparative analysis of genomes • Sequence and comparative analysis of nonmammalian genomes help to provide unique perspectives on the evolution of anatomy, physiology, development, and behavior. • Prior to the draft sequence, estimated that the human genome contained at least 100,000 genes. • Current estimate of 20,000 to 25,000 protein-coding genes came as a surprise. • What makes us uniquely human? • The answer lies somewhere within the 35 million single-nucleotide substitutions, 5 million small insertions and deletions, local rearrangements, and a chromosomal fusion that distinguish us from the chimpanzee (Pan troglodytes). Comparative analysis of genomes: insights from pufferfish and chickens • Comparative genome analysis allows researchers to assess changes in gene structure and sequence that have occurred during evolution. • Homologous sequences share a common evolutionary ancestry. • Orthologs are genes in different species that are homologous because they are derived from a common ancestral gene. • Paralogs are two genes in a genome that are similar because they arose from a gene duplication. • Synteny is conservation in genetic linkage between the genes of distantly related organisms. • Suggests that the conserved order of loci on a chromosome may be of importance for gene regulation. Insights from the pufferfish genome • Comparison of the genome sequence of the pufferfish with that of humans. • Researchers have deduced that the extinct ancestor of ray-finned fish and lobe-finned fish had 12 pairs of chromosomes. Insights from the chicken genome • Potential for using comparative sequence analysis to map conserved regulatory elements present in the human genome. What is a gene and how many are there in the human genome? Three essential features of a gene: • Expression of a product. • Requirement that it be functional. • Inclusion of both coding and regulatory regions. • A gene is a complete chromosomal segment responsible for making a functional product. • How many genes are there in the human genome? • Gene “hunting” or gene prediction computer programs have become much more sophisticated, but no program predicts all genes correctly. • Some recognize genes by detecting distinctive patterns in DNA sequences • Others detect new genes based on their similarity to known genes. • The Encyclopedia of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome. • This includes protein-coding genes, nonprotein-coding genes, transcriptional regulatory elements, and sequences that mediate chromosome structure and dynamics. 16.5 High-throughput analysis of gene function • Methods for genome and proteome analysis are called “high-throughput” because the activities of thousand of genes and their products are studied at the same time. Some classic methods for high-throughput analysis of gene function • DNA microarrays • Protein arrays • MALDI-TOF • Tandem mass spectrometry DNA microarrays • Analysis of the transcriptional activity of thousands of genes simultaneously. • Compare transcription programs of cells or organisms during specific physiological responses, developmental processes, or disease states. Protein arrays Two common types: • Analytical protein array. • Functional protein array. Analytical protein array • Monitor protein expression levels. • Clinical diagnostics. Functional protein array • Analyze enzymatic activities, proteinprotein interactions, post-translational modifications. • Drug-target identification. • Mapping biological pathways. Mass spectrometry Two popular strategies: • Peptide mass fingerprinting using MALDI-TOF. • Shotgun proteomics using MS/MS Peptide mass fingerprinting using MALDI-TOF • Analysis of a single isolated protein. • MALDI-TOF (Matrix Assisted Laser Desorption/Ionization-Time of Flight) mass spectrometry. • Time of flight is inversely proportional to mass and directly proportional to charge. • Measurement of the number of ions at each m/z value (mass to charge ratio). • Computer database: identify protein from which peptides originated. Shotgun proteomics using MS/MS • “Interrogation” of an entire proteome. • Tandem mass spectrometry (MS/MS). • The process produces a collection of peptide ion fragments that differ in mass by a single amino acid. • Measurement of mass to charge (m/z) ratios of the fragments allows the amino acid sequence to be read. The nucleolar proteome • Analysis of the nucleolar proteome by shotgun proteomics. • Group of candidate novel nucleolar proteins identified by mass spectrometry. • Isolate corresponding cDNAs. • Subclone into YFP expression vectors. • Observe localization in transfected cells by confocal microscopy. 16.6 Genome-wide association studies • All human individuals share genome sequences that are approximately 99.9% the same. • The remaining variable 0.1% is responsible for the genetic diversity between individuals. • Most common human traits and diseases have a polygenic pattern of inheritance. • This means that DNA sequence variants at many genetic loci influence the phenotype. • Genome-wide association studies (GWAS) have identified more than 3000 variants associated with 150 human traits. • Example: Hundreds of genetic variants in at least 180 loci influence adult height. • Projects investigating cancer genomes and the genomes of people with diabetes, Alzheimer’s disease, Crohn’s disease, and other disorders are under way. • This type of meta-analysis screens databases of single nucleotide polymorphisms, or copy number variants, to test for the association of a particular trait with each polymorphism. Single nucleotide polymorphisms • Two or more possible nucleotides occur at a specific mapped location in a genome. • e.g. ATGCCTA or ATGCTTA • For a variation to be considered a SNP, must occur in at least 1% of the population. • ~7 million SNPS with an allele frequency >5%. • ~4 million with an allele frequency of 1 to 5%. • Map of SNPs can be used to scan the human genome for haplotypes associated with common diseases. – e.g. Late onset Alzheimer’s disease • Haplotypes are patterns of sequence variation; i.e., stretches of DNA containing a distinctive set of alleles. Mapping disease-associated SNPs: Alzheimer’s disease • Two SNPs in the apolipoprotein E gene result in three possible alleles. • An individual with at least on apoļ„4 (E4) allele has a greater chance of developing Alzheimer’s. Copy number variants (CNVs) • Variation in the genome in which entire genes or genomic regions are deleted, duplicated, or rearranged. • CNVs can affect from one kilobase to several megabases of DNA. • In some cases CNVs have been linked to disease. • Example: Rare structural variations in genes that affect neuronal development and signaling. • These CNVs may account for much of the heritability underlying autism. • Each person with autism may carry a unique set of “autism loci,” but the biological pathways affected by these CNVs are likely to be similar. • CNVs are also associated with neuroblastoma, Crohn’s disease, and schizophrenia. • Will these associations stand the test of further research? Genes polymorphisms and human behavior Human behavior • Personality • Temperament • Cognitive style • Psychiatric disorders Oversimplified model of human behavior • Direct linear relationship between individual genes and behavior. More accurate model • Complex gene networks and multiple environmental factors affect brain development and function, which in turn will influence behavior. • Heredity definitely plays some role in behavior but DNA is not destiny. • In general, be wary of announcements in the popular media about scientists finding “the gene” for an aspect of human behavior. Why is there a lack-of-progress in finding “behavior” genes? • Behavioral traits are polygenic. • Gene-environment interactions. • Behavioral traits tend to be inexactly defined. • Sample bias. • Inadequate sample size. Aggressive, impulsive, and violent behavior • Family, twin, and adoption studies have suggested heritability of 0% to >50% for a predisposition to violent behavior. • The case of the “extra” Y chromosome. • Polymorphism in the transcriptional control region of the monoamine oxidase A (MAOA) gene. MAOA functional length polymorphism • MAOA metabolizes several neurotransmitters in the brain, such as dopamine and serotonin. • Prevents excess neurotransmitters from interfering with communication among neurons. High activity alleles • Alleles with 3.5 or 4 copies of the repeat sequence are transcribed more efficiently and produce more MAOA enzyme. Low activity alleles • Alleles with 3 or 5 copies of the repeat sequence are transcribed less efficiently and produce less MAOA enzyme. High activity alleles (3.5 or 4 repeats) • Less likely to develop antisocial behavior. Low activity alleles (3 or 5 repeats) • More likely to develop antisocial behavior if maltreated as children. Schizophrenia susceptibility loci • Severe psychiatric disorder that affects about 1% of the population. • Twin and adoption studies show the risk of developing schizophrenia is increased among relatives of affected individuals. • Multilocus model. • A number of potential susceptibility loci and CNVs have been described. • Most evidence for linkage in loci that encode proteins involved in neurotransmission, axon guidance, and cell-cell signaling in the brain. What remains unknown • The disease risk conferred by each locus. • The extent of genetic variability. • The degree of interaction among loci and the environment.