Genomics Chapter 18 Mapping Genomes Maps of genomes can be divided into 2 types -Genetic maps -Abstract maps that place the relative location of genes on chromosomes based on recombination frequency -Physical maps -Use landmarks within DNA sequences, ranging from restriction sites to the actual DNA sequence 2 Physical Maps Distances between “landmarks” are measured in base-pairs -1000 basepairs (bp) = 1 kilobase (kb) Knowledge of DNA sequence is not necessary There are three main types of physical maps -Restriction maps -Cytological maps -Radiation hybrid maps 3 Physical Maps Restriction maps -The first physical maps -Based on distances between restriction sites -Overlap between smaller segments can be used to assemble them into a contig -Continuous segment of the genome 4 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1. Multiple copies DNA enzyme B of a segment of DNA are cut with restriction enzymes. Molecular weight marker 2. The fragments produced by enzyme A only, by enzyme B only, and by enzymes A and B together are run side-by-side on a gel, which separates them according to size. 14 kb 14 kb 10 kb 9 kb 8 kb 6 kb 5 kb 2 kb 3. The fragments are arranged so that the smaller ones produced by the simultaneous cut can be grouped to generate the larger ones produced by the individual enzymes. 2 kb A 5 kb 3 kb 2 kb 2 kb 8 kb 9 kb A 5 kb 14 kb B 2 kb 3 kb A B A B 4. A physical map is constructed. 9 kb 0 2 kb 5 kb 5 kb 9 kb A A 10 kb 5 19 kb Physical Maps Cytological maps -Employ stains that generate reproducible patterns of bands on the chromosomes -Divide chromosomes into subregions -Provide a map of the whole genome, but at low resolution -Cloned DNA is correlated with map using fluorescent in situ hybridization (FISH) 6 Physical Maps 7 Physical Maps Radiation hybrid maps -Use radiation to fragment chromosomes randomly -Fragments are then recovered by fusing irradiated cell to another cell -Usually a rodent cell -Fragments can be identified based on banding patterns or FISH 8 Physical Maps Sequence-tagged sites -An STS is a small stretch of DNA that is unique in the genome -Only 200-500 bp -Boundary is defined by PCR primers -Identified using any DNA as a template -STSs essentially provide a scaffold for assembling genome sequences 9 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. STS sites STS 1 STS 2 STS 3 STS 4 DNA PCR primers 1. The location of 4 STSs in the genome is shown. PCR is used to amplify each STS from different clones in a library. Amplifying each STS by PCR generates a unique fragment that can be identified. PCR runs with four clones Clone A Clone B Clone C Clone D Longer fragments STS 4 STS 3 STS 2 STS 1 Shorter fragments 2. The products of the PCR reactions are separated by gel electrophoresis producing a different size fragment for each STS. STS 1 STS 2 Clone A STS 2 STS 3 STS 2 STS 3 STS 4 STS 3 STS 4 STS 3 STS 4 Clone B Clone C Clone D STS 1 STS 2 Contig 3. The presence or absence of each STS in the clones identifies regions of overlap. The final result is a contiguous sequence (contig) of overlapping clones. 10 Genetic Maps Genetic maps are measured in centimorgans -1 cM = 1% recombination frequency Linkage mapping can be done without knowing the DNA sequence of a gene -Limitations: 1. Genetic distance does not directly correspond to actual physical distance 2. Not all genes have obvious phenotypes 11 Genetic Maps Most common markers are short repeat sequences called, short tandem repeats, or STR loci -Differ in repeat length between individuals -13 form the basis of modern DNA fingerprinting developed by the FBI -Cataloged in the CODIS database to identify criminal offenders 12 Genetic Maps Genetic and physical maps can be correlated -Any cloned gene can be placed within the genome and can also be mapped genetically 13 Genetic Maps All of these different kinds of maps are stored in databases -The National Center for Biotechnology Information (NCBI) serves as the US repository for these data and more -Similar databases exist in Europe and Japan 14 Whole Genome Sequencing The ultimate physical map is the base-pair sequence of the entire genome -Requires use of high-throughout automated sequencing and computer analysis 15 Whole Genome Sequencing Sequencers provide accurate sequences for DNA segments up to 800 bp long -To reduce errors, 5-10 copies of a genome are sequenced and compared Vectors use to clone large pieces of DNA: -Yeast artificial chromosomes (YACs) -Bacterial artificial chromosomes (BACs) -Human artificial chromosomes (HACs) 16 -Are circular, at present Whole Genome Sequencing Clone-by-clone sequencing -Overlapping regions between BAC clones are identified by restriction mapping or STS analysis Shotgun sequencing -DNA is randomly cut into smaller fragments, cloned and then sequenced -Computers put together the overlaps -Sequence is not tied to other information 17 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Clone-by-Clone Method 1. Large DNA clones are first isolated. These are arranged into contiguous sequences based on overlapping tagged sites. 2. Large clones are fragmented into smaller clones for sequencing. 3. The entire sequence is assembled from the overlapping larger clones. a. Shotgun Method 1. Cut DNA of entire chromosome into small fragments and clone. 2. Sequence each segment and arrange based on overlapping nucleotide sequences. b. 18 The Human Genome Project Originated in 1990 by the International Human Genome Sequencing Consortium Craig Venter formed a private company, and entered the “race” in May, 1998 In 2001, both groups published a draft sequence -Contained numerous gaps 19 The Human Genome Project In 2004, the “finished” sequence was published as the reference sequence (REF-SEQ) in databases -3.2 gigabasepairs -1 Gb = 1 billion basepairs -Contains a 400-fold reduction in gaps -99% of euchromatic sequence -Error rate = 1 per 100,000 bases 20 Characterizing Genomes The Human Genome Project found fewer genes than expected -Initial estimate was 100,000 genes -Number now appears to be about 25,000! In general, eukaryotic genomes are larger and have more genes than those of prokaryotes -However, the complexity of an organism is not necessarily related to its gene number 21 Characterizing Genomes 22 Finding Genes Genes are identified by open reading frames -An ORF begins with a start codon and contains no stop codon for a distance long enough to encode a protein Sequence annotation -The addition of information, such as ORFs, to the basic sequence information 23 Finding Genes BLAST -A search algorithm used to search NCBI databases for homologous sequences -Permits researchers to infer functions for isolated molecular clones Bioinformatics -Use of computer programs to search for genes, and to assemble and compare genomes 24 Genome Organization Genomes consist of two main regions -Coding DNA -Contains genes than encode proteins -Noncoding DNA -Regions that do not encode proteins 25 Coding DNA in Eukaryotes Four different classes are found: -Single-copy genes : Includes most genes -Segmental duplications : Blocks of genes copied from one chromosome to another -Multigene families : Groups of related but distinctly different genes -Tandem clusters : Identical copies of genes occurring together in clusters -Also include rRNA genes 26 Noncoding DNA in Eukaryotes Each cell in our bodies has about 6 feet of DNA stuffed into it -However, less than one inch is devoted to genes! Six major types of noncoding human DNA have been described 27 Noncoding DNA in Eukaryotes Noncoding DNA within genes -Protein-encoding exons are embedded within much larger noncoding introns Structural DNA -Called constitutive heterochromatin -Localized to centromeres and telomeres Simple sequence repeats (SSRs) -One- to six-nucleotide sequences repeated 28 thousands of times Noncoding DNA in Eukaryotes Segmental duplications -Consist of 10,000 to 300,000 bp that have duplicated and moved Pseudogenes -Inactive genes 29 Noncoding DNA in Eukaryotes Transposable elements (transposons) -Mobile genetic elements -Four types: -Long interspersed elements (LINEs) -Short interspersed elements (SINEs) -Long terminal repeats (LTRs) -Dead transposons 30 Noncoding DNA in Eukaryotes 31 Expressed Sequence Tags ESTs can identify genes that are expressed -They are generated by sequencing the ends of randomly selected cDNAs ESTs have identified 87,000 cDNAs in different human tissues -But how can 25,000 human genes encode three to four times as many proteins? -Alternative splicing yields different proteins with different functions 32 Alternative Splicing 1 2 3 4 5 6 7 8 9 10 11 12 13 5´ cap 3´ poly-A tail Primary RNA transcript mRNA splicing exons introns 3 4 5 6 Processed RNA in brain 8 Processed RNA in muscle 9 10 12 5´ cap Mature mRNA in brain 1 3´ poly-A tail 2 4 5 6 8 9 10 13 5´ cap 3´poly-A tail Mature mRNA in muscle 33 Variation in the Human Genome Single-nucleotide polymorphisms (SNPs) are sites where individuals differ by only one nucleotide -Must be found in at least 1% of population Haplotypes are regions of the chromosome that are not exchanged by recombination -Tendency for genes not to be randomized is called linkage disequilibrium -Can be used to map genes 34 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. SNPs SNP SNP SNP Chromosome 1 A A C A C G C C A T T CGGGGT C AG T C G AC C G Chromosome 2 A A C A C G C C A T T CG AGGT C AG T C AAC C G Chromosome 3 A A C A T G C C A T T CGGGGT C AG T C AAC C G Chromosome 4 A A C A C G C C A T T CGGGGT C AG T C G AC C G a. Haplotypes Haplotype 1 C T C A A A G T A C G G T T C A G G C A Haplotype 2 T T G A T T G C G C A A C A G T A A T A Haplotype 3 C C C G A T C T G T G A T A C T G G T G Haplotype 4 T C G A T T C C G C G G T T C A G A C A b. Diagnostic SNPs A/G T/C C/G Haplotype 1 A T C Haplotype 2 A C G Haplotype 3 G T C Haplotype 4 A C G 35 Genomics Comparative genomics, the study of whole genome maps of organisms, has revealed similarities among them -For example, over half of Drosophila genes have human counterparts Synteny refers to the conserved arrangements of DNA segments in related genomes -Allows comparisons of unsequenced genomes 36 Genomics 37 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Rice Sugarcane Corn Wheat Genomic Alignment (Segment Rearrangement) 38 Genomics Organellar genomes -Mitochondria and chloroplasts are descendants of ancient endosymbiotic bacterial cells -Over time, their genomes exchanged genes with the nuclear genome -Both organelles contain polypeptides encoded by the nucleus 39 Genomics Functional genomics is the study of the function of genes and their products DNA microarrays (“gene chips”) enable the analysis of gene expression at the whole-genome level -DNA fragments are deposited on a slide -Probed with labeled mRNA from different sources -Active/inactive genes are identified 40 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1. Unique, PCR-amplified Arabidopsis genome 2. DNA is printed onto a microscope slide. fragments (1, 2, 3, 4...) are contained in each well of a plate. Plate containing genome fragments DNA microarray Microscope slide 1 2 Robotic quill 3 4 DNA 1 3. Samples of mRNA are obtained from two different tissues. Probes for each sample are prepared using a different fluorescent nucleotide for each sample. Flower-specific mRNA (sample 1) 2 3 4 4. The two probes are mixed and hybridized with the microarray. Fluorescent signals on the microarray are analyzed. Probe 1 Mix Hybridize Reverse transcriptase Fluorescent nucleotide Probe 2 cDNA probe Leaf-specific mRNA (sample 2) Reverse transcriptase Different fluorescent nucleotide cDNA probe Weak signal from probe 2 Similar signals from both probes Strong signal from probe Strong signal from probe 1 Weak signal from probe 1 41 Genomics Transgenics is the creation of organisms containing genes from other species (transgenic organisms -Can be used to determine whether: -A gene identified by an annotation program is really functional in vivo -Homologous genes from different species have the same function 42 Genomics 43 Proteomics Proteomics is the study of the proteome -All the proteins encoded by the genome The transcriptome consists of all the RNA that is present in a cell or tissue 44 Proteomics Proteins are much more difficult to study than DNA because of: -Post-translational modifications -Alternative splicing However, databases containing the known protein structural motifs exist -These can be searched to predict the structure and function of gene sequences 45 Proteomics 46 Proteomics Protein microarrays are being used to study large numbers of proteins simultaneously -Can be probed using: -Antibodies to specific proteins -Specific proteins -Small molecules The yeast two-hybrid system has generated large-scale maps of interacting proteins 47 Applications of Genomics The genomics revolution will have a lasting effect on how we think about living systems The immediate impact of genomics is being seen in diagnostics -Identifying genetic abnormalities -Identifying victims by their remains -Distinguishing between naturally occurring and intentional outbreaks of infections 48 Applications of Genomics 49 Applications of Genomics Genomics has also helped in agriculture -Improvement in the yield and nutritional quality of rice -Doubling of world grain production in last 50 years, with only a 1% cropland increase 50 Applications of Genomics Genome science is also a source of ethical challenges and dilemmas -Gene patents -Should the sequence/use of genes be freely available or can it be patented? -Privacy concerns -Could one be discriminated against because their SNP profile indicates susceptibility to a disease? 51