Chapter 20: Genomics and Proteomics

advertisement
Chapter 20: Genomics and Proteomics
In 1920, geneticists turned from the study of individual genes to focus on the entire genome of an
organism. The goal became to map out all of the genes found in an entire organism, and geneticists
developed a two part approach:
1. Identify spontaneous mutations or collect mutants by using chemical or physical agents
2. Generate genetic maps by linkage analysis using the mutant strand
This method is the backbone of genetic analysis and is still used today. However, using mutations have
limits.
1. At least one mutation per gene must be present for mutational analysis to be used.
2. Obtaining mutations is time consuming
3. Some mutations are lethal and some have no clear phenotype, making it difficult to map the
mutated gene
 Beginning in the 1980s, recombinant DNA technology was introduced as a way to map the human
genome. A new method, positional cloning, was developed and used to isolate and map genes one
at a time. By the mid 1980s, around 3500 genes had been identified and mapped.
 In 1977, Fred Sanger and colleagues began the study of genomes (genomics) using a method
developed to map the genome of a virus. Genomics includes several subfields: structural genomics,
functional genomics, and comparative genomics. Proteomics, the study of proteins, came as an
outgrowth of genomics.
 The Human Genome Project developed as an international effort to determine the sequence of the
3.2 billion base pairs making up the human genome and to identify all of the genes in it.
20.1 Genomics: sequencing is the basis for identifying and mapping all genes in a genome
 Clone-by-clone method: construction of cloned libraries of large fragments that include the entire
DNA in an organism’s genome. The clones are assembled into genetic and physical maps
encompassing the entire genome. The nucleotide sequence is determined clone by clone until the
entire length is sequenced. This method depends on restriction maps and large amounts of clones to
sequence. This method is time consuming.
 Shotgun method: two or three preparations of genomic DNA are made. One is cut into short
fragments, another is cut in longer portions, and the third is made in much larger portions. A library
is made from each preparation and clones are selected at random and sequenced. Software is used
to assemble long stretches of sequence from overlapping fragments from the library, using the
sequences from the larger clones as framework. This method was developed by Craig Venter and
colleagues in 1995.
20.2 An overview of genomic analysis
Compiling the Sequence – the genome is sequenced more than once to determine that it is error free.
The HGP sequenced the human genome 12 times using the shotgun process. The privately run process
examined a portion of the genome more than 35 times. A draft was finished in 2001, with some parts
unfinished. A final version was published in 2003.
Annotating the Sequence – Annotation is a process that identifies genes, their regulatory sequences,
and their functions. It also identifies no protein coding genes (like rRNA, tRNA, and small nuclear RNA).
Locating protein-coding genes is done by analyzing the sequence using software. Protein-coding genes
are composed of open reading frames (ORFs), nucleotide triplets that can be translated. Since
sequences are read three bases at a time, it is unclear where to begin studies. Analysis then begins at
the first nucleotide and searches for the ORF. Searching for ORFs starting with an ATG followed by a
termination sequence is one strategy for finding genes.
20.3 Functional genomics classifies genes and identifies their functions
After annotation, assigning functions to the genes comes next. Some have already had functions
assigned by the classical methods, but many have no function assigned. One approach used homology
searches, which involves similar genes isolated from other organisms and comparing the new gene with
the similar gene.
20.7 The Human Genome Project (HGP)
 The HGP goal was to determine the human genome by using recombinant DNA technology and DNA
sequencing instead of mutational analysis. The HGP has produced much information, and much of
that still requires interpretation. What is known though, humans and other organisms share a
common set of genes essential for cellular function and reproduction.
 In 1990, the Human Genome Project began under the direction of James Watson. It was designed to
sequence the entire DNA in the human genome, to identify and map the thousands of genes in
chromosomes, and establish the function of all genes. The HGP also set up the ELSI program (Ethical,
Legal, and Social Impact) to ensure that genetic information would be used in the proper way.
Major features of the human genome – In February 2001, about 96% of the euchromatic region (areas
that contain most of the structural genes) of the DNA had been analyzed. The remaining work was
finished by 2003, and attention is now directed at analyzing the data.
The unfinished tasks in human genome sequences – Two types of gaps remain in the sequence. 324
gaps remain in the euchromatin portion, most of which have duplicated regions that are difficult to
assemble. Other major gaps include areas in the heterochromatic regions (areas thought to lack
structural genes).
Major Features of the Human Genome
1. It contains more than 3 billion nucleotides, but protein-coding sequences make up only about
5% of the genome
2. It contains between 25000 and 30000 genes
3. More than 40% of the genes identified have no known molecular function
4. Genes are not uniformly distributed on the chromosomes. Gene-rich areas are separated by
gene-poor areas that account for 20% of the genome. Chromosome 19 has the highest gene
density, while chromosome 13 and the Y chromosome have the lowest density.
5. Human genes are larger and contain more introns than genes in invertebrates like Drosophila.
20.8 Comparative genomics is a versatile tool
Comparative genomics uses a variety of techniques and resources, including construction and use of
databases containing nucleic acid and amino acid sequences, gene mapping, and mutagenesis.
20.10 Proteomics identifies and analyzes the proteins in a cell
Proteomics defines the complete set of proteins encoded by a genome. It can be used to describe a set
of proteins expressed in a cell at a given time. In most genomes sequenced to date, many newly
discovered genes have no known function. In the human genome, about 41% of genes are of unknown
function, but it is estimated that 40-60% of human genes produce more than one protein.
Download