Chapter 16: Genome Analysis: DNA Typing, Genomics, and

advertisement
Chapter 16:
Genome Analysis: DNA
Typing, Genomics, and
Beyond
Some scientists said there was no reason to do it
[The Human Genome Project] over 15 years.
Why not do it over 25? One important reason is
that if you did it over 25 years, most of the
experienced scientists involved in it might be
dead, at least mentally, by the time it was
finished… Most people like to do things where
they can see the results.
James Watson, Genetics and Society (1993), p.
18.
16.1 Introduction
• Levels of genome analysis range from
personal identification to comparative
analysis of entire genomes.
16.2 DNA typing
• One of the most reliable and conclusive
methods available for identification of an
individual.
• Technique developed by Alec Jeffrey’s and
coworkers in 1985.
• First called “DNA fingerprinting,” now called
“DNA typing.”
Applications of DNA typing
• Establish paternity and other family
relationships.
• Identify potential suspects whose DNA may
match evidence left at a crime scene.
• Exonerate persons wrongly accused of crimes.
• Match organ donors with recipients in
transplant programs.
• Identify catastrophe victims.
• Detect bacteria and other organisms that may
pollute air, water, soil, and food.
• Determine whether a clone is genetically
identical to the donor nucleus.
• Trace the source of different marijuana plants.
• Identify endangered and protected species as
an aid to wildlife officials.
DNA profiles of marijuana
• DNA profiles generated by amplified
fragment length polymorphism (AFLP).
• Used to trace the source of marijuana
samples to growers.
• PCR amplification of restriction fragments
to which adaptor oligomer sequences
have been attached.
• PCR primers recognize adaptors and
bind to amplify different sized
fluorescently-tagged DNA fragments.
• Detected with a DNA sequencer.
Nonhuman DNA typing
• DNA of protected whales found at
Japanese markets
• Incriminating pets
– The case of the Rottweilers’ saliva.
– The case of the hair from Snowball the cat.
DNA polymorphisms: the basis of
DNA typing
• Only about 0.1% of the human genome
differs from one person to another.
• With the exception of the human
leukocyte antigen (HLA) region, genetic
variation is relatively limited in coding
DNA.
• Less than 40% of the human genome is
comprised of genes and gene-related
sequences.
• Intergenic DNA consists of unique or low
copy number sequences and moderately
to highly repetitive sequences.
• The majority of DNA typing systems used
in forensic casework are based on
genetic loci with minisatellites or short
tandem repeats (STRs).
• Analyze multiple variable regions, called
polymorphic markers.
• The power of DNA evidence lies in
statistics.
• Aim to calculate the probability that only
one person in a quadrillion (1015) could
have the same profile of markers.
A variety of DNA technologies are used in
forensic investigations:
•
•
•
•
•
•
Minisatellite analysis
PCR-based analysis
STR analysis
Mitochondrial DNA analysis
Y chromosome analysis
Random amplified polymorphic DNA
(RAPD) analysis
Minisatellite analysis
• Minisatellites are a special class of RFLP
in which the variable lengths of the DNA
fragments result from a change in the
number, not the base sequence, of
minisatellite repeats.
• Also known as variable number tandem
repeats (VNTRs).
Classic “DNA fingerprinting:”
minisatellite analysis with
a multilocus probe
• Unique biological identifier for each
individual.
• Essentially constant for an individual,
irrespective of the source of DNA.
• Simple Mendelian pattern of inheritance.
• Requires relatively large amounts of
DNA.
• Does not work well with degraded
samples.
Minisatellite analysis with a single-locus
probe
• A single-locus probe allows the detection
of a single minisatellite DNA locus on one
chromosome.
• To increase the sensitivity, 3-5 singlelocus probes are mixed in a single-locus
“cocktail.”
Polymerase chain reaction-based
analysis
• Sufficient DNA can be collected from
saliva on a postage stamp or bones from
skeletons.
• Even highly degraded DNA can be
amplified, as long as the target sequence
is intact.
Short tandem repeat analysis
• Currently the most widely used DNA
typing procedure in forensic genetics.
• The variability in STRs mainly occurs by
slippage during DNA replication, rather
than by unequal crossing-over.
Multiplex analysis of STRs
• Simultaneous amplification of many
targets of interest in one reaction by
using more than one pair of primers.
• The FBI uses a standard set of 13
specific STR regions for CODIS (The
Combined DNA Index System).
Example:
• 15 different STRs and a gender-specific
marker amplified by PCR.
• One primer in each pair is labeled with a
fluorescent tag for 4 color detection.
• Detect PCR amplification products using an
automated sequencer.
• Separated by size and detected by color after
laser-induced excitation.
Mitochondrial DNA analysis
• Every cell has hundreds of mitochondria with
several hundred mtDNA molecules.
• Older biological samples (e.g. strands of hair,
solid bone, or teeth) often lack usable nuclear
DNA but have abundant mtDNA.
• mtDNA has been successfully isolated from
fossil bones.
• Analysis by PCR amplification and direct
sequencing of two highly variable regions
in the D loop region.
• Can only identify a person’s maternal
lineage.
Y chromosome analysis
• Y chromosome-specific STRs.
• Paternity testing of male offspring
• Analyzing biological evidence in criminal
casework involving multiple male
contributors.
Randomly amplified polymorphic DNA
(RAPD) analysis
• No knowledge of an organism’s DNA sequence
is required.
• PCR primers consist of random sequences.
• e.g. The case of the Palo Verde tree seed
pods.
• e.g. Differentiation between Bacillus species.
16.3 Genomics, proteomics,
and beyond
• Whereas gene discovery once drove
DNA sequencing, now the sequencing of
entire genomes drives gene discovery.
What is bioinformatics?
• Area of computer science devoted to
collecting, organizing, and analyzing
DNA and protein sequences and all the
data being generated by genomics and
proteomics labs.
Tools of bioinformatics:
•
•
•
•
Locate and align sequences.
Assemble consensus sequences.
Analyze properties of proteins.
Analyze sequence patterns to locate
restriction sites, promoters, DNA binding
domains, etc.
• Phylogenetic analysis.
• Basic local alignment search tool
(BLAST).
• The most commonly used genome tool.
• Example: Search for all the predicted
protein sequences that are related to a
“query sequence.”
Genomics
• The comprehensive study of whole sets
of genes and their interactions rather
than single genes.
• Comparative analysis of genomes based
on the availability of complete genome
sequences.
Proteomics
• The comprehensive study of the full set
of proteins encoded by a genome—the
“proteome.”
• Protein biochemistry on a “highthroughput” scale.
The age of “omics” and
systems biology
• A whole set of related terms coined to describe
the comparative study of databases.
– e.g., transcriptomics, metabolomics,
kinomics, glycomics, lipidomics
• Interactomics: the study of macromolecular
machines, mapping protein-protein interactions
throughout a cell.
• Systems biology aims to make sense of all
the data arising from the study of
biomolecular networks.
• Uses both experimental and computational
approaches to model these interactions.
• “Attempts to piece together everything.”
16.4 Whole genome sequencing
Major milestones in sequencing technology
• Development of automated DNA sequencing.
• Development of the BLAST algorithm.
• Development of bacterial artificial chromosome
(BAC) vectors.
Two main genome sequencing methods
• Clone by clone genome assembly approach:
– Used by the publicly funded international
sequencing consortium for the human
genome.
• Whole-genome shotgun approach:
– Used by the privately funded Celera
Genomics Corporation for the human
genome.
Clone by clone genome
assembly approach
• Restriction fragments of ~150 kb are cloned
into BAC vectors.
• A physical map of the genome is produced.
• The BAC clones are broken up into smaller
fragments, subcloned, and sequenced.
• This places the sequences in order so they can
be pieced together.
• Time consuming, but precise.
Whole-genome
shotgun approach
• Plasmid clones with 2-10 kb inserts are
prepared directly from fragmented genomic
DNA.
• Clones are randomly selected for sequencing.
• Sequence is reassembled in order with the aid
of a supercomputer.
• More rapid, but often results in gaps in the
sequence.
Rough drafts versus finished
sequences
• “Rough draft” of the human genome
reported in 2001 by the publicly and
privately funded groups.
• “Finished” sequence reported in 2004.
• More accurate and complete, but still
contains some gaps.
• Annotation of a sequenced genome is an
in-depth analysis of all functional elements
of the genome.
• Much of the emphasis is on the gene
content, with the aim of characterizing all
of the genes and their functions.
Comparative analysis of genomes
• Sequence and comparative analysis of
nonmammalian genomes help to provide
unique perspectives on the evolution of
anatomy, physiology, development, and
behavior.
• Prior to the draft sequence, estimated
that the human genome contained at
least 100,000 genes.
• Current estimate of 20,000 to 25,000
protein-coding genes came as a surprise.
• What makes us uniquely human?
• The answer lies somewhere within the 35
million single-nucleotide substitutions, 5
million small insertions and deletions, local
rearrangements, and a chromosomal
fusion that distinguish us from the
chimpanzee (Pan troglodytes).
Comparative analysis of genomes:
insights from pufferfish
and chickens
• Comparative genome analysis allows
researchers to assess changes in gene
structure and sequence that have
occurred during evolution.
• Homologous sequences share a
common evolutionary ancestry.
• Orthologs are genes in different species
that are homologous because they are
derived from a common ancestral gene.
• Paralogs are two genes in a genome that
are similar because they arose from a
gene duplication.
• Synteny is conservation in genetic
linkage between the genes of distantly
related organisms.
• Suggests that the conserved order of loci
on a chromosome may be of importance
for gene regulation.
Insights from the pufferfish genome
• Comparison of the genome sequence of
the pufferfish with that of humans.
• Researchers have deduced that the
extinct ancestor of ray-finned fish and
lobe-finned fish had 12 pairs of
chromosomes.
Insights from the chicken genome
• Potential for using comparative sequence
analysis to map conserved regulatory
elements present in the human genome.
What is a gene and how many are
there in the human genome?
Three essential features of a gene:
• Expression of a product.
• Requirement that it be functional.
• Inclusion of both coding and regulatory
regions.
• A gene is a complete chromosomal
segment responsible for making a
functional product.
• How many genes are there in the human
genome?
• Gene “hunting” or gene prediction
computer programs have become much
more sophisticated, but no program
predicts all genes correctly.
• Some recognize genes by detecting
distinctive patterns in DNA sequences
• Others detect new genes based on their
similarity to known genes.
• The Encyclopedia of DNA Elements
(ENCODE) Project aims to identify all
functional elements in the human genome.
• This includes protein-coding genes, nonprotein-coding genes, transcriptional
regulatory elements, and sequences that
mediate chromosome structure and
dynamics.
16.5 High-throughput analysis
of gene function
• Methods for genome and proteome
analysis are called “high-throughput”
because the activities of thousand of
genes and their products are studied at
the same time.
Some classic methods for high-throughput
analysis of gene function
• DNA microarrays
• Protein arrays
• MALDI-TOF
• Tandem mass spectrometry
DNA microarrays
• Analysis of the transcriptional activity of
thousands of genes simultaneously.
• Compare transcription programs of cells
or organisms during specific
physiological responses, developmental
processes, or disease states.
Protein arrays
Two common types:
• Analytical protein array.
• Functional protein array.
Analytical protein array
• Monitor protein expression levels.
• Clinical diagnostics.
Functional protein array
• Analyze enzymatic activities, proteinprotein interactions, post-translational
modifications.
• Drug-target identification.
• Mapping biological pathways.
Mass spectrometry
Two popular strategies:
• Peptide mass fingerprinting using
MALDI-TOF.
• Shotgun proteomics using MS/MS
Peptide mass fingerprinting using
MALDI-TOF
• Analysis of a single isolated protein.
• MALDI-TOF (Matrix Assisted Laser
Desorption/Ionization-Time of Flight)
mass spectrometry.
• Time of flight is inversely proportional to mass
and directly proportional to charge.
• Measurement of the number of ions at each
m/z value (mass to charge ratio).
• Computer database: identify protein from which
peptides originated.
Shotgun proteomics using MS/MS
• “Interrogation” of an entire proteome.
• Tandem mass spectrometry (MS/MS).
• The process produces a collection of peptide
ion fragments that differ in mass by a single
amino acid.
• Measurement of mass to charge (m/z) ratios of
the fragments allows the amino acid sequence
to be read.
The nucleolar proteome
• Analysis of the nucleolar proteome by
shotgun proteomics.
• Group of candidate novel nucleolar
proteins identified by mass spectrometry.
• Isolate corresponding cDNAs.
• Subclone into YFP expression vectors.
• Observe localization in transfected cells
by confocal microscopy.
16.6 Genome-wide association
studies
• All human individuals share genome sequences
that are approximately 99.9% the same.
• The remaining variable 0.1% is responsible for
the genetic diversity between individuals.
• Most common human traits and diseases have a
polygenic pattern of inheritance.
• This means that DNA sequence variants at
many genetic loci influence the phenotype.
• Genome-wide association studies (GWAS) have
identified more than 3000 variants associated
with 150 human traits.
• Example: Hundreds of genetic variants in at
least 180 loci influence adult height.
• Projects investigating cancer genomes and the
genomes of people with diabetes, Alzheimer’s
disease, Crohn’s disease, and other disorders
are under way.
• This type of meta-analysis screens
databases of single nucleotide
polymorphisms, or copy number variants,
to test for the association of a particular
trait with each polymorphism.
Single nucleotide polymorphisms
• Two or more possible nucleotides occur
at a specific mapped location in a
genome.
• e.g. ATGCCTA or ATGCTTA
• For a variation to be considered a SNP,
must occur in at least 1% of the
population.
• ~7 million SNPS with an allele frequency
>5%.
• ~4 million with an allele frequency of 1 to
5%.
• Map of SNPs can be used to scan the
human genome for haplotypes
associated with common diseases.
– e.g. Late onset Alzheimer’s disease
• Haplotypes are patterns of sequence
variation; i.e., stretches of DNA
containing a distinctive set of alleles.
Mapping disease-associated SNPs:
Alzheimer’s disease
• Two SNPs in the apolipoprotein E gene
result in three possible alleles.
• An individual with at least on apoļ„4 (E4)
allele has a greater chance of developing
Alzheimer’s.
Copy number variants (CNVs)
• Variation in the genome in which entire
genes or genomic regions are deleted,
duplicated, or rearranged.
• CNVs can affect from one kilobase to
several megabases of DNA.
• In some cases CNVs have been linked to
disease.
• Example: Rare structural variations in genes that
affect neuronal development and signaling.
• These CNVs may account for much of the
heritability underlying autism.
• Each person with autism may carry a unique set
of “autism loci,” but the biological pathways
affected by these CNVs are likely to be similar.
• CNVs are also associated with
neuroblastoma, Crohn’s disease, and
schizophrenia.
• Will these associations stand the test of
further research?
Genes polymorphisms and
human behavior
Human behavior
•
Personality
•
Temperament
•
Cognitive style
•
Psychiatric disorders
Oversimplified model of human behavior
•
Direct linear relationship between individual
genes and behavior.
More accurate model
•
Complex gene networks and multiple
environmental factors affect brain development
and function, which in turn will influence
behavior.
• Heredity definitely plays some role in
behavior but DNA is not destiny.
• In general, be wary of announcements in
the popular media about scientists finding
“the gene” for an aspect of human
behavior.
Why is there a lack-of-progress in finding
“behavior” genes?
•
Behavioral traits are polygenic.
•
Gene-environment interactions.
•
Behavioral traits tend to be inexactly defined.
•
Sample bias.
•
Inadequate sample size.
Aggressive, impulsive,
and violent behavior
•
Family, twin, and adoption studies have
suggested heritability of 0% to >50% for a
predisposition to violent behavior.
•
The case of the “extra” Y chromosome.
•
Polymorphism in the transcriptional control
region of the monoamine oxidase A (MAOA)
gene.
MAOA functional length polymorphism
• MAOA metabolizes several
neurotransmitters in the brain, such as
dopamine and serotonin.
• Prevents excess neurotransmitters from
interfering with communication among
neurons.
High activity alleles
•
Alleles with 3.5 or 4 copies of the repeat
sequence are transcribed more efficiently and
produce more MAOA enzyme.
Low activity alleles
•
Alleles with 3 or 5 copies of the repeat
sequence are transcribed less efficiently and
produce less MAOA enzyme.
High activity alleles (3.5 or 4 repeats)
• Less likely to develop antisocial behavior.
Low activity alleles (3 or 5 repeats)
• More likely to develop antisocial behavior
if maltreated as children.
Schizophrenia susceptibility loci
• Severe psychiatric disorder that affects
about 1% of the population.
• Twin and adoption studies show the risk
of developing schizophrenia is increased
among relatives of affected individuals.
• Multilocus model.
• A number of potential susceptibility loci
and CNVs have been described.
• Most evidence for linkage in loci that
encode proteins involved in
neurotransmission, axon guidance, and
cell-cell signaling in the brain.
What remains unknown
• The disease risk conferred by each locus.
• The extent of genetic variability.
• The degree of interaction among loci and
the environment.
Download