Lecture

advertisement
Nucleic Acids
DNA and RNA are nucleic acids, long, thread-like polymers
made up of a linear array of monomers called nucleotides
All nucleotides contain three components:
1. A nitrogen heterocyclic base
2. A pentose sugar
3. A phosphate residue
Chemical Structure of DNA vs RNA
Ribonucleotides have a 2’-OH
Deoxyribonucleotides have a 2’-H
Bases are classified as Pyrimidines or Purines
Structure of Nucleotide Bases
The nucleus contains the cell’s DNA (genome)
RNA is synthesized in the nucleus and exported
to the cytoplasm
Nucleus
Cytoplasm
replication
DNA
transcription
RNA (mRNA)
translation
Proteins
Deoxyribonucleotides found in DNA
dA
dG
dT
dC
Nucleotides are
linked by
phosphodiester
bonds
DNA is double stranded
Bases form a specific hydrogen bond pattern
Properties of a DNA
double helix
The strands of DNA are antiparallel
The strands are complimentary
There are Hydrogen bond forces
There are base stacking interactions
There are 10 base pairs per turn
DNA is a Double-Helix
Transcription of a DNA
molecule results in a mRNA
molecule that is singlestranded.
RNase P M1 RNA
RNA molecules do not have a
regular structure like DNA.
hairpin
Structures of RNA molecules
are complex and unique.
RNA molecules can base pair
with complementary DNA or
RNA sequences.
G pairs with C, A pairs with U,
and G pairs with U.
bulge
internal loop
Nucleic Acids in Acid and Base
The glycosidic bond of DNA and RNA is hydrolyzed by acids.
Order of stability: dA, dG < rA, rG < dC, dT < rC, rU
dA, dG hydrolyzed in boiling 0.1 M hydrochloric acid in 30 min
rA, rG hydrolyzed in boiling 1 M hydrochloric acid in 60 min
rC, rU hydrolyzed in boiling 12 M perchloric acid in 60 min
DNA is quite stable under basic conditions.
RNA is readily hydrolyzed by base.
RNA is hydrolyzed under alkaline (basic) conditions
Methylation of Nucleotide bases
Certain nucleotide bases in DNA molecules are methylated, catalyzed by enzymes.
Adenine and Cytosine are methylated more often than Guanine and Thymine.
Methylation is confined to specific regions of DNA and aid in biological processes.
E. coli DNA is methylated to distinguish its DNA from that of foreign invaders.
In eukaryotic cells about 5% of cytidines are methylated, producing 5-methylcytidine.
Spontaneous Alterations in Nucleic Acids
In a human cell, DNA undergoes spontaneous alterations in structure (mutations).
As a cell ages, the number of mutations increases, making it likely that a cell’s
normal processes may be altered.
There is a link between spontaneous mutation, aging, and carcinogenesis.
Depurination
Why does DNA contain thymine and not uracil?
Hypothesis:
If DNA contined uracil, during replication of DNA the
uracils would be base-paired with adenine.
Deaminated cytosines would also be base-paired with adenine.
This would decrease the number of G-C base pairs over time
and increase the number of A-U base pairs.
Eventually all the G-C base pairs could be lost.
The genetic code would not exist as we know it.
Ultraviolet light is damaging to DNA
Near-UV radiation (wavelengths of
200 – 400 nm) is a significant portion
of the solar spectrum.
Upon exposure to ultraviolet
radiation, two adjacent pyrimidine
bases can dimerize.
This happens most often between two
adjacent thymines.
Two products often form:
cyclobutane thymine dimer
6-4 photoproduct
Nucleic Acids
Where are they found in nature?
and
What do they look like?
Genomes
Source of DNA
Size (bases)
Type
Escherichia coli
9,200,000
Closed-circular double-stranded DNA
Bacillus subtilis
4,200,000
Closed-circular double-stranded DNA
F plasmid
95,000
Closed-circular double-stranded DNA
 phage
48,500
Linear double-stranded DNA
T7 phage
40,000
Linear double-stranded DNA
M13 phage
6,400
Closed-circular single-stranded DNA
MS2 phage
3,600
Linear single-stranded RNA
Human
6,000,000,000
Linear double-stranded DNA
Fruit fly
270,000,000
Linear double-stranded DNA
HIV
9,700
Linear single-stranded RNA
DNA molecules are packaged in the cell as structures called chromosomes.
Bacteria have a single chromosome. Eukaryotes have multiple chromosomes.
A single chromosome contains thousands of genes, each encoding a protein.
All of an organism’s chromosomes make up the genome.
Humans have 46 chromosomes.
The human genome has about 3 billion nucleotide base pairs.
The Human Genome
http://www.ncbi.nlm.nih.gov/genome/guide/human/
How is DNA packaged into a cell?
E. coli has a single double-stranded
DNA molecule as its genome.
There are 4,639,221 base pairs
in the E. coli genome.
The DNA is 1.7 mM long,
850 times the length of an E. coli cell.
plasmid
Large DNA molecules
are compacted in a cell
by supercoiling.
relaxed
supercoiled
DNA in eukaryotic cells is packaged into nucleosomes,
which contain proteins called histones.
DNA wrapped around a
histone core (side view)
Nucleosomes are packaged to form 30 nm fibers
Compaction of 30 nm
fibers uses nuclear
scaffolds
In eukaryotes,
genes contain exons (coding regions)
and introns (non-coding regions).
Prokaryotic genes do not
contain introns.
Telomeres
Telomeres are sequences at the end of eukaryotic
chromosomes that help stabilize the chromosome.
Telomeres are repeats of the following sequence:
5’-(TxGy)n
3’-(AxCy)n
x and y = 1 to 4
The TG strand is longer
5’-TTTGGTTTGGTTTGGTTTGGTTTGGTTTGG…
3’-AAACCAAACCAAACC…
Can be >10,000 nucleotides in mammals.
The ends of the chromosome are replicated by
the enzyme telomerase.
Telomeres and aging
There appears to be a relationship between the length of
telomeres at the end of chromosomes and the age of
an individual.
The older you are, the shorter your telomeres are.
Germ-line cells (reproductive cells) contain telomerase activity.
Non-germ-line cells (somatic cells) do not contain telomerase
activity.
We have a certain length of telomeres that we are born with.
As we age, the telomeres get shorter.
Is our life-span pre-determined by the length of our telomeres?
Internet Resources
Nucleic Acids
National Center for Biotechnology Information (NCBI)
National Library of Medicine (NLM)
National Institutes of Health (NIH)
http://www.ncbi.nlm.nih.gov/
GenBank
GenBank® is the NIH genetic
sequence database, an annotated
collection of all publicly available
DNA sequences ( Nucleic Acids
Research , 2011 Jan;39(Database
issue):D32-7 ). There are
approximately 126,551,501,141
bases in 135,440,924 sequence
records in the traditional GenBank
divisions and 191,401,393,188 bases
in 62,715,288 sequence records in
the WGS division as of April 2011.
BLAST SEARCH
What is BLAST?
BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs
designed to explore all of the available sequence databases regardless of whether the
query is protein or DNA. The scores assigned in a BLAST search have a well-defined
statistical interpretation, making real matches easier to distinguish from random
background hits. BLAST uses an algorithm which seeks local as opposed to global
alignments and is therefore able to detect relationships among sequences which share
only isolated regions of similarity.
The core of NCBI 's BLAST services is BLAST 2.0 otherwise known as "Gapped
BLAST". This service is designed to take protein and nucleic acid sequences and
compare them against a selection of NCBI databases.
Instead of relying on global alignments (commonly seen in multiple sequence
alignment programs) BLAST emphasizes regions of local alignment to detect
relationships among sequences which share only isolated regions of similarity.
Therefore, BLAST is more than a tool to view sequences aligned with each other or to
calculate percent homology, but a program to locate regions of sequence similarity with
a view to comparing structure and function.
The BLAST search pages allow you to select from several different programs
Below is a table of these programs.
Program
Description
blastp
Compares an amino acid query sequence against a protein sequence database.
blastn
Compares a nucleotide query sequence against a nucleotide sequence database.
blastx
Compares a nucleotide query sequence translated in all reading frames against a
protein sequence database. You could use this option to find potential translation
products of an unknown nucleotide sequence.
tblastn
Compares a protein query sequence against a nucleotide sequence database
dynamically translated in all reading frames.
tblastx
Compares the six-frame translations of a nucleotide query sequence against the sixframe translations of a nucleotide sequence database. Please note that the tblastx
program cannot be used with the nr database on the BLAST Web page because it is
computationally intensive.
Nucleotide Databases
Database
nr
Description
All non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or HTGS sequences).
month
All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days.
dbest
Non-redundant database of GenBank+EMBL+DDBJ EST Divisions.
dbsts
Non-redundant database of GenBank+EMBL+DDBJ STS Divisions.
mouse ests
The non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism mouse.
human ests
The Non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism human.
other ests
The non-redundant database of GenBank+EMBL+DDBJ EST Divisions all organisms except mouse and human.
yeast
Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences. Not a collection of all Yeast nucelotides
sequences, but the sequence fragments from the Yeast complete genome.
E. coli
pdb
E. coli (Escherichia coli) genomic nucleotide sequences.
Sequences derived from the 3-dimensional structure of proteins.
kabat [kabatnuc]
Kabat's database of sequences of immunological interest.
patents
Nucleotide sequences derived from the Patent division of GenBank.
vector
Vector subset of GenBank(R), NCBI
mito
Database of mitochondrial sequences
alu
Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. It is available at
epd
Eukaryotic Promotor Database ISREC in Epalinges s/Lausanne (Switzerland).
gss
Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.
htgs
High Throughput Genomic Sequences.
Nucleic Acid Sequence
What does it encode?
CGTGATGAACGGCTTCGAGCGATACGAGGGAGTGCGTCACTGCCGCTATGTGGACGAGTTGCAG
ATCGTCCAGAATGCGCCATGGACTCTGTCCGATGAATTCATCGCCGACAACAAAATCGACTTTGT
GGCCCACGACGACATTCCGTATGTAACCGATGGCATGGACGACATCTATGCTCCTCTCAAGGCGC
GCGGCATGTTTGTGGCCACGGAGCGCACTGAGGGTGTGTCCACCTCGGACATCGTAGCCCGGAT
CGTCAAGGATTACGATCTGTATGTGCGTCGTAATCTGGCCAGAGGCTATTCGGCCAAGGAACTCA
ATGTGTCGTTCCTGTCCGAGAAGAAGTTCCGGCTGCAGAACAA
Problem
• As an employee of the Environmental Protection Agency
(EPA) you are charged with maintaining safe public
swimming in lakes.
• In a sample isolated from a lake used for swimming and
boating you discover the following nucleic acid molecule
that you believe is part of a larger gene sequence. You
suspect the organism from which the gene came may be
harmful to the public.
• 5’-CATCCAGGGAATCACCAAGCCCGCCATTCGCCGTCTGGCTCGCCG-3’
• Determine if you should shut down public access to the
lake, or if the lake is safe.
Problem #2
• In the middle of the swimming season you re-test the lake to
make sure it is safe for human use.
• In a sample isolated from the lake you discover the
following nucleic acid molecule that you again believe is
part of a larger gene sequence. You wonder the organism
from which the gene came may be harmful to the public.
• 5’-GTCGAAGCGCCACTCGAAGGAGAAGGACACGCTCGGGGGCATCAC-3’
• As before, determine if you should shut down public access
to the lake, or if the lake is safe.
Regulatory Proteins
DNA sequences recognized by regulatory proteins are often inverted
repeats of a short DNA sequence. These repeats form a
palindrome with two-fold symmetry about a central axis.
DNA binding proteins are often dimeric, with two
identical protein subunits.
Each subunit binds to one strand of the DNA.
5’-TACGGTACTGTGCTCGAGCACTGCTGTACT-3’
3’-ATGCCATGACACGAGCTCGTGACGACATGA-5’
central axis
Protein – DNA interaction
Proteins often bind to specific sequences of DNA.
Example: Restriction enzyme EcoRI binds to the DNA sequence
5’-GAATTC-3’
3’-CTTAAG-5’
Restriction Fragment Length Polymorphism (RFLP)
A variation in sizes of DNA seen after cutting with restriction enzymes.
Restriction enzymes cut DNA at a specific site. For example, the EcoR1 restriction
enzyme cuts DNA whenever it sees the letters GAATTC:
DNA before cutting by EcoR1:
5’-AATCTAGGGAATTCACAGCGATGCGAATTCGCAATTA-3’
3’-TTAGATCCCTTAAGTGTCGCTACGCTTAAGCGTTAAT-5’
DNA after cutting by EcoR1:
5’-AATCTAGGG AATTCACAGCGATGCG AATTCGCAATTA-3’
3’-TTAGATCCCTTAA GTGTCGCTACGCTTAA GCGTTAAT-5’
In this example, EcoR1 has cut the one strand of 37 base pairs into 3 smaller strands of
DNA. If another person has slightly different DNA, EcoR1 may cut the DNA into
pieces of different lengths. (For example: If the second GAATTC is GAATTT, EcoR1
will cut this other person's DNA in only one place, producing 2 smaller strands of
DNA.)
The words "fragment length polymorphism" mean "DNA pieces of different lengths."
RFLPs are a quick way to see if two pieces of DNA are identical, without having to
look at the entire DNA sequence.
IS6110 Fingerprints of M. tuberculosis
DNA Profiling
Each person has a unique set of fingerprints. As with a person’s fingerprint no two
individuals share the same genetic makeup. This genetic makeup, which is the hereditary
blueprint imparted to us by our parents, is stored in the chemical deoxyribonucleic acid (DNA),
the basic molecule of life. Examination of DNA from individuals, other than identical twins, has
shown that variations exist and that a specific DNA pattern or profile could be associated with an
individual. These DNA profiles have revolutionized criminal investigations and have become
powerful tools in the identification of individuals in criminal and paternity cases.
The first widespread use of DNA tests involved RFLP (restriction fragment length
polymorphism) analysis, a test designed to detect variations in the DNA from different
individuals. In the RFLP method, DNA is isolated from a biological specimen (e.g., blood,
semen, vaginal swabs) and cut by an enzyme into restriction fragments. The DNA fragments are
separated by size into discrete bands in a gel (gel electrophoresis), transferred onto a membrane,
and identified using probes (known DNA sequences that are "tagged" with a chemical tracer).
The resulting DNA profile is visualized by exposing the membrane to a piece of x-ray film which
allows the scientist to determine which specific fragments the probe identified among the
thousands in a sample of human DNA. A "match" is made when similar DNA profiles are
observed between an evidentiary sample and those from a suspect’s DNA. A determination is
then made as to the probability that a person selected at random from a given population would
match the evidence sample as well as the suspect. The entire analysis may require from 6 to 10
weeks for completion.
restriction fragment length
polymorphism (= RFLP)
Technique, also known as DNA fingerprinting, that allows familial
relationships to be established by comparing the characteristic
polymorphic patterns that are obtained when certain regions of genomic
DNA are amplified (typically by PCR) and cut with certain restriction
enzymes. In principle, an individual can be identified unambiguously by
RFLP (hence the use of RFLP in forensic analysis of blood, hair or
semen). Similarly, if a polymorphism can be identified close to the locus
of a genetic defect, it provides a valuable marker for tracing the
inheritance of the defect.
The matching process for identifying DNA profile patterns which either
"exclude" or "include" a person as being the parent of a child is shown in
the figure below. In this instance man 1 is excluded from paternity and
man 2 is included as a possible father of the child.
Parentage Testing
RNA and DNA
Viruses
Viruses
• disease-causing agents that can multiply only in
cells
• viruses are DNA or RNA enclosed by a
protective coat that enables them to move from
one cell to another.
• Viral-infected cells often break open (lyse) and
allows viruses access to nearby cells.
• A protein shell (capsid) surrounds the nucleic
acid of most viruses. In many viruses the protein
capsid is further enclosed by a lipid bilayer
membrane that contains proteins.
Viral capsid
The capsids of some viruses, all shown at the same scale.
(A) Tomato bushy stunt virus; (B) poliovirus; (C) simian virus 40
(SV40); (D) satellite tobacco necrosis virus.
Acquisition of a viral envelope
The Coats of Viruses
Bacteriophage
T4, a large
DNA-containing
virus that infects
E. coli.
Potato virus X, a
filamentous
plant virus that
contains an RNA
genome.
Adenovirus, a
DNA-containing
virus that can
infect human
cells. The protein
capsid forms the
outer surface of
this virus.
Influenza virus, a
large RNAcontaining animal
virus whose
protein capsid is
enclosed in a
lipid envelope
with protruding
spikes of viral
glycoprotein
Several types of viral genomes
The smallest viruses contain only a few genes and can have an RNA
or a DNA genome; the largest viruses contain hundreds of genes and
have a double-stranded DNA genome.
T4 bacteriophage
chromosome
This schematic shows the positions
of the more than 30 genes involved
in T4 DNA replication. The
genome of bacteriophage T4
consists of 169,000 nucleotide pairs
and encodes about 300 different
proteins.
The life cycle of the Semliki forest virus
The virus parasitizes
the host cell for most
of its biosyntheses
The life cycle of a retrovirus
• The retrovirus genome consists of an RNA molecule of
about 8500 nucleotides; two such molecules are
packaged into each viral particle.
• The enzyme reverse transcriptase first makes a DNA
copy of the viral RNA molecule and then a second DNA
strand, generating a double-stranded DNA copy of the
RNA genome.
• The integration of this DNA double helix into the host
chromosome, catalyzed by the viral enzyme integrase, is
required for the synthesis of new viral RNA molecules
by the host-cell RNA polymerase.
reverse
transcription
messenger RNA (mRNA)
transfer RNA (tRNA)
ribosomal RNA (rRNA)
The life cycle of a retrovirus
The AIDS Virus Is a Retrovirus
• In 1982 physicians first became aware of a new sexually transmitted disease that
was associated with an unusual form of cancer (Kaposi's sarcoma) and a variety of
unusual infections. Because both of these problems reflect a severe deficiency in the
immune system - specifically in helper T lymphocytes - the disease was named
acquired immune deficiency syndrome (AIDS). By culturing lymphocytes from
patients with an early stage of the disease, a retrovirus was isolated that is now
known to be the causative agent of AIDS.
• The retrovirus, called human immunodeficiency virus (HIV), enters helper T
lymphocytes by first binding to a functionally important plasma membrane protein
called CD4. There are two features of HIV that make it especially deadly. First, it
eventually kills the helper T cells that it infects rather than living in symbiosis with
them, as do most other retroviruses, and helper T cells are vitally important in
defending us against infection. Second, the provirus tends to persist in a latent state
in the chromosomes of an infected cell without producing virus until it is activated
by an unknown rare event; this ability to hide greatly complicates any attempt to
treat the infection with antiviral drugs.
• Much current research on AIDS is aimed at understanding the life cycle of HIV. The
complete nucleotide sequence of the viral RNA, which encodes nine genes, has been
determined. This has made it possible to identify and study each of the proteins that
it encodes. The three-dimensional structure of its reverse transcriptase is being used
to help design new drugs that inhibit the enzyme.
A map of the HIV genome
The HIV genome is about 9000 nucleotides and contains nine genes. Three of
the genes (green) are common to all retroviruses: gag encodes capsid proteins,
env encodes envelope proteins, and pol encodes both the reverse transcriptase
and the integrase proteins. The HIV genome contains six small genes (in red)
plus the three (in green) that are normally required for the retrovirus life cycle.
1.
•
•
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Attachment
CD4-gp120 Interaction
Gp120-Chemokine Receptor Interaction
Viral Fusion/Uncoating
Reverse Transcription
RNaseH Degradation
Second Strand Synthesis
Migration to Nucleus
Integration
Latency
Early Transcription
Late Transcription
RNA Processing
Protein Synthesis
Protein Glycosylation
Assembly of Virion
Viral Budding
Virion Maturation
HIV binds to the CD4 receptor on the host cell. CD4 is present on
the surface of many lymphocytes, which are a critical part of the
body's immune system. A coreceptor, CXCR4 and/or CCR5, is
needed for HIV to enter the cell.
The HIV envelope fuses with the host cell membrane.
The viral capsid and its contents enter the host cell
The RNA HIV genome and the enzyme reverse
transcriptase are released in to the host cell
Reverse transcriptase makes a DNA copy of the RNA
HIV genome. First, a single-stranded DNA is made
Reverse transcriptase then makes a double-stranded
DNA copy of the HIV genome
The enzyme integrase fuses the double-stranded copy of the
DNA genome with the host cell genome in the nucleus
mRNA is produced encoding HIV proteins
mRNA is translated to produce HIV-encoded
polypeptides, including HIV protease
HIV protease cleaves polypeptides and makes
functional HIV proteins
A new HIV particle is assembled at the cell
surface and buds off
The HIV virus particle leaves to infect other cells
Human immunodeficiency virus (HIV)
leaving an infected T lymphocyte
Preventing and treating AIDS
Vaccines?
Preventing and treating AIDS
Vaccines?
Modern vaccines for viral infections often consist of one
or more coat proteins of the virus that are not
themselves infectious, but elicit an immune response
from the person receiving the vaccine.
Preventing and treating AIDS
Vaccines?
Modern vaccines for viral infections often consist of one
or more coat proteins of the virus that are not
themselves infectious, but elicit an immune response
from the person receiving the vaccine.
HIV reverse transcriptase has an error rate of one nucleotide
per 2000. This means that the amino acid sequence
of the HIV coat proteins is constantly changing.
Preventing and treating AIDS
Drugs?
What should we target?
Anti-HIV chemotherapy
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Antiretroviral Agents Currently Available (generic name/Trade name)
zidovudine/Retrovir (AZT, ZDV)
didanosine/Videx, Videx EC (ddI)
zalcitabine/HIVID (ddC)
stavudine/Zerit (d4T)
lamivudine/Epivir (3TC)
abacavir/Ziagen (ABC)
nevirapine/Viramune (NVP)
delavirdine/Rescriptor (DLV)
efavirenz/Sustiva (EFV)
tenofovir DF/Viread (TDF)
indinavir/Crixivan
ritonavir/Norvir
saquinavir/Invirase, Fortovase
nelfinavir/Viracept
amprenavir/Agenerase
lopinavir/ritonavir, Kaletra
FUZEON (enfuvirtide, ENF or T-20)
Anti-PDI antibodies
T22 – Tyr-5,12,Lys-7]polyphemusin II
Pharmacogenomics
Title
The study of how
anSlide
individual's genetic
inheritance affects the body's response to drugs.
Holds the promise that drugs might one day be
tailor-made for individuals and adapted to each
person's own genetic makeup.
Combines traditional pharmaceutical sciences such
as biochemistry with annotated knowledge of
genes, proteins, and single nucleotide
polymorphisms.
Wouldn't it be wonderful if you knew exactly what
measures you could take to stave off, or even prevent,
Wouldn’t
it
be
wonderful…
the onset of disease?
Wouldn't it be a relief to know that you are not
allergic to the drugs your doctor just prescribed?
Wouldn't it be a comfort to know that the treatment
regimen you are undergoing has a good chance of
success because it was designed just for you?
With the recent harvest of millions of Single
Nucleotide Polymorphisms (SNPs) biomedical
researchers now believe that such exciting medical
advances are not that far away.
Sanger dideoxy nucleotide DNA sequencing uses
a DNA polymerase to determine the sequence of DNA
Sanger dideoxy sequencing incorporates dideoxy
nucleotides, preventing further synthesis of the DNA strand
Automated DNA Sequencing
Automated DNA sequencing uses
a mixture of unlabeled deoxy
nucleotides and
dideoxy nucleotides labeled with
a fluorescent dye. A computer
then determines the identity of the
labeled nucleotide as each DNA
fragment migrates through a
polyacrylamide gel.
What are SNPs and How are They Found?
A Single Nucleotide Polymorphism, or SNP (pronounced "snip") is a small
genetic change, or variation, that can occur within a person's DNA sequence. A single
base change found in 1% of an ethnically diverse population is defined as a SNP.
An example of a SNP is the alteration of the DNA segment AAGGTTA to
ATGGTTA. Because only about 3 to 5 percent of a person's DNA sequence codes for
the production of proteins, most SNPs are found outside of "coding sequences." SNPs
found within a coding sequence are of particular interest to researchers as they are more
likely to alter the biological function of a protein. Due to recent advances in technology,
coupled with the unique ability of these genetic variations to facilitate gene
identification, there has been a recent flurry of SNP discovery and detection.
Although many SNPs do not produce physical changes in people, scientists
believe that other SNPs may predispose a person to disease and even influence their
response to a drug regimen.
Needles in a Haystack
Finding single nucleotide changes in the human genome seems like a daunting
prospect, but, over the last 20 years, advances in DNA sequencing and recombinant
DNA technology have made it possible to do just that. Selected regions of a DNA
sequence obtained from multiple individuals who share a common trait are compared.
Many common diseases in humans are caused by genetic variation within
genes, some influenced by complex interactions among multiple genes. Therefore, a
person may have a genetic predisposition, or the potential to develop a disease based
on genes and hereditary factors.
Genetic factors may also determine the severity or progression of disease.
Since we do not yet know all of the factors involved in these intricate pathways,
researchers have found it difficult to develop screening tests for most diseases and
disorders. By studying stretches of DNA that have been found to harbor a SNP
associated with a disease trait, researchers may begin to reveal relevant genes associated
with a disease.
Why are SNPs important to
pharmaceutical companies?
Estimate: Most commonly used drugs will only be effective
in 30% to 60% of patients with the same disease. In
addition, a subset of these patients may suffer side effects.
Adverse drug reactions have been reported to be in the top five
leading causes of death in the United States, with an
economic impact up to $100 billion annually.
Severe adverse effects have lead to the withdrawal of blockbuster
drugs Rezulin, Seldane, Redux, and Pondimin.
Bringing a new drug to market is estimated to cost as much as
$500 million. Being able to predict a population’s response
to a drug would be invaluable to the pharmaceutical industry.
SNPs and Drug Interactions
Drug
Absorption in
the breast
Drug in
breast tissue
Metabolism
in the liver
Transportation in
the blood
Drug in
bloodstream
Transporter
Drug becomes
inactive or toxic
Excretion in
the kidney
SNP Profiles and Response to
Drug Therapy
Breast Cancer Patients
Individual SNP Profiles Are Sorted
Responds to Standard Drug Treatment
Does Not Respond to Standard Drug Treatment
SNP profile A
SNP profile B
SNP profile E
SNP profile C
SNP profile D
Cancer is many diseases
"WE ALREADY know that if we sample tumor tissues from 100 different women,
those tissues would have a molecular makeup that would break up into different
categories.
In essence, those patients [each] have a different disease, but we just happen to be
calling it the same thing--breast cancer," Conway says.
"We think we're going to subdivide diseases. Once we get people with the right disease
diagnosis, the disease definitions are going to change from 'You have breast cancer' to
'You have molecular profile A, B, C, or D.' The treatments of those diseases are going to
be different."
SNPs and Cancer
SNPs May Be the Solution
SNPs A
SNPs B
SNPs C
SNPs D
What Is Variation in the Genome?
Common Sequence
Variations
Polymorphism
Deletions
Insertions
Chromosome
Translocations
Variations Causing Latent Changes
= Variations in DNA that cause latent effects
Many years later
Many years later
What Is Variation in the Genome?
Common Sequence
Variations
Polymorphism
Deletions
Insertions
Chromosome
Translocations
Variations Causing Latent Changes
= Variations in DNA that cause latent effects
Many years later
Many years later
Age-dependent Frequency in SNPs
Sequenom's scientists are interested in changes in the frequency of SNPs as the
population ages. "We take advantage of the fact that most human diseases are lateonset. Age is a major risk factor," Cantor says. "If young people are carrying a harmful
variation, they're still well, whereas an old person carrying that same variation has a
very high chance that he's been made sick or killed by it. You make the prediction that
variations that are harmful to health should decline in frequency as a function of age in
the healthy population.“
One percent of genes appears to show an age-dependent frequency in SNPs,
Cantor says. He suspects that only 200 to 400 genes will be involved in disorders that
affect a major population. Finding these genes in healthy people, however, gives no
indication of what the diseases actually are. "After we find them in the healthy
population, we have to go back and look at biochemically stratified populations or
clinically stratified populations," he explains. "The advantage is that instead of having
to do all the genes with these tricky populations, we only have to do 200 to 400. We can
pay a lot more attention to the details."
Laboratory Experiment
Isolation of genomic DNA from
the bacterium Escherichia coli
(E. coli)
E. coli has a single double-stranded
DNA molecule as its genome.
There are 4,639,221 base pairs
in the E. coli genome.
Promega Wizard® Genomic DNA
Purification Kit
The Wizard Genomic DNA Purification Kit is designed for isolation of DNA from
white blood cells, tissue culture cells and animal tissue, plant tissue, yeast, Grampositive and Gram-negative bacteria.
1.
Lyse (break open) the cells and the nuclei. An RNase digestion step may be
included at this time. Depending on the DNA isolation method used, RNA will
be co-purified with genomic DNA. Spectrophotometric measurements do not
differentiate between DNA and RNA, so RNA contamination can lead to
overestimation of DNA concentration. Treatment with RNase A will remove
contaminating RNA; this can either be incorporated into the purification
procedure or performed after the DNA has been purified.
2.
Remove the cellular proteins by a salt-precipitation step, which precipitates the
proteins but leaves the high molecular weight genomic DNA in solution.
3.
Concentration of the genomic DNA followed by desalting by isopropanol
precipitation.
Laboratory Experiment
1. Determination of the molar absorptivity of adenosine 5’monophosphate (AMP)
2. Determination of the concentration of AMP in an aqueous solution
3. Determination of the concentration of DNA in purified E. coli
genomic DNA solution
4. Determination of the concentration of DNA in oligonucleotide
solutions
Download