010 lec stu analysis genetic info 2

advertisement
1. ANALYSIS OF GENETIC INFORMATION II
a. Three key events radically transformed the field of genetics:
i. 1860’s: Mendel’s fundamental principles of genetics
ii. 1953: Watson & Crick, DNA structure
iii. 1990: Start and continuation of the Human Genome Project
b. Goal of the Human Genome Project: to _________________and analyze the human genome in
conjunction with the genomes of several model organisms.
c. What would they look at? A haploid human genome consisting of _______ _______________
______________________________ that would represent 99.9% of the information contained
in a diploid set of chromosomes.
i. The 24 DNA molecules contain a total of: ______________________ nucleotides
ii. Each molecule ranges in size from: _______________________________________.
2. GENOMICS
a. GENOMICS: The study of whole genomes.
b. GENOME: the sum total of genetic information in a particular cell or organism.
c. GENOME PROJECTS: A large-scale, often multi-laboratory effort required to sequence a
complex genome.
d. 2001: First ________________________________of sequence of human genome.
Problems: 1._________________________________________________________
2.__________________________________________________________
e. 2003: An accurate sequence covering ________________ of genome completed 2 years ahead
of schedule.
f. 2006: Published finished human genome sequence with greater than 99% coverage, and 99.99%
accuracy.
g. 2009: Whole genome sequences had been completed for ____________ distinct species as well
as identification of 2359 ________________ causing different _________________ disease.
h. 2013: Whole genome sequences were completed for greater that _______________ distinct
species.
3. Creating Genomic Libraries
a. Each colony present on the agar plates in this image contain a different recombinant plasmid
with a different fragment of the human genome.
b. IMPORTANT NOTE: Small fragments of human genomic DNA cannot reproduce themselves
in a cell. A "vector" must be used to manage the DNA. Why?
i. Vector: a vehicle for introducing transgenes into living cells.
ii. The vector and inserted piece of foreign DNA (DNA from two different origins) is a
recombinant DNA molecule.
c. Why #1. DNA sequences must be present that promote replication. DNA fragments can lack the
regulatory sequences to give the cell information about what to do with this piece of DNA.
d. Why #2. There must be a method by which the vector signals its presence in the cell by
conferring a detectable property on the host cell (example: blue or white colonies).
e. RECOMBINANT DNA MOLECULES:
i. Both the human DNA and the vector DNA (bacterial in this image) have been cut with
the same restriction enzyme.
Bio 110
010 student
Genetic Analysis II
Beavers
Page 1 of 9
ii. "Sticky ends" produced by using the same restriction enzyme allow for complementary
base pairing – regardless of the origin of each fragment of DNA.
iii. Simplest vector is the plasmid – it can be more useful if it has been engineered to have
more than one recognition site for more than one restriction enzyme.
iv. Each plasmid should have an origin of replication (ORI) permitting the vector to replicate
independently from the bacterium's chromosomal DNA.
v. Each plasmid should include a gene for antibiotic resistance to allow for selection.
vi. The plasmid should be able to be purified from the bacteria for study of the DNA (we do
this with a mini-prep).
vii. Larger vectors include artificial chromosomes:
1. BAC (bacterial artificial chromosome) holds a DNA fragment (insert) up to 300
kb.
2. YAC (yeast artificial chromosome) holds a DNA fragment (insert) up to 2000 kb
(2 Mb).
4. ONE MORE LOOK AT SANGER SEQUENCING
a. Clones (colonies) present in a genomic library on an agar plate represent different human DNA
fragments. Their arrangement on the plate gives no indication of their relative order in the
genome.
b. Each insert (DNA fragment) present in vectors must be SEQUENCED.
c. Sequencing of the human genome was accomplished using the original method developed by
Fred Sanger in the 1970's once it had been automated.
d. Sequencing is based on hybridization – the natural tendency of complementary single-stranded
molecules of DNA or RNA to base pair and form double helixes.
e. Key requirements for sequencing:
i. DNA polymerase (enzyme) to catalyze DNA replication.
ii. A template (a single-strand of DNA)
iii. Deoxyribonucleotide triphosphates (dATP, dCTP, dGTP and dTTP) as building blocks
for the new strand
iv. Primer (complementary to the part of the template) that provides the free 3' end to which
DNA polymerase can attach new nucleotides. The sequence must be known to produce a
primer. If it is not, and usually the fragments inserted in the vectors sequences are not
known, using one strand of the vector's sequence as a template to produce a primer solves
the issue.
f. Mechanism of sequencing
i. Production of a series of single stranded fragments produced by DNA polymerase using
the unknown fragment as a template.
ii. Each produced fragment differs in length by a SINGLE NUCLEOTIDE.
iii. The graduated set of fragments is called a nested array.
iv. Each fragment is identified by relative length and one of four terminating nucleotides.
1. The fragments are produced using normal deoxyribonucleotide triphosphates and
four dideoxyribonucleotide triphosphates; ddATP, ddCTP, ddGTP and ddTTP.
2. These dideoxyribonucleotide triphosphates LACK A 3' -OH (hydroxyl group)
which will terminate the sequence whenever they are added.
3. These dideoxyribonucleotide triphosphates are labeled with four different color
fluorescent dyes.
v. DNA polymerase adds nucleotides to the millions of replicating strands and continues
with each strand until a dideoxyribonucleotide triphosphate (ddNTP – the "N" strands for
any of the four) terminates replication.
Bio 110
010 student
Genetic Analysis II
Beavers
Page 2 of 9
vi. The mixture of variable sized fragments are run on a polyacrylamide gel electrophoresis
under conditions which separate fragments by a difference of one nucleotide.
vii. A detector transmits information about the fluorescent signals of the DNA fragments to a
computer which interprets each different signal as series of different colored peaks
representing nucleotides that are COMPLEMENTARY to the original template fragment
which is referred to as a READ.
1. READ: in a single DNA sequencing run, a digital file of the sequence of As, Cs,
Gs and Ts comprising the newly synthesized DNA.
5. SANGER SEQUENCING AUTOMATED
6. HUMAN GENOME PROJECT - Maps
a. Started with a genome-wide linkage map, then a physical map, ending with a sequence map.
b. Linkage maps: which we have reviewed previously, depict the distances BETWEEN loci as
well as the order in which they occur in the organism. This technique can map a small number of
loci in a relatively small region of the genome. The terms linkage map or genetic map can be
used interchangeably when talking about maps produced through analyses of recombination
frequencies.
c. Researchers have expanded on techniques to produce physical and sequence maps.
d. Physical map: a map of locations of identifiable landmarks on DNA, for example, restriction
enzyme cutting sites, genes.
e. Human genome lowest resolution physical map: Banding pattern on the 24 different
chromosomes.
f. A physical map is a constellation of overlapping DNA fragments that are ordered and oriented
and span each of the chromosomes in a genome.
g. They are the molecular counterpart of linkage maps.
h. Unlike linkage maps, which use recombination frequencies to map, physical maps are based on
direct analysis of genomic DNA.
i. They chart the actual base pairs (bp), kilobases (kb) or megabases (Mb) that define or
separate a locus from its neighbor.
ii. Humans: Linkage maps 1cM (or m.u.) = 1 Mb Physical map distance.
iii. 1cM = 1% recombination frequency
iv. 1 Mb = 1 million nucleotides
i. Short range physical maps produced with multiple restriction enzymes and probed for genes or
markers is just a smaller scale version of how they produce physical maps of chromosomes that
average 100’s of thousands of base pairs.
7. KARYOTYPE – Maps
a. KARYOTYPE: (lowest resolution physical map) the visual description of the complete set of
chromosomes in one cell of an organism. Idiogram: black and white diagram of the
chromosomes converted from the light and dark bands observed under the microscope.
b. REVIEW if needed:
c. We learned previously that chromosomes at metaphase of mitosis can be stained with a Giemsa
dye and viewed under a light microscope.
d. The regions of dark and light called bands and interbands are used by cytogeneticists as:
landmarks.
e. The landmarks: distinguish homologous chromosome pair from other pairs.
f. Karyotype analysis produces: low-resolution physical maps that locate where on the
chromosome you might find:
i. Cloned genes
Bio 110
010 student
Genetic Analysis II
Beavers
Page 3 of 9
g.
h.
i.
j.
k.
ii. Markers: an identifiable physical location on a chromosome, whose inheritance can be
monitored. Markers can be expressed regions of DNA (genes) or any segment of DNA
with variant forms that can be followed.
This visual description is termed a karyotype.
Autosomes numbered in order of descending length.
Each band is numbered starting at the centromere and moving out along each arm toward the
telomere, arms p and q.
Banding resolution can increase as staining techniques improve.
See chromosome 7 at 3 levels of resolution.
i. Cells to be examined must be capable of growth and rapid division in culture – most
accessible – white blood cells, specifically T-lymphocytes.
ii. Higher levels of resolution are achieved by different staining methods and timing of
staining.
iii. G-banding or R-banding is utilized on chromosomes at prophase or prometaphase when
they are still in a relatively uncondensed state. Ideal for subtle structural abnormalities
in the chromosome.
iv. Standard banding: ~450 total bands
v. Prometaphase banding: ~ 550-850 bands or more
8. SPECTRAL KARYOTYPING (SKY) - maps
a. Specialized application of FISH (fluorescent in situ hybridization).
b. FISH: a physical mapping approach that uses fluorescent tags to detect hybridization of nucleic
acid probes with chromosomes.
9. FISH - maps: map making and site of interest detection.
a. FISH can show the location of a particular DNA sequence within the genome by hybridizing a
single fluorescent probe to a chromosome.
b. Some example uses: mapping, detecting deletions or additions, even extra whole chromosomes.
10. HIGH RESOLUTION PHYSICAL MAPPING
a. The ultimate goal of high-resolution physical mapping is the generation of one large contig for
each chromosome.
b. CONTIG: (from the word contiguous): is a set of 2 or more overlapping cloned DNA fragments
that together cover an UNINTERRUPTED stretch of the genome.
c. Why not just read the DNA from one end to the other?
d. SEQUENCE ASSEMBLY: (a real challenge to researchers) The compilation of THOUSANDS
or MILLIONS of independent DNA sequence reads (i.e. sequence data) into a set of contigs and
scaffolds.
e. How do they build up all of the individual segments of DNA into a consensus sequence?
f. CONSENSUS SEQUENCE: The nucleotide sequence of a segment of DNA that is in agreement
with most sequence reads of the same segment from different individuals.
g. Overlapping contigs can produce sequence maps: maps that show the order of nucleotides in a
cloned piece of DNA.
h. PROBLEMS:
i. The length of segments is not the only problem to overcome. As in all experimental
observation, automated sequencing machines do not always give perfectly accurate
sequence reads and the error rate is not constant.
Bio 110
010 student
Genetic Analysis II
Beavers
Page 4 of 9
ii. To ensure accuracy, genome projects obtain multiple independent sequence reads of each
base pair of a genome. Tenfold coverage (10x) ensures that chance errors in the reads do
not give a false reconstruction of the consensus sequence.
11. WHAT STRATEGIES WERE EMPLOYED TO GET SEQUENCE MAPS?
a. Two strategies:
1. Hierarchical Strategy
2. Whole Genome Shotgun Sequencing
12. HIERARCHICAL SHOTGUN SEQUENCING STRATEGY – Map first, sequence later.
a. SHOTGUN: sequencing approach in which the overlapping insert fragments to be sequenced
have been randomly generated in one of three ways:
i. SOURCE OF DNA: BAC's, genome sonication (shearing DNA with sound), restriction
digest
1. Produce a genomic BAC library. (Bacterial Artificial Chromosome based on the
F1 fertility plasmid in bacteria.)
2. Develop map of overlapping BAC clones.
a. Large clone contigs are screened for similarities: Restriction enzyme
recognition sites, short tandem repeats or STS’s (STS: one of a kind
marker that tag positions along the DNA molecule). Organize a minimal
tiling path (minimally overlapping regions).
3. Produce shotgun clones.
a. Choose a BAC insert to be sequenced and shear into ~2 kb fragments.
4. Sequence.
13. WHOLE-GENOME SHOTGUN SEQUENCING STRATEGY – Sequence first, map later.
1. Shear DNA 3 times to construct 3 different sized fragments to produce 3 libraries.
2. Sequence.
3. Assemble into maps using sequence reads to build contigs to build scaffolds.
4. Use unique sequence overlaps found in sequence reads to build CONTIGS.
5. Paired-end reads can be used to span gaps to order and orient CONTIGS into
SCAFFOLDS.
14. WHOLE-GENOME SHOTGUN SEQUENCING
15. Celera’s three different sized generated fragments provided spatial information about the clones
16. CHALLENGES
a. SEQUENCING ERROR – all machines make
b. DISTINGUISHING SEQUENCE ERROR FROM POLYMORPHISMS
i. Polymorphism: variant of a gene or any genomic DNA sequence that has two or more
alleles.
c. REPEATED SEQUENCES – where do they belong?
d. UNCLONABLE DNA CANNOT BE SEQUENCED (heterochromatic)
i. A high proportion of the DNA located in regions of constitutive heterochromatin consists
of long stretches of simple repetitive sequences like SSR's (simple sequence repeats). In
addition, heterochromatic regions are often repositories for many transposable elements.
17. Sequenced yes. Understood? Not completely.
Bio 110
010 student
Genetic Analysis II
Beavers
Page 5 of 9
a. Which DNA sequences once ordered correspond to genes? Centromeres? Telomeres?
Transposable elements?
b. Clues for the location of genes – ORF’s and locating transcribed regions:
1. Open reading frames (stretches of nucleotides that have a reading frame of triplets
uninterrupted by a stop codon.
2. Any sequence of DNA can have 6 reading frames.
3. Other information is necessary to verify.
c. All genes are transcribed into RNA – even if they will never be translated.
i. Works well for RNA's that are abundant – like rRNA's.
ii. Less effective for RNA's that are relatively rare in a cell – like mRNA's.
iii. Once either is obtained – copy into DNA to study (cDNA).
d. Annotation of the genome: analyzing which sequence of DNA do which tasks.
18. GENETIC VARIATION
a. Genomes of Watson, Venter and an unknown Chinese man reveal in total more than 5.6 million
single nucleotide differences from the “standard” human genome.
b. No standard human genome length.
c. Until DNA could be evaluated on a molecular basis, all wild-type individuals were presumed to
have the same alleles.
d. It was found that wild-type individuals of the same species could produce variant forms of
proteins, encoded by variant alleles.
e. This is the origin of the term polymorphic:
f. Polymorphic: a locus with two or more distinct alleles in a population.
g. Genetic variants: describes alleles of a polymorphic locus.
h. Polymorphism: variant of a gene or noncoding region that has two or more alleles. Molecular
geneticists use this term to describe a variant of a locus within a population of organisms that has
two or more alleles. Population geneticists reserve the term for variants at a locus where two or
more alleles are present at a frequency of 1% or greater; for example, to describe the alternative
forms of a gene that has more than one wild-type allele.
i. Locus: any location (gene or not, single base pair or millions of base pairs) in the genome that is
defined by chromosomal coordinates, regardless of biological function.
j. In light of the broadened view of locus, we must recognize that: an allele of any locus is a
variation in the DNA sequence itself, even if it has no impact on the expression of any trait.
Functional or not, this makes no difference in the manner that a locus is transmitted from one
generation to the next.
k. Researchers can use nonfunctional loci as genetic markers to identify, locate, isolate, and follow
the transmission of nearby genes.
l. Anonymous DNA polymorphisms: differences in genomic DNA sequence with no effect on gene
function.
19. SINGLE NUCLEOTIDE POLYMORPHISMS (SNPs)
a. SNPs: – simplest and most useful class of genetic variant.
b. SNPs defined: – single nucleotide polymorphisms. Particular base positions in the genome where
alternative letters of the DNA alphabet commonly distinguish some people from others.
c. SNPs occur due to: a mistake in DNA replication, mutagenic chemical or radiation.
d. Account for most of the total variation that exists between human genomes.
e. Occurs on average once every 1000 bases.
f. Derived allele: an allele that arises through mutation.
g. Ancestral allele: allele carried by last common ancestor of two species.
Bio 110
010 student
Genetic Analysis II
Beavers
Page 6 of 9
20. RESTRICTION SITE-ALTERING SNPs DETECTED BY SOUTHERN BLOT OR PCR
a. Alleles of a SNP locus are well defined, single base changes in DNA sequence, they can be
distinguished by a variety of molecular methods:
i. Restriction enzyme digest
ii. Gel Electrophoresis
iii. Southern Blotting
iv. PCR
v. Allele specific oligonucleotide hybridization: allows short hybridization probes to
distinguish single base mismatches under the correct experimental conditions.
vi. DNA Microarrays – which allow labs to detect SNP alleles at over 1 million locations
(loci) for a few hundred dollars.
21. InDels or DIPs (Deletion-Insertion polymorphisms)
a. Genetic variation can be caused by addition or loss of DNA.
b. This would be the deleting, duplicating or insertion of genetic material into chromosomes.
c. It can be the loss or gain of one base pair all the way to the loss or gain of multiple megabases
(millions of bases).
d. The second most common form of genetic variation in the human genome is represented by
InDels or DIPs (deletion-insertion polymorphisms).
e. DIPs (Indels): short insertions or deletions of genetic material. Range in length from one base
pair to hundreds of base pairs.
f. Can cause a frameshift mutation if Indel is not inserted or deleted by 3 nucleotides or a multiple
of 3.
22. SSRs: Simple Sequence Repeats (Microsatellites)
a. SSRs (Simple Sequence Repeats) have more of a size differential and can be easily detected by
PCR and gel electrophoresis.
b. SSRs: sequences of one to a few bases that are REPEATED in tandem les than 10 to more than
100 times.
c. Also called - STRP: short tandem repeat polymorphism
d. Human genomes, as well as other complex organisms are loaded with loci defined by simple
sequence repeats.
e. DETECTION BY PCR AND GEL ELECTROPHORESIS:
23. SSRs, PCR and DNA Fingerprinting
a. Using the product rule for independent assortment, the likelihood that any two random
individuals share exactly the same combination of two alleles at a particular SSR loci is 10%.
b. Same combination of alleles at a second SSR loci? 0.10 x 0.10 = 0.01 (100%)
c. How about at the 13 positions used for DNA fingerprinting? 0.10 x 1013 = 1 chance in 10 trillion
– right now the earth only has approximately 7 billion human beings.
d. FBI maintains CODIS – a data base of DNA fingerprints using the same 13 SSR loci.
e. All 50 states mandate collection of DNA fingerprint data from felons convicted of certain crimes
such as sexual offenders. Also includes missing persons.
24. COPY NUMBER VARIANTS (CNVs), COPY NUMBER POLYMORPHISMS (CNPs)
a. CNVs or CNPs are a category of genetic variation arising from LARGE regions of duplication or
deletion, depending on frequency of occurrence in a population.
b. Length of a CNV (a large block of genetic material with a variable repeat number) is 10 bp to 1
Mb per repeat.
c. People possess variation in the number and type of olfactory receptor genes they have.
d. Unequal crossing over a potential cause of CNV's.
Bio 110
010 student
Genetic Analysis II
Beavers
Page 7 of 9
25. Minisatellites -DNA Fingerprinting with Restriction Enzymes
a. Many definitions for Minisatellites: repeats having a unit size in the range of 500 bp to 20 kb.
b. Perhaps best generally defined as a VNTR (variable number tandem repeat): A type of DNA
polymorphism created by a tandem arrangement of multiple copies of short DNA sequences.
c. The POWER of Minisatellites: particular minisatellite sequences often occur at a small number
of different genomic loci. (Microsatellites are at thousands of locations in the genome.)
d. Using restriction enzyme digest, gel electrophoresis, and Southern blot hybridization, researchers
can look simultaneously at allelic variation at multiple UNLINKED loci.
26. PREIMPLANTATION GENETIC DIAGNOSIS
a. A technique that allows couples to establish the genotype of their fetus by fertilizing the
harvested egg in-vitro with her partner's sperm. At approximately the 6-10 cell stage, one cell is
removed from each viable embryo for testing.
27. DNA MICROARRAY
a. A DNA array is a large set of DNA fragments displayed on a solid support (like a glass chip the
size of a microscope slide cover).
b. The goal: to analyze THE LEVELS OF EXPRESSION of the individual genes that are
represented on the chip using the DNA samples being hybridized.
c. Possible uses: (not an exhaustive list!!)
d. Compare tissue- and cell type-specific gene expression.
e. Compare developmental stage-specific gene expression.
f. Compare gene expression during differentiation or even tumor genesis.
g. Analyze two different cDNA's taken from one kind of cell at DIFFERENT phases of the cell
cycle.
h. Analyze inducible gene expression (how cells respond to environmental changes: hormones, heat
shock, chemicals).
i. The list goes on and on…
28. What if it were abnormalities detected in ultrasound?
a. Signature genomics.
29. WGS vs. WES and Massively Parallel Sequencing
a. WGS: Whole genome sequencing
b. WES: whole-exome sequencing: sequencing of only genomic DNA corresponding to exons.
c. Exome: the portion of a genome corresponding to exons; in humans, the exome is approximately
2% of the genome.
d. Based on Sanger sequencing but with some new additions:
i. Individual DNA molecules being synthesized by DNA polymerase are anchored in one
place.
ii. These methods control base addition temporarily so that each base can be identified
before the next base is added
iii. In some systems, the sensitivity of detection is so high that a single molecule of DNA can
be monitored without the need for cloning or PCR amplification steps.
iv. SEQUENCING MACHINES ARE ABLE TO RECORD THE SUCCESSIVE
ADDITION OF NUCLEOTIDES TO EACH OF THE MILLIONS OF GROWING DNA
MOLECULES IN REAL TIME.
v. How?
Bio 110
010 student
Genetic Analysis II
Beavers
Page 8 of 9
1. a) Millions of fragments of single-stranded genomic DNA to which poly-A has
been enzymatically added at the 3' end are hybridized to oligo-dT molecules
attached to the surface of a special microarray called a flowcell.
2. b) Using the genomic fragment as a template and the oligo-dT as a PRIMER –
DNA polymerase synthesized new DNA nucleotides with colored, base-specific
fluorescent tags. These nucleotides are also blocked at their 3' ends so that only
one nucleotide can be added at a time. This chemical block is reversible.
3. c) after a high resolution camera photographs the fluorescence, chemical applied
to the flowcell remove the tag and blocking group from the just added nucleotide.
4. d) Each subsequent cycle begins by infusing the flowcell with a new dose of
tagged nucleotides and polymerase, and is followed by an iteration of step c.
5. The sequencing machine takes about 100 pictures that record a sequence of
colored flashes at each of millions of spots on a flowcell where a single DNA
molecule is being synthesized.
6. The machines computer rearranges the data into millions of short sequence reads
of about 100 nucleotides, and then assembles the genome sequence.
7. Wow.
Bio 110
010 student
Genetic Analysis II
Beavers
Page 9 of 9
Download