Gene Discovery

advertisement
MEDG520
Block 1
Gene Discovery
Concepts:
























Describe the different modes of inheritance
X-linked diseases – what are their unique inheritance and mutation patterns,
and how can disease arise in a female?
Describe X chromosome inactivation
Describe Duschenne Muscular Dystrophy (DMD)
Structure of an 'average gene' – is DMD average?
How do you positionally clone a gene – circa 1985?
How do you positionally clone a gene – circa 2003?
Discuss how one would go about positionally cloning a gene, including
knowledge of how to use:
o Southern
o FISH
o DNA sequencing
o somatic cell hybrids
o PCR
o subtractive or competitive hybridizations (CGH)
o microarrays
o vectors for cloning DNA
Discuss the composition of the human genome including:
o gene structure
o polymorphisms
o repetitive DNA elements
Mutation frequency and human populations
Discuss the different types of mutations that can lead to human disease.
Discuss variables that may influence mutation frequency across the genome
Compare differences in mutation frequency between males and females
Techniques: PCR and the challenge of single sperm typing
Techniques: DNA sequencing
RNA interference
Chromatin Structure
Explain the differences between active and silent chromatin in the human
genome.
Explain how chromatin structure can result in position effects
LCRs
DNA methylation and gene silencing
o CpG Islands
Techniques: microarrays
Techniques: CGH and array CGH
Variations in DNA copy number in normal individuals, individuals with
genetic diseases or cancers




Variations in RNA expression levels in normal individuals, individuals with
genetic diseases or cancers
Discuss normal and disease-related alterations in gene copy number and gene
expression
Describe approaches to detect changes in gene copy number and expression.
Other Definitions
Describe the different modes of inheritance
Autosomal Dominant Inheritance
Mutation Location:
Autosomal Chromosome
Genetic transmission: Individuals possessing one copy of a mutation will be affected.
Examples:
Huntington's Disease
Charcot-Marie-Tooth Disease Type 1
Spinocerebellar Ataxia
Myotonic Dystrophy
Characteristics of Autosomal Dominant Inheritance:


The child of an affected parent has a 50% chance of inheriting the parent's mutated allele and thus
being affected with the disorder.
A mutation can be transmitted by either the mother or the father. All children, regardless of gender,
have an equal chance of inheriting the mutation.
Addendum to principle of Autosomal Dominant Inheritance:

Some autosomal dominant disorders may be characterized by reduced penetrance, i.e., an
individual may inherit a mutation and not manifest clinical symptoms. However, these individuals
may transmit the mutation and have affected offspring.
Example of an Autosomal Dominant Pedigree:
(Darkened circles and squares indicate
affected individuals.)
Autosomal Recessive Inheritance
Mutation Location:
Autosomal Chromosome
Genetic transmission: Individuals possessing two copies of a mutation will be affected.
Examples:
Spinal Muscular Atrophy (SMA)
Friedreich's Ataxia
Characteristics of Autosomal Recessive Inheritance:



An individual will be a "carrier" if they posses one mutated allele and one normal gene copy.
There is a 50% chance that a carrier will transmit a mutated gene to a child.
If two carrier parents have a child there is a:
o 25% chance that both will transmit the mutated gene; in this case, the child will inherit only
mutated copies of the gene from both the mother and the father and thus will be affected
with the disorder.
o 50% chance that one carrier parent will transmit the mutated gene and the other will
transmit the normal gene; in this case, the child will have one mutated gene and one
normal gene and will be a carrier of the disorder.
o 25% chance that both carrier parents will transmit the normal gene; in this case the child
will have only normal genes and will not be affected and will not be a carrier.
This punnet square illustrates the possible genetic combinations for the child of
two carrier parents. A capital "A" indicates a normal gene copy. A lower case "a"
indicates the mutated gene. The carrier parents (Aa) are indicated outside the
square. The possible offspring combinations are denoted within the 4 squares: 1
in 4 (25%) possibility that the child will be affected (aa); 2 in 4 (50%) that the
child will be a carrier (Aa); 1 in 4 (25%) that the child will have only normal
genes (AA).


All children of an affected individual will be carriers of the disorder.
A mutation can be transmitted by either the mother or the father. All children, regardless of gender,
have an equal chance of inheriting mutations.
Example of an Autosomal Recessive
Pedigree: (The half-filled circles and squares
represent carriers of an autosomal recessive
genetic disease. The fully blackened square
represents an affected individual.)
X-Linked Dominant Inheritance
Mutation Location:
X-Chromosome
Genetic transmission: Individuals possessing one copy of a mutation will be affected.
Examples:
Charcot-Marie-Tooth Disease Type X1
Characteristics of X-Linked Dominant Inheritance:




A male or female child of an affected mother has a 50% chance of inheriting the mutation and thus
being affected with the disorder.
All female children of an affected father will be affected (daughters possess their fathers' Xchromosome).
No male children of an affected father will be affected (sons do not inherit their fathers' Xchromosome).
If a disease is X dominant lethal, only observed in females.
Example of an X-Linked Dominant Pedigree:
(Darkened circles and squares indicate
affected individuals.)
X-Linked Recessive Inheritance
Mutation Location:
X-Chromosome
Genetic
transmission:
Individuals possessing no normal gene copies will be affected; typically, only males
are affected.
Examples:
Duchenne/Becker Muscular Dystrophy
Norrie Disease
Spinal and Bulbar Muscular Atrophy (Kennedy's Disease)
Characteristics of X-Linked Recessive Inheritance:





Females possessing one X-linked recessive mutation are considered carriers and will generally not
manifest clinical symptoms of the disorder.
All males possessing an X-linked recessive mutation will be affected (males have a single Xchromosome and therefore have only one copy of X-linked genes).
All offspring of a carrier female have a 50% chance of inheriting the mutation.
All female children of an affected father will be carriers (daughters posses their fathers' Xchromosome).
No male children of an affected father will be affected (sons do not inherit their fathers' Xchromosome).
Example of an X-Linked Recessive Pedigree:
(Darkened squares indicate affected males;
dark circles within clear circles indicate carrier
females.)
Mitochondrial Inheritance
Mutation Location:
Mitochondrial DNA
Genetic
transmission:
Dependent on proportion of normal and mutated mitochondrial DNA (mtDNA).
Kearns-Sayre Syndrome
Examples:
MELAS - Mitochondrial Myopathy Encepholopathy, Lactic Acidosis, and Stroke Like
Episodes
MERRF - Myoclonus with Epilepsy Ragged Red Fibers
Characteristics of Mitochondrial Recessive Inheritance:




Mitochondrial DNA is inherited from the mother only (maternal inheritance). Fathers do not
contribute mtDNA to their offspring.
All children of a mother with a mtDNA mutation are at risk to be either affected with the disorder or
asymptomatic carriers of the disorder.
An individual will be affected with a mitochondrial disorder if the percentage of mitochondria
possessing mutated mtDNA reaches a threshold value beyond which the normal mtDNA does not
compensate for the mutated mtDNA.
The mixture of mitochondria possessing mutated mtDNA and mitochondria with normal DNA is
referred to as heteroplasmy.
Example of an Mitochondrial Inheritance
Pedigree: (Darkened circles and squares
indicate clinically affected individuals. Squares
and circles with gray shading indicate
presence of familial mitochondrial mutation in
varying proportions.)
X-linked diseases – what are their unique inheritance and mutation patterns, and
how can disease arise in a female?
 See modes of inheritance above
Describe X chromosome inactivation
 inactivation of X chromosome occurs early in embryonic life – soon after
fertilization
 in any female somatic cell either paternal or maternal X is inactive – matter of
chance
 once one X is inactive, all cells descended from that cell has same inactive X
o Inactivation is random but permanent.
 X inactivation happens to compensate dosage – Lyon hypothesis
 But some regions of short arm and at least one region of long arm escape
inactivation
 Therefore 2 genes are expressed in females for genes in these regions
 X-linked genes that escape inactivation fall into 3 groups:
1. pseudoautosomal region – genes on very distal short and long arms; these are
matching to sequences on the Y chromosome and therefore males and females
have 2 copies of genes expressed
2. located outside pseudoautosomal region – have related copies on Y
chromosome and therefore have equal dosage in males and females
3. located outside pseudoautosomal region but no copies on Y chromosome and
therefore females have higher dosage; genes still expressed even though X is
inactive
 X-inactivation can vary (ie the inactive X can be predominantly normal and the
active predominantly mutant or vice versa)
 Therefore heterozygotes can vary with respect to symptoms of disease
 Manifesting heterozygote – deleterious allele is on active X and normal allele on
inactive X in most cells = skewed X-inactivtion
Describe Duschenne Muscular Dystrophy (DMD)
 deletions in dystrophin gene – exons 45-48
 X-linked recessive
 Incidence of 1/3500 male births
 DMD mutations include large deletions 50-70%
o large duplications 5-10%
o small deletions
o small insertions
25-30%
o nucleotide changes
 nucleotide changes occur throughout gene but predominantly at CpG
 de novo nucleotide changes arise during spermatogenesis
 age of onset and severity in females depends on degree of skewing of Xinactivation















1/3 of mothers with 1 single affected son are not carriers of a mutation in DMD
gene
if daughter is a carrier, she has a low risk of developing Duchenne but risk of
cardio abnormalities is 50-60%
if a mother is apparently not a carrier, she still has a 7% risk of having a boy with
Duchenne due to germline mosaicism
DMD is genetic lethal in males – affected males fail to reproduce
DMD mutation rate is 10-4
some females have DMD due to X-autosome translocation or only one Xchromosome (Turner Syndrome)
DMD gene is huge – 2300kb (1.5% of X chromosome)
DMD gene has 79 exons, 7 tissue specific promotors, differential splicing
60% of patients have deletions in 2 spots – 5` half or in central region
mRNA length 13000bp
DMD has alternative promotors – for tissue specificity
When DMD patients have mental retardation, due to disrupting alternative
promotors
5` ends often have deletions
deletions could shift entire reading frame – if mRNA deeps splicing it has a 1/20
chance of encountering a stop codon
Becker’s DMD is a milder form
o gene still partially functional
o deletion but inframe (1 in 3 chance of being in frame)
o because DMD is such a huge gene, a small in frame deletion is not as
severe
Structure of an 'average gene' – is DMD average?
 What is a gene?
o DNA sequence that carries information
o No ‘one’ definition
o genes contain exons, introns, enhancers, promotors, splice sites
o genes make a protein, protein made from ORF (open reading frame)
o when looking for a gene, want to know where things have gone wrong
o UTR (untranslated region)
o TAATAA box – promotor located ~ 42 bp upstream of 1st exon; allows
transcription to start
 Are there additional exons downstream of polyA signal?
o NO – nonsense mediated RNA decay?
o System in cell to eliminate extra exons that could lead to alternative splice
sites
o Recognizes stop and alternative splice sites after stop codon
Structure of an average gene:
 At genomic level, consists of exons interspersed with introns, demarcated by
splice sites.
 Is associated with upstream and downstream regions that contain important
features like transcription factor binding sites, promoters, enhancers, etc.
 Produces a transcript (mRNA) with a 5' cap, 5' UTR, start codon, CDS, 3' UTR,
stop codon, and PolyA tail in that order.
 Translated into some protein.
 Average span: 27kb
 Average number of exons: 3 to 8 (depending on source)
 Many genes give rise to a few different transcripts as a result of alternative
splicing, transcription initiation, or transcription termination sites.
Is DMD average? Not at all!
Very large: spans 2.4 MB and produces a transcript of ~14KB.
 79 exons
 Has unusually common mutations, usually translation terminating mutations.
 Highly complex, containing at least 8 independent, tissue specific promoters and
two polyA addition sites.
 Has a larger than normal number of splice variants encoding numerous protein
isoforms.
 Large variability in intron size from a few KB to 180KB.
 Recombination rate 4x higher than predicted for gene of its size.
How do you positionally clone a gene – circa 1985?
See Use of competitive hybridization in positional cloning below for an example
How do you positionally clone a gene – circa 2003?
Definition: Positional cloning refers to identifying genes based on their chromosomal
location.
Since 1985 (when the DMD gene was first cloned) new techniques have emerged
including:
i.
PCR (developed in 1985)
ii.
Fluorescence in situ Hybridization (FISH)
iii. YACs and BACs
Discuss how one would go about positionally cloning a gene, including knowledge of
how to use:
Southern (taken from http://www.web-books.com/MoBio/Free/Ch9D.htm)
 Southern blotting is a technique for detecting specific DNA fragments in a
complex mixture. The technique was invented in mid-1970s by Edward
Southern. It has been applied to detect Restriction Fragment Length
Polymorphism (RFLP) and Variable Number of Tandem Repeat Polymorphism
(VNTR). The latter is the basis of DNA fingerprinting.
Figure 9-D-1. Southern blotting. (a) The DNA to be analyzed is digested with
restriction enzymes and then separated by agarose gel electrophoresis. (b) The DNA
fragments in the gel are denatured with alkaline solution and transferred onto a
nitrocellulose filter or nylon membrane by blotting, preserving the distribution of the
DNA fragments in the gel. (c) The nitrocellulose filter is incubated with a specific
probe. The location of the DNA fragment that hybridizes with the probe can be displayed
by autoradiography.
Northern blotting
 Northern blotting is used for detecting RNA fragments, instead of DNA
fragments. The technique is called "Northern" simply because it is similar to
"Southern", not because it was invented by a person named "Northern".
 In the Southern blotting, DNA fragments are denatured with alkaline solution. In
the Northern blotting, RNA fragments are treated with formaldehyde to ensure
linear conformation.
Western blotting
 Western blotting is used to detect a particular protein in a mixture. The probe
used is therefore not DNA or RNA, but antibodies. The technique is also called
"immunoblotting".
Fluorescence in situ Hybridization (FISH)
 FISH is a technique that can locate a gene at a particular site within an intact
chromosome.
 Once the gene of interest has been cloned, its DNA can be used as a probe to find
where the gene is located within a set of chromosomes.
 Since the gene of interest will only hybridize to its complementary sequence, this
is a good technique for accurately locating any cloned gene on its chromosome.
 Method:
1. Chromosomes are fixed on slides and their DNA is denatured to single
stranded form.
2. Probes are added (ie the gene of interest)
3. Probes hybridize to denatured DNA only at sites complementary to the probe.
4. Fluorescent labeled molecules that bind to the probe are added (these labels
are chemically treated in such a manner that they only bind to the probe).
5. Slide is observed under a fluorescence microscope.




Use genomic DNA for probe – hybridize better than cDNA since cDNA has
introns removed
10-40kb is reasonable size for probes
choose a contig of sequence from library for 40 kb (plasmids)
can PCR up to 10kb to get a probe
DNA sequencing
See techniques: DNA sequencing below
somatic cell hybrids









human mouse hybrid
way to separate 2 alleles – look at expression and mutations in both copies
Somatic cell hybrids allow for entire chromosomes to be isolated from a donor
cell in a recipient cell in culture. In a classic case, the donor cell is human and the
recipient cell is a rodent cell.
Rodent cells preferentially loose all but one human chromosome, which allow the
separation of 2 homologous chromosomes, to look at the genes and alleles
specific to one homologue by looking at separate hybrid clones.
In the past, somatic cell hybrids have been used to determine the chromosomal
location of genes.
Procedure:
1. Human and rodent cells growing in culture are mixed.
2. Agents are added to the media that promote the fusion of cell membranes.
3. The nuclei of fused cells remain separate. When a human and rodent cell fuse
together, yet two distinct nuclei remain, it is referred to as a heterokaryon. (If
two cells of the same species fuse it is referred to as a homokaryon.)
4. Following mitosis and cell division, the two nuclei unite and form a single
“hybrid” nucleus.
5. Over cell divisions, the hybrid cell looses variable numbers of chromosomes.
For unknown reasons, human chromosomes are preferentially lost, although it
is a random process with regards to which human chromosomes are lost.
6. After several cell divisions, different daughter hybrid cells contain different
numbers and combinations of human chromosomes. Cell division can be
allowed to continue until each daughter hybrid cell contains just one human
chromosome.
7. Each daughter hybrid cell can then be isolated and analyzed for which human
chromosome they have retained using a karyotype technique that
distinguishes between rodent and human chromosomes (through the use of
probes).
8. A panel of daughter hybrid cells can then be established that represent the
human genome (1 human chromosome per hybrid cell).
9. This panel can be used to map human genes to their location, utilizing
Southern blots and PCR.
The gene responsible for Tay Sach’s, HexA, was mapped this way to
chromosome 15.
See http://www.gmpgenetics.com/html/core/index.html for more info
Advantages:
o Split maternal and paternal chromosomes into separate hybrids.
o Ability to examine alleles independently for mutations.
o Provides a clean template for genomic sequencing.
o Reduces the amount of sequencing, by permitting uni-directional
sequencing.
o Allows independent analysis of gene expression.
o Can show transfunctional defects of an allele in the absence of the wild
type allele.
o Indicates a mutation on an allele without the need for further screening.
o
o
o
o
Powerful method for directly ascertaining haplotype.
Reduces number of samples needed to provide definite data.
More informative data is gained from family studies.
Facilitates understanding of genetic effects on drug metabolism and
differing responses to drugs.
o Prioritizes samples for further study.
PCR (Explanation and figures taken from
http://allserv.rug.ac.be/~avierstr/principles/pcr.html)
The purpose of a PCR (Polymerase Chain Reaction) is to make a huge number of copies
of a gene. This is necessary to have enough starting template for sequencing.
1. The cycling reactions :
There are three major steps in a PCR, which are repeated for 30 or 40 cycles. This
is done on an automated cycler, which can heat and cool the tubes with the
reaction mixture in a very short time.
1. Denaturation at 94°C :
During the denaturation, the double strand melts open to single stranded
DNA, all enzymatic reactions stop (for example : the extension from a
previous cycle).
2. Annealing at 54°C :
The primers are jiggling around, caused by the Brownian motion.
Hydrogen bonds are constantly formed and broken between the single
stranded primer and the single stranded template. The more stable bonds
last a little bit longer (primers that fit exactly) and on that little piece of
double stranded DNA (template and primer), the polymerase can attach
and starts copying the template. Once there are a few bases built in, the
hydrogen bond is so strong between the template and the primer, that it
does not break anymore.
3. extension at 72°C :
This is the ideal working temperature for the polymerase. The primers,
where there are a few bases built in, have a stronger attraction to the
template, created by hydrogen bonds, than the forces breaking these
attractions. Primers that are on positions with no exact match, get loose
again (because of the higher temperature) and don't give an extension of
the fragment.
The bases (complementary to the template) are coupled to the primer on
the 3' side (the polymerase adds dNTP's from 5' to 3', reading the template
from 3' to 5' side, bases are added complementary to the template)
Figure 3 : The different steps in PCR.
Because both strands are copied during PCR, there is an exponential increase of
the number of copies of the gene. Suppose there is only one copy of the wanted
gene before the cycling starts, after one cycle, there will be 2 copies, after two
cycles, there will be 4 copies, three cycles will result in 8 copies and so on.
1. Figure 4 : The exponential amplification of the gene in PCR.
Figure 5 : The first 4 cycles of a PCR reaction in detail. In the 3rd cycle, two
double strands of the right length are copied (the forward and reverse strand are
the same in length). In the 4th cycle, 8 double strands of the right length are
copied.
2. Is there a gene copied during PCR and is it the right size?
Before the PCR product is used in further applications, it has to be checked if :
1. There is a product formed.
Though biochemistry is an exact science, not every PCR is successful.
There is for example a possibility that the quality of the DNA is poor, that
one of the primers doesn't fit, or that there is too much starting template.
2. The product is of the right size.
It is possible that there is a product, for example a band of 500 bases, but
the expected gene should be 1800 bases long. In that case, one of the
primers probably fits on a part of the gene closer to the other primer. It is
also possible that both primers fit on a totally different gene.
3. Only one band is formed.
As in the description above, it is possible that the primers fit on the desired
locations, and also on other locations. In that case, you can have different
bands in one lane on a gel.
Figure 6 : Verification of the PCR product on gel.
The ladder is a mixture of fragments with known size to compare with the PCR
fragments. Notice that the distance between the different fragments of the ladder
is logarithmic. Lane 1 : PCR fragment is approximately 1850 bases long. Lane 2
and 4 : the fragments are approximately 800 bases long. Lane 3 : no product is
formed, so the PCR failed. Lane 5 : multiple bands are formed because one of the
primers fits on different places.
subtractive or competitive hybridizations (CGH)
 used to measure the differences in copy number or dosage of a particular
chromosome segment between 2 different DNA samples
 It is very useful for finding changes in gene dosage in tissues that can serve as a
source of DNA to make a probe, but that are not easily karyotyped, ie. soft
tumors/tissues or cancer cells).
1. Total DNA from one sample is labeled with red flourescent dye (ie. patients
DNA)
2. Total DNA from a second sample is labeled with green flourescent dye (ie.
normal DNA)
3. The two samples are mixed in equal amounts, and used as a painting probe for
FISH with normal human metaphase chromosomes
4. The ratio of red to green fluorescence emited by the probe along each
chromosome is measured
5. If DNA from a particular region of a chromosome is represented equally in the 2
samples that make up the probe, the red:green ratio in the FISH signal will be 1:1
6. If DNA labelled with green is from a normal cell line, and the DNA labelled with
red is from a patient with monosomy for a particular region, the red:green ratio
will be <1, or the region will appear green
7. If the patient is trisomic for a particular region, the red:green ratio will be >1, or
the region will appear red.
Use of competitive hybridization in positional cloning
 Example: cloning of DMD gene
1. Take DNA from DMD patient with deletion and mix with DNA from
XXXXY human lymphoid cell line. This cell line was used because the DMD
deletion was known to lie somewhere on the X chromosome.
2. Shear DMD DNA into many pieces, cleave XXXXY DNA with restriction
enzyme (eg. MboI), giving it ‘sticky’ ends.
3. Mix two sets of DNA and allow competitive hybridization with 200-fold
excess of DMD DNA.
I. Most of the MboI cleaved XXXXY DNA will hybridize with the
excessive sheared DNA from the DMD DNA, forming a double stranded
molecule with one ‘sticky’ end and one sheared end.
II. Some of the MboI cleaved XXXXY DNA will reassociate with itself
and forming a double stranded molecule with two sticky ends
III. DNA from the MboI cleaved XXXXY DNA that corresponds to the
deleted region in the DMD patients will have no choice but to
reassociate with itself, resulting in a double stranded molecule with two
‘sticky’ ends for the deletion region.
4. DNA from situation II and III can be cloned into vector with MboI sites.
Some of these clones will contain the deletion.
5. By using CGH, determine which clones bind to normal but not DMD
chromosomes. These clones will contain the DMD deletion and likely part of
the DMD gene.
6. You can then use this clone with the deletion region to probe your clone set by
Southerns to identify a clone containing the entire gene.
I. You could use the XXXXY human lymphoid cell line clone library
mentioned above. DNA from XXXXY human lymphoid cells was
extracted, digested with some enzyme (eg. MboI) and cloned into some
vector (eg. plasmid).
II. Separate the clones by gel electrophoresis as described above for
Southerns.
III. Label your DMD deletion clone with dye or radioactivity and probe the
southern blot to identify a clone that contains the deletion as well as
more DNA.
IV. Isolate your identified clone from the blot and sequence.
V. You could also use a cDNA clone library to identify the mRNA
transcripts for the DMD gene.
CGH using microarrays

Same idea as above, except that instead of using the probe on metaphase
chromosomes, you use the probe on microarrays that are composed of genomic
DNA lined up in chromosomal sequence. This allows for greater resolution and
you can narrow down in the region of the chromosomal abnormality.
microarrays
See techniques: microarrays below
vectors for cloning DNA
1. plasmids
- insert size 0-10 kb
- high copy number (each bacterial cell will carry many plasmid vectors)
- extrachromosomal circular dsDNA molecules, not a part of the bacterial genome
- important features include:
- MCS polylinker - multiple cloning site; ~30bp sequence with numerous unique
restriction sites where the insert can be cloned in
- ori - origin of replication; allows replication of the plasmid with the bacterial
replication machinery
- antibiotic resistance gene (ex: ampR for ampilicillin resistance); allows for
selection of bacteria that have been transformed (ie. have taken up the plasmid)
- often have a selection system for recombinants; usually the MCS polylinker is
inserted in frame in an expressible gene, so that when an insert is cloned into the
MCS it disrupts the gene (ex: disruption of the LacZ gene)
- disadvantages: size limit of 10Kb, transformation efficiency not that great
2. Bacteriophages
- insert size 9-23 kb
- medium to high copy number
- viruses which infect bacterial cells; usually have dsDNA (can be circular or linear)
- can exist extracellularly with genome encased in protein coat which facilitates entry
into host cell
- most common example: lambda phage
- circular dsDNA in protein coat, ends have 5’ overhang of 12 nucleotides called
the cos sequence because they are cohesive
- once inside the cell the cos sequences pair and a circular dsDNA results
- this DNA can enter the lytic cycle or the lysogenic cycle
- advantages: excellent transformation efficiency
3. Cosmids
- insert size 30 – 44 Kb
- plasmid vectors that contain cos sites and can therefore be packaged into a phage coat
- advantages: packaging into phage coat allows higher transformation efficiency than
regular plasmid
- disadvantages: less stable because recombinants are subject to rearrangement
4. BACs (Bacterial artificial chromosome)
- insert size up to 300Kb
- based on E Coli F factor (Fertility plasmids that determine the “sex” of a bacterium;
they maintain copy number at 1-2 /cell)
- advantage: maintaining low copy number allows stability (usually 1 per cell
therefore no recombination between multiple vectors in a cell)
- disadvantage: ~one copy/cell therefore low yield of cloned DNA/bacteria grown
- Compared to YAC, maintains better stability, can be manipulated easier and DNA
can be cloned with higher efficiency.
- basic structure:
-
CMR is a selectable marker for chloramphenicol resistance
oriS, repE, parA and parB are F factor genes for replication and regulation of copy
number
CosN is the cos site from lambda phage
HindIII and BamhI are cloning sites at which foreign DNA is inserted
The two promotors are for transcribing the inserted fragments
-
The NotI sites are used for cutting out the inserted fragment.
5. PACs (P1 derived artificial chromosome)
- insert size 130 -150 kb
- combines features of the BAC system (above) and the P1 bacteriophage (below)
- the P1 Bacteriophage:
- pac = the P1 packaging site
- two loxP sites, (sites recognized by the recombinase, the product of the cre gene)
- The vector is digested so as to generate two arms, a short arm and a long arm to which
85-100 kb size-selected foreign DNA fragments are ligated.
- Packaging of the recombinant DNA occurs in vitro using P1 packaging extracts: pacase
cleaves the recombinant DNA at the pac site and then works with other components to
insert the DNA (maximum of 115 Kb) into phage heads.
- Tail proteins are attached and the recombinant phage is allowed to adsorb to a cre+
strain of E.Coli.
- The host cell cre product acts on the loxP sites so as to produce a circular plasmid which
is maintained at low copy number (by the plasmid replicon) but can be amplified by
inducing the P1 lytic replicon
6. YACs: Yeast artificial chromosomes
- insert size 200-2000kb
- most popular system for cloning very large DNA fragments
- vector constructstarts out circular, but to use it you cleave it to make it linear
- advantages: can hold a lot. Also propagated in yeast: yeast are eukaryotic and don’t
have trouble replicating DNA from other eukaryotic organisms (prokaryotes have trouble
with some DNA repeat sequences)
- important features for replication in yeast:
- centromere
- two telomeres
- ARS sequence elements (autonomous replicating sequence) necessary
for replication in yeast
1 yeast origin of replication
bacterial origin of replication
unique cloning site recognized by 1 restriction enzyme.
Cloning in a YAC
- CEN1 - centromere sequence; TEL - telomere sequences; ARS1 - autonomous
replicating sequence; Amp - gene conferring ampicilin-resistance; ori - origin of
replication for propagation in an E.Coli host.
- selection system: The vector is used with a specialized yeast host cell, AB1380, which
is red colored because it carries an ochre mutation in a gene, ade-2, involved in adenine
metabolism, resulting in accumulation of a red pigment. However, the vector carries the
SUP4 gene, a suppressor tRNA gene which overcomes the effect of the ade-2 ochre
mutation and restores wild-type activity, resulting in colorless colonies. The host cells are
also designed to have recessive trp1 and ura3 alleles which can be complemented by the
corresponding TRP1 and URA3 alleles in the vector, providing a selection system for
identifying cells containing the YAC vector. Cloning of a foreign DNA fragment into the
SUP4 gene causes insertional inactivation of the suppressor gene function, restoring the
mutant (red color) phenotype
Discuss the composition of the human genome including:
gene structure
 Gene-rich regions often located near centromeres or telomeres, even though
they are more prone to be silenced here. Thought that these regions offer
protection against recombination.

polymorphisms
repetitive DNA elements
Mutation frequency and human population
 There are 100 new mutations in the genome of each individual human.
 1 X 10-8 per base / generation (this is the average rate).


The male rate is estimated to be twice the average rate, thus 2 X 10-8 per base /
generation (a 30 year old male undergoes 400 cell divisions prior to sperm
formation!)
An average of 4.2 amino acid altering mutations per diploid, per generation have
occurred in the human lineage since our separation from chimps!
Mutation Rate: # new mutations/locus/generation
 direct way to measure incidence of new sporadic cases of autosomal dominant or
X-linked diseases that are fully penetrant
 Median mutation rate: 1x10-6 per locus/generation
 based on gene size, fraction of mutant alleles to give noticeable phenotype,
mutational mechanism, presence of mutational hotspots (CpG Islands)
 DMD and NF genes are very large, therefore high mutation rate
 Achondroplasia results from mutation at one hotspot
Types of Mutations and Estimated Frequencies
Class of Mutation
Mechanism
Frequency (approx)
Genome Mutation
Chromosome
10-2/cell division
aggregation
Chromosome
Chromosome
6 x 10-4/cell division
mutation
rearrangement
Gene mutation
Base-pair mutation
10-5 – 10-6/
locus/generation


Examples
Aneuploidy
Translocations
Point mutations
non-coding regions twice as likely to mutate than coding regions
1/2 of all known diseases arise from missense mutations
o 30% of these contain hotspots (ie CpG islands)
o 12% premature trucations
o 10% RNA splicing
o 25% deletions & mutations
Discuss the different types of mutations that can lead to human disease.
 Gain of function versus loss of function mutations.
 Missense versus nonsense mutations
 Duplications, translocations, deletions.
 Mutations can be considered genomic, chromosomal, allelic, etc.
What is a dominant negative mutation?
 A mutation resulting in a mutant gene product that can inhibit function of the
wildtype gene product in heterozygotes.
 Eg. Otseogenesis imperfecta – Missense mutations in either the COL1A1 or
COL1A2 gene result in abnormal type I procollagen. The two genes encode the
two protein chains that make up the procollagen dimer. Abnormal procollagen
can result in a heterogeneous genetic disorder of connective tissue known as
“brittle bone disease”. As with many other dominant negative mutations, the
effect of a missense mutation is worse than that of a null mutation. This is
because, heterozygotes for the null mutation will still have one correct copy and
will still produce enough (or almost enough) normal procollagen molecule.
Heterozygotes for the missense mutation will produce 50% correct protein
subunits as above, but many will be incorporated into abnormal chains because of
all the mutant versions of the protein being incorporated. Instead of being just a
little short on procollagen, there will be lots of abnormal procollagen making up
the structures that depend on it. This leads to many nasty phenotypes.
Discuss variables that may influence mutation frequency across the genome
 Recombination hotspots could create regions of the genome more prone to
chromosomal mutations (eg. translocations, duplications, deletions).
o Repeat regions could also have similar effects.
 Male versus female (see below)
 Base composition. For example, CpG dinucleotides are more prone to mutation
than other base combinations.
 Environmental factors
 Gene size
Mutation rate
 Mutations per locus per generation
Differences in Mutation frequency between males and females.
Factors affecting male-biased mutation:
 Errors during DNA replication in the germ cells
o The male germline goes through many more rounds of replication
o Cell divisions at age 35: males: ~540
Females: ~24
 Differential expression of genes encoding DNA repair enzymes is possible
 Methylation occurs at a reasonably higher rate in sperm, this increases the
frequency of paternal mutation (C T, G A). Cs that from part of a CpG
dinucleotide are methylated. DNA often undergoes de-amination. Unmethylated
Cs are deaminated to U, which is readily recognized since it is not a DNA base
and can be corrected. Methyl-C gets de-aminated to T, which is not as easy to
identify as a mismatch.
 Mutation rate increases with local recombination rate i.e. the repair of double
stranded breaks during recombination is mutagenic.
 Spermatogenesis is ongoing process of meiotic divisions
 Approx mutation frequency of 10-10 replication errors per base pair per cell
division
 Each diploid spermatogonia contains 6x109 base pairs
 Therefore one new mutation each time it replicates before meiosis (not all lethal
or disease causing)





Calculated rates of mutations estimate that approximately 1 in 10 sperm carries a
new deleterious mutation
Therefore more paternal mutations expected with increasing paternal age based on
the rounds of replication
however, evidence shows that new mutations are not more frequently of paternal
origin but not always associated with advanced paternal age
excess of gene mutations of paternal origin observed in Neurofibromatosis,
achondroplasia and Hemophilia B
Differences in mutation rates may be due to selection of particular mutations (ie
C>G mutation arises less frequently but has stronger selective advantage than
C>T mutation)
Factors affecting female-biased mutation:
 Females have more chromosomal abnormalitites i.e. non-disjunction.
 mutations can occur at any point of meiotic division in oogenesis or
spermatogenesis
 oocytes arrested in meiosis I: speculation that the longer they spend in meiosis I,
the greater chance of error when the cells do complete the division
o help explain autosomal trisomies of chromosomes 13, 18 & 21 that occur
more frequently in female germ-line
o increase frequency with maternal age
Techniques: PCR and the challenge of single sperm typing
Single Sperm Typing
 isolate sperm for individual meiotic events/recombination events
 permits the linkage relationships among DNA polymorphisms to be determined
without pedigree analysis
 need gene location and sequence
 Might make use of a nested PCR strategy.
Techniques: DNA sequencing
Standard DNA Sequencing
The process of determining the order of the nucleotide bases along a DNA strand is
called sequencing. In 1977, 24 years after the discovery of the structure of DNA, two
separate methods for sequencing DNA were developed: the chain termination method
and the chemical degradation method. Both methods were equally popular to begin
with, but, for many reasons, the chain termination method is the method more commonly
used today. This method is based on the principle that single-stranded DNA molecules
that differ in length by just a single nucleotide can be separated from one another using
polyacrylamide gel electrophoresis, described earlier.
The DNA to be sequenced, called the template DNA, is first prepared as a singlestranded DNA. Next, a short oligonucleotide is annealed, or joined, to the same position
on each template strand. The oligonucleotide acts as a primer for the synthesis of a new
DNA strand that will be complementary to the template DNA. This technique requires
that four nucleotide-specific reactions—one each for G, A, C, and T—be performed on
four identical samples of DNA. The four sequencing reactions require the addition of all
the components necessary to synthesize and label new DNA, including:





A DNA template
A primer tagged with a mildly radioactive molecule or a light-emitting chemical
DNA polymerase, an enzyme that drives the synthesis of DNA
Four deoxynucleotides (G, A, C, and T)
One dideoxynucleotide, either ddG, ddA, ddC, or ddT
After the first deoxynucleotide is added to the growing complementary sequence, DNA
polymerase moves along the template and continues to add base after base. The strand
synthesis reaction continues until a dideoxynucleotide is added, blocking further
elongation. This is because dideoxynucleotides are missing a special group of molecules,
called a 3'-hydroxyl group, needed to form a connection with the next nucleotide. Only a
small amount of a dideoxynucleotide is added to each reaction, allowing different
reactions to proceed for various lengths of time, unti, by chance, DNA polymerase inserts
a dideoxynucleotide , terminating the reaction. Therefore, the result is a set of new
chains, all of different lengths.
To read the newly generated sequence, the four reactions are run side-by-side on a
polyacrylamide sequencing gel. The family of molecules generated in the presence of
ddATP is loaded into one lane of the gel, and the other three families, generated with
ddCTP, ddGTP, and ddTTP, are loaded into three adjacent lanes. After electrophoresis,
the DNA sequence can be read directly from the positions of the bands in the gel.
Figure 3. Chain termination DNA sequencing.
Chain termination sequencing involves the synthesis of new strands of DNA complementary to a singlestranded template (step I). The template DNA is supplied with a mixture of all four deoxynucleotides, four
dideoxynucleotides (each labeled with a different color fluorescent tag) and DNA polymerase (step II).
Because all four deoxynucleotides are present, chain elongation proceeds until, by chance, DNA
polymerase inserts a dideoxynucleotide. The result is a new set of DNA chains, all of different lengths (step
III). The fragments are then separated by size using gel electrophoresis (step IV). As each labeled DNA
fragment passes a detector at the bottom of the gel, the color is recorded. The DNA sequence is then
reconstructed from the pattern of colors representing each nucleotide sequence (step V).
Variations of this method have been developed for automated sequencing machines. In
one method, called cycle sequencing, the dideoxynucleotides, not the primers, are tagged
with different colored fluorescent dyes; thus, all four reactions occur in the same tube and
are separated in the same lane on the gel. As each labeled DNA fragment passes a
detector at the bottom of the gel, the color is recorded, and the sequence is reconstructed
from the pattern of colors representing each nucleotide in the sequence.
Pyrosequencing:
see pyrosequencing website.
http://www.pyrosequencing.com/
RNA interference
 double stranded RNA (dsRNA) causes degradation of homologous mRNAs
(RNAi in animals, and post-transcriptional silencing in plants)
Mechanism of RNAi








dsRNA is injected or taken up by the animal (in worms, this is by injection,
soaking or feeding).
initiation step: a long dsRNA is processed into small interfering RNAs (siRNA)
of about 21-23nt. Cleavage requires ATP and is mediated by an RNAseIII-like
dsRNA specific ribonuclease called Dicer.
effector step: ds siRNAs are incorporated into a multiprotein complex (RNAinduced silencing complex = RISC). RISC undergoes an ATP-dependent
activation step that results in the unwinding of ds siRNAs. Activated RISC uses
the ss siRNA as a guide to identify complementary RNAs. An endoribonuclease
cleaves the target RNA across from the centre of the guide siRNA. The cleaved
RNA is degraded by exoribonucleases.
an unresolved issue is the subcellular locations of siRNA production and target
RNA degradation – there is the most evidence for a cytosolic pathway
amplification steps might be required for efficient RNA-mediated silencing in
several systems.
There is a C. elegans model in which primary siRNAs might prime the synthesis
of additional dsRNA using target mRNA as a template  newly synthesized
dsRNA would then be cleaved by Dicer to generate secondary siRNAs as a
sufficient concentration to achieve efficient target mRNA degradation by RISC
In human cells, RNAi seems to be limited mostly to the cytoplasm.
a fraction of the RISC complexes might be located at the nuclear pores and act as
gatekeepers, scanning the RNAs as they are being exported.
From Maquat Lab: http://dbb.urmc.rochester.edu/labs/maquat/maquat_lab.htm
Research in my lab focuses on studies of an RNA decay pathway that typifies all
eukaryotic cells that have been examined. This pathway is called nonsense-mediated
mRNA decay (NMD) or mRNA surveillance. NMD likely evolved to safeguard cells
from potentially deleterious proteins produced as a consequence of routine mistakes in
gene expression. In mammalian cells, these mistakes include inaccuracies in transcription
initiation or pre-mRNA splicing, ineffective somatic DNA rearrangements of the type
that characterize the immunoglobulin and T-cell receptor genes, and recognition of a
selenocysteine codon as nonsense. These mistakes often result in mRNAs having reading
frames upstream of the usual reading frame, frameshift mutations that generate nonsense
codons, or nonsense mutations.
Our studies of nonsense-containing transcripts together with results from a
survey we undertook indicate that transcripts from the majority of mammalian genes are
subject to NMD when they prematurely terminate translation more than 50-55 nt
upstream of the final exon-exon junction. Therefore, we now understand why diseaseassociated nonsense codons generally reduce mRNA abundance but normal termination
codons, which usually reside within the final exon, generally do not. One of the most
significant outcomes of our work has been the finding that nuclear pre-mRNA splicing
influences cytoplasmic mRNA translation by influencing mRNP structure. Data indicate
that a complex of proteins is deposited on mRNA immediately upstream of exon-exon
junctions as a consequence of pre-mRNA splicing. This complex recruits Upf3/3X, a
mostly nuclear shuttling protein involved in NMD. The Upf3/3X-bound complex is
exported in association with mRNA to the cytoplasm. There, Upf3/3X recruits Upf2, a
perinuclear protein also involved in NMD. If translation terminates more than 50-55 nt
upstream of an exon-exon junction marked by the Upf proteins, then Upf1 can interact
with Upf2 and elicit NMD. However, if translation terminates less than 50-55 nt
upstream of or downstream of the final exon-exon junction marked by the Upf proteins,
then translating ribosomes are thought to remove the mark and confer immunity to NMD.
Consistent with this model, we have found that the substrate for NMD is mRNA bound
by the CBP80/20 complex of cap binding proteins. Once CBP80/20 is replaced by the
eukaryotic initiation factor (eIF) 4E, the mRNA is immune to NMD. As would be
expected from our findings, CBP80/20-bound mRNA, but not eIF4E-bound mRNA, is
associated with Upf3/3X and Upf2. Furthermore, CBP80/20-bound mRNA is translated.
Therefore, another significant outcome of our work is the discovery of a new template for
translation that supports what we call the "pioneer" round of protein synthesis.
Approximately one third of genetic diseases are due to frameshift and
nonsense mutations that result in the premature termination of translation. Studies in
progress will significantly advance our understanding of the factors involved in splicing,
translation termination and mRNA decay that are required for NMD. Our results will be
useful when designing therapies that aim to abrogate NMD in order to abrogate the
severity of nonsense-generated diseases.
Chromatin structure
Chromatin - The complex of double stranded DNA and the proteins that comprise
chromosomes Chromatin structure is involved in ‘DNA packing’.
Nucleosomes - The basic units of chromatin and the fundamental packaging structure of
DNA (Nucleosomes are found in all eukaryotic chromosomes).
 11nm in diameter.
 Each one is composed of a short length of DNA (146 bp) wrapped twice around a
core histone octamer . The octamer consists of histones H2A, H2B, H3, H4 (all X
2).
 Histone 1 (H1) fastens the DNA to the nucleosome core.
 Linker DNA (8-100 bp depending on the species) connects nucleosomes together
giving the unfolded chromatin a ‘beads on a string’ appearance.
Chromatin fibre
This involves higher order organization of nucleosomes.
A zig-zag fibre of nucleosomes forming a rod like structure with a diameter of 30 nm.
This process is facilitated by H1 binding to linker DNA.
Note: Histone modifications within chromatin correlates with increased transcriptional
activity whereas the absence of any modification is a feature of transcriptionally inert
chromatin.
Explain the differences between active and silent chromatin in the human genome.
Chromatin states
 Heterochromatin: The condensed form of chromatin (highly conserved and may
be transcriptionally silent) 2 types:
1. Constitutive: specific DNA regions that are not expressed.
2. Facultative: condensed in some cells and not others, thus representing stable
differences in activity of genes in different cells (random X inactivation is an
example).

Euchromatin: The unfolded form of chromatin (transcriptionally active DNA).
o Chromatin is the complex of DNA and chromosomal proteins called
histones.
o There are 5 major types of histones that play a critical role in the proper
packaging of the chromatin fiber which is essential for establishing an
environment to ensure proper gene expression.
o 2 copies of 4 histones (H2A, H2B, H3, H4) constitute an octomer around
which DNA winds twice to make a nucleosome.
o Nucleosomes are the basic unit of chromatin.
o Chromatin structure is condensed during metaphase of cell division, while
at other times, the chromatin is less condensed, with some regions in a
beads-on-a-string conformation (Fig below).
o The structure of chromatin reflects the distinction between
transcriptionally active and inactive regions of DNA.
o Transcriptionally inactive chromatin generally adopts a highly condensed
conformation and is associated with tight binding by the histone H1
molecule.
o Transcriptionally active chromatin adopts a more open conformation and
is marked by:
1. relatively weak binding by histone H1 molecule
2. extensive histone acetylation
3. absence of methylated cytosines.
o The open conformation of transcriptionally active chromatin domains
makes them more accessible to cleavage by enzymes (DNAase I, RNA Pol
I, etc.)
What are the mechanisms for affecting chromatin structure?
 Topological changes in a closed loop of DNA that leads to formation of DNA
supercoils.
 Protein catalyzed alteration of chromatin.
 Gain or loss of protein coat alters chromatin structure.
Explain the Modification of Chromatin by Protein Complexes*:
1. Chromatin-Remodelling Complexes:
o 3 classes of ATP-dependent chromatin-remodelling proteins are present in
mammalian cells.
o All 3 classes use ATP hydrolysis to alter histone-DNA contacts and/or
nucleosome positioning, thereby making chromatin accessible to other
proteins, such as transcriptional factors.
2. Histone-Modifying Enzymes:
o Core histones may be modified by acetylation, phosphorylation,
methylation and ubiquitylation.
o The pattern of specific histone modifications determines gene activity.
3. DNA cytosine Methyltransferases:
o Methylation of C-residues in CG motifs is found in plants and mammals.
*Grummt et al. Epigenetic Silencing of RNA Pol I Transcription. Nat Rev Mol Cell Bio
4:641-649 (2003).
Figure. The condensed structure of chromatin.
(a) The 30 nm chromatin fiber is associated with scaffold proteins to form loops. Each
loop contains about 75 kb DNA. Scaffold proteins are attached to DNA at specific
regions called scaffold attachment regions (SARs), which are rich in adenine and
thymine.
(b) The chromatin fiber and associated scaffold proteins coil into a helical structure
which may be observed as a chromosome. G bands are rich in A-T nucleotide pairs
while R bands are rich in G-C nucleotide pairs.
Explain how chromatin structure can result in position effects
 Position effects are most often caused by differences in chromatin state (see
above).
o For example, if a transgene is inserted into heterochromatin, its expression
will be lower than if it is inserted into euchromatin.
 Deletions or translocations can move a gene into a new chromatin environment
changing the genes expression. If expression is reduced, we call this a negative
position effect. If expression is increased, we call it a positive position effect.
 LCRs, promoters, and enhancers can all play a role in position effects as well.
Eg. in Burkitt’s lymphoma, a translocation places the c-MYC gene under the
control of an immunoglobulin enhancer.
 Changes in chromatin structure can bring a gene into closer contact with nuclear
membrane and promotes expression. May be mediated by S/MARS
Chromatin structure and control of gene expression:
 Decondensation of chromatin allows access of gene regulatory proteins. Thus,
chromatin structure may represent an “upstream” control of expression before the
transcriptional control system
Chromatin structure is closely tied to DNA methylation and histone acetylation.
 Histones are composed of a structured, three helix domain called th histone fold
and two unstructured tails.
 histone tails are susceptible to numerous modifications including acetylation,
methylation, phosphorylation, and ubiquitination. All of these may be involved in
regulating chromatin structure and thus gene expression.
 In general acetylation of core histone tails correlates to the opening of chromatin
structure to allow transcription.
 Acetylation is performed by histone acetylation transferases (HATs) and
deacetylation by histone deacetylases (HDACs).
 Methylation of DNA by DNA methyl transferases (DNMTs) can alter histone
acetylation and vice versa (possibly).
LCRs
 Change in chromatin structure depends on sequence of dna called the Locus
Control Reason (LCR) that lies far upstream of the gene cluster.
 LCRs are typically 50-60kb upstream of the gene, contain enhancers, cis-acting
elements, hyper-DNAse sensitive.
 eg. deletions in the LCR for B-globin gene can silence the gene even though the
gene's sequence is unaltered. The chromatin remains condensed and thus the gene
is silenced, leading to thalassemia.
DNA methylation and gene silencing
 DNA methylation represents one method of epigenetic control.
 The Cytosines of CpG dinucleotides are methylated and this affects the
expression of nearby genes.
 The mechanism is likely tied in to chromatin state through histone
acetylation/deacetylation. The exact mechanism is still unclear.

CpG islands
 Usually maintained in an unmethylated state. They are protected from
methylation by certain proteins so that important genes (eg. housekeeping) in their
vicinity will never be silenced.
 A CpG island is any region of higher CG frequency relative to elsewhere. In
general, the GC content in the genome is lower than expected 0.5 because of the
tendency of C->T mutations. Any region of 0.5 or higher GC content is usually
defined as a CpG island.
CpG islands versus CpG dinucleotides
 The Cytosines of CpG dinucleotides are more prone to mutations than other
bases. This is because it is the C that is methylated as part of the epigenetic
control mechanism of the genome.
 A methylated Cytosine is converted to Thymidine when deaminated (C->T).
 In fact, deamination of methylated versus unmethylated Cytosine is just as likely.
But, when and unmethylated Cytosine is deaminated, it results in a Uracil (C->U).
The DNA repair mechanisms recognize this and correct it much more easily than
a C->T mutation.
 Because, CpG islands are much higher in CpG content, one would expect that
over evolutionary time, many of the C’s would have been mutated to T’s and the
island destroyed. In fact, this is what has happened throughout much of the
genome, which is why CG content is generally low. CpG islands have been
maintained because they are in front of important genes which must always be
kept “on”. Therefore, they are never methylated and therefore not converted to
thymidine as described above.
Microarrays
What are microarrays useful for?
 Drug discovery, basic research, target discovery, biomarker determination,
pharmacology, toxicogenomics, target selectivity, development of prognostic
tests, and disease subclass determination.
What is a microarray?




An artificially constructed grid of DNA constructed such that each element of the
grid probes for a specific RNA sequence – that is, each holds a DNA sequence
that is the reverse complement of the RNA sequences trancribed for the genes of
interest.
DNA microarrays are small solid supports onto which the sequence from 1000’s
of different genes are attached at fixed locations.
The solid support maybe made of glass, plastic, silicon, gold or gel.
Nucleotide arrays may be constructed using cDNA, RNA, genomic DNA or
plasmid libraries.
Methods of Preparing the Microarray
1. Photolithography for light directed synthesis of oligonucleotides which allows for
base by base synthesis of 100,000’s of nucleotides usually 20-25 nucleotides in
length on an area of glass about 1.28 X 1.28 cm.
2. Robotic deposition that uses a single longer (1000 bp) double stranded DNA for
each gene.
o In both cases probes are usually designed from sequences located closer to
the 3’ end of the gene and different probes maybe used for different exons.
When Using the Array….
 DNA samples of interest must be labeled with reporter molecules that can be
easily identified….currently, reporter molecules consist of red and green
fluorescent dyes.
 After hybridization, of labeled samples (usually overnight), the arrays are scanned
using a laser. The fluorescent dyes emit detectable light which is captured by a
detector such as a confocal microscope which records the intensity of the
emission.
 The intensity provided by the array image may be quantified using a software
package.
 The quantitative fluorescence image along with the known identity of the probes
(ie their address on the chip) is used to asses the presence/absence of a gene.
Advantages of Microarrays
 Detection of aberrations such as deletions, duplications, non-reciprocal
translocations, and gene amplifications that lead to developmental abnormalities
and cancer.
 It allows us to look at many genes at once.
 Well suited for comparing gene expression in different populations of cells (ie it
can reveal the genes which are preferentially expressed in specific tisuues).
How does it work?
There are many protocols and variations but basically:
 extract RNA from biological sample in either normal state or some experimental
state.
 RNA is copied and fluorescent nucleotides or some stainable tag is incorporated.
 Labelled RNA is hybridized to microarray for a period of time and then excess

washed away and slide analyzed with a lazer.
4,000-50,000 measurements of gene expression are obtained for each sample.
These are normalized and then analyzed in various ways (often clustered).
Can you use microarrays do detect inherited disorders?
 Depends on what you are looking for
 ie SNP – can’t be ID’d
 if transcription levels are fine, but problem with transcript – can’t detect
 DNA copy # changes that aren’t pathogenic – duplications
What are the main types of microarrays and how are they different?
 oligonucleotide microarrays – all probes are designed to be theoretically similar
with respect to hybridization temperature and binding affinity. Each microarray
measures a single sample and the absolute level for each RNA molecule is
determined. eg. Affymetrix chips are probably the most famous oligo microarray
chips.

cDNA microarrays – each probe has its own hybridization characteristics.
Therefore, each microarray measures two samples at once (a control and an
experimental sample) and the relative measurement level for each RNA is
determined. The dyes enable the amount of sample bound to a spot to be
measured by the level of fluorescence emitted when it is excited by a laser. If the
RNA from the sample in condition 1 is in abundance, the spot will be green, if the
RNA from the sample in condition 2 is in abundance, it will be red. If both are
equal, the spot will be yellow, while if neither are present it will not fluoresce and
appear black. Thus, from the fluorescence intensities and colours for each spot,
the relative expression levels of the genes in both samples can be estimated.

Serial analysis of gene expression (SAGE) – not technically a microarray but
used for the same kind of gene expression studies as microarrays and often
lumped in with the two above. Unique short sequence tags are extracted from
sample enzymatically.
SAGE Protocol:
Step 1. The restriction enzyme Nla III, "anchoring enzyme" (AE), cleaves at the sequence
5'-CATG, leaving a four nucleotide 3' overhang. Biotinylated fragments are isolated
using streptavidin beads.
Step 2. Divide in half and ligate to linkers A or B. The linkers contain unique primer
binding sites, the recognition sequence (5'-GGGAC) for a tagging enzyme (TE), in this
case BsmFI, and an Nla III compatible sticky end.
Step 3. Cleave with BsmFI tagging enzyme (TE) and blunt-end-fill in. The enzyme
BsmFI cuts 10 and 14 bases in the 3' direction from its recognition site, thus adding the
"Tag" sequence to the linkers.
Step 4. Ligation and amplification using primers A and B. Step 5 Restriction with
anchoring enzyme, isolation of ditags, concatenate and clone. X and O represent
nucleotides from different transcripts.
CGH:
 detects and maps changes in copy # of DNA sequences
 used for analysis of tumor genes and chromosomal aberrations
 take DNA from a test (ie tumor) and a reference genome (ie normal individual)
o differentially label (eg. Cy3 and Cy5)
o hybridize to a representative genome (originally a metaphase chromosome
spread)
o block hybridization of repetitive sequences (Cot-1 DNA)
o not good for telomere/centromere since Cot-1 DNA will bind to these
areas blocking expression
o fluorescence ratio of test and reference signals are determined at different
positions along the genome
o provides info on relative copy # of sequences of test genome compared to
normal diploid genome
array CGH
 new microarray based CGH is preferred to chromosome based CGH –use large
genomic clones (use ligation mediated PCR to increase copy # of BACs before
plating) and cDNAs
 array CGH has advantages over chromosome CGH
1. increase resolution and dynamic range
2. direct mapping of aberrations
3. higher throughput
4. automation – diagnostic devices
 chromosomal array limits are 10-20 Mb – therefore, anything smaller than that
won’t be detected, can’t narrow down individual mutations, depends on
preparation of metaphase chromosomes
 chromosomal aberrations are involved in developmental abnormalities and cancer




CGH can detect deletions, duplications, non-reciprocal translocation and gene
amplification
more difficult to detect single copy (low level) gains and losses
becomes more difficult with heterogeneous cell populations
using BACs in array CGH, give intense signals so accurate measurements can be
obtained over a range in copy #
o single copy changes can be detected
What determines resolution of your CGH array?
 size of reference DNA
 chromosome CGH – 10 Mb on average is size of bands on chromosome so
limited by band size
 what if you want to cover the entire genome?
o need to tile to get full coverage
o cosmids (40kb) so more difficult
o BACs – need fewer clones to get overlapping segments but still need a lot
of spots
 If looking at X-linked mental retardation, just use 1 chromosome
 But most often you don’t know what gene you want or which chromosome so
need to do CGH on entire genome
 If want 1Mb resolution, use BACs evenly spaced throughout.
o can potentially miss what you are looking for.
o sensitivity based on coverage - # of spots, spacing of spots
 FISH probes five times higher resolution – available commercially now
What is a UNIGENE?
 Composed primarily of ESTs, mRNAs – each cluster should correspond to known
or putative genes
 Represents cluster with some evidence that sequences belong together
 an attempt to bring order to a huge amount of data
 A Unigene is almost anything that could be a gene?
 EST – expressed sequence tag – sequenced some expressed sequence
o since EST is expressed, probably sequence from a gene – cDNA clone
o made libraries of ESTs – found at 3` end of a gene since need to sequence
less to get unique sequence (center of gene is more likely to be conserved
since hydrophobic and hydrophilic sites) – ends incorporate UTR – since
untranslated less conservative) (open reading frame has high sequence
conservation within a gene family so may not be able to distinguish from
other genes)
o need to be able to distinguish ESTs from each other and didn’t originally
have complete sequence
o could have different ESTs for a particular gene due to alternative splicing
o UNIGENE is trying to distinguish multiple ESTs that represent 1 gene
 UNIGENE evidence for a gene – tries to combine all sources of information


1. EST
2. microdeletion leading to phenotype
3. combine open reading frame info with promoter sequence (problem with
looking for promoter is that they aren’t very well defined or conserved)
4. conserved domains across species (mice)
5. combine open reading frame with exon/intron boundaries – flanked by
consensus splice junctions
6. know what 1 gene looks like – look for similar genes
7. has known genes and combines all ESTs for this gene and combine into 1
UNIGENE
constantly changing as new data comes forward
gives sequence assembly
Variations in DNA copy number in normal individuals, individuals with genetic
diseases or cancers
 different tumors differ in the regions of DNA that are aberrant and also the types
of copy # aberrations that are present
o ie defective mismatch repair colon tumors have fewer copy # changes than
proficient mismatch repair colon tumors
 tumors also differ in histology, in inactivated genes, and response to therapy
o can be distinguished based on types of genetic instability and selection of
altered genes
 genetic background of cells shapes the copy # profiles of tumor cell genomes
 copy # profiles may be used to understand cellular control pathway deregulation
in sold tumors
Variations in RNA expression levels in normal individuals, individuals with genetic
diseases or cancers
 could be genomic amplification that doesn’t result in excess mRNA production
 Overexpression – what does this do? The impact?
 An initial rearrangement can provide an advantage
 if checkpoint is skipped, mutant cells can replicate faster and don’t need to
expand energy on repair
 telomerase – why an advantage?
o Cell doesn’t die – chance to accumulate more mutations; a lot of these
mutations overcome checkpoint and can increase stability, missegregation
can increase mutations
o May influence expression of other genes accumulate DNA and RNA
changes
Describe approaches to detect changes in gene copy number and expression.
 Use a cDNA microarray for both comparative genomic hybridization (array CGH)
and mRNA expression analysis.
 Arrange cDNAs on array according to genomic position.



Hybridize array with either genomic DNA samples or mRNA samples.
Look for concordance between regions of multiple DNA copy number changes
and expression level changes.
A study using the above method found that on average, a 2-fold change in DNA
copy number is associated with a 1.4-1.5 fold increase in mRNA level.
Other Definitions
Genomic imprinting:
 Whether a disease is expressed or not depends on whether mutant allele is
transmitted from mom or dad
 Results from alteration in chromatin that affects the expression of a gene but not
the DNA sequence
Germline mosaicism:
 parents are phenotypically normal but a somatic mutation has occurred in a
germline cell
 14-15% of mothers with affected DMD offspring have germline mosaicism
Subtraction cloning
DNAse
 used to distinguish between heterochromatin and euchromatin. Dense packaging
prevents degradation by the enzyme.
chromosomal position effects
 gene expression is perturbed by moving a structurally normal gene to an
unfavorable chromatin environment as a result of a deletion, inversion or
chromosomal translocation.
Negative chromosomal position effects
 change in chromatin structure prevents expression of gene.
Positive chromosomal position effects
 change in chromatin structure brings gene into closer contact with nuclear
membrane and promotes expression. May be mediated by S/MARS.
Epigenetic modifications
 nuclear localization, pattern of replication, chromatin structure, pattern of dna
methylation. Ie. Any modification that affects gene expression independantly of
the sequence.
S/MARs
 Scaffold/Matrix Attachement Regions. A region of a chromosome that binds
nuclear matrix proteins.
Missense Mutation
 A gene mutation in which a base-pair change in the DNA causes a change in an
mRNA codon (but not to a stop codon), with the result that a different amino acid
is inserted into the polypeptide in place of one specified by the wild-type codon.
Nonsense Mutation
 A gene mutation in which a base-pair change in the DNA causes a change in an
mRNA codon from an amino acid-coding codon to a chain-terminating
(nonsense) codon. As a result, polypeptide chain synthesis is terminated
prematurely and is therefore either nonfunctional or, at best, partially functional.
Often results in null mutation.
Download