Lecture 1 (cont.): DNA, Genes, Gene Expression, transcription. 31/3

advertisement
Lecture 1 (cont.): DNA, Genes, Gene Expression, transcription. 31/3/2005
Note: [figure numbers] refer to slides in ppt presentation
3 DIFFERENTIATION and GENE EXPRESSION
All cells of a multicellular organism were ”born” in division of a single cell. [Central dogma]
All cells contain the same DNA and contain all genes - the full genome. However, different cell types synthesize different proteins. But for higher multicellular organisms one finds
that the 2000 most abundant proteins (> 50, 000 copies/cell) appear with about the same
concentrations (within factor 5) in different cell types. Only a few % of the proteins are expressed in a tissue-specific way. A typical higher eucaryotic cell expresses 10 - 20,000 genes.
Presence/absence of a few hundred - 1000 induces tremendous differences.
The same cell can have different expression levels at different times, especially in response to
external signals (conditions) or internal signals. Knowledge of the concentration of all protein
species will yield information on the cell and it’s ”biological state”. Direct measurement of
the concentrations of all proteins with reasonable accuracy is very difficult, and one also
needs to know their phosphorylation state, bound complexes, etc. We will focus on gene
expression, as reflected in the concentration of mRNA in the cell.
Working assumption: the concentrations of the mRNA molecules in a cell define
it’s ”biological state”.
To understand the incompleteness of this assumption, note that the concentration of proteins
can be controlled in a variety of ways. [control]:
Transcriptional control
RNA transport control
RNA degradational control
RNA processing control (splicing)
Translational control
Protein activity control
Nevertheless, under our working assumption, knowledge of 25 - 30,000 numbers - the concentration of 25-30,000 kinds of mRNA - defines the state of the cells of a human tissue.
The device that measures these mRNA concentrations is called a DNA chip or microarray.
The latest Affymetrix U133plus2.0 chip, measures the expression levels of 55,000 ”genes”
(probe sets); in a good experiment this is done for 10 - 100 samples taken from different
tissues, people, tumors. These expression levels are in the form of a large matrix or array,
that contains 500,000 - 5,000,000 numbers. [excel]
Our aim is to represent this data, visualize it and extract biologically significant
meaning from it.
Visualization: color code [color code],[leukemia1],[leukemia2]
Lecture 2: DNA chips.
1 INTRODUCTION
The basic assumption: the ”state” of a cell <===> mRNA concentration of its genes.
Two main types of experiments:
(a) Ns ≈ 10 − 100 samples (e.g. tissue removed from tumors), for each L ≈ 1 − 10 known
clinical labels and measured expression levels for Ng ≈ few thousand to few 10,000 genes.
(b) M ≈ 1 − 10 experimental conditions. Initialize some biological process at time t = 0 and
measure expression at ≈ 5 − 20 time points ti for Ng genes.
Aims:
(a) Identify different sub-classes of a disease (cancer) on the basis of gene expression. Use
for diagnosis, prognosis, design of therapy.
(b) Identify groups of genes with correlated expression levels over conditions to learn about
possible functions, networks and relationships.
DNA chips measure the mRNA concentration in a solution.
The method is based on a hybridization reaction:
Two matching strands of DNA are held together by hydrogen bonds, A-T and C-G [hybridization1]. Similarly, DNA and its complementary RNA are strongly bound by A-U and
C-G bonds. When the double stranded compound is heated (or submitted to changes of
the chemical environment), the two strands dissociate (denaturation). Upon reversal of
conditions, matching single strands meet and reunite (hybridization) [hybridization1]. An
oligonucleotide of say 20 bp will bind with the highest affinity to a perfectly matching sequence (PM) of a long single stranded polynucleotide, and with significantly lower affinity
to a segment with imperfect (mis)match (MM). The stringency of the hybridization depends
on the temperature; higher T, more stringent. This is a non-equilibrium effect; at lower T
the strand is bound to the much more abundant MM and does not dissociate (essential in
order to seek and find its PM). [hybridization2].
The basic idea of DNA chips: prepare a solid substrate (chip) divided into pixels. Stick to
each pixel identical probes; segments of DNA taken from one gene. Prepare a solution that
contains different targets - species of mRNA at different concentrations. Pour the solution
onto the chip. The mRNA molecules diffuse and if they find a matching probe, i.e. one
taken from the gene from which the mRNA was transcribed, they stick. Detect the amount
of targets that stuck to each pixel.
There are two basic families of implementations of this idea:
(a) cDNA microarrays (spotted microarrays)
(b) Oligonucleotide microarrays (Affymetrix chips).
2 QUICK OVERVIEW
2.1 cDNA spotted arrays - Simplified picture [cDNA expt].
(a) TARGET (sample) preparation:
Compare gene expression under two conditions: experiment and control.
Extract mRNA from both samples.
Use Reverse Transcription to produce cDNA from each species of mRNA. The cDNA is
fluorescently marked; experiment by RED, control by GREEN.
RT
cells → mRNA −→
cDNA (label Red/Green)
(b) PROBE (chip) preparation: Prepare a library of (double stranded) DNA.
Print a spotted array - each spot contains 107 − 108 clones of DNA from one gene, 5-10,000
spots on chip.
(c) Hybridization: Incubate the solution of the two kinds of marked cDNA and hybridize to
probes.
(d) Detection: Wash away unbound targets and measure fluorescence from each spot;
RATIO=RED/GREEN
To control non-specific binding: Two kinds of targets (expt. and control) compete for probe
Campbells animation:
http://www.bio.davidson.edu/courses/genomics/chip/chip.html [cDNA spots]
2.2 Oligonucleotide microarrays (Affymetrix) - simplified picture.
[Aff.design: Target]
(a) TARGET (sample) preparation: Only ”experiment”; if want to compare to control - run
a separate chip.
Extract mRNA from sample.
Reverse-Transcribe to (double-strand) cDNA, and transcribe back to cRNA The cRNA is
amplified, fluorescently marked and fragmented (cut into short oligonucleotide chains).
RT
cells → mRNA −→
cDNA −→ mRNA (amplify, label, fragment)
(b) PROBE (chip) preparation: [Affy probe]
Synthesize in situ single strand 25-mer oligos [Probe litho]; basic feature - ≈ few 106 copies
of single oligo [Wafer], [PM/MM]:
PM: - ”perfect match” - the DNA of the gene to be studied
MM: - ”mismatch” - the same 25-mer with a central mistake
55,000 probe-sets
(c) Hybridization: Incubate the labeled cRNA solution, hybridize to chip, wash away unbound targets, stain [Affy expt design]
(d) Detection: Measure fluorescence from each spot; record
Difference = Perfect Match - MisMatch
To control non-specific binding: Two kinds of probes (PM/MM) compete for target
.
Target
.
Probe
.
Hybridize
Detection
.
nonspec. bind.
cDNA
Expt and Control, cDNA label Red/Green
need 1 µg mRNA
cDNA (2-stranded, 1000bp) on
spots of size 100µ, No.spots¡10,000
Expt and Control to cDNA
Red/Green laser fluorescent
record RATIO
2 targets compete for probe
Affymetrix
Expt. cRNA, label, fragment
0.2 - 2 µg mRNA
25-mer oligos, single-strand DNA
11 PM/MM pairs, No. PS=55K
Expt. to PM and MM
fluorescent dye,
record PM-MM DIFFERENCE
2 probes compete for target
3 DETAILED DESCRIPTION OF THE TECHNIQUES
3.1 Oligonucleotide Microarrays [AffED]
(a) Target preparation
Need 1 - 15 µ g of total RNA; (107 cells yield 200 µg).
Extraction: total RNA and from that separate the mRNA (about 2 %), using the poly-A
tail to pull out the mRNA.
Reverse transcription:
Reverse Transcription (RT):
Transcription of RNA to DNA by Reverse Transcriptase, an unusual kind of DNA polymerase, an enzyme (complex) that uses an RNA strand as template to synthesize the matching complementary DNA. [4f5-74]. It is used by retroviruses:
Virus = genetic elements enclosed by protective coat. 0.1µ. Inserts its genetic material
(normally - DNA) and enzymes into a cell and ”takes a ride” to replicate itself (several
100 copies). The genome contains also the special enzymes needed for its own replication.
Retroviruses’ genetic material is RNA. The virus also contains reverse transcriptase [3F6-82]
Reverse transcriptase (like any DNA polymerase) adds the next nucleotide to a growing
strand only if
1. the incoming base matches that of the template and
2. the preceding base pair is also matched - a proofreading device to correct replication
error.
BUT this poses a question - how does the process start? In vivo - a short (10 bp) RNA
primer is assembled at the initiation site by DNA primase. In vitro - we introduce a particular short sequence that complements the desired start site and serves as primer to grow a
desired sequence of DNA.
In the RT process a TTTTT segment is used as primer to complement the poly-A. An
RNA-polymerase binding sequence XXXX is added. Then
1.RT produces single strand cDNA, which is
2. complemented to double stranded cDNA by DNA polymerase.
3. Transcription to mRNA (50-fold linear amplification!): Only the correct strand is transcribed (the XXXX determines which - transcription goes only in the 5’ direction of the
template).
4. The bases that are used are to build the RNA are biotinated.
5. Fragment the mRNA (chemically randomly) to get bits of 50 - 200bp.
(b) Probe (chip) preparation [Wafer]:
Cover glass slide with covalent linker molecules, terminated with a protective group that
can be removed by light. Use photo-lithography to expose selectively patches and add the
appropriate chemical group and cover with protective layer and repeat. The basic feature
is (on the U133plus2 chip) a 11 µ square covered with several 106 identical single strand
oligonucleotides, 25 bp long, copied from a gene. [Photo,Synth]
For each gene to be represented - select 11 such oligos; pair each with a mismatched [PerfectMatch].
Selection of the probe-sets is a major component of the process. They are biased towards
the 3’ end (last exon usually not spliced out and also if the RNA - polymerase falls off too
early when the target is prepared - the complements of the 3’-biased probes are present.
(c) Hybridization
At 45 degrees for 16 hours; wash and stain (fluorescently marked avidin attaches to biotin).[Hyb,Shirley’s].
(d) Detection
Illuminate and scan [Shirley’s] - measure intensity of emitted light, at resolution of 2.5 µ
[pixels] calculate average intensity of each basic feature (square).
(e) Output
for each gene on chip 11+11 numbers; the average of the pixels in each basic feature, P Mi
and M Mi , i = 1, 2, ..., 11. Calculate differences Di and their weighted average for each
gene:
Di = P M i − M M i
AvgDiff =
N
1 X
Wi Di
N i=1
The average is taken over N differences; outliers and suspect values are discarded.[excel]
Absent/Present call.
Reproducability, noise: [scatter] multiplicative noise, 90 % within 20-50 %
Download