Uploaded by harmoniesmusical

CMMB 461 DNA MICROARRAY 1 2022 FOR D2L

advertisement
DNA microarrays:
“procedure, fabrication, data processing and analysis”
CMMB 461
University of Calgary
Gordon Chua
1
Suggested Readings
1. C.A. Harrington et al. (2000) Monitoring gene expression using DNA
microarrays, Curr. Opin. Microbiol. 3, 285–291.
2. Hughes, T.R. and Shoemaker, D.D. (2001) DNA microarrays for
expression profiling; Curr Opin Chem Biol. 5, 21-5.
3. Quackenbush, J. (2002) Microarray data normalization and
transformation. Nat Genet. 32, Suppl:496-501.
4. Leung and Cavalieri (2003) Fundamentals of cDNA microarray data
analysis. Trends in Genetics 19, 649-659
5. M.B. Eisen et al. (1998) Cluster analysis and display of genome-wide
expression patterns. PNAS 95,14863-8..
2
1. Background and Introduction to
Microarrays:
“What are they and what led to its development?”
3
Transcriptome
•Transcriptome: defined as a complete set of transcripts
encoded in the genome and their relative levels of expression in
a particular cell or tissue type under defined conditions.
•Characterizing the transcriptome can identify:
•Genes exhibiting cell and tissue-specific expression
•Genes aberrantly expressed in cell and tissue disease
(molecular basis behind the disease)
•Genes expressed in response to environmental toxins and
pharmaceutical compounds (mode of action and side effects)
•Genes expressed in response to pathogens (mode of infection
and virulence)
4
• Obtain blood transcriptomes of
104 ASD cases and 82 controls
(all males)
• Found 55 genes differentially
regulated as candidates to
diagnose autism
• 68% accuracy for ASD
identification with these 55
genes for males, poorly for
females
• Blood test to detect autism may
be possible.
Kong et al. 2012 PLoS ONE 7: e49475
5
Northern blotting
•Conventional method to detect RNA transcripts of a cell and tissue.
•To characterize the transcriptome of a human cell or tissue type,
you would have to run 25,000 northerns and use 25,000 different
probes!
Wikipedia
6
DNA microarrays
•DNA/gene chip that contains single-stranded probes (25-70 nucleotides)
with sequence complementary to a specific gene/mRNA
•Each probe is present in many copies in a spot on the microarray
•Fluorescent-labelled mRNAs or cDNAs are placed on the microarray to
hybridize (complementary base pairing) to the probes
•The intensity of the fluorescence is proportional to the abundance of
mRNA/cDNA that bind to the probe.
•Allows the simultaneous monitoring of the expression (mRNA) level of
every gene in an organism in response to genetic and environmental
perturbation)
•In a single experiment, (two weeks) can determine which genes in the
genome are transcriptionally turned on or off
7
Microarray probe design
. . . . . .
. . . . . .
. . . . . .
Side view
A
T
C
A
G
C
T
A
G
G
C
A
G
C
C
T
A
A
T
C
A
A
C
A
C
C
T
T
A
G
C
A
A
G
C
T
Top view
1. Specificity: unique for each gene, no cross hybridization
2. Homogeneity: bind to complementary DNA at same Tm
3. Sensitivity: not form 2o structures that interfere with
hybridizations
8
Microarray procedure
Wild type
(Control)
Mutant/drug
(Experimental)
X
X
X
X
X
X
X
X
Z
Z
X
X
X
X
X
Isolate total mRNA
Y
Y
Z
Z
Y
Y
Y
Y
Y
Y
Reverse transcribe and label
cDNA with red (Cy5) and
green (Cy3) fluorescent dyes
X
X
X
Z
Z
Y
Y
X XX
XX
X
X
Z
Z
X
Y
Y
Y
Y Y
YY
ZZ
Z
Y
Z
Y
Y
Y
Y
Y
Y
Relative levels
X
Y
Z
UP
DOWN
UNCHANGED
NOT PRESENT
9
2. Fabrication of Microarrays:
“How do they get oligonucleotides probes on a
matrix at such high densities?”
10
Ink-jet microarrays (Agilent)
Ink-jet print-head uniformly deposits small, accurate
volumes (picoliters) of nucleic acids building the 60-mer
oligonucleotide probes one base at a time onto a 1’ X 3’
glass slide
4 X 44K Expression
microarray
•Flexible, customizable
•All 60-mer probes are virtually functional
•No need for expensive masks: cheaper
•Density: >1,000,000 spots/array
http://www.agilent.com/about/newsroom/lsca/background/2007/bg_microarrays.pdf
11
Photolithographic microarrays (Affymetrix)
•Oligonucleotide probe synthesis
on wafer using combination of
photolithography and chemistry
•Photomask: opaque plate with
holes that allow light to shine in
specific locations on the silicon
wafer
•Light
removes
blocking
compound which prevents base
addition to wafer
•Flood with a chemical base (e.g.
adenine)
which attaches to
unblocked area of wafer
http://www.affymetrix.com/estore/about_affymetrix/outreach/educ
ator/microarray_img_resources.affx#
•Repeat
this
process
blocking compound and
photomask.
with
new
12
3. Labelling, Scanning and Image
Processing
“Getting colourful microarray images and
extracting the data”
13
Common ways to “label” nucleic acids
Random priming of doublestranded DNA
Reaction
contains
labelled
nucleotides
Direct labeling of mRNA with
fluorescent molecules:
*
*
Amplification by transcription
AAAAAAAA
*
AAAAAAAA
Poly-T primed cDNA synthesis
(Reverse transcription)
AAAAAAAA
Reaction
contains
labelled
nucleotides
*
AAAAAAAA
**
Courtesy of Tim Hughes
TTTTTTTTTT
TTTTTTTTTT-T7 promoter
“second
strand”
synthesis
AAAAAAAA-T7 promoter
TTTTTTTTTT-T7 promoter
* *
* *
*
T7 reaction
contains
labelled
nucleotides
14
Fluorescence dyes for labelling microarray samples (Cy3 and Cy5)
•Fluorescence: emission of light by a molecule
that has absorbed light/radiation (excitation)
•Water-soluble fluorescent dyes of the cyanine
Excitation wavelengths family
•Cy5 dye is excited with a 635 nm red laser
and detected by a emission filter that passes
only 650-690 nm light
Emission wavelengths
•Cy3 dye is excited with a 532 nm green laser
and detected by a emission filter that passes
only 557-592 nm light
•Fluorescent intensity
photomultiplier tube
is
detected
by
a
http://www.answers.com/topic/cy3-cy5-dyes-gif-1
http://www.jireurope.com/technical/images/GRAPH1.gif
15
How does microarray data initially look like?
•For each microarray, acquire two TIFF images (16-bit)
scanned with either the Cy5 (red) and Cy3 (green) channel
Cy5 channel
Cy3 channel
Red and Green Laser Scanner (Genepix)
https://www.youtube.com/watch?v=VNsThMNjKhM
Merged
16
Image Segmentation
Spatial segmentation
•Partition the image to determine
which pixels constitute signal or
background
•Use an inner circle to calculate
signal value and pixels outside the
outer circle as local background
•Problem: sometimes inner circle is
not small enough for tiny spots
•Intensity-based segmentation: rank intensity of pixels and take a
cut-off equivalent to the approximate area of the spot = signal
•Can use a combination of the two types of segmentation
•Background correction also can be blank spots or control spots
of exogenous DNA
17
Quantification of signal and background
•Mean, median, mode and total intensity of segmented signal
(microarray spots) and background pixels are determined in a text
file (e.g. gpr file)
•Signal intensity =total spot intensity-background intensity
•Median is usually used because it is more robust to outliers
Cy5
Spot
location on
microarray
Cy3
Genes
18
4. Microarray Data Pre-processing and
Normalization
“Correct the data first before spending all your
time analyzing it”
19
Log transformation of expression ratios
•When comparing relative abundance of gene expression between two
samples, take the ratio of Cy5/Cy3 values (R/G)
•Log the expression ratios (log(R/G)): increases symmetrical distribution
of data (upregulated and downregulated genes are treated equally)
log (R/G)
R/G
•2-fold change (R/G) = 2 or 0.5 while log2(R/G) = +1 or -1
http://www.bio.davidson.edu/people/macampbell/ACS_MAGIC/transform.html
20
Microarray Data Normalization
Required to correct for variations caused by:
•Unequal amounts of cDNA
•Distinct dye properties (fluorescence/quenching)
•Differences in dye incorporation
•Differences in scanning
21
Within array/single experiment normalization
Cy5-experimental
Log intensity-Cy3
Cy3-control
Log intensity-Cy5
•Assumption: most genes are not differentially regulated
•Graph looks like the vast majority of genes (spots) are upregulated in the experiment
•More likely that more labelled-Cy5 cDNA used in the hybridization
or non-linear dye properties
22
Global linear normalization
•Assume equal quantities of cDNA and total intensity of Cy3 and Cy5
•Normalization constant=Σ(Cy3)/Σ(Cy5) [e.g. 10,000/20,000 = 0.5]
•For each gene, multiply the Cy5 intensity by the normalization
constant=0.5 (make ratios =1)
Log intensity-Cy3
Log intensity-Cy3
•Only works partially because the relationship is not linear.
Log intensity-Cy5
Log intensity-Cy5
23
log2(G)
M=log2(R/G)
Scatterplot versus M/A (R/I) plot
log2(R)
A=(½ )*log2(R*G)
•A= intensity (brightness) of microarray spots
•M=log expression ratio
•M/A plot allows for detection of intensity-dependent effects
on log expression ratios.
•Plots above shows that most of the greener spots are low
intensity spots
http://compbio.pbworks.com/w/page/16252907/Microarray%20Normalization%20and%20Gene%20Expression%20Index
24
Global Lowess (locally weighted linear regression)
•Performs a series of local regressions in overlapping windows with a
weighted average of neighbouring spots (curve fitting and correction)
•Each regression is combined to make the Lowess smooth curve (weighted
average values: closer spots have greater weight that far-away spots
Window
Fitted line is a function of
mean intensity
25
Global Lowess (locally weighted linear regression)
•Normalized log (R/G)=log(R/G)-Lowess correction
•Lowess correction: subtracting the deviation/distance of the Lowess curve
from the zero axis from the log ratios of each spot
•The output is that log ratios at all intensities have a mean of 0
26
5. Spot and replicate filtering
“Improving the quality of data”
27
Filtering out low intensity spots
•The normalized log ratios at low intensity spots show greater
variation and are less reliable to identify differentially-expressed
genes
•Use some arbitrary cut-off for low intensity spots
Self hybridization experiment
28
Replicate filtering
•Plot the normalized log ratios from two replicate experiments
•Blue spots are within two standard deviations between both replicates
while brown spots > 2 SD are removed
Quackenbush (2002) Nature Genetics Suppl. 32
29
Dye-reversal replicates
•Uneven incorporation of Cy3 and Cy5 dyes can cause false
positives for differentially-expressed genes.
•If label a common mRNA sample with Cy3 and Cy5 and
hybridize on microarray, then all spots should have a mean of 1
Dye bias (spots are not at 0)
•Solution: dye reversal/swap experiment
1. Sample A-Cy3 vs. Sample B-Cy5
2. Sample B’-Cy3 vs. Sample A’-Cy5
•Merged normalized log ratios= [log2(A/B) + log2(B’/A’)]/2
Dabney and Storey (2007) 8:129-138
30
“Now that your data is normalized, what do
you do next?”
•Identify differentially-expressed genes
•Cluster analysis
31
Identifying differentially expressed genes
Most straight-forward way is to have a fixed fold change
cut-off (usually two fold)
+10
2-fold cutoff
0
-10
Log ratio
Log ratio
+10
0
-10
Problem is that the variability of the log ratio is
greater at lower intensities.
At lower intensity spots, genes can be
misidentified as differentially expressed.
At higher intensity spots, differentiallyexpressed genes can be missed.
32
Z-score transformation
•Measures the number of standard deviations a particular data
point is from the mean/median
•Using a sliding window, calculate the local mean and standard
deviations within a window surrounding each data point (e.g. 0.25
log units of spot i)
•Zi=(log ratio ri-mi)/si, where mi and si are the local mean and
standard deviation, respectively
0.5 log units
Zi>1.96 spots are
differentially regulated at
the 95% confidence level
Quackenbush (2002) Nature Genetics Suppl. 32
33
Learning objectives: you should be able to…
•Explain how
transcriptomes
microarrays
are
•Describe how RNA samples
microarray experiments
used
are
to
study
prepared
for
•Describe the characteristics of microarray probes?
•Explain why it is important to normalize microarray
data
•Describe the two approaches to identify differentiallyexpressed genes
34
Download