X - STAT 115

advertisement
Gene Expression Microarrays
Microarray Normalization
Xiaole Shirley Liu
STAT115, STAT215
Microarrays
• Grow cells at certain condition, collect
mRNA population, and label them
• Microarray has high density sequence
specific probes with known location for
each gene/RNA
• Sample hybridized to microarray probes by
DNA (A-T, G-C) base pairing, wash nonspecific binding
• Measure sample mRNA value by checking
labeled signals at each probe location
2
Spotted cDNA Arrays
• Pat Brown Lab, Stanford
University
• Robotic spotting of cDNA
(mRNA converted back to
DNA, no introns)
• Several thousand probes /
array
• One long probe per gene
3
Spotted cDNA Arrays
• Competing hybridization
– Control
– Treatment
• Detection
–
–
–
–
4
Green: high control
Red: high treatment
Yellow: equally high
Black: equally low
Why Competing Hybridization?
• DNA concentration in probes not the
same, probes not spotted evenly
5
Oligonucleotide Arrays
• GeneChip® by Affymetrix
• Parallel synthesis of
oligonucleotide probes (25mer) on a slide using
photolithographic methods
• Millions of probes /
microarray
• Multiple probes per gene
• One-color arrays
6
Affymetrix GeneChip Probes
7
Labeled Samples Hybridize to DNA
Probes on GeneChip
8
Shining Laser Light Causes
Tagged Fragments to Glow
9
Perfect Match (PM) vs MisMatch (MM)
(control for cross hybridization)
10
Replicates & Normalization
• Replicates: always preferred
• Try to preserve biological variation and
minimize experimental variation, so
different experiments can be compared
• Assumption: most genes / probes don’t
change between two conditions
• Normalization can have larger effect on
analysis than downstream steps (e.g. group
comparisons)
11
Affymetrix Microarray Imagine Analysis
• Gridding: based on spike-in DNA
• Affymetrix GeneChip Operating System
(GCOS)
– cel file
X
701
702
Y
523
523
MEAN
311.0
48.0
STDV
76.5
10.5
NPIXELS
16
16
– cdf file
• Which probe at (X,Y) corresponds to which probe
sequence and targeted transcript
• MM probes always (X,Y+1) PM
12
Median Scaling
• Linear scaling
array1
array1
– Ensure the different arrays have the same
median value and same dynamic range
– X' = (X – c1) * c2
array2
13
array2
Loess
• LOcally WEighted Scatterplot Smoothing
• Fit a smooth curve
– Use robust local linear fits
– Effectively applies different scaling factors at
different intensity levels
– Y = f(X)
– Transform X to X' = f(X)
– Y and X' are comparable
14
Quantile Normalization
• Bolstad et al Bioinformatics 2003
– Currently considered the best normalization method
– Assume most of the probes/genes don’t change between samples
• Calculate mean for each quantile and reassign each probe
by the quantile mean
• No experiment retain value, but all experiments have
exact same distribution
Experiments
Probes
15
Mean
Dilution Series
• RNA sample in 5 different concentrations
• 5 replicates scanned on 5 different scanners
• Before and after quantile normalization
16
Normalization Quality Check
MvA Plot
log2R vs log2G
Values should be
on diagonal
17
M=log2R- log2G
A=(log2R+log2G)/2
Values should scatter
around 0
Before Normalization
• Pairwise MA plot for 5 arrays, probe (PM)
18
After Normalization
• Pairwise MA plot for 5 arrays, probe (PM)
19
When Might qnorm Fail?
• Loven et al, Cell 2012
20
Public Microarray Databases
• GEO: Gene Expression Omnibus, a NCBI
repository for gene expression and
hybridization data, growing quickly.
• Oncomine: Cancer Microarray Database
– Published cancer related microarrays
– Raw data all processed, nice interface
• TCGA: the Cancer Genome Atlas
– https://cghub.ucsc.edu/
– http://www.cbioportal.org/public-portal/
21
Download