Gene Expression Microarrays Microarray Normalization Xiaole Shirley Liu STAT115, STAT215, BIO512, BIST298 Announcement: you can register for ANY of the above 4 courses now Microarrays • Grow cells at certain condition, collect mRNA population, and label them • Microarray has high density sequence specific probes with known location for each gene/RNA • Sample hybridized to microarray probes by DNA (A-T, G-C) base pairing, wash nonspecific binding • Measure sample mRNA value by checking labeled signals at each probe location 2 Spotted cDNA Arrays • Pat Brown Lab, Stanford University • Robotic spotting of cDNA (mRNA converted back to DNA, no introns) • Several thousand probes / array • One long probe per gene 3 Spotted cDNA Arrays • Competing hybridization – Control – Treatment • Detection – – – – 4 Green: high control Red: high treatment Yellow: equally high Black: equally low Why Competing Hybridization? • DNA concentration in probes not the same, probes not spotted evenly 5 Oligonucleotide Arrays • Some Design Considerations – – – – – – 6 More or fewer probes / array? Long or short oligos? Same or different probe lengths? How many probes / gene? How are probes placed on the array? One- or two-color assay Affymetrix Oligo Arrays • GeneChip® by Affymetrix • Parallel synthesis of oligonucleotide probes (25mer) on a slide using photolithographic methods • Millions of probes / microarray • Multiple probes per gene • One-color arrays 7 Affymetrix GeneChip Probes 8 Labeled Samples Hybridize to DNA Probes on GeneChip 9 Shining Laser Light Causes Tagged Fragments to Glow 10 Perfect Match (PM) vs MisMatch (MM) (control for cross hybridization) 11 NimbleGen Oligo Arrays 12 Agilent Oligo Arrays 13 14 Why do we bother learning about microarrays now? • RNA-seq is probably preferred in new expression experiments • The amount of useful public data • The data analysis techniques 15 Public Microarray Resources • GEO: Gene Expression Omnibus, a NCBI repository for gene expression and hybridization data, growing quickly. • TCGA: The Cancer Genome Atlas – http://www.cbioportal.org/public-portal/ – https://cghub.ucsc.edu/ – http://www.broadinstitute.org/cancer/cga/ • Oncomine: Cancer Microarray Database – Published cancer related microarrays – Raw data all processed, nice interface 16 Affymetrix Microarray Imagine Analysis • Gridding: based on spike-in DNA • Affymetrix GeneChip Operating System (GCOS) – cel file X 701 702 Y 523 523 MEAN 311.0 48.0 STDV 76.5 10.5 NPIXELS 16 16 – cdf file • Which probe at (X,Y) corresponds to which probe sequence and targeted transcript • MM probes always (X,Y+1) PM 17 Replicates • Always preferred • Biological replicates: – Different animals, tissues, etc • Technical replicates: – Repeated measures of the same sample • In between: – Same cell line grown on different days 18 Normalization • Try to preserve biological variation and minimize experimental variation, so different experiments can be compared • Assumption: most genes / probes don’t change between two conditions • Normalization can have larger effect on analysis than downstream steps (e.g. group comparisons) 19 Median Scaling • Linear scaling array1 array1 – Ensure the different arrays have the same median value and same dynamic range – X' = (X – c1) * c2 array2 20 array2 LOESS • LOcally WEighted Scatterplot Smoothing, more general form is LOESS • Fit a smooth curve – Use robust local linear fits – Effectively applies different scaling factors at different intensity levels – Y = f(X) – Transform X to X' = f(X) – Y and X' are comparable 21 Quantile Normalization • Bolstad et al Bioinformatics 2003 – Currently considered the best normalization method – Assume most of the probes/genes don’t change between samples • Calculate mean for each quantile and reassign each probe by the quantile mean • No experiment retain value, but all experiments have exact same distribution Experiments Probes 22 Mean How to Visualize Microarray Normalization? 23 Dilution Series • RNA sample in 5 different concentrations • 5 replicates scanned on 5 different scanners • Before and after quantile normalization 24 MvA Plot log2R vs log2G Values should be on diagonal 25 M=log2R- log2G A=(log2R+log2G)/2 Values should scatter around 0 Before Normalization • Pairwise MA plot for 5 arrays, probe (PM) M log 2 ( PM i / PM j ) A log 2 PM i PM j 26 After Normalization • Pairwise MA plot for 5 arrays, probe (PM) M log 2 ( PM i / PM j ) A log 2 PM i PM j 27 When Might qnorm Fail? • Loven et al, Cell 2012 28 Summary • Microarrays: Different oligo arrays • Array normalization: Loess, qnorm – Assumptions • Normalization visualization: MA plots • We will cover batch effect removal after clustering analysis… 29