Affymetrix case study Jesper Jørgensen NsGene A/S jrj@nsgene.dk Overview • Affymetrix GeneChip technology • Data processing – – – – Expression level Normalisation Fold change Statistics • Parkinson disease • Ventral versus dorsal midbrain (case study) • Verification of array data – Q-PCR – In situ hybridization – Immunohistochemistry Expression profiling • Expression profiling – Investigate mRNA expression profile. – Compare gene expression between two or more situations. – Case versus control. • Profiling methods – Differential display. – SAGE (Serial Analysis of Gene Expression) – Micro array (Custom spotted arrays / Affymetrix GeneChip). Affymetrix GeneChip technology Gene 5’ 3’ Mulitple oligo probes PM MM Figure adapted from: David Givol, Weizman Institute of Science, http://www.weizmann.ac.il/home/ligivol/research_interests.html Probe synthesis on the array Affymetrix GeneChip technology Gene 5’ 3’ Mulitple oligo probes PM MM Figure adapted from: David Givol, Weizman Institute of Science, http://www.weizmann.ac.il/home/ligivol/research_interests.html Probe set design A probe set = 11-20 PM,MM pairs (Probe design is not optimized) Affymetrix GeneChip technology Gene 5’ 3’ Mulitple oligo probes PM MM Figure adapted from: David Givol, Weizman Institute of Science, http://www.weizmann.ac.il/home/ligivol/research_interests.html Preparation of samples for GeneChip U133A U133B Amplification (T7 RNA polymerase) Figure modified from: Knudsen (2002), “A Biologist's Guide to Analysis of DNA Microarray Data", Wiley. The hardware Overview • Affymetrix GeneChip technology • Data processing – – – – Expression level Normalisation Fold change Statistics • Parkinson disease • Ventral versus dorsal mesencephalon (case study) • Verification of array data – Q-PCR – In situ hybridization – Immune histochemistry Expression level (probe signal) Li-Wong model n: scaling factor obtained by fitting Several other models exists. Irizarry et al. (2002) uses log transformed PM values after carrying out a global background adjustment and across array normalisation. Irrizary et al. (2002) Biostatistics qspline normalisation (M/A plot) Before After Workman et al., (2002) Genome Biology, vol. 3, No. 9. Assumption: Most genes are unchanged. M/A plot: Raw chip data are used to plot, for each probe, the logarithm of the ratio between two chips versus the logarithm of the mean expression for the two chips. Variation A/A B/B Two different amplifications of the same RNA applied to GeneChips Fold change (Log fold) • Fold change = sample/control • Log transformation makes scale symmetric around 0 • All data log2 transformed 4 3 Log fold (2) 2 1 0 -1 -2 -3 -4 0 2 4 6 Fold change 8 10 12 Statistical testing Is the regulation significant? • Student and Welch’s t-test • ANOVA • SAM • Wilcoxon • Kruskal-Wallis • Westfall-Young • ……….. Bonferroni correction At a P-value of 0.05 you expect: • 5 false positives if you look at 100 genes • 1200 false positives if you look at 24.000 genes Increased likelihood of getting a significant result by chance alone If you want 25% chance of having only one false positive in the list of regulated genes, you should only consider P-values more significant than the Bonferroni corrected cutoff. • 2.5x10-3 (0.25/100) if you look at 100 genes • 1.0x10-5 (0.25/24.000) if you look at 24.000 genes Overview • Affymetrix GeneChip technology • Data processing – – – – Expression level Normalisation Fold change Statistics • Parkinson disease • Ventral versus dorsal mesencephalon (case study) • Verification of array data – Q-PCR – In situ hybridization – Immune histochemistry Parkinson’s Disease (PD) • A fairly common neurodegenerative disorder (app. 2 million in USA/Europe) • Due to loss of the dopamineproducing neurons in the Substantia Nigra • Cardinal motor symptoms: tremor, rigidity and bradykinesia • Conventional treatment does not halt the progression nerve cell loss Fetal Transplantation for PD • Cells from the developing midbrain (A) – are collected and dissociated (B) – and transplanted into the striatum (C) • The cells will integrate with the host brain and produce dopamine. Stem cells in Parkinson disease Langston JW., J Clin Invest. 2005 Jan;115(1):23-5. Overview • Affymetrix GeneChip technology • Data processing – – – – Expression level Normalisation Fold change Statistics • Parkinson disease • Ventral versus dorsal mesencephalon (case study) • Verification of array data – Q-PCR – In situ hybridization – Immune histochemistry Aim • In the human fetus, DA neurons can be found in the ventral part of the tegmentum (VT) from approximately 6 weeks. • In contrast, no DA neurons can be found in the neighboring dorsal part (DT). • We aim at finding genes associated with DA differentiation by using GeneChips to compare the expression profiles of VT and DT. * TH IHC High quality RNA from 8w GA human ventral midbrain 8wVT (A) 8wVT (B) 8wDT (A) 8wDT (B) Experimental setup • Compare VT against DT (3x3) • Affymetrix Human Genome U133 Chip Set – HG-U133A: Well substantiated genes – HG-U133B: Mostly EST’s – Total: 45,000 probes (genome) A VENTRAL B VENTRAL C VENTRAL A DORSAL B DORSAL C DORSAL U133A data permutations and filter • Red: VM versus DM: VM (A1 VENTRAL, A2 VENTRAL, B VENTRAL) DM (A1 DORSAL, A2 DORSAL, B DORSAL) • Other colors: Permutations • Low-stringency filter as dotted line: • Average expression > 50 • P-value < 0.04 • SLR>0.5 (42% up-regulation in VM) • Arrange with descending fold change. SLR Genes up-regulated in VM on U133A Low-stringency filter: Average expression > 50, P-value<0.04, SLR>0.5 arranged with descending fold change. Total list 107 probes. Only SLR>1 displayed. Literature verification • • • • • • • • • • • • • • • • • • ALDH1A DAT1 VMAT2 TH Calbindin, 28kDa HNF3a 3x Nurr1 2x IGF 4x SNCA 4x DRD2 KCNJ6 (Girk2) Ret PITX3 BDNF DLK1 (FA1) SLC17A6 (VGLUT2) EPHA5 ERBB4 Overview • Affymetrix GeneChip technology • Data processing – – – – Expression level Normalisation Fold change Statistics • Parkinson disease • Ventral versus dorsal mesencephalon (case study) • Verification of array data – Q-PCR – In situ hybridization – Immune histochemistry Verification of array data Array Data (100 candiate genes) Validation on array material (confirmation) Validation on new samples (universality) Desk work RNA Protein Statistics Q-PCR IHC Literature ISH ELISA Bioinformatics Northerns Westerns 30x cDNA#253 (VM) cDNA#254 (DM) cDNA#244 (VM) 0,14 0,12 Fluorescence 299bp cDNA#245 (DM) cDNA#256 (VM) cDNA#257 (DM) ALDH1A1 RT-PCR 0,1 0,08 0,06 0,04 0,02 299bp 0 35x 1 4 7 103013 16 19 35 22 254028 Cycle 31 34 37 40 43 Q-PCR verification of genes regulated on U133A TH Q-PCR on a developmental series of subdissected human embryonic and fetal brain material OD260/280 were measured to 1.88 +/- 0.05 for all RNA samples Q-PCR analysis and clustering OD260/280 were measured to 1.88 +/- 0.05 for all RNA samples Fold change in a mixed population 1.5 fold up-regulation from no expression 1.5 fold up-regulation from some expression Verification of array data Array Data (100 candiate genes) Validation on array material (confirmation) Validation on new samples (universality) Desk work RNA Protein Statistics Q-PCR IHC Literature ISH ELISA Bioinformatics Northerns Westerns Organization of ISH procedure GeneChip verification with ISH ISH from: Vernay et al., J Neurosci. 2005 May 11;25(19):4856-67. Verification of array data Array Data (100 candiate genes) Validation on array material (confirmation) Validation on new samples (universality) Desk work RNA Protein Statistics Q-PCR IHC Literature ISH ELISA Bioinformatics Northerns Westerns GeneChip verification with IHC Courtesy of Josephine Jensen Conclusions • Using arrays one will get at snapshot of the expression profile under the conditions investigated. – Careful experimental design – RNA quantity and quality are important • Since a single array experiment generates thousands of data points, the primary challenge of the technique is to make sense of data. – Calculations/Statistics (back and forth) – Literature mining • Independent methods are needed for verification – Q-PCR – In situ hybridization (ISH) – Immunohistochemistry (IHC) Acknowledgements NsGene, Ballerup, Denmark (http://www.nsgene.com/) • Lars Wahlberg • Bengt Juliusson • Teit Johansen Neurotech, Huddinge University Hospital, Sweden • Åke Seiger Department of Medical Genetics, IMBG, Panum Institute, Denmark • Claus Hansen • Karen Friis Wallenberg Neuroscience Center, Sweden • Anders Björklund • Josephine Jensen • Elin Andersson CBS, DTU, Denmark • Søren Brunak • Steen Knudsen • Nikolaj Blom • Thomas Nordahl Petersen