The Discovery of Myc Regulated Genes in Islet Tumours using Microarray Analysis Stella Pelengaris Sam Robson ACTIVATION OF MYCERTAM PROTEIN Inactive MycERTAM Myc Bound HSP90 ERTM HSP90 + 4-hydroxytamoxifen Active MycERTAM Myc Unbound HSP90 ERTM HSP90 Myc Max ERTM Myc Activation Promotes cell Proliferation MycERTAM inactive MycERTAM activated 24 hrs Ki67 Myc induces concomitant apoptosis MycERTAM inactive MycERTAM activated 72 hrs TUNEL -cells are almost totally ablated MycERTAM inactive MycERTAM activated 6 days H&E Blocking Myc-induced apoptosis in islet ß-cells ……. By over-expressing Bcl-XL (RIP- Bcl-XL Transgenic mice from Doug Hanahan) Blocking Myc-induced apoptosis: Ins-MycERTM x RIPBcl-xL mice TAM MycERTAM inactive MycER H&E activated 7-14 days Loss of cell differentiation Decreased insulin content (also loss of Pdx-1, GLUT 2), moreover, animals develop transient diabetes. The tumour suppressor function of c-Myc c-Myc ON Proliferation c-Myc OFF Adult Pancreatic islet beta cells Apoptosis Islet involution +Bcl-xL (Apoptosis blocked) Hyperplasia Loss of Differentiation Loss of cell-cell contact Local invasion Angiogenesis The oncogenic potential of c-Myc Rapid tumour regression following Myc deactivationDay 10 Day 0 Day 4 Day 38 Ki-67 E-cadherin Experimental Design • Islet tumour reversal experiment. • Timepoints: – – – – Time 0 (Transgenic untreated) 1 day ON (4-OHT administered for 1 day) 14 days ON (4-OHT administered for 14 days) 14 days ON 7 days OFF (4-OHT administered for 14 days, injections stopped for 7 days) • 3 replicates for each timepoint. • Randomization or litter mates? Affymetrix GeneChips • Each gene represented by 11-20 ‘probe pairs’. • Probe pairs are 3’ biased. • ‘Probe Pair’ consists of Perfect Match (PM) and MisMatch (MM) probes. • MM has altered middle (13th) base. Designed to measure non-specific binding (NSB). GeneChip Scanning • RNA sample prepared, labelled and hybridised to chip. • Chip fluorescently scanned. Gives a raw pixelated image - .DAT file. • Grid used to separate pixels related to individual probes. • Pixel intensities averaged to give single intensity for each probe - .CEL file. • Probe level intensities combined for each probe set to give single intensity value for each gene .CHP file. Affymetrix MicroArray Suite (MAS) v5.0 • Current method employed by Affymetrix. • Weighted mean using one-step Tukey Biweight Estimate: signal log 1 Tukey Biweight log PM j CTj • CTj is a quantity derived from MMj never larger than PMj. • Weights each probe intensity based on its distance from the mean. • Robust average (insensitive to small changes from any assumptions made). Problems with Mis-Match Data • MM intensity levels are greater than PM intensity levels in ~1/3 of all probes. • Suggests that MM probes measure actual signal, and not just non specific binding. • Removal of MM results in negative signal values. • Subtracting MM data will result in loss of interesting signal in many probes. Several methods have been proposed using only PM data. Robust Multiarray Average (RMA) • Subtraction of MM data corrects for NSB, but introduces noise. • Want a method that gives positive intensity values. • Normalising at probe level avoids the loss of information. Analysis Setup • Use both RMA normalization and MAS5.0 normalization. • Have higher confidence in genes that show differential expression by both methods. • Setup experiment with both RMA and MAS5.0 normalized data. Allows direct comparison. • Be sure to apply per chip and per gene normalization steps on RMA and MAS5.0 normalized data separately. Analysis Setup Quality Control - Samples • Measure how similar replicates are. • Methods include condition trees, principle component analysis (PCA) and scatter plots. • See here that one time point has variation amongst replicates compared to the others. This raises an interesting question regarding randomization. • For now, remove these samples from analysis. Quality Control - Samples Genespring N... Genes... Selected Condition Tree: Branch color parameter: Genespring ... Time (days) Colored by: Non QC'd - Genespring No... Gene List: test (920), 1455802_x_at s... Condition Tree Quality Control - Samples 1 Y: PCA component 2 (11.76% variance) 0d ON 0 14d O... 0 0 Z: PCA component 3 (10.54% variance) 1 14d ON X: PCA component 1 (13.07% variance) 1 1d ON X-axis: PCA component 1 (13.07% varia... Y-axis: PCA component 2 (11.76% varia... Z-axis: PCA component 3 (10.54% varia... Conditions: Colored by: Non QC'd - Genespring No... Parameter Time Principle Component Analysis – MAS5.0 Quality Control - Samples 1Y: PCA component 2 (29.82% variance) 0d ON 0 14d O... 0 0 X: PCA component 1 (31.67% variance) 1 Z: PCA component 3 (10.96% variance) 1 14d ON 1d ON X-axis: PCA component 1 (31.67% variance) Y-axis: PCA component 2 (29.82% variance) Z-axis: PCA component 3 (10.96% variance) Conditions: Colored by: Non QC'd - RMA Preprocessed Experiment, All Samples Parameter Time Principle Component Analysis – RMA Quality Control - Genes • Remove all AFFX probe data (standard probes present on all Affymetrix Genechips). • Filter on flags (removed ‘absent in all’ genes) • Filter genes based on Standard Deviation of replicate data. “Interesting data lies within 1.4 SDs of mean”. • Filter out non-changing genes to leave differentially expressed genes (0.8< fold change <1.2). Be sure to use ratio mode with all fold change analyses. • Venn diagram tool to make a list of ∩ RMA and MAS 5 QC’d genes. • Replace samples removed from sample QC prior to further analyses. Supervised vs. Unsupervised Analysis • Unsupervised analysis uses iterative methods to cluster expression profiles. • Useful to see coexpressed genes. • Supervised analysis probably better in this case. We know what expression profiles to expect. • Compare all genes to an expected expression profile. Expected Expression Profile Use of Gene Ontology • Genes listed based on Molecular Function, Biological Process and Cellular Component. • Comparison of gene lists with GO lists offers insight into gene function. • Can split window to group genes with similar molecular functions. Use of Gene Ontology Normalized Intensity 1 (log scale) Normalized Intensity 1 (log scale) Time Normalized Intensity 1 (log scale) Time Normalized Intensity 1 (log scale) Time Time Time 0 Time 0 GC-RMA MAS5.0 antioxidant activity (GO... Time 0 Time 0 GC-RMA MAS5.0 apoptosis regulator ac... Time 0 Time 0 GC-RMA MAS5.0 binding...20 in list Time 0 Time 0 GC-RMA MAS5.0 catalytic activity... Normalized Intensity 1 (log scale) Normalized Intensity 1 (log scale) Normalized Intensity 1 (log scale) Normalized Intensity 1 (log scale) Time Time Time Time Time 0 Time 0 GC-RMA MAS5.0 cell adhesion molecul... Time 0 Time 0 GC-RMA MAS5.0 chaperone activity (GO... Time 0 Time 0 GC-RMA MAS5.0 chaperone regulator a... Time 0 Time 0 GC-RMA MAS5.0 defense immunity prot... Normalized Intensity 1 (log scale) Normalized Intensity 1 (log scale) Normalized Intensity 1 (log scale) Normalized Intensity 1 (log scale) Time Time Time Time Time 0 Time 0 GC-RMA MAS5.0 Time 0 Time 0 GC-RMA MAS5.0 Time 0 Time 0 GC-RMA MAS5.0 Time 0 Time 0 GC-RMA MAS5.0 enzyme regulator activi... molecular_function un... motor activity (GO:000... protein stabilization ac... Normalized Intensity 1 (log scale) Normalized Intensity 1 (log scale) Time Normalized Intensity 1 (log scale) Time Normalized Intensity 1 (log scale) Time Time Time 0 Time 0 GC-RMA MAS5.0 Time 0 Time 0 GC-RMA MAS5.0 Time 0 Time 0 GC-RMA MAS5.0 Time 0 Time 0 GC-RMA MAS5.0 signal transducer activ... structural molecule act... transcription regulator ... translation regulator (... Normalized Intensity 1 (log scale) Normalized Intensity 1 (log scale) Time Time 0 Time 0 GC-RMA MAS5.0 transporter activity... Time Time 0 Time 0 GC-RMA MAS5.0 Unclassified...14 in list Y-axis: Comparison of Normalization methods, Time course - Log mode Split by: Gene Lists/Gene Ontology (GO SLIMS)/GO SLIMS Molecular Functio... Colored by: Time 0 GC-RMA Gene List: All (64) Use of gene Pathways • GenMAPP and KEGG have pathways available to GeneSpring. • Able to show expression profiles directly onto these pathways. Use of Gene Pathways Selected Pathway: Colored by: Gene List: Adherens junction - Mus musculus Comparison of Normalization methods, Time course - Log mode all genes (45101) Use of Gene Pathways Selected Pathway: Colored by: Gene List: Apoptosis - Mus musculus Comparison of Normalization methods, Time course - Log mode all genes (45101) Use of Gene Pathways Selected Pathway: Colored by: Gene List: Cell Cycle Comparison of Normalization methods, Time course - Log mode all genes (45101) Differences seen between RMA and MAS5.0 • Every analysis step performed on both sets of data. • Those genes found with both methods have greater confidence. • See more genes pulled out with MAS5.0 than RMA. • Is MAS5.0 producing false positives, or is RMA producing false negatives? • RMA seen to ‘squash’ low expression genes compared to MAS5.0. These could be lost in QC process. • Probable that RMA loses interesting data. Hand Curation of Genes • First pull out genes with at least two-fold change between time points. • Use GO lists to find genes that could be interesting. • Use literature to find possible targets. • Which genes are not interesting, and which are novel targets? • Biologist intellectual input required. Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 1 day ON 14 days ON MAS5.0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Mmp9 (2) Mmp9 – Matrix metalloproteinase 9 Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 1 day ON 14 days ON MAS5.0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Mmp12 (1) Mmp12 – Matrix metalloproteinase 12 Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 1 day ON 14 days ON MAS5.0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Ipf1 (2) Ipf1 – Insulin promoter factor 1 Some Results 100 Normalized Intensity (log scale) 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 1 day ON 14 days ON MAS5.0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Insulin (2) Ins1 – Insulin I and II Some Results 100 Normalized Intensity (log scale) 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 1 day ON 14 days ON MAS5.0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Cad (2) Cad – Carbamoyl-phosphate synthetase 2, aspartate transcarbamylase and dihydroorotase Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 1 day ON Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Caspase1 (1) Casp1 – Caspase 1 14 days ON MAS5.0 Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 1 day ON Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Caspase4 (1) Casp4 – Caspase 4 14 days ON MAS5.0 Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 1 day ON Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Caspase7 (2) Casp7 – Caspase 7 14 days ON MAS5.0 Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Cdk8 (1) Cdk8 1 day ON 14 days ON MAS5.0 Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: p15 (1) p15 1 day ON 14 days ON MAS5.0 Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: p57 (1) p57 1 day ON 14 days ON MAS5.0 Some Results 100 Normalized Intensity (log scale) 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 1 day ON Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Somatostatin (1) Somatostatin 14 days ON MAS5.0 Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Vegf (4) Vegf 1 day ON 14 days ON MAS5.0 Some Results 100 10 1 0.1 0.01 Time 0 Time 1 day ON 14 days ON GC-RMA Time 0 Y-axis: Comparison of Normalization methods, Time course - Log mode Colored by: Time 0 GC-RMA Gene List: Vimentin (1) Vimentin 1 day ON 14 days ON MAS5.0 Pathways Problems seen in data • Pancreatic tissue notorious for producing low quality RNA (can be prevented with experience). • Use whole pancreatic tissue, yet only interested in changes in the islets. Exocrine tissue may mask important changes. • Islet mass not constant throughout the pancreas. • Islet mass increases with 4-OHT administration. Thus later time points contain more islet tissue than earlier time points. • Many suspected target genes show unexpected expression profiles. Future work • More in silico work before wet lab work? • Wet lab work to confirm hypothesise: – Real Time PCR – Western Blots – In situ PCR (Not quantitative but would allow us to concentrate on islet tissue) – Micro Fluidic Cards? • Run microarrays on pure exocrine tissue. Subtract from whole pancreas data to see islet changes. • Laser microdissection to concentrate on islet tissue. Method requires work to prevent RNA degradation. Paradise FFPE kit?