Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian Introduction Big volumes of microarray data generated from different technologies are too large to analyze by simple sorting in spreadsheets, or manually comparing, plotting as graphs. Each type and/or platform of microarray has its own unique analysis features. General Procedures Normalization: remove/reduce systematic (non-biological) variation between arrayarray, chip-chip. Try to equalized overall signals across array/chip to be compared. Examples of normalization: whole chip, per gene, quantile, Lowess, dye swap, RMA,… Cont. Comparative: Compare gene expression across two or more samples to determine significant differential expressed gene list. Example methods: t-test ANOVA Fold change Rank order (MAS 5 etc.) Permutation (SAM) Cont. Clustering: Identifies significant correlation in expression data across experiments/conditions. Example method: Hierarchical clustering k-means clustering Self-organizing maps ….. Cont. Biological overlay: Identify functions for give genes; functional clusters of genes; hypothesis generation Example method: Multi-database access (Source) Functional grouping (Gene Ontology, KEGG, GenMAPP) PubMed Correlations (PubGene) GenomeStudio A software tool for analyzing illumina gene expression data from scanned microarray images collected from the illumina BeadArray Reader. Resulting BeadStudio files can be used by the 3rd party analysis programs. The normalization uses quantiles of sample intensities to fit smoothing B-splines. It’s a non-linear method. Different scaling factors are applied to different parts of the population of genes. Differntial Expression Algorithm This is used to compare a group of samples to a reference group. Illumina custom: assumes that signal intensith is normally distributed among replicates. The variation has 3 components: biological, non-biological, and technical errors. Mann-Whitney: also called Wilcoxon rank-sum rest. It’s a non-parametric test for assessing whether two samples of observation come from the same distribution. T-test: Output Files of BeadStudio XXXXXX_gene_profile: Intensity data and various quality scores reported at the gene level. Signals from probes for the same gene are combined to give a single value for the gene. XXXXXX_qcinfo :Intensity data for categories of experimental control probes. XXXXXX_gene_diff: Intensity determining if gene expression levels have changed between two experimental groups. Cluster Analysis System Controls Housekeeping Controls: The intactness of the biological specimen can be monitored by this. Biotin Control: Successful secondary staining is indicated by a positive hybridization signal from these probe. Negative Controls: This represents measurement of background, non-specific binding or cross-hybridization. Cont. Controls for Hybridization: Cy3-Labeled Hyb control Low Stringency Hyb Control High Stringency Hyb Control