Microarray Data Analysis

advertisement
Microarray Data Analysis
Illumina Gene Expression Data
Analysis
Yun Lian
Introduction

Big volumes of microarray data generated
from different technologies are too large to
analyze by simple sorting in spreadsheets, or
manually comparing, plotting as graphs.

Each type and/or platform of microarray has
its own unique analysis features.
General Procedures

Normalization: remove/reduce systematic
(non-biological) variation between arrayarray, chip-chip. Try to equalized overall
signals across array/chip to be compared.
Examples of normalization: whole chip, per
gene, quantile, Lowess, dye swap, RMA,…
Cont.

Comparative: Compare gene expression across two
or more samples to determine significant differential
expressed gene list.
Example methods:
t-test ANOVA
Fold change
Rank order (MAS 5 etc.)
Permutation (SAM)
Cont.

Clustering: Identifies significant correlation in
expression data across experiments/conditions.
Example method:
Hierarchical clustering
k-means clustering
Self-organizing maps
…..
Cont.
Biological overlay: Identify functions for give
genes; functional clusters of genes; hypothesis
generation
Example method:
Multi-database access (Source)
Functional grouping (Gene Ontology, KEGG,
GenMAPP)
PubMed Correlations (PubGene)

GenomeStudio


A software tool for analyzing illumina gene
expression data from scanned microarray
images collected from the illumina BeadArray
Reader.
Resulting BeadStudio files can be used by the
3rd party analysis programs.
The normalization uses quantiles of sample intensities to fit smoothing
B-splines. It’s a non-linear method. Different scaling factors are
applied to different parts of the population of genes.
Differntial Expression Algorithm
This is used to compare a group of samples to a
reference group.
 Illumina custom: assumes that signal intensith is
normally distributed among replicates. The variation
has 3 components: biological, non-biological, and
technical errors.
 Mann-Whitney: also called Wilcoxon rank-sum rest.
It’s a non-parametric test for assessing whether two
samples of observation come from the same
distribution.
 T-test:
Output Files of BeadStudio



XXXXXX_gene_profile: Intensity data and various
quality scores reported at the gene
level. Signals from probes for the same gene are
combined to give a single value for the gene.
XXXXXX_qcinfo :Intensity data for categories of
experimental control probes.
XXXXXX_gene_diff: Intensity determining if gene
expression levels have changed between two
experimental groups.
Cluster Analysis
System Controls



Housekeeping Controls: The intactness of the
biological specimen can be monitored by this.
Biotin Control: Successful secondary staining
is indicated by a positive hybridization signal
from these probe.
Negative Controls: This represents
measurement of background, non-specific
binding or cross-hybridization.
Cont.

Controls for Hybridization:
Cy3-Labeled Hyb control
Low Stringency Hyb Control
High Stringency Hyb Control
Download