Microarrays 1 Outline • Intro to the technology • Sample QC • Experimental design • Gene expression microarrays • SNP microarrays for SNP and CNV analyses • Methylation microarrays • Applications 2 Intro to Microarray • A DNA microarray is a multiplex technology consisting of thousands of oligonucleotide that are attached to a solid surface and are hybridized to a biological sample. • Several commercial platforms are available: Affymetrix, Illumina, Agilent, etc. Advantages Disadvantages Off-the- shelf product Not flexible Uniform and reproducible (QC by manufacturer) Known genes/transcripts For specific species and organisms 3 The BeadArray Technology • Silica beads that randomly self assemble in microwells. • Each bead contains X100,000 copies of a specific oligo. • ~30-fold redundancy for each bead. 4 Applications 1975176103_A AVG_Signal vs 1975176103_B AVG_Signal 10 • SNP genotyping 10 4 1975176103_A AVG_Signal • Gene expression 5 10 3 10 2 10 1 10 1 10 2 10 3 10 4 10 5 1975176103_B AVG_Signal • CNV analysis • DNA methylation 5 DNA/RNA Quality 6 Automated Extraction and Purification • DNA/RNA extraction and purification • Spin-column kits • Up to 12 samples 7 DNA/RNA Quality • Automated electrophoresis by TapeStation, Agilent • QC of DNA/RNA • Sizing • Quantitation • Level of degradation • 28S rRNA/18S rRNA ratio 8 DNA Quality 9 DNA Quality 10 RNA Quality • Purity: A260/A280 = 1.9-2.1 • Integrity: 28S rRNA/18S rRNA = 1.8 11 RNA Quality 12 RNA Quality 13 Experimental Design 14 Experimental Design • Define your biological question. • Consider the expected effect and according to this decide the sample size. - For GT experiment (GWAS): the number of cases and controls is determined by the study power. - For GX experiment: at least three biological (not technical) replicates, better four. Optional- pooling. 15 Gene Expression Microarrays 16 RNA Labeling and Amplification 17 Direct Hybridization Assay • 750 ng of aRNA. • Array hybridization, washing, blocking and streptavidin-Cy3 staining. • Quantitatively detection of Cy3 fluorescence. 18 Gene Expression Data Analysis 19 Clustering Analysis • % of variance component • 3D PCA (Principal Component Analysis) • Hierarchical clustering (heat map and dendogram) 20 The problem of Multiple Comparisons • When doing so many statistical tests, it is almost guaranteed that many of your results will be incorrectly considered significant. • At p-value = 0.05, 5 of every 100 results might be called significant by mistake (p<0.05). • But when doing ~20,000 tests per experiment (as with microarrays and RNAseq), p-value of 0.05 means that 1000 genes will be called significant by mistake. 21 The Solution for Multiple Comparisons • Filter out transcripts with expression at background level or transcripts with less than 5% variation between samples. • Bonferroni correction: p/number of tests so, 0.05/20,000 = 2.5x10^-6. Thus, you should consider a result significant if p<2.5x10^-6. • False Discovery Rate (FDR): the expected proportion of false-positives among the positive results. At adjusted p=0.05, if you call 1000 genes significantly changed, up to 50 of those might be false positives, but the rest will be true positives. q-value/adjusted p-value 22 Analysis of Variance (ANOVA) • ANOVA allows us to compare means of ≥2 groups. • You will get a list of significant differentially expressed genes between groups (at an adjusted p-value of 0.05 and difference between the means of 1). • Volcano plots (-log10 (p-value)=2 p-value=0.01 23 Venn Diagram 24 Pathway and Network Analysis 25 26 27 28 SNP Microarrays 29 ILM Whole-Genome SNP Genotyping • Available for human and agriculture products • Each array contains 700K-5M probes • Off-the-shelf and custom arrays • 200 ng gDNA (quantitation method: Qubit or PicoGreen) • SNPs can be used in GWAS for identification of susceptibility loci (large groups of cases and controls) and for analysis of copy number variation (CNVs) 30 Infinium Assay Workflow rs2179648 2.00 1.80 1.60 1.40 Norm R 1.20 1 0.80 0.60 0.40 0.20 0 -0.20 22 0 44 0.20 0.40 0.60 0.80 30 1 Norm Theta 31 Data Analysis for SNPs rs12191877 rs2179648 2.00 1.20 1.80 1.60 1 1.40 0.80 Norm R Norm R 1.20 1 0.80 0.60 0.40 0.60 0.40 0.20 0.20 0 -0.20 0 22 0 44 0.20 0.40 0.60 Norm Theta 0.80 30 1 -0.20 6 0 33 0.20 0.40 0.60 Norm Theta 56 0.80 1 32 33 34 CNV Analysis 35 Milestones in Cytogenetics 1960s G-banding karyotype 1980s FISH 1990s CGH aCGH rs2179648 2.00 1.60 1.40 1.20 Norm R 1999 SNP arrays 1.80 1 0.80 0.60 0.40 0.20 0 -0.20 22 0 44 0.20 0.40 0.60 0.80 30 1 Norm Theta 2005 NGS 36 Detection of Structural Variation 37 Data Analysis for CNVs • B Allele Frequency (BAF)- expresses the proportion of intensity contribution of a given allele, which indicates the genotype. BAF=Number of B alleles/Total number of alleles • Log2 R Ratio (LRR)- expresses the signal intensity of each probe, which indicates the copy number. LRR=log2(Robserved/Rexpected) 38 Data Analysis for CNVs Normal (diploid) Deletion (loss of one copy) Duplication (gain of one copy) Copy-Neutral LOH (UPD) LRR Intensity BAF Genotype Also can detect: Amplification, Unbalanced aberration, Aneuploidy, and Mosaicism.39 Data Analysis for CNVs Normal (diploid) LRR=0 CN=2 BAF AA=0/2=0 BB=2/2=1 AB=1/2=0.5 LRR Intensity BAF Genotype 40 Data Analysis for CNVs Deletion (loss of one copy) LRR = -0.5 CN=1 BAF A/- = 0/1=0 B/- = 1/1=1 LRR Intensity BAF Genotype 41 Data Analysis for CNVs Duplication (gain of one copy) LRR=0.5 LRR Intensity BAF Genotype CN=3 BAF AAA=0/3=0 BBB=3/3=1 AAB=1/3=0.33 ABB=2/3=0.67 42 Data Analysis for CNVs LRR=0 CN=2 BAF AA=0/2=2 BB=2/2=1 AB=Not present Copy-Neutral LOH (UPD) LRR Intensity BAF Genotype 43 44 45 46 Methylation Microarrays 47 Methylation C G 48 ILM Whole-Genome Methylation • > 850K methylation sites that covers: - 99% of RefSeq genes, ~20 CpG sites per gene distributed across the promoter, 5’UTR, first exon, gene body, and 3’UTR. - 96% of CpG islands. - CpG sites outside of CpG islands. - Non-CpG methylated sites identified in human stem cells. - Differentially methylated sites in tumors. - Diff methy sites across tissues. - CpG islands outside of coding regions. - miRNA promoter regions. • Experimental design should consider effect of age, gender 49 and tissue, on the methylation profile. Methylation Assay 50 51 52 Summary • Microarrays technology is a powerful tool in research. • Microarrays provide insight for basic mechanism, disease subtypes, treatment adjustment, and diseases risks. • Data analysis of microarrays has been standardized and package into easy to use software tools for biologists. • However the success of experiment depends on experimental design, biological samples collection, and carful statistical analysis. • Expected to be totally replaced soon by the NGS technology (already happened for gene expression). 53 Thank you! Questions? 54