Uploaded by shani hadar

Talk3 Microarrays

advertisement
Microarrays
1
Outline
• Intro to the technology
• Sample QC
• Experimental design
• Gene expression microarrays
• SNP microarrays for SNP and CNV analyses
• Methylation microarrays
• Applications
2
Intro to Microarray
• A DNA microarray is a multiplex technology consisting
of thousands of oligonucleotide that are attached to a
solid surface and are hybridized to a biological sample.
• Several commercial platforms are available: Affymetrix,
Illumina, Agilent, etc.
Advantages
Disadvantages
Off-the- shelf product
Not flexible
Uniform and reproducible
(QC by manufacturer)
Known genes/transcripts
For specific species and
organisms
3
The BeadArray Technology
• Silica beads that randomly self assemble
in microwells.
• Each bead contains X100,000 copies of a specific oligo.
• ~30-fold redundancy for each
bead.
4
Applications
1975176103_A AVG_Signal vs 1975176103_B AVG_Signal
10
• SNP genotyping
10 4
1975176103_A AVG_Signal
• Gene expression
5
10 3
10
2
10 1
10
1
10
2
10
3
10
4
10
5
1975176103_B AVG_Signal
• CNV analysis
• DNA methylation
5
DNA/RNA Quality
6
Automated Extraction and Purification
• DNA/RNA extraction and purification
• Spin-column kits
• Up to 12 samples
7
DNA/RNA Quality
• Automated electrophoresis by TapeStation, Agilent
• QC of DNA/RNA
• Sizing
• Quantitation
• Level of degradation
• 28S rRNA/18S rRNA ratio
8
DNA Quality
9
DNA Quality
10
RNA Quality
• Purity: A260/A280 = 1.9-2.1
• Integrity: 28S rRNA/18S rRNA = 1.8
11
RNA Quality
12
RNA Quality
13
Experimental Design
14
Experimental Design
• Define your biological question.
• Consider the expected effect and according to this
decide the sample size.
- For GT experiment (GWAS): the number of cases and
controls is determined by the study power.
- For GX experiment: at least three biological (not
technical) replicates, better four. Optional- pooling.
15
Gene Expression Microarrays
16
RNA Labeling and Amplification
17
Direct Hybridization Assay
• 750 ng of aRNA.
• Array hybridization, washing, blocking and
streptavidin-Cy3 staining.
• Quantitatively detection of Cy3 fluorescence.
18
Gene Expression Data Analysis
19
Clustering Analysis
• % of variance component
• 3D PCA (Principal Component Analysis)
• Hierarchical clustering (heat map
and dendogram)
20
The problem of Multiple Comparisons
• When doing so many statistical tests, it is almost
guaranteed that many of your results will be
incorrectly considered significant.
• At p-value = 0.05, 5 of every 100 results might be
called significant by mistake (p<0.05).
• But when doing ~20,000 tests per experiment (as
with microarrays and RNAseq), p-value of 0.05 means
that 1000 genes will be called significant by mistake.
21
The Solution for Multiple Comparisons
• Filter out transcripts with expression at background
level or transcripts with less than 5% variation between
samples.
• Bonferroni correction: p/number of tests so,
0.05/20,000 = 2.5x10^-6. Thus, you should consider a
result significant if p<2.5x10^-6.
• False Discovery Rate (FDR): the expected proportion of
false-positives among the positive results. At adjusted
p=0.05, if you call 1000 genes significantly changed, up to
50 of those might be false positives, but the rest will be
true positives.  q-value/adjusted p-value
22
Analysis of Variance (ANOVA)
• ANOVA allows us to compare means of ≥2 groups.
• You will get a list of significant differentially expressed
genes between groups (at an adjusted p-value of 0.05
and difference between the means of 1).
• Volcano plots (-log10 (p-value)=2  p-value=0.01
23
Venn Diagram
24
Pathway and Network Analysis
25
26
27
28
SNP Microarrays
29
ILM Whole-Genome SNP Genotyping
• Available for human and agriculture products
• Each array contains 700K-5M probes
• Off-the-shelf and custom arrays
• 200 ng gDNA (quantitation method: Qubit or
PicoGreen)
• SNPs can be used in GWAS for identification of
susceptibility loci (large groups of cases and controls)
and for analysis of copy number variation (CNVs)
30
Infinium Assay Workflow
rs2179648
2.00
1.80
1.60
1.40
Norm R
1.20
1
0.80
0.60
0.40
0.20
0
-0.20
22
0
44
0.20
0.40
0.60
0.80
30
1
Norm Theta
31
Data Analysis for SNPs
rs12191877
rs2179648
2.00
1.20
1.80
1.60
1
1.40
0.80
Norm R
Norm R
1.20
1
0.80
0.60
0.40
0.60
0.40
0.20
0.20
0
-0.20
0
22
0
44
0.20
0.40
0.60
Norm Theta
0.80
30
1
-0.20
6
0
33
0.20
0.40
0.60
Norm Theta
56
0.80
1
32
33
34
CNV Analysis
35
Milestones in Cytogenetics
1960s G-banding karyotype
1980s FISH
1990s CGH
aCGH
rs2179648
2.00
1.60
1.40
1.20
Norm R
1999 SNP arrays
1.80
1
0.80
0.60
0.40
0.20
0
-0.20
22
0
44
0.20
0.40
0.60
0.80
30
1
Norm Theta
2005 NGS
36
Detection of Structural Variation
37
Data Analysis for CNVs
• B Allele Frequency (BAF)- expresses the proportion of
intensity contribution of a given allele, which indicates the
genotype.
BAF=Number of B alleles/Total number of alleles
• Log2 R Ratio (LRR)- expresses the signal intensity of each
probe, which indicates the copy number.
LRR=log2(Robserved/Rexpected)
38
Data Analysis for CNVs
Normal (diploid)
Deletion (loss of one copy)
Duplication (gain of one copy)
Copy-Neutral LOH (UPD)
LRR
Intensity
BAF
Genotype
Also can detect: Amplification,
Unbalanced aberration,
Aneuploidy, and Mosaicism.39
Data Analysis for CNVs
Normal (diploid)
LRR=0
CN=2
BAF
AA=0/2=0
BB=2/2=1
AB=1/2=0.5
LRR
Intensity
BAF
Genotype
40
Data Analysis for CNVs
Deletion (loss of one copy)
LRR = -0.5
CN=1
BAF
A/- = 0/1=0
B/- = 1/1=1
LRR
Intensity
BAF
Genotype
41
Data Analysis for CNVs
Duplication (gain of one copy)
LRR=0.5
LRR
Intensity
BAF
Genotype
CN=3
BAF
AAA=0/3=0
BBB=3/3=1
AAB=1/3=0.33
ABB=2/3=0.67
42
Data Analysis for CNVs
LRR=0
CN=2
BAF
AA=0/2=2
BB=2/2=1
AB=Not present
Copy-Neutral LOH (UPD)
LRR
Intensity
BAF
Genotype
43
44
45
46
Methylation Microarrays
47
Methylation
C
G
48
ILM Whole-Genome Methylation
• > 850K methylation sites that covers:
- 99% of RefSeq genes, ~20 CpG sites per gene
distributed across the promoter, 5’UTR, first
exon, gene body, and 3’UTR.
- 96% of CpG islands.
- CpG sites outside of CpG islands.
- Non-CpG methylated sites identified in human stem cells.
- Differentially methylated sites in tumors.
- Diff methy sites across tissues.
- CpG islands outside of coding regions.
- miRNA promoter regions.
• Experimental design should consider effect of age, gender
49
and tissue, on the methylation profile.
Methylation Assay
50
51
52
Summary
• Microarrays technology is a powerful tool in
research.
• Microarrays provide insight for basic mechanism,
disease subtypes, treatment adjustment, and
diseases risks.
• Data analysis of microarrays has been standardized
and package into easy to use software tools for
biologists.
• However the success of experiment depends on
experimental design, biological samples collection,
and carful statistical analysis.
• Expected to be totally replaced soon by the NGS
technology (already happened for gene expression).
53
Thank you!
Questions?
54
Download