Introduction to DNA Microarrays: TECHNOLOGY

advertisement
Data Type 1: Microarrays
• Reverse Genetics approach
• Genomics
• So we need to understand what exactly is a Microarray
• DNA Microarrays are small, solid supports(the size of two
side-by-side pinky fingers) onto which the sequences from
thousands of different genes are immobilized, or attached, at
fixed locations.
• So we need to KNOW the sequence to design this array.
Definition of Microarray
A semiconductor device that is used to detect the DNA makeup of
a cell.
• It contains hundreds of thousands of tiny squares designed to
mate with a particular gene.
• They react to the liquified cells poured over it and are
detectable by a laser.
• Microarrays are helping revolutionize medicine by being able
to pinpoint a very specific disease or the susceptibility to it.
• Sometimes called "biochips," micro arrays are commonly
known as "gene chips," GeneChip is an Affymetrix trademark.
Microarrays have
• revealed new patterns of coordinated gene expression across
gene families,
• expanded the size of existing gene families,
• increased our understanding of how these genes coordinate
• precise knowledge of these inter-relationships has emerged
• speeded the identification of genes involved in the development
of various diseases
• aided the examination of the integration of gene expression and
function at the cell level,
• revealed how multiple gene products work together
and will continue to do so.
A Microarray
Gene Expression Analysis
• Typical Northern Blot: One gene/ experiment/ more than
one sample
– Fairly quantitative
– Time consuming
– Limited information
• Microarray and RNA-seq: thousands of gene/one sample
– Fairly quantitative
– Less time
– Massive information
REMEMBER:
Central dogma of molecular biology
•Each gene is transcribed (at the appropriate time) from
DNA into mRNA, which then leaves the nucleus and is
translated into the required protein.
•This is the principle used for microarrays
Types of Microarrays
• Two channel spotted arrays: (more historical than used now)
– cDNA microarrays: probes are cDNA 300-3000 base pairs in length,
PCRd from custom libraries and spotted onto a microscope slide using a
robot.
– Long-oligo spotted arrays: shorter but uniform length nucleotide
probes, 60-90 bp, usually synthesized and spotted as with cDNA, but
there are also
– Commercial spotted arrays (e.g., Agilent) spotting longoligo probes
using inkjet technology.
• Single channel arrays:
– High-density short (e.g. 25 bp) oligo arrays (Affymetrix,Nimblegen)
synthesized in situ. Single-channel.
– Some commercial arrays (e.g., Applied Biosystems AB1700)
– Some Spotted Single Channel cDNA arrays (not that common but
exists)
Single Channel AFFY chip
Affymetrix “Gene chip” system in 2007
• Uses 25 base oligos synthesized in place on a chip (20 pairs
of oligos for each gene)
• RNA labeled and scanned in a single “color”
– one sample per chip
• Can have as many as 50,000 genes on a chip and it keeps
increasing
• Arrays get smaller every year (more genes)
• Chips are expensive
• Proprietary system: “black box” software, can only use their
chips
• TOUTED AS THE CHIP WITHOUT ISSUES
Current Affymetrix
• Many products with lot more options for custom arrays.
• Have products that can handle multiple samples at once.
• Description for Affymetrix HG-U133, MG-430, and RG-230 Array
• These data sources are used to design probes that interrogate 9 to 11 unique
sequences of each transcript.
• The unique 25-mer probes interrogate up to 275 bases per transcript.
Affy-GWAS chips
• The new Affymetrix Genome-Wide Human SNP Array 6.0 features
1.8 million genetic markers, including more than 906,600 single
nucleotide polymorphisms (SNPs) and more than 946,000 probes for
the detection of copy number variation.
• The high price-performance value of the SNP Array 6.0 enables
researchers to design association studies with larger sample sizes in
the initial scan and replication phases, thereby significantly
increasing the overall genetic power of their studies.
Current Content
–
–
–
–
–
–
–
–
–
RefSeq probe sets
HG-U133 HG-U219 MG-430 RG-230
NM – RefSeq coding transcript,
38,026 43,134 34,325 17,277
XM – RefSeq coding transcript,
1,071
177
1,021
2,692
NR – RefSeq non-coding transcript, 1,515
542
438
12
XR – RefSeq non-coding transcript, 476
79
108
227
RefSeq probe sets (total)
41,090 43,932 35,892 20,208
UniGene probe sets
10,377 5,388
2,819
8,842
Other probe sets
3,273
334
6,412
2,070
Total probe sets
54,700 49,411 45,123 31,120
Probe Design: Affymetrix
Photolitigraphic Synthesis
Scanning AFFY chips
• Light removes protecting groups at defined positions.
• Single nucleotide washed over the chip, binds where the
protecting group removed.
• Through successive steps, any sequence can be built up in any
position on the chip.
• The number of steps corresponds with length of oligo, so can
increase # of genes without # of steps
Analysis of expression level from probe sets
.
Each pixel is quantitated and integrated for each
oligo feature (range 0-25,000)
Perfect Match (PM)
Mis Match (MM) Control
PM - MM = difference score per probe set
All significant difference scores are averaged to
create “average difference” = expression level of
the gene.
Microarray Data Analysis
How to Handle Microarray Data?
Preprocessing:
•Signal Generation from Image
• Normalization
• Filtering
Analysis:
• Statistical Tests for differential Expressions, t tests, non-parametric
Tests, ANOVA
• Clustering: Hierarchical, non-hierarchichal, SOM
•Classification: Discriminant Analysis, PCA
The Main Goal of Microarray Data Analysis is to Generate a
List Of ‘Interesting’ Genes
Download