Data Type 1: Microarrays • Reverse Genetics approach • Genomics • So we need to understand what exactly is a Microarray • DNA Microarrays are small, solid supports(the size of two side-by-side pinky fingers) onto which the sequences from thousands of different genes are immobilized, or attached, at fixed locations. • So we need to KNOW the sequence to design this array. Definition of Microarray A semiconductor device that is used to detect the DNA makeup of a cell. • It contains hundreds of thousands of tiny squares designed to mate with a particular gene. • They react to the liquified cells poured over it and are detectable by a laser. • Microarrays are helping revolutionize medicine by being able to pinpoint a very specific disease or the susceptibility to it. • Sometimes called "biochips," micro arrays are commonly known as "gene chips," GeneChip is an Affymetrix trademark. Microarrays have • revealed new patterns of coordinated gene expression across gene families, • expanded the size of existing gene families, • increased our understanding of how these genes coordinate • precise knowledge of these inter-relationships has emerged • speeded the identification of genes involved in the development of various diseases • aided the examination of the integration of gene expression and function at the cell level, • revealed how multiple gene products work together and will continue to do so. A Microarray Gene Expression Analysis • Typical Northern Blot: One gene/ experiment/ more than one sample – Fairly quantitative – Time consuming – Limited information • Microarray and RNA-seq: thousands of gene/one sample – Fairly quantitative – Less time – Massive information REMEMBER: Central dogma of molecular biology •Each gene is transcribed (at the appropriate time) from DNA into mRNA, which then leaves the nucleus and is translated into the required protein. •This is the principle used for microarrays Types of Microarrays • Two channel spotted arrays: (more historical than used now) – cDNA microarrays: probes are cDNA 300-3000 base pairs in length, PCRd from custom libraries and spotted onto a microscope slide using a robot. – Long-oligo spotted arrays: shorter but uniform length nucleotide probes, 60-90 bp, usually synthesized and spotted as with cDNA, but there are also – Commercial spotted arrays (e.g., Agilent) spotting longoligo probes using inkjet technology. • Single channel arrays: – High-density short (e.g. 25 bp) oligo arrays (Affymetrix,Nimblegen) synthesized in situ. Single-channel. – Some commercial arrays (e.g., Applied Biosystems AB1700) – Some Spotted Single Channel cDNA arrays (not that common but exists) Single Channel AFFY chip Affymetrix “Gene chip” system in 2007 • Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene) • RNA labeled and scanned in a single “color” – one sample per chip • Can have as many as 50,000 genes on a chip and it keeps increasing • Arrays get smaller every year (more genes) • Chips are expensive • Proprietary system: “black box” software, can only use their chips • TOUTED AS THE CHIP WITHOUT ISSUES Current Affymetrix • Many products with lot more options for custom arrays. • Have products that can handle multiple samples at once. • Description for Affymetrix HG-U133, MG-430, and RG-230 Array • These data sources are used to design probes that interrogate 9 to 11 unique sequences of each transcript. • The unique 25-mer probes interrogate up to 275 bases per transcript. Affy-GWAS chips • The new Affymetrix Genome-Wide Human SNP Array 6.0 features 1.8 million genetic markers, including more than 906,600 single nucleotide polymorphisms (SNPs) and more than 946,000 probes for the detection of copy number variation. • The high price-performance value of the SNP Array 6.0 enables researchers to design association studies with larger sample sizes in the initial scan and replication phases, thereby significantly increasing the overall genetic power of their studies. Current Content – – – – – – – – – RefSeq probe sets HG-U133 HG-U219 MG-430 RG-230 NM – RefSeq coding transcript, 38,026 43,134 34,325 17,277 XM – RefSeq coding transcript, 1,071 177 1,021 2,692 NR – RefSeq non-coding transcript, 1,515 542 438 12 XR – RefSeq non-coding transcript, 476 79 108 227 RefSeq probe sets (total) 41,090 43,932 35,892 20,208 UniGene probe sets 10,377 5,388 2,819 8,842 Other probe sets 3,273 334 6,412 2,070 Total probe sets 54,700 49,411 45,123 31,120 Probe Design: Affymetrix Photolitigraphic Synthesis Scanning AFFY chips • Light removes protecting groups at defined positions. • Single nucleotide washed over the chip, binds where the protecting group removed. • Through successive steps, any sequence can be built up in any position on the chip. • The number of steps corresponds with length of oligo, so can increase # of genes without # of steps Analysis of expression level from probe sets . Each pixel is quantitated and integrated for each oligo feature (range 0-25,000) Perfect Match (PM) Mis Match (MM) Control PM - MM = difference score per probe set All significant difference scores are averaged to create “average difference” = expression level of the gene. Microarray Data Analysis How to Handle Microarray Data? Preprocessing: •Signal Generation from Image • Normalization • Filtering Analysis: • Statistical Tests for differential Expressions, t tests, non-parametric Tests, ANOVA • Clustering: Hierarchical, non-hierarchichal, SOM •Classification: Discriminant Analysis, PCA The Main Goal of Microarray Data Analysis is to Generate a List Of ‘Interesting’ Genes