Microarrays A snapshot that captures the activity pattern of thousands of genes at once. Custom spotted arrays Affymetrix GeneChip Microarray Process Practical Applications of Microarrays Gene Target Discovery By allowing scientists to compare diseased cells with normal cells, arrays can be used to discover sets of genes that play key roles in diseases. Genes that are either overexpressed or underexpressed in the diseased cells often present excellent targets for therapeutic drugs. Pharmacology and Toxicology Arrays can provide a highly sensitive indicator of a drug’s activity (pharmacology) and toxicity (toxicology) in cell culture or test animals. This information can then be used to screen or optimize drug candidates prior to launching costly clinical trials. Diagnostics Array technology can be used to diagnose clinical conditions by detecting gene expression patterns associated with disease states in either biopsy samples or peripheral blood cells. Microarray Platforms •Oligonucleotide-based arrays •25mers spotted on a glass wafer, Affymetrix GeneChip arrays •Custom spotted 50-80mers generated from known sequences. •cDNA •Inserts from cDNA libraries •PCR products generated from gene specific or universal primers GeneChip Instrument System ® Fluidics Station Scanner made by Hewlett-Packard Computer Workstation GeneChip Probe Array ® GeneChip Probe Arrays ® GeneChip Probe Array Hybridized Probe Cell Single stranded, fluorescently labeled DNA target Oligonucleotide probe 1.28cm * * * * * 24µm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe Over 250,000 different probes complementary to genetic information of interest Image of Hybridized Probe Array Synthesis of Ordered Oligonucleotide Arrays Light (deprotection) Mask OOOOO TTOOO HO HO O O O T– Substrate Light (deprotection) Mask CATAT AGCTG TTCCG TTCCO TTOOO C– Substrate REPEAT Probe Tiling Strategy Gene Expression (25-mer) Gene Expression Tiling Strategy [ ] [ ] [ [ Uninduced ] ] [ ] [ ] Induced 40 separate hybridization events are involved in determining the presence or absence of a transcript 80 separate hybridization events are involved determining differential gene expression of a transcript between two samples Starting material for Microarrays Platform Affymetrix Poly (A)+ mRNA Total RNA ~2 mg ~10 mg Spotted arrays Poly (A) + Total RNA ~0.4 – 2 mg 10 -100 mg Experimental Design Biotin - labeled cRNA transcript Cells B + Poly (A) RNA Or Total RNA IVT cDNA Biotin-UTP Biotin-CTP B B B B B B Fragment heat, Mg2+ B Hybridize B B B B Wash & Stain Scan (8 minutes) (75 minutes) (16 hours) Biotin - labeled cRNA fragments Add Oligo B2 & Staggered Spike Controls Normalization and Scaling Non-biological factors can contribute to the variability of data in many biological assays, therefore it is important to minimize the non-biological differences. Factors that may contribute to variation include: •Amount and quality of target hybridized to array •Amount of stain applied •Experimental variables The data can be normalized from: •a limited group of probe sets •all probe sets Thus the normalization of the array is multiplied by a Normalization Factor (NF) to make its Average Intensity equivalent to the Average Intensity of the baseline array. Normalization and Scaling Average intensity of an array is calculated by averaging all the Average Difference values of every probe set on the array, excluding the highest 2% and lowest 2% of the values. Data Processing: •Analytical Spreadsheet can Handle Millions of Rows or Columns •Scaling & Normalization (e.g. standardize, log-scale, log & linear scale, power) •Sort rows by Value or by Similarity to Prototype (find genes most similar to specified prototype) •Missing Data Handling (e.g. analysis, casewise deletion, imputation) Cluster and Tree View Microarray Process Products used for spotting Easy-To-Spot™ Products (Incyte Genomics) •Every clone is sequence-verified prior to PCR • PCR products are purified to remove excess salts, unincorporated nucleotides, primers, and particulates • Quality controlled production process with failure rate1 of less than 10% • 8,734 PCR products from sequenced-verified clones from the UniGene database from NCBI, average length is greater than 500 nucleotides •Between 1-3 ug of DNA per well. Enough to fabricate 500 to 1,000 arrays • Corresponding clones available for purchase for further research Indirect labeling Simple, highly sensitive technique requires less starting RNA, and creates evenly labeled DNA without dye bias. •Uniform incorporation of fluorescent dyes produces more reliable signals •High sensitivity to detect lowcopy signals •Requires only 10 to 20 µg of total RNA or 0.4 to 1 µg of polyA RNA Clontech Atlas™ Glass Fluorescent Labeling Kit Stratagene FairPlay™ Microarray Labeling Kit Array Ready Oligo set (Operon Technologies) Complete Yeast Genome Oligo Set • Optimized 70-mer oligonucleotides for each of the 6,307 open reading frames (ORFs) of Saccharomyces cerevisiae from the Saccharomyces Genome Database (SGD) at Stanford University •The amount of sample provided with each set is sufficient to print between 2000 and 6000 slides, depending on the printing procedure used. Human Genome Oligo Set •This Array-Ready Oligo Set™ contains arrayable 70-mers representing 13,971 well-characterized human genes from the UniGene database. This database is located at the National Center for Biotechnology Information. •All 70-mer oligonucleotides in the Human Genome Oligo Set were designed from the representative sequences in the UniGene database, Hs build #119. The set also contains 29 controls. GeneMachine Omni Grid Arrayer Printing Pin Axon GenePix4000A Scanner • 10mm pixel size • Simultaneously scans array slides at two wavelengths • User-selectable laser power • User-selectable focus poisitions GenePix Pro Features • Auto Align Before Auto Align After Auto Align GenePix Pro Features •Feature Viewer P = pixel intensity F = feature intensity B = background intensity Rp = ratio of pixel intensities Rm = ratio of means mR = median of ratios rR = regression ratio GenePix Pro Features •Feature Pixel Plot GenePix Pro Features •Histogram GenePix Pro Features •Scatter Plot Spotted glass slide microarrays Advantages Low cost per array Custom gene selection Any species Competitive hybridization Open architecture Disadvantages Clone management Clone cost Quality control Affymetrix GeneChip system Advantages Stream line production Large number of genes and ESTs/chip Several number of species Disadvantages System cost GeneChip cost Propietary system Limits on customizing GeneCards Database Challenges in analyzing Microarray Data •Amount of DNA in spot is not consistant •Spot contamination •cDNA may not be proportional to that in the tissue •Low hybridization quality •Measurement errors •Spliced variants •Outliers •Data are high-dimensional “multi-variant” •Biological signal may be subtle, complex, non linear, and buried in a cloud of noise •Normalization •Comparison across multiple arrays, time points, tissues, treatments •How do you reveal biological relationships among genes? •How do you distinguish real effect from artifact? Factors to consider in designing microarray experiments •Need to do lots of control experiments-validate method •Do replicate spotting, replicate chips, and reverse labeling for custom spotted chips •Do pilot studies before doing “mega chip” experiments •Don’t design experiment without replication; nothing will be learned from a single failed experiment •Design simple (one-two factor) experiments, i.e. treatment vs. untreatment •Understand measurement errors •In designing Databases; they are useful ONLY if quality of data is assured •Involve statistical colleagues in the design stages of your studies Once you have identified an interesting expression pattern, what comes next? •With some arrays it is possible to purchase clones of interest for further experimentation. •Confirm that the particular clone you now have in your hand shows the expression pattern so indicated by the array, quantitating individual mRNA species. •RT-PCR, Relative, quantitative RT-PCR uses an internal standard to monitor each reaction and allow comparisons between different reactions to be made. • Competitive RT-PCR --a competition between a known amount of a template and an unknown target. •Northern analysis