Summary from Chapter 1, 2, 3 of Data Analysis Tools for DNA Microarrays by Sorin Draghici The need for microarrays The genomes of many model organisms have been sequenced so can look at large scale screening of gene expression levels under the influence of a particular factor Understanding the networks of bio-molecular interactions at a global scale Compare various tissues with each other, or a tumor with the healthy tissues surrounding it study the effect of drugs or stressors by monitoring the gene expression levels gene expression can be used to understand the phenomena related to aging, fetal development,etc. screening tests for various conditions can be designed if those conditions are characterized by specific gene expression patterns Microarrays DNA array is usually a substrate (nylon membrane, glass or plastic) on which one deposits single stranded DNAs (ssDNA) with various sequences Probe: ssDNA printed on the solid substrate Target: a solution containing ssDNA generated from the biological sample under study that is used to wash the array (the reverse transcribed mRNA) Target is illuminated with an appropriate source of light to provide an image of the array of features (spots in cDNA arrays or sets of probes in GenChips (Affymetrix) ) Intensity of each spot or the average difference between matches and mismatches can be related to the amount of mRNA present in the tissue and the amount of protein produced by the gene corresponding to the given features Fabrication of Microarrays 1. Deposition of DNA fragments (Spotted arrays – Agilent, Protogene, etc.): Robots dip thin pins into solutions containing desired DNA material, then touch pins onto surface of arrays Small quantities of DNA are deposited on the array in the form of spots Spotted arrays can use small sequences, whole genes or even arbitrary PCR products Uses DNA cloning which involves selective amplification of desired fragment of DNA There is another method for deposition is the attachment of synthesized oligonucleotides to solid support 2. In situ synthesis (Affymetrix arrays) Probes are photochemically synethesized on the chip No cloning, no spotting, and no PCR carried out Elimination of these steps, which introduces a lot of noise in the cDNA system, is an advantage Affymetrix arrays use photolithographic methods similar to technology used for very large scale integrated circuits in modern computers Synthetic linkers modified with photochemical removable protecting groups are attached to a glass surface Light is shed through a photolithographic mask to a specific area on the surface to produce a localized photodeprotection A series of hydroxyl-protected deoxynucleosides is incubated on the surface Mask is then directed to another region of the substrate by a new mask and the chemical cycle is repeated One nugleotide after another is added until the desired chain is synthesized The sequence of this nucleotide corresponds to a part of the gene in the organism under investigation Gene expression arrays have a match/mismatch probe strategy Reference probes: probes that match the target sequence exactly Mismatch probes: for each reference probe, there is a probe containing a nucleotide change at the central base position Average difference between reference and mismatch represents the expression level of the gene Other methods of in situ synthesis use ink jet technology or electrochemical synthesis Comparing Spotted and Affymetrix Arrays Spotted technology is more flexible Affymetrix is more reliable and easier to use Spotted has longer sequences and is easier to analyze with appropriate experimental design Spotted arrays spot unknown sequences while affymetrix arrays spot known sequences Sources of Variation Difficult to distinguish between the variation introduced by the different expression levels and the variation inherent to the laboratory process itself Preparation of mRNA, even if kits from the same company and batch, will yield different results – tissues, kits and procedures vary Target preparation: enzyme mediated reverse transcription of mRNA and concomitant incorporation of fluorescently labeled nucleotides – inherent variation in the reaction, type of enzymes used and the type of labeling and procedures as well as age of labels More specifically: Slide preparation stages: pin type variation, surface chemistry, humidity,target volume, slide inhomogeneities and target fixation Hybridization: hybridization parameters, non-specific spot hybridization and nonspecific background hybridization Pin geometry: different surfaces and properties due to production random errors and between pin types Volumes deposited by microarrays are extremely small so accuracy is difficult Surface chemistry and humidity affect spot formation Amplification (PCR protocol) – PCR is difficult to quantify Target fixation: fraction of target cDNA that is chemically linked to the slide surface from the droplet is unknown Hybridization parameters: influenced by temperature, time, buffering conditions and others Slide inhomogeneities: slide production parameters, batch to batch variation Non-specific hybridization: cDNA hybridizes to background or to sequences that are no their exact complement Gain setting (PMT): shifts the distribution of pixel intensities Dynamic range limitations: variability at low end or saturation at the high end Image alignment: images of same array at various wavelengths corresponding to different channels are not aligned; different pixels are considered for the same spot Grid placement: centre of the spot in s=not located properly Non-specific background – irregular spots are hard to segment from background Spot shape: irregular pots are hard to segment from background Segmentation: bright contaminants can seem like signal (e.g. dust) Spot quantification: pixel mean, median, area, etc. Image Porcessing 1. Deposition of DNA fragments (Spotted arrays): Array localization – spot finding Image segmentation – separating the pixels into signals, background and other Quantification – computation of values representative for the signal and background levels of each spot – this measure of expression level of gene can be calculated as the total, mean, median, mode of the signal intensity, the volume of the signal intensity of the intensity ratio between the channels Spot quality assessment:- computation of quality measures 2. In situ synthesis (Affymetrix arrays) Produces several types of information including calls and average differences Calls are meant to provide qualitative information about genes – present, marginally present or absent Average difference is calculated with the reference and mismatch of all probes corresponding to a given gene – quantitative measure of expression level of the gene