Summary from Chapter 1, 2, 3 of Data Analysis Tools for DNA

advertisement
Summary from Chapter 1, 2, 3 of Data Analysis Tools for DNA Microarrays by Sorin
Draghici
The need for microarrays






The genomes of many model organisms have been sequenced so can look at large
scale screening of gene expression levels under the influence of a particular factor
Understanding the networks of bio-molecular interactions at a global scale
Compare various tissues with each other, or a tumor with the healthy tissues
surrounding it
study the effect of drugs or stressors by monitoring the gene expression levels
gene expression can be used to understand the phenomena related to aging, fetal
development,etc.
screening tests for various conditions can be designed if those conditions are
characterized by specific gene expression patterns
Microarrays





DNA array is usually a substrate (nylon membrane, glass or plastic) on which one
deposits single stranded DNAs (ssDNA) with various sequences
Probe: ssDNA printed on the solid substrate
Target: a solution containing ssDNA generated from the biological sample under
study that is used to wash the array (the reverse transcribed mRNA)
Target is illuminated with an appropriate source of light to provide an image of the
array of features (spots in cDNA arrays or sets of probes in GenChips (Affymetrix) )
Intensity of each spot or the average difference between matches and mismatches can
be related to the amount of mRNA present in the tissue and the amount of protein
produced by the gene corresponding to the given features
Fabrication of Microarrays
1. Deposition of DNA fragments (Spotted arrays – Agilent, Protogene, etc.):
 Robots dip thin pins into solutions containing desired DNA material, then touch pins
onto surface of arrays
 Small quantities of DNA are deposited on the array in the form of spots
 Spotted arrays can use small sequences, whole genes or even arbitrary PCR products
 Uses DNA cloning which involves selective amplification of desired fragment of
DNA
 There is another method for deposition is the attachment of synthesized
oligonucleotides to solid support
2.



In situ synthesis (Affymetrix arrays)
Probes are photochemically synethesized on the chip
No cloning, no spotting, and no PCR carried out
Elimination of these steps, which introduces a lot of noise in the cDNA system, is an
advantage












Affymetrix arrays use photolithographic methods similar to technology used for very
large scale integrated circuits in modern computers
Synthetic linkers modified with photochemical removable protecting groups are
attached to a glass surface
Light is shed through a photolithographic mask to a specific area on the surface to
produce a localized photodeprotection
A series of hydroxyl-protected deoxynucleosides is incubated on the surface
Mask is then directed to another region of the substrate by a new mask and the
chemical cycle is repeated
One nugleotide after another is added until the desired chain is synthesized
The sequence of this nucleotide corresponds to a part of the gene in the organism
under investigation
Gene expression arrays have a match/mismatch probe strategy
Reference probes: probes that match the target sequence exactly
Mismatch probes: for each reference probe, there is a probe containing a nucleotide
change at the central base position
Average difference between reference and mismatch represents the expression level
of the gene
Other methods of in situ synthesis use ink jet technology or electrochemical synthesis
Comparing Spotted and Affymetrix Arrays
 Spotted technology is more flexible
 Affymetrix is more reliable and easier to use
 Spotted has longer sequences and is easier to analyze with appropriate experimental
design
 Spotted arrays spot unknown sequences while affymetrix arrays spot known
sequences
Sources of Variation
 Difficult to distinguish between the variation introduced by the different expression
levels and the variation inherent to the laboratory process itself
 Preparation of mRNA, even if kits from the same company and batch, will yield
different results – tissues, kits and procedures vary
 Target preparation: enzyme mediated reverse transcription of mRNA and
concomitant incorporation of fluorescently labeled nucleotides – inherent variation in
the reaction, type of enzymes used and the type of labeling and procedures as well as
age of labels
More specifically:
 Slide preparation stages: pin type variation, surface chemistry, humidity,target
volume, slide inhomogeneities and target fixation
 Hybridization: hybridization parameters, non-specific spot hybridization and nonspecific background hybridization
 Pin geometry: different surfaces and properties due to production random errors and
between pin types
 Volumes deposited by microarrays are extremely small so accuracy is difficult














Surface chemistry and humidity affect spot formation
Amplification (PCR protocol) – PCR is difficult to quantify
Target fixation: fraction of target cDNA that is chemically linked to the slide surface
from the droplet is unknown
Hybridization parameters: influenced by temperature, time, buffering conditions and
others
Slide inhomogeneities: slide production parameters, batch to batch variation
Non-specific hybridization: cDNA hybridizes to background or to sequences that are
no their exact complement
Gain setting (PMT): shifts the distribution of pixel intensities
Dynamic range limitations: variability at low end or saturation at the high end
Image alignment: images of same array at various wavelengths corresponding to
different channels are not aligned; different pixels are considered for the same spot
Grid placement: centre of the spot in s=not located properly
Non-specific background – irregular spots are hard to segment from background
Spot shape: irregular pots are hard to segment from background
Segmentation: bright contaminants can seem like signal (e.g. dust)
Spot quantification: pixel mean, median, area, etc.
Image Porcessing
1. Deposition of DNA fragments (Spotted arrays):
 Array localization – spot finding
 Image segmentation – separating the pixels into signals, background and other
 Quantification – computation of values representative for the signal and background
levels of each spot – this measure of expression level of gene can be calculated as the
total, mean, median, mode of the signal intensity, the volume of the signal intensity of
the intensity ratio between the channels
 Spot quality assessment:- computation of quality measures
2. In situ synthesis (Affymetrix arrays)
 Produces several types of information including calls and average differences
 Calls are meant to provide qualitative information about genes – present, marginally
present or absent
 Average difference is calculated with the reference and mismatch of all probes
corresponding to a given gene – quantitative measure of expression level of the gene
Download