236632 – Topics in Microarray Data Analysis Winter 2007-8 26.12.2007 Lecture #9: Analysis of DNA Copy Numbers from Array-based Comparative Genomic Hybridization (aCGH) Lecturer: Zohar Yakhini Scribes: Anat Eck & Lena Kleyner Genomic instability is common in cancer. One form of the genomic instability is changes in the copy number of particular regions such as deletions and amplifications. Normal Karyotype Colon Carcinoma HT-29 Figure 1 Comparing a normal karyotype to a carcinogenic cell line karyotype reveals differences in the copy number and in the ploidy. Some examples of changes in the DNA copy number in cancer: Retinoblastoma is a cancer of the retina. In the hereditary form of the disease, all the cells of the body contain only one normal copy of the Rb gene. The malignant tumor appears as a result of a deletion in 13q which leaves the cell without a normal copy of Rb. The deletion happens in a high probability due to the proximity of Rb to the telomere. A certain kind of breast cancer is caused by over expression and amplification of the her2 (human epidermal growth factor receptor 2) gene. Herceptin is a drug which blocks the her2 receptors but has severe side effects. A CGH based test is used to determine whether the patient is her2 positive and therefore will benefit from Herceptin. Hence, measurement of DNA copy number plays an essential role in clinical diagnostic and treatment. Nowadays, the modern medicine is on the edge of using genetic information in order to choose the appropriate personal treatment. CGH - Comparative Genomic Hybridization CGH is a method which is used for detecting loss, gain and amplification of the copy number at the levels of chromosomes throughout the whole genome. There are two kinds of CGH: 1. Classic CGH method: Two sets of samples are isolated and labeled - DNA from subject tissue (usually cancer) is coloured in red and DNA from normal control tissue (reference) is coloured in green. The mix is hybridized to normal metaphase chromosomes which are used as a template. The color ratio along the chromosomes is used to evaluate regions of DNA gain or loss in the subject sample. A region coloured in red indicates of a high amplification in the subject tissue (cancer), whereas green indicates of a deletion in the subject tissue. 2. aCGH – array based CGH: Similarly to the classic method, two sets of samples are isolated and labeled, but the hybridization is performed using microarray. In order to infer about the number of copies in the genome, an injective binding is required. The resolution of aCGH is higher than the one achieved in the classic method, and is dependent on the probes uniqueness and the array size. There are three different kinds of aCGH which differ in the content of the array: BAC (bacterial artificial chromosome) - relative long fragments of DNA which are cloned in bacteria. This technology is difficult to maintain. cDNA- has better resolution, but is limited only for genes (since the cDNA is produced in reverse transcription from mRNA). Oligonucleotide (60-mers) - the array size is larger (44 - 244K). Enables to find deletion breaking points in the genome. 4 BAC array [1998] 2 0 -2 0 1 2 3 4 5 6 7 8 7 x 10 cDNA array [1999] 4 2 0 -2 0 1 2 3 4 5 6 7 8 7 x 10 oligo array [2004] 4 2 0 -2 0 1 2 3 4 5 6 BT474 cell-line, chr 17 (x10Mbp) 7 8 7 x 10 Figure 2 In the above figure, the x axis describes the location on chromosome 17 and the y axis describes the log RED GREEN value. We can see the increasing resolution throughout the technique progress. The amplification in chromosome 17 is also clearly seen. The first aCGH utilized gene expression arrays, but since they were not originally designed for that purpose the performance was relatively low. The problems occurred when using gene expression arrays: Gene expression arrays are designated for RNA whereas CGH measures DNA therefore the specificity is damaged. Gene expression arrays are limited to genes, whereas CGH measures changes in the copy number throughout the whole genome. There are cases where the probe has no complementary sequence in the genomic DNA. One example for that (the probe marked in red): Exon1 Exon2 The specificity is damaged - a specific probe for cDNA may not be specific anymore when we use the whole genomic DNA (e.g. repetitive sequences). Comparison between expression array and CGH specific array The histograms below show a comparison between the performance of CGH using gene expression array and the performance of CGH using a specific designed array for detecting varying number of the X chromosome: Theoretical XY XX Log2 = -1.0 XX XX XXX XX XXXX XX XXXXX XX Log2 Log2 Log2 Log2 = 0.0 = 0.6 = 1.0 = 1.3 A Human 1A Expression array B Slope=0.47 Number of Probes Measured Log2(Ratio) Measured Log2(Ratio) C D Research prototype CGH array Slope=0.96 Number of Probes Theoretical Log2(Ratio) Measured Log2(Ratio) Measured Log2(Ratio) Barrett et al., PNAS 101:17765-17770, Dec 2004. Theoretical Log2(Ratio) Figure 3 The experiment included hybridizations of genomic DNA samples from a series of cell lines with variable copy numbers of the X chromosome using 46,XX DNA as a reference. This was done in two ways: using expression array (fig. A) and using a specific designed array (fig. C). Measured mean and median fluorescence ratios of X-chromosome probes were plotted versus theoretical ratios for the expression (fig. B) and CGH (fig. D) arrays. A comparison of the median ratios for the X-chromosome probes from these hybridizations on the expression array (373 probes) and the CGH array (4,878 probes) revealed that the slope increased from 0.47 to 0.96, which is much closer to the theoretical value – 1. These data emphasize the usefulness of designing and selecting in situ synthesized oligonucleotide probes for CGH assays. Copy Number Polymorphism Figure 4 A normal sample aCGH profile of chromosome 1 shows that beyond the expected noise, there are copy number variations (the extreme points) which are common among the normal population. CGH Data Analysis Given a raw data matrix, our goal is to transform it into a corresponding step function. This would enable the identification of genomic regions in which deletion or amplification events have occured (aberration calling). Figure 5 The figure above is an example for a corresponding step function to a given aCGH data.