ULS labeling versus traditional enzymatic labeling protocol Figure 1.

advertisement
High resolution discovery and characterization of copy number variation using oligonucleotide DNA Microarrays
Anniek De Witte, Shane Giles, Anya Tsalenko, Jayati Ghosh, Dione Bailey, Doug Amorese
Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, CA 95051, USA
Abstract # 5137
Abstract
Comparison of ULS and enzymatic labeling methods
Copy Number Variants (CNVs) are defined as structural variation in the genome (submicroscopic variants, including deletions,
insertions, duplications, and large scale copy number variants) and have recently been shown to be a risk factor for diseases, such as
cancer and mental retardation. A public database that accumulates CNVs on hundreds of healthy individuals is now available
(Database of Genomic Variants http://projects.tcag.ca/varation). High-resolution comparative genomic hybridization (CGH) arrays are
specifically designed for detecting copy number changes and are, with the appropriate content, a very suitable tool for CNV
identification and characterization. In order to detect CNVs that encompass a wide range of sizes and locations, we have developed a
two - 244K array set with a median spacing of 960 bp in CNV regions from the Database of Genomic Variants while maintaining
genome wide coverage at a median spacing of 25.6 kb. These probes were selected using an empirical model that utilizes scores for
homology, thermodynamics, secondary structure, and sequence complexity, in order to predict their relative performance in the assay.
Two sample labeling methods were evaluated, the standard enzymatic labeling and a chemical labeling method known as Universal
Linkage System (ULS). Previously we have shown the success of the ULS method on degraded and FFPE samples. In this CNV
study, we used high quality DNA isolated from blood and cell lines to compare the signal to noise, ease of use, and throughput of the
two methods. We used a workflow for the analysis and visualization of CNV data that includes statistically robust approaches for
calling CNV intervals in individual samples as well as methods for grouping variants from multiple samples into CNV regions (CNVRs).
We evaluated the performance of the platform by profiling DNA from several individuals from the HapMap collection (e.g. NA10851 and
NA15510). Many regions were found to be smaller and many were found to be more variable than previously reported. Taken together,
the results show that highly reproducible, sensitive, and robust high resolution microarray methods can reveal new CNVs, clarify the
boundaries and structure of known CNVs, and uncover how CNVs vary across populations.
Fine scale structural characterization of CNV regions in different HapMap samples
A. Confirmation of CNVs reported in other studies
0.12
0.12
0.14
0.15
DLRSD
Oligonucleotide Microarrays. A two - 244K array set consisting of 2 Agilent Human Genome CGH Microarrays (p/n G4423A)) was used. This two –
array set contains ~ 487,000 probes with a median spacing of 960 bp in CNV regions and in-dels from the Database of Genomic Variants (DGV, Oct2007) as well as 8,061 regions of segmental duplication while maintaining genome wide coverage at a median spacing of 25.6 kb.
#4
#5
B. Reduction in sizes of CNV regions and inference of genotypes
Self-Self
False positives, reproducibility and size distribution of calls
Comparison of called interval heights
B
600
Deletions
Amplifications
400
200
Sample
#3
#4
Self-self
#1
#2
#3
#4
#5
0
-1
1 copy
-2
-3
-4
-5
0 copies
-6
#5
10
85
11
Se
Se
lfs
el
f
lf- (a)
s
10
55 elf
85
10 (b)
1_
10 155 +1(
85
10 a)
1
_
10 -15 +1(
b
85 51
0_ )
1-1
15
51 (a)
0_
-1
(b
)
0
#1
#2
Fraction of Regions
800
C
NA10851-NA15510 Exp # 2
A
Comparison of distribution of CNV sizes
Mean log2 ratio of 15 probes
ULS Labeling. Per array 1.5 µg of genomic sample and reference DNA was heat fragmented for 10 min at 95°C and chemically labeled with Cy5ULS or Cy3-ULS dyes for 30 min at 85°C using the Agilent Genomic DNA ULS Labeling Kit (Agilent p/n 5190-0419). The Cy5-labeled and Cy3labeled samples were purified using Agilent-KREApure columns supplied with the Agilent Genomic DNA ULS Labeling Kit.
Imaging and Data Analysis. Slides were scanned on an Agilent 2565BA microarray scanner. The images were analyzed using Feature Extraction
software (version 9.5, Agilent Technologies) and CGH Analytics software (beta version 4.0.54, Agilent Technologies).
#2
Figure 2. DNA Analytics view of a 325 Kb region on Chromosome 7 comparing the ULS and Enzymatic labeling methods using
the widely used standard sample NA15510 versus reference sample NA10851. The genomic DNA was labeled with either the
Genomic DNA ULS Labeling kit (purple and red plot (dye-flip)) or the Genomic DNA Labeling kit PLUS (blue and orange plot
(dye-flip)). Log2 ratio values for all oligonucleotide probes are plotted as a function of their chromosomal position. The ADM-2
(threshold 5) algorithm identified the same CNV with both labeling methods. The Derivative Log2 Ratio Standard Deviation, a
measure for the probe-to-probe log ratio noise, was slightly lower for the ULS labeled samples. DLRSD values below 0.2 are
indicative of very high-quality reliable data. All other data presented on this poster was generated with the ULS labeling method.
Genomic DNA. The reference sample NA10851 (HapMap-CEPH/UTAH), the NA15510 sample and the other HapMap samples (CEPH/UTAH, JPT,
CHB and YRI) were obtained from Coriell.
Hybridization and Washing. The appropriate purified labeled samples were combined and mixed with human Cot-1 DNA (Invitrogen), Agilent 10x
Blocking Agent (supplied with Agilent’s Oligo aCGH Hybridization Kit p/n 5188-5220), and Agilent 2x Hybridization Buffer. Before hybridization to the
array, the hybridization mixtures were denatured at 95°C for 3 min and incubated at 37°C for 30 min. Agilent-CGHblock was added to the ULS labeled
samples and the samples were applied to the arrays. Hybridization was carried out for 40 h at 65°C. The arrays were washed in Agilent Oligo aCGH
Wash Buffer 1 for 5 min at room temperature, followed by 1 min at 37°C in Agilent Oligo aCGH Wash Buffer 2 (Agilent p/n 5188-5226).
#1
#3
Materials and Methods
Enzymatic Labeling. Per array 1 µg of genomic sample and reference DNA was digested with AluI and RsaI for 2h at 37°C. The digested samples
were labeled with Cy5-dUTP or Cy3-dUTP using the Agilent Genomic DNA Labeling Kit PLUS (Agilent p/n 5188-5309). The Cy5 labeled and Cy3
labeled samples were purified using Microcon YM-30 columns (Millipore).
Self-Self
Detection of a ~183 Kb CNV
on chromosome 4 in five
different HapMap samples.
This CNV is reported in the
Database of Genomic Variants
(indicated by the grey bar) and
is covered by 72 probes.
ULS labeling versus traditional enzymatic labeling protocol
NA10851-NA15510 Exp # 1
Size (kb)
Figure 3A (left). We used self-self hybridizations (NA10851 versus NA10851) to estimate the number of false positives, and we
used replicate dye-flips (NA10851 versus NA15510) to estimate the reproducibility of our assay. Using the ADM-2 algorithm
(threshold 5) we observed only 8 aberrations calls in the self-self hybridizations indicating a low false positive rate, while we
observed an average of 372 +/- 27 amplifications and 343 +/- 7 deletions illustrating the reproducibility.
Detection of a ~6 Kb (15 probes) CNV on chromosome 1 in five different HapMap samples. From this (limited) set of
samples it seems that the CNV is actually much smaller than the CNV reported in the Database of Genomic Variants
(indicated by the grey bar). Mean log2 ratios of the probes within the CNV are provided in the right panel likely reflecting
states of zero (Sample # 4) and one copies (Samples # 1, # 2, # 3 and # 5) for this DNA segment.
C. Architecturally complex CNV regions
Figure 3B (middle). This plot compares the average log2 ratios in the variant regions for the sample on the x-axis to the average
log2 ratios in the same regions for the replicate sample on the y-axis. The correlation between the replicates was 0.99 with a sigma
estimate (RMSE) from the least squares fit of 0.09.
Figure 3C (right). Representative example of the size distribution of CNVs detected in NA10851 versus NA15510 compared to
corresponding CNVs from DGV. In this study, the median size of detected CNVs was 4 Kb; 124 CNVs were smaller than 1000 bp,
most CNVs (371) were between 1 Kb and 10 Kb in size, only 7 CNVs were larger than 1 Mb.
Conclusions
• High-resolution CGH arrays specifically designed for detecting CNVs in
combination with the ULS labeling technology resulted in highly reproducible
CNV data with low false positive rates.
Figure 1. Comparison of the ULS labeling protocol (on right) versus the traditional enzymatic labeling protocol (on left). The traditional
labeling protocol utilizes an enzymatic methodology of random primers and Klenow to differentially label genomic DNA samples with
fluorescently labeled nucleotides. The ULS labeling protocol utilizes Kreatech’s Universal Linkage System (ULS™) to chemically bind
the label to genomic DNA at the N7 position of guanine. Compared to enzymatic labeling, it is independent of DNA fragment length,
four times faster, contains fewer steps, and is easier to automate.
• The size distribution of CNV regions detected in the different HapMap
samples was significantly smaller than the corresponding regions in the
Database of Genomic Variants.
• Many regions were found to be complex as previously reported by for
example Perry et al. (AJHG 82, 685-695, March 2008).
Self-Self
Detection of a complex
CNV region on
chromosome 4 in five
different HapMap
samples. The CNV is
larger in Sample # 4 as
compared to the CNV in
Samples # 1, # 3 and # 5
and couldn’t be detected
in Sample # 2. A similar
pattern for this region was
reported by Perry et al.
(AJHG 82, 685-695,
March 2008).
#1
#2
#3
#4
#5
Figure 4A, 4B and 4C. DNA Analytics views of 5 different HapMap samples hybridized against NA10851 and a self-self
(NA10851-NA10851) hybridization. Log2 ratio values for all oligonucleotide probes are plotted as a function of their chromosomal
position. CNVs were identified using the ADM-2 (threshold 5) algorithm.
Download