Global identification of human transcribed sequences with genome

advertisement
Materials and Methods
Human genome sequence processing
The template sequences used to design the microarrays were derived from UCSC
HG13/NCBI Build 31 (UCSC Human Genome Project Working Draft, November 2002
assembly) of the human genome sequence assembly (1). Each chromosome sequence was
screened for repetitive elements and low-complexity DNA with RepeatMasker (2) in
sensitive mode, in conjunction with the RepBase collection of repetitive sequence
elements (3, 4). Additional low-complexity sequence filtering was performed with the
NSEG program (5) using a minimum segment length of 21 nucleotides, trigger
complexity of 1.4, and extension complexity of 1.6.
Oligonucleotide probe selection
Following repeat sequence identification and low-complexity filtering, the remaining 1.5
Gb of nonrepetitive DNA was processed by the NASA Oligonucleotide Probe Selection
Algorithm (NOPSA), which is designed to identify optimal hybridization probes
according to several criteria: 1) nucleotide frequency information, calculated to determine
the uniqueness of every 36mer in the genome; this measure was used to select probes that
occur less than 5 times on average to reduce the potential for cross-hybridization; 2)
intra-oligo alignment scores, used to exclude sequences that could form a loop with a
stem greater than 7 bases; 3) other sequence-dependent factors such as length, extent of
complementarity and overall base composition. A total of 51,874,388 36mer
oligonucleotide probes were selected to represent both sense and antisense strands of the
nonrepetitive DNA at an average resolution of 46 nt (probes are spaced every 10 nt on
average). For DNA synthesis purposes, each selected probe was designed not to use more
than 75% of the maximum number of cycles required to synthesize an oligomer of that
length. For 36mer oligonucleotides this cutoff was 108 cycles. Virtual masks describing
the layout of each microarray design were developed with the ArrayScribe program
(NimbleGen Systems, Madison, WI).
Microarray construction
High-density oligonucleotide arrays were fabricated as described (6-8). Briefly, DNA
synthesis reagents (Glen Research, Sterling, VA), (Proligo, Boulder, CO), (Amersham
Pharmacia, Piscataway, NJ), (Applied Biosystems, Foster City, CA) were used in
conjunction with Expedite DNA synthesizers (Applied Biosystems). Maskless Array
Synthesis (MAS) units (NimbleGen Systems, Madison, WI) were connected to the DNA
synthesizers to manufacture custom arrays using photolabile phosophoramidites
(NPPOC- dAdenosine (N6-tac) -Cyanoethylphosphoramidite, NPPOC-dCytidine (N4Isobutyryl) -Cyanoethylphosphoramidite NPPOC-dGuanosine (N2-ipac) Cyanoethylphosphoramidite, NPPOC-dThymidine-Cyanoethylphosphoramidite) obtained
from Proligo (Boulder, CO). After synthesis on the MAS was completed, the baseprotecting groups were removed in a solution of ethylenediamine:ethanol (1:1 v/v)
(Aldrich, St. Louis, MO) for two hours. The arrays were rinsed with water, dried and
stored desiccated until use.
Sample preparation and labeling
Triple-selected human liver tissue poly (A)+ RNA pooled from several individuals was
obtained from Ambion (Austin, TX). First-strand cDNA was generated via M-MLV
reverse transcriptase (RNase H-) with equimolar concentrations of oligo(dT) primers and
random decamers. The reactions were carried out at 42°C for 2 hours in the presence of
amino allyl-dUTP to facilitate the secondary labeling of an amine-reactive fluorescent
conjugate. Following reverse transcription the products were heated at 95°C for 5
minutes to denature the RNA:DNA hybrids and heat-inactivate the reverse transcriptase,
after which the RNA template was hydrolyzed via incubation with NaOH at 65°C for 15
minutes. Reverse transcription products from 20 separate reactions were produced in this
manner and pooled to reduce technical variability between samples. The cDNAs were
precipitated in ethanol:isopropanol (1:1 v/v) and resuspended in 0.1 M NaHCO3 to
facilitate coupling of Alexa Fluor 555 NHS esters (Molecular Probes, Eugene, OR) to the
reactive groups of the amino allyl-dUTPs. Following incubation at room temperature for
1 hour, the labeled cDNAs were purified with CyScribe GFX glass fiber spin columns
(Amersham Bioscience, Piscataway, NJ) and isopropanol-precipitated.
Microarray hybridization
Microarrays were hybridized with 2-3 µg labeled cDNA in 320 µL hybridization buffer
(50 mM MES, 0.5 M NaCl, 10 mM EDTA, and 0.005% (v/v) Tween-20) for 20 hours at
50°C. Hybridizations were performed in disposable adhesive chambers (Grace BioLabs,
Bend, OR) in a hybridization oven with constant agitation. After hybridization, the arrays
were washed on an orbital platform in non-stringent buffer (6× SSPE, 0.01% [v/v]
Tween-20) for 10 minutes at room temperature, then in stringent buffer (100 mM MES,
0.1 M NaCl, 0.01% Tween-20) for 30 minutes at 45°C. This was followed by a 5-minute
wash in non-stringent buffer and a 4-minute wash in 0.2X SSC. The arrays were dried
with compressed nitrogen. Images were acquired with an Axon 4000B laser scanner at
5µm resolution and intensity data were extracted with NimbleScan software (NimbleGen
Systems).
Validation of TAR sequences
RT-PCR was performed to validate 96 transcriptionally active regions (TARs) from
throughout the genome. PCR primer pairs were designed for each region and the
reactions were carried out with the RETROscript system (Ambion) with 0.25 µg poly
(A)+ human liver RNA for each sample tested. An identical aliquot of each reaction
mixture was used as a minus-reverse transcriptase control. PCR products were
electrophoresed on 2.5% fine-resolution agarose gels; 90 of the 96 displayed bands of the
appropriate size with no detectable band in the negative control, while 10 of the positive
amplicons produced multiple bands in addition to the targeted product.
References
1. E. S. Lander et al., Nature 409, 860 (2001).
2. A. F. A. Smit, P. Green, unpublished results.
3. J. Jurka, Trends Genet. 9, 418 (2000).
4. RepBase Update, volume 7, issue 2.
5. J. C. Wooton, S. Federhen, Methods Enzymol. 266, 554 (1996).
6. S. Singh-Gasson et al., Nat. Biotechnol. 17, 974 (1999).
7. E. F. Nuwaysir et al., Genome Res. 12, 1749 (2002).
8. T. J. Albert et al., Nucleic Acids Res. 31, e35 (2003).
Download