Supplementary Methods - Word file (50 KB )

advertisement
Supplementary Methods
TaqMan® assays
Primer/Probe design. TaqMan® Gene Expression Assays are designed to transcripts
obtained from the NCBI Reference Sequence Project data base (RefSeq) using the
Applied Biosystems genome-aided primer and probe design pipeline (Applied
Biosystems, 2004). To identify the optimal region for primer and probe design,
ambiguous regions (repetitive and low complexity sequences, SNPs) in each transcript
are first masked, then the transcripts are aligned to the genome and exon-exon junctions
marked. Primer and probes are designed for each gene and where possible, the assay is
designed such that that probe spans the exon-exon junction. Parameters such as %GC
content, Tm, secondary structure, and amplicon length are optimized to ensure high
amplification efficiency. To ensure specificity, each assay design undergoes an in silico
QC process against the public genome and transcripts database (NCBI) where we
determine the degree of homology through BLAST between the assay primer and probe
sequences and other closely related transcripts and homologous genes and pseudogenes.
Thus each assay is designed to a transcript that is gene specific. To address sequence
updates, the assays are periodically remapped to the most recent RefSeq versions of the
NCBI database.
Selection of Endogenous Control for Normalization. In QRT-PCR an endogenous control
gene is used to normalize data and control for variability between samples as well as
plate, instrument and pipetting differences. The ideal control should be sufficiently
abundant, have constant RNA transcription levels across samples, and be unaffected by
experimental treatments. To identify the best endogenous control for this study, we ran
11 samples containing titrations of UHR and Brain, ranging from 100% UHR to 100%
Brain on a TaqMan® assays Low Density Array Endogenous Control Panel (Applied
Biosystems), a 384-well low density array containing 16 common housekeeping genes
(Supplementary Fig. 5a). From these data we identified 4 candidate endogenous control
genes (Supplementary Fig. 5b) to use in the main MAQC study for normalization of the
1000
assays
across
samples
A,
B,
C
and
D:
18S(Hs99999901_s1),
UBC(Hs00824723_m1), HPRT(Hs99999909_m1), POLR2A(Hs00172187_m1). The
endogenous control genes were run in quadruplicate on each of 44 plates. Although all 4
genes showed very little change across samples, ANOVA analysis across 3 factors
(plates, instruments, samples) indicated that 18S and POLR2A showed the least variation
across the samples. We decided to use POLR2A because its CT value was within the
range of most of the genes in the study. Each replicate CT was normalized to the average
CT of POLR2A on a per plate basis by subtracting the average CT of POLR2A from each
replicate to give the ΔCT which is equivalent to the log2 difference between endogenous
control and target gene.
TaqMan® assays relationship between SD and fold change confidence. A SD of CT of
0.167 provides 99.72% confidence for measuring 2-fold discrimination with a single
observation. A higher SD of CT may be the result of a low expressing gene (stochastic
effects) or technical error (pipetting, mixing).
StaRT-PCR™
Detailed Procedure. Standardized reverse transcriptase polymerase chain reaction
(StaRT-PCR™) is a modification of the competitive template (CT) RT method described
by Gilliland et al. (1). StaRT-PCR™ enables numerical quantification of gene expression
at endpoint of PCR (35 cycles) for many genes simultaneously. An internal standard (IS)
CT is prepared for each gene and cloned to generate enough for >1012 assays. Internal
standards for 96 genes are mixed together into a Standardized Mixture of Internal
Standards (SMIS™). In each measurement, variation in loading of cDNA is controlled by
reference to ACTB, which has relatively constant expression among different samples
(due to use of the same SMIS in each measurement, following data collection,
normalization may be converted to any other single gene or combination of genes
measured through a simple algebraic conversion factor). In each measurement, the native
template (NT) for both the target gene and ACTB are measured relative to a known
quantity of their respective internal standards.
The ratio of NT to IS must be greater than 1:10 and less than 10:1 for the
measurement to be within assay range. Initial calibration of each cDNA to a known
quantity of ACTB internal standard ensures that the ACTB NT/IS is within this range for
each subsequent measurement. Next, 2 l of the calibrated cDNA sample and 2 l of
SMIS™ are PCR-amplified in a 20 l PCR reaction with primers specific to a different
gene in each reaction. As with the ACTB loading control gene, the target gene NT/IS
must be greater than 1:10 and less than 10:1. Because genes are expressed over more than
six orders of magnitude in human tissues, the target gene internal standards in each 96gene SMIS™ are 10-fold serially diluted relative to the loading control gene (ACTB)
internal standard, in a System of six SMIS™, A–F. Thus, there are 600,000 transcript
molecules of ACTB IS in a l of each SMIS (A-F), and 6,000,000 transcript molecules of
each target gene IS in SMIS A and 60 transcript molecules of each target gene IS in
SMIS™ F. For each System, sufficient amount of A–F SMIS™ is prepared for more than
1012 assays. Thus, the relative concentration of each IS within a SMIS™ is constant and
stable and when used by any lab according to recommended methods, will yield the same
results when assessing the same samples. Thus far, Gene Express, Inc. has prepared
Systems 1-8, comprising reagents for nearly 800 genes.
When preparing an internal standard for each transcript, quality control is ensured
through a 29-step GLP compliant protocol. Specificity of StaRT-PCR™ reagents to a
particular transcript is ensured by selection of primers through careful database analysis.
Optimal primer efficiency, including limit of detection (LOD) of less than 10 transcript
molecules, and 100% signal-to-analyte response are ensured through serial limiting
dilution of a known quantity of IS. In each measurement, quality control of gene
specificity is ensured through SOP preparation of primers, SMIS™ and PCR reaction
mixtures and examination of PCR products following electrophoretic separation to ensure
that PCR products of expected size are observed.
Quality Control. An internal standard peak was observed in every measurement and this
documents that no false negatives were observed.
Absence of false positives was
documented through observation of no peaks in reaction mixtures with no cDNA or
SMIS in reaction.
Specificity. Some StaRT-PCR™ reagents were intentionally designed to assess more than
one transcript at the request of customers. For these reagents, there was no increase in
discordance compared to TAQ or QGN. Each of the StaRT-PCR™ reagents assessed
transcripts representing a single gene. The design of the primers to cross introns ensured
that genomic DNA PCR products, if present, would be detected as longer PCR products.
cDNA Consumption. For each MAQC sample, 2 l of calibrated cDNA sample
corresponded to 8 ng of original RNA. For each of the four samples (A, B, C, and D) 615
datapoints were obtained for each sample (205 genes in triplicate) and approximately 2.2
StaRT-PCR™ assays were done for each datapoint. In total, these assessments for cDNA
calibration, range finding, and quantification of 205 genes consumed between 2,700 and
3,000 l of cDNA, corresponding to approximately 13 g of RNA.
Although far less cDNA in each measurement could have been used in assessment
of genes expressed at higher levels, use of less cDNA to measure genes expressed at
lower levels would have led to increased variation in measurement and/or lack of
representation of those genes due to stochastic sampling error. Gene Express has SOP
guidelines for loading 10-fold more cDNA to obtain better reproducibility of low
expressed genes, or as much as 100-fold less cDNA to conserve on sample in
measurement of highly expressed genes.
In summary, in each measurement a known quantity of SMIS™ is combined with
a cDNA sample. Loading of cDNA is controlled by reference to ACTB. Each target gene
and loading control gene is simultaneously measured relative to a known number of its
respective internal standard transcript molecules in the SMIS™. Thus, it is possible to
report each gene expression measurement as a numerical value in units of target gene
cDNA transcript molecules/106 reference gene cDNA transcript molecules. Calculation
of data in this format enables entry into a common databank, direct inter-experimental
comparison, and combination of values into interactive transcript abundance indices. All
data may be normalized to any gene measured other than ACTB through a simple
correction factor.
StaRT-PCR™ Data Set 1 (Figs. 3-6) All data were normalized to ACTB as loading
control. It was assumed that mRNA/Total RNA was equal in Samples A and B. The
determination of difference in ACTB/mRNA in Sample A compared to Sample B was
based on identifying the A/B ratio of ACTB/mRNA that yielded optimal average R2 for
Samples A, B, C and D across all genes measured. Based on this, ACTB/mRNA was
calculated to be 2-fold higher in Sample A compared to Sample B. Data were corrected
for this difference. This normalization effectively normalized the GEX data set to all
other data sets, as is evident from the fold-difference graphs (Fig. 3 and Fig. 4e).
StaRT-PCR™ Data Set 2 (Fig. 2 and related Table 1 data, R2 data in Table 1) As with
Data Set 1, all data were normalized to ACTB as loading control. Based on MAQC
sample titration data from StaRT-PCR™ (MS-7) and from each microarray (MS-8), and
from spike-in data from MS-12, it was concluded that mRNA/Total RNA is higher in
Sample A compared to Sample B. To identify the optimal normalization with respect to
Sample A to Sample B difference using no a priori assumptions regarding either
mRNA/Total RNA and ACTB/mRNA, a 3-D surface plot was used to assess the effect of
each of these parameters on the linear correlation (R2) across the four MAQC samples
averaged across all genes measured. It was discovered that there are two optimal R 2
peaks. One is achieved with the normalization assumptions used in Data Set 1 (i.e. A/B
mRNA/Total RNA = 1.0, A/B ACTB/mRNA = 0.5). However, the assumption of A/B
mRNA/Total RNA = 1.0 is not supported by multiple sets of data (MS-8, MS-12). Thus,
the second R2 optimum, achieved with A/B mRNA/Total RNA = 2.5 and A/B
mRNA/Total RNA = 0.5, is considered to be the correct one. These are the assumptions
used in Data Set 2.
StaRT-PCR™ Data Set 3 (Fig. 1 and related Table 1 data)
The number of transcript molecules in the assay, with no normalization to ACTB or
anything else.
QuantiGene®
Probe design. A probe set for a target gene consists of three types of oligonucleotide
probes (CE, LE, BL) covering a contiguous region of the target, which allows the capture
of target RNA to the surface of plate well and hybridization with branched DNA signal
amplification molecule. For each target sequence, the software algorithm identifies
regions that can serve as annealing templates for CEs (5-10 per gene), LEs (10-20 per
gene), or BLs to fill the remaining space. More detailed description of the probe design is
described previously (2)
Scaling of data for TaqMan® assays, StaRT-PCR™, and QuantiGene®
The data for the three platforms were transformed to be on the same X-axis scale in the
following manner. For StaRT-PCR™, 6000 transcript molecules was defined by a value
of 6000 or log2 (6000) = 12.55.
For TaqMan® assays, first the CT values were
transformed from a decreasing copy number scale to an increasing copy number scale.
This was accomplished by taking the absolute value of the difference of every TaqMan ®
assays CT value and the lowest value for TaqMan® assays CT (40). This rescaling
preserves the assay range measured by TaqMan® assays in the log2 space. Given that a
TaqMan® assays CT value of 35 corresponds to 5 transcript molecules, the extrapolated
CT equivalent for 6,000 transcript molecules is approximately 24.78. This value on the
transformed scale corresponds to |24.78 - 40| or 15.22. In order to scale this to the
StaRT-PCR™ value of 6,000 transcript molecules, a re-scaling value of 2.66025 was
applied to all values. This factor was calculated by taking the difference between the prescaling value in TaqMan® assays that corresponds to 6000 transcript molecules (15.22)
and the value of StaRT-PCR™ that corresponds to 6000 transcript molecules (12.55).
The same transformation was applied to QuantiGene® values resulting in a rescaling
factor = 13.55. This factor was generated with the estimation of 6000 transcript
molecules defined by 0.5 RLU or -1.0 on a log2 scale. These transformations result in all
platforms having a post-scaling value of 12.55 on a log2 scale for an an analyte value of
6000 transcript molecules.
Supplementary References
1.
Gilliland, G., Perrin, S., Blanchard, K. & Bunn, H.F. Analysis of cytokine
mRNA and DNA: detection and quantitation by competitive polymerase chain
reaction. Proc. Natl. Acad. Sci. USA 87, 2725–2729 (1990).
Bushnell, S. et al. ProbeDesigner: for the design of probesets for branched DNA (bDNA)
signal amplification assays. Bioinformatics 15, 348–355 (1999).
Download