SUPPLEMENTARY MATERIAL

advertisement
METHODS
Patients and Definitions
The clinical criteria for inclusion were: severe injuries of at least two body regions or three major
fractures, at least 18 years of age; an estimated Injury Severity Score (1) of 5 or more at baseline in
the emergency room and 12 points or more after complete assessment and diagnosis of injuries at
admission to the intensive care unit (ICU), less than 12 hours between accident and admission to the
ICU, and at least more than 3 days of survival. None of the patients underwent neuro- or cardiac
surgery.
The exclusion criteria were: severe intracranial head injuries, coagulation abnormalities (e.g.
coagulation factor deficiency < 40%, prothrombin time <40%, partial thromboplastin time >120
secs, antithrombin III < 40%, INR < 2.0, platelet count <50,000µl) well-known at the day of
admission to the ICU, acute renal failure (serum creatinine >3.0 mg/dL; serum urea >250 mg/dL;
urine output <20 mL/h despite intensive diuretic therapy with furosemide), liver failure, pregnancy,
malignant disease and hemofiltration in the patient’s history.
With reference to the criteria originally proposed by the members of The American College of Chest
Physicians and the Society of Critical Care Medicine (2) sepsis was assumed, if all of the following
sepsis criteria were met within 24 hours of each other: 1) definitive clinical evidence to support a
presumptive diagnosis of sepsis; (2) hyperthermia (body temperature of ≥38,5°C) or hypothermia
(body temperature of ≤35,6°C) measured as core temperature; (3) tachycardia, defined as a heart rate
of ≥90 beats/min in the absence of receiving a ß-adrenergic receptor blocker; (4) requiring
mechanical ventilation; (5) hypotension, defined as a systolic blood pressure of ≤90 mm Hg (or a
sustained drop in the systolic blood pressure of >40 mm Hg in the presence of an adequate fluid
challenge), or evidence of systemic toxicity or poor end-organ perfusion, defined by two or more of
the following criteria: (a) metabolic acidosis (arterial blood pH of ≤7.3 or base deficit of ≥5mmol/L,
(b) arterial hypoxia (Po2/FIo2 of ≤250), (c) acute renal failure (urine output of <0.5 mL/kg/hr for at
least 1 hr), (d) coagulation abnormality (prothrombin time of ≥1.5 × control) or partial
thromboplastin time of ≥1.5 × control), (e) unexplained decrease in the platelet count (≤100,000
thrombocytes/µL or decrease of at least 50% from baseline), (f) cardiac index of <4.0 L/min/m2 with
systemic vascular resistance of <800 dyne.secs/cm5.
Septic shock was defined as severe sepsis with persistent hypotension (systolic blood pressure of <90
mm Hg or a sustained decrease in the systolic blood pressure of >40 mm Hg for at least 60 mins) despite
adequate volume load or the need for vasopressor drugs. Documented infection was defined as the
identification of positive bacterial cultures from normally sterile body fluids or bodily compartments and
clinical signs of infection. Homologous packed red blood cells were transfused to maintain the
hemoglobin level >9 g/dL. Plasma was only given to maintain hemostasis higher than the thresholds of
the coagulation abnormalities mentioned. The supportive additional therapy (antibiotics, parenteral
nutrition, volume therapy, adequate FIo2 adapted to the situation of the patient, catecholaminergic- as
well as pharmacological support) was carried out as detailed in a protocol of standard management
principles by Sibbald et al. in our ICU and calculated by the intensivist on duty blinded to the aim of the
study (3).
Data management
Patient data is documented electronically on a Patient Data Management System (PDMS, ICU-Data
by Imeso Gemany) prospectively. Study data is documented in the Giessen-Research-Center-inInfectious-Diseases database based on an Oracle™ platform.
Microbiological screening
Daily microbiological screening included smear tests from wounds, and skin. Catheters in central
veins and urethra were examined when removed from the patient. Broncho-alveloar smear was taken
every time, when the patient underwent bronchoscopy, and when suction material looked suspicious
for infection. Blood cultures were drawn, by additional venous puncture, when the treating
intensivist had identified early signs of infection, such as fever. All bacteriological results were
documented in the patients record for the time point at which the sample was drawn.
Multivariate analyses
In the first step, we investigated the impact of carriage of at least one mutation in a tumor nescrosis
factor (TNF) polymorphism on TNF- plasma level on day one. For univariate analyses, MannWhitney tests were calculated. To control for possible confounding variables, logistic regression
analyses were performed on the dichotomized TNF- plasma level predicting high (≥36.5) vs. low
(<36.5) TNF- plasma level from age, weight, and trauma severity as continuous covariates, as well
as gender as a categorical covariate. To determine whether an association with TNF polymorphisms
might depend on previously determined factors for plasma levels, we investigated the association of
TNF polymorphisms, while controlling for the risk factors that had been identified previously by
adding the polymorphisms to the thus obtained logistic regression models. Odds ratios (ORs) with
95% confidence intervals (CI) were estimated. In addition, the effect of the estimated TNF
haplotypes on plasma level on day one was analyzed assuming an ordinally scaled phenotype
utilizing a score test with simulated p-values from 106 replications (4).
In the second step, the effect of carrying at least one mutation in a TNF polymorphism on the
development of sepsis was investigated. For univariate analyses, Fisher’s exact test was performed
and ORs with 95% CIs were estimated. Again, to control for other factors, logistic regression
analyses were performed predicting development of sepsis from age, weight, trauma severity, and
TNF- plasma levels on day one as continuous covariates, and gender as a categorical covariate. An
independent association of TNF polymorphisms with sepsis was then analyzed by adding the
polymorphisms to the regression model and estimating the adjusted ORs with 95% CIs, thus
controlling for the identified factors. To include possible covariates in the analyses of estimated
TNF haplotypes with regard to development of a sepsis, generalized linear models allowing for
ambiguous haplotypes were employed (5).
Finally, we utilized the same procedure to analyze the impact of carriage of at least one mutation in
a TNF polymorphism on mortality. Fisher’s exact test and ORs with 95% CIs were employed for
univariate analyses. For multivariate analyses, logistic regression analyses were performed to predict
mortality from age, weight, trauma severity, and TNF- plasma level on day one as continuous
covariates, with gender and sepsis as categorical covariates. To determine an association of TNF
polymorphisms with mortality that is independent of previously determined factors, we added the
TNF polymorphisms to the logistic regression models previously obtained and estimated adjusted
ORs with 95% CIs. In a similar way, the prediction of outcome by estimated TNF haplotypes was
analyzed using generalized linear regression models including possible covariates.
In the development of all of the above regression models, the association between continuous
covariates and the respective dependent variable was modelled utilizing fractional polynomials (6).
As the number of events was 72 for sepsis and 32 for mortality, and the calculated events per
variable was about 15 for sepsis and less than 6 for mortality, only risk factors with univariate
p < .01 were considered for the multivariate model. All possible two-way interactions were analyzed
and a backward selection of covariates with elimination of covariates with p > .01 was conducted.
For internal validation of the final multivariate prediction models for sepsis and mortality, bootstrap
sampling was performed with 20.000 replicates. From this, the mean parameter estimates as well as
a 95% CI were determined.
Gene expression profiling
A total of 28 patients were included in the gene expression analysis study. The PAXgene Blood
RNA System (PreAnalytiX, Heidelberg, Germany) was used to collect whole blood samples and to
isolate the RNA according to the manufacturer’s recommendations (PreAnalytiX). Total RNA was
quantified with Nanodrop (NanoDrop Technologies, Rockland DE, USA) and the quality of RNA
was assessed using the Agilent 2100 Bioanalyzer Bioanalyzer (Agilent Technologies GmbH,
Boeblingen, Germany). When the total RNA yield was >2 µg, the 260/280-ratio was >1.9 and the
electrophoretic profile showed clear and sharp ribosomal peaks, the RNA was subjected to cRNA
synthesis, cRNA fragmentation and finally hybridization on CodeLink UniSet Human 10 K
Bioarrays (GE Healthcare, Freiburg, Germany) using the CodeLink Expression Assay Kit (GE
Healthcare) according to manufacturer's instructions. Each patient sample was hybridized on at least
two bioarrays (technical replicates). Bioarrays were stained with Cy5™-streptavadin (GE
Healthcare) and scanned using the GenePix® 4000 B scanner and the GenePix Pro 4.0 Software
(Axon Instruments, Arlington, TX, USA). A total of 75 array images were subjected to data
analysis.
Spot signals of CodeLink bioarrays were quantified using CodeLink System Software consisting of
Batch Submission (V2.2.27) and Expression Analysis (V2.2.25) (GE Healthcare) as outlined in the
user's manual. CodeLink Expression Software 1.21 generated background corrected raw as well as
median centered intra-slide normalized data. The intra-slide normalized data were used for further
analysis. The software automatically calculated thresholds for intra-slide normalized intensities for
each array and flagged genes as TRUE when the gene intensity was higher than the threshold or
FALSE when the intensity was lower than the threshold. The present call of a microarray was given
as the ratio of genes flagged as TRUE / total number of genes on microarray. Microarrays subjected
to data analysis showed a mean present call of 81% indicating a high number of genes above
threshold, i.e. being flagged as TRUE. Furthermore, the software flagged each gene value as
GOOD, EMPTY, POOR, NEG or MSR defining different quality measures as outlined in the user's
manual. Only gene values flagged as GOOD or EMPTY were used in the following analysis
workflow:
1) Defining patient groups:
Patients and corresponding arrays were separated in two groups (dataset1):
group A) 16 patients (WT1-16; 42 arrays) without the TNF rs1800629 A variant
group B) 12 patients (MUT1-12;33 arrays) carrying the TNF rs1800629 A variant
2) Removal of genes with a high number of missing values or of values being flagged as FALSE:
Genes with missing values >= 50% of all arrays in a group were excluded from the dataset.
Genes that were flagged as FALSE in > 50% of arrays in each group were also excluded from
the dataset.
3) Imputation of remaining missing values:
Remaining missing values were imputed using sequential K-nearest neighbour (SKNN)
imputation (7) with k=5.
4) Normalization of imputed dataset:
Imputed dataset was normalized using quantiles normalization in R (8) and logged to base 2.
5) Array outlier detection:
Dissimilarity matrices of the normalized dataset were generated in AVADIS-Pride (9) to
determine outlier arrays within the dataset (Figure e1). Arrays of patients WT13-16 (12 arrays)
and patients MUT11 and 12 (5 arrays) were identified as outliers and removed from the original
dataset 1 in step 1.
The analyses steps 1 - 4 were repeated with the reduced dataset consisting of:
group A) 12 patients (WT1-12; 30 arrays) without the TNF rs1800629 A variant and
group B) 10 patients (MUT1-10; 28 arrays) carrying the TNF rs1800629 A variant
and resulted in an imputed, normalized and logged dataset 2.
6) Statistical analysis of microarrays:
In dataset 2, for each gene, the mean value of all technical replicates of a patient was calculated
in dChip (10). To identify differentially regulated genes between group B (TNF rs1800629 A
variant) and group A (TNF wild type), the dataset was subjected to a novel two-class rank
statistics (Rank products, RP) as described below (11, 12). For each gene, a false discovery rate
(FDR) < 0.25 was defined as the significance level.
7) Annotation of genes:
Significantly regulated genes were annotated using the web based annotation tools SOURCE
(13) and the Database for Annotation, Visualization and Integrated Discovery (DAVID) (14)
version 2.0 as described in the manuals.
8) Enriched functional categories:
Enriched functional categories within the differentially regulated genes were determined using
DAVID (14) version 2.0. DAVID is a platform that provides statistical methods (reported as an
Enrichment Score) to facilitate the biological interpretation of gene lists deriving from
microarray analysis. Enriched genes describes a class of genes that have similar functions
regardless of their expression level, and appear more often in a list of interest than would
normally be predicted by their distribution among all genes assayed. An Enrichment Score is
calculated for likelihood of enrichment of biological processes, molecular functions and cellular
component categories using the Gene Ontology public database.
9) Cluster analysis:
Hierarchical cluster analysis of the top 100 significant over- and under-expressed genes was
performed using the centroid linkage method and the distance matrix 1 –r in dChip (10).
Rank products
The Rank Products method (11, 12) was used for identifying differentially expressed genes in the
expression data. The method is based on the premise that a gene in an experiment examining n
genes in k replicates, has a probability of being ranked first (rank 1) of 1/nk if the lists were entirely
random. Therefore, it is unlikely for a single gene to be in the top position in all replicates if this
gene was not differentially expressed, i.e., if all null hypotheses were true. More generally, for each
gene g in k replicates i, each examining ni genes, one can calculate the corresponding combined
probability as a rank product RPgup=∏i=1k(ri,gup/ni) where ri,gup is the position of gene g in the list of
genes in the ith replicate sorted by decreasing fold change, i.e. rup = 1 for the most strongly
upregulated gene, etc. The genes can then be sorted according to the likelihood of observing their
RP value at or above a certain position on the list. Analogously, RPdowng is calculated from the list of
genes sorted by increasing FC, i.e. rdown=1 for the most strongly downregulated gene.
To know how significant the changes are and how many of the selected genes are likely to be truly
differentially expressed., a simple permutation-based estimation procedure provides a very
convenient way to determine how likely it is to observe a given RP value in a random experiment by
converting the RP value to an E value in analogy to the BLAST results (15). The RP value
distribution can be approximated in each case by calculating the RP values for a number of z random
“experiments” with the same number of replicates and “genes” as the real experiment. Each random
experiment consists of k random permutations of the numbers 1,…,n and for these the RP values are
calculated as described above. The number of simulated RP values in the random experiments that
are smaller than or equal to a given experimental RP value (x(RP)) are then used to calculate the
average expected value E(RP)≈x(RP)/z.
Subsequently, for each gene g a conservative estimate of the percentage of false-positives (PFP) is
calculated: qg=E(RPg)/rank(g). Here, rank(g) denotes the position of gene g in a list of all genes
sorted by increasing RP value, i.e., it is the number of genes accepted as significantly regulated. This
estimates the FDR [Storey 2003] and provides a flexible way to assign a significance level to each
gene. The FDR is accepted as a reasonable significance threshold in microarray studies (16). One
can now decide how large a PFP would be acceptable and extend the list of accepted genes up to the
gene with this qg value. The rank product method was chosen since it has been shown to outperform
classical t-statistic and moderated t-statistics when datasets have low numbers of samples or high
levels of noise (11, 17).
To investigate the difference in the peripheral blood transcriptome from patients with and without
the TNF rs1800629 A variant, bootstrap sampling was performed with 10,000 replicates. From this,
the 95% CI for mean differences in expression levels as well as robust two-sided p values were
estimated.
TaqMan Real-time Reverse Transcription-polymerase Chain Reaction
To validate the microarray data, TaqMan quantitative real-time real-time reverse transcriptionpolymerase chain reaction (RT-PCR) was performed for 10 selected human genes (CASP8, ILR1,
ILR2, TNFRSF1A, SOCS3, IL18R1, CEBPD, TLR2, PRV1, TLR4).
Pre-optimized TaqMan primer/probe sets (Quantitect Primer Assays) of selected genes were
obtained from the Gene Globe Portal (Qiagen, Hilden, Germany). TaqMan probes were labeled with
6-carboxy-fluorescein (FAM) as a reporter dye and 6-carboxy-tetramethyl- rhodamine as a quencher
dye. Peptidylprolyl isomerase A (Quantitect Primer Assays) was simultaneously detected as an
internal control to normalize all the data. Prior to sample measurements, all primer pairs were
validated using a control total RNA pool derived from PAXgene samples. Standard curves of
gradual RNA dilutions were designed by plotting Ct values against the log-transformed input total
RNA (in ng). Amplifikation efficiencies for the target genes and the internal control were calculated
as E = 10(-1/S) -1, where S is the slope of the standard curve. The amplification efficiencies are given
in Table e1. For sample measurements, 400ng PAXgene RNA of two groups of patients A and B
(group A: 7 patients without the TNF rs1800629 A variant (WT2, WT4-7, WT10-11); group B: 9
patients carrying the TNF rs1800629 A variant (MUT1-9) were subjected to cDNA synthesis using
Superscript II Reverse Transcriptase (Invitrogen, Karlsruhe, Germany) following the manufacturer’s
protocol. Real-time RT-PCR was performed on the ABI PRISM® 7700 Sequence Detection System
(Applied Biosystems, Darmstadt, Germany) using the Quantitect SYBR Green PCR Kit (Qiagen)
with cDNA corresponding to 2 ng (0,5%) input total RNA. All reactions were run in duplicate. Ct
values of the tested genes were determined and compared with the respective standard curve. The
antilogarithm of the value at the intersection point with the standard curve corresponded with the
amount of human total RNA of the expressed target gene. The normalized expression of a target
gene Eg was given as the ratio between the total RNA amount of the target gene and the internal
control (peptidylprolyl isomerase A). Both normalized microarray intensities and RT gene
expression levels relative to internal control of patients with the TNF rs1800629 A variant (group B)
were log2 transformed and expressed as log2 differences from patients without the TNF rs1800629 A
variant (group A).
Results
Multivariate analysis of outcome
Multivariate analysis controlling for the variables TNF- plasma concentration on day one, sepsis
syndrome, age, sex, ISS, and body weight revealed significant confounding by TNF- plasma
concentration on day 1 after inclusion and sepsis syndrome. Specifically, the risk of succumbing
was higher when TNF- plasma concentrations were high on day one and in patients with sepsis
syndrome. When these factors were considered by multivariate analysis, the association of
rs1800629 A allele and rs909253 G allele with outcome in severely injured patients remained stable.
Bootstrap estimates of the parameter values in the multivariate setting yielded mean values (95%
CI) of 0. (-0.; 1.) for rs909253 and 1.118 (-0.070; 2.331) for rs1800629, respectively.
Validation of microarray results
The validity of the microarray results were determined by using a TaqMan assay of 10 selected
genes from the microarray with a broad range of expression value. The overall correspondence
between gene expression levels by microarrays and by TaqMan was good indicated by a correlation
coefficient of 0.88 (Figure e3). A gene-to-gene variation exists and may be attributable to sequencespecific factors, i.e. the labelled cRNA may hyridize to a microarray element for a given gene that is
a few hundred base pairs from the corresponding TaqMan primers and probes. Nevertheless, our
results support the accuracy by which the Codelink microarray represents gene expression.
REFERENCES
1. Baker SP, O'Neill B, Haddon W, Jr., et al: The injury severity score: A method for
describing patients with multiple injuries and evaluating emergency care. J Trauma 1974;
14:187-196
2. Bone RC, Balk RA, Cerra FB, et al: Definitions for sepsis and organ failure and guidelines
for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus Conference
Committee. American College of Chest Physicians/Society of Critical Care Medicine. Chest
1992; 101:1644-1655
3. Sibbald WJ, Vincent JL: Round table conference on clinical trials for the treatment of sepsis.
Brussels, March 12-14, 1994. Intensive Care Med 1995; 21:184-189
4. Schaid DJ, Rowland CM, Tines DE, et al: Score tests for association between traits and
haplotypes when linkage phase is ambiguous. Am J Hum Genet 2002; 70:425-434
5. Lake SL, Lyon H, Tantisira K, et al: Estimation and tests of haplotype-environment
interaction when linkage phase is ambiguous. Hum Hered 2003; 55:56-65
6. Royston P: A strategy for modelling the effect of a continuous covariate in medicine and
epidemiology. Stat Med 2000; 19:1831-1847
7. Kim KY, Kim BJ, Yi GS: Reuse of imputed data in microarray analysis increases imputation
efficiency. BMC Bioinformatics 2004; 5:160
8. Bolstad BM, Irizarry RA, Astrand M, et al: A comparison of normalization methods for high
density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19:185193
9. Gwadry FG, Sequeira A, Hoke G, et al: Molecular characterization of suicide by microarray
analysis. Am J Med Genet C Semin Med Genet 2005; 133:48-56
10. Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Expression index
computation and outlier detection. Proc Natl Acad Sci U S A 2001; 98:31-36
11. Breitling R, Armengaud P, Amtmann A, et al: Rank products: A simple, yet powerful, new
method to detect differentially regulated genes in replicated microarray experiments. FEBS
Lett 2004; 573:83-92
12. Breitling R, Herzyk P: Rank-based methods as a non-parametric alternative of the T-statistic
for the analysis of biological microarray data. J Bioinform Comput Biol 2005; 3:1171-1189
13. Diehn M, Sherlock G, Binkley G, et al: SOURCE: a unified genomic resource of functional
annotations, ontologies, and gene expression data. Nucleic Acids Res 2003; 31:219-223
14. Dennis G, Jr., Sherman BT, Hosack DA, et al: DAVID: Database for Annotation,
Visualization, and Integrated Discovery. Genome Biol 2003; 4:3
15. Altschul SF, Gish W, Miller W, et al: Basic local alignment search tool. J Mol Biol 1990;
215:403-410
16. Storey JD, Tibshirani R: Statistical methods for identifying differentially expressed genes in
DNA microarrays. Methods Mol Biol 2003; 224:149-157
17. Jeffery IB, Higgins DG, Culhane AC: Comparison and evaluation of methods for generating
differentially expressed gene lists from microarray data. BMC Bioinformatics 2006; 7:359
Download