METHODS Patients and Definitions The clinical criteria for inclusion were: severe injuries of at least two body regions or three major fractures, at least 18 years of age; an estimated Injury Severity Score (1) of 5 or more at baseline in the emergency room and 12 points or more after complete assessment and diagnosis of injuries at admission to the intensive care unit (ICU), less than 12 hours between accident and admission to the ICU, and at least more than 3 days of survival. None of the patients underwent neuro- or cardiac surgery. The exclusion criteria were: severe intracranial head injuries, coagulation abnormalities (e.g. coagulation factor deficiency < 40%, prothrombin time <40%, partial thromboplastin time >120 secs, antithrombin III < 40%, INR < 2.0, platelet count <50,000µl) well-known at the day of admission to the ICU, acute renal failure (serum creatinine >3.0 mg/dL; serum urea >250 mg/dL; urine output <20 mL/h despite intensive diuretic therapy with furosemide), liver failure, pregnancy, malignant disease and hemofiltration in the patient’s history. With reference to the criteria originally proposed by the members of The American College of Chest Physicians and the Society of Critical Care Medicine (2) sepsis was assumed, if all of the following sepsis criteria were met within 24 hours of each other: 1) definitive clinical evidence to support a presumptive diagnosis of sepsis; (2) hyperthermia (body temperature of ≥38,5°C) or hypothermia (body temperature of ≤35,6°C) measured as core temperature; (3) tachycardia, defined as a heart rate of ≥90 beats/min in the absence of receiving a ß-adrenergic receptor blocker; (4) requiring mechanical ventilation; (5) hypotension, defined as a systolic blood pressure of ≤90 mm Hg (or a sustained drop in the systolic blood pressure of >40 mm Hg in the presence of an adequate fluid challenge), or evidence of systemic toxicity or poor end-organ perfusion, defined by two or more of the following criteria: (a) metabolic acidosis (arterial blood pH of ≤7.3 or base deficit of ≥5mmol/L, (b) arterial hypoxia (Po2/FIo2 of ≤250), (c) acute renal failure (urine output of <0.5 mL/kg/hr for at least 1 hr), (d) coagulation abnormality (prothrombin time of ≥1.5 × control) or partial thromboplastin time of ≥1.5 × control), (e) unexplained decrease in the platelet count (≤100,000 thrombocytes/µL or decrease of at least 50% from baseline), (f) cardiac index of <4.0 L/min/m2 with systemic vascular resistance of <800 dyne.secs/cm5. Septic shock was defined as severe sepsis with persistent hypotension (systolic blood pressure of <90 mm Hg or a sustained decrease in the systolic blood pressure of >40 mm Hg for at least 60 mins) despite adequate volume load or the need for vasopressor drugs. Documented infection was defined as the identification of positive bacterial cultures from normally sterile body fluids or bodily compartments and clinical signs of infection. Homologous packed red blood cells were transfused to maintain the hemoglobin level >9 g/dL. Plasma was only given to maintain hemostasis higher than the thresholds of the coagulation abnormalities mentioned. The supportive additional therapy (antibiotics, parenteral nutrition, volume therapy, adequate FIo2 adapted to the situation of the patient, catecholaminergic- as well as pharmacological support) was carried out as detailed in a protocol of standard management principles by Sibbald et al. in our ICU and calculated by the intensivist on duty blinded to the aim of the study (3). Data management Patient data is documented electronically on a Patient Data Management System (PDMS, ICU-Data by Imeso Gemany) prospectively. Study data is documented in the Giessen-Research-Center-inInfectious-Diseases database based on an Oracle™ platform. Microbiological screening Daily microbiological screening included smear tests from wounds, and skin. Catheters in central veins and urethra were examined when removed from the patient. Broncho-alveloar smear was taken every time, when the patient underwent bronchoscopy, and when suction material looked suspicious for infection. Blood cultures were drawn, by additional venous puncture, when the treating intensivist had identified early signs of infection, such as fever. All bacteriological results were documented in the patients record for the time point at which the sample was drawn. Multivariate analyses In the first step, we investigated the impact of carriage of at least one mutation in a tumor nescrosis factor (TNF) polymorphism on TNF- plasma level on day one. For univariate analyses, MannWhitney tests were calculated. To control for possible confounding variables, logistic regression analyses were performed on the dichotomized TNF- plasma level predicting high (≥36.5) vs. low (<36.5) TNF- plasma level from age, weight, and trauma severity as continuous covariates, as well as gender as a categorical covariate. To determine whether an association with TNF polymorphisms might depend on previously determined factors for plasma levels, we investigated the association of TNF polymorphisms, while controlling for the risk factors that had been identified previously by adding the polymorphisms to the thus obtained logistic regression models. Odds ratios (ORs) with 95% confidence intervals (CI) were estimated. In addition, the effect of the estimated TNF haplotypes on plasma level on day one was analyzed assuming an ordinally scaled phenotype utilizing a score test with simulated p-values from 106 replications (4). In the second step, the effect of carrying at least one mutation in a TNF polymorphism on the development of sepsis was investigated. For univariate analyses, Fisher’s exact test was performed and ORs with 95% CIs were estimated. Again, to control for other factors, logistic regression analyses were performed predicting development of sepsis from age, weight, trauma severity, and TNF- plasma levels on day one as continuous covariates, and gender as a categorical covariate. An independent association of TNF polymorphisms with sepsis was then analyzed by adding the polymorphisms to the regression model and estimating the adjusted ORs with 95% CIs, thus controlling for the identified factors. To include possible covariates in the analyses of estimated TNF haplotypes with regard to development of a sepsis, generalized linear models allowing for ambiguous haplotypes were employed (5). Finally, we utilized the same procedure to analyze the impact of carriage of at least one mutation in a TNF polymorphism on mortality. Fisher’s exact test and ORs with 95% CIs were employed for univariate analyses. For multivariate analyses, logistic regression analyses were performed to predict mortality from age, weight, trauma severity, and TNF- plasma level on day one as continuous covariates, with gender and sepsis as categorical covariates. To determine an association of TNF polymorphisms with mortality that is independent of previously determined factors, we added the TNF polymorphisms to the logistic regression models previously obtained and estimated adjusted ORs with 95% CIs. In a similar way, the prediction of outcome by estimated TNF haplotypes was analyzed using generalized linear regression models including possible covariates. In the development of all of the above regression models, the association between continuous covariates and the respective dependent variable was modelled utilizing fractional polynomials (6). As the number of events was 72 for sepsis and 32 for mortality, and the calculated events per variable was about 15 for sepsis and less than 6 for mortality, only risk factors with univariate p < .01 were considered for the multivariate model. All possible two-way interactions were analyzed and a backward selection of covariates with elimination of covariates with p > .01 was conducted. For internal validation of the final multivariate prediction models for sepsis and mortality, bootstrap sampling was performed with 20.000 replicates. From this, the mean parameter estimates as well as a 95% CI were determined. Gene expression profiling A total of 28 patients were included in the gene expression analysis study. The PAXgene Blood RNA System (PreAnalytiX, Heidelberg, Germany) was used to collect whole blood samples and to isolate the RNA according to the manufacturer’s recommendations (PreAnalytiX). Total RNA was quantified with Nanodrop (NanoDrop Technologies, Rockland DE, USA) and the quality of RNA was assessed using the Agilent 2100 Bioanalyzer Bioanalyzer (Agilent Technologies GmbH, Boeblingen, Germany). When the total RNA yield was >2 µg, the 260/280-ratio was >1.9 and the electrophoretic profile showed clear and sharp ribosomal peaks, the RNA was subjected to cRNA synthesis, cRNA fragmentation and finally hybridization on CodeLink UniSet Human 10 K Bioarrays (GE Healthcare, Freiburg, Germany) using the CodeLink Expression Assay Kit (GE Healthcare) according to manufacturer's instructions. Each patient sample was hybridized on at least two bioarrays (technical replicates). Bioarrays were stained with Cy5™-streptavadin (GE Healthcare) and scanned using the GenePix® 4000 B scanner and the GenePix Pro 4.0 Software (Axon Instruments, Arlington, TX, USA). A total of 75 array images were subjected to data analysis. Spot signals of CodeLink bioarrays were quantified using CodeLink System Software consisting of Batch Submission (V2.2.27) and Expression Analysis (V2.2.25) (GE Healthcare) as outlined in the user's manual. CodeLink Expression Software 1.21 generated background corrected raw as well as median centered intra-slide normalized data. The intra-slide normalized data were used for further analysis. The software automatically calculated thresholds for intra-slide normalized intensities for each array and flagged genes as TRUE when the gene intensity was higher than the threshold or FALSE when the intensity was lower than the threshold. The present call of a microarray was given as the ratio of genes flagged as TRUE / total number of genes on microarray. Microarrays subjected to data analysis showed a mean present call of 81% indicating a high number of genes above threshold, i.e. being flagged as TRUE. Furthermore, the software flagged each gene value as GOOD, EMPTY, POOR, NEG or MSR defining different quality measures as outlined in the user's manual. Only gene values flagged as GOOD or EMPTY were used in the following analysis workflow: 1) Defining patient groups: Patients and corresponding arrays were separated in two groups (dataset1): group A) 16 patients (WT1-16; 42 arrays) without the TNF rs1800629 A variant group B) 12 patients (MUT1-12;33 arrays) carrying the TNF rs1800629 A variant 2) Removal of genes with a high number of missing values or of values being flagged as FALSE: Genes with missing values >= 50% of all arrays in a group were excluded from the dataset. Genes that were flagged as FALSE in > 50% of arrays in each group were also excluded from the dataset. 3) Imputation of remaining missing values: Remaining missing values were imputed using sequential K-nearest neighbour (SKNN) imputation (7) with k=5. 4) Normalization of imputed dataset: Imputed dataset was normalized using quantiles normalization in R (8) and logged to base 2. 5) Array outlier detection: Dissimilarity matrices of the normalized dataset were generated in AVADIS-Pride (9) to determine outlier arrays within the dataset (Figure e1). Arrays of patients WT13-16 (12 arrays) and patients MUT11 and 12 (5 arrays) were identified as outliers and removed from the original dataset 1 in step 1. The analyses steps 1 - 4 were repeated with the reduced dataset consisting of: group A) 12 patients (WT1-12; 30 arrays) without the TNF rs1800629 A variant and group B) 10 patients (MUT1-10; 28 arrays) carrying the TNF rs1800629 A variant and resulted in an imputed, normalized and logged dataset 2. 6) Statistical analysis of microarrays: In dataset 2, for each gene, the mean value of all technical replicates of a patient was calculated in dChip (10). To identify differentially regulated genes between group B (TNF rs1800629 A variant) and group A (TNF wild type), the dataset was subjected to a novel two-class rank statistics (Rank products, RP) as described below (11, 12). For each gene, a false discovery rate (FDR) < 0.25 was defined as the significance level. 7) Annotation of genes: Significantly regulated genes were annotated using the web based annotation tools SOURCE (13) and the Database for Annotation, Visualization and Integrated Discovery (DAVID) (14) version 2.0 as described in the manuals. 8) Enriched functional categories: Enriched functional categories within the differentially regulated genes were determined using DAVID (14) version 2.0. DAVID is a platform that provides statistical methods (reported as an Enrichment Score) to facilitate the biological interpretation of gene lists deriving from microarray analysis. Enriched genes describes a class of genes that have similar functions regardless of their expression level, and appear more often in a list of interest than would normally be predicted by their distribution among all genes assayed. An Enrichment Score is calculated for likelihood of enrichment of biological processes, molecular functions and cellular component categories using the Gene Ontology public database. 9) Cluster analysis: Hierarchical cluster analysis of the top 100 significant over- and under-expressed genes was performed using the centroid linkage method and the distance matrix 1 –r in dChip (10). Rank products The Rank Products method (11, 12) was used for identifying differentially expressed genes in the expression data. The method is based on the premise that a gene in an experiment examining n genes in k replicates, has a probability of being ranked first (rank 1) of 1/nk if the lists were entirely random. Therefore, it is unlikely for a single gene to be in the top position in all replicates if this gene was not differentially expressed, i.e., if all null hypotheses were true. More generally, for each gene g in k replicates i, each examining ni genes, one can calculate the corresponding combined probability as a rank product RPgup=∏i=1k(ri,gup/ni) where ri,gup is the position of gene g in the list of genes in the ith replicate sorted by decreasing fold change, i.e. rup = 1 for the most strongly upregulated gene, etc. The genes can then be sorted according to the likelihood of observing their RP value at or above a certain position on the list. Analogously, RPdowng is calculated from the list of genes sorted by increasing FC, i.e. rdown=1 for the most strongly downregulated gene. To know how significant the changes are and how many of the selected genes are likely to be truly differentially expressed., a simple permutation-based estimation procedure provides a very convenient way to determine how likely it is to observe a given RP value in a random experiment by converting the RP value to an E value in analogy to the BLAST results (15). The RP value distribution can be approximated in each case by calculating the RP values for a number of z random “experiments” with the same number of replicates and “genes” as the real experiment. Each random experiment consists of k random permutations of the numbers 1,…,n and for these the RP values are calculated as described above. The number of simulated RP values in the random experiments that are smaller than or equal to a given experimental RP value (x(RP)) are then used to calculate the average expected value E(RP)≈x(RP)/z. Subsequently, for each gene g a conservative estimate of the percentage of false-positives (PFP) is calculated: qg=E(RPg)/rank(g). Here, rank(g) denotes the position of gene g in a list of all genes sorted by increasing RP value, i.e., it is the number of genes accepted as significantly regulated. This estimates the FDR [Storey 2003] and provides a flexible way to assign a significance level to each gene. The FDR is accepted as a reasonable significance threshold in microarray studies (16). One can now decide how large a PFP would be acceptable and extend the list of accepted genes up to the gene with this qg value. The rank product method was chosen since it has been shown to outperform classical t-statistic and moderated t-statistics when datasets have low numbers of samples or high levels of noise (11, 17). To investigate the difference in the peripheral blood transcriptome from patients with and without the TNF rs1800629 A variant, bootstrap sampling was performed with 10,000 replicates. From this, the 95% CI for mean differences in expression levels as well as robust two-sided p values were estimated. TaqMan Real-time Reverse Transcription-polymerase Chain Reaction To validate the microarray data, TaqMan quantitative real-time real-time reverse transcriptionpolymerase chain reaction (RT-PCR) was performed for 10 selected human genes (CASP8, ILR1, ILR2, TNFRSF1A, SOCS3, IL18R1, CEBPD, TLR2, PRV1, TLR4). Pre-optimized TaqMan primer/probe sets (Quantitect Primer Assays) of selected genes were obtained from the Gene Globe Portal (Qiagen, Hilden, Germany). TaqMan probes were labeled with 6-carboxy-fluorescein (FAM) as a reporter dye and 6-carboxy-tetramethyl- rhodamine as a quencher dye. Peptidylprolyl isomerase A (Quantitect Primer Assays) was simultaneously detected as an internal control to normalize all the data. Prior to sample measurements, all primer pairs were validated using a control total RNA pool derived from PAXgene samples. Standard curves of gradual RNA dilutions were designed by plotting Ct values against the log-transformed input total RNA (in ng). Amplifikation efficiencies for the target genes and the internal control were calculated as E = 10(-1/S) -1, where S is the slope of the standard curve. The amplification efficiencies are given in Table e1. For sample measurements, 400ng PAXgene RNA of two groups of patients A and B (group A: 7 patients without the TNF rs1800629 A variant (WT2, WT4-7, WT10-11); group B: 9 patients carrying the TNF rs1800629 A variant (MUT1-9) were subjected to cDNA synthesis using Superscript II Reverse Transcriptase (Invitrogen, Karlsruhe, Germany) following the manufacturer’s protocol. Real-time RT-PCR was performed on the ABI PRISM® 7700 Sequence Detection System (Applied Biosystems, Darmstadt, Germany) using the Quantitect SYBR Green PCR Kit (Qiagen) with cDNA corresponding to 2 ng (0,5%) input total RNA. All reactions were run in duplicate. Ct values of the tested genes were determined and compared with the respective standard curve. The antilogarithm of the value at the intersection point with the standard curve corresponded with the amount of human total RNA of the expressed target gene. The normalized expression of a target gene Eg was given as the ratio between the total RNA amount of the target gene and the internal control (peptidylprolyl isomerase A). Both normalized microarray intensities and RT gene expression levels relative to internal control of patients with the TNF rs1800629 A variant (group B) were log2 transformed and expressed as log2 differences from patients without the TNF rs1800629 A variant (group A). Results Multivariate analysis of outcome Multivariate analysis controlling for the variables TNF- plasma concentration on day one, sepsis syndrome, age, sex, ISS, and body weight revealed significant confounding by TNF- plasma concentration on day 1 after inclusion and sepsis syndrome. Specifically, the risk of succumbing was higher when TNF- plasma concentrations were high on day one and in patients with sepsis syndrome. When these factors were considered by multivariate analysis, the association of rs1800629 A allele and rs909253 G allele with outcome in severely injured patients remained stable. Bootstrap estimates of the parameter values in the multivariate setting yielded mean values (95% CI) of 0. (-0.; 1.) for rs909253 and 1.118 (-0.070; 2.331) for rs1800629, respectively. Validation of microarray results The validity of the microarray results were determined by using a TaqMan assay of 10 selected genes from the microarray with a broad range of expression value. The overall correspondence between gene expression levels by microarrays and by TaqMan was good indicated by a correlation coefficient of 0.88 (Figure e3). A gene-to-gene variation exists and may be attributable to sequencespecific factors, i.e. the labelled cRNA may hyridize to a microarray element for a given gene that is a few hundred base pairs from the corresponding TaqMan primers and probes. Nevertheless, our results support the accuracy by which the Codelink microarray represents gene expression. REFERENCES 1. Baker SP, O'Neill B, Haddon W, Jr., et al: The injury severity score: A method for describing patients with multiple injuries and evaluating emergency care. J Trauma 1974; 14:187-196 2. Bone RC, Balk RA, Cerra FB, et al: Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus Conference Committee. American College of Chest Physicians/Society of Critical Care Medicine. Chest 1992; 101:1644-1655 3. Sibbald WJ, Vincent JL: Round table conference on clinical trials for the treatment of sepsis. Brussels, March 12-14, 1994. Intensive Care Med 1995; 21:184-189 4. Schaid DJ, Rowland CM, Tines DE, et al: Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 2002; 70:425-434 5. Lake SL, Lyon H, Tantisira K, et al: Estimation and tests of haplotype-environment interaction when linkage phase is ambiguous. Hum Hered 2003; 55:56-65 6. Royston P: A strategy for modelling the effect of a continuous covariate in medicine and epidemiology. Stat Med 2000; 19:1831-1847 7. Kim KY, Kim BJ, Yi GS: Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinformatics 2004; 5:160 8. Bolstad BM, Irizarry RA, Astrand M, et al: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19:185193 9. Gwadry FG, Sequeira A, Hoke G, et al: Molecular characterization of suicide by microarray analysis. Am J Med Genet C Semin Med Genet 2005; 133:48-56 10. Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc Natl Acad Sci U S A 2001; 98:31-36 11. Breitling R, Armengaud P, Amtmann A, et al: Rank products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 2004; 573:83-92 12. Breitling R, Herzyk P: Rank-based methods as a non-parametric alternative of the T-statistic for the analysis of biological microarray data. J Bioinform Comput Biol 2005; 3:1171-1189 13. Diehn M, Sherlock G, Binkley G, et al: SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res 2003; 31:219-223 14. Dennis G, Jr., Sherman BT, Hosack DA, et al: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003; 4:3 15. Altschul SF, Gish W, Miller W, et al: Basic local alignment search tool. J Mol Biol 1990; 215:403-410 16. Storey JD, Tibshirani R: Statistical methods for identifying differentially expressed genes in DNA microarrays. Methods Mol Biol 2003; 224:149-157 17. Jeffery IB, Higgins DG, Culhane AC: Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 2006; 7:359