Genomic responses in mouse models greatly mimic human inflammatory diseases Keizo Takaoa,b and Tsuyoshi Miyakawaa,b,c,1 a Section of Behavior Patterns, Center for Genetic Analysis of Behavior, National Institute for Physiological Sciences, Okazaki, Aichi 444-8585, Japan; Core Research for Evolutional Science and Technology, Japan Science and Technology Agency, Kawaguchi, Saitama 332-0012, Japan; and cDivision of Systems Medical Science, Institute for Comprehensive Medical Science, Fujita Health University, Toyoake, Aichi 470-1192, Japan b The use of mice as animal models has long been considered essential in modern biomedical research, but the role of mouse models in research was challenged by a recent report that genomic responses in mouse models poorly mimic human inflammatory diseases. Here we reevaluated the same gene expression datasets used in the previous study by focusing on genes whose expression levels were significantly changed in both humans and mice. Contrary to the previous findings, the gene expression levels in the mouse models showed extraordinarily significant correlations with those of the human conditions (Spearman’s rank correlation coefficient: 0.43–0.68; genes changed in the same direction: 77– 93%; P = 6.5 × 10−11 to 1.2 × 10−35). Moreover, meta-analysis of those datasets revealed a number of pathways/biogroups commonly regulated by multiple conditions in humans and mice. These findings demonstrate that gene expression patterns in mouse models closely recapitulate those in human inflammatory conditions and strongly argue for the utility of mice as animal models of human disorders. transcriptome analysis | inflammation | sepsis | burn | trauma T he use of mice as animal models of human disorders has long been considered essential for elucidating the underlying mechanisms of disease, as well as for translational research from bench to bedside. This notion was seriously challenged by the recent report by Seok et al. (1) that genomic responses to different acute inflammatory stressors are highly similar in humans but very poorly reproduced in the corresponding mouse models. Seok et al. (1) investigated gene expression changes in individuals with trauma, burns, and endotoxemia and compared them with those in mouse models of these conditions. Surprisingly, in their study there were few correlations between the gene expression changes in the human conditions and those in mouse models, whereas the gene expression patterns were similar between humans with trauma, burns, and endotoxemia. Based on their findings, Seok et al. (1) concluded that higher priority should be focused on studying complex human conditions directly rather than relying on mouse models to study human inflammatory diseases. The study has drawn much attention from researchers (2–10) and mass media (11–14) and has been cited more than 360 times since its publication only 18 months ago. Altmetric analysis of the article indicates that it is among the top 1% of all publications tracked by the system (as of January 20, 2014). Most of the reactions were an expression of general concern regarding the utility of mouse models for biomedical research. In the present study, we reevaluated the same gene expression datasets analyzed in Seok et al. (1) using more conventional statistical methods. There are a few critical differences in the analysis methods used between the previous study and ours. First, we focused on genes whose expression levels were significantly changed in both humans and mice. The previous study analyzed sets of genes that were significantly changed in the human conditions regardless of the significance of the changes in mouse models for comparison, which is not a conventional method of comparing two gene expression datasets. Assuming that mouse models would mimic only partial aspects of human disorders, inclusion of genes that showed no significant response www.pnas.org/cgi/doi/10.1073/pnas.1401965111 to the stimulus would generally decrease the sensitivity to detect the responses shared by the disorders and their models. For this reason, we excluded such genes from our analysis. Second, we compared each of the conditions in a single mouse study independently with the human reference conditions. Mouse studies, such as GSE7404 and GSE19668, included multiple conditions or gene sets. For example, GSE19668 contains multiple datasets, including those for two different mouse strains and multiple timecourse data points after infection. Because humans and mice are expected to be quite different, the optimal conditions/parameters that most closely mimic human conditions should be rigorously searched and considered the best model when trying to establish any animal model of a human disorder. Therefore, among such multiple conditions in a mouse study we chose the gene set with the highest similarity to the human reference condition and used this set for further analyses. Third, we mainly used Spearman’s rank correlation coefficient (or Spearman’s ρ), instead of Pearson’s correlation coefficient (R) or correlation coefficient of determination (R2), because there is no reason to assume linearity and normal distribution of fold changes or log-twofold changes of gene expression levels. Fourth, we used a bioinformatics tool, NextBio (15), to conduct more sophisticated nonbiased statistical analyses of the similarity between gene sets. NextBio employs a normalized ranking approach, which enables comparability across data from different studies, platforms, and analysis methods by removing dependence on absolute values of fold change and minimizing some of the effects of the normalization methods used and platform effects. We also used NextBio to conduct meta-analyses of the human and mouse gene sets to determine the pathways/biogroups that were commonly changed in the human and mouse gene sets. Significance The role of mouse models in biomedical research was recently challenged by a report that genomic responses in mouse models poorly mimic human inflammatory diseases. Here we reevaluated the same gene expression datasets used in the previous study by focusing on genes whose expression levels were significantly changed in both humans and mice. Contrary to the previous findings, the gene expression patterns in the mouse models showed extraordinarily significant correlations with those of the human conditions. Moreover, many pathways were commonly regulated by multiple conditions in humans and mice. These findings demonstrate that gene expression patterns in mouse models closely recapitulate those in human inflammatory conditions and strongly argue for the utility of mice as animal models of human disorders. Author contributions: T.M. designed research; K.T. and T.M. performed research; K.T. and T.M. analyzed data; and K.T. and T.M. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open access option. 1 To whom correspondence should be addressed. Email: miyakawa@fujita-hu.ac.jp. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1401965111/-/DCSupplemental. PNAS Early Edition | 1 of 6 MEDICAL SCIENCES Edited by Ruslan Medzhitov, Yale University School of Medicine, New Haven, CT, and approved June 11, 2014 (received for review January 31, 2014) Human Burn GSE37069 15 ρ = 0.88 ρ = 0.82 ρ = 0.68 ρ = 0.43 ρ = 0.57 ρ = 0.51 N = 13225 P < 0.0001 N = 9149 P < 0.0001 N = 13223 P < 0.0001 N = 9145 P < 0.0001 N = 13222 P < 0.0001 N = 13222 P < 0.0001 Human Trauma 0 GSE36809 -15 15 ρ = 0.72 ρ = 0.63 ρ = 0.50 ρ = 0.54 ρ = 0.51 N = 12566 P < 0.0001 N = 13212 P < 0.0001 N = 13210 P < 0.0001 N = 13215 P < 0.0001 N = 13219 P < 0.0001 Human Sepsis 0 GSE28750 -15 15 ρ = 0.56 ρ = 0.48 ρ = 0.48 ρ = 0.52 N = 13223 P < 0.0001 N = 13216 P < 0.0001 N = 13224 P < 0.0001 N = 13215 P < 0.0001 Mouse Burn ρ = 0.47 ρ = 0.54 ρ = 0.57 N = 13224 P < 0.0001 N = 13224 P < 0.0001 N = 13212 P < 0.0001 0 GSE7404 -15 15 Mouse Trauma 0 GSE7404 -15 15 ρ = 0.50 ρ = 0.51 N = 13224 P < 0.0001 N = 13211 P < 0.0001 Mouse Sepsis 0 GSE19668 -15 15 Mouse Infection 0 -15 -15 ρ = 0.57 N = 13210 P < 0.0001 GSE20524 0 15 -15 0 15 -15 0 15 -15 0 15 -15 0 15 -15 0 15 Fig. 1. Correlations of gene changes among human burns, trauma, sepsis, and the corresponding mouse models. Scatterplots and Spearman’s rank correlations (ρ) of the fold changes of the genes responsive to both conditions for each pair of interest (P < 0.05; fold change >1.2). Vertical bar and horizontal bar for each panel represents fold change in right and upper panels, respectively. Murine models were highly significantly correlated with human conditions with Spearman’s correlation coefficient (ρ = 0.43–0.68; P < 0.0001 for every comparison between human conditions and mouse models). The correlations between different mouse models were also significant (ρ = 0.47–0.57; P < 0.0001 for every comparison). 2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1401965111 Human Burn vs. Mouse Burn (GSE7404) 50 Overlap P value: 40 3.9 × 10-34 30 20 B 20 15 5 0 0 940 genes 854 genes 267 genes 397 genes Human↑ Human↓ Human↑ Human↓ Mouse↑ Mouse ↓ Mouse↓ Mouse ↑ Human Burn vs. Mouse Sepsis (GSE19668) 50 Overlap P value: 40 1.2 × 10-35 30 20 D 20 593 genes 591 genes 482 genes 816 genes Human↑ Human↓ Human↑ Human↓ Mouse↑ Mouse ↓ Mouse↓ Mouse ↑ Human Burn vs. Mouse Sepsis (GSE26472) Overlap P value: -log (p-value) 60 6.3 × 10-13 10 10 C Human Burn vs. Mouse Trauma (GSE7404) Overlap P value: -log (p-value) 60 -log (p-value) gene expression between patients and healthy subjects for each dataset of human burn (GSE37069), trauma (GSE36809), and sepsis (GSE28750) conditions and between treated and control groups for the murine models (GSE7404 for burn and trauma, GSE19668 for sepsis, and GSE20524 for infection) were obtained from a bioinformatics platform, NextBio. Between any two datasets, the agreements of the gene fold changes (Spearman’s rank correlation between the two datasets for Fig. 1 and Pearson’s correlation for Fig. 3) as well as the directionality of the changes (P values derived from nonparametric ranking method by NextBio for Fig. 2 and the percentages of genes changed in the same direction between the two datasets for Fig. 3) were compared. We conducted a Kolmogorov–Smirnov test to check for the normality of the distribution of gene expression data in human burn conditions, which were compared with mouse models as a reference dataset. The assumption of normality was rejected either for the fold changes or for the log-twofold changes of the gene expression levels (P < 0.0001). Therefore, we mainly used nonparametric Spearman’s correlation coefficient (ρ) for the correlation analyses. The criteria for the selection of the genes of interest was fold change >1.2 and P < 0.05 within each condition to be compared. The correlations of the gene changes as assessed by Spearman’s correlation coefficient indicated that there were highly significant similarities in gene responses between each of the human conditions and those of the mouse models (Fig. 1; ρ = 0.43–0.68, P < 0.0001 for every comparison between human conditions and the corresponding mouse models). There were also highly significant correlations among different mouse models A -log (p-value) Correlation of Gene Changes to Trauma, Burns, and Endotoxemia Between Human Subjects and Murine Models. The fold changes of 15 6.5 × 10-11 10 5 10 0 0 771 genes 1266 genes 773 genes 441 genes Human↑ Human↓ Human↑ Human↓ Mouse↑ Mouse ↓ Mouse↓ Mouse ↑ E 40 -log (p-value) Results 30 578 genes 221 genes 212 genes 513 genes Human↑ Human↓ Human↑ Human↓ Mouse↑ Mouse ↓ Mouse↓ Mouse ↑ Human Burn vs. Mouse Infection (GSE20524) Overlap P value: 3.4 × 10-35 20 10 0 564 genes 1044 genes 353 genes 237 genes Human↑ Human↓ Human↑ Human↓ Mouse↑ Mouse ↓ Mouse↓ Mouse↑ Fig. 2. Statistical comparison of the direction of the gene expression changes between human burns and mouse models. Vertical bar represents the significance of the overlap between gene sets. Genes whose expression levels were changed in human burns significantly overlapped with those in the condition of mouse burn (A, overlap P value = 3.9 × 10−34), mouse trauma (B, 6.3 × 10−13), mouse sepsis from GSE19668 (C, 1.2 × 10−35), mouse sepsis from GSE26472 (D, 6.5 × 10−11), and mouse infection (E, 3.4 × 10−35). Value is expressed as the –log 10 of the P value. Statistical significances regarding the directionality of the gene expression changes were derived from the nonparametric ranking method provided by the bioinformatics platform NextBio. (Fig. 1; ρ = 0.47–0.57, P < 0.0001 for every comparison between a pair of mouse models). Nonparametric ranking analysis by NextBio rejected, with extraordinarily high confidence, the hypothesis that mouse models show only coincidental overlap of the directionality of gene changes with those in human burn conditions [Fig. 2; overlap P value = 3.9 × 10−34, 6.3 × 10−13, 1.2 × 10−35, 6.5 × 10−11, and 3.4 × 10−35 for mouse models of burn, trauma, sepsis (GSE19668), sepsis (GSE26472), and infection, respectively], demonstrating that the directionality of the changes in mouse models was highly similar to that in human burn conditions. Outputs of these analyses using NextBio are available from the URLs shown in Table S1. Although the data violate the assumption of a normal distribution, we created a scatterplot for the responses to inflammation from different etiologies in humans and murine models (Fig. 3) using Pearson’s correlation coefficient R and percentages of genes changed in the same direction to compare with figure 4 in Seok et al. (1). Contrary to the conclusion of Seok et al., there were significant correlations and similarities of the direction in the gene expression changes between human burn conditions and mouse models (Fig. 3; human: fold change >2.0; mouse model: fold change >1.2; R = 0.26–0.51; P < 0.0001 for all comparisons; percentage: 48.1–86.2). With more stringent criteria for gene selection (human: fold change >4.0; mouse model: fold change Takao and Miyakawa respectively. Significant enrichment of up-regulated genes in five pathways/biogroups was detected across all seven conditions. Pathways/biogroups that show enrichment with a P value less than 0.6 in at least one of the conditions are listed in Dataset S1. Some of the pathways/biogroups with high overlap are shown in Fig. 4 A–D. The significance of the overlap between each condition and pathways/biogroups is also shown in the graph. There was significant overlap between genes annotated in GO as “innate immune response” and the genes up-regulated in the mouse models of burn (Fig. 4A, P = 6.3 × 10−24), sepsis (P = 4.6 × 10−33), and infection (P = 1.3 × 10−16), as well as in human burn A Innate immune response suppressed activated >1.2), R values and the percentages of genes changed in the same direction were generally greater [Fig. 3; R = 0.36–0.59; P < 0.0001 for all comparisons except for mouse sepsis (P = 0.0162); percentage: 74.4–93.2]. Detailed information on the analyses is available in Table S2. Comparison of the Significantly Regulated Pathways/Biogroups Between Human Conditions and Mouse Models. Pathways/biogroups commonly altered across datasets of human diseases and mouse models were determined through a combination of rank-based enrichment statistics and biomedical ontologies using NextBio. In the human burn condition, significant enrichment of up-regulated and down-regulated gene expression was identified in 875 and 197 pathways/biogroups, respectively, from a total of 6,176 sets of pathways/ biogroups registered in NextBio [1,454 sets from Gene Ontology (GO) database, 4722 sets from canonical pathways in Broad MSigDB]. As a selection criterion, a P value that was corrected by the Bonferroni correction for multiple comparisons (P < 8.1 × 10−6 = 0.05/6,176) was used for this identification. Among the 875 or 197 pathways/biogroups with significant enrichment of up-regulated or down-regulated genes in the human burn condition, the pathways/ biogroups that demonstrated significant enrichment in other human and mouse conditions as well were determined with a P value corrected with the Bonferroni correction (P < 5.7 × 10−5 = 0.05/875 or P < 2.6 × 10−4 = 0.05/197, respectively). Of the 875 pathways/ biogroups with significant enrichment of up-regulated genes in human burn conditions, 313, 163, 345, and 114 also showed significant enrichment in mouse models of burn, trauma, sepsis, and infection, respectively. Unexpectedly, only half of the pathways/ biogroups up-regulated in human burn conditions were shared by the other human disease conditions (trauma 52.1%, sepsis 53.4%). Significant enrichment of up-regulated genes in six pathways/biogroups was detected across all conditions (three human conditions and four mouse models). Similarly, among the 195 pathways/biogroups with significant enrichment of down-regulated genes in human burn conditions, 85, 44, 60, and 130 also showed significant enrichment in mouse models of burn, trauma, sepsis, and infection, Takao and Miyakawa Human Burn Human Trauma Human Sepsis Mouse Burn Mouse Trauma Mouse Sepsis Mouse Infection -10 0 10 20 30 40 50 60 70 80 90 Significance of overlaps with the GO biogroup (- log) B Genes involved in Cytokine Signaling in Immune system suppressed activated Human Burn Human Trauma Human Sepsis Mouse Burn Mouse Trauma Mouse Sepsis Mouse Infection -10 0 10 20 30 40 50 60 Significance of overlaps with the Broad MSigDB biogroup (- log) C Lymphocyte differentiation activated suppressed Human Burn Human Trauma Human Sepsis Mouse Burn Mouse Trauma Mouse Sepsis Mouse Infection -10 -5 0 5 10 15 20 25 30 Significance of overlaps with the GO biogroup (- log) D Genes involved in Translocation of ZAP-70 to Immunological synapse activated suppressed Human Burn Human Trauma Human Sepsis Mouse Burn Mouse Trauma Mouse Sepsis Mouse Infection -5 0 5 10 15 20 25 Significance of overlaps with the Broad MSigDB biogroup (- log) Fig. 4. Representative pathways or biogroups shared by the human diseases and mouse models. (A–D) Shown are bar graphs of statistical significance (−log 10 P value) for the representative pathways with significant regulation in the human and murine models. (A) Genes annotated as “innate immune response (GO)” significantly overlapped with genes up-regulated in the mouse models as well as in human diseases. (B) Significant overlap was also detected between “genes involved in cytokine signaling in immune system (canonical pathways, Broad MSigDB)” and genes up-regulated in both the mouse models and human diseases. (C) Genes annotated “lymphocyte differentiation (GO)” and genes down-regulated in the mouse models and human diseases significantly overlapped. (D) There was also significant overlap between “genes involved in translocation of ZAP-70 to immunological synapse (canonical pathways, Broad MSigDB).” As can be seen, in the pathways/biogroups shown here, the mouse models are comparable to the human conditions in the significance of enrichment between each condition and pathways/biogroups. For a complete list of pathways/ biogroups shared by the human diseases and mouse models see Dataset S1. PNAS Early Edition | 3 of 6 MEDICAL SCIENCES Fig. 3. Comparison of the genomic response to severe acute inflammation from different etiologies in human and murine models. The datasets that were used in Seok et al. (1) and are registered in NextBio were reevaluated using the criteria of Seok et al. (1) to select the genes of interest. We chose genes with significant responses in both the human burn dataset and in the mouse dataset for comparison, whereas Seok et al. chose the 4,918 genes with significant responses in the human datasets regardless of the significance of the changes in the genes in mice. The datasets used here are listed in Table S2. Shown are Pearson’s correlations (R; x axis) and directionality (%; y axis) of the gene response from multiple published datasets in GEO compared with human burns. Note that R2 data taken from Seok et al. were recalculated to obtain the R values shown here for comparison (blue symbols). Most of the mouse models showed a high directionality score (random chance is 50%), indicating that their gene expression patterns are similar to the that in human burns, as long as stringent criteria are applied to the selection of the genes of interest. (P = 5.0 × 10−51), trauma (P = 4.1 × 10−90), and sepsis conditions (P = 6.3 × 10−21). Significant overlap was also detected between “genes involved in cytokine signaling in immune system (canonical pathways, Broad MSigDB)” and genes up-regulated in the mouse models of sepsis (Fig. 4B, P = 4.0 × 10−36), burn (P = 1.6 × 10−11), and infection (P = 1.3 × 10−8), as well as in human burn (P = 5.5 × 10−28), trauma (P = 6.5 × 10−52), and sepsis conditions (P = 3.2 × 10−12). With regard to down-regulated pathways/biogroups, genes annotated “lymphocyte differentiation (GO)” significantly overlapped with genes down-regulated in the mouse models of burn (Fig. 4C, P = 2.2 × 10−8), trauma (P = 1.4 × 10−15), sepsis (P = 4.2 × 10−20), and infection (P = 1.3 × 10−18), as well as in human burn (P = 1.7 × 10−18), trauma (P = 5.6 × 10−12), and sepsis conditions (P = 3.3 × 10−30). There was also significant overlap between “genes involved in Translocation of ZAP-70 to immunological synapse (canonical pathways, Broad MSigDB)” and genes down-regulated in all of the human disease conditions and mouse models of these conditions (Fig. 4D, human burn, P = 1.3 × 10−11; human trauma, P = 0.0003; human sepsis, 8.7 × 10−25; mouse burn, P = 0.0003; mouse trauma, P = 5.1 × 10−10; mouse sepsis, P = 1.3 × 10−7; and mouse infection, P = 1.8 × 10−9). Discussion Contrary to the findings reported by Seok et al.(1), the findings of the present study using the same datasets presented in their study demonstrated that the expression levels of the genes in mouse models of human disease conditions were highly significantly correlated with those of the human conditions. The percentage of genes that changed in the same direction between human diseases and mouse models was greater than chance level and statistically highly significant (P = 6.5 × 10−11 to 1.2 × 10−35). Moreover, meta-analysis of those datasets revealed a number of pathways/biogroups commonly and significantly altered by multiple conditions in both humans and mice. Thus, our findings were quite different from those reported by Seok et al. (1) using virtually identical datasets, resulting in almost completely opposite conclusions. How can such discrepancies be explained? As a research group that uses mice as a main tool for studying human disorders, we applied commonly used methodologies to the analyses to effectively detect similarities between humans and mice. It is generally well known and accepted among researchers using animal models that mouse models mimic only partial aspects of human disorders. To identify the signals/signs of the similarities with higher sensitivity, genes that do not show significant responses to the experimental manipulations must be excluded from the analyses, because those nonresponsive genes would simply produce noise in the correlation analysis. For example, 13,586 and 3,116 genes are changed (P < 0.05 and fold change >1.2) in human burn conditions and mouse models of infection, respectively, and 1,992 among them are commonly changed in both humans and mice. We calculated the correlation coefficients and percentages of genes that changed in the same direction as the 1,992 overlapping genes. However, Seok et al. (1) used 4,981 genes for the calculation, which included all 3,250 genes that were changed (false discovery rate <0.001 and fold change >2.0) in human burn conditions, and the 1,668 genes with significant changes only in other human conditions, regardless of their responses in mouse models. With the calculation method used in the previous study, the biased inclusion of genes that showed a significant response only in the human conditions and not in mouse models greatly reduces the value of the overall correlation coefficient/percentage of the genes that changed in the same direction and misses the possible responses in the mouse model that recapitulate the human condition. The genes shared by humans and mice (i.e., the 1,992 genes in the example described above) are those that mainly represent the aspects of human disorders and biological pathways mimicked by the mouse model. It is not surprising that Seok et al. (1) found little correlation in each pathway, considering that genes 4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1401965111 showing no significant response in the mouse models were included in the calculation. To compare pathways/biogroups that are commonly changed between human diseases and mouse models, Seok et al. (1) used Pearson’s correlation coefficient of determination R2, which should not be used for detecting similarities between different species. The use of R2 is inadequate in this case for the following reasons. First, Pearson’s correlation assumes a linear relationship between normally distributed data, and there is no reason to assume a linear relationship among fold changes or log-twofold changes of gene expression levels. In this case, the gene expression level of each gene may follow a different function of the severity of inflammation. Some of the genes may have linearity but others may not, and such differences among genes could be more apparent between different species. Second, the normality of the distribution cannot be assumed regarding fold changes and log-twofold changes of gene expression levels. Indeed, in the present study, the assumption of a normal distribution for the datasets was formally rejected by the Kolmogorov–Smirnov test. Thus, the use of Pearson’s correlation in this analysis is not justified. In addition, R2 provides the impression that the correlation is small when R is an intermediate value. Supposing that R is 0.14–0.28, which is usually considered an intermediate or moderate correlation, R2 would be 0.02–0.08, which seems small to those not familiar with the use of R2. In the present study, we used Spearman’s rank correlation, a nonparametric statistical method that does not assume linearity or normal distribution of the data. Another problem with the study by Seok et al. (1) is that they did not try to find the optimal condition in a mouse study that has multiple conditions in a dataset and that time-course data were used in aggregate. In GSE19668, for example, there are eight conditions that can be compared (2, 4, 6, or 12 h after infection in C57BL6J or AJ mice). The overlap P values with the human burn dataset vary from 8.7 × 10−8 (C57BL6J, 2 h after infection) to 1.2 × 10−35 (C57BL6J, 6 h after infection). In our analyses, we used the dataset that showed the highest similarity for further analyses. In general, it is preferable to use conditions that most highly mimic the human disorders when using animal models. In fact, when researchers develop an animal model of a human condition, they screen the models to determine which model most highly mimics the particular aspect of the human condition being studied. We also used the sophisticated statistical methods provided by NextBio to find potential similarities between human conditions and mouse models. Expression profiles vary considerably from study to study and from species to species. Different studies for different species can yield different dynamic ranges, distributions of fold changes, and P values that reflect the conditions used. To allow for interstudy comparisons, NextBio employs a nonparametric approach, in which ranks are assigned to each gene signature based on the magnitude of fold change. Ranks are then further normalized to eliminate any bias owing to varying platform size. This normalized ranking approach enables comparisons across data from different studies, platforms, and analysis methods by removing the dependence on absolute values of fold change and minimizing some of the effects of the differences of normalization methods, platform, and species used. Use of this method demonstrated highly significant similarities between human and mouse conditions (P value = 6.5 × 10−11 to 1.2 × 10−35). Evaluating similarities across microarray studies is an important topic for contemporary biomedical research. Simple Pearson’s correlations are widely used, but other sophisticated statistical measures, including that of NextBio, have been proposed to quantify the similarity of any two microarray studies (16). It would be interesting to assess the same datasets with other advanced analytical measures. The study by Seok et al. (1) has been interpreted by some researchers and mass media as robust and persuasive evidence that there are no correlations or similarities between mouse models and human conditions, provoking discussions of the Takao and Miyakawa Takao and Miyakawa heterogeneous populations in terms of the site or microbiology of infection, genetic makeup, age, comorbidity of the patients, and so forth. Such heterogeneity in human conditions could dilute the positive effect of the treatment, if any, in a subgroup of patients in a clinical setting. A review of studies using animal models of TNF neutralization in sepsis suggests that anti-TNF therapies are most efficacious in models of systemic endotoxemia or Gram-negative infection, ineffective in models of complex polymicrobial infections, and harmful when the challenge organism is Streptococcus pneumoniae or an opportunistic pathogen (31). This implies that heterogeneity of the efficacy of a treatment exists even among animal models with relatively homogeneous populations and that assessment of a treatment requires rigorous characterization of the heterogeneity of human patients. Third, the index for the outcome of clinical trials is focused on patient mortality. If a trial is evaluated based solely on mortality, it provides no information on the effect on the clinical or biological events that are not lethal but may be important to patients who survive their sepsis episodes (23). It is possible that one therapy cannot improve patient mortality but improves the prognosis of survivors, such as the incidence rate of amputations (23, 32) and level of organ dysfunction (32, 33). Taken together, it is not surprising that Seok et al. (1) found almost no correlation between the genomic responses in human disease conditions and those of mouse models, because they used methodologies with limited detection ability, which are rarely used by researchers who study animal models. With reasonably sophisticated and commonly used methods, we found highly significant correlations between human and mouse data. It should be noted that in our analyses we used the data from the exact same studies as those used in Seok et al. (1), and still we found highly significant similarities, demonstrating that their failure to detect correlations is purely a result of the inappropriately biased methodologies they used. Efforts to find more optimal age/ time-course/organs/strains/induction protocols that recapitulate human gene expression for each disease condition, might lead to better mouse models of these disorders than the ones evaluated here. Given that even these suboptimal models provide highly significant correlations, however, Seok et al. (1) might consider modifying their conclusions, which have drawn the attention of the popular media as well as researchers in broad areas of biomedical sciences. Materials and Methods Datasets for Human Diseases and Mouse Models. The datasets that we analyzed in the present study were the same as those used in the study by Seok et al. (1) and are registered in NextBio. In the present study, the following datasets were used for gene expression pattern analyses: “leukocytes of patients with severe burns on >20% of total body surface area vs. healthy controls” from GSE37069 is referred as “Human Burn”; “white blood cells of severe blunt trauma patients 14 d after injury vs. healthy subjects” from GSE36809 is referred as “Human Trauma”; “whole blood of sepsis patients with community-acquired infection vs. healthy subjects” from GSE28750 is referred as “Human Sepsis”; “WBC from blood at 7 d after burn injury vs. burn injury sham” from GSE7404 is referred to as “Mouse Burn”; “WBC from spleen at 3 d after trauma hemorrhage vs. trauma hemorrhage sham” from GSE7404 is referred to as “Mouse Trauma”; “Blood of C57BL6J mice 4 h after Staphylococcus aureus infection vs. uninfected” from GSE19668 is referred to as “Mouse Sepsis”; “Blood from 8-wk-old BALB-c mice 1 d after tail vein injection - Candida albicans vs. saline control” from GSE20524 is referred to as “Mouse Infection.” Only genes with a P value of 0.05 or less and an absolute fold change of 1.2 or greater were considered to be differentially expressed. Comparison of Gene Response Between Datasets. Fold changes of datasets used in the present study were calculated by dividing the value of human diseases/mouse models by that of healthy controls/normal control mice. Values were converted into the negative reciprocal, or −1/(fold change) if the fold change was less than 1. In Fig. 1, genes meeting the criteria of P < 0.05 and fold change >1.2 are plotted in the graph. The normality of each dataset was assessed using the Kolmogorov–Smirnov test, in which none of the datasets had a normal distribution (P < 0.0001). Because the distribution PNAS Early Edition | 5 of 6 MEDICAL SCIENCES validity of animal models as a tool for studying human disease and for translational research (3, 8). Commentaries in response to the previous study claim that the implications of the study by Seok et al. (1) may well go beyond mice and sepsis (10), and that better definitions of clinical phenotypes, especially in heterogeneous or overlapping conditions such as neurodevelopmental and psychiatric diseases, as well as more emphasis on rigorously defining molecular alterations in human patients, is needed (8). It should be noted that, even for schizophrenia, which could be considered a uniquely human disorder (17), recent mouse models have been developed using an approach similar to those used in the present study that are shown to closely recapitulate not only the clinical or behavioral phenotypes but also the molecular alterations in transcriptional and protein changes in the brain (18–21). The New York Times published an article attempting to translate the findings in the study by Seok et al. (1) for a general audience, noting, “But now, researchers report evidence that the mouse model has been totally misleading for at least three major killers—sepsis, burns, and trauma. As a result, years and billions of dollars have been wasted following false leads, they say” (11). In the present study, however, we provide strong evidence, by reanalysis of the same datasets that were used by Seok et al. (1), that mouse models do in fact closely recapitulate certain aspects of transcriptomic responses in some inflammatory conditions in humans and are useful for studying human disorders and conducting translational research. In some commentaries (4, 6–8), the use of only a single strain (i.e., C57BL/6) has been criticized as a potential caveat of the study by Seok et al. (1). This may not be the case, at least for some datasets we analyzed, because the similarity of the data derived from C57BL/6 with human data are comparable to or even greater than those from other strains of mice, although our study does not exclude the possibility that some other mouse strain mimics the human disorders better. Our results extend those criticisms about the use of only a single strain and indicate the importance of searching for optimal conditions in multiple dimensions, including time course, dose, age, induction protocol, organ, and so on, by using appropriate and nonbiased datamining methods to establish animal models of human disorders. Seok et al. (1) pointed out that all of the nearly 150 clinical trials testing candidate agents intended to block the inflammatory response in critically ill patients have failed (1). Why have these trials failed when such agents have proven effective in mouse models? Although it is beyond the scope of the study to discuss this serious issue in detail, we would like to suggest some possible reasons. First, it is possible that some of the failed drugs were targeting genes with distinctly different responses between humans and mice, although the targeted pathways were indeed enriched for differentially expressed genes in humans and mice. Genes that show altered expression with a P value less than 0.05 in at least three of the conditions are listed in Dataset S2. TNFalpha, a major proinflammatory cytokine that activates the NF-κB pathway, is a therapeutic target for sepsis (22, 23). Candidate agents, such as the TNF antibody, are effective in mouse models (24) but failed to reduce mortality in clinical trials (25). In fact, the mRNA of TNF-alpha is significantly increased in the datasets from mouse models (fold change, 4.8; P = 0.002 in mouse sepsis and fold change, 1.39, P = 0.0023 in mouse infection) but not in the datasets of human conditions assessed in the present study (P > 0.05 in all of the four human conditions; TNF-alpha is not present in Dataset S2, because the table shows the genes significantly changed in three or more biosets), consistent with previous studies that showed little increase in the gene (26, 27) or its product (28, 29) in human sepsis. Most experimental therapies for sepsis have focused on attenuating the proinflammatory response, but the idea that immunosuppression is a major contributor to the conditions has recently become more accepted (30). In this sense, the failure of the trials could be due to an inadequate strategy/target of the therapy. Second, compared with animal models, human sepsis patients comprise of the datasets was not normal, Spearman’s rank correlation (ρ) was calculated between each dataset. In nonparametric ranking analysis of the gene expression signature by NextBio (Fig. 2), genes with a P value of 0.05 or less and an absolute fold change of 1.2 or greater were used, which is the default criterion used in analyses in NextBio. Pearson correlation (R) between the human burn condition and other mouse models was calculated using genes with a P value of 0.05 or less and an absolute fold change greater than 4.0 (Fig. 3, solid red circles) or 2.0 (Fig. 3, open red circles) in the human burn dataset (as a reference) and genes with a P value of 0.05 or less and an absolute fold change greater than 1.2 in other conditions of mouse models (Fig. 3). The percentage of genes changed in the same direction as in the human burn condition was also calculated using genes with the same criteria. The five datasets of human disease and five datasets of mouse models that were used in Seok et al. (1) and are registered in NextBio as well are also plotted in Fig. 3. vein injection - C. albicans vs. saline control.” Pathways/biogroups from GO and canonical pathways of Broad MSigDB were included in this analysis. For identification of the pathways/biogroups showing significant enrichment of the genes changed in Human Burn, a pathway enrichment analysis was conducted among the 6,176 pathways/biogroups registered in GO and canonical pathways of Broad MSigDB, with a P value corrected by the Bonferroni correction for multiple comparisons (P = 8.1 × 10−6 = 0.05/6,176). The pathways/biogroups showing significant enrichment of the genes changed in six other human and mouse conditions were also determined by pathway enrichment analyses using NextBio. Among the pathways/biogroups with significant enrichment in human burn conditions, those also showing significant enrichment in each condition were determined. Pathways/biogroups with P values less than the alpha-level corrected by the Bonferroni correction were considered significantly enriched. Comparison of Pathways/Biogroups Altered in the Human Diseases and Mouse Models. Meta-analyses of the human and mouse gene sets using NextBio were conducted to find the pathways/biogroups commonly changed in the gene sets. Pathways/biogroups commonly altered across datasets of human diseases and mouse models were determined through a combination of rankbased enrichment statistics (see the following section) and biomedical ontologies using NextBio. The datasets used in the meta-analysis were as follows: Human Burn, “leukocytes of patients with severe burns on >20% of total body surface area vs. healthy controls”; Human Trauma, “white blood cells of severe blunt trauma patients 28 d after injury vs. healthy subjects”; Human Sepsis, “whole blood of sepsis patients with community-acquired infection vs. healthy subjects”; Mouse Burn, “WBC from blood at 7 d after burn injury vs. burn injury sham”; Mouse Trauma, “WBC from spleen at 3 d after trauma hemorrhage vs. trauma hemorrhage sham”; Mouse Sepsis, “blood of C57BL6J mice 4 h after Staphylococcus aureus infection vs. uninfected”; Mouse Infection “blood from 8-wk-old BALB-c mice 1 d after tail Computing Overlap P Values of Gene Expression Patterns from Different Datasets. NextBio was used to compare gene expression patterns (signatures) between experiments from human diseases and mouse models. NextBio was accessed on December 31, 2013, for Figs. 1–4, unless otherwise noted. NextBio compares the signatures in publicly available microarray databases with a signature provided by the user using a “Running Fisher” algorithm, as previously described (15, 18). The overlap P value, that is, the direction of the correlation between two given gene signature sets, and the P values between subsets of gene signatures, is calculated with the algorithm. For detailed method for calculation of overlap P values, see SI Materials and Methods. ACKNOWLEDGMENTS. This work was supported by Grants-in-Aid for Scientific Research 25116526, 10301780, and 80420397 from the Ministry of Education, Culture, Sports, Science, and Technology of Japan and by grants from Core Research for Evolutional Science and Technology of the Japan Science and Technology Agency. 1. Seok J, et al.; Inflammation and Host Response to Injury, Large Scale Collaborative Research Program (2013) Genomic responses in mouse models poorly mimic human inflammatory diseases. Proc Natl Acad Sci USA 110(9):3507–3512. 2. Cauwels A, Vandendriessche B, Brouckaert P (2013) Of mice, men, and inflammation. Proc Natl Acad Sci USA 110(34):E3150. 3. de Souza N (2013) Model organisms: Mouse models challenged. Nat Methods 10(4): 288. 4. Drake AC (2013) Of mice and men: What rodent models don’t tell us. Cell Mol Immunol 10(4):284–285. 5. Leist M, Hartung T (2013) Inflammatory findings on species extrapolations: Humans are definitely no 70-kg mice. ALTEX 30(2):227–230. 6. Osterburg AR, et al. (2013) Concerns over interspecies transcriptional comparisons in mice and humans after trauma. Proc Natl Acad Sci USA 110(36):E3370. 7. Remick D (2013) Use of animal models for the study of human disease–a shock society debate. Shock 40(4):345–346. 8. (2013) Of men, not mice. Nat Med 19(4):379-379. 9. Schaff M (2013) Mouse study findings prove contentious among scientists. Pitt News. Available at www.pittnews.com/news/article_41cd755a-a081-577a-b752-2c83fc6628e2. html. Accessed January 17, 2014. 10. Collins DF (2013) Of mice, men, and medicine. NIH Director’s Blog. Available at http:// directorsblog.nih.gov/2013/02/19/of-mice-men-and-medicine/. Accessed January 17, 2014. 11. Kolata G (2013) Health testing on mice is found misleading in some cases. NY Times. Available at www.nytimes.com/2013/02/12/science/testing-of-some-deadly-diseaseson-mice-mislead-report-says.html. Accessed January 17, 2014. 12. Engber D (2013) Septic shock. Slate. Available at www.slate.com/articles/health_and_ science/the_mouse_trap/2013/02/lab_mouse_models_of_inflammation_for_burns_sepsis_ trauma_animal_testing.html. Accessed January 17, 2014. 13. Cossins D (2013) Do mice make bad models? The Scientist. Available at www.the-scientist. com/?articles.view/articleNo/34346/title/Do-Mice-Make-Bad-Models-/. Accessed January 17, 2014. 14. Chu E (2013) This is why it’s a mistake to cure mice instead of humans. Popular Science. Available at www.popsci.com/science/article/2013-02/how-scientists-got-lostcuring-mice-instead-humans. Accessed January 17, 2014. 15. Kupershmidt I, et al. (2010) Ontology-based meta-analysis of global collections of high-throughput public data. PLoS ONE 5(9):e13066. 16. Tseng GC, Ghosh D, Feingold E (2012) Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res 40(9):3785–3799. 17. Powell CM, Miyakawa T (2006) Schizophrenia-relevant behavioral testing in rodent models: a uniquely human disorder? Biol Psychiatry 59(12):1198–1207. 18. Takao K, et al. (2013) Deficiency of schnurri-2, an MHC enhancer binding protein, induces mild chronic inflammation in the brain and confers molecular, neuronal, and behavioral phenotypes related to schizophrenia. Neuropsychopharmacology 38(8): 1409–1425. 19. Ohira K, et al. (2013) Synaptosomal-associated protein 25 mutation induces immaturity of the dentate granule cells of adult mice. Mol Brain 6:12. 20. Walton NM, et al. (2012) Detection of an immature dentate gyrus feature in human schizophrenia/bipolar patients. Transl Psychiatr 2:e135. 21. Hagihara H, Ohira K, Takao K, Miyakawa T (2014) Transcriptomic evidence for immaturity of the prefrontal cortex in patients with schizophrenia. Mol Brain 7:41. 22. Wesche-Soldato DE, Swan RZ, Chung C-S, Ayala A (2007) The apoptotic pathway as a therapeutic target in sepsis. Curr Drug Targets 8(4):493–500. 23. Marshall JC (2014) Why have clinical trials in sepsis failed? Trends Mol Med 20(4): 195–203. 24. Beutler B, Milsark IW, Cerami AC (1985) Passive immunization against cachectin/tumor necrosis factor protects mice from lethal effect of endotoxin. Science 229(4716): 869–871. 25. Abraham E, et al.; NORASEPT II Study Group (1998) Double-blind randomised controlled trial of monoclonal antibody to human tumour necrosis factor in treatment of septic shock. Lancet 351(9107):929–933. 26. Wong HR, et al.; Genomics of Pediatric SIRS/Septic Shock Investigators (2009) Genomic expression profiling across the pediatric systemic inflammatory response syndrome, sepsis, and septic shock spectrum. Crit Care Med 37(5):1558–1566. 27. Wong HR (2013) Genome-wide expression profiling in pediatric septic shock. Pediatr Res 73(4 Pt 2):564–569. 28. Hotchkiss RS, Karl IE (2003) The pathophysiology and treatment of sepsis. N Engl J Med 348(2):138–150. 29. Oberholzer A, Oberholzer C, Moldawer LL (2000) Cytokine signaling—regulation of the immune response in normal and critically ill states. Crit Care Med 28(4, Suppl): N3–N12. 30. Hotchkiss RS, Coopersmith CM, McDunn JE, Ferguson TA (2009) The sepsis seesaw: Tilting toward immunosuppression. Nat Med 15(5):496–497. 31. Lorente JA, Marshall JC (2005) Neutralization of tumor necrosis factor in preclinical models of sepsis. Shock 24(Suppl 1):107–119. 32. Levin M, et al. (2000) Recombinant bactericidal/permeability-increasing protein (rBPI21) as adjunctive treatment for children with severe meningococcal sepsis: A randomised trial. rBPI21 Meningococcal Sepsis Study Group. Lancet 356(9234): 961–967. 33. Marshall JC, et al. (1995) Multiple organ dysfunction score: A reliable descriptor of a complex clinical outcome. Crit Care Med 23(10):1638–1652. 6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1401965111 Takao and Miyakawa