Characterization of host-pathogen interactions through in vivo expression profiling of Plasmodium falciparum in malaria infected patients R. Ordoñez1,3, J.P. Daily5,6, N. Pochet2,4, D. Scanfeld2, K. Le Roch7, D. Plouffe8, M. Kamal2, O. Sarr9, S. Mboup9, O. Ndir10, D. Wypij11, K. Levasseur5, E. Thomas2, P. Tamayo2, C. Dong5, Y. Zhou8, E.S. Lander2,3,13, D. Ndiaye10, D. Wirth5, E.A. Winzeler8,12, J.P. Mesirov2, A. Regev2,3 1. Biomedical Engineering, Florida International University, 10555 W. Flagler St., Miami, Florida, 33174, USA. 2. Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, Massachusetts, 02142, USA. 3. Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, Massachusetts, 02140 USA. 4. FAS Center of Systems Biology, Harvard University, 7 Divinity Avenue, Cambridge, Massachusetts, 02138, USA. 5. Department of Immunology and Infectious Disease, Harvard School of Public Health, 665 Huntington Avenue, Boston, Massachusetts, 02115, USA. 6. Department of Medicine, Brigham and Women’s Hospital, 75 Francis Street, Boston, Massachusetts, 02115, USA. 7. Department of Cell Biology and Neuroscience, 900 University Avenue, University of California, Riverside, California, 92521, USA 8. Genomics Institute of the Novartis Research Foundation, San Diego, California, 92121, USA. 9. Laboratory of Bacteriology and Virology, Dantec Hospital, Cheikh Anta Diop University, Senegal. 10. Department of Parasitology and Mycology, Dantec Hospital, Cheikh Anta Diop University, Senegal. 11. Department of Biostatistics, Harvard School of Public Health, 665 Huntington Avenue, Boston, Massachusetts, 02115, USA. 12. Department of Cell Biology, The Scripps Institute, 10550 Torrey Pines Road, La Jolla, California, 92037, USA. 13. The Whitehead Institute for Biomedical Research, Nine Cambridge Center, Cambridge, Massachusetts, 02142, USA. Millions of people are infected by P. falciparum every year, with clinical outcomes that vary dramatically. The molecular and genetic mechanisms underlying this diversity are poorly defined but likely involve both host and pathogen biology. To study the host-pathogen interaction and clinical outcomes we analyzed whole genome transcription profiles of parasites and human host cells in 43 blood samples taken directly from infected patients. We identified three P. falciparum in vivo transcriptional states, which closely resemble (i) glycolytic metabolism; (ii) a starvation response; and (iii) an environmental stress response (ESR). The glycolytic state is highly similar to the known profile of the ring stage in vitro, but the other states have not been observed in vitro. Here, we evaluate the matched clinical features (e.g. parasitemia, hematocrit, and cytokine levels) and transcriptional profiles of blood cells from these patients to identify host factors that are associated with parasite clusters. Clustering of the human expression profile did not completely mirror the parasite clusters, however host samples matched to the glycolytic or in vitro-like state were generally distinct from the profiles of host samples matched to the two novel parasite clusters, starvation and ESR. Furthermore, patients with a higher cytokine and inflammatory response were significantly associated with parasites in these novel clusters, particularly in the cluster exhibiting the ESR. Gene Set Enrichment Analysis (GSEA) of the human transcriptional profiles identified an inflammatory response gene signature that was significantly associated with clinical phenotypes. Finally, when applying the same procedure to parasite expression profiles, we found enrichment of gene sets involving ribosomal proteins and virulence. Interestingly, the expression profiles of such parasite genes clustered into the three distinct clusters previously seen in this study, which may implicate the human inflammatory response in the observed metabolic shifts. The results reveal a previously unknown physiological diversity in the in vivo biology of the malaria parasite. This novel approach of studying host and parasite interactions through transcriptional profiling may reveal disease mechanisms and allow for the identification of targets for intervention. INTRODUCTION Infection with the malaria parasite, P. falciparum, begins when the Anopheles gambiae mosquito, the primary malaria-causing vector in humans, takes a blood meal1. The mosquito injects sporozoites with its saliva, which are carried to the liver where they invade hepatocytes and, through asexual replication, develop into schizonts. The schizonts rupture, releasing merozoites into the bloodstream where they go on to invade erythrocytes. After invasion, the parasite, now called a trophozoite or “ring”, grows inside the erythrocyte, ingesting host cytoplasm and breaking down hemoglobin into amino acids. It is during the intraerythrocytic stage of the P. falciparum life cycle that the human host exhibits the majority of its symptoms. Plasmodium falciparum infection affects children most severely, but the range of clinical outcomes varies from mild flu-like symptoms to coma and death. Only a small proportion of infected patients develop severe malaria, which affects several tissues and organs, even when marked manifestations may seem to involve a single organ such as the brain2. No simple one-toone correlation between the clinical syndromes and the pathogenic response has been discovered. Moreover, the molecular and genetic mechanisms underlying this diversity are poorly understood. Current animal and in vitro models inadequately represent the in vivo environment, and previous studies have found little variation between expression profiling of different P. falciparum strains in vitro3. Thus, studying the in vivo transcription profiles may reflect variation in the host environment influencing P. falciparum biology. Several cytokines are known to be involved in the pathogenesis of infection with malaria, including the pro-inflammatory, TNF, IL6, and INF, and the anti-inflammatory, IL10 and TGF. IL12p70 has been found to be inversely related to parasitemia and TNF. Fibrinogen, involved in clotting mechanisms, decreases parasite binding to ICAM1 under flow. ICAM1 has been implicated in cerebral malaria4. Examining the complex components of host and parasite interactions allows for the possible identification of targets for the enhancement of our medical arsenal. RESULTS & DISCUSSION We measured the parasite expression profiles directly from venous blood samples of forty-three infected patients residing in Senegal, with a diverse age range (8.3± 6.9 years) and illness severity (parasitemia 5.5% ± 6.2%, hematocrit 32.3 ± 6.8). The forty-three samples were hybridized to a custom P. falciparum (3D7 strain) chip, and to study the host expression profiles, 28 of the samples were hybridized to the HG_U133A Affymetrix chip. Using a Non-Negative Matrix Factorization (NMF) algorithm5, we clustered the expression profiles of the forty-three parasite samples (Figure 1). All of the parasites looked similar in the patient blood samples, early ring stage, and Cluster 2 samples resembles the profiles of the early ring stage seen in the 3D7 strain grown in vitro6-8 (Figure 2). However, the expression profiles for the samples in Clusters 1 and 3 contrasted with Cluster 2 in that they did not correlate with the profiles of early rings or late stages of the asexual parasite life cycle in vitro, thus representing novel transcriptional states. Clusters 1 and 2 are diametrically opposed to each other and reasonably uniform within each clustering, suggesting a global transcriptional shift. Cluster 3 is more heterogeneous with some apparent substructure; furthermore, computational analysis indicates that Cluster 3 is not a mixture of populations in Cluster 1 and Cluster 2 states. Plasmodium falciparum has a complex life cycle with distinct profiles in erythrocytes, so we performed Gene Set Enrichment Analysis to investigate the possibility of the clusters representing different life cycle stages. GSEA identified gene sets differentially expressed between clusters, supporting the idea that a major metabolic shift occurs between Cluster 1, and Cluster 2, where gene sets associated with a starvation response and glycolytic metabolism were induced, respectively (Figure 3). Extensive knowledge, such as expression profiles and gene modules, is available on the responses in Saccharomyces cerevisiae, so we projected a large expression compendium onto the expression space defined by the three P. falciparum NMF clusters (Figure 4). For each of the parasite clusters, we identified a set of similar S. cerevisiae profiles and examined their biological annotations. Cluster 2 matched S. cerevisiae profiles associated with normal fermentative growth (168/287 experiments, P=2.3X10-23). Cluster 1 matched profiles associated with S. cerevisiae starvation responses (44/113, P=1.5X10-7), as well as mutations in the general transcription machinery (23/53 experiments, P=2.8X10-5). Cluster 3 was strongly associated with S. cerevisiae environmental stress experiments (278/438, P=4.6X10-22), consistent with patients’ elevated levels of inflammation. These findings imply that Cluster 2 is consistent with glycolytic growth in vivo, while the starvation response seen in Cluster 1 suggests may lead a metabolic shift in the asexual stage of P. falciparum. Cluster 3 was strongly associated with S. cerevisiae profiles measured under environmental stress (e.g. heat shock, oxidative stress, osmotic stress) and showed clear correlation with the patients’ clinical phenotypes. In particular, the patients have higher temperature and inflammation and elevated levels of the cytokines IL-6 and IL-10, which have been associated with more severe outcomes9. It has been previously demonstrated that parasite biology can change in response to environmental cues10,11, such as heat shock treatment in vitro resulting in increased virulence. Thus, to study the clinical significance of these clusters, we analyzed the human host response. NMF consensus clustering of human expression profile did not match the parasite clusters, although three different clusters did manifest whose significance remains to be elucidated (Figure 5). GSEA of human clusters revealed significant enrichment in many gene sets (FDR ≤ 0.05) e.g. DNA replication, RNA transcription, and DNA repair. Using parasite clusters as phenotype for GSEA, gene sets related to carbon sources (e.g. fatty acid metabolism, nitrogen metabolism, and glycolytic pathways) were not enriched, suggesting that the parasite response may not induce the host to seek alternate metabolic pathways to compensate for the parasite’s metabolism. Using each of the clinical features (parasitemia, hematocrit, IL6, IL10, IL15, IL12 p70, creactive protein, TNF, IFN, ICAM1, VCAM1, lymphotactin, fibrinogen, TGF, tissue factor, p-selectin, and glucose sera levels) as the phenotype, GSEA of the both the host and parasite data returned enrichment of a variety of gene sets. Inflammatory response and oxidative phosphorylation gene signatures were enriched in the patient expression profiles, and in the parasite, gene sets related ribosomal proteins, cell cycle, and virulence were enriched (FWER ≤ 0.05). The leading-edge subset is a set of genes that can be interpreted as the core that drives the overall enrichment signal. Thus, for each of the analyses performed on the host and parasite data, the “leading edges” of both positively and negatively correlated gene sets were extracted and run through an overlap analysis, which extracts the union of these core enrichment genes. From this union, a matrix of gene sets vs. genes was then clustered in Genomica12. Only genes with five or more hits across gene sets were included in this clustering (Figure 6). NMF consensus clustering of the expression profile of the clinically correlated genes did not reveal clustering associated with the parasite clusters (Figure 7). However, NMF clustering of the parasite expression profile using gene set reflected the previously identified physiologically distinct clusters. It also appears that Cluster 3 may have more substructures. Interestingly, in the NMF clustering where k =2, one cluster contained nearly all parasite Cluster 2 and two samples of Cluster 3, and the second cluster contained all of parasite Cluster 1 and the remainder of Cluster 3. Patients in this latter cluster were associated with elevated markers of inflammation, such as higher temperature, inflammation, and anemia. CONCLUSIONS & FUTURE WORK Pathogenesis studies in other systems have demonstrated that organisms have distinct biology in vivo as compared to in vitro models and that some of these differences relate to virulence13. Little is known about the biology of P. falciparum residing in the human circulation. Our results establish at least three distinct physiological states related to glycolytic growth, a starvation response, and a general stress response associated with clinical outcomes. In addition, these metabolic shifts may be driven by inflammatory response or parasite genetics. It is important to note the induction of adherence and virulence genes in Cluster 1, and that both this cluster and Cluster 3 are significantly associated with host inflammation. Finally, if the distinct profiles represent persistent physiological differences, they may identify novel drug targets for malaria or suggest alternative therapies. We can also apply these findings in updating current in vitro experiments by varying carbon sources and exploring response to cytokines. METHODS SUMMARY Patient population and sample handling. Venous blood samples from P. falciparum infected patients in Senegal were directly added to Tri-Reagent BD (Molecular Research Center, Cincinnati, OH). This cohort consisted of patients who presented to the district hospital in Velingara, Senegal with fever and symptoms suggestive of malaria. Enrollment criteria consisted of a ≥1% P. falciparum infection. RNA was isolated, and steady state parasite mRNA levels of forty-three samples were determined using a custom-made Affymetrix chip based on the 3D7 genome as previously reported7. Transcriptional Analysis. The patient derived transcriptional profiles were normalized to each other and to previously published in vitro datasets6-8. Sample were clustered using Nonnegative Matrix Factorization (NMF)5 procedure that finds a small number of gene combinations (metagenes) that best capture the behavior of an expression data set. The number of clusters was determined using consensus clustering and maximizing the cophenetic correlation coefficient. To project yeast expression data onto our parasite data set we first identified 1247 S. cerevisiae genes that have P. falciparum orthologues. We then used a Support Vector Machine predictor to project 1449 previously published S. cerevisiae expression profiles into the three metagene factor NMF representation described above with a confidence level determined by a Brier score14. Experiments scoring highly in a given factor were associated to the P. falciparum cluster represented by that factor. We then used a hypergeometric enrichment test to identify biological conditions enriched in the profiles associated with each cluster. Gene sets that are differentially expressed between clusters were identified by GSEA 15, based on a weighted Kolmogorov-Smirnov-like statistic. Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). When analyzing the host-pathogen interactions in GSEA, the phenotype profile was the profile of clinical feature of interest. The selected clinical feature became a continuous phenotype label. Familywise-error rate (FWER) is a conservatively estimated probability that the normalized enrichment score represents a false positive finding. False discovery rate (FDR) is the estimated probability that the normalized enrichment score represents a false positive finding. APPENDIX FIGURE 1: Parasite clusters and associated patient clinical features FIGURE 2: Parasite samples reflecting the ring stage present in all three clusters FIGURE 3: GSEA of parasite Clusters 1 & 2 FIGURE 4: Yeast expression profiles projected onto parasite clusters FIGURE 5: Hierarchical and NMF clustering of human expression profiles FIGURE 6: Gene vs. Gene Set Matrices of clinical correlates – Host and Parasite FIGURE 7: Clustering of clinically correlated genes in host and parasite samples REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Angulo, I. & Fresno, M. Cytokines in the pathogenesis of and protection against malaria. Clin Diagn Lab Immunol 9, 1145-52 (2002). Miller, L.H., Baruch, D.I., Marsh, K. & Doumbo, O.K. The pathogenic basis of malaria. Nature 415, 673-9 (2002). Llinas, M., Bozdech, Z., Wong, E.D., Adai, A.T. & DeRisi, J.L. Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains. Nucleic Acids Res 34, 1166-73 (2006). Jason, J. et al. Cytokines and malaria parasitemia. Clin Immunol 100, 208-18 (2001). Brunet, J.P., Tamayo, P., Golub, T.R. & Mesirov, J.P. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci U S A 101, 4164-9 (2004). Bozdech, Z. et al. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol 1, E5 (2003). Le Roch, K.G. et al. Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 301, 1503-8 (2003). Young, J.A. et al. The Plasmodium falciparum sexual development transcriptome: a microarray analysis using ontology-based pattern identification. Mol Biochem Parasitol 143, 67-79 (2005). Lyke, K.E. et al. Serum levels of the proinflammatory cytokines interleukin-1 beta (IL1beta), IL-6, IL-8, IL-10, tumor necrosis factor alpha, and IL-12(p70) in Malian children with severe Plasmodium falciparum malaria and matched uncomplicated malaria or healthy controls. Infect Immun 72, 5630-7 (2004). Pavithra, S.R., Banumathy, G., Joy, O., Singh, V. & Tatu, U. Recurrent fever promotes Plasmodium falciparum development in human erythrocytes. J Biol Chem 279, 46692-9 (2004). Udomsangpetch, R. et al. Febrile temperatures induce cytoadherence of ring-stage Plasmodium falciparum-infected erythrocytes. Proc Natl Acad Sci U S A 99, 11825-9 (2002). Segal, E., Friedman, N., Kaminski, N., Regev, A. & Koller, D. From signatures to models: understanding cancer using microarrays. Nat Genet 37 Suppl, S38-45 (2005). Mahan, M.J., Slauch, J.M. & Mekalanos, J.J. Selection of bacterial virulence genes that are specifically induced in host tissues. Science 259, 686-8 (1993). Tamayo, P. et al. Metagene projection for cross-platform, cross-species characterization of global transcriptional states. Proc Natl Acad Sci U S A 104, 5959-64 (2007). Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-50 (2005).