Gene Expression and Cell Identity Alexander Diehl ImmPort Science Talk 3/20/14 Understanding the Nature of Entities in Reality Depends on What Parts We See Reality is Often More Complex Than at First Glance And Perspective is Important What are Cells? • Cell are physical entities that exist in reality. • We understand aspects of cells based on results of experimental assays. • Our knowledge of cell types is necessarily incomplete even as we attempt to understand their nature. How We Represent Cells in the Cell Ontology • Morphology • Surface marker expression, singly or in combination • Transcription factor expression or expression of other internal protein • By lineage • By function or capability Types of Evidence Behind the Representation of Cells in CL • • • • • • Microscopy, with or without staining (histology). Immunofluorescence in situ or in vitro Flow cytometry or CyTOF Colony formation assays In vivo/in vitro lineage tracking Directs assays of cellular function, typically in vitro • Indirect assays of cellular function in vivo • And rarely, assays of gene expression. Experimental Data from Multiple Sources Is Synthesized into a Single Definition in CL 9 Challenges in Ontology Building • We want to represent both general cell types and specific cell types. • Many cell types are considered equivalent across species in their general characteristics such as surface marker expression or functions. • Hematopoietic cell types in different species, such as mouse and human, sometimes are called the same name but are defined by different sets of surface markers. Challenges in Ontology Building • We need to provide unique representations via logical definitions for each cell type. • We need to recognize that in some cases, different combinations of markers may identify the same cell type. The HIPC Lyoplate Project The HIPC Lyoplate Project • Standardization of human PBMC immunophenotyping to enhance reproducibility across different facilities. • Use of eight color flow cytometry with standardized antibody panels • Use of standardized sample preparation • Use of standardized instrument settings • Use of standardized data analysis The HIPC Lyoplate Project HIPC-Defined Cell Types in CL HIPC-Defined Cell Types in CL HIPC-Defined Cell Types in CL HIPC-Defined Cell Types in CL HIPC-Defined Cell Types in CL “effector CD8+ T cell” “effector CD8+ T cell” “effector CD8+ T cell” A request to CL… TEMRA = a memory T cell without CD45RO expression. Are These the Same Cell Type? Are These the Same Cell Type? Are These the Same Cell Type? But… “effector CD4+ cell”? The Gene Expression Part Questions: Can we use the structure of the Cell Ontology as a framework for comparing gene expression data tied to specific cell types? Can we use the CL framework to identify genes that distinguish one cell type from closely related cell types? The Immunological Genome Project Linking IGP data to CL cell types. • IGP provides gene expression data based on sorted mouse immune cell types developed according to standardized methods. • We mapped IGP cell types to cell type terms in the Cell Ontology. • The CL structure was used to guide comparisons between gene expression profiles of different cell types. Linking IGP data to CL cell types. • 88 cell types were compared in a pairwise fashion. • Separate gene sets were created for genes whose expression differed by greater than or less than 1.5 fold, respectively, for each pairwise comparison. • 7656 gene sets resulted. • An ontological framework was created to map these gene sets to Cell Ontology classes. Map IGP samples to CL Generate pairwise comparisons (PWC) from IGP dataset Figure 2 Map PWCs to the CL Choose a CL term yes 1 Does CL term or a descendant have mapped PWC? 0 How many PWCs are shared with a neighbor or one of its descendants? >1 Workflow of CL-IGP Project no DO NOT associate genes with CL Term Figure 3 Find intersection (shared genes) of PWC matrix Associate genes with CL Term genes from PWC #1 genes from PWC #2 A. Find Pairwise comparisons: “that have the name ‘germinal center B cell’ in ‘Parent’ that can be reached by a ‘has_up_regulated_genes’” and “that have the name ‘marginal zone B cell’ in ‘Parent’ that can be reached by a ‘has_down_regulated_genes’” OUTPUT = 1 gene set B. Find Pairwise comparisons: “that have the name ‘germinal center B cell’ in ‘Parent’ that can be reached by a ‘has_up_regulated_genes’” and “that have the name ‘mature B cell’ in ‘Ancestor’ that can be reached by a ‘has_down_regulated_genes’” OUTPUT = 7 gene sets C. Find Pairwise comparisons: “that have the name ‘Mature B cell’ in ‘Ancestor’ that can be reached by a ‘has_up_regulated_genes’” and “that have the name ‘mature lymphocyte’ in ‘Ancestor’ that can be reached by a ‘has_down_regulated_genes’” and “that do NOT have the name ‘Mature B cell’ in ‘Ancestor’ that can be reached by an ‘has_down_regulated_genes’” OUTPUT = 256 gene sets Searching for Pairwise Comparisons Mapped to CL To find genes that dis nguish mature B cells from other lymphocytes…. B cell genes é compared to mature NK cells B cell genes é compared to mature T cells ….An R func on takes all relevant pair-wise comparisons (Fig2c) as input… ID GS: 9100825 GS: 9100826 GS: 9100831 Name 825d 826d 831d Defini on CD4+ NKT cell of liver vs marginal zone B cell CD4+ NKT cell of spleen vs marginal zone B cell Vg2- γδ T cell of spleen vs marginal zone B cell ….Etc for 253 more pairwise comparisons ….finds the genes present on all comparisons … 825d ID 10361292 10405216 10467508 10567863 10420758 10467258 10562132 10495781 logFC -8.49 -5.01 -5.99 -6.57 -5.56 -5.35 -5.86 -5.09 826d ID logFC 10405216 -5.13 10361292 -8.56 10567863 -6.21 10467508 -5.64 10420758 -5.50 10492983 -5.59 10368675 -6.67 10562132 -5.70 831d ID logFC 10405216 -4.82 10361292 -8.01 10467508 -5.66 10420758 -5.49 10467258 -5.26 10567863 -6.11 10492983 -5.40 10384974 -4.82 …And then outputs genes ranked by mean fold change. Summary of Upregulated and Downregulated Genes by Cell Type Novel Genes Identified for Specific Cell Types and Confirmed by IGP Scd1 I830077J02Rik Scd1 is an enzyme involved in biosynthesis of monounsaturated fatty acids whose expression is restricted to mature B cells types in comparison to other immune cell types. I830077J02Rik, a single-pass transmembrane protein, otherwise uncharacterized, is widely expressed among myeloid cells. In lymphocytes, expression of this protein is restricted to marginal zone B cells. normalized gene expression Novel Genes Identified for Specific Cell Types and Confirmed by IGP Scd1 I830077J02Rik Scd1 is an enzyme involved in biosynthesis of monounsaturated fatty acids whose expression is restricted to mature B cells types in comparison to other immune cell types. I830077J02Rik, a single-pass transmembrane protein, otherwise uncharacterized, is widely expressed among myeloid cells. In lymphocytes, expression of this protein is restricted to marginal zone B cells. normalized gene expression B A GC B cell é vs ac vated NK Cell GC B cell é vs MZ B cells GC B cell é vs ac vated CD8 T cell 550 Genes > 2fold up C Gene symbol Rgs13 Igj Aicda Mybl1 Ighv1-43 Gm600 Rasgrp3 Gcet2 Rassf6 IgKv Fold Change 102.5 86.2 74.3 62.2 53.5 51.8 48.6 47.3 44.1 30.0 Stdv 6.3 22.9 13.2 50.9 46.4 5.4 43.6 2.0 1.5 25.1 Affy ID MGI ID 10358399 10531126 10541507 10353010 10403021 10416006 10446965 10436024 10531261 10545190 MGI:2180585 MGI:96493 MGI:1342279 MGI:99925 MGI:3704124 MGI:2685446 MGI:3028579 MGI:102969 MGI:1920496 * Candidate Genes Involved in the Unique Functions of Germinal Center B cells Conclusions • Gene expression comparisons placed in an ontology framework can provide details about genes uniquely expression in particular immune cell subtypes. • Results of our approach has been validated against non-ontologically based analyses of IGP data, for instance for NK cells, and similar results are seen. Acknowledgements • • • • • • • • Barry Smith Alan Ruttenberg Ryan Brinkman Raphael Gottardo Richard Scheuermann David Dougall Holden Maeckler Philip McCoy • Terry Meehan • Nicole Vasilevsky • Chris Mungall • Melissa Haendel • Judy Blake • And many other contributors to the CL