1991198283914738 Ontologies for Immunology Barry Smith https://immport.niaid.nih.gov/ NIAID Division of Allergy, Immunology and Transpantation (DAIT) Areas of Research • Allergic Diseases • Asthma • Autoimmune Diseases • Food Allergy • Immune Tolerance • Medical Countermeasures Against Radiological and Nuclear Threats • Transplantation DAIT-Funded Projects Depositing Data into ImmPort • • • • Immune Tolerance Network (ITN) Atopic Dermatitis and Vaccinia Network (ADVN) Population Genetics Analysis Program Immune Function and Biodefense in Children, Elderly, and Immunocompromised Populations • HLA Region Genetics in Immune-Mediated Diseases • Modeling Immunity for Biodefense Goals of ImmPort • Accelerate a more collaborative and coordinated research environment • Create an integrated database that broadens the usefulness of scientific data • Advance the pace and quality of scientific discovery while extending the value of scientific data in all areas of immunological research • Integrate relevant data sets from participating laboratories, public and government databases, and private data sources • Promote rapid availability of important findings • Provide analysis tools to advance immunological research Standards and ontologies needed to advance especially this • Integrate relevant data sets from participating laboratories, public and government databases, and private data sources SDY 165: Characterization of in vitro Stimulated B Cells from Human Subjects shared to SemiPublic Workspace (SPW) Proje SDY 165: Characterization of in vitro Stimulated B Cells from Human Subjects shared to SemiPublic Workspace (SPW) Project During the human B cell (Bc) recall response, rapid cell division results in multiple Bc subpopulations. RNA microarray and functional analyses showed that proliferating CD27lo cells are a transient pre-plasmablast population, expressing genes associated with Bc receptor editing. Undivided cells had an active transcriptional program of non-ASC B cell functions, including cytokine secretion and costimulation, suggesting a link between innate and adaptive Bc responses. Transcriptome analysis suggested a gene regulatory network for CD27lo and CD27hi Bc differentiation. Functionally Distinct Subpopulations of CpG-Activated Memory B Cells, Alicia D. Henn … & Martin S. Zand, Pubmed 2246822 Figure 5: The Fate of CD27lo cells ImmPort Antibody Registry ImmPort Antibody Registry BD Lyoplate Screening Panels Human Surface Markers http://pir.georgetown.edu/cgibin/pro/entry_pro?id=PR:000001963 Discoverability • Find all ImmPort studies involving genes associated with B cell receptor editing • Find all data in public and government databases relating to B cell receptor editing GOPubMed: 179 documents (B cell receptor editing Zand) AND ("Zand"[au]) SDY 165 Discoverability During the human B cell (Bc) recall response, rapid cell division results in multiple Bc subpopulations. RNA microarray and functional analyses showed that proliferating CD27lo cells are a transient pre-plasmablast population, expressing genes associated with Bc receptor editing. Undivided cells had an active transcriptional program of non-ASC B cell functions, including cytokine secretion and costimulation, suggesting a link between innate and adaptive Bc responses. Transcriptome analysis suggested a gene regulatory network for CD27lo and CD27hi Bc differentiation. NIAID Sample Data Sharing Plan (Last Reviewed February 12, 2013) • Sharing of data generated by this project is an essential part of our proposed activities and will be carried out in several different ways. • Presentations at national scientific meetings. … it is expected that approximately four presentations at national meetings would be appropriate. … • Annual lectureship. A lectureship has brought to the University distinguished scientists and clinicians … • Newsletter. The [disease interest group] publishes a newsletter … • Web site of the Interest Group. The [interest group] currently maintains a Web site where information [about the disease] is posted. Summaries of the scientific presentation from the [quarterly project] meetings will be posted on this Web site, written primarily for a general audience. [Link to Web site] • Annual [Disease] Awareness wee k…. • SAGE Library Data. It is our explicit intention that these [Serial analysis of gene expression] data will be placed in a readily accessible public database. … Plan addressing Key Elements for a Data Sharing Plan under NIH Extramural Support (Last Reviewed August 09, 2012) What data that will be shared [sic]: I will share phenotypic data associated with the collected samples by depositing these data at _______which is an NIH-funded repository. ... Additional data documentation and de-identified data will be deposited for sharing along with phenotypic data, which includes demographics, family history of XXXXXX disease, and diagnosis, consistent with applicable laws and regulations. … Meta-analysis data and associated phenotypic data, along with data content, format, and organization, will be available at ___. Submitted data will confirm [sic] with relevant data and terminology standards. Who will have access to the data: … Where will the data be available: … When will the data be shared: … How will researchers locate and access the data: I agree that I will identify where the data will be available and how to access the data in any publications and presentations that I author or co-author about these data … repository has policies and procedures in place that will provide data access to qualified researchers, fully consistent with NIH data sharing policies and applicable laws and regulations. What should be required / recommended • • • • • • • LOINC CDISC BRIDG SNOMED Minimal Information Checklists OBI Immunology Ontologies Lab resources do not recommend use of LOINC http://www.niaid.nih.gov/LabsAndResour ces/resources/DAIDSClinRsrch/Document s/gclp.pdf Can some these be built into • • • • LDMS LIS CTMS EHRs need to explore relation of SDY data in ImmPort to CDISC From the Summary specification CDISC definitions • A study arm represents one planned path through the study. The path is composed of a study cell for each epoch in the study. • A study cell is the part of study design that describes what happens in a particular epoch for a particular arm. The cell describes how the purpose of its epoch is fulfilled for each arm. CDISC Structural Elements Building blocks of a study design: Epochs, Cells, Arms, Segments, Activities • A study arm represents a path • A path is composed of a cell • A cell is a part of a study design • A cell describes what happens in an epoch • An Activity represents a point in a study at which a specific action is to be taken. BRIDG • http://bridgmodel.nci.nih.gov/files/BRIDG_M odel_3.2_html/index.htm BRIDG Adverse Event BRIDG Adverse Event Product Brief - Biomedical Research Integrated Domain Group (BRIDG) Implementations/ Case Studies (Actual Users) • The BRIDG Project is a collaborative effort engaging stakeholders from four organizations: – Clinical Data Interchange Standards Consortium (CDISC) – HL7 Regulated Clinical Research Information Management Working Group (HL7 RCRIM WG) – National Cancer Institute (NCI), including the Cancer Biomedical Informatics Grid (caBIG™) project – Food and Drug Administration (FDA) Does BRIDG have any users? Ballot Cycle Info Misc Notes Product Type Project Document Repository 2010 May Ballot Cycle Info: INFORMATIVE Ballot results: Met basic vote requirements for approval. 24 Negative votes to reconcile Document Name: HL7 Version 3 Domain Analysis Model: Biomedical Research Integrated Domain (BRIDG), Release 1 Ballot Code: V3DAM_BRIDG_R1_TBD NIB Submitted By: Edward Tripp May 2013: Published NE 2013; PMO archiving. March 2013: Received ANSI approval for Technical Report of HL7 Version 3 Domain Analysis Model: Biomedical Research Integrated Domain (BRIDG), Release 1. PMO set status as Ready for Normative Edition Publication. Project Team does not need to do any further work. Nov 2012: TSC approved a publication request for HL7 Version 3 Domain Analysis Model: Biomedical Research Integrated Domain (BRIDG), Release 1 as an informative document and registration with ANSI as a Technical Report, from TSC tracker # 2406. As of 2012-09-01: Project restarted. BRIDG project team re-writing new NWIP for submission September 2012. 2011Dec: Per L. Laakso The Bridg project at PI 538 is an ISO/JIC project (ISO/CD 14199). PMO changed it to be an ISO/JIC project type. 2010Nov(LL): updated dates per RCRIM three-year plan. cyclical development indicative of informative ballot developemnt Jan cycles, ballot May cycles, publication and renewed development in September cycles. 2010July: added repository URL Sept 2009: PMO changed from 3-Year Plan item to Active Project Domain Analysis Model (DAM) http://wiki.hl7.org/index.php?title=BRIDG_as_DAM What should be required / recommended • • • • • • • LOINC CDISC BRIDG SNOMED Minimal Information Checklists OBI Immunology Ontologies MIFLOWCYT: Minimal Information for a Flow Cytometry Experiment Minimal Information about a Cellular Assay Project Header Source: Contact details of researcher/person in charge of the project (name, affiliation/institution, department, address, Email, etc.). Project: Description (text) of the project within a larger context (biological process that is addressed; description of measured effect, controls, etc.). Application: Description (text) of the specific application of this project (abstract; reference to publication). Array(s): culturing and reaction container(s) that are used during the project (name; • identifier; type (e.g. 384, 96, 24 well, flask, glass slide for cell arrays); vendor or • manufacturer; order-number; surface area/feature size. CellLine(s): Description of cell lines employed (name; identifier; ATCC number (if applicable) or details: Species, tissue, organ, contact-details (when from different lab); • reference to publication; passage number; mycoplasma test (Y/N) and other validation; • modifications (optional, if any made, e.g. stably transfected, induced resistance, etc.). Reagents: Media, supplements, kits, buffers, and solutions (name, identifier, vendor or manufacturer, order number, lot number). Perturbator(s): Description of materials/conditions that are used in the project to perturb the cells (type - e.g. siRNA, cDNA, small chemical compound, name, external references • gene/protein identifiers/order numbers, sequences if applicable). Instrument(s): Description of the data acquisition station and other instruments utilized in the project, e.g. for transfection (name, type, model, manufacturer). Minimal Information about a Cellular Assay Experimental modules Treatment(s): Description of the conditions that are applied to the cells during culturing (name, identifier, time-stamp, materials used, volume, actual passage number of cells, seeding density, temperature, CO2-content, humidity). Perturbation(s): Description of the perturbation, (special case of a treatment), which describes the application of ‘perturbator(s)’, e.g. transfection (siRNA, expression clone), treatment with small compound, temperature shift, etc. PostTreatment(s): Description of the conditions that are applied after culturing and prior to data acquisition, i.e. lysis, fixation, staining, antibody incubation, etc. DataAcquisition(s): Detection of the effect(s) induced by the perturbation (identifier, time stamp, reference to instrument above, instrument-settings, e.g. excitation and emission wavelengths with filter sets, lamp/Laser energy). DataProcessing Description of the processes applied to analyze the raw-data in order to generate a hit list. Reference to publication(s) describing the procedures and/or to software utilized (incl. version and settings). Links to the raw, the processed, and to the interpreted data. Minimal Information about a T-Cell Assay • Janetzki S, Britten CM, Kalos M, Levitsky HI, Maecker HT, Melief CJ, Old LJ, Romero P, Hoos A, Davis MM. 2009. "MIATA"-Minimal Information about T Cell Assays. Immunity. 31: 527-8 http://www.miataproject.org/ http://mibbi.sourceforge.net/portal.shtml MIBBI= Minimal Information about a Biological or Biomedical Investigation • How to make MIBBI checklists non-redundant, factorable, such as to support interoperability / sharing of data? • Need to use the same words for the same things and events in each checklist • OBI = Ontology for Biomedical Investigations OBI representation of a neuroscience study OBI: Vaccine Protection Investigation OBI ontology terms (examples) OBI Relations Advantages of OBI • Federal reporting (drug trials) • Enhancement of plans for data sharing • Supports retrieval and meta-analysis by allowing searches over protocols, metadata about experiments, imaging processes, statistical processes, sources of analytes, equipment vendors, … • Supports comparison of runs performed by different labs on the same machines, using the same sorts of settings, stainings, samples … ImmPort (recommended) Ontologies Chemical Entities of Biological Interest (CHEBI) The Protein Ontology (PRO) The Gene Ontology (GO) GO Annotation for the Immune System The Cell Ontology (CL) Beta Cell Genomics Ontology (BCGO) The Immune Epitope Ontology (ONTIE) The Infectious Disease Ontology (IDO) Ontology for General Medical Science (OGMS)) Desiderata Allergy Ontology Immunology Ontology Autoimmune Disease Ontology How to advance interoperability of immunology (clinical trial) data How connect • experimental data • ontology resources? • 1. identify ways that submitters of data can benefit early on through use of ontologies • 2. identify ways to make it easier for submitters of data to use ontologies • 3. identify ways to annotate data using ontologies post-submission 1. identify ways that submitters of data can benefit early on through use of ontologies a. help to satisfy NIH mandates – can we influence NIH mandates? b. help to satisfy regulatory reporting requirements (FDA) c. help to analyze their data (and to incorporate other sorts of data) d. help them to do better science by imitating successes of others who have exploited ontologies 2. identify ways to make it easier for submitters of data to use ontologies a. build them into EHRs – can we influence EHR vendors? b. build them into Lab Information Systems c. build them into Clinical Trial Management Systems 3. identify ways to annotate data using ontologies post-submission a. NLP - New tools for classification and monitoring of autoimmune diseases.Maecker HT, Lindstrom TM, Robinson WH, Utz PJ, Hale M, Boyd SD, Shen-Orr SS, Fathman CG.Nat Rev Rheumatol. 2012 May 31;8(6):317-28 - Towards a Cytokine-Cell Interaction Knowledgebase of the Adaptive Immune System. Shen-Orr SS, Goldberger O,Garten Y,Rosenberg-Hasson Y,Lovelace PA,Hirschberg DL, Altman RB, Davis MM, Butte AJ.Pacific Symposium on Biocomputing 2009:439-450 b. manual annotation of ImmPort summaries c. ? The Infectious Disease Ontology with thanks to Albert Goldfain and Lindsay G. Cowell 71 IDO-Core • OBO Foundry ontology based on BFO and OGMS • Contains general terms in the ID domain: • E.g., ‘colonization’, ‘pathogen’, ‘infection’ • Intended to represent information along several dimensions: • biological scale (gene, cell, organ, organism, population) • discipline (clinical, immunological, microbiological) • organisms involved (host, pathogen, and vector types) • A hub for further extension ontologies • A contract between IDO extension ontologies and the datasets that use them. 72 “Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease” 73 ICD 9: Catch-all Codes and Scattered Exclusions • 041 Bacterial infection in conditions classified elsewhere and of unspecified site • Note: This category is provided to be used as an additional code to identify the bacterial agent in diseases classified elsewhere. This category will also be used to classify bacterial infections of unspecified nature or site. • Excludes: septicemia (038.0-038.9) • 041.1 Staphylococcus • 041.10 Staphylococcus, unspecified • 041.11 Methicillin susceptible Staphylococcus aureus • MSSA • Staphylococcus aureus NOS • 041.12 Methicillin resistant Staphylococcus aureus • Methicillin-resistant staphylococcus aureus (MRSA) • 041.19 Other Staphylococcus • 038 Septicemia • 038.1 Staphylococcal septicemia • 038.10 Staphylococcal septicemia, unspecified • 038.11 Methicillin susceptible Staphylococcus aureus septicemia • MSSA septicemia • Staphylococcus aureus septicemia NOS • 038.12 Methicillin resistant Staphylococcus aureus septicemia • 038.19 Other staphylococcal septicemia 74 ICD 9: Catch-all Codes and Scattered Exclusions • 041 Bacterial infection in conditions classified elsewhere and of unspecified site • Note: This category is provided to be used as an additional code to identify the bacterial agent in diseases classified elsewhere. This category will also be used to classify bacterial infections of unspecified nature or site. [041.19] Other Staphylococcus • Excludes: septicemia (038.0-038.9) • 041.1[041] Staphylococcus Bacterial infection in conditions classified elsewhere and of • 041.10 Staphylococcus, unspecified unspecified site. • 041.11 Methicillin susceptible Staphylococcus aureus • MSSA • Staphylococcus aureus NOS • 041.12 Methicillin resistant Staphylococcus aureus • Methicillin-resistant staphylococcus aureus (MRSA) • 041.19 Other Staphylococcus • 038 Septicemia • 038.1 Staphylococcal septicemia • 038.10 Staphylococcal septicemia, unspecified • 038.11 Methicillin susceptible Staphylococcus aureus septicemia • MSSA septicemia • Staphylococcus aureus septicemia NOS • 038.12 Methicillin resistant Staphylococcus aureus septicemia • 038.19 Other staphylococcal septicemia 75 IDO: Core and Extensions Framework 76 A Lattice of Lightweight Application-Specific Ontologies 77 IDO-Staph: Introduction • Initial Release Candidate: http://purl.obolibrary.org/obo/ido/sa.owl • Google Code Page: http://code.google.com/p/ido-staph/ • Scope • Entities specific to Staphylococcus aureus (Sa) infectious diseases at multiple granularities • Biological and clinical terms describing host-Sa interactions • An IDO extension ontology • Extends IDO-Core, OGMS • BFO as an upper ontology • Built on OBO Foundry principles • Applications • Duke Staph aureus Bacteremia Group data annotation • Lattice of infectious diseases 78 Sa Organism: Parts and Products • Molecular Entities: Toxins, Invasins, Adhesins from Shetty, Tang, and Andrews, 2009 79 Source: http://textbookofbacteriology.net/themicrobialworld/staph.html 80 Toxic Shock Syndrome • Staphylococcal TSS is a ido:‘infectious disease’ • has_material_basis SOME (Sa infectious disorder AND (has_part SOME TSST) • TSST is a pr:protein • has_disposition SOME ‘exotoxin disposition’ [INF: is a exotoxin] • tstH is a so:gene • has_gene_product SOME TSST • part_of SOME (SaPI2 OR SaPI3) • SaPI2 is a so:‘pathogenic island’ • SaPI3 is a so:‘pathogenic island’ 81 Sa Diseases: Asserted Hierarchy • Primary classification of staphylococcal diseases • These are first and foremost infectious diseases • Use DOIDs for disease terms • Assert ido:‘infectious disease’ as a parent term for these diseases 82 Sa Diseases: Inferred Hierarchy • Secondary classification as Sa Infectious Diseases 83 Ways of differentiating infectious diseases • High-level types • By host type (species) • By anatomical site of infection • By signs and symptoms • By mode of transmission • By (sub-)species of pathogen • Differentiation based on host features • Clinical phenotype • Strain (e.g. A/J) • Gene types (e.g. C5-deficient) • SNP alleles • Differentiation based on pathogen features • By phenotype (e.g. drug resistance) • By genotype • • • • By banding patterns (e.g. PFGE) By typing of house-keeping genes (e.g. MLST) By virulence factor typing (e.g. spa, SCCmec) By whole genome? “Methicillin-Susceptible Staphylococcus aureus Endocarditis Isolates Are Associated with Clonal Complex 30 Genotype and a Distinct Repertoire of Enterotoxins and Adhesins” Nienaber et al. 2011 J Infect Dis. 204(5):704-13. 84 Ways of differentiating Staph aureus infectious diseases • Sa Infectious Disease • By SCCmec type • By ccr type • By mec class • By spa type • IWG-SCC • Maintains up-to-date SCCMec types • General guidelines for reporting novel SCCmec elements http://www.sccmec.org/Pages/SCC_ClassificationEN.html 85 SCCMec (Staphylococcal Chromosome Cassette) • A mobile genetic element in Staphylococcus aureus that carries the central determinant for broad-spectrum beta-lactam resistance encoded by the mecA gene and has the following features: • (1) carriage of mecA in a mec gene complex, • (2) carriage of ccr gene(s) (ccrAB or ccrC) in a ccr gene complex, • (3) integration at a specific site in the staphylococcal chromosome, referred to as the integration site sequence for SCC (ISS), which serves as a target for ccr-mediated recombination, and • (4) the presence of flanking direct repeat sequences containing the ISS. 86 87 Representing SCCMec: IDO-Staph + SO is a gene group pathogenic island is a is a mec complex has_part is a SCCmec has_part ccr complex is a is a mec complex class B SCCmec Type IV ccr complex Type 2 has_part has_part has_part IS1272 is a has_part has_part mecA is a insertion sequence gene ccrA2 has_part ccrB2 is a is a 88 NARSA Isolate Data • Isolate data from the Network for Antimicrobial Resistance to Staph. Aureus • CDC Active Bacterial Core surveillance (ABCs) Isolates Subset • Known Clinically Associated Strains • 101 Sa isolates • Isolate data • Culture source (e.g. bone/joint) • Antimicrobial profile (e.g. erythromycin resistant) • Virulence factors expressed (e.g. TSST-1+) • PFGE type (e.g. USA300) • Genomic typing (e.g. MLST type 8, SCCmec type IV)) 89 90 Building the Lattice • For each NARSA Isolate we extract • SCCMec Type (IDO-STAPH) • TSST +/- (IDO-STAPH) • PVL +/- (IDO-STAPH) • Culture Source (FMA) • Antimicrobial Profile • Drug (CHEBI) • Minimum Inhibitory Concentration (OBI) • CLSI Interpretation of Resistance (IDO) • Each particular isolate can be part of a particular Staph aureus infectious disorder. • Each particular Staph aureus isolate can be the material basis for a particular Staph aureus infectious disease. 91 Resistance of NRS701 to clindamycin • resistance_of_Iso_to_D instanceOf resistance_to_D • Iso has_disposition ‘resistance_of_Iso_to_D’ • Iso_D_MIC instanceOf ‘MIC data item’ • Iso_D_MIC has_measurement_value M + 92 Faceted Browser • http://awqbi.com/LATTICE/narsa-complete.html • http://purl.obolibrary.org/obo/ido/sa/narsa-isolates.owl 93 Conclusion • Good web resources on Staph aureus exist… • IWG-SCC • NARSA • Comprehensive Antibiotic Resistance Database (CARD) • …but currently in information siloes and flat HTML • Disease specific application ontologies can be induced from isolate data • Each such application ontology • Has a well-defined place in the lattice beneath IDO-Core • Can be used to make Sa specific genetic-phenotypic assertions. • We believe an IDO-based lattice of application ontologies can contribute to a new taxonomy of (infectious) disease. 94 Acknowledgements • This work was funded by the National Institutes of Health through Grant R01 AI 77706-01. Smith’s contributions were funded through the NIH Roadmap for Medical Research, Grant U54 HG004028 (National Center for Biomedical Ontology). • Duke SABG • IDO Consortium • OBO Foundry 95