HL7_DIM_CG_R1_I1_2014SEP HL7 Domain Information Model: Clinical Genomics, Release 1 Clinical Genomics Statement Model WalkThrough Last update: 31July2014 September 2014 HL7 Informative Ballot Sponsored by: Clinical Genomics Work Group CGWG co-chairs: Amnon Shabo (Shvo), Mollie Ullman-Cullere, Bob Milius, Gil Alterovitz, Siew Lam Primary Editor: Amnon Shabo (Shvo), Standards of Health Co-editor: Robert R. Freimuth, Mayo Clinic Co-editor: Mollie Ullman-Cullere, Dana Farber Institute and Partners HealthCare With contribution of members of the HL7 Clinical Genomics Work Group Copyright © 2014 Health Level Seven International ® ALL RIGHTS RESERVED. The reproduction of this material in any form is strictly forbidden without the written permission of the publisher. HL7 and Health Level Seven are registered trademarks of Health Level Seven International. Reg. U.S. Pat & TM Off. Use of this material is governed by HL7's IP Compliance Policy. Introduction This document is part of an effort to develop a Clinical Genomics Domain Analysis Model (CG DAM). In the September 2014 cycle, two documents will be balloted separately: Clinical Sequencing DAM and Clinical Genomics Domain Information Model (DIM). These are balloted at Informative level. In a later release, the CG DIM and the Sequencing DAM will be merged into a broader CG DAM. The CG DIM project addresses the lack of alignment across the current HL7 Clinical Genomics specifications (published v3, v2 & CDA specs, and FHIR resources in development). It is presented to the reader in a 'standards-independent' way, i.e., that is not dependent on any specific HL7 jargon. The CG DIM is a set of conceptual models intended to serve as the agreed-upon semantics of clinical genomics data pertaining to individuals. For more information about the HL7 Clinical Genomics Domain Information (DIM) Model(s) Project (scope, need, criteria, requirements, etc.), please see here: http://wiki.hl7.org/index.php?title=Clinical_Genomics_Domain_Information_Model(s)_Project This document consists of a walk-through of one of the Clinical Genomics DIM models - the Clinical Genomics Statement (CGS) Model. The model itself can be found in the zip file containing this document and also in the HL7 project wiki site (see above). HL7 Clinical Genomics: Domain Information Model, Release 1 © Health Level Seven International. All rights reserved. Page 2 General comments The Clinical Genomics Statement (CGS) model represents a single 'genotype-phenotype' association. Several CGS instances can be populated within higher-level constructs, e.g., messages, documents, etc. where subject and other contextual data of the CGS are identified (as an example, see the Family Health History included in this ballot package). The CGS model is designed with the following underlying principles: Core principles: o Explicit representation of a 'genotype-phenotype' association o The association should not be pre-coordinated, that is, a phenotype is not represented as part of some genetic variation representation, e.g., a genetic biomarker is an explicit association of two distinct observations and there is no pre-coordination of the interpretation into a single biomarker code o The association is a many-to-many relationship, e.g., a genetic observation can be associated with several phenotypes and vice versa o It is possible to populate merely the genetic observation part, even if no phenotypic data is available Genetic observations: o Capture various types of data, from known mutations to sequencing-based variations (e.g., somatic mutations in tumor tissues) to structural variations (e.g., large deletions, cytogenetics) to gene expression and so forth o The core class GeneticObservation should contain data that is not the result of downstream analysis, i.e., that cannot be derived analytically from another observation available to the performer o Link to GeneticObservationComponent class (zero to many) enables the post-coordination of any core data item (e.g., HGVS string) to its unambiguous various primitives (e.g., variation position length, type, etc.) o Enabling encapsulation of key raw omics data that is the basis of creating this clinical genomics statement o The encapsulated data can include links back to the full-blown omics data (e.g., NGS sequencing data) Phenotypes: o Distinguish between interpretive phenotype and observed o Interpretation is represented as a distinct observation with its own attributes such as time, method, performer, etc. o Links to knowledge sources used for the interpretation Specimen o Molecular specimen and genomic source class o Anatomic pathology specimen o Both types of specimen should eventually comply with the HL7 new specimen model (under development) Other data types: o Optional indications representing the triggers for performing genetic testing or making genomic observations Page 3 HL7 Domain Information Model: Clinical Genomics, Release 1 © Health Level Seven International. All Rights Reserved. Model Walk-through: Recent changes (THIS PART CAN BE SKIPPED BY FIRST-TIME REVIEWERS) 30July2014 * Changed the scope of GeneticObservation class to hold merely observations that cannot be derived from other observations * Changed SupplementingGeneticData to DerivedGeneticData, in accordance with the previous change * Link KeyOmicsData class directly to the GeneticObservation class 29July2014 * Changed the class name GeneticVariation to GeneticObservation, in order to imply the scope of this place holder as not necessarily restricted to nucleotide variation data and when used for variants can also represent wild types * Changed the class name AssociatedOmicsData to SupplementingGeneticData, in order not to restrict this placeholder for omics data only and also in order to avoid using the term association except for the main concept in this model, i.e., the 'genotype-phenotype' association * Added a class GeneticObservationComponent linked to the GeneticObservation in an aggregation mode, in order to represent data items that are inherent part of the core genetic observation, in a post-coordinated manner; this class does not hold an independent observation and this has merely type-value attribute pair and has no attributes such id, time, etc. * Note that the link of GeneticObservation to SupplementingGeneticData was changed from aggregation to simple association * Changed class name AssociatedPhenotypicData to SupplementingPhenotypicData HL7 Clinical Genomics: Domain Information Model, Release 1 © Health Level Seven International. All rights reserved. Page 4 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class GeneticObservation Description: Representation of any observation that cannot be analytically derived from other observations made by the performer, including genetic loci in the size of a gene, e.g., genes, biomarkers, point DNA mutations, etc., as well as structural variations, e.g., large DNA deletions or duplications, etc. and other types of omics data. Note that omics data derived from this observation, e.g., RNA, proteins, expression, etc., should be represented using the linked DerivedGeneticData class. Attributes: id: a globally-unique id of this GeneticObservation type: the type of genetic observation, e.g., mutation, SNP, large deletion, HGVSdescribed variation, etc. Vocabularies for defining these types have not been fully standardized yet, but in any case they are orthogonal to the overall structure of the CGS, which is the gist of this modeling effort. Nevertheless, there are already some vocabularies available such as the LOINC value sets and panels used in the HL7 v2 message for genetic testing results and demonstrated in the HL7 CDA Genetic Testing Report (GTR) value: the actual variation, e.g., if type is "DNA Sequence Variation" then value can be "109G>A" using "HGVS nomenclature" (specified as part of the value attribute) performer: e.g., unique identifications of the genetic lab that made this observation method: code and/or description of the methods used to obtain the genetic variation observation observationTime: the effective time for the subject of this observation ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class GeneticObservationComponent Description: Representation of components of the core genetic observation in a post-coordinated manner, i.e., breaking it down to its primitives in order to disambiguate data held by the GeneticObservation class, e.g., an HGVS string. Attributes: type: the type of genetic observation component, e.g., variation location, variation length, etc. value: the actual component observation, e.g., if type is variation base length then value is the actual length Page 5 HL7 Domain Information Model: Clinical Genomics, Release 1 © Health Level Seven International. All Rights Reserved. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class DerivedGeneticData Description: Representing any downstream data derived from the core genetic observation (including omics data), e.g., RNA, proteins, expression, etc. Attributes: id: a globally-unique id of this observation type: the type of observation, e.g., amino-acid change, derived characteristics of the resulting protein value: the actual observation performer: e.g., unique identifications of the genetic lab that generated this data ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class KeyRawOmicsData Description: Representation of key data extracted from mass raw genomic data, e.g., an exon's sequence that is considered key to this clinical genomics statement, is extracted from some larger sequencing (up to whole exome sequencing) and encapsulated within this statement structure. Another example: certain lines of a VCF file (preferably its 'clinical-grade' version). The encapsulated data should contain references to the full-blown raw omics data. Attributes: encapsulatedData: a placeholder for the key raw omics data, e.g., DNA sequences, variant calls, expression data, etc. Could be any type of data encoding: binary, text, XML, etc. reference: a pointer to the repository containing the full-blown data sets out of which these key data items were extracted type: designating the notation used in the encapsulatedData attribute, e.g., VCF, FASTA, etc. schemaReference: a pointer to the schema governing the representation of the data in the encapsulatedData attribute ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class RawOmicsDataSource Description: Representing the repository where the full-blown data set could be found. HL7 Clinical Genomics: Domain Information Model, Release 1 © Health Level Seven International. All rights reserved. Page 6 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class Phenotype (abstract) Description: Representation of any data considered phenotype of the source genetic observation. For each genetic observation (e.g., variation), there could be multiple phenotypes, however, each phenotype object should represent a single phenotype. Defining a set of properties to be a 'single' phenotype might be determined by the intended utility of this phenotype, e.g., in patient care, clinical trials or earlier on in different phases of research. Note that this abstract class cannot be instantiated. Attributes: id: a globally-unique id of this observation type: the type of phenotypic observation, e.g., drug responsiveness value: the actual observation (should be drawn from dedicated ontologies, e.g., Human Phenotype Ontology, http://www.human-phenotype-ontology.org/) status : describes the 'clinical genomics' status of this phenotype, e.g.: - Established - Provisional - Putative - Preliminary ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class PhenotypeComponent Description: Representation of components of the phenotype in a post-coordinated manner, i.e., breaking it down to its primitives in order to disambiguate data held by the Phenotype class, e.g., a pre-coordinated SNOMED code. Attributes: type: the type of phenotype component, e.g., SMOMED code component value: the actual component Page 7 HL7 Domain Information Model: Clinical Genomics, Release 1 © Health Level Seven International. All Rights Reserved. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class SupplementingPhenotypeData Description: The SupplementingPhenotypeData class can add details to the core phenotype represented in the Phenotype class, e.g., drug being resisted and contextual data, e.g., age, etc. Attributes: id: a globally-unique id of this observation type: the type of associated phenotypic observation, e.g., age, antibiotics taken, etc. value: the actual observation ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class ObservedPhenotype (extending Phenotype) Description: This is a sub-class of Phenotype, representing a phenotype observed in the subject, e.g., if the somatic mutation is known to be the reason of positive responsiveness to some drug, then that responsiveness is an observed phenotype. For example, consider the following case where Kobayashi et al. (2005) reported a patient with advanced non-small cell lung cancer and an EGFR somatic mutation, who has been gefitinib-responsive resulting in 2 years of complete remission, but then had a relapse due to a second EGFR somatic mutation (see Kobayashi, S., Boggon, T. J., Dayaram, T., Janne, P. A., Kocher, O., Meyerson, M., Johnson, B. E., Eck, M. J., Tenen, D. G., Halmos, B. EGFR mutation and resistance of non-small-cell lung cancer to gefitinib. New Eng. J. Med. 352: 786-792, 2005. [PubMed: 15728811]. At the end of those two years, the first EGFR mutation should have been associated with an observed phenotype while the second mutation should have been associated with an interpretive phenotype (see further below). Attributes: clinicalTime: the effective time for the subject of this observation ageOfOnset: use this attribute in cases where dates are not available (e.g., in clinical genomics statements pertaining to a relative of a patient in a family health history) referenceToHealthRecord: a pointer to an information system (e.g., EHR system) holding the broader contextual data of this observation (this could be implemented by guidance provided from health information exchange systems such as the specifications set out by the NwHIN in the US) HL7 Clinical Genomics: Domain Information Model, Release 1 © Health Level Seven International. All rights reserved. Page 8 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class InterpretivePhenotype (extending Phenotype) Description: This is a sub-class of Phenotype, representing a phenotype that is likely to be exprerssed in the subject, e.g., the somatic mutation is likely to cause resistance to some drug (see a use case in the description of observed phenotype above). Attributes: method: the method by which this interpretation was made (e.g., an identified algorithm, a rule engine, a clinical decision support application/service, etc.) interpretation Time: the time this interpretation was made available for this Clinical Genomics Statement instance) performer: unique id of the entity making this interpretation (e.g., clinical decision support application/service) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class InterpretationEvidenceSource Description: Representing the knowledge bases that served as source for making the interpretation. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class AnatomicPathologyFinding (extending ObservedPhenotype) Description: An anatomic pathology (AP) finding that is the phenotype of some genetic observation. Attributes: status: the status of this finding (using AP reporting vocabularies) referenceToAPReport: a pointer to an AP report consisting of this AP finding Page 9 HL7 Domain Information Model: Clinical Genomics, Release 1 © Health Level Seven International. All Rights Reserved. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class AnatomicPathologySpecimen Description: An optional reference to a detailed description of the specimen used for the finding described in the AnatomicPathologyFinding class. Note that this class and the MolecularSpecimen class should be harmonized and modeled so they can be used together. This should be done following the HL7 Specimen model, now under revision. Attributes: id: a globally-unique id of the AP specimen (e.g., using the HL7 Specimen ID) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class Indication Description: An indication to performing genetic testing, e.g., genetic-related hearing loss diagnostics. Note that this is mainly relevant in clinical environments. Comments: 1. Incidental findings could be represented by CGS instances that do not have indications; 2. If results of one test are used as indications for another test, they should be represented by the indication class. Future work – might need to add to the vocabulary of the type of indication a code denoting 'previous testing results'. Attributes: id: a globally-unique id of this indication type: the type of indication (e.g., diagnostic, carrier, etc.) value: the actual indication (preferably drawn from controlled vocabularies) HL7 Clinical Genomics: Domain Information Model, Release 1 © Health Level Seven International. All rights reserved. Page 10 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Class MolecularSpecimen Description: This class and the AnatomicPathologySpecimen class should be harmonized and modeled so they can be used together. For example, it is possible to obtain a single biospecimen that is split, one portion sent to path, another to a biobank, and a third to a genetic testing lab. Attributes: id: a globally-unique id of this specimen type: the type of molecular specimen (e.g., DNA) genomicSourceClass: the source of the specimen being analyzed, e.g., germline for inherited genome, somatic for cancer genome and prenatal for fetal genome ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Association Class 'Genotype Phenotype' Association Description: [This is a future work] An association class adding characteristics that cannot be represented through either side of this core association. Page 11 HL7 Domain Information Model: Clinical Genomics, Release 1 © Health Level Seven International. All Rights Reserved.