HL7 Domain Information Model: Clinical Genomics

HL7_DIM_CG_R1_I1_2014SEP
HL7 Domain Information Model:
Clinical Genomics, Release 1
Clinical Genomics Statement Model WalkThrough
Last update: 31July2014
September 2014
HL7 Informative Ballot
Sponsored by:
Clinical Genomics Work Group
CGWG co-chairs:
Amnon Shabo (Shvo), Mollie Ullman-Cullere, Bob Milius, Gil Alterovitz, Siew Lam
Primary Editor: Amnon Shabo (Shvo), Standards of Health
Co-editor: Robert R. Freimuth, Mayo Clinic
Co-editor: Mollie Ullman-Cullere, Dana Farber Institute and Partners HealthCare
With contribution of members of the HL7 Clinical Genomics Work Group
Copyright © 2014 Health Level Seven International ® ALL RIGHTS RESERVED. The reproduction of this
material in any form is strictly forbidden without the written permission of the publisher. HL7 and Health Level
Seven are registered trademarks of Health Level Seven International. Reg. U.S. Pat & TM Off.
Use of this material is governed by HL7's IP Compliance Policy.
Introduction
This document is part of an effort to develop a Clinical Genomics Domain Analysis
Model (CG DAM). In the September 2014 cycle, two documents will be balloted
separately: Clinical Sequencing DAM and Clinical Genomics Domain Information
Model (DIM). These are balloted at Informative level. In a later release, the CG DIM
and the Sequencing DAM will be merged into a broader CG DAM.
The CG DIM project addresses the lack of alignment across the current HL7 Clinical
Genomics specifications (published v3, v2 & CDA specs, and FHIR resources in
development). It is presented to the reader in a 'standards-independent' way, i.e.,
that is not dependent on any specific HL7 jargon. The CG DIM is a set of conceptual
models intended to serve as the agreed-upon semantics of clinical genomics data
pertaining to individuals.
For more information about the HL7 Clinical Genomics Domain Information (DIM)
Model(s) Project (scope, need, criteria, requirements, etc.), please see here:
http://wiki.hl7.org/index.php?title=Clinical_Genomics_Domain_Information_Model(s)_Project
This document consists of a walk-through of one of the Clinical Genomics DIM models - the
Clinical Genomics Statement (CGS) Model. The model itself can be found in the zip file
containing this document and also in the HL7 project wiki site (see above).
HL7 Clinical Genomics: Domain Information Model, Release 1
© Health Level Seven International. All rights reserved.
Page 2
General comments
The Clinical Genomics Statement (CGS) model represents a single 'genotype-phenotype'
association. Several CGS instances can be populated within higher-level constructs, e.g.,
messages, documents, etc. where subject and other contextual data of the CGS are
identified (as an example, see the Family Health History included in this ballot package).
The CGS model is designed with the following underlying principles:





Core principles:
o Explicit representation of a 'genotype-phenotype' association
o The association should not be pre-coordinated, that is, a phenotype is not
represented as part of some genetic variation representation, e.g., a genetic
biomarker is an explicit association of two distinct observations and there is
no pre-coordination of the interpretation into a single biomarker code
o The association is a many-to-many relationship, e.g., a genetic observation
can be associated with several phenotypes and vice versa
o It is possible to populate merely the genetic observation part, even if no
phenotypic data is available
Genetic observations:
o Capture various types of data, from known mutations to sequencing-based
variations (e.g., somatic mutations in tumor tissues) to structural variations
(e.g., large deletions, cytogenetics) to gene expression and so forth
o The core class GeneticObservation should contain data that is not the result
of downstream analysis, i.e., that cannot be derived analytically from
another observation available to the performer
o Link to GeneticObservationComponent class (zero to many) enables the
post-coordination of any core data item (e.g., HGVS string) to its
unambiguous various primitives (e.g., variation position length, type, etc.)
o Enabling encapsulation of key raw omics data that is the basis of creating
this clinical genomics statement
o The encapsulated data can include links back to the full-blown omics data
(e.g., NGS sequencing data)
Phenotypes:
o Distinguish between interpretive phenotype and observed
o Interpretation is represented as a distinct observation with its own
attributes such as time, method, performer, etc.
o Links to knowledge sources used for the interpretation
Specimen
o Molecular specimen and genomic source class
o Anatomic pathology specimen
o Both types of specimen should eventually comply with the HL7 new
specimen model (under development)
Other data types:
o Optional indications representing the triggers for performing genetic testing
or making genomic observations
Page 3
HL7 Domain Information Model: Clinical Genomics, Release 1
© Health Level Seven International. All Rights Reserved.
Model Walk-through:
Recent changes
(THIS PART CAN BE SKIPPED BY FIRST-TIME REVIEWERS)
30July2014
* Changed the scope of GeneticObservation class to hold merely observations that cannot
be derived from other observations
* Changed SupplementingGeneticData to DerivedGeneticData, in accordance with the
previous change
* Link KeyOmicsData class directly to the GeneticObservation class
29July2014
* Changed the class name GeneticVariation to GeneticObservation, in order to imply the
scope of this place holder as not necessarily restricted to nucleotide variation data and when
used for variants can also represent wild types
* Changed the class name AssociatedOmicsData to SupplementingGeneticData, in order not
to restrict this placeholder for omics data only and also in order to avoid using the term
association except for the main concept in this model, i.e., the 'genotype-phenotype'
association
* Added a class GeneticObservationComponent linked to the GeneticObservation in an
aggregation mode, in order to represent data items that are inherent part of the core
genetic observation, in a post-coordinated manner; this class does not hold an independent
observation and this has merely type-value attribute pair and has no attributes such id, time,
etc.
* Note that the link of GeneticObservation to SupplementingGeneticData was changed from
aggregation to simple association
* Changed class name AssociatedPhenotypicData to SupplementingPhenotypicData
HL7 Clinical Genomics: Domain Information Model, Release 1
© Health Level Seven International. All rights reserved.
Page 4
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class GeneticObservation
Description:
Representation of any observation that cannot be analytically derived from other
observations made by the performer, including genetic loci in the size of a gene, e.g., genes,
biomarkers, point DNA mutations, etc., as well as structural variations, e.g., large DNA
deletions or duplications, etc. and other types of omics data.
Note that omics data derived from this observation, e.g., RNA, proteins, expression, etc.,
should be represented using the linked DerivedGeneticData class.
Attributes:






id: a globally-unique id of this GeneticObservation
type: the type of genetic observation, e.g., mutation, SNP, large deletion, HGVSdescribed variation, etc. Vocabularies for defining these types have not been fully
standardized yet, but in any case they are orthogonal to the overall structure of the
CGS, which is the gist of this modeling effort. Nevertheless, there are already some
vocabularies available such as the LOINC value sets and panels used in the HL7 v2
message for genetic testing results and demonstrated in the HL7 CDA Genetic
Testing Report (GTR)
value: the actual variation, e.g., if type is "DNA Sequence Variation" then value can
be "109G>A" using "HGVS nomenclature" (specified as part of the value attribute)
performer: e.g., unique identifications of the genetic lab that made this observation
method: code and/or description of the methods used to obtain the genetic
variation observation
observationTime: the effective time for the subject of this observation
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class GeneticObservationComponent
Description:
Representation of components of the core genetic observation in a post-coordinated
manner, i.e., breaking it down to its primitives in order to disambiguate data held by the
GeneticObservation class, e.g., an HGVS string.
Attributes:

type: the type of genetic observation component, e.g., variation location, variation
length, etc.

value: the actual component observation, e.g., if type is variation base length then
value is the actual length
Page 5
HL7 Domain Information Model: Clinical Genomics, Release 1
© Health Level Seven International. All Rights Reserved.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class DerivedGeneticData
Description:
Representing any downstream data derived from the core genetic observation (including
omics data), e.g., RNA, proteins, expression, etc.
Attributes:

id: a globally-unique id of this observation

type: the type of observation, e.g., amino-acid change, derived characteristics of the
resulting protein

value: the actual observation

performer: e.g., unique identifications of the genetic lab that generated this data
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class KeyRawOmicsData
Description:
Representation of key data extracted from mass raw genomic data, e.g., an exon's sequence
that is considered key to this clinical genomics statement, is extracted from some larger
sequencing (up to whole exome sequencing) and encapsulated within this statement
structure. Another example: certain lines of a VCF file (preferably its 'clinical-grade' version).
The encapsulated data should contain references to the full-blown raw omics data.
Attributes:




encapsulatedData: a placeholder for the key raw omics data, e.g., DNA sequences,
variant calls, expression data, etc. Could be any type of data encoding: binary, text,
XML, etc.
reference: a pointer to the repository containing the full-blown data sets out of
which these key data items were extracted
type: designating the notation used in the encapsulatedData attribute, e.g., VCF,
FASTA, etc.
schemaReference: a pointer to the schema governing the representation of the data
in the encapsulatedData attribute
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class RawOmicsDataSource
Description:
Representing the repository where the full-blown data set could be found.
HL7 Clinical Genomics: Domain Information Model, Release 1
© Health Level Seven International. All rights reserved.
Page 6
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class Phenotype (abstract)
Description:
Representation of any data considered phenotype of the source genetic observation. For
each genetic observation (e.g., variation), there could be multiple phenotypes, however,
each phenotype object should represent a single phenotype. Defining a set of properties to
be a 'single' phenotype might be determined by the intended utility of this phenotype, e.g.,
in patient care, clinical trials or earlier on in different phases of research.
Note that this abstract class cannot be instantiated.
Attributes:




id: a globally-unique id of this observation
type: the type of phenotypic observation, e.g., drug responsiveness
value: the actual observation (should be drawn from dedicated ontologies, e.g.,
Human Phenotype Ontology, http://www.human-phenotype-ontology.org/)
status : describes the 'clinical genomics' status of this phenotype, e.g.:
- Established
- Provisional
- Putative
- Preliminary
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class PhenotypeComponent
Description:
Representation of components of the phenotype in a post-coordinated manner, i.e.,
breaking it down to its primitives in order to disambiguate data held by the Phenotype class,
e.g., a pre-coordinated SNOMED code.
Attributes:

type: the type of phenotype component, e.g., SMOMED code component

value: the actual component
Page 7
HL7 Domain Information Model: Clinical Genomics, Release 1
© Health Level Seven International. All Rights Reserved.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class SupplementingPhenotypeData
Description:
The SupplementingPhenotypeData class can add details to the core phenotype represented
in the Phenotype class, e.g., drug being resisted and contextual data, e.g., age, etc.
Attributes:

id: a globally-unique id of this observation

type: the type of associated phenotypic observation, e.g., age, antibiotics taken, etc.

value: the actual observation
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class ObservedPhenotype (extending Phenotype)
Description:
This is a sub-class of Phenotype, representing a phenotype observed in the subject, e.g., if
the somatic mutation is known to be the reason of positive responsiveness to some drug,
then that responsiveness is an observed phenotype. For example, consider the following
case where Kobayashi et al. (2005) reported a patient with advanced non-small cell lung
cancer and an EGFR somatic mutation, who has been gefitinib-responsive resulting in 2 years
of complete remission, but then had a relapse due to a second EGFR somatic mutation (see
Kobayashi, S., Boggon, T. J., Dayaram, T., Janne, P. A., Kocher, O., Meyerson, M., Johnson, B.
E., Eck, M. J., Tenen, D. G., Halmos, B. EGFR mutation and resistance of non-small-cell lung
cancer to gefitinib. New Eng. J. Med. 352: 786-792, 2005. [PubMed: 15728811].
At the end of those two years, the first EGFR mutation should have been associated with an
observed phenotype while the second mutation should have been associated with an
interpretive phenotype (see further below).
Attributes:



clinicalTime: the effective time for the subject of this observation
ageOfOnset: use this attribute in cases where dates are not available (e.g., in clinical
genomics statements pertaining to a relative of a patient in a family health history)
referenceToHealthRecord: a pointer to an information system (e.g., EHR system)
holding the broader contextual data of this observation (this could be implemented
by guidance provided from health information exchange systems such as the
specifications set out by the NwHIN in the US)
HL7 Clinical Genomics: Domain Information Model, Release 1
© Health Level Seven International. All rights reserved.
Page 8
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class InterpretivePhenotype (extending Phenotype)
Description:
This is a sub-class of Phenotype, representing a phenotype that is likely to be exprerssed in
the subject, e.g., the somatic mutation is likely to cause resistance to some drug (see a use
case in the description of observed phenotype above).
Attributes:



method: the method by which this interpretation was made (e.g., an identified
algorithm, a rule engine, a clinical decision support application/service, etc.)
interpretation Time: the time this interpretation was made available for this Clinical
Genomics Statement instance)
performer: unique id of the entity making this interpretation (e.g., clinical decision
support application/service)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class InterpretationEvidenceSource
Description:
Representing the knowledge bases that served as source for making the interpretation.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class AnatomicPathologyFinding (extending ObservedPhenotype)
Description:
An anatomic pathology (AP) finding that is the phenotype of some genetic observation.
Attributes:


status: the status of this finding (using AP reporting vocabularies)
referenceToAPReport: a pointer to an AP report consisting of this AP finding
Page 9
HL7 Domain Information Model: Clinical Genomics, Release 1
© Health Level Seven International. All Rights Reserved.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class AnatomicPathologySpecimen
Description:
An optional reference to a detailed description of the specimen used for the finding
described in the AnatomicPathologyFinding class. Note that this class and the
MolecularSpecimen class should be harmonized and modeled so they can be used together.
This should be done following the HL7 Specimen model, now under revision.
Attributes:

id: a globally-unique id of the AP specimen (e.g., using the HL7 Specimen ID)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class Indication
Description:
An indication to performing genetic testing, e.g., genetic-related hearing loss diagnostics.
Note that this is mainly relevant in clinical environments.
Comments:
1. Incidental findings could be represented by CGS instances that do not have indications;
2. If results of one test are used as indications for another test, they should be represented
by the indication class. Future work – might need to add to the vocabulary of the type of
indication a code denoting 'previous testing results'.
Attributes:

id: a globally-unique id of this indication

type: the type of indication (e.g., diagnostic, carrier, etc.)

value: the actual indication (preferably drawn from controlled vocabularies)
HL7 Clinical Genomics: Domain Information Model, Release 1
© Health Level Seven International. All rights reserved.
Page 10
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Class MolecularSpecimen
Description:
This class and the AnatomicPathologySpecimen class should be harmonized and modeled so
they can be used together. For example, it is possible to obtain a single biospecimen that is
split, one portion sent to path, another to a biobank, and a third to a genetic testing lab.
Attributes:

id: a globally-unique id of this specimen

type: the type of molecular specimen (e.g., DNA)

genomicSourceClass: the source of the specimen being analyzed, e.g., germline for
inherited genome, somatic for cancer genome and prenatal for fetal genome
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Association Class 'Genotype Phenotype' Association
Description:
[This is a future work]
An association class adding characteristics that cannot be represented through either side of
this core association.
Page 11
HL7 Domain Information Model: Clinical Genomics, Release 1
© Health Level Seven International. All Rights Reserved.