RCRIM Vocabulary and Controlled Terminology for Clinical

advertisement
RCRIM Vocabulary and
Controlled Terminology for
Clinical Research
Margaret Haber, RN, OCN
Co-Director
Enterprise Vocabulary Services
National Cancer Institute
Clinical Research Challenges
• Fundamental new capacities to
characterize and intervene in
biological systems and the
disease process
• Hampered by our inability to
integrate huge volumes of data
due to information fragmentation
• Many diverse research and
delivery platforms that are
disconnected due to a lack of
common, interoperable systems
and semantics
• The problem is International in
scope, and with enormous
implications for our ability to
translate information into
knowledge
No Controlled Terminology?
No Interoperability
Systems cannot exchange or use
information if they use incompatible
codes or tokens to signify meaning
Terminology services provide token and
codes
Proper use of them assures consistent
meaning across and among enterprises
The Pillars of Interoperability
Necessary but not sufficient
Common information models across
all domains of interest
A foundation of rigorously defined
data types (metadata)
A methodology for interfacing with
controlled vocabularies
Interoperability Keys
for Terminology
Use of Industry Standards, where feasible



Must allow for extensions to core standards
Specialty terminology remains common
Mapping is therefore essential
Conformance with Data Models



For process (logical models)
For data flow (messages)
For data at rest (database design)
Clinical Data Interchange
Standards Consortium
(CDISC)
CDISC is an open, multidisciplinary, non-profit
organization committed to the development of
worldwide industry standards to support the
electronic acquisition, exchange, submission and
archiving of clinical trials data and metadata for
medical and biopharmaceutical product
development.
HL7 (Health Level Seven)
HL7 is a volunteer, ANSI-accredited Standards Developing
Organization (SDO) that focuses on clinical and
administrative healthcare data.
Mission:
"To provide standards for the exchange, management and
integration of data that support clinical patient care and the
management, delivery and evaluation of healthcare services.
Specifically, to create flexible, cost effective approaches,
standards, guidelines, methodologies, and related services for
interoperability between healthcare information systems.“
Bringing It All
Together: RCRIM
The HL7 “Regulated Clinical Research Information
Management” Technical Committee formed as a
collaboration of CDISC, FDA, and HL7
•
To facilitate the development of common
standards for clinical research information
management across a variety of organizations,
including government agencies, private research
efforts, and sponsored research
To develop standards for interchange of
regulated data that are interoperable with
general healthcare standards.
HL7 Vocabulary (including RCRIM)
Value sets associated with certain
domain portions of HL7 models
Most vocabulary domains are published
as informative references only
Those domains that have a formal
ballot status are shown in bold in the
HL7 vocabulary tables on their web site
There are current initiatives to map
these values to standard controlled
terminologies
HL7 Vocabulary - Access
HL7 publishes at
http://www.hl7.org/library/datamodel/RIM/C30204/vocabulary.htm
There are approximately 8,000 terms or
“concepts” in the current HL7 vocabulary
Scroll down to select a specific “table” or set
of terms
Also available through an NCI developed “HL7
SDK” (software development kit) application
tool
Conversion notes are included, see
“HL7_Design.pdf” on NCI’s website
What’s Happening Now?
CDISC, RCRIM and NCI
CDISC terminology group has established an
independent working environment at NCI for
the specification and development of broad
based clinical trials standard terminology,
based on CDISC models (SDTM)
Using the NCI Data Standards Repository
(caDSR), which draws controlled terminology
from NCI EVS systems, including but not
limited to leveraging NCI Thesaurus resources
for novel terminology development
Collaboration
These open standards, developed
in collaboration with FDA, NIH, HL7
and industry experts, can provide
the basis for a controlled
terminology set submitted to HL7
RCRIM as proposed standards for
adoption by the clinical trials
community
Where? NCI Enterprise
Vocabulary Services (EVS)
Services and resources that address NCI
and Partner’s needs for controlled
vocabulary http://evs.nci.nih.gov/
A collaboration
 NCI Office of Communications
 Physician Data Query (PDQ), Clinical Trials
Portal, Cancer Information Service and the
NCI web portal www.cancer.gov
 NCI Center for Bioinformatics
 Bioinformatics Core Infrastructure
(caCORE), including a metadata repository
(caDSR) and object models built using EVS
terminology for their core semantics
NCI EVS Goal –
Integration by Meaning
Clinical, translational, and basic research
terminology have overlapping but
specialized needs, therefore EVS assists to:
 Integrate different conceptual frameworks
 Create terminological and taxonomic
conventions across systems
Vocabulary Products
 NCI Thesaurus – an ontology-like terminology
 NCI Metathesaurus – maps vocabularies
 External vocabularies maintained and served:
MedDRA, HL7, NDF-RT, LOINC, etc.
NCI Thesaurus (NCIt)
Reference Terminology for NCI, Partners
A Federal Standard Terminology
Broad coverage of the cancer, other
research, and clinical domain including
prevention and treatment trials







Neoplastic and other Diseases
Findings and Abnormalities
Anatomy, Tissues, Subcellular Structures
Agents, Drugs, Chemicals
Genes, Gene Products, Biological Processes
Animal Models – Mouse, other
Research techniques and management,
apparatus, clinical trials, lab, radiology,
imagery
NCI Thesaurus (2)
Published Monthly
Public domain, open content
license
Available on-line and by download
(OWL, Ontylog XML, flat files)
55,000+ “Concepts” hierarchically
organized
Description-logic based
“Roles” establish machine
readable semantic relationships
between Concepts
NCI Thesaurus is Deployed:
http://nciterms.nci.nih.gov
http://www.nci.nih.gov/EVS
(full documentation)
• API: caCORE public access
• Fulfills NCI and collaborators’ needs
for controlled vocabulary
• Public domain, open content
license
17
Example Disease Concept
Gastric Mucosa-Associated Lymphoid Tissue Lymphoma
A low grade, indolent B-cell lymphoma, usually associated with Helicobacter Pylori infection.
Morphologically it is characterized by a dense mucosal atypical lymphocytic (centrocyte-like
cell) infiltrate with often prominent lymphoepithelial lesions and plasmacytic differentiation.
Approximately 40% of gastric MALT lymphomas carry the t(11;18)(q21;q21). Such cases
are resistant to Helicobacter Pylori therapy. -- 2003
Molecular abnormalities:
Disease_May_Have_Cytogenetic_Abnormality: Trisomy 3
Disease_May_Have_Cytogenetic_Abnormality: Trisomy 18
Role group 1:
Disease_May_Have_Cytogenetic_Abnormality: t(11;18)(q21;q21)
Disease_May_Have_Molecular_Abnormality: AP12-MLT fusion protein expression
Histogenesis:
Disease_Has_Normal_Cell_Origin: Post-germinal center marginal zone B-lymphocyte
Pathology:
Disease_Has_Abnormal_Cell: Centrocyte-like cell
Disease_May_Have_Abnormal_Cell: Neoplastic monocytoid B-lymphocyte
Disease_May_Have_Abnormal_Cell: Neoplastic plasma cell
Disease_May_Have_Finding: Lymphoepithelial lesion
Anatomy:
Disease_Has_Primary_Anatomic_Site: Stomach
Disease_Has_Normal_Tissue_Origin: Gut associated lymphoid tissue
Clinical information:
Disease_May_Have_Finding: Indolent clinical course
Disease_May_Have_Associated_Disease: Hepatitis C
A holistic view of information exchange
also requires broader interoperability,
but where do we place the fences?
Clinical data, regulatory submissions,
discovery research?
Industry agreements, nationally
accredited, global standardization?
One answer is mapping:
Relating Terminologies for Effective
Data Exchange
Mapping: NCI Metathesaurus
A filtered version of the NLM UMLS
Metathesaurus, extended with additional
required vocabularies
 1,100,000 concepts, 2,200,000+ terms
and phrases with definitions
 Mappings among over 55 vocabularies
 Extensive synonymy: Over 40,000 terms
for neoplasms mapped to 7,000
concepts
Used as online dictionary and thesaurus,
for mapping and document indexing
NCI Metathesaurus (2)
Minor releases monthly, Major releases
two to three times a year
Provides a mapped overlap and partial interrelation of current versions of NCI and
partner required vocabularies, ex. The ICD’s,
MedDRA, SNOMED, MeSH (NLM Medical
Subject Headings), HCPCS (procedures),
LOINC (lab values), drug terminologies (VA
NDF-RT, AOD, RxNORM, Multum, NCI
Thesaurus drugs, etc.)
NCI Metathesaurus: Browser Example
1/12/2006 #22
EVS Products & Services
Are Open
NCI Thesaurus is Open Content
ftp://ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusTerm
sofUse.htm
NCI Metathesaurus is Mostly Open Source
See Each Source’s License
http://ncimeta.nci.nih.gov/MetaServlet/GenerateSour
cesServlet
NCI EVS Servers Are Freely Accessible
On the Web: http://nciterms.nci.nih.gov and http://ncimeta.nci.nih.gov
http://ncicb.nci.nih.gov/core/caBIO
Via API:


All Software Developed by NCI EVS is Public Open
Source and Free for the Asking: http://ncicb.nci.nih.gov/core
NCI builds on EVS via caCORE
Infrastructure
Enhanced
Information
integration
Cross-discipline
reasoning
capabilities
biomedical objects
common data elements
controlled vocabulary
Enterprise Vocabulary
NCI Meta-Thesaurus
(Cross-mapped standard vocabularies,
e.g. ICD’s, MedDRA, SNOMED)
 Semantic integration, inter-vocabulary
mapping among 55+ vocabularies
 UMLS Metathesaurus extended with
numerous additional vocabularies
 1,100,000+ Concepts, 2,200,000
terms and phrases
biomedical objects
NCI Thesaurus


Description logic-based
55,000+ “Concepts”
 Concept is the semantic unit
 One or more terms describe a
Concept – synonymy
 Semantic relationships between
Concepts
Freestanding terminologies

MedDRA, MGED, NDF-RT, GO, SNOMED, etc.
common data elements
controlled vocabulary
Common Data Elements
(caDSR)
Structured data
reporting elements
Precisely defined,
harmonized
questions and
answers


Standardized
questions for forms
Standard lists of
coded valid values for
answers
biomedical objects
common data elements
controlled vocabulary
Biomedical Information Objects
(caBIO)
UML object models
representing clinical
and research entities
such as genes,
sequences,
chromosomes,
pathways, etc.
Public access APIs
provide an information
interface independent
of back-end data
platforms
biomedical objects
common data elements
controlled vocabulary
Controlled Terminology is integrated into
NCI’s standards supporting infrastructure
Enterprise Vocabulary Services (EVS)



Core Semantics for caCORE and many other applications
Public access browsers
APIs
cancer Data Standards Repository (caDSR)


ISO 11179 metadata repository
Common Data Elements (CDE’s) for multiple templates, such
as Case Report Forms, drawn from EVS terminology
cancer Bioinformatics Infrastructure
Objects (caBIO)


UML Models annotated with EVS concepts/terms, loadable
into caDSR
Public access APIs
EVS: Extending Interoperability
Beyond the Enterprise
Leverage Collaborations




Federal: FDA, VA, CDC, other NIH Institutes
Major Standards Organizations: HL7, CDISC,
W3C
Cancer Centers and Cooperative Groups
(caBIG, caGRID)
Many research collaborators such as the
Microarray Gene Expression Data Society
(MGED)
FDA-NCI MOU
Significance of MOU


NCI is leveraging its terminology-related resources to
address FDA needs
Avoids expenditure at FDA to replicate existing,
available resources at NCI, increases return on
investment for NIH/NCI
Leverages multiple efforts


FDA collaboration with NIH/NCI will result in
improved trial drug and related regulatory
terminology for the broader clinical trials community
FDA and NCI are to coordinate regarding terminology
standards efforts such as HL7 RCRIM (including
CDISC)
Example:
NCI EVS and FDA SPL
NCI EVS maintains and provides access to FDA
SPL Terminology
NCI Thesaurus will be a primary namespace
used
Also FDA standard terminology for the ICSR,
IND/NDA, device nomenclature, others

Access Via
 Download at ftp://ftp1.nci.nih.gov/pub/cacore/EVS/
 Public, open API http://ncicb.nci.nih.gov/core/caBIO
 Web Servlet at
http://nciterms.nci.nih.gov
Concept Details
URI:
Version:
Identifiers:
http://nciterms.nci.nih.gov:80/NCIBrowser/ConceptReport.jsp?
dictionary=NCI_Thesaurus&code=C42887
December 30, 2004 (04.12g)
Aerosol Dosage Form
name
code
Information about this concept:
Preferred_Name
Semantic_Type
DEFINITION
Synonym with source data
Synonym with source data
Synonym with source data
Synonym
Synonym
Synonym
Synonym
Superconcepts:
Pharmaceutical Dosage Form
Subconcepts:
Aerosol Foam Dosage Form
Aerosol Spray Dosage Form
Metered Aerosol Dosage Form
Powder Aerosol Dosage Form
Aerosol_Dosage_Form
C42887
Aerosol Dosage Form
Manufactured Object
FDA|A product that is packaged under pressure and
contains therapeutically active ingredients that are
released upon activation of an appropriate valve
system; it is intended for topical application to the
skin as well as local application into the nose
(nasal aerosols), mouth (lingual aerosols), or lungs
(inhalation aerosols).
AER|AB|FDA_CDER|246
Aerosol Dosage Form|PT|NCI
Aerosol|PT|FDA|246
AER
Aerosol
Aerosol Dosage Form
Aerosol Dose Form
This indicates the
concept is used in
the FDA Structured
Product Label (SPL)
A Vital Collaboration:
CDISC and NCI –
Shared models, metadata standards, and
core semantics drawn from standard
terminology
CDISC terminology group is working with NCI tools
through EVS for the specification and development of
broad based clinical trials standard terminology, based on
CDISC models
CDISC is using the NCI Data Standards Repository and
controlled terminology from NCI EVS, including but not
limited to NCI Thesaurus, for novel terminology
development
These open CDISC standards, developed in collaboration
with FDA, NIH, HL7 and others, can provide the basis for
a controlled terminology set able to be adopted across
the clinical trials community
NCIt Browser: CDISC Tagged Concept
NCI Thesaurus
Concept: Race
Terminology
concept for
Race showing
harmonization
of different
users,
including
CDISC, NCI,
CDC, etc.
35
Benefits of Terminology Development in
a Common Environment
A Step Towards Semantic Interoperability
• Support and maintenance of terminologies in
NCI EVS provides access to and common usage
of standard terminologies
• Enables use of controlled terminology by
clinicians and researchers for data encoding,
retrieval, reporting, and aggregation
• Facilitates collaboration and information
exchange by increasing the ability to predictably
use information that is gathered
• Leverages the power of shared knowledge
36
You can collaborate
Joint Participation: In standards groups such as
HL7 RCRIM in order to inform relevant standards
decisions
Joint Development: Contributing to clinical trials
standard terminology development efforts, i.e.
through CDISC terminology group
Providing validation and testing: Content and
modeling developed with industry input is more
robust, better able to meet your needs, and you
can better plan/anticipate implementation/impacts
on your organization
Participate in HL7 RCRIM
The HL7 “Regulated Clinical Research Information
Management” Technical Committee, formed as a
collaboration of CDISC, FDA, and HL7
To facilitate the development of common
standards for clinical research information
management across a variety of organizations -including government agencies, private research
efforts, and sponsored research
To develop standards for interchange of
regulated data that are interoperable with general
healthcare standards.
Contact:
Margaret W. Haber, RN, OCN
Co-Director
NCI Enterprise Vocabulary Services
NCI Office of the Director
mhaber@mail.nih.gov
http://evs.nci.nih.gov/
Download