Clinical Coding: The Basics (MN Kamel Boulos)

advertisement
Terminologies, Classifications
and Groupings
Dr M N Kamel Boulos
MIM Centre,
City University,
London
XaBpR (aka Dermatologist)
E-mail: dk708@city.ac.uk
2001
Why Use a Clinical Terminology?
Free-Text Search Weaknesses
• Brute force free-text search techniques cannot locate
relevant knowledge efficiently for three reasons:
 The sought page might be using a different term (or
synonym) that points to the same concept. Myocardial
infarction and coronary thrombosis cannot be matched,
although they are the same.
 Spelling mistakes and variants are considered as
different terms in a computer environment. For
example, psoriasis (correct spelling) and psoriaisis
(typographical error) cannot be matched. Similarly,
anaemia (correct UK spelling) and anemia (correct US
spelling) cannot be matched.
Why Use a Clinical Terminology?
Free-Text Search Weaknesses (Cont’d)
• Brute force free-text search techniques cannot locate
relevant knowledge efficiently for three reasons (Cont’d):
 In the bibliographic world, e.g., MEDLINE, search engines
cannot process HTML intelligently. For example, searching
for resources on ‘psoriasis’ will retrieve all the documents
containing this word, but many of these resources might not
be relevant, i.e., psoriasis was just mentioned by the way in
these documents and is not their actual topic. For example,
some documents might be mentioning psoriasis within a
‘See also: psoriasis’ sentence, e.g., at the bottom of a page
covering another papulous squamous disease, or under the
differential diagnosis section of a page covering another
disease, e.g., Reiter’s disease or seborrhoeic dermatitis.
You might be doing a
search on the term
‘stroke’ (cerebral
infarction) and end up
with documents that
teach you about the
workings of the twostroke motorcycle
engine. The nondiscriminatory free-text
method of document
retrieval inevitably
produces a number of
irrelevant leads or
noise (Kiley, 1999).
Solution
• Structured (headings), coded data entry.
• It is essential that healthcare professionals agree on
the nature and content of the component data sets of
the different EPRs (e.g., record structures and
headings related to the different medical, surgical and
nursing specialities), so that consistent basic models
of these records can be constructed and shared in a
reliable way. (The framework of headings Web site
http://www.nhsia.nhs.uk/headings/ is a good example
of well-organised standardisation efforts.)
Headings are very important. For example, the same term “schizophrenia” assumes
different meanings/implications depending on whether it appears under diagnosis or
present/past history or family history headings in an electronic patient record.
Introduction to Clinical Coding
•
Clinical coding lies at the heart of successful implementation of the
EPR and integrated decision support modules.
•
EPR coding is done using a fine granularity terminology (or
controlled healthcare vocabulary) like Read CTV3 (Read Clinical
Terms Version 3).
•
A controlled healthcare vocabulary is a system of concepts to
populate electronic healthcare applications. Controlled healthcare
vocabularies are products of the electronic era, designed to support
computer-based functionality.
•
Read CTV3 allows not only the coding of diagnoses and drugs
(treatment), but also the coding of symptoms and signs, and of
different tests and investigations. Moreover, Read CTV3 is a
compositional terminology, which means that concepts can be
constructed from primitive building blocks with rules controlling
different combinations.
Introduction to Clinical Coding
• On the other hand, a clinical classification allows
categorisation of clinical data according to intrinsic rules.
Formal clinical classifications have existed for over 100
years, initially for mortality, but more recently for morbidity
and interventions. Classifications like the WHO's ICD-10 (The
International Statistical Classification of Diseases and
Related Health Problems, tenth revision) offer a coarser
granularity (1000s of entries vs. 100,000s of entries in clinical
terminologies) and only single parentage (so that an item
may not be counted twice under different headings), and are
therefore more suitable for statistical reporting (national
statistics and international comparisons) using aggregated
data.
Introduction to Clinical Coding
• Groupings like HRGs (Health Resource Groups)
have an even much coarser granularity, lumping
together tens of different conditions in single groups
according to their resource consumption (100s of
groups). Grouping information from aggregated EPRs
helps in resource management, planning and budget
negotiations.
Introduction
How Do Concept Codes Help in
Decision Support and Research?
All the concepts represented by the terms in Read CTV3 are arranged
in a hierarchy (multiple parentage allowed), i.e., they are semantically
defined within the clinical vocabulary. The hierarchy describes which
concepts are types of something else. Consider this example:
Myocardial infarction (and its synonyms)
is a type of
Ischaemic heart disease (and its synonyms)
which is a type of
Disorder of heart (and its synonyms)
which is a type of
Cardiovascular disorder (and its synonyms)
which is a type of
Disorders (or diseases)
which is a type of
Clinical findings
How Do Concept Codes Help in
Decision Support and Research?
Now suppose that a doctor wishes to prescribe a drug that must not be
used by anyone having a heart disorder. Because the clinical
terminology knows every condition that is a type of heart disorder, it
can automatically check the patient’s record to see whether the patient
has any of these conditions.
This could not have been achieved with a free text patient record. For
instance, Angina is a type of heart disorder, but this could not have
been detected in a text-based patient record, where the best that can
be achieved is to search for the word "heart" in the text. Nor would it
have been possible to search for words "heart OR angina OR coronary
OR myocardial OR ... etc.", as there are over 1000 types of heart
disorder listed in the Read Codes for example.
Read Codes’ hierarchy of concepts allows all sorts of research
questions such as "list all my patients who have eczema, and list these
according to the type of eczema that they have." The possibilities are
endless...
Controlled Clinical Terminology
Desirable Features (Cimino)
1 Concept based
2 Completeness (the compositional feature of a terminology
ensures completeness)
3 Synonymy (in this way the terminology is less restrictive and
richer; all synonyms of a concept point to it and are semantically
associated with it)
4 Hierarchical
5 Multiple classification and multiple parentage
6 Compositional
7 Semantic definition of concepts
8 Mapped to classifications (e.g., Read -> ICD10; can be one-toone or one-to-many maps; usually some detail is lost as
classifications have coarser granularity compared to
terminologies)
9 Language-independent model
No Ambiguity or Redundancy
• No duplicate concepts are allowed, i.e., cannot allow two
different ways of coding the same thing or concept, e.g.,
"Heart attack" and "Myocardial infarction" cannot be
considered two different concepts and given two concept
codes; they are just synonyms.
• "Paget disease" cannot be a concept’s preferred term or
label, because it is ambiguous; it can point to "Paget
disease of the breast" as well as "Paget disease of bone".
• Each concept has one unambiguous preferred term and
any number of synonyms. Synonyms may be shared with
other concepts, e.g., "Ventricle" is a synonym (but cannot
be the preferred term) of both "Cardiac ventricle" and
"Brain ventricle".
Concept and Term Codes
Concept and Term Codes
• "Plaque psoriasis" (concept label) has a Read
concept code ‘M1614’ while the codes
(TermIds) of the preferred term ‘Plaque
psoriasis’ (again) is ‘Y50HZ’ and synonymous
terms: "Discoid psoriasis" – ‘Y50Ha’ and
"Nummular psoriasis" – ‘Y50Hc’, i.e., four
codes for this example: one concept code
and three TermIds.
Arranging Concepts
• Concepts can be arranged
orthographically (by spelling, i.e., A to
Z), like a dictionary (e.g., Apple, Dog,
Orange, Zebra). However, arranging
concepts semantically (by meaning) like
a thesaurus is much more useful (e.g.,
Fruits [Apple, Orange], Animal [Dog,
Zebra]).
Directed Acyclic Graph (DAG)
• DAG allows multiple parentage and allows
concepts to be moved and reclassified as
medical knowledge changes (cf. rigid codedependent hierarchy of ICD). With DAG,
unlimited hierarchy depths can be reached
(cf. Only four levels in ICD), but all these
features of DAG come on the expense of
increased complexity for implementers.
Directed Acyclic Graph (DAG)
Enumerative Vs
Compositional Terminologies
• Enumerative (pre-coordinated) terminologies, where every
possible concept is listed explicitly, result in compositional
explosion and you can never be sure that you have listed all the
possibilities.
• A compositional (post-coordinated) terminology on the other
hand, like Read CTV3, seeks to construct concepts from
primitive building blocks, governed by validation rules.
• OAV (Object-Attribute-Value) triples constitute the description
logic scheme used in Read CTV3, and help achieving semantic
definition of concepts.
• In SNOMED (Systematised Nomenclature of Human and
Veterinary Medicine - College of American Pathologists), the
description logic is KRSS (Knowledge Representation System
Specification) while in GALEN it is GRAIL.
•
N.B.: Read Codes are due to be merged with SNOMED by 2002, to create a
new worldwide standard clinical coding scheme. This will be called SNOMED
Clinical Terms (SNOMED-CT).
Enumerative Vs
Compositional Terminologies
Description Logics
• Description logics (DLs) lie at the heart
of any clinical terminology. DLs are
languages that allow reasoning about
information, in particular supporting the
classification of descriptions by working
out how concepts and their instances
relate to one another based on their
roles. They can thus infer knowledge
implied by an ontology.
Terminology Servers
• Medical terminologies are foundational ontologies used by
many applications, and hence they should not be embedded
in client applications, but should be shared and reused as
distributed resources by implementing them as services
through terminology servers.
• A terminology server is a special type of ontology servers
that allows retrieval of related concepts (parent, child,
sibling, cousin and uncle concepts) and synonyms, and
querying and cross-mapping multiple terminologies/
classifications at the same time. Ideally, it should also
support concept mapping, which involves processing free
text queries to identify corresponding terms from a controlled
vocabulary; this relieves users from any restrictions while
ensuring accurate results (contextual relevancy) and can
also support multiple languages.
Terminology Servers
(Cont’d)
• Chute et al mention the following desiderata for a
clinical terminology server: word normalisation,
word completion, target terminology specification,
spelling correction, lexical matching, term
completion, semantic locality, term composition and
decomposition.
• Examples of terminology servers include Saphire
International
(http://www.ohsu.edu/cliniweb/saphint/) and jTerm
(http://www.jterm.org), a Java-based open source
terminology server.
Classifications
• A classification is a system of categories to which
entities are assigned according to some
established criteria, e.g., anatomy, disease
process or pathology, aetiology, clinical (like
obstetrics), or a combination of these. Categories
are limited in number, all encompassing and
stable over time. Common and important entities
are assigned to specific categories, while
uncommon and less significant entities are
included within other categories.
Classifications
Entities rules of
engagement in ICD
1 Index, e.g., 443.1 Buergers disease (ICD9)
2 Inclusions (when an entity is less significant), e.g., 443.6 Other
(incl. Acrocyanosis, Diabetic peripheral angiopathy)
[excl. Chilblains, Frostbite, Immersion foot] (ICD9)
3 Exclusions (see above example; tells you not to count an entity under
this code, as it is listed elsewhere within the classification with another
code)
4 Otherwise specified categories (OS) – include other specific but less
significant entities
5 Unspecified (NOS – Not Otherwise Specified), e.g., 443.9
Unspecified (incl. Intermittent claudication, Spasm
of artery) [excl. Spasm of cerebral artery (435)]
(ICD9)
6 Extensions (5th digit), e.g., to differentiate between closed and open
fracture neck of femur as an open fracture is much more liable to
infection and complications.
7 Dagger – asterisk, e.g., {Cause} 265.0 Beriberi {Effect}
425.7* Nutritional cardiomyopathies
Characteristics of a
Classification
• All concepts can find a single place in a suitable
classification (i.e., all-inclusive, mutually exclusive). A
single concept cannot be classified under two
different headings (i.e., no multiple parentage); this
prevents double counting of a condition, which is
essential for reliable statistics and central returns
(remember: statistics are the main raison d’être of
classifications).
• In classifications, you loose detail (related concepts
are aggregated and counted together; no distinction
between them is made on the code and statistics
levels).
Characteristics of a
Classification
• Classifications become less accurate with time, and will
eventually need revision at some stage, e.g., when new
diseases are discovered (where to put these diseases,
and if we put them under existing categories the meaning
of these categories will drift with time making comparisons
with previous years statistics done using the same
classification less accurate and reliable). Updating a
classification also implies preparing an equivalencemapping table (to compare statistics done using different
versions of the classification).
• In addition to ICD (for primary diagnosis), OPCS-4 (a
surgical operations and procedure classification) is used in
the UK.
The Pyramid
Mapping and Grouping Tools
CamsCoder presents us with a
review screen showing the
diagnostic and operative
procedures entered.
CamsCoder is a good example of a
mapping tool that translates Read
CTV3 terms into ICD-10 and
OPCS-4 terms and codes.
The coder can set the episode coding
to be complete when they are happy
with it.
The episode’s HRG is displayed here.
CamsCoder automatically validates
the information presented. If there are
any problems, a message is displayed
here explaining what the coder needs
to do.
If the coding was invalid clicking on the
“Action” button would take the coder
through the process of making the
episode valid.
The coder now clicks on “OK” to finish
this coded episode.
CamsCoder allows the entered statements
(which may contain multiple terms and codes) to
be re-ordered/deleted using these buttons.
The LOINC Codes
•
The LOINC database provides a set of universal names and ID
codes for identifying laboratory and clinical observations. The goal
of LOINC is to facilitate the exchange and pooling of clinical
laboratory results, such as blood haemoglobin or serum potassium,
for clinical care, outcomes management, and research.
•
Currently, many laboratories are using HL7 or similar standards, to
send laboratory results electronically from producer laboratories to
clinical care systems in hospitals. Most laboratories identify tests in
these messages by means of their internal (and idiosyncratic) code
values, so the receiving systems cannot fully "understand" the
results they receive unless they either adopt the producer's
laboratory codes (which is impossible if they receive results from
multiple source laboratories), or invest in work to map each
laboratory's code system to the receiver's internal code system.
The LOINC Codes
You may download and use to LOINC database browser free of charge from
http://www.regenstrief.org/loinc/loinc.htm
The LOINC Codes
• If laboratories all used the LOINC codes to
identify their results in data transmissions, this
problem would disappear. The receiving system
with LOINC codes in its master vocabulary file
would be able to understand and properly file
HL7 results messages that also use the LOINC
code. Similarly, government agencies would be
able to pool and analyse results for tests from
many sites if they were reported electronically
using the LOINC codes.
2001 MeSH (Medical Subject
Headings)
• MeSH was originally developed by United States’ National
Library of Medicine (NLM) to index the world medical literature
in MEDLINE (MeSH provides bibliographic headings for
indexing); the latest MeSH version is 2001 MeSH. MeSH also
forms an essential part of the NLM’s Unified Medical Language
System (UMLS).
• MeSH qualifiers or subheadings are used to better define a
topic, narrow retrieval, or express a certain aspect of a main
heading.
• It should be noted that MeSH is not an efficient indexing
language for tasks such as classifying episodes of patient care.
The more efficient clinical coding systems (e.g., Read
Codes/Clinical Terms Version 3) are more suited to coding the
Electronic Patient Record.
2001 MeSH (Medical Subject
Headings)
MeSH Descriptor Data for Psoriasis, a skin disease.
2001 MeSH Tree Structures
• MeSH hierarchy allows broader
(parents or ancestors and siblings) and
narrower (children or successors)
concept relationships. Moreover, within
this hierarchy, a single concept may
appear as narrower concepts of more
than one broader concept, e.g.,
"Psoriatic Arthritis" appears under both
"Joint Diseases" and "Skin Diseases”.
cf. ICD; remember each coding language or scheme is most suited to particular purpose(s).
2001 MeSH Tree Structures
http://www.nlm.nih.gov/mesh/MBrowser.html
UMLS (Unified Medical
Language System)
• The UMLS project (http://umls.nlm.nih.gov/) is a longterm research and development project at the United
States' National Library of Medicine (NLM) whose goal is
to help health professionals and researchers to
intelligently retrieve and integrate information from a
wide range of disparate electronic biomedical information
sources. It can be used to overcome variations in the
way similar concepts are expressed in different sources.
This makes it easier for users to link information from
patient record systems, bibliographic databases, factual
databases, expert systems, etc. The UMLS Knowledge
Services can also assist in data creation and indexing
applications.
UMLS (Unified Medical
Language System)
• The UMLS includes machine-readable "Knowledge
Sources" that can be used by a wide variety of
applications programs to compensate for
differences in the way concepts are expressed in
different machine-readable sources and by different
users, to identify the information sources most
relevant to a user inquiry.
• The Metathesaurus contains mappings to MeSH,
ICD-9-CM, SNOMED, CPT, and a number of other
coding systems.
UMLS (Unified Medical
Language System)
• The UMLS is not itself a standard; it is a crossreferenced collection of standards and other data
and knowledge sources. It is a very valuable
resource for solving the most difficult problem in
exchanging healthcare information: the multiplicity
of coding systems in use today.
• One on-line use of the UMLS is the Medical World
Search site (http://www.mwsearch.com/). When a
user searches the Web for a medical concept,
Medical World Search uses UMLS to include
synonyms in the query.
The UMLS Project and its
Components
• The project is directed by a multidisciplinary team,
including clinicians, computer and information
scientists, and linguists, and involves collaboration with
many medical informatics research groups. The project
work has resulted in a set of knowledge sources and
accompanying programs that are updated and
distributed regularly on CD-ROM. Online access to the
UMLS knowledge sources is provided through the
Internet-based UMLS Knowledge Source Server, which
includes an application programming interface (API)
and a World Wide Web interface. The Web site requires
registration (http://umlsks.nlm.nih.gov/).
UMLS Metathesaurus
• The Metathesaurus contains information about
biomedical concepts and terms from a large number of
controlled terminologies and thesauri. The
Metathesaurus preserves the information encoded in
the source vocabularies, such as the hierarchical
contexts of the terms, their meanings and other
attributes. The Metathesaurus is organised by
concepts, which means that alternate names
(synonyms, lexical variants, and translations) for the
same meaning are all linked together as one concept.
The Metathesaurus adds information to the concepts,
including semantic types, definitions, and inter-concept
relationships.
UMLS Metathesaurus (Cont’d)
•
The Metathesaurus contains hundreds of thousands of concepts from
a broad range of vocabularies. These include, for example, all or
portions of the following terminologies:
–
–
–
–
–
–
–
–
–
–
–
•
the Systematised Nomenclature of Medicine (SNOMED International),
the Read Thesaurus,
the International Classification of Diseases - Clinical Modification (ICD9-CM),
the Universal Medical Device Nomenclature System,
the WHO Adverse Drug Reaction Terminology,
the Classification of Nursing Diagnoses (NANDA),
the Home Health Care Classification of Nursing Diagnoses and Interventions,
the Physicians' Current Procedural Terminology (CPT),
the Medical Subject Headings (MeSH),
the Diagnostic and Statistical Manual of Mental Disorders (DSMIV), and
the Thesaurus of Psychological Index Terms.
In addition, translations of some of the terminologies into languages
other than English are included.
UMLS Semantic Network
• The Semantic Network, through its high-level
semantic types, or categories, provides a consistent
categorisation of all concepts represented in the
Metathesaurus. The links between the semantic types
provide the structure for the Network and represent
important relationships in the biomedical domain.
There are semantic types for organisms, anatomical
structures, biologic function, chemicals, events,
physical objects, and concepts or ideas. The primary
relationship is the "is_a" link, and there are five major
categories of additional relationships: physical,
spatial, temporal, functional, and conceptual
relationships.
UMLS (Unified Medical
Language System)
UMLS (Unified Medical
Language System)
UMLS (Unified Medical
Language System)
Software Implementations You
Can Experiment With
•
Agora’s
Web-based
READ 3.1
browser
that you
can play
with:
http://www.
agora.co.uk
:1080/read/
gp.htm
Lightweight browser and
search engine for the Read
Codes clinical thesaurus by
Agora.
Software Implementations You
Can Experiment With
• CLUE (CIC Look Up Engine from The Clinical
Information Consultancy, UK) is a freeware clinical
coding solution that helps you add NHS Clinical
Terms Version 3 capabilities to clinical applications
in hours rather than months (you may use Visual
Basic for example to access CLUE's API). CLUE
also offers a ready-to-use Read Codes browser. You
may download the full CLUE package free of
charge, but beware that a terminology like Read
CTV3 with more than 200,000 concepts, nearly
300,000 terms and over a million access keys is not
a small download, so be prepared for this one (over
20MB).
http://www.clinical-info.co.uk/ClueDownload.htm
Software Implementations You
Can Experiment With
•
CLUE (CIC Read CTV3 Look Up
Engine from The Clinical
Information Consultancy, UK)
Software Implementations You
Can Experiment With
ICD9 CodeFinder lets you search and browse ICD9 categories and
codes. You may download CodeFinder free of charge (298 KB).
http://www.winsite.com/info/pc/win95/misc/cfind20.zip/
Software Implementations You
Can Experiment With
e-MDs Online ICD-9 Search: http://www.e-mds.com/icd9/index.html
Recommended Web Links
and Papers
• Bechhofer SK, Goble CA, Rector AL, Solomon WD, and Nowlan
WA. Terminologies and Terminology Servers for Information
Environments. In: Proceedings of STEP '97 Software Technology
and Engineering Practice, 1997. URI:
http://citeseer.nj.nec.com/354766.html
• Chute CG, Elkin PL, Sheretz DD and Tuttle MS. Desiderata for a
Clinical Terminology Server. In: Proceedings of AMIA'99 Annual
Symposium, 1999. URI: http://www.amia.org/pubs/symposia/D005782.PDF
• Rector AL. Clinical Terminology: Why Is it so Hard? Methods Inf
Med. 1999;38(4-5):239-52
• The British Association of Clinical Terminology Specialists:
•
http://www.bacts.org.uk/
OpenGALEN: http://www.opengalen.org/
• Read Codes Engines: http://www.cams.co.uk/ and
http://www.visualread.com
• See also “Related Web Links” section at:
https://wwws.soi.city.ac.uk/intranet/students/courses/mim/mi/lect2_2.htm
Download