SNOMED CT Data Quality and Data Repair Dr Jeremy Rogers IHTSDO Consultant Terminologist Principal Terminology Specialists NHS HSCIC IHTSDO ImpSIG, Amsterdam, October 27th 2014 Outline Data Quality : An old problem SNOMED CT : New ways to get it wrong SNOMED CT : New ways to prevent or fix it Medicine needs useful formal ontologies, but formal ontologies that are simple to use are not useful, while useful ontologies appear to be too complex to be directly useable. MD Thesis 2004 Old data quality problems… Interrater Variability ART & ARCHITECTURE THESAURUS (AAT) Domain: art, architecture, decorative arts, material culture Content: 125,000 terms Structure: 7 facets, 33 polyhierarchies Associated concepts (beauty, freedom, socialism) Physical attributes (red, round, waterlogged) Style/Period (French, impressionist, surrealist) Agents: (printmaker, architect, jockey) Activities: (analysing, running, painting) Materials (iron, clay, emulsifier) Objects: (gun, house, painting, statue, arm) Synonyms Links to ‘associated’ terms Access: lexical string match; hierarchical view Data Quality Untrained, time pressured users Headcloth Cloth Scarf Model Person Woman Adults Standing Background Brown Blue Chemise Dress Tunics Clothes Suitcase Luggage Attache case Brass Instrument French Horn Horn Tuba X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Data Quality Types of coding error Missed coding: no code e.g. Table? Miscoding : wrong code e.g. French Horn, Arms Undercoding : half the truth e.g. Brass Instrument Overcoding : truth + lies e.g. Woman Outline Data Quality : An old problem SNOMED CT : New ways to get it wrong SNOMED CT : New ways to fix it SNOMED CT Miscodes 39 months in a busy UK A&E Department • Setting: 408,831 coded ED episodes – One ‘reason’ code per completed visit – 39 months (Oct 2008 – Dec 2011) – 12,022 distinct SCT codes selected at least once • Users: No training, time pressured • Browser: string match on all of SNOMED, no hierarchy SNOMED CT: New ways to get it wrong ‘Ontology-driven’ miscodes 11% of all data ‘obviously’ miscoded SNOMED CT: New ways to get it wrong ‘Obvious’ miscode examples 1097 145 118 24 33 15 17 8 373 207 117 50 33 136 82 78 396 230 Temperature High temperature FB (Foreign Body) TB (Tuberculosis) Spot MI (Myocardial Infarct) Drug used Drugs ETOH - Alcohol intake Alcohol EtOH – Ethanol Ethanol Lymph node Nasogastric tube Catheter Dressing Psychiatric Stabbing 246508008|Temperature (attribute)| 285717004|High temperature (physical force)| 367409002|Followed by (attribute)| 60117003|Transmitted by (attribute)| 23840004|Leiostomus xanthurus (organism)| 45169001|Without (attribute)| 246488008|Drug used (attribute)| 228011000000101|Drugs (navigational concept)| 160573003|Alcohol intake (observable entity)| 53041004|Alcohol (substance)| 419442005|Ethyl alcohol (substance)| 419442005|Ethyl alcohol (substance)| 59441001|Structure of lymph node (body structure)| 17102003|Nasogastric tube, device (physical object)| 19923001|Catheter, device (physical object)| 37898001|Dressing, device (physical object)| 27296002|Psychiatric (qualifier value)| 410706007|Stabbing sensation quality (qualifier value)| SNOMED CT: New ways to get it wrong Subtle miscode examples Coding foreign bodies… Disorder or treatment ? 11% of all data ‘obviously’ miscoded? SNOMED CT: New ways to get it wrong No ‘standard’ miscoding error rate • 23% of 74 abdominal aortic aneurysms miscoded as a Drug Trade Family (9192101000001100 AAA (product)); AAA is the name of a pharmaceutical company that make a spray to soothe sore throats • 25% of 939 stabbing victims miscoded as a qualifier value (‘stabbing sensation quality’ = quality of pain experienced during heart attack) • 33% of 3771 patients with some form of fever miscoded as ‘temperature’ either as an (attribute) or a (physical force) • 38% of 1101 failed consultations (patient left the department, or did not attend an appointment) miscoded as either a laterality (left) or as deoxyribonucleic acid (DNA = Did Not Attend) • 44% of 575 patients attending with a fish bone stuck in their throat miscoded as a food allergen (7661006|Fish bone (substance)|) • 49% of 5,062 alcohol-related attendances miscoded as either the substance (alcohol, ethyl alcohol) or just the mood disorder of feeling elated (‘intoxicated’) but not necessarily involving alcohol intake at all Outline Data Quality : An old problem SNOMED CT : New ways to get it wrong SNOMED CT : New ways to prevent or fix it Much of the complexity of formal ontologies arises from the consistent application of semantic patterns and choices. The cognitive load of using a complex formal ontology can be reduced if these patterns and choices are made explicit as a metamodel of the ontology, and where the metamodel is subsequently harnessed to guide user choices pre hoc and transform expressions post hoc to a preferred semantic form. MD Thesis 2004 SNOMED CT: New ways to prevent it Pre-hoc data capture (User training) Clinical search-and-browse Suppress non-sensical chapters Display concepts in hierarchy Data entry constraints/validation Speciality Subsets But beware : risk of non-interoperable sublanguages Structured Data Entry Non-interoperable sublanguages… with thanks to Malcolm Duncan http://www.mrtablet.demon.co.uk/chocolate_teapot_lite.htm SPECIALTY ONE SPECIALTY TWO SPECIALTY THREE Crockery ---Teapot 5 ----- Brown teapot 2 ------White teapot 1 Crockery ---Teapot ----- Brown teapot ------White teapot ------Blue teapot ------China teapot Crockery ---Teapot ----- Brown teapot ------White teapot ----------White china teapot ------Blue teapot ----------Blue china teapot ------China teapot ---------White china teapot ---------Blue china teapot ------Pink teapot ------Aluminium teapot ------Chocolate teapot ------Wooden teapot ------Paper teapot 2 2 1 2 1 1 0 0 0 1 1 0 0 1 1 0 1 1 1 SNOMED CT: New ways to prevent it Post-hoc data repair Query Table ‘Semantic redirection’ Manual redirection ‘Semantic redirection’ ? IF code IN <<442083009|Anatomical or acquired body structure| THEN SELECT CASE epr_context CASE = “diagnosis” code disorder:findingSite=code CASE = “procedure” code procedure:procedureSite=code END SELECT END IF Manual redirection ‘OD’ = overdose Hypoglycemia Dyspepsia? Dysuria? THANKYOU