Pathology Informatics & MD Anderson Cancer Center Mark Routbort, MD, PhD University of Texas MD Anderson Cancer Center Houston, Texas MD Anderson Cancer Center • Major tertiary care/referral center for cancer diagnostics, treatment, and research • Over 200 active clinical trials • No residency program, but over 30 new fellows annually distributed between – – – – Surgical pathology Hematopathology Laboratory medicine Molecular pathology • 80+ pathology/lab medicine faculty • Visiting rotations available in pathology informatics Visiting pathology informatics fellowships Informatics as part of a career in pathology • As the primary or a major focus – Medical directorship with oversight of a technical group – There are currently 3 active faculty vacations in the US and Canada fitting this description • As a “hat” or role in a department or practice – For an individual lab, e.g. molecular pathology – Participation in processes (such as RFP) and committees – Engaging in search for solutions • As a life-long interest – The “go-to” person Why do we care? • As providers of pathology & laboratory services, interpretation and accurate reporting of information are at the core of our clinical endeavors • 70/70 rule-of-thumb for lab data – 70% of clinical data points emerge from laboratory data – Used in 70% of clinical decision making • Pathology data – The definitive diagnosis of cancer • Diagnosis dictates chemotherapy/surgery and prognosis • Staging guides protocols & management • Practical knowledge is practically useful Principles driving informatics at MD Anderson • Consider the “source of truth” of data – Seek to minimize the number of transformations, and especially human transcriptions, data goes through – This favors a query-based architecture of primary systems, and encourages stewardship of data by domain experts • Use standards where possible and practicable – Don’t re-invent the wheel – But don’t let the pursuit of perfection prevent forward progress – Generally, it is most important to design well, and openly • Leverage and develop clinician informaticians to work at the junction of clinical practice, data management, and research – Synergistic skills – This means recognizing informatics as both a clinical and research service to the institution Pathology informatics faculty at MD Anderson • 3 full-time faculty members with significant (>=50%) dedicated informatics time • We are recruiting one more! • Michael Riben, MD – Workflow & change management – Vocabularies/terminoogy • Mary Edgerton, MD – Institutional tissue bank – Microarray data standards and computational modeling/analysis First principles and practical applications • Well covered by practical informatics course at this conference & John Sinard’s “bible” – Practical Pathology Informatics • Data structures & web services • Medical data transmission, HL7, and pathology reporting • Workflow fundamentals • Information retrieval and vocabularies The need for relational models • A model of laboratory blood draws using a simple 2dimensional table rapidly fails Relational databases enable an extensible model of the real world Databases versus “Excel” • Databases – – – – – Model the world as entities and relationships Primary keys for identifying entities Foreign keys for linking entities Denormalization Referential integrity • The “Excel” view (two-dimensional tables) is not bad: – Needed for: • Statistical analysis • Graphing – But it can always be derived as a “view” from the database model ® Microsoft Access , distributed with the near-ubiquitous Microsoft Office suite, is an excellent model system for learning about relational databases Relational lingua franca for the Web: XML • eXtended Markup Language • W3C specification for data modeling • Human and machine readable • Self-describing XML Schema • • • Describes what conforming XML data should look like (a blueprint) Required and optional elements and attributes, cardinality, data types, complex structure Conformance to a schema represents a contract for exchange of information Web services and WSDL • Web services: – Software constructs designed to support interoperable Machine to Machine interaction over the World Wide Web – Service provider – Service requester – Optionally, service broker – Most commonly enabled by the SOAP (simple object access protocol) specification • WSDL: Web services description language – an XML-based language that provides a model for describing Web services – Modern application development environments (Java, .NET) can consume WSDL directly to automatically create client (requester)-side code to consume the service MD Anderson SPiDR • Web services based “shared pathology data repository” for all clinical lab and pathology data Web services and schemas can greatly facilitate connections to complex data sources 1. 2. 3. Instantiate the service proxy Execute a service call to return a LabData object Bind the grid to the LabData object The data model/schema directly dictates the run-time appearance: Information transfer: Health Level 7 (HL7) • Messaging standard for health care inter-systems communication • Founded 1987, versions 2.1, 2.2, 2.3 from 1990-1999, in wide use for communicating lab and pathology results (version 2.x) • ANSI standard CBC (Supergroup) result message examples - Partial result message MSH|^~\&|ESI|LAB|INVISION_PMS|HIS|20050331155000-0600||ORU^R01|2980822|T|2.1 PID|1||000000000999999|00000|TEST^MICKEY^N||19400313|F||W|||||||UNK|000010501880256|428827901 PV1|1|O|DICT^DICT|||||||731||||HIS|||0000361^WALTERS, RONALD S. M|R||||||||||||||||||||||||||200503011442000600|20050402155000-0600 OBR|1|5500280|01014775200001550550028025032847925032847900000000101|5500312^CBC^COMPLETE BLOOD CNT/DIF/PLT|RT|20050331152000-0600|20050331154200-0600|||PCCGS^SO, CELIA G.||||20050331154300-0600||0000361^WALTERS, RONALD S. M||1||0000509003089|G|||LA|P||^^^200503311520^^RT OBX|001|NM|5500009^WBC^WHITE BLOOD CELL COUNT|| 2.4|K/UL| 4.011.0|L|||F||00000000000000225200|20050331155000.0000-0600|IIM^INSTRUMENT PERFORMED ID|PCNDA^ACOSTA, NOEL D. OBX|002|NM|5500018^RBC^RED BLOOD CELL COUNT|| 3.03|M/UL| 4.005.50|L|||F||00000000000000225200|20050331155000.0000-0600|IIM^INSTRUMENT PERFORMED ID|PCNDA^ACOSTA, NOEL D. How is information commonly conveyed? Pathologist Pathologist, transcriptionist, resident entry AP – LIS Format conversion to ASCII text “Native” pathology report DIAGNOSIS Metastatic adenocarcinoma. HL7 Interface engine HL7 HIS Database /Viewer Custom display logic Clinician HL7 is not WYSIWYG (what you see is what you get) HIS viewer Pathology system The integrity of semantic content is at stake in any transformation “Direct” electronic delivery of pathology reports Pathologist Self, transcriptionist, resident entry HIS Viewer Web service based direct query for report Rich Text Format (RTF) “Native” pathology report stored in PowerPath database Custom path report viewing control Clinician Current pathology reporting at MD Anderson Web service Pathology system EMR Workflow foundations from an informatics perspective • Data model of the objects involved in your business process • Defined transitions between states associated with rules and events • Identify objects with machine readable technologies Workflow foundations: asset identification • Bar coding – 1D or 2D machine readable data encodings – Line of sight (laser or digital imaging detection) – Can represent • machine readable form of human information, or • a unique identifier to uniquely identify the asset • RFID (radio frequency identification) – digital data encoded in an RFID tag is captured by a reader using radio waves – non-line of sight – can multiplex – active, passive, and hybrid forms – relatively expensive compared to bar codes RFID vs. Barcode • RFID – – – – – – – – Does not require “line of sight” No-contact No operator Simultaneous (parallel) Identification Data storage is greater (up to 30x) Smaller tag size required Read reliability – eliminates multiple scanning attempts Harder to deploy • Barcode – Line of site required – Requires operator most of the time – Serial identification – Limited information ( increased with 2d) – Requires larger tag size – May require multiple scan attempts – Cheap – Easily deployable Workflow example at MD Anderson: Introducing new technology to the grossing lab • Previous system: Telephony based with very limited dictation workflow control or metrics Background • “Free-flow” dictation style with batching of numerous dictations in a single session • No connection between dictation system and AP-LIS • Significant percentage of transcription time spent listening to dictation for case information, typing in accession numbers, and loading case into PowerPath (estimate 10-20% depending on case type) Background • Routing and priority dependent on correct punching of numbers on touch pad • “Paper-towel syndrome” provoked by profound distrust in system Goals • Replace dictation system with telephony independent solution • Use non-proprietary dictation hardware if possible • Needed solution that can tolerate conditions of grossing environment (fluids, biohazards) • Bring clinical and ancillary information closer to grossing personnel • Drive system with bar codes – Connect physical specimens to AP-LIS – NO numeric input by humans – Wanted to be focus-free (no keyboard wedges) • Route priority and process according to known workflow rules based on case type Hardware selection Ergotron LX wall mount and Elo 1529 Touch panels InSync Buddy Microphone Symbol MS3207 Scanners Kinesis Footpedals Solution: Software • Dictation module – WinScribe™ – Client/server application capable of operating with numerous off-the-shelf dictation hardware solutions – Basic API for automation – Capable of basic rule & priority based routing – Attractive licensing model for our installation (concurrent transcriptionists) • • • AP-LIS - PowerPath™ Touch panel keyboard (for system login) – Click-N-Type freeware Workflow “wrapper” – PathStation – – – – MD Anderson custom workflow application for Pathology Supplements, not supplants, existing AP-LIS Integrate bar coding with WinScribe, PowerPath, and EMR Made it possible to create a fully hands-free dictation environment, except for initial login Specimen arrival • Institutional label • MRN is bar coded MRN in Code 39 Institutional bar code (MRN) drives accessioning • If no recent specimen match, start a new case on the patient • If recent specimens exist, system offers choice of new case or add on to existing case Ready to gross with new labels that drive workflow Scanning specimen at workstation: 1. Opens case in PowerPath 2. Starts a new dictation in the WinScribe system 3. Sets the patient & case context in PathStation Real-time dictation job overview and review Payoffs • No human data entry of accessions or MRNs either by dictators or transcriptionists • Dictations instantly and easily available for review at the case level • No prioritization or case type flags entered by hand – simply scan specimen or requisition and dictate • Better dictation turn around times with new system • Well-established framework for further enhancement – bar codes on case paperwork also drive pathologist workflow at signout • Users have found novel uses of the system interconnects (patient and case context synchronization) Information retrieval Those who cannot learn from history are doomed to repeat it. George Santayana • Effective retrieval and analysis from our laboratory and pathology information systems is key to knowledge development – Transactional queries: Where is this specimen right now? – Analytic queries: How many lab tests of each type are we doing monthly? – Identification queries: Can I find cases of metastatic rhabdomyosarcomas in our database? Mechanics of search engines Spidering/ crawling Corpus Document caching and indexing Query tools: keywords, phrases, conjunction, ranking Searching the web vs searching pathology reports • Web based searches: – Favor relevance (precision) over full retrieval (recall), thus • Ranking extremely important • Pathology reports – predominantly case-finding for: – Examples (full recall not needed) – Cohort generation (need high recall and precision) – Data extraction (similar to cohort generation, but with added goal of automatically or semiautomatically harvesting granular data) Foundations: inverted files (text indexes; concordances) • The key to efficient retrieval The power of indexes • Library of Congress – ~ 20 million books – ~ 5 trillion words – If fully indexed, a binary search could find any word by looking at less than 42 entries – At that point, you’d have a list of every single book of the 20 million which contained that word • Indexes versus one-dimensional catalogs Text index based retrieval Full-text index (system) results table Diagnosis dx Gross description gd Clinical history None given. SNOMED sc Table index (SQL Server) Text index based retrieval tool at MD Anderson Text-based retrievals versus data element capture (synoptic reporting) • Text retrievals – – – – – With appropriate index engine, can be extremely fast Simple to use Do not require any modification of incoming data Very good at retrieving rare diagnoses Very poor at • Data extraction • Semantic retrievals (find all cases which have metastatic carcinoma, not those in which the phrase “no evidence of metatastic carcinoma” occurs) • Intraobserver retrieval • Synoptic reporting/structured documentation – elements in report are defined and maintained for downstream use – Separation of presentation of report from content/data – Data elements are “marked up” and remain searchable • What is the average size of largest lymph node metastasis in tumor X? • With clinical data correlation, can determine clinically relevant elements Vocabularies/ontologies: SNOMED CT • A systematized, hierarchical nomenclature (ontology) facilitating Disease Cancer – – – Adenocarcinoma Concept (not word) based retrieval Aggregation and subsumption Every concept has interrelationships with other concepts that provide logical computer readable definitions. These include hierarchical relationships and clinical attributes. Lymphoma Hodgkin lymphoma Lymphocytepredominant Hodgkin lymphoma Squamous cell carcinoma Non-Hodgkin’s lymphoma Non-Hodgkin Bcell lymphoma Classical Hodgkin lymphoma Nodular Mixed sclerosis cellularity HD HD Carcinoma Lymphocyte rich HD Mantle cell lymphoma Follicular lymphoma Summing up – Major pathology informatics projects in the last 3 years at MD Anderson • Completed – Creation of a real-time RTF viewer for pathology reports in our EMR to improve legibility and comprehensibility of reports – Importation of a large scale legacy Fortran store of lab data going back to the 1970’s into a modern relational format – Creating a unified, structurally robust repository for all clinical lab and pathology data which is used in a federated architecture by our EMR over web services – Real-time document scanning of all paperwork associated with pathology cases – Creation of a “Workflow integration application” (PathStation) for pathologists which unifies multiple disparate application under single-signon and patient context, driving workflow through bar codes – Integration of the PathStation application in the grossing room to drive dictation and associated grossing workflow Summing up – Major pathology informatics projects in the last 3 years at MD Anderson • In progress – Pathology workflow optimization: Integrate bar coded identification of pathology blocks and slides with a real-time workflow model – Expansion of use of the shared pathology data repository to research users and non-transactional query models – Virtual slide implementation for outside consultation material – Implementation of SCC SoftLab LIS as well as specialized modules for • Cytogenetics • Flow cytometry • Molecular diagnostics – Establishment of enterprise vocabulary services for the institution (Mike Riben) – Microarray analysis and statistical modeling of breast carcinoma (Mary Edgerton) – Tissue banking enhancements (Mary Edgerton) Summing up - Optimizing the pathology informatics cycle Domain expertise • Only imagination and time limit the scope of work to do • There is a tremendous synergism available when someone possesses the skill and inclination to engage at all vertices • Consider advanced training in informatics Information models and technologies Workflow