One of the goals in this project is to recognize chronic

advertisement
Summary
This master thesis is part of the PASTAs project. The main goal of PASTAs is to determine what happens
to chronically ill patients as they are moved between their hospital, their primary doctor and other
services offered by the local authorities.
In this thesis I should cover some of PASTAs' sub-goals, such as:
A1. Develop an automated tool. This tool should be able to extract data related to each patient from
Norwegian Patient Register (NPR) data. Furthermore, this tool should transfer the NPR data to the Event
Stream (ES) format.
A2. Identify clusters of actual trajectories. This clustering will be done by using the ICD-10 ontology. The
ontology will help classifying the events related to diagnosis into chronic or non-chronic diseases.
A3. Develop a visualization of the trajectories. These trajectories are related to the chronically ill
patients. For doing the visualization, a tool (Patient Explorer) from a previous master thesis project will
be utilized.
The main focus will be on goals A1 and A2. Goal A3 will be accomplished, only if there is extra time.
1
Contents
1.
2.
Introduction .......................................................................................................................................... 3
1.1
Motivation..................................................................................................................................... 3
1.2
Our approach to the problem ....................................................................................................... 3
1.3
Case to be investigate ................................................................................................................... 3
1.4
Available Data ............................................................................................................................... 4
1.5
Methodology................................................................................................................................. 4
1.6
Outline........................................................................................................................................... 4
Background ........................................................................................................................................... 4
2.1
Event Stream ................................................................................................................................. 4
2.2
Temporal Event ............................................................................................................................ 4
2.3
Related Works ............................................................................................................................... 4
2.3.1
Computer-based Patient Record (CPR) ................................................................................. 5
2.3.2
Knave-II ................................................................................................................................. 5
2.3.3
Patient Explorer .................................................................................................................... 5
2.4
3.
Visualization .................................................................................................................................. 5
Proposed Design ................................................................................................................................... 5
3.1
Domain ......................................................................................................................................... 5
3.2
Files .............................................................................................................................................. 5
3.2.1
3.3
Recognizing chronic diseases ....................................................................................................... 6
3.3.1
Manually ............................................................................................................................... 6
3.3.2
Automatically ........................................................................................................................ 6
3.4
Ontology ....................................................................................................................................... 6
3.4.1
2
Medical codes ...................................................................................................................... 5
Jena ...................................................................................................................................... 6
3.5
Database ....................................................................................................................................... 7
3.6
Create episodes............................................................................................................................. 7
1. Introduction
This master thesis is part of the PAtientS TrAjactorieS (PAsTAs) project. In this project we are going to
develop a methodology to present data in the Electronic Medical Record (EMR) format. This
presentation should show patients’ trajectories through Norwegian health care system which includes
hospitals, primary cares and local organizations that offer health services.
1.1
Motivation
Up to now, the health care organizations like hospital, primary care and local care organizations have
kept patients’ records separately. Therefore, it is hard for researchers to have an overview of the
patient’s trajectory. Patient trajectory is a curve which presents the process and current status of a
patient during encounters with health care systems. There are two axes in the patient trajectory. Date is
in horizontal axis, and event can be observed in the vertical axis. Date could be one day like visiting his
primary care or a period of time such as receiving a service or being hospitalized. Event here could be
diagnosis or the service he receives in a specific date.
By visualizing the patient trajectory, researchers can observe how the treatment’s plan has worked for
the patient. Furthermore, they can observe if there is any progress in the treatment [6].
1.2
Our approach to the problem
In this thesis we are going to focus on events that are related to chronic disease. Chronic disease is a
disease that has long duration and it progresses very slowly. The diseases that are considered chronic
are such as heart disease, cancer, HIV and diabetes. [4]
According to Anselm [8] a patient trajectory in normal situation ends when the patient leaves the
hospital but in chronic diseases it can continue on with trajectory work in clinics, repeated visits to the
hospital and physicians’ offices. Therefore it is important to recognize which patients have chronic
diseases in order to be able to follow their trajectories correctly without missing any related data.
In order to have a complete patient trajectory, we also need to analyze the events that are not related
to chronic diseases, and to classify them properly.
1.3
Case to be investigate
In order to be able to recognize chronic diseases, we need to use International Classification of Disease
(ICD-10). A filter can be developed by this international classification codes, and system can distinguish
chronic diseases from nom-chronic ones.
The issue here is that there is not any chronic classification in ICD-10, so we need to find a way in order
to recognize chronic diseases automatically in ICD-10 ontology.
The next step is to classify events. Here we will call this classification episode. We want to separate the
events that are related to each other. In order to this classification we need to analyze the data and find
all possible episodes. For instance a patient might have received a service due to diagnosis recognition in
the hospital.
3
1.4
Available Data
Which organization has provided the data?
1.5
Methodology
Case study?
1.6
Outline
2. Background
In this chapter different methodologies and tools are going to be discussed. Also we are going to
evaluate them with our existing dataset in order to see if they can be used in our tool.
2.1
Event Stream
Each dataset file can be considered as a data table. A data table contains rows (records) and columns
(fields). In Electronic Medical Records (EMR) systems, a data table keeps information related to some
specific aspect of the patient’s care. These data tables are linked together by an identifier column.
Therefore it is possible to the query on them. [5]
The required information for analyzing can be extracted from EMR by querying the data tables. This
extracted information is called Event Stream (ES).
The simplest version of ES contains three elements. First column is an identifier which in this case is
patient’s identification number. The second column is the event. The event tells what kind of event is on
the row. And the last column is time.
In event stream time is considered one date. It could be in milliseconds or in days format. For instance
patient 1 has been hospitalized in date 0. And patient number 1 has been tested in date 23.
In ES more columns can be added if it is required.
In our case we have a period of time which means each event has a start date and an end date.
Therefore we need to find another format of events.
2.2
Temporal Event
In this method, they have classified a patient’s history into sequence of episodes.
e1, e2,…,en
Each episode contains different events [7].
ei =< PIDi1,PIDi2,…,PIDik >
2.3
Related Works
Write about previous works and what are their differences
4
2.3.1
Computer-based Patient Record (CPR)
2.3.2
Knave-II
2.3.3
Patient Explorer
2.4
Visualization
Explain how we are going to use Patient Explorer
3.
Proposed Design
3.1
Domain
Here I will explain about the domain in general. What kommune and hospital files are.
3.2
Files
In detail explain different columns in kommune and hospital
3.2.1 Medical codes
ICD-10: ICD is abbreviation of the International Classification of Diseases. ICD-10 classifies medical
diagnosis by alphabetical index.
The alphabetical index has been divided into different categories [1] [2]:
1) The physical diseases, syndromes, disorders, problems and diagnosis that are known by
physicians, such as: eye disease, metabolic disorder, pregnancy problem and disorders
involving immune mechanism (A00 – E90, G00 – Q99).
2) Mental and behavioral disorders (F00 – F98)
3) External causes of injuries or death
 Injury, poisoning and certain other consequences of external causes (S00 – T98) such as:
Effects of foreign body (like metal shard, eye lash in general external object) entering
through natural orifice.
 External causes of morbidity and mortality (V01 – Y98) such as: accidents, Internal self-harm
and so on.
4) Symptoms with less certainty or unknown
 Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified
 Symptoms and signs (R00 – R69)
 Abnormal findings on examination (R70 – R99)
 Codes for special purposes, they are subcategories of unknown diseases and their
causes.
 Provisional assignment of new diseases of uncertain etiology (U00 - U49)
 Bacterial agents resistant to antibiotics (U80 – U89)
ICPC2
5
NCMP
NCSP
3.3
Recognizing chronic diseases
One of the goals in this project is to recognize chronic diseases. In order to be able to do it automatically
I used International Classification of diseases (ICD) coeds. ICD is used to classify diseases and other
health problems [3]. But in this classification they have not group chronic diseases.
3.3.1 Manually
For recognizing chronic disease I first tried to only use ICD10 Norwegian ontology. In this ontology, there
are classified ICD10 codes according Umls-SemanticType. They have add an attribute as UmlsSemanticType and classified the codes as disease or syndrome, finding, health care activity, congenital
abnormality, mental or behavioral dysfunction, injury or poisoning, pathological function, neoplastic
process or pathologic function.
Chronic Condition Indicator (CCI) provides classification for chronically disease on ICD9 codes. In this list
value 1 is for chronic disease and 0 for non-chronic [9]. The problem was there was no CCI for ICD10
codes. Therefore I had decided to convert ICD9 to ICD10 codes and then manually add an attribute as
Chronic in ICD10 ontology.
The problem was it was time consuming and I could not be one hundred percent sure that the codes are
correct.
3.3.2 Automatically
But after a while I had International Classification of Primary Care, second edition (ICPC2) codes [10].
These codes were divided into acute, chronic and not diseases categories.
In ICD10 ontology for each code there are relative ICPC2 codes, so I had decided to use ICPC2 codes for
recognizing chronic diseases in ICD10.
In ICPC2 codes there were some codes that were categorized as acute but in ICD10 were recognized as
chronic. When I looked up more this issue, found out that some acute diseases because they take more
than 3 months therefore they are considered as chronic diseases. For solving this problem I have
decided first query chronic ICPC2 codes in ICD10 ontology and later for completing my list, I will query
chronic as it is mentioned in the name tag in ICD10 ontology.
3.4
Ontology
3.4.1 Jena
In order to extract the information we require related to chronic diseases, I use Jena. Jena is an Apache
platform for semantic web application. It provides tools and java libraries for developing semantic web
[11]. Jena has different formats, such as, RDF/XML, N3, Turtle and N-triples. The ICD10 Web Ontology
Language (OWL) file has RDF/XML format. For reasoning and querying the ontology I will use Simple
6
Protocol and RDF Query Language (SPARQL). SPARQL is a SQL-like language for querying RDF data [12].
We can query ontology in two ways; one way is to do it through SPARQL panel in protégé. The other way
is to do it through java application by using Jena library. In this project I have decided to do the query in
java application, because later on it would be easier for user to apply any changes (flexibility).
Owl has three sublanguages, Owl Lite, OWL DL and OWL Full. The ontology is using OWL Lite language.
OWL Lite was initially built to support hierarchy classification. It supports cardinality constraints and it
only permits 0 or 1 as values.
3.5
Database
Explain about the database tables and their relations
3.6
Create episodes
Explain how we queried database and what the episodes are
7
Reference
[1] ICD-10-CM Official Guidelines for Coding and Reporting 2012,
[http://www.cdc.gov/nchs/data/icd10/10cmguidelines2012.pdf]
[2] ICD-10 Version 2012, [http://apps.who.int/classifications/icd10/browse/2010/en#/S05.0]
[3] http://www.who.int/classifications/icd/en/
[4] Chronic disease, WHO, http://www.who.int/topics/chronic_diseases/en/, visited in January 2013
[5] Event-Stream Format, Mikel Aickin, pp. 1-21, 2011
[6] Visualizing Patient Trajectories on Wall-Mounted Boards – Information Security Challenges. By Arild
Faxvaag, Lillian Røstad, Inger A.Tøndel, Andreas R. Seim, Pieter J. Toussaint. IOS press, 2009
[7] “A Workbench for Temporal Event Information Extraction from Patient Records” Svetla Boytcheva
and Galia Angelova, pp.48-58, AIMSA 2012.
[8] “Grounded Theory in Practice”. By Anselm Strauss and Juliet M. Corbin, March 11, 1997, pp. 231.
[9] http://www.hcup-us.ahrq.gov/toolssoftware/chronic/chronic.jsp
[10] http://www.who.int/classifications/icd/adaptations/icpc2/en/index.html
[11] http://jena.apache.org/
[12] http://opentox.org/data/documents/development/RDF%20files/JavaOnly/query-reasoning-withjena-and-sparql
8
Download