Honours Projects – 2005 Jon Patrick Language Technology Projects 1. Detecting the role of Prepositions in Language Prepositions are an important part of the glue of meaning in language. No truly successful semantic analyser of language will be successful without effective detection of the meaning of a preposition in its usage. The aim of this project is to build a system that detects the way a preposition is being using in a particular manner and meaning in each instance. This involves developing a system for describing the various meanings of prepositions and the various formal roles they can serve in their usage and then developing algorithms for detecting those meanings and roles in a sentence. 2. Recognition of Multi-word Expressions (MWEs) The task of recognising coherent clusters of words to form a single notion such as the names of people, places, organisations and other types of named objects (films, ships, music groups) is notionally simple but quite complicated as a computational task. A variety of methods can be used for this task but we favour a machine learning approach. This involves two possible methods, the first is using existing corpora to learn the optimal rule sets for accurately detecting MWEs. However such an approach produces an idiosyncratic solution for the training corpus. The second method is unsupervised learning and involves detecting MWEs without any prior knowledge. A strategy that combines both methods with some added intervention by manual processing. This project aims to set up a general framework for building a system for running MWE experiments and producing classifiers as a combination of a variety of Machine Learning methods. 3. Generalised canonicalisation of texts Certain genres of texts, such as articles originating from newspapers or journals, or formal case reports written up by doctors on their patients, are generally expected to conform to conventions of structure and readability. Yet, other types of texts, such as notes on a patient written by a GP for later review, or emergency department reports entered by triage nurses as patients enter the ED, are marked by problems which are prone to cause issues in the quality of any text processing we may wish to perform. Examples of problems arising from such texts are variance in the representation of core medical concepts (whether unconscious, such as typographical errors, or conscious, such as abbreviations), and the occurrences of different notations to signify the same concept (for example, the many ways in which a doctor might denote a blood pressure reading). Yet other problems are general to nearly all texts in the medical domain, such as the recognition of multiword expressions (for example, we would wish “arterial blood pressure” to be considered as a single unit, as opposed to three distinct and unrelated tokens). The aim of this project is to design the architecture for text preprocessing and implement a strategy for making it operational. The pre-processed text is then passed into usual processors such as part of speech taggers or parsers but in a reduced form that makes their processing task easier. The task is a combination of language technology and software engineering. Health Informatics Projects The Health Informatics program offers scholarships to Honours students for those projects supported by sponsors. Students need to apply for a generic scholarship on and will be assigned to a project that best suits their interests and skills in the context of all applicants. Auto-Generation of Specialised Health Information Systems There are a number of projects aimed at advancing the research into the process of generating operational information systems from a master system. This is the idea of having an information system generator that creates particular systems for medical specialities such as Intensive Care, Cardiology, Oncology, Obstetrics, etc. The projects below aim to address different facets of this general problem. 1. Emergency Department System Sponsors: Centre for Health Informatics R&D, Background The current ED information Systems are considered to be antiquated technology that needs to be redeveloped. They limitations in that they not interact with other information systems within the especially X-ray and pathology systems. There have been efforts by many Area Health Services to encourage the Government to invest in creating a new generation solution but to date that has not reached high enough up the priority list of activities. Project Objectives This project aims to produce a prototype of a an ED system that tackles the implementation issues in the context of a generic solution to generating application specific information systems in the framework of an Health Information Management System. Project Resources Work in the Summer Research program has a produced a systems analysis of the EDs at Westmead, Nepean, Auburn and Blacktown hospitals. Also there is a historical set of specifications for ED systems which will be made available from a group of doctors specializing in this area of medicine. Deliverables A methodology for generating IS from a master system which has as its major functions Incorporation of medical terminology systems Real-time data capture of point of care information, e.g., text descriptions, bedside measurements (pulse, blood pressure, temperature, etc.) Interaction with other information systems for data requests and storage Storage of a standardised electronic medical records, Retrieval of EMRs Analytics of aggregated EMRs. Project Option – Patient and Staff Tracking Research technologies useful for real time data capture in emergency departments. Tracking the location of a patient and the time they remain in a particular area is essential in an emergency department – a patient may be in the waiting area, triage area, X – ray, bathroom, or in their bed. This is particularly important if the patient is a risk to staff, other patients and/or to themselves. Another requirement that emergency departments must comply to is that clinicians have to report when treatment begins for an in-patient. Often this does not happen as they are treating the patient first – then back tracking to enter the time, resulting in an inaccurate time stamp. Other times that may need may also need to be considered is the time when – the patient arrives off the ambulance – the patient is physically in a bed – the patient is triaged – test are request and/or sent – test results arrives – consultants or specialists are notified to come to the emergency department – consultants or specialists receive their notification – consultants or specialists arrive a the emergency department – the decision is made to send a patient to a ward – the time the patient physically leaves the emergency department There is a great need for accurate but effortless time stamping in emergency departments as the government agencies pay for a patient (and consequently their treatment) is independent of the performance of the emergency department but rather the length of stay in the department. Right now, the most popular choice is Radio Identification (Rfid), placing a chip in patient armbands. This also solves the problem of identifying patients when arriving at labs as identification is immediate. Technology that can support real time data capture may be biometric recognition devices. This project will require the identification of similar devices, research into how they work and support the work flow of medical staff. A social and technological analysis would also be desirable of the issues surrounding real time data capture of the aforementioned events. 2. Allied Health System Sponsor: Sydney West Area Health Service General background: AHMIS is a PowerHouse VAX based system designed to assist the operation of allied health services. The software was developed as a common system for allied health disciplines including Physiotherapy, Occupational Therapy, Speech Pathology, Social Work and Nutrition and Dietetics. It is also used at some sites by Audiology, Diversional Therapy, Orthoptics, Orthotics Play Therapy, Podiatry, and Psychology. Interfaces exist to the patient administration system HOSPAS and clinical costing system Trendstar. The current Version 4.1 includes a new collection for non-inpatient data. Generic HL7 PAS interfaces are being developed. The AHMIS Consortium has nine (9) Area members. The VAX based tool Powerhouse is now considered obsolete technology and the system has been in desperate need of redevelopment for some years. Delays in progressing the redevelopment have been due to a concern that the functionality would be more effectively addressed by the core ICIP systems, and that the batch delayed data entry and focus on recording activity does not fit current directions in IT system in health. However this system is widely used and accepted and the viability of replacing it needs to be assessed. Currently this technology is deployed as indicated above in the majority of the facilities within Sydney West Area Health Service, except Lithgow and Portland. Scope Sydney West is keen to undertake a design, build and go-live for the Lithgow District Hospital as they are currently running these services solely on paper. It is expected to be a cut down version of the eventual outcome of the above. The design of the end state will be used to retrofit this interim approach. Scope of the above to include the functionality of the existing allied health system (patient based aspects). So that in the event there is a further delay of the end state design document, the interim solution developed can be rolled out to replace the existing system at all other facilities. All allied health services at Lithgow have already been moved onto outpatient scheduling to do electronic bookings. There are ample PC's and printers for use of any electronic system. It is expected that Lithgow will be live by the completion of the student project. Other participants in project (non users): - Current vendor - 1 x Cerner support staff member as required Challenges - Lithgow (apart from scheduling) is fully paper based in their processes including statistics - Lithgow provides outreach services to other facilities outside of AHS (consideration will need to be given to determine how this activity links into this approach as part of the project) - Inter-releationship with the rest of the Allied Health Network within the AHS in respect to this project. Deliverables - Review of current services i.e. physiotherapy, speech, Occupational Therapy, podiatry etc, including inpatient/outpatient, no of patients, etc - Systems analysis of workflows - System design of new user system (within the Cerner EMR suite, including interaction with the Scheduling system), reporting requirements, testing approach, testing, and end user training and documentation. - System Build (does not require expert programming skills, as development is GUI driven) 3. Obstrix System General background: With over 85,000 births in NSW each year, obstetrics and neonatal services are key services in NSW Health. The Health Council Report specifically mentions issues relating to these services, and Maternal Health is one of the GMTT focus areas. There are fundamental differences from many other hospital-based specialties due to its focus on longitudinal medical record and the shared care aspects of obstetrics, which often crosses AHS boundaries. The combination of antenatal screening information, patient and family history together with birth outcomes makes it an invaluable clinical resource which can directly influence patient care. Maternity services commence in the antenatal phase, well prior to admission and knowledge about previous pregnancies is often critical. The strategic relevance of this service is reflected in the fact that the Midwives Data Collection is the longest running data collection database and is recognised as being more accurate and complete than other collection. OBSTET provides the majority of the MDC data which is submitted electronically. The OBSTET system has an exceptionally high level of user commitment. It is used extensively (9 AHSs) by clinicians and administration staff, including obstetricians and midwives who enter the data directly. Both private and public hospitals are currently consortium members and the participation is growing. The OBSTET system stores maternal and neonatal health records on over 250,000 mothers and babies in NSW. Its combination of antenatal screening information, patient and family history together with birth outcomes forms a comprehensive longitudinal obstetric health record. At present 60% of all births in NSW (around 50,000 pa) are recorded on OBSTET with whole families now being recorded and linked since 1992. Implementation of the system has achieved significant benefits in the standardisation of work practices, data definitions and collection. The system is highly regarded within the profession. Indeed, despite its obsolete technology platform, other states and private hospitals have expressed keen interest. It also forms an epidemiological resource and has been used for research eg by the Simpson Centre at Liverpool. The success of the OBSTET system is also reflected in the many demands that come from other business sectors to use the system for a vehicle for extended data collection e.g. domestic violence and drug and alcohol. There is also a push to redevelop and extend the system to cover Integrated Perinatal requirements, which will further enhance the continuity of care. OBSTET is a Vax based application which in 2005 was rewritten as a stand alone application, however, on a web based/SQL platform and called OBSTRIX. Currently OBSTETS is running in the hospitals from Auburn to Blacktown, and OBSTRIX is installed from Nepean to the Blue Mountains. Nothing is installed at Lithgow. There is a current business case being developed to replace OBSTETS with OBSTRIX, which will be approved on the basis that it is an interim strategy which has an end of life and they sites will move to an EMR based version within a specific period. All antenatal and postnatal services are being scheduled via a Scheduling system, and all statistical, reports and clinical documentation is undertaken within the Cerner Electronic Medical Record Staged Project It is expected that the total requirements of the OBSTRIX application will be moved across to the EMR proposed system in one step. However, the rollout plan will restrict the application in the first instance to be rolled out to Lithgow initially, and then the user community will prioritise the other locations. Scope As a result of the above, Sydney West is keen to undertake a design, build and go-live for the Lithgow District Hospital as they are currently running these services solely on paper. The scope of the system is to include the functionality of the existing OBSTRIX. It is expected that Lithgow is live prior to the completion of the students work. Other participants in project (non users): - Current vendor - 2 x Cerner support staff member as required Challenges - Lithgow (apart from scheduling) are fully paper based in their processes including statistics - Inter-relationship with the rest of the Women’s and Children’s Health Network within the AHS in respect to this project. - Inter-relationship with the Obstrix Consortium that have overall carriage and direction for the product. - The EMR version will not be as easy to use as the current OBSTRIX version, so the benefits of using an EMR versus a standalone system will need to be part of an education program - The current system links moth and baby. A mechanism of how this could occur needs to be explored further within the EMR version. Deliverables - review of current services ie, including inpatient/outpatient, no of patients, etc - construct a description of workflows - System design of new user system (within the Cerner EMR suite, including interaction with Scheduling system), reporting requirements, testing approach, testing, and end user training documentation. - System Build (does not require programming skills as development is GUI driven) 4. Terminology Server Sponsor: National Centre for Classification in Health A terminology server (TS) serves the role of delivery of a large terminology vocabulary into an information system so that the terminology is available for describing data that is found the real world. In the health sector to achieve consistent use of a diverse vocabulary one has to have available in the IS all possible natural language expressions of a given concept and allow the user to enter their own variant of the text that expresses that concept. An elaborate system would provide for checking combinations of terms to be checked against a knowledge base (an ontology) to ensure the entered data was consistent with a given body of knowledge. For example a well known system permits entry of the condition “suicide” with the adjectives “mild” “severe”. The problem is more acute with the naming of drugs where product names are readily confused with the active ingredients and so at many sites it is impossible to do reliable data analyses of the usage of drugs. A comprehensive terminology service encompasses 3 major functions: content creation, maintenance, and services. Content creation and maintenance is achieved by storing data in a database and providing access to the data for content matter experts who can review and edit the data. From this point the server has to deliver the specialised terminology to service applications running on remote systems. In remote systems the terminology has to be delivered in a usable way and the TS should deliver the working code for this functionality. In particular in the case of an ontology the TS should deliver services for checking data entry against the ontology for consistency and supplying error functions for responses to problems. The ultimate goal of a health IS is to deliver sensible analytics in a speedy and easy to use form. The TS can assist in this function by making available the language in the ontology as an aggregating function for the analytics and therefore not require the development of separate programming functions. Objective The objective of this project is to build a terminology server for the SNOMED CT ontology and experiment with methods of delivering it to an Emergency Department IS and provide retrieval and analytics functions in an automatic way.