Generic Access to Synapses EHCR Data Jesús Bisbal, Gaye Stephens, Jane Grimson Department of Computer Science, Trinity College Dublin, Ireland. Abstract: The idea of collecting data from various sources and federating them in an open, generic and secure way to form a welldefined electronic healthcare record (EHCR) was the focus of the Synapses project. These data sources are referred to in Synapses terms as Feeder Systems. This paper reviews common types of Feeder Systems supported by current healthcare information systems, it describes the data they contain and presents a uniform mechanism of retrieving this data. 1. Introduction Synapses [Grimson98] was an ambitious project funded under the fourth framework of EU projects. The project ran for three years and was completed in January 1999. The focus of the project was to collect data from a variety of heterogeneous data sources and federate them in an open, generic and secure way to form a well-defined Electronic Patient Record (EPR). In Synapses terms, the data sources from which the EPR takes its data are known as Feeder Systems. These feeder systems are autonomous and highly heterogeneous, e.g. laboratory instruments, Laboratory Information Systems, Digital Imaging Systems, Patient Monitors, and even other Synapses servers. Two main concepts underpinning Synapses are the Synapses Object Model, or SynOM, and the Synapses Object Dictionary, or SynOD [Grimson98]. These ensure the flexibility needed in order to allow data from various medical fields to be represented and yet have the required consistency and security to ensure that patient records are transferred faithfully. The SynOM, rather than being a static model, is a collection of building blocks together with rules indicating how these blocks must be aggregated. Using these blocks and the aggregation rules, a healthcare organisation, department or professional can define their own healthcare record. It is this definition which comprises the SynOD. The SynOD contains the following information: • a definition of the structure of a particular record • details of where to source data to populate the defined structure • the type of access required to retrieve data from the appropriate Feeder System • the query to be issued to retrieve the relevant data It is important to note that a SynOD does not contain any patient data. The patient specific data is retrieved from the feeder systems in response to appropriate queries from the server and are appended to the EPR at runtime. In summary each site uses the same SynOM, while each SynOD is site specific. In the Synapses project there were five sites each representing different aspects of the medical domain and each building a Synapses server based on a common SynOM. The Synapses server referred to in this paper was designed and implemented by the site comprising institutions based in Dublin and Uppsala and was concerned with a healthcare record for the Intensive Care Unit (ICU). The reference site was St. James’s Hospital, Dublin. The Server which was designed and implemented by this group is known as the Dublin Synapses Server. As an additional part of the Synapses Server a software tool, the Record Structure Builder [Grimson98], or RSB, was designed and implemented (see Figure 1). An enduser (e.g. clinician) can use the RSB to specify the record components which comprise a SynOD. A data administrator can use it to specify the Feeder System(s) and query(s) to be used to populate the specified record components. It should be noted that the specification of the healthcare record and the queries can only reflect what the Feeder system returns. Feeder Systems are autonomous. They are also commonly legacy systems [Bisbal99], which means they are frequently inflexible providing data to external systems in one single pre-defined format. It is not therefore either feasible or desirable for a Synapses Server to dictate new requirements or expect the Feeder System to be re-engineered simply to ease co-operation with the Server. CGI Server Wrapper HTTP XML Client IIOP Web Server IIOP CORBA Server Wrapper IIOP Synapses Server Kernel ... IIOP Visual Basic Client Feeder systems C++ Client SynOD Record Structure Builder (RSB) Figure 1. Dublin Synapses Server Context Diagram Figure 1 shows a context diagram including the Synapses server, the set of clients currently implemented, the feeders systems containing patient data, the Record Structure Builder, and the SynOD. This Figure represents the architecture of the Server, showing how it interacts with the clients, the SynOD, the RSB, and the Feeder Systems. This remainder of the paper is organised as follows. The next section outlines the problems which arise from the need to access different types of Feeders Systems. Section 3 describes the different types of data from which the Dublin Synapses Server is currently retrieving data. The tool that provides a uniform access to this data, the Generic Adapter, is analysed in section 4. The final section summarises the paper and gives a number of future directions for this research. 2. Accessing Disparate Feeder Systems Types In hospitals and at healthcare providers’ sites there is an increasing trend to store patient data in electronic format. It is even possible nowadays to store this type of data in ambulatory situations or even in palm held devices. Healthcare data is inherently distributed and heterogeneous, varying, for example, in terms of content, storage format, and location. Integrating these disparate types of data would ease the data access possibilities of healthcare providers, assisting the decision making process, and ultimately improving the service offered to end users. In some instances integration engines [March90] have been used and in other situations proprietary (ad-hoc) solutions have been applied. Integration engines provide an interim solution to the problem of facilitating the exchange of data. Ad-hoc solutions, based on proprietary interfaces, are usually feasible for small healthcare centers and where the number of different types of data to be integrated is small. However neither approach is sufficiently generic or scaleable and would lead to considerable work and complexity when adding new feeder systems or adapting to changes in existing feeders. To deal with the storage format and location issues, Synapses has taken the following approach: 1. All Feeder Systems must provide their registered name and the access method(s) they support, e.g. ODBC (see section 3). These details are entered into a Feeder Systems information database (see section 4). 2. At runtime the Synapses server can use these details to determine Feeder System location and the type of connection to make. As these details are not used until runtime it means that the location or the access method can be changed as long as the new details are entered into the Feeder Systems information database. The third issue concerns the shape or organisation of the retrieved data. This is an important issue in Synapses as the EPR and the queries to retrieve the data populate the EPR are defined independently and can be redefined as required without changing the server. To address this issue, a generic data container (referred to in Section 4 as Generic Adapter) was designed to hold the data and its metadata. Using this data container the Synapses server can: • determine what the Feeder Systems returns (data/metadata) • populate the record by matching the metadata to the definition of record components provided by the end-user in the SynOD. Before the means of accessing Feeder Systems is described it is of interest to take a look at the type of Feeder Systems accessed by the Synapses server and the type of data they contain. 3. Feeder Systems Accessed by the Synapses Server The medical domain of interest to the Dublin Synapses server during the project was the Intensive Care Unit (ICU) and as a result the majority of Feeder Systems accessed reflected the type of data produced, required and stored in this domain. Table 1 shows the Feeder System names and the type of medical data accessed in Synapses. The Patient Management System (PMS) is a MS-Access application that is used in the ICU and comprises a database, a graphical user-interface and data processing. It is only the database part of the application that is of concern in this paper. In the case of the Laboratory Information System (LIS) the hospital runs a batch program to download information from the Telepath LIS into an MS-Access database. The Hospital Information System (HIS) was an emulated Feeder System as access to the hospital’s HIS was not possible. The Blood Gas Analyser and the Merlin Monitor downloaded their results to MS-Access databases which could then be used as Feeder Systems. Feeder System Name Patient Management System Hospital Information System Laboratory Information System Blood Gas Analyser Merlin Monitor Medical Data retrieved Admissions details Prognosis details Social details Demographic details Laboratory investigation results Laboratory investigation results performed at near bedside blood gas analyser Vital signs collected from monitors attached to patient. Table 1. Data Types Accessed in Synapses As far as the Synapses Server is concerned, all five Feeder Systems are MS-Access databases, although the data may have originated from other databases. An MS-Access database uses the Relational Model [Codd70] where data are contained in tables and each table is described by columns and rows. For all these Feeder Systems the type of medical data varies but the data format is the same. This means that the names of the columns, the type of data they could contain and the actual data values contained in the rows varies for each Feeder System. However, the fact that the data are stored in tables with rows and columns is the same for each, which reduces the level of syntactic heterogeneity facilitating concentration on the more difficult semantic issues surrounding interoperability between systems. The access method used to retrieve data from these Feeder Systems is Open Database Connectivity (ODBC), a de facto standard for data access. For each ODBC retrieval from a Feeder system, the Synapses server determined the type of data in each column and the name of the column. Since the end of the Synapses project, work has continued on the Synapses Server under a new project Synex [SynEx99, Ferrara99]. Synex is concerned with integrating a selection of components which were developed in previous EU projects. As a result of this collaboration, the Server has been used to access two other types of Feeder Systems not directly linked with the ICU, namely Distributed Health Environment (DHE) [Gesi99] based Feeder Systems and eXtensible Markup Language (XML) [Connolly97] based Feeder Systems. These two types of Feeder Systems are described in the next Sections. 3.1 DHE-Based Feeder Systems The Distributed Health Environment (DHE) is an implementation of the Health Information System Architecture (HISA) [HISA97] standard and is currently being used in a number of European hospitals. DHE is a system architecture used to encapsualte healthcare information and services provided on this data. This type of system could be used in a hospital department to encapsulate data or it could be used to encapsulate the data of an entire hospital or group of hospitals. At present Synapses views DHE as a Feeder System, i.e. a source of data. In the initial prototype, the Synapses server will support the retrieval of patient demographic data from DHE. 3.2 XML-Based Feeder Systems The second Feeder System type to which an interface has been developed is eXtensible Markup Language [Connolly97] (XML) based Feeder Systems. From Synapses point of view, an XML-based Feeder System is one which can provide data in XML format. It is unlikely that the actual feeder systems themselves will actually store the data as XML documents. But rather in the same way as SQL interfaces have been provided to nonrelational databases, so there is increasing interest in providing tools which extract data from a variety of different databases and present it in the form of standard XML Documents. XML is becoming the de facto standard for universal structured document format. An XML document contains both data and a description of its organisation, that is, its metadata. The organisation is achieved using tags in much the same way as tags are used in HTML to generate web pages. In HTML tags could be used to specify that a piece of text is to be presented in boldface, e.g. <bold> boldtext </bold> where <bold> and </bold> are the tags. In HTML the tags are prespecified whereas in XML new tags can be created and used. The definition of the tags and rules governing their usage are entered into a file called a Document Type Definition, or DTD. The XML-based Feeder System being used for the SynEx project prototype supplies Renal laboratory investigation results. In summary, the Synapses server can issue queries to ODBC, DHE and XML-based Feeder Systems. In each case the returned data is accompanied by its metadata thereby ensuring that the data can be correctly interpreted by the Synapses Server. In this way the Feeder Systems can to some extent evolve independently of the Synapses server. The Synapses Server component which performs connections to these three types of feeder systems, retrieves data from them and passes it back to the Synapses Server is called the Generic Adapter. This component is described in the next section. 4. Generic Adapter – Uniform Access to Data Sources As the functionality of the Dublin Synapses Server was extended it required access to a wider variety of data sources was required. The initial architecture of the Server, shown in Figure 1 of Section 1, assumed that the Server itself would handle the connections to these Feeder Systems and would retrieve the appropriate data. This architecture, therefore, required that the Synapses Server implemented a significant amount of functionality that was not directly related to creating EPRs. Also, this approach was not flexible since adding support for a new Feeder System type involved making changes to the Server, leading to a significant amount of additional work and increased complexity. For all these reasons, a new component has been incorporated into the architecture of the Dublin Synapses Server. This module, called the Generic Adapter, isolates the Server from the details of connecting to Feeder Systems and retrieving data. It also provides a well-defined interface through which the Server retrieves the required data, independently of the type of Feeder System. The ultimate goal of the Generic Adapter is to provide the Server with a Uniform View to the data in the Feeder Systems, making the overall architecture more modular, flexible, extensible, and scaleable. The resulting architecture is shown in Figure 2. This figure focuses on how the Synapses Server accesses Feeder System data. All clients shown in Figure 1 of Section 1, as well as the RSB, remain as before, although for simplicity they are not included in Figure 2.. When the Server needs to retrieve data from any Feeder System it will send a Generic Query to the Generic Adapter. This Generic Query contains information about the Feeder System to be queried and the information needed to retrieve the appropriate data. For example, in case of a relational database this Generic Query would contain the y IIOP ... DHE IIOP CORBA Server Wrapper DHE Data DHE API Generic Query Synapses Server Kernel ODBC API Generic Adapter Generic Result Relational Database XML Feeder Interface SynOD Feeder Info Mediator Data Source Figure 2. Dublin Synapses Server Architecture with the Generic Adapter name of the database and an SQL query. All this information is stored in the SynOD and can therefore be accessed directly by the Server. The Generic Adapter stores the information required to connect to the specified data source (location, IP address, password, etc.) in a database called Feeder Info in Figure 2. It also stores information about the type of data source to which the query refers, so that it can use the appropriate query mechanism. Figure 2 shows the three different types of data sources accessed by the current implementation of the Synapses server, as explained in Section 3. Relational data is accessed via ODBC Application Program Interface (API); DHE provides its own API to be used to retrieve its data; and finally an interface to an XML-based data source, called Mediator [Xu99], is currently under development. The query submitted to the Mediator is wrapped in XML, and the result is also in XML format. Potentially, the Mediator service can access any type of data source, which potentially allows the Server to retrieve data from a wide range of additional Feeder Systems. The Generic Adapter will handle the details of retrieving the data from each particular type of Feeder System, and will return this data structured as defined by a Generic Result type (see Figure 2). This type has been designed (see section 4.1) so that it can represent all the different types of data sources currently under consideration, i.e. relational data, DHE data and XML documents. It can be seen how, from the point of view of the Synapses Server Kernel, there is only one type of query, Generic Query, and one type of result, Generic Result. This greatly simplifies the workings of the Server and leads to a more flexible and extensible architecture. Adding support for a new data source would now only require expanding the functionality of the Generic Adapter so that it can query this type of data source, and then map the resulting data into the Generic Result type. The important benefit of this architecture is that the Server would does not need to know about the incorporation of this new Feeder System type. One last component of this architecture is what in Figure 2 is termed Feeders Interface. This module, tightly coupled with the Synapses Server Kernel, is responsible for interpreting the Generic Results returned by the Generic Adapter. The Generic Adapter is used to populate the healthcare electronic records the clients request from the Server. However, this is a generic module, without knowledge of the healthcare domain or healthcare records. The results it returns must be interpreted in different ways, depending on which kind of Feeder System supplied the data. This interpretation is done by the Feeder Interface, which maps data from the Generic Result into Record format. This module is clearly separated form the Server, although tightly related to it. Using this module, all the Server needs to request is for a particular part of a Record to be populated. The Feeder Interface will request the data from the Generic Adapter, and then interpret the result appropriately to map it into Record format. The combination of the Generic Adapter and Feeder Interface provides uniform access to disparate data sources, and gives this data the appropriate semantics to be used to build an EPR. 4.1 Generic Adapter Design The overall design of the Generic Adapter is shown in Figure 3. This is a UML Class Diagram representing the fact that different kind of queries (termed Statements) can be executed, namely queries to ODBC, DHE, or XML –based Feeder Systems. A connection to each of these Feeder Systems will therefore be required, which is also represented as different specialisation of the abstract class Connection in Figure 3. It is likely that the Server will send multiple queries to the same Feeder System over time (to populate different records, for example). Each of these queries would require a Connection to the Feeder System. In order avoid creating a new connection for each query, and therefore to reduce the overall time involved in connecting to Feeder Systems, another class, termed Connection Manager, has been introduced in Figure 3. This class will manage all existing connections, so that when a new query is executed it will re-use existing connections if possible. Source Manager contains Connection Manager 0..* Connection Statement produces ODBCConnection DHEConnection MediatorConnection ODBCStatement DHEStatement XMLStatement ResultSet +data +metadata 1..* 1..* describes Element 1..* Tuple XMLDocument 0..* Attribute MetaElement 1 TupleMetadata DTD 1..* AttributeMetadata Figure 3. Generic Adapter Design (Core) - UML Class Diagram As explained in the previous section there is a database, called FeederInfo, which is used to store details of the particular Feeder Systems available to the Server. A class, termed Source Manager in Figure 3, has been used to manage this database and provide an easy interface to the Connection Manager when it requires information in order to connect to a particular Feeder System. Finally, the Generic Result, referred to above, is shown in this Figure 3 as a class termed ResultSet. It can be seen how this consists of a list of type Elements (e.g. Tuple), each of them associated with their own description (termed MetaElement). This list of elements does not need to be homogenous, i.e. the design allows for one single ResultSet to store tuples, XML documents, etc, as a single result if required. The design of this Generic Result type is believed to be highly flexible. Only experience and the addition of support for new Feeder System types which have not yet been envisaged will prove whether it really is sufficiently flexible. The most clear design pattern that appears in this UML Class Diagram is the heavy use of inheritance in order to keep the design as extensible as possible, abstracting the different types of queries, connections and results. For example, if a new Feeder System type was to be added, all that would only be required would be new specialisations for types Connection and Statement. Although not shown in Figure 3, these two classes define an Interface which all specialisations must implement if they are to inherit from them. Once this interface is implemented, newly developed classes can be seamlessly integrated in this design providing access to a new Feeder System type. The generic adapter was implemented using Visual C++ v5.0 on a Windows NT 4.0 platform. It was compiled into a Library (.LIB) which could be accessed by the Synapses server. The Library is a separate piece of compiled code from the calling application and thus has the advantage of being developed independently of the caller so long as the interface to the methods remains unchanged. In the case of the XML-based and DHE-based Feeder Systems the Common Object Request Broker Architecture (CORBA) [Orfali97] communication handler was used. 5. Conclusions The Generic Adapter was designed and implemented as part of the Synapses server. It is capable of retrieving data from a number of disparate Feeder Systems. These Feeder Systems differ in the following ways 1. Location 2. Type of medical data they contain. 3. Storage format of the data. 4. Arrangement of the data. Through the use of a common interface offered by the Generic Adapter the Synapses server can issue queries to these types Feeder Systems in a uniform fashion. The result of the queries which includes the data and the metadata is transferred to the Synapses server in a uniform way. The Synapses server can use the metadata to interpret the result of the query. As well as addressing access to different types of Feeder Systems the Generic Adapter has focused on two other issues namely performance and extendibility: 1. Through its policy of managing the Feeder System connections, the number of Open and Close operations issued to the Feeder System is minimised 2. The Generic Adapter has been designed with a view to ease the incorporation of other Feeder System types The next type of Feeder system to be connected is one that contains images – a completely new data type - and this will provide a significant test of the genericity and flexibilty of the Generic Adapter. Finally the Generic Adapter was designed and implemented as a component of the Synapses server for the medical domain. It could, however, be applied to any other domain in which integrated access to heterogeneous, autonomous, distributed data sources was required. It must be emphasised that the objective when developing the Generic Adapter was only to provide a flexible and efficient solution to the problem at hand, that is, accessing different data sources to provide data to the Synapses Server. The Generic Adapter does not pretend to be a solution to the problems presented by more sophisticated approaches of dealing with distributed information, like distributed and heterogeneous databases [Bell92][March90]. 6. References [Bell92] D. Bell, and J. Grimson. ‘Distributed Database Systems’, Addison-Wesley Longman, Reading, Mass., 1992. [Bisbal99] J. Bisbal, D. Lawless, B. Wu, and J. Grimson. ‘Legacy Information Systems: Issues and Directions’, IEEE Software, 16(5), pp.103-111, Sep./Oct. 1999. [Codd70] E.F. Codd. ‘A Relational Model of Data for Large Shared Data Banks’, Communications of the ACM, 13(6), pp. 377-387, 1970. [Connolly97] D. Connolly, Editor. ‘XML – Principles, Tools and Techniques’, World Wide Web Journal, 2(4), Fall 1997. [Ferrara99] Ferrara, F.M., Grimson, B., The holistic architectural approach to integrating the healthcare record in the overall system, Proceedings MIE99, IOS Press, pp. 847-852, 1999. [Gesi99] http://www.gesi.it. [Grimson98] J. Grimson, W. Grimson, D. Berry, G. Stephens, E. Felton, D. Kalra, P. Toussaint, and W. Weier. ‘A CORBA-based integration of distributed electronic healthcare records using the Synapses approach’, IEEE Trans. on Information Technology in Biomedicine, 2(3), pp. 124-138, Sep. 1998. [HISA97] ‘Healthcare Information System Architecture’, CEN/TC251 prENV12967. [March90] S.T. March, Editor. ‘Special Issue on Heterogeneous Databases’, ACM Computing Surveys, 22(3), 1990. [Orfali97] R. Orfali, D. Harkey, J. Edwards. ‘Instant CORBA’, Wiley, 1997. [Synex99] SynEx Homepage, http://www.gesi.it/synex/. [Xu99] Y. Xu, D. Sauquet, E. Zapletal, and P.P. Degoulet. ‘Using XML in a Generic Model of Mediators’, Proceedings of XML Europe’99, pp. 697705, April 1999, Granada, Spain.