Automatic Checking of the Correctness of Clinical Guidelines in GLARE Paolo Terenzianib, Luca Anselmaa, Alessio Bottrighib, Laura Giordanob, Stefania Montanib a b DI, Università di Torino, Corso Svizzera 185, 10149 Torino, Italy, E-mail: anselma@di.unito.it DI, Univ. del Piemonte Orientale “Amedeo Avogadro”, Spalto Marengo 33, 15100 Alessandria, Italy, Abstract Representing clinical guidelines is a very complex knowledgerepresentation task, requiring a lot of expertise and efforts. Nevertheless, guideline representations often contain several kinds of errors. Therefore, checking the well-formedness and correctness of a guideline representation is an important task, which can be drastically improved with the adoption of computer programs. In this paper, we discuss the advanced facilities provided by the GLARE system to assist physicians in the production of a correct representations of clinical guidelines. Keywords: Artificial Intelligence, clinical guidelines, syntactic and semantic correctness Introduction Clinical guidelines are a mean for specifying the “best” clinical procedures and for standardizing them. Despite the fact that they are usually produced as a result of long-term cooperative efforts of large teams of experts, clinical guidelines representations can nevertheless contain different forms of errors and/or inconsistencies. This fact seems to us a natural consequence of the intrinsic difficulty of organizing and representing huge amounts of both explicit and implicit knowledge. The dimension of guidelines is usually such that it makes infeasible an extensive human check of correctness. On the other hand, advanced Artificial Intelligence techniques can be used in order to automatize large parts of such a check. In the rest of the paper we describe the advanced facilities provided by GLARE (Guideline Acquisition, Representation and Execution), a domain-independent prototypical system to acquire, represent and execute clinical guidelines, in order to check the correctness of guidelines being represented. Background In recent years, the medical community has started to recognize that computer-based systems dealing with clinical guidelines provide relevant advantages, since, e.g., they can be used to support physicians in the diagnosis and treatment of diseases, or for education, critical review and evaluation aims [5]. Thus, many different approaches and projects have been developed in recent years to create domain-independent computer-assisted tools for managing clinical guidelines (see e.g., Asbru [13], EON [9], GEM [14], GLIF [10, 11], GUIDE [12], PROforma [3], and also [5, 7, 8, 16]). Besides the above-mentioned advantages, we believe that computer-based tool might also play a crucial role in assisting experts in the extremely difficult task of producing correct guideline representations. In the following, we first sketch the main features of our system, and then we show how it has been extended in order to provide several form of automatic checks, to enhance the production of terminologically, syntactically and semantically correct representations of guidelines. The GLARE system GLARE (Guideline Acquisition, Representation and Execution) is a domain-independent tool to acquire, represent and execute clinical guidelines [4, 15]. It has been built, starting from 1997, in a long-term cooperation between Dipartimento di Informatica of Università del Piemonte Orientale, Alessandria, Italy, and Azienda Ospedaliera S. Giovanni Battista, Torino, Italy, one of the largest hospitals in Italy. In the rest of this section, we sketch some of the more interesting general features of the GLARE’s approach, while in the following section we focus on GLARE’s advanced approach to check the correctness of guideline representations. Representation formalism. In order to guarantee usability of the program to physicians not expert in Computer Science, in GLARE we aimed at defining a limited set of clear representation primitives, covering most of the relevant aspects of a guideline [15]. We distinguish between atomic and composite actions (plans), where atomic actions represent simple steps in a guideline, and plans represent actions which can be defined in terms of their components via the has-part relation. The has-part relation supports top-down refinement: a guideline itself can be seen as a composite action. Control relations establish which actions can be executed next, and in what order. We distinguish between four different control relations: sequence, controlled, alternative and repetition. Four different types of atomic actions have been defined as well: work actions (actions to be performed at a certain step of the guideline), query actions (requests for information), decisions (selections among alternatives) and conclusions (explicit output of a decision process). Actions are described in terms of their attributes. Acquisition and Execution tools. As in most approaches in the literature, GLARE distinguishes between the acquisition phase (when a guideline is introduced into the system –e.g., by a committee of expert physicians) and the execution phase (when a guideline is applied to a specific patient). Therefore, the system is composed by two main modules, the acquisition tool and the execution tool. The tools strictly interact with a set of databases, including the terminological database (during acquisition) and the patient database (during execution). The acquisition tool provides a graphical interface to acquire atomic actions, has-part relations and control relations between the components of plans. The guideline is depicted as a graph, where each action is represented by a node (different forms and colours are used to distinguish among different types of actions), while control relations are represented by arcs. By clicking on the nodes in the graph, the user can trigger other windows to acquire the internal descriptions (attributes) of the nodes. The interface also shows the hierarchical structure of the guideline in the form of a tree, where plans can be seen as parents of their components (see figure 1). We have already tested our representation formalism and acquisition tool prototype. Several groups of expert physicians, following a few-hour training session, used GLARE to acquire algorithms concerning different clinical domains (e.g., bladder cancer, reflux esophagitis, and heart failure), with the help of a knowledge engineer. In all the tests, our representation formalism and acquisition tool proved expressive enough to cover the clinical algorithms, and the acquisition of a clinical guideline was reasonably fast (e.g., the acquisition of the guideline on heart failure, starting from a non-structured textual representation, required only 3 days). Methods Our acquisition module provides an “intelligent” interface to expert-physicians, in the sense that it helps them in the task of acquiring a consistent guideline. In order to achieve this goal, our acquisition module supports different types of consistency checking, which automatically operate whenever the expertphysician modifies (typically, with the addition of new nodes —actions—, arcs —control relations— or descriptions of the attributes of a node) a guideline. Other checks can be applied afterwards, when an entire guideline has been acquired, to verify its semantic consistency. The execution tool is typically used “on-line”: a user physician applies a guideline with reference to a specific patient. This method is used for integrating guidelines into clinical practice. Moreover, GLARE is available for “off-line” execution, i.e. for education, critical review and evaluation purposes. The execution tool also provides a decision support facility, which allows physicians navigate through the guideline to see and compare alternative paths (stemming from decision actions). Checking terminological correctness This first type of consistency checking is automatically triggered whenever the expert physician introduces a new term or value within the description of an action in a guideline. The acquisition module strictly interacts with the clinical vocabulary Database in order to provide expertphysicians with a standard terminology, and with a standard range of values for clinical findings. We currently support two modalities for the execution of the acquisition tool: the “safe” and the “advanced” modalities. In the “safe” mode, the expert physician is only allowed to introduce in the guideline terms/values that have already been defined within the clinical vocabulary Database. In order to make this task easier, the acquisition tool provides a means of browsing the clinical vocabulary Database, on the basis of the hierarchical organization of the data it contains. For instance, in the decision “GERD differential diagnosis” one criterion regards duration of heartburn. This datum can be found by browsing the database as follows: patient history (section) subjective symptoms (class) specific complaints (category) heartburn (datum) duration (attribute). The possible values of this attribute, as stored in the Clinical Database, are “null”, “less than 3 months”, “more than 3 months”. Thus, in the “safe” mode, the acquisition tool enforces all data and values to be consistent with the dictionary provided by the Clinical Database. In the “advanced” mode is we also allow expert physicians to introduce new terms/values, that are not already contained within the Clinical Database. In such a case, a warning is shown to the expert physician. If s/he decides to go on, s/he is directly responsible for the correctness of the new term/value. Testing. Checking syntactic consistency Figure 1: A window of GLARE’s acquisition tool graphical interface (concerning part of the gallbladder stones treatment guideline): on the left, the hierarchical structure of the guideline is displayed; on the right, the representation of control relations is shown in form of a graph The second type of consistency checking is automatically triggered whenever the expert physician introduces a node or arc within a guideline. The acquisition module checks whether the new element being introduced is consistent with several “logical design criteria” of guidelines we want to enforce with our tool. For example, the acquisition module: (i) checks that each alternative is preceded by a decision; whenever alternative arcs exit a node, the acquisition module checks that such a node is a decisional one. This check allows for the fact that whenever alternative ways of achieving goals are considered in the guideline, the guideline also contains an explicit way of discriminating between them. This property is important especially at execution time, since, in such a way, the execution tool can provide specific support to the user physicians whenever they have to choose from alternatives; (ii) checks that decision actions are preceded by query actions specifying all the data involved in the decision criteria. Thus, at execution time, a decision action is executable only after all necessary data for that decision are available. Checking semantic consistency: temporal constraints In most therapies, actions have to be performed according to a set of temporal constraints concerning their relative order, their duration, and the delays between them. Additionally, in many cases, actions must be repeated at regular (i.e., periodic) times. Furthermore, it is also necessary to carefully take into account the (implicit) temporal constraints derived from the hierarchical decomposition of actions into their components and from the control-flow of actions in the guideline. Checking the consistency of such a set of implicit and explicit constraints is a very hard task, which cannot be performed manually by any expert. On the other hand, within the Artificial Intelligence community, several approaches to perform automatically the propagation of temporal constraints and to check their consistency have been developed [17]. Despite the large amount of valuable works, there still seems to be a gap between the range of phenomena covered by current AI temporal reasoning approaches and the needs arising from clinical guidelines management. In particular, in clinical guidelines, (1) qualitative and quantitative constraints, as well as repeated/periodic events need to be considered at the same time; all types of constraints may be imprecise and/or partially defined; (2) a structured representation of complex events (in terms of part-of relations) must be supported, to deal with structured descriptions of the domain knowledge; (3) the distinction between classes of actions (e.g. an action in a general guideline) and instances of such actions (e.g., the specific execution of an action in a guideline) has to be supported; Obviously, the interplay between issues (1)-(3) needs to be dealt with, too. For example, the interaction between composite and periodic events might be complex to represent and manage. In fact, in the case of a composite periodic event, the temporal pattern regards the components, which may, recursively, be composite and/or periodic events. For instance, consider Ex.1. In Ex. 1, the instances of the melphalan treatment must respect the temporal pattern “twice a day, for 5 days”, but such a pattern must be repeated for six cycles, each one followed by a delay of 23 days, since the melphalan treatment is part of the general therapy for multiple mieloma. (Ex. 1) The therapy for multiple mieloma is made by six cycles of 5-day treatment, each one followed by a delay of 23 days (for a total time of 24 weeks). Within each cycle of 5 days, 2 inner cycles can be distinguished: the melphalan treatment, to be provided twice a day, for each of the 5 days, and the prednisone treatment, to be provided once a day, for each of the 5 days. These two treatments must be performed in parallel. Unfortunately, no current approach in the AI and in the guideline literature proposes a comprehensive approach in which all the above phenomena can be represented, and correct, complete and tractable temporal reasoning can be performed. In GLARE, we define an approach addressing all the above-mentioned issues [1]. A complete automatic treatment of temporal constraints involves, besides the design of an expressive representation formalism, also the development of suitable temporal reasoning algorithms operating on them, to be applied both at acquisition and at execution time. However, subtle issues such as the trade-off between the expressiveness of the representation formalism and the tractability of correct and complete temporal reasoning algorithms have to be faced in order to deal with temporal constraints in a principled and well-founded way; few works in the area of computerized guidelines have deeply analyzed this topic so far. As a starting point, we have chosen to rely as much as possible on STP (Simple Temporal Problem) [2], a well known and consolidated Artificial Intelligence approach coping with different types of temporal constraints. However, we had to extend it, in order to cope with the beforementioned additional temporal issues. Specifically, We have chosen to model the constraints regarding repeated actions into separate STPs, one for each repeated action. Thus, in our approach, the overall set of constraints between actions in the guideline is represented by a tree of STPs (STP-tree henceforth). The root of the tree (node N1 in the example in Fig. 2) is the STP which homogeneously represents the constraints (including the ones derived from the control-flow of actions in the guideline) between all the actions in the guideline (e.g., in N1, the fact that the duration of the chemotherapy is 168 days), except repeated actions. Each node in the tree is an STP, and has as many children as the number of repeated actions it contains. Each edge in the tree connects a pair of endpoints in an STP (the starting and ending point of a repeated action) to the STP containing the constraints between its subactions, and is labeled with the list of properties describing the temporal constraints on the repetitions. For example, in Fig. 2, we show the STP-tree representing the temporal constraints involved by the example Ex. 1. Figure 2: STP-tree for the multiple mieloma chemotherapy guideline. Thiny lines and arcs between nodes in a STP represent bound on differences constraints. Arcs from a pair of nodes to a child STP represent repetitions. Arcs between any two nodes X and Y in a STP of the STP-tree are labeled by a pair [n,m] representing the minimum and maximum distance between X and Y. In order to check the consistency of the STP-tree, it is not sufficient to check the consistency of each node separately. In such a case, in fact, we would neglect the repetition/periodicity information. Temporal consistency checking, thus, proceeds in a top-down fashion, starting from the root of the STP-tree. Basically, the root contains a “standard” STP, so that the Floyd-Warshall’s algorithm can be applied to check its consistency (as shown in [2]). Thereafter, we proceed top down towards the leaves of the tree. For each node in the tree, we first check that the constraints on the arcs, considereds alone, are consistent. If so, we then merge such constraints with the constraints in the son node, and propagate the resulting constraints to verify the joint consistence. We also formally proved the following. Property. Our algorithm to check the consistency of STPtrees is correct, complete, and operate in polynomial time. Checking semantic “logical” consistency Besides the property of being temporally consistent, several other semantic properties (e.g., logical consistency, safeness) should be checked on clinical guidelines. Specifically, we have identified four different classes of properties relevant in the clinical guideline context, and we have proposed a general and task-independent approach to check all of them. (i) (ii) Properties concerning a guideline “per se”. One can check if the guideline contains a path of actions satisfying a given set of conditions (e.g., a path including actions X, Y and Z, or a path in which no action of type X is executed, or a path nor requiring a given laboratory test, or a path requiring only a given set of resources, and so on); Properties of a guideline in a given context. Specific contexts of execution may impose several limitations on the executable actions of guidelines, related, e.g., to the lack of certain resources (e.g., laboratory instruments). The consequences of such limitations may be automatically investigated. For instance, one can check whether there is or not a therapy for a patient affected by a given disease, in the case a specific set of resources is available (not available). (iii) Properties of a guideline when applied to a specific patient. For instance, the feasibility of a given action or path of actions on the specific patient can be checked. (iv) Integrated proofs. Any combination of the above types of checks can be performed. For instance, one may ask whether, given a patient with a specific disease and set of symptoms, and given an hospital with a specific set of resources, there is a path in the guideline which applies to the patient and satisfies a given set of properties. In our approach, we provide a general way of proving all the above properties by loosely coupling GLARE with the model checker SPIN [6]. Roughly speaking, we have devised a tool to map clinical guidelines acquired by the GLARE system into the Promela language, which is the language used by the SPIN model-checker. Promela allows a high level model of a distributed system to be defined by modelling each agent in an extended pseudo-C code, including synchronization primitives and message exchange primitives. Specifically, GLARE’s guidelines are translated into a set of agents (e.g., the agents representing the guideline actions, the agent representing the physician executing the guideline, and so on). Once we have the translation of GLARE’s guidelines in Promela code, we can use SPIN as a general-purpose engine to prove any property that can be expressed in the temporal logic LTL. In fact, SPIN translates each process (each agent) into a finite automaton, and the global behaviour of the system is obtained by computing an asynchronous interleaving product of automata. The resulting automaton represents the global state space of the system and can built on-the-fly during the verification process. The property which has to be verified on the system is passed to the verifier through an interface, which maps it into a temporal formula, as required by SPIN. SPIN converts the negation of the temporal formula into a Büchi automaton and computes its synchronous product with the system global state space. If the language of the resulting Büchi automaton is empty then the property is true on all the possible execution of the system, otherwise the verifier provides a counterexample for the property (an execution path on which it is false). In such a way, we provide a general-purpose approach to automatically check all the types of properties discussed above. Results GLARE provides a set of facilities to help physician during the acquisition of clinical guidelines, and to verify a-posteriori the correctness of guidelines being represented. To the best of our knowledge, no other guideline system in the literature provides such a large set of facilities. GLARE’s facilities have proven to be quite effective in our testing activities. In particular, in the cases in which a textually written guideline had to be entered into the GLARE systems, the adoption of GLARE facilities allowed us to detect different kinds of errors in the original guidelines. In certain cases, these errors were simply due to omissions (e.g., lack of a decision action to discriminate between alternative paths of actions). In other cases, they were due to the human impossibility to propagate constraints (and, in particular, temporal constraints) along long paths of actions. In all cases the physician experts agreed that the errors detected by the GLARE systems were “genuine” errors in the original guidelines, and agreed to correct them. Acknowledgments We acknowledge Prof Gianpaolo Molino and Dr. Mauro Torchio of Azienda Ospedaliera S. Giovanni Battista, Turin, Italy for their cooperation in defining and testing the GLARE system. The research reported in this paper has been partially supported by a grant from Koine Sistemi, Torino, and by PRIN’06. References 1. L. Anselma, P. Terenziani, S. Montani, A. Bottrighi. Towards a Comprehensive Treatment of Repetitions, Periodicity and Temporal Constraints in Clinical Guidelines. Artificial Intelligence in Medicine, 2006, 38(2): 171-195. 2. R. Dechter, I. Meiri, J. Pearl, Temporal Constraint Networks, Artificial Intelligence , 1991; 49: 61-95. 3 J. Fox, N. Johns, A. Rahmanzadeh, R. Thomson, Disseminating medical knowledge: the PROforma approach, AI in Medicine, 1998; 14: 157-181. 4 L. Giordano, P.Terenziani, A. Bottrighi, S. Montani, L. Donzella, Model Checking for Clinical Guidelinesç an AgentBased Approach, Proc. AMIA 2006, Washington D.C.; November 2006. p.289-293. 5. C. Gordon and J.P. Christensen, eds., Health Telematics for Clinical Guidelines and Protocols (IOS Press, Amsterdam); 1995. 6. G.J.Holzmann, The SPIN Model Checker. Primer and Reference Manual. Addison-Wesley; 2003 7. JAMIA, Focus on Clinical Guidelines and Patient Preferences, JAMIA, 1998; 15(3). 8. Special Issue on Workflow Management and Clinical Guidelines, D.B. Fridsma (Guest ed.), JAMIA, 2001; 22(1):180. 9. M.A. Musen, S.W. Tu, A.K. Das, and Y. Shahar, EON: A component-based approach to automation of protocol-directed therapy. Journal of the American Medical Information Association 1996; 3(6): 367-388. 10. L. Ohno-Machado, J.H. Gennari, S. Murphy, N.L. Jain, S.W. Tu, D.E. Oliver, et al., The GuideLine Interchange Format: A Model for Representing Guidelines, JAMIA , 1998; 5(4):357-372. 11. M. Peleg, A.A. Boxawala, et al., GLIF3: The evolution of a Guideline Representation Format, in: Proc. AMIA Annual Symposium; 2000. 12. S. Quaglini, M. Stefanelli, A. Cavallini, G. Miceli, C. Fassino, and C. Mossa, Guideline-based careflow systems, Artificial Intelligence in Medicine, 2000; 20(1):5-22. 13. Y. Shahar, S. Mirksch, P. Johnson, The Asgaard Project: a Task-Specific Framework for the Application and Critiquing of Time-Oriented Clinical Guidelines, Artificial Intelligence in Medicine, 1998; 14:29-51. 14. R.N. Shiffman, B.T. Karras, A. Agrawal, R. Chen, L. Menco, and S. Nath, GEM: a proposal for a more comprehensive guideline document model using XML, JAMIA, 2000; 7(5):488-498. 15. P. Terenziani, G. Molino, M. Torchio. A Modular Approach for Representing and Executing Clinical Guidelines. Artificial Intelligence in Medicine 23 (2001) 249-276. 16. S.W. Tu, M.S. Mark, A. Musen, A Flexible Approach to Guideline Modeling, in: Proc. AMIA’99; 1999. p. 420-4. 17. L. Vila, A Survey on Temporal Reasoning in Artificial Intelligence, AI Communications, 1994; 7(1):4-28. Please address all correspondence to Paolo Terenziani Prof. Paolo Terenziani, Dipartimento di Informatica, Universita’ del Piemonte Orientale “Amedeo Avogadro” Via Bellini 25\g, 15100 Alessandria, Italy Email: terenz@mfn.unipmn.it -- Phone: +39 0131 360174