This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 Similarity measure between patient traces for clinical pathway analysis: problem, method, and applications Zhengxing Huang*, Wei Dong*, Huilong Duan**, Haomin Li** Abstract—Clinical pathways leave traces, described as event sequences with regard to a mixture of various latent treatment behaviors. Measuring similarities between patient traces can profitably be exploited further as a basis for providing insights into the pathways, and complementing existing techniques of clinical pathway analysis, which mainly focus on looking at aggregated data seen from an external perspective. Most existing methods measure similarities between patient traces via computing the relative distance between their event sequences. However, clinical pathways, as typical human-centered processes, always take place in an unstructured fashion, i.e., clinical events occur arbitrarily without a particular order. Bringing order in the chaos of clinical pathways may decline the accuracy of similarity measure between patient traces, and may distort the efficiency of further analysis tasks. In this paper, we present a behavioral topic analysis approach to measure similarities between patient traces. More specifically, a probabilistic graphical model, i.e., Latent Dirichlet Allocation, is employed to discover latent treatment behaviors of patient traces for clinical pathways such that similarities of pairwise patient traces can be measured based on their underlying behavioral topical features. The presented method provides a basis for further applications in clinical pathway analysis. In particular, three possible applications are introduced in this paper, i.e., patient trace retrieval, clustering, and anomaly detection. The proposed approach and the presented applications are evaluated via a real-world data-set of several specific clinical pathways collected from a Chinese hospital. Index Terms—Clinical pathway analysis, Similarity measure, Latent Dirichlet Allocation, Patient trace clustering, Patient trace retrieval, Anomaly detection I. I NTRODUCTION Clinical pathways define the essential component of the complex health-care process, with the objective of linking evidence to practice for specific health conditions and, therefore, optimize patient outcomes and maximize clinical efficiency [1–6]. They have been proposed to support the translation of clinical guidelines into local protocols and clinical practice [7], and as a strategy, to optimize resource allocation in a climate of increasing health-care costs [8, 9]. Clinical pathway analysis (CPA) has experienced increased A preliminary version of this paper appeared in the 14th Conference on Artificial Intelligence in Medicine (AIME2013) Zhengxing Huang and Huilong Duan are from the College of Biomedical Engineering and Instrument Science of Zhejiang University. The Key Laboratory of Biomedical Engineering, Ministry of Education, China. Wei Dong is with the Department of Cardiology, Chinese PLA General Hospital. Haomin Li is from the College of Computer Science of Zhejiang University. *Both authors contributed equally to this work. **Corresponding authors: duanhl@zju.edu.cn, haomin li@yahoo.com attention over the years due to its importance to health-care management in general and its usefulness for capturing the actionable knowledge and interesting insights to administrate, automate, and schedule the best practice for individual patients in clinical pathways [3, 10, 11]. For example, it is possible to discover a clinical pathway model from past clinical pathway instances (i.e. patient traces) [3], detect the anomalies in clinical pathways [11], identify care-points where patient traces deviate from expected and/or normative medical behaviors [12], and enrich pathway models based on patient traces, etc. Predominant approaches to CPA are from an external perspective of clinical pathways [3]. For example, Muluk et al., [13] evaluated the effects of the clinical pathway of nonurgent abdominal aortic aneurysm surgery, i.e., charges, length of stay, and mortality rate. Barbieri et al., [14] presented a meta-analysis method to evaluate the use of clinical pathways for hip and knee joint replacements by assessing the major outcomes of in-hospital hip and knee joint replacement processes: postoperative complications, number of discharged patients at home, length of stay, and direct cost, etc. Kul proposed a patient survival analysis for clinical pathways [15]. As valuable as these approaches are, they typically look at aggregated data seen from the measures, e.g., length of stay, mortality, and infection rate, etc [10], and thus restrict the attention to an external perspective of CPA. In clinical settings, pathways are evolving and clinicians typically have an oversimplified and incorrect view of the actual clinical pathways. In this regard, health-care organizations require to provide insights into clinical pathways and enable various types of analysis. In this study, we argue that a careful inspection of patient traces can support health-care organizations to analyze and improve clinical pathways from an internal perspective. Patient traces properly group sets of consistent examples, representing frequent, similar modifications to instances of the same pathway model, and allowing to extract generalized knowledge for clinical pathways. By measuring similarities between patient traces, it can be useful to health-care organizations for a number of reasons including better overall clinical pathway management and maintenance [16]. For example, similar patient traces can be grouped to exploit the specific knowledge or previously experienced situations, identify standardized and consolidated clinical pathways, and retrieve suggestions on how to improve and optimize clinical pathways, etc. In order to measure similarities between patient traces, it is a common technique to provide a measure of distance in the Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 2 features’ space, e.g., to compute similarity primarily by using event sequences of patient traces. Traditional techniques of sequence similarity measures are focused on direct matching between sequences commonly applying the classical distance concepts. They may not be appropriate to measure similarities between patient traces for clinical pathways. Clinical pathways, as typical human-centric processes, always take place in an unstructured fashion, i.e., clinical events may occur arbitrarily without a particular order in the pathways. Bringing order in the chaos of clinical pathways probably requires different similarity measure strategies rather than the existing methods [17]. To this end, we employ a probabilistic graphical model, i.e., Latent Dirichlet Allocation (LDA) [18], to measure similarities between patient traces for clinical pathways. The assumption made is that the possible treatment behaviors of patient traces in clinical pathways may be represented by a relatively small number of simple and common behavioral topics, where each topic is characterized by a probability distribution over treatment behaviors, i.e., a set of specific clinical events performed on specific patients. The derived treatment topics can be combined with the original patient traces to measure similarities between traces. Many further interesting applications, e.g., patient trace retrieval, clustering, and anomaly detection, can be performed based on similarity to analyze clinical pathways. The remainder of this paper is organized as follows: We present our similarity measure method in Section 2. Section 3 experimentally evaluates our approach based on three typical applications, i.e., patient trace retrieval, clustering and anomaly detection. We present the system prototype in Section 4. Finally, Section 5 concludes and discusses possible directions for future work. II. M ETHOD In this section, we introduce some notations and terminologies for the patient trace representation at first. This is followed by a description of the proposed similarity measure between patient traces for clinical pathways. A. Patient trace representation Clinical pathways leave traces, described as sequences of clinical events with regard to a mixture of various latent treatment behaviors. Typically, we assume that it is possible to sequentially record various kinds of clinical events in clinical pathways such that each event refers to a clinical activity (i.e., a well-defined step in clinical pathways) and is related to a particular patient (i.e., a patient trace). Furthermore, additional information such as the time-stamp of the event, and patient data elements recorded with the event (e.g., age, sex, first diagnosis code, and care level, etc.). In general, hospital information systems record such information. To introduce the patient trace representation model and our similarity measure method, we first define the following concepts. Definition 1: Let E be the set of clinical events 1 . A patient trace is a non-empty sequence of clinical events performed on a particular patient, i.e., c = he1 , e2 , . . . , en i, where ei ∈ E (1 ≤ i ≤ n) is a particular clinical event. For convenience, let c(i) be the ith clinical event in the trace. A patient trace repository R is a multi-set of patient traces. For example, Table 1 shows an example of a patient trace repository, which consists of ten patient traces, i.e., R = {c1 , c2 , · · · , c10 }. Each clinical event in the repository is linked to a particular trace and globally unique, i.e., the same event cannot occur twice in a repository. For example, let e = (Adm, 1) be a specific clinical event, which indicates that the patient is in admission at the time stamp 1. For the sake of simplicity, the time stamps of these event examples are integer values, however it could be presented in a dateformat time stamp. A patient trace in a repository represents a particular clinical process instance also referred to as “case” of the treatment to a patient. The trace contains a set of clinical events, which spread along the observed time period of the patient’s length of stay. Table 2 lists the meaning of these event types. B. Similarity measure between patient traces As mentioned earlier, a patient trace is represented by a mixture of treatment behaviors, w.r.t specific categories of clinical events in clinical pathways. In this study, we employ a specific topic analysis approach, i.e., Latent Dirichlet Allocation, to mine the set of latent treatment behavioral topics from patient trace repository. And then, based on the derived treatment behavioral topics, similarities among patient traces can be measured efficiently. LDA has been widely used to model the generative process of a text document corpus, where a document is summarized as a mixture of topics. With respect to our study, patient traces are a mixture of latent treatment behaviors. Note that treatment behaviors are recognized as a set of clinical events, we can extract clinical event types to represent “words” in the model, and clinical events of a particular patient trace are combined to form a “document”. All patient traces in the repository are thus converted into a collection of documents. In general, LDA helps to explain the behavioral similarity of patient traces by grouping clinical events into unobserved sets. As shown in Figure 1, a mixture of these sets then constitutes the observable patient trace. The generative process of LDA is as follows. For each patient trace c, a mixture of topic proportion θc ∼ Dir (α) is sampled from a Dirichlet distribution parameterized by the hyperparameter α. Each clinical event e in a trace is generated by first sampling a topic t from a multinomial distribution t ∼ Mult(θ), and then sampling e ∼ Mult(φt ) also from a multinomial distribution. Given a treatment behavioral topic t, each φt ∼ Dir (β) is sampled from a Dirichlet distribution parameterized by β. In 1 Some clinical events might have a duration, i.e., they are conducted not at a specific time-stamp, but over a time period. However, such a clinical event can be assumed to consist of a pair of sub clinical activities, i.e., a start event and an end event, which correspond to a start event and an end event, respectively. In this study, we assume that clinical events are time point events, and intervals are represented by starting and ending time point events. Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 3 TABLE I E XAMPLE PATIENT TRACES FOR THE INTRACRANIAL HEMORRHAGE CLINICAL PATHWAY. T HE TRACES ARE SIMPLIFIED INFORMATION EXTRACTION FROM PATIENT RECORDS OF Z HEJIANG H UZHOU C ENTRAL H OSPITAL OF C HINA . c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 h(Adm, 1), (EPT, 1), (EKS, 1), (EBT, 1), (OxS, 1), (lso, 2), (Oxl, 2), (OxS, 2), (BT, 2), (Mic, 2), (Coa, 2), (Uri, 2), (Vei, 2), (EBT, 2), (EKS, 2), (Hig, 2), (Hep, 2), (Ele, 2), (Cat, 2), (Sto, 2), (Rep, 3), (Rep, 3), (IhH, 4), (OxS, 4), (Vei, 5), (Pos, 6), (Ind, 6), (Rep, 6), (Vei, 7), (Oxl, 7), (BT, 8), (Hep, 8), (Ele, 8), (Vei, 9), (Vei, 9), (Vei, 11), (Vei, 11), (KAR, 14), (Osm, 16), (Oxl, 16), (Vei, 16), (Hep, 16), (Ele, 16), (Vei, 18), (Dis, 18)i h(Adm, 1), (Con, 1), (Sto, 1), (Cat, 1), (Ele, 1), (LKF, 1), (Sil, 1), (EPT, 1), (EKS, 1), (EBT, 1), (Thy, 1), (Uri, 1), (Coa, 1), (Vei, 1), (Sex, 1), (BT, 1), (OxS, 1), (lso, 1), (InH, 2), (Oxl, 3), (Vei, 3), (Pos, 5), (Rep, 5), (Lum, 6), (aCB, 6), (rCF, 6), (Lum, 7), (CFA, 7), (rCF, 7), (Lum, 8), (BT, 8), (CFA, 8), (rCF, 8), (CSF, 8), (aCC, 8), (Hep, 8), (Ele, 8), (Oxl, 11), (OxS, 11), (Ind, 11), (Rep, 11), (Dis, 11)i h(Adm, 1), (Det, 1), (Tum, 1), (Oxs, 1), (Hem, 1), (Hem, 1), (Ane, 1), (Ane, 1), (Coa, 1), (Coa, 1), (Ana, 1), (Ana, 1), (Uri, 1), (Uri, 1), (Vei, 1), (Thy, 1), (Thy, 1), (EBT, 1), (EKS, 1), (EKS, 1), (EUS, 1), (EPT, 1), (EPT, 1), (LKG, 1), (LKG, 1), (Sto, 1), (Sto, 1), (Oxs, 2), (Vei, 2), (Con, 2), (Oxs, 3), (ERS, 5), (FAC, 7), (Oxs, 7), (Oxs, 8), (BTH, 10), (Ele, 10), (LKF, 10), (Hem, 10), (Det, 11), (CDR, 11), (Dis, 18)i h(Adm, 1), (EBT, 1), (EPT, 1), (EE, 1), (Osm, 1), (Rep, 2), (Oxs, 2), (InH, 2), (Sto, 2), (Hig, 2), (Ele, 2), (Vei, 2), (Uri, 2), (Vei, 2), (Osm, 2), (Coa, 2), (BT, 2), (Cat, 2), (Hig, 2), (Con, 3), (Vei, 4), (EKS, 5), (Osm, 7), (Vei, 7), (Osm, 8), (aCB, 8), (rCSF, 8), (Bac, 8), (Hep, 8), (Ele, 8), (Vei, 8), (BT, 8), (Oxl, 9), (Ind, 9), (Pos, 10), (Vei, 10), (Rep, 10), (Rep, 10), (Vei, 12), (Vei, 14), (Vei, 15), (Ele, 18), (Osm, 18), (Hep, 18), (BT, 18), (Vei, 18), (Dis, 21)i h(Adm, 1), (EPT, 1), (Con, 1), (OxS, 1), (Sto, 2), (Hep, 2), (Ele, 2), (Vei, 2), (Uri, 2), (Coa, 2), (Myo, 2), (Sex, 2), (BT, 2), (Lip, 2), (Mul, 2), (Ind, 3), (Hep, 3), (Ele, 3), (Vei, 3), (Vei, 3), (BT, 3), (Sex, 3), (Vei, 4), (Oxl, 4), (Rep, 6), (Ind, 6), (InH, 6), (Vei, 7), (Osm, 9), (Hep, 9), (BT, 9), (Vei, 9), (Con, 10), (Vei, 10), (Vei, 11), (CDR, 12), (Rep, 12), (Pos, 12), (Vei, 12), (Ele, 13), (Osm, 13), (OxS, 13), (BT, 13), (Vei, 13), (Vei, 13), (Hep, 13), (rCSF, 14), (Lum, 14), (aCB, 14), (Bac, 14), (Det, 16), (Vei, 16), (Vei, 18), (Vei, 18), (Vei, 19), (Dis, 21)i h(Adm, 1), (Tum, 1), (ESR, 1), (Coa, 1), (Uri, 1), (Thy, 1), (EBT, 1), (EKS, 1), (EPT, 1), (LKG, 1), (Sto, 1), (EUS, 1), (Con, 2), (CDR, 2), (Hol, 6), (Inf, 7), (Hem, 7), (Ele, 7), (Hep, 7), (BTH, 7), (BTH, 13), (BTH, 13), (Hep, 13), (Ele, 13), (CA7, 13), (Dis, 14)i h(Adm, 1), (Tum, 1), (Ser, 1), (Hem, 1), (Gly, 1), (Ane, 1), (Coa, 1), (Ana, 1), (Thy, 1), (Uri, 1), (BDH, 1), (EBT, 1), (EKS, 1), (EPT, 1), (LKG, 1), (Sto, 1), (EUS, 1), (Oxs, 2), (BDH, 2), (BDH, 2), (BDH, 2), (BDH, 2), (Con, 2), (BDH, 2), (Oxs, 3), (Spu, 3), (Spu, 3), (BDH, 3), (Oxs, 4), (Oxs, 5), (Vei, 7), (LFP, 7), (Hol, 7), (ESR, 8), (BTH, 8), (LKF, 8), (Ele, 8), (Vei, 14), (LPF, 15), (Dis, 15)i h(Adm, 1), (Hem, 1), (Coa, 1), (Uri, 1), (Thy, 1), (EBT, 1), (EKS, 1), (Tum, 1), (LKG, 1), (Sto, 1), (EUS, 1), (Con, 4), (Ele, 5), (Ren, 5), (BTH, 5), (BTH, 12), (Hep, 12), (Ele, 12), (Hol, 13), (Dis, 14)i h(Adm, 1), (Tum, 1), (Oxs, 1), (Hem, 1), (Coa, 1), (Uri, 1), (Vei, 1), (Thy, 1), (EBT, 1), (EKS, 1), (LKG, 1), (Sto, 1), (EUS, 1), (Oxs, 2), (Oxs, 3), (FAO, 4), (Oxs, 4), (Oxs, 5), (EKS, 5), (EBT, 5), (EUS, 5), (Con, 5), (Oxs, 6), (Hem, 6), (BTH, 6), (Glu, 6), (LKF, 6), (Ele, 6), (Oxs, 8), (BTH, 13), (LKG, 13), (Ele, 13), (Dis, 15)i h(Adm, 1), (EBT, 1), (EBT, 1), (EE, 1), (EE, 1), (BT, 1), (EKS, 1), (EPT, 1), (Cra, 2), (Oxl, 2), (Ful, 2), (BT, 2), (Sex, 2), (Myo, 2), (Gas, 2), (Gas, 2), (Mic, 2), (Osm, 2), (Cor, 2), (Uri, 2), (Thy, 2), (HA, 2), (Hep, 2), (Ele, 2), (Sto, 2), (BNP, 2), (Con, 3), (rCF, 5), (GPC, 5), (Lum, 5), (BT, 5), (aCB, 5), (Bac, 5), (Osm, 5), (Hep, 5), (Ele, 5), (rCF, 6), (BT, 6), (aCB, 6), (Bac, 6), (Osm, 6), (Lum, 6), (Hep, 6), (Ele, 6), (Bac, 7), (Bac, 7), (aCB, 7), (aCB, 7), (Lum, 7), (rCF, 7), (rCF, 7), (Dis, 8)i TABLE II T HE MEANING OF THE EXAMPLE ALPHABETIC LABELS OF CLINICAL EVENTS SHOWN IN TABLE 1. aCB: Acute Cerebrospinal fluid biochemical Ana: Analysis of urine microalbumin Bac: Bacteria and fungi were cultured and identified BTH: Blood test+Hypersensitive CRP Cat: Catheterization Coa: Coagulation + D-dimer Cra: Craniotomy for intracranial decompression Dis: Discharge EBT: Emergency blood test Ele: Electrolyte ESR: ESR Ful: Full set of Lipids (hospital) Gly: Glycosylated hemoglobin HA: Hepatitis A antibody HLA: HLA-B27 Imu: Immune (5 items) InT: Infrared treatment KAR: Kidneys and renal vascular color Doppler ultrasound LKG: Liver, kidney, the glycolipid heart enzyme (hospitalization) Mic: Micro-jet atomization mask OxS: Oxygen saturation monitoring Pos: Postoperative drainage Rep: Replacement of drainage Sex: Sex hormones Sto: Stool examination Tum: Tumors (10 items) aCC: Acute CSFRT cryptococcal Ane: Anemia (3 items) BFG: (BFGF) topical bovine basic fibroblast growth BNP: B-type natriuretic peptide CA7: CA72-4 Con: Conventional ECG Exam CSF: CSF biochemical DLV: Determination of left ventricular function EE: Emergency Electrolyte EPT: Emergency PT EUS: Emergency ultra-sensitivity CRP Gas: Gastrointestinal high nutrition therapy GPC: General physical cooling Hep: The hepatorenal sugar (hospitalization) Hol: 24-hour Holter In3: Inflammation (3 items) InH: Intracranial hematoma(including simple epidural) LFP: Low-frequency pulse power treatment Lip: Lipids (7 items, hospitalization) Mul: Multiple intracranial hematoma OxI: Oxygen inhalation rCF: CSF routine Ser: Serum troponin T assay Sil: Silicone suction drainage Th7: Thyroid function (7 items) Uri: Urine + sediment test LDA, each patient trace c is a mixture of topics represented by θc and each topic t is a distribution over all events represented by φt,e = Pr(e|t). Using this generative model, the treatment behavioral topic assignments for clinical events can be calculated based on the current topic assignment of all the other clinical event positions. More specifically, the topic assignment is sampled from: π (e) Pr(ti = t|t¬i , c) = P ntc,¬i + α a nt,¬i +β b∈A nbt + β|A| P j∈K t ncj + αK Adm: Admission Ant: Anti-O rheumatoid BT: Blood test BDH: B-D Heparin cap CDR: Color Doppler routine inspection Cor: Cortisol CFA: Cerebrospinal fluid biochemical+ADA DT3: Determination of tumor (3 items) EKS: Emergency kidney, sugar ERS: Emergency renal, sugar FAC: By the femoral artery catheter cerebral arteriography Glu: Glucose Hem: Hemorheology Hig: High-frequency oxygen / hour IDF: Intracranial Doppler flow imaging (TCD) Ind: Indwelling catheter Iso: Isoflurane (live Ning)/1ml/ml LKF: Liver and kidney function (hospitalization) Lum: Lumbar puncture Myo: Myocardial enzymes Osm: Osmotic pressure Ren: Renal function (hospitalization) SE+: Stool examination+OB Spu: Sputum culture Thy: Thyroid (five items) Vei: Vein catheterization where ti = t represents the assignment of the ith occurrence to topic t, t¬i represents all treatment behavioral topics assignments not including the ith occurrence, K is the number πa (e) of topics, |A| is the number of clinical event types, nt,¬i is the number of times the event type πa (e) assigned to topic t, not including the current instance, and ntc,¬i is the number of times topic t assigned to the patient trace c, not including the current instance. From these count matrices, we can estimate the topic-event (1) Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 4 Fig. 1. Graphical representation of LDA-based similarity measure between patient traces [4]. distribution θ and trace-topic distribution φ by, π (e) and, nt a +β b + β|A| n b∈A t (2) ntc + α t t∈T nc + αK (3) θt,e = P φc,t = P Exact inference in LDA is generally intractable. In particular, we use Gibbs sampling to estimate the parameters ntc and net from which we can determine the model parameters θt,e and φc,t . The pseudo-code for Gibbs sampling is shown in Algorithm 1. By inspection, the complexity of Algorithm 1 scales linearly with the number of latent treatment topics K, the number Pof clinical events in the patient trace repository R, N = c∈R |c|, and linearly with the number of Gibbs samples L, giving the overall complexity of O(L · N · K). Taking the traces shown in Table 1 as an example, clinAlgorithm 1 Gibbs sampling for LDA 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: Procedure::LDAGibbsSampling(R, α, β, L) Input: R is a patient trace repository α, β are Dirichlet hyper-parameters L is the number of Gibbs samples Output: T is the set of estimated treatment topics based on the probability Pr(e|t) Steps: //Initialization Initialize the count parameters, ntc = 0, net = 0 For each event e in R Sample a treatment topic t from t ∼ Mult( T1 ) Let ntc = ntc + 1, net = net + 1 End For //Run the chain For l = 1 to L do For each c ∈ R do For each ei ∈ c do Let ntc = ntc − 1, net = net − 1 Sample tci according to Equation (1) Let ntc = ntc + 1, net = net + 1 End For End For store the l-th Gibbs sample End For Output: Estimated treatment topics based on Pr(e|t) based on Equations (2) and (3) 27: End Procedure ical experts from the cooperated hospital have indicated that the two derived topics have specific clinical intentions, i.e., cerebral hemorrhage treatment (ICD-10: I61), and subdural hematoma treatment (ICD-10: I62.006), respectively. Thus, we denote K = 2 for the example traces. Note that the derived topics reflect a collaborative shared view of medical behaviors contained in the traces, and the event types of the topics reflect a common vocabulary to describe the patient trace. Table 3 shows typical examples of event types (satisfying p(e|t) ≥ 0.01) of the derived treatment behavioral topics. As can be seen, the topics group typically co-occurring events. For example, clinical event types “Intracranial hematoma surgery (including simple epidural)” and “Postoperative drainage” are correlated with each other, and they have the same value of the event-topic distribution. The relationships between clinical event types via treatment behavioral topics can be used to provide good classification of patient traces. Note that the derived latent topics are not necessary disjoint. E.g., “ECG” occurs in the cerebral hemorrhage topic as well as in the subdural hematoma topic. Once we have learned the model parameters, we can measure the similarity between patient traces. In particular, for a specific trace c in the repository R, we obtain the topic → − distribution θc = {θ̂c,t1 , θ̂c,t2 , · · · , θ̂c,tK }, where each θ̂c,ti is the posterior estimate of θc,ti for the treatment behavioral topic ti (1 ≤ i ≤ K). Upon this, we are able to calculate the similarity between two traces c and c∗ (c, c∗ ∈ R) as follows: P θ̂c,t × θ̂c∗ ,t (4) sim(c, c∗ ) = qP t∈T qP 2 2 θ̂ θ̂ ∗ t∈T c,t t∈T c ,t Taking the traces shown in Table 1 as examples, for patient trace c1 , the top 5 similar traces are c4 (sim(c1 , c4 ) = 0.9999), c5 (sim(c1 , c5 ) = 0.9997), c2 (sim(c1 , c2 ) = 0.9834), c10 (sim(c1 , c10 ) = 0.9761) and c6 (sim(c1 , c6 ) = 0.9323). III. C ASE STUDY The presented similarity measure approach provides a basis for further CPA tasks. In this section, three possible CPA applications, i.e., patient trace retrieval, clustering, and anomaly detection, are presented as follows. To test the feasibility of the proposed method, experiments on data-sets collected from Zhejiang Huzhou Central Hospital of China were performed. The explanation of the experimental setups and obtained results are presented in the following. A. Data set description The experimental data-set was extracted from Zhejiang Huzhou Central hospital of China. The application of information technology in this hospital is at a relatively high level, and the electronic medical records system has been gradually used since 2004. The system records many kinds of information of clinical pathways, e.g, examinations, lab tests, surgeries, etc. In the experiments, we build a specific patient trace repository of clinical pathways of several specific types of cancer, i.e., branchial lung cancer, colon cancer, rectal cancer, breast cancer, and gastric cancer, from the system. The collected data is from 2007/08 to 2009/09. In addition, we preprocessed those traces by removing those incomplete traces Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 5 TABLE III T HE TYPICAL EVENT TYPES FOR THE DERIVED TREATMENT BEHAVIORAL TOPICS FROM EXAMPLE TRACES SHOWN IN TABLE 1. Topic 1 Topic 2 Admission; Oxygen saturation monitoring; Vein catheterization; Conventional ECG Exam; Blood test; Coagulation + D-dimer; Electrolyte; Emergency blood test; Emergency kidney and sugar; Emergency PT; Liver, kidney and the glycolipid heart enzyme (hospitalization); Urine + sediment test; The hepatorenal sugar (hospitalization); Thyroid function (7 items); Emergency ultra-sensitivity CRP; Stool examination + OB; Tumors (10 items); Blood test + Hypersensitive CRP; Hemorheology; Discharge Oxygen saturation monitoring; High-frequency oxygen/hour; Vein catheterization; Intracranial hematoma surgery (including simple epidural); Postoperative drainage; Indwelling catheter; Replacement of drainage bag; Lumbar puncture; Blood test; Electrolyte; Stool examination; Oxygen inhalation; Cerebrospinal fluid biochemical; CSF routine; The hepatorenal sugar (hospitalization); Osmotic pressure; Bacteria and fungi were cultured and identified; Blood test + Hypersensitive CRP; Hemorheolopgy; B-D Heparin cap (e.g., the trace of which the patient died or was transferred during his or her LOS) from the repository. In detail, there are 258 traces, 11028 clinical events with 266 event types. The average LOS of these traces is 25.39 days while some traces take a very short time, e.g., only 4 days in hospital, and other traces take much longer, e.g., 66 days in the hospital, which implicitly indicates the diversity of treatment behaviors in intracranial hemorrhage clinical pathway. B. Similarity measure methods considered In order to evaluate the performance of the presented similarity measure method, we compare the presented LDAbased similarity measure with the traditional edit-distancebased similarity measure, and a classical simple term vector based method. • Using “edit distance” to measure the temporal similarity between pairwise patient traces c and c∗ are implicitly considered as the penalties of a transformation of the trace c to c∗ or vice versa through a set of editing operations, i.e., “no change”, “substitution”, “deletion” and “insertion”, applied to one of the traces iteratively. For more details about the ‘edit distance’ approach, please refer to [19]. • Term vector has been widely used for representing text documents. Adapted to our setting, the term vector of a particular patient trace c has the following form: w(c) ~ = {w1 , · · · , w|V | }, where V is the event type vocabulary of the patient trace repository. Note that the element wi in the vector, which corresponds to the term ith in V , is weighted by using some schema such as TFxIDF. In this study, we use Equation (5) to calculate the similarity between two patient traces ci and cj based on their term vectors: P wi,v × wj,v (5) sim(ci , cj ) = qP v∈V qP 2 2 w w i,v j,v v∈V v∈V In the following experiments, we refer to LDA-based similarity measure with K-topic model (K = 1, 2, 3, · · · , 20) as LDAK, edit-distance-based similarity measure as ED, and term vector based similarity measure as TV. C. Experimental settings Constructing LDA model is to fit latent treatment behavioral topics to the patient trace repositories. In the experiments, we conducted topic analysis for the experimental repository using LDA with different number of treatment behavioral topics (K = 1, 2, 3, · · · , 20). The Dirichlet prior α and β of LDA are set to 0.2 and 0.1, which are common settings in literature. The number of iterations of Gibbs sampling is set to 10000. Note that Gibbs sampling converges before 10000 iterations for the experiments. In addition, to expand the number of trials when we construct the LDA model, we adopt a fivefold crossvalidation strategy. For each repository, we split it randomly into five mutually exclusive subsets of equal size. We then designate each subset as the testing data set are used to compute the perplexity score while the others serve as the training data set. To minimize potential biases that may result from the randomized folding process, we perform this fivefold cross-validation process five times and estimate the overall performance by averaging the performance estimates obtained from the 250 individual trials. The topic models are exploited for experiments hereafter. Now that we have built the LDA model from patient trace repository, several interesting applications could be performed based on the learned LDA model. As shown below, we evaluated the presented approach based on three specific applications, i.e., patient trace retrieval, clustering, and anomaly detection. D. Patient trace retrieval The first application based on similarity measure is patient trace retrieval. Patient traces describe the knowledge acquired after solving specific problems [20]. When a clinician encounters problems in executing a patient trace, he/she may retrieve suggestions from past traces. Given a query, those patient traces with high similarities are good candidates for recommendation, i.e., closer to the query in terms of their behavioral similarities. There is an assumption of using Equation (4) to measure the treatment behavioral similarity between pairwise traces c and c∗ : both traces should be placed into the patient trace repository R such that topic analysis can be performed, and a LDA model can be learned from the traces in the repository. However, in most cases of retrieval, the queried trace is a new one outside the repository, and thus Equation (4) is not appropriate for measuring similarities between a trace in the repository and an external query. To this regard, we employed a LDA-based retrieval model [21] to measure similarities between patient traces. The basic idea of using the LDA-based retrieval model is to generate the query likelihood process, where each trace is scored by its likelihood generating a query trace c∗ , Pr(c|c∗ ). And thus similarity can be measured as sim b (c, c∗ ) ∝ Pr(c|c∗ ). To calculate the query likelihood, we need to sum over the Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 6 treatment topical variable for each clinical event type of the query trace c∗ . Given the posterior estimates θ̂ and φ̂, the query likelihood of a particular trace c (c ∈ R) given c∗ , Pr(c|c∗ ), can be calculated as YX Pr(c|c∗ ) = Pr(e|t, φ̂)Pr(t|θ̂, c) (6) e∈c∗ t∈T Taking Equation (6) to replace Equation (4) for similarity measure, similar traces with a query can be retrieved from the repository R. E. Evaluation metrics on patient trace retrieval For evaluation on patient trace retrieval, the matrix “Precision” is calculated: 5 Precision = X rel (ci ) 1 × |Q| i=1 5 (7) where Q is the set of query traces from the repository. Especially, 25 traces are randomly selected as query traces from the repository. i denotes the ith retrieval trace, which is from 1 to 5, i.e., given a query, we retrieved top 5 similar traces. rel (ci ) denotes the relevant value of the trace ci to the query. If ci is relevant to the query, rel (ci ) = 1, otherwise rel (ci ) = 0. Apparently, it needs to identify if a retrieval trace is relevant to a particular query. For this purpose, a manual evaluation was conducted independently by three managers of medical services at the Zhejiang Huzhou Central hospital adopting a majority voting. F. Evaluation results on patient trace retrieval Figure 2 shows detailed experimental results in comparison between LDA-K (K = 1, 2, 3, · · · , 20), ED and TV on the performance of retrieval. We observe that the number of treatment topics K has weak impacts on retrieval performance of the proposed LDA method. As depicted in Figure 2, with K increases, the precision increases slowly at first, and then remains stable with the further increases of K. However, when K surpasses a certain threshold, the precision decreases slowly with further increases of the value of K. We can observe that, the precision achieves the best performance when K is around 11, while smaller values like K = 1 or larger values like K = 20 can potentially degrade the performance. This phenomenon indicates that the number of latent treatment topics for analysis should be suitable to reflect the topics in the repository. In comparison with ED and TV, the precision achieved by LDA-11 is 0.792, while the precision achieved by ED and TV are 0.632 and 0.664, respectively, i.e., roughly 19% improvements on the quality of precision, which is quite remarkable. In fact, as shown in Figure 2 the presented LDAK outperforms ED and TV regardless of the value of K. It indicates that LDA is more appropriate for the patient trace retrieval than ED and TV. G. Patient trace clustering In clinical pathways, patients who have the similar symptoms, chief complaints, pathology examination results, and other clinical features, may have similar traces, and can be grouped into the same cluster. Patient trace clustering helps reveal the underlying characteristics and commonalities among a large collection of traces. The information extracted by clustering can also facilitate subsequent analysis, for instance, to extract common treatment patterns of execution in the traces, or speed up trace indexing and anomaly detection. 0 A reasonable similarity measure sim(c, c ) is critical for the patient trace clustering. The objective of the clustering methods that work on similarity measure function is to maximize the intra cluster similarities and minimize the inter cluster similarity [22, 23]. In this study, we adopted a hierarchical micro-clustering algorithm [24] to generate partitions of patient traces in the repository. In Algorithm 2, we iteratively group two trace clusters with the largest similarity, where the similarity between two clusters is defined as the similarity between the farthest traces in the two clusters. The algorithm terminates when the maximum similarity between clusters becomes smaller than a user-specified threshold ε. The algorithm outputs a set of clusters of patient traces. It guarantees that the similarity between any pairwise traces in the same cluster is larger than ε. Algorithm 2 Density-based k nearest neighbor clustering. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: Procedure::DensityBasedKNNClustering(R, ε) Input: R is a patient trace repository ε is the threshold of similarity Output: Φ is the set of patient trace clusters Steps: For each trace c in R Let φc = {c}, Φ ⇐ Φ + φc End For For each pair of clusters φi and φj in Φ Let sim ij = sim(ci , cj ) be the similarity between φi and φj , where φi = {ci }, φj = {cj } End For Set the current maximum similarity sim = max(sim ij ) While (sim ≥ ε) Select sim x,ySwhere (x, y) = argmax i,j simi,j Let φz = φxS φy Let Φ ⇐ Φ {φz } − φx − φy For each φv 6= φz Let sim vz = max(sim(c1 , c2 )) where c1 ∈ φv and c2 ∈ φz End For End While Output Φ End Procedure H. Evaluation metrics on patient trace clustering In the experiments, we compare the generated clusters with the benchmark clusters. The benchmark clusters are identified from the experimental repository. In particular, we use the first diagnosis code to category patient traces. As mentioned above, 5 categories, i.e., bronchial lung cancer, colon cancer, rectal cancer, breast cancer, and gastric cancer, are extracted from the repository. Since the experimental repository contains these general categories, they can be used as benchmark clusters for evaluating the overall performance of clustering. As to evaluate the patient trace clustering, we first calculate the accuracy of the system on a per-trace basis and then build a global score for all patient traces in the repository, i.e., for a patient trace c, the precision and recall with respect to that Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 7 Fig. 2. The results of patient trace retrieval. trace are calculated as follows: T |φc | |ϕc | |φc | T |φc | |ϕc | Recall c = |ϕc | Precision c = (8) (9) where φc is the generated cluster containing c, ϕc is the T benchmark cluster containing c, |φc | |ϕc | is the number of patient traces simultaneously appeared in both φc and ϕc . And the final precision and recall numbers are calculated as follows: 1 X Precision = Precision c (10) |R| c∈R Recall = 1 X Recall c |R| (11) c∈R Usually, precision and recall are not used separately, but combined into Fβ measure as following: Fβ = (1 + β 2 ) × (Precision × Recall ) β 2 × Precision + Recall (12) In the experiments, we set β = 0.5 to weight precision twice as much as recall. This is because we are willing to have averagesize clusters but high precision than merging them into a large cluster for higher recall but low precision. I. Evaluation results on patient trace clustering Using the benchmark clusters, we can evaluate clustering performance on F0.5 . In particular, by taking the maximum value of F0.5 (among different merging thresholds ε from 0.0 to 0.4), we compare the performance of ED, TV and LDA-K (K = 1, 2, 3, · · · , 20). As shown in Figure 3, when the number of topics is larger than a particular value (K ≥ 8), the F0.5 is quite stable. Certainly, k ≈ 8 is probably the suitable number of topics for the experimental patient trace repository. Now we study the impact of the parameter ε on both the experimental results, where ε is the merging threshold in the clustering step. We vary the value of ε from 0.0 to 0.4. Figure 4 shows the results of ED, TV and LDA-8 (using the 8-topic model). From the figure, we can notice that LDA-8 can provide significant improvement over ED and TV. The maximum value of F0.5 of LDA-8 is 0.6622, which is nearly 84% better than ED (0.1044), and 46% better than TV (0.3565). Note that when margining threshold is zero, each patient trace is classified into a specific cluster. That explains why three curves have the same starting value of F0.5 shown in Figure 4. In addition, the inclusion of latent topics increases similarity among patient traces. As a result, when merging threshold is small, LDA-8 does not show an advantage over ED and TV. When merging threshold increases, LDA-8 obtains better results on F0.5 than ED and TV, while TV increases slowly with the increases of ε, and ED remains stable regardless of the change on the value of ε. In particular, LDA-8 provides the most significant improvements when ε is 0.15. And then F0.5 decreases slowly with the further increases of ε. It means the suitable value of ε is around 0.15 for the experimental repository. Note that we can always obtain better results with LDA-8 except ε = 0 in comparison with ED and TV. It indicates that the treatment behavioral features have much more significant influences on the similarity measure and subsequent analysis (e.g., patient trace clustering) than the sequential order of clinical events of the traces. Apparently, it confirms our assumption that clinical pathways take place in an unstructured fashion such that traditional temporal similarity measure between patient traces would not achieve the accurate results, and may distort the subsequent tasks of CPA. J. Anomaly detection With regard to the set of trace clusters discovered by the method presented in previous section, it is possible to find if a particular patient trace c is normal or anomalous. Since patient traces within a specific cluster have similar care journeys to each other. We argue that while facing a new piece of information, humans firstly classify it into an existing information category [25], and then compare it to the previous members of the category to understand how it varies in relation to the general characteristics of the membership category. Once the “normality” has been roughly captured by the discovered clusters from a particular patient trace repository, one can look for those individual patient traces whose patient-care journey deviates from the normal one. To this end, we assume that each discovered patient trace cluster φ represents a particular clinical pathway category, which is supported by a subset of patient traces in the trace repository R (φ ⊆ R). Traces of φ share a set of common properties Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 8 Fig. 3. Performance of clustering using ED, TV and LDA with different latent treatment behavioral topic models on the experiment repository. For each clustering setting (ED, TV or LDA with different topic models), we changed the merging threshold and obtained the maximum F0.5 for comparison. Fig. 4. The comparison between ED, TV and LDA-8 on patient trace clustering. that make them perceptually similar to each other, while also making them different from the traces of other clusters. If a particular patient trace c has similar features with the traces in φ, we can say that c is regular with regard to φ, otherwise, c is an anomaly. To this end, similarities between c and the traces of φ are combined to generate a conclusion about c. Based on the presented similarity measure between pairwise traces, we compute the similarity between a particular patient trace c and the previous members of each trace cluster by defining a function ∆φ (c) as: ∆φ (c) = X ωφ (c∗ ) · sim(c, c∗ ) (13) c∗ ∈φ where ωφ (c∗ ) is the weight of each member c∗ in the cluster φ, which indicates the participation of c∗ in φ. 1 X ωφ (c∗ ) = sim(c∗ , c∗∗ ) (14) |φ| ∗∗ c ∈φ ∆φ represents the average weighted similarity between a particular patient trace c and any one of a membership cluster φ. The selected membership cluster φ∗ is found as: φ∗ = argmax ∆φ (c∗ ) (15) ∀φ Once the membership decision of a new particular trace has been made, we can focus our attention on deciding whether the new particular trace is normal or not. Intuitively speaking, we want to decide the normality of a new trace based on its closeness to the previous members of its membership cluster. This is done with respect to the average closeness between the previous members of its membership cluster. In particular, we define a particular trace c as normal with respect to its membership cluster φ∗ if ∆φ∗ (c) is larger than a particular normality threshold µ, i.e., if ∆φ∗ (c) ≥ µ, c is normal w.r.t φ∗ . Otherwise, it is an anomaly. K. Evaluation metrics on anomaly detection In this subsection, we evaluate the proposed anomaly detection method. The overview of the experimental flowchart involves three steps: 1) By applying the proposed approach, we evaluate the normality of each patient trace in the repository. In particular, we set up 10-fold cross validation experiments, which mean those traces in the repository would be split into ten partitions. Nine partitions are training data, and one partition is testing data. Based on train data, the proposed anomaly detection model is built. Then, for the partition of testing data, the normality of each trace is calculated based on the learned model. In all, the set of anomalies are extracted from the repository R, named Anomalies = {c|c ∈ R ∧ ∆φ∗ (c) < µ}, where µ is a particular normal threshold value. The calculation process and methods have been introduced in Section 4.3 of this paper in detail. 2) Ask to the benchmark (or ground truth) evaluation data, we asked three experienced physicians of Zhejiang Huzhou central hospital to evaluate the discovered anomalies adopting a majority voting. Formally, we let bc be the clinical expert’s evaluation result of an anomaly Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 9 trace c discovered by our method. If clinical experts also take c as anomaly, bc = 1, otherwise, bc = 0. 3) The last step is the comparison between the calculation results and benchmark. In particular, the matrix “Precision” is gained as follows: P bc (16) Precision = c∈Anomalies × 100% |Anomalies| L. Evaluation results on anomaly detection As mentioned above, LDA-K achieves the best clustering performance with K = 8. Thus, we investigated the performance of LDA-8 on anomaly detection. Table 4 shows the number of detected anomalies and the corresponding precision value of LDA-8 with different µ, where µ is the threshold value of normality of patient traces. In the experiments, we vary the value of µ from 0.05 to 0.5. The general trend of precision is observed in Table 4. For example, when µ = 0.9, there are 29 anomalies detected by LDA-8, and the precision is about 58.6% (i.e., 17 detected anomalies are evaluated as true, and 12 detected anomalies are evaluated as false by clinical experts). When µ reduces to a certain value, i.e, µ ≤ 0.8, the number of detected anomalies and the corresponding precision of LDA-8 remains stable with the further decreases of µ, i.e., less than 3 detected anomalies and 100% precisions. Although the precision achieved is quite remarkable when µ ≤ 0.8, there are at least 14 (17-3) anomalies recognized by clinical experts while not detected by LDA-8. Clearly, when µ = 0.85, LDA-8 is able to detect most of the anomalies from the repository. Thus, as a conservative estimate, the default value for normality threshold value µ is set at 0.85. It has to be mentioned that, when ED is applied in anomaly detection, all 258 traces in the repository are judged as anomalies even when µ = 0.5. For TV, 133 out of 258 patient traces are recognized as anomalies when µ = 0.5, which is still a quite large number of anomalies. It is because the measured similarities between patient traces are quite small using ED or TV. For example, using ED for any trace c in the repository, the maximum ∆φ∗ (c) (over varying ε from 0.0 to 0.4 in the clustering step) is 0.382. It is far less than µ. Apparently, both ED and TV are unsuitable to be applied in anomaly detection for the experimental repository. It also confirms our assumption that clinical pathways are typically unstructured such that it requires different strategies rather than traditional methods to measure similarities between patient traces. IV. P ROOF - OF - CONCEPT PROTOTYPE We have implemented and tested the proposed approach using Microsoft C#. Figure 5 depicts a screen-shot of our prototype. Based on the input trace repository extracted from Zhejiang Huzhou Central hospital of China, we can describe the details of each trace. For example, Figure 5 listed a set of patient traces of the intracranial hemorrhage pathway. On the left part of Figure 5, it presents the basic information from the repository, e.g., number of traces, number of events, number of event types, minimum LOS, maximum LOS, average LOS, etc. In addition, all traces with their IDs and all event types existing in the trace repository are listed. Each event type is represented as a color dot to distinguish clearly. User could select the traces and the types of interest to display. For each patient trace, it shows time-line display, categories and similar traces on the right part of Figure 5. Time-line display distributes all the events upon the corresponding inpatient day which means that for each in-patient day, there are events sorted by time from the earliest event to the latest one in a single day. Categories show the treatment behavioral topics the trace belongs to. Sometimes the trace is a mixture of two or more categories, and we can fix it with a probability on each category. Similar traces present the typical as well as similar traces we’ve found from the patient trace repository by using the methods presented in this work. They are also displayed with time-line display. V. C ONCLUSION In this paper, we present a probabilistic approach of measuring the similarities between patient traces for CPA. The proposed approach can provide a basis for subsequent CPA (e.g., patient trace retrieval, clustering, and anomaly detection, etc.), and assist in getting better insights into clinical pathways. The advantages of the proposed approach have been pointed out in our proposal. Note that what we need is to gather a patient trace repository and use it for analyzing and improving clinical pathways. Analysis on the patient trace repository is totally unsupervised. It requires small effort of humans for preprocessing the traces in the repository. This is particularly useful when dealing with clinical pathways lacking formal consensus models, where patient traces can still be measured based on their treatment behavioral similarities. As a result, the solution works well for CPA. We believe that our approach is highly appealing in the field of CPA. Measuring similarities between patient traces can profitably be exploited as a basis for further tasks of CPA, not limited to the applications listed in this article. E.g., critical/essential treatment behaviors can be detected, analyzed, and optimized based on the topic analysis presented in this study, association rules between recognized anomalies and patient states can be derived, etc. We will address these tasks by exploiting the potential of the proposed method and its applications, as a crucial advantage over traditional techniques for clinical pathway analysis and optimization. ACKNOWLEDGMENT This work was supported by the National Nature Science Foundation of China under Grant No 81101126, and the National Hi-Tech R&D Plan of China under Grant No 2012AA02A601. The authors would like to give special thanks to all experts who cooperated in the evaluation of the proposed method. R EFERENCES [1] D.A. Alexandrou, I.E. Skitsas, and G.N. Mentzas. A holistic environment for the design and execution of selfadaptive clinical pathways. Information Technology in Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 10 TABLE IV T HE RESULTS OF ANOMALY DETECTION USING LDA-8 ON THE EXPERIMENTAL REPOSITORY. # of detected anomalies # of benchmark anomalies Precision Fig. 5. [2] [3] [4] [5] [6] [7] µ = 0.9 29 17 58.6% µ = 0.85 13 12 92.3% µ = 0.8 3 3 100% µ = 0.75 2 2 100% µ = 0.7 2 2 100% µ = 0.65 2 2 100% µ = 0.6 2 2 100% µ = 0.55 1 1 100% µ = 0.5 1 1 100% A screen-shot of the system prototype. Biomedicine, IEEE Transactions on, 15(1):108 –118, jan. 2011. L. Maruster and R.J. Jorna. From data to knowledge: a method for modeling hospital logistic processes. Information Technology in Biomedicine, IEEE Transactions on, 9(2):248–255, june 2005. Z. Huang, X. Lu, and H. Duan. On mining clinical pathway patterns from medical behaviors. Artificial Intelligence in Medicine, 56(1):35–50, 2012. Z. Huang, X. Lu, and H. Duan. Latent treatment topic discovery for clinical pathways. Journal of Medical Systems, 37(2):1–10, 2013. Z. Huang, X. Lu, H. Duan, and W. Fan. Summarizing clinical pathways from event logs. Journal of Biomedical Informatics, 46(1):111–127, 2013. Z. Huang, X. Lu, and H. Duan. Similarity measuring between patient traces for clinical pathway analysis. In Niels Peek, Roque Marłn Morales, and Mor Peleg, editors, Artificial Intelligence in Medicine, volume 7885 of Lecture Notes in Computer Science, pages 268–272. Springer Berlin Heidelberg, 2013. H. Campbell, R. Hotchkiss, N. Bradshaw, and M. Porteous. Integrated care pathways. British Medical Journal, 316(7125):133–137, 1998. [8] A. Dogac, Y. Kabak, T. Namli, and A. Okcan. Collaborative business process support in ehealth: Integrating ihe profiles through ebxml business process specification language. Information Technology in Biomedicine, IEEE Transactions on, 12(6):754–762, 2008. [9] J. Kimberly, G. de de Pouvourville, and T. d’Aunno. The globalization of managerial innovation in healthcare. Cambridge: University Press, 2009. [10] F. Lin, S. Chen, S. Pan, and Y. Chen. Mining time dependency patterns in clinical pathways. International Journal of Medical Informatics, 62(1):11–25, 2001. [11] A. Rebuge and D.R. Ferreira. Business process analysis in healthcare environments: A methodology based on process mining. Information Systems, 37(2):99–116, 2012. [12] J. van de Klundert, P. Gorissen, and S. Zeemering. Measuring clinical pathway adherence. Journal of Biomedical Informatics, 43(6):861–872, 2010. [13] S.C. Muluk, L. Painter, S. Sile, R.Y. Rhee, M.S. Makaroun, D.L. Steed, and M.W. Webster. Utility of clinical pathway and prospective case management to achieve cost and hospital stay reduction for aortic aneurysm surgery at a tertiary care hospital. Journal of Vascular Surgery, 25(1):84–93, 1997. Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 11 [14] A. Barbieri, K. Vanhaecht, P. Van Herck, W. Sermeus, F. Faggiano, S. Marchisio, and M. Panella. Effects of clinical pathways in the joint replacement: a metaanalysis. BMC Medicine, 7(32):1–11, 2009. [15] S. Kul. The use of survival analysis for clinical pathways. International Journal of Care Pathways, 14(1):23–26, 2010. [16] M. Qiao, R. Akkiraju, and A. Rembert. Towards efficient business process clustering and retrieval: Combining language modeling and structure matching. In Stefanie Rinderle-Ma, Farouk Toumani, and Karsten Wolf, editors, Business Process Management, volume 6896 of Lecture Notes in Computer Science, pages 199–214. Springer Berlin / Heidelberg, 2011. [17] S. Goedertier, J. De Weerdt, D. Martens, J. Vanthienen, and B. Baesens. Process discovery in event logs: An application in the telecom industry. Applied Soft Computing, 11(2):1697–1710, 2011. [18] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. Joural of Machine Learning Research, 3:993– 1022, March 2003. [19] D. Gusfield. Algorithms on strings, trees and sequences, Computer Science and Computational Biology. Cambridge University, 1997. [20] J.M. Juarez, M. Campos, J. Palma, and R. Marin. Tcare: temporal case retrieval system. Expert Systems, 28(4):324–338, 2011. [21] X. Wei and W.B. Croft. LDA-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’06, pages 178–185, New York, NY, USA, 2006. ACM. [22] Y. Cheng. Mean shift, mode seeking, and clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790–799, aug 1995. [23] A.K. Jain, M.N. Murty, and P.J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, 1999. [24] L. Ertoz, M. Steinbach, and V. Kumar. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In Third SIAM International Conference on Data Mining (SDM), pages 47–58, 2003. [25] E. Rosch, C. Mervis, W. Gray, D. Johnson, and P. BoyesBraem. Basic objects in natural categories. Cognitive Psychology, 8:382–439, 1976. Zhengxing Huang received his B.S. in 2003, and Ph.D. in 2010 in the College of Biomedical Engineering and Instrument Science at Zhejiang University, P.R. China. At present he is an instructor of the College of Biomedical Engineering and Instrument Science at Zhejiang University, P.R. China. His research interests include computer-aided medical decision support, artificial intelligence in medicine, etc. Wei Dong received her B.S. in Clinical Medicine from Taishan Medical College, China in 1993, M.S. in Cardiology from PLA General hospital, China in 1999, and Ph.D. in Cardiology form PLA General hospital, China in 2002. At present, she is a deputy chief physician of the Cardiology department of the PLA general hospital, a young faculty of Chinese Society of Cardiology, and a young faculty of PLA Society of Cardiology. Her research interests include coronary heart disease, the diagnosis and treatment of acute and chronic heart failure, and clinical decision support. Huilong Duan received his B.S. in Medical Instrumentation from Zhejiang University, China in 1985, M.S. in Biomedical Engineering from Zhejiang University, P.R. China in 1988, and Ph.D. in Engineering (Evoked Potential) form Zhejiang University, P.R. China in 1991. He is currently a Professor in the Department of Biomedical Engineering, and Dean of College of Biomedical Engineering & Instrument Science, Zhejiang University. His research interests are in Medical Image Processing, Medical Information System and Biomedical Informatics. He has published over 100 scholarly research papers in the above research areas. He is Program Committee Member of Computer Aided Radiology and Surgery; Editorial Board of Space Medicine & Medical Engineering and Chinese Journal of Medical Instruments respectively; Editorial Board of Chinese Journal of Biomedical Engineering; Secretary-General of BME Education Steering Committee, Chinese Ministry of Education; and Member of The Brain-Bridge Program Committee, Philips, TU/e and ZJU. Haomin Li is assistant professor of Biomedical Engineering, Zhejiang University, China. He holds a Ph.D. degree in the Biomedical Engineering from Zhejiang University. A former post doctorate fellow researcher at NHLBI Proteomics Center, David Geffen School of Medicine at UCLA. His research interests focus on clinical Knowledge Translation and Decision Support recently. Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.