Matheson Ramsey ramml003 Using an ontology in place of flat data for Sequential Pattern Mining A minor thesis for the degree of Bachelor of Computer Science (Honours) School of Computer and Information Science University of South Australia 25/10/2010 Supervisor Jan Stanek i Table of Contents Glossary ................................................................................................................................................... v Abstract .................................................................................................................................................. vi 1 Introduction ......................................................................................................................................... 1 1.1 Motivation..................................................................................................................................... 3 1.2 Research Question ........................................................................................................................ 3 2 Literature Review ................................................................................................................................. 3 2.1 Data Mining in Health Informatics ................................................................................................ 3 2.2 Sequential Pattern Mining ............................................................................................................ 5 2.3 Drug Ontologies ............................................................................................................................ 6 2.4 Electronic Health Records ............................................................................................................. 7 3 Methodology ........................................................................................................................................ 8 3.1 Raw data ....................................................................................................................................... 8 3.1.1 Ethical Considerations............................................................................................................ 8 3.1.2 Data summary ........................................................................................................................ 8 3.2 Pre-processing............................................................................................................................... 9 3.3 Sequential Pattern Mining (SPM)................................................................................................ 10 3.3.1 Algorithm summary ............................................................................................................. 11 3.3.2 Sequential Pattern Mining process ...................................................................................... 12 4 Results ................................................................................................................................................ 13 4.1 Output Format ............................................................................................................................ 13 4.2 Flat Data Pattern Mining Results ................................................................................................ 14 4.3 Ontology Data Pattern Mining Results ....................................................................................... 14 5 Discussion........................................................................................................................................... 16 5.1 Results Analysis ........................................................................................................................... 16 5.2 Using ontologies.......................................................................................................................... 17 5.3 Interestingness ............................................................................................................................ 18 5.4 Further implications .................................................................................................................... 19 6 Future Work ....................................................................................................................................... 19 6.1 Therapeutic pathways................................................................................................................. 19 6.2 Changing Granularity dynamically .............................................................................................. 20 ii 6.3 Exploring Interestingness ............................................................................................................ 20 6.4 Applying to other fields............................................................................................................... 20 7 Conclusion .......................................................................................................................................... 21 8 References ......................................................................................................................................... 21 9 Project Timeline ................................................................................................................................. 24 Appendix A – Ethics Approval Application ............................................................................................ 25 Appendix B – Results............................................................................................................................. 32 iii Table of Figures Figure 1: a prescription pathway ............................................................................................................ 1 Figure 2: the ATC drug classification model and an example for Propicillin .......................................... 2 Figure 3: the program process flow ........................................................................................................ 8 Figure 4: re-coding dosage information.................................................................................................. 9 Figure 5: using the WHOCC online ATC index to obtain ATC codes ..................................................... 10 Figure 6: preparing the prescription pathways for the sequential pattern mining .............................. 11 Figure 7: generating patterns by testing candidates against support thresholds ................................ 12 Figure 8: combining 2-item patterns into 3-item candidates, and then halting because no 3-item patterns are discovered ........................................................................................................................ 12 Figure 9: example output of pattern mining program.......................................................................... 13 Figure 10: a summary for a stage of pattern mining ............................................................................ 13 Figure 11: a summary of the patterns discovered for the flat data ..................................................... 14 Figure 12: a comparison of number of 2-length patterns found at each level of ATC ontology .......... 15 Figure 13: the ATC level 1 patterns ....................................................................................................... 15 Figure 14: an example of dilution ......................................................................................................... 16 Figure 15: an example of contamination .............................................................................................. 16 Figure 16: an unusual pattern found from pattern mining .................................................................. 17 Figure 17: a prescription pathway and its equivalent therapeutic pathway ........................................ 19 iv Glossary Pathway Clinical Pathway (CP) Prescription Pathway (PP) Tuple Flat data Node Hierarchy Ontology Granularity Contamination Dilution Pattern Sequential Pattern Mining (SPM) Interestingness A sequence of drug prescriptions over time A pathway that has been designed by medical professionals to be followed in order to treat a certain condition or disease The pathway as seen by the prescriber. It shows what drugs were prescribed in what order; it does not involve which drugs are being consumed at the same time, or how long prescriptions last for A single item in a dataset; one row of values A set of data for which all attributes are numeric or categorical; no hierarchical objects, objects within objects, etc. A single item that links to others. If a node links to another node it is that node’s parent. If a node is linked to by another node it is that node’s child A data structure that has multiple levels of nodes. A hierarchy with a strict “is-a” relationship between parent and child levels of nodes The level of the hierarchy that is being referred to. A lower granularity implies working higher up in the ontology (less granular being less specific), while a high granularity implies working lower down in the ontology (more granular being more specific) Using a level of the hierarchy that is too high, so there are too many unrelated subgroups included, and the pattern loses its meaning Using a level of the hierarchy that is too low, so information is too specific, and patterns may not emerge A sequence of prescriptions that occurs often in the dataset The process of extracting patterns from the dataset The degree to which a pattern can be considered interesting or meaningful v Abstract This paper investigates the impact of using an ontology in place of flat data on sequential pattern mining. Prescription data is modelled in an ontology using the ATC drug classification. Sequential pattern mining is performed by a new algorithm developed for this study, based on AprioriAll. The algorithm searches for patterns across a sequential database, as well as within each sequence, and uses 2 support thresholds to ensure meaningful results are obtained. The research shows that introducing granularity by the use of an ontology does increase the amount of patterns found, and the paper also explores some of the effects of using data abstraction such as contamination and dilution. The paper summarises by observing that whilst there is an increase in pattern discovery as the prescription data abstracts further up the ontology, this does not necessarily reflect a gain in information from the database. Field of thesis Health informatics; Data Mining: Sequential Pattern Mining; Ontologies vi 1 Introduction The use of electronic support systems in health care is increasingly important (Hillestad et al. 2005). Studies have shown that 90% of general practitioners (GPs) use a clinical software package, and 98% of these GPs use the clinical packages for prescribing (McInnes, Saltman & Kidd 2006). This means there is an abundance of rich heath data available on general practitioners computers. However, a lot of this data is stored as free text, and as a result it is not easily interpreted. Using computers for writing prescriptions offers several benefits for GP’s. Therefore prescription data is one of the most complete and structured types of data in general practice (Hassey, Gerrett & Wilson 2001). The prescription data can be represented in a number of ways to aid the GPs; such as tabular summaries (Wroe et al. 2000), or as a series of connected nodes where the nodes represents different prescriptions; forming a prescription pathway (Stanek et al. 2005). A prescription pathway represents the sequence of prescriptions for a particular patient over a certain period of time as prescribed by the GP. Creating pathways can make the data more interpretable, easier to follow, and additional relationships between prescriptions may become apparent. Figure 1: a prescription pathway The concept of these pathways is based on the use of flat prescription data. However, drugs are by nature hierarchical, as they belong to a series of categories. Hence they can be modelled in an ontology; such as the Anatomical Therapeutic Chemical (ATC) drug classification, where different levels of the ontology represent different groups of drugs (WHO 2010). The different levels range from specific chemical substances to broad anatomical main groups (see figure 2). If an ontology like this is used with the prescription drugs in the pathways, it is possible to analyse the prescription pathway at different levels of granularity. For example, the pathway can be viewed at the most specific level (which can be seen as using the original flat data), or the drugs can be abstracted into chemical subgroups using the ATC drug classification, and then further in to pharmacological subgroups, and so on. Different information exists at different levels of the ontology, which can be useful for different methods and applications. When observing a pathway, a low granularity implies a more general pathway (for example, level 1 or 2 of the ATC drug classification) and might give the observer a simpler understanding of what the pathway is trying to achieve. Meanwhile a high granularity gives a more specific pathway (for example, ATC level 5), which gives a thorough understanding of the specific drugs in the pathway. 1 Figure 2: the ATC drug classification model and an example for Propicillin Changing the granularity of the prescriptions can have precarious effects. For example, figure 2 shows a model for Propicillin. At level three it can be observed that it is part of the penicillins group. Some people have allergies to Penicillins; so if data-mining for this pattern, logically the best level to operate at is level 3. If the drugs are modelled at more specific levels (e.g. level 5 - J01CE03 Propicillin); the pattern gets weaker, as it is spread across a large amount of variables, hence the pattern is diluted. It may get weaker to the extent that it does not emerge as a pattern at all, resulting in a loss of information from the dataset. Conversely, if a lower granularity is used (e.g. level 2 - J01 - Antibacterials for systemic use); the pattern showing penicillin allergy may disappear, as the J02 group contains other groups of antibacterial drugs unrelated to the penicillin-type allergy. In this sense, the pattern becomes contaminated by unrelated drugs. The concepts of dilution and contamination are used throughout this paper as explained here. These notions highlight the potential importance of incorporating an ontology knowledge base for the discovery of patterns. The traditional approach to discovery of these pathways is sequential pattern mining (SPM). Sequential pattern mining is the process of trying to find the relationships between occurrences of sequential events, to find if there exists any specific order of the occurrences (Zhao & Bhowmick 2003). By performing sequential pattern mining on the prescription data and implementing the knowledge of a drug ontology, there is the potential to find many more new and interesting patterns that are not present in the mining of flat data. There are several algorithms to perform sequential pattern mining, which are explored in section 2.2. 2 1.1 Motivation Traditional path mining algorithms work on flat data without taking into account hierarchies or ontologies. This may not be sufficient in some cases. Prescription data is by nature hierarchical, so there is an opportunity to explore the impact of an ontology on the sequential pattern mining in this project. There is little current indication as to what effect a change in granularity has on the usefulness of the sequential pattern mining. This research explores how changing the granularity affects the sequence pattern discovery process. It investigates the concepts of dilution; that is, if the exact chemical components used in each prescription are always mined; the associations may be too weak, and contamination; if a low level of granularity is always selected, the pathways may be too ambiguous. This research has wider implications in the field of data mining; as this use of an ontology could benefit the approach to finding patterns in data, and obtaining more results from datasets that can be modelled in an ontology. For the health informatics domain; being able to ascertain which granularities are appropriate for which purposes will make the prescription data far more useful for GPs. Simply omitting all other granularities in favour of one results in the potential loss of important information. This research is necessary to identify the effect of using an ontology on the pattern discovery for general practice data. 1.2 Research Question This project focuses on analysing the impact of applying an ontology for data mining ordinarily flat data. The research question ‘what impact will the use of an ontology in place of flat data have on the success of sequential pattern mining?’ is explored. 2 Literature Review This section will focus on some previous research into several core aspects of this minor thesis. Some similar work involving data mining in health informatics is explored, as well as some supplementary literature to support aspects of the project; covering the data mining aspect with some sequential pattern mining methods; the drug ontology with some conceptual work on ontologies and some previous work with drug ontologies and granularity; and the nature of the raw data with research regarding the use of electronic health records. 2.1 Data Mining in Health Informatics There is a large amount of research in health informatics that uses data mining. Health informatics is the science of applying Information Age technology to serve the specialised needs of public health (Friede, Blum & McDonald 1995). Data mining has become a key benefactor to the progress of the integration of health information systems into general practice. This project draws on concepts proposed in the work by Stanek et al. (2005). Stanek et al. proposed a method that compares practice patterns to clinical pathways. The research was focused on patients with diabetes and hypertension. There was also use of the ATC drug coding system, but the researchers did not fully utilise the hierarchical nature of the drug ontology for the data mining, and instead used a set granularity for all experiments. Stanek et al.’s methods are tractable for smaller domains; however the intent of this project is to apply a similar methodology to a far broader area, where these methods quickly become problematic. 3 Some other related work involved the adaption of Bayesian Networks for discovering temporal-state transition patterns, specifically in the hemodialysis process (Lin, Chiu & Wu 2002). Lin, Chiu & Wu’s research focused on learning clinical pathways, so that pathways for admitted patients could be predicted. Lin, Chiu & Wu used a rich set of data including more attributes such as test results; and created a set of states, events, and actions. The research proved very successful; but did not implement knowledge of an ontology for the drugs. Also, Lin, Chiu & Wu’s research involved a broader data set with attributes that are not available for this research. It was also specific to hemodialysis, which limits its applications. Bei et al. (2005) performed some correlated work with a system called Portal. Bei et al.’s research was focused on improving the quality of procedures by giving continuous support to physicians. They performed some rule extraction for the selection of pacemaker systems for new patients, and implemented a simple business logic flowchart system to automatically classify new patients. A flowchart type implementation is not be suitable for this project; due to the immense size of the flowchart required to model all possible prescription pathways. The researchers went on to identify the potential for data mining for long processes (such as long prescription patterns). Bei et al.’s research highlighted the need for optimised support systems to reduce costs and improve the quality of procedures. Another associated piece of research was the use of the Hidden Markov Model (HMM) to learn clinical pathways (Lin, Hsieh & Pan 2005). The researchers modelled the process of spontaneous delivery of patients, and developed a 4-state pathway that accurately encompasses normal spontaneous delivery. The model was trained with the patient data, and visualised in a manner that simplifies the pathway for doctors. The intent was to learn the clinical pathway, so the outcomes were defined; unlike the research proposed in this paper, which was poised to perform unconstrained data mining the data for patterns and observe the results. There was also no use of a drug ontology, and the data used was not strictly prescription data. The outcome of the research was a model that accurately recreated clinical pathways that could be used to predict possible paths for an admitted patient. Work by Riou, Pouliquen & Beeux (1999) had the goal of predicting the best drug for a prescription based on the clinical background of a patient. The methodology was not related to mining for patterns, and was rather regarding analysing patient’s disorders, pathophysiological conditions, age, and other factors to determine the next step in the clinical process. A tool was developed from the premise that junior residents and medical students had difficulties selecting the most appropriate drug for a given scenario. Riou, Pouliquen & Beeux proposed the use of the ATC codes, but decided against it due to the limitations of only maintaining one use, and not fitting indicators or other properties, and instead opted to develop their own drug knowledge base. Some issues in regard to the use of the ATC drug classification for this project are explored in section 3.2. Another piece of related research was regarding mining time dependency patterns in clinical pathways (Lin et al. 2001). Lin et al. intended to find patterns of process execution sequences that showcased the dependant relation between activities. The researchers developed a method to discover the patterns of clinical pathways using patient records and clinical log data. The research covered the broader domain of complete clinical pathways, so additional data was used in the 4 process, whereas the research for this minor thesis focuses on the pathways relating to drug prescriptions only. Furthermore, there is no use of ontologies, which is intended for this research. 2.2 Sequential Pattern Mining The concept of mining for patterns in sequences of data has been implemented and improved in many applications. It stems from the field of Data Mining; which is the process of extracting interesting information or patterns from information repositories (Chen, Han & Yu 1996). Sequential pattern mining (SPM) is the process of trying to find the relationships between occurrences of sequential events, to find if there exists any specific order of the occurrences (Zhao & Bhowmick 2003). Many methods of SPM have been developed, some of which were explored and evaluated for this project. One of the earliest and possibly the simplest algorithms developed for SPM is AprioriAll (Agrawal & Srikant 1995). It is based on the Apriori principle from data mining for association rules, and is a very rudimentary method for finding sequential patterns. It finds single frequently occurring items in the dataset and then attempts to find sequences of them. The straightforward nature of the algorithm may make it serviceable for this project. However, there are some limitations; the algorithm cannot detect frequent patterns within a sequence. For example, if one patient’s prescription history is used as a sequence, patterns within that prescription history are not identified. The sequence may be detected as a candidate, but if it is not frequent across all patients, it is not identified as a pattern. As patters within each patient’s records are important, this is not sufficient. PrefixSpan (Pei et al. 2004) is a more optimised alternative to AprioriAll. It provides the same functionality at a lower computational cost. It works by recursively projecting the database into a set of smaller databases and growing the patterns by recombining the sets. The authors found it to address some of the shortcomings of the candidate generate-and-test methods, and in several experiments it was shown to be the best performer. Although the algorithm is more efficient, it possesses several undesired traits for this project. Similar to AprioriAll, it does not detect patterns within sequences. Furthermore, the algorithm finds transitive patterns, i.e. patterns with gaps. For example, for the sequence A -> B -> C; A -> B is as much a pattern as A -> C. This is detrimental to the project’s outcomes, as each drug in a prescription pathway is very important, and occluding a drug or series of drugs may produce misleading results. SPIRIT (Sequential Pattern mining with Regular expression constraints) is an optimised algorithm designed to mine user-specified patterns (Garofalakis, Rastogi & Shim 1999). This proves interesting if searching for particular sequences of drug prescriptions, such as known clinical pathways. Dowsey et al (1999) showed that the use of clinical pathways reduced the duration of admission for patients. By searching for parts of clinical pathways, general practitioners adherence to them could be evaluated. However, as this minor thesis is not directed at any particular set of drugs or clinical pathways, this is not within the scope of this project. There are also methods involving multiple attributes, called multi-dimensional sequential pattern mining (Pinto et al. 2001). These are useful for adding new information such as age groups, dates, or complex attributes to patterns. Due to the nature of the data, no other information is guaranteed to be usable, so this was not implemented. 5 Another extension on conventional pattern mining is incremental mining, introduced in Parthasarathy et al.’s work (1999) and explored by Zhang et al. (2001). This is useful for datasets that continue to change over time. In this project space the dataset is not changing so this has not been implemented. This could prove as an interesting extension to the sequential pattern mining in the future. Periodic Pattern Analysis involves the limiting of pattern mining to certain periods, to specify when to check for recurring patterns (Han, Dong & Yin 1999). Naturally this is most effective if the data is collected over long periods of time. This could be used to explicitly find monthly or yearly patterns, but does not regard the initial concern of testing the impact of ontologies on data mining. Other research in the area includes optimising for linked objects in a distributed system (Chen, Park & Yu 1998), and creating hybrid combinations of other methods (LeniC & Kokol 2002). Many advanced and optimised methods have been developed for sequential pattern mining, but the requirement of this minor thesis is a simple method for a proof-of-concept, so many prove unsuitable. 2.3 Drug Ontologies Ontologies are explicit formal specifications of the terms in the domain and relations among them (Gruber 1993). Noy & McGuinness say ontologies are used to share common understanding of the structure of information, to make domain assumptions explicit, to separate domain knowledge from the operational knowledge and to analyse domain knowledge (2001). In an attempt to liken these definitions to this project; the domain can be seen as the set of prescription drugs, along with a formal definition of the explicit “is-a” relationship between ontology levels and their parent levels (for example, each item at level 3 in the ontology is part of an item at level 2). There are several implementations of ontologies for the domain of health informatics. As discussed earlier, the Anatomical Therapeutic Chemical (ATC) drug classification is one such drug ontology (WHO 2010). This ontology provides a simple hierarchical breakdown of the field of prescribable drugs, and is easily accessible from the organisations website. Whilst the ATC classification does have some flaws which are addressed in section 3.2, it is a widely used standard that provides a malleable ontology for the development of a proof-of-concept system in a simple and practical manner. Rector et al. (1998) proposed some requirements for developing ontologies to be used in medicine. Rector et al. identify that an ontology should be treated as an "assembly language", and that it should be viewed as a “pure tree in which the branches at each level are disjoint but nonexhaustive subconcepts of the parent concept” (Rector et al. 1998). These elements provide the basis of some further work into developing drug ontologies. One such piece of research involved the development of Prodigy: a reusable and automaticallyclassified ontology to describe the chemical composition of the drugs, as well as a dictionary of prescribable products, which includes more volatile information such as the pack sizes and preparations (Solomon et al. 1999). Whilst this does create a more robust and descriptive system, it complicates the knowledge base by incorporating non-useful or unavailable data (in regard to this research) into the drug ontology, so this method was not selected for this research. 6 Wroe and colleagues used a descriptive logic named Grail to implement an ontology based on existing pathology and physiology ontologies to create formal descriptions of a generic drug’s clinical properties. This was used to include indications, contradictions, side effects and other properties in the definition of the drugs (Wroe et al. 2000). This proved promising for sorting and grouping drugs, and possibly finding multi-dimensional patterns. However like with Solomon et al.’s research, this served to complicate the knowledge base and made it intractable for this research. One of the most influential pieces of work for this study was an investigation into the effects of data granularity on data mining (Andrusiewicz & Orlowska 1997). Whilst granularity is not directly linked to ontologies, they serve the same purpose as both involve abstracting data to more general cases to enhance data mining. In Andrusiewicz & Orlowska’s work, a formal definition of some data mining and database concepts is given, followed by an explanation of how the concept of changing data granularity could influence the data mining outcomes. Andrusiewicz & Orlowska found that decreasing the granularity (akin to moving up the ontology) can lead to the discovery of patterns that were not present earlier. This observation is similar to the aforementioned definition of dilution. The research demonstrated the effects of introducing data granularity to a data set, much the same as this project’s introduction of an ontology. 2.4 Electronic Health Records The data used for the program consists of electronic forms of patient records. Storing electronic health records (EHR) has become prevalent in general practice. Keeping digital copies of health data presents many opportunities as well as legal issues and complications, which are explored. Hillestad et al. (2005) discussed the estimated savings, costs, safety benefits and other health benefits, in order to show the potential profit that the use of electronic medical records can produce for the industry. Hillestad et al. compared the use of I.T. in health to many other sectors such as telecommunications, securities trading and retail to forecast the financial benefits of investing in health informatics. Hillestad et al.‘s research showed the importance of dedicating resources to the development of electronic health records, and showed the likelihood that the trend of increasing adoption of EHR will continue, making it a valuable source of data. There has been research involving extensions to EHR, such as the development of Virtual Medical Records (vMR) (Johnson et al. 2001). These are an abstraction of conventional medical records; stripped down to things necessary for modelling guidelines and protocols. This ongoing interest in digital health information stresses how prevalent it is becoming, and further identifies it as a valuable source of data. Replicating health records on computer systems presents many legal issues regarding privacy and ownership of information, as highlighted by Friedman (2006) and Hodge, Gostin & Jacobson (1999). Debate continues to occur regarding the use of EHR for research, and this was the driving factor behind the need to de-identify data before analysing it, to avoid any ethical and privacy issues. 7 3 Methodology In this minor thesis a proposal is given for a system to pre-process prescription data, construct prescription pathways from the data, and then execute the sequential pattern mining on the prescription information. A process flow that outlines the running of the program can be seen in figure 3. Figure 3: the program process flow This section explains the nature of the raw data, the methods used to pre-process the data, as well as an explanation of the sequential pattern mining method. 3.1 Raw data 3.1.1 Ethical Considerations This research is based on real prescription data from general practices. The data used for this research was used in similar previous studies and was properly de-identified (i.e. no identifiers or other data related to the identity of the patient is contained in the data). In no way can this information lead to re-identification of the persons by any members besides the data-providers. Ethics approval was granted to re-use the dataset for the current study. The data collected will need to be stored for seven years after the research is concluded, due to standard university procedures. An ethics approval form has been submitted and approved to ensure to upmost adherence to ethical research policies, which is included in Appendix A. 3.1.2 Data summary The entire dataset contains 67,150 tuples for 2931 patients. A smaller dataset of 630 constituting 13 patients’ prescription histories has been extracted prior to this research, selected as being mostly complete records and feasible for this proof-of-concept study. Each tuple of the dataset contains the information for one single drug prescription. Cumulative prescriptions for single patients are spread over multiple tuples. The data has already been stripped of any identifying attributes, as addressed in section 3.1.1. Each tuple has the following attributes: Attribute patient_pkey filenumber provider_pkey date script_number drug_name dosage dose repeats packsize quantity form Meaning A string of characters unique to each patient Irrelevant A string of characters unique to each prescription provider The date the prescription was given Irrelevant The brand name of the drug prescribed in plain text Notes on the dosage How much of the drug to take at one time How many times to repeat the prescription (0 = only one prescription) How many dosages in a pack Duplicate of packsize The form of the prescribed drug (tablet, ointment, etc) 8 formulary druggen generic pbs use Irrelevant Chemical name of the drug prescribed Irrelevant Irrelevant The condition the prescribed drug is treating Many of the attributes in the data are not required for this project. The patient_key is needed to link prescriptions for the same patient. The druggen contains the chemical name of the drug, and is used to get the ATC code for the prescription pathways. The date is used to ensure the prescription pathway proceeds from the earliest point in time to the latest (i.e., it is a record of what drugs were prescribed in what order). The dose, repeats and packsize can be combined to deduce how many days the prescription lasts for. Together, this smaller set of attributes forms the dataset required for this research. 3.2 Pre-processing The data requires several stages of cleaning and processing before it is suitable for sequential path mining. These stages include addressing missing values; recoding dosages; mapping drug names to ATC codes; and constructing prescription pathways. The first issue addressed is the possibility of missing values. The issue of missing values exists in any dataset, despite the prescription data being the most reliable available (Hassey, Gerrett & Wilson 2001). Any tuples with missing prescription drugs are seen as a data entry error, and must be removed. Whilst this is not ideal, it is a far better option than trying to fabricate a drug prescription to overcome the data entry error. Missing values of packsize or dose are replaced with 1. Again, this is not optimal, but the only negative effect of this is an inaccurate length of prescription duration. Furthermore, this does not impact the data mining for this project, as this duration information is not used. The repeats, date, and patient_key values are complete for all records. The next step of the pre-processing is to recode the forms of the dose. Different GPs have the tendency to record dosage information differently. Different methods of recording such as ‘1 daily’, ‘one’, ‘1 n’, etc need to be recoded to a uniform representation. This is performed by searching for particular strings and modifying the dose number to reflect a dosage per day. For example, a base dosagePerDay is initialised to 1. If the substring ‘per week’ is found, the dosagePerDay is divided by 7. If ‘3’ is found, the dosagePerDay is multiplied by 3, and so on. In this manner, doses such as ‘1/2 a tablet 5 times per week’ are interpreted correctly, and exported as a standardised number of units per day. Figure 4: re-coding dosage information 9 This information can then be combined with the packsize and repeats to calculate how many days a prescription lasts for. However, as this research only regards the prescription pathways, and hence has no notion of prescription durations, this is beyond the scope of this project. This aspect of the software is provided as a foundation for further research. The next step in the pre-processing is to replace the drug names with their respective ATC codes. A Global is created in MUMPS (similar to an array in other programming languages) that maps all drug names to their respective codes. This is generated by utilising the World Health Organisation Collaborating Centre (WHOCC) online ATC classification index. The WHOCC offers a service where a drug name can be input, and the respective ATC code is received. This is used to generate a Global of ATC codes and drug names, which can be used to automatically map the drug names to ATC codes. Figure 5: using the WHOCC online ATC index to obtain ATC codes There are also several features of the ATC classification that are not desired for this minor thesis that will now be addressed. Firstly, the ATC system assigns unique codes to certain combination drugs. In this research, it is preferred if these combination drugs be represented by the codes for each of their components – as if multiple single drugs were prescribed. This can be achieved by simply modifying the Global to reflect these conversions. One other undesirable trait of the ATC drug classification in relation to this project is the handlings of drugs that have multiple uses. Because the ATC drug classification does not implement a multiple-hierarchy; if a drug has several uses or forms, it can exist at multiple points in the ontology, and hence have multiple codes. As this is a large implementation issue, and does not necessarily affect the path mining process, it is sufficient to omit multiple codes for drugs in favour of a single code for this project. The final step of the pre-processing is to generate a prescription pathway for each patient. The prescription pathway is constructed by linking all sequential patient prescriptions, which are all the tuples with the same patient_pkey. The prescription pathways are stored in a single multidimensional Global, where one dimension represents the patient number and the second dimension represents the prescription number. This ensures enough information is maintained to determine where patterns occur, i.e. in which patients and at what point in their prescription pathways. 3.3 Sequential Pattern Mining (SPM) For the purpose of these studies, a modified sequential pattern mining algorithm has been created to suit the data and requirements of the project. The flaws of the comparative algorithms are discussed in relation to this project, as well as the benefits of the developed algorithm, followed by an overview of the sequential pattern mining process. 10 3.3.1 Algorithm summary The sequential pattern mining was performed based on the ATC drug codes. Due to the nature of conventional SPM algorithms such as AprioriAll (Agrawal & Srikant 1995) and PrefixSpan (Pei et al. 2004), it would not be sufficient to simply feed in each patient’s pathways as a sequence, as this would not yield meaningful results. This is because these algorithms simply test for the presence of an itemset in a sequence, not for the frequency. For example, a sequence of Drug A -> Drug B -> Drug A -> Drug B -> Drug A -> Drug B that is only given to one patient may be significant, but would not be identified as a pattern. For this reason, a modified sequential pattern mining algorithm is developed that finds patterns within patients, as well as across the set of patients. This is achieved by feeding each part of the pathway into the algorithm as a separate sequence, as shown below in figure 6. In this manner, any and all frequently occurring patterns within patient’s pathways are discovered. Figure 6: preparing the prescription pathways for the sequential pattern mining This also addresses another issue of comparative SPM methods; transitive sequences. In a medical context, the exact order of the prescriptions can be very important. However, the SPM algorithms that were evaluated simply check for the presence of one item at some point after another item. For example, if you consider the pathway in figure 6, “A01AA02 -> B01AB02” would be considered as frequent as “A01AA02 -> C01CD02”, even though C01CD02 does not directly follow A01AA02. Due to the length of the prescription pathways, many unrelated and misleading “patterns” may be discovered by this property. The current SPM algorithms could be modified to prevent this from occurring; however by isolating each sequential prescription as an individual sequence, these problems are easily avoided. To determine what is selected as a pattern from the frequently occurring sequences, two support thresholds are used. A standard ‘minSupport’ is used; which is a minimum number of times the sequence has to occur across all prescriptions. The second threshold is a ‘minPatient’; which is a minimum number of patient prescription pathways the sequence has to occur in. The sequence is considered to be a pattern if either of these criteria is met. Ignoring the first variable would result in frequently occurring prescriptions within limited numbers of patients being missed. Ignoring the second variable results in prescription patterns that are not re-prescribed to the same patients being 11 ignored (i.e. the pattern is prescribed to many patients, but since it does not re-occur, it does not meet the minSupport). Using this dual-threshold method, the most frequent prescription sequences across the whole dataset are discovered, as well as the prescriptions that occur in the most patients. 3.3.2 Sequential Pattern Mining process To run the experiments, different levels of granularity of the underlying data are created (using the ATC ontology to generate the tests). The level five “flat data” is analysed first (refer to figure 6). The frequently occurring 2-item sequences are found by generating the candidate 2-item sequences and counting how often each sequence occurs, similar to the AprioriAll method (Agrawal & Srikant 1995). Note that unlike AprioriAll, 1-item sequences are not generated, as they are not required. The candidates are then tested against the two support thresholds, to obtain the patterns. This process illustrated in figure 7. Figure 7: generating patterns by testing candidates against support thresholds From these 2-item patterns the 3-item candidates are generated by combining all frequent 2-item patterns than occur sequentially in the same patients, and then testing against the support thresholds again to deduce the patterns. This process continues until no candidates reach the support thresholds, at which point execution is halted. This is illustrated in figure 8. This process borrows from previous SPM methods, but is tuned to this project to obtain the most meaningful results achievable. Figure 8: combining 2-item patterns into 3-item candidates, and then halting because no 3-item patterns are discovered 12 Further experiments are then generated by recoding the pathways into higher levels of the ATC ontology. For example; N05CD07 becomes N05CD, A10BA02 becomes A10BA, etc. After searching for patterns at this level, drugs are recoded further, for example N05CD becomes N05C. This process iterates until patterns mining at all five levels of the ATC drug ontology has been completed. 4 Results The above methodology was implemented on the dataset containing 630 prescriptions across 13 patients. The program generated patterns and output them to file for manual interpretation. After some experimenting, a minSupport of 6 and minPatient of 3 were selected for delivering optimal results. This is equivalent is approximately 1% of all prescriptions and 25% of patients. In other words, for a sequence to be identified as a pattern, it must occur in 1% of the total number of prescriptions, or in 25% of the patients. 4.1 Output Format Some example output is given in figure 9, and the entire set of results can be found in Appendix B. The figure below gives a view of how the results are delivered to the user. The drug names and codes are displayed, as well as the index of each occurrence (giving the patient and the prescription number for that patient). The total number of occurrences (which must be more than minSupport) and the number of patients (which must be more than minPatients) are also displayed. This formatting provides all the required information in a neat and readable manner, which is required for manually inspecting patterns. Figure 9: example output of pattern mining program The program also provides summaries after each stage of the data mining, as seen in figure 10. This is beneficial for obtaining statistics about the overall progress and success of the pattern mining. Figure 10: a summary for a stage of pattern mining 13 4.2 Flat Data Pattern Mining Results Using the support thresholds of minSupport = 6 and minPatients = 3 with the original ATC level 5 data, the sequential pattern mining generated a total of 55 2-item patterns, of which there were 8 unique patterns. No 3-item sequences met either of the support thresholds, so no 3-item patterns were identified. These results represent a baseline as the limited set of outcomes produced from standard analysis of flat data. The patterns found are displayed in figure 11. Note how the first two patterns only meet the minPatient threshold, while the 3rd-5th patterns only meet the minSupport. If only one of these thresholds was used, many of these patterns would be missed. Figure 11: a summary of the patterns discovered for the flat data 4.3 Ontology Data Pattern Mining Results More patterns started to emerge as the codes were modified and abstracted further up the ontology. Patterns consisting of more than two prescriptions also began to appear. The full set of results is attached in Appendix B, and the results are summarised in the below table. Level 5 4 3 3 2 2 1 1 1 1 1 Pattern Length 2 2 2 3 2 3 2 3 4 5 6 Unique Patterns 8 11 15 1 20 1 31 25 11 2 1 14 Total Patterns 55 73 109 6 152 6 430 211 73 17 7 The “level” indicates the stage of the ATC drug classification, for example level 5 is the chemical substances, e.g. J01CE03, and level 1 is the anatomical main groups, e.g. J (see figure 2 for a summary of each level). The “pattern length” indicates how many items were in the pattern, for example a 2-length pattern may be J01 -> B01, meanwhile a 3-length pattern may be J01 -> B01 -> C03. Number of Patterns The graph in figure 12 compares the number of 2-length patterns than are discovered at different levels of the ATC ontology. This gives an insight as to how the success of the pattern mining is impacted by the change in the granularity. 500 450 400 350 300 250 200 150 100 50 0 TotalPatterns UniquePatterns 5 4 3 2 1 Ontology Level Figure 12: a comparison of number of 2-length patterns found at each level of ATC ontology Number of Patterns From these results, an increase in the amount of patterns occurring as the prescription data moves further up the ontology is observed. Of particular mention is the drastic increase as the data moves to level 1 pattern mining; 430 total patterns are found. This number is especially large when you consider there were only 630 total prescriptions in the dataset; almost 70% of the 2-item sequences are now frequent. The level 1 patterns also reach up to 6 items in length, far longer than the previous levels. The level 1 results are visualised below. 500 450 400 350 300 250 200 150 100 50 0 TotalPatterns UniquePatterns 2 3 4 5 6 Number of items in pattern Figure 13: the ATC level 1 patterns 15 This graph illustrates the large difference between the total number of patterns found and the number of unique patterns. This is especially apparent when pattern mining at this higher level of the ontology. There is also a significant decline in patterns as the number of items in the patterns increases. 5 Discussion In this section, the results are interpreted, providing some evidence of the existence of contamination and dilution in granular data. The impact of using ontologies in place of flat data will then be discussed, followed by the interestingness of patterns discovered from using ontologies. 5.1 Results Analysis The results show a large increase in emergent patterns (in particular, total patterns) as the prescriptions progress up the ATC ontology. This is because as the data progresses up the ontology, each item below is contained by the item above. This means that each pattern at level n will also be a pattern at level n-1, as the items at level n are contained in the level n-1 representation (along with a group of other drugs). Furthermore, as the prescriptions move up the ontology and the number of possible states (and hence the number of possible combinations of these states) decreases, mathematically more patterns emerge. Whilst there may be an increase in the number of patterns, this does not necessarily result in a gain in information from the data. The concepts of contamination and dilution are explored below, to investigate the effects of using an ontology on the results of the pattern mining. After manually inspecting the results, three cases are presented for analysis. The first is an example of dilution: Figure 14: an example of dilution This pattern was found at level 4 of the ontology mining, and represents the equally most frequent pattern at this level. As such it can be seen as one of the most important patterns. However, the pattern does not occur lower in the ontology at level 5; as at that point the added information spreads the pattern over multiple sequences, which do not meet the minimum support (in this case; C09AA05 -> C10AA05, which occurs 4 times in 1 patient, and C09AA10 -> C10AA05, which occurs 5 times in 1 patient). This is an example of dilution; performing the sequential pattern mining on the more specific data results in this potentially meaningful pattern being missed. The second case is an example of contamination. Figure 15 shows a selection of results from the output that demonstrates this concept: Figure 15: an example of contamination 16 The figure illustrates one pattern as it traverses up the ontology. This can be seen by the similarity of the codes (A10BA02 -> A10BA -> A10B). At level 5, the sequence A10BA02 -> A10BB09 is already a frequently occurring pattern. As mining is performed at higher levels of the ontology, additional information is compounded into the pattern. For example, at level 4, the pattern now includes ‘all Biguanides’ instead of just Metformin, even though the pattern is actually only true for Metformin. By level 3, the pattern has collapsed to a single state (A10B -> A10B), which is essentially not a pattern at all, and rather a re-prescription. As such, the pattern has been made meaningless by the addition of unrelated groups of drugs to the pattern. The reoccurring pattern of A10B is not as meaningful as A10BA02 -> A10BB09; and the pattern has been contaminated. The third and final example is an unusual pattern: Figure 16: an unusual pattern found from pattern mining While this pattern is present at lower levels, it is far easier to identify and interpret at this more abstract level, especially for medical non-professionals. As a pattern that involves hypnotics and sedatives, it is potentially unusual and may indicate a problematic patient, or interesting prescription methodology. This use of the ontology to represent information at a more abstract level could be used by practitioners to more easily identify and address unusual prescription patterns. This ease of identification is another case for the use of ontologies like the ATC drug classification to further data mining, as it makes the analysis of emergent patterns more understandable. These example patterns give an idea of the kind of additional information that it available when an ontology is incorporated into an ordinarily flat database. Discovery of new patterns, variance of pattern interpretation and conceptual abstraction are all expected to appear in any dataset this is implemented with. For the case of prescription information, a large difference between the mining of flat data and the mining of hierarchical data that is implemented in an ontology is observed. 5.2 Using ontologies The results of this work have shown promise in addressing the research question; regarding what impact the use on an ontology in place of flat data has on the success of sequential pattern mining. This research has demonstrated that the ability to detect patterns in the dataset depends on the selection of the correct granularity. As the granularity is decreased, more patterns emerge, but due to the decline in specificity of the pattern, they may be less meaningful. There has been some similar research into the effects of granularity on pattern mining (Andrusiewicz & Orlowska 1997). Andrusiewicz & Orlowska’s proposed using abstractions of original data to strengthen the patterns found by data mining. This is the same principle as modelling the data in an ontology, as the abstraction of information serves to minimise the set of possible states and hence the combination of these states. The researchers had similar findings; that decreasing granularity could lead to the discovery of additional patterns, but it did not necessarily make them meaningful or interesting. For example, in Andrusiewicz & Orlowska’s work it is observed that the pattern of “people who buy bread also buy milk” may be quite interesting; however the abstraction of the pattern to “people who buy bread also buy food” makes it meaningless. This is explored further in section 5.3. 17 The introduction of granularity to the dataset also realises the concepts of contamination and dilution, as demonstrated in section 5.1.These phenomena are expected not to be isolated to this project and the use of prescription data, and rather will be present in other datasets. Dilution can occur whether the dataset is hierarchical or not, as it occurs when the granularity of the data is too high for important patterns to emerge. This is especially prevalent in datasets containing attributes with vast ranges of values. The more values there are available; the more combinations of these values there are possible; and mathematically less likelihood combinations will be frequent enough to be patterns. Often the original data in a dataset is that of a very high, specific granularity, and hence many patterns may be missed due to the vast number of values. Adding levels of abstraction to the data helps to identify patterns that may not have been apparent at more precise levels of granularity. Contamination is a concept that is introduced by the use of ontologies, and is not present in ordinary flat data. As the data abstracts further up the ontology, and the concepts get more generic, there is a risk of patterns becoming too ambiguous and meaningless. If a strong pattern emerges, and the data is abstracted further, this contaminates it with unrelated concepts, and hence the pattern becomes less clear. This experience is particularly important in prescription data, but may have implications in other datasets. 5.3 Interestingness As mentioned earlier, the use of ontologies does increase the discovery of patterns, but it does not necessarily mean the patterns are interesting. Ascertaining just what constitutes an interesting pattern is difficult. Ideally, a user study would be conducted with industry professionals to try to build a model for determining the interestingness of a pattern. However, due to time limitations, this is not possible in this project. Hence, interestingness is defined broadly for this project as a pattern’s potential usefulness and meaningfulness to the industry professionals for whom the data relates to (in this case, general practitioners). As the data abstracts further up the ontology and the granularity decreases, the patterns become more conceptually generic. To some degree, this means a loss in meaning, as the specificities of the pattern are not contained in the abstracted sequence. For example, if a pattern is discovered at level 3 of the ATC drug classification involving J01C – Penicillins, this may be considered interesting. However, the exact penicillins that were involved is not known at this level, so there may be little opportunity to act on this pattern. It is apparent there is some trade-off between the additional discovery of potentially interesting patterns and the loss of specific information in those patterns. Andrusiewicz & Orlowska (1997) explore this notion, and argue that adding levels of granularity can increase interestingness to a point, after which the patterns become too general. Andrusiewicz & Orlowska’s argument extends to this project, where using an ontology does lead to the discovery of additional patterns which may be interesting. However, in this project, simply abstracting patterns does not make them more interesting in any case; it merely makes them easier to analyse and interpret. This may be isolated to the field of prescription data, as the degree of detail is required for the patterns to be interesting to general practitioners. 18 5.4 Further implications This project used a limited set of data and operated in a very limited time frame. With more data and further refinement of methods, this research could ultimately show how much detail is required for prescription information to be meaningful and usable, and how much abstraction of the pathways is possible before the patterns become too ambiguous to be significant. This could have potential applications to GPs analysing prescription information in practice. Better delivery of prescription pathways to GPs could help identify anomaly cases; identify practitioner’s trends in prescribing; and monitor adherence to clinical pathways. With practitioner support, a developed system could be integrated in the practice review process for GPs to reflect on and adapt methods. 6 Future Work There are several ways in which further work could be conducted for this project. These include the possibility to implement therapeutic pathways in place of prescription pathways; changing granularity dynamically; as well as the potential for further work in regard to interestingness. Additionally, this project may form the basis of further research into different fields to test the effects of ontologies on other domains and datasets. 6.1 Therapeutic pathways One possible limitation of this project is the omission of time related information. If several drugs are prescribed at the same time (for example, drugs B1 and B2), they are stored in the database as separate prescriptions, but have the same ‘date’ attribute. A prescription pathway does not account for the date of the prescription and to this effect, interprets this as B1 followed by B2. There has been some previous research involving development of therapeutic state-transition graphs (Gadzhanova et al. 2007), which will be referred to as therapeutic pathways. The therapeutic pathway is a combination of the drug prescriptions and the amounts prescribed; to give an idea of what combination of drugs was being taken at any given time. Figure 17: a prescription pathway and its equivalent therapeutic pathway By utilising this approach, the discovered patterns might be more representative of the true implications of the data, rather than just the manner in which the data is stored. Whilst this minor thesis provides an insight into the general effects of an ontology on data mining, for the results to be more meaningful, therapeutic pathways may be implemented. This would prevent multi-drug 19 prescriptions from emerging as patters in the dataset, and so would be a welcome modification to make the software more applicable to general practice. 6.2 Changing Granularity dynamically In both Andrusiewicz & Orlowska’s work (1997) and in this project, the granularity (or level of ontology) is changed statically for all items at the same time (for example, in this project, all prescriptions are moved from one level of the ATC drug classification to the next level at the same time). There is an opportunity to explore the effects of dynamically changing the level of individual items independent to one another. Using this technique could lead to the discovery of patterns that fluctuate between levels of the ATC ontology, such as A10BA02 -> C10 -> B05BB. Searching for patterns in this manner could help discover what types of data (in a prescription data context) require what level of granularity to produce meaningful patterns. 6.3 Exploring Interestingness Although this project has discussed interestingness to some extent, there is still much work that be conducted regarding this term. As mentioned, there is no formal definition in regard to what constitutes an interesting pattern. Logically, a beneficial pattern is one that provides a gain in information. User studies with general practitioners could be used to deduce just what granularities of different types of prescription data provide useful information, which could be used to develop a better definition for the interestingness of a pattern. A set of heuristics could possibly be used to assess the interestingness of a pattern. With a tentative definition for interestingness, research into whether changing granularity results in a change to interestingness could be conducted. Furthermore, this extended knowledge of interestingness could be used to pre-process related data. Rather than mapping a flat database into a complete ontology as was done in this project, the data could be mapped into the most appropriate or ‘interesting’ levels of the ontology initially, and then the data mining could be performed. This would mean that less candidate patterns would be generated and the algorithm would generally run more efficiently. 6.4 Applying to other fields There is also an opportunity to apply this research to other fields. Currently this research is limited to the area of prescription data for health informatics. There are many domains where data is naturally hierarchical, such as store purchase data and chemicals interactions, and could be modelled in an ontology. By mapping other datasets to ontology based databases, the effects of added granularity could be further explored, to ensure the results presented here are not isolated to prescription data. Due to the reasoning for the increase in pattern discovery (i.e. reducing the number of states and hence the possible combinations of these states), it is unlikely that other fields would have drastically differing results. However, research could also be conducted with regard to different types of ontologies with different dependencies between nodes, such as “is-a-part-of” or “is-acause-of”, instead of the single “is-a” dependency of this project. 20 7 Conclusion In this project the effects of using an ontology in place of flat data for the purpose of performing sequential pattern mining on patient prescription data has been explored. A sequential pattern mining method has been developed that can search for patterns in drug prescriptions modelled at different levels of the ATC drug classification. The algorithm finds patterns within patients as well as across all patients, and uses two support thresholds to produce the most meaningful and interesting patterns. The experiments have shown that there is an increase in pattern discovery at less granular levels of the ontology; however it is difficult to ascertain if these patterns are more interesting. There is also evidence of dilution; where using a low level of the ontology results in important patterns being omitted, and contamination; where using a high level of the ontology results in important patterns becoming ambiguous or meaningless. This research has provided a proof-of-concept into the effects of ontologies on sequential pattern mining. It has demonstrated the potential importance of implementing an ontology in place of flat data to further data mining efforts, and has shown the need for further research into this topic. This work has provided evidence that the use of ontologies can lead to the discovery of additional patterns, and also addressed some of the issues that come from the use of ontologies in data mining. 8 References Agrawal, R & Srikant, R 1995, 'Mining Sequential Patterns', paper presented at the Eleventh International Conference on Data Engineering. Andrusiewicz, A & Orlowska, ME 1997, On Data Granularity Factors that Affect Data Mining, 8th International Database Workshop Hong Kong, pp. 12-29. Bei, A, Luca, SD, Ruscitti, G & Salamon, D 2005, 'Health-Mining: a Disease Management Support Service based on Data Mining and Rule Extraction', paper presented at the Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the. Chen, M, Han, J & Yu, P 1996, 'Data mining: An overview from a database perspective', IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866-883. Chen, M, Park, J & Yu, P 1998, 'Efficient data mining for path traversal patterns', IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, pp. 209-221. Dowsey, M, Kilgour, M, Santamaria, N & Choong, P 1999, 'Clinical pathways in hip and knee arthroplasty: a prospective randomised controlled study', Medical Journal of Australia, vol. 170, pp. 59-61. Friede, A, Blum, H & McDonald, M 1995, 'Public health informatics: how information-age technology can strengthen public health', Annual Review of Public Health, vol. 16, no. 1, pp. 239-252. 21 Friedman, D 2006, 'HIPAA and research: how have the first two years gone?', American journal of ophthalmology, vol. 141, no. 3, p. 543. Gadzhanova, S, Iankov, I, Warren, J, Stanek, J, Misan, G, Baig, Z & Ponte, L 2007, 'Developing highspecificity anti-hypertensive alerts by therapeutic state analysis of electronic prescribing records', Journal of the American Medical Informatics Association, vol. 14, no. 1, pp. 100-109. Garofalakis, M, Rastogi, R & Shim, K 1999, 'SPIRIT: Sequential pattern mining with regular expression constraints', paper presented at the 25th International Conference on Very Large Data Bases Gruber, T 1993, 'A translation approach to portable ontology specifications', Knowledge acquisition, vol. 5, pp. 199-199. Han, J, Dong, G & Yin, Y 1999, 'Efficient mining of partial periodic patterns in time seriesdatabase', paper presented at the International Conference on Data Engineering 1999. Hassey, A, Gerrett, D & Wilson, A 2001, 'A survey of validity and utility of electronic patient records in a general practice', British Medical Journal, vol. 322, no. 7299, p. 1401. Hillestad, R, Bigelow, J, Bower, A, Girosi, F, Meili, R, Scoville, R & Taylor, R 2005, 'Can electronic medical record systems transform health care? Potential health benefits, savings, and costs', Health Affairs, vol. 24, no. 5, p. 1103. Hodge, J, Gostin, L & Jacobson, P 1999, 'Legal Issues Concerning Electronic Health Information: Privacy, Quality, and Liability', JAMA, vol. 282, no. 15, pp. 1466-1471. Johnson, P, Tu, S, Musen, M & Purves, I 2001, 'A virtual medical record for guideline-based decision support', paper presented at the AMIA Annual Symposium 2001. LeniC, M & Kokol, P 2002, 'Combining classifiers with multimethod approach', Soft computing systems: design, management and applications, p. 374. Lin, F, Chiu, C & Wu, S 2002, 'Using Bayesian networks for discovering temporal-state transition patterns in Hemodialysis', paper presented at the 35th Annual Hawaii International Conference on System Sciences. Lin, F, Chou, S, Pan, S & Chen, Y 2001, 'Mining time dependency patterns in clinical pathways', International Journal of Medical Informatics, vol. 62, pp. 11-25. Lin, F, Hsieh, L & Pan, S 2005, 'Learning Clinical Pathway Patterns by Hidden Markov Model', paper presented at the 38th Annual Hawaii International Conference on System Sciences McInnes, D, Saltman, D & Kidd, M 2006, 'General practitioners' use of computers for prescribing and electronic health records: results from a national survey', Medical Journal of Australia, vol. 185, no. 2, p. 88. Noy, N & McGuinness, D 2001, Ontology development 101: A guide to creating your first ontology, Citeseer. 22 Parthasarathy, S, Zaki, M, Ogihara, M & Dwarkadas, S 1999, 'Incremental and interactive sequence mining', paper presented at the eighth international conference on Information and knowledge management Pei, J, Han, J, Dayal, U, Mortazavi-Asl, B, Wang, J, Pinto, H, Chen, Q & Hsu, M 2004, 'Mining sequential patterns by pattern-growth: The prefixspan approach', IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 11. Pinto, H, Han, J, Pei, J, Wang, K, Chen, Q & Dayal, U 2001, 'Multi-dimensional sequential pattern mining', paper presented at the tenth international conference on Information and knowledge management Rector, A, Zanstra, P, Solomon, W, Rogers, J, Baud, R, Ceusters, W, Claassen, W, Kirby, J, Rodrigues, J & Mori, A 1998, 'Reconciling users’ needs and formal requirements: issues in developing a reusable ontology for medicine', IEEE Transactions on Information Technology in BioMedicine, vol. 2, no. 4, p. 229. Riou, C, Pouliquen, B & Beeux, PL 1999, 'A computer-assisted drug prescription system: the model and its implementation in the ATM knowledge base', Meth Inform Med, vol. 38, pp. 25-30. Solomon, W, Wroe, C, Rector, A, Rogers, J, Fistein, J & Johnson, P 1999, 'A reference terminology for drugs', paper presented at the AMIA Annual Symposium 1999. Stanek, J, Iankov, I, Gadzhanova, S, Warren, J & Misan, G 2005, 'Guideline-based General Practice Data Mining', HIC 2005 and HINZ 2005: Proceedings, p. 254. WHO 2010, 'World Health Organisation Collaborating Centre for Drug Statistics Methodology', http://www.whocc.no/. Wroe, C, Solomon, W, Rector, A & Rogers, J 2000, 'DOPAMINE: a tool for visualizing clinical properties of generic drugs', paper presented at the 14th European Conference on Articial Intelligence. Zhang, M, Kao, B, Yip, C & Cheung, D 2001, 'A GSP-based efficient algorithm for mining frequent sequences', paper presented at the International Conference on Artificial Intelligence 2001. Zhao, Q & Bhowmick, S 2003, 'Sequential pattern mining: A survey', ITechnical Report CAIS Nayang Technological University Singapore, pp. 1–26. 23 9 Project Timeline Date 5th March 2010 12th March 2010 th 13 March – 15th March 2010 16th March - 31st March 2010 1st April – 14th April 2010 15th April 2010 16th April – 16th May 2010 17th May – 1st June 2010 3rd June – 8th June 11th June 2010 13th June 2010 14th June - 25th July 2010 26th July – 1st August 2010 2nd August – 22nd August 2010 23rd August – 30th September 2010 10th October 2010 24th October 2010 22nd November 2010 29th November 2010 Task Chose Supervisor Decide on field of thesis Develop Project Plan Research topic Write annotated bibliography Submit Annotated Bibliography Finalise Research Question Write Minor Thesis introduction, literature review Work on ethics proposal for obtaining data Write Minor Thesis ethical considerations Prepare presentation slides Write Minor Thesis methodology Finalise presentation slides Minor Thesis Proposal presentation Submit Minor Thesis Proposal Submit Ethics Proposal Form Familiarising with Cache programming suite Program Pre-processing, pathway output Program testing Implementing Sequential Pattern Mining methods Evaluating Sequential Pattern Mining results Write Minor Thesis results and discussion Minor Thesis draft to supervisor Submit Minor Thesis Comments for corrections received, adjust Minor Thesis Submit Final bound copies 24 Appendix A – Ethics Approval Application University of South Australia Human Ethics Application Protocol Number : 0000020574 Application Title : Impact of applying Hierarchical Structure to typically Flat Data for path mining prescription pathways Date of Submission : N/A Primary Investigator : Mr Matheson Lee Ramsey Prior Assessment Non-UniSA HREC UniSA HREC Project details Research Ethics Advisor Project type 1.1 Has another Human Research Ethics Committee (other than UniSA) reviewed this research project before and does this clearance/approval accurately describe the project as it is to be conducted?* Yes No 2.1 Is this application a resubmission of an application that was considered by UniSA HREC and the decision was 'Not Approved: Resubmit', 'Not Approved' or "Approved subject to" and the status has expired (ie amendments were not made within the 6 month timeframe. Please note if your application is Approved subject to and 6 months has not lapsed then you should use the application you submitted to make the required changes. * Yes: Not approved: resubmit Yes: Not Approved Yes: Approved subject to and the status has expired No 3.1 Name of Research Ethics Adviser This question is not answered. 3.2 Has the Research Ethics Adviser conducted an ethics workshop in the last 12 months?* No 3.3 Have you attended human ethics training in the last 12 months?* Yes No 4.1 Main type of research (e.g. staff, PhD). * Honours Course Approval PhD Masters by Course work Masters by Research Professional Doctorate Undergraduate Graduate Diploma / Graduate Certificate Staff Other 4.2.1 Please note that, if you are a student applicant, your application will be forwarded to your principal supervisor once submitted for their approval. If they are satisfied with your application it will be forwarded to the relevant review group. If your supervisor requires changes to be made then your application will be returned to you to make the required changes. 4.3 Other type of research (e.g. staff, PhD). Please select all that apply* None Honours Course Approval PhD Masters by Course work Masters by Research Professional Doctorate Undergraduate Graduate Diploma / Graduate Certificate Staff Other 25 Project details Resources Project funding Ownership of Data 5.1 Title of research project* Impact of applying Hierarchical Structure to typically Flat Data for path mining prescription pathways 5.2 Plain English title* Testing what kind of impact on path mining success of applying a hierarchical structure to data that is normally not stored in a hierarchy, such as prescription data 5.3 What are the aims of your research* -evaluate the impact and usefulness of using a hierarchy to store typically flat data -evaluate what levels in the hierarchy produce the strongest and most useful paths 5.4 List your research questions or hypotheses. Your protocol should clearly identify the questions which you want your research to answer.* What is the impact on path mining of applying a hierarchy structure to typically flat data? 5.5 Explain the need for, and value of, your research. Place the aims in the context of existing research or practice. (You must include a list of not more than 10 key references as an attachment to support your answer to this question. These are to be attached to the Attachment tab available from the Application Overview screen).* The research presents an opportunity to explore and understand how vague information can be before it is no longer useful (the higher up a hierarchy the more vague the information is, as it encompasses more elements). It also presents an opportunity to discover the impact of using a hierarchy to elaborate typically flat data. If a mark improvement is found, this could lead to the adoption of hierarchical structures for other applications, which could lead to increased running speeds or more accurate path mining, depending on the domain. There is a need to explore the impact of using a hierarchy as some flat data (such as prescription drug information) is too vast to perform complete path mining in feasible time frames. 5.6 Proposed commencement date* 05/07/2010 5.7 Proposed completion date* 01/11/2010 6.1 Have you applied for funding for this project (other than divisional funds)* Yes No 8.1 Detail who will own the data and the results of your research (student researchers normally own their own research and data unless there is a written agreement between the student and the University / third party; staff research and data is normally owned by UniSA). Please select all that apply.* UniSA Student researcher Other 8.2 Does the owner of the information or any other party have any right to impose limitations or conditions on the publication of the results of this project?* Yes No 8.3 Please note that it is the researcher's responsibility to ensure that, where required, an appropriate agreement is in place. If you are unsure whether this is needed, please consult the UniSA website . Do you require an agreement regarding ownership or do you currently have an agreement in place?* An agreement is required A signed agreement is in place An agreement is not required Please note that you must inform UniSA HREC once the agreement has been signed. Final ethics approval cannot be given until confirmation is received. 9.1 The information which will be stored at the completion of this project is of the following type(s). Please select all that apply.* Individually identifiable Re-identifiable Non-identifiable 9.2 Where will the data be stored (please be specific with the address e.g. If stored at UniSA please specify which campus and the office/room location)* the data will be stored at the Mawson Lakes campus of UniSA in D2-03 9.3 For how long will the information be stored after the completion of the project? Why has this period been chosen?* 5 years - to ensure any queries after the completion of the project can be answered, and to quell any later accusations of copying. 9.4 In what formats will the information be stored during the research project? (eg. paper copy, computer file on floppy disk or CD, audio tape, USB memory stick, videotape, film). * computer file 9.5 How will information, in all forms, be disposed after the retention time has lapsed? (Please refer to the Ownership and Retention of Data Policy. The Head of School (or equivalent) must be aware of this process.* deletion of computer file and any backups (on the single same machine) 26 9.6 Will any other individual(s), organisation(s) or researcher(s) (other than those listed on the Investigators tab) have authority to use or have access to the information? * Yes No 9.7 Specify the measures to be taken to ensure the security of information from misuse, loss, or unauthorised access while stored during the research project? (eg. will identifiers be removed and at what stage? Will the information be physically stored in a locked cabinet?)* the data will be immediately de-identified as this identifying information is of no use to the study. The data will be stored on a laptop computer using a strong user password to protect from misuse. 9.8 What arrangements are in place with regard to the storage of the information collected for, used in, or generated by this project in the event that the principal researcher / investigator ceases to be engaged at the current organisation? (Please refer to the Ownership and Retention of Data Policy.* If the principal researcher ceases to be engaged in the study the data will become the responsibility of the supervisor Jan Stanek 10.1 Please refer to the UniSA website : Do you require insurance cover for this project"* Yes No 11.1 Is the activity archival research? A large proportion of activity involving the analysis of documents, publicly available information, or previously collected data may be outside the scope of the University's human research ethics arrangements.* Yes No 11.2 Is the work being conducted only for UniSA administrative / service delivery purposes?* Yes No Scope Scope Research type and participants Research type Participant information 12.1 Should the work be characterised as quality assurance or an audit, rather than human research within the scope of the University's human research ethics arrangements?* Yes No 12.2 Is the work a practical exercise or test conducted for teaching purposes in a University administered facility? ( Please refer to Appendix 2 of Guidelines for Evaluation Activities Involving UniSA Students and Staff) * Yes No 13.1 Is the work a routine experiment or procedure conducted for teaching purposes in a University administered facility? * Yes No 13.2 Is the work / data collection conducted by a student only for teaching / learning purposes? * Yes No 13.2.1 Will the results be published / presented in any way other than a paper / product produced purely for assessment purposes ?* Yes No 14.1 This project involves: (Please select all that apply.)* Research using qualitative methods Research using quantitative methods, population level data or databanks, e.g survey research, epidemiological research None of the above 14.2 What research methodologies will you use? (Please select all that apply.) * Anonymous questionnaires Internet questionnaires Questionnaires requesting intimate personal, identifying, or sensitive information Other questionnaires Face to face interviews which do not request personal or sensitive information Face to face interviews which request personal or sensitive information Telephone survey which does not request personal or sensitive information Telephone survey which requests personal or sensitive information Focus groups Action Research Observation of participant's usual activities Observation of an activity set up for the purposes of the study Access to medical records (or records which contain intimate personal information, and are individually identifiable and are not publicly available) Experiment or testing of a procedure, drug or equipment Use of biological hazards, GMOs or pathogenic organisms Use of carcinogenic and/or toxic chemicals, including heavy metals Use of Radiation (Ionising and/or Non-ionising, but not Ultrasound) Other 14.2.1 Please describe what research methodology you will use.* none of these methodologies apply. We are only interested in obtaining the de-identified data. 27 14.3 Will you be audio-taping, video-taping, or taking photographs of participants during the course of the study? Please select all that apply.* Audio-taping Videotaping Photographs No Selection of participants Project start, end, location details Irregular consent process Limited disclosure / waive consent Covert observations 15.1 How many participant groups are involved in this research project? * 0 15.3 What is the expected total number of participants in this project at all sites?* 0 16.1 What process(es) will be used to identify potential participants?* there are no participants 16.2 Will potential participants be 'screened' or given a test/questionnaire to assess their suitability as a participant for the study?* Yes No 16.3 Describe how initial contact will be made with potential participants.* No contact 16.4 Is an advertisement, e-mail, website, letter or telephone call proposed as the form of initial contact with potential participants?* Yes No 16.5 List the selection and, if appropriate to your study, the exclusion criteria for participants.* there are no participants 16.6 If it became known that a person or participant group was recruited to, participated in, or was excluded from the research, would that knowledge expose the person to any disadvantage or risk?* Yes No Not Applicable 17.1 Will the research be undertaken in Australia?* Yes No 17.1.1 In which town(s)/city(ies)/State(s) of Australia will the research be undertaken in? * Adelaide, South Australia 17.1.2 In how many Australian organisations will the research be conducted? * 0 17.2 Will the research be undertaken overseas?* Yes No 17.3 Are there any time-critical aspects of the research project of which the review committee should be aware?* Yes No 18.1 Does the research involve limited disclosure to participants. Please refer to the National Statement. * Yes No 18.2 Are you asking the HREC / review body to waive the requirement of consent? Refer to the National Statement* Yes No 19.1 Does the research involve covert observation? Refer to the National Statement* Yes No Deception Project type Project type Participants Recruitment Risk to Participants Risk to participants Right to Privacy 20.1 Does the research involve deception. Refer to the National Statement* Yes No 21.1 Does the research involve any of the following? Please select all that apply.* Drugs, narcotics, poisons, placebo will be ingested / injected, or an invasive procedure will be administered Clinical trials Cellular therapy The collection and / or use of human samples. This includes tissue, blood or other body fluid collection / extraction Genetic testing and/or genetic research Human gametes or use or creation of human embryos A practice or intervention which is an alternative to a standard practice or intervention Investigating workplace practices which could possibly impact on workplace relationships Conducting the research overseas and recruiting participants None of the above 28 38.1 Who will you be recruiting as participants for this study? (If there is a high chance that you will be recruiting one of these groups, you should also select that participant group).* General public (over 18 years of age) Members of a collectivity People whose first language is not English People who are illiterate Pregnant women/human foetus Children People who are in a dependent or unequal relationship People who are highly dependent on medical care People with a cognitive impairment Aboriginal and/or Torres Strait Islander peoples People who may be involved in illegal activity Not recruiting participants Other 38.2 Does the research involve issues likely to be considered significant to Indigenous peoples?* Yes No Not Applicable 51.1 Please select all that apply. This research project:* Has the potential to expose participants to potential civil, criminal or other proceedings Makes it possible for third parties to identify participants Involves a risk of physical injury Involves human exposure to ionising and/or non-ionising radiation (including X-ray) Involves exposure to disease or infection Involves pain or significant discomfort Involves psychological or emotional stress Involves sensitive personal information Could expose participants to potential loss of professional reputation, market standing, or employability Could result in significant negative impact upon personal relations Offers an inducement which could be considered coercive Involves the participation of people who legally cannot provide voluntary and informed consent for their participation in research None of the above Collection method Collection method Participants Relationships Consent Consent process 66.1 Does IS42 or the Commonwealth Privacy Act apply to the research (eg access to identified personal data held by third parties subject to privacy regimes)? Refer to the Privacy law* Yes No 67.1 Will the source of the information about participants used in this research project be collected directly from the participant? (e.g. asking participants directly about their medical history)* Yes No 67.2 Will the source of the information about participants used in this research project be collected from another person about the participant? (e.g. asking participants' doctors about their patients medical history)* Yes No 67.3 Will the source of the information about participants involve the use or disclosure of information by an agency, authority or organization (other than UniSA)? (e.g. accessing participants' medical records)* Yes No 67.4 Will the source of the information about participants involve the use of information which you or your organisation Collected previously for a purpose other than this research project?* Yes No 67.5 Describe how information collected about participants will be used in this project.* data will be de-identified immediately as the identifying information poses no use for the study. 67.6 Indicate whichever of the following applies to this project: Please select all that apply.* Information collected for, used in, or generated by, this project will not be used for any other purpose. Information collected for, used in, or generated by, this project will/may be used for another purpose by the researcher for which ethical approval will be sought. Information collected for, used in, or generated by, this project is intended to be used for establishing a database/data collection/register for future use by the researcher for which ethical approval will be sought. Information collected for, used in, or generated by, this project will/may be made available to a third party for a subsequent use or which ethical approval will be sought Other 68.1 Is there an existing relationship or one likely to arise during the research, between the potential participants and any member of the research team or an organisation involved in the research?* Yes No 29 68.2 Does the researcher / investigator have another role in relation to the participant?* Yes No 68.3 Will the research impact upon, or change, an existing relationship between participants and researcher / investigator or organisations.?* Yes No 69.1 Will consent for participation in this research be sought from all participants? Refer to the National Statement* Yes No 69.1.1 Explain why consent will not be sought from all participants.* there are no participants. 70.1 Describe the consent process, ie how participants or those deciding for them will be informed about, and choose whether or not to participate in, the project.* no participants Risks and benefits Risks and benefits Risks and benefits cont. Researcher training 70.2 If a participant or person on behalf of a participant chooses not to participate, are there specific consequences of which they should be made aware, prior to making this decision?* Yes No 70.3 If a participant or person on behalf of a participant chooses to withdraw from the research, are there specific consequences of which they should be made aware, prior to giving consent?* Yes No 70.4 Can individual participants be identifiable by other members of their group? (e.g. co-workers, focus group members etc.)* Yes No 70.7 Will consent be specific or extended or unspecified? Refer to section 2.2.14-2.2.18 of the National Statement* Specific Extended Unspecific Please note that when answering the following questions, only risks beyond those encountered in everyday life are relevant. Refer to the National Statement 71.1 Are there any risks to participants as a result of participation in this research project (eg physical, psychological, spiritual, emotional, legal, social, financial well-being, employability or professional relationships)?* Yes No 71.2 What expected benefits (if any) will this research have for the wider community?* -provide an insight into the requirement for specificity of useful informational - IE, in a hierarchical structure, how far up the hierarchy can we go before paths that are mined are too vague? -allow for future work which can autonomously detect the most useful level of information for a specific purpose. This could result in more accurate prescriptions, and more successful treatments. 71.3 What expected benefits (if any) will this research have for participants?* data being de-identified means there will be no personal benefit 71.4 Are there any other risks involved in this research? eg. to the research team, the organisation, others (eg physical, psychological, spiritual, emotional, legal, social, financial well-being, employability or professional relationships)* Yes No 72.1 Is it anticipated that the research will lead to commercial benefit for the investigator(s) and or the research sponsor(s)?* Yes No 72.2 Is there a risk that the dissemination of results could cause harm of any kind to individual participants - whether their physical, psychological, spiritual, emotional, legal, social or financial well-being, or to their employability or professional relationships - or to their communities?* Yes No 72.3 Describe how the researchers / investigators intend to monitor the conduct and progress of the research project?* -data will be de-identified at first opportunity. There will be no opportunities for misconduct as long as the data is successfully de-identified. 72.4 It is mandatory for researchers to report suspected cases of child abuse/neglect, domestic violence, bullying, illegal activities, use of illicit substances, abuse of elderly persons, professional negligence etc. 72.4.1 Is it likely that this will be disclosed during the course? * Yes No 73.1 List the relevant qualifications, experiences and /or skills of the research team which equip them to conduct this research* 3 years study at UniSA learning ethical conduct 30 Reporting of results Reporting of results cont. Peer review Declaration Minor experience in health informatics with research placement and ongoing work with minor thesis 73.2 Do the researchers involved in this research project require any additional training in order to undertake this research?* Yes No 74.1 Is it intended that results of the research that relate to a specific participant be reported to that participant?* Yes No Not Applicable 74.2 Is the research likely to produce information of personal significance to individual participants?* Yes No 74.3 Will individual participant's results be recorded with their personal records?* Yes No Not Applicable 74.4 Is it intended that all or some of the results that relate to a specific participant be reported to anyone other than that participant?* Yes No 74.5 Will research participants have the opportunity to receive a copy of your final report or summary of the findings if they wish?* Yes No 74.5.2 Why will participants not be provided with a copy of the final report or summary of the findings?* there are no participants 75.1 Is the research likely to reveal a significant risk to the health or well being of persons other than the participant (eg family members, colleagues)?* Yes No 75.2 Is there a risk that the dissemination of results could cause harm of any kind to individual participants - whether their physical, psychological, spiritual, emotional, social or financial well-being, or to their employability or professional relationships - or to their communities?* Yes No 75.3 How is it intended to disseminate the results of the research? Please select all that apply.* Thesis/dissertation Journal article/s Research paper Conference presentation Commissioned report Other 75.4 Will the confidentiality of participants and their data be protected in the dissemination of research results?* Yes No Not Applicable 75.4.1 Explain how confidentiality of participants and their data will be protected in the dissemination of research results* de identified data will be used, and so no confidential information will be revealed in the dissemination 76.1 Provide details of the anticipated duration of the data collection / human research phase of the project.* simple obtain some data from previous researcher and/or database - collection should take no longer than 1-2 days 76.2 Has the research proposal, including design, methodology and evaluation undergone, or will it undergo, a peer review process?* Yes No Declaration The Primary Contact for this project is responsible for the application that is submitted and must be the one to agree to the following statement. "On behalf of the research team for this project, I confirm that all members of the research have read the current NHMRC National Statement on Ethical Conduct in Human Research. The research team accepts responsibility for the ethical and appropriate conduct of the procedures detailed in this application, confirm that the research team will conduct this project in accordance with the principles described in the National Statement, and confirm that the research team will comply with any other condition laid down by the University of South Australia's Human Research Ethics Committee."* I agree 31 Appendix B – Results LEVEL 5 : 2 ITEM PATTERNS PATTERN: Metformin (A10BA02) -> Metformin (A10BA02) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 63) (4, 16) (13, 9) (13, 15) (13, 30) PATTERN: Metformin (A10BA02) -> Gliclazide (A10BB09) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 64) (4, 1) (4, 3) (13, 45) (13, 49) PATTERN: Gliclazide (A10BB09) -> Metformin (A10BA02) PATTERN OCCURS 7 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (4, 2) (4, 4) (4, 9) (4, 12) (4, 22) (13, 25) (13, 34) PATTERN: Prazosin (C02CA01) -> Verapamil (C08DA01) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (11, 7) (11, 48) (11, 54) (11, 66) (11, 79) (11, 89) PATTERN: Verapamil (C08DA01) -> Metformin (A10BA02) PATTERN OCCURS 9 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (4, 39) (4, 41) (11, 13) (11, 19) (11, 24) (11, 36) (11, 42) (11, 55) (11, 67) PATTERN: Atorvastatin (C10AA05) -> Atorvastatin (C10AA05) PATTERN OCCURS 6 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (5, 18) (10, 8) (10, 12) (10, 20) (10, 47) (11, 51) PATTERN: Paracetamol (N02BE01) -> Paracetamol (N02BE01) PATTERN OCCURS 9 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 20) (3, 38) (3, 41) (3, 42) (3, 43) (3, 53) (3, 59) (3, 68) (3, 96) PATTERN: Temazepam (N05CD07) -> Paracetamol (N02BE01) PATTERN OCCURS 8 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 49) (3, 52) (3, 56) (3, 58) (3, 65) (3, 67) (3, 80) (3, 100) 32 --total of 55 patterns found. total of 8 unique patterns. the largest single pattern occurred 9 times. --- LEVEL 4 : 2 ITEM PATTERNS PATTERN: Biguanides (A10BA) -> Biguanides (A10BA) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 63) (4, 16) (13, 9) (13, 15) (13, 30) PATTERN: Biguanides (A10BA) -> Sulfonamides, urea derivatives (A10BB) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 64) (4, 1) (4, 3) (13, 45) (13, 49) PATTERN: Sulfonamides, urea derivatives (A10BB) -> Biguanides (A10BA) PATTERN OCCURS 7 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (4, 2) (4, 4) (4, 9) (4, 12) (4, 22) (13, 25) (13, 34) PATTERN: Alpha-adrenoreceptor antagonists (C02CA) -> Phenylalkylamine derivatives (C08DA) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (11, 7) (11, 48) (11, 54) (11, 66) (11, 79) (11, 89) PATTERN: Sulfonamides, plain (C03BA) -> ACE inhibitors, plain (C09AA) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 28) (9, 9) (9, 17) (10, 41) PATTERN: Phenylalkylamine derivatives (C08DA) -> Biguanides (A10BA) PATTERN OCCURS 9 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (4, 39) (4, 41) (11, 13) (11, 19) (11, 24) (11, 36) (11, 42) (11, 55) (11, 67) 33 PATTERN: ACE inhibitors, plain (C09AA) -> ACE inhibitors, plain (C09AA) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (4, 6) (4, 7) (9, 4) (10, 2) (10, 22) PATTERN: ACE inhibitors, plain (C09AA) -> HMG CoA reductase inhibitors (C10AA) PATTERN OCCURS 9 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (9, 5) (9, 7) (9, 10) (9, 18) (10, 7) (10, 11) (10, 19) (10, 23) (10, 29) PATTERN: HMG CoA reductase inhibitors (C10AA) -> HMG CoA reductase inhibitors (C10AA) PATTERN OCCURS 6 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (5, 18) (10, 8) (10, 12) (10, 20) (10, 47) (11, 51) PATTERN: Anilides (N02BE) -> Anilides (N02BE) PATTERN OCCURS 9 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 20) (3, 38) (3, 41) (3, 42) (3, 43) (3, 53) (3, 59) (3, 68) (3, 96) PATTERN: Benzodiazepine derivatives (N05CD) -> Anilides (N02BE) PATTERN OCCURS 8 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 49) (3, 52) (3, 56) (3, 58) (3, 65) (3, 67) (3, 80) (3, 100) --total of 73 patterns found. total of 11 unique patterns. the largest single pattern occurred 9 times. --- LEVEL 3 : 2 ITEM PATTERNS PATTERN: ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) -> ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) PATTERN OCCURS 19 TIMES IN TOTAL OF 3 PATIENTS 34 OCCURENCES: (1, 63) (1, 64) (4, 1) (4, 2) (4, 3) (4, 4) (4, 9) (4, 12) (4, 15) (4, 16) (4, 17) (4, 22) (13, 9) (13, 15) (13, 25) (13, 30) (13, 34) (13, 45) (13, 49) PATTERN: ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) -> SELECTIVE CALCIUM CHANNEL BLOCKERS WITH MAINLY VASCULAR EFFECTS (C08C) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (13, 1) (13, 23) (13, 26) (13, 35) (13, 42) (13, 46) PATTERN: ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) -> ACE INHIBITORS, PLAIN (C09A) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (4, 5) (4, 10) (4, 13) (4, 23) (4, 29) (4, 31) PATTERN: ANTIADRENERGIC AGENTS, PERIPHERALLY ACTING (C02C) -> SELECTIVE CALCIUM CHANNEL BLOCKERS WITH DIRECT CARDIAC EFFECTS (C08D) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (11, 7) (11, 48) (11, 54) (11, 66) (11, 79) (11, 89) PATTERN: LOW-CEILING DIURETICS, EXCL. THIAZIDES (C03B) -> ACE INHIBITORS, PLAIN (C09A) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 28) (9, 9) (9, 17) (10, 41) PATTERN: SELECTIVE CALCIUM CHANNEL BLOCKERS WITH DIRECT CARDIAC EFFECTS (C08D) -> ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) PATTERN OCCURS 9 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (4, 39) (4, 41) (11, 13) (11, 19) (11, 24) (11, 36) (11, 42) (11, 55) (11, 67) PATTERN: ACE INHIBITORS, PLAIN (C09A) -> ACE INHIBITORS, PLAIN (C09A) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (4, 6) (4, 7) (9, 4) (10, 2) (10, 22) PATTERN: ACE INHIBITORS, PLAIN (C09A) -> CHOLESTEROL AND TRIGLYCERIDE REDUCERS (C10A) PATTERN OCCURS 9 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (9, 5) (9, 7) (9, 10) (9, 18) (10, 7) (10, 11) (10, 19) (10, 23) (10, 29) PATTERN: CHOLESTEROL AND TRIGLYCERIDE REDUCERS (C10A) -> CHOLESTEROL AND TRIGLYCERIDE REDUCERS (C10A) PATTERN OCCURS 6 TIMES IN TOTAL OF 3 PATIENTS 35 OCCURENCES: (5, 18) (10, 8) (10, 12) (10, 20) (10, 47) (11, 51) PATTERN: CHOLESTEROL AND TRIGLYCERIDE REDUCERS (C10A) -> ANTIGOUT PREPARATIONS (M04A) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (10, 9) (10, 13) (10, 24) (10, 30) (10, 39) (10, 48) PATTERN: VIRAL VACCINES (J07B) -> ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 67) (4, 21) (4, 28) (4, 35) (13, 29) PATTERN: OTHER ANALGESICS AND ANTIPYRETICS (N02B) -> OTHER ANALGESICS AND ANTIPYRETICS (N02B) PATTERN OCCURS 10 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 20) (3, 38) (3, 41) (3, 42) (3, 43) (3, 47) (3, 53) (3, 59) (3, 68) (3, 96) PATTERN: OTHER ANALGESICS AND ANTIPYRETICS (N02B) -> HYPNOTICS AND SEDATIVES (N05C) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 48) (3, 50) (3, 57) (3, 66) (3, 78) (3, 89) PATTERN: HYPNOTICS AND SEDATIVES (N05C) -> OTHER ANALGESICS AND ANTIPYRETICS (N02B) PATTERN OCCURS 9 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 49) (3, 52) (3, 56) (3, 58) (3, 65) (3, 67) (3, 80) (3, 90) (3, 100) PATTERN: OTHER DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES, INHALANTS (R03B) -> ADRENERGICS, INHALANTS (R03A) PATTERN OCCURS 3 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (7, 15) (10, 45) (11, 59) --total of 109 patterns found. total of 15 unique patterns. the largest single pattern occurred 19 times. --- 36 LEVEL 3 : 3 ITEM PATTERNS PATTERN: ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) -> ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) -> ORAL BLOOD GLUCOSE LOWERING DRUGS (A10B) PATTERN OCCURS 6 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (1, 63) (4, 1) (4, 2) (4, 3) (4, 15) (4, 16) --total of 6 patterns found. total of 1 unique patterns. the largest single pattern occurred 6 times. --- LEVEL 2 : 2 ITEM PATTERNS PATTERN: DRUGS USED IN DIABETES (A10) -> DRUGS USED IN DIABETES (A10) PATTERN OCCURS 19 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 63) (1, 64) (4, 1) (4, 2) (4, 3) (4, 4) (4, 9) (4, 12) (4, 15) (4, 16) (4, 17) (4, 22) (13, 9) (13, 15) (13, 25) (13, 30) (13, 34) (13, 45) (13, 49) PATTERN: DRUGS USED IN DIABETES (A10) -> CALCIUM CHANNEL BLOCKERS (C08) PATTERN OCCURS 7 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (4, 40) (13, 1) (13, 23) (13, 26) (13, 35) (13, 42) (13, 46) PATTERN: DRUGS USED IN DIABETES (A10) -> AGENTS ACTING ON THE RENIN-ANGIOTENSIN SYSTEM (C09) PATTERN OCCURS 7 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (4, 5) (4, 10) (4, 13) (4, 23) (4, 29) (4, 31) (4, 42) PATTERN: DRUGS USED IN DIABETES (A10) -> VACCINES (J07) PATTERN OCCURS 6 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (4, 27) (4, 34) (8, 8) (9, 20) (13, 12) (13, 38) 37 PATTERN: ANTIHYPERTENSIVES (C02) -> CALCIUM CHANNEL BLOCKERS (C08) PATTERN OCCURS 10 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (11, 7) (11, 15) (11, 21) (11, 26) (11, 34) (11, 48) (11, 54) (11, 66) (11, 79) (11, 89) PATTERN: DIURETICS (C03) -> AGENTS ACTING ON THE RENIN-ANGIOTENSIN SYSTEM (C09) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 28) (9, 9) (9, 17) (10, 41) PATTERN: CALCIUM CHANNEL BLOCKERS (C08) -> DRUGS USED IN DIABETES (A10) PATTERN OCCURS 11 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (4, 39) (4, 41) (11, 13) (11, 19) (11, 24) (11, 36) (11, 42) (11, 55) (11, 67) (13, 24) (13, 33) PATTERN: CALCIUM CHANNEL BLOCKERS (C08) -> CALCIUM CHANNEL BLOCKERS (C08) PATTERN OCCURS 6 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (3, 2) (5, 1) (5, 2) (5, 5) (5, 12) (11, 35) PATTERN: AGENTS ACTING ON THE RENIN-ANGIOTENSIN SYSTEM (C09) -> DRUGS USED IN DIABETES (A10) PATTERN OCCURS 6 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (1, 49) (4, 8) (4, 11) (4, 14) (4, 26) (4, 30) PATTERN: AGENTS ACTING ON THE RENIN-ANGIOTENSIN SYSTEM (C09) -> AGENTS ACTING ON THE RENIN-ANGIOTENSIN SYSTEM (C09) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (4, 6) (4, 7) (9, 4) (10, 2) (10, 22) PATTERN: AGENTS ACTING ON THE RENIN-ANGIOTENSIN SYSTEM (C09) -> SERUM LIPID REDUCING AGENTS (C10) PATTERN OCCURS 9 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (9, 5) (9, 7) (9, 10) (9, 18) (10, 7) (10, 11) (10, 19) (10, 23) (10, 29) PATTERN: SERUM LIPID REDUCING AGENTS (C10) -> SERUM LIPID REDUCING AGENTS (C10) PATTERN OCCURS 6 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (5, 18) (10, 8) (10, 12) (10, 20) (10, 47) (11, 51) 38 PATTERN: SERUM LIPID REDUCING AGENTS (C10) -> ANTIGOUT PREPARATIONS (M04) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (10, 9) (10, 13) (10, 24) (10, 30) (10, 39) (10, 48) PATTERN: ANTIBACTERIALS FOR SYSTEMIC USE (J01) -> BETA BLOCKING AGENTS (C07) PATTERN OCCURS 3 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 61) (3, 28) (6, 46) PATTERN: ANTIBACTERIALS FOR SYSTEMIC USE (J01) -> ANALGESICS (N02) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (3, 9) (3, 40) (5, 7) (11, 4) (11, 46) PATTERN: VACCINES (J07) -> DRUGS USED IN DIABETES (A10) PATTERN OCCURS 6 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (1, 67) (4, 21) (4, 28) (4, 35) (8, 9) (13, 29) PATTERN: ANALGESICS (N02) -> ANALGESICS (N02) PATTERN OCCURS 11 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 20) (3, 38) (3, 41) (3, 42) (3, 43) (3, 47) (3, 53) (3, 59) (3, 68) (3, 69) (3, 96) PATTERN: ANALGESICS (N02) -> PSYCHOLEPTICS (N05) PATTERN OCCURS 8 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 4) (3, 13) (3, 48) (3, 50) (3, 57) (3, 66) (3, 78) (3, 89) PATTERN: PSYCHOLEPTICS (N05) -> ANALGESICS (N02) PATTERN OCCURS 10 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 49) (3, 52) (3, 56) (3, 58) (3, 65) (3, 67) (3, 80) (3, 88) (3, 90) (3, 100) PATTERN: DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES (R03) -> DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES (R03) PATTERN OCCURS 7 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (6, 21) (7, 7) (7, 15) (7, 16) (10, 44) (10, 45) (11, 59) --total of 152 patterns found. 39 total of 20 unique patterns. the largest single pattern occurred 19 times. --- LEVEL 2 : 3 ITEM PATTERNS PATTERN: DRUGS USED IN DIABETES (A10) -> DRUGS USED IN DIABETES (A10) -> DRUGS USED IN DIABETES (A10) PATTERN OCCURS 6 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (1, 63) (4, 1) (4, 2) (4, 3) (4, 15) (4, 16) --total of 6 patterns found. total of 1 unique patterns. the largest single pattern occurred 6 times. --- LEVEL 1 : 2 ITEM PATTERNS PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 22 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 39) (1, 56) (1, 59) (1, 63) (1, 64) (4, 1) (4, 2) (4, 3) (4, 4) (4, 9) (4, 12) (4, 15) (4, 16) (4, 17) (4, 22) (13, 9) (13, 15) (13, 25) (13, 30) (13, 34) (13, 45) (13, 49) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 33 TIMES IN TOTAL OF 6 PATIENTS OCCURENCES: (1, 18) (1, 26) (1, 34) (1, 40) (1, 57) (1, 68) (3, 82) (3, 86) (4, 5) (4, 10) (4, 13) (4, 23) (4, 29) (4, 31) (4, 40) (4, 42) (11, 6) (11, 14) (11, 20) (11, 25) (11, 77) (11, 87) (12, 6) (12, 45) (13, 1) (13, 10) (13, 16) (13, 19) (13, 23) (13, 26) (13, 35) (13, 42) (13, 46) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> C;C (C;C) 40 PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (4, 18) (8, 10) (13, 21) (13, 31) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (H) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 8) (11, 68) (11, 72) (12, 10) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> ANTIINFECTIVES FOR SYSTEMIC USE (J) PATTERN OCCURS 12 TIMES IN TOTAL OF 7 PATIENTS OCCURENCES: (1, 42) (1, 52) (1, 60) (4, 27) (4, 34) (4, 36) (8, 8) (9, 20) (11, 3) (12, 3) (13, 12) (13, 38) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> NERVOUS SYSTEM (N) PATTERN OCCURS 8 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (1, 11) (1, 22) (1, 50) (3, 71) (3, 74) (3, 77) (3, 84) (3, 99) PATTERN: CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 34 TIMES IN TOTAL OF 8 PATIENTS OCCURENCES: (1, 19) (1, 41) (1, 49) (1, 58) (1, 62) (3, 73) (3, 83) (3, 98) (4, 8) (4, 11) (4, 14) (4, 26) (4, 30) (4, 39) (4, 41) (7, 13) (9, 19) (11, 2) (11, 13) (11, 19) (11, 24) (11, 36) (11, 42) (11, 55) (11, 67) (12, 9) (12, 46) (13, 8) (13, 11) (13, 14) (13, 18) (13, 20) (13, 24) (13, 33) PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 70 TIMES IN TOTAL OF 9 PATIENTS OCCURENCES: (1, 13) (1, 27) (1, 28) (3, 2) (3, 22) (3, 25) (3, 31) (3, 32) (3, 33) (3, 34) (3, 35) (3, 36) (3, 45) (3, 94) (4, 6) (4, 7) (5, 1) (5, 2) (5, 5) (5, 12) (5, 17) (5, 18) (7, 9) (9, 3) (9, 4) (9, 5) (9, 6) (9, 7) (9, 8) (9, 9) (9, 10) (9, 14) (9, 17) (9, 18) (10, 2) (10, 7) (10, 8) (10, 11) (10, 12) (10, 19) (10, 20) (10, 21) (10, 22) (10, 23) (10, 28) (10, 29) (10, 37) (10, 38) (10, 41) (10, 47) (11, 1) (11, 7) (11, 8) (11, 15) (11, 21) (11, 22) (11, 23) (11, 26) (11, 34) (11, 35) (11, 48) (11, 51) (11, 54) (11, 66) (11, 78) (11, 79) (11, 88) (11, 89) (13, 7) (13, 17) PATTERN: CARDIOVASCULAR SYSTEM (C) -> C;C (C;C) PATTERN OCCURS 8 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (11, 49) (11, 52) (11, 64) (13, 27) (13, 36) (13, 40) (13, 43) (13, 47) PATTERN: CARDIOVASCULAR SYSTEM (C) -> SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (H) 41 PATTERN OCCURS 5 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (1, 14) (3, 61) (5, 13) (5, 19) (11, 90) PATTERN: CARDIOVASCULAR SYSTEM (C) -> ANTIINFECTIVES FOR SYSTEMIC USE (J) PATTERN OCCURS 12 TIMES IN TOTAL OF 5 PATIENTS OCCURENCES: (4, 24) (4, 32) (4, 43) (5, 3) (5, 6) (5, 9) (5, 15) (6, 2) (6, 5) (6, 47) (9, 1) (12, 37) PATTERN: CARDIOVASCULAR SYSTEM (C) -> MUSCULO-SKELETAL SYSTEM (M) PATTERN OCCURS 15 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (7, 4) (10, 5) (10, 9) (10, 13) (10, 17) (10, 24) (10, 30) (10, 35) (10, 39) (10, 42) (10, 48) (11, 27) (11, 32) (11, 80) (12, 15) PATTERN: CARDIOVASCULAR SYSTEM (C) -> NERVOUS SYSTEM (N) PATTERN OCCURS 19 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (1, 29) (3, 3) (3, 6) (3, 12) (3, 15) (3, 17) (3, 19) (3, 23) (3, 26) (3, 29) (3, 37) (3, 46) (3, 64) (3, 87) (3, 92) (3, 95) (8, 2) (11, 9) (11, 16) PATTERN: DERMATOLOGICALS (D) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 3 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (3, 1) (10, 16) (12, 42) PATTERN: GENITO URINARY SYSTEM AND SEX HORMONES (G) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (9, 16) (12, 8) (12, 14) (13, 6) PATTERN: G;G (G;G) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 6 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (1, 10) (1, 17) (1, 21) (1, 25) (1, 33) (1, 55) PATTERN: SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (H) -> ANTIINFECTIVES FOR SYSTEMIC USE (J) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (6, 8) (11, 69) (11, 73) (12, 24) PATTERN: SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (H) -> NERVOUS SYSTEM (N) PATTERN OCCURS 9 TIMES IN TOTAL OF 3 PATIENTS 42 OCCURENCES: (3, 55) (3, 62) (6, 10) (6, 17) (6, 23) (6, 28) (6, 43) (12, 11) (12, 35) PATTERN: ANTIINFECTIVES FOR SYSTEMIC USE (J) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 10 TIMES IN TOTAL OF 6 PATIENTS OCCURENCES: (1, 67) (3, 76) (4, 21) (4, 28) (4, 33) (4, 35) (4, 44) (8, 9) (12, 27) (13, 29) PATTERN: ANTIINFECTIVES FOR SYSTEMIC USE (J) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 13 TIMES IN TOTAL OF 7 PATIENTS OCCURENCES: (1, 3) (1, 61) (3, 11) (3, 28) (4, 25) (4, 38) (5, 4) (5, 16) (6, 46) (9, 2) (9, 21) (13, 13) (13, 39) PATTERN: ANTIINFECTIVES FOR SYSTEMIC USE (J) -> SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (H) PATTERN OCCURS 7 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (5, 10) (6, 3) (6, 7) (6, 9) (11, 29) (11, 70) (11, 74) PATTERN: ANTIINFECTIVES FOR SYSTEMIC USE (J) -> ANTIINFECTIVES FOR SYSTEMIC USE (J) PATTERN OCCURS 7 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (1, 31) (1, 53) (4, 20) (4, 37) (6, 6) (12, 25) (12, 26) PATTERN: ANTIINFECTIVES FOR SYSTEMIC USE (J) -> NERVOUS SYSTEM (N) PATTERN OCCURS 10 TIMES IN TOTAL OF 5 PATIENTS OCCURENCES: (1, 43) (3, 9) (3, 40) (5, 7) (6, 14) (6, 19) (6, 33) (11, 4) (11, 46) (11, 83) PATTERN: MUSCULO-SKELETAL SYSTEM (M) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 15 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (7, 5) (7, 12) (10, 1) (10, 6) (10, 10) (10, 18) (10, 27) (10, 34) (10, 36) (10, 40) (10, 49) (11, 12) (11, 33) (12, 16) (12, 29) PATTERN: MUSCULO-SKELETAL SYSTEM (M) -> MUSCULO-SKELETAL SYSTEM (M) PATTERN OCCURS 10 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (1, 6) (2, 1) (2, 2) (10, 14) (10, 25) (10, 26) (10, 31) (10, 32) (10, 33) (11, 11) PATTERN: NERVOUS SYSTEM (N) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 7 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 38) (1, 44) (1, 51) (3, 70) (3, 81) (3, 85) (11, 5) PATTERN: NERVOUS SYSTEM (N) -> CARDIOVASCULAR SYSTEM (C) 43 PATTERN OCCURS 22 TIMES IN TOTAL OF 7 PATIENTS OCCURENCES: (1, 12) (1, 23) (3, 5) (3, 14) (3, 16) (3, 18) (3, 21) (3, 24) (3, 30) (3, 44) (3, 60) (3, 63) (3, 72) (3, 91) (3, 93) (3, 97) (5, 8) (6, 1) (8, 1) (8, 3) (11, 47) (12, 36) PATTERN: NERVOUS SYSTEM (N) -> ANTIINFECTIVES FOR SYSTEMIC USE (J) PATTERN OCCURS 11 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 30) (3, 10) (3, 27) (3, 39) (3, 75) (6, 13) (6, 18) (6, 29) (6, 32) (6, 39) (6, 45) PATTERN: NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) PATTERN OCCURS 34 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (3, 4) (3, 13) (3, 20) (3, 38) (3, 41) (3, 42) (3, 43) (3, 47) (3, 48) (3, 49) (3, 50) (3, 51) (3, 52) (3, 53) (3, 56) (3, 57) (3, 58) (3, 59) (3, 65) (3, 66) (3, 67) (3, 68) (3, 69) (3, 78) (3, 79) (3, 80) (3, 88) (3, 89) (3, 90) (3, 96) (3, 100) (6, 24) (6, 44) (12, 12) PATTERN: RESPIRATORY SYSTEM (R) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (7, 1) (7, 3) (7, 8) (10, 46) (11, 41) PATTERN: RESPIRATORY SYSTEM (R) -> RESPIRATORY SYSTEM (R) PATTERN OCCURS 7 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (6, 21) (7, 7) (7, 15) (7, 16) (10, 44) (10, 45) (11, 59) --total of 430 patterns found. total of 31 unique patterns. the largest single pattern occurred 70 times. --- LEVEL 1 : 3 ITEM PATTERNS PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> ALIMENTARY TRACT AND METABOLISM (A) > ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 6 TIMES IN TOTAL OF 2 PATIENTS 44 OCCURENCES: (1, 63) (4, 1) (4, 2) (4, 3) (4, 15) (4, 16) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> ALIMENTARY TRACT AND METABOLISM (A) > CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 11 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 39) (1, 56) (4, 4) (4, 9) (4, 12) (4, 22) (13, 9) (13, 15) (13, 25) (13, 34) (13, 45) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 12 TIMES IN TOTAL OF 5 PATIENTS OCCURENCES: (1, 18) (1, 40) (1, 57) (3, 82) (4, 10) (4, 13) (4, 29) (4, 40) (12, 45) (13, 10) (13, 19) (13, 23) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 9 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (1, 26) (4, 5) (11, 6) (11, 14) (11, 20) (11, 25) (11, 77) (11, 87) (13, 16) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> ANTIINFECTIVES FOR SYSTEMIC USE (J) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 60) (9, 20) (13, 12) (13, 38) PATTERN: CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 9 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 58) (1, 62) (4, 8) (4, 11) (4, 14) (13, 8) (13, 14) (13, 24) (13, 33) PATTERN: CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 7 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (4, 30) (4, 39) (4, 41) (11, 13) (11, 19) (11, 24) (13, 18) PATTERN: CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) -> ANTIINFECTIVES FOR SYSTEMIC USE (J) PATTERN OCCURS 5 TIMES IN TOTAL OF 5 PATIENTS OCCURENCES: (1, 41) (4, 26) (9, 19) (11, 2) (13, 11) 45 PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 9 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (4, 7) (9, 18) (11, 1) (11, 23) (11, 35) (11, 54) (11, 66) (13, 7) (13, 17) PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 31 TIMES IN TOTAL OF 7 PATIENTS OCCURENCES: (1, 27) (3, 31) (3, 32) (3, 33) (3, 34) (3, 35) (4, 6) (5, 1) (5, 17) (9, 3) (9, 4) (9, 5) (9, 6) (9, 7) (9, 8) (9, 9) (9, 17) (10, 7) (10, 11) (10, 19) (10, 20) (10, 21) (10, 22) (10, 28) (10, 37) (11, 7) (11, 21) (11, 22) (11, 34) (11, 78) (11, 88) PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (H) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 13) (5, 12) (5, 18) (11, 89) PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> MUSCULO-SKELETAL SYSTEM (M) PATTERN OCCURS 9 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (10, 8) (10, 12) (10, 23) (10, 29) (10, 38) (10, 41) (10, 47) (11, 26) (11, 79) PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> NERVOUS SYSTEM (N) PATTERN OCCURS 9 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 28) (3, 2) (3, 22) (3, 25) (3, 36) (3, 45) (3, 94) (11, 8) (11, 15) PATTERN: CARDIOVASCULAR SYSTEM (C) -> ANTIINFECTIVES FOR SYSTEMIC USE (J) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (4, 24) (5, 3) (5, 15) (9, 1) PATTERN: CARDIOVASCULAR SYSTEM (C) -> MUSCULO-SKELETAL SYSTEM (M) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 9 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (7, 4) (10, 5) (10, 9) (10, 17) (10, 35) (10, 39) (10, 48) (11, 32) (12, 15) PATTERN: CARDIOVASCULAR SYSTEM (C) -> NERVOUS SYSTEM (N) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 6 TIMES IN TOTAL OF 2 PATIENTS 46 OCCURENCES: (3, 15) (3, 17) (3, 23) (3, 29) (3, 92) (8, 2) PATTERN: CARDIOVASCULAR SYSTEM (C) -> NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) PATTERN OCCURS 8 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 3) (3, 12) (3, 19) (3, 37) (3, 46) (3, 64) (3, 87) (3, 95) PATTERN: SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (H) -> NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (3, 55) (6, 23) (6, 43) (12, 11) PATTERN: ANTIINFECTIVES FOR SYSTEMIC USE (J) -> CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 61) (4, 25) (4, 38) (13, 13) PATTERN: MUSCULO-SKELETAL SYSTEM (M) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 8 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (10, 1) (10, 6) (10, 10) (10, 18) (10, 27) (10, 36) (10, 40) (11, 33) PATTERN: NERVOUS SYSTEM (N) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 7 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 12) (3, 21) (3, 24) (3, 30) (3, 44) (3, 93) (11, 47) PATTERN: NERVOUS SYSTEM (N) -> CARDIOVASCULAR SYSTEM (C) -> ANTIINFECTIVES FOR SYSTEMIC USE (J) PATTERN OCCURS 3 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (5, 8) (6, 1) (12, 36) PATTERN: NERVOUS SYSTEM (N) -> CARDIOVASCULAR SYSTEM (C) -> NERVOUS SYSTEM (N) PATTERN OCCURS 7 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (3, 5) (3, 14) (3, 16) (3, 18) (3, 63) (3, 91) (8, 1) PATTERN: NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 7 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 4) (3, 13) (3, 20) (3, 43) (3, 59) (3, 90) (3, 96) 47 PATTERN: NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) PATTERN OCCURS 19 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 41) (3, 42) (3, 47) (3, 48) (3, 49) (3, 50) (3, 51) (3, 52) (3, 56) (3, 57) (3, 58) (3, 65) (3, 66) (3, 67) (3, 68) (3, 78) (3, 79) (3, 88) (3, 89) --total of 211 patterns found. total of 25 unique patterns. the largest single pattern occurred 31 times. --- LEVEL 1 : 4 ITEM PATTERNS PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> ALIMENTARY TRACT AND METABOLISM (A) > CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 5 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 39) (1, 56) (4, 9) (4, 12) (13, 9) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 57) (4, 10) (4, 13) (13, 23) PATTERN: ALIMENTARY TRACT AND METABOLISM (A) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 6 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 26) (4, 5) (11, 6) (11, 20) (11, 77) (11, 87) PATTERN: CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) -> ALIMENTARY TRACT AND METABOLISM (A) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 6 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (4, 8) (4, 11) (13, 8) (13, 14) (13, 24) (13, 33) 48 PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> ALIMENTARY TRACT AND METABOLISM (A) PATTERN OCCURS 4 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (4, 6) (9, 17) (11, 22) (11, 34) PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 14 TIMES IN TOTAL OF 4 PATIENTS OCCURENCES: (3, 31) (3, 32) (3, 33) (3, 34) (9, 3) (9, 4) (9, 5) (9, 6) (9, 7) (9, 8) (10, 19) (10, 20) (10, 21) (11, 21) PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> MUSCULO-SKELETAL SYSTEM (M) PATTERN OCCURS 6 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (10, 7) (10, 11) (10, 22) (10, 28) (10, 37) (11, 78) PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> NERVOUS SYSTEM (N) PATTERN OCCURS 3 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (1, 27) (3, 35) (11, 7) PATTERN: CARDIOVASCULAR SYSTEM (C) -> MUSCULO-SKELETAL SYSTEM (M) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 6 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (10, 5) (10, 9) (10, 17) (10, 35) (10, 39) (11, 32) PATTERN: MUSCULO-SKELETAL SYSTEM (M) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 6 TIMES IN TOTAL OF 2 PATIENTS OCCURENCES: (10, 6) (10, 10) (10, 18) (10, 27) (10, 36) (11, 33) PATTERN: NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) PATTERN OCCURS 13 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 41) (3, 47) (3, 48) (3, 49) (3, 50) (3, 51) (3, 56) (3, 57) (3, 65) (3, 66) (3, 67) (3, 78) (3, 88) 49 --total of 73 patterns found. total of 11 unique patterns. the largest single pattern occurred 14 times. --- LEVEL 1 : 5 ITEM PATTERNS PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 10 TIMES IN TOTAL OF 3 PATIENTS OCCURENCES: (3, 31) (3, 32) (3, 33) (9, 3) (9, 4) (9, 5) (9, 6) (9, 7) (10, 19) (10, 20) PATTERN: NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) -> NERVOUS SYSTEM (N) PATTERN OCCURS 7 TIMES IN TOTAL OF 1 PATIENTS OCCURENCES: (3, 47) (3, 48) (3, 49) (3, 50) (3, 56) (3, 65) (3, 66) --total of 17 patterns found. total of 2 unique patterns. the largest single pattern occurred 10 times. --- LEVEL 1 : 6 ITEM PATTERNS PATTERN: CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) -> CARDIOVASCULAR SYSTEM (C) PATTERN OCCURS 7 TIMES IN TOTAL OF 3 PATIENTS 50 OCCURENCES: (3, 31) (3, 32) (9, 3) (9, 4) (9, 5) (9, 6) (10, 19) --total of 7 patterns found. total of 1 unique patterns. the largest single pattern occurred 7 times. --- 51