Continuity Equations in Continuous Auditing: Detecting Anomalies in Business Processes Jia Wu Dept of Accounting and Finance University of Massachusetts – Dartmouth 285 Oldwestport Road North Dartmouth, MA 02747 Alex Kogan Department of Accounting & Information Systems Rutgers University 180 University Ave Newark, NJ 07102 Michael Alles Department of Accounting & Information Systems Rutgers University 180 University Ave Newark, NJ 07102 Miklos Vasarhelyi Department of Accounting & Information Systems Rutgers University 180 University Ave Newark, NJ 07102 Oct, 2005 Abstract: This research discusses how Continuity Equations (CE) can be developed and implemented in Continuous Auditing (CA) for anomaly detection purpose. We use realworld data sets extracted from the supply chain of a large healthcare management firm in this study. Our first primary objective is to demonstrate how to develop CE models from a Business Process (BP) auditing approach. Two types of CE models are constructed in our study — the Simultaneous Equation Model (SEM) and the Multivariate Time Series Model (MTSM). Our second primary objective is to design a set of online learning and error correction protocols for automatic model selection and updating. Our third primary objective is to evaluate the CE models through comparison. First, we compare the prediction accuracy of the CE models and the traditional analytical procedure model. Our results indicate that CE models have relatively good prediction accuracy. Second, we compare the anomaly detection capability of the AP models with error correction and models without error correction. We find that models with error correction have better performance than models without error correction. Lastly, we examine the difference in detection capability between CE models and the traditional AP model. Overall, we find that CE models outperform linear regression model in terms of anomaly detection. Keywords: continuous auditing, analytical procedure, anomaly detection Data availability: Proprietary data, not available to the public, contact the author for details. 2 Table of Contents I. Introduction ................................................................................................................... 4 II. Background, Literature Review and Research Questions ................................ 7 2.1 Continuous Auditing............................................................................................... 7 2.2 Business Process Auditing Approach .................................................................... 8 2.3 Continuity Equations .............................................................................................. 9 2.4 Analytical Procedures ............................................................................................. 9 2.5 Research Questions ............................................................................................... 11 III. Research Method ...................................................................................................... 15 3.1 Data Profile and Data Preprocessing .................................................................. 15 3.2 Analytical Modeling .............................................................................................. 18 3.2.1 Simultaneous Equation Model ...................................................................... 18 3.2.2 Multivariate Time Series Model ................................................................... 20 3.2.3 Linear Regression Model .............................................................................. 21 3.3 Automatic Model Selection and Updating .......................................................... 22 3.4 Prediction Accuracy Comparison ....................................................................... 24 3.5 Anomaly Detection Comparison .......................................................................... 25 3.5.1 Anomaly Detection Comparison of Models with Error Correction and without Error Correction ....................................................................................... 27 3.5.2 Anomaly Detection Comparison of SEM, MTSM and Linear Regression29 IV: Conclusion, Limitations and Future Research Directions ................................... 30 4.1 Conclusion ............................................................................................................. 30 4.2 Limitations ............................................................................................................. 30 4.3 Future Research Directions ................................................................................. 31 V: References ................................................................................................................... 32 VI: Figures, Tables and Charts ..................................................................................... 35 VII: Appendix: Multivariate Time Series Model with All Parameter Estimates ..... 52 3 I. Introduction The CICA/AICPA Research Report defines CA as “a methodology that enables independent auditors to provide written assurance on a subject matter using a series of auditors’ reports issued simultaneously with, or a short period of time after, the occurrence of events underlying the subject matter.”1 Generally speaking, audits in a CA environment are performed on a more frequent and timely basis relative to traditional auditing. CA is a great leap forward in both audit depth and audit breadth and is expected to improve audit quality. Thanks to fast advances in information technologies, the implementation of CA has become technologically feasible. Besides, the recent spate of corporate scandals and related auditing failures are driving the demand for audits of better quality. Additionally, new regulations such as Sarbanes-Oxley Act require verifiable corporate internal controls and shorter reporting lags. All these taken together have created an amenable environment for CA development since it is expected that CA can outperform traditional auditing on many aspects including anomaly detection. In the past few years CA has caught the attention of more and more academic researchers, auditing professionals, and software developers. The research on CA has been continuously flourishing. A number of papers discuss the enabling technologies in CA (Vasarhelyi and Halper 1991; Kogan et al. 1999; Woodroof and Searcy 2001; Rezaee et al. 2002; Murthy and Groomer 2004, etc.). Other papers, mostly normative ones, address CA from a variety of theoretical perspectives (Alles et al. 2002 and 2004; Elliott 2002; Vasarhelyi 2002). However, there is a dearth of empirical research on CA due to 1 http://www.cica.ca/index.cfm/ci_id/989/la_id/1.htm 4 the lack of data availability.2 This study extends the prior research by using real-world data sets to build analytical procedure models for CA. This research proposes and demonstrates how a set of novel analytical procedure (AP) models, Continuity Equation Models, can be developed and implemented in CA for anomaly detection purpose which is considered as one of the fortes of CA. Statement on Auditing Standards (SAS) No. 56 requires that analytical procedures be performed during the planning and review stages of an audit. It also recommends the use of analytical procedures in substantive tests. Effective and efficient AP can reduce the audit workload of substantive tests and cut the audit cost because it can help auditors focus their attention on most suspicious accounts. In applying analytical procedures an auditor first relies on an AP expectation model, or an AP model, to make prediction the value of an important business metric (e.g. an account balance). Then, the auditor compares the predicted value with the actual value of the metric. Finally, if the variance between the two values exceeds a pre-established threshold, an alarm should be triggered. This would warrant the auditor’s further investigation. The expectation models in AP therefore play an important role in helping auditors to identify anomalies. In comparison to traditional auditing, CA usually involves with high frequency audit tests, highly disaggregate business process data, and continuous new data feeds. Moreover, any detected anomalies must be corrected in a timely fashion. Therefore, an expectation model in CA must be capable of processing high volumes of data, detecting anomalies at the business process level, self-updating 2 Compustat, CRSP or other popular data sets for capital market researchers are usually not sufficient to do empirical research in CA. For CA empirical research, very high frequency data sets are generally required. 5 using the new data feeds, and correcting errors immediately after detection. Besides, it is of vital importance for the expectation model in CA to detect anomalies in an accurate and timely manner. With these expectations in mind we define four requirements for AP models in CA. First, the analytical modeling process should be largely automated and the AP models should be self-adaptive, requiring as little human intervention as possible. The high frequency audit tests make it impossible for human auditors to select the best model on a continuous basis. One the other hand, new data are continuous fed into a CA system. A good AP model for CA should be able to assimilate additional information contained in the new data feeds, adapting itself continuously. Second, the AP models should be able to generate accurate predictions. Auditors reply on expectation models to forecast business metric values. It is very important for the expectation model to generate accurate forecast. Third, the AP models should be able to detect errors effectively and efficiently. The ultimate objective for auditors applying AP is to detect anomalies and then to apply test of details on these anomalies. To improve error detection capability, the AP model should be able to correct any detected errors as soon as possible to ensure that new prediction is based on the correct data as opposed to the erroneous ones. In this study we construct the expectation models using the supply chain procurement cycle data provided by a large healthcare management firm. These models are built using the Business Process (BP) approach as opposed to the traditional transactional level approach. Three key business processes are identified in the procurement cycle: the ordering process, the receiving process, and the voucher payment process. Our CE models are constructed on the basis of these three BPs. Two types of CE 6 models are proposed in this paper — Simultaneous Equation Model and Multivariate Time Series Model. We evaluate the two CE models through comparison with traditional AP models such as the linear regression model. First, we examine the prediction accuracy of these models. Our first findings suggest that the two CE models can produce relative accurate forecasts. Second, we compare AP models with and without error correction. Our finding shows that AP models with error correction can outperform AP models without error correction. Lastly, we compare the two CE models with traditional linear regression model in an error correction scenario. Our finding indicates that the Simultaneous Equation Model and Multivariate Time Series Model outperform the linear regression model in terms of anomaly detection. The remainder of this paper is organized as follows. Section II provides some background knowledge and literature review on CA and AP. Research questions are stated in this section. Section III describes the data profile and data preprocessing steps, discusses model construction procedures, and presents the findings of the study. The final section discusses the results, identifies the limitations of the study, and suggests future research directions in this line of study. II. Background, Literature Review and Research Questions 2.1 Continuous Auditing Continuous auditing research came into being over a decade ago. The majority of the papers on continuous auditing are descriptive, focusing on the technical aspect of CA (Vasarhelyi and Halper 1991; Kogan et al. 1999; Woodroof and Searcy 2001; Rezaee et 7 al. 2002; Murthy 2004; Murthy and Groomer 2004, etc.). Only a few papers discuss CA from other perspectives (e.g. economics, concepts, research directions, etc.) and most of these are normative research (Alles et al. 2002 and 2004; Elliott 2002; Vasarhelyi 2002; Searcy et al. 2004). Due to the data unavailability, there is a lack of empirical studies on CA in general and on analytical procedures for CA in particular. This study enhances the prior CA literature by using empirical evidence to illustrate the prowess of CA in anomaly detection. Additionally, it extends prior CA research by discussing the implementation of analytical procedures in CA and proposing new models for it. 2.2 Business Process Auditing Approach When Vasarhelyi and Halper (1991) introduced the concept of continuous auditing over a decade ago, they discussed the use of key operational metrics and analytics generated by the CPAS auditing system to help internal auditors monitor and control AT&T’s billing system. Their study uses the operational process auditing approach and emphasizes the use of metrics and analytics in continuous auditing. Bell et al. (1997) also propose a holistic approach to audit an organization: structurally dividing a business organization into various business processes (e.g. the revenue cycle, procurement cycle, payroll cycle, and etc.) for the auditing purpose. They suggest the expansion of auditing subjects from business transactions to the routine activities associated with different business processes. Following these two prior studies, this paper also adopts the Business Process auditing approach in our AP model construction. One advantage of BP auditing approach is that anomalies can be detected in a more timely fashion. Anomalies can be detected at 8 the transaction level as opposed to the account balance level. Traditionally, AP is applied at the account balance level after business transactions have been aggregated into account balances. This would not only delay the anomaly detection but also create an additional layer of difficulty for anomaly detection because transactions are consolidated into accounting numbers. BP approach auditing can solve these problems. 2.3 Continuity Equations We use Continuity Equations to model the different BPs in our sample firm. Continuity Equations are commonly used in physics as mathematical expressions of various conservation laws, such as the law of the conservation of mass: “For a control volume that has a single inlet and a single outlet, the principle of conservation of mass states that, for steady-state flow, the mass flow rate into the volume must equal the mass flow rate out.”3 This paper borrows the concept of CE from physical sciences and applies it in a business scenario. We consider the each business process as a control volume made up of a variety of transaction flows, or business activities. If transaction flows into and out of each BP are equal, the business process would be in a steady-state, free from anomalies. Otherwise, if spikes occur in the transaction flows, the steady-state of the business process can not be maintained. Auditors should initiate detailed investigations on the causes of these anomalies. We use Continuity Equations to model the relationships between different business processes. 2.4 Analytical Procedures 3 http://www.tpub.com/content/doe/h1012v3/css/h1012v3_33.htm 9 There are extensive research studies on analytical procedures in auditing. Many papers discuss the traditional analytical procedures (Hylas and Ashton 1982; Kinney 1987; Loebbecke and Steinbart 1987; Biggs et al. 1988; Wright and Ashton 1989). A few papers examine new analytical procedure models using disaggregate data, which are more relevant to this study. Dzeng (1994) introduces VAR (vector) model into his study, comparing 8 univariate and multivariate AP models using quarterly and monthly financial and non-financial data of a university. His study finds that less aggregate data can yield better precisions in the time-series expectation model. Additionally, his study also concludes that VAR is better than other modeling techniques in generating expectation models. Other studies also find that applying new AP models to high frequency data can improve analytical procedure effectiveness (Chen and Leitch 1998 and 1999, Leitch and Chen 2003). On the other hand, Allen et al. (1999) do not find any supporting evidence that geographically disaggregate data can improve analytical procedures. In this study we test the CE models’ effectiveness using daily transaction data, which has higher frequency than the data sets used by prior studies. We propose two types of CE models for our study: the Simultaneous Equation Model (SEM) and the Multivariate Time Series Model (MTSM). The SEM can model the interrelationships between different business processes simultaneously while traditional expectation models such as linear regression model can only model one relationship at a time. In SEM each interrelationship between two business processes is represented by an equation. A SEM usually consists of a simultaneous system of two or more equations which represent a variety of business activities co-existing in a business organization. The use of SEM in analytical procedures has been examined by Leitch and Chen (2003). 10 They use monthly financial statement data to compare the effectiveness to different AP models. Their finding indicates that SEM can generally outperform other AP models including Martingale and ARIMA. In addition to SEM, this paper also proposes a novel AP model — the Multivariate Times Series Model. To the best of our knowledge, the MTSM has never been explored in prior auditing literature even though there are a limited number of studies on the univariate time series models (Knechel 1988; Lorek et al. 1992; Chen and Leitch 1998; Leitch and Chen 2003). The computational complexity of MTSM hampers its application as an AP model. Prior researchers and practitioners were unable to apply this model because appropriate statistical tools were unavailable. However, with the recent development in statistical software applications, it is not difficult to compute this sophisticated model. Starting with version 8, SAS (Statistical Analysis System) allows users to make multivariate time series forecasts. The MTSM can not only model the interrelationships between BPs but represent the time series properties of these BPs as well. Although MTSM has never been discussed in the auditing literature, studies in other disciplines have either employed or discussed MTSM as a forecasting method (Swanson 1998; Pandther 2002; Corman and Mocan 2004). 2.5 Research Questions Because the statistically sophisticated CE models can better represent business processes, we expect that the CE models can outperform the traditional AP models. We select linear regression model for comparison purpose because it is considered as the best traditional AP model (Stringer and Stewart 1986). Following the previous line of research 11 on AP model comparison (Dzeng 1994; Allen et al. 1999; Chen and Leitch 1998 and 1999; Leitch and Chen 2003), this study compares the SEM and MTSM with the traditional linear regression model on two aspects. First, we compare the prediction accuracy of these models. A good expectation model is expected to generate predicted values close to actual values. Auditors can rely on these accurate predictions to identify anomalies. This leads to our first research question: Question 1: Do Continuity Equation models have better prediction accuracy than the traditional linear regression model? We use Mean Absolute Percentage Error (MAPE) as the benchmark to measure prediction accuracy of expectation models. It first calculates the absolute variance between the predicted value and the actual value. Then it computes the percentage of the absolute variance over the actual value. A good expectation model is supposed to have better prediction accuracy thereby low MAPE. Our primary interest in developing AP models is for anomaly detection purpose. To the best of our knowledge, previous auditing studies have not discussed how error correction can affect the detection capabilities of AP models. In this study we compare the anomaly detection capabilities between models with error correction and without error correction. In a continuous auditing scenario involving the high frequency audit tests, it may be necessary that an error should be corrected immediately after its detection, before subsequent audit tests. And the AP models will make subsequent predictions based on the correct value as opposed to the erroneous value. We expect that 12 AP models with error correction can outperform AP models without error correction. This leads to our second research question: Question 2: Do AP models with error correction have better anomaly detection capability than AP models without error correction? The ultimate purpose for us to develop CE models is for anomaly detection. We expect that CE models can outperform traditional AP models in term of anomaly detection. Hence our third research question is stated as follows: Question 3: Do Continuity Equation models have better anomaly detection capability than traditional linear regression AP model? After the analysis of our second research question, we find that models with error correction generally outperform models without error correction. Therefore, when we analyze our third research question, we specify that both the CE models and the linear regression model have error correction capability. We use false positive error rate and false negative error rate4 as benchmarks to measure the anomaly detection capability. A false positive error, also known as a false alarm or Type I error, is a non-anomaly mistakenly detected by the AP model as an anomaly. On the other hand, a false negative error is, or a type II error, which indicates that an anomaly failed to be detected by the model. An effective AP model is expected to have a low false positive error rate and low false negative error rate. 4 See section 3.4 for detailed description of false negative errors and false positive errors. 13 In summary, we expect that AP models in CA should be equipped with error correction function for better detection rate. And we also expect that CE models can outperform traditional linear regression models in a simulated CA environment. 14 III. Research Method 3.1 Data Profile and Data Preprocessing The data sets are extracted from the data warehouse of a large healthcare management firm. At current stage we are working with the supply chain procurement cycle data which consists of 16 tables concerning a variety of business activities. The data sets include all procurement cycle daily transactions from Oct 1st, 2003 through June 30th, 2004. These transactions are performed by ten facilities of the firm including one regional warehouse and nine hospitals and surgical centers. The data was first collected by the ten facilities and then transferred to the central data warehouse in the firm’s headquarters. Even though the firm headquarters have implemented an ERP system, many of the 10 facilities still rely on legacy systems. Not surprisingly, we have identified a number of data integrity issues which we believe are caused by the legacy systems. These problems should be resolved in the data preprocessing phase of our study. Following the BP auditing approach, and also as a means to facilitate our research, our first step is to identify key business processes in the supply chain procurement cycle and focus our attention on them. The three key BPs we have identified are: ordering, receiving, and voucher payment, which involve six tables in our data sets. [Insert Figure 1 here] At the second step we clean the data by removing the erroneous records in the 6 tables. Two categories of erroneous records are removed from our data sets: those that 15 violate data integrity and those that violate referential integrity. Data integrity violations include but are not limited to invalid purchase quantities, receiving quantities, and check numbers.5 Referential integrity violations are largely caused by many unmatched records among different business processes. For example, a receiving transaction can not be matched with any related ordering transaction. A payment for a purchase order can not be matched by the related receiving transaction. Before we can build any analytical model, these erroneous records must be eliminated. We expect that in a real world CA environment the data cleansing task can be automatically completed by the auditee’s ERP systems. The third step in the data preprocessing is to identify records with complete transaction cycles. To facilitate our research, we exclude those records with partial delivery or partial payments. We specify that all the records in our sample must have undergone a complete transaction cycle. In other words, each record in one business process must have a matching record in a related business process and have the same transaction quantity.6 The fourth step in the data preprocessing phase is to delete non-business-day records. Though we find sporadic transactions occurred on some weekends and holidays, the number of these transactions accounts for only a small fraction of that on a working day. However, if we leave these non-business-day records in our sample, these records would inevitably trigger false alarms simply because of low transaction volume. 5 We found negative or zero numbers in these values which can not always be justified by our data provider. 6 For example, a purchase record of 1000 pairs of surgical gloves must have a matched receiving record and payment record, with the same order number, item number, and the same transaction quantity which is 1000, etc. 16 The last step in the data preprocessing is to aggregate individual transactional records by day. Aggregation is a critical step before the construction of an AP model. It can reduce the variance among individual transactions.7 The spikes among individual transactions can be somewhat smoothed out if we aggregate them by day, which can lead to the construction of a stable model. Otherwise, it would be impossible to derive a stable model based on data sets with enormous variances because the model would either trigger too many alarms or lack the detection power. On the other hand, if we aggregate individual transactions over a longer time period such as a week or a month, then the model would fail to detect many abnormal transactions because the abnormality would be mostly smoothed out by the longer time interval. Aggregation can be performed on other dimensions besides the time interval. For example, aggregation can be based on each facility (hospitals or surgical centers), each vendor, each purchase item, etc. Moreover, various metrics can be used for aggregation. At current stage, we use transaction quantity as the primary metric for aggregation. Other metrics including the dollar amounts of each transaction or the number of transactions can also be aggregated. Analytical procedures can be performed on these different metrics to monitor the transaction flows in the business organization. Auditing on different metrics plays an important role today. It would enable auditors to detect more suspicious patterns of transaction.8 Summary statistics are presented in Table 1. 7 For example, the transaction quantity can differ a lot among individual transactions. The lag time between order and delivery, delivery and payment can also vary. If we aggregate the individual transactions by day, the variance can be largely reduced. 8 We need perform audits on different metrics besides financial numbers. For example, the Patriot Act requires that banks should report the source of money for any deposit larger than US$100,000 by its client. However, the client can by-pass the mandatory reporting by dividing the deposit over $100,000 into several smaller deposits. Even though the deposit amount each time is under the limit, the number of total deposits is over the limit. Auditors can only catch such fraudulent activity by using the number of deposit transactions as an audit metrics. 17 [Insert Table 1 here] 3.2 Analytical Modeling 3.2.1 Simultaneous Equation Model Following the BP auditing approach, we have identified three key business processes for our sample firm which include ordering, receiving, and voucher payment processes. We model the interrelationships between these processes. We select the transaction quantity as our audit metric and use individual working day as our level of aggregation. After completing these initial steps, we are able to estimate our first type of CE model — the Simultaneous Equation Model. We specify the daily aggregate of order quantity as the exogenous variable while the daily aggregates of receiving quantity and payment quantity as endogenous variables. Time stamps are added to the transaction flow among the three business processes. The transaction flow originates from the ordering process at time t. After a lag period Δ1, the transaction flow pops up in the receiving process at time t+ Δ1. After another lag period Δ2, the transaction flow re-appears in the voucher payment processes at time t+ Δ2. The basic SEM model is: (qty of receive)t *(qty of order )t 1 1 (qty of vouchers)t *(qty of receive)t - 2 2 We then select transaction quantity as the primary metric for testing as opposed to dollar amounts due to two reasons: First, we want to illustrate that CA can work efficiently and effectively on operational data (non-financial data); Second, in our sample set dollar amounts contains noisy information including sales discounts and tax. We 18 aggregate the transaction quantities for the ordering, receiving, and voucher payment processes respectively. After excluding weekends and holidays, we have obtained 147 observations in our data sets for each business process. Our next step in constructing the simultaneous equation model is to estimate the lags. Initially, we used the mode and average of the individual transactions’ lags as estimates for the lags between the BPs. The mode lag between the ordering process and the receiving process is 1 day. The mode lag between the receiving process and the payment process is also 1 day. The average lags are 3 and 6 days respectively. Later, we tried different combinations of lag estimates from 1 day to 7 days to test our model. Our results indicate that the mode estimate works best among all estimates for the simultaneous equation model. Therefore, we can express our estimated model as: receivet * ordert -1 1 vouchert * receivet -1 2 Where order = daily aggregate of transaction quantity for the purchase order process receive = daily aggregate of transaction quantity for the receiving process voucher = daily aggregate of transaction quantity for the voucher payment process t = transaction time We divide our data set into two parts. The first part which accounts for 2/3 of the observations is categorized as the training set and used to estimate the model. The second part which accounts for 1/3 of the total observations is categorized as the hold-out set and used to test our model. Our estimated simultaneous equation model estimated on the training set is as follows: 19 receivet 0.8462* ordert -1 e1 vouchert 0.8874* receivet -1 e2 The R squares for the equation are 0.73 and 0.79 respectively, which indicate a good fit of data for the simultaneous equation model. However, we have also realized some limitations associated with SEM. First, the lags have to be separately estimated and such estimations are not only time-consuming but also prone to errors. Second, SEM model is a simplistic model. Each variable can only depend on a single lagged value of the other variable. For example, vouchert can only depend on receivet-1 even though there may be a good chance that vouchert can depend on other lagged value of the receive variable, or even the lagged value of the order variable. Due to these limitations, we need to develop a more flexible CE model. 3.2.2 Multivariate Time Series Model We continue to follow the BP auditing approach and use daily aggregates of transaction quantity as audit metric to develop the MTSM. However, unlike in the case of SEM, no lag estimation is necessary. We only need to specify the maximum lag period. All possible lags within the period can be tested by the model. We specify 18 days as the maximum lag because 95% of the lags of all the individual transactions fall within this time frame. Our basic multivariate time series model is expressed as follows: ordert = Φro*M(receive)+ Φvo*M(voucher)+ εo receive t = Φor*M(order)+ Φvr*M(voucher)+ εr vouchert = Φov*M(order)+ Φrv*M(receive)+ εv 20 M(order)= n*1 vector of daily aggregate of order quantity M(receive)= n*1 vector of daily aggregate of receive quantity M(voucher)= n*1 vector of daily aggregate of voucher quantity Φ = corresponding 1*n transition vectors Again we split our data set into two subsets: the training set and the hold-out (test) set. We use SAS VARMAX procedure to estimate the large MTSM model (a 3x18x3 matrix has been estimated - see Appendix). Despite the fact that this model is a good fit to our data sets, the predictions it generates for the hold-out (test) sample have large variances.9 In addition, a large number of the parameter estimates are not statistically significant. We believe the model suffers from the over-fitting problem. Therefore, we apply step-wise procedures to restrict the insignificant parameter values to zero and retain only the significant parameters in the model in each step. Then, we estimate the model again. If new insignificant parameters appear, we restrict them to zero and re-estimate the model. We repeat the step-wise procedure several times until there are no insignificant parameters appearing in the model. One of our estimated multivariate time series model is expressed as: ordert = 0.24*order t-4 + 0.25*order t-14 + 0.56*receive t-15 + eo receive t= 0.26*order t-4 + 0.21*order t-6 + 0.60*voucher t-10 + er vouchert =0.73*receivet-1 - 0.25*ordert-7 + 0.22*ordert-17 + 0.24*receivet-17+ ev 3.2.3 Linear Regression Model We construct the linear regression model for comparison purpose. In our linear regression model we specify the lagged values of daily aggregates of transaction quantity 9 We find that the MAPE for predictions of Order, Receive, and Voucher variables are all over 54%. 21 in the order process and the receive process as two independent variables respectively, and the voucher payment quantity aggregate as the dependent variable. Again, we use the mode value of lags in individual transactions as estimates for the lags in the model (i.e. 2 day lag between the ordering and voucher payment processes, and 1 day lag between the receiving and voucher payment processes). No intercept is used in our model because we can not find any valid meaning for the intercept. Our OLS linear regression model is expressed as follows: vouchert = α*ordert-2 + β*receivet-1 + ε Where order = daily aggregate of transaction quantity for the ordering process receive = daily aggregate of transaction quantity for the receiving process voucher = daily aggregate of transaction quantity for the voucher payment process t= transaction time at time t Again we use the first 2/3 of our data set as the training subset to estimate our model. The estimated linear regression model is: vouchert = 0.02* ordert-2 + 0.81* receivet-1 + e The α estimate is statistically insignificant (p>0.73) while the β estimate is significant at 99% level (p<0.0001). 3.3 Automatic Model Selection and Updating One distinctive feature of analytical modeling in CA is the automatic model selection and updating capability. Traditional analytical modeling is usually based on 22 static archival data. Auditors generally apply one model to the entire audit data set. In comparison, analytical modeling in CA can be based on the continuous data streams dynamically flowing into the CA system. The analytical modeling in CA should be able to assimilate the new information contained in every segment of the data flows and adapt itself constantly. Each newly updated analytical model is used to generate a prediction only for one new segment of data. This model updating procedure is expected to improve prediction accuracy and anomaly detection capability. [Insert Figure 2 here] When we develop multivariate time series models, we have encountered model over-fitting problem. Specifically, our initial model is a large and complex one including many parameters. Though this model fits the training data set very well, it suffers from a severe model over-fitting problem as indicated by the poor model prediction accuracy. To improve the model, we have applied a step-wise procedure. First, we determine a p-value threshold for all the parameter estimates. Then, in each step, we only retain the parameter estimates under the pre-determined threshold and restrict those over the threshold to zero, and re-estimate the model. If we find new parameter estimates over the threshold, we apply the previous procedure again until all the parameter estimates are below the threshold. The step-wise procedure ensures that all the parameters are statistically significant and the over-fitting problem is largely eliminated. [Insert Figure 3 here] When we apply step-wise procedures to the multivariate time series model, a set of different p-value thresholds are used. We choose thresholds at 5%, 10%, 15%, 20% 23 and 30% and test the prediction accuracy for each variable in MTSM. We find that if we use 15% threshold, the MTSM has the overall best prediction accuracy. 3.4 Prediction Accuracy Comparison While performing the analytical procedures, auditors use different measures to make predictions on account numbers. The methods they use include historical analysis, financial ratio analysis, reasonableness tests, and statistical AP models. One of the expectations for AP models is that auditors can rely on the models to make accurate predictions. Hence, it is important for AP models to make forecasts as close to actual values as possible. In this subsection we compare the prediction accuracy for the three AP models: the Simultaneous Equation Model, the Multivariate Time Series Model, and the Linear Regression Model. We use MAPE as the benchmark to measure prediction accuracy, expecting that a good model should have a small MAPE (i.e. the variance between the predicted value and the actual value is small). We first use the training sample to estimate each of the three models. Then, each estimated model is used to make 1-step-aheard forecasts, to calculate the forecast variance, and then the model is adapted according to the new data feeds in the hold-out (test) sample. Finally, all the variances are summed up and divided by the total number of observations in the hold-out sample to compute the MAPE. The results for MAPE of Voucher predictions are presented in Table 2. [Insert Table 2 here] We find that the MAPEs generated by the three AP models differ only by less than 2%, which indicates that all three models have very similar prediction accuracy. 24 Linear regression model has the lowest MAPE, followed by the MTSM, and SEMhas the highest MAPE. Therefore, H1 is rejected. Theoretically, the best AP model should have the lowest MAPE. The slightly higher MAPE for SEM and MTSM can possibly be attributed to the pollution in our data sets. As mentioned in later section, our AP models can detect 5 or 6 original anomalies in the hold-out (test) sample before we seed any errors. These outliers can increase the MAPEs of the AP models which are capable of making accurate predictions. In summary, the CE models have similar prediction accuracy as the linear regression models. Compared with prior literature10, the predictions are relatively accurate for all of our AP models. 3.5 Anomaly Detection Comparison The primary objective for AP models is to detect anomalies. A good AP model can detect anomalies in an effectively and efficiently fashion. To measure anomaly detection capability of the AP models, we use two benchmarks: the number of false positive errors and the number of false negative errors.11 A false positive error is also called a false alarm or a type I error, which is a non-anomaly mistakenly detected by the model as an anomaly. A false negative error is also called a type II error, which is an anomaly failed to be detected by the model. While a false positive error can waste auditor’s time and thereby increase audit cost, a false negative error is usually more detrimental because of the material uncertainty associated with the undetected anomaly. 10 In Chen and Leitch (2003) study the MAPEs of the 4 AP models are 0.3915, 0.3944, 0.5964 and 0.5847. Other auditing literature sometimes report MAPE exceeding 100%. 11 For the presentation purpose, we also include the tables and charts of detection rate, which equals to 1 minus false negative error rate. 25 An effective and efficient AP model should keep both the number of false positive errors and the number of false negative errors at a low level. To compare the anomaly detection capabilities of the CE models and linear regression model, we need to seed errors into our hold-out (test) sample. Our AP models have detected around 5 original anomalies even before we seed any errors. Therefore, we select those observations other than the original anomalies to seed errors. Each time we randomly seed 8 errors into the hold-out sample. We also want to test how the error magnitude can affect each AP model’s anomaly detection capability. Therefore, we use 5 different magnitudes respectively in every round of error seeding: 10%, 50%, 100%, 200% and 400% of the original actual value of the seeded observations. The entire error seeding procedure is repeated 10 times to reduce selection bias and ensure randomness. We use confidence intervals (CI) for the individual dependant variable, or the prediction interval, as the acceptable threshold of variance to define anomaly detection. If the value of the prediction exceeds the upper confidence limit or falls below the lower confidence limit, then we mark the observation as an anomaly.12 The selection of prediction interval is another issue to discuss. If we choose a high percentage for the prediction interval (e.g. 95%), the prediction interval would be too wide and thereby result in a low detection rate. On the other hand, if a low percentage prediction interval is selected, then the prediction interval would be too narrow and thereby many normal observations would be categorized as anomalies. To solve this problem, we have tested a set of prediction interval percentages from 50% through 95%. We have found that 97% prediction interval works the best for simultaneous equations, 70% prediction interval 12 We also tested other benchmarks for anomalies, such as using MAPE = 50% as a threshold. We have found that using the prediction interval generates the best performance for anomaly detection, resulting the smallest number of false positive errors and false negative errors. 26 works best for the multivariate time series model and 90% prediction interval works best for the linear regression model. The relatively low percentage of prediction interval for the multivariate time series model is most probably due to the data pollution problem.13 Leitch and Chen (2003) use both positive and negative approach to evaluate the anomaly detection capability of various models. In the positive approach all the observations are treated as non-anomalies. The model is used to detect those seeded errors. In contrast, the negative approach treats all observations as anomalies. The model is used to find those non-anomalies. This study only adopts the positive approach because it fits better to the BP auditing scenario. 3.5.1 Anomaly Detection Comparison of Models with Error Correction and without Error Correction In a CA environment when an anomaly is detected, the auditor will be notified immediately and a detailed investigation will be initiated. Ideally, the auditor will correct the error with the true value in a timely fashion, usually before the next round of audit starts. In other words, errors are detected and corrected in real time in a CA environment. We use error correction model to simulate this scenario. Specifically, when the AP model detects a seeded error in the hold-out (test) sample, the seeded error will be substituted by the original actual value before the model is used again to predict subsequent values. For comparison purpose, we also test how our CE models and linear regression model work without the error correction. Unlike continuous auditing, anomalies are detected but usually not corrected immediately in traditional auditing. To simulate this 13 Our data sets are not extracted from a relational database. As a result, there may exist non-trivial noise which can affect our test results. 27 scenario, we simply don’t correct any errors we seeded in the hold-out (test) sample even if the AP model detects them. [Insert Table 3A, 3B, 4, 5A, 5B, 6, 7A, 7B, 8, Chart 1A, 1B, 2A, 2B, 3A, 3B here] We find that SEM with error correction consistently outperforms SEM without error correction. SEM with error correction has lower false negative error rate and higher detection rate (Table 3A, 3B and Chart 1A, 1B). Neither the models generate any false positive errors (Table 4). The results indicate that the MTSM error correction model generally has lower false negative error rates than the MTSM without error correction (Tables 5A and 5B, Charts 2A and 2B), which supports H2 that error correction models have better detection rate. In addition, the error correction model for MTSM has no false positive errors while the model without error correction occasionally has false positive errors (Table 6). Similar results have been found for the linear regression model (Tables 7A and 7B, Charts 3A and 3B), except that there are no false positive errors in both the error correction and the without correction models (Table 8). A further investigation reveals that some of the new false negative errors are due to the detection failure of the original anomalies, especially when the magnitude of seeded error increases. This indicates that AP models without error correction may fail to detect those relatively small-size errors when large-size errors are present simultaneously. In general, the results are consistent with our expectation that error correction can significantly improve the anomaly detection capability of AP models. H2 is supported. 28 3.5.2 Anomaly Detection Comparison of SEM, MTSM and Linear Regression It is of interest for us to know whether CE models are better than linear regression model in terms of anomaly detection. We have known from the test results of H2 that error correction models generally have better anomaly detection capability than noncorrection models. Hence, we compare SEM, MTSM and the linear regression model in an error correction scenario. [Insert Tables 9A, 9B, 10 and Charts 4A, 4B here] Table 9A and Chart 1A present the results of the false negative error percentage rates of the three different AP models with error correction. Table 9B and Chart 9B present the detection success rates of the AP models, which is another way to represent the anomaly detection capability. It is not difficult to realize that though we have mixed results for the three models in anomaly detection when error magnitude is small (at 10% and 50% level), multivariate time series model can detect more anomalies as error magnitude increases than the linear regression model. The difference is most pronounced when error magnitude is at 200% level. Besides, the simultaneous equation model also has better performance than the linear regression model when error magnitude is larger than 100%. However, it is not as good as the multivariate time series model when error magnitude is at the 200% level. Table 10 presents the results of the false positive error percentage rates of the AP models with error correction. There are no false positive errors generated by all three models, indicating perfect performance on this aspect. In summary, we believe that the both simultaneous equation model and the multivariate time series model perform better than the linear regression model in general, because it is more important for the AP models to detect material errors than small errors. Our finding supports H3. 29 IV: Conclusion, Limitations and Future Research Directions 4.1 Conclusion In this study we have explored how to implement analytical procedures to detect anomalies in a continuous auditing environment. Specifically, we have constructed two continuity equation models: a simultaneous equation model and a multivariate time series model. And we compare the CE models with the linear regression model in terms of prediction accuracy and anomaly detection performance. We can not find evidence to support our first hypothesis that CE models can normally generate better prediction accuracy. We find evidence to support our second hypothesis that models with error correction are better than models without error correction in anomaly detection. The results from the empirical tests are also consistent with our third hypothesis that the CE models generally outperform traditional linear regression model in terms of anomaly detection in a simulated CA environment which has high frequency data available. This is the first study on the analytical procedures of continuous auditing. It is also the first attempt to use empirical data to compare different AP models in a CA context. We have also proposed a novel AP model in auditing research — the multivariate time series model and examine the different detection capabilities between models with error correction and without error correction. 4.2 Limitations This study has a number of limitations. Firstly, our data sets are extracted from a single firm, which may constitute a selection bias. Until we test our CE model, using 30 other firms’ data sets, we will not have empirical evidence to support that our AP models are portable and can be applied to other firms. In addition, our data sets contain some noise. Since our data sets are actually extracted from a central data warehouse which accepts data from both ERP and legacy systems in the firm’s subdivisions, it is inevitable for our datasets to be contaminated by some errors and noise. And the date truncation problem also produces some noise in our data sets. The appearance of original anomalies is one indication of the presence of noise in our data sets. 4.3 Future Research Directions Since this paper is devoted to a new research area, there are many future research directions to fill the vacuum. For example, it can be very interesting to see if our model is portable to other firms or other audit dimensions such as financial numbers if data is available. It would also be of interest to see how CE models can be compared with other innovative AP models such as artificial intelligence models and other time series models including Martingale model and X11 model. Moreover, our models do not include many independent variables and control variables, which can be included in CE models in future studies. 31 V: References: 1. Allen R.D., M.S. Beasley, and B.C. Branson. 1999. Improving Analytical Procedures: A Case of Using Disaggregate Multilocation Data, Auditing: A Journal of Practice and Theory 18 (Fall): 128-142. 2. Alles M.G., A. Kogan, and M.A. Vasarhelyi. 2002. Feasibility and Economics of Continuous Assurance. Auditing: A Journal of Practice and Theory 21 (spring):125-138. 3. ____________________________________.2004. Restoring auditor credibility: tertiary monitoring and logging of continuous assurance systems. International Journal of Accounting Information Systems 5: 183-202. 4. ____________________________________ and J. Wu. 2004. Continuity Equations: Business Process Based Audit Benchmarks in Continuous Auditing. Proceedings of American Accounting Association Annual Conference. Orlando, FL. 5. Bell T., Marrs F.O., I. Solomon, and H. Thomas 1997. Monograph: Auditing Organizations Through a Strategic-Systems Lens. Montvale, NJ, KPMG Peat Marwick. 6. Chen Y. and Leitch R.A. 1998. The Error Detection of Structural Analytical Procedures: A Simulation Study. Auditing: A Journal of Practice and Theory 17 (Fall): 36-70. 7. ______________________. 1999. An Analysis of the Relative Power Characteristics of Analytical Procedures. Auditing: A Journal of Practice and Theory 18 (Fall): 35-69. 8. Corman H. and H.N. Mocan 2004. A Time-series Analysis of Crime, Deterrence and Drug Abuse in New York City. American Economic Review. Forthcoming. 9. Dzeng S.C. 1994. A Comparison of Analytical Procedures Expectation Models Using Both Aggregate and Disaggregate Data. Auditing: A Journal of Practice and Theory 13 (Fall): 1-24. 10. Elliot, R.K. 2002. Twenty-First Century Assurance. Auditing: A Journal of Practice and Theory. 21 (Spring): 129-146. 32 11. Groomer, S.M. and U.S. Murthy. 1989. Continuous auditing of database applications: An embedded audit module approach. Journal of Information Systems 3 (2): 53-69. 12. Kogan, A. E.F. Sudit, and M.A. Vasarhelyi. 1999. Continuous Online Auditing: A Program of Research. Journal of Information Systems. 13. (Fall): 87–103. 13. Koreisha, S. and Y. Fang. 2004. Updating ARMA Predictions for Temporal Aggregates. Journal of Forecasting. 23: 275-396. 14. Leitch and Y. Chen. 2003. The Effectiveness of Expectation Models In Recognizing Error Patterns and Eliminating Hypotheses While Conducting Analytical Procedures. Auditing: A Journal of Practice and Theory 22 (Fall): 147206. 15. Murthy, U.S. 2004. An Analysis of the Effects of Continuous Monitoring Controls on e-Commerce System Performance. Journal of Information Systems. 18 (Fall): 29–47. 16. ___________and M.S. Groomer. 2004. A continuous auditing web services model for XML-based accounting systems. International Journal of Accounting Information Systems 5: 139-163. 17. Pandher G.S. 2002. Forecasting Multivariate Time Series with Linear Restrictions Using Unconstrained Structural State-space Models. Journal of Forecasting 21. 281-300. 18. Rezaee, Z., A. Sharbatoghlie, R. Elam, and P.L. McMickle. 2002. Continuous Auditing: Building Automated Auditing Capability. Auditing: A Journal of Practice and Theory 21 (Spring): 147-163. 19. Searcy, D. L., Woodroof, J. B., and Behn, B. 2003. Continuous Audit: The Motivations, Benefits, Problems, and Challenges Identified by Partners of a Big 4 Accounting Firm. Proceedings of the 36th Hawaii International Conference on System Sciences: 1-10. 20. Stringer, K. and T. Stewart. 1986. Statistical techniques for analytical review in auditing. Wiley Publishing. New York. 21. Swanson, N., E. Ghysels, and M. Callan. 1999. A Multivariate Time Series Analysis of the Data Revision Process for Industrial Production and the 33 Composite Leading Indicator. Book chapter of Cointegration, Causality, and Forecasting: Festchrift in Honour of Clive W.J. Granger. Eds. R. Engle and H. White. Oxford: Oxford University Press. 22. Vasarhelyi, M.A and F.B. Halper. 1991. The Continuous Audit of Online Systems. Auditing: A Journal of Practice and Theory 10 (Spring):110–125. 23. _______________ 2002. Concepts in Continuous Assurance. Chapter 5 in Researching Accounting as an Information Systems Discipline, Edited by S. Sutton and V. Arnold. Sarasota, FL: AAA. 24. ______________, M.A. Alles, and A. Kogan. 2004. Principles of Analytic Monitoring for Continuous Assurance. Forthcoming, Journal of Emerging Technologies in Accounting. 25. Woodroof, J. and D. Searcy 2001. Continuous Audit Implications of Internet Technology: Triggering Agents over the Web in the Domain of Debt Covenant Compliance. Proceedings of the 34th Hawaii International Conference on System Sciences. 34 VI: Figures, Tables and Charts Figure 1: Business Process Transaction Flow Diagram Ordering Process Receiving Process Voucher Payment Process 35 Figure 2: Model Updating Protocol CA System 1 0 3 1 0 2 1 0 1 AP Model 1 101 Predicted Value Data Segments for Analytical Modeling: 1,2,3,4,5,6……100 Updated CA System 1 0 4 1 0 3 1 0 2 AP Model 2 102 Predicted Value AP Model 3 103 Predicted Value Data Segments for Analytical Modeling: 1,2,3,4,5,6……100, 101 Updated CA System 1 0 5 1 0 4 1 0 3 Data Segments for Analytical Modeling: 1,2,3,4,5,6……100, 101, 102 36 Figure 3: Multivariate Time Series Model Selection Initial Model Estimation Determine Parameter p-value threshold Retain parameters below threshold; Restrict parameters over threshold to zero Re-estimate Model No Do new parameter estimates all below threshold? Yes Final Model 37 Table 1: Summary Statistics Variable N Mean Std Dev Minimum Maximum Order Receive Voucher 147 147 147 6613.37 6488.29 5909.71 3027.46 3146.43 3462.99 3240 171 0 30751 29599 30264 The table presents the summary statistics for the transaction quantity daily aggregates for each business process. The low minimums for Receive and Voucher are due to the date cutting off problem. Our data sets span from 10/01/03 to 06/30/04. Many related transactions for the Receive and Voucher for the first 2 days of our data set may happen before 10/01/03. 38 Table 2 – MAPE Comparison among SEM, MTSM, and Linear Regression Model 1. Simultaneous Equations MAPE Analysis Variable: Voucher Quantity Variance N Mean Std Dev Minimum Maximum —————————————————————————————————— 45 0.3805973 0.3490234 0.0089706 2.0227909 —————————————————————————————————— 2. Multivariate Time Series MAPE Analysis Variable: Voucher Quantity Variance N Mean Std Dev Minimum Maximum ————————————————————————————————— 47 0.3766894 0.3292023 0.0147789 1.9099106 ————————————————————————————————— 3. Linear Regression MAPE Analysis Variable: Voucher Quantity Variance N Mean Std Dev Minimum Maximum —————————————————————————————————— 45 0.3632158 0.3046678 0.0366894 1.7602224 —————————————————————————————————— The MAPE is represented by the Mean value of each panel. 39 Table 3A: False Negative Error Rates of Simultaneous Equation Models with and without Error Correction Error Magnitude Simultaneous Equation Simultaneous Equation Model with Error Model without Error Correction Correction 10% 90% 91.25% 50% 78.75% 78.75% 100% 33.75% 40% 200% 12.5% 16.25% 400% 0 10% The false negative error rate indicates the percentage of errors that are not detected by the AP model. It is calculated as: (total number of undetected errors) / 8 (which is the number of seeded errors)*100%. Table 3B: Detection Rates of Simultaneous Equation Models with and without Error Correction Error Magnitude Simultaneous Equation Simultaneous Equation Model with Error Model without Error Correction Correction 10% 10% 8.75% 50% 21.25% 21.25% 100% 66.25% 60% 200% 87.5% 83.75% 400% 100.00% 90% The detection rate indicates the percentage of errors that have been successfully detected. It is calculated as: 100% - False Negative Error Percentage. 40 Chart 1A: Anomaly Detection Comparison between Simultaneous Equation Models (SEM) with and without Error Correction — False Negative Error Rate SEM_Error_Correction SEM_No_Error_Correction 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 10%E 50%E 100%E 200%E 400%E Chart 1B: Anomaly Detection Comparison between Simultaneous Equation Models (SEM) with and without Error Correction — Detection Rate SEM_Error_Correction SEM_No_Correction 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 10%E 50%E 100%E 200%E 400%E 41 Table 4: False Positive Error Rates of Simultaneous Equation Models with and without Error Correction Error Magnitude Simultaneous Equation Simultaneous Equation Model with Error Model without Error Correction Correction 10% 0 0 50% 0 0 100% 0 0 200% 0 0 400% 0 0 The false positive error rate indicates the percentage of non-errors that are reported by the AP model as errors. It is calculated as: (total number of reported non-errors) / 8 (which is the number of seeded errors)*100%. 42 Table 5A: False Negative Error Rates of MTSM with and without Error Correction Error Magnitude Multivariate Time Series Multivariate Time Series with Error Correction without Error Correction 10% 96.25% 95% 50% 71.25% 75% 100% 32.5% 40% 200% 8.75% 42.5% 400% 0 37.5% The false negative error rate indicates the percentage of errors that are not detected by the AP model. It is calculated as: (total number of undetected errors) / 8 (which is the number of seeded errors)*100%. Table 5B: Detection Rates of MTSM with and without Error Correction Error Magnitude Multivariate Time Series Multivariate Time Series with Error Correction without Error Correction 10% 3.75% 5% 50% 28.75% 25% 100% 67.50% 60% 200% 91.25% 57.50% 400% 100.00% 62.50% The detection rate indicates the percentage of errors that have been successfully detected. It is calculated as: 100% - False Negative Error Percentage. 43 Chart 2A: Anomaly Detection Comparison between MTSM with and without Error Correction — False Positive Error Rate MTSM_Error_Corretion MTSM_No_Error_Corretion 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 10%E 50%E 100%E 200%E 400%E Chart 2B: Anomaly Detection Comparison between MTSM with and without Error Correction — Detection Rate MTSM_Error_Correction MTSM_No_Error_Correction 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 10%E 50%E 100%E 200%E 400%E 44 Table 6: False Positive Error Rates of MTSM with and without Error Correction Error Magnitude Multivariate Time Series Multivariate Time Series with Error Correction without Error Correction 10% 0 0 50% 0 2.5% 100% 0 2.5% 200% 0 1.25% 400% 0 0 The false positive error rate indicates the percentage of non-errors that are reported by the AP model as errors. It is calculated as: (total number of reported non-errors) / 8 (which is the number of seeded errors)*100%. 45 Table 7A: False Negative Error Rates of Linear Regression Model with and without Error Correction Error Magnitude Linear Regression with Linear Regression without Error Correction Error Correction 10% 95% 92.5% 50% 68.75% 76.25% 100% 33.75% 45% 200% 17.5% 28.75% 400% 2.5% 21.25% The false negative error rate indicates the percentage of errors that are not detected by the AP model. It is calculated as: (total number of undetected errors) / 8 (which is the number of seeded errors)*100%. Table 7B: Detection Rates of Linear Regression Model with and without Error Correction Error Magnitude Linear Regression with Linear Regression without Error Correction Error Correction 10% 5% 7.50% 50% 31.25% 23.75% 100% 66.25% 55% 200% 82.50% 71.25% 400% 97.50% 78.75% The detection rate indicates the percentage of errors that have been successfully detected. It is calculated as: 100% - False Negative Error Percentage. 46 Chart 3A: Anomaly Detection Comparison between Linear Regression Model with and without Error Correction — False Negative Error Rate Linear_Regression_Error_Correction Linear_Regression_No_Error_Correction 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 10%E 50%E 100%E 200%E 400%E Chart 3B: Anomaly Detection Comparison between Linear Regression Model with and without Error Correction — Detection Rate Linear_Regression_Error_Correction Linear_Regression_No_Error_Correction 120% 100% 80% 60% 40% 20% 0% 10%E 50%E 100%E 200%E 400%E 47 Table 8: False Positive Error Rates of Linear Regression with and without Error Correction Error Magnitude Linear Regression Model Linear Regression Model without Error Correction with Error Correction 10% 0 0 50% 0 0 100% 0 0 200% 0 0 400% 0 0 The false positive error rate indicates the percentage of non-errors that are reported by the AP model as errors. It is calculated as: (total number of reported non-errors) / 8 (which is the number of seeded errors)*100%. 48 Table 9A: False Negative Error Rates of SEM, MTSM, and Linear regression Error Magnitude Simultaneous Multivariate Time Linear Regression Equations Series 10% 90.00% 96.25% 95% 50% 78.75% 71.25% 68.75% 100% 33.75% 32.5% 33.75% 200% 12.50% 8.75% 17.5% 400% 0 0 2.5% The false negative error rate indicates the percentage of errors that are not detected by the AP model. It is calculated as: (total number of undetected errors) / 8 (which is the number of seeded errors)*100%. Table 9B: Detection Rates of SEM, MTSM, and Linear regression Error Magnitude 10%E Simultaneous Equations 10.00% Multivariate Series 3.75% Time Linear Regression 50%E 21.25% 28.75% 31.25% 100%E 66.25% 67.50% 66.25% 200%E 87.50% 91.25% 82.50% 400%E 100.00% 100.00% 97.50% 5% The detection rate indicates the percentage of errors that have been successfully detected. It is calculated as: 100% - False Negative Error Percentage. 49 Chart 4A: Anomaly Detection Comparison of SEM, MTSM and Linear Regression — False Negative Error Rate. SEM MTSM Linear Regression 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 10%E 50%E 100%E 200%E 400%E Chart 4B: Anomaly Detection Comparison of SEM, MTSM and Linear Regression — Detection Rate SEM MTSM Linear_Regression 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 10%E 50%E 100%E 200%E 400%E 50 Table 10: False Positive Error Rates of SEM, MTSM, and Linear regression Error Magnitude Simultaneous Multivariate Time Linear Regression Equations Series 10% 0 0 0 50% 0 0 0 100% 0 0 0 200% 0 0 0 400% 0 0 0 The false positive error rate indicates the percentage of non-errors that are reported by the AP model as errors. It is calculated as: (total number of reported non-errors) / 8 (which is the number of seeded errors)*100%. 51 VII: Appendix: Multivariate Time Series Model with All Parameter Estimates (No Restriction Model) Equation Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Parameter AR1_1_1 AR1_1_2 AR1_1_3 AR2_1_1 AR2_1_2 AR2_1_3 AR3_1_1 AR3_1_2 AR3_1_3 AR4_1_1 AR4_1_2 AR4_1_3 AR5_1_1 AR5_1_2 AR5_1_3 AR6_1_1 AR6_1_2 AR6_1_3 AR7_1_1 AR7_1_2 AR7_1_3 AR8_1_1 AR8_1_2 AR8_1_3 AR9_1_1 AR9_1_2 AR9_1_3 AR10_1_1 AR10_1_2 AR10_1_3 AR11_1_1 AR11_1_2 AR11_1_3 AR12_1_1 AR12_1_2 AR12_1_3 AR13_1_1 AR13_1_2 AR13_1_3 AR14_1_1 AR14_1_2 Estimate -0.16037 0.773021 0.056123 -0.03406 0.093277 -0.07466 0.005592 0.105725 -0.0933 -0.17319 0.084414 -0.17791 -0.14743 0.179332 -0.14668 -0.19199 0.104713 -0.02084 0.089424 0.105111 -0.22252 0.162881 -0.00173 0.092181 -0.01444 -0.2778 0.140415 -0.06404 0.215473 0.052242 0.137301 0.120841 0.070449 0.06304 -0.03973 -0.12179 -0.21532 0.134275 -0.06533 -0.15346 -0.2001 StdErr 0.186429 0.170199 0.161157 0.178949 0.203196 0.165114 0.171766 0.197241 0.161175 0.18603 0.188326 0.151337 0.194655 0.197046 0.138626 0.198044 0.201667 0.137556 0.183408 0.198981 0.142094 0.167633 0.188214 0.140709 0.197208 0.202377 0.136786 0.220088 0.284546 0.138868 0.25217 0.277529 0.149185 0.251674 0.277996 0.151897 0.228969 0.251694 0.149511 0.229446 0.250259 tValue -0.86023 4.541852 0.348252 -0.19033 0.459047 -0.45219 0.032554 0.536021 -0.57885 -0.93098 0.448234 -1.17561 -0.75738 0.9101 -1.0581 -0.96941 0.519238 -0.1515 0.48757 0.528246 -1.566 0.971649 -0.00917 0.65512 -0.07324 -1.37267 1.026532 -0.29099 0.757252 0.376195 0.544478 0.435417 0.472223 0.250484 -0.14291 -0.80181 -0.94037 0.533484 -0.43696 -0.66885 -0.79958 Probt 0.396966301 0.0001 0.730255749 0.85042111 0.649743948 0.654615021 0.974261525 0.596177159 0.567318487 0.35982232 0.65743388 0.249649354 0.45515182 0.370538137 0.299052293 0.340638372 0.607675379 0.880666117 0.629650204 0.601490467 0.128581577 0.339544558 0.992744711 0.517737052 0.942134933 0.180752008 0.313428089 0.773200909 0.455224862 0.709607308 0.590423095 0.666597994 0.640427632 0.804041662 0.887381603 0.429415981 0.355071398 0.597908337 0.665491493 0.50907024 0.430686579 Variable Voucher(t-1) Receive(t-1) Order(t-1) Voucher(t-2) Receive(t-2) Order(t-2) Voucher(t-3) Receive(t-3) Order(t-3) Voucher(t-4) Receive(t-4) Order(t-4) Voucher(t-5) Receive(t-5) Order(t-5) Voucher(t-6) Receive(t-6) Order(t-6) Voucher(t-7) Receive(t-7) Order(t-7) Voucher(t-8) Receive(t-8) Order(t-8) Voucher(t-9) Receive(t-9) Order(t-9) Voucher(t-10) Receive(t-10) Order(t-10) Voucher(t-11) Receive(t-11) Order(t-11) Voucher(t-12) Receive(t-12) Order(t-12) Voucher(t-13) Receive(t-13) Order(t-13) Voucher(t-14) Receive(t-14) 52 Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Voucher Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive AR14_1_3 AR15_1_1 AR15_1_2 AR15_1_3 AR16_1_1 AR16_1_2 AR16_1_3 AR17_1_1 AR17_1_2 AR17_1_3 AR18_1_1 AR18_1_2 AR18_1_3 AR1_2_1 AR1_2_2 AR1_2_3 AR2_2_1 AR2_2_2 AR2_2_3 AR3_2_1 AR3_2_2 AR3_2_3 AR4_2_1 AR4_2_2 AR4_2_3 AR5_2_1 AR5_2_2 AR5_2_3 AR6_2_1 AR6_2_2 AR6_2_3 AR7_2_1 AR7_2_2 AR7_2_3 AR8_2_1 AR8_2_2 AR8_2_3 AR9_2_1 AR9_2_2 AR9_2_3 AR10_2_1 AR10_2_2 AR10_2_3 AR11_2_1 AR11_2_2 AR11_2_3 AR12_2_1 AR12_2_2 -0.05806 0.069123 0.187622 0.087902 -0.04247 0.068967 -0.15622 0.18883 0.394092 0.169976 -0.15459 0.03755 -0.03482 -0.03055 0.020737 0.320549 0.121545 -0.01138 0.118937 0.014457 0.163507 -0.09866 0.12031 0.082318 0.04232 -0.0029 -0.02093 -0.03364 0.065528 -0.20524 0.241198 -0.18737 -0.07976 0.128889 -0.01147 0.245291 -0.04621 0.209188 0.134783 -0.13692 0.713458 -0.222 -0.18318 -0.07424 0.270723 -0.00188 -0.04491 -0.31936 0.148166 0.230829 0.268598 0.147067 0.252551 0.273247 0.146218 0.249186 0.274718 0.146612 0.251826 0.294286 0.151271 0.217119 0.198218 0.187687 0.208408 0.236646 0.192296 0.200043 0.229711 0.187708 0.216655 0.219328 0.17625 0.2267 0.229484 0.161446 0.230646 0.234866 0.1602 0.213601 0.231737 0.165486 0.195229 0.219198 0.163872 0.229672 0.235692 0.159304 0.25632 0.331388 0.161729 0.293683 0.323216 0.173745 0.293105 0.32376 -0.39182 0.299456 0.698524 0.597699 -0.16817 0.252396 -1.06839 0.757787 1.43453 1.159358 -0.61386 0.127596 -0.23016 -0.14073 0.104617 1.707894 0.58321 -0.04809 0.618512 0.072267 0.711796 -0.5256 0.555308 0.37532 0.240113 -0.01278 -0.09123 -0.20834 0.284106 -0.87385 1.505604 -0.87721 -0.34419 0.778849 -0.05877 1.119039 -0.28199 0.91081 0.57186 -0.85951 2.783469 -0.66992 -1.13262 -0.25279 0.837592 -0.01082 -0.15321 -0.98642 0.698154055 0.76680342 0.490610739 0.554844733 0.867660209 0.802578559 0.294469019 0.454909542 0.162496664 0.256101901 0.544263909 0.899380593 0.819643158 0.889092916 0.917425712 0.098722011 0.564420534 0.961986335 0.541237292 0.942903001 0.482480094 0.603303003 0.583093628 0.710250662 0.811992104 0.98989428 0.927963093 0.836473545 0.778419671 0.389635351 0.14336769 0.387837709 0.73327743 0.442601876 0.953553137 0.272631917 0.780028486 0.370170555 0.571979672 0.397355911 0.009526644 0.508396592 0.266982012 0.802280928 0.409352563 0.99144303 0.87933249 0.332374044 Order(t-14) Voucher(t-15) Receive(t-15) Order(t-15) Voucher(t-16) Receive(t-16) Order(t-16) Voucher(t-17) Receive(t-17) Order(t-17) Voucher(t-18) Receive(t-18) Order(t-18) Voucher(t-1) Receive(t-1) Order(t-1) Voucher(t-2) Receive(t-2) Order(t-2) Voucher(t-3) Receive(t-3) Order(t-3) Voucher(t-4) Receive(t-4) Order(t-4) Voucher(t-5) Receive(t-5) Order(t-5) Voucher(t-6) Receive(t-6) Order(t-6) Voucher(t-7) Receive(t-7) Order(t-7) Voucher(t-8) Receive(t-8) Order(t-8) Voucher(t-9) Receive(t-9) Order(t-9) Voucher(t-10) Receive(t-10) Order(t-10) Voucher(t-11) Receive(t-11) Order(t-11) Voucher(t-12) Receive(t-12) 53 Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Receive Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order AR12_2_3 AR13_2_1 AR13_2_2 AR13_2_3 AR14_2_1 AR14_2_2 AR14_2_3 AR15_2_1 AR15_2_2 AR15_2_3 AR16_2_1 AR16_2_2 AR16_2_3 AR17_2_1 AR17_2_2 AR17_2_3 AR18_2_1 AR18_2_2 AR18_2_3 AR1_3_1 AR1_3_2 AR1_3_3 AR2_3_1 AR2_3_2 AR2_3_3 AR3_3_1 AR3_3_2 AR3_3_3 AR4_3_1 AR4_3_2 AR4_3_3 AR5_3_1 AR5_3_2 AR5_3_3 AR6_3_1 AR6_3_2 AR6_3_3 AR7_3_1 AR7_3_2 AR7_3_3 AR8_3_1 AR8_3_2 AR8_3_3 AR9_3_1 AR9_3_2 AR9_3_3 AR10_3_1 AR10_3_2 -0.00739 -0.17556 0.08359 0.124446 -0.00485 -0.15028 0.031932 -0.02061 -0.07347 -0.07068 0.162509 -0.403 0.095081 -0.20449 0.219422 -0.02093 0.03724 0.096999 0.019438 0.154733 -0.07571 -0.22663 -0.00724 -0.15633 0.011352 0.152747 -0.1919 0.264969 0.232692 -0.20221 0.427858 -0.07093 -0.21575 0.116105 0.329587 -0.16738 0.356457 0.22186 -0.27512 0.241143 -0.13461 -0.36707 0.135778 -0.49843 0.065586 -0.16232 -0.47439 0.383478 0.176902 0.266663 0.293128 0.174124 0.267218 0.291458 0.172557 0.268828 0.312815 0.171278 0.294126 0.31823 0.170288 0.290208 0.319943 0.170748 0.293282 0.342732 0.176174 0.203475 0.185762 0.175892 0.195311 0.221775 0.180212 0.187472 0.215276 0.175912 0.20304 0.205545 0.165175 0.212454 0.215063 0.151301 0.216152 0.220107 0.150133 0.200178 0.217175 0.155087 0.182961 0.205423 0.153574 0.215239 0.220881 0.149293 0.240212 0.310563 -0.04176 -0.65836 0.285165 0.714697 -0.01816 -0.51563 0.185049 -0.07668 -0.23488 -0.41263 0.552515 -1.26638 0.558351 -0.70463 0.685814 -0.12259 0.126976 0.283016 0.110332 0.76045 -0.40756 -1.28846 -0.03705 -0.70491 0.062995 0.814774 -0.89143 1.506259 1.14604 -0.98378 2.590337 -0.33388 -1.00321 0.767376 1.524792 -0.76044 2.374273 1.108314 -1.26682 1.554892 -0.73572 -1.78689 0.884119 -2.31572 0.296931 -1.08723 -1.97487 1.234784 0.966987811 0.515684352 0.777616586 0.480713307 0.98563778 0.610159856 0.854524356 0.939424617 0.8160131 0.683017006 0.584979682 0.21581074 0.581042391 0.486859807 0.498469878 0.903307972 0.899866486 0.779246537 0.91293285 0.453341887 0.68669768 0.208130573 0.9707106 0.486686616 0.950218399 0.422077713 0.380297082 0.143200403 0.261478928 0.33364933 0.015051624 0.740960556 0.324351539 0.449281197 0.138527803 0.453345853 0.024676001 0.277155871 0.215657318 0.131203438 0.46802254 0.084787272 0.384161153 0.028116941 0.768710913 0.28620511 0.058217519 0.22717318 Order(t-12) Voucher(t-13) Receive(t-13) Order(t-13) Voucher(t-14) Receive(t-14) Order(t-14) Voucher(t-15) Receive(t-15) Order(t-15) Voucher(t-16) Receive(t-16) Order(t-16) Voucher(t-17) Receive(t-17) Order(t-17) Voucher(t-18) Receive(t-18) Order(t-18) Voucher(t-1) Receive(t-1) Order(t-1) Voucher(t-2) Receive(t-2) Order(t-2) Voucher(t-3) Receive(t-3) Order(t-3) Voucher(t-4) Receive(t-4) Order(t-4) Voucher(t-5) Receive(t-5) Order(t-5) Voucher(t-6) Receive(t-6) Order(t-6) Voucher(t-7) Receive(t-7) Order(t-7) Voucher(t-8) Receive(t-8) Order(t-8) Voucher(t-9) Receive(t-9) Order(t-9) Voucher(t-10) Receive(t-10) 54 Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order Order AR10_3_3 AR11_3_1 AR11_3_2 AR11_3_3 AR12_3_1 AR12_3_2 AR12_3_3 AR13_3_1 AR13_3_2 AR13_3_3 AR14_3_1 AR14_3_2 AR14_3_3 AR15_3_1 AR15_3_2 AR15_3_3 AR16_3_1 AR16_3_2 AR16_3_3 AR17_3_1 AR17_3_2 AR17_3_3 AR18_3_1 AR18_3_2 AR18_3_3 -0.16732 -0.02051 -0.20287 0.115627 0.224882 0.350439 -0.06855 0.460876 -0.3442 0.147679 0.076131 0.50425 0.387588 0.036202 0.693183 0.00421 0.643019 -0.11341 0.102113 -0.29129 -0.18914 -0.2682 -0.09114 -0.54341 -0.22331 0.151566 0.275228 0.302905 0.162826 0.274686 0.303415 0.165785 0.249905 0.274708 0.163182 0.250425 0.273142 0.161714 0.251934 0.293157 0.160514 0.275643 0.298232 0.159587 0.271971 0.299837 0.160018 0.274851 0.321194 0.165103 -1.10395 -0.07452 -0.66974 0.710127 0.818685 1.154983 -0.4135 1.844203 -1.25296 0.904999 0.304008 1.846109 2.396757 0.143697 2.364547 0.026226 2.332796 -0.38026 0.639856 -1.07104 -0.63081 -1.67606 -0.33158 -1.69183 -1.35253 0.279012969 0.941126086 0.508511703 0.483498415 0.419879493 0.257859377 0.682392815 0.07575913 0.22058352 0.373187325 0.763369746 0.075473617 0.023460166 0.886769198 0.02521952 0.979263435 0.02707079 0.706622645 0.527467718 0.293298539 0.533277278 0.104859286 0.742673334 0.101780131 0.187027552 Order(t-10) Voucher(t-11) Receive(t-11) Order(t-11) Voucher(t-12) Receive(t-12) Order(t-12) Voucher(t-13) Receive(t-13) Order(t-13) Voucher(t-14) Receive(t-14) Order(t-14) Voucher(t-15) Receive(t-15) Order(t-15) Voucher(t-16) Receive(t-16) Order(t-16) Voucher(t-17) Receive(t-17) Order(t-17) Voucher(t-18) Receive(t-18) Order(t-18) 55