Weka Tertius Explanation October 19, 2010 1. Introduction The Tertius association rule algorithm is presented in Flach & Lachiche (2001). This paper discusses a statistically well-founded confirmation measure used to determine how strong the rule is supported and the novelty of the rule. It further discuses how this confirmation measure can be paired with a search algorithm to generate interesting association rules (both supervised and unsupervised) from datasets with nominal attributes. The Tertius rule miner has previously been implemented for the Weka machine learning library (Deltour, 2001). This implementation makes it easy to use Tertius on any dataset that is in compatible Weka format (e.g., arff files). However, there is a disconnect between the Tertius algorithm discussed in Flach & Lachiche (2001) and the Weka implementation. Deltour (2001) provides only a user manual which discusses the command line options for Tertius. The author does not discuss the Tertius implementation at all. Weka is open source so the source code is freely available @ http://www.cs.waikato.ac.nz/ml/weka/. However, the actual source code has almost no documentation. This makes it very difficult to satisfactorily explain the results. For example: recently, we used this Tertius implementation on the EHC questionnaire data (???). The results were quite anomalous with the counter-examples values much higher than the confirmation. We were unable to find any explanation for this in the Tertius paper. After extensive digging through the source code, we found that these values were not the counter-examples but something completely differentβtruepositive values! In this paper, we begin to discuss Weka implementation of Tertius. In Section 2, we focus on the equations and parameters used to generate the three values associated with each association rule: (1) confirmation, (2) true-positive, and (3) false-positive. We provide a running example on the simple, synthetic Balls dataset given in Table 1. This dataset consists of three separate nominal attributes: Size, Bounce, and Color. The Color is used as the label attribute which is the head of the association rules. The other two attributes are used as the body of the rules. Note: this discussion applies only to Tertius used to generate supervised association rules. This occurs when the setRocAnalysis flag for Weka Tertius is set to true. In the future, we may provide an equivalent section for Tertius used to generate unsupervised association rules. Armed with an understanding of the equations/parameters, in Section 3 we provide an explanation for the anomalous results on the EHC questionnaire dataset. Table 1: Balls Dataset Used in Running Example. Size Bounce Color Small Low Red Large Low Red Small High Red Large High Red Small Low Blue Large Low Blue Small High Blue Large High Blue Small Large Low Low Red Red 2. Tertius Equations and Parameters As previously discussed, the Weka implementation of Tertius (referred to as wTertius from here on) outputs three separate values with each association rule: (1) confirmation value, (2) true-positive, and (3) the false-positive. Of these three values, the confirmation value of a rule is the most important because it measures how unusual/interesting the rules are. Interested readers should consult Flach & Lachiche (2001) for more details on confirmation. The other two values are included to measure the significance of the association rules. They will be explained in more detail later in this section. First, we look at the four parameters common to all the functions: ο· ππππ¦_πππ’ππ‘ππ = ππ’ππππ ππ πππππ‘π π€βπππ ππππ¦ ππ‘π‘ππππ’π‘ππ πππ‘πβ ππ’ππ ο· βπππ_πππ’ππ‘ππ = ππ’ππππ ππ πππππ‘π π€βπππ βπππ ππ‘π‘ππππ’π‘π ππππ πππ πππ‘πβ ππ’ππ ο· π_πππ’ππ‘ππ = ππ’ππππππ πππππ‘π π€βπππ ππππ¦ ππ‘π‘ππππ’π‘ππ πππ‘πβ, ππ’π‘ βπππ ππππ πππ ο· πππ π‘πππππ = ππ’ππππ ππ πππππ‘π The names for the first three are somewhat vague. The π_πππ’ππ‘ππ refers to the actual counterexample value discussed in the Flach & Lachiche (2001). The ππππ¦_πππ’ππ‘ππ seems to refer to the first row in the contingency table in that paper, whereas the βπππ_πππ’ππ‘ππ refers to the second column. For wTertius purposes, all four parameters are involved in generating the output values. An example of computing the four parameters on the Balls dataset is given in Table 2 for the rule: πππ’πππ = πππ€ → πππππ = πππ. (That is, ππππ¦ → βπππ.) The table gives the following parameter values for this rule: ο· ππππ¦_πππ’ππ‘ππ = 6 ο· βπππ_πππ’ππ‘ππ = 4 ο· π_πππ’ππ‘ππ = 2 ο· πππ π‘πππππ = 10 Table 2: Example of computing the four parameters for the rule: πππ’πππ = πππ€ → πππππ = πππ. The tiny B is for ππππ¦_πππ’ππ‘ππ, the tiny H is for βπππ_πππ’ππ‘ππ, and the tiny M is for π_πππ’ππ‘ππ. Size Bounce Color Small Low B Red Large Low B Red Small High Red Large High Red Small Low B Blue HM Large Low B Blue HM Small High Blue H Large High Blue H Small Low B Red Large Low B Red Second, we consider the confirmation value used to measure the novelty of the association rules. The confirmation value requires two more parameters (1) expected frequency and (2) observed frequency. The equations for all are given below. Both the ππ₯ππππ‘ππ and πππ πππ£ππ parameters range from 0 to 1. The constraints on ππ₯ππππ‘ππ prevent confirmation from dividing by zero. The ππ₯ππππ‘ππ parameter measures the novelty for the rules. It becomes larger when many points match the body of the rule and/or when many points do NOT match the head. The πππ πππ£ππ parameter balances ππ₯ππππ‘by penalizing the confirmation when the body is satisfied, but not the head. ππ₯ππππ‘ππ = ππππ¦_πππ’ππ‘ππ ∗βπππ_πππ’ππ‘ππ πππ π‘πππππ 2 πππ πππ£ππ = π_πππ’ππ‘ππ πππ π‘πππππ (roughly equal to the estimated number of counterexamples) (roughly equal to the observed number of counterexamples) πππππππππ‘πππ = 0 ππ ππ₯ππππ‘ππ = 1 ππ 0 πππππππππ‘πππ = ππ₯ππππ‘ππ − πππ πππ£ππ √ππ₯ππππ‘ππ − ππ₯ππππ‘ππ An example of computing the ππ₯ππππ‘ππ, πππ πππ£ππ, πππππππππ‘πππ on the Balls dataset is given in the bullets below for the rule: πππ’πππ = πππ€ → πππππ = πππ. ο· ππ₯ππππ‘ππ = 6 ∗ 4/100 = 0.24 ο· πππ πππ£ππ = 2/10 = 0.2 0.24−0.2 ο· πππππππππ‘πππ = 0.24−0.24 = 0.16 √ Third, we consider the true-positive and false-positive rates given as output in wTertius. The equations for both are given below. Both measure the quality of the association rule created by wTertius. The true-positive rate measures how often the entire rule is satisfied compared to the head whereas falsepositive measure how often the head of the rule is not satisfied in the data points. Ideally, to find interesting rules, the confirmation should be higher than true-positive and true-negative because we prefer novel rules over common rules, but a rule that never works (i.e., never satisfied) on the data points is also of limited use. π‘ππ’π-πππ ππ‘ππ£π = ππππ¦_πππ’ππ‘ππ − π_πππ’ππ‘ππ πππ π‘πππππ − βπππ_πππ’ππ‘ππ ππππ π-πππ ππ‘ππ£π = π_πππ’ππ‘ππ βπππ_πππ’ππ‘ππ An example of computing the π‘ππ’π-πππ ππ‘ππ£π and ππππ π-πππ ππ‘ππ£π on the Balls dataset is given in the bullets below for the rule: πππ’πππ = πππ€ → πππππ = πππ. 6−2 ο· π‘ππ’π-πππ ππ‘ππ£π = 10−4 = 0.67 ο· ππππ π-πππ ππ‘ππ£π = 2/4 = 0.5 To summarize, a good rule is one that has a high confirmation value, a high true-positive rate, and a low false-positive rate. 3. EHC Questionnaire Results We have previously used Tertius to obtain good results on education data (Riley, et al. 2009). However, when we ran the same configuration on the EHC Questionnaire dataset (???) there were some anomalous results. Specifically, we found that the second value was 1.0 for many rules generated from the results on domain VII. Originally, we thought this was the counter-instances, but we have subsequently found out this value was the true-positives. Now that we understand all the wTertius parameters, we can provide an explanation for these results. Overall, wTertius is working as intended. Recall that the dataset uses the Respond Parallel (RP) attribute as the label with attributes combined from the previous three turns. Despite this merge, the number of data points with π π = 1 are extremely smallβonly 1 point out of 676. This lopsided label distribution is causing high true-positive values. Take the following actual rule as an example: ο· π· ∗ = 0 πππ πΈππΈπ = 1 → π π = 1 4−3 o π‘ππ’π-πππ ππ‘ππ£π = 676−675 = 1 o πππππππππ‘πππ = 0.04 (1π π‘) For this rule, wTertius is working as intended. Recall, that true-positive measures how often the entire rule is satisfied compared to the head. For this rule, only one data point has π π = 1 so only one data point can satisfy the entire rule regardless of the number of points which satisfy the body. Therefore, if the number of body matches (ππππ¦_πππ’ππ‘ππ) increases, we will get a corresponding increase in the π_πππ’ππ‘ππ without a change overall π‘ππ’π-πππ ππ‘ππ£π (and vice-versa). This explains the large number of attributes with π π = 1 and π‘ππ’π-πππ ππ‘ππ£π = 1. References Deltour, A. (2001). Tertius Extension to Weka (Technical Report No. CSTR-01-001). United Kingdom: University of Bristol. Flach, P. A., & Lachiche, N. (2001). Confirmation-Guided Discovery of First-Order Rules with Tertius. Mach. Learn., 42(1-2), 61-95. Riley, S., Miller, L. D., Soh, L., Samal, A., & Nugent, G. (2009). Intelligent Learning Object Guide (iLOG): A Framework for Automatic Empirically-Based Metadata Generation. In Artificial Intelligence in Education (pp. 515-522).