slides

G54DMT – Data Mining Techniques and Applications http://www.cs.nott.ac.uk/~jqb/G54DMT Dr. Jaume Bacardit jqb@cs.nott.ac.uk Topic 3: Data Mining Lecture 5: Regression and Association Rules Some slides from chapter 5 of Data Mining. Concepts and Techniques by Han & Kamber Outline of the lecture • Regression – Definition – Representations • Association rules – Definition – Methods • Resources Regression • Regression problems are supervised problems where the output variable is continuous • Many techniques with different names are included in this category – Regression – Function approximation – Modelling – Curve-fitting • Given an input vector X and a corresponding output y, we want to find a function f such that y’=f(X) is as close as possible to the true y Evaluating regression • Supervised learning: we know the true outputs, so we check how different are from the predicted ones n 1 MAE = å f (X i ) - y i n i=1 – Mean Absolute Error – Mean Squared Error n – Root Mean Squared Error MSE = 1 å ( f (X i ) - y i ) 2 n i=1 RMSE = MSE Linear Regression • Most classic (and widespread in statistics) type of regression • f(X) is modelled as – y’=w0+w1x1+w2x2+…+wnxn http://upload.wikimedia.org/wikipedia/en/thumb/1/13/Linear_regression.png/400px-Linear_regression.png Linear regression • Simple but limited in expression power – The same model would apply to these four datasets http://en.wikipedia.org/wiki/Anscombe%27s_quartet Linear regression • How to find W? – Many mathematical methods available • • • • Least squares Ridge regression Lasso Etc – We can also use some kind of metaheuristic (e.g. a Genetic Algorithm) Polynomial regression • More complex and sophisticated functions – y=w0+w1x+w2x2+….. – Y=w0+w1x1+w2x2+w3x1x2+… • Now the job is double – Choosing the correct function (human inspection may help) – Adjusting the weights of the model • Still, would a single mathematical function fit any type of data? Piece-wise regression • Input space is partitioned in regions • A local regression model is generated from the training examples that fell inside each region – Approximating a sine function with linear regressions (Butz, 2010) Piece-wise regression • How to partition the input space – Using a series of rules • With a (hyper)rectangular condition (XCSF) • With a (hyper)ellipsoidal condition (XCSF,LWPR) • With a neural condition (XCSF) – Using a tree-like structure (CART, M5) • How to perform the regression process for each local approximation – Pick any of the functions discussed before – Plus some truly non-linear methods (SVR) Piece-wise approximation with hyperellipsoids • Using XCSF (Wilson, 02) with hyperellipsoid conditions (Butz et al, 08) XCSF’s population Test function (Stalph et al, 2010) Other regression methods • Neural networks – A MLP is natively a regression method • Classification is done by discretising the output of the network – It is proven that a MLP with enough hidden nodes can approximate any function • Support Vector Regression – As in SVM, depending on the kernel we got linear or nonlinear regression – The margin specifies a tube around the approximated function. All points inside the tube have their errors ignored – Support Vectors are the points that lay outside the tube Association Rules • Association rules try to find frequent patterns in the dataset that appear together • It can use the class label but it does not have to  we can consider it an unsupervised learning paradigm • Two types of elements being generated – Association rules: They have antecedent and consequent – Frequent itemsets: They just have an antecedent. • Both antecedent and consequent are logic predicates (generally of conjunctive form) Association rules mining Witten and Frank, 2005 (http://www.cs.waikato.ac.nz/~eibe/Slides2edRev2.zip) Origin of Association Rules • These methods were originally employed to analyse shopping carts • Database is specified as a set of transactions. Each of them includes one or more of a set of items • An frequent itemset is a set of items that appears in many transactions Tid Items 10 A, C, D • These databases are extremely sparse 20 B, C, E 30 A, B, C, E 40 B, E Beers and diapers • An urban myth about association rules says that when applied to analyze a very large volume of shopping carts they discovered a very simple pattern – “Customers that buy beer also tend to buy diapers” • This story has changed through time. You can find an article about it here • It is a good example of data mining, as it was able to find an unexpected pattern Why Is Freq. Pattern Mining Important? • Discloses an intrinsic and important property of data sets • Forms the foundation for many essential data mining tasks – Association, correlation, and causality analysis – Sequential, structural (e.g., sub-graph) patterns – Pattern analysis in spatiotemporal, multimedia, time-series, and stream data – Classification: associative classification – Cluster analysis: frequent pattern-based clustering – Data warehousing: iceberg cube and cube-gradient – Semantic data compression: fascicles – Broad applications Evaluation of association rules • Support – Percentage of examples covered by the predicate in the antecedent – Applies to both association rules and frequent itemsets • Confidence – Percentage of the examples matched by the antecedent for which also match the consequent – Only apply to association rules • Typically, the user specifies a minimum support and confidence and the algorithm finds all rules above the thresholds Scalable Methods for Mining Frequent Patterns • The downward closure property of frequent patterns – Any subset of a frequent itemset must be frequent – If {beer, diaper, nuts} is frequent, so is {beer, diaper} – i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper} • Scalable mining methods: Three major approaches – Apriori (Agrawal & Srikant@VLDB’94) – Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00) – Vertical data format approach (Charm—Zaki & Hsiao @SDM’02) Apriori: A Candidate Generation-and-Test Approach • Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94) • Method: – Initially, scan DB once to get frequent 1-itemset – Generate length (k+1) candidate itemsets from length k frequent itemsets – Test the candidates against DB – Terminate when no frequent or candidate set can be generated The Apriori Algorithm—An Example Supmin = 2 Itemset sup {A} 2 {B} 3 {C} 3 {D} 1 {E} 3 Database TDB Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E C1 1st scan C2 L2 Itemset {A, C} {B, C} {B, E} {C, E} sup 2 2 3 2 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} sup 1 2 1 2 3 2 Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 L1 C2 2nd scan Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} C3 Itemset {B, C, E} 3rd scan L3 Itemset sup {B, C, E} 2 The Apriori Algorithm • Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk; Resources • “The Elements of Statistical Learning” by Hastie et al. contains a lot of detail about statistical regression • List of Regression and association rules methods in KEEL • Weka also contains both kind of methods • Chapter 5 of the Han and Kamber book is all about association rules (Han created the Fpgrowth method) • Review of evolutionary algorithms for association rule mining Questions?

slides

Related documents

Products

Support

slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib