Data Mining in Forecasting

Chapter 9 DATA MINING PAULA JENSEN SDSM&T ENGM 745 McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved. DATA MINING  DATA-DATA  Extracting of useful information from large databases  Tools of Data Mining  Looking at where to find the data TOOLS OF DATA MINING Prediction  Classification  Clustering  Association  PREDICTION  Predict the value of a numeric variable Customer’s expenditure  Will they purchase  What are their interests  Do their interests predict a purchase  CLASSIFICATION Classes of objects or actions  Reliability of customer  Income  Location  CLUSTERING Analysis tools analyze objects viewed as a class  Where is the cut off of income or size  How do I group the information  ASSOCIATION Patterns based on likes  Netflix  Facebook  Google  CLASSIFICATION  k-nearest neighbor  Naïve Bayes  Classification/regression trees  Logistic Regression DATA MINING TERMINOLOGY 9-10 9-11 9-12 K-NEAREST NEIGHBOR Use Subset of total data called training data  Select closest neighbor with Euclidian distance shown in previous slide other metrics available to measure to define neighbors  Validation data is a separate set of data  Test statistic important on the validation data versus the training data  60% of data training data and 40% validation data acceptable mix  9-14 9-15 K-NEAREST NEIGHBOR ANALYSIS Multidimentional  Program is going to compute a distance associated to each attribute  Continuous Variables are measured in different scales  Categorical attributes will use a weighted mechanism  Example is will they respond to marketing to take a loan  K=3 means used 3 neighbors to classify all records 9-17 Type 1 would take a loan – Type 0 would not take a loan 9-18 9-19 TERMS Lift – measures the change in concentration of a particular class when the model is used to select a group from the general population. Significant lift on the example.  Decile Wise chart- Pick the top 10% of our records classified by our model our selection would include approximately 7 times as many correct classifications.  Classification Trees 9-21 9-22 9-23 CLASSIFICATION TREES  Advantages Decision rules are easy  Easy to understand   Disadvantages Overfit data  Correlated attributes will cause multicollinearity  9-25 9-26 9-27 9-28 NAÏVE BAYSES Statistical Classification  Bayes Therom: predicts the probability of a prior event given a certain subsequent event has taken place  Called Naïve because each attribute is assumed as independent  9-30 9-31 9-32 9-33 BAYESIAN THEOREM  P (A|B) = (P(B|A))* P(A) P(B) P(A) is the prior probability P (A|B) is conditional probability of A, given B P (B|A) is the conditional probability of B given A P (B) is the prior probability of B 9-35 APPLYING BAYES’ THEROM REGRESSION Logistic regression or Logit analysis  Difference between logics regression and ordinary regression is that the dependent variable in logistic regression is categorical not continuous  Dependent Variable is Dichtomous- either yes or no  Dependent variable is either will be limited to values between 0 and 1  9-38 9-39 9-40 9-41 9-42 9-43 9-44 9-45 9-46 WHERE DO I FIND THE DATA???  Current Customer Activity  Collect in your database Family names  Sales software  Forms from your website Wufoo.com  Track inquiries  Current Facebook Activity  BUY IT!  Mailing lists  How to use it??? 

Data Mining in Forecasting

Related documents

Products

Support

Data Mining in Forecasting

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib