Adult data set Allele ANOVA table Back-propagation 18, 176 240-241 46 250 212, 298299 68 204-206 227-234 246 74 Balancing the data Baseball data set Bayesian approach Bayesian belief networks Boltzmann selection California data set Case study: modeling response to direct mail marketing 265-316 case study: business understanding phase building the cost/benefit table direct mail marketing response problem false negative false positive true negative true positive case study: data understanding and data preparation phases clothing-store data set deriving new variables exploring the relationship between the predictors and the response investigating the correlation structure among the predictors Microvision life style clusters product uniformity standardization and flag variables transformations to achieve normality or symmetry case study: modeling and evaluation phases balancing the training data set cluster analysis: BIRCH clustering algorithm cluster profiles combining models using the mean response probabilities combining models: voting comprehensive listing of input variables establishing base line model performance model collection A: using the principle components model collection B: non-PCA models 267-270 267-270 267 269 269 268 268 270-289 270 277-278 278-286 286-289 271 272 276-277 272-275 289-312 298-299 294-298 294-297 308-312 304-306 291 299-300 300-302 306-308 model collections A and B outline of modeling strategy overbalancing as a surrogate for misclassification costs partitioning the data set principle component profiles principle components analysis CRISP-DM summary of case study chapter 298 289-290 302-304 290 292-293 292 265-267 312-315 34, 94 240-241 163, 207 xiii 267 Cereals data set Chromosomes Churn data set Clementine software Clothing-store data set Cluster analysis: BIRCH clustering algorithm Coefficient of determination Combining models using the mean response probabilities Combining models: voting Conditional independence Cook's distance Correlation coefficient Cost/benefit table 294-298 39-43 308-312 304-306 216 52 45 267-270 xiii, 265267 240 241 242 242 245 xi CRISP-DM Crossover Crossover operator Crossover point Crossover rate Crowding Data mining, what is Datasets adult baseball California cereals churn clothing-store houses Deriving new variables Dimension reduction methods factor analysis Bartlett's test of sphericity equimax rotation factor analysis model factor loadings factor rotation Kaiser-Meyer-Olkin measure of 18, 176 68 74 34, 94 163, 207 267 5 277-278 1-32 18-23 19 23 18 18, 20 20-23 19 sampling adequacy oblique rotation orthogonal rotation principle axis factoring quartimax rotation varimax rotation multicollinearity need for principle components analysis (PCA) communalities component matrix component weights components correlation coefficient correlation matrix covariance covariance matrix eigenvalue criterion eigenvalues eigenvectors how many components to extract minimum communality criterion orthogonality partial correlation coefficient principle component profiling the principle components proportion of variance explained criterion scree plot criterion standard deviation matrix validation of the principle components summary of dimension reduction methods chapter user-defined composites measurement error summated scales Discovering Knowledge in Data, an Introduction to Data Mining (by Daniel Larose, Wiley, 2005) Discrete crossover Elitism Empirical rule Estimated regression equation Estimation error False negative False positive Fitness 23 22 19 23 21 1, 115123 1-2 2-17 15-17 8 8 2 3 4 3 3 10 4 4 9-12 16 9 5 4 13-15 10 11 3 17 25-27 23-25 24 24 xi, 1, 18, 33, 268, 294 249 246 1 35 36 269 269 241 Fitness function Fitness sharing function Generation Genes Genetic algorithms 241 245 242 240-241 240-264 basic framework of a genetic algorithm crossover point crossover rate generation mutation rate roulette wheel method 241-242 242 242 242 242 242 discrete crossover normally distributed mutation simple arithmetic crossover single arithmetic crossover whole arithmetic crossover 248-249 249 249 248 248 249 allele chromosomes crossover crossover operator fitness fitness function genes Holland, John locus mutation mutation operator population selection operator 240-241 240-241 240-241 240 241 241 241 240-241 240 240 241 241 241 241 multipoint crossover positional bias uniform crossover 247 247 247 247 Boltzmann selection crowding elitism fitness sharing function rank selection selection pressure sigma scaling 245-246 246 245 246 245 246 245 246 genetic algorithms for real variables introduction to genetic algorithms modifications and enhancements: crossover modifications and enhancements: selection tournament ranking simple example of a genetic algorithm at work summary of genetic algorithm chapter using genetic algorithms to train a neural network 243-245 261-262 back-propagation modified discrete crossover neural network WEKA: hands-on analysis using genetic algorithms 249-252 250 252 249-250 252-261 49 5 57-63 51 231 36-39 49 158 240 155-203 174-177 High leverage point Houses data set Inference in regression Influential observation Learning in a Bayesian network Least-squares estimates Leverage Likelihood function locus Logistic regression assumption of linearity higher order terms to handle non-linearity inference: are the predictors significant? 183-189 deviance saturated model Wald test 161-162 160 160 161 for a continuous predictor for a dichotomous predictor for a polychotomous predictor odds odds ratio reference cell coding relative risk standard error of the coefficients 162-174 170-174 163-166 166-170 162 162 166 163 165-166 interpreting logistic regression model interpreting logistic regression output maximum likelihood estimation 159 likelihood function log likelihood maximum likelihood estimators multiple logistic 246 158 158 158 158 179-183 regression simple example of conditional mean logistic regression line logit transformation sigmoidal curves summary of logistic regression chapter validating the logistic regression model WEKA: hands-on analysis using logistic regression zero-cell problem 156-168 156-157 156-157 158 157 197-199 189-193 194-197 177-179 156-157 158 131 Logistic regression line Logit transformation Mallows' Cp statistic Maximum a posteriori classification (MAP) Maximum likelihood estimation Mean squared error, (MSE) Minitab software MIT Technology Review 206-215 158 43 xiii-xiv xi 1, 115123 179-183 Multicollinearity Multiple logistic regression Multiple regression and model building 93-154 adjusting the coefficient of determination estimated multiple regression equation inference in multiple regression 113 94 100 confidence interval for a particular coefficient f-test t-test 104 102 101 variance inflation factors 96 131 116-123 118-119 interpretation of coefficients Mallows' Cp statistic multicollinearity multiple coefficient of determination multiple regression model regression with categorical predictors 97 99 analysis of variance dummy variable 105-116 106 106 indicator variable reference category sequential sums of squares SSE, SSR, SST summary of multiple regression and model building chapter using the principle components as predictors variable selection criteria variable selection methods 106 107 115 97 147-149 142 135 all possible subsets procedure application to cereals dataset backward elimination procedure best subsets procedure forward selection procedure partial f-test stepwise procedure Multipoint crossover Mutation Mutation operator Mutation rate Naïve Bayes classification Naïve Bayes estimation and Bayesian networks Bayesian approach Bayes, Reverend Thomas frequentist or classical approach marginal distribution maximum a posteriori method non-informative prior posterior distribution prior distribution Bayesian belief networks 123-135 126 127-135 125 126 125 123 126 247 241 241 242 215-223 204-239 204-206 205 204 206 206 205 205 205 227-234 conditional independence in Bayesian networks directed acyclic graph joint probability distribution learning in a Bayesian network parent node, descendant node using the Bayesian network to find probabilities maximum a posteriori classification (MAP) balancing the data Bayes theorem 227 227 231 231 227 229-232 206-215 212 207 conditional probability joint conditional probabilities MAP estimate posterior odds ratio naïve Bayes classification adjustment for zero frequency cells conditional independence log posterior odds ratio numeric predictors verifying the conditional independence assumption summary of chapter on naïve Bayes estimation and Bayesian networks WEKA: hands-on analysis using naïve Bayes WEKA: hands-on analysis using the Bayes net classifier Neural network Non-informative prior Normally distributed mutation Odds Odds ratio Outlier Overbalancing as a surrogate for misclassification costs Partitioning the data set Population Positional bias Posterior distribution Posterior odds ratio Prediction error Prinicple components analysis (PCA) Prior distribution Rank selection Regression coefficients Regression modeling Regression modeling 207 209 206-207 210-211 215-223 218-219 216 217-218 219-223 218 234-236 223-226 232-234 249-250 205 249 162 162 48 302-304 290 241 247 205 210-211 36 2-17 205 246 35 33-92 46 ANOVA table coefficient of determination correlation coefficient estimated regression equation estimation error example of simple linear regression inference in regression 39-43 45 35 36 34 57-63 confidence interval for the mean value of y given x 60 confidence interval for the slope prediction interval t-test for the relationship between x and y least-squares estimates error term least-squares line true or population regression equation mean squared error, (MSE) outliers, high leverage points, influential observations 60 61 58 36-39 36 36 36 43 Cook's distance high leverage point influential observation leverage outlier standard error of the residual standardized residual prediction error regression coefficients regression model assumptions residual error slope of the regression line standard error of the estimate, (s) sum of squares error, (SSE) sum of squares regression, (SSR) 48-55 52 49 51 49 48 48 48 36 35 55-57 55 36 35 43-44 40 41-42 sum of squares total, (SST) summary of regression modeling chapter transformations to achieve linearity 41 84-86 Box-Cox transformations bulging rule ladder of reexpressions Scrabble 79-84 83 79, 81 79 79-84 Anderson-Darling test for normality normal probability plot patterns in the residual plot quantile 63-68 65 63 67 64 verifying the regression assumptions y-intercept Regression with categorical predictors Relative risk Residual error Roulette wheel method Scrabble Selection operator Selection pressure Sigma scaling Simple arithmetic crossover Single arithmetic crossover Slope of the regression line Software 105-116 163 36 242 79-84 241 245 246 248 248 35 Clementine software Minitab software SPSS software WEKA software SPSS software Standard error of the estimate, (s) Steck, James Tournament ranking Transformations to achieve linearity Transformations to achieve normality or symmetry True negative True positive Uniform crossover User-defined composites Variable selection methods Variance inflation factors Website, companion WEKA software WEKA: Hands-on analysis White-box approach Whole arithmetic crossover www.dataminingconsultant.com y-intercept ZDNET news 35 Bayes net classifier Genetic algorithms Logistic regression Naïve Bayes xiii xiii-xiv xiii-xiv xiii-xiv xiii-xiv 43-44 xiv 246 79-84 272-275 268 268 247 23-25 123-135 118-119 xii, xiv xiii-xiv 232-234 252-261 194-197 223-226 xi 249 xii 35 xi