DISSERTATION PROPOSAL DEFENSE

Boosting and Bagging For Fun and Profit Hal Elkins David Lucus Keith Walker Ensemble Methods  Improve predictive performance of a given statistical model      fitting technique. Run a base procedure many times while changing input data Estimates are linear or non-linear combinations of iteration estimates Originally used for machine learning and data and text mining Attracting attention due to relative simplicity and popularity of bootstrapping Are ensemble methods useful to academic researchers? Bagging  Bootstrap aggregating for improving unstable estimation       schemes - Breiman (1996) Variance reduction for base procedure – Bühlmann and Yu (2002) Bagging requires user specified input models Step 1) construct bootstrap sample with replacement. Step 2) compute estimator Step 3) repeat Base procedure bias is increased Boosting  Boosting proposed by Schapire (1990) and Freund (1995)  Nonparametric optimization - useful if we have no idea for a      model Bias reduction Step 1) initialize – apply base procedure Step 2) compare residuals Step 3) repeat Variance is increased Enterprise Miner: Bagging & Boosting Enterprise Miner: Bagging  Control:  Ensemble (under model menu)  Inputs  The outputs of other models  Regression  Decision Tree  Neural Networks  Settings  Limited and VERY BLACK BOX Enterprise Miner: Bagging  How to view output  Connect Ensemble output to Regression node  Use Comparison node to:  Compare Bagged model with input models  Note:  Ensemble will only out perform the input models IF, there is large disagreement in the input models. (AAEM_61 Manual) Year 1 Analysis Year 2 Analysis Enterprise Miner: Boosting  Control  Gradient Boosting (under model menu)  Input  Data Partition or Dataset  Settings (MANY)  Assessment Measures  Tree Size Settings  Iterations  Etc. Enterprise Miner: Boosting  Outputs  List variable importance  List # of decision rules containing each variable  Hook to regression node for more information  Compare with other models  Use Model Comparison node  Example  Gradient Boost Regression AIC: -5868.21  Base Regression AIC: -2866.43  Base Neural Network AIC: -2981.86 Enterprise Miner: Boosting  A Boosting Story (Another data set)  Prediction of graduation from TTU  Data 2004-2007 (SAT, ACT, HSRank%, Parent Education, Income Level)  Texas Census Level (Matched on Student’s High School county code (20+ variables)  After boosting 1 variable had prediction power of graduation from TTU Previous Use  Manescu, C. & Starica, C. 2009. Do corporate social responsibility scores explain and predict firm profitability? A case study on the publishers of the Dow Jones Sustainability Indexes. Working Paper, Gothenburg University.  Bagging and Boosting to determine if CSR measures affect ROA Model Comparisons – Data Courtesy Dr. Romi  Dependent Variable (Change in CSR Performance)i,t+n  OLS = α + β1[CSO]i,t + β2COMMITTEEi,t + β3∆SIZEi,t+1 + β4∆ROAi,t+1 + β5ΔFINi,t+1 + β6ΔLEVi,t+1 + β7GLOBALi,t + β8CEOCHAIRi,t + β9HIERi,t + β10ESIi,t + β11LITIGATIONi,t + β12EXPERTi,t + ε  Boosting = α + β1[CSO]i,5 + β2COMMITTEEi,t + β3GLOBALi,t + β4CEOCHAIRi,t + β5HIERi,t + ε  Bagging All = α + β1[CSO]i,4 + β2[CSO]i,5 + β3COMMITTEEi,t + β4GLOBALi,t + β5EXPERTi,t + ε Is Either Useful to Us?  What we thought  Both useful in model selection and refinement  What we concluded  Bagging for settling different model possibilities  Bagging helps determine model disagreement on “black box” models  Boosting for grounded theory  Boosting for starting model point References  Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996a)  Bühlmann, P.,Yu, B.: Analyzing bagging. Ann. Stat. 30, 927–961 (2002)  Freund, Y.: Boosting a weak learning algorithm by majority. Inform. Comput. 121, 256–285 (1995)  Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5, 197– 227 (1990)  Course Notes  George, Jim et al: Applied Analytics Using SAS Enterprise Miner 6.1, Course Notes, 2009  Working Paper  Manescu, C. & Starica, C. 2009. Do corporate social responsibility scores explain and predict firm profitability? A case study on the publishers of the Dow Jones Sustainability Indexes. Working Paper, Gothenburg University. THANK YOU!  QUESTIONS?

DISSERTATION PROPOSAL DEFENSE

Related documents

Products

Support

DISSERTATION PROPOSAL DEFENSE

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib