B8114 Applied Regression Analysis FALL 2014, 2nd Half PROFESSOR Peter Kolesar Office Location: 314 Uris Office Phone: 212 854 4105 Fax: 212 316 9180 E-mail: pjk4@columbia.edu Class Meetings: Thursdays 2:30 to 5:00 pm Office Hours: Thursdays, noon to 2 pm and by appointment. Skype conferences welcomed TEACHING ASSISTANTS to be named REQUIRED COURSE MATERIAL: See course description for more information Recommended Textbook: Sampit Chatterjee, Ali S. Hadi and Bertram Price, Regression Analysis by Example, 4th edition (Wiley 2006) ISBN 978-0-471-74696-6 Recommended Computer Software: Minitab 17 (A free 30-day trial version may be downloaded from http://it.minitab.com/en-us/products/minitab/free-trial.aspx PREREQUISITE This course will utilize, build on and extend concepts covered in the MBA core statistics course B6100 or the equivalent – that is, the basics of probability and statistical inference. Students who qualified out of the core statistics course will generally be accepted, but should consult with the professor prior to the start of term, or in the first class. COURSE DESCRIPTION This course studies the family of statistical methods called regression analysis and is a logical successor to the core B6100 Managerial Statistics course. It is frequently taken by 2nd-year MBA students wishing to solidify and extend their quantitative and statistical data analysis skills. Regression analysis is used to build statistical models of the relationships between variables that can be used for enhanced understanding of the causes of a phenomenon and, when it works, for prediction of future outcomes. In business the ultimate goal of regression analysis is often to support better decision making. In the contemporary world of ‘big data’, regression provides foundational methods and ideas for many of the techniques used in ‘data mining.’ Regressions have been used in financial analyses of investment opportunities, in marketing analyses of customer behavior, in human resources to test the fairness of employment policies, in operations to identify the determinants of Page 1 of 4 product quality, and in strategic planning to create sales forecasts. Regression models are also widely used in many other fields in the sciences, economics and engineering. Although contemporary computing hardware and statistical software has made it extraordinarily easy to mechanically produce regression analyses (for example, Microsoft Excel has a powerful regression tool that is easy to use without any knowledge of the underlying concepts or theory) it is a challenge to create a regression model that is really useful and reliable. The explicit goal of this course is to learn how, in a business context, to create reliable, valid and useful regressions, and to be able to judge the validity and usefulness of regressions done by others. The course premise is that successful applications of regression require understanding of both the practical problem situation, and the underlying statistical theory. The course blends theory and applications -- avoiding the extremes of presenting unneeded theory in isolation, or of giving application tools without the foundation needed for practical understanding. The course integrates three topics: First and most basic, is an approach to data and data analysis that is based on statistical theory, the scientific method and on some pragmatic epistemology. Second, is regression analysis mechanics and theory, including extensions of the basic linear regression model to logistic regressions, non-linear models and multivariate methods. Third, is forecasting of time series from historical data. We will seek to introduce some elements of modern ‘big data/data mining’ as time permits. The title of our textbook is descriptive of our approach: Regression by Example. Concepts and procedures shall generally be introduced by example. Moreover, we will emphasize applications in which the business context matters. Computing: The course will be computationally hands-on from the very first lecture. Your laptop computer will be used for all data analysis. Some of the course work, at least at the outset, can be done in Excel and we assume a basic familiarity with its data analysis tools and capabilities. However, there are advantages and conveniences to using a statistical software package. Several important regression procedures cannot be done in Excel, so we will supplement it with the Minitab statistical analysis system. Minitab gives us professional statistical analysis capabilities while being inexpensive and very easy to learn and use. An advantage of Minitab is the ease with which it interfaces with Microsoft’s Excel , Word and PowerPoint. Any version of Minitab, or indeed any other software that can do regression, stepwise regression and logistic regression will be adequate. Students who already are familiar with, or have access to, another software package that has these capabilities are welcome to use it instead. ( e.g. STATA, BMDP, SAS, S4, JMPIN,R) Conduct of the Course Course Project: A major part of the course will be a term project consisting of a significant regression oriented data analysis in a real business context. I will provide a standard ‘default’ project. However, I suggest that students who have particular application interests propose their own project, as this can increase greatly the value you get out of the course. The term project can be either an individual effort or by a team of two. Specifications for the final project report and timing will be provided in class. Workload and Grading: It is expected that students will attend class regularly and participate fully in class discussions. Since many of these discussions will be based on our analytic homework assignments (mini-cases), it is important that assigned work be done thoroughly and on time. Most regular homework will be of the Business School’s “Type A” Page 2 of 4 variety, but with the group size limited to a maximum of 2 people. You may make one submission and an identical grade will be given for both members of the group. You have the option of doing these exercises individually as well. Some homework, specified in advance, will need to be done individually. There will be one short electronically administered exam. In class I will generally expect professional comportment appropriate to serious learning environment. On the other hand, I intend that we all will have fun while learning. The overall work load should be moderate, but as in any serious learning endeavor, you will benefit from the course in proportion to what you put in. The final course grade will be composed of four components: Exam 15% Attendance and class participation 15% Written Assignments 35% Term Project 35% Textbooks and Software The course will follow the same general outline and notation as the textbook Regression Analysis by Example by Chatterjee, Hadi and Price, listed below. In assignments and lectures we will often refer to the book as RAE. It, RAE, is a good resource and reference, and goes into greater depth on some topics than we will have time for in class. This textbook strikes a balance between providing a theoretical understanding and keeping a concrete focus on applications. While we may reference some of the book’s examples in class, we will frequently use different examples so the book will offer complimentary views and illustrations on some issues and procedures. We strongly recommend purchasing it; however it is possible to do very well in this course without owning the textbook – it will be on reserve in the Business School Library. There are a good number of excellent books on regression and if you already own another book, it may suffice. I recommend as an alternative the excellent book – at a slightly higher mathematical level than RAE, Introduction to Linear Regression Analysis by Montgomery, Peck and Vining. As stated above, In addition to Excel we will use the Minitab statistical package, the software for which comes with a helpful user’s manual. Textbook: Sampit Chatterjee, Ali S. Hadi and Bertram Price, Regression Analysis by Example, 4th edition (Wiley 2006) ISBN 978-0-471-74696-6 Computer Software: Minitab 17 (A free 30-day trial version may be downloaded from www.minitab.com http://www.minitab.com/products/minitab/demo/) Page 3 of 4 COURSE OBJECTIVES The primary objective of the course is to enable you to carry out meaningful regression analyses in a business context and to be a knowledgeable consumer of such analyses done on your behalf or on behalf of your firm by others. Another goal is provide a foundation for other data mining techniques such as regression trees and discriminant analysis. In the process the course should greatly enhance your understanding and comfort with variation, statistics and probability. For further information contact me, Professor Peter Kolesar, preferably initially by email and I will follow up by phone or Skype. pjk4@columbia.edu 212 854 4105 (office) or 845 557 6307 (alternative phone number) Page 4 of 4