B8831 Applied Regression Analysis FALL 2013, 2nd Half PROFESSOR Peter Kolesar Office Location: 314 Uris Office Phone: Fax: 212 854 4105 212 316 9180 E-mail: pjk4@columbia.edu Office Hours: Tuesdays, noon to 2 pm and by appointment. Skype conferences welcomed TEACHING ASSISTANTS to be named REQUIRED COURSE MATERIAL: See course description for more information Recommended Textbook: Sampit Chatterjee, Ali S. Hadi and Bertram Price, Regression Analysis by Example, 4th edition (Wiley 2006) ISBN 978-0-471-74696-6 Recommended Computer Software: Minitab 16 (A free 30-day trial version may be downloaded from www.minitab.com http://www.minitab.com/products/minitab/demo/) PREREQUISITE The material in this course will utilize, build on and extend concepts covered in the MBA core statistics course B6014 or the equivalent – that is the basics of probability and statistical inference. Students who qualified out of the core statistics course will generally be accepted, but should consult with the professor at the start of term. COURSE DESCRIPTION This course is about the family of statistical methods called regression analysis and is a logical successor to the core B6014 Managerial Statistics course. It is frequently taken by 2nd-year MBA students wishing to solidify and extend their quantitative and statistical data analysis skills. Regression analysis is used to build statistical models of the relationships between variables that can be used for enhanced understanding of the causes of a phenomenon and, when it works, for prediction of future outcomes. In business the ultimate goal of regression analysis is often to support better decision making. In the contemporary world of ‘big data’, regression provides foundational methods and ideas for many of the techniques used in ‘data mining.’ Page 1 of 3 Regressions have been used in financial analyses of investment opportunities, in marketing analyses of customer behavior, in human resources to test the fairness of employment policies, in operations to identify the determinants of product quality, and in strategic planning to create sales forecasts. Regression models are also widely used in many other fields in the sciences, economics and engineering. Although contemporary computing hardware and statistical software has made it extraordinarily easy to mechanically produce regression analyses (for example, Microsoft Excel has a powerful regression tool that is easy to use without knowledge of the underlying concepts or theory) it is a challenge to create a regression model that is really useful and reliable. The explicit goal of this course is to learn how, in a business context, to create reliable, valid and useful regressions, and to be able to judge the validity and usefulness of regressions done by others. The course premise is that successful applications of regression require understanding of both the practical problem situation, and the underlying statistical theory. The course blends theory and applications -- avoiding the extremes of presenting unneeded theory in isolation, or of giving application tools without the foundation needed for practical understanding. The course integrates three topics: First and most basic, is an approach to data and data analysis that is based on statistical theory, the scientific method and on some pragmatic epistemology. Second, is regression analysis mechanics and theory, including extensions of the basic linear regression model to logistic regressions, non-linear models and multivariate methods. Third, is forecasting of time series from historical data. We will seek to introduce some elements of modern ‘big data/data mining’ as time permits. The title of our textbook is descriptive of our approach: Regression by Example. Concepts and procedures shall generally be introduced by example. Moreover, we will emphasize applications in which the business context matters. Computing: The course will be computationally hands-on from the very first lecture. Your laptop computer will be used for all data analysis. Much of the course work, at least at the outset, can be done in Excel and we assume a basic familiarity with its data analysis tools and capabilities. However, there are advantages and conveniences to using a statistical software package. Several important regression procedures cannot be done in Excel, so we will supplement it with the Minitab statistical analysis system. Minitab gives us professional statistical analysis capabilities while being very easy to learn and use. Any version of Minitab, or indeed any other software that can do regression, stepwise regression and logistic regression will be adequate. Students who already are familiar with another software package that has these capabilities are welcome to use it instead. ( e.g. STATA, BMDP, SAS, S4, JMPIN) Conduct of the Course Course Project: A major part of the course will be a data analysis project consisting of a significant data analysis in a real business context. I will provide a standard ‘default’ project. However, I strongly suggest that students who have particular interests propose their own project, as this can increase greatly the value you get out of the course. The term project will be either an individual effort or by a team of two. Specifications for the final project report and timing will be provided in class. Workload and Grading: It is expected that students will attend class regularly and participate fully in class discussions. Since many of these discussions will be based on our analytic assignments (mini-cases), it is important that assigned work be done thoroughly and on time. Regular homeworks in this class are of the Type A variety, but with the group Page 2 of 3 size limited to 2 people per group with one submission and one grade for both members of the group. You have the option of doing these individually as well. I will generally expect professional comportment appropriate to serious learning environment. On the other hand, I intend that we all will have fun while learning. The overall work load should be moderate, but as in any serious learning endeavor, you will get benefit from the course in proportion to what you put in. Assignments can be done individually or in teams of two students. The final course grade will be composed of three components: Attendance and class participation 1/3 Written Assignments 1/3 Term Project 1/3 Textbooks and Software The course will follow the same general outline as the text by Chatterjee, Hadi and Price listed below. This book strikes a balance between providing a theoretical understanding and keeping a concrete focus on applications. In class we generally use different examples than those in the text, so it offers a second and complimentary view on most issues and procedures. We recommend purchasing it; however it is possible to do very well in this course without owning the textbook. But it is a good resource and reference, and goes into greater depth than we will have time for in class. There are a number of excellent books on regression and if you already own another, it may suffice. In addition to Excel we will use the Minitab statistical package, the software for which comes with a helpful user’s manual. Textbook: Sampit Chatterjee, Ali S. Hadi and Bertram Price, Regression Analysis by Example, 4th edition (Wiley 2006) ISBN 978-0-471-74696-6 Computer Software: Minitab 16 (A free 30-day trial version may be downloaded from www.minitab.com http://www.minitab.com/products/minitab/demo/) COURSE OBJECTIVES The primary objective of the course is to enable you to carry out meaningful regression analyses in a business context and to be a knowledgeable consumer of such analyses done on your behalf or on behalf of your firm by others. In the process the course should greatly enhance your understanding and comfort with variation, statistics and probability. For further information contact me, Professor Peter Kolesar, preferably initially by email and I will follow up by phone or Skype. pjk4@columbia.edu 212 854 4105 (office) or 845 557 6307 (summer phone number) Page 3 of 3