MGS 8040: Data Mining Syllabus for Fall 2014 Instructor: Dr. Satish Nargundkar Office: 827 College of Business Office Hours: By appointment Phone: (678) 644 6838 E-Mail : snargundkar@gmail.com Website: www.nargund.com/gsu CRN: 83254, Buckhead Center, Room 406 Thursday 4:30 – 7:00 PM Prerequisites: MBA 7025 or equivalent or permission of instructor. You must already have knowledge of basic statistics, including Regression Analysis, to succeed in this course. Text: 1. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, 3rd Edition, by Gordon Linoff and Michael Berry. ISBN-10: 0470650931 ISBN-13: 978-0470650936, Wiley. The following optional books/sites may also be helpful. 2. 3. 4. 5. Making Sense of Data II by Glenn Myatt & Wayne Johnson, John Wiley& Sons, 2009. Multivariate Data Analysis by Hair, Anderson, Tatham, & Black, Prentice Hall. http://statsoft.com/textbook/stathome.html. The Little SAS Book by Delwiche and Slaughter. Course Catalog Description This course covers various analytical techniques to extract managerial information from large data warehouses. A number of well-defined data mining tasks such as classification, estimation, prediction, affinity grouping and clustering, and data visualization are discussed. Design and implementation issues for corporate data warehousing are also addressed. Detailed Course Description Data mining supports decision making by detecting patterns, devising rules, identifying new decision alternatives and making predictions. This course is organized around a number of well-defined data mining tasks: description, classification, estimation, prediction, and affinity grouping and clustering. Students will learn to use techniques such as Rule Induction (classification trees), Logistic Regression, Discriminant Analysis, and Neural Networks. Data visualization techniques will be used whenever possible to reveal patterns and relationships. Students will use commercially available software tools to mine large databases. Team-based projects will be conducted. The course is organized into 3 broad areas as follows: 1) Context: Decision Support for Strategic/Tactical Decision-making. Data/Information Organization Data Warehouse Design. 2) Exploratory Analysis: Segmentation Techniques 3) Forecasting/Segmentation: Modeling Techniques, Transforming analysis into actions Learning Outcomes/Course Objectives Upon completion of the course, students will be able to: 1. Explain in your own words a general framework for decision support within organizations. 2. Discuss the sources of data, problems with data, and how to overcome them (Data Cleaning). 3. Understand business requirements, organization structure, and how data mining projects may fit into a client’s organization to meet their decision support needs. 4. Explain the data mining methodology; use it to analyze a dataset. 5. Use visual techniques to describe data. 6. Explain the assumptions of various techniques such as Cluster Analysis, Multiple Regression, Discriminant Analysis, Logistic Regression, and Artificial Neural Networks. 7. Build multiple regression, discriminant analysis, and Logistic models for forecasting. 8. Validate models using the Kolmogorov-Smirnov (K-S) test. 9. Compare and Contrast Neural Networks with Statistical techniques. 10. Interpret Classification trees. 11. Use Interaction detection methods such as CART, CHAID, for classification. 12. Segment data using Cluster Analysis, and interpret the output. 13. Identify underlying factors using Factor Analysis, and interpret. 14. Discuss issues of implementation of the results of various techniques. 15. Develop methods to monitor the ongoing performance of implemented models. Methods of Instruction: The course will combine lectures and discussion, plus guest lectures from industry experts. The teambased project will be emphasized, and case studies will be discussed. Grading: Assignments Tests (2) Team Project Final Exam 20% 50% 20% 10% Course Average 94-96, 97+ 90-93 87-89 83-86 80-82 Grade A, A+ AB+ B B- Course Average 77-79 73-76 70-72 60-69 Less than 60 Grade C+ C CD F Late work will get partial credit only, with 10% less for each day of delay. Software: Students are encouraged to do project work in SAS or R in order to develop a marketable skill. You may choose other software (SPSS is available at GSU) if you wish. SAS will be discussed in class. General Policies: 1. Students are expected to attend each class (who knows, you may actually enjoy the class!), arrive on time and participate in class discussions. 2. Turn off cell phones, pagers, stereos, TVs, etc. when in class. Treat the instructor and each other with courtesy. Course Assessment: Your constructive assessment of this course plays an indispensable role in shaping education at Georgia State. Upon completing the course, please take the time to fill out the online course evaluation. MGS 8040 Data Mining Tentative Schedule – Fall 2014 Topic Overview / Understanding Data Week 1: 8/28 Introduction – DM Overview Week 2: 9/4 Regression Review Understanding Credit Data – Equifax / Experian / Trans Union Week 3: 9/11 The Initial Client Meeting Date Week 4: 9/18 Introduction to SAS SAS Training at UCLA Readings Assignments Notes Notes – Simple Regression Notes – Multiple Regression Exercise Review Regression Analysis Notes Notes – Initial Client Meeting Hair Chapter 2 Sample Design Exercise Solution to Exercise Data Cleaning Notes – Basic SAS Analysis 1. Application – Dep. Var, Outcome, Sample time frame 2. SAS assignment Folder Instructions The Little SAS Book By Delwiche & Slaughter Data1 subset in Excel Week 5: 9/25 Guest Lecture: State of the art of Analytics and Big Data. Bill Franks, Chief Analytics Officer, Teradata Corp. Week 6: 10/2 Data Cleaning Data Warehouse introduction 3. Crosstabs, Dummy Dummy Variable Definition Books by Edward Tufte. decisions Gallery of Data Visualization Class Handout WHO visualization Week 7: 10/9 Test 1 Modeling/Validation/Forecasting Week 8: 10/16 Hair, Chapter 4 4. Discrim, KS Discriminant Analysis www.statsoft.com Validation – KS Test SAS Programs for SAS Programs for Reg/Scoring Regression/Scoring Week 9: 10/23 Guest Lecture: Logistic Regression and Classification Trees Gregg Weldon, Chief Analytics Officer, Analytics IQ Inc. Intro to Logistic Regression, Logistic Regression, Classification Trees Week 10: 10/30 Effectiveness of models – A Research Paper on Model Effectivenss review of methods Neural Networks Excel file (demo of logic) www.statsoft.com Segmentation Week 11: 11/6 Segmentation Hair, Cluster Analysis 5. Clustering Cluster Analysis www.statsoft.com SPSS Output (Cluster) Factor Analysis Project Progress Memory Based Reasoning Clustering Paper Report (informal, oral) Week 12: 11/13 Test 2 Week 13: 11/20 Project Presentations 11/27 Thanksgiving Break Week 14: 12/4 Monitoring Reports Review Project Reports Due [Guidelines] Sample Final Project Week 15: 12/11 Final Exam – Comprehensive – 4:15 – 6:45 PM. Appendix A MSA Program Goals and Objectives Goals: Students completing the MS in Analytics will: G1 G2 G3 G4 G5 Understand organizational problems in general and associated analytical problems in particular. Proficient in the management of data needed for decision-making. Proficient with the methodological skills needed for data-driven decision-making. Understand the implementation issues that accompany analytical problem solving. Be able to demonstrate the positive impact on analytics on organizations. Objectives/Learning Outcomes (LO): After finishing the program students are expected to have mastered the knowledge and skills to carry out the following analytical tasks: LO1 LO2 LO3 LO4 LO5 LO6 LO7 LO8 Frame Business Problems (G1) MSA students will properly frame a business problem. Frame Analytical Problems (G1) MSA students will demonstrate the ability to properly solve analytical problems. Data Management (G2) MSA students will effectively acquire, clean, and manage both structured and unstructured data. Methodology (G3) MSA students will identify and apply the appropriate methodology for the business and analytical problem(s) identified. Modeling (G3, G4). MSA students will build and deploy analytical models across organizations that fit the underlying organizational needs and the analytical problem(s) identified. Programming (G4). MSA students will solve analytical problems by utilizing computer programming, both by employing available tools where possible and by developing customized solutions where necessary. Life Cycle Management. (G3, G4). MSA student(s) will develop adaptable models that allow for continued organizational improvement of productivity and quality Organizational Impact (G5) MSA student(s) will effectively communicate the positive, strategic impact of a model on the firm to which it is being applied.