MGS 8040 : Data Mining - Specialized Master`s Programs

advertisement
MGS 8040: Data Mining
Syllabus for Fall 2014
Instructor: Dr. Satish Nargundkar
Office: 827 College of Business
Office Hours: By appointment
Phone: (678) 644 6838
E-Mail : snargundkar@gmail.com
Website: www.nargund.com/gsu
CRN: 83254, Buckhead Center, Room 406
Thursday 4:30 – 7:00 PM
Prerequisites: MBA 7025 or equivalent or permission of instructor. You must already have knowledge of
basic statistics, including Regression Analysis, to succeed in this course.
Text:
1.
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management,
3rd Edition, by Gordon Linoff and Michael Berry. ISBN-10: 0470650931
ISBN-13: 978-0470650936, Wiley.
The following optional books/sites may also be helpful.
2.
3.
4.
5.
Making Sense of Data II by Glenn Myatt & Wayne Johnson, John Wiley& Sons, 2009.
Multivariate Data Analysis by Hair, Anderson, Tatham, & Black, Prentice Hall.
http://statsoft.com/textbook/stathome.html.
The Little SAS Book by Delwiche and Slaughter.
Course Catalog Description
This course covers various analytical techniques to extract managerial information from large data
warehouses. A number of well-defined data mining tasks such as classification, estimation, prediction,
affinity grouping and clustering, and data visualization are discussed. Design and implementation issues
for corporate data warehousing are also addressed.
Detailed Course Description
Data mining supports decision making by detecting patterns, devising rules, identifying new decision
alternatives and making predictions. This course is organized around a number of well-defined data
mining tasks: description, classification, estimation, prediction, and affinity grouping and clustering.
Students will learn to use techniques such as Rule Induction (classification trees), Logistic Regression,
Discriminant Analysis, and Neural Networks. Data visualization techniques will be used whenever
possible to reveal patterns and relationships. Students will use commercially available software tools to
mine large databases. Team-based projects will be conducted.
The course is organized into 3 broad areas as follows:
1) Context: Decision Support for Strategic/Tactical Decision-making. Data/Information Organization
Data Warehouse Design.
2) Exploratory Analysis: Segmentation Techniques
3) Forecasting/Segmentation: Modeling Techniques, Transforming analysis into actions
Learning Outcomes/Course Objectives
Upon completion of the course, students will be able to:
1. Explain in your own words a general framework for decision support within organizations.
2. Discuss the sources of data, problems with data, and how to overcome them (Data Cleaning).
3. Understand business requirements, organization structure, and how data mining projects may fit
into a client’s organization to meet their decision support needs.
4. Explain the data mining methodology; use it to analyze a dataset.
5. Use visual techniques to describe data.
6. Explain the assumptions of various techniques such as Cluster Analysis, Multiple Regression,
Discriminant Analysis, Logistic Regression, and Artificial Neural Networks.
7. Build multiple regression, discriminant analysis, and Logistic models for forecasting.
8. Validate models using the Kolmogorov-Smirnov (K-S) test.
9. Compare and Contrast Neural Networks with Statistical techniques.
10. Interpret Classification trees.
11. Use Interaction detection methods such as CART, CHAID, for classification.
12. Segment data using Cluster Analysis, and interpret the output.
13. Identify underlying factors using Factor Analysis, and interpret.
14. Discuss issues of implementation of the results of various techniques.
15. Develop methods to monitor the ongoing performance of implemented models.
Methods of Instruction:
The course will combine lectures and discussion, plus guest lectures from industry experts. The teambased project will be emphasized, and case studies will be discussed.
Grading:
Assignments
Tests (2)
Team Project
Final Exam
20%
50%
20%
10%
Course
Average
94-96, 97+
90-93
87-89
83-86
80-82
Grade
A, A+
AB+
B
B-
Course
Average
77-79
73-76
70-72
60-69
Less than 60
Grade
C+
C
CD
F
Late work will get partial credit only, with 10% less for each day of delay.
Software: Students are encouraged to do project work in SAS or R in order to develop a marketable skill.
You may choose other software (SPSS is available at GSU) if you wish. SAS will be discussed in class.
General Policies:
1. Students are expected to attend each class (who knows, you may actually enjoy the class!), arrive
on time and participate in class discussions.
2. Turn off cell phones, pagers, stereos, TVs, etc. when in class. Treat the instructor and each other
with courtesy.
Course Assessment:
Your constructive assessment of this course plays an indispensable role in shaping education at Georgia
State. Upon completing the course, please take the time to fill out the online course evaluation.
MGS 8040 Data Mining Tentative Schedule – Fall 2014
Topic
Overview / Understanding Data
Week 1: 8/28
Introduction – DM Overview
Week 2: 9/4
Regression Review
Understanding Credit Data –
Equifax / Experian / Trans
Union
Week 3: 9/11
The Initial Client Meeting
Date
Week 4: 9/18
Introduction to SAS
SAS Training at UCLA
Readings
Assignments
Notes
Notes – Simple Regression
Notes – Multiple Regression
Exercise
Review Regression
Analysis Notes
Notes – Initial Client Meeting
Hair Chapter 2
Sample Design Exercise
Solution to Exercise
Data Cleaning
Notes – Basic SAS Analysis
1. Application – Dep.
Var, Outcome, Sample
time frame
2. SAS assignment
Folder Instructions
The Little SAS Book
By Delwiche & Slaughter
Data1 subset in Excel
Week 5: 9/25
Guest Lecture: State of the art of Analytics and Big Data.
Bill Franks, Chief Analytics Officer, Teradata Corp.
Week 6: 10/2
Data Cleaning
Data Warehouse introduction
3. Crosstabs, Dummy
Dummy Variable Definition
Books by Edward Tufte.
decisions
Gallery of Data Visualization
Class Handout
WHO visualization
Week 7: 10/9
Test 1
Modeling/Validation/Forecasting
Week 8: 10/16
Hair, Chapter 4
4. Discrim, KS
Discriminant Analysis
www.statsoft.com
Validation – KS Test
SAS Programs for
SAS Programs for
Reg/Scoring
Regression/Scoring
Week 9: 10/23
Guest Lecture: Logistic Regression and Classification Trees
Gregg Weldon, Chief Analytics Officer, Analytics IQ Inc.
Intro to Logistic Regression, Logistic Regression, Classification Trees
Week 10: 10/30
Effectiveness of models – A Research Paper on Model
Effectivenss
review of methods
Neural Networks
Excel file (demo of logic)
www.statsoft.com
Segmentation
Week 11: 11/6
Segmentation
Hair, Cluster Analysis
5. Clustering
Cluster Analysis
www.statsoft.com
SPSS Output (Cluster)
Factor Analysis
Project Progress
Memory Based Reasoning
Clustering Paper
Report (informal, oral)
Week 12: 11/13
Test 2
Week 13: 11/20
Project Presentations
11/27
Thanksgiving Break
Week 14: 12/4
Monitoring Reports Review
Project Reports Due
[Guidelines]
Sample Final Project
Week 15: 12/11
Final Exam – Comprehensive – 4:15 – 6:45 PM.
Appendix A
MSA Program Goals and Objectives
Goals:
Students completing the MS in Analytics will:
G1
G2
G3
G4
G5
Understand organizational problems in general and associated analytical problems in
particular.
Proficient in the management of data needed for decision-making.
Proficient with the methodological skills needed for data-driven decision-making.
Understand the implementation issues that accompany analytical problem solving.
Be able to demonstrate the positive impact on analytics on organizations.
Objectives/Learning Outcomes (LO): After finishing the program students are expected to
have mastered the knowledge and skills to carry out the following analytical tasks:
LO1
LO2
LO3
LO4
LO5
LO6
LO7
LO8
Frame Business Problems (G1) MSA students will properly frame a business problem.
Frame Analytical Problems (G1) MSA students will demonstrate the ability to properly
solve analytical problems.
Data Management (G2) MSA students will effectively acquire, clean, and manage both
structured and unstructured data.
Methodology (G3) MSA students will identify and apply the appropriate methodology
for the business and analytical problem(s) identified.
Modeling (G3, G4). MSA students will build and deploy analytical models across
organizations that fit the underlying organizational needs and the analytical problem(s)
identified.
Programming (G4). MSA students will solve analytical problems by utilizing computer
programming, both by employing available tools where possible and by developing
customized solutions where necessary.
Life Cycle Management. (G3, G4). MSA student(s) will develop adaptable models that
allow for continued organizational improvement of productivity and quality
Organizational Impact (G5) MSA student(s) will effectively communicate the positive,
strategic impact of a model on the firm to which it is being applied.
Download