STAT 3880 CRN 2

advertisement
Course Title: Data Mining / Predictive Analytics
Quarter/Year: Spring Quarter, 2013
Course Number, Section, CRN: STAT 3880 CRN 2016 Sect. 1 / INFO 4300 CRN 4865 Sect. 1
Prerequisites: STAT 1400 Statistics II, INFO 1010 Business Statistics & Analytics
or STAT 4610 – Business Statistics
Meeting Place and Time: DCB 240 MW 4:00pm-5:50pm
Name of Professor: Dr. Kellie Keeling
Office Hours: MW 1:30p-3:45p, Virtual Office Hours as Posted, and by Appointment
Discussion Board: The General Questions area should be used to ask questions that may be
relevant to all the students in the course. The instructor will log on to the discussion board nearly
every day.
Office Location: DCB 590
E-Mail Address: Moo@statsdairy.com
Phone Number: 303-871-2296 (but please e-mail to leave me a message, as my phone eats messages)
Class Web Presence: http://statsdairy.com and http://blackboard.du.edu/
Introduction: This is a blended course. That means we will meet face-to-face for Mondays and
Wednesdays, but there is a substantial portion of the course material that will be delivered and
completed "online." Attendance at the face-to-face meetings and participation in all online activities
is required. You will find that the online and face-to-face elements of this course are interdependent
and integrated. Online participation is required every week – you will be expected to go online to
complete reading quizzes, watch lectures, and participate in discussion. You will be assigned to a
group and completion of some activities will require group interaction. Some activities will be faceto-face and some will be online. If you miss a face-to-face class for a legitimate reason, you may
complete the in-class group assignment on your own and submit it the following class.
ALL STUDENTS NEED TO FOLLOW THESE EXPECTATIONS:
University of Denver Honor Code
All students are expected to abide by the University of Denver Honor Code. These expectations
include the application of academic integrity and honesty in your class participation and
assignments. The Honor Code can be viewed in its entirety at this link:
http://www.du.edu/studentlife/studentconduct/index.html
All members of the University of Denver are expected to uphold the values of Integrity, Respect,
and Responsibility. These values embody the standards of conduct for students, faculty, staff, and
administrators as members of the University community.
In order to foster an environment of ethical conduct in the University community, all community
members are expected to take “constructive action,” that is, any effort to discuss or report any
behavior contrary to the Honor Code with a neutral party. Failure to do so constitutes a violation of
the DU Honor Code. Specifically, plagiarism and cheating constitute academic misconduct and can
result in both a grade penalty imposed by the instructor and disciplinary action including suspension
or expulsion. As part of their responsibility to uphold the Honor Code, instructors reserve the right
to have papers submitted through SafeAssign to check for plagiarism against a database of papers
submitted previously at DU, a national database of papers, and the Internet.
Official Communications
The standard method of communicating official information from the Daniels College of Business to
its students is through email. Students are provided a DU account using the protocol of
firstname.lastname@du.edu, but must set up a "preferred" off-campus email address. Emails sent
to the DU account will be forwarded to the preferred email account. DU accounts do not store
messages. More information is available at: http://www.du.edu/studentemail/.
Students with Disabilities
A student who qualifies for academic accommodations because of a disability must submit a
Faculty Letter to the instructor from the DU Disability Services Program (DSP) in a timely manner,
so that the needs of the student can be addressed. Accommodations will not be provided
retroactively, e.g., following an exam or after the due date of a project. DSP determines eligibility
for accommodations based on documented disabilities. DSP is located in Ruffatto Hall, 1999 E.
Evans Ave. (303-871-2278). http://www.du.edu/studentlife/disability/dsp/index.html
Performance Assessment
The Daniels College of Business may use assessment tools in this course and other courses for
evaluation. Educational Assessment is defined as the systematic collection, interpretation, and use
of information about student characteristics, educational environments, learning outcomes and
client satisfaction to improve program effectiveness, student performance and professional
success.
Conflicts of Interest, including Gifts from Students
The University of Denver requires all employees to avoid real or perceived conflicts of interest.
Because of possible perceptions of undue influence, it is not appropriate for a student to give a gift
to a faculty member while the student is still enrolled in that faculty member’s class, including
through the grading period.
Emergency Procedures
The College places great emphasis on the safety of its students. Please respect emergency
instructions, including fire alarms. For more information, go to
http://www.du.edu/emergency/whattodowhen/index.html .
Daniels Areas of Interdisciplinary Strength: This course will cover statistical ethics.
REQUIRED COURSE MATERIALS:
Course Description
This course is designed to prepare you for managerial data analysis and data mining. More
specifically, the course addresses the how, when, why, and where of data mining. The emphasis is
on understanding the application of a wide range of modern techniques to specific decision-making
situations, rather than on mastering the theoretical underpinnings of the techniques. Upon
successful completion of the course, you should be able to perform the computational processes
necessary to extract information from multi-dimensional data and transform it into knowledge that
can lead to improved business performance. The course covers methods that are aimed at
prediction, forecasting, classification, clustering, and association. Students will gain hands-on
experience in using computer software to mine business data sets.
Required Materials
• Clicker Turning Point response system device – These are available new & used
from the bookstore ->
• Textbooks
Free Online Book: Discovering Knowledge in Data: An Introduction to Data
Mining. Larose, John Wiley & Sons. 2005. http://0library.books24x7.com.bianca.penlib.du.edu/toc.asp?bookid=12378
(I will post pdf files from this book for you on Blackboard)
Supplemental Book (Not Required): Data Mining Techniques: For Marketing, Sales, and
Customer Relationship Management, 2nd Ed., Berry and Linoff, Wiley Publishing, 2004
• Software
Microsoft Excel (2007/2010/2013)
JMP 9.0 (disks will be passed around the first day)
Camtasia Relay - I will send you information about downloading
Other freely available products including R, Weka, and WebCrawler
Course Assessment
Performance will be evaluated on the items below. For this class, all assignments and exams
assume you are trainees for Stats Dairy. Your training score is only a measure of your performance
in this class and does not reflect my opinion of you as an individual or your worth as a person.
Module Exam 1
15%
Module Exam 2
20%
Module Exam 3
20%
Group Mini-Example/Peer Feedback
10%
Reading/Video Quizzes
15%
Group Project
10%
Mini Assignments/Attendance (Clickers)
10%
100%
Grades: Stats Dairy regularly hires more data mining trainees than it needs. By means of this
course we determine where to place the graduates of the program:
90% - 100% A
Trainees who receive an A are considered on the "fast track" and will start out as data
mining analysts. Our studies show that most trainees who fall in this group reach an
executive position within 10 years.
80% - 89%
B
Trainees who receive a B will start out as assistant data mining analysts. This does
not mean that they cannot reach the executive level but it will be more difficult since
they will not regularly be put into career-enhancing positions such as overseas
consulting assignments.
70% - 79%
C
Trainees who receive a C will be put into staff positions for further development.
60% - 69%
D
Trainees who receive a D will be offered non-management positions.
00% - 59%
F
Trainees who receive an F will be separated from Stats Dairy.
Grading scale: A: 93-100%; A-: 90-92.9%; B+: 87-89.9%, B: 83-86.9%; B-: 80-82.9%; etc.
Course Assignments
Overall Description: This course is set up as a blended course. Therefore, in addition to meeting face-toface, we will conduct course activities through Blackboard. Here is a general outline of what to expect each
class for the course:
1. Read/View the assigned reading material for class (posted on Blackboard).
• Complete the “check for understanding” multiple choice reading quiz for Monday’s reading
before class.
2. Watch the lecture video for class (posted on Blackboard).
• Complete the quiz within the video.
3. Attend class.
• Complete the in-class group mini assignment.
• Participate using clickers during in-class quizzing.
• Watch Group Mini-Example Presentation. (starts week 2)
o Post your evaluation comments.
4. If you are a group presenting a mini-example, prepare your presentation before class and
present in during your assigned class.
Details about each graded component
Exams: Exams will be completed in two parts: On paper without notes and on your computer with your
notes. Calculators will be permitted. Cell phones cannot be used as calculators. If you are going to miss an
exam for a legitimate conflict, you must receive permission from me BEFORE the exam in order to
reschedule. Otherwise you will receive a zero on the exam. No make-up exams will be given. For the
portion of the exam taking on your personal computer, you will be required to record what you are doing
using Camtasia Relay.
Reading Quizzes: For all class days except exam days, a reading quiz will be due before class. These
reading quizzes are multiple-choice and cover assigned reading materials. The lowest three scores will be
dropped. Late work will be accepted with a 31% penalty up to one week late. Note: the due dates for the
first week will be extended.
Video Quizzes: For all class days except exam days, short lecture videos will be posted for you to watch
before class. The videos will be posted 2 days before class (starting week 2). While watching the video,
you need to complete a quiz. Note: the due dates for the first week will be extended.
Group Mini-Examples: Each group is responsible for one mini-example presentation during class. You will
sign up for a date the first day of class. These will start during the second week of class.
Group Peer Feedback: Each person should comment on each group Mini-Example presentation through
the link on Blackboard.
Mini Assignments: During class time an assignment will be given that supplements the topics learned by
watching recorded lectures and reading supplied materials. These can be completed in small groups or
individually. These are due at the end of class or at the beginning of the following class. The lowest three
scores of these mini assignments will be dropped. In addition, participation in clicker quizzes will be a part of
this grade.
Group Project: There will be a group project assigned during the final module of the class. The projects
will be shared with the class the final day of class.
Communication
If you are having difficulty with the course material, please see me at your earliest convenience. Do not wait
until the first exam to see me about difficulties you are experiencing in comprehending the course material.
Do not allow yourself to fall behind in covering the assigned material as this will most certainly result in a
poor course grade. Keep up with your assignments and the readings in the text!
Classroom Environment
The optimal learning environment may be impaired significantly when the class as a whole is distracted
from its intended focus by the actions of a few. Accordingly, classroom computers should be used only as
directed by the instructor. Also, in-class use of cell phones, beepers and other devices that potentially may
create classroom distractions is prohibited (e.g., cell phones must be set on silent). Further, the behavior of
each member in the class must be conducive to the learning of the class as individuals and as a whole.
Course Schedule – Details here: http://statsdairy.com/INFO4300/schedule.html
Date
Principal content elements
Learning outcomes
W Apr 10
MODULE 1: PREDICTION
Define Data Mining and its basic terminology.1
Describe and perform data cleaning and preparation
Introduction to JMP
methods.2,3
Data Preprocessing
Identify and perform the steps in the data mining process.1,3
Exploratory Data Analysis
Describe, summarize, and display information in a data
Ethics in Data Mining
set.3,4,5
Multiple Regression and Model
Discuss the ethics that are involved with data mining.2,6
Evaluation Techniques
Explain and use performance metrics to evaluate data mining
models.2,6
General Linear Models
Create and interpret multiple regression models.3,4
Module 1 Exam
M Apr 15
W Apr 17
MODULE 2: PREDICTION/CLASSIFICATION
Time series forecasting
Explain, apply, and interpret forecasting models.2,3,4
Explain, apply, and interpret principal components analysis.
Principal Components Analysis
M Mar 25
W Mar 27
M Apr 1
W Apr 3
M Apr 8
M Apr 22
Introduction to Data Mining
Classification Methods
k-nearest neighbor
W Apr 24
M Apr 29
Logistic Regression
Classification and Regression Trees
(CART) / Decision Trees
W May 1
Module 2 Exam
2,3,4
Explain, apply, and interpret three simple classification
methods.2,3,4
Explain, apply, and interpret logistic regression models. 2,3,4
Explain, apply, and interpret classification and regression
trees (CART). 2,3,4
M May 13
MODULE 3: CLASSIFICATION, CLUSTERING, and ASSOCIATION
Explain, apply, and interpret discriminant analysis models. 2,3,4
Discriminant Analysis
Explain, apply, and interpret cluster analysis models. 2,3,4
Cluster Analysis
Explain, apply, and interpret association rules. 2,3,4
Explain, apply, and interpret neural network models. 2,3,4
Neural Networks
W May 15
M May 20
Association Rules
Text Mining
W May 22
Module 3 Exam
M May 27
Memorial Day
W May 29
Project Work Day
M May 6
W May 8
M Jun 3
Project Presentations
Learning Outcomes Classifications: 1. Knowledge, 2. Comprehension, 3. Application, 4. Analysis, 5. Synthesis, 6. Evaluation
Download