STAT 3880/4880 Sect. 1, CRN 2176 / Sect. 1

advertisement
Course Title: Data Mining I
Quarter/Year: Spring 2012
Course Number, Section and CRN: STAT 3880/4880 Sect. 1, CRN 2176 / Sect. 1, CRN 2177
Prerequisites: STAT 1400 – Statistics II or STAT 4610 – Quantitative Methods
Meeting Place and Time: DCB 130 4:00pm-5:50pm
Name of Professor: Dr. Kellie Keeling
Office Hours: TT 9:00-10:00, 2:00-4:00, Virtual Office Hours as Posted, and by Appointment
Discussion Board: The General Questions area should be used to ask questions that may be relevant to all
the students in the course. The instructor will log on to the discussion board nearly every day.
Office Location: DCB 590
E-Mail Address: DM@statsdairy.com
Phone Number: 303-871-2296
Class Web Presence: http://statsdairy.com and http://blackboard.du.edu/
Introduction: This is a blended course. That means we will meet face-to-face for most Tuesdays and
Thursdays, but there is a substantial portion of the course material that will be delivered and completed
"online." Attendance at the face-to-face meetings and participation in all online activities is required. You
will find that the online and face-to-face elements of this course are interdependent and integrated. Online
participation is required every week – you will be expected to go online to continue discussion or complete
other activities. You will be assigned to a group and completion of some activities will require group
interaction. Some activities will be face-to-face and some will be online. If you miss a face-to-face class for a
legitimate reason, you may complete the in-class group assignment on your own and submit it the following
class.
ALL STUDENTS NEED TO FOLLOW THESE EXPECTATIONS:
University of Denver Honor Code
All students are expected to abide by the University of Denver Honor Code. These expectations include the
application of academic integrity and honesty in your class participation and assignments. The Honor Code
can be viewed in its entirety at this link: http://www.du.edu/ccs/honorcode.html
All members of the University of Denver are expected to uphold the values of Integrity, Respect, and
Responsibility. These values embody the standards of conduct for students, faculty, staff, and administrators
as members of the University community.
In order to foster an environment of ethical conduct in the University community, all community members
are expected to take “constructive action,” that is, any effort to discuss or report any behavior contrary to
the Honor Code with a neutral party. Failure to do so constitutes a violation of the DU Honor Code.
Specifically, plagiarism and cheating constitute academic misconduct and can result in both a grade penalty
imposed by the instructor and disciplinary action including suspension or expulsion. As part of their
responsibility to uphold the Honor Code, instructors reserve the right to have papers submitted through
SafeAssign to check for plagiarism against a database of papers submitted previously at DU, a national
database of papers, and the Internet.
Official Communications
The standard method of communicating official information from the Daniels College of Business to its
students is through email. Students are provided a DU account using the protocol of
firstname.lastname@du.edu, but must set up a "preferred" off-campus email address. Emails sent to the DU
account will be forwarded to the preferred email account. DU accounts do not store messages. More
information is available at: http://www.du.edu/studentemail/.
Students with Disabilities
A student who qualifies for academic accommodations because of a disability must submit a Faculty Letter
to the instructor from the DU Disability Services Program (DSP) in a timely manner, so that the needs of the
student can be addressed. Accommodations will not be provided retroactively, e.g., following an exam or
after the due date of a project. DSP determines eligibility for accommodations based on documented
disabilities. DSP is located in Ruffatto Hall, 1999 E. Evans Ave. (303-871-2278).
http://www.du.edu/studentlife/disability/dsp/index.html
Performance Assessment
The Daniels College of Business may use assessment tools in this course and other courses for evaluation.
Educational Assessment is defined as the systematic collection, interpretation, and use of information about
student characteristics, educational environments, learning outcomes and client satisfaction to improve
program effectiveness, student performance and professional success.
Gifts from Students
Because of possible perceptions of undue influence, it is not appropriate for a student to give a gift to a
faculty member while the student is still enrolled in that faculty member’s class, including through the
grading period. As a general rule, Daniels discourages the giving of gifts between students and faculty.
Emergency Procedures
The College places great emphasis on the safety of its students. Please respect emergency instructions,
including fire alarms. For more information, go to
http://www.du.edu/emergency/whattodowhen/index.html
REQUIRED COURSE MATERIALS:
Course Description
This course is designed to prepare you for managerial data analysis and data mining. More specifically, the
course addresses the how, when, why, and where of data mining. The emphasis is on understanding the
application of a wide range of modern techniques to specific decision-making situations, rather than on
mastering the theoretical underpinnings of the techniques. Upon successful completion of the course, you
should be able to perform the computational processes necessary to extract information from multidimensional data and transform it into knowledge that can lead to improved business performance. The
course covers methods that are aimed at prediction, forecasting, classification, clustering, and association.
Students will gain hands-on experience in using computer software to mine business data sets.
Beyond Grey Pinstripes: In this course we will discuss issues around ethics in data mining as related to
Business and Society. Specific concerns are misleading statistics and graphs and personally identifiable
information that could lead to Discrimination, Privacy abuse, and Telemarketing abuses.
Required Materials
• Clicker response system device – These are available new & used from the bookstore->
• Textbooks
Free Online Book: Discovering Knowledge in Data: An Introduction to Data Mining.
Larose, John Wiley & Sons. 2005. http://0-library.books24x7.com.bianca.penlib.du.edu/toc.asp?bookid=12378
(I will post pdf files from this book for you on Blackboard)
Supplemental Book (Not Required): Data Mining Techniques: For Marketing, Sales, and Customer
Relationship Management, 2nd Ed., Berry and Linoff, Wiley Publishing, 2004
Software
Microsoft Excel (2007/2010)
JMP 9.0 (disks will be passed around the first day)
Camtasia Relay - I will send you information about downloading
Other freely available products we discover through the course such as Weka and WebCrawler
•
Course Assessment
Performance will be evaluated on the items below. For this class, all assignments and exams assume you
are trainees for Stats Dairy. Your training score is only a measure of your performance in this class and does
not reflect my opinion of you as an individual or your worth as a person.
Module Exam 1
15%
Module Exam 2
20%
Module Exam 3
20%
Discussion participation 20%
Mini Assignments
5%
Reading Quizzes
5%
Group Project
10%
Real World Summary 5%
100%
Grades: Stats Dairy regularly hires more data mining trainees than it needs. By means of this course we
determine where to place the graduates of the program:
90% - 100%
A
Trainees who receive an A are considered on the "fast track" and will start out as
data mining analysts. Our studies show that most trainees who fall in this group
reach an executive position within 10 years.
80% - 89%
B
Trainees who receive a B will start out as assistant data mining analysts. This does
not mean that they cannot reach the executive level but it will be more difficult
70% - 79%
60% - 69%
00% - 59%
C
D
F
since they will not regularly be put into career-enhancing positions such as overseas
consulting assignments.
Trainees who receive a C will be put into staff positions for further development.
Trainees who receive a D will be offered non-management positions.
Trainees who receive an F will be separated from Stats Dairy.
NOTE: + and – grades are given according to the DU scale
Course Assignments
Overall Description: This course is set up as a blended course. Therefore, in addition to meeting face-to-face, we will
meet online in Blackboard for a variety of activities. Here is a general outline of what to expect each week in the
course:
1. Read/View the assigned reading material for Tuesday (posted on Blackboard).
2. Complete the “check for understanding” multiple choice reading quiz for Tuesday’s reading before class.
3. Attend class on Tuesday. Complete the in-class group mini assignment.
4. Post comments about the Discussion Board Analysis for the week (due Wednesday night).
5. Read/View the assigned reading material for Thursday (posted on Blackboard).
6. Complete the “check for understanding” multiple choice reading quiz for Thursday’s reading before class.
7. Post a comment about the week’s real word summaries.
8. If you are the lead group, post your response to the Discussion Board Analysis by Monday night.
Details about each graded component
Exams: Exams will be completed in two parts: On paper without notes and on your computer with your notes.
Calculators will be permitted. Cell phones cannot be used as calculators. If you are going to miss an exam for a
legitimate conflict, you must receive permission from me BEFORE the exam in order to reschedule. Otherwise you
will receive a zero on the exam. No make-up exams will be given. For the portion of the exam taking on your
personal computer, you will be required to record what you are doing using Camtasia Relay.
Reading Quizzes: For all class days except exam days, a reading quiz will be due before class. These reading quizzes
are multiple-choice and cover assigned reading materials. The lowest three scores will be dropped.
Discussion Participation:
Discussion Board Analysis: Each week a group will be assigned to be the “lead group” to post an initial analysis for a
given problem. These must be posted by Monday night. Then the remainder of the groups should respond to the
analysis Wednesday night. See the grading rubric on Blackboard under “Discussion Board Analysis Schedule.”
Depending on class size we may have 2 groups one week.
Real World Discussion: Each person should comment on each real world summary presented by their fellow
students. Students can either comment on: one thing they liked or learned from this presentation or one suggestion
to make the presentation better.
Group Project Comments: Each person should comment on one good thing about each presentation and one
suggestion for improvement or further analysis that could be completed.
Mini Assignments: During class time an assignment will be given that supplements the topics learned by watching
recorded lectures and reading supplied materials. These can be completed in small groups or individually. These are
due at the end of class or at the beginning of the following class. The lowest three scores of these mini assignments
will be dropped. In addition, participation in clicker quizzes will be a part of this grade.
Group Project: There will be a group project assigned during the final module of the class. 5-7 minute executive
summaries of the groups' projects will be presented the final day of class.
Real World Summaries: Each student will sign up for a day to post a real world example of the use of data mining.
These summaries will be presented as a 2-minute PowerPoint presentation with recorded audio using Camtasia
Relay. These can be accessed under Blackboard "Real World Summaries Schedule". The presentations will begin the
second week of classes.
Communication
If you are having difficulty with the course material, please see me at your earliest convenience. Do not wait until the
first exam to see me about difficulties you are experiencing in comprehending the course material. Do not allow
yourself to fall behind in covering the assigned material as this will most certainly result in a poor course grade. Keep
up with your assignments and the readings in the text!
Honor Code
You are expected to abide by the University's honor code on all assignments and exams. The Honor Code is meant to
foster and advance an environment of ethical conduct in our academic community. The Code of Student Conduct
contains information on the behavioral standards expected of all students at the University of Denver including the
areas of civility, community, integrity, and responsibility. Details can be found at: http://www.du.edu/ccs/
Classroom Environment
The optimal learning environment may be impaired significantly when the class as a whole is distracted from its
intended focus by the actions of a few. Accordingly, classroom computers should be used only as directed by the
instructor. Also, in-class use of cell phones, beepers and other devices that potentially may create classroom
distractions is prohibited (e.g., cell phones must be set on silent). Further, the behavior of each member in the class
must be conducive to the learning of the class as individuals and as a whole.
Students with Special Needs
If you need adaptations or accommodations because of a disability and are registered with the Disability Services
Program (DSP), if you have emergency medical information to share with me, if you need special arrangements in
case the building must be evacuated, or if you require rescheduling of an exam due to a religious holiday, please
make an appointment to see me during the first week of class.
Learning Outcomes Classifications
The following 6 levels of Bloom’s Taxonomy are used to classify the learning outcomes [Bloom B. S. (1956). Taxonomy of
Educational Objectives, Handbook I: The Cognitive Domain. New York: David McKay Co Inc.]
1. Knowledge: Recall data or information. Ex: defines, describes, identifies, knows, labels, lists, matches, names, outlines, recalls,
recognizes, reproduces, selects, states
2. Comprehension: Understand the meaning and interpretation of instructions and problems. State a problem in one's own
words. Ex: comprehends, converts, defends, distinguishes, estimates, explains, extends, generalizes, summarizes,
translates
3. Application: Use a concept in a new situation. Ex: applies, changes, computes, constructs, demonstrates, discovers, modifies,
operates, predicts, relates, shows, solves, use
4. Analysis: Separates material or concepts into component parts so that its organizational structure may be understood.
Distinguishes between facts and inferences. Ex: analyzes, breaks down, compares, contrasts, diagrams, differentiates,
identifies, illustrates, infers, outlines, relates, selects, separates
5. Synthesis: Builds a structure or pattern from diverse elements. Ex: categorizes, combines, compiles, composes, creates,
devises, designs, explains, generates, modifies, organizes, plans, rearranges, reconstructs, relates, reorganizes, revises,
rewrites, summarizes, tells, writes
6. Evaluation: Make judgments about the value of ideas or materials. Ex: appraises, compares, concludes, contrasts, criticizes,
critiques, defends, describes, discriminates, evaluates, explains, interprets, justifies, relates, summarizes, supports
Course Schedule – Details here: http://www.statsdairy.com/dm/dmsch.html
Date
Learning outcomes
Modules (Principal content elements)
MODULE 1: PREDICTION
Tue Mar 27
Introduction to Data Mining
Thu Mar 29
Introduction to JMP
Data Preprocessing
Define Data Mining and its basic terminology.
1
Describe and perform data cleaning and preparation methods.
Identify and perform the steps in the data mining process.
1,3
Exploratory Data Analysis
Ethics in Data Mining
Describe, summarize, and display information in a data set.
Thu Apr 5
Multiple Regression and Model
Evaluation Techniques
Discuss the ethics that are involved with data mining.
Tue Apr 10
General Linear Models
Explain and use performance metrics to evaluate data mining
2,6
models.
Tue Apr 3
Create and interpret multiple regression models.
Thu Apr 12
2,3
3,4,5
2,6
3,4
Module 1 Exam
MODULE 2: PREDICTION/CLASSIFICATION
Tue Apr 17
Time series forecasting
Explain, apply, and interpret forecasting models.
2,3,4
Explain, apply, and interpret principal components analysis.
Thu Apr 19
Principal Components Analysis
Tue Apr 24
Classification Methods
k-nearest neighbor
Thu Apr 26
Logistic Regression
Tue May 1
Classification and Regression Trees
(CART) / Decision Trees
Thu May 3
Module 2 Exam
2,3,4
Explain, apply, and interpret three simple classification methods.
Explain, apply, and interpret logistic regression models.
2,3,4
Explain, apply, and interpret classification and regression trees
2,3,4
(CART).
MODULE 3: CLASSIFICATION, CLUSTERING, and ASSOCIATION
Tue May 8
Discriminant Analysis
Explain, apply, and interpret discriminant analysis models.
Thu May 10
Cluster Analysis
Explain, apply, and interpret cluster analysis models.
Tue May 15
Neural Networks
Explain, apply, and interpret association rules.
2,3,4
Explain, apply, and interpret neural network models.
Thu May 17
Association Rules
Tue May 12
Text Mining
Thu May 24
Module 3 Exam
Tue May 29
Project Work Days
Thu May 31
Project Work Days
Tue Jun 5
Project Presentations
2,3,4
2,3,4
2,3,4
2,3,4
Download