CPSC 6127 - Zanev - Columbus State University

advertisement
Course Description and
Objectives
Textbook
Software
Methods of Instruction
Evaluation
Student Responsibilities
Attendance Policy
Academic
Dishonesty
ADAAccommodat
ion Notice
Instructor: Dr. Vladimir Zanev
Office Location/Phone Number: CCT 442/ (706) 507-8182
Office Hours: Mon, Wed, Fri: 10:00 a.m.-12:00 noon.
p.m.
E-mail: CougarVIEW class e-mail or zanev_vladimir@colstate.edu
Website: http://colstate.view.usg.edu
http://csc.columbusstate.edu/zanev
; Tue, Thu: 2:00-4:00
This course is offered as an online class in the Spring semester 2012. Class meets 100%
online at
( http://colstate.view.usg.edu )
Online Interface:
CougarVIEW (formerly WebCT Vista) will be the primary system and method of online
interaction in this course. Course materials (course outline, schedule, assignments, projects,
course notes, datasets, discussions, resources, and grading) will be available through
CougarVIEW. You can access CougarVIEW at:
( http://colstate.view.usg.edu )
At this page, click on the "Log-in" link to activate the CougarVIEW logon dialog box, which
will ask for your CougarVIEW username and password. Your CougarVIEW username and
password are:
Username: lastname_firstname
Password: DDMMYY
where DDMMYY is the student birth date. (Example - Birthday of Oct. 25, 1978 is 251078)
If you try the above and CougarVIEW will not let you in, please use the "Comments/Problems"
link at the bottom of the home page to request help. If you are still having problems gaining
access a day or so after the class begins, please e-mail me. Once you have clicked on the course's
name and accessed the course itself, you will find a home page with links to other sections and
tools, and a menu on the left-hand side. This course homepage and the left-hand menu will give
you access to all course materials.
Course Description and Objectives
Course Description:
Prerequisite - CPSC 5115. Algorithm Analysis and Design, CPSC 5138 Advanced DBMS.
These prerequisites are not in the Catalog and will not be enforced. Consider them as a
suggested background, which you should have to pass this course in a breeze. It is not required
that you must have taken the courses above. However, completing the following courses and/or
having a working knowledge in the respective areas will greatly help you to succeed in this class.
This course is an introduction to data mining. Recent advances in database technology along with the
phenomenal growth of the Internet have resulted in an explosion of data collected, stored, and
disseminated by various organizations. Because of its massive size, it is difficult for analysts to sift
through the data even though it may contain useful information. Data mining holds great promise to
address this problem by providing efficient techniques to uncover useful information hidden in the large
data repositories. Data mining is a modern area of computer science concerned with automated or
convenient extraction of patterns that represents previously unknown knowledge implicitly
stored in large databases, data warehouses, and other massive information repositories. In this
course we will approach the data mining problem from the position of data mining algorithms,
database design and programming. We will discuss suitable data models, data preparation, and
finally - different methods and algorithms one can implement to discover new knowledge from
raw data. We consider an introduction to the data warehouse and OLAP technology, data cube
computation and data generalization. The key objectives of this course are two-fold: (1) to teach the
fundamental concepts of data mining and (2) to provide extensive hands-on experience in applying the
concepts to real-world applications. The core topics to be covered in this course include:








data and exploring/preprocessing data
data warehouse and OLAP, data cubes and data generalization
classification data mining algorithms and methods
association analysis data mining algorithms and methods
cluster data mining algorithms and methods
WEKA data mining environment
Data mining using data mining Add-Ins and Excel
SQL Server 2008 data mining environment
Expected Outcomes
At the completion of this course, students will have an understanding and knowledge of:

What is data mining?









Data and exploring data: sampling, data cleaning, feature selection, and dimensionality
reduction
Data warehouse, OLAP technology, data cubes and data cube computation
Classification: basic concepts, decision trees, model evaluation
Classification: naive Bayes, time series, neural networks
Association analysis: basic concepts and algorithms, Apriori algorithm
Cluster analysis: basic concepts and algorithms, hierarchical clustering methods
Data warehouse, OLAP technology, data cubes and data cube computation
SQL Server 2008 environment, tools, and algorithms
How to use SQL Server 2008 for data mining
Textbook
Textbooks - required
Title: Data Mining. Practical Machine Learning Tools and
Techniques
Authors: Ian H. Witten, Eibe Frank, Mark Hall
Edition: 3rd, 2011
Publisher: Morgan Kaufmann Publishers
ISBN: 978-0-12-374856-0
Title: Data Mining with SQL Server 2008
Authors: Jamie MacLennan, ZhaoHui Tang, Bogdan
Crivat
Edition: 2009
Publisher: Wiley Publishing Inc.
ISBN: 978-0-470-27774-4
Additional Resources
(available online at the class
Resources page)
Chapter 3. Data Warehouse and OLAP Technology
Chapter 4. Data Cube Computation and Data
Generalization
Chapter 5. Mining Frequent Patterns, Associations,
and Correlations
from the textbook Data Mining. Concepts and
Techniques by J. Han and M. Kamber
Data Cube: A Relational Aggregation Operators
Generalizing Group-By, Cross-Tab, and Sub-Totals
by Jim Gray et all (research paper)
SS08 Analysis Services and Data Cube Tutorial
(developed from the SQL Server Books Online and SQL
Server Developer Center)
Software
Software
To complete all lessons, the data mining project, assignments, discussions, and exams, you will
need a computer with:





Windows XP/Vista/7, Internet Explorer, Adobe Acrobat Reader, and Word
Access to CSU CougarVIEW Web site
SQL Server 2008 or R2 (see Resources Web page for details how to obtain SQL Server
2008)
WEKA data mining environment (see Resources Web page)
SQL Server 2008 Add-Ins and Excel 2007
Methods of Instruction
Methods of Instruction:






Online Study
Forums
Assignments
Data Mining Projects
Midterm Exam
Final Exam
Online Study
Each student is expected to complete all readings from the textbooks and the additional resources
following the course schedule. Make your own notes. You can use your own notes during the
exams.
Assignments
Several assignments will be given that build upon the concepts covered in the textbooks and
have to be completed on your own time. Assignments will be problem-solving about data mining
algorithms. Assignment deadlines are not flexible for any reason. Late assignments are not
accepted for credit. Assignment submissions are usually via WebCT Vista dropboxes.
Data Mining Projects
The purpose of the projects is to give you experience with Data Mining project development,
implementation, analysis, result interpretations, and conclusions. The data mining projects are an
opportunity to apply the data mining concepts, techniques, and tools studied in class on real data
sets. All projects are data mining projects developed individually. The objective is to study,
implement and run data mining algorithms analyzing real data sets. You have to use SQL Server
2008, WEKA, and the data mining Add-Ins as implementation tools. Late projects are not
accepted for credit. Project submissions are usually via WebCT Vista drop boxes.
Forums
Three special forums will be opened on the course WebCT site. The first one is Software
Installation forum, the second one is Data Mining Projects and the third one - Data Mining
Assignments. The forums are studying tools and your participation in these forums is not for
grading purpose. You can post in these forums any questions, answers, remarks, or essays. You
cannot ask for a help on an entire project or assignment in these forums. For example, you can
ask for help on some error messages with projects, to give some hints or directions about parts of
an assignment or a project. However you cannot ask for solutions of an entire project and/or
assignment or for essential parts of a project or an assignment.
Exams
Your performance in this class will be measured by two online exams - Midterm and Final
Exam. No make-up exams will be given unless an exam was missed due to a documented
emergency. The exams will problem solving, timed exams. The problems on the exams will be
about data mining algorithms.
Evaluation
Evaluation
The final grade will be obtained from the following:
Assignments
Projects
Midterm
Exam
Final Exam
30%
30%
20%
20%
The letter grade will be assigned as follows:
Grade
A
B
C
D
F
Points
90-100
80-89
70-79
60-69
0 -59
Student Responsibilities
Student Responsibilities







Each student is responsible to manage his/her time and maintain the discipline required
to meet the course requirements.
Each student is responsible to read from the textbooks and the additional resources all
topics covered in the class
Each student is responsible to read the forum messages and to participate in the forums
Each student is responsible to execute the data mining projects
Each student is responsible to complete all assignments
Each student is responsible to adhere to all course deadlines
Each student is responsible to take the exams as they are scheduled in the course
schedule.
"I didn't know" is not an acceptable excuse for failing to meet the course requirements.
Students who fail to meet their responsibilities do so at their own risk.
Top ...
Attendance Policy
Attendance Policy
Attendance at all classes and other activities (lecture periods, quizzes, examinations, or other
schedule meetings) is required for every student at Columbus State University. The attendance
record begins with the first meeting of the class, and one who registers late is responsible for
class work missed. Class attendance is the responsibility of the student, and it is the student's
responsibility to independently cover any materials missed. Class attendance and participation
may also be used in determining grades. Student should note that the Computer Science Faculty
does not initiate "class drops". A student wishing to drop should complete the official procedure
before the deadline. Those who violate the attendance policy after that deadline may receive an
"F" at the discretion of the instructor. Refer to the CSU Catalog
(http://ace.columbusstate.edu/advising/a.php#AbsencePolicy ) for more information on class
attendance and withdrawal.
Academic Dishonesty
Academic Dishonesty: Academic dishonesty includes, but is not limited to, activities such as
cheating and plagiarism
(http://ace.columbusstate.edu/advising/a.php#AcademicDishonestyAcademicMisconduct). It is a
basis for disciplinary action. Any work turned in for individual credit must be entirely the work
of the student submitting the work. All work must be your own. You may share ideas but
submitting identical assignments (for example) will be considered cheating. You may discuss
the material in the course and help one another with debugging; however, any work you
hand in for a grade must be your own. A simple way to avoid inadvertent plagiarism is to talk
about the assignments, but don't read each other's work or write solutions together unless
otherwise directed. For your own protection, keep scratch paper and old versions of assignments
to establish ownership, until after the assignment has been graded and returned to you. If you
have any questions about this, please see me immediately. For assignments, access to notes,
the course textbooks, books and other publications is allowed. All work that is not your own,
MUST be properly cited. This includes any material found on the Internet. Stealing or giving or
receiving any code, diagrams, drawings, text or designs from another person (CSU or non-CSU,
including the Internet) is not allowed. Having access to another person’s work on the computer
system or giving access to your work to another person is not allowed. It is your responsibility to
keep your work confidential.
No cheating in any form will be tolerated. Penalties for academic dishonesty may include:




a zero grade on the assignment or exam/quiz
a failing grade for the course
suspension from the Computer Science program
dismissal from the Computer Science program.
All instances of cheating will be documented in writing with a copy placed in the Department's
files. Students will be expected to discuss the academic misconduct with the faculty members
and the chairperson.
ADA Accommodation Notice
ADA Accommodation Notice
If you have a documented disability as described by the Rehabilitation Act of 1973 (P.L. 933112 Section 504) and Americans with Disabilities Act (ADA) and would like to request
academic and/or physical accommodations please the Office of Disability Services in the Center
for Academic Support and Student Retention, Tucker Hall 100 or at (706) 568-2330, as soon as
possible. Course requirements will not be waived but reasonable accommodations may be
provided as appropriate.
Download