Course Description and Objectives Textbook Software Methods of Instruction Evaluation Student Responsibilities Attendance Policy Academic Dishonesty ADAAccommodat ion Notice Instructor: Dr. Vladimir Zanev Office Location/Phone Number: CCT 442/ (706) 507-8182 Office Hours: Mon, Wed, Fri: 10:00 a.m.-12:00 noon. p.m. E-mail: CougarVIEW class e-mail or zanev_vladimir@colstate.edu Website: http://colstate.view.usg.edu http://csc.columbusstate.edu/zanev ; Tue, Thu: 2:00-4:00 This course is offered as an online class in the Spring semester 2012. Class meets 100% online at ( http://colstate.view.usg.edu ) Online Interface: CougarVIEW (formerly WebCT Vista) will be the primary system and method of online interaction in this course. Course materials (course outline, schedule, assignments, projects, course notes, datasets, discussions, resources, and grading) will be available through CougarVIEW. You can access CougarVIEW at: ( http://colstate.view.usg.edu ) At this page, click on the "Log-in" link to activate the CougarVIEW logon dialog box, which will ask for your CougarVIEW username and password. Your CougarVIEW username and password are: Username: lastname_firstname Password: DDMMYY where DDMMYY is the student birth date. (Example - Birthday of Oct. 25, 1978 is 251078) If you try the above and CougarVIEW will not let you in, please use the "Comments/Problems" link at the bottom of the home page to request help. If you are still having problems gaining access a day or so after the class begins, please e-mail me. Once you have clicked on the course's name and accessed the course itself, you will find a home page with links to other sections and tools, and a menu on the left-hand side. This course homepage and the left-hand menu will give you access to all course materials. Course Description and Objectives Course Description: Prerequisite - CPSC 5115. Algorithm Analysis and Design, CPSC 5138 Advanced DBMS. These prerequisites are not in the Catalog and will not be enforced. Consider them as a suggested background, which you should have to pass this course in a breeze. It is not required that you must have taken the courses above. However, completing the following courses and/or having a working knowledge in the respective areas will greatly help you to succeed in this class. This course is an introduction to data mining. Recent advances in database technology along with the phenomenal growth of the Internet have resulted in an explosion of data collected, stored, and disseminated by various organizations. Because of its massive size, it is difficult for analysts to sift through the data even though it may contain useful information. Data mining holds great promise to address this problem by providing efficient techniques to uncover useful information hidden in the large data repositories. Data mining is a modern area of computer science concerned with automated or convenient extraction of patterns that represents previously unknown knowledge implicitly stored in large databases, data warehouses, and other massive information repositories. In this course we will approach the data mining problem from the position of data mining algorithms, database design and programming. We will discuss suitable data models, data preparation, and finally - different methods and algorithms one can implement to discover new knowledge from raw data. We consider an introduction to the data warehouse and OLAP technology, data cube computation and data generalization. The key objectives of this course are two-fold: (1) to teach the fundamental concepts of data mining and (2) to provide extensive hands-on experience in applying the concepts to real-world applications. The core topics to be covered in this course include: data and exploring/preprocessing data data warehouse and OLAP, data cubes and data generalization classification data mining algorithms and methods association analysis data mining algorithms and methods cluster data mining algorithms and methods WEKA data mining environment Data mining using data mining Add-Ins and Excel SQL Server 2008 data mining environment Expected Outcomes At the completion of this course, students will have an understanding and knowledge of: What is data mining? Data and exploring data: sampling, data cleaning, feature selection, and dimensionality reduction Data warehouse, OLAP technology, data cubes and data cube computation Classification: basic concepts, decision trees, model evaluation Classification: naive Bayes, time series, neural networks Association analysis: basic concepts and algorithms, Apriori algorithm Cluster analysis: basic concepts and algorithms, hierarchical clustering methods Data warehouse, OLAP technology, data cubes and data cube computation SQL Server 2008 environment, tools, and algorithms How to use SQL Server 2008 for data mining Textbook Textbooks - required Title: Data Mining. Practical Machine Learning Tools and Techniques Authors: Ian H. Witten, Eibe Frank, Mark Hall Edition: 3rd, 2011 Publisher: Morgan Kaufmann Publishers ISBN: 978-0-12-374856-0 Title: Data Mining with SQL Server 2008 Authors: Jamie MacLennan, ZhaoHui Tang, Bogdan Crivat Edition: 2009 Publisher: Wiley Publishing Inc. ISBN: 978-0-470-27774-4 Additional Resources (available online at the class Resources page) Chapter 3. Data Warehouse and OLAP Technology Chapter 4. Data Cube Computation and Data Generalization Chapter 5. Mining Frequent Patterns, Associations, and Correlations from the textbook Data Mining. Concepts and Techniques by J. Han and M. Kamber Data Cube: A Relational Aggregation Operators Generalizing Group-By, Cross-Tab, and Sub-Totals by Jim Gray et all (research paper) SS08 Analysis Services and Data Cube Tutorial (developed from the SQL Server Books Online and SQL Server Developer Center) Software Software To complete all lessons, the data mining project, assignments, discussions, and exams, you will need a computer with: Windows XP/Vista/7, Internet Explorer, Adobe Acrobat Reader, and Word Access to CSU CougarVIEW Web site SQL Server 2008 or R2 (see Resources Web page for details how to obtain SQL Server 2008) WEKA data mining environment (see Resources Web page) SQL Server 2008 Add-Ins and Excel 2007 Methods of Instruction Methods of Instruction: Online Study Forums Assignments Data Mining Projects Midterm Exam Final Exam Online Study Each student is expected to complete all readings from the textbooks and the additional resources following the course schedule. Make your own notes. You can use your own notes during the exams. Assignments Several assignments will be given that build upon the concepts covered in the textbooks and have to be completed on your own time. Assignments will be problem-solving about data mining algorithms. Assignment deadlines are not flexible for any reason. Late assignments are not accepted for credit. Assignment submissions are usually via WebCT Vista dropboxes. Data Mining Projects The purpose of the projects is to give you experience with Data Mining project development, implementation, analysis, result interpretations, and conclusions. The data mining projects are an opportunity to apply the data mining concepts, techniques, and tools studied in class on real data sets. All projects are data mining projects developed individually. The objective is to study, implement and run data mining algorithms analyzing real data sets. You have to use SQL Server 2008, WEKA, and the data mining Add-Ins as implementation tools. Late projects are not accepted for credit. Project submissions are usually via WebCT Vista drop boxes. Forums Three special forums will be opened on the course WebCT site. The first one is Software Installation forum, the second one is Data Mining Projects and the third one - Data Mining Assignments. The forums are studying tools and your participation in these forums is not for grading purpose. You can post in these forums any questions, answers, remarks, or essays. You cannot ask for a help on an entire project or assignment in these forums. For example, you can ask for help on some error messages with projects, to give some hints or directions about parts of an assignment or a project. However you cannot ask for solutions of an entire project and/or assignment or for essential parts of a project or an assignment. Exams Your performance in this class will be measured by two online exams - Midterm and Final Exam. No make-up exams will be given unless an exam was missed due to a documented emergency. The exams will problem solving, timed exams. The problems on the exams will be about data mining algorithms. Evaluation Evaluation The final grade will be obtained from the following: Assignments Projects Midterm Exam Final Exam 30% 30% 20% 20% The letter grade will be assigned as follows: Grade A B C D F Points 90-100 80-89 70-79 60-69 0 -59 Student Responsibilities Student Responsibilities Each student is responsible to manage his/her time and maintain the discipline required to meet the course requirements. Each student is responsible to read from the textbooks and the additional resources all topics covered in the class Each student is responsible to read the forum messages and to participate in the forums Each student is responsible to execute the data mining projects Each student is responsible to complete all assignments Each student is responsible to adhere to all course deadlines Each student is responsible to take the exams as they are scheduled in the course schedule. "I didn't know" is not an acceptable excuse for failing to meet the course requirements. Students who fail to meet their responsibilities do so at their own risk. Top ... Attendance Policy Attendance Policy Attendance at all classes and other activities (lecture periods, quizzes, examinations, or other schedule meetings) is required for every student at Columbus State University. The attendance record begins with the first meeting of the class, and one who registers late is responsible for class work missed. Class attendance is the responsibility of the student, and it is the student's responsibility to independently cover any materials missed. Class attendance and participation may also be used in determining grades. Student should note that the Computer Science Faculty does not initiate "class drops". A student wishing to drop should complete the official procedure before the deadline. Those who violate the attendance policy after that deadline may receive an "F" at the discretion of the instructor. Refer to the CSU Catalog (http://ace.columbusstate.edu/advising/a.php#AbsencePolicy ) for more information on class attendance and withdrawal. Academic Dishonesty Academic Dishonesty: Academic dishonesty includes, but is not limited to, activities such as cheating and plagiarism (http://ace.columbusstate.edu/advising/a.php#AcademicDishonestyAcademicMisconduct). It is a basis for disciplinary action. Any work turned in for individual credit must be entirely the work of the student submitting the work. All work must be your own. You may share ideas but submitting identical assignments (for example) will be considered cheating. You may discuss the material in the course and help one another with debugging; however, any work you hand in for a grade must be your own. A simple way to avoid inadvertent plagiarism is to talk about the assignments, but don't read each other's work or write solutions together unless otherwise directed. For your own protection, keep scratch paper and old versions of assignments to establish ownership, until after the assignment has been graded and returned to you. If you have any questions about this, please see me immediately. For assignments, access to notes, the course textbooks, books and other publications is allowed. All work that is not your own, MUST be properly cited. This includes any material found on the Internet. Stealing or giving or receiving any code, diagrams, drawings, text or designs from another person (CSU or non-CSU, including the Internet) is not allowed. Having access to another person’s work on the computer system or giving access to your work to another person is not allowed. It is your responsibility to keep your work confidential. No cheating in any form will be tolerated. Penalties for academic dishonesty may include: a zero grade on the assignment or exam/quiz a failing grade for the course suspension from the Computer Science program dismissal from the Computer Science program. All instances of cheating will be documented in writing with a copy placed in the Department's files. Students will be expected to discuss the academic misconduct with the faculty members and the chairperson. ADA Accommodation Notice ADA Accommodation Notice If you have a documented disability as described by the Rehabilitation Act of 1973 (P.L. 933112 Section 504) and Americans with Disabilities Act (ADA) and would like to request academic and/or physical accommodations please the Office of Disability Services in the Center for Academic Support and Student Retention, Tucker Hall 100 or at (706) 568-2330, as soon as possible. Course requirements will not be waived but reasonable accommodations may be provided as appropriate.