COURSE SYLLABUS Semester: Fall 2013 Course Prefix/Number: CAP 6990 Course Title: Web Data Mining Course Credit Hours: 3.0 Course Meeting Times/Places: Online Prerequisites or Co-requisites: Data Mining (CAP5771). Course Description: The primary focus of this course is on Web usage mining and its applications to e-commerce and business intelligence. Specifically, we will consider techniques from machine learning, data mining, text mining, and databases to extract useful knowledge from Web data which could be used for site management, automatic personalization, recommendation, and user profiling. The first half of the course will be focused on a detailed overview of the data mining process and techniques, specifically those that are most relevant to Web mining. The second half will concentrate on the applications of these techniques to Web and e-commerce data, and their use in Web analytics, user profiling and personalization. List of Topics: The following issues and topics will be covered throughout the course. Many of these topics will be revisited several times during the course in a variety of contexts. Data Mining and Knowledge Discovery The KDD process and methodology Data preparation for knowledge discovery Overview of data mining techniques Market basket analysis Classification and prediction Clustering Memory-based reasoning Evaluation and Interpretation Web Usage Mining Process and Techniques Data collection and sources of data Data preparation for usage mining Mining navigational patterns Integrating e-commerce data Leveraging site content and structure User tracking and profiling E-Metrics: measuring success in e-commerce Privacy issues Web Mining Applications and Other Topics Data integration for e-commerce Web personalization and recommender systems Web content and structure mining Web data warehousing Review of tools, applications, and systems Teaching materials Required Textbook: o Web data Mining - Exploring Hyperlinks, Contents and Usage Data, By Bing Liu, Third Edition, Springer, July 2011, ISBN 978-3-642-19459-7 References o Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann Publishers, ISBN 1-55860-489-8. o Principles of Data Mining, by David Hand, Heikki Mannila, Padhraic Smyth, The MIT Press, ISBN 0-262-08290-X. o Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson/Addison Wesley, ISBN 0-321-32136-7. o Machine Learning, by Tom M. Mitchell, McGraw-Hill, ISBN 0-07-042807-7 Data mining resource site: KDnuggets Directory Topics (subject to change, slides may be changed too) 1. Introduction 2. Data pre-processing o Data cleaning o Data transformation o Data reduction o Discretization 3. Association rules and sequential patterns o Basic concepts 2 o o o o o 4. 5. 6. 7. 8. Apriori Algorithm Mining association rules with multiple minimum supports Mining class association rules Sequetial pattern mining Summary Supervised learning (Classification) o Basic concepts o Decision trees o Classifier evaluation o Rule induction o Classification based on association rules o Naive-Bayesian learning o Naive-Bayesian learning for text classification o Support vector machines o K-nearest neighbor o Bagging and boosting o Summary Unsupervised learning (Clustering) o Basic concepts o K-means algorithm o Representation of clusters o Hierarchical clustering o Distance functions o Data standardization o Handling mixed attributes o Which clustering algorithm to use? o Cluster evaluation o Discovering holes and data regions o Summary Information retrieval and Web search o Basic text processing and representation o Cosine similarity o Relevance feedback and Rocchio algorithm o Opinion spam or fake review detection Recommender systems and collaborative filtering o Content-based recommendation o Collaborative filtering based recommendation K-nearest neighbor Association rules Matrix factorization Web data extraction o Wrapper induction o Automated extraction References: Weka’s site: 3 Grading Policy: The final grade will be determined (tentatively) based on the following components: Assignments = 65% Final Project = 25% Final Exam = 10% Assignments: There will be 6-7 assignments during the semester involving the concepts and techniques discussed in class. The assignments may involve experimenting with various tools, as well as other written or problem-oriented exercises. Some assignments must be done individually. Late Policy: 1. 2. You are expected to complete work on schedule. Deadlines are part of the real world environment you are being prepared for. Documentation of health or family problems may be required. Late assignments will be penalized 25% per day (that means, four days after due date it will not be accepted). Course Project: For the class project, students can choose to do an implementation project, a data analysis project, or a research paper. Implementation projects may be done individually or in groups of 2 people (depending the complexity and the type of the project). Research papers and data analysis projects must be done individually. Each group or individual will submit a specific project proposal to be approved. More details about the possible project options, as well as due dates for the proposal and the final submission, will be available later. About this Course: This course is delivered completely online. Students must have consistent access to the Internet. Learning at a distance may be a very different environment for many of you. You will set your own schedules, participate in class activities at your convenience, and work at your own pace. You may require some additional time online during the first few days while you become accustomed to the online format and you may even feel overwhelmed at times. It will get better. 4 You should be prepared to spend more than 8 – 10 hours per week online completing lessons, activities, and participating in class discussions. DSS will provide the student with a letter for the instructor that will specify any recommended accommodations. Other Course Policies: Class material and due dates: Students are responsible for all announcements and all material presented. Students are expected to keep up with due dates and submit all assignments and work into the elearning dropbox before the due date. Communication: You are responsible for checking your e-mail and the elearning site regularly, preferably once a day, to keep up with important announcements, assignments, etc. Re-grading Assignments: It is the student’s responsibility to check graded assignments/tests when they are returned to you. I will gladly re-grade an assignment/test when a question or mistake is brought to my attention. To ensure fairness, I reserve the right to re-grade the entire assignment/test. As a result, your grade may increase, decrease, or remain the same. Grades will not be changed after a week from the date graded assignments/tests are returned to the class. Grades: Final grades will be calculated using a standard grade distribution. The last day of the term for withdrawal from an individual course with an automatic grade of “W” is 3/24. Students requesting late withdrawal (W or WF) from class must have the approval of the advisor, instructor, and the department chairperson (in that order) and finally by the Academic Appeals 5 committee. Requests for late withdraws may be approved only for the following reasons (which must be documented): 1. A death in the immediate family. 2. Serious illness of the student or an immediate family member. 3. A situation deemed similar to categories 1 and 2 by all in the approval process. 4. Withdrawal due to Military Service (Florida Statute 1004.07) 5. National Guard Troops Ordered into Active Service (Florida Statute 250.482) Requests without documentation will not be accepted. Requests for late withdrawal simply for not succeeding in a course, do not meet the criteria for approval and will not be approved. Applying for an incomplete or “I” grade will be considered only if: (1) there are extenuating circumstances to warrant it, AND (2) you have a passing grade and have completed at least 70% of the course work, AND (3) approval of the department chair. Participation and Feedback: I encourage active participation and regular feedback. I believe that effective communication between the instructor and students will make the course more useful, interesting, and productive. Please contact me if you have any questions, concerns, or suggestions! Important Note: Any changes to the syllabus or schedule made during the semester take precedence over this version. Check the elearning site (or email) regularly for up-to-date information. Overall Grading Scale: 1. A : 92 - 100 2. A-: 89 - 91 3. B+: 87 - 88 4. B : 82 - 86 5. B- : 79 - 81 6. C+: 77 - 78 7. C : 72 - 76 8. C : 72-76 C-: 69-71 D+: 67-68 D: 59-66 F: 0-58