COP2253 - University of West Florida

advertisement
COURSE SYLLABUS
Semester: Fall 2011
Course Prefix/Number: CAP 6990
Course Title: Web Data Mining
Course Credit Hours: 3.0
Course Meeting Times/Places:
Online
Instructor and Contact Information:
Dr. Runa Bhuamik
E-mail: rbhaumik@uwf.edu
Course Web Site: http://elearning.uwf.edu/ (login and select Web Data Mining, CAP6990)
Prerequisites or Co-requisites: Data Mining (CAP5771).
Course Description:
The primary focus of this course is on Web usage mining and its applications to e-commerce and
business intelligence. Specifically, we will consider techniques from machine learning, data
mining, text mining, and databases to extract useful knowledge from Web data which could be
used for site management, automatic personalization, recommendation, and user profiling. The
first half of the course will be focused on a detailed overview of the data mining process and
techniques, specifically those that are most relevant to Web mining. The second half will
concentrate on the applications of these techniques to Web and e-commerce data, and their use in
Web analytics, user profiling and personalization.
List of Topics:
The following issues and topics will be covered throughout the course. Many of these
topics will be revisited several times during the course in a variety of contexts.
Data Mining and Knowledge Discovery






The KDD process and methodology
Data preparation for knowledge discovery
Overview of data mining techniques
Market basket analysis
Classification and prediction
Clustering


Memory-based reasoning
Evaluation and Interpretation
Web Usage Mining Process and Techniques








Data collection and sources of data
Data preparation for usage mining
Mining navigational patterns
Integrating e-commerce data
Leveraging site content and structure
User tracking and profiling
E-Metrics: measuring success in e-commerce
Privacy issues
Web Mining Applications and Other Topics





Data integration for e-commerce
Web personalization and recommender systems
Web content and structure mining
Web data warehousing
Review of tools, applications, and systems
Textbooks and Reading Material:



Data Mining Techniques for Marketing, Sales, and Customer Relationship Management,
Second Edition, by Michael Berry and Gordon Linoff, John Wiley, 2004.
Various papers or online resources (provided in class or online).
Recommended Books:
o Data Mining: Practical Machine Learning Tools and Techniques, by Ian Witten
and Eibe Frank, 2nd Ed., Morgan Kaufmann, 2005. [Note: this is the WEKA
book]
o Mining the Web: Transforming Customer Data into Customer Value, by Gordon
Linoff and Michael Berry, John Wiley & Sons, 2001.
o The Data Webhouse Toolkit, by Ralph Kimball and Richard Merz, John Wiley,
2000.
References:
Weka’s site: http://www.cs.waikato.ac.nz/~ml/weka/
Grading Policy:
The final grade will be determined (tentatively) based on the following components:
Assignments = 65%
Final Project = 35%
2
Assignments:
There will be 6-7 assignments during the semester involving the concepts and techniques
discussed in class. The assignments may involve experimenting with various tools, as
well as other written or problem-oriented exercises. You can work in a group of 2
students. You do the analysis together but should write your report separately, do not
copy the text from each other. Some assignments must be done individually.
Late Policy:
1.
2.
You are expected to complete work on schedule. Deadlines are part of the real
world environment you are being prepared for.
Documentation of health or family problems may be required.
Late assignments will be penalized 25% per day (that means, four days after due date it
will not be accepted).
Course Project:
For the class project, students can choose to do an implementation project, a data analysis
project, or a research paper. Implementation projects may be done individually or in
groups of 2 people (depending the complexity and the type of the project). Research
papers and data analysis projects must be done individually. Each group or individual
will submit a specific project proposal to be approved. More details about the possible
project options, as well as due dates for the proposal and the final submission, will be
available later.
About this Course:
This course is delivered completely online. Students must have consistent access to the Internet.
Learning at a distance may be a very different environment for many of you. You will set your
own schedules, participate in class activities at your convenience, and work at your own pace.
You may require some additional time online during the first few days while you become
accustomed to the online format and you may even feel overwhelmed at times. It will get better.
You should be prepared to spend more than 8 – 10 hours per week online completing lessons,
activities, and participating in class discussions. Finally, you may want to incorporate these tips
to help you get started:

Set a time at least twice a week (schedule) to:
o Check elearning postings to determine your tasks.
3
o Check elearning frequently throughout the week for updates.

Within the first week, become familiar with elearning and how to use it.
o It is a tool to help you learn!

Ask questions when you need answers.
o If you have problems, contact your instructor early.
Technology Requirements:
Knowledge of a machine learning tool – WEKA (on the Windows environment) will be
necessary for the project.
Expectations for Academic Conduct/Plagiarism Policy:
Academic Conduct Policy: (Web Format) | (PDF Format) | (RTF Format)
Plagiarism Policy: (Word Format) | (PDF Format) | (RTF Format)
Student Handbook: (PDF Format)
Assistance:
Students with special needs who require specific examination-related or other course-related
accommodations should contact Barbara Fitzpatrick, Director of Disabled Student Services
(DSS), dss@uwf.edu, (850) 474-2387. DSS will provide the student with a letter for the
instructor that will specify any recommended accommodations.
Other Course Policies:
Class material and due dates: Students are responsible for all announcements and all material
presented. Students are expected to keep up with due dates and submit all assignments and work
into the elearning dropbox before the due date.
Communication: You are responsible for checking your e-mail and the elearning site regularly,
preferably once a day, to keep up with important announcements, assignments, etc.
Re-grading Assignments: It is the student’s responsibility to check graded assignments/tests
when they are returned to you. I will gladly re-grade an assignment/test when a question or
mistake is brought to my attention. To ensure fairness, I reserve the right to re-grade the entire
assignment/test. As a result, your grade may increase, decrease, or remain the same. Grades will
not be changed after a week from the date graded assignments/tests are returned to the class.
Grades: Final grades will be calculated using a standard grade distribution. The last day of the
term for withdrawal from an individual course with an automatic grade of “W” is 3/24. Students
requesting late withdrawal (W or WF) from class must have the approval of the advisor,
instructor, and the department chairperson (in that order) and finally by the Academic Appeals
committee. Requests for late withdraws may be approved only for the following reasons (which
must be documented):
1. A death in the immediate family.
2. Serious illness of the student or an immediate family member.
3. A situation deemed similar to categories 1 and 2 by all in the approval process.
4. Withdrawal due to Military Service (Florida Statute 1004.07)
4
5. National Guard Troops Ordered into Active Service (Florida Statute 250.482)
Requests without documentation will not be accepted. Requests for late withdrawal simply for
not succeeding in a course, do not meet the criteria for approval and will not be approved.
Applying for an incomplete or “I” grade will be considered only if: (1) there are extenuating
circumstances to warrant it, AND (2) you have a passing grade and have completed at least 70%
of the course work, AND (3) approval of the department chair.
Participation and Feedback: I encourage active participation and regular feedback. I believe
that effective communication between the instructor and students will make the course more
useful, interesting, and productive. Please contact me if you have any questions, concerns, or
suggestions! 
Important Note: Any changes to the syllabus or schedule made during the semester take
precedence over this version. Check the elearning site (or email) regularly for up-to-date
information.
Overall Grading Scale:
1. A : 92 - 100
2. A-: 89 - 91
3. B+: 87 - 88
4. B : 82 - 86
5. B- : 79 - 81
6. C+: 77 - 78
7. C : 72 - 76
8. C : 72-76
9. C-: 69-71
10. D+: 67-68
11. D: 59-66
12. F: 0-58
There’s another page…keep scrolling down…
5
Tentative Course Schedule:
Lectures & Course Material
Topics & Reading
Week 1/Week2
Week3/Week4
Overview Web Data Mining and E-Business Analytics
Reading:
 Chapters 1 and 2 of Berry and Linoff
 Web Analytics (Wikipedia)
 Web Mining (Wikipedia)
 Web Mining: Information and Pattern Discovery on
the World Wide Web, by Robert Cooley, Bamshad
Mobasher, and Jaideep Srivastava, ICTAI 1997.
Knowledge Discovery Process; Data Preparation for Mining
Reading:
 Chapters 3 and 17 of Berry and Linoff
 Data Mining Overview
 Driving e-Commerce Profitability From Online and
Offline Data, White paper form Torrent Systems.
 Web Usage Mining: Discovery and Applications of
Usage Patterns from Web Data, by Jaideep Srivastava,
et. al., SIGKDD Explorations, January 2000.
 Read the Description of Porter's Stemming Algorithm
 An online version of porter stemmer algorithm
http://qaa.ath.cx/porter_js_demo.html
 Porter stemmer in many different programming
languages can be reached here:
http://tartarus.org/~martin/PorterStemmer/
Week5/Week6
Data Mining Techniques: Mining Association Rules and
Sequential Patterns
Reading:



Chapter 9 of Berry and Linoff
Web Usage Mining for Web Site Evaluation, by Myra
Spiliopoulou, Communications of ACM, August 2000.
An Internet-enabled Knowledge Discovery Process, by
Alex Buchner, et. al., MINEit Software Ltd., 1999.
6
Week7/Week8
Data Mining Techniques: Classification & Prediction, Neural
Network
Reading:
 Chapter 6 of Berry and Linoff
 Modeling Web Robot Navigation Patterns, by PangNing Tan and Vipin Kumar, WebKDD Workshop at the
ACM SIGKDD Conference, 2000.
 Note: An additional description of the ID3 and C4.5
algorithms can be found in the document "Building
Classification Models: ID3 and C4.5" from the AI course
at Temple university.
Week9/Week10
Data Mining Techniques: Clustering; Memory-Based
Reasoning
Reading:
 Chapters 11 and 8 of Berry and Linoff
 Text-Learning and Related Intelligent Agents: A
Survey, by Dunja Mladenic, IEEE Intelligent Systems,
July/August 1999.
 Clustering Users of Large Web Sites into Communities,
by Georgios Paliouras, et. al., ICML 2000.
Week11/Week12
Web Usage Mining: Data Preparation and Integration
Reading:

Week13/Week14
Data Preparation for Mining World Wide Web
Browsing Patterns, by Robert Cooley, Bamshad
Mobasher, and Jaideep Srivastava, Knowledge and
Information Systems, Volume 1, No. 1, 1999.
Web Usage Mining: E-Metrics and E-Commerce Data
Analysis, Predictive Web Analytics
Reading:
 Chapters 14 and 4 of Berry and Linoff
 E-Commerce Intelligence: Measuring, Analyzing, and
Reporting on Merchandising Effectiveness of Online
Stores, by Stephen Gomory, et. al., IBM T. J. Watson
Research Center.
 E-Metrics Business Metrics For The New Economy,
White Paper from NetGenesis.
 Lessons and Challenges from Mining Retail ECommerce Data, by Ron Kohavi, et al., Journal of
Machine Learning, 2003.
7

Week 15
Analysis of Recommendation Algorithms for ECommerce, by Badrul Sarwar, et. al., ACM Electronic
Commerce Conference, November 2000.
Web Personalization and Recommender Systems
Reading:
 Automatic Personalization Based on Web Usage
Mining, by Bamshad Mobasher, Robert Cooley, and
Jaideep Srivastava, Communications of ACM, August
2000.
 Integrating Web Usage and Content Mining for More
Effective Personalization, by Bamshad Mobasher et. al.,
EC-Web 2000.
Important Note: Not all lecture notes are prepared from the textbook. This is just a guideline
about the topics and a good source of solving homework problems. To get a better understanding
of the topics, you should read the related papers and the text from the book. If you find typos or
don’t understand any question, please let me know as soon as possible. Do not wait until the last
moment.
8
Download