New Jersey Institute of Technology College of Computing Sciences

New Jersey Institute of Technology
College of Computing Sciences
IS-698
Web Mining
Course Syllabus- Fall 2011
Instructor:
Min Song
Office: Room 4102 – GITC Building – 4th floor
Office Hours: Check my Web site for hours, other times by appointment
Web Site:
http://web.njit.edu/~song/
Telephone:
973-596-5291 (email is much better if have to leave a message)
E-mail:
min.song@njit.edu
Course:
IS 698
Where:
FMH308
When:
Tuesday, 6:00pm – 9:05pm
I am on campus extensively, but send me an e-mail to make sure that I am available in my office.
OVERVIEW
Web mining aims to discover useful information or knowledge from the Web hyperlink structure, page
content and usage log. It has quickly become one of the most popular areas in computing and
information systems because of its direct applications in e-commerce, Web analytics, information
retrieval/filtering, Web personalization, and recommender systems. Employees knowledgeable about
Web mining techniques and their applications are highly sought by major Web companies such as
Google, Amazon, Yahoo, MSN and others who need to understand user behavior and utilize discovered
patterns from terabytes of user profile data to design more intelligent applications.
The primary focus of this course is on Web usage mining and its applications to business intelligence and
biomedical domains. Specifically, we will consider techniques from machine learning, data mining, text
mining, and databases to extract useful knowledge from Web data which could be used for site
management, automatic personalization, recommendation, and user profiling.
Programming assignments give hands-on experience with web mining tasks. Programming experience is
required.
DISCUSSION:
Lecture presented in the text listed below and other possible sources
Semester-long project and paper.
PREREQUISITES
Knowledge and experience of java programming are required.
GRADING
Grades are assigned based on 3-4 assignments, midterm, a final project, and class participation. The
grading breakdown is as follows:
* Assignments:
20%
* Midterm:
25%
* Final Project:
45%
* Class Participation:
10%
Late assignments will not be accepted.
COLLABORATION POLICY
For assignments, you are not allowed to discuss your answers with other students. Copying solutions
from other students is never allowed. For the group project, you will work in teams and hand in only one
written report.
TEXT
 Required Textbook:

Web Data Mining - Exploring Hyperlinks, Contents and Usage Data, By Bing Liu, Springer, ISBN 3540-37881-2, Dec 2006. (It is available in UIC bookstore)
 References

Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann
Publishers, ISBN 1-55860-489-8.

Principles of Data Mining, by David Hand, Heikki Mannila, Padhraic Smyth, The MIT Press, ISBN
0-262-08290-X.

Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar,
Pearson/Addison Wesley, ISBN 0-321-32136-7.

Machine Learning, by Tom M. Mitchell, McGraw-Hill, ISBN 0-07-042807-7
IS 698 Course Schedule – Fall 2009
Class
Topic
Reading
Assignment Due/
Project Progress
#1 – Sep 6
Introduction and Project Idea
Chapter 1
Week1
Data pre-processing and Natural
Language Processing (NLP)
Chapter 1
Week2
Project team
#1 –
#2 – Sep 13

Data cleaning

Data transformation

Data reduction

Part-Of-Speech

Sentence Parsing

Text Chunking
#2 –
#3 – Sep 20
Association rules and sequential
patterns
·
Basic concepts
·
Apriori Algorithm
·
Sequential pattern mining
Chapter 2
Week 3
Chapter 3
Assignment 1
Week 4
Chapter 4
Project proposal
Proposal Outline
#3 –
#4 – Sep 27
Supervised learning (Classification)
and Unsupervised Learning
(Clustering)
·
Basic concepts
·
Decision trees
·
Classifier evaluation
·
Rule induction
·
Classification based on
association rules
·
Naive-Bayesian learning
#4 –
#5 – Oct. 4
Information retrieval and Web
search
#5 –
#6 – Oct 11
Question Answering
Chapter 5
Week 5
Class Exercise
Full-text Mining
Chapter 6
Assignment 2
Week 6
Chapter 7
Week 7
Project progress
report
#6 –
#7 – Oct 18
#7 –
#8 – Oct 25
#8 –
Midterm – in class exam
#9 – Nov 1
Partially supervised learning
·
Word Sense Disambiguation
(WSD)
·
#9 –
#10 – Nov 8
Link analysis
Chapter 8
Week 8
·
Social network analysis:
centrality and prestige
·
Citation analysis: co-citation
and bibliographic coupling
·
Mining communities on the
Web
#10 –
#11 – Nov 15
Data extraction and information
integration
#11 –
#12 – Nov 22
No Class – Thanksgiving Break
Chapter 9
Assignment 3
#12 –
#13 – Nov 29
Opinion mining and summarization
Chapter 10
#13 –
#14 – Dec 6
Future Direction of Web Mining
#14 –
#15 – Dec 13
Project Presentation
Deadline for
submission of final
paper
#15 –
Note: The class will have three hour long class instead of one and half hour class.
Oct 25. - conference trip and possibly Nov. 18 - conference trip.