Dr. Christos Nikolopoulos
Office: BR 197
(309) 677-2456 chris@bradley.edu
class web site at : http://hilltop.bradley.edu/~chris/cs563.html
Required Textbook:
Witten I. and Frank E., DATA MINING: Practical Machine Learning Tools and
Techniques, Morgan Kaufmann Publishers.
Optional References:
1. Cios et al., DATA MINING: A Knowledge Discovery Approach , Springer, 2007.
2. Chris Nikolopoulos, Expert Systems: An Introductionto First, Second Generation and Hybrid Knowledge based Systems,Marcel Dekker, 1997.
Advances in Knowledge Discovery and Data Mining bring together the latest research in the areas of statistics, databases, machine learning, and artificial intelligence which together contribute to the rapidly growing field of knowledge discovery and data mining. Topics covered include fundamental issues, knowledge representation, cleaning and reprocessing of data sets, classification and clustering, machine learning algorithms, comparing machine learning algorithms and models, evaluating performance. The complimentary topic of Data Warehousing and OLAP is covered in the class CS 572, Advanced Databases.
Tests will be online.
Midterm:
The midterm will be sent by email to each student by 10:00 a.m. of Monday, June 2 nd and she/he will have till 10:00 p.m. of the next day (Tuesday June 3 rd
) to complete, and email the answers back to the instructor. The answers could be typed in a word file and emailed or they could be handwritten, scanned and emailed. (times are Peoria (central) time)
Final:
The final project substitutes the final exam, and the data set to be analyzed must be chosen by Friday 5/23. Send me an email to tell me which data set you chose. The project is due on Friday June 6 th
by 5:00 p.m. It has to be written in a research paper format
(abstract, introduction, main sections, conclusions, bibliography) and is due back by email.
Homework:
Homework assignments are also due by email on Friday June 6th by 5:00 p.m.
The table below gives the reading assignments each day from the books and online sources.
# Date
M day 1
Topics for online discussion Readings/Assignments
Introduction to Machine
Learning tools and techniques
Witten Part I, Chapter 1, pp. 4-39
Check out the class website: http://hilltop.bradley.edu/~chris/cs563.html
for powerPoint notes, homework assignments etc.
Download WEKA (see link on class web site)
T day 2 Witten Part I, Chapter 2, pp. 41-60
W day 3
Input: Concepts, instances and attributes
Output: Knowledge representation
Witten Part I, Chapter 3, pp. 61-82
TH day 4 Machine Learning: the basic methods
F day 5 Machine Learning: the basic methods
Witten Part I, Chapter 4, pp. 83-111, sections 4.1-4.4
Watch video 1
Witten Part I, Chapter 4, pp. 112-139, sections 4.5-4.9
Watch video 2
Witten Part II, Chapter 9, pp. 365-368 and
Chapter 10, pp. 369-401.
Send me an email on which data set you chose for your project, by 5:00 p.m.
M day 6
T day 7
W day 8
The WEKA machine learning workbench
Decide on a data set to use for Final Project/Exam
(University of California
Irvine Machine Learning
Data set http://archive.ics.uci.edu/ml/
)- send email to instructor to notify him of which data set you chose
The WEKA machine learning workbench
Evaluating what has been learned
TH day 9 Evaluating what has been learned
F day 10 Engineering the input and
Witten Part II, Chapter 10, pp. 401-423
Witten Part I, Chapter 5, pp. 143-160
Watch video 3
Witten Part I, Chapter 5, pp. 160-183
Watch video 4
Witten Part I, Chapter 7, pp. 285-341
output, attribute selection, discretizing, automatic data cleansing
M day 11 MIDTERM EXAM mailed The take home test covers chapters
1,2,3,4,5 from Witten’s book.
Test emailed to you by 10:00 a.m. Due back next day by 10:00 p.m. by email.
T day 12 MIDTERM EXAM due date Midterm answers due back by 10:00 p.m.
W day 13 Details on Decision trees, classification rules, extending linear models, neural nets by email.
Answers to test due back by 9:00 a.m.
Witten Part I, Chapter 6, pp. 187-235, sections 6.1-6.3
Watch video 5
Witten Part I, Chapter 6, pp. 235-283 TH day 14 Instance-based learning, numeric prediction, clustering, Bayesian
F day 15 networks
FINAL PROJECT REPORT
DUE and HOMEWORK
ASSIGNMENT DUE
By 5:00 p.m. both HW and the PROJECT are due, by email.
100 Points Total
50% Midterm Exam
30% Final Data Mining project report (in place of final exam)
20% homework assignments
Some Videos on DM/KD to watch:
Video 1: IIT lecture 1: http://www.bing.com/videos/watch/video/lecture-34-data-miningand-knowledge-discovery/1d0668894dc732fe82b91d0668894dc732fe82b9-83872645560
Video 2: IIT lecture 2: http://www.bing.com/videos/watch/video/lecture-35-data-miningand-knowledge-discovery-part-ii/f2c1c8cfcc5e319417f6f2c1c8cfcc5e319417f6-
29437526744
Video 3: DM and KD: http://videolectures.net/mps07_lavrac_dmkd/
Video 4: Data Mining at NASA: http://videolectures.net/kdd09_srivastava_dmnasata/
Video 5: SQL Know How Video, http://www.microsoft.com/showcase/en/us/details/38b7e057-42d2-4a8c-b4d2-
3154bc35d87a
More: http://videolectures.net/Top/Computer_Science/Data_Mining/
:
The project could be worked on as a team project (teams of at most two members), but individual projects are also fine if you so choose. The project is open ended and it involves applying WEKA to analyze a data set. Which algorithms to use and which are the most appropriate, how to clean the data etc. is entirely up to you. To find a good data set for your project, look at the machine learning depository stored at the University of
California Irvine’s ML site: http://archive.ics.uci.edu/ml/ . The data mining software you will use for your project is WEKA (see Witten's book). Download the WEKA software from link in my main class page).