CS831: Knowledge Discovery in Databases

advertisement
CS490-001: Introduction to Data Mining
Winter 2016 (201610)
Instructor
Robert J. Hilderman
Office: CW308.23, 3rd Floor, College West
Voice: (306) 585-4061
Fax: (306) 585-4745
e-Mail: robert.hilderman@uregina.ca
WWW: http://www.cs.uregina.ca/~hilder
Office Hours
Location: CW308.23, 3rd Floor, College West
Time: MWF 10:30 – 11:30 AM (or by appointment)
Course Overview
This course will be a mix of self-directed study by the student on various core
data mining topics, a series of assignments to test the student’s understanding of
the core topics, and a term project based upon a research paper chosen by the
student from the recent data mining literature. There will be no exams.
Mark Distribution



Assignments (4) (written)
40%
Term project proposal (written)
Term project final report and appendices (written)
10%
50%
------100%
Note: Your final mark must be at least 75% to pass the course.
Note: At the instructor’s discretion, the final mark may be adjusted +/-5%.
Choosing a Term Project Topic


The research paper upon which your term project is based must be a recent
publication, specifically, anything published in 2013, 2014, and 2015 is
eligible. It must be approved by the instructor.
The scope of your term project must include significant software
development, data mining, and results evaluation components.
Term Project Proposal Requirements
The term project proposal must contain the following sections (the minimum
requirement):







Statement of Problem: Provide a statement of the problem addressed in the
selected paper.
Examples: Provide detailed example/s of the problem that was solved. These
must be complete hand-derived examples different from those contained in the
paper.
Overview of the Proposed Solution: Provide an approximate form of the
proposed solution in writing. This must be a complete hand-derived example
of how the problem was solved different from that contained in the paper.
Proposed Software Solution: Provide a detailed description of the software
that you will develop to solve the problem. A pseudocode overview of the
details of your software would be appropriate.
Evaluation Criteria: Describe your plans for testing that the software actually
solves the problem correctly.
Experimental Results: Describe the datasets that you will use to generate your
experimental results.
References: Provide a complete, properly formatted list of the cited
references.
The project proposal must be eight to 10 double-spaced typewritten pages in 12pt
font.
Term Project Final Report Requirements
The final project report should be modeled on a format that is similar to typical
research papers that you have read. For example, like the one you based your term
project on. It must contain the following sections (the minimum requirement):






Introduction: Provide some background on the problem addressed by the
project, an overview of the proposed solution, and a description of the report
document (i.e., the organization of the report).
Statement of Problem and Examples: This section can be adapted from your
term project proposal.
Proposed Approach: Provide detailed descriptions of algorithms, data
structures, and/or theoretical results. This section can be adapted from your
term project proposal and the software you developed.
Experimental Results: Provide a description of sample/typical experimental
results, tabular/graphical comparisons of your results compared to other
published results, and a summary of your results (a detailed description of
your results will be in the appendix).
Brief Comparison to Related Work: Provide a detailed analysis and discussion
of your results in comparison to other related work.
Conclusions: Provide a summary of your results.

References: Provide a complete, properly formatted list of the cited
references.
The final project report must be 16 to 18 double-spaced typewritten pages in 12pt
font. This does not include appendices (see below).
Appendices


Source Code Listing: Provide a complete listing of well-formatted, welldocumented source code.
Experimental Results: Provide a complete listing of all experimental results
(i.e., both raw data and summary data).
Sources for Reference Materials
Books (many other books are available and those below may have newer editions)















Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (eds.),
Advances in Knowledge Discovery and Data Mining, AAAI Press / The MIT
Press, 1996.
Berry, M.J.A. and Linoff, G.S., Mastering Data Mining: The Art and Science
of Customer Relationship Management, Wiley, 2000.
Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan
Kaufmann, 2001.
Hand, D., Mannila, H., and Smyth, P., Principles of Data Mining, The MIT
Press, 2001.
Witten, I.H. and Frank, E., Data Mining: Practical Machine Learning Tools
and Techniques with Java Implementations, Morgan Kaufmann, 2000.
Hastie, T., Tibshirani, R., and Friedman, J., The Elements of Statistical
Learning: Data Mining, Inference, and Prediction, Springer, 2001.
Fayyad, U., Grinstein, G.G., and Wierse, A. (eds.), Information Visualization
in Data Mining and Knowledge Discovery, Morgan Kaufmann, 2002.
Thuraisingham, B., Data Mining: Technologies, Techniques, Tools, and
Trends, CRC Press, 1999.
Pyle, D., Data Preparation for Data Mining, Morgan Kaufmann, 1999.
Dunham, M.H., Data Mining: Introductory and Advanced Topics, Prentice
Hall, 2003.
Russell, S. and Norvig, P., Artificial Intelligence: A Modern Approach,
Prentice Hall, 2003.
Mitchell, T.M., Machine Learning, McGraw-Hill, 1997.
Guillet, F. and Hamilton, H.J., Quality Measures in Data Mining, Springer,
2007.
Liu, B., Web Data Mining, Springer, 2007.
Wu, X. and Kumar, V., The Top Ten Algorithms in Data Mining, CRC Press,
2009.

Bramer, M., Principles of Data Mining, Springer, 2007.
Conference Proceedings (there are many others that have KDD tracks)







Proceedings of the International Conference on Knowledge Discovery and
Data Mining (KDD)
Proceedings of the European Conference on the Principles of Data Mining and
Knowledge Discovery (PKDD)
Proceedings of the Pacific-Asia Conference on Advances in Knowledge
Discovery and Data Mining (PAKDD)
Proceedings of the Data Warehousing and Knowledge Discovery Conference
(DaWaK)
Proceedings of the International Conference on Data Mining (ICDM)
Proceedings of the International Conference on Very Large Databases
(VLDB)
Proceedings of the International Conference on Management of Data
(SIGMOD)
Journals (these are just a few of many dealing with KDD)






IEEE Transactions on Knowledge and Data Engineering
Data Mining and Knowledge Discovery
Intelligent Data Analysis
Journal of Intelligent Information Systems
Knowledge and Information Systems
SIGKDD Explorations
Sources for Real World Datasets
To locate each of the sources shown below, use the terms given as keywords in a
web search engine.













UCI KDD Database Repository
UCI Machine Learning Repository
DELVE
FEDSTATS
FIMI Repository
Financial Data Finder
Grain Market Research
Investor Links
MIT Cancer Genomics Gene Expression Datasets
MLnet
National Space Science Data Center
PubGene Gene Database
Stanford Microarray Database







STATLOG Project Datasets
United States Census Bureau
DataCrunch
Reuters-21578 Text Categorization Collection
UCR Time Series Archive
DataWeb
WHO Statistical Information System
Download