Module Descriptor 2012/13 School of Computer Science and

advertisement
Module Descriptor 2012/13
School of Computer Science and Statistics.
Module Code
Module Name
Module Short
Title
ST4003
Data Analytics
N/a
ECTS
weighting
10
Semester/term
taught
Michaelmas term
Contact Hours
Lecture hours: 4
Labhours: 1
Total hours: 55
Module
Personnel
Professor Myra O’ Regan
To understand the theory and be able to apply the following techniques to a set
of data
Learning
Outcomes
Classification trees
Neural Networks
Association rules
Ensemble methods
Random Forests
RuleFit procedure (Jerome Friedman)
Support vector machines
Evaluation of models
Module
Learning Aims
The aim of the course is to introduce the students to a set of techniques including
classification trees, neural networks, ensemble methods and support vector
machines. Some techniques will be discussed in detail whilst a brief overview will be
given for others. Methods to evaluate models will also be discussed.
.
Module
Content
Introduction
Overview
Handling Missing data
Detailed discussion of Classification Trees
Detailed discussion of Evaluation of Models
Overview of Association Rules
Overview of Neural Nets
Overview of Support vector machines
Page 1 of 3
Module Descriptor 2012/13
School of Computer Science and Statistics.
Ensemble methods
General Overview of Ensemble methods
Detailed discussion of Random Forests
Detailed discussion of RuleFit procedure
Recommended
Reading List
Ayres, I. Supercrunchers, How anything can be predicted, John Murray, 2007.
Berry M. J, A., & Linoff, G. Data Mining Techniques 3rd Edition , John Wiley & sons,
1997
Bishop, Christopher, Pattern Recognition and Machine Learning, Springer Science,
2006.
Breiman, L., Friedman, J. H. Olshen, R. A. & Stone, C. J. Classification and regression
Trees, Chapman and Hall,1984
Davenport, T.H. Harris, J.G. Competing on Analytics, The New Science of Winning,
Harvard Business School Press, 2007.
Hastie Trevor, Tibshirani, R., Friedman, J. The Elements of Statistical Learning, 2nd
Edition, Springer Series, 2009
Ripley, B. D. Pattern recognition and Neural Networks, Cambridge University Press,
1996
Tan, Pang-Ning Steinbach, M. Kumar, V. Introduction to Data Mining, Pearson, 2006
Webb, Andrew, Statistical Pattern Recognition 2nd Edition, Wiley, 2002.
Module Pre
Requisite
ST3007 – Multivariate Analysis and Applied Forecasting
Module Co
Requisite
Assessment
Details
Students will be required to carry out a project employing the above techniques on
a set of data using R. The project will consist of a series of mini projects over the
term and will account for 40% of the total mark with an exam accounting for the
remaining 60%.
Module
approval date
N/a
Approved By
N/a
Academic
Start Year
N/a
Page 2 of 3
Module Descriptor 2012/13
School of Computer Science and Statistics.
Academic Year
N/a
of Data
Page 3 of 3
Download