Module Descriptor 2012/13 School of Computer Science and Statistics. Module Code Module Name Module Short Title ST4003 Data Analytics N/a ECTS weighting 10 Semester/term taught Michaelmas term Contact Hours Lecture hours: 4 Labhours: 1 Total hours: 55 Module Personnel Professor Myra O’ Regan To understand the theory and be able to apply the following techniques to a set of data Learning Outcomes Classification trees Neural Networks Association rules Ensemble methods Random Forests RuleFit procedure (Jerome Friedman) Support vector machines Evaluation of models Module Learning Aims The aim of the course is to introduce the students to a set of techniques including classification trees, neural networks, ensemble methods and support vector machines. Some techniques will be discussed in detail whilst a brief overview will be given for others. Methods to evaluate models will also be discussed. . Module Content Introduction Overview Handling Missing data Detailed discussion of Classification Trees Detailed discussion of Evaluation of Models Overview of Association Rules Overview of Neural Nets Overview of Support vector machines Page 1 of 3 Module Descriptor 2012/13 School of Computer Science and Statistics. Ensemble methods General Overview of Ensemble methods Detailed discussion of Random Forests Detailed discussion of RuleFit procedure Recommended Reading List Ayres, I. Supercrunchers, How anything can be predicted, John Murray, 2007. Berry M. J, A., & Linoff, G. Data Mining Techniques 3rd Edition , John Wiley & sons, 1997 Bishop, Christopher, Pattern Recognition and Machine Learning, Springer Science, 2006. Breiman, L., Friedman, J. H. Olshen, R. A. & Stone, C. J. Classification and regression Trees, Chapman and Hall,1984 Davenport, T.H. Harris, J.G. Competing on Analytics, The New Science of Winning, Harvard Business School Press, 2007. Hastie Trevor, Tibshirani, R., Friedman, J. The Elements of Statistical Learning, 2nd Edition, Springer Series, 2009 Ripley, B. D. Pattern recognition and Neural Networks, Cambridge University Press, 1996 Tan, Pang-Ning Steinbach, M. Kumar, V. Introduction to Data Mining, Pearson, 2006 Webb, Andrew, Statistical Pattern Recognition 2nd Edition, Wiley, 2002. Module Pre Requisite ST3007 – Multivariate Analysis and Applied Forecasting Module Co Requisite Assessment Details Students will be required to carry out a project employing the above techniques on a set of data using R. The project will consist of a series of mini projects over the term and will account for 40% of the total mark with an exam accounting for the remaining 60%. Module approval date N/a Approved By N/a Academic Start Year N/a Page 2 of 3 Module Descriptor 2012/13 School of Computer Science and Statistics. Academic Year N/a of Data Page 3 of 3