Monday March 1, 2004 DSES 6180-01 DATA MINING AND KNOWLEDGE DISCOVERY Instructor: Prof. Mark J. Embrechts (x 4009 or 371-4562) Office hrs: CII 5217 Thursday 10:00-11:00 Class Time: Monday/Thursday 4-5:20 pm (Jonson-Rowland Science Center 2C30) Book: Margaret H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall 2003. LECTURE #12: Direct Kernel Methods This lecture will introduce the paradigm of direct kernel methods, which reconciles neural networks, statistical multivariate regression and support vector machines in a single framework. Direct kernel methods assume a kernel transform as a data preprocessing step, rather than an inherent part of the learning method. By applying a direct kernel transform, traditional methods such as Principal Component Analysis (PCA), Ridge Regression, Partial Least Squares (PLS), Independent Component Analysis (ICA) or simple one-layered neural networks, can be transformed into powerful nonlinear modeling and machine learning tools. This presentation will highlight several industrial applications of direct kernel methods such as network intrusion detection, the detection of ischemia from magnetocariograms, in-silico drug design, the electronic nose, gene expression arrays, and the detection of mixtures of chemical substances from spectral data. Handouts 1. Lecture Slides posted on website 2. Mark J. Embrechts, “Direct Kernel Least-Squares Support Vector Machines with Heuristic Regularization,” Submitted for presentation to IJCNN2004, Budapest, July 2004. Quiz 1. Explain PLS in half a page or less 2. Explain equation 8 Deadlines: January 22 January 29 February 16 March 1 March 4 March 18 March 8&11 March 15 April 8 April 22/26 HW#0 (Web browsing). Project Proposal HW #1 Quiz #1 on PLS paper by Svante Wold et al. HW #2 HW #3 Spring Break Progress Report No Class Final Presentations 1