Information Retrieval in High Dimensional Data Wintersemester 2011213 Prof. Dr. M. Kleinsteuber and Dipl. Math. M. Seibert, Geometric Optimization and Machine Learning Group, TU München Information Retrieval in High Dimensional Data 1 A test: Find this person in the audience: Information Retrieval in High Dimensional Data 2 How do we extract/store the picture‘s information? Information Retrieval in High Dimensional Data 3 Where would you go for a 12 months stay? Analyze the following data: Dataset 1 Information Retrieval in High Dimensional Data 4 Where to go for a 12months stay? Analyze the following data: Dataset 2 Information Retrieval in High Dimensional Data 5 Where to go for a 12months stay? Analyze the following data: Dataset 3 Information Retrieval in High Dimensional Data 6 Dataset 1 (Porto) Dataset 2 (Honululu) Dataset 3 (Canberra) How do we extract information? Is it possible to divide simply into „good“ and „bad“ climate? Is it possible to visualize climate-relatedness of cities? Information Retrieval in High Dimensional Data 7 More examples Speech Recognition Text Classification Image Analysis Recognize digits/faces Sound Separation Data Visualization Information Retrieval in High Dimensional Data 8 In this course: Get in touch with some of the tools! No Support Vector Machines No Regression No Factor Analysis No Random Projection No Neural Networks No Hidden Markov Models No Bayes Classifier No Self Organizing Maps ..... Reference: I. Fodor: A survey of dimension reduction techniques, Technical Report, Berkeley 2002. Information Retrieval in High Dimensional Data 9 INSTEAD: Outline of the course: 1. Curse of Dimensionality 2. Statistical Decision Making 3. Principal Component Analysis 4. Linear Discriminant Analysis 5. Independent Component Analysis 6. Multidimensional Scaling 7. Isomap vs. Local Linear Embedding 8. Christmas 9. Kernel PCA 10. Robust PCA 11. Sparsity and Morphological Component Analysis Computer Vision 10 Literature: J. Izenman. Modern Multivariate Statistical Techniques. Springer 2008. J.A. Lee, M. Verleysen: Nonlinear Dimensionality Reduction, Springer 2007. T. Hastie, R. Tibshirani, J. Friedman. The elements of statistical Learning, Springer 2009. Papers (will be provided when appropriate) Information Retrieval in High Dimensional Data 11 Data Analysis Books/Papers/Internet... Communicate Contents Give feedback/Ask questions mk Studis Accept Methods Be interested Be independent Ask questions Give feedback mk Choose methods Choose topics Address the questions Accept Feedback Studis Structure of Course Lecture 2 + Tutorials 2 (M. Seibert and I) (4 assignments+1 Poster Session) LABCOURSE (Matlab Programming/Discussion and reading group/Postersession/etc.) 3 Examination: assignments required (max. 5 x 20 pts) 33% 30 mins oral examination 66% (up to two persons per exam) Information Retrieval in High Dimensional Data 14 Questions? Information Retrieval in High Dimensional Data 15