Statistical Learning & Inference Lecturer: Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiao Tong University Books and References – Trevor Hastie Robert Tibshirani Jerome Friedman , The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2001, Springer-Verlag – V. Cherkassky & F. Mulier, Learning From Data, Wiley,1998 – Vladimir N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed., Springer, 2000 – M. Vidyasagar, Learning and generalization: with applications to neural networks, 2nd ed., Springer, 2003 – G. Casella & R. Berger, Statistical Inference, Thomson, 2002 – T. Cover & J. Thomas, Elements of Information Theory, Wiley 2015/4/13 Statistical Learning and Inference 2 Overview of the Course Introduction Overview of Supervised Learning Linear Method for Regression and Classification Basis Expansions and Regularization Kernel Methods Model Selections and Inference Support Vector Machine Bayesian Inference Unsupervised Learning 2015/4/13 Statistical Learning and Inference 3 Why Statistical Learning? 我门被信息淹没,但却缺乏知识。---- R. Roger 恬静的统计学家改变了我们的世界;不是通过发现新的事实或者 开发新技术,而是通过改变我们的推理、实验和观点的形成方式。 ---- I. Hacking 问题:为什么现在的计算机处理智能信息效率很低? – 图像、视频、音频 – 认知、交流 – 语言、语音、文本 – 生物、基因、蛋白 2015/4/13 Statistical Learning and Inference 4 Cloud Computing Cloud Computing Service Layers Services Services Application Focused Services – Complete business services such as PayPal, OpenID, OAuth, Google Maps, Alexa Application Application – Cloud based software that eliminates the need for local installation such as Google Apps, Microsoft Online Development Development – Software development platforms used to build custom cloud based applications (PAAS & SAAS) such as SalesForce Platform Infrastructure Focused Description Storage Hosting Platform – Cloud based platforms, typically provided using virtualization, such as Amazon ECC, Sun Grid Storage – Data storage or cloud based NAS such as CTERA, iDisk, CloudNAS Hosting – Physical data centers such as those run by IBM, HP, NaviSite, etc. 个人用户需要更多的功能: •疾病监护/心肺疾病 问题心电发 送 •康复训练 •健身指导等 心电采集和初 步诊断 调动社区医院空闲资源 反馈治疗建 议 自动诊断和辅助诊断 数据共享(远程医生) 咨询系统 社区医院 远程诊疗与监护 中心 人工诊断和治 疗建议 诊断结果反馈 数据发送 社区医院 医院医生也是远程种 新的用户 社区医院也需要更多的功能: •心电、呼吸、血压 •慢性病康复训练 •健身指导等 ML: SARS Risk Prediction Pre-Hospital Attributes 2015/4/13 RBC Count Albumin Blood pO2 White Count Chest X-Ray Age Gender Blood Pressure SARS Risk In-Hospital Attributes Statistical Learning and Inference 8 ML: Auto Vehicle Navigation Steering Direction 2015/4/13 Statistical Learning and Inference 9 Protein Folding 2015/4/13 Statistical Learning and Inference 10 The Scale of Biomedical Data 2015/4/13 Statistical Learning and Inference 11 General Procedure in SL Problem Definition Predictions Data Acquisition ML Procedure Model Training Feature Analysis EX. Pattern Classification Objective: To recognize horse in images Procedure: Feature => Classifier => Cross+Valivation 2015/4/13 Statistical Learning and Inference 13 Classifier Horse Non Horse 2015/4/13 Statistical Learning and Inference 14 Function Estimation Model The Function Estimation Model of learning examples: – Generator (G) generates observations x (typically in Rn), independently drawn from some fixed distribution F(x) – Supervisor (S) labels each input x with an output value y according to some fixed distribution F(y|x) – Learning Machine (LM) “learns” from an i.i.d. l-sample of (x,y)-pairs output from G and S, by choosing a function that best approximates S from a parameterised function class f(x,), where is in the parameter set 2015/4/13 Statistical Learning and Inference 15 Function Estimation Model G x S y LM ^y Key concepts: F(x,y), an i.i.d. k-sample on F, functions f(x,) and the equivalent representation of each f using its index 2015/4/13 Statistical Learning and Inference 16 The Problem of Risk Minimization The loss functional (L, Q) – the error of a given function on a given example L : x, y, f L y, f x, Q : z, Lz y , f z x , The risk functional (R) – the expected loss of a given function on an example drawn from F(x,y) – the (usual concept of) generalisation error of a given function R Q z , dF z 2015/4/13 Statistical Learning and Inference 17 The Problem of Risk Minimization Three Main Learning Problems – Pattern Recognition: y 0,1and L y, f x, 1y f x, – Regression Estimation: y and L y, f x, y f x, 2 – Density Estimation: y 0,1 and L px, log px, 2015/4/13 Statistical Learning and Inference 18 General Formulation The Goal of Learning – Given an i.i.d. k-sample z1,…, zk drawn from a fixed distribution F(z) – For a function class’ loss functionals Q (z ,), with in – We wish to minimise the risk, finding a function * * arg min R 2015/4/13 Statistical Learning and Inference 19 General Formulation The Empirical Risk Minimization (ERM) Inductive Principle – Define the empirical risk (sample/training error): 1 k Remp Qzi , k i 1 – Define the empirical risk minimiser: k arg min Remp – ERM approximates Q (z ,*) with Q (z ,k) the Remp minimiser…that is ERM approximates * with k – Least-squares and Maximum-likelihood are realisations of ERM 2015/4/13 Statistical Learning and Inference 20 4 Issues of Learning Theory 1. Theory of consistency of learning processes • What are (necessary and sufficient) conditions for consistency (convergence of Remp to R) of a learning process based on the ERM Principle? 2. Non-asymptotic theory of the rate of convergence of learning processes • How fast is the rate of convergence of a learning process? 3. Generalization ability of learning processes • How can one control the rate of convergence (the generalization ability) of a learning process? 4. Constructing learning algorithms (i.e. the SVM) • How can one construct algorithms that can control the generalization ability? 2015/4/13 Statistical Learning and Inference 21 Change in Scientific Methodology NEW TRADITIONAL Formulate hypothesis Design experiment Collect data Analyze results Review hypothesis Repeat/Publish 2015/4/13 Design large experiments Collect large data Put data in large database Formulate hypothesis Evaluate hypothesis on database Run limited experiments Review hypothesis Repeat/Publish Statistical Learning and Inference 22 Learning & Adaptation Any method that incorporates information from training samples in the design of a classifier employs learning. Due to complexity of classification problems, we cannot guess the best classification decision ahead of time, we need to learn it. Creating classifiers then involves positing some general form of model, or form of the classifier, and using examples to learn the complete classifier. 2015/4/13 Statistical Learning and Inference 23 Supervised learning In supervised learning, a teacher provides a category label for each pattern in a training set. These are then used to train a classifier which can thereafter solve similar classification problems by itself. – Such as Face Recognition, Text Classification, …… 2015/4/13 Statistical Learning and Inference 24 Unsupervised learning In unsupervised learning, or clustering, there is no explicit teacher or training data. The system forms natural clusters of input patterns and classifiers them based on clusters they belong to . – Data Clustering, Data Quantization, Dimensional Reduction, …… 2015/4/13 Statistical Learning and Inference 25 Reinforcement learning In reinforcement learning, a teacher only says to classifier whether it is right when suggesting a category for a pattern. The teacher does not tell what the correct category is. – Agent, Robot, …… 2015/4/13 Statistical Learning and Inference 26 Classification The task of the classifier component is to use the feature vector provided by the feature extractor to assign the object to a category. Classification is the main topic of this course. The abstraction provided by the feature vector representation of the input data enables the development of a largely domain-independent theory of classification. Essentially the classifier divides the feature space into regions corresponding to different categories. 2015/4/13 Statistical Learning and Inference 27 Classification The degree of difficulty of the classification problem depends on the variability in the feature values for objects in the same category relative to the feature value variation between the categories. Variability is natural or is due to noise. Variability can be described through statistics leading to statistical pattern recognition. 2015/4/13 Statistical Learning and Inference 28 Classification Question: How to design a classifier that can cope with the variability in feature values? What is the best possible performance? Object Representation in Feature Space S(x)>=0 Class A S(x)<0 Class B Noise and Biological Variations Cause Class Spread X2 (area) Classification error due to class overlap S(x)=0 Objects (perimeter) X1 2015/4/13 Statistical Learning and Inference 29 Examples User interfaces: modelling subjectivity and affect, intelligent agents, transduction (input from camera, microphone, or fish sensor) Recovering visual models: face recognition, modelbased video, avatars Dynamical systems: speech recognition, visual tracking, gesture recognition, virtual instruments Probabilistic modeling: image compression, low bandwidth teleconferencing, texture synthesis …… 2015/4/13 Statistical Learning and Inference 30 Course Web http://bcmi.sjtu.edu.cn/statLearnig/ Teaching Assistant: Liu Ye Email: lyyx000@yahoo.cn 2015/4/13 Statistical Learning and Inference 31 Assignment To write a report on the topic you are working on, including: – – – – Problem definition Model and method Key issues to be solved Outcome 2015/4/13 Statistical Learning and Inference 32