Lasso, Support Vector Machines, Generalized linear models Kenneth D. Harris 20/5/15

Lasso, Support Vector Machines, Generalized linear models Kenneth D. Harris 20/5/15 Multiple linear regression What are you predicting? Data type Continuous Dimensionality 1 What are you predicting it from? Data type Continuous Dimensionality p How many data points do you have? Enough What sort of prediction do you need? Single best guess What sort of relationship can you assume? Linear Ridge regression What are you predicting? Data type Continuous Dimensionality 1 What are you predicting it from? Data type Continuous Dimensionality p How many data points do you have? Not enough What sort of prediction do you need? Single best guess What sort of relationship can you assume? Linear Regression as a probability model What are you predicting? Data type Continuous Dimensionality 1 What are you predicting it from? Data type Continuous Dimensionality p How many data points do you have? Not enough What sort of prediction do you need? Probability distribution What sort of relationship can you assume? Linear Different data types What are you predicting? Data type Discrete, integer, whatever Dimensionality 1 What are you predicting it from? Data type Continuous Dimensionality p How many data points do you have? Not enough What sort of prediction do you need? Single best guess What sort of relationship can you assume? Linear – nonlinear Ridge regression Linear prediction: 𝑦𝑖 = 𝐰 ⋅ 𝐱 𝑖 Loss function: 𝐿= 𝑖 1 𝑦𝑖 − 𝑦𝑖 2 2 + 1 𝜆𝐰 2 Fit quality Both the fit quality and the penalty can be changed. 2 Penalty “Regularization path” for ridge regression http://scikit-learn.org/stable/auto_examples/linear_model/plot_ridge_path.html Changing the penalty • 𝐰 = 2 𝑤 𝑖 𝑖 is called the “𝐿2 norm” • 𝐰1 = 𝑖 𝑤𝑖 is called the “𝐿1 norm” • In general 𝐰 𝑝 = 𝑝 𝑖 𝑤𝑖 𝑝 is called the “𝐿𝑝 norm” The LASSO Loss function: 𝐿= 𝑖 Fit quality 1 𝑦𝑖 − 𝑦𝑖 2 2 + 1 𝜆𝐰1 2 Penalty LASSO regularization path • Most weights are exactly zero • “sparse solution”, selects a small number of explanatory variables • This can help avoid overfitting when p>>N • Models are easier to interpret – but remember there is no proof of causation. • Path is piecewise-linear http://scikit-learn.org/0.11/auto_examples/linear_model/plot_lasso_lars.html Elastic net •𝐿= 1 𝑖2 𝑦𝑖 − 𝑦𝑖 2 + 1 𝜆 2 1 𝐰1+ 1 𝜆 2 2 𝐰 2 2 Predicting other types of data Linear prediction: 𝑓𝑖 = 𝐰 ⋅ 𝐱 𝑖 Loss function: 𝐿= 𝐸 𝑓𝑖 , 𝑦𝑖 𝑖 2 Penalty Fit quality For ridge regression, 𝐸 𝑓𝑖 , 𝑦𝑖 = + 1 𝜆𝐰 2 1 2 𝑓𝑖 − 𝑦𝑖 2 . But it could be anything… Support vector machine • For predicting binary data • “Hinge loss” function 0 1 − 𝑓𝑖 • 𝐸 𝑓𝑖 , 𝑦𝑖 = 0 1 + 𝑓𝑖 𝑦𝑖 = 1, 𝑓𝑖 𝑦𝑖 = 1, 𝑓𝑖 𝑦𝑖 = −1, 𝑓𝑖 𝑦𝑖 = −1, 𝑓𝑖 ≥1 <1 ≤ −1 > −1 E f Errors vs. margins • Margins are the places where 𝑓𝑖 = ±1 • On the correct side of the margin: zero error. • On the incorrect side: error is distance from margin. • Penalty term is higher when margins are close together • SVM balances classifying points correctly vs having big margins Generalized linear models What are you predicting? Data type Discrete, integer, whatever Dimensionality 1 What are you predicting it from? Data type Continuous Dimensionality p How many data points do you have? Not enough What sort of prediction do you need? Probability distribution What sort of relationship can you assume? Linear – nonlinear Generalized linear models Linear prediction: 𝑓𝑖 = 𝐰 ⋅ 𝐱 𝑖 Loss function: 𝐿= 𝐸 𝑓𝑖 , 𝑦𝑖 1 𝜆𝐰 2 + 𝑖 For ridge regression, 𝐸 𝑓𝑖 , 𝑦𝑖 = 1 2 𝑓𝑖 − 𝑦𝑖 for a Gaussian distribution with mean 𝑓𝑖 . 2 2 = 𝑐𝑜𝑛𝑠𝑡 − log 𝑝 𝑦𝑖 ; 𝑓𝑖 Generalized linear models Linear prediction: 𝑓𝑖 = 𝐰 ⋅ 𝐱 𝑖 Loss function: 𝐿= − log 𝑝 𝑦𝑖 ; 𝑓𝑖 𝑖 + 1 𝜆𝐰 2 2 Where 𝑝 𝑦𝑖 ; 𝑓𝑖 is a probability distribution for 𝑦𝑖 with parameter 𝑓𝑖 . Example: logistic regression 1 • 𝑝 𝑦𝑖 ; 𝑓𝑖 = 1+𝑒 −𝑓𝑖 1 1+𝑒 𝑓𝑖 𝑦𝑖 = 1 𝑦𝑖 = −1 P(y; f) f Logistic regression loss function • 𝐸 𝑓𝑖 , 𝑦𝑖 = log 𝑝 𝑦𝑖 ; 𝑓𝑖 = log 1 − 𝑒 −𝑓𝑖 𝑦𝑖 Poisson regression • When 𝑦𝑖 is a positive integer (e.g. spike count) • Distribution for 𝑦𝑖 is Poisson with mean 𝑔 𝑓𝑖 • “Link function” 𝑔 must be positive. Often exponential function, but doesn’t have to be (and it’s not always a good idea). What to read; what software to use http://web.stanford.edu/~hastie/glmnet_matlab/

Lasso, Support Vector Machines, Generalized linear models Kenneth D. Harris 20/5/15

Related documents

Products

Support

Lasso, Support Vector Machines, Generalized linear models Kenneth D. Harris 20/5/15

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib