Evaluating Models

advertisement
LECTURE 02:
EVALUATING MODELS
January 27, 2016
SDS 293
Machine Learning
Announcements / Questions
• Life Sciences and Technology Fair is tomorrow:
3:30-6pm in the Carroll Room
www.smith.edu/lazaruscenter/fairs_scitech.php
• Office hours: does anyone have a conflict?
Outline
• Evaluating Models
• Lab pt. 1 – Introduction to R:
- Basic Commands
- Graphics Overview
- Indexing Data
- Loading Data
- Additional Graphical/Numerical Summaries
• Lab pt. 2 - Exploring other datasets (time permitting)
Beyond LR
Stated goal of this course:
explore methods that go beyond
standard linear regression
One tool to rule them all…?
Question: why not just teach you the best one first?
Answer: it depends
• No single method dominates over all
• On a particular data set, for a particular question, one
specific method may work well; on a related but not
identical dataset or question, another might be better
• Choosing the right approach is arguably the most
challenging aspect of doing statistics in practice
• So how do we do it?
Measuring “Quality of Fit”
• One question we might ask: how well do my model’s
predictions actually match the observations?
• What we need: a way to measure how close the
predicted response is to the true response
• Flashback to your stats training: what do we use in
regression?
Mean Squared Error
True response
for the ith observation
Of the squared difference
We take the average Prediction our model gives
over all observations
for the ith observation
“Training” MSE
• This version of MSE is computed using the training data
that was used to fit the model
• Reality check: is this what we care about?
Test MSE
• Better plan: see how well the model does on observations
we didn’t train on
• Given some never-before-seen examples, we can just
calculate the MSE on those using the same method
• But what if we don’t have any new observations to test?
- Can we just use the training MSE?
- Why or why not?
Example
Test MSE
Avg. training MSE
Training vs. Test MSE
• As the flexibility of the
statistical learning method
increases, we observe:
- a monotone decrease in the
training MSE
- a U-shape in the test MSE
• Fun fact: occurs regardless
of the data set and statistical
method being used
• As flexibility increases,
training MSE will decrease,
but the test MSE may not
Overfitting
Trade-off between bias and variance
• The U-shaped curve in the Test MSE is the result of two
competing properties: bias and variance
• Variance refers to the amount by which the model would
change if we estimated it using different training data
• Bias refers to the error that is introduced by
approximating a real-life problem (which may be
extremely complicated) using a much simpler model
Relationship between bias and variance
• In general, more flexible methods have higher variance
Relationship between bias and variance
• In general, more flexible methods have lower bias
Trade-off between bias and variance
• It is possible to show that the expected test MSE for a
given value can be decomposed into three terms:
The variance of our model
The bias of ourThe
model
variance
on the test value
on the test
ofvalue
the error terms
Balancing bias and variance
• We know variance and squared bias are always
nonnegative (why?)
• There’s nothing we can do about the variance of the
irreducible error inherent in the model
• So we’re looking for a method that minimizes the sum of
the first two terms… which are (in some sense) competing
Balancing bias and variance
• It’s easy to build a model with
low variance but high bias (how?)
• Just as easy to build one with
low bias but high variance (how?)
• The challenge: finding a method for which both the
variance and the squared bias are low
• This trade-off is one of the most important recurring
themes in this course
What about classification?
• So far, we’ve only talked about how to evaluate the
accuracy of a regression model
• The idea of a bias-variance trade-off also translates to the
classification setting, but we need some minor
modifications to deal with qualitative responses
• For example: we can’t really compute MSE without
numerical values, so what can we do instead?
Training error rate
• One common approach is to use the training error rate,
where we measure the proportion of the times our model
incorrectly classifies a training data point:
Where
Using
the an
model’s classification
And take
We tally
the up
was
indicator
different
function
from the actual class
average
all the times
Takeaways
• Choosing the “right” level of flexibility is critical for success
in both the regression and classification settings
• The bias-variance tradeoff can make this a difficult task
• In Chapter 5, we’ll return to this topic and explore various
methods for estimating test error rates
• We’ll then use these estimates to find the optimal level of
flexibility for a given ML method
Questions?
Lab pt. 1: Introduction to R
• Basic Commands
• Graphics
• Indexing data
• Loading external data
• Generating summaries
• Playing with real data
(time permitting!)
Lab pt. 1: Introduction to R
Lab pt. 1: Introduction to R
Lab pt. 1: Introduction to R
Lab pt. 1: Introduction to R
• Today’s walkthrough (and likely many others) will be run
using:
which allows me to build “notebooks” which run live R
code (python, too!) in the browser
• Hint: this is also a nice way to format your homework!
Lab pt. 2: Exploring Other Datasets
• More datasets from the book - ISLR package
- Already installed on Smith Rstudio server
- Working locally? > install.packages(‘ISLR’)
- Details available at: cran.r-project.org/web/packages/ISLR
- Dataset descriptions: www.inside-r.org/packages/cran/ISLR/docs
• Real world data:
- Olympic Athletes: goo.gl/1aUnJW
- World Bank Indicators: goo.gl/0QdN9U
- Airplane Bird Strikes: goo.gl/lFl5ld
- …and a whole bunch more: goo.gl/kcbqfc
Coming Up
• Next class: Linear Regression 1: Simple and Multiple LR
• For planning purposes: Assignment 1 will be posted next
week, and will be due the following Weds (Feb. 10th)
Download