Least Squares Fitting 1

advertisement
WEEK #5, Lecture 3: Least-Squares Fitting
Function Fitting
So far in MATLAB, we have not dealt with data with error:
• Splines go exactly through given points.
• Equation solving methods assume we know all input values exactly.
Another important use of numerical tools is to go from (error-prone) data to
a mathematical model.
2
Data Files and MATLAB
The first step in working with data is getting it into memory.
Look up what the following commands do in MATLAB.
• dlmread -
• csvread -
• textread -
• xlsread -
Week 5 – Interpolation
3
Example Dataset
Exercise:
Start a new script, W5 8.m, and have it read in QuizAndExamGrades.xls
• Can be done with xlsread.
• Can also use double-click in Directory listing.
What format is data now in in MATLAB?
4
Histograms
Exercise: Look at the distribution of each variable separately using a histogram.
The MATLAB command for this is hist.
Arranging plots can be helpful. Exercise: In the script, use the commands
figure(1) and figure(2) for each separate plot.
Exercise: Look up the subplot command in the Help system. Modify the
script so it shows all the graphs in the same graph window using subplot.
How does the subplot command lay out the sub-windows?
Week 5 – Interpolation
Relationships
Of more interest than each variable separately is how they are related.
Exercise: Generate a scatter plot of the exam vs. test grades.
Once you have the scatterplot, what might you want to do next?
5
6
Fitting Curves to Data
Exercise:
In the MATLAB plot window, select Tools/Basic Fitting
There is a ‘spline’ option: what happens when you try it? Explain what
happened.
Exercise:
Play around:
• Move legend out of the way
• Get formula for best fit linear and quadratic curves
• What does big Right Arrow button do?
Week 5 – Interpolation
7
Model Selection - Which Fit is “Best”?
MATLAB is supposedly finding the “best fit line” or “best fit curve”. i.e. Of all
possible straight lines, the linear fit shown is the best straight line.
However, what does that mean if we want to compare the best straight line fit
to the best quadratic fit?
8
Guidelines
Which models keep closest to the actual data points?
• linear, quadratic, or higher order?
Which models match logic/intuition/practical constraints better?
• linear, quadratic, or higher order?
Always ask: Is a closer fit to the data substantial enough to justify higher-order
fittings?
We will study the question of “how high a degree should I use” in a more systematic
way next class.
Week 5 – Interpolation
9
Defining the “Best Fit” Within One Model
For today, we will look at selecting
• the best linear fit, among all possible linear fits, or
• the best quadratic fit, among all possible quadratic fits, etc.
How is the best model within each family selected? Or in other words, what
what makes the “best fit line” the best?
10
Mathematics of Least Squares
Our data is a set of (xi, yi) pairs. Before we find the best curve, we select/limit
ourselves to one predictive family of functions, e.g.
• Linear: ŷ = p1x + p2
• Quadratic: ŷ = p1x2 + p2x + p3
Definition: Finding the “Best fit” means “find values for pi that minimize the
squared error”:
X
(yi − ŷi)2
i
Week 5 – Interpolation
Graphically
11
60
50
40
30
20
10
0
0
2
4
6
8
10
12
14
16
18
12
Naming and Symbols
What symbols are traditionally used to describe the various components of function
fitting?
• Original Data
• Fitted function
• Fitted values
• Residuals
Week 5 – Interpolation
13
Least-Squares Error
Least-squares error is the standard means by which we select the best fit. The
best function fit is selected, from all possible curves in the same family, so as to
minimize the sum of y errors squared.
Are other definitions of “best fit” possible?
Why do we use this the least-squares definition of “best” so often?
14
Next class, identifying when least-squares fits are not “best”, and selecting between
multiple “best fit” models.
Download