REGRESSION AND INTERPOLATION WITH MATLAB

advertisement
REGRESSION AND
INTERPOLATION WITH
MATLAB
Asst. Prof. Dr. Elif SERTEL
sertele@itu.edu.tr
1
Summary
1
Review of Basic Statistics
mean(x): Computes the mean (average
value) of the elements of the vector x.
std(x): Computes the standard deviation
of the values in x. The standard deviation σ
is defined as the square root of the
variance
Curve Fitting with Functions Other than Polynomials
y=bxm
power function
mx
mx
y=be or y=b10
exponential function
y=mln(x)+b or y=mlog(x)+b
logarithmic function
y=1/mx+b
reciprocal function
First rewrite the functions in a form that can be fitted
with a linear polynomial (n=1)
y=mx+n
ln(y)=mln(x)+ln(b)
power function
ln(y)=mx+ln(b) or log(y)=mx+log(b)
exponential function
1/y=mx+b
reciprocal function
2
Curve Fitting
Regression analysis is a process of fitting a function to
a setof data points.
Curve fitting with polynomials is done with polyfit
function which uses the least squares method.
polyfit finds the coefficients of a polynomial
representing the data
p=polyfit(x,y,n)
p is the vector of the coefficients of the polynomial that fits
the data
x is a vector with the horizontal coordinate
y is a vector with the vertical coordinate
n is the degree of the polynomial
polyval uses those coefficients to find new values of y,
that correspond to the known values of x
3
Other polyfit functions in Matlab
Function
Power function
y=bxm
Exponential function
y=bemx or
y=b10mx
Logarithmic function
y=mln(x)+b or
y=mlog(x)+b
Reciprocal function
y=1/mx+b
Polyfit function form
p=polyfit(log(x),log(y),1)
p=polyfit(x,log(y),1) or
p=polyfit(x,log10(y),1)
p=polyfit(log(x),y,1) or
p=polyfit(log10(x),y,1)
p=polyfit(x,1./y,1)
Other consideration when choosing a function are:
Exponential functions cannot pass through
the origin
Exponential functions can only fit data with
all positive y’s or all negative y’s
Logarithmic functions cannot model x=0, or
negative values of x
For the power function y=0 when x=0
The reciprocal equation cannot model y=0
4
Least Squares Curve Fitting
Experimental data always has a finite amount of
error included in it, due to both accumulated
instrument inaccuracies and also imperfections in
the physical system being measured. Even data
describing a linear system won’t all fall on a single
straight line.
Least-squares curve fitting is a method to find
parameters that fit the error-laden data as best
we can.
Least-squares minimize the sum of the squares of
the residuals.
Minimizing the residuals
5
Error or “residual”
Observation
Prediction
0
0
20
Sum squared error
Finds the “best fit” straight line
Minimizes the amount each point is away
from the line
It’s possible none of the points will fall on
the line
Advantages
Positive errors do not cancel negative ones.
Differentiation of the sum of squares is
easy
6
Linear Regression
Linear regression analyzes the relationship between
two variables, X and Y. For each subject (or
experimental unit), you know both X and Y and you want
to find the best straight line through the data.
In general, the goal of linear regression is to find the
line that best predicts Y from X. Linear regression
does this by finding the line that minimizes the sum of
the squares of the vertical distances of the points
from the line.
Note that linear regression does not test whether your
data are linear (except via the runs test). It assumes
that your data are linear, and finds the slope and
intercept that make a straight line best fit your data.
Linear Regression
The goal of linear regression is to adjust the values of
slope and intercept to find the line that best predicts
Y from X.
More precisely, the goal of regression is to minimize
the sum of the squares of the vertical distances of the
points from the line.
7
The names of the variables on the X and Y
axes vary according to the field of
application.
Some of the more common usages are
X-axis
independent
predictor
carrier
input
Y-axis
dependent
predicted
response
output
8
The data are pairs of independent and dependent
variables {(xi,yi): i=1,...,n}.
The fitted equation is written
is the predicted value of the response obtained by
using the equation.
The residuals are the differences between the
observed and the predicted values . They are always
calculated as (observed-predicted)
9
10
Least Squares Estimation of β 0, β 1
β0 ≡ Mean response when x=0 (y-intercept)
β1 ≡ Change in mean response when x
increases by 1 unit (slope)
β0, β1 are unknown parameters (like µ)
β0+β1x ≡ Mean response when explanatory
variable takes on the value x
Goal: Choose values (estimates) that
minimize the sum of squared errors (SSE)
of observed values to the straight-line:
^
^
2
^
^
n 


^

SSE = ∑i =1  yi − y i  = ∑i =1  yi −  β 0 + β 1 xi  





^
2
n
y = β 0+ β1 x
Least Squares Computations
S
xx
=
S
xy
=
S
yy
=
^
β
1
=
∑ (x
∑ (x
∑ (y
−
2
−
y
)
y
)=
2
−
∑ (x − x )(y −
∑ (x − x )
2
^
S
xy
S
xx
^
β
0
=
y − β
∑
s
)
x )( y
y )
− x
2
=
1
x
^


 y − y 


n − 2
2
=
SSE
n − 2
11
Example - Pharmacodynamics of LSD
Score (y)
78.93
58.20
67.47
37.47
45.65
32.92
29.97
350.61
LSD Conc (x)
1.17
2.97
3.26
4.69
5.83
6.00
6.41
30.33
x-xbar
-3.163
-1.363
-1.073
0.357
1.497
1.667
2.077
-0.001
y-ybar
28.843
8.113
17.383
-12.617
-4.437
-17.167
-20.117
0.001
Sxx
10.004569
1.857769
1.151329
0.127449
2.241009
2.778889
4.313929
22.474943
Sxy
-91.230409
-11.058019
-18.651959
-4.504269
-6.642189
-28.617389
-41.783009
-202.487243
Syy
831.918649
65.820769
302.168689
159.188689
19.686969
294.705889
404.693689
2078.183343
(Column totals given in bottom row of table)
350.61
30.33
= 50.087
x=
= 4.333
7
7
^
^
^
− 202.4872
β1 =
= −9.01
β 0 = y − β 1 x = 50.09 − (−9.01)(4.33) = 89.10
22.4749
y=
^
y = 89.10 − 9.01x
s2 = 50.72
Interpolation
12
Interpolation
Estimation of intermediate values between precise data
points. The most common method is:
f ( x) = a0 + a1 x + a2 x 2 + L + an x n
13
interp is the MATLAB function for linear
interpolation
v = interp(x,y,u)
The first two input arguments, x and y, are
vectors of the same length that define the
interpolating points.
The third input argument, u, is a vector of
points where the function is to be
evaluated.
The output v is the same length as u and
has elements
14
15
Methods
Nearest neighbor interpolation (method = 'nearest'). This
method sets the value of an interpolated point to the value
of the nearest existing data point.
Linear interpolation (method = 'linear'). This method fits
a different linear function between each pair of existing
data points, and returns the value of the relevant function
at the points specified by xi. This is the default method for
the interp1 function.
Cubic spline interpolation (method = 'spline'). This method
fits a different cubic function between each pair of existing
data points, and uses the spline function to perform cubic
spline interpolation at the data points.
Cubic interpolation (method = 'pchip' or 'cubic'). These
methods are identical. They use the pchip function to
perform piecewise cubic Hermite interpolation within the
vectors x and y.
Cubic Spline
A cubic spline creates a smooth curve,
using a third degree polynomial
16
17
Tips
When the nearest and linear methos are
used the values of xi must be within the
domain of x If the spline or the pchip
methods are used, xi can have values
outside the domain of x and the function
interp1 performs extrapolation.
The spline method can also give large
errors if the input data points are
nonuniform such that some points are much
closer than others.
18
References
http://mathworks.com
MIT OpenCourseWare, http://ocw.mit.edu
Lecture Notes of Aylin Konuklar and Lale Ergene
http://web.cecs.pdx.edu/~gerry/nmm/course/slid
es/ch09Slides4up.pdf
http://www.curvefit.com/linear_regression.htm
19
Download