Data modeling and least squares fitting*

advertisement
Data Modeling and
Least Squares Fitting
COS 323
Data Modeling
• Given: data points, functional form,
find constants in function
• Example: given (xi, yi), find line through them;
i.e., find a and b in y = ax+b
(x3,y3)
(x5,y5)
(x1,y1)
(x6,y6)
y=ax+b
(x7,y7)
(x2,y2)
(x4,y4)
Data Modeling
• You might do this because you actually care
about those numbers…
– Example: measure position of falling object,
fit parabola
position
time
p = –1/2 gt2
 Estimate g from fit
Data Modeling
• … or because some aspect of behavior is
unknown and you want to ignore it
– Example: measuring
relative resonant
frequency of two ions,
want to ignore
magnetic field drift
Least Squares
• Nearly universal formulation of fitting:
minimize squares of differences between
data and function
– Example: for fitting a line, minimize
 2    yi  (axi  b) 2
i
with respect to a and b
– Most general solution technique: take derivatives
w.r.t. unknown variables, set equal to zero
Least Squares
• Computational approaches:
– General numerical algorithms for function minimization
– Take partial derivatives; general numerical algorithms
for root finding
– Specialized numerical algorithms that take advantage of
form of function
– Important special case: linear least squares
Linear Least Squares
• General pattern:



yi  a f ( xi )  b g ( xi )  c h( xi )  

Given ( xi , yi ), solve for a, b, c,
• Note that dependence on unknowns is linear,
not necessarily function!
Solving Linear Least Squares Problem
• Take partial derivatives:
 2    yi  a f ( xi )  b g ( xi )  
2
i

   2 f ( xi )  yi  a f ( xi )  b g ( xi )    0
a
i
a  f ( xi ) f ( xi )  b  f ( xi ) g ( xi )     f ( xi ) yi
i
i
i

   2 g ( xi )  yi  a f ( xi )  b g ( xi )    0
b
i
a  g ( xi ) f ( xi )  b  g ( xi ) g ( xi )     g ( xi ) yi
i
i
i
Solving Linear Least Squares Problem
• For convenience, rewrite as matrix:
 f ( xi ) f ( xi )
 i
  g ( xi ) f ( xi )
 i

 f (x )g(x )
 g(x )g(x )
i
i
i
i
i
i

 a   f ( xi ) yi 

   i
 b     g ( xi ) yi 

   i


    

• Factor:
 f ( xi )  f ( xi )



i  g ( xi )   g ( xi ) 
    



T
a 
 f ( xi )
 


b    yi  g ( xi ) 
 i   
 


Linear Least Squares
• There’s a different derivation of this:
overconstrained linear system
Ax  b









A

 

 
   
   
 x    b 
   
   

 

 
• A has n rows and m<n columns:
more equations than unknowns
Linear Least Squares
• Interpretation: find x that comes “closest” to
satisfying Ax=b
– i.e., minimize b–Ax
– i.e., minimize |b–Ax|
– Equivalently, minimize |b–Ax|2 or (b–Ax)(b–Ax)
min (b  Ax)T (b  Ax)

 (b  Ax) (b  Ax)  2A (b  Ax)  0

T

A T Ax  A Tb
T
Linear Least Squares
• If fitting data to linear function:
– Rows of A are functions of xi
– Entries in b are yi
– Minimizing sum of squared differences!
 f ( x1 )
A   f (x )
2


 f ( xi ) f ( xi )
 i
T
A A    g ( xi ) f ( xi )
 i

g ( x1 )
g ( x2 )

 y1 

, b   y 

 2


 
 f (x )g(x )
 g(x )g(x )
i
i
i
i
i
i

 yi f ( xi )

 i


T
, A b    yi g ( xi ) 
 i








Linear Least Squares
• Compare two expressions we’ve derived – equal!
 f ( xi ) f ( xi )
 i
  g ( xi ) f ( xi )
 i

 f (x )g(x )
 g(x )g(x )
i
i
i
i
i
i

 f ( xi )  f ( xi )



i  g ( xi )   g ( xi ) 
    



T





a   yi f ( xi )

   i
b     yi g ( xi ) 

  i


  

a 
 f ( xi )
 


b    yi  g ( xi ) 
 i   
 


Ways of Solving Linear Least Squares
• Option 1:
for each xi,yi
compute f(xi), g(xi), etc.
store in row i of A
store yi in b
compute (ATA)-1 ATb
• (ATA)-1 AT is known as “pseudoinverse” of A
Ways of Solving Linear Least Squares
• Option 2:
for each xi,yi
compute f(xi), g(xi), etc.
store in row i of A
store yi in b
compute ATA, ATb
solve ATAx=ATb
• These are known as the “normal equations” of
the least squares problem
Ways of Solving Linear Least Squares
• These can be inefficient, since A typically much
larger than ATA and ATb
• Option 3:
for each xi,yi
compute f(xi), g(xi), etc.
accumulate outer product in U
accumulate product with yi in v
solve Ux=v
Special Case: Constant
• Let’s try to model a function of the form
y=a
• In this case, f(xi)=1 and we are solving
 1 a   y 
i
i
i
 a
y
i
i
n
• Punchline: mean is least-squares estimator for
best constant fit
Special Case: Line
• Fit to y=a+bx
1
i  x 1
 i
 n
( A A)  
xi
T
1
xi 
1
a 
    yi  
b  i  xi 
 xi 2  xi 


1
xi 
 yi 
n 
 xi
T

, A b

2
2
2
nxi  xi 
xi 

x
y
 i i
x yi  xi xi yi
n xi yi  xi yi
a i
,
b

2
2
2
2
nxi  xi 
nxi  xi 
2
Weighted Least Squares
• Common case: the (xi,yi) have different
uncertainties associated with them
• Want to give more weight to measurements
of which you are more certain
• Weighted least squares minimization
min    wi  yi  f ( xi ) 
2
2
i
• If uncertainty is , best to take wi 
1
 i2
Weighted Least Squares
• Define weight matrix W as
 w1



W


0

w2
w3
w4
0







• Then solve weighted least squares via
AT WA x  AT W b
Error Estimates from Linear Least Squares
• For many applications, finding values is useless
without estimate of their accuracy
• Residual is b – Ax
• Can compute 2 = (b – Ax)(b – Ax)
• How do we tell whether answer is good?
– Lots of measurements
– 2 is small
– 2 increases quickly with perturbations to x
Error Estimates from Linear Least Squares
• Let’s look at increase in 2:
x  x  x
b  A( x  x)T b  A( x  x) 
T
 (b  Ax)  Ax)  (b  Ax)  Ax) 
 (b  Ax) T (b  Ax)  2x T A T (b  Ax)  x T A T Ax
  2  2x T ( A T b  A T Ax)  x T A T Ax
So,  2   2  x T A T Ax
• So, the bigger ATA is, the faster error increases
as we move away from current x
Error Estimates from Linear Least Squares
• C=(ATA)–1 is called covariance of the data
• The “standard variance” in our estimate of x is
2 
• This is a matrix:
2
nm
C
– Diagonal entries give variance of estimates of
components of x
– Off-diagonal entries explain mutual dependence
• n–m is (# of samples) minus (# of degrees of
freedom in the fit): consult a statistician…
Special Case: Constant
 2   ( yi  a ) 2
ya
i
 1 a   y 
i
i
i
 a
y
i
n
2 
i
a 
 ( yi  a )
2
i
n 1
 ( yi  a )
i
“standard deviation
of samples”
n 1
 1n 
2
n
“standard deviation
of mean”
Things to Keep in Mind
• In general, uncertainty in estimated parameters
goes down slowly: like 1/sqrt(# samples)
• Formulas for special cases (like fitting a line) are
messy: simpler to think of ATAx=ATb form
• All of these minimize “vertical” squared distance
– Square not always appropriate
– Vertical distance not always appropriate
Download