Data modeling and least squares fitting*

Data Modeling and Least Squares Fitting COS 323 Data Modeling • Given: data points, functional form, find constants in function • Example: given (xi, yi), find line through them; i.e., find a and b in y = ax+b (x3,y3) (x5,y5) (x1,y1) (x6,y6) y=ax+b (x7,y7) (x2,y2) (x4,y4) Data Modeling • You might do this because you actually care about those numbers… – Example: measure position of falling object, fit parabola position time p = –1/2 gt2  Estimate g from fit Data Modeling • … or because some aspect of behavior is unknown and you want to ignore it – Example: measuring relative resonant frequency of two ions, want to ignore magnetic field drift Least Squares • Nearly universal formulation of fitting: minimize squares of differences between data and function – Example: for fitting a line, minimize  2    yi  (axi  b) 2 i with respect to a and b – Most general solution technique: take derivatives w.r.t. unknown variables, set equal to zero Least Squares • Computational approaches: – General numerical algorithms for function minimization – Take partial derivatives; general numerical algorithms for root finding – Specialized numerical algorithms that take advantage of form of function – Important special case: linear least squares Linear Least Squares • General pattern:    yi  a f ( xi )  b g ( xi )  c h( xi )    Given ( xi , yi ), solve for a, b, c, • Note that dependence on unknowns is linear, not necessarily function! Solving Linear Least Squares Problem • Take partial derivatives:  2    yi  a f ( xi )  b g ( xi )   2 i     2 f ( xi )  yi  a f ( xi )  b g ( xi )    0 a i a  f ( xi ) f ( xi )  b  f ( xi ) g ( xi )     f ( xi ) yi i i i     2 g ( xi )  yi  a f ( xi )  b g ( xi )    0 b i a  g ( xi ) f ( xi )  b  g ( xi ) g ( xi )     g ( xi ) yi i i i Solving Linear Least Squares Problem • For convenience, rewrite as matrix:  f ( xi ) f ( xi )  i   g ( xi ) f ( xi )  i   f (x )g(x )  g(x )g(x ) i i i i i i   a   f ( xi ) yi      i  b     g ( xi ) yi      i         • Factor:  f ( xi )  f ( xi )    i  g ( xi )   g ( xi )          T a   f ( xi )     b    yi  g ( xi )   i        Linear Least Squares • There’s a different derivation of this: overconstrained linear system Ax  b          A                x    b                • A has n rows and m<n columns: more equations than unknowns Linear Least Squares • Interpretation: find x that comes “closest” to satisfying Ax=b – i.e., minimize b–Ax – i.e., minimize |b–Ax| – Equivalently, minimize |b–Ax|2 or (b–Ax)(b–Ax) min (b  Ax)T (b  Ax)   (b  Ax) (b  Ax)  2A (b  Ax)  0  T  A T Ax  A Tb T Linear Least Squares • If fitting data to linear function: – Rows of A are functions of xi – Entries in b are yi – Minimizing sum of squared differences!  f ( x1 ) A   f (x ) 2    f ( xi ) f ( xi )  i T A A    g ( xi ) f ( xi )  i  g ( x1 ) g ( x2 )   y1   , b   y    2      f (x )g(x )  g(x )g(x ) i i i i i i   yi f ( xi )   i   T , A b    yi g ( xi )   i         Linear Least Squares • Compare two expressions we’ve derived – equal!  f ( xi ) f ( xi )  i   g ( xi ) f ( xi )  i   f (x )g(x )  g(x )g(x ) i i i i i i   f ( xi )  f ( xi )    i  g ( xi )   g ( xi )          T      a   yi f ( xi )     i b     yi g ( xi )     i       a   f ( xi )     b    yi  g ( xi )   i        Ways of Solving Linear Least Squares • Option 1: for each xi,yi compute f(xi), g(xi), etc. store in row i of A store yi in b compute (ATA)-1 ATb • (ATA)-1 AT is known as “pseudoinverse” of A Ways of Solving Linear Least Squares • Option 2: for each xi,yi compute f(xi), g(xi), etc. store in row i of A store yi in b compute ATA, ATb solve ATAx=ATb • These are known as the “normal equations” of the least squares problem Ways of Solving Linear Least Squares • These can be inefficient, since A typically much larger than ATA and ATb • Option 3: for each xi,yi compute f(xi), g(xi), etc. accumulate outer product in U accumulate product with yi in v solve Ux=v Special Case: Constant • Let’s try to model a function of the form y=a • In this case, f(xi)=1 and we are solving  1 a   y  i i i  a y i i n • Punchline: mean is least-squares estimator for best constant fit Special Case: Line • Fit to y=a+bx 1 i  x 1  i  n ( A A)   xi T 1 xi  1 a      yi   b  i  xi   xi 2  xi    1 xi   yi  n   xi T  , A b  2 2 2 nxi  xi  xi   x y  i i x yi  xi xi yi n xi yi  xi yi a i , b  2 2 2 2 nxi  xi  nxi  xi  2 Weighted Least Squares • Common case: the (xi,yi) have different uncertainties associated with them • Want to give more weight to measurements of which you are more certain • Weighted least squares minimization min    wi  yi  f ( xi )  2 2 i • If uncertainty is , best to take wi  1  i2 Weighted Least Squares • Define weight matrix W as  w1    W   0  w2 w3 w4 0        • Then solve weighted least squares via AT WA x  AT W b Error Estimates from Linear Least Squares • For many applications, finding values is useless without estimate of their accuracy • Residual is b – Ax • Can compute 2 = (b – Ax)(b – Ax) • How do we tell whether answer is good? – Lots of measurements – 2 is small – 2 increases quickly with perturbations to x Error Estimates from Linear Least Squares • Let’s look at increase in 2: x  x  x b  A( x  x)T b  A( x  x)  T  (b  Ax)  Ax)  (b  Ax)  Ax)   (b  Ax) T (b  Ax)  2x T A T (b  Ax)  x T A T Ax   2  2x T ( A T b  A T Ax)  x T A T Ax So,  2   2  x T A T Ax • So, the bigger ATA is, the faster error increases as we move away from current x Error Estimates from Linear Least Squares • C=(ATA)–1 is called covariance of the data • The “standard variance” in our estimate of x is 2  • This is a matrix: 2 nm C – Diagonal entries give variance of estimates of components of x – Off-diagonal entries explain mutual dependence • n–m is (# of samples) minus (# of degrees of freedom in the fit): consult a statistician… Special Case: Constant  2   ( yi  a ) 2 ya i  1 a   y  i i i  a y i n 2  i a   ( yi  a ) 2 i n 1  ( yi  a ) i “standard deviation of samples” n 1  1n  2 n “standard deviation of mean” Things to Keep in Mind • In general, uncertainty in estimated parameters goes down slowly: like 1/sqrt(# samples) • Formulas for special cases (like fitting a line) are messy: simpler to think of ATAx=ATb form • All of these minimize “vertical” squared distance – Square not always appropriate – Vertical distance not always appropriate

Data modeling and least squares fitting*

Related documents

Products

Support

Data modeling and least squares fitting*

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib