Data Modeling and Least Squares Fitting COS 323 Data Modeling • Given: data points, functional form, find constants in function • Example: given (xi, yi), find line through them; i.e., find a and b in y = ax+b (x3,y3) (x5,y5) (x1,y1) (x6,y6) y=ax+b (x7,y7) (x2,y2) (x4,y4) Data Modeling • You might do this because you actually care about those numbers… – Example: measure position of falling object, fit parabola position time p = –1/2 gt2 Estimate g from fit Data Modeling • … or because some aspect of behavior is unknown and you want to ignore it – Example: measuring relative resonant frequency of two ions, want to ignore magnetic field drift Least Squares • Nearly universal formulation of fitting: minimize squares of differences between data and function – Example: for fitting a line, minimize 2 yi (axi b) 2 i with respect to a and b – Most general solution technique: take derivatives w.r.t. unknown variables, set equal to zero Least Squares • Computational approaches: – General numerical algorithms for function minimization – Take partial derivatives; general numerical algorithms for root finding – Specialized numerical algorithms that take advantage of form of function – Important special case: linear least squares Linear Least Squares • General pattern: yi a f ( xi ) b g ( xi ) c h( xi ) Given ( xi , yi ), solve for a, b, c, • Note that dependence on unknowns is linear, not necessarily function! Solving Linear Least Squares Problem • Take partial derivatives: 2 yi a f ( xi ) b g ( xi ) 2 i 2 f ( xi ) yi a f ( xi ) b g ( xi ) 0 a i a f ( xi ) f ( xi ) b f ( xi ) g ( xi ) f ( xi ) yi i i i 2 g ( xi ) yi a f ( xi ) b g ( xi ) 0 b i a g ( xi ) f ( xi ) b g ( xi ) g ( xi ) g ( xi ) yi i i i Solving Linear Least Squares Problem • For convenience, rewrite as matrix: f ( xi ) f ( xi ) i g ( xi ) f ( xi ) i f (x )g(x ) g(x )g(x ) i i i i i i a f ( xi ) yi i b g ( xi ) yi i • Factor: f ( xi ) f ( xi ) i g ( xi ) g ( xi ) T a f ( xi ) b yi g ( xi ) i Linear Least Squares • There’s a different derivation of this: overconstrained linear system Ax b A x b • A has n rows and m<n columns: more equations than unknowns Linear Least Squares • Interpretation: find x that comes “closest” to satisfying Ax=b – i.e., minimize b–Ax – i.e., minimize |b–Ax| – Equivalently, minimize |b–Ax|2 or (b–Ax)(b–Ax) min (b Ax)T (b Ax) (b Ax) (b Ax) 2A (b Ax) 0 T A T Ax A Tb T Linear Least Squares • If fitting data to linear function: – Rows of A are functions of xi – Entries in b are yi – Minimizing sum of squared differences! f ( x1 ) A f (x ) 2 f ( xi ) f ( xi ) i T A A g ( xi ) f ( xi ) i g ( x1 ) g ( x2 ) y1 , b y 2 f (x )g(x ) g(x )g(x ) i i i i i i yi f ( xi ) i T , A b yi g ( xi ) i Linear Least Squares • Compare two expressions we’ve derived – equal! f ( xi ) f ( xi ) i g ( xi ) f ( xi ) i f (x )g(x ) g(x )g(x ) i i i i i i f ( xi ) f ( xi ) i g ( xi ) g ( xi ) T a yi f ( xi ) i b yi g ( xi ) i a f ( xi ) b yi g ( xi ) i Ways of Solving Linear Least Squares • Option 1: for each xi,yi compute f(xi), g(xi), etc. store in row i of A store yi in b compute (ATA)-1 ATb • (ATA)-1 AT is known as “pseudoinverse” of A Ways of Solving Linear Least Squares • Option 2: for each xi,yi compute f(xi), g(xi), etc. store in row i of A store yi in b compute ATA, ATb solve ATAx=ATb • These are known as the “normal equations” of the least squares problem Ways of Solving Linear Least Squares • These can be inefficient, since A typically much larger than ATA and ATb • Option 3: for each xi,yi compute f(xi), g(xi), etc. accumulate outer product in U accumulate product with yi in v solve Ux=v Special Case: Constant • Let’s try to model a function of the form y=a • In this case, f(xi)=1 and we are solving 1 a y i i i a y i i n • Punchline: mean is least-squares estimator for best constant fit Special Case: Line • Fit to y=a+bx 1 i x 1 i n ( A A) xi T 1 xi 1 a yi b i xi xi 2 xi 1 xi yi n xi T , A b 2 2 2 nxi xi xi x y i i x yi xi xi yi n xi yi xi yi a i , b 2 2 2 2 nxi xi nxi xi 2 Weighted Least Squares • Common case: the (xi,yi) have different uncertainties associated with them • Want to give more weight to measurements of which you are more certain • Weighted least squares minimization min wi yi f ( xi ) 2 2 i • If uncertainty is , best to take wi 1 i2 Weighted Least Squares • Define weight matrix W as w1 W 0 w2 w3 w4 0 • Then solve weighted least squares via AT WA x AT W b Error Estimates from Linear Least Squares • For many applications, finding values is useless without estimate of their accuracy • Residual is b – Ax • Can compute 2 = (b – Ax)(b – Ax) • How do we tell whether answer is good? – Lots of measurements – 2 is small – 2 increases quickly with perturbations to x Error Estimates from Linear Least Squares • Let’s look at increase in 2: x x x b A( x x)T b A( x x) T (b Ax) Ax) (b Ax) Ax) (b Ax) T (b Ax) 2x T A T (b Ax) x T A T Ax 2 2x T ( A T b A T Ax) x T A T Ax So, 2 2 x T A T Ax • So, the bigger ATA is, the faster error increases as we move away from current x Error Estimates from Linear Least Squares • C=(ATA)–1 is called covariance of the data • The “standard variance” in our estimate of x is 2 • This is a matrix: 2 nm C – Diagonal entries give variance of estimates of components of x – Off-diagonal entries explain mutual dependence • n–m is (# of samples) minus (# of degrees of freedom in the fit): consult a statistician… Special Case: Constant 2 ( yi a ) 2 ya i 1 a y i i i a y i n 2 i a ( yi a ) 2 i n 1 ( yi a ) i “standard deviation of samples” n 1 1n 2 n “standard deviation of mean” Things to Keep in Mind • In general, uncertainty in estimated parameters goes down slowly: like 1/sqrt(# samples) • Formulas for special cases (like fitting a line) are messy: simpler to think of ATAx=ATb form • All of these minimize “vertical” squared distance – Square not always appropriate – Vertical distance not always appropriate