Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_ 2014/ Propagation of Errors Example 1: linear function xt is the true value of x if no bias in the error Example 2: mean sample mean averages errors in data Mean Squared Error How much is error reduced by averaging? Examine the mean squared error or error variance. The <~> indicates an ensemble average over N realizations. If the errors are random and similar with no bias error variance is reduced by a factor of N if errors are uncorrelated Errors for Differences Example 3: difference in time If errors are random (uncorrelated), the difference increases the squared errors by a factor of 2 If errors are correlated (bias), the difference reduces the errors Most errors are a combination of random and bias General Error Estimates Given a quantity that is a function of several variables F(x,y,z) a variation (or error) in F is related to variations in the variables or in terms of the error variance assuming errors in x, y and z are uncorrelated Error Estimate Example where ρ and cp are constant Squared error Factor out F2 Take ensemble average and define relative error Another Example Wind stress where cD is constant Error is given as a fraction r of wind speed so relative error is What is the relative error of wind stress (magnitude)? What is the stress error if the wind speed error is 10%? Another Example Solution Wind stress General formula: where cD is constant A 10% error in wind speed s gives a 20% error in stress Exercise 3: Error Estimates Known errors for Q, T and H Need error estimates for • Q/(ρ cpH) • dT/dt Exercise 3: Are other terms significant? 1. 2. compute LHS estimate total errors for LHS Is the LHS difference larger than the estimated errors? Notes: convert relative error variance of x to error variance using var(x) check that all units match Hypothesis Testing To determine whether a relationship is significant we formulate a null hypothesis, that the proposed relationship is NOT true We test to determine if the null hypothesis can be rejected within a given probabilty, say α = 0.05 (5%). (The level of confidence is 95%.) A significance test consists of finding the probability of a given result (a p-value) and comparing that with the alpha test value. If the pvalue (probability) is less than alpha, then the null hypothesis is rejected. Test Example Is the mean <X> of a subsample of X over N points significantly different from the known mean value μ? Depends on the std dev (error) of the mean estimate A measure of how large this is (how likely it is to be significant) is found from the Z-transform Probability of Z score (or lower) from a normal distribution N(0,1) is p = normcdf(Z,0,1) Or let Matlab do the work: p = normcdf(<X>,μ,σm) [Matlab function] Analysis of Variance (ANOVA) To test how well a dynamical or statistical model fits observations d(t) we estimate the fraction of variance described by the model z Two common types of models are (1) known function z = f (x,y) (2) linear estimator (coefficients by regression) z=ax+by+c The ratio of the squared residual (or error) r2 =( d – z )2 to the variance of the observations σd2 is the fraction of variance not explained by the model. Time Series Analysis The analysis of time series differs from that of independent objects (tossing dice, medical patient studies, etc) in that the measurements generally have serial correlation: So a time series with N points does not have N independent measurements. The effective number of independent measurements (degrees of freedom N*) depends on the degree of correlation of successive measurements, the autocorrelation of the time series: Covariance and Correlation For two time series x(t) and y(t) covariance is defined as where <~> is expected value and Δt is a time lag Correlation is the covariance normalized by the std dev’s (values between -1 and 1) Notes: 1) this terminology differs from that in Matlab, but is common 2) when applied to a single variable, x, autocovariance, autocorrelation 3) these are time-lagged values, but we often use only zero-lag value 4) we generally remove the mean values (as shown) Correlations Some common types of correlations: 1) autocorrelation (to get a time scale for the data) 2) correlations between two variables 3) lagged correlations to determine if one variable leads or lags another 4) vector correlations (as opposed to scalar correlations) To evaluate a correlation, need an objective measure of significance Autocorrelation & Periodic Signals Autocorrelation of variable with periodic signal mostly shows the periodicity Remove harmonics before computing (auto) correlations for better interpretation & statistics Characteristic Time Scale Is there a characteristic time scale for each variable? First zero crossing? Or something more robust? Integral Time Scale More robust method: takes into account shape of function integral time scale: integrate correlation (to first zero crossing) to get equivalent time (tau) for perfect correlation integral time scales: 1 month for Qnet 4 months for SSH integral time scales shorter than zero crossing integral time scale Caution: Covariance from Observations Autocovariance (or autocorrelation) from a single time series is an overestimate of the actual function because the error is correlated with itself. It should be estimated from two different measurements of the same quantity at the same location. If the errors have shorter time scales than the variable, then the error can be estimated from the autocovariance at non-zero lags Autocorrelation: estimate correction for zero lag extrapolate to zero lag Qnet difference in correlation from unresolved signal variance and actual errors (upper bound) SSH Significance of a Correlation (degrees of freedom) The integral time scale τ is used to define the number of degrees of freedom N* of a time series N* = N/τ where N is length of the series which is needed to determine the statistical significance of the correlation Z-test for significance of the correlation r based on a random parent distribution ρ of possible correlations Create a new variable The mean and std dev of w are Derivation of Significance Test (cont’d) For null hypothesis ρ = 0 so μ = 0. Normalize using Z transform If Z is within region containing fraction (1-α) of distribution the correlation is NOT significant. Alternatively, one can solve for the critical value of correlation rc See Bendat & Piersol for derivation (2000), pp.101-111 Exercise 4: Lagged correlations SSH at two locations lag SSH: longitude-time plot Can you estimate the speed of the Rossby wave from the SSH? Exercise 4: Vectors Mean wind vectors • KEO mooring • ECMWF • QuikSCAT • NCEP2 Note: vector correlations do not include means Vector Correlations complex correlation gives persistent direction errors & magnitude errors