Ocean569_3&4

advertisement
Oceanography 569
Oceanographic Data Analysis Laboratory
Kathie Kelly
Applied Physics Laboratory
515 Ben Hall IR Bldg
class web site:
faculty.washington.edu/kellyapl/classes/ocean569_
2014/
Propagation of Errors
Example 1: linear function
xt is the true value of x
if no bias in the error
Example 2: mean
sample mean averages
errors in data
Mean Squared Error
How much is error reduced by averaging? Examine the mean
squared error or error variance.
The <~> indicates an ensemble average over N realizations.
If the errors are random and similar with no bias
error variance is reduced by a factor of N if errors are
uncorrelated
Errors for Differences
Example 3: difference in time
If errors are random (uncorrelated), the difference increases the
squared errors by a factor of 2
If errors are correlated (bias), the difference reduces the errors
Most errors are a combination of random and bias
General Error Estimates
Given a quantity that is a function of several variables
F(x,y,z)
a variation (or error) in F is related to variations in the variables
or in terms of the error variance
assuming errors in x, y and z are uncorrelated
Error Estimate Example
where ρ and cp are constant
Squared error
Factor out F2
Take ensemble average
and define relative error
Another Example
Wind stress
where cD is constant
Error is given as a fraction r of wind speed
so relative error is
What is the relative error of wind stress (magnitude)?
What is the stress error if the wind speed error is 10%?
Another Example Solution
Wind stress
General formula:
where cD is constant
A 10% error in wind speed s gives a 20% error in stress
Exercise 3: Error Estimates
Known errors for
Q, T and H
Need error
estimates for
• Q/(ρ cpH)
• dT/dt
Exercise 3: Are other terms significant?
1.
2.
compute LHS
estimate total errors for LHS
Is the LHS difference larger than the estimated errors?
Notes:
convert relative error variance of x to error variance using var(x)
check that all units match
Hypothesis Testing
To determine whether a relationship is significant we formulate a null
hypothesis, that the proposed relationship is NOT true
We test to determine if the null hypothesis can be rejected within a
given probabilty, say α = 0.05 (5%). (The level of confidence is
95%.)
A significance test consists of finding the probability of a given result
(a p-value) and comparing that with the alpha test value. If the pvalue (probability) is less than alpha, then the null hypothesis is
rejected.
Test Example
Is the mean <X> of a subsample of X over N points significantly
different from the known mean value μ?
Depends on the std dev (error) of the mean estimate
A measure of how large this is (how likely it is to be significant) is
found from the Z-transform
Probability of Z score (or lower) from a normal distribution N(0,1) is
p = normcdf(Z,0,1)
Or let Matlab do the work:
p = normcdf(<X>,μ,σm)
[Matlab function]
Analysis of Variance (ANOVA)
To test how well a dynamical or statistical model fits observations
d(t) we estimate the fraction of variance described by the model z
Two common types of models are
(1) known function
z = f (x,y)
(2) linear estimator (coefficients by regression)
z=ax+by+c
The ratio of the squared residual (or error)
r2 =( d – z )2
to the variance of the observations σd2 is the fraction of variance
not explained by the model.
Time Series Analysis
The analysis of time series differs from that of independent
objects (tossing dice, medical patient studies, etc) in that the
measurements generally have serial correlation:
So a time series with N points does not have N independent
measurements.
The effective number of independent measurements (degrees
of freedom N*) depends on the degree of correlation of
successive measurements, the autocorrelation of the time
series:
Covariance and Correlation
For two time series x(t) and y(t) covariance is defined as
where <~> is expected value and Δt is a time lag
Correlation is the covariance normalized by the std dev’s
(values between -1 and 1)
Notes:
1) this terminology differs from that in Matlab, but is common
2) when applied to a single variable, x, autocovariance,
autocorrelation
3) these are time-lagged values, but we often use only zero-lag value
4) we generally remove the mean values (as shown)
Correlations
Some common types of correlations:
1) autocorrelation (to get a time scale for the data)
2) correlations between two variables
3) lagged correlations to determine if one variable leads or
lags another
4) vector correlations (as opposed to scalar correlations)
To evaluate a correlation, need an objective measure of
significance
Autocorrelation & Periodic Signals
Autocorrelation of
variable with periodic
signal mostly shows
the periodicity
Remove harmonics
before computing
(auto) correlations for
better interpretation &
statistics
Characteristic Time Scale
Is there a
characteristic
time scale for
each variable?
First zero
crossing?
Or something
more robust?
Integral Time Scale
More robust method: takes
into account shape of
function
integral time scale:
integrate correlation (to first
zero crossing) to get
equivalent time (tau) for
perfect correlation
integral time scales:
1 month for Qnet
4 months for SSH
integral time scales shorter
than zero crossing
integral time
scale
Caution: Covariance from Observations
Autocovariance (or autocorrelation) from a single time
series is an overestimate of the actual function
because the error is correlated with itself.
It should be estimated from two different measurements
of the same quantity at the same location.
If the errors have shorter time scales than the variable,
then the error can be estimated from the autocovariance
at non-zero lags
Autocorrelation:
estimate correction for zero lag
extrapolate to
zero lag
Qnet
difference in
correlation from
unresolved signal
variance and
actual errors
(upper bound)
SSH
Significance of a Correlation
(degrees of freedom)
The integral time scale τ is used to define the number of degrees of
freedom N* of a time series
N* = N/τ
where N is length of the series
which is needed to determine the statistical significance of the correlation
Z-test for significance of the correlation r based on a random parent
distribution ρ of possible correlations
Create a new variable
The mean and std dev of w are
Derivation of Significance Test (cont’d)
For null hypothesis ρ = 0 so μ = 0. Normalize using Z transform
If Z is within region containing fraction (1-α) of distribution
the correlation is NOT significant.
Alternatively, one can solve for the critical value of correlation rc
See Bendat & Piersol for derivation (2000), pp.101-111
Exercise 4: Lagged correlations
SSH at two locations
lag
SSH: longitude-time plot
Can you estimate the
speed of the Rossby wave
from the SSH?
Exercise 4: Vectors
Mean wind vectors
• KEO mooring
• ECMWF
• QuikSCAT
• NCEP2
Note: vector
correlations do not
include means
Vector Correlations
complex correlation gives
persistent direction errors
& magnitude errors
Download