Lecture 17

advertisement
Environmental Data Analysis with MatLab
Lecture 17:
Covariance and Autocorrelation
SYLLABUS
Lecture 01
Lecture 02
Lecture 03
Lecture 04
Lecture 05
Lecture 06
Lecture 07
Lecture 08
Lecture 09
Lecture 10
Lecture 11
Lecture 12
Lecture 13
Lecture 14
Lecture 15
Lecture 16
Lecture 17
Lecture 18
Lecture 19
Lecture 20
Lecture 21
Lecture 22
Lecture 23
Lecture 24
Using MatLab
Looking At Data
Probability and Measurement Error
Multivariate Distributions
Linear Models
The Principle of Least Squares
Prior Information
Solving Generalized Least Squares Problems
Fourier Series
Complex Fourier Series
Lessons Learned from the Fourier Transform
Power Spectral Density
Filter Theory
Applications of Filters
Factor Analysis
Orthogonal functions
Covariance and Autocorrelation
Cross-correlation
Smoothing, Correlation and Spectra
Coherence; Tapering and Spectral Analysis
Interpolation
Hypothesis testing
Hypothesis Testing continued; F-Tests
Confidence Limits of Spectra, Bootstraps
purpose of the lecture
apply the idea of covariance
to time series
Part 1
correlations between random
variables
Atlantic Rock Dataset
Scatter plot of TiO2 and Na2O
R=0.730804
5
Na2O
Na
2O
4
3
2
1
0
0
2
TiO2
TiO2
4
6
idealization as
a p.d.f.
scatter plot
R=0.730804
5
Na
Na2O
2O
4
d1
3
2
1
0
0
2
4
TiO2
TiO
2
6
d2
types of correlations
positive
d2 correlation
d2
d1
negative
correlation
d2 uncorrelated
d1
d1
the covariance matrix
C=
σ12
σ1,2
σ1,3
…
σ1,2 σ1,3
σ22 σ2,3
σ2,3 σ32
… …
…
…
…
…
recall the covariance matrix
C=
σ12
σ1,2
σ1,3
…
σ1,2 σ1,3
σ22 σ2,3
σ2,3 σ32
… …
…
…
…
…
covariance of d1 and d2
estimating covariance from data
divide (di, dj) plane into bins
bin, s
dj
Ddj
Ddi
di
estimate total probability in bin from the data:
Ps = p(di, dj) Δdi Δdj ≈ Ns/N
bin, s with Ns data
dj
Ddj
Ddi
di
approximate integral with sum
and use p(di, dj) Δdi Δdj≈Ns/N
approximate integral with sum
and use p(di, dj) Δdi Δdj≈Ns/N
shrink bins so no more than one
data point in each bin
“sample” covariance
normalize to range ±1
“sample” correlation coefficient
Rij
-1
perfect negative correlation
0
no correlation
1
perfect positive correlation
|Rij| of the Atlantic Rock Dataset
SiO2
TiO2
Al2O2
FeO
MgO
CaO
K2O
Na2O
CaO
MgO
FeO
TiO2
SiO2
K2O
Al2O2
Na2O
|Rij| of the Atlantic Rock Dataset
SiO2
0.73
TiO2
Al2O2
FeO
MgO
CaO
K2O
Na2O
CaO
MgO
FeO
TiO2
SiO2
K2O
Al2O2
Na2O
Atlantic Rock Dataset
Scatter plot of TiO2 and Na2O
R=0.730804
5
Na2O
Na
2O
4
3
2
1
0
0
2
TiO2
TiO2
4
6
Part 2
correlations between samples
within a time series
Neuse River Hydrograph
4
A) time series, d(t)
discharge, cfs
d(t), cfs
x 10
2
1
PSD, (cfs)2 per cycle/day
0
0
500
1000
1500
2000
2500
time, days
time t, days
3000
3500
4000
0.04
0.045
9
x 10
8
6
4
2
0
0
0.005
0.01
0.015 0.02 0.025 0.03 0.035
frequency, cycles per day
0.05
high degree of short-term correlation
what ever the river was doing yesterday, its probably
doing today, too
because water takes time to drain away
Neuse River Hydrograph
4
A) time series, d(t)
discharge, cfs
d(t), cfs
x 10
2
1
PSD, (cfs)2 per cycle/day
0
0
500
1000
1500
2000
2500
time, days
time t, days
3000
3500
4000
0.04
0.045
9
x 10
8
6
4
2
0
0
0.005
0.01
0.015 0.02 0.025 0.03 0.035
frequency, cycles per day
0.05
low degree of intermediate-term correlation
what ever the river was doing last month, today it could
be doing something completely different
because storms are so unpredictable
Neuse River Hydrograph
4
A) time series, d(t)
discharge, cfs
d(t), cfs
x 10
2
1
PSD, (cfs)2 per cycle/day
0
0
500
1000
1500
2000
2500
time, days
time t, days
3000
3500
4000
0.04
0.045
9
x 10
8
6
4
2
0
0
0.005
0.01
0.015 0.02 0.025 0.03 0.035
frequency, cycles per day
0.05
moderate degree of long-term correlation
what ever the river was doing this time last year, its
probably doing today, too
because seasons repeat
Neuse River Hydrograph
4
A) time series, d(t)
discharge, cfs
d(t), cfs
x 10
2
1
PSD, (cfs)2 per cycle/day
0
0
500
1000
1500
2000
2500
time, days
time t, days
3000
3500
4000
0.04
0.045
9
x 10
8
6
4
2
0
0
0.005
0.01
0.015 0.02 0.025 0.03 0.035
frequency, cycles per day
0.05
1 day
3 days
2.5
discharge lagged by 3 days
discharge lagged by 1 days
2.5
4
x 10
2
1.5
1
0.5
0
0
0.5
1
1.5
discharge
2
2.5
4
x 10
4
x 10
2.5
discharge lagged by 30 days
4
30 days
2
1.5
1
0.5
0
0
0.5
1
1.5
discharge
2
2.5
4
x 10
x 10
2
1.5
1
0.5
0
0
0.5
1
1.5
discharge
2
2.5
4
x 10
Let’s assume different samples in
time series are random variables
and calculate their covariance
usual formula for the covariance
assuming time series has zero mean
now assume that the time series is
stationary
(statistical properties don’t vary with time)
so that covariance depends only on
time lag between samples
time series of length N
time lag of (k-1)Δt
using the same approximation for the sample
covariance as before
1
using the same approximation for the sample
covariance as before
1
autocorrelation
at lag (k-1)Δt
autocorrelation in MatLab
Autocorrelation on Neuse River Hydrograph
autocorrelation
6
x 10
5
0
-30
-20
-10
0
lag, days
10
20
30
autocorrelation
6
x 10
5
0
-5
-3000
-2000
-1000
0
lag, days
1000
2000
3000
Autocorrelation on Neuse River Hydrograph
symmetric about zero
autocorrelation
6
x 10
5
0
-30
-20
-10
0
lag, days
10
20
30
autocorrelation
6
x 10
5
0
-5
-3000
-2000
-1000
0
lag, days
1000
2000
3000
a point in a time series correlates
equally well with another in the
future and another in the past
Autocorrelation on Neuse River Hydrograph
peak at zero lag
autocorrelation
6
x 10
5
0
-30
-20
-10
0
lag, days
10
20
30
autocorrelation
6
x 10
5
0
-5
-3000
-2000
-1000
0
lag, days
1000
2000
3000
a point in time series is perfectly
correlated with itself
Autocorrelation on Neuse River Hydrograph
falls off rapidly
in the first few
days
autocorrelation
6
x 10
5
0
-30
-20
-10
0
lag, days
10
20
30
autocorrelation
6
x 10
5
0
-5
-3000
-2000
-1000
0
lag, days
1000
2000
3000
lags of a few days are highly correlated
because the river drains the land over
the course of a few days
Autocorrelation on Neuse River Hydrograph
negative
correlation at lag
of 182 days
autocorrelation
6
x 10
5
0
-30
-20
-10
0
lag, days
10
20
30
autocorrelation
6
x 10
5
0
-5
-3000
-2000
-1000
0
lag, days
1000
2000
3000
points separated by a half year are
negatively correlated
Autocorrelation on Neuse River Hydrograph
positive
correlation at lag
of 360 days
autocorrelation
6
x 10
5
0
-30
-20
-10
0
lag, days
10
20
30
autocorrelation
6
x 10
5
0
-5
-3000
-2000
-1000
0
lag, days
1000
2000
points separated by a year are
positively correlated
3000
Autocorrelation on Neuse River Hydrograph
repeating pattern
A)
autocorrelation
6
x 10
5
0
-30
-20
B)
-10
0
lag, days
10
20
30
autocorrelation
6
x 10
5
0
-5
-3000
-2000
-1000
0
lag, days
1000
2000
the pattern of rainfall
approximately repeats annually
3000
autocorrelation similar to convolution
autocorrelation similar to convolution
note difference in sign
Important Relation #1
autocorrelation is the convolution of a
time series with its time-reversed self
integral form of
autocorrelation
integral form of
autocorrelation
change of
variables, t’=-t
integral form of
autocorrelation
change of
variables, t’=-t
write as
convolution
Important Relationship #2
Fourier Transform of an autocorrelation
is proportional to the
Power Spectral Density of time series
so
since
so
since
so
since
Fourier Transform of a
convolution is product of the
transforms
so
since
Fourier Transform of a
convolution is product of the
transforms
Fourier
Transform
integral
so
Fourier Transform of a
convolution is product of the
transforms
since
Fourier
Transform
integral
transform of
variables, t’=-t
so
Fourier Transform of a
convolution is product of the
transforms
since
Fourier
Transform
integral
transform of
variables, t’=-t
symmetry properties
of Fourier Transform
Summary
time
lag
0
rapidly fluctuating
time series
narrow
autocorrelation
function
wide spectrum
0
frequency
Summary
time
lag
0
slowly fluctuating
time series
wide
autocorrelation
function
narrow spectrum
0
frequency
Download