Environmental Data Analysis with MatLab 2nd Edition Lecture 17: Covariance and Autocorrelation SYLLABUS Lecture 01 Lecture 02 Lecture 03 Lecture 04 Lecture 05 Lecture 06 Lecture 07 Lecture 08 Lecture 09 Lecture 10 Lecture 11 Lecture 12 Lecture 13 Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Using MatLab Looking At Data Probability and Measurement Error Multivariate Distributions Linear Models The Principle of Least Squares Prior Information Solving Generalized Least Squares Problems Fourier Series Complex Fourier Series Lessons Learned from the Fourier Transform Power Spectra Filter Theory Applications of Filters Factor Analysis Orthogonal functions Covariance and Autocorrelation Cross-correlation Smoothing, Correlation and Spectra Coherence; Tapering and Spectral Analysis Interpolation Linear Approximations and Non Linear Least Squares Adaptable Approximations with Neural Networks Hypothesis testing Hypothesis Testing continued; F-Tests Confidence Limits of Spectra, Bootstraps Goals of the lecture apply the idea of covariance to time series Part 1 correlations between random variables Atlantic Rock Dataset Scatter plot of TiO2 and Na2O R=0.730804 5 Na2O Na 2O 4 3 2 1 0 0 2 TiO2 TiO2 4 6 idealization as a p.d.f. scatter plot R=0.730804 5 Na Na2O 2O 4 d1 3 2 1 0 0 2 4 TiO2 TiO 2 6 d2 types of correlations positive d2 correlation d2 d1 negative correlation d2 uncorrelated d1 d1 the covariance matrix C= σ12 σ1,2 σ1,3 … σ1,2 σ1,3 σ22 σ2,3 σ2,3 σ32 … … … … … … recall the covariance matrix C= σ12 σ1,2 σ1,3 … σ1,2 σ1,3 σ22 σ2,3 σ2,3 σ32 … … … … … … covariance of d1 and d2 estimating covariance from data divide (di, dj) plane into bins bin, s dj Ddj Ddi di estimate total probability in bin from the data: Ps = p(di, dj) Δdi Δdj ≈ Ns/N bin, s with Ns data dj Ddj Ddi di approximate integral with sum and use p(di, dj) Δdi Δdj≈Ns/N approximate integral with sum and use p(di, dj) Δdi Δdj≈Ns/N shrink bins so no more than one data point in each bin “sample” covariance normalize to range ±1 “sample” correlation coefficient Rij -1 perfect negative correlation 0 no correlation 1 perfect positive correlation |Rij| of the Atlantic Rock Dataset SiO2 TiO2 Al2O2 FeO MgO CaO K2O Na2O CaO MgO FeO TiO2 SiO2 K2O Al2O2 Na2O |Rij| of the Atlantic Rock Dataset SiO2 0.73 TiO2 Al2O2 FeO MgO CaO K2O Na2O CaO MgO FeO TiO2 SiO2 K2O Al2O2 Na2O Atlantic Rock Dataset Scatter plot of TiO2 and Na2O R=0.730804 5 Na2O Na 2O 4 3 2 1 0 0 2 TiO2 TiO2 4 6 Part 2 correlations between samples within a time series Neuse River Hydrograph 4 A) time series, d(t) discharge, cfs d(t), cfs x 10 2 1 PSD, (cfs)2 per cycle/day 0 0 500 1000 1500 2000 2500 time, days time t, days 3000 3500 4000 0.04 0.045 9 x 10 8 6 4 2 0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 frequency, cycles per day 0.05 high degree of short-term correlation what ever the river was doing yesterday, its probably doing today, too because water takes time to drain away Neuse River Hydrograph 4 A) time series, d(t) discharge, cfs d(t), cfs x 10 2 1 PSD, (cfs)2 per cycle/day 0 0 500 1000 1500 2000 2500 time, days time t, days 3000 3500 4000 0.04 0.045 9 x 10 8 6 4 2 0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 frequency, cycles per day 0.05 low degree of intermediate-term correlation what ever the river was doing last month, today it could be doing something completely different because storms are so unpredictable Neuse River Hydrograph 4 A) time series, d(t) discharge, cfs d(t), cfs x 10 2 1 PSD, (cfs)2 per cycle/day 0 0 500 1000 1500 2000 2500 time, days time t, days 3000 3500 4000 0.04 0.045 9 x 10 8 6 4 2 0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 frequency, cycles per day 0.05 moderate degree of long-term correlation what ever the river was doing this time last year, its probably doing today, too because seasons repeat Neuse River Hydrograph 4 A) time series, d(t) discharge, cfs d(t), cfs x 10 2 1 PSD, (cfs)2 per cycle/day 0 0 500 1000 1500 2000 2500 time, days time t, days 3000 3500 4000 0.04 0.045 9 x 10 8 6 4 2 0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 frequency, cycles per day 0.05 1 day 3 days 2.5 discharge lagged by 3 days discharge lagged by 1 days 2.5 4 x 10 2 1.5 1 0.5 0 0 0.5 1 1.5 discharge 2 2.5 4 x 10 4 x 10 2.5 discharge lagged by 30 days 4 30 days 2 1.5 1 0.5 0 0 0.5 1 1.5 discharge 2 2.5 4 x 10 x 10 2 1.5 1 0.5 0 0 0.5 1 1.5 discharge 2 2.5 4 x 10 Let’s assume different samples in time series are random variables and calculate their covariance usual formula for the covariance assuming time series has zero mean now assume that the time series is stationary (statistical properties don’t vary with time) so that covariance depends only on time lag between samples time series of length N time lag of (k-1)Δt using the same approximation for the sample covariance as before 1 using the same approximation for the sample covariance as before 1 autocorrelation at lag (k-1)Δt autocorrelation in MatLab Autocorrelation on Neuse River Hydrograph autocorrelation 6 x 10 5 0 -30 -20 -10 0 lag, days 10 20 30 autocorrelation 6 x 10 5 0 -5 -3000 -2000 -1000 0 lag, days 1000 2000 3000 Autocorrelation on Neuse River Hydrograph symmetric about zero autocorrelation 6 x 10 5 0 -30 -20 -10 0 lag, days 10 20 30 autocorrelation 6 x 10 5 0 -5 -3000 -2000 -1000 0 lag, days 1000 2000 3000 a point in a time series correlates equally well with another in the future and another in the past Autocorrelation on Neuse River Hydrograph peak at zero lag autocorrelation 6 x 10 5 0 -30 -20 -10 0 lag, days 10 20 30 autocorrelation 6 x 10 5 0 -5 -3000 -2000 -1000 0 lag, days 1000 2000 3000 a point in time series is perfectly correlated with itself Autocorrelation on Neuse River Hydrograph falls off rapidly in the first few days autocorrelation 6 x 10 5 0 -30 -20 -10 0 lag, days 10 20 30 autocorrelation 6 x 10 5 0 -5 -3000 -2000 -1000 0 lag, days 1000 2000 3000 lags of a few days are highly correlated because the river drains the land over the course of a few days Autocorrelation on Neuse River Hydrograph negative correlation at lag of 182 days autocorrelation 6 x 10 5 0 -30 -20 -10 0 lag, days 10 20 30 autocorrelation 6 x 10 5 0 -5 -3000 -2000 -1000 0 lag, days 1000 2000 3000 points separated by a half year are negatively correlated Autocorrelation on Neuse River Hydrograph positive correlation at lag of 360 days autocorrelation 6 x 10 5 0 -30 -20 -10 0 lag, days 10 20 30 autocorrelation 6 x 10 5 0 -5 -3000 -2000 -1000 0 lag, days 1000 2000 points separated by a year are positively correlated 3000 Autocorrelation on Neuse River Hydrograph repeating pattern A) autocorrelation 6 x 10 5 0 -30 -20 B) -10 0 lag, days 10 20 30 autocorrelation 6 x 10 5 0 -5 -3000 -2000 -1000 0 lag, days 1000 2000 the pattern of rainfall approximately repeats annually 3000 autocorrelation similar to convolution autocorrelation similar to convolution note difference in sign Important Relation #1 autocorrelation is the convolution of a time series with its time-reversed self integral form of autocorrelation integral form of autocorrelation change of variables, t’=-t integral form of autocorrelation change of variables, t’=-t write as convolution Important Relationship #2 Fourier Transform of an autocorrelation is proportional to the Power Spectral Density of time series so since so since so since Fourier Transform of a convolution is product of the transforms so since Fourier Transform of a convolution is product of the transforms Fourier Transform integral so Fourier Transform of a convolution is product of the transforms since Fourier Transform integral transform of variables, t’=-t so Fourier Transform of a convolution is product of the transforms since Fourier Transform integral transform of variables, t’=-t symmetry properties of Fourier Transform Summary time lag 0 rapidly fluctuating time series narrow autocorrelation function wide spectrum 0 frequency Summary time lag 0 slowly fluctuating time series wide autocorrelation function narrow spectrum 0 frequency