Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Initial definitions Statistics = the study of methods for: - organising and summarising the data - inferring conclusions about the population (on the basis of a data sample) population sample Organising / summarising the data = DESCRIPTIVE STATISTICS Inferring conclusions about the population = INFERENTIAL STATISTICS e.g.: probability distributions, testing statistical hypothesis Initial definitions Study of probability = study of randomness and uncertainty Variables, processes and data • Variable X : some physical entity for which the value can vary (e.g. in time) e.g.: a discharge time series Q ⇒ specific values: x, q (measurements, observations, outcomes, realizations, …) population X sample x1, x2, …, xn Variables, processes and data • Variable X : some physical entity for which the value can vary (e.g. in time) e.g.: a discharge time series Q ⇒ specific values: x, q (measurements, observations, outcomes, realizations, …) ⇒ continuous or discrete ⇒ univariate or multivariate vector of variables X = [X1, X2, …] e.g.: [Q, H] ⇒ random or non-random variability in the results by repetition of the measurement under identical circumstances Variables, processes and data • Series ⇒ continuous or discrete equidistant or not special case: extreme value series: - annual series - partial duration series / POT series ⇒ stationary or non-stationary statistical properties do not change in time by - trend Variables, processes and data • Series ⇒ continuous or discrete equidistant or not special case: extreme value series: - annual series - partial duration series / POT series ⇒ stationary or non-stationary statistical properties do not change in time by - trend - jump Variables, processes and data • Series ⇒ continuous or discrete equidistant or not special case: extreme value series: - annual series - partial duration series / POT series ⇒ stationary or non-stationary statistical properties do not change in time by - trend - jump - persistence : dependency in time ⇒ deterministic or stochastic/random Variables, processes and data • Processes process = mathematical description of the behaviour of a phenomenon (in time; in space; continuous; discrete; deterministic; stochastic) ⇒ stationary or non-stationary ⇒ ergodic or non-ergodic each realization of the process is a complete and independent representation of all possible realizations of the process (all statistical properties of the process can be achieved from a single realization) Variables, processes and data Ergodic process Non-ergodic process Variables, processes and data • Processes process = mathematical description of the behaviour of a phenomenon (in time; in space; continuous; discrete; deterministic; stochastic) ⇒ stationary or non-stationary ⇒ ergodic or non-ergodic each realization of the process is a complete and independent representation of all possible realizations of the process (all statistical properties of the process can be achieved from a single realization) ⇒ population : ensemble of processes Variables, processes and data • Data : all observations in the sample, together with all other relevant information • Errors in data ⇒ random or systematic Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Descriptive statistics Presentation of data : graphical presentations (of a sample) • Histogram / frequency distribution / frequency density distribution Histogram ← number of data points in each interval Frequency distribution ← + ÷ total number of sample points Frequency density distribution ← + ÷ interval lenght = EMPIRICAL DISTRIBUTION ⇒ theoretical distribution ⇒ looking for - symmetry - outliers interval lenght x Descriptive statistics Presentation of data : graphical presentations (of a sample) cumulative frequency • Cumulative frequency distribution x interval centres or all sample points x1, …, xn Descriptive statistics Presentation of data : graphical presentations (of a sample) • Time series 1− X λ of a variable X, ev. after transformation (e.g. ln(X), BC ( X ) = ) λ 8 Debietmeting Measured time series Gefilterde basisafvoer Filtered baseflow 7 Debiet [m3/s] Discharge [m3/s] 6 5 4 3 2 1 0 0 1000 2000 3000 4000 5000 Tijd [aantal uren] Time [number of hours] 6000 7000 8000 xt or x(t) for a continuous time series xi or x(i) for a discrete time series Descriptive statistics Presentation of data : graphical presentations (of a sample) • Time series ⇒ time series in aggregated form x t intervals : overlapping or non-overlapping (disjoint) e.g. moving average Descriptive statistics Presentation of data : graphical presentations (of a sample) • Time series ⇒ time series in ranked form x 1 2 n Rank number i 100 * rank number = percentage of time the value of x is exceeded sample size n Descriptive statistics Presentation of data : graphical presentations (of a sample) • Time series ⇒ ranked form, using only independent values : e.g. only extremes: annual maxima or POT values Descriptive statistics Presentation of data : graphical presentations (of a sample) • Empirical quantile plot p = Pr[X ≤ x] empirical quantiles empirical theoretical pi * 100% quantile x( i ),th x(i ) 1 - p = Pr[X ≥ x] 1 - pi = Pr[X ≥ x(i)] = i − 0 .5 n Hazen plotting position pi = n − i + 0 .5 n empirical cumul. probabilities p Descriptive statistics Presentation of data : graphical presentations (of a sample) • Q-Q plot empirical quantiles x(i ) 45° x( i ),th theoretical quantiles Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Descriptive statistics Statistical descriptors of data • Measures of central tendency of the data ⇒ mean Relative frequency n x1 + ... + xn xi x= =∑ n i =1 n for grouped data: k x = f x + ... + f k x = ∑ f i xi* * x1* x 2* x k* x * 1 1 * k i =1 Descriptive statistics Statistical descriptors of data • Measures of central tendency of the data ⇒ mean n x1 + ... + xn xi x= =∑ n i =1 n ⇒ root mean square RMS X = ⇒ median x12 + ... + x n2 = n ~ x = x 0 .5 xi2 ∑ i =1 n n Descriptive statistics Statistical descriptors of data ⇒ quantiles / percentiles xα x0.25 x0.75 quartiles Box-plot: xmin x0.25 x0.5 x interquartile range x0.75 xmax Descriptive statistics Statistical descriptors of data • Measures of dispersion of the data ⇒ variance n s X2 = ∑ Relative frequency ( x i − x )2 n −1 i =1 for grouped data: k ( s = ∑ fi x − x 2 X x1* x 2* x k* x i =1 * i ) * 2 Descriptive statistics Statistical descriptors of data • Measures of dispersion of the data ⇒ standard deviation ⇒ mean deviation sX = n ∑ CV X = n −1 i =1 n xi − x i =1 n dX = ∑ ⇒ coefficient of variation ( x i − x )2 sX x Descriptive statistics Statistical descriptors of data • Moments xi − x M1 = ∑ n i =1 n n M2 = ∑ ( x i − x )2 i =1 n n ( x i − x )r Mr = ∑ i =1 n ( ↔ mean deviation ) ( ↔ sX ) s X2 = : the rth sample moment n M2 n −1 Descriptive statistics Statistical descriptors of data • Shape ⇒ symmetry ⇒ skewness coefficient of skewness : CS X = M 3 (x) (M 2 ( x ) ) 2 3 Descriptive statistics Statistical descriptors of data • Shape ⇒ symmetry ⇒ skewness ⇒ kurtosis (peakedness) coefficient of kurtosis : CK X = M 4 (x) (M 2 ( x ) ) 2 Descriptive statistics Statistical descriptors of data • After linear transformation: u = a1 x1 + a 2 x 2 + ... + a n x n u = a1 x1 + a 2 x 2 + ... + a n x n sU2 = a12 s X2 1 + a 22 s X2 2 + ... + a n2 s X2 n Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Elementary probability theory Probability laws event A • complementary event : AC (‘not A’) A and AC are mutually exclusive: A ∩ AC = φ A ∪ AC = S • intersection : A ∩ B (‘A and B’) • union : A ∪ B (‘A or B’) • probability : Pr(A) the ‘relative weight’ of event A ‘proportion in the full population’ ‘frequency density’ event B sample space S Elementary probability theory Probability laws 0 ≤ Pr(A) ≤ 1 Pr(A ∪ B) = Pr(A) + Pr(B) - Pr(A ∩ B) Pr(A ∩ B) = Pr(A|B) . Pr(B) = Pr(B|A) . Pr(A) conditional probabilities Pr(A|B) = Pr(B|A) . Pr(A) independency : Pr(B) : Bayes’ rule Pr(A|B) = Pr(A) Pr(B|A) = Pr(B) Pr(A ∩ B) = Pr(A) . Pr(B) Elementary probability theory Probability laws Theorim of total probabilities : A1 n Pr(B) = ∑ Pr(B|Ai) . Pr(Ai) ... i=1 A2 A3 Ai : mutually exclusive events A1 ∪ A2 ∪ … ∪ An = S An Elementary probability theory Probability functions Discrete random variable Continuous random variable Pr( x1 ≤ X ≤ x2 ) = fX(x) = Pr(X=x) x2 ∫f X ( x)dx x1 +∞ ∫f X ( x)dx = 1 −∞ Probability mass function FX(x) = Pr(X≤ x) = ∑ ∀ xi ≤x fX(xi) Probability density function x FX ( x ) = ∫f X ( x)dx −∞ f X ( x) = dFX ( x) dx FX (−∞) = 0 FX (+∞) = 1 (Cumulative) distribution function (Cumulative) distribution function Elementary probability theory Moments of distributions of random variables ⇒ mean or expected value µ X = E[X ] n for a probability mass function: µ X = ∑ xi f X ( xi ) i =1 +∞ for a probability density function: µ X = ∫ xf X ( x)dx −∞ ⇒ variance Var [X ] = σ 2X for a probability mass function: for a probability density function: n σ = ∑ ( xi − µ X ) 2 f X ( xi ) 2 X i =1 +∞ σ = 2 X 2 ( x − µ ) f X ( x)dx X ∫ −∞ Elementary probability theory Moments of distributions of random variables ⇒ mean or expected value µ X = E[X ] ⇒ variance Var [ X ] = σ 2X σ X : standard deviation σX VX = : coefficient of variation µX ⇒ higher order moments for a probability mass function: µ (r ) X n = ∑ ( xi − µ X ) r f X ( xi ) i =1 +∞ for a probability density function: µ (r ) X = r ( x − µ ) f X ( x)dx X ∫ −∞ Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Probability distributions = Models of distributions of random variables Normal or Gaussian distribution 1 x−µ X f X ( x) = exp − 2 σX σ 2π 1 x FX ( x ) = ∫f X ( x)dx 2 for -∞ ≤ x ≤ +∞ → cfr Table −∞ notation: properties: X ¬ N (µ X , σ 2X ) • symmeteric around µX • almost equal to the distribution of the sum of a large number of iid variables → central limit theorem • the distribution of a linear function of normally distributed variables is also normal ⇒⇒ model of sums Probability distributions Normal or Gaussian distribution 1 x−µ 1 X exp − f X ( x) = 2 σX σ X 2π x FX ( x ) = ∫f X ( x)dx 2 for -∞ ≤ x ≤ +∞ → cfr Table −∞ special case: standard normal distribution N (0,1) Z ¬ N (0,1) Φ ( z ) = Pr( Z ≤ z ) X = µX + Z σX x −µX σX ⇒ FX ( x ) = Φ Probability distributions Lognormal distribution Lognormal distribution equals normal distribution after ln transformation: X ¬ LN (µ X , σ 2X ) ⇔ ln X ¬ N (µ ln X , σ ln2 X ) relation between (µ X , σ X ) and (µ ln X , σ ln X ) : µ ln X µ 2X 1 = ln 2 σ 2 X + 1 µX σ ln2 X = ln(V X2 + 1) Probability distributions Lognormal distribution After power function transformation: y = x1a1 x 2a2 ... x nan → ln y = a1 ln x1 + a 2 ln x 2 + ... + a n ln x n X i ¬ LN (µ ln X i , σ ln2 X i ) ∀i → Y ¬ LN (a1µ ln X 1 + ... + a n µ ln X n , a12 σ ln2 X 1 + ... + a n2 σ ln2 X n ) ⇒⇒ model of products Probability distributions Exponential distribution f X ( x) = λ exp(−λx) FX ( x) = 1 − exp(−λx) if x ≥ 0 f X ( x) = FX ( x) = 0 if x < 0 moments: µX = σX = 1 λ ⇒⇒ model of time between events Probability distributions Exponential distribution Poisson process : a special case of the occurrence of events t=0 in time time t properties: • the probability of an event in a short interval of time [t, t+h] is approximately λh • the probability of more than one event in a short interval of time is negligible • the probability is independent of time or the probability in any other interval ( = memory-less property ) parameter: λ = average number of events per unit of time interval lenght Probability distributions Exponential distribution extention with threshold xt: FX ( x) = 1 − exp(−λ ( x − xt )) f Tk (t ) extention to a higher order: Gamma distribution : λk k −1 f X ( x) = x exp(−λx) if x ≥ 0 Γ(k ) Gamma function: +∞ Γ(k ) = ∫ t k −1 exp(−t )dt 0 moments: µX = λt k λ σX = k λ Gamma distribution with threshold: Pearson III distribution Probability distributions Pareto distribution FX ( x ) = 1 − x − α α : Pareto index Probability distributions Weibull distribution x τ FX ( x) = 1 − exp − β τ : Weibull index Two limiting cases: τ =1 : exponential distribution ( λ = τ =0 : Pareto 1 ) β Probability distributions Uniform distribution fX(x) 1 xmax - xmin xmin xmax x ⇒⇒ equally likely model Probability distributions Beta distribution α=2, α+β=3 α=1, α+β=3 f X ( x) = 1 x α −1 (1 − x) β−1 B ( α , β) α=1, α+β=2 Beta function: α=0.5, α+β=1 B(α, β) = α=1 α=7 α+β=8 α=6 α=2 α=4 Γ(α) Γ(β) Γ(α + β) Probability distributions Normal related or sampling distributions t-distribution : distribution of sample mean of a population X Chi-square distribution : distribution of sample variance of a population X F-distribution : distribution of the ratio of the sample variances of 2 populations X1 and X2 Normal related or sampling distributions Chi-square distribution CH (n) = χ 2n f X ( x) = FX (x) x n −1 2 n 2 n 2 Γ( ) 2 x n −1 2 x exp(− ) 2 if x > 0 → cfr Table • degrees of freedom : n • moments : µX = n σ X = 2n • the distribution of the sum of squares of n iid standard normal variables : Z 12 + Z 22 + ... + Z n2 ¬ χ 2n • n large : χ 2n asympt. N (n, 2n) S X2 • sampling distribution of the variance : (n − 1) ¬ χ 2n −1 2 σX Normal related or sampling distributions Chi-square distribution Distribution of sample variance : assume: X1, X2, …, Xn a random sample from N (µ X , σ 2X ) 2 ( X − X ) i sample variance: S = ∑ n −1 i =1 2 X calculations: n X i ¬ N (µ X , σ 2X ) Xi − µX ¬ N (0, 1) σX ( X i − µ X )2 2 ¬ χ ∑ n σ 2X i =1 n S X2 (n − 1) 2 ¬ χ 2n −1 σX µX → x χ 2n −1 Normal related or sampling distributions t-distribution n +1 ) 2 f X ( x) = n Γ ( ) nπ 2 Γ( FX (x) ν=n x 1 + n 2 − n +1 2 → cfr Table • degrees of freedom : n • sampling distribution of the mean : X − µX ¬ t n −1 sX n Normal related or sampling distributions t-distribution Distribution of sample mean : assume: X1, X2, …, Xn a random sample from N (µ X , σ 2X ) sample mean: X = X 1 + ... + X n n [ ] calculations: µ X = E X = σ = 2 X E [X 1 ] + ... + E [ X n ] = E[X ] = µ X n σ 2X 1 + ... + σ 2X n n2 σ 2X = n X ¬ N (µ X , σ 2X ) X −µX ¬ N (0, 1) σX n σ X → sX t n −1 Normal related or sampling distributions Overview: sample mean X 2 sample variance S X population variance σ 2X known population variance σ 2 X not known X −µX ¬ N (0, 1) σX n X − µX ¬ t n −1 sX n S X2 (n − 1) 2 ¬ χ 2n −1 σX Normal related or sampling distributions F-distribution Consider 2 populations: X1, X2 2 SX 1 2 X2 S σ 2X 1 σ 2X 2 ¬ F (n1 − 1, n2 − 1) Probability distributions Modified distributions Truncated distributions Compound distributions Modified distributions Truncated distributions * fX(x) e.g. truncated normal distribution f X* ( x) : f X ( x) f X (x) boundary conditions: FX* ( x0 ) = 0 FX* (+∞) = FX (+∞) = 1 ⇒ FX* ( x) = x0=0 x FX ( x ) − FX ( x 0 ) 1 − FX ( x 0 ) FX(x) 1 FX(x) F * ( x ) X 0 x0=0 x Modified distributions Compound distributions e.g. mixture of two populations X1 and X2 FX ( x) = p1 FX 1 ( x1 ) + (1 − p1 ) FX 2 ( x 2 ) p2 Probability distributions Multivariate distributions X random vector: Y f X ,Y ( x, y ) dx dy = Pr[( x ≤ X ≤ x + dx ) ∩ ( y ≤ Y ≤ y + dy )] e.g. bivariate normal distribution : f X ,Y ( x, y ) = 1 2πσ X σ Y x − µ 1 X exp − 2 2(1 − ρ ) σ X 1− ρ 2 2 ( x − µ X )( y − µ X ) y − µY − 2 ρ + σ Xσ Y σY +∞ Marginal distribution : f X ( x) = ∫f X ,Y ( x, y ) dy −∞ Conditional distribution : f X Y ( x) = ∫f X ,Y condition for Y ( x, y ) dy 2 Probability distributions Multivariate distributions for independent random variables X and Y : f X ,Y ( x, y ) = f X ( x) f Y ( y ) for correlated random variables X and Y : after linear transformation Z = aX + bY E [Z ] = a E [ X ] + b E [Y ] Var [Z ] = a 2 Var [ X ] + b 2 Var [Y ] + 2 a 2 b 2 Cov[X , Y ] n Covariance : Cov[X , Y ] = ∑ i =1 ( xi − x )( y i − y ) n −1 Correlation coefficient : ρ X ,Y = Cov[ X , Y ] σ X σY Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Estimation of distribution parameters Distribution parameters: θ Estimators: • based on quantities derived from the sample [] • should be unbiased : E θ̂ = θ Estimation of distribution parameters Method of moments Method: θ̂ so that: µ (X1) (θˆ ) = M 1 µ (X2 ) (θˆ ) = M 2 ... Moments of the theoretical distribution Examples: Normal distribution: Sample moments X ¬ N (µ X , σ 2X ) µX = x σ 2X = s X2 Exponential distribution: µ X = 1 =x λ Estimation of distribution parameters Method of maximum likelihood Method: Derivation of the ‘most likely’ parameter values by maximising the likelihood (the probability of occurrence) the sample has been drawn from the distribution Likelihood function: L( θ) = n ∏f X ( x i θ) i =1 Maximum likelihood method: ∂L(θˆ ) =0 ∂θ L(θˆ ) = max or: using the log-likelihood function ln( L(θ)) : n ∑ ∂(ln f X ( xi θ)) i =1 e.g. exponential distribution: ∂ (ln(λ exp(−λxi ))) =0 ∑ ∂λ i =1 n ∂θ =0 λ= 1 x Confidence intervals : intervals that contain with high probability the ‘true’ parameter value e.g.: (1- α ) 100 % confidence interval for a parameter θ after estimation by an estimator θ̂ : f θˆ (θˆ ) α 100 % cumulative probability 2 θ̂1 θ̂ 2 θ̂ [θˆ , θˆ ] = (1- α ) 100 % two-sided confidence interval 1 2 f θˆ (θˆ ) α 100 % cumulative probability [θˆ ,+∞] = (1- α ) 100 % one-sided confidence interval θ̂1 1 θ̂ Confidence intervals interpretation of the confidence interval : NOT : “the true value θ lies with probability 1- α within the interval” BUT : “if an infinite number of random samples are taken, in (1-α) 100 % of the cases the true value θ lies within the interval” θ̂1 θ̂ 2 : sample 1 θ̂1 for (1-α ) 100 % of the samples, the true value θ lies within the interval; while for α 100 % of the samples, the true value θ lies outside the interval θ̂1 θ̂ 2 θ̂ 2 θ̂1 : sample 3 θ̂ 2 θ̂1 : sample 2 θ̂ 2 : sample 4 : sample 5 … etc θ θ̂ Confidence intervals Example for sampling distributions: θ = µX θ̂ = X sample mean X θ = σ 2X θ̂ = s X2 2 sample variance S X population variance σ 2X known population variance σ 2 X not known X −µX ¬ N (0, 1) σX n X − µX ¬ t n −1 sX n S X2 (n − 1) 2 ¬ χ 2n −1 σX Confidence intervals Example for sampling distributions; e.g. sample mean, pop. variance known: X −µX ¬ N (0, 1) σX n f X −µ X ( σX n x −µX ) σX n α Φ −1 ( ) = − z α 2 2 zα x −µX σX n 2 Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of prarameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Extreme value analysis : searching for the distribution in the tail of fX(x) and FX(x) in the quantile plot: 35 POT waarden Hill-type regressie Optimale drempel 30 independent extreme values x [m3/s] Debiet 25 20 extreme value distribution above a threshold xt 15 xt 10 5 0 0 1 2 3 4 -ln ( 1-G(x) ) -ln( “exceedance probability” ) -ln( i / (m+1) ) -ln( 1 - G(x) ) 5 6 Extreme value analysis • Extraction of independent values from the time series: ‘Peak-Over-Threshold (POT)’ or ‘Partial-Duration-Series (PDS)’ method : consider independent POT-values x1≥ x2 ≥... ≥ xm • Pickands (1975) : xi > xt : FX|X ≥ xt → Generalized Pareto distribution (GPD) G ( x )= 1 − (1 + γ for γ<>0 • x − xt β ) − 1 γ G ( x )= 1 − exp( − for γ =0 Extreme value index γ : a measure of the tail-heaviness of the distribution x − xt β ) Extreme value analysis • 0.05 • probability density fX(x) 0.045 0.04 • 0.035 0.03 γ>0: Pareto-class; heavy tails γ=0: Gumbel/Exponential - class; normal tails γ<0: final right-endpoint; light tails 0.025 0.02 0.015 Extreme value index : positive zero negative 0.01 0.005 0 0 10 20 30 x 40 50 60 Extreme value analysis Exponential quantile plot: 35 POT waarden Hill-type regressie Optimale drempel Discharge [m3/s] Debiet [m3/s] 30 25 β 20 15 xt 10 G ( x )= 1 − exp( − 5 0 0 1 2 3 4 -ln ( 1-G(x) ) -ln( “exceedance probability” ) -ln( i / (m+1) ) -ln( 1 - G(x) ) 5 6 x − xt β ) Extreme value analysis Pareto quantile plot: 35 POT waarden Hill-type regressie Optimale drempel ln( Discharge x [m3/s] ) Debiet [m3/s] 30 25 γ 20 15 xt 10 G ( x )= 1 − (1 + γ x − xt β 5 0 0 1 2 3 -ln ( 1-G(x) ) 4 -ln( “exceedance probability” ) -ln( i / (m+1) ) -ln( 1 - G(x) ) 5 β= γ x t 6 ) − 1 γ Extreme value analysis Overview: Extreme value index γ>0 Extreme value index γ=0 GEV distribution Gumbel distribution GPD distribution Exponential distribution Weibull distribution -> Pareto QQ-plot -> Exponential QQ-plot -> Weibull QQ-plot Extreme value analysis Examples: Exponential QQ-plot Dataset 1 - normal tail 100 90 80 70 x 60 50 40 30 20 10 0 0 1 2 3 -ln( 1-G(x) ) 4 5 6 Extreme value analysis Examples: Slope in exponential QQ-plot Dataset 1 - normal tail 30 1000 900 25 800 700 20 15 500 400 10 300 200 5 100 0 0 50 100 150 200 number of observations above threshold 250 0 300 MSE slope 600 Extreme value analysis Examples: Pareto QQ-plot Dataset 1 - normal tail 5 4.5 ln( x ) 4 3.5 3 2.5 0 1 2 3 -ln( 1-G(x) ) 4 5 6 Extreme value analysis Examples: Slope in Pareto QQ-plot Dataset 1 - normal tail 1 0.7 0.9 0.6 0.8 slope 0.6 0.4 0.5 0.3 0.4 0.3 0.2 0.2 0.1 0.1 0 0 0 20 40 60 80 number of observations above threshold 100 120 MSE 0.5 0.7 Extreme value analysis Examples: Pareto QQ-plot Dataset 2 - heavy tail 1.2 1 0.8 ln ( x ) 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 0 1 2 3 -ln( 1-G(x) ) 4 5 6 Extreme value analysis Examples: Slope in Pareto QQ-plot Dataset 2 - heavy tail 0.4 0.7 0.35 0.6 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.05 0 0 0 50 100 150 200 number of observations above threshold 250 300 MSE extreme value index 0.3 Extreme value analysis Two methods for extraction of independent extremes from a time series: • Annual (periodic) maxima method • Peak-Over-Threshold (POT) or Partial-Duration-Series (PDS) method 8 7 Debietmeetreeks POT waarden Debiet [m3/s] 6 5 4 3 2 1 0 1-Nov-93 11-Nov-93 21-Nov-93 1-Dec-93 Tijd 11-Dec-93 21-Dec-93 31-Dec-93 Extreme value analysis Annual (periodic) maxima method: : Generalized Extreme Value (GEV) distribution x − xt −1 / γ ) ) β x − xt = exp(− exp(− )) β if γ≠0 if γ=0 H ( x) = exp(−(1 + γ Poisson process Peak-Over-Threshold (POT) or Partial-Duration-Series (PDS) method: : Generalized Pareto Distribution (GPD) G ( x ) = 1 − (1 + γ x − xt G ( x ) = 1 − exp( − β ) −1 / γ x − xt β ) if γ ≠0 if γ =0 Extreme value analysis Extreme value index γ>0 Extreme value index γ=0 Method of periodic maxima GEV distribution Gumbel distribution POT method GPD distribution Exponential distribution Extreme value analysis Return period: for POT extremes: total number of years n 1 T ( x) = t P[ X > x | X > xt ] number of exceedences of the threshold level xt T : return period [years] extreme value distribution Extreme value analysis Return period: for POT extremes: total number of years n 1 T [ years] = t 1 − G ( x) number of exceedences extreme value distribution of the threshold level xt Extreme value analysis Return period: for annual maxima: 1 T [ years] = 1 − H ( x) 1 T AM = 1 − exp(− 1 TPOT ) Extreme value analysis Frequency factor KT: n 1 T [ years] = = = g ( x) t (1 − G ( x)) 1 − H ( x) x = g −1 (T ) = µ X + K T σ X Extreme value analysis Confidence limits: • empirical: – parametric bootstrap method – non-parametric bootstrap method • analytical for the ML-method: Var (θ1 ) Cov (θ1 , θ 2 ) ... Cov (θ 2 , θ1 ) Var (θ 2 ) ... ... ... ... ∂2L ∂θ12 ∂2L ∂θ1∂θ 2 ∂ L ∂ L ∂θ 2 ∂θ1 ∂θ 22 2 ... 2 ... ... ... ... Extreme value analysis Naturalization of data : abstraction discharges and other man-made influences may cause problems of dependency or non-randomness of the data ⇒ elimination of these influences Urban drainage system Untreated domestic sources WWTP Sewer system ancillaries (SST) Watercourses Industrial sources Rainfall-runoff agricultural pollution Upstream discharges Extreme value analysis Other influences : • • • influence of river flooding (floodplain or bank storage) inaccurate extrapolation of rating curve outliers Extreme value analysis Flooding influence Extreme waarden analyse debiettijdreeks 12 11 Discharge [m3/s] Debiet [m3/s] 10 9 8 Limnigraafdebieten River discharge measurements Equivalente Equivalentneerslagafstromingsdebieten upstream rainfall-runoff discharges Equivalentneerslagafstromingsdebieten, upstream rainfall-runoff discharges, Equivalente na correctie op de extrapolatie hetrating Q-H verband after correction van of the curve Gekalibreerde extreme waarden verdeling Calibrated extreme value distribution 7 6 q* 5 4 3 2 0.1 1 Terugkeerperiode [jaar] Return period [years] 10 Extreme value analysis Flooding influence 10 Limnigraafgegevens River discharge measurements 9 Equivalent upstream rainfallGeschatte neerslagafstromingsdebieten runoff discharges Debiet [m3/s] Discharge [m3/s] 8 (op basis van inverse riviermodellering) 7 6 q* 5 4 3 2 1 0 950 1000 1050 1100 1150 Tijd [aantal Time uren] 1200 1250 Extreme value analysis Influence rating curve extrapolation 3 Original rating Q-H curveverband Oorspronkelijk Geschatte afvlakking Q-H Rating curve correction forverband fl. infl. Water level [m][m] Waterhoogte 2.5 2 1.5 1 0.5 0 0 1 2 3 4 5 6 Debiet [m3/s] Discharge [m3/s] 7 8 9 10 Extreme value analysis Influence of outliers 20 Simulatieresultaten, periode 1986-1996 Simulation results rainfall-runoff model Simulatieresultaten, periode 1898-1997 Discharge [m3/s] Debiet [m3/s] Extreme value distribution Extreme-waarden-verdeling 15 10 5 0 0.01 0.1 1 Return period [years] Terugkeerperiode [jaar] 10 100 Extreme value analysis For minima (e.g. low flow or drought frequency analysis) • the extreme value analysis method for floods is still valid after transformation x → -x or x → 1/x • the lower limit for zero discharges has to be taken into account ⇒ the lower limit becomes an upper limit with the transformation -x (bounded case; light tail) ⇒ unbounded case with the transformation 1/x (normal or heavy tail) Extreme value analysis For minima (e.g. low flow or drought frequency analysis) • when zero discharges occur, a bi-modal probability distribution model has to be considered ⇒ separation of the zero flow and non-zero flow conditions (the extreme value analysis method is only valid for the latter conditions) • for the POT method, low flow periods will be considered independent when they are separated by a high flow period Extreme value analysis Consideration of the time duration (the aggregation level): IDF-curves Rainfall intensity [mm/h] e.g. rainfall intensities Aggregation-level [days] Extreme value analysis Consideration of the time duration (the aggregation level): e.g. rainfall intensities IDF-curves design storms Rainfall intensity [mm/h] Neerslagintensiteit 60 50 40 30 20 10 0 0 1 2 3 Tijd [h][h] Time 4 5 6 Extreme value analysis Consideration of the time duration (the aggregation level): e.g. discharges QDF-curves synthetic hydrographs 25 Historische gebeurtenis Composiethydrogram Debiet [m3/s] 20 15 10 5 0 0 10 20 30 40 50 60 70 80 90 Tijd [aantal uren] 100 110 120 130 140 150 Extreme value analysis Consideration of the time duration (the aggregation level): e.g. low flow discharges Consideration of the duration of the low flow or drought period ⇒ discharge/duration/frequency relationships ⇒ considering different durations relevant for the several applications: – agricultural applications – irrigation – power plants – domestic supply – pollution – etc. Extreme value analysis Consideration of the time duration (the aggregation level): e.g. water levels HDF-curves synthetic limnigraphs Extreme value analysis Consideration of the time duration (the aggregation level): e.g. concentrations CDF-curves immission standards 0 Class A DO concentration [mg/l] 2 Class B 4 Water with fish powder 6 Fishery with trout 8 Fishery with carp Class C Model, 1h VMM immission measurements Intermittent standards; DWPCC, 1985 Intermittent standards; FWR, 1998 10 0.01 0.1 1 12 Return period [years] 10 100 Extreme value analysis Ungauged locations - Regionalisation analysis: Step 1: Identification of homogeneous subregions for the statistical properties, based on: – meteorological, geological, and geomorfological characteristics – rainfall-runoff model parameters – extreme value distribution parameters of peak discharges and/or rainfall intensities – more global statistics of the discharge series, such as the coefficient of variation (CV, CS, CK) Extreme value analysis Ungauged locations - Regionalisation analysis: Step 2: Derivation of relationships for each homogeneous region between the distribution parameters and catchment characteristics (area, length, topography/slope, land use, soil type, …) Most common approach: – Mean extreme value (e.g. annual maximum method: mean annual maximum MAF): at-site dependent with catchment characteristics – Growth curve = extreme value distribution for X/MAF: identical per region ¾ Per homogeneous region: regional growth curve based on all data stations ¾ Solves partly the data limitation problem Extreme value analysis Data problems: • • • • • Missing gaps (especially during the important periods with flood conditions) Difference in record length for different stations Shortage of data Non-homogeneous series by morfological changes Systematic measurement errors (outliers) Extreme value analysis Missing gap filling: • • Calculation of the correlation between stations ⇒ based on individual values and on cumulative amounts Filling up of missing gaps ⇒ based on the measurements of neighbouring stations and the calculated correlation with these stations ⇒ larger gaps: use of the rainfall-runoff model Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses Selection of the type of distribution 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Selection of the type of distribution Based on: • Shape of the distribution • Boundary conditions • Knowledge and understanding of the physics and its influence on the distribution • Distribution class Selection of the type of distribution Distribution class based on the distribution’s tail (cfr. extreme value analysis): Extreme value index γ>0 Pareto distribution τ=0 Extreme value index γ=0 Extreme value index γ<0 normal distribution Beta distribution lognormal distribution uniform distribution exponential Weibull distribution distribution τ=1 τ>0 Gamma distribution GPD distribution γ>0 γ=0 GEV distribution γ<0 Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Statistical hypothesis testing H0 : null-hypothesis = a statement about a random variable if the observations in a random sample are consistent with H0 : ACCEPT H0 if not consistent with H0 : REJECT H0 The decision is based on a STATISTICAL HYPOTHESIS TEST Statistical hypothesis testing fT (t ) known distribution of the test-statistic if H0 is true α 2 α 100 % cumulative probability 2 test-statistics T reject H0 accept H0 reject H0 α = SIGNIFICANCE LEVEL of the test = probability that H0 is rejected while H0 is true → type I error Statistical hypothesis testing reality H0 true Accept H0 no error Reject H0 Type I error H0 wrong Type II error no error type II error = the error of accepting H0 when H0 is not true β = probability of a type II error under an alternative hypothesis H1 = POWER of the test Statistical hypothesis testing Example: use of the sample mean X as a test-statistic for an hypothesis dealing with a statement about the population mean µ X e.g.: H0: µ X = µ 0 Alternative hypothesis H1: µ X = µ1 Test statistic: X f X (x ) N (µ 0 , σ 2X n ) β α 2 µ0 reject H0 accept H0 α 2 µ1 reject H0 x Examples of hypothesis tests test-statistic H0 µ X = µ0 sample mean µ X1 = (a) µ X 2 σ σ ( 2) X1 ( 2) X =σ ( 2) 0 = (a) σ p = p0 ( 2) X2 difference of sample means name of the test distribution X − µX X X1 − X 2 sample variance ratio of sample variances sample proportion 2 X s /n t-test ¬ t n −1 or N(0,1) X 1 − (a ) X 2 − ( µ X1 − (a ) µ X 2 ) s / n1 + (a ) s / n2 2 X1 2 X2 S X2 (n − 1) 2 ¬ χ 2n −1 σX SX S X1 S X2 1 SX2 2 P̂ 2 (a ) S 2 X2 ¬ F (n1 − 1, n2 − 1) Pˆ − p ¬ N (0,1) Pˆ (1 − Pˆ ) / n for n large ρ = ρ0 λ = λ0 sample correlation coefficient λ-parameter estimate Poisson process R λ̂ R n−2 ¬ tn−2 if ρ = 0 1− R λˆ − λ ¬ N (0,1) for n large λˆ / n 2 t-test ¬ tn1 +n2 −2 or N(0,1) χ2-test F-test Examples of hypothesis tests H0 Serial correlation of order 0: r=r0 test-statistic Wald-Wolfowitz statistic distribution n R = ∑ X i X i +1 i =1 R + a2 /(n − 1) ¬ N (0,1) a2 / n − 1 if no serial correlation name of the test WaldWolfowitz test n a2 = ∑ i 2 i =1 = Long-term trend in a time series Man-Kendall statistic k Tk = ∑ N i i =1 With Ni the number of sample points for which Xj < Xi ( ∀ j < i ) n(n + 1)(2n + 1) 6 Tk − k (k − 1) / k ¬ N (0,1) k (k − 1)(2k + 5) / 72 if no trend ManKendall trend test Examples of hypothesis tests Tests for the validity of a specific theoretical probability distribution: test-statistic distribution ( N i − ei ) 2 histogram statistic T = ∑ ¬ χ 2n −1 ei i =1 n name of the test χ2 goodness-of-fit test with Ni : the number of sample points in interval i of the histogram ei : the expected number of sample points, corresponding the theoretical distribution fX(x) ei = n ∫f X i nterval i ( x)dx KolmogorovD = max pi − FX ( xi ) i =1,...,n Smirnov statistic ¬ Kolmogorov-Smirnov table KolmogorovSmirnov goodness-of-fit test Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Regression and correlation linear regression: y dependent variable yi a + b xi y=a+bx error or residual ei x independent or explanatory variable Regression and correlation estimation of the parameters of the regression curve by the least squares method: minimization of the mean squared error MSE: n ( yi − (a + bxi )) 2 ei2 MSE = ∑ =∑ n−2 i =1 i =1 n − 2 n n ∂MSE =0 ∂a ∂MSE =0 ∂b bˆ = ∑(y i =1 i − y )( xi − x ) n ∑ (x − x) i =1 aˆ = y − bˆx i 2 Regression and correlation ideal case: errors in y-direction independent on x (possibly transformation of y needed): the errors then can be represented by a single distribution: fE(e) 2 e s E2 = ∑ i = MSE i =1 n − 2 n σE 0 e Regression and correlation Prediction of Y based on x: y + σE y = a + b x fYi xi ( yi ) − σE µYi xi xi x Regression and correlation Assuming the model y = a + bx is known: Yi = a + bxi + E E ¬ N (0, σ 2E ) Yi ¬ N (a + bxi , σ 2E ) Regression and correlation Estimation uncertainty parameters: ∑ ( E[Y ] − E[Y ]) n n Bˆ = ∑ (Y − Y ) i =1 i 2 ( xi − x ) n ∑ (x − x) i =1 i 2 [] E Bˆ = [] Var Bˆ = i =1 n ∑ (x − x) ( xi − x ) 2 i i =1 σ 2E n ∑ (x − x) 2 i i =1 Aˆ = Y − Bˆ x 2 i [] 1 ˆ Var [A] = σ ( + n E Aˆ = a 2 E x2 n 2 ( ) x − x ∑ i i =1 ) =b Regression and correlation Estimation uncertainty parameters: n Bˆ = ∑ (Y − Y ) i =1 i 2 ( xi − x ) n ∑ (x − x) i =1 i Aˆ = Y − Bˆ x 2 Bˆ − b ¬ N (0,1) σ Bˆ Bˆ − b ¬ tn−2 S Bˆ Aˆ − a ¬ N (0,1) σ Aˆ Aˆ − a ¬ tn−2 S Aˆ Regression and correlation When the regression model yˆ = aˆ + bˆx is estimated: Yi = Aˆ + Bˆ xi + E µ̂Yi xi [ ] [] [] Var [µ ] = Var [Aˆ ]+ x E µYi xi = E Aˆ + E Bˆ xi = a + bxi 2 i Yi xi [] Var Bˆ 2 x − x ( ) 1 = σ 2E ( + n i ) n 2 x − x ( ) ∑ i i =1 Regression and correlation Uncertainty on regression curve, by parameter uncertainty: y µY f µY xi (µYi ) i µYi xi xi x Regression and correlation When the regression model yˆ = aˆ + bˆx is estimated: Yi = Aˆ + Bˆ xi + E µ̂Yi xi + E [ ] [] [] Var [µY ] = Var [Aˆ ]+ x E µYYii xi = E Aˆ + E Bˆ xi = a + bxi 2 i Yii xi Yi − E [Yi ] ¬ tn−2 σYi [] Var Bˆ 2 x − x ( ) 1 = σ 2E ( + n i ) n 2 x − x ( ) ∑ i i =1 +σ 2 E Regression and correlation Different uncertainty-sources: y parameter uncertainties + model-structure uncertainties + input uncertainties xi x Regression and correlation Link with correlation: y sY y fY ( y ) x x sX f X (x) Regression and correlation y yi ŷi y x xi x Regression and correlation Variance decomposition: 2 2 2 n n ˆ ˆ ( y − y ) ( y − y ) ( y − y ) i sY2 = ∑ i =∑ i +∑ i n−2 n−2 n−2 i =1 i =1 i =1 n 2 stot sm2 od 2 tot s sE2 sm2 od : proportion of total variance explained by the regression = a measure of the ‘goodness-of-fit’ of the regression Regression and correlation Sensitivity analysis: s 2 m od dy 2 2 = ( ) sx dx model sensitivity Regression and correlation Goodness-of-fit: sm2 od 2 tot s (Cov( X , Y )) 2 2 = = R 2 2 s X sY : coefficient of determination if X,Y bivariate normal: = ρ2 Mathematical modelling considering uncertainties: Linear model: y = a+bx Y = Aˆ + Bˆ X + E General model: y = F ( x, p ) Y = F ( X , Pˆ ) + E p x(t) > Fi , i=1,n > Y(t) Model uncertainty analysis Î probabilistic modelling Considering • input uncertainties • parameter uncertainties • model-structure uncertainties Parameter uncertainties fP(p) x(t) > Fi , i=1,n + > + > Y(t) n finite EY model-str. unc. (t) EX(t) >t Input uncertainties >t Model-structure uncertainties Probabilistic modelling Example: Lumped conceptual rainfall-runoff model : Rainfall input Rainfall-input uncertainty EX ∑ Es+i Model-structure uncertainties • surface runoff and interflow • baseflow Eg ∑ Evapotranspiration ∑ Soil moisture storage kIF kBF kOF Surface runoff Interflow Baseflow Probabilistic modelling Example: Lumped conceptual rainfall-runoff model : 7 Measurements 3 Discharge Discharge[m [m3/s] /s] 6 Probabilistic model results 5 4 3 2 1 0 9000 10000 11000 12000 13000 Time [h][h] Time 14000 15000 16000 17000 Probabilistic modelling Example: Lumped conceptual rainfall-runoff model : 10 9 7 5 4 33 Discharge /s] Discharge[m [m /s] 3 Discharge Discharge[m [m3/s] /s] 6 8 Measurements 7 6 Probabilistic model results 5 4 3 2 1 3 0 8850 2 8900 8950 9000 9050 Time[h] [h] Time 9100 9150 9200 1 0 9000 10000 11000 12000 13000 Time [h][h] Time 14000 15000 16000 17000 Regression and correlation Additional remarks: • regression in y-direction might be different from a regression in x-direction • possible hypothesis tests: H0 : a = 0 H0 : b = 0 • (n − 2) sm2 od 2 tot s ¬ F (1, n − 2) if b = 0 Regression and correlation Additional remarks: X1 • multivariate case: X = X 2 ... Y1 Y = Y2 ... Y = A+ B X Bˆ = ( X T X ) −1 X T Y Aˆ = Y − Bˆ X n-2 → n-p • model order identification: balance between R2 and p or Var(P) e.g. AIC (Akaike Information Criterion) YIC (Young’s Information Criterion) Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Discharge splitting Example: 7 Measurements Filtered baseflow 6 Filtered interflow Filtered total discharge Discharge [m3/s] 5 4 3 2 1 0 500 550 600 650 Time [number of hours] 700 750 800 Discharge splitting Example baseflow filtering: 8 Debietmeting Measurements Filtered baseflow Gefilterde basisafvoer Discharge [m3/s] Debiet [m3/s] 7 6 5 4 3 2 1 0 0 1000 2000 3000 4000 5000 Tijd [aantal Time [number of uren] hours] 6000 7000 8000 Discharge splitting Baseflow filtering based on recession constant: 10 Discharge [m3/s] Measurements Filtered baseflow Slope recession constant for baseflow 1 0.1 0.01 0 1000 2000 3000 4000 5000 Time [number of hours] 6000 7000 8000 Discharge splitting Linear reservoir model: 1 b(t ) = exp(− )b(t − 1) k 1 q (t − 1) + q (t ) + (1 − exp(− ))( ) k 2 b(t) b(0) q(t) k t b(t) Discharge splitting Linear reservoir mode as ‘lowpass filter’: 1 2 H( f ) ≈ 1 + (2 π f k ) 2 frequency response function: 1 0.9 0.8 | H(f) | 2 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 Frequency f 0.4 0.5 Discharge splitting Numerical filtering: q(t): total stream flow filter b(t): slow flow (baseflow) f(t): quick flow : signal filtered out : filter result Discharge splitting Working-principle ‘Extended Chapman-filter’: q(t) w*q(t) f(t) (1-w)*q(t) b(t) Discharge splitting Extended Chapman-filter: f (t ) = af (t − 1) + b( y (t ) − αy (t − 1)) b (t ) = q ( t ) − f (t ) = αb(t − 1) + c (1 − α )( f (t ) + f (t − 1)) α = exp( −1 / k ) 1− w v= w ( 2 + v )α − v a= 2 + v − vα 2 b= 2 + v − vα c = 0 .5v Discharge splitting Calibration k parameter: 10 Discharge [m3/s] Measurements Filtered baseflow Slope recession constant for baseflow 1 0.1 k 0.01 0 1000 2000 3000 4000 5000 Time [number of hours] 6000 7000 8000 Discharge splitting Calibration v parameter: Debiet [m3[m3/s] /s] Discharge 1 Debietmeting Discharge measurements 0.1 1100 Grond waterfiltering, gekalibreerde v parameter Filtered baseflow, calibrated v parameter Grond waterfiltering, ond er-orofunderestimated overs chatte v p arameter Filtered baseflow, overv parameter 1300 1500 1700 1900 Tijd [aanta l uren vana f 0 1.01.93 ] Time [number of hours] 2100 2300 2500 Discharge splitting Discharge splitting of baseflow and interflow (subsurface flow) from total flow in three steps: • elimination of constant component • baseflow separation from the total discharge • interflow separation from the series ‘total discharge - baseflow’ Discharge splitting Example baseflow: 10 Discharge [m3/s] Measurements Filtered baseflow Slope recession constant for baseflow 1 0.1 0.01 0 1000 2000 3000 4000 5000 Time [number of hours] 6000 7000 8000 Discharge splitting Example interflow: Discharge [m3/s] 10 Measurements, after subtraction filtered baseflow Filtered interflow Slope recession constant for interflow 1 0.1 0.01 500 550 600 650 700 750 Time [number of hours] 800 850 900 950 Discharge splitting Application using different steps : • 1-step approach: 1 forward step • 3-step approach: forward + backward + forward Debiet [m3/s] Discharge [m3/s] 10 Debietmeetreeks Discharge measurements Eenmalige toepassing1-step filter approach Filtered baseflow, Drievoudige toepassing filter Filtered baseflow, 3-step approach Slopeter recession constant baseflow Helling kalibratie recessieconstante 1 0.1 6000 6500 7000 7500 Tijd [aantal uren] Time [number of hours] 8000 8500 9000 POT extraction 8 7 Measurements Debietmeetreeks POTvalues waarden POT Discharge [m3/s] Debiet [m3/s] 6 5 4 3 2 1 baseflow 0 1-Nov-93 11-Nov-93 21-Nov-93 1-Dec-93 Tijd Time 11-Dec-93 21-Dec-93 31-Dec-93 POT extraction Method 1: based on baseflow qmax p qmin - qbase based on max. ratio <f qmax min. peak height qmax > qlim qmin qbase Method 2: based on baseflow + interflow idem after replacement of qbase by qbase + qinter Method 3: independent on subflows based on min. indep. period p > k qmin based on max. ratio q < f max min. peak height qmax > qlim Statistics for Water Engineering 1. Initial definitions : Hydrological variables, processes and data 2. Descriptive statistics 2.1. Presentation of data 2.2. Statistical descriptors of data 3. Probability theory 3.1. Elementary probability theory 3.2. Probability distributions 3.3. Estimation of parameters 3.4. Testing statistical hypotheses 4. Extreme value analysis 5. Regression and correlation 6. Hydrological time series analysis 7. Introduction in stochastic and probabilistic modelling and risk analysis Risk analysis Risk = Probability ⊗ Elements ⊗ Vulnerability ⊗ Value Risk = Probability ⊗ Consequence Economic consequence = Damage Risk analysis Example: risk-based design of an hydraulic structure for flood protection: Total annual expected cost = total installation cost * capital recovery factor + annual expected damage cost Risk analysis Example: risk-based design of an hydraulic structure for flood protection: Annual expected damage cost: E(D) Damage = Load > Resistance Discharge Q in the river Discharge capacity Qc hydraulic structure Considering the distribution of Q: +∞ Annual expected flood damage : E(D) = ∫ D(q,qc) fQ(q) dq qc Considering the distribution of Q and Qc: +∞ +∞ Annual expected flood damage : E(D) = ∫ ( ∫ D(q,qc) fQ(q) dq ) fQ (qc) dqc 0 qc c Risk-zone Risk = Probability * damage Return period Spatial dimension Risk analysis Example: Flood risk mapping River Dender case (Belgium) - area Geraardsbergen - Zandbergen: floodmaps for: rivers subcatchments 1 year 10 years 100 years