Uploaded by merryana.thong

Theory sessions-print

advertisement
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Initial definitions
Statistics = the study of methods for:
- organising and summarising the data
- inferring conclusions about the population
(on the basis of a data sample)
population
sample
Organising /
summarising
the data
= DESCRIPTIVE
STATISTICS
Inferring conclusions about the population
= INFERENTIAL STATISTICS
e.g.: probability distributions, testing statistical hypothesis
Initial definitions
Study of probability = study of randomness and uncertainty
Variables, processes and data
• Variable X
: some physical entity for which the value can vary (e.g. in time)
e.g.: a discharge time series Q
⇒ specific values: x, q
(measurements, observations, outcomes, realizations, …)
population
X
sample
x1, x2, …, xn
Variables, processes and data
• Variable X
: some physical entity for which the value can vary (e.g. in time)
e.g.: a discharge time series Q
⇒ specific values: x, q
(measurements, observations, outcomes, realizations, …)
⇒ continuous or discrete
⇒ univariate or multivariate
vector of variables X = [X1, X2, …]
e.g.: [Q, H]
⇒ random or non-random
variability in the results by repetition of the measurement under
identical circumstances
Variables, processes and data
• Series
⇒ continuous or discrete
equidistant or not
special case: extreme value series: - annual series
- partial duration series / POT series
⇒ stationary or non-stationary
statistical properties do not change in time
by - trend
Variables, processes and data
• Series
⇒ continuous or discrete
equidistant or not
special case: extreme value series: - annual series
- partial duration series / POT series
⇒ stationary or non-stationary
statistical properties do not change in time
by - trend
- jump
Variables, processes and data
• Series
⇒ continuous or discrete
equidistant or not
special case: extreme value series: - annual series
- partial duration series / POT series
⇒ stationary or non-stationary
statistical properties do not change in time
by - trend
- jump
- persistence : dependency in time
⇒ deterministic or stochastic/random
Variables, processes and data
• Processes
process = mathematical description of the behaviour of a phenomenon
(in time; in space; continuous; discrete; deterministic; stochastic)
⇒ stationary or non-stationary
⇒ ergodic or non-ergodic
each realization of the process is a complete and independent
representation of all possible realizations of the process
(all statistical properties of the process can be achieved from a
single realization)
Variables, processes and data
Ergodic process
Non-ergodic process
Variables, processes and data
• Processes
process = mathematical description of the behaviour of a phenomenon
(in time; in space; continuous; discrete; deterministic; stochastic)
⇒ stationary or non-stationary
⇒ ergodic or non-ergodic
each realization of the process is a complete and independent
representation of all possible realizations of the process
(all statistical properties of the process can be achieved from a
single realization)
⇒ population : ensemble of processes
Variables, processes and data
• Data
: all observations in the sample, together with all other relevant information
• Errors in data
⇒ random or systematic
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Descriptive statistics
Presentation of data
: graphical presentations (of a sample)
• Histogram / frequency distribution /
frequency density distribution
Histogram ←
number of data points in each interval
Frequency distribution ←
+ ÷ total number of sample points
Frequency density distribution ←
+ ÷ interval lenght
= EMPIRICAL DISTRIBUTION
⇒ theoretical distribution
⇒ looking for - symmetry
- outliers
interval
lenght
x
Descriptive statistics
Presentation of data
: graphical presentations (of a sample)
cumulative frequency
• Cumulative frequency distribution
x
interval centres or all sample points x1, …, xn
Descriptive statistics
Presentation of data
: graphical presentations (of a sample)
• Time series
1− X λ
of a variable X, ev. after transformation (e.g. ln(X), BC ( X ) =
)
λ
8
Debietmeting
Measured time series
Gefilterde
basisafvoer
Filtered baseflow
7
Debiet [m3/s]
Discharge [m3/s]
6
5
4
3
2
1
0
0
1000
2000
3000
4000
5000
Tijd
[aantal uren]
Time
[number
of hours]
6000
7000
8000
xt or x(t) for a continuous time series
xi or x(i) for a discrete time series
Descriptive statistics
Presentation of data
: graphical presentations (of a sample)
• Time series
⇒ time series in aggregated form
x
t
intervals : overlapping or non-overlapping (disjoint)
e.g. moving average
Descriptive statistics
Presentation of data
: graphical presentations (of a sample)
• Time series
⇒ time series in ranked form
x
1 2
n
Rank number i
100 * rank number
= percentage of time the value of x is exceeded
sample size n
Descriptive statistics
Presentation of data
: graphical presentations (of a sample)
• Time series
⇒ ranked form, using only independent values :
e.g. only extremes: annual maxima or POT values
Descriptive statistics
Presentation of data
: graphical presentations (of a sample)
• Empirical quantile plot
p = Pr[X ≤ x]
empirical quantiles
empirical
theoretical
pi * 100% quantile
x( i ),th
x(i )
1 - p = Pr[X ≥ x]
1 - pi = Pr[X ≥ x(i)]
= i − 0 .5
n
Hazen plotting position
pi =
n − i + 0 .5
n
empirical cumul. probabilities p
Descriptive statistics
Presentation of data
: graphical presentations (of a sample)
• Q-Q plot
empirical quantiles
x(i )
45°
x( i ),th
theoretical quantiles
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Descriptive statistics
Statistical descriptors of data
• Measures of central tendency of the data
⇒ mean
Relative
frequency
n
x1 + ... + xn
xi
x=
=∑
n
i =1 n
for grouped data:
k
x = f x + ... + f k x = ∑ f i xi*
*
x1*
x 2*
x k*
x
*
1 1
*
k
i =1
Descriptive statistics
Statistical descriptors of data
• Measures of central tendency of the data
⇒ mean
n
x1 + ... + xn
xi
x=
=∑
n
i =1 n
⇒ root mean square
RMS X =
⇒ median
x12 + ... + x n2
=
n
~
x = x 0 .5
xi2
∑
i =1 n
n
Descriptive statistics
Statistical descriptors of data
⇒ quantiles / percentiles
xα
x0.25
x0.75
quartiles
Box-plot:
xmin
x0.25
x0.5 x
interquartile range
x0.75
xmax
Descriptive statistics
Statistical descriptors of data
• Measures of dispersion of the data
⇒ variance
n
s X2 = ∑
Relative
frequency
( x i − x )2
n −1
i =1
for grouped data:
k
(
s = ∑ fi x − x
2
X
x1*
x 2*
x k*
x
i =1
*
i
)
* 2
Descriptive statistics
Statistical descriptors of data
• Measures of dispersion of the data
⇒ standard deviation
⇒ mean deviation
sX =
n
∑
CV X =
n −1
i =1
n
xi − x
i =1
n
dX = ∑
⇒ coefficient of variation
( x i − x )2
sX
x
Descriptive statistics
Statistical descriptors of data
• Moments
xi − x
M1 = ∑
n
i =1
n
n
M2 = ∑
( x i − x )2
i =1
n
n
( x i − x )r
Mr = ∑
i =1
n
( ↔ mean deviation )
( ↔ sX )
s X2 =
: the rth sample moment
n
M2
n −1
Descriptive statistics
Statistical descriptors of data
• Shape
⇒ symmetry
⇒ skewness
coefficient of skewness : CS X =
M 3 (x)
(M 2 ( x ) ) 2
3
Descriptive statistics
Statistical descriptors of data
• Shape
⇒ symmetry
⇒ skewness
⇒ kurtosis (peakedness)
coefficient of kurtosis :
CK X =
M 4 (x)
(M 2 ( x ) ) 2
Descriptive statistics
Statistical descriptors of data
• After linear transformation:
u = a1 x1 + a 2 x 2 + ... + a n x n
u = a1 x1 + a 2 x 2 + ... + a n x n
sU2 = a12 s X2 1 + a 22 s X2 2 + ... + a n2 s X2 n
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Elementary probability theory
Probability laws
event A
• complementary event : AC (‘not A’)
A and
AC
are mutually exclusive:
A ∩ AC = φ
A ∪ AC = S
• intersection : A ∩ B (‘A and B’)
• union : A ∪ B (‘A or B’)
• probability : Pr(A)
the ‘relative weight’ of event A
‘proportion in the full population’
‘frequency density’
event B
sample space S
Elementary probability theory
Probability laws
0 ≤ Pr(A) ≤ 1
Pr(A ∪ B) = Pr(A) + Pr(B) - Pr(A ∩ B)
Pr(A ∩ B) = Pr(A|B) . Pr(B)
= Pr(B|A) . Pr(A)
conditional probabilities
Pr(A|B) =
Pr(B|A) . Pr(A)
independency :
Pr(B)
: Bayes’ rule
Pr(A|B) = Pr(A)
Pr(B|A) = Pr(B)
Pr(A ∩ B) = Pr(A) . Pr(B)
Elementary probability theory
Probability laws
Theorim of total probabilities :
A1
n
Pr(B) = ∑ Pr(B|Ai) . Pr(Ai)
...
i=1
A2
A3
Ai : mutually exclusive events
A1 ∪ A2 ∪ … ∪ An = S
An
Elementary probability theory
Probability functions
Discrete random variable
Continuous random variable
Pr( x1 ≤ X ≤ x2 ) =
fX(x) = Pr(X=x)
x2
∫f
X
( x)dx
x1
+∞
∫f
X
( x)dx = 1
−∞
Probability mass function
FX(x) = Pr(X≤ x) =
∑
∀ xi ≤x
fX(xi)
Probability density function
x
FX ( x ) =
∫f
X
( x)dx
−∞
f X ( x) =
dFX ( x)
dx
FX (−∞) = 0
FX (+∞) = 1
(Cumulative) distribution function
(Cumulative) distribution function
Elementary probability theory
Moments of distributions of random variables
⇒ mean or expected value
µ X = E[X ]
n
for a probability mass function:
µ X = ∑ xi f X ( xi )
i =1
+∞
for a probability density function: µ X =
∫ xf
X
( x)dx
−∞
⇒ variance
Var [X ] = σ 2X
for a probability mass function:
for a probability density function:
n
σ = ∑ ( xi − µ X ) 2 f X ( xi )
2
X
i =1
+∞
σ =
2
X
2
(
x
−
µ
)
f X ( x)dx
X
∫
−∞
Elementary probability theory
Moments of distributions of random variables
⇒ mean or expected value
µ X = E[X ]
⇒ variance Var [ X ] = σ 2X
σ X : standard deviation
σX
VX =
: coefficient of variation
µX
⇒ higher order moments
for a probability mass function:
µ
(r )
X
n
= ∑ ( xi − µ X ) r f X ( xi )
i =1
+∞
for a probability density function:
µ
(r )
X
=
r
(
x
−
µ
)
f X ( x)dx
X
∫
−∞
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Probability distributions
= Models of distributions of random variables
Normal or Gaussian distribution
 1 x−µ
X
f X ( x) =
exp − 
 2  σX
σ 2π

1
x
FX ( x ) =
∫f
X
( x)dx



2




for -∞ ≤ x ≤ +∞
→ cfr Table
−∞
notation:
properties:
X ¬ N (µ X , σ 2X )
• symmeteric around
µX
• almost equal to the distribution of the sum
of a large number of iid variables
→ central limit theorem
• the distribution of a linear function of
normally distributed variables is also normal
⇒⇒ model of sums
Probability distributions
Normal or Gaussian distribution
 1 x−µ
1
X
exp − 
f X ( x) =
 2  σX
σ X 2π

x
FX ( x ) =
∫f
X
( x)dx



2




for -∞ ≤ x ≤ +∞
→ cfr Table
−∞
special case: standard normal distribution N (0,1)
Z ¬ N (0,1)
Φ ( z ) = Pr( Z ≤ z )
X = µX + Z σX
 x −µX
 σX
⇒ FX ( x ) = Φ 




Probability distributions
Lognormal distribution
Lognormal distribution equals normal distribution after ln transformation:
X ¬ LN (µ X , σ 2X ) ⇔
ln X ¬ N (µ ln X , σ ln2 X )
relation between (µ X , σ X ) and (µ ln X , σ ln X ) :
µ ln X
µ 2X
1
= ln
2  σ 2
 X  + 1
 µX 
σ ln2 X = ln(V X2 + 1)
Probability distributions
Lognormal distribution
After power function transformation:
y = x1a1 x 2a2 ... x nan
→
ln y = a1 ln x1 + a 2 ln x 2 + ... + a n ln x n
X i ¬ LN (µ ln X i , σ ln2 X i ) ∀i
→
Y ¬ LN (a1µ ln X 1 + ... + a n µ ln X n , a12 σ ln2 X 1 + ... + a n2 σ ln2 X n )
⇒⇒ model of products
Probability distributions
Exponential distribution
f X ( x) = λ exp(−λx)
FX ( x) = 1 − exp(−λx)
if x ≥ 0
f X ( x) = FX ( x) = 0 if x < 0
moments:
µX = σX =
1
λ
⇒⇒ model of time between events
Probability distributions
Exponential distribution
Poisson process
: a special case of the occurrence of events
t=0
in time
time t
properties:
• the probability of an event in a short interval of time [t, t+h] is
approximately λh
• the probability of more than one event in a short interval of time is negligible
• the probability is independent of time or the probability in any other interval
( = memory-less property )
parameter:
λ = average number of events per unit of time interval lenght
Probability distributions
Exponential distribution
extention with threshold xt:
FX ( x) = 1 − exp(−λ ( x − xt ))
f Tk (t )
extention to a higher order: Gamma distribution :
λk k −1
f X ( x) =
x exp(−λx) if x ≥ 0
Γ(k )
Gamma function:
+∞
Γ(k ) = ∫ t k −1 exp(−t )dt
0
moments:
µX =
λt
k
λ
σX =
k
λ
Gamma distribution with threshold: Pearson III distribution
Probability distributions
Pareto distribution
FX ( x ) = 1 − x − α
α : Pareto index
Probability distributions
Weibull distribution
  x τ 
FX ( x) = 1 − exp −   
 β 


τ : Weibull index
Two limiting cases:
τ =1 : exponential distribution ( λ =
τ =0 : Pareto
1
)
β
Probability distributions
Uniform distribution
fX(x)
1
xmax - xmin
xmin
xmax
x
⇒⇒ equally likely model
Probability distributions
Beta distribution
α=2, α+β=3
α=1, α+β=3
f X ( x) =
1
x α −1 (1 − x) β−1
B ( α , β)
α=1, α+β=2
Beta function:
α=0.5, α+β=1
B(α, β) =
α=1
α=7
α+β=8
α=6
α=2
α=4
Γ(α) Γ(β)
Γ(α + β)
Probability distributions
Normal related or sampling distributions
t-distribution
: distribution of sample mean of a population X
Chi-square distribution : distribution of sample variance of a population X
F-distribution
: distribution of the ratio of the sample variances
of 2 populations X1 and X2
Normal related or sampling distributions
Chi-square distribution
CH (n) = χ 2n
f X ( x) =
FX (x)
x
n
−1
2
n
2
n
2 Γ( )
2
x
n
−1
2
x
exp(− )
2
if x > 0
→ cfr Table
• degrees of freedom : n
• moments :
µX = n
σ X = 2n
• the distribution of the sum of squares of n iid standard normal variables :
Z 12 + Z 22 + ... + Z n2 ¬ χ 2n
• n large :
χ 2n
asympt.
N (n, 2n)
S X2
• sampling distribution of the variance : (n − 1)
¬ χ 2n −1
2
σX
Normal related or sampling distributions
Chi-square distribution
Distribution of sample variance :
assume: X1, X2, …, Xn a random sample from N (µ X , σ 2X )
2
(
X
−
X
)
i
sample variance: S = ∑
n −1
i =1
2
X
calculations:
n
X i ¬ N (µ X , σ 2X )
Xi − µX
¬ N (0, 1)
σX
( X i − µ X )2
2
¬
χ
∑
n
σ 2X
i =1
n
S X2
(n − 1) 2 ¬ χ 2n −1
σX
µX → x
χ 2n −1
Normal related or sampling distributions
t-distribution
n +1
)
2
f X ( x) =
n
Γ ( ) nπ
2
Γ(
FX (x)
ν=n

x 
1 + 
n 

2
−
n +1
2
→ cfr Table
• degrees of freedom : n
• sampling distribution of the mean :
X − µX
¬ t n −1
sX
n
Normal related or sampling distributions
t-distribution
Distribution of sample mean :
assume: X1, X2, …, Xn a random sample from N (µ X , σ 2X )
sample mean: X =
X 1 + ... + X n
n
[ ]
calculations: µ X = E X =
σ =
2
X
E [X 1 ] + ... + E [ X n ]
= E[X ] = µ X
n
σ 2X 1 + ... + σ 2X n
n2
σ 2X
=
n
X ¬ N (µ X , σ 2X )
X −µX
¬ N (0, 1)
σX
n
σ X → sX
t n −1
Normal related or sampling distributions
Overview:
sample mean X
2
sample variance S X
population variance
σ 2X
known
population variance
σ
2
X
not known
X −µX
¬ N (0, 1)
σX
n
X − µX
¬ t n −1
sX
n
S X2
(n − 1) 2 ¬ χ 2n −1
σX
Normal related or sampling distributions
F-distribution
Consider 2 populations: X1, X2
2
SX 1
2
X2
S
σ 2X 1
σ 2X 2
¬ F (n1 − 1, n2 − 1)
Probability distributions
Modified distributions
Truncated distributions
Compound distributions
Modified distributions
Truncated distributions
*
fX(x)
e.g. truncated normal distribution f X* ( x) :
f X ( x)
f X (x)
boundary conditions:
FX* ( x0 ) = 0
FX* (+∞) = FX (+∞) = 1
⇒ FX* ( x) =
x0=0
x
FX ( x ) − FX ( x 0 )
1 − FX ( x 0 )
FX(x)
1
FX(x) F * ( x )
X
0
x0=0
x
Modified distributions
Compound distributions
e.g. mixture of two populations X1 and X2
FX ( x) = p1 FX 1 ( x1 ) + (1 − p1 ) FX 2 ( x 2 )
p2
Probability distributions
Multivariate distributions
X
random vector: Y
f X ,Y ( x, y ) dx dy = Pr[( x ≤ X ≤ x + dx ) ∩ ( y ≤ Y ≤ y + dy )]
e.g. bivariate normal distribution :
f X ,Y ( x, y ) =
1
2πσ X σ Y

 x − µ
1

X

exp −
2

 2(1 − ρ )   σ X
1− ρ 2


2

( x − µ X )( y − µ X )  y − µY
 − 2 ρ
+ 
σ Xσ Y
 σY

+∞
Marginal distribution :
f X ( x) =
∫f
X ,Y
( x, y ) dy
−∞
Conditional distribution :
f X Y ( x) =
∫f
X ,Y
condition for Y
( x, y ) dy



2




Probability distributions
Multivariate distributions
for independent random variables X and Y :
f X ,Y ( x, y ) = f X ( x) f Y ( y )
for correlated random variables X and Y :
after linear transformation Z = aX + bY
E [Z ] = a E [ X ] + b E [Y ]
Var [Z ] = a 2 Var [ X ] + b 2 Var [Y ] + 2 a 2 b 2 Cov[X , Y ]
n
Covariance : Cov[X , Y ] = ∑
i =1
( xi − x )( y i − y )
n −1
Correlation coefficient : ρ X ,Y =
Cov[ X , Y ]
σ X σY
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Estimation of distribution parameters
Distribution parameters: θ
Estimators:
• based on quantities derived from the sample
[]
• should be unbiased : E θ̂ = θ
Estimation of distribution parameters
Method of moments
Method:
θ̂
so that:
µ (X1) (θˆ ) = M 1
µ (X2 ) (θˆ ) = M 2
...
Moments
of the theoretical
distribution
Examples:
Normal distribution:
Sample
moments
X ¬ N (µ X , σ 2X )
µX = x
σ 2X = s X2
Exponential distribution: µ X =
1
=x
λ
Estimation of distribution parameters
Method of maximum likelihood
Method: Derivation of the ‘most likely’ parameter values by maximising the
likelihood (the probability of occurrence) the sample has been drawn from
the distribution
Likelihood function: L( θ) =
n
∏f
X
( x i θ)
i =1
Maximum likelihood method:
∂L(θˆ )
=0
∂θ
L(θˆ ) = max
or: using the log-likelihood function ln( L(θ)) :
n
∑
∂(ln f X ( xi θ))
i =1
e.g. exponential distribution:
∂ (ln(λ exp(−λxi )))
=0
∑
∂λ
i =1
n
∂θ
=0
λ=
1
x
Confidence intervals
: intervals that contain with high probability the ‘true’ parameter value
e.g.: (1-
α ) 100 % confidence interval for a parameter θ after estimation by an estimator θ̂ :
f θˆ (θˆ )
α
100 % cumulative probability
2
θ̂1
θ̂ 2
θ̂
[θˆ , θˆ ] = (1- α ) 100 % two-sided confidence interval
1
2
f θˆ (θˆ )
α 100 % cumulative probability
[θˆ ,+∞] = (1- α ) 100 % one-sided confidence interval
θ̂1
1
θ̂
Confidence intervals
interpretation of the confidence interval :
NOT : “the true value θ lies with probability 1- α within the interval”
BUT
: “if an infinite number of random samples are taken, in (1-α) 100 % of the cases
the true value θ lies within the interval”
θ̂1
θ̂ 2
: sample 1
θ̂1
for (1-α ) 100 % of the samples,
the true value θ lies within the interval;
while for α 100 % of the samples, the true
value θ lies outside the interval
θ̂1
θ̂ 2
θ̂ 2
θ̂1
: sample 3
θ̂ 2
θ̂1
: sample 2
θ̂ 2
: sample 4
: sample 5
… etc
θ
θ̂
Confidence intervals
Example for sampling distributions:
θ = µX
θ̂ = X
sample mean X
θ = σ 2X
θ̂ = s X2
2
sample variance S X
population variance
σ 2X
known
population variance
σ
2
X
not known
X −µX
¬ N (0, 1)
σX
n
X − µX
¬ t n −1
sX
n
S X2
(n − 1) 2 ¬ χ 2n −1
σX
Confidence intervals
Example for sampling distributions; e.g. sample mean, pop. variance known:
X −µX
¬ N (0, 1)
σX
n
f X −µ X (
σX
n
x −µX
)
σX
n
α
Φ −1 ( ) = − z α
2
2
zα
x −µX
σX
n
2
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of prarameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Extreme value analysis
: searching for the distribution in the tail of fX(x) and FX(x)
in the quantile plot:
35
POT waarden
Hill-type regressie
Optimale drempel
30
independent extreme values
x [m3/s]
Debiet
25
20
extreme value distribution
above a threshold xt
15
xt
10
5
0
0
1
2
3
4
-ln ( 1-G(x)
)
-ln( “exceedance
probability”
)
-ln( i / (m+1) )
-ln( 1 - G(x) )
5
6
Extreme value analysis
•
Extraction of independent values from the time series:
‘Peak-Over-Threshold (POT)’ or ‘Partial-Duration-Series (PDS)’
method :
consider independent POT-values x1≥ x2 ≥... ≥ xm
•
Pickands (1975) :
xi > xt : FX|X ≥ xt → Generalized Pareto distribution (GPD)
G ( x )= 1 − (1 + γ
for γ<>0
•
x − xt
β
)
−
1
γ
G ( x )= 1 − exp( −
for γ =0
Extreme value index γ :
a measure of the tail-heaviness of the distribution
x − xt
β
)
Extreme value analysis
•
0.05
•
probability density fX(x)
0.045
0.04
•
0.035
0.03
γ>0:
Pareto-class; heavy tails
γ=0:
Gumbel/Exponential - class; normal tails
γ<0:
final right-endpoint; light tails
0.025
0.02
0.015
Extreme value index :
positive
zero
negative
0.01
0.005
0
0
10
20
30
x
40
50
60
Extreme value analysis
Exponential quantile plot:
35
POT waarden
Hill-type regressie
Optimale drempel
Discharge
[m3/s]
Debiet
[m3/s]
30
25
β
20
15
xt
10
G ( x )= 1 − exp( −
5
0
0
1
2
3
4
-ln ( 1-G(x)
)
-ln( “exceedance
probability”
)
-ln( i / (m+1) )
-ln( 1 - G(x) )
5
6
x − xt
β
)
Extreme value analysis
Pareto quantile plot:
35
POT waarden
Hill-type regressie
Optimale drempel
ln( Discharge
x [m3/s] )
Debiet [m3/s]
30
25
γ
20
15
xt
10
G ( x )= 1 − (1 + γ
x − xt
β
5
0
0
1
2
3
-ln ( 1-G(x) )
4
-ln( “exceedance probability” )
-ln( i / (m+1) )
-ln( 1 - G(x) )
5
β= γ x t
6
)
−
1
γ
Extreme value analysis
Overview:
Extreme value
index γ>0
Extreme value
index γ=0
GEV distribution
Gumbel distribution
GPD distribution
Exponential distribution
Weibull distribution
-> Pareto QQ-plot
-> Exponential QQ-plot
-> Weibull QQ-plot
Extreme value analysis
Examples:
Exponential QQ-plot
Dataset 1 - normal tail
100
90
80
70
x
60
50
40
30
20
10
0
0
1
2
3
-ln( 1-G(x) )
4
5
6
Extreme value analysis
Examples:
Slope in exponential QQ-plot
Dataset 1 - normal tail
30
1000
900
25
800
700
20
15
500
400
10
300
200
5
100
0
0
50
100
150
200
number of observations above threshold
250
0
300
MSE
slope
600
Extreme value analysis
Examples:
Pareto QQ-plot
Dataset 1 - normal tail
5
4.5
ln( x )
4
3.5
3
2.5
0
1
2
3
-ln( 1-G(x) )
4
5
6
Extreme value analysis
Examples:
Slope in Pareto QQ-plot
Dataset 1 - normal tail
1
0.7
0.9
0.6
0.8
slope
0.6
0.4
0.5
0.3
0.4
0.3
0.2
0.2
0.1
0.1
0
0
0
20
40
60
80
number of observations above threshold
100
120
MSE
0.5
0.7
Extreme value analysis
Examples:
Pareto QQ-plot
Dataset 2 - heavy tail
1.2
1
0.8
ln ( x )
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
0
1
2
3
-ln( 1-G(x) )
4
5
6
Extreme value analysis
Examples:
Slope in Pareto QQ-plot
Dataset 2 - heavy tail
0.4
0.7
0.35
0.6
0.5
0.25
0.4
0.2
0.3
0.15
0.2
0.1
0.1
0.05
0
0
0
50
100
150
200
number of observations above threshold
250
300
MSE
extreme value index
0.3
Extreme value analysis
Two methods for extraction of independent extremes from a time series:
• Annual (periodic) maxima method
• Peak-Over-Threshold (POT) or Partial-Duration-Series (PDS) method
8
7
Debietmeetreeks
POT waarden
Debiet [m3/s]
6
5
4
3
2
1
0
1-Nov-93
11-Nov-93
21-Nov-93
1-Dec-93
Tijd
11-Dec-93
21-Dec-93
31-Dec-93
Extreme value analysis
Annual (periodic) maxima method:
: Generalized Extreme Value (GEV) distribution
x − xt −1 / γ
) )
β
x − xt
= exp(− exp(−
))
β
if
γ≠0
if
γ=0
H ( x) = exp(−(1 + γ
Poisson process
Peak-Over-Threshold (POT) or Partial-Duration-Series (PDS) method:
: Generalized Pareto Distribution (GPD)
G ( x ) = 1 − (1 + γ
x − xt
G ( x ) = 1 − exp( −
β
) −1 / γ
x − xt
β
)
if
γ ≠0
if
γ =0
Extreme value analysis
Extreme value
index γ>0
Extreme value
index γ=0
Method of
periodic maxima
GEV distribution
Gumbel distribution
POT method
GPD distribution
Exponential distribution
Extreme value analysis
Return period:
for POT extremes:
total number of years
n
1
T ( x) =
t P[ X > x | X > xt ]
number of exceedences
of the threshold level xt
T : return period [years]
extreme value distribution
Extreme value analysis
Return period:
for POT extremes:
total number of years
n
1
T [ years] =
t 1 − G ( x)
number of exceedences extreme value distribution
of the threshold level xt
Extreme value analysis
Return period:
for annual maxima:
1
T [ years] =
1 − H ( x)
1
T AM
= 1 − exp(−
1
TPOT
)
Extreme value analysis
Frequency factor KT:
n
1
T [ years] =
=
= g ( x)
t (1 − G ( x)) 1 − H ( x)
x = g −1 (T ) = µ X + K T σ X
Extreme value analysis
Confidence limits:
•
empirical:
– parametric bootstrap method
– non-parametric bootstrap method
•
analytical for the ML-method:
Var (θ1 )
Cov (θ1 , θ 2 )
...
Cov (θ 2 , θ1 )
Var (θ 2 )
...
...
...
...
∂2L
∂θ12
∂2L
∂θ1∂θ 2
∂ L
∂ L
∂θ 2 ∂θ1 ∂θ 22
2
...
2
...
...
...
...
Extreme value analysis
Naturalization of data :
abstraction discharges and other man-made influences may cause
problems of dependency or non-randomness of the data
⇒ elimination of these influences
Urban drainage system
Untreated domestic sources
WWTP
Sewer system ancillaries (SST)
Watercourses
Industrial sources
Rainfall-runoff
agricultural pollution
Upstream
discharges
Extreme value analysis
Other influences :
•
•
•
influence of river flooding (floodplain or bank storage)
inaccurate extrapolation of rating curve
outliers
Extreme value analysis
Flooding influence
Extreme waarden analyse debiettijdreeks
12
11
Discharge
[m3/s]
Debiet [m3/s]
10
9
8
Limnigraafdebieten
River discharge measurements
Equivalente
Equivalentneerslagafstromingsdebieten
upstream rainfall-runoff discharges
Equivalentneerslagafstromingsdebieten,
upstream rainfall-runoff discharges,
Equivalente
na correctie
op
de extrapolatie
hetrating
Q-H verband
after
correction van
of the
curve
Gekalibreerde extreme waarden verdeling
Calibrated extreme value distribution
7
6
q*
5
4
3
2
0.1
1
Terugkeerperiode
[jaar]
Return
period [years]
10
Extreme value analysis
Flooding influence
10
Limnigraafgegevens
River discharge measurements
9
Equivalent upstream rainfallGeschatte
neerslagafstromingsdebieten
runoff discharges
Debiet [m3/s]
Discharge
[m3/s]
8
(op basis van inverse riviermodellering)
7
6
q*
5
4
3
2
1
0
950
1000
1050
1100
1150
Tijd [aantal
Time uren]
1200
1250
Extreme value analysis
Influence rating curve extrapolation
3
Original rating Q-H
curveverband
Oorspronkelijk
Geschatte
afvlakking
Q-H
Rating curve
correction
forverband
fl. infl.
Water
level [m][m]
Waterhoogte
2.5
2
1.5
1
0.5
0
0
1
2
3
4
5
6
Debiet [m3/s]
Discharge
[m3/s]
7
8
9
10
Extreme value analysis
Influence of outliers
20
Simulatieresultaten, periode 1986-1996
Simulation
results rainfall-runoff model
Simulatieresultaten, periode 1898-1997
Discharge
[m3/s]
Debiet [m3/s]
Extreme
value distribution
Extreme-waarden-verdeling
15
10
5
0
0.01
0.1
1
Return
period [years]
Terugkeerperiode
[jaar]
10
100
Extreme value analysis
For minima (e.g. low flow or drought frequency analysis)
•
the extreme value analysis method for floods is still valid after
transformation x → -x or x → 1/x
•
the lower limit for zero discharges has to be taken into account
⇒ the lower limit becomes an upper limit with the transformation -x
(bounded case; light tail)
⇒ unbounded case with the transformation 1/x (normal or heavy tail)
Extreme value analysis
For minima (e.g. low flow or drought frequency analysis)
•
when zero discharges occur, a bi-modal probability distribution model
has to be considered
⇒ separation of the zero flow and non-zero flow conditions (the extreme
value analysis method is only valid for the latter conditions)
•
for the POT method, low flow periods will be considered independent
when they are separated by a high flow period
Extreme value analysis
Consideration of the time duration (the aggregation level):
IDF-curves
Rainfall intensity [mm/h]
e.g. rainfall intensities
Aggregation-level [days]
Extreme value analysis
Consideration of the time duration (the aggregation level):
e.g. rainfall intensities
IDF-curves
design storms
Rainfall
intensity [mm/h]
Neerslagintensiteit
60
50
40
30
20
10
0
0
1
2
3
Tijd
[h][h]
Time
4
5
6
Extreme value analysis
Consideration of the time duration (the aggregation level):
e.g. discharges
QDF-curves
synthetic hydrographs
25
Historische gebeurtenis
Composiethydrogram
Debiet [m3/s]
20
15
10
5
0
0
10
20
30
40
50
60
70
80
90
Tijd [aantal uren]
100
110
120
130
140
150
Extreme value analysis
Consideration of the time duration (the aggregation level):
e.g. low flow discharges
Consideration of the duration of the low flow or drought period
⇒ discharge/duration/frequency relationships
⇒ considering different durations relevant for the several
applications:
– agricultural applications
– irrigation
– power plants
– domestic supply
– pollution
– etc.
Extreme value analysis
Consideration of the time duration (the aggregation level):
e.g. water levels
HDF-curves
synthetic limnigraphs
Extreme value analysis
Consideration of the time duration (the aggregation level):
e.g. concentrations
CDF-curves
immission standards
0
Class A
DO concentration [mg/l]
2
Class B
4
Water with fish powder
6
Fishery with trout
8
Fishery with carp
Class C
Model, 1h
VMM immission measurements
Intermittent standards; DWPCC, 1985
Intermittent standards; FWR, 1998
10
0.01
0.1
1
12
Return period [years]
10
100
Extreme value analysis
Ungauged locations - Regionalisation analysis:
Step 1:
Identification of homogeneous subregions for the statistical
properties, based on:
– meteorological, geological, and geomorfological characteristics
– rainfall-runoff model parameters
– extreme value distribution parameters of peak discharges and/or
rainfall intensities
– more global statistics of the discharge series, such as the
coefficient of variation (CV, CS, CK)
Extreme value analysis
Ungauged locations - Regionalisation analysis:
Step 2:
Derivation of relationships for each homogeneous region between the
distribution parameters and catchment characteristics (area,
length, topography/slope, land use, soil type, …)
Most common approach:
– Mean extreme value (e.g. annual maximum method: mean annual
maximum MAF): at-site dependent with catchment characteristics
– Growth curve = extreme value distribution for X/MAF: identical
per region
¾ Per homogeneous region: regional growth curve based on all data
stations
¾ Solves partly the data limitation problem
Extreme value analysis
Data problems:
•
•
•
•
•
Missing gaps (especially during the important periods with flood
conditions)
Difference in record length for different stations
Shortage of data
Non-homogeneous series by morfological changes
Systematic measurement errors (outliers)
Extreme value analysis
Missing gap filling:
•
•
Calculation of the correlation between stations
⇒ based on individual values and on cumulative amounts
Filling up of missing gaps
⇒ based on the measurements of neighbouring stations and the
calculated correlation with these stations
⇒ larger gaps: use of the rainfall-runoff model
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
Selection of the type of
distribution
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Selection of the type of distribution
Based on:
•
Shape of the distribution
•
Boundary conditions
•
Knowledge and understanding of the physics and its influence on the
distribution
•
Distribution class
Selection of the type of distribution
Distribution class based on the distribution’s tail (cfr. extreme value analysis):
Extreme value
index γ>0
Pareto distribution
τ=0
Extreme value
index γ=0
Extreme value
index γ<0
normal distribution
Beta distribution
lognormal distribution
uniform distribution
exponential Weibull
distribution distribution
τ=1
τ>0
Gamma distribution
GPD distribution
γ>0
γ=0
GEV distribution
γ<0
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Statistical hypothesis testing
H0 : null-hypothesis
= a statement about a random variable
if the observations in a random sample are consistent with H0
: ACCEPT H0
if not consistent with H0
: REJECT H0
The decision is based on a STATISTICAL HYPOTHESIS TEST
Statistical hypothesis testing
fT (t )
known distribution of the test-statistic
if H0 is true
α
2
α
100 % cumulative probability
2
test-statistics T
reject H0
accept H0
reject H0
α = SIGNIFICANCE LEVEL of the test
= probability that H0 is rejected while H0 is true
→ type I error
Statistical hypothesis testing
reality
H0 true
Accept H0
no error
Reject H0
Type I error
H0 wrong
Type II error
no error
type II error = the error of accepting H0 when H0 is not true
β = probability of a type II error under an alternative hypothesis H1
= POWER of the test
Statistical hypothesis testing
Example: use of the sample mean X as a test-statistic for an
hypothesis dealing with a statement about the population mean µ X
e.g.: H0: µ X = µ 0
Alternative hypothesis H1: µ X = µ1
Test statistic: X
f X (x )
N (µ 0 ,
σ 2X
n
)
β
α
2
µ0
reject H0
accept H0
α
2
µ1
reject H0
x
Examples of hypothesis tests
test-statistic
H0
µ X = µ0
sample mean
µ X1 = (a) µ X 2
σ
σ
( 2)
X1
( 2)
X
=σ
( 2)
0
= (a) σ
p = p0
( 2)
X2
difference of
sample means
name of the test
distribution
X − µX
X
X1 − X 2
sample variance
ratio of sample
variances
sample proportion
2
X
s /n
t-test
¬ t n −1
or N(0,1)
X 1 − (a ) X 2 − ( µ X1 − (a ) µ X 2 )
s / n1 + (a ) s / n2
2
X1
2
X2
S X2
(n − 1) 2 ¬ χ 2n −1
σX
SX
S X1
S X2 1
SX2
2
P̂
2
(a ) S
2
X2
¬ F (n1 − 1, n2 − 1)
Pˆ − p
¬ N (0,1)
Pˆ (1 − Pˆ ) / n
for n large
ρ = ρ0
λ = λ0
sample correlation
coefficient
λ-parameter estimate
Poisson process
R
λ̂
R n−2
¬ tn−2
if ρ = 0
1− R
λˆ − λ
¬ N (0,1)
for n large
λˆ / n
2
t-test
¬ tn1 +n2 −2
or N(0,1)
χ2-test
F-test
Examples of hypothesis tests
H0
Serial correlation
of order 0: r=r0
test-statistic
Wald-Wolfowitz
statistic
distribution
n
R = ∑ X i X i +1
i =1
R + a2 /(n − 1)
¬ N (0,1)
a2 / n − 1
if no serial correlation
name of
the test
WaldWolfowitz
test
n
a2 = ∑ i 2
i =1
=
Long-term trend
in a time series
Man-Kendall
statistic
k
Tk = ∑ N i
i =1
With Ni the number of
sample points for which
Xj < Xi ( ∀ j < i )
n(n + 1)(2n + 1)
6
Tk − k (k − 1) / k
¬ N (0,1)
k (k − 1)(2k + 5) / 72
if no trend
ManKendall
trend test
Examples of hypothesis tests
Tests for the validity of a specific theoretical probability distribution:
test-statistic
distribution
( N i − ei ) 2
histogram statistic T = ∑
¬ χ 2n −1
ei
i =1
n
name of the test
χ2 goodness-of-fit
test
with Ni : the number of sample points in interval i of the histogram
ei : the expected number of sample points, corresponding the
theoretical distribution fX(x)
ei = n
∫f
X
i nterval i
( x)dx
KolmogorovD = max pi − FX ( xi )
i =1,...,n
Smirnov statistic
¬ Kolmogorov-Smirnov table
KolmogorovSmirnov
goodness-of-fit test
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Regression and correlation
linear regression:
y
dependent variable
yi
a + b xi
y=a+bx
error or
residual ei
x
independent or
explanatory variable
Regression and correlation
estimation of the parameters of the regression curve
by the least squares method:
minimization of the mean squared error MSE:
n
( yi − (a + bxi )) 2
ei2
MSE = ∑
=∑
n−2
i =1
i =1 n − 2
n
n
∂MSE
=0
∂a
∂MSE
=0
∂b
bˆ =
∑(y
i =1
i
− y )( xi − x )
n
∑ (x − x)
i =1
aˆ = y − bˆx
i
2
Regression and correlation
ideal case: errors in y-direction independent on x
(possibly transformation of y needed):
the errors then can be represented by a single distribution:
fE(e)
2
e
s E2 = ∑ i = MSE
i =1 n − 2
n
σE
0
e
Regression and correlation
Prediction of Y based on x:
y
+ σE y = a + b x
fYi xi ( yi )
− σE
µYi xi
xi
x
Regression and correlation
Assuming the model y = a + bx is known:
Yi = a + bxi + E
E ¬ N (0, σ 2E )
Yi ¬ N (a + bxi , σ 2E )
Regression and correlation
Estimation uncertainty parameters:
∑ ( E[Y ] − E[Y ])
n
n
Bˆ =
∑ (Y − Y )
i =1
i
2
( xi − x )
n
∑ (x − x)
i =1
i
2
[]
E Bˆ =
[]
Var Bˆ =
i =1
n
∑ (x − x)
( xi − x )
2
i
i =1
σ 2E
n
∑ (x − x)
2
i
i =1
Aˆ = Y − Bˆ x
2
i
[]
1
ˆ
Var [A] = σ ( +
n
E Aˆ = a
2
E
x2
n
2
(
)
x
−
x
∑ i
i =1
)
=b
Regression and correlation
Estimation uncertainty parameters:
n
Bˆ =
∑ (Y − Y )
i =1
i
2
( xi − x )
n
∑ (x − x)
i =1
i
Aˆ = Y − Bˆ x
2
Bˆ − b
¬ N (0,1)
σ Bˆ
Bˆ − b
¬ tn−2
S Bˆ
Aˆ − a
¬ N (0,1)
σ Aˆ
Aˆ − a
¬ tn−2
S Aˆ
Regression and correlation
When the regression model yˆ = aˆ + bˆx is estimated:
Yi = Aˆ + Bˆ xi + E
µ̂Yi xi
[ ] [] []
Var [µ ] = Var [Aˆ ]+ x
E µYi xi = E Aˆ + E Bˆ xi = a + bxi
2
i
Yi xi
[]
Var Bˆ
2
x
−
x
(
)
1
= σ 2E ( + n i
)
n
2
x
−
x
(
)
∑ i
i =1
Regression and correlation
Uncertainty on regression curve, by parameter uncertainty:
y
µY
f µY xi (µYi )
i
µYi xi
xi
x
Regression and correlation
When the regression model yˆ = aˆ + bˆx is estimated:
Yi = Aˆ + Bˆ xi + E
µ̂Yi xi + E
[ ] [] []
Var [µY ] = Var [Aˆ ]+ x
E µYYii xi = E Aˆ + E Bˆ xi = a + bxi
2
i
Yii xi
Yi − E [Yi ]
¬ tn−2
σYi
[]
Var Bˆ
2
x
−
x
(
)
1
= σ 2E ( + n i
)
n
2
x
−
x
(
)
∑ i
i =1
+σ
2
E
Regression and correlation
Different uncertainty-sources:
y
parameter uncertainties
+ model-structure uncertainties
+ input uncertainties
xi
x
Regression and correlation
Link with correlation:
y
sY
y
fY ( y )
x
x
sX
f X (x)
Regression and correlation
y
yi
ŷi
y
x xi
x
Regression and correlation
Variance decomposition:
2
2
2
n
n
ˆ
ˆ
(
y
−
y
)
(
y
−
y
)
(
y
−
y
)
i
sY2 = ∑ i
=∑ i
+∑ i
n−2
n−2
n−2
i =1
i =1
i =1
n
2
stot
sm2 od
2
tot
s
sE2
sm2 od
: proportion of total variance explained by the regression
= a measure of the ‘goodness-of-fit’ of the regression
Regression and correlation
Sensitivity analysis:
s
2
m od
dy 2 2
= ( ) sx
dx
model sensitivity
Regression and correlation
Goodness-of-fit:
sm2 od
2
tot
s
(Cov( X , Y )) 2
2
=
=
R
2
2
s X sY
: coefficient of determination
if X,Y bivariate normal:
= ρ2
Mathematical modelling
considering uncertainties:
Linear model:
y = a+bx
Y = Aˆ + Bˆ X + E
General model:
y = F ( x, p )
Y = F ( X , Pˆ ) + E
p
x(t)
>
Fi , i=1,n
>
Y(t)
Model uncertainty analysis
Î probabilistic modelling
Considering
• input uncertainties
• parameter uncertainties
• model-structure uncertainties
Parameter uncertainties
fP(p)
x(t)
>
Fi , i=1,n
+
>
+
>
Y(t)
n finite
EY model-str. unc. (t)
EX(t)
>t
Input uncertainties
>t
Model-structure
uncertainties
Probabilistic modelling
Example: Lumped conceptual rainfall-runoff model :
Rainfall input
Rainfall-input
uncertainty
EX
∑
Es+i
Model-structure
uncertainties
• surface runoff
and interflow
• baseflow
Eg
∑
Evapotranspiration
∑
Soil
moisture
storage
kIF
kBF
kOF
Surface
runoff
Interflow
Baseflow
Probabilistic modelling
Example: Lumped conceptual rainfall-runoff model :
7
Measurements
3
Discharge
Discharge[m
[m3/s]
/s]
6
Probabilistic
model results
5
4
3
2
1
0
9000
10000
11000
12000
13000
Time
[h][h]
Time
14000
15000
16000
17000
Probabilistic modelling
Example: Lumped conceptual rainfall-runoff model :
10
9
7
5
4
33
Discharge
/s]
Discharge[m
[m
/s]
3
Discharge
Discharge[m
[m3/s]
/s]
6
8
Measurements
7
6
Probabilistic
model results
5
4
3
2
1
3
0
8850
2
8900
8950
9000
9050
Time[h]
[h]
Time
9100
9150
9200
1
0
9000
10000
11000
12000
13000
Time
[h][h]
Time
14000
15000
16000
17000
Regression and correlation
Additional remarks:
• regression in y-direction might be different from a regression
in x-direction
• possible hypothesis tests:
H0 : a = 0
H0 : b = 0
• (n − 2)
sm2 od
2
tot
s
¬ F (1, n − 2) if b = 0
Regression and correlation
Additional remarks:
 X1 
• multivariate case:
X =  X 2 
 ... 
Y1 
Y = Y2 
 ... 
Y = A+ B X
Bˆ = ( X T X ) −1 X T Y
Aˆ = Y − Bˆ X
n-2 → n-p
• model order identification:
balance between R2 and p or Var(P)
e.g. AIC (Akaike Information Criterion)
YIC (Young’s Information Criterion)
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Discharge splitting
Example:
7
Measurements
Filtered baseflow
6
Filtered interflow
Filtered total discharge
Discharge [m3/s]
5
4
3
2
1
0
500
550
600
650
Time [number of hours]
700
750
800
Discharge splitting
Example baseflow filtering:
8
Debietmeting
Measurements
Filtered baseflow
Gefilterde
basisafvoer
Discharge
[m3/s]
Debiet [m3/s]
7
6
5
4
3
2
1
0
0
1000
2000
3000
4000
5000
Tijd [aantal
Time [number
of uren]
hours]
6000
7000
8000
Discharge splitting
Baseflow filtering based on recession constant:
10
Discharge [m3/s]
Measurements
Filtered baseflow
Slope recession constant for baseflow
1
0.1
0.01
0
1000
2000
3000
4000
5000
Time [number of hours]
6000
7000
8000
Discharge splitting
Linear reservoir model:
1
b(t ) = exp(− )b(t − 1)
k
1 q (t − 1) + q (t )
+ (1 − exp(− ))(
)
k
2
b(t)
b(0)
q(t)
k
t
b(t)
Discharge splitting
Linear reservoir mode as ‘lowpass filter’:
1
2
H( f ) ≈
1 + (2 π f k ) 2
frequency response function:
1
0.9
0.8
| H(f) |
2
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
Frequency f
0.4
0.5
Discharge splitting
Numerical filtering:
q(t): total stream flow
filter
b(t): slow flow (baseflow)
f(t): quick flow
: signal filtered out
: filter result
Discharge splitting
Working-principle ‘Extended Chapman-filter’:
q(t)
w*q(t)
f(t)
(1-w)*q(t)
b(t)
Discharge splitting
Extended Chapman-filter:
f (t ) = af (t − 1) + b( y (t ) − αy (t − 1))
b (t ) = q ( t ) − f (t )
= αb(t − 1) + c (1 − α )( f (t ) + f (t − 1))
α = exp( −1 / k )
1− w
v=
w
( 2 + v )α − v
a=
2 + v − vα
2
b=
2 + v − vα
c = 0 .5v
Discharge splitting
Calibration k parameter:
10
Discharge [m3/s]
Measurements
Filtered baseflow
Slope recession constant for baseflow
1
0.1
k
0.01
0
1000
2000
3000
4000
5000
Time [number of hours]
6000
7000
8000
Discharge splitting
Calibration v parameter:
Debiet [m3[m3/s]
/s]
Discharge
1
Debietmeting
Discharge measurements
0.1
1100
Grond
waterfiltering,
gekalibreerde
v parameter
Filtered
baseflow, calibrated
v parameter
Grond
waterfiltering,
ond er-orofunderestimated
overs chatte v p arameter
Filtered
baseflow, overv parameter
1300
1500
1700
1900
Tijd [aanta l uren vana f 0 1.01.93 ]
Time [number of hours]
2100
2300
2500
Discharge splitting
Discharge splitting of baseflow and interflow (subsurface flow)
from total flow in three steps:
• elimination of constant component
• baseflow separation from the total discharge
• interflow separation from the series ‘total discharge - baseflow’
Discharge splitting
Example baseflow:
10
Discharge [m3/s]
Measurements
Filtered baseflow
Slope recession constant for baseflow
1
0.1
0.01
0
1000
2000
3000
4000
5000
Time [number of hours]
6000
7000
8000
Discharge splitting
Example interflow:
Discharge [m3/s]
10
Measurements, after subtraction filtered baseflow
Filtered interflow
Slope recession constant for interflow
1
0.1
0.01
500
550
600
650
700
750
Time [number of hours]
800
850
900
950
Discharge splitting
Application using different steps :
• 1-step approach: 1 forward step
• 3-step approach: forward + backward + forward
Debiet [m3/s]
Discharge
[m3/s]
10
Debietmeetreeks
Discharge measurements
Eenmalige
toepassing1-step
filter approach
Filtered baseflow,
Drievoudige
toepassing
filter
Filtered baseflow,
3-step
approach
Slopeter
recession
constant
baseflow
Helling
kalibratie
recessieconstante
1
0.1
6000
6500
7000
7500
Tijd [aantal
uren]
Time [number
of hours]
8000
8500
9000
POT extraction
8
7
Measurements
Debietmeetreeks
POTvalues
waarden
POT
Discharge
[m3/s]
Debiet [m3/s]
6
5
4
3
2
1
baseflow
0
1-Nov-93
11-Nov-93
21-Nov-93
1-Dec-93
Tijd
Time
11-Dec-93
21-Dec-93
31-Dec-93
POT extraction
Method 1: based on baseflow
qmax
p
qmin - qbase
based on max. ratio
<f
qmax
min. peak height qmax > qlim
qmin
qbase
Method 2: based on baseflow + interflow
idem after replacement of qbase by qbase + qinter
Method 3: independent on subflows
based on min. indep. period p > k
qmin
based on max. ratio q < f
max
min. peak height qmax > qlim
Statistics for Water Engineering
1. Initial definitions
: Hydrological variables, processes and data
2. Descriptive statistics
2.1. Presentation of data
2.2. Statistical descriptors of data
3. Probability theory
3.1. Elementary probability theory
3.2. Probability distributions
3.3. Estimation of parameters
3.4. Testing statistical hypotheses
4. Extreme value analysis
5. Regression and correlation
6. Hydrological time series analysis
7. Introduction in stochastic and probabilistic modelling and risk analysis
Risk analysis
Risk = Probability ⊗ Elements ⊗ Vulnerability ⊗ Value
Risk = Probability ⊗ Consequence
Economic consequence = Damage
Risk analysis
Example: risk-based design of an hydraulic structure for flood protection:
Total annual expected cost
= total installation cost * capital recovery factor + annual expected damage cost
Risk analysis
Example: risk-based design of an hydraulic structure for flood protection:
Annual expected damage cost: E(D)
Damage = Load > Resistance
Discharge Q
in the river
Discharge capacity Qc
hydraulic structure
Considering the distribution of Q:
+∞
Annual expected flood damage : E(D) = ∫ D(q,qc) fQ(q) dq
qc
Considering the distribution of Q and Qc:
+∞
+∞
Annual expected flood damage : E(D) = ∫ ( ∫ D(q,qc) fQ(q) dq ) fQ (qc) dqc
0
qc
c
Risk-zone
Risk =
Probability * damage
Return period
Spatial dimension
Risk analysis
Example: Flood risk mapping
River Dender case (Belgium) - area Geraardsbergen - Zandbergen:
floodmaps for:
rivers
subcatchments
1 year
10 years
100 years
Download