GEOSTATISTICS - Laboratory for Remote Sensing Hydrology and

advertisement
STOCHASTIC HYDROLOGY
Stochastic Simulation (I)
Univariate simulation
Professor Ke-Sheng Cheng
Department of Bioenvironmental Systems Engineering
National Taiwan University
Stochastic Hydrology
• Hydrological processes exhibit variations in
both space and time. As hydrological models
are simplified versions of reality, they
produce predictions or estimates of
hydrological variables (e.g. runoff, hydraulic
head, concentration) that are inherently
erroneous.
• Stochastic hydrology is mainly concerned
with the assessment of uncertainty in
hydrological analysis, modeling and
forecasting.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
2
• In stochastic hydrology, the assessment of
uncertainty is an integral part of hydrological
analysis and modeling, being as important as
the predictions themselves.
• Assessment of uncertainties is achieved by
using “stochastic models”, which are models
consisting of random components. These
random components characterize the part of
reality that is not explained by the
deterministic components in our model.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
3
• Errors in hydrological model prediction can
occur because
– the model concept is wrong, or Model uncertainty
– due to errors in parameters, boundary and
initial conditions.
Parameter uncertainty
• We may choose to ignore these errors and
accept our model predictions at face value.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
4
• However, in operational hydrology, when
actual decisions have to be made based on
hydrological model predictions (sometimes
involving human life such as in flood control),
it is imperative that uncertainty is taken into
account in the decision-making process.
– Government policy decision-making is a
complicated process. (Holistic decision making)
– An inappropriate decision may result in
significant losses of life or over/underinvestments in public infrastructures.
Unintended consequences of government policy
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
5
What does a prediction really represent?
• Example of a linear regression model
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
6
What does a prediction really represent?
• Example of a linear regression model.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
7
• Predictions of data-driven models.
• Predictions of deterministic models.
• Properties of the predictand variable
– Unbiased ?
– Confidence interval ?
– Efficient ?
For parameter estimation, we are concerned about the
above properties of our estimators, but why not the
predictions?
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
8
• The objectives of this course are
– To demonstrate the stochastic nature of many
hydrological processes,
– To facilitate students with a stochastic
perspective of hydrological modeling and
forecasting, and
– To introduce techniques of stochastic simulation
which will enable students to explore a wide
range of applications.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
9
Univariate Simulation
•
•
•
•
•
Pseudo random number generation
Probability integral transformation
Rejection method
Frequency-factor based generation
Random number generation using R
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
10
Pseudo Random Number
Generator (PRNG)
• Computer simulation of random variables is
the task of using computers to generate many
random numbers that are independent and
identically distributed (IID). It is also known
as random number generation (RNG).
• In fact, these computer-generated random
numbers form a deterministic sequence, and
the same list of numbers will be cycled over
and over again. This cycle can be made to be
so long that the lack of true independence is
unimportant.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
11
• Therefore, such computer codes are often
termed pseudo-random number generators
(PRNG).
• There exist mathematical transformation
methods to obtain other distributions from
uniform variates. For this reason, most
PRNGs found in software libraries produce
uniform random numbers in the unit interval
(0, 1).
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
12
Linear Congruential Generator
• Generation of random samples of various
probability densities is based random
samples of the uniform density U[0,1).
Therefore, the algorithm of generating
random numbers of U[0,1) is essential. This
can be achieved by the Linear Congruential
Generator (LCG) described below.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
13
• Let a sequence of numbers xn be defined by
xn  [axn 1  c] modulo m
where a, c, and m are given positive integers.
The above equation means that axn 1  c is
divided by m and the remainder is taken as the
value of xn . The quantity xn / m is then taken as an
approximation to the value of a U[0,1) random
variable. When c = 0, the algorithm is also called a
pure multiplicative generator.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
14
• A guideline for selection of a and m (c =0) is that m
be chosen to be a large prime number that can be
fitted to the computer word size. For a 32-bit word
5
31
computer, m = 2  1 and a = 7 result in desired
properties.
• For small computers without a random number
generator, the following a, c, and m are found to be
satisfactory when the LCG algorithm is used:
a  25173, c  13849 and m  65536.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
15
PROBABILITY INTEGRAL
TRANSFORMATION
• The PIT method is based on the property that a
random variable X with CDF FX () can be
transformed into a random variable U with uniform
distribution over the interval (0,1) by defining
U  FX (X )
• Conversely, if U is uniformly distributed over the
interval (0,1), then X  FX1 (U ) has cumulative
distribution function FX () .
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
16
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
17
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
18
• For random variables whose cumulative
distribution function cannot be expressed by
a close form the probability integral
transformation technique cannot be used for
generating random numbers of these random
variables.
• The normal distribution is one such random
variable.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
19
The Acceptance/Rejection Method
• This method uses an auxiliary density for
generation of random quantities from
another distribution. This method is
particularly useful for generating random
numbers of random variables whose
cumulative distribution functions cannot be
expressed in closed form.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
20
• Suppose that we want to generate random
numbers of a random variable X with density
f(X).
• An auxiliary density g(X) which we know
how to generate random samples is identified
and cg(X) is everywhere no less than f(X) for
some constant c, i.e.,
f ( x)  cg ( x)
3/14/2016
x
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
21
cg(X)
f(X)
X
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
22
•
•
•
•
Generate a random number x of density g(X),
Generate a random number u from the density
U[0,cg(x)),
Reject x if u > f(x); otherwise, x is accepted as a
random number form f(X),
Repeat the above steps until the desired number
of random numbers are obtained.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
23
Frequency-factor-based generation
• An advantage of the method is that it does
not require CDF inversion and frequency
factors of the five commonly used
distributions involves only the standard
normal and the uniform deviates.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
24
GENERAL EQUATION FOR
HYDROLOGICAL FREQUENCY ANALYSIS
A random variable X has cumulative distribution
function FX () with mean  and standard
deviation  . The magnitude of X corresponding to
return period T, denoted by xT , is defined as
P X  xT   1
T
Chow (1951) proposed the following general
equation for hydrologic frequency analysis:
xT    KT
where KT , the frequency factor, is a function of T
and is distribution-specific.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
25
• Apparently, if X is normally distributed, the
frequency factor KT corresponds to the
standard normal deviate with exceedence
probability 1/T.
• Frequency factors of distributions commonly
used in hydrologic frequency analysis have
been developed (Kite, 1988).
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
26
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
27
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
28
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
29
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
30
TEST AND VALIDATION
• In order to demonstrate the applicability of
the FQFT approach, random numbers of
normal, log-normal, extreme value type I
(EV1), Pearson type III (PT3) and LogPearson type III (LPT3) distributions were
generated and tested.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
31
• For each type of distribution N random
samples, each of size n, were generated and
used in subsequent analysis.
• In this study the sample size n was set to vary
from 50 to 500 at increment of 50 and
number of random samples N was set to
1,000 and 10,000.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
32
• Three means were adopted to test the
validity of the generated random
numbers:
– Graphical comparison of CDF and
empirical CDF (ECDF) derived from
generated data,
– Properties of estimated parameters, and
– Type-I-error of goodness-of-fit (GOF) test.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
33
Graphical comparison of CDF and
ECDF
• Figure 2 graphically illustrates the closeness of CDF
and ECDF with regard to sample size of 50 and 500.
Each ECDF in Figure 2 is based on one single
random sample of size 50 or 500 and it may change
when another random sample is used.
• It can be seen that even at sample size of 50 the
ECDF is fairly close to CDF of the designated
distribution. At sample size of 500, all ECDFs
become almost indistinguishable from their
corresponding CDFs.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
34
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
35
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
36
Properties of parameter estimators
• From each of the N generated random
samples, distribution parameters including
mean, standard deviation and coefficient of
skewness can be estimated.
• Furthermore, from a total of N random
samples, the sample mean and standard
deviation of the above estimated parameters
were calculated, with respect to sample size n
ranging from 50 to 500.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
37
Uncertainty in estimation
of mean reduces as
sample size increases.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
38
Uncertainty in estimation
of standard deviation
reduces as sample size n
increases.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
39
Uncertainty in estimation
of skewness coefficient
reduces as sample size n
increases.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
40
• With 1,000 and 10,000 random samples
(N=1000 or 10,000), sample means (the center
line) of the estimated parameters (including
mean, standard deviation and coefficient of
skewness) are very close to the theoretical
values designated for random number
generation.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
41
• It is also seen clearly that standard deviations
of all parameter estimators decrease with
increase of the sample size n, indicating the
unbiasedness of the estimator and reduction
of uncertainty in parameter estimation. Such
characteristics of parameter estimators
suggest the generated random samples are
indeed from the desired distributions.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
42
Type-I-Error of goodness-of-fit test
• Each random sample of size n is generated
from a theoretical distribution with
designated parameters and GOF test can be
applied to test whether the random sample is
drawn from the theoretical distribution. The
widely applied Chi-square GOF test is
adopted in this study.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
43
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
44
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
45
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
46
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
47
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
48
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
49
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
50
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
51
Generating random samples of
normal distribution
• The Box-Muller method
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
52
• The Central Limit Theorem
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
53
• Acceptance/Rejection Method
– The cumulative distribution function of the
exponential density and its inverse function can
be easily derived, and therefore, random samples
of the exponential distribution can be generated
with the probability integral transformation
method.
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
54
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
55
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
56
3/14/2016
Lab for Remote Sensing Hydrology and Spatial Modeling
Dept of Bioenvironmental Systems Engineering, NTU
57
Random number generation in R
• R commands for stochastic simulation (for
normal distribution
– pnorm – cumulative probability
– qnorm – quantile function
– rnorm – generating a random sample of a specific
sample size
– dnorm – probability density function
For other distributions, simply change the distribution names.
For examples, (punif, qunif, runif, and dunif) for uniform
distribution and (ppois, qpois, rpois, and dpois) for Poisson
distribution.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
58
Generating random numbers of
discrete distribution using R
• Discrete uniform distribution
– R does not provide default functions for random
number generation for the discrete uniform
distribution.
– However, the following functions can be used for
discrete uniform distribution between 1 and k.
•
•
•
•
3/14/2016
rdu<-function(n,k) sample(1:k,n,replace=T) # random number
ddu<-function(x,k) ifelse(x>=1 & x<=k & round(x)==x,1/k,0) # density
pdu<-function(x,k) ifelse(x<1,0,ifelse(x<=k,floor(x)/k,1))
# CDF
qdu <- function(p, k) ifelse(p <= 0 | p > 1, return("undefined"),
ceiling(p*k))
# quantile
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
59
– Similar, yet more flexible, functions are defined as
follows
• dunifdisc<-function(x, min=0, max=1) ifelse(x>=min & x<=max &
round(x)==x, 1/(max-min+1), 0)
>dunifdisc(23,21,40)
>dunifdisc(c(0,1))
• punifdisc<-function(q, min=0, max=1) ifelse(q<min, 0, ifelse(q>max, 1,
floor(q-min+1)/(max-min+1)))
>punifdisc(0.2)
>punifdisc(5,2,19)
• qunifdisc<-function(p, min=0, max=1) floor(p*(max-min+1))+min
>qunifdisc(0.2222222,2,19)
>qunifdisc(0.2)
• runifdisc<-function(n, min=0, max=1) sample(min:max, n, replace=T)
>runifdisc(30,2,19)
>runifdisc(30)
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
60
• Binomial distribution
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
61
• Poisson distribution
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
62
Download