Hydrologic Statistics

advertisement
04/04/2006
Hydrologic Statistics
Reading: Chapter 11 in Applied Hydrology
Some slides by Venkatesh Merwade
Hydrologic Models
Classification based on randomness.
• Deterministic (eg. Rainfall runoff analysis)
– Analysis of hydrological processes using deterministic
approaches
– Hydrological parameters are based on physical relations of
the various components of the hydrologic cycle.
– Do not consider randomness; a given input produces the
same output.
• Stochastic (eg. flood frequency analysis)
– Probabilistic description and modeling of hydrologic
phenomena
– Statistical analysis of hydrologic data.
2
Probability
• A measure of how likely an event will occur
• A number expressing the ratio of favorable
outcome to the all possible outcomes
• Probability is usually represented as P(.)
– P (getting a club from a deck of playing cards) = 13/52 = 0.25 = 25 %
– P (getting a 3 after rolling a dice) = 1/6
3
Random Variable
• Random variable: a quantity used to represent
probabilistic uncertainty
– Incremental precipitation
– Instantaneous streamflow
– Wind velocity
• Random variable (X) is described by a probability
distribution
• Probability distribution is a set of probabilities
associated with the values in a random variable’s sample
space
4
Sampling terminology
• Sample: a finite set of observations x1, x2,….., xn of the random
variable
• A sample comes from a hypothetical infinite population
possessing constant statistical properties
• Sample space: set of possible samples that can be drawn from a
population
• Event: subset of a sample space

Example
 Population: streamflow
 Sample space: instantaneous streamflow, annual
maximum streamflow, daily average streamflow
 Sample: 100 observations of annual max. streamflow
 Event: daily average streamflow > 100 cfs
6
Types of sampling
• Random sampling: the likelihood of selection of each member of the
population is equal
– Pick any streamflow value from a population
• Stratified sampling: Population is divided into groups, and then a random
sampling is used
– Pick a streamflow value from annual maximum series.
• Uniform sampling: Data are selected such that the points are uniformly far
apart in time or space
– Pick steamflow values measured on Monday midnight
• Convenience sampling: Data are collected according to the convenience of
experimenter.
– Pick streamflow during summer
7
Summary statistics
• Also called descriptive statistics
– If x1, x2, …xn is a sample then
Mean,
1 n
X   xi
n i 1
m for continuous data
2
Variance,
Standard
deviation,
Coeff. of variation,
1 n
S 
 xi  X 
n  1 i 1
s2 for continuous data
S  S2
s for continuous data
2
CV 
S
X
Also included in summary statistics are median, skewness, correlation coefficient,
8
Graphical display
•
•
•
•
Time Series plots
Histograms/Frequency distribution
Cumulative distribution functions
Flow duration curve
10
Time series plot
• Plot of variable versus time (bar/line/points)
• Example. Annual maximum flow series
Annual Max Flow (10 3 cfs)
600
500
400
300
200
100
0
1905
1900
1918
1908 1900
1927
19001938
1948
1900 1958
1968
1900
Year
Year
Colorado River near Austin
11
1988
1978 1900
1998
1900
Histogram
• Plots of bars whose height is the number ni, or fraction
(ni/N), of data falling into one of several intervals of
equal width
30
60
100
90
50
25
No. ofoccurences
occurences
No.
No. of
of occurences
80
Interval = 50,000 cfs
70
40
20
Interval
= 25,000
Interval
= 10,000
cfscfs
60
30
15
50
40
20
10
30
1020
5
10
50
0
45
0
300
300
40
0
0 50 50 100100 150
150 200
200 250
250
35
0
30
0
25
0
20
0
15
0
10
0
0
50
0
0
00
350 400
400 450
450 500
500
350
3 3 3cfs)
Annual
ax
flow
(10
Annual
ax
flow
Annualmm
m
ax
flow(10
(10cfs)
cfs)
Dividing the number of occurrences with the total number of points will give Probability
12
Mass Function
Using Excel to plot histograms
1) Make sure Analysis Tookpak is added in Tools.
This will add data analysis command in Tools
2) Fill one column with the data, and another with
the intervals (eg. for 50 cfs interval, fill
0,50,100,…)
3) Go to ToolsData AnalysisHistogram
4) Organize the plot in a presentable form
(change fonts, scale, color, etc.)
14
Probability density function
• Continuous form of probability mass function is probability
density function
0.9
100
90
0.8
No. of
occurences
Probability
80
0.7
70
0.6
60
0.5
50
0.4
40
0.3
30
0.2
20
0.1
10
00
0
0
50
100 100
150
200
200 300
250
300 400350
400500450
500
600
3 3
Annualmm
flow(10
(10
cfs)
Annual
axaxflow
cfs)
pdf is the first derivative of a cumulative distribution function
15
Cumulative distribution function
• Cumulate the pdf to produce a cdf
• Cdf describes the probability that a random variable is less
than or equal to specified value of x
1
P (Q ≤ 50000) = 0.8
Probability
0.8
P (Q ≤ 25000) = 0.4
0.6
0.4
0.2
0
0
100
200
300
400
500
Annual m ax flow (103 cfs)
17
600
Flow duration curve
• A cumulative frequency curve that shows the percentage of
time that specified discharges are equaled or exceeded.

Steps





Arrange flows in chronological order
Find the number of records (N)
Sort the data from highest to lowest
Rank the data (m=1 for the highest value and m=N for the lowest value)
Compute exceedance probability for each value using the following
formula
p  100 

m
N 1
Plot p on x axis and Q (sorted) on y axis
22
Flow duration curve in Excel
600
500
Q (1000 cfs)
400
Median flow
300
200
100
0
0
20
40
60
% of tim e Q w ill be exceeded
23
80
100
Statistical analysis
•
•
•
•
Regression analysis
Mass curve analysis
Flood frequency analysis
Many more which are beyond the scope of
this class!
24
Linear Regression
• A technique to determine the relationship between two
random variables.
– Relationship between discharge and velocity in a stream
– Relationship between discharge and water quality constituents
A regression model is given by : yi
 b 0  b1 xi  e i
yi = ith observation of the response (dependent variable)
xi = ith observation of the explanatory (independent) variable
b0 = intercept
b1 = slope
ei = random error or residual for the ith observation
n = sample size
25
i  1,2,..., n
Least square regression
• We have x1, x2, …, xn and y1,y2, …, yn
observations of independent and dependent
variables, respectively.
i  1,2,..., n
• Define a linear model for yi, yˆi  b0  b1 xi
• Fit the model (find b0 and b1) such at the sum
of the squares of the vertical deviations is
minimum
– Minimize  yi  yˆ i 2  ( yi  b 0  b1 xi ) 2
Regression applet:
26
http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html
i  1,2,..., n
Linear Regression in Excel
• Steps:
– Prepare a scatter plot
– Fit a trend line
1800
TDS (mg/L)
1500
Data are for Brazos River
near Highbank, TX
TDS = 0.5946(sp. Cond) - 15.709
R2 = 0.9903
1200
900
600
300
0
0
500
1000
1500
2000
2500
Specific Conductance ( S/cm)

Alternatively, one can use ToolsData
AnalysisRegression
27
3000
Coefficient of determination (R2)
• It is the proportion of observed y variation that can
be explained by the simple linear regression model
SSE
R  1
SST
2
SST   ( yi  y ) 2 Total sum of squares, Ybar is the mean of yi
SSE   ( yi  yˆ i ) 2 Error sum of squares
The higher the value of R2, the more successful is the model in explaining y
variation.
If R2 is small, search for an alternative model (non linear or multiple
regression model) that can more effectively explain y variation
28
Download