04/04/2006 Hydrologic Statistics Reading: Chapter 11, Sections 12-1 and 12-2 of Applied Hydrology Probability • A measure of how likely an event will occur • A number expressing the ratio of favorable outcome to the all possible outcomes • Probability is usually represented as P(.) – P (getting a club from a deck of playing cards) = 13/52 = 0.25 = 25 % – P (getting a 3 after rolling a dice) = 1/6 2 Random Variable • Random variable: a quantity used to represent probabilistic uncertainty – Incremental precipitation – Instantaneous streamflow – Wind velocity • Random variable (X) is described by a probability distribution • Probability distribution is a set of probabilities associated with the values in a random variable’s sample space 3 Sampling terminology • Sample: a finite set of observations x1, x2,….., xn of the random variable • A sample comes from a hypothetical infinite population possessing constant statistical properties • Sample space: set of possible samples that can be drawn from a population • Event: subset of a sample space Example Population: streamflow Sample space: instantaneous streamflow, annual maximum streamflow, daily average streamflow Sample: 100 observations of annual max. streamflow Event: daily average streamflow > 100 cfs 5 Summary statistics • Also called descriptive statistics – If x1, x2, …xn is a sample then Mean, 1 n X xi n i 1 m for continuous data 2 Variance, Standard deviation, Coeff. of variation, 1 n S xi X n 1 i 1 s2 for continuous data S S2 s for continuous data 2 CV S X Also included in summary statistics are median, skewness, correlation coefficient, 6 Graphical display • • • • Time Series plots Histograms/Frequency distribution Cumulative distribution functions Flow duration curve 8 Time series plot • Plot of variable versus time (bar/line/points) • Example. Annual maximum flow series Annual Max Flow (10 3 cfs) 600 500 400 300 200 100 0 1905 1900 1918 1908 1900 1927 19001938 1948 1900 1958 1968 1900 Year Year Colorado River near Austin 9 1988 1978 1900 1998 1900 Histogram • Plots of bars whose height is the number ni, or fraction (ni/N), of data falling into one of several intervals of equal width 30 60 100 90 50 25 No. ofoccurences occurences No. No. of of occurences 80 Interval = 50,000 cfs 70 40 20 Interval = 25,000 Interval = 10,000 cfscfs 60 30 15 50 40 20 10 30 1020 5 10 50 0 45 0 300 300 40 0 0 50 50 100100 150 150 200 200 250 250 35 0 30 0 25 0 20 0 15 0 10 0 0 50 0 0 00 350 400 400 450 450 500 500 350 3 3 3cfs) Annual ax flow (10 Annual ax flow Annualmm m ax flow(10 (10cfs) cfs) Dividing the number of occurrences with the total number of points will give Probability 10 Mass Function Using Excel to plot histograms 1) Make sure Analysis Tookpak is added in Tools. This will add data analysis command in Tools 2) Fill one column with the data, and another with the intervals (eg. for 50 cfs interval, fill 0,50,100,…) 3) Go to ToolsData AnalysisHistogram 4) Organize the plot in a presentable form (change fonts, scale, color, etc.) 12 Probability density function • Continuous form of probability mass function is probability density function 0.9 100 90 0.8 No. of occurences Probability 80 0.7 70 0.6 60 0.5 50 0.4 40 0.3 30 0.2 20 0.1 10 00 0 0 50 100 100 150 200 200 300 250 300 400350 400500450 500 600 3 3 Annualmm flow(10 (10 cfs) Annual axaxflow cfs) pdf is the first derivative of a cumulative distribution function 13 Cumulative distribution function • Cumulate the pdf to produce a cdf • Cdf describes the probability that a random variable is less than or equal to specified value of x 1 P (Q ≤ 50000) = 0.8 Probability 0.8 P (Q ≤ 25000) = 0.4 0.6 0.4 0.2 0 0 100 200 300 400 500 Annual m ax flow (103 cfs) 15 600 Hydrologic extremes • Extreme events – Floods – Droughts • Magnitude of extreme events is related to their frequency of occurrence Magnitude 1 Frequency of occurence • The objective of frequency analysis is to relate the magnitude of events to their frequency of occurrence through probability distribution • It is assumed the events (data) are independent and come from identical distribution 19 Return Period • • • • • Random variable: X xT Threshold level: Extreme event occurs if: X xT Recurrence interval: Time between ocurrences of X x Return Period: E ( ) T Average recurrence interval between events equalling or exceeding a threshold • If p is the probability of occurrence of an extreme event, then E ( ) T 1 p or 1 P ( X xT ) T 20 More on return period • If p is probability of success, then (1-p) is the probability of failure • Find probability that (X ≥ xT) at least once in N years. p P ( X xT ) P ( X xT ) (1 p ) P ( X xT at least once in N years) 1 P ( X xT all N years) 1 P ( X xT at least once in N years) 1 (1 p ) N 1 1 T N 21 Hydrologic data series • Complete duration series – All the data available • Partial duration series – Magnitude greater than base value • Annual exceedance series – Partial duration series with # of values = # years • Extreme value series – Includes largest or smallest values in equal intervals • Annual series: interval = 1 year • Annual maximum series: largest values • Annual minimum series : smallest values 22 Return period example • Dataset – annual maximum discharge for 106 years on Colorado River near Austin xT = 200,000 cfs 600 Annual Max Flow (10 3 cfs) No. of occurrences = 3 500 2 recurrence intervals in 106 years 400 300 T = 106/2 = 53 years 200 If xT = 100, 000 cfs 100 0 1905 7 recurrence intervals 1908 1918 1927 1938 1948 1958 1968 1978 1988 1998 T = 106/7 = 15.2 yrs Year P( X ≥ 100,000 cfs at least once in the next 5 years) = 1- (1-1/15.2)5 = 0.29 23 Probability distributions • Normal family – Normal, lognormal, lognormal-III • Generalized extreme value family – EV1 (Gumbel), GEV, and EVIII (Weibull) • Exponential/Pearson type family – Exponential, Pearson type III, Log-Pearson type III 24 Normal distribution • Central limit theorem – if X is the sum of n independent and identically distributed random variables with finite variance, then with increasing n the distribution of X becomes normal regardless of the distribution of random variables • pdf for normal distribution 1 f X ( x) e s 2 1 xm 2 s 2 m is the mean and s is the standard deviation Hydrologic variables such as annual precipitation, annual average streamflow, or annual average pollutant loadings follow normal distribution 25 Standard Normal distribution • A standard normal distribution is a normal distribution with mean (m) = 0 and standard deviation (s) = 1 • Normal distribution is transformed to standard normal distribution by using the following formula: z X m s z is called the standard normal variable 26 Lognormal distribution • If the pdf of X is skewed, it’s not normally distributed • If the pdf of Y = log (X) is normally distributed, then X is said to be lognormally distributed. ( y m y )2 f ( x) exp 2 2s y xs 2 1 x 0, and y log x Hydraulic conductivity, distribution of raindrop sizes in storm follow lognormal distribution. 27 Extreme value (EV) distributions • Extreme values – maximum or minimum values of sets of data • Annual maximum discharge, annual minimum discharge • When the number of selected extreme values is large, the distribution converges to one of the three forms of EV distributions called Type I, II and III 28 EV type I distribution • If M1, M2…, Mn be a set of daily rainfall or streamflow, and let X = max(Mi) be the maximum for the year. If Mi are independent and identically distributed, then for large n, X has an extreme value type I or Gumbel distribution. f ( x) x u x u exp exp 1 6sx u x 0.5772 Distribution of annual maximum streamflow follows an EV1 distribution 29 EV type III distribution • If Wi are the minimum streamflows in different days of the year, let X = min(Wi) be the smallest. X can be described by the EV type III or Weibull distribution. k x f ( x) k 1 x k exp x 0; , k 0 Distribution of low flows (eg. 7-day min flow) follows EV3 distribution. 30 Exponential distribution • Poisson process – a stochastic process in which the number of events occurring in two disjoint subintervals are independent random variables. • In hydrology, the interarrival time (time between stochastic hydrologic events) is described by exponential distribution f ( x ) e x 1 x 0; x Interarrival times of polluted runoffs, rainfall intensities, etc are described by exponential distribution. 31 Gamma Distribution • The time taken for a number of events (b) in a Poisson process is described by the gamma distribution • Gamma distribution – a distribution of sum of b independent and identical exponentially distributed random variables. b x b 1e x f ( x) ( b ) x 0; gamma function Skewed distributions (eg. hydraulic conductivity) can be represented using gamma without log transformation. 32 Pearson Type III • Named after the statistician Pearson, it is also called three-parameter gamma distribution. A lower bound is introduced through the third parameter (e) b ( x e ) b 1 e ( x e ) f ( x) ( b ) x e ; gamma function It is also a skewed distribution first applied in hydrology for describing the pdf of annual maximum flows. 33 Log-Pearson Type III • If log X follows a Person Type III distribution, then X is said to have a log-Pearson Type III distribution b ( y e ) b 1 e ( y e ) f ( x) ( b ) y log x e 34