CSE 38300 Introduction to Analytical and Quantitative Methods for Civil Engineering Lecture 5 Ir CL KWAN Department of Civil and Environmental Engineering The Hong Kong Polytechnic University 1 Sampling Distribution and Estimation 2 Sampling distribution of the mean Let X be the number you get on a dice. The probability distribution of X is: 0.2 P(x) = 1/6 0.15 0.1 0.05 0 1 2 3 4 5 6 Population mean: E ( X ) = µ = ∑ xP(x ) = 1 ⋅ 1 + 2 ⋅ 1 + ... + 6 ⋅ 1 = 3.5 6 6 6 V ( X ) = σ 2 = ∑ ( x − µ ) P ( x ) = (1 − 3.5 ) 2 Population S.D.: σ = σ 2 = 2.92 = 1.71 2 1 2 1 + ... + (6 − 3.5 ) = 2.92 6 6 3 Sampling distribution of the mean Let’s say now you roll a dice two times (sample size n = 2) . You can calculate the mean ( X ) of the two numbers: x x x With a sample size of 2, there are only 36 combinations, and only 11 different values of x : (1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6) With x = 3.5 most likely to occur 4 Sampling distribution of the mean We repeat the random experiment 1000 times and calculate the sample mean X each time. The probability distribution of the sample mean: n=2 Note that: 1. The P.D. is centered on 3.5 2. The P.D. is not flat anymore 5 Sampling distribution of the mean We roll a dice 12 times and calculate the sample mean X . Then, we repeat the random experiment 1000 times and calculate the sample mean X each time. The probability distribution of the sample mean X : n = 12 Note that: 1. The P.D. is centered on 3.5 2. The P.D. is not flat 3. The P.D. is narrower than when n = 2 6 P.D. of the population P.D. of the sample mean X 0.2 n=2 0.15 0.1 0.05 0 1 2 3 4 5 6 µ = 3.5 As n gets larger, the P.D. for X n = 12 1. Is centered on 3.5 2. Gets narrower 3. Becomes more bell-shaped 7 Sampling distribution of the mean As the sample size n increases, the probability distribution of the sample mean X 1. Becomes more like a normal distribution 2. The P.D. is centered on µ 3. The P.D. gets narrower 8 Sampling distribution of the mean Original Normal Uniform Skewed population Sample means (n=5) Sample means ( n = 10 ) Sample means normal normal normal ~normal ~normal ( n = 30 ) 9 Central limit theorem For a population, if the random variable X has a mean of µ, and a standard deviation of σ. If a sample of size n is randomly drawn from the population, no matter what is the probability distribution of X: 1. The probability distribution of the sample mean x will approach a normal distribution as the sample size is increased 2. The mean of the sample mean will be the population mean µx = µ 3. The standard deviation of the sample mean will be σx = σ “Standard error of the mean” n 10 Central limit theorem X: µ, σ How good is X in estimating µ? X= 1 (X1 + X 2 + ... + X n ) n What would you expect the value of X to be? ( ) EX = Before collections of data, X1 is a r.v. = X ∴ X is r.v. nµ 1 1 E(X 1 + X 2 + ... + X n ) = (µ + ... + µ ) = =µ n n n ( ) 2 σ 1 1 Var X = 2 Var ( X 1 + X 2 + ... + X n ) = 2 nσ 2 = n n n (Assume s.i. due to random sampling) As n ↑ → Var(X) ↓ (Theorem 6 revisited) 11 Central limit theorem What is the distribution of X? If X is normal → X is normal σ2 X ~ N µ , n If X not normal but n is large → X is approx. normal σ2 X ~ N µ , n N is large n1 n1 > n2 n2 µ X 12 Central limit theorem For most populations, if the sample size is greater than 30, the Central Limit Theorem approximation is good. Example 1 At a large university, µ=22.3 years and σ=4 years. A random sample of 64 students is drawn. Determine the probability that the average age of these students is greater than 23 years? µ=22.3; σ2=16; n=64 is large σ2 By the Central limit theorem, X ~ N µ , = N (22.3,0.25) n X − 22.3 23 − 22.3 ∴ P ( X > 23) = P ( > ) = P ( Z > 1.40) = 0.0808 0.25 0.25 13 Statistical Estimation of Parameters Classical estimation of parameters consists of two types: (1) point estimation and (2) interval estimation Point estimation is aimed at calculating a single number, from a set of observational data, to represent the parameter of the underlying population Interval estimation determines an interval within the true parameter lies, with a statement of “confidence” represented by a probability that the true parameter will be in the interval. 14 Estimating Parameters From Observation Data REAL WORLD “POPULATION” Theoretical Model (True Characteristics Unknown) Sampling (Experimental Observations) Random Variable X Real Line -∞ < x < ∞ With Distribution fX(x) fX(x) Mean µ ≈ x Variance σ 2 ≈ s 2 Sample {x1, x2, …, xn} Inference On fX(x) Statistical Estimation 1 x = ∑ xi n 1 s2 = xi − x ∑ n −1 ( Role of sampling in statistical inference ) 2 15 Point Estimation Definition If E[U(X1,X2,…,Xn)]=θ, U(X1,X2,…,Xn) is called an unbiased estimator of θ. Otherwise, it is said to be biased. θ is an unknown parameter of the probability distribution. E.g. f ( x ) = λ e − λ x parameter X Example 2 Consider a random sample of size n which is drawn from a 1 1 population. The sample mean X = X + ... + X 1 1 E ( X ) = E ( X 1 + ... + X n ) n n 1 1 = E ( X 1 ) + ... + E ( X n ) n n 1 1 = µ + ... + µ = µ n n n 1 n n QED ∴ X is an unbiased estimator of µ. It is noted that the error in the estimator X decreases as n increases.(p.11) 16 Point Estimation Example 3 Consider a random sample of size n which is drawn from a 1 n population. The sample variance is 2 ( )2 S = ∑ n −1 i =1 Xi − X Prove that S2 is an unbiased estimator of σ2 . That is, E ( S 2 ) = σ 2 (Try to derive it) Example 4 Consider the Binomial distribution, X is the no. of success in n trials. An estimator pˆ = X / n is unbiased. X 1 1 E ( pˆ ) = E ( ) = E ( X ) = (np ) = p n n n ∴ p̂ is an unbiased estimator of p. QED 17 Estimation of Parameters, e.g. µ, σ2, λ, ζ etc. a) Method of moments: equate statistical moments (e.g. mean, variance, skewness etc.) of the model to those of the sample. e.g. in normal X : N (µ , σ ) ; µˆ = x, σσˆ 22 = s 2 in lognormal X : LN(λ , ζ ) 1 21 2 x ) =µexp = λE(Xln − λ ζ+ 2 ζ ≡ x 2 2 ζ ) [2e − 1]s≡ s 2 Var (X ) = E ( Xσ ζ 2 = ln 1 + 2 = ln (1 + δ 2 ) µ 2 Example 5 2 x 12 Data for the fatigue life of an aluminium yielded x = 26.75 million cycles 2 2 and s = 360.0 (million cycles) . Estimate λ and ζ . 1 2 λ = ln(26.75) − ς 2 = 3.083 ; ς 2 = ln(1 + 360 ) = 0.4075 ⇒ ς = 0.638 26.75 2 18 Common Distributions and their Parameters 19 Common Distributions and their Parameters (Cont’d) 20 b) Method of maximum likelihood Consider a random variable X with PDF f(x; θ), where θ=Parameter. The idea is to estimate the parameter θ with the value that makes the observed data x1,x2,⋅⋅⋅xn most likely. When a probability mass function or probability density function is considered to be a function of the parameter, it is called a likelihood function. Assume random sampling, the likelihood function of obtaining a set of n independent observations is L(θ) = fX(x1;θ) fX(x2;θ)⋅⋅⋅fX(xn;θ), where x1,x2, ⋅⋅⋅xn are observed data The maximum likelihood estimator θˆ is the value of θ that maximizes the likelihood function L(θ) ∂L(θ ) ∂ log∂LL((θθ)) =0 = 0 → θθˆ → opt estimation of θ ∂θ ∂θ∂θ 21 Method of maximum likelihood (Cont’d): Example 6: The times between successive occurrence of the high-intensity earthquake were observed to be 7.3, 6.2, 5.4, 9.3, 8.3 years. fX(x) fX(x) 0.2 L (λ ) = (λe =λe-λx − λx1 )(λe − λx 2 )...(λe − λx n )=λ e n −λ n ∑ xi i =1 n log L(λ ) = n log λ − λ ∑ xi i =1 λ = 0.2 0.1 λ = 0.1 d log L(λ ) n n n = − ∑ xi = 0 ⇒ λˆ = n dλ λ i =1 ∑ xi i =1 n 5 ∴ λˆ = n = = 0.137 quake / year ∑ xi 36.5 i =1 X2 x Given X1 → λ = 0.2 more likely Similarly, X2 → λ = 0.1 more likely ∴ Likelihood of λ depends on fX(xi) and the xi’s X1 22 c) Probability plots (Probability paper) 1. 2. 3. 4. Scientists and engineers often work with data that can be thought of as a random sample from some population. In many cases, it is important to determine the probability distribution that approximately describes the population. More often, the only way to determine an appropriate distribution is to examine the sample to find a probability distribution that fits. Use of Probability Paper Arrange the data in ascending order plotting position x1 , x2 ,..., xm ,...x N m ; m = 1,2,..., N Let a cumulative probability be Pm = N +1 Plot xm VS s See if follows a straight line 23 Use of Probability Paper The normal (or Gaussian) probability paper is constructed on the basis of the standard normal CDF s= x−µ σ ⇒ x = σs + µ slope y-intercept The lognormal probability paper can be obtained from the normal probability paper by simply changing the arithmetic scale for values of X to a logarithmic scale. s= ln x − λ ς ⇒ ln x = ςs + λ slope y-intercept 24 Example 7 Data: Time spent for 5 randomly chosen vehicles passing through a junction 2.9s, 3.5s, 4s, 2.5s, 3.1s; N=5 m xm Pm Pm 1 2 3 4 5 2.5 2.9 3.1 3.5 4 1/6 2/6 3/6 4/6 5/6 0.1667 0.3333 0.5000 0.6667 0.8333 s -0.97 -0.43 0 0.43 0.97 It is found that the data points appear to fit the normal distribution 25 xm Normal Probability paper µˆ = x0.5 = 3.2 σˆ = x0.84 − x0.5 4 3.95 = 3.95 − 3.2 = 0.75 3.2 3 2 1 s s pm -1 16% 0 50% 1 84% 26 Example 8: Shear strength of concrete 13 data for shear strength of concrete were obtained m Shear strength τ m Shear strength τ m/N+1 m/N+1 s ln τ 1 0.35 0.0714 -1.47 -1.05 8 0.58 0.5714 0.18 -0.54 2 0.40 0.1429 -1.07 -0.92 9 0.68 0.6429 0.37 -0.39 3 0.41 0.2143 -0.79 -0.89 10 0.7 0.7143 0.57 -0.36 4 0.42 0.2857 -0.57 -0.87 11 0.75 0.7857 0.79 -0.29 5 0.43 0.3571 -0.37 -0.84 12 0.87 0.8571 1.07 -0.14 6 0.48 0.4286 -0.18 -0.73 13 0.96 0.9286 1.47 -0.04 7 0.49 0.5000 s 0 ln τ -0.71 It is found that the data points appear to fit the lognormal distribution 27 Normal Probability Paper τ 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 s pm -1 16% 0 50% 1 84% s 28 Lognormal Probability Paper lnτ 0 -0.2 -0.22 -0.4 -0.58 -0.6 λ = −0.58 − 0.22 − (−0.58) 1− 0 = 0.36 ς= -0.8 -1.0 -1.2 s -1 0 pm 16% 50% 1 84% s 29 Goodness-of-fit test of distribution Even though the data points appear to fall on a straight line, but how good is it? Would it be accepted or rejected at a prescribed confidence level? If it appears to fit several probability models, which one is better? 2 χ Chi-square test ( ) Kolmogorov-Smirnov test (K-S) 30 Procedures of Chi-Square test ( χ ) 2 Draw histogram from data 2. Draw proposed distribution (frequency diagram) normalized by no. of occurrence same area as histogram 3. Select appropriate intervals k 4. Determine ni = observed incidences per interval ei = predicted incidences per interval based on model 2 ( n − e ) i i 5. Determine for each interval 1. ei (ni − ei ) 2 Z =∑ ei i k 6. Determine for all intervals Note: Larger Z less fit 7. Compare Z with the standardized value C1−α , f level of confidence d.o.f. = k – 1 – m No. of parameters in proposed distribution, estimated from data 31 P=α 32 Procedures of Chi-Square test ( χ ) 2 8.Check If Z < C1−α , f probability model substantiated with confidence level 1 − α . Otherwise Model not substantiated Validity of method rely on k ≥ 5 , ei ≥ 5 (combine some intervals if necessary) Example 9 – Crushing strength of 143 concrete cubes 33 Example 9 –Crushing strength of concrete (cont.) (ni-ei)2/ei Theoretical Frequencies, ei Observed frequency,ni Interval(ksi) Normal Lognormal Normal Lognormal < 6.75 9 11.23 10.37 0.44 0.18 6.75 - 7.00 17 13.47 14.38 0.92 0.48 7.00 - 7.25 22 20.85 22.18 0.06 0.00 7.25 - 7.50 31 25.94 26.59 0.99 0.73 7.50 - 7.75 28 25.94 25.36 0.16 0.27 7.75 - 8.00 20 20.85 19.66 0.03 0.01 8.00 - 8.50 9 20.47 19.43 6.43 5.60 > 8.50 7 4.23 5.04 1.81 0.76 143 143.00 143.00 10.85 8.03 Σ Z N = 10.85; Z LN = 8.03; f = k − 1 − m = 8 − 1 − 2 = 5; C 0.95, 5 = 11.07 As both Z N & Z LN ≤ C 0.95, 5 Both models substantiated (Lognormal is better than normal) 34 Kolmogorov-Smirnov (K-S) Test Arrange the data in ascending order: x1 , x2 ,..., xk ,...x N Sample CDF 0 x < x1 S n (x) = { k n 1 xk < x ≤ xk +1 x ≥ xn Compare S n (x) of sample with CDF, FX (x) of proposed model. Identify the largest discrepancy D between the two max curves. Compare Dmax with a standardized value Dmax > Dα reject model Dmax < Dα model substantiated 35 Example 10 – Data for fracture toughness of 26 steel plates α D Critical values n in K-S test 17 = 0.654 26 n α 0.2 0.1 0.05 0.01 5 0.45 0.51 0.56 0.67 10 0.32 0.37 0.41 0.49 0.05 ≤ D26 = 0.265 15 20 0.37 0.30 0.34 0.40 0.23 0.26 0.29 0.36 25 0.21 0.24 0.27 0.32 30 0.19 0.22 0.24 0.29 Dmax = 0.654 − 0.5 = 0.154 0.5 17th data 7.9 77 36 Confidence Interval (or Interval Estimation) • a range (or an interval) of values likely to contain the true value of the population parameter As an example, Lower # < µ < Upper # Definition - Degree of Confidence • the probability 1 – α that the confidence interval contains the true value of the population parameter (often expressed as the equivalent percentage value) usually 95% or 99% (α = 5%) (α = 1%) 37 Confidence interval (Interval estimation) of µ We would like to establish P(? < µ < ?) = 0.95 (Refer to p.12) x −µ ∴y= is N(0,1) see E4.1 σ n σ , assuming σ known → xX =~NN µµ, σ n n 2 − µ x ≤ 1.96 = 0.95 P − 1.96 < σ n ( 0.95 0.025 -1.96 k0.025 = 1.96 x −µ σ n − 1.9 6σ Similarly n ) < x − µ ⇒ µ < x + 1.9 6σ x − µ ≤ 1.96 σ n ⇒ µ ≥ x − 1.96 σ σ σ ∴ P x − 1.96 ≤ µ < x + 1.96 = 0.95 n n 38 n n Confidence interval of µ (Cont’d) From data → n = 25; x = 5.6; assume σ = 0.65 ∴ P(5.345 ≤ µ < 5.855) = 0.95 Not a r.v. → confidence interval In short, µ µ where 0.95 1-α = x k 0.025 = x kα σ 2 α k α = Φ 1 - 2 2 n σ n 1–α α/2 -1 kα/2 39 Example 11 Daily dissolved oxygen (DO) n = 30 observations s = 2.05 mg/l = σ (assumed) x = 2.52 mg/l Determine 99% confidence interval of µ. k 0.005 = Φ −1 (1 − 0.005) 1 − α = 0.99 → α = 0.01 → α = 0.005 2 ∴ µ 0.99 = x k 0.005 σ n 2.05 = 2.52 2.58 30 = (1.56;3.49 ) Similarly = Φ −1 (0.995) = 2.58 µ 0.95 = (1.76;3.25) As confidence level ↑ → interval ↑ s ↑ → <> ↑ n ↑ → <> ↓ 40 Confidence Intervals from Different Samples µ = 2.91 (but unknown to us) 0 3.00 1.00 • 1.56 • • x=2.52 • •3.49 • • 5.00 • • This confidence interval • does not contain µ 41 Confidence Interval of µ when σ is unknown X - µX − µ Need the distribution of T = S nS n → student t − distribution with parameter f = n - 1 Large f → N(0,1) Small f 0 X −µ S n is Tn -1 ⇒ observe interval ↑ for same confidence level µ 1− α s s = x − t α ,n −1 ; x + t α ,n −1 2 2 n n 42 p α/2 0 Go to p383 Ex.5.6, Example 11 (Cont.) tα/2,f x = 2.52, s = 2.05 n − 1 = 29 99% → α = 0.005 → p = 0.995 2 ∴ t α ,n −1 = t 0.005, 29 = 2.756 2 2.05 = ± = (1.49 to 3.55) 2 . 52 2 . 756 0.99 30 wider than (1.56 to 3.49 ) (for known σ case) ∴ µ Example 12 Traffic survey on speed of vehicles. Suppose we would like to determine the mean vehicle velocity to within ± 1 kph with 99 % confidence. How many vehicles should be observed? Assume σ = 3.58 from previous study µ 1− α = x ± kα σ 2 n Scatter k 0.005 3.58 n = 12 → nn=85.3 = 21 or 86 2.58 What if s not known, but sample std. dev. expected to s = 3.58 and desired to be with ± 1 ? then → t 0.005,n −1 t 0.005,n −1 n = 3.58 n = 21 21 = 00.279 .559 3.58 LHS = = t 0.005,89−1 89 2.6330 89 = t 0.005,88 89 = 0.279 = RHS ∴ n = 89 Compare with n ≈ 86 for σ known 44 One - sided confidence limit σ = 0.95 < µ ) 0.95 for strength P µ ≥ x - k α n σ = 0.95 for load P µ ≤ x -+ k α ( µ > 0.95 n Not α/2 x-µ ≤ kα = 1 − α start by writing P α 1– σ α n kα σ < µ limit ; x - t α , n −1 s )1-α is= x − kα 100(1-α)% lower confidence n n σ Upper confidence s > = + + x k ( ; x t µ 100(1-α)% upper confidence limit is 1-α α α , n −1 limit n n σ known σ unknown 45 The End of the Session 46