CHAPTER 4 • 4.1 - Discrete Models General distributions Classical: Binomial, Poisson, etc. • 4.2 - Continuous Models General distributions Classical: Normal, etc. ~ The Normal Distribution ~ (a.k.a. “The Bell Curve”) standard deviation X ~ N(μ, σ) σ Johann Carl Friedrich Gauss 1777-1855 X mean μ • Symmetric, unimodal • Models many (but not all) natural systems • Mathematical properties make it useful to work with 2 Standard Normal Distribution Z ~ N(0, 1) d e n s ity fu n c tio n (z) 1 2 e z 2 2 1 Total Area = 1 Z The cumulative distribution function (cdf) is denoted by (z). It is tabulated, and computable in R via the command pnorm. Example Find P(Z 1.2). Standard Normal Distribution Z ~ N(0, 1) 1 Total Area = 1 Z 1.2 “z-score” Example Standard Normal Distribution Find P(Z 1.2). Z ~ N(0, 1) Use the included table. 1 Total Area = 1 Z 1.2 “z-score” Lecture Notes Appendix… 6 7 Example Standard Normal Distribution Find P(Z 1.2). Z ~ N(0, 1) Use the included table. Use R: > pnorm(1.2) [1] 0.8849303 1 Total Area = 1 0.88493 P(Z > 1.2) 0.11507 Z 1.2 “z-score” Note: Because this is a continuous distribution, P(Z = 1.2) = 0, so there is no difference between P(Z > 1.2) and P(Z 1.2), etc. Standard Normal Distribution X ~ N(μ, σ) σ μ Z ~ N(0, 1) Z X 1 Z Why be concerned about this, when most “bell curves” don’t have mean = 0, and standard deviation = 1? Any normal distribution can be transformed to the standard normal distribution via a simple change of variable. Example POPULATION Random Variable X = Age at first birth Question: What proportion of the population had their first child before the age of 27.2 years old? P(X < 27.2) = ? Year 2010 X ~ N(25.4, 1.5) σ = 1.5 μ = 25.4 27.2 10 Example POPULATION Random Variable X = Age at first birth Question: What proportion of the population had their first child before the age of 27.2 years old? P(X < 27.2) = ? Year 2010 X ~ N(25.4, 1.5) The x-score = 27.2 must first be transformed to a corresponding z-score. σ = 1.5 μ μ==25.4 27.2 33 11 Example POPULATION Random Variable X = Age at first birth Question: What proportion of the population had their first child before the age of 27.2 years old? P(X < 27.2) = ?P(Z < 1.2) = 0.88493 Year 2010 X ~ N(25.4, 1.5) 27.2 X 25.4 Z Z Z 1.2 1.5 σ = 1.5 Using R: > pnorm(27.2, 25.4, 1.5) [1] 0.8849303 μ μ==25.4 27.2 33 12 Standard Normal Distribution Z ~ N(0, 1) 1 Z What symmetric interval about the mean 0 contains 95% of the population values? That is… Standard Normal Distribution Z ~ N(0, 1) Use the included table. 0.95 0.025 0.025 Z -z.025 = ? +z.025 = ? What symmetric interval about the mean 0 contains 95% of the population values? That is… Lecture Notes Appendix… 15 16 Standard Normal Distribution Z ~ N(0, 1) Use the included table. Use R: > qnorm(.025) [1] -1.959964 > qnorm(.975) [1] 1.959964 0.95 0.025 0.025 Z -z.025 = -1.96 ? “.025 critical values” +z.025 ? .025 = +1.96 What symmetric interval about the mean 0 contains 95% of the population values? X ~ N(μ1.5) , σ) X ~ N(25.4, Standard Normal Distribution Z ~ N(0, 1) What symmetric interval about the mean age of 25.4 contains 95% of the population values? X Z 1.96 X 25.4 1.5 22.46 X 28.34 yrs > areas = c(.025, .975) > qnorm(areas, 25.4, 1.5) [1] 22.46005 28.33995 X 25.4 (1.96)(1.5) X 25.4 2.94 0.95 0.025 0.025 Z -z.025 = -1.96 ? “.025 critical values” +z.025 = +1.96 ? What symmetric interval about the mean 0 contains 95% of the population values? Standard Normal Distribution Z ~ N(0, 1) Use the included table. 0.90 0.05 0.05 Z Similarly… -z.05 = ? +z.05 = ? What symmetric interval about the mean 0 contains 90% of the population values? …so average 1.64 and 1.65 0.95 average of 0.94950 and 0.95053… 20 Standard Normal Distribution Z ~ N(0, 1) Use the included table. Use R: > qnorm(.05) [1] -1.644854 > qnorm(.95) [1] 1.644854 0.90 0.05 0.05 Z Similarly… -z.05 = -1.645 ? “.05 critical values” +z +z.05 = +1.645 ? .05 = What symmetric interval about the mean 0 contains 90% of the population values? Standard Normal Distribution Z ~ N(0, 1) In general…. 10.90 – 0.05 /2 0.05 /2 Z Similarly… -z.05 = -1.645 ? -z / 2 ““.05 / 2critical criticalvalues” values” +z +z.05 = +1.645 ? .05 / 2= What symmetric interval about the mean 0 contains 100(1 – )% of the population values? continuous discrete Normal Approximation to the Binomial Distribution Suppose a certain outcome exists in a population, with constant probability . We will randomly select a random sample of n individuals, so that the binary “Success vs. Failure” outcome of any individual is independent of the binary outcome of any other individual, i.e., n Bernoulli trials (e.g., coin tosses). Discrete random variable X = # Successes in sample (0, 1, 2, 3, …,, n) P(Success) = P(Failure) = 1 – Then X is said to follow a Binomial distribution, written X ~ Bin(n, ), with “probability function” n x f(x) = x (1 ) nx , x = 0, 1, 2, …, n. 23 > dbinom(10, 100, .2) [1] 0.00336282 Area 24 > pbinom(10, 100, .2) [1] 0.005696381 Area 25 26 27 28 29 Therefore, if… X ~ Bin(n, ) with n 15 and n (1 – ) 15, then… X N n , n (1 . That is… ˆ N , n X (1 ) n “Sampling Distribution” of ˆ 30 ● Normal distribution ● Log-Normal ~ X is not normally distributed (e.g., skewed), but Y = “logarithm of X” is normally distributed ● Student’s t-distribution ~ Similar to normal distr, more flexible ● F-distribution ~ Used when comparing multiple group means ● Chi-squared distribution ~ Used extensively in categorical data analysis ● Others for specialized applications ~ Gamma, Beta, Weibull… 31