STP 420 SUMMER 2005 STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES PART 2 – PROBABILITY AND INFERENCE CHAPTER 5 FROM PROBABILITY TO INFERENCE Introduction The distribution of a statistic A statistic from a random sample or randomized experiment is a random variable. The probability distribution of the statistic is its sampling distribution. Population Distribution The population distribution of a variable is the distribution of its values for all members of the population. The population distribution is also the probability distribution of the variable when we choose one individual from the population at random. 1 STP 420 SUMMER 2005 5.1 Sampling Distributions for Counts and Proportions The random variable X is a count of the occurrences of some outcome in a fixed number of observations n. Sample proportion - p̂ X n The binomial distributions for sample counts The binomial setting 1. There are a fixed number n of observations. 2. The n observations are all independent. 3. 4. Each observation falls into one of just two categories, which for convenience we call “success” and “failure”. The probability of a success, call it p, is the same for each observation. Binomial distributions The distribution of the count X of successes in the binomial setting is called the binomial distribution with parameters n and p. The parameter n is the number of observations, and p is the probability of a success on any one observation. The possible values of X are the whole numbers from 0 to n. X is B(n, p). 2 STP 420 SUMMER 2005 Binomial distributions in statistical sampling Sampling distribution of a count When the population is much larger than the sample, the count X of successes in as SRS of size n has approximately the B(n, p) distribution if the population proportion of successes is p. (population >= 10 as large as sample) Finding binomial probabilities: tables The tables (Table C) give all the possibilities of k (k 0), for the n (n > 1) trials. Consider the experiment, rolling two dice and recording the sum of the two faces. X is the discrete random variable with values 2, 3, …, 12 P(X 3 ) = P(X = 3) + P(X = 2) P(X < 4 ) = P(X = 3) + P(X = 2) P(X 7 ) = P(X = 7) + P(X = 8) + … + P(X = 12) Binomial mean and standard deviation If a count X has the binomial distribution B(n, p), then X = np Xnp(1) 3 STP 420 SUMMER 2005 Sample proportions p count of successes in sample size of sample X n Mean and Standard Deviation of a sample proportion Let p̂ be the sample proportion of success in an SRS of size n drawn from a large population having population proportion p of successes. The mean and standard deviation of p̂ are p p̂ ˆp p(1 p ) n Normal approximation for counts and proportions Draw an SRS of size n from a large population having population proportion p of X successes. Let X be the count of successes in the sample and p̂ the sample n proportion of successes. When n is large, the sampling distributions of these statistics are approximately normal: X is approximately N(np, p̂ is approximately N(p, np(1 p) ) p(1 p) ) n We want n and p such that np 10 and n(1-p) 10. 4 STP 420 SUMMER 2005 The continuity correction Figure 5.5 shows how a normal curve approximates the binomial distribution and there is a correction factor that improves the accuracy. For a discrete distribution (binomial) P(X = 6) can be computed but the bar actually begins at 5.5 and stops at 6.5 If a continuous distribution (normal) is used to approximate, P(X = 6) = 0 and we try to find the probability of an interval instead. Eg. P(X 9) = P(X 9.5) = P X 10 9.5 10 P(Z -0.17) = 0.4325 3 3 Continuity correction – interval 0.5 below a whole number to 0.5 above the whole number. Binomial formulas Binomial coefficient – number of ways of arranging k successes among n observations n k n! for k = 0, 1, 2, …, n k! ( n k )! where n! = n (n - 1) (n – 2) … 3 2 1 Binomial Probability If X has the binomial distribution B(n, p) with n observations and probability p of successes on each observation, the possible values of X are 0, 1, 2, …, n. If k is any one of these values, the binomial probability is n P( X k ) p k (1 p ) n k k 5 STP 420 SUMMER 2005 5.2 The Sampling Distribution of a Sample Mean Discrete random variables – uses counts and proportions Continuous random variables – uses measured data and can find mean, percentiles, or sd Important: averages are less variable than individual observations and also more normal The mean and standard deviation of x 1 (X 1 n Mean of X is x 1 ( n population mean Mean of x X1 x X2 X2 ... X n ) ... Xn 1 ) ( n ... ) is the same as the For independent observations, 2 The variance is 2 x 1 ( n 2 2 X1 2 X2 ... 2 Xn 1 ) ( n 2 2 ... 2 ) 2 n Let x be the mean of an SRS of size n from a population having mean and standard deviation . The mean of x is x The standard deviation of x is x n 6 STP 420 SUMMER 2005 The sampling distribution of x Sampling Distribution of a Sample Mean If a population has the N(, ) distribution then the sample mean x of n independent observations has the N(, /n) distribution. The sample mean of an SRS from a normal population has a normal distribution In general, any linear combination of independent normal random variables is also normally distributed. The central limit theorem Draw as SRS of size n from a population with mean and finite standard deviation . When n is large, the sampling distribution of the sample mean x is approximately normal: x ~ N(, n ) The distribution of the population does not have to be normally distributed and the quantities do not have to be independent (some correlation is not a problem) Exponential distribution – strongly right skewed - applicable to problems involving the time required to serve a customer or to repair a machine. - as sample sizes n increases the exponential curve starts to look more like the normal curve. 7 STP 420 SUMMER 2005 Beyond the basics – Weibull distributions - Appropriate distribution when considering experiments such as time to something lasts before it fails. - Engineers uses it to study the reliability of products. - eg. Infant mortality (most products fail almost immediately or very early) eg. Early failure (kind of right skewed) eg. Old-age wear out (kind of left skewed) 8