Chapter 11 Sampling Distributions Parameters and Statistics Typically, a number is computed from the sample and used to make inference about some unknown value that describes a characteristic of the population. A numerical characteristic of a Population is called - an unknown value but fixed A numerical characteristic of a Sample is called - a known value based on the sample but Ex: A marketing research company wanted to estimate the % of Americans that have unfavorable opinion about an airline. They conducted telephone interviews with a randomly selected national sample of 1,009. They report that 74% of the people in this sample have an unfavorable opinion about that airline. Population: Parameter: Sample: Statistic: Ex: A carload of ball bearings has mean diameter 2.5003 cm. This is within the specifications for acceptance of the shipment by the purchaser. An inspector chooses a random sample of 100 bearings from this carload and finds their mean to be 2.5009 cm. This is outside the specified limits so the shipment is mistakenly rejected. Indicate whether 2.5003 is a parameter or statistic. Do the same for 2.5009. 1 Statistical Estimation and the Law of Large Numbers Unbiased Estimate One of the desirable properties of a statistic is _____________. A statistic used to estimate a parameter is _____________ if the mean of its sampling distribution is equal to the true value of the parameter being estimated. To get an idea of this property, let’s look at the following dartboard players: A B Which player is unbiased? Estimation of population mean Many times we are interested in estimating the mean of some characteristic of a population. For example: mean financial aid that an SMU student receives mean amount of CO2 emitted by the Beetle cars mean number of people riding DART train per weekend We represent the mean value of a population by the symbol μ. So μ is a parameter / statistic. Our goal is to estimate the value of μ. To do this we take a sample and use the information obtained from it. 2 What statistic would be useful for estimating the parameter μ? We will be using the will learn some of the properties of to draw inference about μ. In this section we . Since the value of this statistic varies from sample to sample it can be viewed as a ______________. Then why is it considered a good estimator of μ? One of the reasons is … … Law of Large Numbers Draw observations at random from any population with finite mean . As the number of observations being drawn increases, the sample mean of the observed values, Example: Consider a roulette wheel in a casino. It has 38 slots, 18 are black, 18 are red and 2 are green. When the wheel is spun, the ball is equally likely to come to rest in any of the slots. A bet of $1 on red returns $2 if the ball lands in a red slot. Otherwise the player loses his/her dollar. Suppose you bet on red. What is your probability of winning? The mean amount of money one gets from each bet on red is ______________. And the mean amount one spends on each bet is $1. So the mean amount of money one loses is ________________. This is the population mean. According to the law of large numbers, the more you bet on red, the mean amount you lose gets closer and closer to the above population mean. So for example, if you bet 100,000 times, you would expect to lose Do you want to gamble!!?? But keep in mind, this doesn’t hold for few plays, in which case your winnings (or losses) are quite unpredictable! For casino it is very profitable because they play tens of thousands of time and hence its profit is predictable by the law of large numbers, which is 3 Issue: Although the law of large numbers guarantees that with a very, very large sample the sample mean x will be close to the population mean μ, in real life we cannot always afford to take extremely large samples. What can we say about x computed from a sample of size that is not very large, say of size 10? For this, we ask the question “What would happen if we took many, many samples of size 10 each from the same population”? The sample mean, x will So x is a _______________________and has its own distribution. Recall the definition of distribution of a random variable: it gives the possible values that the variable can take and how often it takes those values. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. To get an idea about this distribution 1. take a sample of size 10 say, from the population of interest and compute its sample mean x . 2. repeat step 1 many, many times taking sample of same size. 3. So now there are many, many different values of sample mean. Make a histogram of these values of x . This shows the distribution of x . 4 Now let us look at the sampling distributions of x based on sample sizes 10 and 1000 given in the previous page…. Q: In each case, where does the center (mean) of the distribution seem to be? Q: Does x seem to be unbiased or biased? Q: What happens to the variability (spread) of the sampling distribution of x when the sample size increases from 10 to 1000? It makes sense because What is more desirable: smaller variability or larger variability? A C Mean and Standard Deviation of a Sample Mean, x Suppose that x is the mean of an SRS of size n drawn from any large population that has mean and standard deviation . The mean of the sampling distribution of x is _____ and its standard deviation is ______. 5 Implications: Mean of sampling distribution of x equal to Standard deviation of sampling distribution of x equal to Example: Response to brake light - Consider the time we take to react to the brake lights on a decelerating vehicle. This time is a random variable and is critical in helping in avoiding rear-end collisions. A study has shown that response time to a brake signal from standard brake lights is normally distributed with mean 1.25 sec and standard deviation of 0.46 sec. (a) Suppose we take an SRS of 50 drivers and find the value of their mean response time x . We keep on repeating this process lot of times each time getting a value of x . Plotting histogram of all these values of x gives us the sampling distribution of x . What are the mean and standard deviation of this distribution of x ? (b) Which of the following vary more - response time of individual drivers or mean response time of 50 drivers? (c) Which of the following will vary more - mean response time of 50 drivers or mean response time of 100 drivers? 6 Sampling Distribution of a Sample Mean If the population we are sampling from is Normal(,) then the sample mean x based on n independent observations has a Normal( , ) distribution. Ex: Response to brake light (cont’d) (a) What is the probability that a randomly chosen driver takes more than 1.45 sec to react to the brake lights? (b) What is the distribution of the mean response time of 50 randomly chosen drivers? (c) What is the probability that a random sample of 50 drivers will have mean response time, x of 1.45 sec or higher? Will this probability be larger or smaller than the one calculated in (a)? Fact: If the population we sample from is not normal, then the distribution of x is not normal. Problem: In that case, how do we compute various probabilities regarding x like the one we did in previous example? We are rescued by a very famous and elegant result of probability theory CENTRAL LIMIT THEOREM!! 7 The Central Limit Theorem (CLT) states: Draw an SRS of size n from any population with mean and standard deviation . When n is large, the sampling distribution of the sample mean x is approximately _____________. Good news : We don’t need to learn any new distributions even if the population we are dealing with is non-normal as long as we are interested in Q: How large is large enough? A: It depends on the shape of the distribution we are sampling from. If the population distribution is close to normal, then 8 9 Rule of Thumb: CLT is usually applicable for Example: Suppose the number of four-can packs of beer sold every day at a beer store varies with mean 105.7 and standard deviation 52.5. The manager records the number of such beer packs sold in 40 randomly chosen days and calculates the mean number of beer packs sold each day. (a) Can the population distribution of the number of beer packs sold be normal? (b) What is the approximate distribution of mean number of beer packs sold ( x )? Why? (c) The manager is interested in finding approximate probability that the mean number of beer packs sold is higher than 130. Calculate it. 10