STATISTICS FOR THE SOCIAL AND BEHAVIORAL SCIENCES RECITATION 8 ANSWER KEY Chapter 4 1 True or False: As the sample size increases, the standard error of the sampling distribution of 𝑦̅ increases. (Explain your answer). False. As sample size increases the standard error of the sampling distribution decreases. We know this because as n gets larger, we are closer to sampling the entire population. If we sample the entire population the standard error would be 0 because our sample mean would be the population mean. 2 The standard deviation of a discrete probability distribution is 𝜎 = √∑(𝑦 − 𝜇)2 𝑃(𝑦) Suppose P(y=1)=π and P(y=0)=1- π, where π represents a number between 0 and 1. Find the standard deviation and standard error of a sample proportion for a random sample of size n (in terms of π and n). The first thing we need to notice is that 𝜇 = 𝐸(𝑌) = 1 × 𝑃(𝑦 = 1) + 0 × 𝑃(𝑦 = 0) = 𝜋 Then, 𝜎 = √(1 − 𝜋 )2 × (𝜋) + (0 − 𝜋 )2 × (1 − 𝜋 ) = √(1 − 2𝜋 + 𝜋 2 ) × 𝜋 + (0 − 2 × 0𝜋 + 𝜋 2 ) × (1 − 𝜋 ) = √𝜋 − 2𝜋 2 + 𝜋 3 + 𝜋 2 − 𝜋 3 = √𝜋 − 𝜋 2 = √𝜋 × (1 − 𝜋) The standard error is 𝑠𝑒 = √𝜋 × (1 − 𝜋) √𝑛 3 Which of the following best explains the phenomenon that while in 10 flips of a fair coin it may not be very surprising to get 8 Heads, it would be very surprising to get 8,000 Heads in 10,000 flips of the coin. A) The frequencies of events with the same likelihood of occurrence even out, given enough trials or instances. B) If we kept taking samples of 10 coin flips to record the number of Heads, then 95% of the time the sample mean would be contained in a given 95% confidence interval. 1 C) If we have a large enough sample (as n approaches infinity), our sample mean will approach the expected value of the random variable. 𝑃(𝐴 𝑎𝑛𝑑 𝐵) D) P(A|B)= 𝑃(𝐵) E) Regardless of the shape of the population distribution, the shape of the sampling distribution tends to be normal for a large enough sample size. 4 Which of the following statements is false? A) Two disjoint outcomes (of the same event) cannot occur at the same time. B) Two independent events cannot occur at the same time. C) Two mutually exclusive outcomes (of the same event) cannot occur at the same time. D) Two complementary outcomes (of the same event) cannot occur at the same time. 5 A statistician is studying blood pressure levels of Italians in the age range 7580. The following is some information about her study: I. II. III. IV. The researcher collected the data by recording the responses of Italians in the age range 75-80 who responded to an email in which the survey was included. The sample observations only make up about 4% of the population. The sample size is 2,047. The distribution of sample observations is skewed - the skew is easy to see, although not very extreme. The researcher is ready to use the Central Limit Theorem (CLT) in the main part of her analysis. Which aspect of her study is most likely to prevent her from using the CLT? a) IV and I b) I c) III d) II and III e) IV f) All of the above 6 About 30% of human twins are identical, and the rest are fraternal. Identical twins are necessarily the same sex, half are males and the other half are females. One-quarter of fraternal twins are both male, one-quarter both female, and onehalf are mixes: one male, one female. You have just become a parent of twins and are told they are both girls. Given this information, what is the probability that they are identical? A) 50% B) 72% C) 33% D) 46% Given: P(I) = 0.3 2 P(~I)= 0.7 P(FF|I)= 0.5 P(MM|I)= 0.5 P(MM|~I)= 0.25 P(FF|~I)= 0.25 P(FM|~I)= 0.25 𝑃(𝐼)𝑃(𝐹𝐹|𝐼) P(I|FF) = 𝑃(𝐹𝐹) = (0.3)∗(0.5) (0.325) = 0.461 P(FF)=P(FF|I)P(I) + P(FF|~I)P(~I) = (0.5)*0.3 + (0.25)0.7 = 0.325 Chapter 5 7 How large a sample size is needed to estimate the mean annual income of Native Americans correct to within $1000 with probability 0.99? We know that about 95% of their incomes are between $6000 and 50000$ and that this distribution of incomes is approximately normal. If confidence level is 99%, then z=2.58 We need to approximate the population standard deviation. We know that the mean is (6000+50000)/2= 28000. We know that 50000 is two standard deviations away from 28000. Thus the standard deviation is (50000-28000)/2=11000. 11000 1000 = 2.58 × √𝑁 11000 √𝑁 = 2.58 × 1000 11000 2 𝑁 = (2.58 × ) 1000 𝑁 = 806 8 Suppose we collected a sample of size n = 100 from some population and used the data to calculate a 95% confidence interval for the population mean. Now suppose we are going to increase the sample size to n = 300. Keeping all else constant, which of the following would we expect to occur as a result of increasing the sample size? I. II. III. The standard error would decrease. Width of the 95% confidence interval would increase. The margin of error would decrease. a) II and III b) I and III c) I and II d) I, II, and III e) None 9 Increasing the confidence level causes the width of a confidence interval to 3 a) increase b) decrease c) stay the same 10 Other things being equal, if we quadruple the sample size the width of a confidence interval a) becomes one quarter as wide b) halves c) stays the same d) doubles 11 Based on responses of 1467 subjects in General Social Surveys, a 95% confidence interval for the mean of close friends equals (6.8, 8.0). Which of the following interpretations is (are) correct? a) We can be 95% confident that 𝑦̅ is between 6.8 and 8.0. b) We can be 95% confident that 𝜇 is between 6.8 and 8.0. c) Ninety-five percent of the values of y = number of close friends (for this sample) are between 6.8 and 8.0. d) If random samples of size 1467 were repeatedly selected, then 95% of the time 𝑦̅ would fall between 6.8 and 8.0. e) If random samples of size 1467 were repeatedly selected, then in the long run 95% of the confidence intervals formed would contain the true value of 𝜇. 12 A random sample of 50 records yields a 95% confidence interval for the mean age at first marriage of women in a certain county of 21.5 to 23.0. Explain what is wrong with each of the following interpretations of this interval and give the right interpretation. a) If random samples of 50 records were repeatedly selected, then 95% of the time the sample mean age at first marriage for women would be between 21.5 and 23.0 years. The correct statement would be: If random samples of 50 records were repeatedly selected, then 95% of the time the true mean age at first marriage for women would be between 21.5 and 23.0 years. b) Ninety-five percent of the ages at first marriage for women in the county are between 21.5 and 23.0 years. This is simply not true. All we know from the confidence interval is that with probability 95% the true mean age at first marriage is in our interval. c) We can be 95% confident that 𝑦̅ is between 21.5 and 23.0 No, we are 100% confident that 𝑦̅ is between 21.5 and 23.0. We use the sample mean to calculate the confidence interval. d) If we repeatedly sampled the entire population, then 95% of the time the population mean would be between 21.5 and 23.0 years. No, if we repeatedly sampled the entire population we would always get the true mean. 4