STAT 301 – BUSINESS STATISTICS FALL SEMESTER 2010 “Knowledge Festival” #1 – Part 2 (Conceptual) This “knowledge festival” is intended to assess your mastery of the conceptual aspects of this course. This test is closed-book, closed-notes. No calculator is allowed (or needed). There are 100 points worth of problems on this exam; relative problem weights are given in brackets. Unless the problem specifically indicates otherwise, use the traditional confidence level of 95% and the traditional significance level of =0.05. This “knowledge festival” is given under provisions of the Honor System of Stetson University. The word “pledged” written before your name at the top of this page is a symbol of your ongoing commitment to the Honor System. Enjoy!! Question 0 [4 points]: Spell your name (correctly and legibly) at the top of the page. Question 1 [16 points; 2 each part]: Indicate whether each of the following statements is TRUE or FALSE: a) Income data are typically skewed left. TRUE FALSE b) Employing the mode as a measure of “typicalness” in a data set is generally most useful when the data set is small (just a few numbers). TRUE FALSE c) All else being equal, as the sample size increases, the variance of the sampling distribution for the sample mean increases. TRUE FALSE d) If the data are skewed right, the mean will be greater than the median. TRUE FALSE e) It is theoretically possible for the expected value to be negative. TRUE FALSE f) All else being equal, a 90% confidence interval will be wider than a 95% confidence interval. TRUE FALSE g) One advantage of a prospective over a retrospective observational study is that the data tend to be cheaper. TRUE FALSE h) In hypothesis testing, a one-tailed test is used if, before we collect the data, we have reason to believe that any departure from the null hypothesis will occur in one particular direction. TRUE FALSE Question 2 [6 points]: For a normal distribution, approximately __________ percent of the data lie within one standard deviation of the mean, and approximately __________ percent of the data lie within two standard deviations of the mean. Question 3 [4 points]: Clyde Arthur Fazenbaker is conducting a hypothesis test to determine whether a coin is fair. He has computed a test statistic of 0. What does that imply? _____ _____ _____ _____ He got the same number of “heads” as “tails.” He made a computational mistake, since a test statistic cannot be 0. His null hypothesis is false. He made a Type I error. Question 4 [6 points; 2 each part]: The Literary Digest was a prominent American magazine of news and public affairs that is most famous for conducting the most disastrous political poll in history. In 1936 their survey predicted that Alf Landon would defeat Franklin Roosevelt in the presidential election. In reality, Roosevelt won in a landslide. (The Literary Digest lost so much credibility as a result that they went bankrupt soon afterwards.) After-the-fact analysis of the survey process revealed several flaws. For each of them, indicate whether this is an example of sampling error or of nonsampling error. a) While the Literary Digest sent out over 10 million surveys (by mail), they received only about 2.5 million back. _____ sampling error _____ nonsampling error b) The survey was begun in September. Some of those surveyed changed their mind between then and the November election. _____ sampling error _____ nonsampling error c) The Literary Digest obtained its lists of people to survey primarily from telephone books and motor vehicle registrations – at a time (mid-Depression) when relatively few people owned these items. _____ sampling error _____ nonsampling error Question 5 [8 points; 4 each part]: Dr. Rasp continues to claim that students who get a good night’s sleep before a “knowledge festival” tend to do better on the “festival.” He decides to conduct research to investigate this claim. a) State (in words) Dr. Rasp’s null and alternative hypotheses. b) Dr. Rasp asks students to write down, on their “exam,” the number of hours of sleep the number of hours of sleep they got the previous night. He then analyzes the resulting data (sleep and “knowledge festival” grades). Dr. Rasp has … _____ … a controlled experiment _____ … an observational study. Question 6 [4 points]: Dr. Rasp maintained, in class, that “numbers are meaningless …”. What did he mean by this seemingly anti-statistical remark? Give an example to illustrate. Question 7 [4 points]: Berengaria Naverre is the manager of StatsWorld, a popular new theme park. She is reading a report, prepared by the park’s accounting and marketing research staffs, which outlines various strategies for increasing park income. Included in the report are results from a study on how much customers spend on souvenirs while in the park. Berengaria reads about a “95% confidence interval for mean spending per customer” as being “$20 + $5.” Which of these is a proper interpretation of this result? {PICK ONE} _____ _____ _____ _____ _____ She’s 100% sure that 95% of the customers in the population spend between $15 and $25. She’s 95% sure that 100% of the customers in the population spend between $15 and $25. She’s 100% sure that 95% of the customers in the sample spend between $15 and $25. She’s 95% sure that 100% of the customers in the sample spend between $15 and $25. None of the above. Question 8 [4 points]: While reading the report (in the previous question), Berengaria wonders why she has only a 95% confidence interval rather than 100% confidence interval. Explain to Berengaria why a 100% confidence interval is infeasible. Question 9 [6 points]: Alphonso Ferrabosco is conducting a hypothesis test, and computes a p-value of .42. What conclusion should he draw? _____ Reject the null hypothesis. _____ Don’t reject the null hypothesis. ______ Reject the alternative hypothesis ______ Don’t reject the alternative hypothesis Question 10 [6 points]: Horatio Wajberlinski is testing H0: beer does not cause cancer HA: beer does cause cancer He gets a “reject” result on his hypothesis test. What conclusion should he draw? _____ There is enough evidence to believe that beer causes cancer. _____ There is enough evidence to believe that beer does not cause cancer. _____ There is not enough evidence to believe that beer causes cancer. _____ There is not enough evidence to believe that beer does not cause cancer. Question 11 [4 points]: We know that we divide by n-1 rather than by n in computing a sample (rather than population) variance or standard deviation. Why do we do so? Question 12 [4 points]: Gracetta Squornshellous and Murgatroyd Applegarth, for their STAT 301 project, conduct a survey on whether Stetson should construct a parking garage on campus. During their presentation they tell Dr. Rasp that to obtain data they went through the Commons and (in their words) “just handed out surveys randomly.” Dr. Rasp points out that they really have a convenience sample rather than a random sample. What would they have needed to do, in order to obtain a random sample? Question 13 [4 points]: “Placebin” is a newly developed pharmaceutical that is absolutely, completely, 100% ineffective at treating every disease, condition, or sickness in the universe. However, the manufacturers of placebin do not know that it is completely worthless. Hence, they conduct 100 different controlled experiments, to see whether placebin is effective in treating one of 100 different diseases. Given that placebin actually has no effect on anything, on how many of these 100 experiments can placebin’s manufacturers expect (on average) to obtain a “reject the null hypothesis” result? _____ on about 0 of the 100 experiments _____ on about 50 of the 100 experiments _____ on about 100 of the 100 experiments _____ on about 5 of the 100 experiments _____ on about 95 of the 100 experiments Question 14 [4 points]: Balph Snerdwell, for his (decidedly weak) STAT 301 project, surveys three Stetson students and asks them how much sleep they got last night. The data were 4, 6, and 8 hours. What is the population (rather than sample) mean amount of sleep? _____ (4 + 6 + 8)/3 _____ (4 + 6 + 8)/2 _____ we can’t tell from the information given Question 15 [4 points]: What does the term “statistically significant” mean? Question 16 [4 points]: Jubilation T. Cornpone is testing the null hypothesis that his “lucky” (Confederate) silver dollar is a fair coin, versus the alternative that it is not a fair coin. What is a Type I error, in this situation? Question 17 [4 points]: According to the Law of Large Numbers and the Central Limit Theorem, what two things happen to the sampling distribution of the sample mean as the sample size is increased? Question 18 [4 points]: Before he came to Stetson, Dr. Rasp taught for five years at the University of Alabama. One semester, he had 600 students in his Introduction to Business Statistics class. Let’s suppose that each one of those 600 individuals obtained data from a random sample of ten different Alabama students on the amount of money spent on textbooks that semester. Each one of Dr. Rasp’s 600 students computes a confidence interval for the true (unknown) population mean. Assume that each of the 600 students computes his/her interval correctly. On average, how many of those intervals will contain the true, unknown population mean? Why? SOLUTIONS 1a) False e) True b) False f) True c) False g) False d) True h) True 2) 68% (or two-thirds); 95% 3) He got the same number of “heads” as “tails.” 4a) nonsampling b) nonsampling c) nonsampling 5a) Null hypothesis: Sleep has no effect upon ‘festival’ grade. Alternative hypothesis: Sleep does affect (or: improves) ‘festival’ grade. b) observational study 6) Numbers are meaningless without a frame of reference. We need some sort of notion of context, of what constitutes “big” or “small” in a particular situation. The primary example used in class was 14.9 million forested acres in a state – is that a lot or a little? 7) None of the above. 8) 100% confidence interval means either (1) you have data on the entire population, or (2) your interval is “minus infinity to plus infinity”. The first isn’t very feasible; the second isn’t informative. 9) Don’t reject the null hypothesis. 10) There is enough evidence to believe that beer causes cancer. 11) A sample of data will tend to underestimate the variability in the entire population. The “n-1” is an adjustment for this – dividing by a smaller number makes the result larger. 12) Selection by a chance mechanism. 13) on about 5 of the 100 experiments 14) we can’t tell from the information given 15) we can reject the null hypothesis 16) Reject the null hypothesis if it is true. Say that the coin is not fair, when really it is fair. 17) LLN: the variance of the sampling distribution decreases CLT: the sampling distribution becomes normal 18) 95% of 600, or 570.