Statistics 101 Name Final Exam INSTRUCTIONS: Read the questions carefully and completely. Answer each question and show all your work in the space provided. Partial credit cannot be given if work is not shown. When asked to explain, describe, or comment, do so in the context of the problem. 1. (2 pts per answer) Multiple Choice. Please write the correct answer in CAPITAL LETTERS in the blanks provided. (a) For data that is skewed to the right, the mean will be (A) less than (B) greater than (C) about the same as the median. (b) As the sample size increases, the standard deviation of the sampling distribution of the sample mean will (A) stay the same (B) increase (C) decrease. (c) As the sample size increases, the mean of the sampling distribution of the sample mean will (A) stay the same (B) increase (C) decrease. (d) CNN posts a question on their web page concerning recent world events and invites users to respond to the question. This is an example of a (A) simple random sample (B) unbiased sample (C) convenience sample (D) voluntary response sample. (e) Which of the following is NOT a type of bias? (A) Non-response (B) Sampling Variability (C) Response (D) Poor wording of questions. (f) Everything else being equal, the width of a 95% confidence interval will be (A) less than (B) greater than (C) the same as the width of a 90% confidence interval. (g) Everything else being equal, the margin of error for a confidence interval with a sample size of 30 will be (A) less than (B) greater than (C) the same as the margin of error for a confidence interval with a sample size of 15. (h) The type of categorical display that must include all possible categories is a (A) bar graph (B) pie chart (C) scatter plot (D) two-way table. (i) You have calculated the correlation coefficient between two variables as r = 0. This would indicate (A) a strong positive linear relationship (B) a weak negative linear relationship (C) no relationship (D) no linear relationship between the two variables. 1 (j) In order to determine the effect of a new migraine drug on the length and severity of migraines, fifty subjects were randomly divided into two groups. One group took the new drug at the onset of a migraine while the other group took no drugs. At the end of the migraine, subjects were asked to rate the migraine by length and severity. This experiment does not take into account (A) control (B) randomization (C) the placebo effect (D) replication. (k) A 95% confidence interval for the population mean is calculated to be (10.5, 11.6). Choose the correct interpretation of this confidence interval. (A) We are 95% confident that the sample mean is between 10.5 and 11.6. (B) 95% of the values from the population are between 10.5 and 11.6. (C) 95% of the values from the sample are between 10.5 and 11.6 (D) We are 95% confident that the population mean is between 10.5 and 11.6. (l) The mean and standard deviation should be used to summarize data that (A) is skewed to the right (B) has no outliers and is symmetric (C) is skewed to the left (D) has outliers. (m) An observation that does not fit the overall pattern of the data is called (A) an outlier (B) an experiment (C) a variable (D) a bias. (n) In conducting a hypothesis test, the p-value is defined as the probability of (A) getting the observed value of the test statistic or more extreme when the null hypothesis is false (B) getting the observed value of the test statistic when the alternative hypothesis is true (C) getting the exactly the observed value of the test statistic when the null hypothesis is true (D) getting the observed value of the test statistic or more extreme when the null hypothesis is true. (o) The middle 50% of any data set is located between (A) the minimum and maximum (B) Q1 and Q3 (C) the range and interquartile range (D) the mean and median. (p) If the equation for a regression line is ŷ = 25 − 5.25x and R2 = 0.41, the value of the correlation between the two variables is (A) -0.64 (B) 0.64 (C) -0.1681 (D) 0.1681. (q) The Central Limit Theorem states that as n increases, the sampling distribution of Y will tend towards a (A) normal distribution with mean µ and standard deviation σ/n (B) normal distribution with mean µ and standard deviation σ (C) skewed distribution √ with mean µ and standard deviation σ/ n (D) normal distribution with mean µ and √ standard deviation σ/ n. 2 2. (4 pts) The four graphs below depict the sampling distribution for the sample mean from a sample of size n from a population that is normally distributed with mean 10 and standard deviation 10. (B) 0.0 0.0 0.05 0.01 0.10 0.02 0.15 0.20 0.03 0.25 0.04 (A) -20 -10 0 10 20 30 40 -10 0 10 20 30 40 -20 -10 0 10 20 30 40 (D) 0.0 0.0 0.02 0.1 0.04 0.06 0.2 0.08 0.3 0.10 0.12 0.4 (C) -20 -20 -10 0 10 20 30 40 Which graph depicts the sampling distribution of the sample mean when (a) n = 1? (b) n = 10? (c) n = 50? (d) n = 100? 3 3. Short Answer I (4 pts per answer) For each problem, state the null and alternative hypotheses. DO NOT PERFORM THE ANALYSIS. (a) The design of controls and instruments affects how easily people can use them. Twentyfive left-handed people were asked to turn a knob clockwise and counterclockwise. The time it took each subject to turn the knob each way was then recorded. Do left-handed people turn the knob faster counterclockwise? (b) A previous study claimed the mean cellulose content of alfalfa hay is 142 mg/g. An agronomist believes the true value is different from 142 mg/g. To test his belief, he takes a simple random sample of 15 cuttings from the population and finds the sample mean cellulose content is 145 mg/g with a sample standard deviation of 3 mg/g. Is this sufficient evidence that the mean cellulose content of alfalfa hay is different than 142 mg/g? (c) In a randomized comparative experiment on the effect of dietary calcium on blood pressure, 54 healthy white males were divided at random into two groups. One group received calcium; the other, a placebo. Blood pressures for all 54 men were taken before and after the study. The researchers believe the men in the calcium group will have a greater mean reduction in systolic blood pressure than the men taking the placebo. (d) In a survey, 75% of ISU undergrads surveyed would like to see VEISHA continued. Is this sufficient evidence that at least 70% of all ISU undergrads would like to see VEISHA continued? 4 4. (16 pts) Short Answer, II. (a) (4 pts) In order to determine the effect of a new migraine drug on the length and severity of migraines, fifty subjects were divided into two groups based on the personal preference of the subject. One group took the new drug while the other group took the current migraine drug on the market. At the end of each migraine, subjects were asked to rate the migraine by length and severity. Name two things wrong with this experiment. (b) (4 pts) Iowa State University would like to obtain information from its students about support for VEISHA across campus. A random sample of 500 students is taken and surveys are sent to the campus addresses of the 500 students. Name two sources of bias that could potentially affect the results of this sample. (c) (4 pts) Of the four principles of experimentation, control, randomization, replication and blocking, which one is not required for all experiments? Explain your answer. (d) (4 pts) Of the four principle of experimentation, control, randomization, replication and blocking, which one is the most important? Explain your answer. 5 5. (14 pts) The height of men in Statistics 101 is normally distributed with a mean height of µ = 70.5 in and a standard deviation of σ = 3 in. (a) (4 pts) What is the probability that a man selected at random from the class will have a height less than 68 inches? (b) (3 pts) What is the sampling distribution of the sample mean height of 5 men selected at random from the class? (c) (5 pts) What is the probability that the mean height of 5 men selected at random from the class will be less than 68 inches? (d) (2 pts) Do you need the Central Limit Theorem to answer parts (b) and (c)? Explain your answer. 6 6. (15 pts) Many high school mathematics teachers believe the language and reading skills of a student is an important part of the ability to learn and understand mathematics. The JMP output attached contains a scatterplot of the ACT English and Math Scores for 100 students. (a) (2 pts) Why was the ACT Math variable chosen to be the response variable? (b) (2 pts) What is the least squares regression line for predicting ACT Math score from ACT English score? (c) (4 pts) What is the value of the slope? Give its interpretation in the context of the problem. (d) (4 pts) What is the value of R2 . Give its interpretation in the context of the problem. (e) (3 pts) Describe the residual plot and make note of any potential problems with the regression. 7 7. (10 pts) In the Gallup Poll from April 16 - 18, 2004, 520 out of the 1000 adults 18 and older surveyed approved of the job President Bush is doing as president. (a) (2 pts) What is the population the Gallup Poll is surveying? (b) (4 pts) Calculate a 95% confidence interval for the population proportion of Americans that approves of the President’s job performance. Assume all assumptions are met. (c) (4 pts) Give an interpretation of the confidence interval you calculated in part (b). 8 6 4 0 2 Number of Stores 8 10 8. (28 pts) A manufacturer of small appliances employs a market research firm to estimate retail sales of its products by gathering information from a sample of retail stores. This month a simple random sample of n = 30 stores in the Midwest sales region finds that these stores sold an average of 23.37 of the manufacturer’s hand mixers, with a sample standard deviation of 2.43. A histogram of the 30 data points from the sample is given below. 18 20 22 24 26 28 Number of Mixers (a) (3 pts) Describe the distribution of the number of mixers sold using the histogram. Make sure to mention shape, center, spread and any outliers. (b) (4 pts) Find a 95% confidence interval for the mean number of mixers sold by all stores in the Midwest region. (c) (4 pts) Give an interpretation of the confidence interval you calculated in part (b). 9 (d) (17 pts) Based on previous experience, company executives expect to sell a mean of µ = 22 hand mixers during this month. Based on the data, is there sufficient evidence to conclude the company had sales exceeding this expectation? Do the appropriate hypothesis test. Assume a significance level of α = 0.05. 10 9. (21 pts) In an experiment to study the effect of the spectrum of ambient light on the growth of plants, researchers assigned tobacco seedlings at random to two groups of eight plants each. The plants were grown in a greenhouse uder identical conditions except for lighting. The experimental group was grown under blue light, while the control group was grown under natural light. Here are the data on stem growth in millimeters: Control Group 4.0 3.5 3.9 3.7 4.0 3.6 3.8 3.7 Experimental Group 3.6 3.4 3.7 3.7 3.2 3.4 3.5 3.6 (a) (4 pts) Calculate the mean and median of the control group. (b) (4 pts) The experimenter believes that the mean stem growth will be longer for the control group than for the experimental group. What are the null and alternative hypotheses for this hypothesis test? (c) (5 pts) The sample standard deviation for the control group is 0.18 and the sample standard deviation for the experimental group is 0.17. Calculate the appropriate test statistic for this hypothesis test. 11 (d) (3 pts) Find the p-value for this significance test. (e) (2 pts) What is your decision for this significance test? Explain your answer. Assume a significance level of α = 0.1. (f) (3 pts) State your conclusion in the context of the problem. 12 10. (17 pts) A poll was conducted by the University of Montana in which the 202 respondents were classified according to the area of Montana that they live in and their political party affiliation. The results are in the table below. West Northeast Southeast Total Democrat 39 15 30 84 Republican 17 30 31 78 Independent 12 12 16 40 Total 68 57 77 202 (a) (4 pts) The pollster would like to test if there is a relationship between the area of Montana in which a person lives and their party affiliation. State the appropriate null and alternative hypotheses. (b) (4 pts) Calculate the expected cell counts for the West - Democrat cell and for the Northeast - Independent cell. (c) (4 pts) Calculate the contribution of the cells West - Democrat and Northeast - Independent to the χ2 statistic. (d) (2 pts) The total χ2 statistic for the two-way table is 13.849 with a p-value of 0.0078. What is your decision about the hypothesis test if the significance level is α = 0.05? Explain your answer. (e) (3 pts) What is the conclusion for the test in the context of the problem? 13 Formulas x= xi n P 1 r= n−1 y= P yi n P sx = (xi − x)(yi − y) sx sy z= p̂ ± z ⋆ s sP (xi − x)2 n−1 ! sy b1 = r sx y−µ σ z= p̂(1 − p̂) n C% z⋆ 90 1.645 y−µ √ σ/ n po (1−po ) n t= y − µo √s n (z ⋆ )2 (0.5)(0.5) (M E)2 95 1.960 (y 1 − y 2 ) ± t ⋆ 98 2.326 s 99 2.576 s2 s21 + 2 n1 n2 y − y2 r1 s21 n1 expected cell count = χ2 = + s22 n2 row total ∗ column total table total X (observed − expected)2 expected 14 (yi − y)2 n−1 b0 = y − b1 (x) p̂ − po z=q s y ± t⋆ √ n n= sy = sP Bivariate Fit of ACT Math Score By ACT English Score ACT Math Score 35 30 25 20 15 15 20 25 30 ACT English Score 35 Linear Fit Linear Fit ACT Math Score = 11.727885 + 0.5483277 ACT English Score Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.40537 0.399303 3.42642 24.97 100 Analysis of Variance Source Model Error C. Total DF 1 98 99 Sum of Squares 784.3554 1150.5546 1934.9100 Mean Square 784.355 11.740 F Ratio 66.8085 Prob > F <.0001 Parameter Estimates Term Intercept ACT English Score Estimate 11.727885 0.5483277 Std Error 1.655936 0.067085 t Ratio 7.08 8.17 Residual 10 5 0 -5 -10 15 20 25 30 ACT English Score 35 Prob>|t| <.0001 <.0001