STAT 302 Final Exam Review These practice questions and problems should help you prepare for the final exam. Please do not feel like you need to write out an answer for each of these questions. I recommend reading through the entire review sheet and making sure you have an understanding of all the topics covered. I encourage you to focus on the application questions on this review sheet. Additionally, I recommend you review the class notes and homework. This review sheet only covers the new material (chapters 7-8). The final exam is cumulative. I recommend using the previous review sheets to review the old material (chapters 1-6). Chapter 7: Inference for Numerical Data 1. What do we do when we don't know the population standard deviation, 𝜎? Why can this be a problem for our normal confidence interval and hypothesis test procedures and what can we do to correct it? 2. What is the general formula for a confidence interval for the mean when 𝜎 is unknown? 3. What is the t-distribution? How does the shape of the t-distribution change as we increase our sample size? 4. What are the conditions for using the t-distribution to calculate a confidence interval or conduct a hypothesis test for the mean when 𝜎 is unknown? 5. A company is trying to determine the average time of customer service calls. In a sample of 11 random calls, the average length of the call was 7 minutes, with a standard deviation of 4. Assume the data is normally distributed and there are no outliers. a. What are the df? b. What is the associated critical value for a 90% confidence level? What is the associated critical value for a 95% confidence level? What is the associated critical value for a 99% confidence level? c. Construct a 90% confidence interval for the average length of the customer service calls. Based on this interval, would you believe that the average length of calls is 10 minutes? d. Construct a 95% confidence interval for the average length of the customer service calls. Based on this interval, would you believe that the average length of calls is 10 minutes? e. Construct a 99% confidence interval for the average length of the customer service calls. Based on this interval, would you believe that the average length of calls is 10 minutes? f. Are the conditions met to create these confidence intervals? 6. A recent study showed that the average house price in the US is $272,000. Lauren lives in California and believes that the average house price in her state is higher than the national average. She takes a random sample of 32 household in California and finds that the average house price is $312,000, with a standard deviation of 95,000. Assume the data is close to normally distributed (not strongly skewed) and there are no extreme outliers. Conduct the appropriate hypothesis test using a 0.05 alpha level of significance. a. b. c. d. e. What is the null hypothesis? What is the alternative hypothesis? What is α? What is the test statistic? What are the df? Page 1 of 7 f. What two values on the t-table bracket of the absolute value of our test statistic? (If the test statistic is larger or smaller than the largest/smallest value in the row, just put the value it is larger or smaller than.) g. Between what two values does our p-value fall? h. What is the appropriate decision? i. What is the appropriate conclusion? j. Were the conditions met to conduct the hypothesis test? 7. What are the two study designs that allow us to compare the means of 2 groups? Briefly describe both of these. 8. If subjects are measured twice under 2 different treatment conditions or at two different times, what is the study design? 9. When is it appropriate to use a matched pairs test? 10. For matched pairs data, what are the hypotheses? 11. For matched pairs data, what is the form of the test statistic? 12. For matched pairs data, what are the conditions necessary to do a hypothesis test or calculate a confidence interval? 13. For matched pairs data, what is the general form of a confidence interval? 14. When is it appropriate to use an independent samples test? 15. For independent samples data, what are the hypotheses? 16. For independent samples data, what is the test statistic? 17. For independent samples data, what are the degrees of freedom? 18. For independent samples data, what are the conditions necessary to do a hypothesis test or calculate a confidence interval? 19. For independent samples, is it appropriate to calculate a confidence interval for one of the group means and see if the other sample mean is in that confidence interval? Why or why not? 20. For independent samples data, what is the general form of a confidence interval? 21. Label the following scenarios as a single sample, matched pairs, or independent samples. Write the appropriate hypotheses for each hypothesis test. i. You collect a random sample of 24 hydrological stations in California and obtain data on the acidity of rainwater collected at each station in 2000 and in 2010. Acidity is measured by pH on a scale of 0-14 (the pH of distilled water is 7.0). You want to determine whether the acidity levels between 2000 and 2010 are different (difference = 2000 acidity - 2010 acidity). ii. Agricultural pests can be controlled to some extend either by using pesticides to kill them or by introducing a large number of sterilized males to diminish the species’ reproductive potential. Ten large corn fields are treated with pesticides and ten have sterilized males introduced. At harvest time, you compare the average corn yield for the two treatments. You want to determine if the pesticide treatment provides a larger yield than the sterilized male treatment (difference = sterilized male yield - pesticide yield). Page 2 of 7 iii. To check a new analytical method, a chemist obtains a reference specimen of a known concentration from the National Institute of Standards and Technology. She then makes 20 measurements of the concentration of this specimen with the new analytic method. She wants to check and see if the new method is biased by comparing the mean result with the known concentration (c). iv. Another chemist is checking the same new method. He does not have access to a reference specimen, but a familiar analytic method is available. He wants to know if the new and the old methods agree. He takes a specimen of unknown concentration and measures the concentration 10 times with the new method and 10 times with the old method. (difference = new method - old method). 22. Hutchinson-Gilford progeria syndrome is a rare genetic condition that produces rapid aging in children. As a result, cardiovascular disease is a common cause of death in the teenage years. A clinical study examined the effect of treatment with the drug lonafarnib on a number of physiological outcomes. Pulse wave velocity (PWV) is the standard measure of vascular stiffness, an important factor in cardiovascular health. Researchers studied 18 children with progeria to determine how lonafarnib affected their PWV. They measured the PWV values (in m/s) of each child at the beginning of the study (“untreated”) and then again at the end, after taking lonafarnib for two years (“treated”). They conducted a hypothesis test at the 0.05 significance level to determine if there was evidence that the two-year treatment with lonafarnib lowers PWV in children with progeria. Consider the difference to be PWV treated - PWV untreated. Assume that the children were randomly selected (case-control study) and that the distribution of the differences is slightly right skewed with no extreme outliers. The 18 differences in the sample have an average value of -3.917 and a standard deviation of 2.644. a. b. c. d. e. f. What is the appropriate type of test? What are the hypotheses? What is the significance level? What is the test statistic? What are the degrees of freedom? Between what two values does the test statistic fall (on the t-table)? (If the test statistic is larger or smaller than the largest/smallest value in the row, just put the value it is larger or smaller than.) g. What is the p-value? h. What is the decision? i. What is the conclusion? j. Are the conditions met? k. Create a 98% confidence interval for the difference between the before and after values. 23. Avandia (rosiglitazone maleate) is an oral anti diabetic drug produced by the pharmaceutical company GlaxoSmithKline. Before a drug can be prescribed, we must know how the body absorbs and excretes it. Patients were randomly assigned to be given a single dose of either 1 mg or 2 mg of Avandia and the maximum plasma concentration of the drug (in ng/mL) was assessed. Is there significant evidence at the 0.01 significance level that the maximum plasma concentration dose is dependent on dose and higher dose results in higher concentrations (difference = 2mg - 1mg)? Assume there are no extreme outliers in either group and that the distribution of both groups is strongly skewed right. Use the table below which gives the treatment summary statistics. Page 3 of 7 Summary Statistics Treatment n x-bar 1 mg 32 76 2 mg 32 156 s 13 42 a. b. c. d. e. f. What is the appropriate type of test? What are the hypotheses? What is the significance level? What is the test statistic? What are the degrees of freedom? Between what two values does the test statistic fall (on the t-table)? (If the test statistic is larger or smaller than the largest/smallest value in the row, just put the value it is larger or smaller than.) g. What is the p-value? h. What is the decision? i. What is the conclusion? j. Are the conditions met? k. Create a 90% confidence interval for the difference between the two doses. 24. What does ANOVA stand for? When do we use ANOVA? 25. Why can’t we use multiple independent samples t-tests instead of using ANOVA? 26. For an ANOVA test, what are the hypotheses? 27. For an ANOVA test, how do we get the p-value? 28. For an ANOVA, what are the conditions necessary to do a hypothesis test? 29. A dental study evaluated the effect of tooth etch time on resin bonding strength. A total of 78 undamaged, recently extracted first molars (baby teeth) were randomly assigned to be etched with phosphoric acid gel for either 15, 30, or 60 seconds. Composite resin cylinders of identical size were then bonded to the tooth enamel. The researchers examined the bond strength after 24 hours by finding the failure load (in megapascal) for each bond. The summary data and the partial ANOVA table for this experiment are below. A QQ plot showed no deviations from normality. The researchers wanted to test at the 10% significance level whether there was an association between tooth etch time and resin bonding strength. Summary Statistics Etch Time n x-bar s 15 seconds 26 4.49 2.28 30 seconds 26 6.98 3.15 60 seconds 26 8.48 4.17 JMP Output Source Etch time Error Total DF Sum of Squares 812.745 Mean Square F-Ratio Prob > F 0.0002 1023.953 a. What is the appropriate type of test? b. What are the hypotheses? Page 4 of 7 c. d. e. f. g. h. i. j. What is the significance level? Fill in the missing spaces on the ANOVA table. What is the test statistic? What is the p-value? What is the decision? What is the conclusion? Are the conditions met? Based on the summary statistics, what two groups do you expect to be statistically significantly different? 30. If we were to complete pairwise/multiple comparisons, using the Bonferroni correction, what would we use as our modified significance level, α*? Chapter 8: Introduction to Linear Regression 1. When do we use linear regression? 2. What is the first step in determining if there is a linear relationship between two numerical variables? 3. What does correlation, or r, measure? 4. When is it appropriate to use r? 5. What are the possible values of r? 6. What are the units of r? 7. Does r depend on the units a variable was measured in? Does r depend on which variable is defined as the response variable? 8. The correlation between IQ score and school GPA is r = 0.634. The correlation between wine consumption and heart disease is r = −0.645. Which of these two correlations indicates a stronger straight-line relationship? Explain your answer. 9. The following scatterplot shows the relationship between the weight of a car and the mileage it gets. How would you describe the association? If you were given a choice between -1.0, -0.79, -0.42, 0, 0.42, 0.79, and 1.0, which would you say is the best estimate of the correlation between these two variables? Page 5 of 7 10. What happens to r in the presence of outliers? For example, what would happen to r if an outlier was added that is in line with the rest of the data? What would happen to r if an outlier was added that was out of line with the rest of the data? 11. How do we interpret R-squared? 12. A study of class attendance and grades among first-year students at a state university showed that in general students who attended a higher percentage of their classes earned higher grades. Class attendance explained 25% of the variation in grade index among the students. What is the numerical value of the correlation between the percentage of classes attended and the grade index? 13. What is 𝛽1 ? What do we use to estimate 𝛽1 ? 14. How do we interpret the value of b1? 15. What is 𝛽0 ? What do we use to estimate 𝛽0 ? 16. A researcher wanted to see if the average SAT Mathematics score of each state’s high school seniors could be predicted by the proportion of each state’s seniors who took the exam. He found that the leastsquares regression line for predicting average SAT Math score from proportion taking is: average Math SAT score = 580.0 (109.7 × proportion taking). Interpret the slope and the intercept of this equation. In New York, the proportion of high school seniors who took the SAT was 0.89. Predict their average score. 17. The correlation between the average SAT Mathematics score in the states and the proportion of high school seniors who take the SAT is r = −0.843. The correlation is negative. What does that tell us? How well does proportion taking predict average score? (Use r2 in your answer.) 18. What is an outlier? 19. What is a high leverage point? 20. What is an influential point? 21. What are the hypotheses we test when we are doing statistical inference about linear regression? 22. For linear regression, how do we calculate the test statistic? 23. For linear regression, how do we calculate the p-value? 24. What do we use to denote the predicted response of an individual? 25. What is a residual, in regards to linear regression? 26. What is interpolation? What is extrapolation? Which one of these should we avoid? 27. A study was done to determine whether or not social exclusion causes “real pain.” Researchers looked at a random sample of 7 individuals; brain activity in an area of the brain that responds to physical pain to see if activity increases as distress from social exclusion increases (measured by social distress score). The social distress scores in the experiment ranged from 0 to 10. A scatterplot shows a moderately strong linear relationship. The table below shows the regression output. Researchers want to test whether or not there is an association between social exclusion and brain activity in this pain center at the 0.05 level. Page 6 of 7 a. b. c. d. e. f. g. h. i. j. Estimate Standard Error (Intercept) -0.1261 0.02465 Distress 0.06078 0.009979 What are the hypotheses? What is the significance level? What is the test statistic? What is the p-value? What is the decision? What is the conclusion? What is the regression equation? What is the predicted brain activity for a person with a social distress score of 2.0? A person has a social distress score of 2.0 and a brain activity level of 0.15. What is their residual? A scientist wants to use the above regression formula to estimate the brain activity for someone with a social distress score of 15. Is this a good idea? Why or why not? Page 7 of 7
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )