Statistics 108, Final, Fall 2008 Name (print):_____________________ 137 points. I confirm that I am allowed only a calculator and one 3-by-5 inch note card of notes for this exam. I will not look at anybody else’s exam and I will take all necessary efforts to prevent others from seeing my exam. I will use the provided probability tables to cover up my work and answers. The consequence of using additional test aids, copying from others, or allowing others to copy my work can result in an F grade for this class. I have read and agree with the above statement. Signature: _____________________ (a.) Show all work to receive full credit. (b.) Circle your final answer. (c.) Ask the instructor for clarification if any questions are unclear. (1) Suppose you are estimating the mean weight of adult fish of a particular species. You collect a sample of 100 fish. After calculating the mean and standard deviation of the 100 fish, you state that 95% confidence interval for the mean weight of fish is (486, 522) grams. (1a. 3pts) “You are 95% confident that the mean weight of _________________ is between 486 and 522 grams.” Which phrase, (i) or (ii), most correctly fills the blank? (i) the 100 sampled fish (ii) the fish in the population (1b. 3pts) Expanding on question (1a), specifically explain what is meant by being “95% confident”. Use the idea of repeated samples or the movie “Groundhog Day”. (1c. 3pts) The general structure of most confidence intervals is (fill in blank): Estimate Multiplier _____________ (2. 2pts) Suppose insects of a particular species were sampled from two different environments. 40 insects were sampled from environment A and 39 insects from environment B. A two-sample t-test was performed to test the null hypothesis of equal lengths. The resulting test statistic was 2.39 giving a p-value of 0.02. Using a level of significance of 5%, we should ____________ the null hypothesis. (Circle: (i)keep, (ii)reject ) (3. 3pts) When performing a statistical test, we do our calculations assuming the __________ hypothesis is true. Circle the correct answer to fill in the blank: (a) alternative (b) null (c) power (d) third (4. 2pts) In general, when performing a statistical test, we first calculate a test statistic. The test statistic is then compared to a probability distribution to obtain a _________________ which we compare to 0.05 (α) to determine our decision regarding the hypotheses. Circle the correct answer to fill in the blank: (a) beta value, (b) mean, (c) p-value, (d) standard deviation 1 (5. 4pts) Circle which statement best describes a p-value? The p-value is the probability of the null hypothesis being true. The p-value is the probability of the alternative hypothesis being true. Assuming the alternative hypothesis is true, the p-value is the probability of getting a test statistic as extreme or more extreme than the test statistic you calculated from your sampled data. Assuming the null hypothesis is true, the p-value is the probability of getting a test statistic as extreme or more extreme than the test statistic you calculated from your sampled data. The p-value is the probability of committing a type 1 error. The p-value is the probability of committing a type 2 error. (6. 3pts) Inspect the below dotplots. Which sample has the larger variance? Circle: (a) Sample 1, (b) Sample 2, (c) Same variance Dotplot of Sample 1, Sample 2 Sample 1 Sample 2 36 54 72 90 108 126 144 162 Data (7) A company claims on average there are 16 ounces of cereal in their boxes of cereal. A suspicious statistician performed a 1-sample t- test was to determine if the mean weight of the company’s cereal boxes was less than 16 ounces. The contents from 100 boxes were measured resulting in a mean of 15.8 ounces and a standard deviation of 1. (7a. 3pts) Complete the null and alternative hypotheses. 𝐻0 : µ 16 𝐻𝐴 : µ 16 (7b. 3pts) Calculate the t-statistic. (8) Suppose you performed a t-test where 𝐻0 : 𝜇 = 𝜇0 vs. 𝐻𝐴 : 𝜇 ≠ 𝜇0 and your degrees-of-freedom was 10. (8a. 3pts) Using α =0.05, state the critical value(s) for this test. (8b. 3pts) Suppose the t-statistic was 1.37. Calculate the p-value. 2 (9. 2pts) In the Venn diagram below, shade in the appropriate region or regions to represent 𝐴̅ . (10. 4pts) Circle which one statement best describes the key idea of the central limit theorem. As the sample size increases, the distribution of the sample becomes distributed like the normal distribution. As the sample sizes increases, the distribution of the sample means become distributed more like the normal distribution. As the sample size increases, the population variance decreases. As the sample size increases, the population becomes distributed more like the normal distribution. (11. 2pts) In the Venn diagram below, shade in the appropriate region or regions to represent 𝐴 ∩ 𝐵 . (12. 4pts) Fill in the blanks: A ______________ [chose: continuous or discrete] random variable’s distribution is described by a density curve. The height of the density curve will always be 0 or greater and the total area beneath the curve will be ______________[hint: a number] . 3 (13) Suppose the university performed a study to get an understanding about the finances of all students at Humboldt State University. To get this picture, the university randomly sampled 200 student ID numbers, contacted those students and interviewed them. (13a. 3pts): The researcher calculated the mean amount of debt of the 200 students. for these 200 students a parameter or statistic? Circle: (i) Parameter (ii) Statistic Is the mean amount of debt (13b. 4pts): 80 of these 200 students had student loans. Calculate a 95% confidence interval for the proportion of HSU students that have student loans. Show the formula you used. (14. 4pts) In simple linear regression, the best line for y 0 1 x is determined via “least-squares”. Describe in detail how the line is determined. (Hint: Explain “least-squares”.) (15. 2pts) In the Venn diagram below, shade in the appropriate region or regions to represent 𝐴⋃𝐵̅ . 4 (16. 2pts) In the Venn diagram below, shade in the appropriate region or regions to represent 𝐴̅ ∩ 𝐵̅ . (17. 3pts) Morning weather conditions were recorded for 80 days of winter in Eureka. Suppose 20 mornings were rainy, 40 foggy, and 20 sunny. Sketch a pie chart of this data making sure you label the rainy, foggy, and sunny sections. How many degrees should be assigned to the “foggy” slice. (18.) The number of eggs in a nest is distributed according to the following fictitious probability distribution. A nest can contain as little as one egg and as many as 4 eggs. The probability distribution for the number of eggs (k) is given below with the exception of the probability for 3 eggs. K P(X=k) cdf: P( X k ) 1 0.2 ???= 2 0.3 ???= 3 ????? ???= 4 0.1 ???= (18a. 4pts) Fill in the table for P(X=3). Show your work. (18b. 3pts) Fill in the table for the cumulative distribution function (cdf) column. Show your work. (18c. 3pts) Calculate the expected value for X, E(X). Show your work. 5 (19.) Below is a scatter plot and simple linear regression using tree diameter to predict height. The regression equation is HEIGHT = 62.03 + 12.65 DIAMETER. Fitted Line Plot HEIGHT = 62.03 + 12.65 DIAMETER 90 S R-Sq R-Sq(adj) 85 5.53846 27.0% 24.4% HEIGHT 80 75 70 65 60 0.6 0.8 1.0 1.2 DIAMETER 1.4 1.6 1.8 (19a. 4pts) If a randomly sampled tree had a diameter of 1.72 units, using the regression analysis, what would be its expected height? Show your calculations. Answer: ___________________. (19b. 2pts) Calculate the residual value for the height of the tree in the problem 19a, assuming its actual height was 88. Show your work. Answer: ___________________________ (19c. 3pt) For each unit change in diameter, you would expect height to increase how many units? Answer: __________________ (20. 5pts) Suppose the random variable X is distributed according to the normal distribution with mean=50 and sd=5. Furthermore suppose a random sample of 25 values will be collected and their sample mean 𝑋̅ calculated. Calculate P(𝑋̅ > 51). Show your work. Answer:__________________ 6 (21. 3pts) A fisheries student studied the weights of 2-year old rainbow trout raised in three different creeks. 35 fish were captured in creek A, 33 in creek B, and 35 in creek C. Below is Minitab ANOVA output for the data. One-way ANOVA: A, B, C Analysis of Variance Source DF SS Factor 2 4.09 Error 100 100.93 Total 102 105.02 Level A B C N 35 33 35 Pooled StDev = Mean 2.503 2.985 2.656 1.005 MS 2.04 1.01 StDev 0.997 1.033 0.985 F 2.02 P 0.138 Individual 95% CIs For Mean Based on Pooled StDev ---------+---------+---------+------(---------*--------) (---------*---------) (---------*---------) ---------+---------+---------+------2.45 2.80 3.15 Which conclusion is most appropriately inferred from the ANOVA output if using a level of significance of 5%? There is insignificant statistical evidence to suggest that the mean weight of 2-year old rainbow trout differs between creeks A,B, and C. (One-way ANOVA, P=0.138) There is statistically significant evidence that the mean weight of 2-year old rainbow trout differs between creeks A, B, and C. (One-way ANOVA, P=0.138) There is insignificant statistical evidence to suggest that the standard deviations of 2-year old rainbow trout differs between creeks A,B, and C. (One-way ANOVA, P=0.138) There is statistically significant evidence that the standard deviations of 2-year old rainbow trout differs between creeks A, B, and C. (One-way ANOVA, P=0.138) (22. 3pts) Suppose a statistical test is performed using 𝛼 = 0.05. Also suppose the null hypothesis is true, but when performing the test you do not know that the null hypothesis is true. What is the probability of committing a Type 1 error for this test? Answer:_________________ (23. 5pts) Suppose the average length of 27 salamanders is 56mm with a standard deviation of 14mm. Calculate the 95% confidence interval for salamander lengths. Show your work. 7 (24.) Suppose you were doing a psychological test on mice and wanted to know whether or not they favor a specific color. You placed food in red, black, and white bowls. 198 different times, a mouse (different mouse each time) would be observed to see which bowl of food it finished all the food from first. The data were: red=64, black=60, and white=74. Calculate the chi-square statistic for a goodness-of-fit test on these data to test the idea that the mice had no preference. (24a. 3pts.) Expected number of red =________, black=__________, white= ______________. Show your work. (24b. 3pts) Chi-square statistic = _________________________. Show your work. (24c. 2pts.) Determine the critical value: df=____________, critical value=________________ (24d. 2pts) True or False: Reject the null hypothesis. (25. 4pts) Circle the correlation that best fits the below graph? (i) –1.34 (ii) –1 (iii) –0.79 (iv) –0.05 (v) 0 (vi) +0.05 (vii) +0.79 (viii) +1 (ix) +1.34 Scatterplot of X vs Y 115 110 X 105 100 95 90 90 95 100 105 110 115 Y (26. 4pts) Suppose you toss 3 fair dice in a row. What is the probability of the first die being a 6, the second being a 1, and the third being a 6? Show your work. 8 (27.) The below contingency table consists of data describing the number of cycles between stopping birth control and a planned pregnancy. Women are categorized by smokers (1st row) and non-smokers (2nd row). first cycle Smokers 29 non-smokers 198 Total 227 2+ cycles 71 288 359 total 100 486 586 (27a. 3pts) Estimate the probability: P( 2+ cycle | smoker ). Probability = _________________ (27b. 3pts) Assuming independence between smoking and number of cycles, how many women were expected to fall in the category of non-smoker & 2+ cycles ? Show your work. Expected value=____________________ (27c. 2pts) The chi-square statistic value for this chi-square test of independence analysis was 4.817. Using a level of significance of 5% (α=0.05), determine the appropriate critical value. Df = ____________ Critical value= ______________ (27d. 3pts) In words, state the null and alternative hypothesis for this test. Ho: Ha: (27e. 2pts) True or False: The null hypothesis should be rejected. 9 (28. 3pts) A doctor wanted to find out if smoking affected the physical endurance of mice. The doctor forced 100 mice to breathe high daily doses of cigarette smoke and another 100 mice were not exposed to smoke. After 2 months the mice were forced to run on a hamster wheel until they tumbled from exhaustion. The length of time until they started to tumble was measured for each mouse. It was thought that the mean length of time for the smoking mice would be less than that of the non-smoking mice. (This was not a real study – no mice were harmed in the course of making this problem.) Circle which statistical test would be most appropriate for this dataset. Chi-square test for independence 1-sample t-test 2-sample t-test ANOVA Chi-square goodness-of-fit test paired t-test simple linear regression (29. 3pts) People have a “dominant” and “non-dominant” eye. If you were to look through a small hole in a paper (or your hands) arms length away, you’d use your dominant eye. A scientist wanted to study whether people’s dominant eye is, on average, stronger than their non-dominant eye. (Perhaps the dominant eye is exercised more.) Using randomly selected people, she measured from how many feet away each person could read a sign with their dominant eye and again with their non-dominant eye. Circle which statistical test would be most appropriate for this dataset. Chi-square test for independence 1-sample t-test 2-sample t-test ANOVA Chi-square goodness-of-fit test paired t-test simple linear regression 10