Math 311 Practice Final Exam 1 Part I—Definitions and Brief Answers (Type your answers.) 1. Define Confidence Level. A confidence level is a probability, based on a probability distriburion, that a score will end up within a particular confidence interval. 2. Define Type I Error. A Type I error is a probability of rejecting Ho, when, in fact, Ho is true. 3. Define Type II Error. A Type II error is the probability of NOT rejecting Ho, when, in fact, Ho is false. 4. Define Population Variance and state the formula. The population variance is the average of the squared differences from the mean of 5. Define Population Standard Deviation and state the formula. 6. State precisely how and why the formulas in (3.) and (4.) are different from those for samples. 7. There are many types of standard deviations. What is their common usage? 8. An anthropologist wishes to use a normal curve to make a statistical decision about population height on a combined group of Pygmy people (average height 4’11”) and Watusi people (average height = 5’9”). Briefly comment on the validity of the procedure. 9. A quality control engineer tests two small samples of electronic switches for duration of service. Sample A switches all burn out after similar durations, but Sample B switches burn out after very different durations, lasting a few seconds to a few years. What type of statistic should the engineer use to test differences of life expectancies of the two types of switches? all the populations scores. 𝜎2 = 2 ∑𝑛 𝑖=1(𝜇−𝑥𝑖 ) 𝑛 The population standard deviation is the square root of the population variance 𝜎 = 2 ∑𝑛 𝑖=1(𝜇−𝑥𝑖 ) √ 𝑛 . The sample variance and standard deviation differ from those of the population by dividing by n-1 (weighted average). This is done to control for bias: relatively small samples tend to be close to the mean, making measures of spread too small, unless we “average” by dividing by a smaller number (n-1). Intuitively, the standard deviation is a scaling factor (yardstick) for calculating probability; roughly 68% of the scores lie within 1 SD of the mean, 95% lie within 2SDs of the mean, and 99.7% lie within 3 SDs of the mean. The combined population of Pigmies and Watuis is NOT normally distributed with respect to height. In fact it is bimodal (two humps). Statistics based on heights of this population will not be valid. The scenario suggest an independent, (small sample) t-test where (unequal) variances are not pooled. Math 311 Part II: Hand Calculations Practice Final Exam 2 Calculate the following using the data set: 10, 15, 25, 23, 26, 36, 50. SHOW YOUR WORK 1. Mean The mean is the arithmetic average of the data, approximately 26.42857143 2. Median 3. Assume the sample above is from a normally distributed population. Calculate the sample standard deviation (Start with the formula and show the main steps.) The ordered data set is 10,15,23,25,26,36,50, and the median is the middle score: 25. 2 ∑𝑛 𝑖=1(𝜇−𝑥𝑖 ) 𝑠= √ 𝑛−1 ~√ 2 ∑𝑛 𝑖=1(26.43−𝑥𝑖 ) 𝑛−1 ~ 13.30234 4. Assume the sample in the data set above is from a normally distributed population. Estimate a 68% confidence interval around the Sample Mean. (Show your reasoning!) 68% of the scores should lie approximately 1SD = 13.3 away from the sample mean = 26.43. Therefore the confidence interval is approximated by 5. Mars Candy Company claims that 13% (by number) of their plain m&ms are brown. In a sample of 20 m&ms, what is the probability that 10 to 13 of them (inclusive) are brown? (Show your reasoning.) 6. Suppose μ = 55, σ =3, 𝑥̅ =45, n=30 for a normally distributed sample. Transform to the zscale and demonstrate if 𝑥̅ is inside or outside a 99% confidence interval. 7. Suppose a pre-test and posttest are given to 10 students with the average of the post – pre scores being 3.1. S = 5.06. A significant improvement was expected at the 95% confidence level. Was the improvement significant? Calculate a confidence interval and test the proper hypothesis. We can compute this probability exactly, using the binomial distribution: P = C(20,10)(.1310)(.87)10+ C(20,11)(.1311)(.87)9 + C(20,12)(.1312)(.87)8 + C(20,13)(.1313)(.87)7 ~ 0.000783 (26.43-13.30, 26.43+13.30) = (13.13, 39.73) Use a table or MS Excel to find the 99% CI on the z-scale: (-2.576, 2.576) 𝑥̅ −μ √n Z = σ/ 45−55 √30 = 3/ ~ -18.26. Ho: upost < upre Ha: upost > upre The correct design is a paired (dependent) sample test with a one-sided, 95% CI, which on the score difference scale is (-∞, 0+1.83*5.06/√10) = (-∞, 2.93). Note that post-pre = 3.1 is outside of this interval, so the improvement was significant at the 95% confidence level. Math 311 Part III—Scenarios Download Data Sets here. Type your answers. Practice Final Exam 3 Scenario 1. A random sample of voters from Math 311 indicated the voting choices shown in the data set. Can you be 95% certain that voting for a gubernatorial candidate was associated with voting for a presidential candidate? Paste 3-D Pivot Chart Here. Ho: All four observed proportions are the same as expected proportions. Ha: At least one observed proportion is not equal to an expected proportion. Presidential Choice vs Gubernatorial Choice Past Pivot Table Here Count of Gubernatorial Choice Row Labels I M Grand Total Column Labels O R Grand Total 16 1 17 6 6 16 7 23 20 16 10 1 0 R I M Paste Observed Value Matrix 6 O X2 = 18.55 Df = (n-1)(m-1) = 1 16 1 [ ] 0 6 P = .0000165 Conclusion Paste Expected Value Matrix 11.8 4.17 [ ] 5.17 1.83 Reject Ho. Choice of presidential candidate is associated with (dependent on) choice of governor (or vice-versa). However, results are compromised by cell values less than 5. Math 311 Practice Final Exam 4 Scenario 2 A mathematics instructor is wondering if his students performed significantly better than last year’s students on his final exam at the 95% confidence level. Last year’s average was 74. This year’s data can be found in the data set. Ho: u< 74 Name your technology method here: Ha: u > 74 Calculator Circle one: One-sided Two-sided Type of Test: 1 sample t-test: one-sided (easy with Note: I would accept a two-sided test, because the calculator; otherwise done by hand; Excel doesn’t do this.) problem suggests only slight evidence for a one-sided test. Appropriate t or z (circle one) value: t = -.203 Circle one: Retain Ho Reject Ho P-value: p = .58 Conclusion: This year’s students did NOT significantly outperform last year’s students at the 95% confidence level. Scenario 3 A quality control engineer suspects an extruding machine loses accuracy when the shop gets warm in the afternoon. To test her conjecture the engineer sets a 95% confidence interval and randomly measures circular extruded lids from the machine in both morning and afternoon samples. The lids are paired by the order in which they are extruded during these shifts, and there is a 1 hour lunch break between samples. What can the engineer rightly conclude? (See the data.) Here you must guide yourself through the complete statistical process, and show the way through it. You may use technology to help you, but indicate which technology you used. What I am looking for here is that you thoroughly consider the issue rather than follow a protocol. In essence, this would be “data mining” for a quality control engineer. Here is a synopsis: This scenario suggests a paired-sample t-test, but losing accuracy is not well-defined, so a two-sided, paired-sample test is best. Homogeneity of variances is assumed here, and the differences of scores are skewed to the left. Importantly, the sample variances differ by a factor of 100, and are greater in the afternoon. Although the paired sample t-test shows no significant differences at the 95% confidence level (p = .43), the great afternoon variance may explain accuracy loss in the heat of the afternoon. Also, the tolerance level of the lid diameters was NOT stated, but the afternoon average lid diameter is almost .2 units larger than that in the morning. All of this suggests (WITHOUT SIGNIFICANCE) that there might be some effect that cannot be measured in diameter (only). Differences 5 4 3 2 1 0