For a particular sample of 50 scores on a psychology exam, the following results were obtained. First quartile = 42 Third quartile = 74 Standard deviation = 8 Range = 56 Mean = 58 Median = 57 Mode = 68 Midrange = 61 Answer each of the following; show all work. What score was earned by more students than any other score? Why? – The score 68 was earned by more students than any other score. This is because the mode = 68. The mode is defined as the most frequently occurring item. What was the highest score earned on the exam? - What was the lowest score earned on the exam? – Range = 56 Midrange = 61 Highest score = Midrange + Lowest score = Midrange - 𝑟𝑎𝑛𝑔𝑒 2 𝑟𝑎𝑛𝑔𝑒 2 =61 + 56/2 = 61 + 28 = 89 =61 - 56/2 = 61 – 28 = 33 According to Chebyshev's Theorem, how many students scored between 42 and 74? – By chebyshev’s theorem 𝑃(|𝑋 − 𝜇| ≤ 𝑘) ≥ 1 − 𝜎 2 /𝑘 2 𝜇 = 𝑚𝑒𝑎𝑛 = 58, 𝜎 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 8 𝑃(42 ≤ 𝑋 ≤ 74) = 𝑃(42 − 58 ≤ 𝑋 − 58 ≤ 74 − 58) = 𝑃(−16 ≤ 𝑋 − 58 ≤ 16) 82 1 3 = 𝑃(|𝑋 − 𝜇| ≤ 16) ≥ 1 − 2 = 1 − = 16 4 4 So the minimum number of students scored between 42 and 74 is 50 × 3/4 = 37.5 ≅ 38 Assume that the distribution is normal. Based on the Empirical Rule, how many students scored between 42 and 74? 42 = 58-16 = 58-2× 8 = mean – 2 standard deviation 74 = 58+16 =58+2× 8 = mean + 2 standard deviation According to the rule of thumb, 95% of the observations will lie between mean ± 2 standard deviations. So the number is 50× 0.95 = 47.5≅ 48 2. Answer the following: - If the correlation coefficient is 0.91, what is the sign of the slope of the regression line? – The correlation coefficient and the regression coefficients have the same sign. So the regression coefficient is also positive. As the correlation coefficient decreases from -0.21 to -0.27, do the points of the scatter plot move toward the regression line, or away from it? As correlation coefficient decreases from -0.21 to -0.27, the magnitude of the correlation coefficient increases from 0.21 to 0.27. So the scatter plot move toward the regression line. 3. A math test was given; a sample of the scores yielded the following results: 87, 55, 55, 3, 79, 5, 77, 89, 93, 27, 99, 97, 77, 62, 55, 94 Find the range, standard deviation, and variance for the sample scores. Lowest observation =3 Highest observation = 99 Range = 99-3 =96 Sum of the observations = ∑ 𝑋 = 87 + 55 + 55 + 3 + 79 + 5 + 77 + 89 + 97 + 77 + 62 + 55 + 94 =1054 Sum of the squares of the observations ∑ 𝑋 2 = 7569 + 3025 + 3025 + 9 + 6241 + 25 + 5929 + 7921 + 8649 + 729 + 9801 + 9409 + 5929 + 3844 + 3025 + 8836 = 83966 Variance = ∑ 𝑋2− (∑ 𝑋)2 𝑛 𝑛−1 = 83966− 15 10542 16 = 83966−69432.25 15 = 14533.75 15 = 968.9167 Standard deviation= √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √968.9167 = 31.1274 4. In terms of the mean and standard deviation: - What does it mean to say that a particular value of x has a standard score of +4.4? – The standard score is calculated as Standard score = 𝑠𝑐𝑜𝑟𝑒 – 𝑚𝑒𝑎𝑛 𝑠𝑐𝑜𝑟𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠 Standard score = 4.4 means that Score = mean score + 4.4 × 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠. The score is 4.4 standard deviations more than the mean score. What does it mean to say that a particular value of x has a z-score of -1.6? 𝑍= 𝑋−𝜇 = −1.6 𝜎 𝑋 = 𝜇 − 1.6𝜎 The score is 1.6 standard deviations less than the mean score. 5. A student scored 10 percent on a test, and was in the 5th percentile. Explain these two numbers. The particular student got 10% of the total marks of the test. 5% of all the students got less than 10% marks. 95% of all the students got more than 10% marks. 6. An animal trainer obtained the following sample data (Table A) in a study of reaction times of dogs (in seconds) to a specific stimulus. He then selected another group of dogs that were much older than the first group and measure their reaction times to the same stimulus. The sample data is shown in Table B. Table A Table B Classes Frequency class frequency 2.3-2.9 3 2.3-2.9 4 3.0-3.6 28 3.0-3.6 11 3.7-4.3 2 3.7-4.3 7 4.4-5.0 21 4.4-5.0 11 5.1-5.7 0 5.1-5.7 27 5.8-6.4 8 5.8-6.4 1 Find the variance and standard deviation for the two distributions above. Compare the variation of the data sets. Decide if one data set is more variable than the other. midx 2.6 3.3 4 4.7 5.4 6.1 TABLE A f xf x^2f 3 7.8 20.28 28 92.4 304.92 2 8 32 21 98.7 463.89 0 0 0 8 48.8 297.68 62 255.7 1118.77 midx 2.6 3.3 4 4.7 5.4 6.1 Table A 𝑁 = ∑ 𝑓 = 62 Variance = ∑ 𝑥 2 𝑓− (∑ 𝑥𝑓)2 𝑁 𝑁−1 = 1118.77− 255.72 62 62−1 = 1.0527 Standard deviation = √1.0527 = 1.0260 Table A 𝑁 = ∑ 𝑓 = 61 Variance = ∑ 𝑥 2 𝑓− (∑ 𝑥𝑓)2 𝑁 𝑁−1 = 11326.77− 278.32 62 61−1 = 0.9444 Standard deviation = √0.9444 = 0.9718 TABLE B f xf x^2f 4 10.4 27.04 11 36.3 119.79 7 28 112 11 51.7 242.99 27 145.8 787.32 1 6.1 37.21 61 278.3 1326.35 The variance and standard deviation is more for the data in table A. Therefore, the variation in the data in table A is more than that of table B 7.You are given the following data. Number of Absences Final Grade 0 93 1 90 2 79 2 77 3 77 3 66 4 60 5 56 6 47 A. Make a scatter plot for the data; B. Find the correlation coefficient for the data. C. Find the equation for the regression line for the data, and predict the final grade of a student who misses 3.5 days. x y 0 1 2 2 3 3 4 5 6 93 90 79 77 77 66 60 56 47 y 100 90 80 70 60 50 40 30 20 10 0 y y = -7.9731x + 94.7 R² = 0.9545 0 2 4 Linear (y) 6 8 Correlation coefficient x y x^2 0 1 2 2 3 3 4 5 6 26 93 90 79 77 77 66 60 56 47 645 0 1 4 4 9 9 16 25 36 104 y^2 xy 8649 0 8100 90 6241 158 5929 154 5929 231 4356 198 3600 240 3136 280 2209 282 48149 1633 (∑ 𝑥)(∑ 𝑦) (26)(645) 1633 − 𝑛 9 𝑟= = = −0.9770 2 2 2 (∑ 𝑦) (26) (645)2 (∑ 𝑥) 2 2 √(104 − √(∑ 𝑥 − 9 ) (48149 − 9 ) 𝑛 ) (∑ 𝑦 − 𝑛 ) ∑ 𝑥𝑦 − Regression line 𝑦 = 𝑏0 + 𝑏1 𝑥 (∑ 𝑥)(∑ 𝑦) (26)(645) 1633 − 𝑛 9 𝑏1 = = = −7.9731 2 2 (∑ (26) 𝑥) (∑ 𝑥 2 − 𝑛 ) (104 − 9 ) ∑ 𝑥𝑦 − 𝑏0 = ∑𝑦 ∑ 𝑥 645 26 − 𝑏1 = − (−7.9731) = 94.7 𝑛 𝑛 9 9 The line is 𝑦 = 94.7 − 7.9731 𝑥 When 𝑥 = 3.5 𝑦 = 94.7 − 7.9731(3.5) = 66.7942 ≅ 67