Chapter 5: Probability Distributions 1. a. A random variable is a variable whose values occur at random, following a probability distribution. b. An observation is the actual realization of a random variable. c. x is the average of sample observations, while μ is the theoretical mean from a probability distribution. 2. The 50 woman-owned businesses do not constitute a random sample, since the are not randomly selected from the population of woman-owned business. They are the top 50 (in terms of annual sales) businesses owned by women. Thus, we cannot make accurate inferences about woman-owned businesses based on this sample. 3. a. a Poisson distribution b. Use the Poisson distribution with y=0 and λ=5. The probability of no births will be: p( y ) y y! e 5 0 5 e 0! p (0) 0.0067 p ( 0) c. A Normal distribution. a. mean = 0.5. standard deviation = 0.5 b. mean = 1/3. standard deviation = 2 3 = 0.4714 c. mean = 1/6. standard deviation = 5 6 = 0.3726 a. mean = 12.5. standard deviation = 2.5 b. mean = 25. standard deviation = 3.5355 c. mean = 3.33. standard deviation = 1.49 d. mean = 1.67. standard deviation = 1.1785. 4. 5. 1 Chapter 5: Probability Distributions 6. a. p(n=10) = 0.1762. p(n<=10) = 0.588 b. p(n=15) = 0.0916. p(n>15) = 0.059 c. p(n=3) = 0.2601. p(n<=3) = 0.5593 d. p(n=20) = 0.0679. p(n<=20) = 0.8481 e. p(n=4,5,6) = 0.6563 a. λ = 10. standard deviation = b. average = 10. standard error = 0.6325 a. p(x=2) = 0.2707. p(x<=2) = 0.6767 b. p(x=3) = 0.1804. p(x<=3) = 0.8571 c. p(x=4) = 0.0902. p(x<=4) = 0.9473 d. p(x=5) = 0.0361. p(x<=5) = 0.9834 a. 0.6915 b. 0.8413 c. 0.9505 d. 0.9750 a. -1.6449 b. -1.2816 c. 0 d. 1.2816 e. 1.6449 f. 1.96 g. 2.3263 a. 0.975 b. 0.99996 c. 1.0000 d. 0.025 e. 0.5 7. 10 8. 9. 10. 11. 2 Chapter 5: Probability Distributions 12. a. 2.4369 b. 3.3168 c. 5.0000 d. 7.5631 e. 8.2897 f. 9.6527 a. p = 0.0568 b. 14 players batted at 0.300 or better. This is 5.32% of the sample, which is close the prediction based on the Normal distribution. c. The histogram of the salary data is: 13. Baseball Salaries 90.0 80.0 70.0 50.0 40.0 30.0 20.0 10.0 Salaries 3 2,380 2,221 2,061 1,902 1,742 1,583 1,423 1,264 1,105 945 786 626 467 307 0.0 148 Counts 60.0 Chapter 5: Probability Distributions The Normal P-plot is: d. Salary P-Plot 2.370 1.370 0.370 -0.630 -1.630 -2.630 68 568 1068 1568 2068 Based on the histogram and the P-plot, there is reason to doubt that salary data follows a Normal distribution. e. The average salary is 541.48. The standard deviation of the salary data is 450.16. The largest Normal score = 2.824. Therefore, the predicted maximum salary = (2.824)(450.16) + 541.48 or 1812.73. The observed largest salary is 2460, so the Normal scores underestimate this value. f. The salary data is positively skewed. 4 Chapter 5: Probability Distributions 14. The histogram of the price data appears as follows: a. Histogram of Prices 25.0 20.0 Counts 15.0 10.0 5.0 209,633 198,900 188,167 177,433 166,700 155,967 145,233 134,500 123,767 113,033 102,300 91,567 80,833 70,100 59,367 0.0 Prices The Normal P-plot is: Price P-Plot 1.946 1.446 0.946 0.446 -0.054 -0.554 -1.054 -1.554 -2.054 -2.554 54,000 74,000 94,000 114,000 134,000 154,000 174,000 194,000 214,000 The data does not appear to follow a Normal distribution. b. The average house price = $106,273.50. The standard deviation of the price data = $38,043.70 and the standard error = $3,517.14. Based on the Central Limit theorem, we should be 95% confident that the true mean value falls in the range ($99,239.22 , $113,307.79). 5 Chapter 5: Probability Distributions The first few values of the log(Price) variable are: c. Price LogPrice 87,400 4.942 110,900 5.045 95,000 4.978 87,000 4.940 73,900 4.869 77,000 4.886 133,000 5.124 116,000 5.064 102,000 5.009 94,000 4.973 The histogram of the log(Price) variable is: Log(Price) Histogram 18.0 16.0 14.0 10.0 8.0 6.0 4.0 2.0 Log(Prices) 6 5.312 5.272 5.232 5.192 5.152 5.112 5.072 5.032 4.992 4.952 4.912 4.872 4.832 4.792 0.0 4.752 Counts 12.0 Chapter 5: Probability Distributions The Normal P-plot is: Log(Price) P-Plot 1.946 1.446 0.946 0.446 -0.054 -0.554 -1.054 -1.554 -2.054 -2.554 4.732 4.832 4.932 5.032 5.132 5.232 5.332 The transformed data follows the Normal distribution more closely than the raw price values. 15. a. Because the shots are selected from a population with a standard deviation of 0.2 (a good marksman), there is a 68% probability that a shot will fall within one standard deviationof the mean, i.e. within 0.2 units of 0. Similarly, 95% of the values should be within two standard deviations of the mean, or within 0.4 units of 0. b. To be 95% sure that her sample mean is ± 0.2, she would need a sample size of 100. The standard deviation of the sample mean would be 0.1, so that is 95% likely that the sample mean is within 2 standard deviations (0.2) of the population mean, 0. c. A marksman with highest accurarcy could take 1 shot and have 95% confidence that the sample mean, the vertical displacement of the shot, was within 0.2 of the bull's eye, since 0.1 is that standard deviation of the sample mean. a. Enter the formula, =AVERAGE(A2:I2), into the first cell and then fill the rest of the values in the column down. 16. 7 Chapter 5: Probability Distributions b. The two histograms will appear as follows (answers will vary): Column 1 Histogram 100.0 80.0 60.0 40.0 20.0 1 2 2 2 2 1 1 1 1 1 0 0 0 0 1 0.0 Sample Average Histogram 30.0 25.0 20.0 15.0 10.0 5.0 0.600 0.644 0.556 0.467 0.511 0.422 0.333 0.378 0.289 0.200 0.244 0.156 0.067 0.111 0.022 0.0 The sample average histogram looks more like a Normal distribution (aside from being discrete) than the values from the first column. c. The descriptive statistics are: Count Average Column1 100 0.1600 Standard Deviation 0.4197 Sample Average 100 0.2344 0.1464 The sampling distribution of the Poisson, where λ=0.25 and n=0 should follow a Normal distribution with mean = 0.25 and standard deviation = 0.167. This is reasonable close to the observed values from the sample average. 17. a. Enter the formula, =AVERAGE(A2:I2), into the first cell and then fill the rest of the values in the column down. 8 Chapter 5: Probability Distributions b. The two histograms will appear as follows (answers will vary): Column 1 Histogram 25.0 20.0 15.0 10.0 5.0 0 1 2 2 3 4 4 5 6 6 7 8 8 9 10 0.0 Sample Average Histogram 20.0 15.0 10.0 5.0 2.556 2.778 3.000 3.222 3.444 3.667 3.889 4.111 4.333 4.556 4.778 5.000 5.222 5.444 5.667 0.0 Both distributions follow a bell-shaped curve, though this is more pronounced with the sample average. c. The descriptive statistics are: Count Average Column 1 100 4.320 Standard Deviation 1.858 Sample Average 100 4.020 0.599 The distribution of the Binomial for n=16 and p=0.25 has a mean value of 4 and a standard deviation of 1.732. The mean of the sampling distribution is 4 and the standard error is 0.577. These values are closely matched in the random data. 18. a. Enter the formula, =AVERAGE(A2:I2), into the first cell and then fill the rest of the values in the column down. 9 Chapter 5: Probability Distributions b. The two histograms will appear as follows (answers will vary): Column 1 Histogram 100.0 80.0 60.0 40.0 20.0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0.0 Sample Average Histogram 30.0 25.0 20.0 0.022 0.067 0.111 0.156 0.200 0.244 0.289 0.333 0.378 0.422 0.467 0.511 0.556 0.600 0.644 15.0 10.0 5.0 0.0 Column 1 contains dichotomous output consisting of zeroes and ones. The sample average, while still discrete, is starting to show some of the features of the bell-shaped curve. c. The descriptive statistics for this set of random data are: Count Average Column 1 100 0.170 Standard Deviation 0.378 Sample Average 100 0.238 0.152 For the Bernoulli distribution, the mean is 0.25 and the standard deviation is 0.5. For a sample with 9 observations, the sample average is 0.25 and the standard error is 0.167. 19. a. Enter the formula, =AVERAGE(A2:I2), into the first cell and then fill the rest of the values in the column down. 10 Chapter 5: Probability Distributions b. The two histograms will appear as follows (answers will vary): Column 1 Histogram 12.0 10.0 8.0 6.0 4.0 2.0 4.06 10.65 17.25 23.84 30.44 37.03 43.63 50.22 56.82 63.41 70.01 76.60 83.20 89.79 96.39 0.0 Sample Average Histogram 15.0 10.0 5.0 24.81 27.67 30.54 33.40 36.27 39.13 41.99 44.86 47.72 50.59 53.45 56.31 59.18 62.04 64.91 0.0 Column 1 does not follow the Normal curve well, particularly in the tails. The sample average follows the bell-curve reasonably well. c. The descriptive statistics for this set of random values is: Count Average Standard Deviation Column 1 100 49.163 29.056 Sample Average 100 47.263 9.401 For the Uniform distribution over the range [0, 100] the mean is 50 and the standard deviation is 28.868. For a sample size of 9, the sample average is 50, and the standard error is 9.623. 20. False. The Central Limit Theorem states that the distribution of the sample means–not the sample values, will approach the Normal distribution as the sample size increases. The observations will be distributed following the underlying probability distribution from which they were selected. 21. a. To have a 95% confidence that the mean of a sample is within 2 units of the population mean, the standard error must be one. Thus 100 observations are required. b. The standard deviation of the sample mean should be 1, so 100 observations are needed. c. 0.159. 11 Chapter 5: Probability Distributions 22. The histogram of the reaction times is: a. Reaction Times Histogram 18.0 16.0 14.0 Counts 12.0 10.0 8.0 6.0 4.0 2.0 0.221 0.214 0.207 0.201 0.194 0.187 0.181 0.174 0.167 0.161 0.154 0.147 0.141 0.134 0.127 0.0 Reaction Times The value 0.1 does not appear on the histogram. The histogram would have to extend to the left in order to include to 0.1 value. b. Mean reaction time = 0.1723. Standard deviation = 0.0206. The probability that a reaction time of 0.1 or less could be observed is 0.022% c. The Normal P-plot shown below does not give a compelling reason to discount an assumption of Normality in the data. Reaction P-Plot 1.981 1.481 0.981 0.481 -0.019 -0.519 -1.019 -1.519 -2.019 -2.519 0.124 0.134 0.144 0.154 0.164 0.174 12 0.184 0.194 0.204 0.214 0.224 Chapter 5: Probability Distributions d. Based on these findings, it would appear unlikely for a sprinter to achive a reaction time of 0.1 or less. The extremely low probability (0.022%) would place the odds of such an occurrence at about 5000 to 1. This does not conclusively prove that Christie must have anticipated the starter's gun. Christie, being the reigning world champion, may have an extremely quick reaction time and it may be wrong to classify him along with other, lesser, sprinters. Moreover, since these reaction times are taken from the first heat in which the competition is less keen, it could be that reaction times are not as fast. The greater sprinters might conserve their energy for the finals, where their reaction times might be quicker. To further investigate this issue, it would be better to analyze the reaction times for the 100 meter champions in controlled conditions to see whether their reaction times are quicker and thus a 0.1 reaction time would not be as unlikely. 13 15