1 2.14 Exercises – Solutions 1. Contrast, by real examples, random and nonrandom samples. A random sample of 50 trees in Oak Mountain State Park would be one taken such that every possible sample of 50 trees has the same chance of being selected. A nonrandom sample of 50 trees in Oak Mountain State Park would be one consisting of the first 50 trees identified after entering the park. 2. Describe a realistic situation in which a decision maker would prefer a stratified random sample to a simple random sample. Why? The population of undergraduate students at Samford University consists of 60% males and 40% females. A stratified random sample is preferable to a simple random sample since the stratified sample would guarantee that males and females are represented in the sample in the same proportion they occur in the population. 3. What kinds of situations would lend themselves to the use of systematic samples instead of simple random samples and why? A systematic sample would be indicated when a printed list of the population is available (as opposed to an electronic form of the list). 4. Why should a decision maker be concerned with the size of the sample as well as its randomness? Size is related to the likelihood that a sample is representative; the larger the sample the greater the chance that it reflects the make-up of the population. 5. Describe a realistic situation in which it would be advantageous to use a grouped data frequency distribution as opposed to a raw data frequency distribution. A decision maker has a sample of 1500 observations consisting of the salaries of mid-level managers in the IT field. 6. What, if anything, does a histogram tell you that a frequency curve does not? A histogram may communicate the number of observations within specific intervals of the variable being displayed. 2 7. Provide a realistic situation which would likely result in: a) a negatively skewed distribution an easy finance exam b) a positively skewed distribution an extremely difficult finance exam c) a unimodal symmetric distribution a sample of the digits 0 through 9 taken from residential phone numbers d) a bimodal distribution. a capstone exam given to a random sample of freshman and senior business students 8. What, if anything, does a relative frequency curve tell the decision maker that a frequency curve does not? the percentage of sample observation of a specific value 9. Why does a cumulative relative frequency distribution go no higher than 1.0? Because the greatest cumulative proportion (percentage) of observations is 1.0 (100%) 10. What kind of distribution would produce a cumulative frequency curve which is a straight line? a uniform distribution 11. In what kind of distributions would the mean median and mode be the same value? unimodal symmetric 12. Give an example which shows the difference between: a) X and μ the mean age of a single classroom of Samford undergraduates versus the mean age of the population of all current undergraduates at Samford 3 b) S2 and σ2 the variance of the ages of a single classroom of Samford undergraduates versus the variance of the ages in the population of all current undergraduates at Samford 13. Verify that ∑ (X - X ) = 0. X 1 2 3 X=2 X−X 1 – 2 = -1 2–2=0 3–2=1 ∑ ( X − X ) =0 √ 14. What situation(s) would guarantee that the X value calculated with raw data would be the same as X calculated with grouped data? if the raw data were the same as the midpoints of the corresponding groups 15. When would it be desirable to utilize the median instead of the mean? Provide a real example. when a sample is skewed, such as a class of statistics students who generally do poorly on the first exam but where a few “curve busters” score well 16. Why were absolute value signs used in the formula for mean deviation? What would be the case if they were omitted? Absolute value signs were used in the formula for MD to eliminate negative deviation scores (i.e., X − X ); if the absolute value signs were omitted it would read, ∑ X − X - the sum of all of the values of X, minus the mean 17. What, if anything, does S indicate that S2 does not. Why use S since S2 must be calculated first? S provides us with a measure of variability in the same units as the variable. 18. Why is n-1 used in equation (2.5) instead of n? to make S2 an unbiased estimate of σ2 4 19. When would one of the calculating formulas for S2 be more convenient than the definitional formula, (2.5)? When certain summary statistics are readily available. 20. What does the decision maker know for sure if (2.5), (2.7), and (2.8) produce different answers for the same data set? that a calculational error has been made 21. In transforming scores why is it usually advisable for the decision maker to consider the standard deviation first when making the initial transformation? Because changing the standard deviation first will impact both the mean and standard deviation of the sample; changing the mean second insures that only the mean (and not the standard deviation) is changed. 22. Why can the transformation into standard unit scores disregard the advice in question 21 above? Because the target standard deviation of zero prevents the mean from changing in step 1. 23. Why can skew be negative and positive whereas kurtosis can only be positive? It is a function of the mathematics; skewness involves a cube in the numerator (meaning that a positive value cubed will remain positive and a negative value cubed will remain negative) and kurtosis involves a numerator raised to the fourth power (meaning that a positive value will remain positive and a negative value will become positive). 24. Calculate X , the median, mode, range, S and S2 using the following scores on an attitude questionnaire (calculate both manually and with Minitab). Attitude Score 0 1 2 3 4 5 X= Frequency 3 5 2 1 0 2 0 + 0 + 0 + 1 + 1 + 1 + 1 + 1 + 2 + 2 + 3 + 5 + 5 22 = = 1.7 13 13 5 S= (0 − 1.7) 2 + (0 − 1.7) 2 + (0 − 1.7) 2 + (1 − 1.7) 2 + (1 − 1.7) 2 + L + (5 − 1.7) 2 = 1.7 13 − 1 S2 = (1.7)2 = 2.9 md = 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 5, 5 = 1 mode = 1 range = 5 – 0 = 5 Minitab: Descriptive Statistics: Attitude Variable Attitude Total Count 13 Mean 1.692 StDev 1.702 Variance 2.897 Median 1.000 Range 5.000 (Note: Minitab does not compute the mode.) 25. Calculate X using the following set of performance scores. Performance 21 – 25 26 – 30 31 – 35 36 – 40 X= Frequency 1 2 3 1 Midpoint 23 28 33 38 1(23) + 2(28) + 3(33) + 1(38) 23 + 56 + 99 + 38 216 = = = 30.9 7 7 7 26. In what way does the size of X affect the size of S? Generally speaking, the sample mean has no affect on the size of the sample standard deviation. 6 27. Calculate, with Minitab, the mean, median, standard deviation and variance of the pretest scores for the Section II groups in Appendix D. Descriptive Statistics: Pretest Variable Pretest Total Count 20 Mean 14.050 StDev 3.103 Variance 9.629 Median 14.000 28. Using the Lecture Methods data set, using Mintab, create a) a simple bar chart for ‘Grade.’ Chart of Grade 10 Count 8 6 4 2 0 A B C Grade D F b) a simple histogram for ‘Final.’ Manually draw a frequency polygon on the histogram generated in b). 7 c) a cumulative distribution curve for ‘Pretest.’ Empirical CDF of Pretest Normal Mean StDev N 100 Percent 80 60 40 20 0 0 5 10 15 20 Pretest 25 30 d) a raw data frequency distribution table for 'Grade.' Tally for Discrete Variables: Grade Grade A B C D F N= Count 3 7 10 3 2 25 CumCnt 3 10 20 23 25 Percent 12.00 28.00 40.00 12.00 8.00 CumPct 12.00 40.00 80.00 92.00 100.00 e) a grouped data frequency distribution table for 'Final.' Tally for Discrete Variables: FinalGrp FinalGrp 35-39 40-44 45-49 50-54 55-59 60-64 65-69 N= Count 3 5 5 4 4 2 2 25 CumCnt 3 8 13 17 21 23 25 Percent 12.00 20.00 20.00 16.00 16.00 8.00 8.00 CumPct 12.00 32.00 52.00 68.00 84.00 92.00 100.00 35 16.24 5.819 25 8 29. Transform the following sample of observations (X) to a new set of observations with a mean ( ) of 200 and a standard deviation (s) of 50. X 2 1 1 x 21.7 → 6 5 X’ 43.5 21.7 21.7 +134.8 → 130.4 108.7 X” 178.3 156.5 156.5 264.8 243.5 X = 3; s = 2.3 X' = 65.2; s ≅ 50 X" ≅ 200; s ≅ 50 30. Convert the sample of observations (in 29 above) into their corresponding z-scores. X 2 1 1 6 5 X−X s 2−3 2.3 1− 3 2.3 1− 3 2.3 6−3 2.3 5−3 2.3 z -0.43 -0.87 -0.87 1.30 0.87 z = 0.00; s ≅ 1