Solutions to Exercises for Chapter 2 2.1. a) Range = 54 CI = 2 – leads to 27 intervals – bad choice CI = 3 – leads to 18 intervals – a possibility CI = 5 – leads to 10–11 intervals – also a possibility Choice: With only 57 scores, we would choose CI width = 5, going 10–11 intervals. Lowest interval would be 20–24. b) Range = 154 CI = 5 – leads to 30–31 intervals – bad choice CI = 10 – leads to 15–16 intervals – OK CI = 15 – leads to 10–11 intervals – OK Choice: With only 32 scores, I would use CI = 15, yielding 10–11 intervals. Lowest interval would be 120–134. c) Range = 29 CI = 2 – leads to 14–15 intervals – OK CI = 3 – leads to 9–10 intervals – OK? Choice: With the relatively large number of scores (112), I would opt for CI = 2. Lowest interval would be 8–9. d) Range = 120 CI = 5 – leads to 24 intervals – not too good CI = 10 – leads to 12 intervals – OK CI = 15 – leads to 8 intervals – not too good Choice: CI = 10, going about 12 intervals. Lowest interval would be 390–399. In actual practice, if there are two possible solutions, you can construct both grouped distributions and then make a decision as to which solutions best depicts the data, in your opinion. 2.2. The first step in constructing a frequency distribution is to identify the largest and smallest values in the data set. For these data, the largest score is 29 and the smallest score is 4. To construct the ungrouped distribution, you list all possible value between the largest and smallest scores as is done below in Part A. The final step in constructing an ungrouped frequency distribution is to convert the “tallies” to numbers, as in Part B. Part A Y Part B Y f(Y) 29 1 28-29 1 28 28 0 26-27 0 27 27 0 24-25 1 26 26 0 22-23 4 25 1 20-21 2 24 0 18-19 3 29 25 f(Y) Part C | | 24 Class Interval Freq. 23 | 23 1 16-17 5 22 ||| 22 3 14-15 3 21 || 21 2 12-13 7 20 0 10-11 7 20 19 | 19 1 8-9 9 18 || 18 2 6-7 11 17 ||| 17 3 4-5 7 16 || 16 2 15 || 15 2 14 | 14 1 13 ||| 13 3 12 |||| 12 4 11 |||| 11 4 10 ||| 10 3 9 ||||| | 9 6 8 ||| 8 3 7 ||||| 7 5 6 ||||| | 6 6 5 ||||| 5 5 4 || 4 2 The ungrouped distribution indicates that there are more scores in the low end than in the high end. Given that there are 26 values in the ungrouped distribution, one might wish to construct a grouped distribution. Applying the guidelines presented earlier in this chapter, a class interval width of two would yield about 13 intervals. An interval width of three would result in about eight or nine intervals. Therefore, we would opt for the solution using a width of two. However, one could argue that the class interval width should be three, given that √60 is between 7 and 8. The grouped frequency distribution is presented in Part C above. Note that the nominal lower limit of each interval is a multiple of the interval width. In addition, we entered the data into an Excel spreadsheet. In the first cell (A1), we entered the name, “jobsat”, without the quotation marks. Immediately below, we entered the 60 values, all in column 1. Then, we saved the Excel file as a text file named chap2.ex2. Excel automatically appended the extension, .txt. We then started R and executed the following commands: chap2.ex2 <- read.table ("c:/bookdatar/chap2.ex2.txt",header=T) attach(chap2.ex2) names(chap2.ex2) length(jobsat) table(jobsat) The output from R: > names(chap2.ex2) [1] "jobsat" > length(jobsat) [1] 60 > table(jobsat) jobsat 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 25 29 2 5 6 5 3 6 3 4 4 3 1 2 2 3 2 1 2 3 1 1 1 As you can see, R provides us with the ungrouped frequency distribution. We can go from here to construct a grouped frequency distribution as before. 2.3. First, let’s use R to create a table of the distribution of these computer aptitude (comp.apt) scores. We created an Excel file as we did with the previous exercise. The results of the table command: > table(comp.apt) comp.apt 17 26 35 42 45 48 49 51 53 54 55 57 58 59 60 61 62 63 64 66 67 68 69 70 71 72 1 1 1 1 2 1 2 3 1 2 1 2 2 2 1 1 1 1 2 3 1 1 3 2 2 3 75 76 77 79 81 83 93 97 1 1 1 1 1 2 1 1 In order to construct a grouped frequency distribution for these data, we identify the extreme values: the highest is 97 and the lowest is 17. Thus, the cases span about 80 values. If we were to use an interval width of 2, we would have about 40 intervals, not a satisfactory solution. An interval width of 3 would result in about 27 intervals, still unsatisfactory. Using an interval width of 5 would yield about 16 intervals, thus meeting our criterion of between 10 and 20 intervals. If we were to employ an interval width of 10, we would have about 8 intervals, too few to meet the criterion. However, one (again) could argue that the √52 is approximately 7. For now, we select five as our interval width. To construct the class intervals, we need to use multiples of 5 as the nominal lower limits, starting with the interval that will contain the smallest value, and proceeding up. Given that the smallest value is 17, we would start with the interval 15–19. The entire solution is shown below. The results of the “tallying” are shown in Part A. Coverting the “tallies” to numbers results in Part B, the final solution. Part A Part B Class Interval f(Y) Class Interval f(Y) 95–99 | 95–99 1 90–94 | 90–94 1 85–89 0 85–89 80–84 ||| 80–84 3 75–79 |||| 75–79 4 70–74 ||||| || 70–74 7 65–69 ||||| || 65–69 8 60–64 ||||| | 60–64 6 55–59 ||||| || 55–59 7 50–54 ||||| | 50–54 6 45–49 ||||| 45–49 5 40–44 | 40–44 1 35–39 | 35–39 1 30–34 0 25–29 1 20–24 0 15–19 1 30–34 25–29 | 20–24 15–19 | Nearly all the values fall between 45 and 84. There are two scores that are extremely high and a tail of extreme low scores. 2.4. After rolling the die 60 times, we observed the number of times each side of the die appeared as below. Your solutions will be different from ours. Y f(Y) 6 11 5 9 4 11 3 13 2 7 1 9 2.5. In our solution to Exercise 2.2 above, we noted that we might have used a class interval width of either 2 or 3, although we use 2 in the solution. In this exercise, we are asked to draw a histogram of our grouped solution. We used R to examine solutions for both CI widths (2 and 3). First, using a CI of 2, and then 3: detach(chap2.ex3) attach(chap2.ex2) hist(jobsat,breaks=seq(3.5,29.5,2),xlab="Job Satisfaction Scores for 60 Public School Teachers") rug(jitter(jobsat)) hist(jobsat,breaks=seq(2.5,29.5,3),xlab="Job Satisfaction Scores for 60 Public School Teachers") rug(jitter(jobsat)) Histogram of jobsat 10 8 6 Frequency 6 0 0 2 2 4 4 Frequency 8 12 10 14 Histogram of jobsat 5 10 15 20 25 30 Job Satisfaction Scores for 60 Public School Teachers Take your pick! 5 10 15 20 25 30 Job Satisfaction Scores for 60 Public School Teachers 6. detach(chap2.ex2) attach(chap2.ex3) hist(comp.apt,prob=T,breaks=seq(14.5,99.5,5),xlab="Computer Aptitude Scores for 52 College Professors") lines(density(comp.apt)) rug(jitter(comp.apt)) 0.015 0.010 0.005 0.000 Density 0.020 0.025 0.030 Histogram of comp.apt 20 40 60 80 100 Computer Aptitude Scores for 52 College Professors 2.7. Using R with the data from Exercise 2.3, we have constructed two stem-and-leaf displays, one with an interval width of 5 (corresponding to the histogram and frequency polygon) and the other with an interval width of 10. > stem(comp.apt) The decimal point is 1 digit(s) to the right of the | 1 | 7 2 | 6 3 | 5 4 | 255899 5 | 1113445778899 6 | 01234466678999 7 | 00112225679 8 | 133 9 | 37 > stem(comp.apt,scale=2) The decimal point is 1 digit(s) to the right of the | 1 | 7 2 | 2 | 6 3 | 3 | 5 4 | 2 4 | 55899 5 | 111344 5 | 5778899 6 | 012344 6 | 66678999 7 | 0011222 7 | 5679 8 | 133 8 | 9 | 3 9 | 7