Stat1600 Midterm #1 Solution, Form A 1. The high temperature on Jan 30th (in ◦ F.) for the last 5 years in a northern U.S. city is as follows: 23,30,25,24,28 (a) (10 points) Calculate the mean, X of this data. ANSWER: 26 CALCULATION/REASON: (show your work) X= 130 23 + 30 + 25 + 24 + 28 = = 26. 5 5 (b) (10 points) Calculate the median of this data. ANSWER: 25 CALCULATION/REASON: (show your work) sorted list: 23, 24, 25, 28, 30 5+1 n+1 = = 3 =⇒ M ED = |{z} 25 = 3rd ordered value. 2 2 (c) (15 points) Let’s compare the summary statistics of temperature for two cities during the past year. Statistics City A City B Mean 62◦ F 60◦ F Standard Deviation 10◦ F 14◦ F Based on the information provided in the table, would you expect one of these cities to have more extremes in their temperatures than the other? ANSWER: Yes REASON: City B has more extremes due to larger SD. 1 2. Suppose the height of males in the United States follows a Normal Distribution with a mean of 69 inches and a standard deviation of 4 inches. Given this information, use the Z-Table provided to answer the following questions: (a) (10 points) What percentage of men are under 65 inches tall? The z-score of x = 65: z= x − Mean 65 − 69 = = −1.00. SD 4 Hence, the area to the left of 65 under N (M ean = 69, SD = 4) equals the area to the left of −1.00 under Z curve: = =1− −1.00 = 1 − .8413 = .1587 1.00 1.00 That is, 15.87%. On the other hand, you can apply the empirical rule: 65 inches is 1 SD below the mean (= 69 − 4) and hence the percentage is 32% 100% − 68% = = 16% 2 2 which is fairly close to 15.87%. (b) (15 points) What percentage of men are between 61 and 77 inches? The z-scores of 61 and 77 are, respectively, for 61 : 61 − 69 77 − 69 = −2.00; for 77 : = 2.00 4 4 and hence the percentage: =1− −2.00 2.00 z =2x −2.00 −1 = 2 x .9772 − 1 = .9544 2.00 z 2.00 z That is, 95.44% On the other hand, you can apply the empirical rule: 61 and 77, respectively, are M ean − 2SD and M ean + 2SD and hence the percentage is 95% which is fairly close to 95.44%. 2 (c) (10 points) A male basketball player is 77 inches tall. What percentile is he in? (In other words, what percentage of males are shorter than he is?) The z-score for 77 is 2.00 (from part (b)) and hence the percentage is 97.72% (area is .9772 from Z Table). You can also apply the empirical rule: 95% |{z} M ean±2SD + 100% − 95% = 97.5% 2 | {z } below Mean−2SD which is fairly close to 97.72%. 3. Fred is in a weekly golf league. His scores from the last n = 19 weeks are shown in the stem and leaf plot below: ------------Leaf Unit = 1 7 579 8 0123489 9 01255 10 012 11 1 ------------(So, for example, the first observation under stem 11 is read as 111 and the first observation under stem 7 is read as 75) (a) (7 points) In addition to the Stem-and-Leaf Plot, what other type(s) of data presentation would be most appropriate for this data? Explain your answer. Your choices are: (Bar Chart, Pie chart, Histogram, Dotplot, Box-and-Whisker Plot) Answer: Since this is a numerical data set, histogram, dotplot, and box-and-whisker plot are also appropriate the present the data. (b) (8 points) After examining the Stem-and-Leaf Plot, what seems to be the shape of this data? Answer: right skewed Reason: 3 Turn the stem-and-leaf plot 90◦ counter-clockwise and inspect the shape: long right tail. Hence, it’s right skewed. (c) (15 points) Give the five-number summary then construct a boxplot (with whiskers extending to the two extremes, MIN and MAX). (Hint: How do you construct a sorted list of the data from the stem-and-leaf plot above?) MIN and MAX: MIN=75, MAX=111 Median: (show your work) n+1 19 + 1 = = 10 =⇒ M ED = 2 2 89 |{z} 10th ordered value Quartiles, Q1 and Q3 : (show your work) 19 + 1 n+1 = =5 4 4 so, Q1 = 5th ordered value = 81 Q3 = 5th largest value = 95 boxplot: 75 80 85 90 95 100 105 110 4. (5 points) Extra Credit Problem Assume we have a data set that includes the Area Codes of our customers’ phone numbers. Some examples in the data set include 269, 626, 610, etc. Clearly these are numbers, but are considered nominal (categorical) data. Discuss or prove why these values are actually nominal (categorical) data and should not be analyzed as numerical data. Area codes are used as labels to determine from which part of the country people are calling. Operations cannot be performed on area codes—it would make no sense to add and substract, multiply and divide them—making them not numerical data. 4