Statistics, Take Home Test 1 Solution Chapter 1, 2 & 3: Intro to Stats, Summarizing, Describing Data 1. True or False: The value of variance and standard deviation is never negative. True – these are absolute quantities that is a measure of variation of all values from the mean (it can be zero) 2. What kind of variable “weights of bears” is? Quantitative or Qualitative Quantitative – variable “weights of bears” gives numbers that represent counts or measurements 3. What kind of variable “gender of bears” is? Quantitative or Qualitative Qualitative – “gender of bears” is distinguished by nonnumeric characteristics 4. Define a population in statistics. Population is the complete collection of all elements (scores, people, measurement, etc) to be studied 5. The value of the middle term in a ranked data set is called the median 6. Given any data, how do you find the mode? Mode is the value that appears with the greatest frequency among the data. A data set can have one, more than one, or no mode (when all numbers appear with equal frequency). 7. True or False: The “number of chairs” is considered to be a continuous variable. False – The number of chairs is not continuous. We cannot have ¼ amounts of chairs. 8. On a Pareto chart, the frequencies should be represented in the vertical (or y) axis. Given the frequency table, answer the following questions. Age group Frequency 11-20 5 21-30 6 31-40 9 41-50 11 51-60 4 9. The number of classes in the table is 5 [number of statistical age groups defined] 10. The class width is 10 (upper limit – lower limit + 1 unit or difference of two consecutive lower limits or upper limits i.e. 21-11) 11. The midpoint of the 4th class is 45.5 (41+50)/2 = 45.5 12. The Lower Boundary of the 5th class is 50.5 (50+51)/2 = 50.5 (think of it as a midpoint between the upper limit of 4th class and the lower limit of 5th class) 13. The Upper Limit of the 1st class is 20 1st class is 11-20 upper limit 14. The sample size is 35 5+6+9+11+4 = 35 15. The relative frequency of the 1st class is relative frequency: f/n relative frequency of the 1st class = f/n = 5/35 = 1/7 ≈ 0.1429 (or 14.29 %) The following frequency table describes the speeds of drivers ticketed through a 30 mph speed zone. Speed Frequency (number of drivers) 42-45 25 46-49 14 50-53 7 54-57 3 58-61 1 16. Calculate the relative frequencies for all classes. n = 50 first class: f/n = 25/50 = 0.5 (or 50%) second class: 14/50 = 0.28 (or 28%) third class: 7/50 = 0.14 (or 14%) fourth class: 3/50 = 0.06 (or 6%) fifth class: 1/50 = 0.02 (or 2%) ∑rf = 1 (or 100%) 17. What percentage represents the speed of 53 mph or less? cumulative frequency distribution of 53 mph or less refers to first three classes cumulative frequency = 0.5 + 0.28 + 0.14 = 0.92 92% represents the speed of 53 mph or less 18. What are the class boundaries? class boundaries are midpoints between corresponding upper and lower limit for the outer bound, same amount is either subtracted or added class boundaries: 41.5-45.5, 45.5-49.5, 49.5-53.5, 53.5-57.5, 57.5-61.5 19. Construct a histogram corresponding to the frequency distribution table. 30 25 20 15 10 5 0 41.5 45.5 49.5 53.5 57.5 20. Prepare the cumulative frequency distribution. (see below) 21. Prepare the cumulative relative frequency distribution. Cumulative speed 42-45 42-49 42-53 42-57 42-61 Cumulative frequency 25 25+14 = 39 25+14+7 = 46 25+14+7+3 = 49 25+14+7+3+1 = 50 Cumulative relative frequency 25/50 = 0.5 (or 50%) 39/50 = 0.78 (or 78%) 46/50 = 0.92 (or 92%) 49/50 = 0.98 (or 98%) 50/50 = 1 (or 100%) 22. Draw an ogive of the cumulative percentage distribution. 120 100 80 60 40 20 0 41.5 45.5 49.5 53.5 57.5 61.5 23. Using the ogive find the percentage of drivers who drove 47 mph or less. ogive applies to added class distribution check #18 for class boundaries and #20 for cumulative percentage data 47 would be somewhere between 45.5 and 49.5 – somewhere between 50% and 78% Approximately 60% of drivers drove 47 mph or less. The following data gives the number of hours that a few employees at the GM factory worked last week. 17, 38, 27, 14, 18, 34, 16, 42, 28, 24, 40, 20, 23, 31, 37, 21, 30, 25 (same data ranked in order) 14, 16, 17, 18, 20, 21, 23, 24, 25, 27, 28, 30, 31, 34, 37, 38, 40, 42 n = 18 24. Find the mean x (14+16+17+18+20+21+23+24+25+27+28+30+31+34+37+38+40+42)/18 = x n 485/18 ≈ 26.9444 25. Find the mode there is no mode (each term applies only once) 26. Find the median. (25+27)/2 = 26 27. Find the midrange. minimum: 14 maximum: 42 MR = (Min + Max)/2 = (14+42)/2 = 28 28. Find the range R= max – Min = 42 – 14 = 28. 29. Find the variance. s2 = ∑(x-x)2/n-1 ∑(x-x)2 ≈ 1397.169753 s2 = 1387.169753 / 17 s2 ≈ 82.1865 30. Find the standard deviation. s = √s2 (value that we found from above) s ≈ 9.0657 31. Find the interquartile range (IQR). Q2 = median = 26 Q1 = middle value between first value and the median = 20 Q3 = middle value between median and last item = 34 interquartile range: Q3 – Q1 = 34 – 20 = 14 IQ scores have a mean of 100 and a standard deviation of 15. 32. Find the coefficient of variance. 15 CV 15% 100 33. Using the range rule of thumb to establish the minimum and maximum “usual” IQ scores. 2 100 – 2(15) = 70 to 100 + 2(15) = 130 usual minimum is 70 and usual maximum is 130 34. Using the Chebyshev’s Theorem, find what is the least percentage of those who will have an IQ score of 70 to 130. 1 – 1/K2 K = 2 (refer to #33, K is the number of standard deviations away from the mean) 1 – 1/22 = 1 – ¼ = ¾ At least 75% have an IQ score of 70 to 130. 35. Using the empirical rule, find the percentage of those who will have an IQ score of 70 to 130. 95% will have an IQ score of 70 to 130. (70 to 130 are 2 standard deviations away from the mean) 36. Define a parameter and a statistic. parameter: a numerical measurement describing some characteristic of a population statistic: a numerical measurement describing some characteristic of a sample 37. Define random sample and simple random sample. random sample: members of the population are selected in such a way that each individual member has an equal chance of being selected simple random sample (of size n): subjects selected in such a way that every possible sample of the same size n has the same chance of being chosen 38. Define the following types of sampling: systematic, convenience, stratified, cluster systematic sampling: select some starting point, and then select every kth element in population convenience sampling: use results that are easy to get stratified sampling: subdivide the population into at least two different subgroups that share the same characteristics, then draw a sample from each subgroup (stratum) cluster sampling: divide the population sections (or clusters), randomly select some of those clusters, choose all members from selected clusters 39. What are different levels of measurement of data? Give examples. nominal level of measurement: qualitative data ex) gender of subjects ordinal level of measurement: categories with some order (differences between data values either cannot be determined or is meaningless but there is an order) ex) course grades A, B, C, D, F interval level of measurement: differences between data values are meaningful, but there is no natural starting point (the value 0 does not mean lack of) ex) years such as 1000, 2000, 1492, 1776 ratio level of measurement: interval level modified to include natural zero starting point ex: price of college textbooks ($0 means no cost) 40. What’s the difference between an observational study and an experiment? Give examples. observational study: observing and measuring specific characteristics without attempting to modify the subjects being studied ex) Charles Darwin’s observation of Darwinian finches at the Galapagos Islands experiment: apply some treatment and then observe its effects on the subjects ex) giving some type of medicine and see whether it cures certain type of disease among subjects 41 Given the following set of data: 32, 19, 14, 7, 15, 3, 4, 5, 9, 16, 15, 16, 19, , 50 a) Rank the data from smallest to largest. b) Prepare a box-and-whisker plot. [Box plot] c) Does this data set contain any outliers? [Make sure to show the lower and the upper fences on your graph] d) Are the data symmetric or skewed? [If skewed, are they skewed left or right?] 42 Draw the box-and-whisker plot for the following data set: 77, 79, 80, 86, 87, 87, 94, 99 Median: (86 + 87) ÷ 2 = 86.5 = Q2 This splits the list into two halves: 77, 79, 80, 86 and 87, 87, 94, 99. Since the halves of the data set each contain an even number of values, the sub-medians will be the average of the middle two values. Copyright © 2004-2011 All Rights Reserved Q1 = (79 + 80) ÷ 2 = 79.5 Q3 = (87 + 94) ÷ 2 = 90.5 Minimum = 77, Q1 = 79.5, Q2= 86.5, Q3= 90.5, Maximum = 99 Box & Whisker Plot: This set of five values has been given the name "the five-number summary". To find the outliers: IQR = Q3 – Q1= 90.5 -79.5 = 11. The values for Q1 – 1.5×IQR and Q3 + 1.5×IQR are the "fences" that mark off the "reasonable" values from the outlier values. Outliers lie outside the fences. The outliers will be any values below Q1 – 1.5×IQR = 79.5 – 1.5×9 = 79.5 – 13.5 = 66 or above Q3 + 1.5×IQR = 90.5 + 1.5×9 = 90.5 + 13.5 = 104. The extreme values (Outliers) will be those below Q1 – 3×IQR or above Q3 + 3×IQR.