Statistics, Take Home Test 1 Solution Chapter 1, 2 & 3: Intro to Stats

Statistics, Take Home Test 1 Solution
Chapter 1, 2 & 3: Intro to Stats, Summarizing, Describing Data
1. True or False: The value of variance and standard deviation is never negative.
True – these are absolute quantities that is a measure of variation of all values from the
mean (it can be zero)
2. What kind of variable “weights of bears” is? Quantitative or Qualitative
Quantitative – variable “weights of bears” gives numbers that represent counts or
measurements
3. What kind of variable “gender of bears” is? Quantitative or Qualitative
Qualitative – “gender of bears” is distinguished by nonnumeric characteristics
4. Define a population in statistics.
Population is the complete collection of all elements (scores, people, measurement, etc)
to be studied
5. The value of the middle term in a ranked data set is called the median
6. Given any data, how do you find the mode?
Mode is the value that appears with the greatest frequency among the data. A data set can
have one, more than one, or no mode (when all numbers appear with equal frequency).
7. True or False: The “number of chairs” is considered to be a continuous variable.
False – The number of chairs is not continuous. We cannot have ¼ amounts of chairs.
8. On a Pareto chart, the frequencies should be represented in the vertical (or y) axis.
Given the frequency table, answer the following questions.
Age group Frequency
11-20
5
21-30
6
31-40
9
41-50
11
51-60
4
9. The number of classes in the table is 5 [number of statistical age groups defined]
10. The class width is 10
(upper limit – lower limit + 1 unit or difference of two consecutive lower limits or upper
limits i.e. 21-11)
11. The midpoint of the 4th class is 45.5
(41+50)/2 = 45.5
12. The Lower Boundary of the 5th class is 50.5
(50+51)/2 = 50.5 (think of it as a midpoint between the upper limit of 4th class and the
lower limit of 5th class)
13. The Upper Limit of the 1st class is 20
1st class is 11-20  upper limit
14. The sample size is 35
5+6+9+11+4 = 35
15. The relative frequency of the 1st class is
relative frequency: f/n
relative frequency of the 1st class = f/n = 5/35 = 1/7 ≈ 0.1429 (or 14.29 %)
The following frequency table describes the speeds of drivers ticketed through a 30 mph speed
zone.
Speed Frequency (number of drivers)
42-45
25
46-49
14
50-53
7
54-57
3
58-61
1
16. Calculate the relative frequencies for all classes.
n = 50
first class: f/n = 25/50 = 0.5 (or 50%)
second class: 14/50 = 0.28 (or 28%)
third class: 7/50 = 0.14 (or 14%)
fourth class: 3/50 = 0.06 (or 6%)
fifth class: 1/50 = 0.02 (or 2%)
∑rf = 1 (or 100%)
17. What percentage represents the speed of 53 mph or less?
cumulative frequency distribution of 53 mph or less refers to first three classes
cumulative frequency = 0.5 + 0.28 + 0.14 = 0.92
92% represents the speed of 53 mph or less
18. What are the class boundaries?
class boundaries are midpoints between corresponding upper and lower limit
for the outer bound, same amount is either subtracted or added
class boundaries: 41.5-45.5, 45.5-49.5, 49.5-53.5, 53.5-57.5, 57.5-61.5
19. Construct a histogram corresponding to the frequency distribution table.
30
25
20
15
10
5
0
41.5
45.5
49.5
53.5
57.5
20. Prepare the cumulative frequency distribution. (see below)
21. Prepare the cumulative relative frequency distribution.
Cumulative speed
42-45
42-49
42-53
42-57
42-61
Cumulative frequency
25
25+14 = 39
25+14+7 = 46
25+14+7+3 = 49
25+14+7+3+1 = 50
Cumulative relative frequency
25/50 = 0.5 (or 50%)
39/50 = 0.78 (or 78%)
46/50 = 0.92 (or 92%)
49/50 = 0.98 (or 98%)
50/50 = 1 (or 100%)
22. Draw an ogive of the cumulative percentage distribution.
120
100
80
60
40
20
0
41.5
45.5
49.5
53.5
57.5
61.5
23. Using the ogive find the percentage of drivers who drove 47 mph or less.
ogive applies to added class distribution
check #18 for class boundaries and #20 for cumulative percentage data
47 would be somewhere between 45.5 and 49.5 – somewhere between 50% and 78%
Approximately 60% of drivers drove 47 mph or less.
The following data gives the number of hours that a few employees at the GM factory worked
last week.
17, 38, 27, 14, 18, 34, 16, 42, 28, 24, 40, 20, 23, 31, 37, 21, 30, 25
(same data ranked in order) 14, 16, 17, 18, 20, 21, 23, 24, 25, 27, 28, 30, 31, 34, 37, 38, 40, 42
n = 18
24. Find the mean
 x  (14+16+17+18+20+21+23+24+25+27+28+30+31+34+37+38+40+42)/18 =
x
n
485/18 ≈ 26.9444
25. Find the mode
there is no mode (each term applies only once)
26. Find the median.
(25+27)/2 = 26
27. Find the midrange.
minimum: 14 maximum: 42
MR = (Min + Max)/2 = (14+42)/2 = 28
28. Find the range
R= max – Min = 42 – 14 = 28.
29. Find the variance.
s2 = ∑(x-x)2/n-1
∑(x-x)2 ≈ 1397.169753
s2 = 1387.169753 / 17
s2 ≈ 82.1865
30. Find the standard deviation.
s = √s2 (value that we found from above)
s ≈ 9.0657
31. Find the interquartile range (IQR).
Q2 = median = 26
Q1 = middle value between first value and the median = 20
Q3 = middle value between median and last item = 34
interquartile range: Q3 – Q1 = 34 – 20 = 14
IQ scores have a mean of 100 and a standard deviation of 15.
32. Find the coefficient of variance.
 15
CV  
 15%
 100
33. Using the range rule of thumb to establish the minimum and maximum “usual” IQ
scores.
  2  100 – 2(15) = 70 to 100 + 2(15) = 130
usual minimum is 70 and usual maximum is 130
34. Using the Chebyshev’s Theorem, find what is the least percentage of those who will
have an IQ score of 70 to 130.
1 – 1/K2
K = 2 (refer to #33, K is the number of standard deviations away from the mean)
1 – 1/22 = 1 – ¼ = ¾
At least 75% have an IQ score of 70 to 130.
35. Using the empirical rule, find the percentage of those who will have an IQ score of 70 to
130.
95% will have an IQ score of 70 to 130.
(70 to 130 are 2 standard deviations away from the mean)
36. Define a parameter and a statistic.
parameter: a numerical measurement describing some characteristic of a population
statistic: a numerical measurement describing some characteristic of a sample
37. Define random sample and simple random sample.
random sample: members of the population are selected in such a way that each
individual member has an equal chance of being selected
simple random sample (of size n): subjects selected in such a way that every possible
sample of the same size n has the same chance of being chosen
38. Define the following types of sampling: systematic, convenience, stratified, cluster
systematic sampling: select some starting point, and then select every kth element in
population
convenience sampling: use results that are easy to get
stratified sampling: subdivide the population into at least two different subgroups that
share the same characteristics, then draw a sample from each subgroup (stratum)
cluster sampling: divide the population sections (or clusters), randomly select some of
those clusters, choose all members from selected clusters
39. What are different levels of measurement of data? Give examples.
nominal level of measurement: qualitative data
ex) gender of subjects
ordinal level of measurement: categories with some order (differences between data
values either cannot be determined or is meaningless but there is an order)
ex) course grades A, B, C, D, F
interval level of measurement: differences between data values are meaningful, but there
is no natural starting point (the value 0 does not mean lack of)
ex) years such as 1000, 2000, 1492, 1776
ratio level of measurement: interval level modified to include natural zero starting point
ex: price of college textbooks ($0 means no cost)
40. What’s the difference between an observational study and an experiment? Give
examples.
observational study: observing and measuring specific characteristics without attempting
to modify the subjects being studied
ex) Charles Darwin’s observation of Darwinian finches at the Galapagos Islands
experiment: apply some treatment and then observe its effects on the subjects
ex) giving some type of medicine and see whether it cures certain type of disease among
subjects
41 Given the following set of data: 32, 19, 14, 7, 15, 3, 4, 5, 9, 16, 15, 16, 19, , 50
a) Rank the data from smallest to largest.
b) Prepare a box-and-whisker plot. [Box plot]
c) Does this data set contain any outliers? [Make sure to show the lower and the upper fences
on your graph]
d) Are the data symmetric or skewed? [If skewed, are they skewed left or right?]
42 Draw the box-and-whisker plot for the following data set:
77, 79, 80, 86, 87, 87, 94, 99
Median: (86 + 87) ÷ 2 = 86.5 = Q2
This splits the list into two halves: 77, 79, 80, 86 and 87, 87, 94, 99. Since the halves of the
data set each contain an even number of values, the sub-medians will be the average of the
middle two values. Copyright © 2004-2011 All Rights Reserved
Q1 = (79 + 80) ÷ 2 = 79.5
Q3 = (87 + 94) ÷ 2 = 90.5
Minimum = 77, Q1 = 79.5, Q2= 86.5, Q3= 90.5, Maximum = 99
Box & Whisker Plot:
This set of five values has been given the name "the five-number summary".
To find the outliers:
IQR = Q3 – Q1= 90.5 -79.5 = 11.
The values for Q1 – 1.5×IQR and Q3 + 1.5×IQR are the "fences" that mark off the "reasonable"
values from the outlier values. Outliers lie outside the fences.
The outliers will be any values below Q1 – 1.5×IQR = 79.5 – 1.5×9 = 79.5 – 13.5 = 66 or above
Q3 + 1.5×IQR = 90.5 + 1.5×9 = 90.5 + 13.5 = 104.
The extreme values (Outliers) will be those below Q1 – 3×IQR or above Q3 + 3×IQR.
Study collections