Session Packet #2 - Iowa State University

advertisement
Session Packet #2
Supplemental Instruction
Iowa State University
Leader: Carly
Course: Stat 226
Instructor:
Date: 1/29/13
Session Agenda:
Opening Activity:
Main Worksheet:
Closing Activity:
Questions:
Quiz – Reviewing Material
Mean, Variance, St Dev
Number Summaries
Ask Any Questions
Opening Activity: Quiz
For questions 1-13, answer true or false. If the statement is false, change it in a way that makes
it true. Try to complete this quiz on your own without using your notes.
___
1.) The numbers 6, 7, 8, 9 do not have the same standard deviation as 160, 170, 180, 190.
False. The numbers in the first set are not as spread out as the numbers in the second set.
If we were comparing {6, 7, 8, 9} to {106,107,108,109}, the answer would be true
because these sets have the same spread.
___
2.) The quartiles Q1, Q2, and Q3 are numbers that divide an ordered group of
observations into 3 4 equally size groups.
False. “Quartile” means “each of 4 equal groups.” If you have trouble remembering this,
think about how there are 4 quarters in a dollar.
___
3.) Q1 is the median of all observations to the left of the median M, including M.
False. We exclude the median in finding Q1.
___
4.) 75% of all observations in a data set are of lesser value than Q3.
True.
___
5.) For a symmetric distribution, Q1 and Q3 are about equally apart from M.
True.
___
6.) For a left-skewed distribution, Q1 Q3 will be closer to M than Q3 Q1.
False. You can also correct this by changing left-skewed to right-skewed.
___
7.) For a right-skewed distribution, the minimum is closer to M than the maximum.
True.
___
8.) The IQR is equal to the middle 50% of the data.
True.
___
9.) The sample variance corresponds to the mean of all deviations of each observation
from the sample mean.
1060 Hixson-Lied Student Success Center  515-294-6624  sistaff@iastate.edu  http://www.si.iastate.edu
True.
___
10.) The sample variance standard deviation, s, is equal to the square root of the standard
deviation sample variance, s2.
False.
___
11.) For the data set 7, 7, 7, 7, 7, 7, 7, 7, the standard deviation is equal to zero, but the
variance is equal to 1 and the variance is also equal to zero.
False.
___
12.) If a data set is reasonably symmetric and no outliers are present, then we could use
the mean and standard deviation or should only use the IQR and 5-number summary as
measures of spread and center.
False.
___
13.) It is more advantageous to work with standard deviation than variance because
standard deviation is expressed in the same units as the observations in the data set.
True.
Main Worksheet: Using the Mean, Variance, and Standard Deviation
Sample Mean:
𝑛
𝑦1 + 𝑦2 + … + 𝑦𝑛 1
𝑦̅ =
= ∑ 𝑦𝑖
𝑛
𝑛
𝑖=1

Why: This is the formula for calculating the sample mean, a measure of central tendency.

When: The mean will be a fairly accurate measure of center when we are working with a
relatively symmetric distribution of data.
How:
𝑦̅ – the variable for the sample mean; measured in whatever units the observations
are measured in
𝑦1 , 𝑦2 , 𝑦3 … the first, second, third… (and so on) observations in a data set

𝑖 – the variable that denotes an observation’s position in the data set
𝑦𝑖 – the variable that denotes some specific observation
𝑛 – the variable for sample size; the number of observations; whatever we are
sampling, that is the unit in which “n” is measured
𝑦𝑛 – the variable that denotes the last observation in the data set
∑𝑛𝑖=1 𝑦𝑖 – the summation of all the data observations

Name 1 disadvantage of using the mean.
It is not robust against outliers.

What is the main difference between the mean and median?
The median uses only the positions of the observations to find the center of a data
distribution. The mean uses the actual data values and averages them to find the center of
a data distribution.
Sample Variance:
𝑛
2
2
2
(𝑦
(𝑦
(𝑦
−
𝑦
̅)
+
−
𝑦
̅)
+
⋯
+
−
𝑦
̅)
1
1
2
𝑛
𝑠2 =
=
∑(𝑦𝑖 − 𝑦̅)2
𝑛−1
𝑛−1
𝑖=1

Why: This is a formula for measuring sample variance, a measure of spread.

When: The variance will be a fairly accurate measure of spread when we are working
with a relatively symmetric distribution of data.

How:
𝑠 2 – the variable for the variance; measured in whatever units the response is
measured in, but the unit is squared
(𝑦𝑖 − 𝑦̅) – the difference between an observation and the sample mean
∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 – the sum of all squared deviations from the sample mean

Why do we square the deviations of the observations from the sample mean?
To avoid cancelling out the positive deviations from the sample mean with negative
deviations.

Everything else equal, what happens to the variance as we increase the sample size?
As we increase sample size, sample variance decreases.
Standard Deviation:
𝑠 = √𝑠 2

Why: This is a formula for calculating standard deviation, a measure of spread.

When: The standard deviation will be a fairly accurate measure of spread when we are
working with a relatively symmetric distribution of data.

How?
𝑠 – the variable for the standard deviation; measured in the same units as the
response
𝑠 2 – the variable for the variance; measured in squared units

What is the advantage of using the standard deviation over the sample variance?
The advantage of using the standard deviation over the sample variance is that the
standard deviation is measured in the same units as the response.

Everything else equal, what happens to the standard deviation as we increase the sample
size?
As the sample size increases, standard deviation decreases.
Closing Activity: Number Summaries
For the questions below, use the given data set to compute the number summaries. It is a good
idea to show your work (for the ones that require work) in case you would like to refer back to
this activity later. To gauge what you know and what you need to review further, try to complete
this activity on your own, without the use of your notes. A calculator might be needed.
12, 2, 25, 5, 30, 3, 4, 7, 12, 28, 4, 6, 1, 11, 25, 1, 9, 3, 4, 25
1, 1, 2, 3, 3, 4, 4, 4, 5, 6, 7, 9, 11, 12, 12, 25, 25, 25, 28, 30
1.) Minimum = 1
2.) Maximum = 30
3.) Range = Maximum – Minimum = 30 – 1 = 29
4.) Median = (6+7)/2 = 6.5
5.) Q1 = (3+4)/2 = 3.5
6.) Q3 = (12+25)/2 = 18.5
7.) IQR = Q3 – Q1 = 18.5 – 3.5 = 15
8.) Use a calculator, Excel, or JMP to find the mean.
Mean = 10.850
9.) Use a calculator, Excel, or JMP to find the sample variance.
Sample Variance = 98.766
10.) Use a calculator, Excel, or JMP to find the standard deviation.
Standard deviation = 9.938
11.) Draw a box plot of this data. Don’t forget to label the minimum, Q1, median, Q3, and
maximum.
12.) What is the shape of the distribution? Are there any outliers?
The distribution appears to be bimodal. If we look back at the data set, we see that we do have 2
modes: 4 and 25. It could be argued that the distribution is right-skewed with outliers occurring
at 25 and higher.
13.) Are Q1 and Q3 approximately the same distance to M? Why or why not?
No, from the box and whiskers plot we see that Q1 is much closer to M than Q3. This occurs
because the data is skewed to the right.
14.) The middle 50% of this data is between ____ and ____. (Give actual values.) (3.5 and 18.5)
15.) If you had to choose between using the IQR + 5-number summary or using the standard
deviation and mean, which would you choose? Why?
It would be best to use the IQR and 5 number summary in this case because the data is not
symmetrical. The mean and standard deviation are not robust against outlying data.
Download