Chapter 3: Numerical Summary Measures 1

advertisement
Chapter 3: Numerical Summary Measures
http://anengineersaspect.blogspot.com/2013_05_01_archive.html
1
Numerical Summary Measures: Goals
• Describe the center of a distribution by:
– mean
– Median
– mode
• Compare the mean and median
• Describe the measure of spread:
– range
– Variance and standard deviation
– Quartiles
• Be able to determine which summary statistics are
appropriate for a given situation
• Empirical Rule and introduction to the normal distribution
• Describe a distribution by a boxplot (five-number summary
2
and outliers)
Definition
Measures of central tendency indicate where
the majority of the data is centered, bunched or
clustered.
3
Notation
• lower case letters, x, y, z indicate the variables.
• x1, x2, x3,….., xn refers to a set of fixed
observations of a variable.
• n : This is the number of observations in a
data set which is called the sample size.
4
Sample Mean
𝑠𝑢𝑚 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 1
𝑥=
=
𝑛
𝑛
𝑥𝑖
μ = population mean
Sample --> Latin letters
Population --> Greek letters
5
Sample Mean: Example
The following data give the time in months from hire
to promotion to manager for a random sample of
20 software engineers from all software engineers
employed by a large telecommunications firm.
a) What is the mean time for this sample?
5
7
12
14
18
14
14
22
21
25
23
24
34
37
34
49
64
47
67
69
b) Suppose that instead of x20 = 69, we had chosen
another engineer that took 483 months to be
promoted. what is the mean time for this new
sample?
6
Sample Median, x̃
Procedure
1. Sort n observations from smallest to largest
2. If n is odd, x̃ is the center
If n is even, x̃ is the average of the two center
observations
7
Sample Median: Example
The following data give the time in months from hire
to promotion to manager for a random sample of
20 software engineers from all software engineers
employed by a large telecommunications firm.
a) What is the median time for this sample?
5
7
12
14
14
14
18
21
22
23
24
25
34
34
37
47
49
64
67
69
b) Suppose that instead of x20 = 69, we had chosen
another engineer that took 483 months to be
promoted. what is the median time for this new
sample?
8
Mean and Median
Mean
Median
Left skew
Mean
Median
Mean
Median
Right skew
9
Mode, M
• The value with the greatest frequency.
10
Sample Mode: Example
The following data give the time in months
from hire to promotion to manager for a
random sample of 20 software engineers
from all software engineers employed by a
large telecommunications firm.
a) What is the mode for this sample?
5
7
12
14
14
14
18
21
22
23
24
25
34
34
37
47
49
64
67
69
11
Variability of Data
1
2
3
-20
Set 1
Set 2
Set 3
-10
-15
-15
-3
-10
-5
-2
0
-5
-1
-1
10
0
0
0
20
5
1
1
10
5
2
15
15
3
12
Measures of Variability
• Sample range
• Sample variance (sample standard deviation)
• Interquartile Range (IQR)
13
Measures of Variability
• Sample range
• Sample variance (sample standard deviation)
• Interquartile Range (IQR)
14
Measures of Variability
• Sample range
• Sample variance (sample standard deviation)
• Interquartile Range (IQR)
15
Sample Variance
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
1
=
𝑛−1
𝑠𝑥2
1
=
𝑛−1
𝑥𝑖2
1
−
𝑛
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑠𝑥 =
(𝑥𝑖 − 𝑥)2
2
𝑥𝑖
1
𝑛−1
(𝑥𝑖 − 𝑥)2
2 = population variance
16
Comments for Standard Deviation
• Variance is used to determine spread for
comparisons.
• s2 = 0 means that all of the observations are
the same, normally s > 0
• n=1
• s is not resistant to outliers
• s has the same units of measurement as the
original observations
17
Sample Standard Deviation: Example
The following data give the time in months from hire
to promotion to manager for a random sample of
20 software engineers from all software engineers
employed by a large telecommunications firm.
a) What is the standard deviation for this sample?
5
7
12
14
14
14
18
21
22
23
24
25
34
34
37
47
49
64
67
69
b) Suppose that instead of x20 = 69, we had chosen
another engineer that took 483 months to be
promoted. what is the standard deviation for this
new sample?
18
Measures of Variability
• Sample range
• Sample variance (sample standard deviation)
• Interquartile Range (IQR)
19
Quartiles
Q1
Q2
Q3
20
Quartiles - Procedure
1. Sort the values from lowest to highest and locate
the median.
2. The first quartile, Q1 is the median of the lower half.
a. Compute d1 = n/4
b. If d1 is an integer, then Q1 is the mean of the
observations at d1 and d1 + 1
c. If d1 is not an integer, the Q1 is the observation at
𝑑1 .
3. The third quartile, Q3 is the median of the upper
half.
a. Computer d2 = 3n/4.
b. Repeat steps 2b and 2c.
21
Quartiles: Example
The following data give the time in months from
hire to promotion to manager for a random
sample of 19 software engineers from all
software engineers employed by a large
telecommunications firm.
24
7
12
14
14
14
18
21
22
23
25
34
34
37
47
49
64
100 150
a) Find the median and the quartiles.
b) What is the Interquartile Range?
c) Are there any outliers in this data set?
22
Outliers
After finding the IQR, find the two inner fences (low
and high) and the two outer fences (low and high)
IFL= Q1 – 1.5(IQR)
OFL= Q1 – 3(IQR)
IFH = Q3 + 1.5 (IQR)
OFH = Q3 + 3 (IQR)
mild
extreme
23
Quartiles: Example
The following data give the time in months from
hire to promotion to manager for a random
sample of 19 software engineers from all
software engineers employed by a large
telecommunications firm.
24
7
12
14
14
14
18
21
22
23
25
34
34
37
47
49
64
100 150
a) Find the median and the quartiles.
b) What is the Interquartile Range?
c) Are there any outliers in this data set?
24
Boxplots
Procedure
1. Find Q1, Q3, median and IQR
2. Calculate IFL, IFH, OFL, OFH
3. Draw a central box from Q1 to Q3. Draw a line
for the median.
4. Extend lines (whiskers) from the box to the
minimum and maximum values that are not
outliers.
5. Put in closed circles for mild outliers and
open circles for extreme outliers.
25
Boxplot: Example
Boxplot of Promotion
160
140
Promotion
120
100
80
60
40
20
0
26
Distributions and Boxplots
27
Side-by-side Boxplot: Example
28
Choosing Measures of Center and
Spread
Choices
1. Mean and standard deviation
2. Median and IQR
ALWAYS PLOT YOUR DATA!
http://freshspectrum.com/wp-content/uploads/2012/09/
Hans-Rosling-Bubble-Plot-Cartoon.jpg
29
Empirical Rule
68-95-99.7 Rule
30
z-score
• 𝑧𝑖 =
𝑥𝑖 −𝑥
𝑠
• z-score is a measure of relative standing
• Given a set of n observations, the sum of the
z-scores is 0.
31
Download