3680 Lecture 02

advertisement
Math 3680
Lecture #2
Mean and Standard
Deviation
Mean vs. Median
Example: In a certain class of 13 students,
10 showed up the first exam, while 3 blew it
off: Here are the grades; in order:
0 0 0 55 68 78 79 81 84 87 93 94 98
(A) Calculate the class median.
(i) Include all students.
(ii) Ignore the students who slept in.
(B) Calculate the class mean (average).
(i) Include all students.
(ii) Ignore the students who slept in.
Definition: Sample mean. For a data set of size n,
the sample mean is
n
xi
x
i 1 n
Definition: Population mean. For a finite population
of size N, the population mean is
N
xi
 
i 1 N
0 0 0 55 68 78 79 81 84 87 93 94 98
Example:
Suppose the student who got a 55 instead got a 15.
Would the median change? Would the mean?
Example:
Suppose the 98 is replaced by 980. Would the median
change? Would the mean? By how much?
Note: The mean is much more sensitive to wild
outliers than the median.
Exercise:
For registered students at universities in the U.S.,
which is larger: average age or median age?
Repeat for the heights of 12-year-olds.
Repeat for the weights of 12-year-olds.
Repeat for the scores on a college final exam.
Like the median, the mean only captures central
behavior and does not contain information about
the spread of the data.
Physical interpretation of the mean: a “balance.”
Physical interpretation of the median: half the area
lies on each side.
We have just explored the ideas of mean
(average), median and mode. These
measurements are useful in providing
succinct numerical representations for
measures of central tendencies.
Exercise:
Two different groups of 10 students are given identical
quizzes with the following results. Compute the mean,
median, and mode.
Group A
65 66
67 68 71 73 74 77 77 77
Group B
42 54
58
62
67
77
77
85
93 100
Standard Deviation
Definition: Sample Standard Deviation. For a data
set of size n, the sample standard deviation is
n
1
2
s
( xi  x)

n  1 i 1
1.
2.
3.
Square all of the deviations from average.
Sum the squares, then divide by n - 1
(the degrees of freedom).
Take the square root of the result of step 2.
Intuition: The standard deviation gives a measure of
how “spread out” the data is.
Exercise: For each list below, find x and s:
(i)
(ii)
(iii)
1, 4, 6, 7, 8, 10
5, 8, 10, 11, 12, 14
3, 12, 18, 21, 24, 30
Example: Each of the following lists has an
average of 50. For which one is the SD of the
numbers the biggest? Smallest?
0, 20, 40, 50, 60, 80, 100
0, 48, 49, 50, 51, 52, 100
0, 1, 2, 50, 98, 99, 100
Example: For a list of positive numbers,
can the SD ever be larger than the average?
For large data sets, Microsoft Excel can compute
the mean and standard deviation.
www.math.unt.edu/~allaart/3680/governors.xls
=AVERAGE(A1:E10)
70000
77028
85000
85000
85506
85776
90000
93089
93600
94532
94780
95000
95000
95000
95389
96361
98331
98500
100600
102704
105000
105194
106078
107482
110000
110298
115345
117000
120087
120303
121391
122160
124575
124855
125130
126485
127303
131768
132500
133162
135000
144416
145000
145132
150000
154800
175000
175000
177000
179000
=STDEV(A1:E10)
Average =
SD =
$ 115,953.20
$ 26,810.27
1) The SD says how far away numbers on a list are from their
average. Most entries on the list will be somewhere
around one SD away from the average. Very few will be
more than two or three SDs away.
2) Roughly 68% of the values will be within one SD of the
average, and 95% will be within two SDs.
(This is only a rule of thumb!)
70000 94780 105000
77028 95000 105194
85000 95000 106078
85000 95000 107482
85506 95389 110000
85776 96361 110298
90000 98331 115345
93089 98500 117000
93600 100600 120087
94532 102704 120303
121391
122160
124575
124855
125130
126485
127303
131768
132500
133162
135000
144416
145000
145132
150000
154800
175000
175000
177000
179000
Average =
SD =
$ 115,953.20
$ 26,810.27
Average - 2 SD =
Average - 1 SD =
Average =
Average + 1 SD =
Average + 2 SD =
$ 62,332.66
$ 89,142.93
$ 115,953.20
$ 142,763.47
$ 169,573.74
Example: Estimate the mean of the high
temperatures recorded in Denton over the past 30
days. Then estimate the standard deviation.
Definition: Population standard deviation.

N
1
2
( xi  x)

N i 1
This formula should be used in the (rare) occasion
that the entire population is known, not a sample.
Definition: Sample variance:
n
1
2
s 
( xi  x)

n  1 i 1
2
Definition: Population variance:
1
 
N
2
N
 ( x  x)
i 1
i
2
Grouped Data
Grouped Data
Find  and  for the
age of the population
under 50% of the
poverty threshold.
Q: Why aren’t we
finding x and s?
Q: Will our answer
be exact?
To handle grouped data, we pretend that all members of each
class are located at the midpoint (called the mark).
0
18
25
35
45
55
60
65
18
25
35
45
55
60
65
85
9
21.5
30
40
50
57.5
62.5
75
5561000
2507000
2155000
1792000
1540000
614000
537000
932000
To handle grouped data, we
pretend that all members of
each class are located at the
midpoint (called the mark).
Now compute the mean and
standard deviation:
Definition: Grouped mean:
m
1
x   f i xi
n i 1
where m = number of groups
Definition: Grouped sample variance:
m
1
2
s 
f i ( xi  x)

n  1 i 1
2
Definition: Grouped Population variance:
1
 
N
2
m
 f ( x  x)
i 1
i
i
2
Download