Stat 101L: Lecture 5 ( ) ∑

advertisement
Stat 101L: Lecture 5
Measure of Center
ŠSample mean
y=
Total
=
n
(∑ y )
i
n
1
Sample Mean
ŠTotal = 8669
Šn = 24
y=
Total 8669
=
= 361 .2
n
24
2
Mean or Median?
Š The sample mean is the balance
point of the distribution.
Š The sample median divides the
distribution into a lower and an
upper half.
Š For skewed data, the mean is
pulled in the direction of the skew.
3
1
Stat 101L: Lecture 5
Numerical Summaries
ŠHow much variation is there in
the data?
ŠLook for the spread of the
distribution.
ŠWhat do we mean by “spread”?
4
Measures of Spread
Š Sample Range
– The distance from the minimum and
the maximum.
Range = (378 – 349 ) = 29 grams
– The length of the interval that
contains 100% of the data.
– Greatly affected outliers.
5
Quartiles
ŠMedians of the lower and
upper halves of the data.
ŠTrying to split the data into
fourths, quarters.
6
2
Stat 101L: Lecture 5
Quartiles
34 9
Lower quartile = (354+354)/2
= 354 grams
35 12334455567
36 25678899
37 0038
Upper quartile = (368+369)/2
= 368.5 grams
7
Measure of Spread
Š InterQuartile Range (IQR)
– The distance between the quartiles.
IQR = 368.5 – 354 = 14.5 grams
– The length of the interval that
contains the central 50% of the data.
8
Five Number Summary
Š Minimum
Š Lower Quartile
Š Median
Š Upper Quartile
Š Maximum
349 grams
354 grams
359.5 grams
368.5 grams
378 grams
9
3
Stat 101L: Lecture 5
Box Plots
Š Establish an axis with a scale.
Š Draw a box that extends from the
lower to the upper quartile.
Š Draw a line from the lower
quartile to the minimum and
another line from the upper
quartile to the maximum.
10
Outlier Box Plots
ŠEstablishes boundaries on what
are “usual” values based on the
width of the box.
ŠValues outside the boundaries
are flagged as potential
outliers.
11
Contents of Cans of Cola
345
350
355
360
365
370
375
380
385
W eight (grams)
12
4
Stat 101L: Lecture 5
Measures of Spread
ŠBased on the deviation from
the sample mean.
ŠDeviation
(y − y)
13
9-hole Golf Scores
46, 44, 50, 43, 47, 52
282
y=
= 47 strokes
6
40
45
50
55
14
Deviations
–4
+5
–3
+3
–1
40
45
50
55
15
5
Stat 101L: Lecture 5
Sample Variance
Almost the average squared deviation
( (y − y) )
= ∑
2
s2
n −1
16
Sample Variance
s2 =
(16 + 9 + 1 + 25 + 9 ) = 60
5
= 12 strokes 2
5
17
Sample Standard Deviation
(∑ ( y − y ) )
2
s=
s =
s=
12 = 3 . 46 strokes
2
n −1
18
6
Stat 101L: Lecture 5
Which summary is better?
ŠFor symmetric distributions
use the sample mean, y , and
sample standard deviation, s.
ŠFor skewed distributions use
the five number summary.
19
Why?
Š For symmetric distributions the
sample mean and sample median
should be approximately equal so
either would work.
Š We will see in Chapter 6 why the
sample standard deviation is best
for symmetric distributions.
20
Why?
Š For skewed distributions, the
sample mean and standard
deviation will be affected by the
skew and/or potential outliers.
The five number summary
displays the skew and is not
affected by outliers.
21
7
Download