Unit 3 Summary Statistics (Descriptive Statistics)

advertisement
Unit 3
Summary Statistics
(Descriptive Statistics)
FPP Chapter 4
For one variable - Center of distribution
"central value", "typical
value"
- Spread of distribution
How variable are the values
in a set of data?
- Measure how many / what
proportion of observations
are above / below a given
value.
3-1
Stats
W.01
Summary Statistics
Purposes:
compact reporting
easy comparison
Important considerations:
interpretable
stable
We will discuss:
• how the statistics are
defined
• when each is
(in)appropriate
• how to interpret them
• how to compute them
• "guesstimation" techniques
3-2
Stats
Example: Hospital Charges
Total charge (in dollars) of the hospital stay for
29 normal deliveries of babies
Charges
1,905 2,324 2,048 2,888 2,907 2,840
2,607 2,823 2,310 2,953 2,138 3,418
4,903 3,729 3,709 5,063 3,932 3,392
3,287 3,819 4,248 2,640 2,921 2,785
2,804 2,955 2,219 2,184 2,681
3-3
Stats
14,898
Definitions
12
10
8
freq. 6
4
2
1500
2500
3500
4500
5500
Hospital Charges (in Dollars)
mode = most frequently occurring value = _______________
median = "middle value" = __________________
=
mean = sum / # measurements in the data set =
= __________/___________ = _________
=
another way to compute the mean:

1
# observations
1  k

[  y f ]
i i
n 
i1

[sum of (each distinct observed

value  its observed
frequency)]
3-4
Stats
Locating These Summary
Statistics on a Histogram
12
10
8
freq.6
4
2
1500
2500 3500 4500
5500
Hospital Charges (in Dollars)
mode:
median:
mean:
comparing mean & median:
For skewed histograms, the mean could be
deceiving.
3-5
Stats
3-6
Stats
Event Day Abnormal Returns
(ref. "Marketing Science", Fall 1987, vol 6, no 4, pages 320335, "Does It Pay to Change Your Company's Name?")
-1.84 -0.31
0.02
0.30
0.53
1.09
-1.38 -0.24
0.06
0.34
0.55
1.12
-1.00 -0.24
0.09
0.36
0.58
1.23
-0.59 -0.20
0.10
0.39
0.78
1.43
-0.57 -0.16
0.13
0.40
0.81
1.50
-0.56 -0.06
0.21
0.41
0.96
1.60
-0.51 -0.05
0.23
0.43
0.98
1.64
-0.44 -0.02
0.24
0.45
0.99
1.79
-0.39 -0.02
0.25
0.48
1.00
-0.33 -0.01
0.29
0.50
1.03
3-7
Stats
mode = most frequently occurring value =______
median = "middle value" = __________
mean =
"average"
=
(sum of values in list)/(# values in list)
=
_____ / _____ = _____
p th percentile =
the value with p percent of the list less
than (or equal to it) and 100-p percent greater
than it
10 th percentile =
_____
25 th percentile =
_____
80 th percentile =
_____
3-8
Stats
Histogram for
Abnormal Returns
0.4
20
0.3
15
0.2
10
0.1
5
-2.0 -0.5 1.0 2.5 4.0
RETURNS
3-9
Stats
Does This Statistic
Make Sense?
Some summary statistics make sense
only for certain types of data.
mean:
median:
mode:
3-10
Stats
Water Watch
3-11
Stats
Aug 1-22 the average consumption was 223.7
million gallons per day.
Aug 1-25 the average consumption was 224.4
million gallons per day.
Q1: Was the average consumption higher
Aug 1-22 or Aug 23-25?
Q2: What was the total amount of water
consumed Aug 23-25?
Q3: What was the average daily consumption
Aug 23-25?
3-12
Stats
Baseball Batting Averages
Suppose
batting average = (# hits / # at bats) x 1000
Before the game starts, a player has batting
average = 250.
- first at bat, strikes out
- new batting average = 200
Q1: How many times has this batter been up?
Another player starts the game with batting
average 500. After his first at bat, his new
batting average is 524.
Q2: Did he get a hit?
Q3: How many times has this batter been up?
3-13
Stats
3-14
Stats
Measures of
Location & Spread
of a Data Set
LOCATION
mean
median
mode
SPREAD
standard deviation (SD)
range
variance
3-15
Stats
Range
RANGE:
(largest measurement) - (smallest measurement)
example:
3-16
Stats
Deviation from Average
definition:
deviation from average = data value - average
note:
A deviation can be zero.
1 2
5
7
10 data value
3-17
Stats
Standard Deviation
of a list of numbers
definition:
standard deviation = SD
= rms size of the deviations from average
=
avg. of (deviations
from avg.) 2
3-18
Stats
rms
(root mean square)
size of a
list of numbers
root-mean-square (rms) operation
1 2
5
7
10
data value
deviation
3-19
Stats
Standard Deviation
Try another
list of numbers.
Find the standard deviation (rms size of the deviations from
average) for this list of numbers.
2, - 6, 12, 4, 6
I.
Find the average of this list of numbers.
II.
Find the deviation of each value from this average.
III.
Find the rms size of the list of deviations.
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 data
3-20
Stats
Standard Deviation
The STANDARD DEVIATION (SD) OF A DATA
SET measures how far away numbers are from
their average.
Most entries on the list will be somewhere
around one SD away from the average. Very few
will be more than two or three SDs away.
3-21
Stats
Interpreting the
Standard Deviation
* Roughly 68% of the entries on a list (roughly
2/3 of the entries) are within one SD of the
average.
* The other 32% (approximately 1/3) are further
away.
** Roughly 95% (19 out of 20) are within two SDs
of the average.
** The other 5% are further away.
The 2/3 rule is true for most data sets.
The 95% rule is true for many data sets, but not
all.
3-22
Stats
Delivery Times Example
TIME IN DAYS
27
43
43
44
47
49
50
54
58
65
68
71
71
71
73
73
74
75
76
77
79
80
81
83
84
84
84
86
88
88
91
91
93
94
94
94
97
97
103
106
107
108
108
116
120
120
122
123
127
128
Class Limits Tallies Frequency
25-34
|
1
35-44
|||
3
45-54
||||
4
55-64
|
1
65-74
|||| |||
8
75-84
|||| ||||
10
85-94
|||| ||||
9
95-104 |||
3
105-114
||||
4 3-23
115-124
||||
5 Stats
125-134
||
2
Delivery Times Continued
Days Elapsed Between Order Date and
Delivery Date for 50 Orders
.20
rel.
freq. .16
.12
.08
.04
25
45 65
85 105 125
Elapsed Time to Delivery
average (mean) =
median =
SD =
days
3-24
Stats
Delivery Times - 3
“The 2/3 Rule” says that
Roughly 2/3 or 68% of the
entries on a list are within one
SD of the average. 108.0 days
Actually, in this data set, 34 out of 50
deliveries took between 59.4 and 108.0 days.
34/50 = 0.68 = 68%
“The 95% Rule” says that
Roughly 95% of the entries on
a list are within two SD’s of the
average. 108.0 days
Actually, 49 out of 50 deliveries took
between 35.1 and 132.3 days.
49/50 = 0.98 = 98%
3-25
Stats
3-26
Stats
Guesstimating the SD
Middle 2/3 Rule
1. Locate the middle 2/3 of the data.
2. The range of the middle 2/3 of the data is
approximately 2 SD's.
So, 1/2 of this range is approximately 1 SD.
3-27
Stats
Variance
The variance of a list of numbers is the SD
squared.
That is, the SD is the square root of the variance.
3-28
Stats
z-score
The z-score says how many SD's above (+) or
below (-) the average a value is.
The sample z-score for a measurement is
z=
The population z-score for a measurement is
z=
example:
3-29
Stats
Interpreting z-scores
Interpretation of z-Scores for "Mound-Shaped"
Distributions of Data
1. Approximately 68% of the measurements will
have a z-score between -1 and +1.
2. Approximately 95% of the measurements will
have a z-score between -2 and +2.
3. All or almost all of the measurements will
have a z-score between -3 and +3.
3-30
Stats
Wonderlic Scores
3-31
Stats
USC had average team score 20.3. What is their zscore? Is this value extreme among NCAA Division I
teams?
How about Michigan State whose average team
score is 16.6? Find their z-score and interpret it.
How about Stanford whose average team score is
28.2? Find their z-score and interpret it.
.
3-32
Stats
Download