Class 06: Descriptive Statistics

advertisement
Random Variable
X=Amt in next
bottle
X=#G in 4
Probability Distribution
N(μ=10.2,σ=0.16)
B(n=4,p=.5)
X=total of 2
tossed dice
We must admit
that we cannot
know exactly
what value X will
take….
Summary Characteristics
2 3 4 5 6 7 8 9 10 11 12
…so that we can do the intelligent thing and
talk about something we CAN know, the
probability distribution of X.
Mean
Median
Mode
Std dev
Variance
skew
There are summary characteristics of
any probability distribution… But
knowing these summary measures do
not replace our need to know the
probability distribution.
Class 06: Descriptive Statistics
EMBS: 3.1, 3.2, first part of 3.3
Characteristics of
probability distributions
• Measures of Location
– Mean
– Median
– Mode
• Measures of Variability
– Standard Deviation
– Variance
• Measure of Shape
– skewness
Descriptive Statistics
(for numerical data)
• Measures of Location
– Sample Mean
– Sample Median
– Sample Mode
• Measures of Variability
– Sample StDev
– Sample Variance
• Measure of Shape
– Sample skewness
A positively-skewed pdf
Mode is the
most likely
value
P(X<median) =
P(X>median) = 0.5
Mean is the
probabilityweighted average
Skewness > 0
http://dept.econ.yorku.ca/~jbsmith/ec2500_1998/lecture9/Lecture9.html
A negatively-skewed pdf
Skewness < 0
An exhibit at MOMA invites visitors to mark their heights on a wall. A normal
distribution results:
Well, not quite. The distribution is actually slightly negatively skewed by the
confounding presence of children, who are obviously shorter than adults - you
can see this in the great number of names well below the central band which
are not mirrored by names higher up. Rest assured, however, that the exchildren distribution is itself Gaussian.
http://www.thisisthegreenroom.com/2009/bell-curves-in-action/
The Normal pdf
Mean = μ
median = μ
mode = μ
Skewness = 0
http://www.comfsm.fm/~dleeling/statistics/no
tes06.html
Measures of Variability
σ = 0.7
σ = 1.0
σ = 1.5
http://www.google.com/imgres?q=standard+deviation+curve&hl=en&gbv=2&bi
w=1226&bih=866&tbm=isch&tbnid=pppxDi8aC37y8M:&imgrefurl=http://www.
comfsm.fm/~dleeling/statistics/notes06.html&docid=Hu1RMsiu0MevM&imgurl=http://www.comfsm.fm/~dleeling/statistics/normal_curve_d
iff_sx.gif&w=401&h=322&ei=9qAqT8KXAcPptgfC3uX0Dw&zoom=1&iact=hc&vpx
=748&vpy=508&dur=1013&hovh=201&hovw=251&tx=142&ty=111&sig=106136
691078404837864&page=1&tbnh=149&tbnw=186&start=0&ndsp=20&ved=1t:4
29,r:13,s:0
Skewed pdfs can also have different
standard deviations
Which pdf has the largest σ?
Pdfs Can have different means, but
identical standard deviations
Which pdf has the largest σ?
Which pdf has the largest μ?
Characteristics of
probability distributions
• Measures of Location
– Mean
– Median
– Mode
Probability
weighted average
50% point
Most likely
• Measures of Variability
Descriptive Statistics
(for numerical data)
• Measures of Location
– Sample Mean
– Sample Median
– Sample Mode
• Measures of Variability
– Standard Deviation
– Sample StDev
Expected squared
– Variance
distance from mean – Sample Variance
• Measure of Shape
– skewness
Neg if skewed left,
0 if symmetric, pos
if skewed right.
• Measure of Shape
– Sample skewness
Characteristics of
probability distributions
• Measures of Location
– Mean
– Median
– Mode
Probability
weighted average
50% point
Most likely
• Measures of Variability
Descriptive Statistics
(for numerical data)
• Measures of Location
– Sample Mean
– Sample Median
– Sample Mode
– skewness
Neg if skewed left,
0 if symmetric, pos
if skewed right.
=median()
=mode()
• Measures of Variability
– Standard Deviation
– Sample StDev
Expected squared
– Variance
distance from mean – Sample Variance
• Measure of Shape
=average()
• Measure of Shape
=stdev()
=var()
– Sample skewness
=skew()
Characteristics of
probability distributions
• Measures of Location
– Mean
– Median
– Mode
Probability
weighted average
50% point
Most likely
• Measures of Variability
Descriptive Statistics
(for numerical data)
• Measures of Location
– Sample Mean
GET
THEM
– Sample
Median
ALL USING
– Sample Mode
DATA
ANALYSIS,
• Measures
of Variability
DESCRIPTIVE
STATISTICS,
– Standard Deviation
– Sample
StDev
SUMMARY
Expected squared
– Variance
Variance
STATISTCS
distance from mean – Sample
• Measure of Shape
– skewness
Neg if skewed left,
0 if symmetric, pos
if skewed right.
• Measure of Shape
– Sample skewness
Characteristics of
probability distributions
• Measures of Location
– Mean
– Median
– Mode
Probability
weighted average
50% point
Most likely
• Measures of Variability
Descriptive Statistics
(for numerical data)
• Measures of Location
– Sample Mean
– Sample Median
– Sample Mode
• Measures of Variability
– Standard Deviation
– Sample StDev
Expected squared
– Variance
distance from mean – Sample Variance
• Measure of Shape
– skewness
Neg if skewed left,
0 if symmetric, pos
if skewed right.
• Measure of Shape
– Sample skewness
COUNT
RANGE
The sample standard deviation
Understanding sample standard deviation
X
X
X
X
X
X
X
X
X
X
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
10
4
14
10
20
16
26
20
30
stdev 8.25 8.25
It measures
variability about
the mean.
0
0
0
4
6
8
10
10
10
16
14
12
20
20
20
stdev 8.25 7.62 7.21
All the data
contribute to the
measure.
0
0
2
10
4
16
10
18
20
20
stdev 8.07 8.07
It measures
variability …. In
either direction.
Our Data
Section
4
4
4
.
.
.
5
5
5
ND ID
Gender (M=1)
901526453
0
901533561
0
901536075
1
.
.
.
.
.
.
901636399
1
901643915
1
901643995
0
HS Stat?
0
1
0
.
.
.
0
0
0
Ht
67
63
70
.
.
.
76
72
64
Value
0
0.06
0
.
.
.
0
0.1
0
Data/DataAnalysis/DescriptiveStatistics
SummaryStatistics
ND ID
Gender
(M=1)
Mean
Standard
0.061 Error
4
Median
4
Mode
Standard
0.504 Deviation
Sample
0.254 Variance
-2.060 Kurtosis
0.030 Skewness
1
Range
4
Minimum
5
Maximum
901589800.3 Mean
Standard
4992.821 Error
901596170 Median
#N/A
Mode
Standard
41473.487 Deviation
Sample
1720050147 Variance
0.555
Kurtosis
-0.581
Skewness
228090
Range
901444465 Minimum
901672555 Maximum
Section
Mean
Standard
Error
Median
Mode
Standard
Deviation
Sample
Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
4.493
310
69
Sum
Count
62209696222 Sum
69
Count
HS Stat?
0.609
Mean
Standard
0.059 Error
1
Median
1
Mode
Standard
0.492 Deviation
Sample
0.242 Variance
-1.847 Kurtosis
-0.455 Skewness
1
Range
0
Minimum
1
Maximum
42
69
Sum
Count
Ht
0.217
Mean
Standard
0.050 Error
0
Median
0
Mode
Standard
0.415 Deviation
Sample
0.173 Variance
-0.039 Kurtosis
1.401 Skewness
1
Range
0
Minimum
1
Maximum
15
69
Sum
Count
Value
69.351 Mean
Standard
0.477 Error
70
Median
71
Mode
Standard
3.959 Deviation
Sample
15.673 Variance
-0.793 Kurtosis
-0.307 Skewness
16
Range
60
Minimum
76
Maximum
0.185
4785.25 Sum
69
Count
12.75
69
0.056
0
0
0.465
0.216
6.385
2.706
2
0
2
Data/DataAnalysis/DescriptiveStatistics
SummaryStatistics
Section
Mean
Standard Error
Median
Mode
Standard
Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
ND ID
4.493
0.061
4
4
0.504
0.254
-2.060
0.030
1
4
5
310
69
Mean
Standard Error
Median
Mode
Standard
Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
901589800.3
4992.821
901596170
#N/A
41473.487
1720050147
0.555
-0.581
228090
901444465
901672555
62209696222
69
Data/DataAnalysis/DescriptiveStatistics
SummaryStatistics
Gender (M=1)
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
HS Stat?
0.609
0.059
1
1
0.492
0.242
-1.847
-0.455
1
0
1
42
69
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
0.217
0.050
0
0
0.415
0.173
-0.039
1.401
1
0
1
15
69
Data/DataAnalysis/DescriptiveStatistics
SummaryStatistics
Ht
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Value
69.351
0.477
70
71
3.959
15.673
-0.793
-0.307
16
60
76
4785.25
69
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
0.185
0.056
0
0
0.465
0.216
6.385
2.706
2
0
2
12.75
69
Fill Test Data
Normal(10.2,0.16)?
EXHIBIT 2
LOREX PHARMACEUTICALS
Filling Line Test Results with Target = 10.2
9.89
10.17
10.29
10.00
10.04
10.35
10.05
10.19
10.21
9.79
10.53
10.47
10.24
10.17
10.15
10.06
10.13
10.04
10.20
10.37
10.24
10.03
10.01
10.20
10.41
10.17
10.35
10.06
10.19
10.17
10.07
10.27
10.13
10.24
10.14
9.84
10.36
10.11
10.42
10.16
10.21
10.01
10.25
10.44
10.44
10.44
10.13
10.29
10.53
10.32
10.16
10.21
10.09
10.02
10.32
10.14
10.11
10.20
10.35
9.96
10.30
10.33
10.36
10.17
10.15
10.14
10.07
10.37
10.40
10.25
10.24
10.03
10.20
10.04
10.16
10.22
10.12
10.36
10.24
10.07
10.40
10.29
10.21
10.10
10.23
10.19
10.19
10.29
10.25
10.18
10.42
9.85
10.45
10.37
10.22
10.19
10.23
10.48
10.17
9.76
10.06
10.17
10.04
10.41
10.27
10.00
10.23
10.11
10.19
9.97
10.05
10.12
10.33
10.18
10.54
9.91
10.28
10.23
9.98
9.99
10.15
10.11
10.19
10.22
10.10
9.99
10.40
10.76
10.20
10.31
10.16
10.23
10.17
10.00
10.11
10.30
10.64
10.10
10.23
10.45
10.17
10.19
9.98
10.13
Fill Test Data
Descriptive Statistics
Summary Statistics
Amount
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
10.198
0.014
10.190
#N/A
0.163
0.026
0.771
0.245
0.997
9.758
10.756
1468.542
144
Fill Test Data
Histogram
1 data point was < 9.758
Frequency
1
2
3
10
17
33
36
14
16
7
3
1
1
Histogram
40
35
30
Frequency
Bin
9.758
9.841
9.925
10.008
10.091
10.174
10.257
10.340
10.423
10.506
10.590
10.673
More
2 data points were
between 9.758 and 9.841
Data
Data Analysis
Histogram
Check chart output
25
20
15
Frequency
10
5
0
1 was above 10.673
Bin
Preview of Coming Attractions
• Class 07
40
Frequency
– Find out how to use these
counts to test H0: these data
came from N(10.2,.16)
– Find out how to use the
Denmark family counts to test
H0: those data came from
Binomial(4,.5)
Histogram
30
20
10
Frequency
0
Bin
Download