Introduction to Descriptive Statistics

advertisement
Introduction to Descriptive
Statistics
2/25/03
Population vs. Sample Notation
Population
Greeks
, , 
Vs
Sample
Romans
s, b
Types of Variables
~Nominal
(Quantitative)
Ordinal
Nominal
(Qualitative)
Interval or
ratio
Describing data
Moment
Center
Spread
Mean
Variance
(standard
deviation)
Skew
Skewness
Non-mean based
measure
Mode, median
Range,
Interquartile
range
--
Peaked
Kurtosis
--
Mean
n
x
 i
i 1
n
X
Variance, Standard Deviation
( xi   )
2
 ,

n
i 1
2
n
( xi   )


n
i 1
n
2
Variance, S.D. of a Sample
( xi   )
2
s ,

n 1
i 1
2
n
( xi   )
s

n 1
i 1
n
2
Coefficient of variation

c.v.  100

Skewness
Symmetrical distribution
• IQ
• SAT
Frequency
Value
Skewness
Asymmetrical distribution
• GPA of MIT students
Frequency
Value
Skewness
(Asymmetrical distribution)
• Income
• Contribution to
candidates
• Populations of
countries
• “Residual vote” rates
Frequency
Value
Skewness
n
 x  x 
3
i
i 1
n
(n  1)

n2
3/ 2
 x  x 
i 1
3/ 2

i
(mean  mod e) / s  3  (mean  median) / s
Skewness
Frequency
Value
Kurtosis
k>3
Frequency
k=3
k<3
Value
A few words about the normal
curve
• Skewness = 0
• Kurtosis = 3
Frequency
Value
1
 ( x   ) / 2 2
f ( x) 
e
 2
More words about the normal
Frequency
curve
34% 34%
47.7%
47.7%
x
Value
SEG example
The instructor and/or section leader:
Mean
s.d.
Skew
Kurt
Gives well-prepared, relevant presentations 6.0
0.69
-1.7
8.5
Explains clearly and answers questions
well
5.9
0.68
-1.0
4.8
Uses visual aids well
5.6
0.85
-1.8
8.9
Uses information technology effectively
5.5
0.91
-1.1
5.0
Speaks well
6.1
0.69
-1.5
6.8
Encourages questions & class participation 6.1
0.66
-0.88
3.7
Stimulates interest in the subject
5.9
0.76
-1.1
4.7
Is available outside of class for questions
5.9
0.68
-1.3
6.3
Overall rating of teaching
5.9
0.67
-1.2
5.5
Graph
Graph some SEG variables
The instructor and/or section leader:
s.d.
Skew
Kurt
5.6
0.85
-1.8
8.9
Graph
.6
Fraction
Uses visual aids well
Mean
0
1
6.1
0.66
-0.88
3.7
.6
Fraction
Encourages
questions & class
participation
7
(mean) q3
0
1
7
(mean) q6
Binary data
X  prob( X )  1  proportion of time x  1
s  x (1  x )  s x  x (1  x )
2
x
Commands in STAT for getting
univariate statistics
•
•
•
•
•
summarize
summarize, detail
graph, bin() normal
graph, box
tabulate [NB: compare to table]
Explore Q9: Overall teaching
evaluation
subject
3.371
3.982
3.14
14.02D
21W.803
21M.480
17.906
2.51
q9
6.4375
6.73333
6.46154
5.66667
5.66667
5.69231
5.28571
5.88235
n
16
15
13
3
12
13
14
17
Graph Q9
. graph q9
Fraction
.505495
0
2.33333
7
(mean) q9
Divide into 7 “bins” and have them
span 1, 1..2, 2..3, … 6..7
. graph q9,bin(7) xscale(0,7)
Fraction
.57326
0
0
7
(mean) q9
Add ticks at each integer score
. graph q9,bin(7) xscale(0,7) xlabel(0,1,2,3,4,5,6,7)
Fraction
.57326
0
0
1
2
3
4
(mean) q9
5
6
7
Add a finer grain to the bars
. graph q9,bin(14) xscale(0,7) xlabel(0,1,2,3,4,5,6,7)
Fraction
.318681
0
0
1
2
3
4
(mean) q9
5
6
7
Even finer grain
• . graph q9,bin(28) xscale(0,7) xlabel(0,1,2,3,4,5,6,7)
Fraction
.181319
0
0
1
2
3
4
(mean) q9
5
6
7
Superimpose the normal curve
(with the same mean and s.d. as the
empirical distribution)
. graph q9,bin(28) xscale(0,7) xlabel(0,1,2,3,4,5,6,7)
norm
Fraction
.181319
0
0
1
2
3
4
(mean) q9
5
6
7
Do the previous graph with only
larger classes (n > 20)
. graph q9 if n>20,bin(28) xscale(0,7)
xlabel(0,1,2,3,4,5,6,7)
Fraction
.202532
0
0
1
2
3
4
(mean) q9
5
6
7
Draw the previous graph with a box
plot
. graph q9 if n>20,box ylabel
(mean) q9
7
6
5
4
3
Draw the box plots for small (0..20),
medium (21..50), and large (50+)
classes
. gen size = 0 if n <=20
(237 missing values generated)
. replace size=1 if n > 20 & n <=100
(196 real changes made)
. replace size = 2 if n > 100
(41 real changes made)
(mean) q9
8
6
. sort size
. graph q9 ,box ylabel by(size)
. graph q9 ,box ylabel by(size)
4
2
0
1
2
A note about histograms with
unnatural categories
From the Current Population Survey (2000), Voter and Registration Survey
How long (have you/has name) lived at this address?
-9
-3
-2
-1
1
2
3
4
5
6
No Response
Refused
Don't know
Not in universe
Less than 1 month
1-6 months
7-11 months
1-2 years
3-4 years
5 years or longer
Simple graph
Fraction
.557134
0
1
6
PES8
Solution, Step 1
Map artificial category onto
“natural” midpoint
-9
-3
-2
-1
1
2
3
4
5
6
No Response  missing
Refused  missing
Don't know  missing
Not in universe  missing
Less than 1 month  1/24 = 0.042
1-6 months  3.5/12 = 0.29
7-11 months  9/12 = 0.75
1-2 years  1.5
3-4 years  3.5
5 years or longer  10 (arbitrary)
Graph of recoded data
Fraction
.557134
0
0
1
2
3
4
5
longevity
6
7
8
9
10
Density plot of data
0
0
1
2
3
4
5
6
longevity
7
8
9
10
15
Download