Measures of Variability

advertisement
Basic Statistics
Measures of Variability
Measures of Variability
The Range
Deviation Score
The Standard Deviation
The Variance
STRUCTURE OF STATISTICS
Continuing with numerical approaches.
DESCRIPTIVE
TABULAR
GRAPHICAL
NUMERICAL
NUMERICAL
STATISTICS
CONFIDENCE
INTERVALS
INFERENTIAL
TESTS OF
HYPOTHESIS
STRUCTURE OF STATISTICS
NUMERICAL DESCRIPTIVE MEASURES
TABULAR
DESCRIPTIVE
GRAPHICAL
CENTRAL
TENDENCY
NUMERICAL
VARIABILITY
SYMMETRY
STRUCTURE OF STATISTICS
NUMERICAL DESCRIPTIVE MEASURES
CENTRAL
TENDENCY
RANGE
VARIABILITY
VARIANCE
NUMERICAL
SYMMETRY
STANDARD
DEVIATION
We need the variability of IQs in the class!
You are an elementary school teacher who has been assigned a
class of fifth graders whose mean IQ is 115. Because children with
IQ of 115 can handle more complex, abstract material, you plan
many sophisticated projects for the year.
Do you think your project will succeed ?
General population
85%
55 70 85 100 115 130 145
Is the average salary enough? We need the variability!
Having graduated from college, you are considering two offers of
employment. One in sales and the other in management. The pay is about
same for both. After checking out the statistics for salespersons and
managers at the library, you find that those who have been working for 5
years in each type of job also have similar averages.
Can you conclude that the pay for two occupations is equal?
Sales
management
Much more
Much less
$20,000
Single score
Central Tendency measures
Mean IQ=118
Group of scores
IQ of 100 students
?
?
?
More homogeneous
Central Tendency Measures
Measures of Central Tendency do not tell you the
differences that exist among the scores
Same Mean---Different Variability
So What?
4
How many are out here?
1
2
3
60
Central Tendency
1. The Range = The difference between
the largest (Xmax) and the smallest (Xmin).
25, 21, 22, 23, 28, 26, 24, 29
21
22
23
24
25
26
28
29
Range = 29 –21 = 8
A large range means there is a lot of variability in data.
Drawbacks of The Range
10, 28, 26, 27, 29
?
10
Range = 29 –26=3
26
27
28
29
Range = 29 –10 = 19
The Range depends on only the two extreme scores
Range and Extreme Observations
R
R
R
Because the range is determined by just two scores in the group,
it ignores the spread of all scores except the largest and
smallest.
One aberrant score or outlier can be greatly increase the range
Range and Measurement Scales
Before you determine the Range, all scores must be arranged in order
Country
Code
SES
1, 2, 3,
1, 2, 3,
1, 2, 3, 1, 2, 3,
3-1=2
3-1=2
3-1=2
1=American
2=Asian
3=Mexican
1=Upper
2=Middle
3=Lower
F
Age
3-1=2
3. The Variance
66
53
27
?
Differences among Scores
49
78
49
35
49
88
41
44
83
66
53
95
27
81
Differences
among Scores
35
42
62
72
57
41
81 49
49
63
35
41
53
77
35
88 78
66
Total Variability = Sum of Individual Variability
How can you determine the variability
of each individual in group?
22
72
22
55
70
12
3
67
The amount of Individual difference entirely depends on
comparison criteria.
Can you figure out how much each score is
different from other scores ?
Can you figure out total amount of differences
among scores ?
You need a Common Criteria for computing Total Variability
46
48
47
?
Mean
Reference score
Score
53
49
51
50
52
45
You need a Common Criteria for computing Total Variability
46
-3
48
-1
47
-2
Deviation
Scores
?
49
+4
53
Reference score
49
0
+2
51
+1
50
+3
52
45
-4
A Deviation score tells you that a particular score
deviate, or differs from the mean
DEVIATION SCORE = (Xi - Mean)
A score a great distance from the mean will
have large deviation score.
2
1
3
A
B
C
Mean
D
E
F
Sum of Deviation Scores
Total amount of variability?!
Sum of all distance values!
mathematically
No
way!
conceptually
The idea makes sense…but
If you compute the sum of the
deviation scores, the sum of the
deviation scores equals zero!
Sum of Deviation scores =
(-4) + (-3) + (-2) + 0 + (1) + (2) + (3) + (4) = 0
The Sum of Absolute Deviation Scores
Sum of absolute deviation scores
( 4 + 3 + 2 + 1 + 0 +1 + 2 + 3 + 4) = 20
The sum of absolute deviations is rarely used
as a measure of variability because the process
of taking absolute values does not provide
meaningful information for inferential statistics.
Sum of Squares
of deviation scores
“SS”
Conceptually
And
Mathematically
Sum of Squares of Deviation Scores, SS
Instead of working with the absolute values of
deviation scores, it is preferable to
(1) square each deviation score and
(2) sum them to obtain a quantity know as the
Sum of Squares.
2
2
2
2
2
2
2
2
SS=(-4)+(-3)+(-2)+(-1)+0+(1)+(2)+(3)+(4)
=16+9+4+1+0+1+2+9+16
=60
SS=
i
2
So !
Group of scores
“A”
SS(A)=30
Group of scores
“B”
SS(B)=40
Can you say that the variability of the
data in Group B is greater than the data
in Group A?
What happens to SS when we look at some data?
3, 4
3, 4
3, 4
Group A
Group B
Mean = 3.5
Mean = 3.5
2
2
SS = (3 - 3.5) + (4 - 3.5)
2
SS = (3 - 3.5) + (4 - 3.5) +
2
=.50
2
(3 - 3.5) + (4 - 3.5)
=1.00
2
N
Mean
i
i=1
SS tends to increase as number of data(N) increase.
SS is not appropriate for comparing variability
among groups having unequal sample size.
How can you overcome the limitation of SS
If SS is divided by N
(Xi  X)
SS

S 

2
2
N
N
The resulting value will be
Mean of the Deviation Scores (Mean
Square)
VARIANCE
3, 4
3, 4
3, 4
Group A
Group B
Mean = 3.5
Mean = 3.5
2
2
V = (3 - 3.5) + (4 - 3.5)
=.50/2 = .25
2
2
V = (3 - 3.5) + (4 - 3.5) +
2
(3 - 3.5) + (4 - 3.5)
= 1.00/4 = 2.5
2
(Xi  X )

Variance 
2
N
Population Variance
2
(
X


)

2 
N
Sample Variance
2

(
X

X
)
2
s
n1
POPULATION VARIANCE
Individual value
2
 
Sigma Square
Population mean
2
( X   )
N
Population size
SAMPLE VARIANCE
Sample Mean
Individual value
2

(
X

X
)
2
s
n1
Sample variance
Sample size-1
Degree of freedom
The sample variance (S2) is used to estimate the
population variance (2)
2
Why n-1 instead n ?

(
X

X
)
2
s
n1
population
100
Sampling n
Sampling error
sample
?<100<?
=
Ideally, a sample variance would be based
on (x - )2. This is impossible since  is
not known if one has only a sample of n
cases.  is substituted by X .
The value of the squared deviations is less
from X than from any other score .
Hence, in a sample, the value of (X-X)
would be less than
n.
>
n
n
n
>
n-1
Ideal sample variance
One could correct for this bias by dividing
by a factor somewhat less than n
sample
n=5
7
X

X
n
?
5
5
We know that
? must equal
25.
If we know that the mean is
equal to 5, and the first 4
scores add to 18, then the
last score MUST equal 7.
n-1 are free to change
Degree of freedom
4. Standard Deviation SD
Positive square root of the variance
Population
Sample
The Standard Deviation and the Mean
with
Normal Distribution

Normal Distribution
-3 -2 -1

+1 +2 +3
Relationship between  and 
Normal Distribution
68%
95%
99.9%
-3S
-2S -1S
+1S +2S +3S
Relationship between
and S
EMPIRICAL RULE
• For any symmetrical, bell-shaped
distribution, approximately 68% of the
observations will lie within  1 standard
deviation of the mean; approximately
98% within  2 standard deviations of
the mean; and approximately 99.9%
within  3 standard deviations of the
mean.
You can approximately reproduce
your data!
68%
95%
99%
If a set of data
has a Mean=50
and SD=10,
then…
20 30 40 50 60 70 80
Download