Pertemuan 02 Ukuran Numerik Deskriptif Matakuliah : A0392-Statistik Ekonomi

advertisement
Matakuliah
Tahun
: A0392-Statistik Ekonomi
: 2006
Pertemuan 02
Ukuran Numerik Deskriptif
1
Outline Materi:
• Ukuran Pemusatan
• Ukuran Variasi
• Ukuran Posisi (Letak)
2
Basic Business Statistics
Numerical Descriptive
Measures
3
Chapter Topics
• Measures of Central Tendency
– Mean, Median, Mode, Geometric Mean
• Quartile
• Measure of Variation
– Range, Interquartile Range, Variance and
Standard Deviation, Coefficient of Variation
• Shape
– Symmetric, Skewed, Using Box-and-Whisker
Plots
4
Chapter Topics
(continued)
• The Empirical Rule and the BienaymeChebyshev Rule
• Coefficient of Correlation
• Pitfalls in Numerical Descriptive Measures
and Ethical Issues
5
Summary Measures
Summary Measures
Central Tendency
Mean
Quartile
Mode
Median
Range
Variation
Coefficient of
Variation
Variance
Geometric Mean
Standard Deviation
6
Measures of Central Tendency
Central Tendency
Mean
Median
Mode
n
X 
X
i 1
i
Geometric Mean
X G   X1  X 2 
n
 Xn 
1/ n
N

X
i 1
i
N
7
Mean (Arithmetic Mean)
• Mean (Arithmetic Mean) of Data Values
– Sample mean
Sample Size
n
X
X1  X 2 
X

– Population mean
n
n
i 1
i
Population Size
N

X
i 1
N
 Xn
i
X1  X 2 

N
 XN
8
Mean (Arithmetic Mean)
(continued)
• The Most Common Measure of Central
Tendency
• Affected by Extreme Values (Outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
9
Mean (Arithmetic Mean)
(continued)
• Approximating the Arithmetic Mean
– Used cwhen raw data are not available
mj f j

j 1
– X
n
n  sample size
c  number of classes in the frequency distribution
m j  midpoint of the jth class
f j  frequencies of the jth class
10
Median
• Robust Measure of Central Tendency
• Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10
Median = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
• In an Ordered Array, the Median is the
‘Middle’ Number
– If n or N is odd, the median is the middle
number
11
Mode
•
•
•
•
•
•
A Measure of Central Tendency
Value that Occurs Most Often
Not Affected by Extreme Values
There May Not Be a Mode
There May Be Several Modes
Used for Either Numerical or Categorical
Data
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
12
Geometric Mean
• Useful in the Measure of Rate of Change
of a Variable Over Time
X G   X1  X 2 
 Xn 
1/ n
• Geometric Mean Rate of Return
– Measures the status of an investment over
time
RG  1  R1   1  R2  
 1  Rn  
1/ n
1
13
Example
An investment of $100,000 declined to $50,000 at
the end of year one and rebounded back to
$100,000 at end of year two:
R1  0.5 (or  50%) R2  1 (or 100% )
Average rate of return:
( 0.5)  (1)
R
 0.25 (or 25%)
2
Geometric rate of return:
RG  1  0.5   1  1 
1/ 2
  0.5    2  
1/ 2
1
 1  11/ 2  1  0 (or 0%)
14
Quartiles
• Split Ordered Data into 4 Quarters
25%
25%
 Q1 
25%
 Q2 
• Position of i-th Quartile
25%
Q3 
i  n  1
 Qi  
4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
1 9  1
12  13

Position of Q1 
 2.5
Q1 
 12.5
4
2
• Q1 and Q3 are Measures
of Noncentral
Location
• Q2 = Median, a Measure of Central Tendency
15
Measures of Variation
Variation
Variance
Range
Population
Variance
Sample
Variance
Interquartile Range
Standard Deviation
Population
Standard
Deviation
Sample
Standard
Deviation
Coefficient
of Variation
16
Range
• Measure of Variation
• Difference between the Largest and the
Smallest Observations:
Range  X Largest  X Smallest
• Ignores How Data are Distributed
Range = 12 - 7 = 5
Range = 12 - 7 = 5
7
8
9
10
11
12
7
8
9
10
11
12
17
Interquartile Range
• Measure of Variation
• Also Known as Midspread
– Spread in the middle 50%
• Difference between the First and Third
Quartiles
Data in Ordered Array: 11 12 13 16 16 17
17 18 21
Interquartile Range  Q3  Q1  17.5  12.5  5
• Not Affected by Extreme Values
18
Variance
• Important Measure of Variation
• Shows Variation about the Mean
– Sample Variance:
n
S 
2
 X
i 1
n 1
– Population Variance:
N
 
2
X
i
2
 X
i 1
i

N
2
19
Standard Deviation
• Most Important Measure of Variation
• Shows Variation about the Mean
• Has the Same Units as the Original Data
– Sample Standard Deviation:
S
n
 X
i 1
X
i
n 1
N
 X
– Population Standard Deviation:

2
i 1
i
N

2
20
Standard Deviation
• Approximating the Standard Deviation
– Used when the raw data are not available
and the only source of data is a frequency
distribution
m j  X  f j
c
S 
2
j 1
n 1
n  sample size
c  number of classes in the frequency distribution
m j  midpoint of the jth class
f j  frequencies of the jth class
21
Comparing Standard
Deviations
Data A
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 3.338
Data B
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = .9258
Data C
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
22
Coefficient of Variation
• Measure of Relative Variation
• Always in Percentage (%)
• Shows Variation Relative to the Mean
• Used to Compare Two or More Sets of
Data Measured in Different Units
•
S
CV  
X

100%

• Sensitive to Outliers
23
Shape of a Distribution
• Describe How Data are Distributed
• Measures of Shape
– Symmetric or skewed
Left-Skewed
Mean < Median < Mode
Symmetric
Mean = Median =Mode
Right-Skewed
Mode < Median < Mean
24
Exploratory Data Analysis
• Box-and-Whisker
– Graphical display of data using 5-number
summary
X smallest Q
1
4
6
Median( Q2)
8
Q3
10
Xlargest
12
25
Distribution Shape &
Box-and-Whisker
Left-Skewed
Q1
Q2 Q3
Symmetric
Q1Q2Q3
Right-Skewed
Q1 Q2 Q3
26
The Empirical Rule
• For Most Data Sets, Roughly 68% of the
Observations Fall Within 1 Standard
Deviation Around the Mean
• Roughly 95% of the Observations Fall
Within 2 Standard Deviations Around the
Mean
• Roughly 99.7% of the Observations Fall
Within 3 Standard Deviations Around the
Mean
27
The Bienayme-Chebyshev
Rule
• The Percentage of Observations Contained
Within Distances of k Standard Deviations
Around the Mean Must Be at Least
– Applies regardless of the shape ofthe
set
1  1/data
k 2 100%
– At least 75% of the observations must be
contained within distances of 2 standard
deviations around the mean
– At least 88.89% of the observations must be
contained within distances of 3 standard
deviations around the mean
– At least 93.75% of the observations must be
contained within distances of 4 standard
28
deviations around the mean
Coefficient of Correlation
• Measures the Strength of the Linear
Relationship between 2 Quantitative
Variables
n
•
r
 X
i 1
n
 X
i 1
i
i
 X Yi  Y 
X
2
n
 Y  Y 
i 1
2
i
29
Features of Correlation
Coefficient
• Unit Free
• Ranges between –1 and 1
• The Closer to –1, the Stronger the
Negative Linear Relationship
• The Closer to 1, the Stronger the Positive
Linear Relationship
• The Closer to 0, the Weaker Any Linear
Relationship
30
Scatter Plots of Data with Various
Correlation Coefficients
Y
Y
Y
X
r = -1
X
r = -.6
Y
X
r=0
Y
r = .6
X
r=1
X
31
Download