X 2 - Binus Repository

advertisement
Mata kuliah
Tahun
: A0392 - Statistik Ekonomi
: 2010
Pertemuan 13
Data Deret Waktu dan Analisis
Regresi dan Korelasi Linier
Sederhana
1
Outline Materi :
 Data Deret Waktu (Times Series)
 Analisis Regresi Linier Sederhana
 Koefisien Korelasi dan Uji
Ketergantungan antar Peubah Acak
2
PENDAHULUAN
•
Data deret berkala adalah sekumpulan data yang dicatat
dalam suatu periode tertentu.
•
Manfaat analisis data berkala adalah mengetahui kondisi
masa mendatang.
•
Peramalan kondisi mendatang bermanfaat untuk
perencanaan produksi, pemasaran, keuangan dan
bidang lainnya.
KOMPONEN DATA BERKALA
Trend; Variasi Musim; Variasi Siklus; dan Variasi yang
Tidak Tetap (Irregular)
3
3
TREND
Suatu gerakan kecenderungan naik atau turun dalam jangka panjang
yang diperoleh dari rata-rata perubahan dari waktu ke waktu dan
nilainya cukup rata (smooth).
Y
Y
Tahun (X)
Trend Positif
Tahun (X)
Trend Negatif
4
4
Metode Kuadrat Terkecil Untuk Trend Linier
Menentukan garis trend yang mempunyai jumlah terkecil dari
kuadrat selisih data asli dengan data pada garis trendnya.
Y = a + bX
a = Y/N
b = YX/X2
Pelanggan (Jutaan)
Trend Pelanggan PT. Telkom
8
7
6
5
4
3
2
1
0
97
98
99
00
01
Tahun
Data Y'
Data Y
5
5
CONTOH METODE KUADRAT TERKECIL
Tahun
Kode X
(tahun)
-2
Y.X
X2
1997
Pelanggan
=Y
5,0
-10,0
4
1998
5,6
-1
-5,6
1
1999
6,1
0
0
0
2000
6,7
1
6,7
1
2001
7,2
2
14,4
4
Y.X=5,5
X2=10
Y=30,6
Nilai a = 30,6/5=6,12
Nilai b =5,5/10=0,55
Jadi persamaan trend Y’=6,12+0,55x
6
6
ANALISIS TREND KUADRATIS
Trend Kuadratis
Jumlah Pelanggan
(jutaan)
Untuk jangka waktu pendek,
kemungkinan trend tidak
bersifat linear. Metode
kuadratis adalah contoh
metode nonlinear
8.00
6.00
4.00
2.00
0.00
Y=a+bX+c
X2
97
98
Y = a + bX + cX2
99
00
01
Tahun
Koefisien a, b, dan c dicari dengan rumus sebagai berikut:
a = (Y) (X4) – (X2Y) (X2)/ n (X4) - (X2)2
b = XY/X2
c = n(X2Y) – (X2 ) ( Y)/ n (X4) - (X2)2
7
7
CONTOH TREND KUADRATIS
Tahun
Y
X
XY
X2
X2Y
X4
1997
5,0
-2
-10,00
4,00
20,00
16,00
1998
5,6
-1
-5,60
1,00
5,60
1,00
1999
6,1
0
0,00
0,00
0,00
0,00
2000
6,7
1
6,70
1,00
6,70
1,00
2001
7,2
2
14,40
4,00
2880
16,00
5,50
10,00
61,10
34,00
30.60
a = (Y) (X4) – (X2Y) (X2) = {(30,6)(34)-(61,1)(10)}/{(5)(34)-(10)2}=6,13
n (X4) - (X2)2
b = XY/X2
= 5,5/10=0,55
2
2
c = n(X Y) – (X ) ( Y)
= {(5)(61,1)-(10)(30,6)}/{(5)(34)-(10)2}=-0,0071
n (X4) - (X2)2
Jadi persamaan kuadratisnya adalah Y =6,13+0,55x-0,0071x2
8
8
ANALISIS TREND EKSPONENSIAL
Persamaan eksponensial dinyatakan dalam bentuk variabel waktu (X)
dinyatakan sebagai pangkat. Untuk mencari nilai a, dan b dari data Y
dan X, digunakan rumus sebagai berikut:
Y’ = a (1 + b)X
Ln Y’ = Ln a + X Ln (1+b)
Sehingga a = anti ln (LnY)/n
b = anti ln  (X. LnY) - 1
(X)2
Jumlah
Pelanggan
(jutaan)
Trend Eskponensial
15,00
10,00
5,00
0,00
97
98
99
00
01
Tahun
Y= a(1+b)X
9
9
CONTOH TREND EKSPONENSIAL
Tahun
Y
X
Ln Y
X2
X Ln Y
1997
5,0
-2
1,6
4,00
-3,2
1998
5,6
-1
1,7
1,00
-1,7
1999
6,1
0
1,8
0,00
0,0
2000
6,7
1
1,9
1,00
1,9
2001
7,2
2
2,0
4,00
3,9
9,0
10,00
0,9
Nilai a dan b didapat dengan:
a = anti ln (LnY)/n = anti ln 9/5=6,049
b = anti ln  (X. LnY) - 1 = {anti ln0,9/10}-1=0,094
(X)2
Sehingga persamaan eksponensial Y =6,049(1+0,094)x
10
10
VARIASI MUSIM
Variasi musim terkait dengan perubahan atau fluktuasi dalam musimmusim atau bulan tertentu dalam 1 tahun.
Indeks Saham PT. Astra Agro
Pergerakan Inflasi 2002
Produksi Padi Permusim
Lestari, Maret 2003
150
2
10
0
1,5
Indeks
20
Inflasi (%)
Produksi (000 ton)
2,5
30
1
0,5
Triw ulan
Variasi Musim Produk
Pertanian
50
0
I- II- III- I- II- III- I- II- III- I- II- III98 98 98 99 99 99 00 00 00 01 01 03
100
03
0
1
2
3
4
5
6
7
8
9
10
05
13
14
22
11 12
Bulan
Variasi Inflasi Bulanan
Tanggal
Variasi Harga Saham
Harian
11
11
VARIASI MUSIM DENGAN METODE RATA-RATA
SEDERHANA
Indeks Musim = (Rata-rata per kuartal/rata-rata total) x 100
Bulan
Januari
88
Rumus= Nilai bulan ini x 100
Nilai rata-rata
(88/95) x100
Februari
82
(82/95) x100
86
Maret
106
(106/95) x100
112
April
98
(98/95) x100
103
Mei
112
(112/95) x100
118
Juni
92
(92/95) x100
97
Juli
102
(102/95) x100
107
96
(96/95) x100
101
105
(105/95) x100
111
85
(85/95) x100
89
November
102
(102/95) x100
107
Desember
76
(76/95) x100
80
Rata-rata
95
Agustus
September
Oktober
Pendapatan
Indeks
Musim
93
12
12
METODE RATA-RATA DENGAN TREND
• Metode rata-rata dengan trend dilakukan dengan cara yaitu indeks
musim diperoleh dari perbandingan antara nilai data asli dibagi
dengan nilai trend.
• Oleh sebab itu nilai trend Y’ harus diketahui dengan persamaan
Y’ = a + bX.
13
13
METODE RATA-RATA DENGAN TREND
Y
Y’
Perhitungan
Indeks Musim
Januari
88
97,41
(88/97,41) x 100
90,3
Februari
82
97,09
(82/97,09) x 100
84,5
Maret
106
96,77
(106/96,77) x100
109,5
April
98
96,13
(98/96,13) x 100
101,9
Mei
112
95,81
(112/95,81) x 100
116,9
Juni
92
95,49
(92/95,49) x 100
96,3
Juli
102
95,17
(102/95,17) x 100
107,2
Agustus
96
94,85
(96/94,85) x 100
101,2
September
105
94,53
(105/94,53) x 100
111,1
Oktober
85
93,89
(85/93,89) x 100
90,5
November
102
93,57
(102/93,57) x 100
109,0
Desember
76
93,25
(76/93,25) x 100
81,5
Bulan
14
14
VARIASI SIKLUS
Siklus Indeks Saham Gabungan
Siklus
2,5
Ingat
2
1,5
Y=TxSxCxI
TCI = Y/S
CI = TCI/T
Di mana CI adalah Indeks
Siklus
IHSG
Maka
1
0,5
0
-0,5 94 95 96 97 98 99 00 01 02
-1
-1,5
-2
-2,5
Tahun
15
15
CONTOH SIKLUS
Th
1998
1999
2000
2001
Trwl
Y
I
22
T
17,5
S
TCI=Y/S
CI=TCI/T
C
II
14
17,2
95
14,7
86
III
8
16,8
51
15,7
93
92
I
25
16,5
156
16,0
97
97
II
15
16,1
94
16,0
99
100
III
8
15,8
49
16,3
103
102
I
26
15,4
163
16,0
104
104
II
14
15,1
88
15,9
105
105
III
8
14,7
52
15,4
105
106
I
24
14,3
157
15,3
107
108
II
14
14,0
89
15,7
112
III
9
13,6
16
16
GERAK TAK BERATURAN
Siklus
Ingat Y = T x S x C x I
TCI = Y/S
CI = TCI/T
I = CI/C
17
17
GERAK TAK BERATURAN
Th
1998
1999
2000
2001
Trwl
C
I=(CI/C) x 100
I
CI=TCI/T
86
II
93
92
101
III
97
97
100
I
99
100
99
II
103
102
101
III
104
104
100
I
105
105
100
II
105
106
99
III
107
108
99
I
112
II
III
18
18
PENGUJIAN KOEFISIEN
REGRESI DENGAN
ANALISIS VARIANSI
19
Measures of Variation:
The Sum of Squares
SST
=
Total
=
Sample
Variability
SSR
Explained
Variability
+
+
SSE
Unexplained
Variability
SST = Total Sum of Squares
SSR = Regression Sum of Squares
SSE = Error Sum of Squares
20
Measures of Variation:
The Sum of Squares

SSE =(Yi - Yi )2
Y
_
SST = (Yi - Y)2
 _
SSR = (Yi - Y)2
Xi
_
Y
X
21
Venn Diagrams and
Explanatory Power of
Regression
Variations in
store Sizes not
used in
explaining
variation in
Sales
Sizes
Sales
Variations in Sales
explained by the
error term or
unexplained by
Sizes  SSE 
Variations in Sales
explained by Sizes
or variations in Sizes
used in explaining
variation in Sales
 SSR 
22
The ANOVA Table in Excel
ANOVA
df
Regressio
k
n
Residuals
Total
SS
MS
F
SS
R
MSR
=SSR/k
P-value of
MSR/MSE
the F Test
n-k- SS
1
E
n-1
Significanc
e
F
MSE
=SSE/(n-k1)
SS
T
23
Measures of Variation
The Sum of Squares: Example
Excel Output for Produce Stores
Degrees of freedom
ANOVA
df
SS
MS
Regression
1
30380456.12
30380456
Residual
5
1871199.595 374239.92
Total
6
32251655.71
F
81.17909
Regression (explained) df
Error (residual) df
Total df
SSE
SSR
Significance F
0.000281201
SST
24
Venn Diagrams and
Explanatory Power of
Regression
r 
2
Sales
Sizes
SSR

SSR  SSE
25
Standard Error of Estimate

n
•
SYX
SSE


n2
i 1
Y  Yˆi

2
n2
• Measures the standard deviation
(variation) of the Y values around the
regression equation
26
Measures of Variation:
Produce Store Example
Excel Output for Produce Stores
R e g r e ssi o n S ta ti sti c s
M u lt ip le R
R S q u a re
0 .9 4 1 9 8 1 2 9
A d ju s t e d R S q u a re
0 .9 3 0 3 7 7 5 4
S t a n d a rd E rro r
6 1 1 .7 5 1 5 1 7
O b s e r va t i o n s
r2 = .94
0 .9 7 0 5 5 7 2
n
7
94% of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage.
Syx
27
Linear Regression
Assumptions
• Normality
– Y values are normally distributed for each X
– Probability distribution of error is normal
• Homoscedasticity (Constant Variance)
• Independence of Errors
28
Consequences of Violation
of the Assumptions
• Violation of the Assumptions
– Non-normality (error not normally distributed)
– Heteroscedasticity (variance not constant)
• Usually happens in cross-sectional data
– Autocorrelation (errors are not independent)
• Usually happens in time-series data
• Consequences of Any Violation of the Assumptions
– Predictions and estimations obtained from the
sample regression line will not be accurate
– Hypothesis testing results will not be reliable
• It is Important to Verify the Assumptions
29
Variation of Errors Around
the Regression Line
f(e)
• Y values are normally distributed
around the regression line.
• For each X value, the “spread” or
variance around the regression line is
the same.
Y
X2
X1
X
Sample Regression Line
30
Inference about the Slope:
t Test
• t Test for a Population Slope
– Is there a linear dependency of Y on X ?
• Null and Alternative Hypotheses
– H0: 1 = 0
– H1: 1  0
(no linear dependency)
(linear dependency)
• Test Statistic
–
b1  1
t
where Sb1 
Sb1
SYX
n
(X
i 1
–
d. f .  n  2
i
 X)
2
31
Example: Produce Store
Data for 7 Stores:
Store
Square
Feet
Annual
Sales
($000)
1
2
3
4
5
6
7
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Estimated Regression
Equation:
Yˆi  1636.415  1.487X i
The slope of this
model is 1.487.
Does square footage
affect annual sales?
32
Inferences about the Slope:
t Test Example
Test Statistic:
H0: 1 = 0
From Excel Printout b
1
H1: 1  0
Coefficients Standard Error
  .05
Intercept
1636.4147
451.4953
df  7 - 2 = 5
Footage
1.4866
0.1650
Decision:
Critical Value(s):
Reject
Reject
Reject H0.
.025
.025
-2.5706 0 2.5706
t
Sb1
t
t Stat P-value
3.6244 0.01515
9.0099 0.00028
p-value
Conclusion:
There is evidence that
square footage affects
33
annual sales.
Inferences about the Slope:
Confidence Interval Example
Confidence Interval Estimate of the Slope:
b1  tn  2 Sb1
Excel Printout for Produce Stores
Intercept
Footage
Lower 95% Upper 95%
475.810926 2797.01853
1.06249037 1.91077694
At 95% level of confidence, the confidence interval
for the slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear dependency
of annual sales on the size of the store.
34
Inferences about the Slope:
F Test
• F Test for a Population Slope
– Is there a linear dependency of Y on X ?
• Null and Alternative Hypotheses
– H0: 1 = 0
– H1: 1  0
(no linear dependency)
(linear dependency)
• Test Statistic
SSR
1
– F 
SSE
 n  2
35
Relationship between a t
Test and an F Test
• Null and Alternative Hypotheses
– H0: 1 = 0
– H1: 1  0
•
t 
n2
2
(no linear dependency)
(linear dependency)
 F1,n 2
• The p –value of a t Test and the p –value
of an F Test are Exactly the Same
• The Rejection Region of an F Test is
Always in the Upper Tail
36
Inferences about the Slope:
F Test Example
H0: 1 = 0
ANOVA
H1: 1  0
  .05
Regression
numerator Residual
df = 1
Total
denominator
df  7 - 2 = 5
Test Statistic:
From Excel Printout
df
1
5
6
Reject
 = .05
0
6.61
F1, n  2
SS
MS
F Significance F
30380456.12 30380456.12 81.179
0.000281
1871199.595 374239.919
p-value
32251655.71
Decision: Reject H0.
Conclusion:
There is evidence that
square footage affects
annual sales.
37
Purpose of Correlation
Analysis
• Correlation Analysis is Used to Measure
Strength of Association (Linear
Relationship) Between 2 Numerical
Variables
– Only strength of the relationship is concerned
– No causal effect is implied
38
Purpose of Correlation
Analysis
• Population Correlation Coefficient  (Rho)
is Used to Measure the Strength between
the Variables
 XY

 X Y
39
Purpose of Correlation
Analysis
(continued)
• Sample Correlation Coefficient r is an
Estimate of  and is Used to Measure the
Strength of the Linear Relationship in the
Sample Observations
n
r
 X
i 1
n
 X
i 1
i
i
 X Yi  Y 
X
2
n
 Y  Y 
i 1
2
i
40
Sample Observations from
Various r Values
Y
Y
Y
X
r = -1
X
r = -.6
Y
X
r=0
Y
r = .6
X
r=1
X
41
Features of  and r
• Unit Free
• Range between -1 and 1
• The Closer to -1, the Stronger the
Negative Linear Relationship
• The Closer to 1, the Stronger the Positive
Linear Relationship
• The Closer to 0, the Weaker the Linear
Relationship
42
t Test for Correlation
• Hypotheses
– H0:  = 0 (no correlation)
– H1:   0 (correlation)
• Test Statistic
t
–
r
where
 r
n2
2
n
r  r2 
 X
i 1
n
 X
i 1
i
i
 X Yi  Y 
X
2
n
 Y  Y 
i 1
2
i
43
Example: Produce Stores
r
From Excel Printout
Is there any
evidence of linear
relationship between
annual sales of a
store and its square
footage at .05 level
of significance?
R e g r e ssi o n S ta ti sti c s
M u lt ip le R
R S q u a re
0 .9 7 0 5 5 7 2
0 .9 4 1 9 8 1 2 9
A d ju s t e d R S q u a re 0 . 9 3 0 3 7 7 5 4
S t a n d a rd E rro r
6 1 1 .7 5 1 5 1 7
O b s e rva t io n s
H0:  = 0 (no
association)
H1:   0 (association)
  .05
7
44
Example: Produce Stores
Solution
r
.9706
t

 9.0099
2
1  .9420
 r
5
n2
Critical Value(s):
Reject
.025
Reject
.025
-2.5706 0 2.5706
Decision:
Reject H0.
Conclusion:
There is evidence of a
linear relationship at 5%
level of significance.
The value of the t statistic is
exactly the same as the t
statistic value for test on the
slope coefficient.
45
Estimation of Mean Values
Confidence Interval Estimate for
Y | X  X
:
i
The Mean of Y Given a Particular Xi
Standard error
of the estimate
Size of interval varies according
to distance away from mean, X
Yˆi  tn 2 SYX
t value from table
with df=n-2
(Xi  X )
1
 n
n
2
 (Xi  X )
2
i 1
46
Prediction of Individual Values
Prediction Interval for Individual Response
Yi at a Particular Xi
Addition of 1 increases width of interval
from that for the mean of Y
Yˆi  tn 2 SYX
1 (Xi  X )
1  n
n
2
(Xi  X )
2
i 1
47
Interval Estimates for Different
Values of X
Y
Confidence
Interval for the
Mean of Y
Prediction Interval
for a Individual Yi
X
X
a given X
48
Example: Produce Stores
Data for 7 Stores:
Store
Square
Feet
Annual
Sales
($000)
1
2
3
4
5
6
7
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Consider a store
with 2000 square
feet.
Regression Model Obtained:

Yi = 1636.415 +1.487Xi
49
Estimation of Mean Values:
Example
Confidence Interval Estimate for
Y | X  X
i
Find the 95% confidence interval for the average annual
sales for stores of 2,000 square feet.

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29
SYX = 611.75
Yˆi  tn 2 SYX
tn-2 = t5 = 2.5706
( X i  X )2
1
 n
 4610.45  612.66
n
2
(Xi  X )
i 1
3997.02  Y |X  X i  5222.34
50
Prediction Interval for Y :
Example
Prediction Interval for Individual YX  X i
Find the 95% prediction interval for annual sales of one
particular store of 2,000 square feet.

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29
SYX = 611.75
Yˆi  tn 2 SYX
tn-2 = t5 = 2.5706
1 ( X i  X )2
1  n
 4610.45  1687.68
n
2
(
X

X
)
 i
i 1
2922.00  YX  X i  6297.37
51
PENGGUNAAN MS EXCEL UNTUK REGRESI
• Masukkan data Y dan data X pada sheet MS Excel, misalnya
data Y di kolom A dan X pada kolom B dari baris 1 sampai 5.
• Klik icon tools, pilih ‘data analysis’, dan pilih ‘simple linear
regression’.
• Pada kotak data tertulis Y variable cell range: masukkan data
Y dengan mem-blok kolom a atau a1:a5. Pada X variable cell
range: masukkan data X dengan mem-blok kolom b atau
b1:b5.
• Anda klik OK, maka hasilnya akan keluar. Y’= a+b X; a
dinyatakan sebagai intercept dan b sebagai X variable1 pada
kolom coefficients.
52
52
53
53
54
54
55
55
SELAMAT BELAJAR SEMOGA SUKSES SELALU
56
56
Download