Mata kuliah Tahun : A0392 - Statistik Ekonomi : 2010 Pertemuan 13 Data Deret Waktu dan Analisis Regresi dan Korelasi Linier Sederhana 1 Outline Materi : Data Deret Waktu (Times Series) Analisis Regresi Linier Sederhana Koefisien Korelasi dan Uji Ketergantungan antar Peubah Acak 2 PENDAHULUAN • Data deret berkala adalah sekumpulan data yang dicatat dalam suatu periode tertentu. • Manfaat analisis data berkala adalah mengetahui kondisi masa mendatang. • Peramalan kondisi mendatang bermanfaat untuk perencanaan produksi, pemasaran, keuangan dan bidang lainnya. KOMPONEN DATA BERKALA Trend; Variasi Musim; Variasi Siklus; dan Variasi yang Tidak Tetap (Irregular) 3 3 TREND Suatu gerakan kecenderungan naik atau turun dalam jangka panjang yang diperoleh dari rata-rata perubahan dari waktu ke waktu dan nilainya cukup rata (smooth). Y Y Tahun (X) Trend Positif Tahun (X) Trend Negatif 4 4 Metode Kuadrat Terkecil Untuk Trend Linier Menentukan garis trend yang mempunyai jumlah terkecil dari kuadrat selisih data asli dengan data pada garis trendnya. Y = a + bX a = Y/N b = YX/X2 Pelanggan (Jutaan) Trend Pelanggan PT. Telkom 8 7 6 5 4 3 2 1 0 97 98 99 00 01 Tahun Data Y' Data Y 5 5 CONTOH METODE KUADRAT TERKECIL Tahun Kode X (tahun) -2 Y.X X2 1997 Pelanggan =Y 5,0 -10,0 4 1998 5,6 -1 -5,6 1 1999 6,1 0 0 0 2000 6,7 1 6,7 1 2001 7,2 2 14,4 4 Y.X=5,5 X2=10 Y=30,6 Nilai a = 30,6/5=6,12 Nilai b =5,5/10=0,55 Jadi persamaan trend Y’=6,12+0,55x 6 6 ANALISIS TREND KUADRATIS Trend Kuadratis Jumlah Pelanggan (jutaan) Untuk jangka waktu pendek, kemungkinan trend tidak bersifat linear. Metode kuadratis adalah contoh metode nonlinear 8.00 6.00 4.00 2.00 0.00 Y=a+bX+c X2 97 98 Y = a + bX + cX2 99 00 01 Tahun Koefisien a, b, dan c dicari dengan rumus sebagai berikut: a = (Y) (X4) – (X2Y) (X2)/ n (X4) - (X2)2 b = XY/X2 c = n(X2Y) – (X2 ) ( Y)/ n (X4) - (X2)2 7 7 CONTOH TREND KUADRATIS Tahun Y X XY X2 X2Y X4 1997 5,0 -2 -10,00 4,00 20,00 16,00 1998 5,6 -1 -5,60 1,00 5,60 1,00 1999 6,1 0 0,00 0,00 0,00 0,00 2000 6,7 1 6,70 1,00 6,70 1,00 2001 7,2 2 14,40 4,00 2880 16,00 5,50 10,00 61,10 34,00 30.60 a = (Y) (X4) – (X2Y) (X2) = {(30,6)(34)-(61,1)(10)}/{(5)(34)-(10)2}=6,13 n (X4) - (X2)2 b = XY/X2 = 5,5/10=0,55 2 2 c = n(X Y) – (X ) ( Y) = {(5)(61,1)-(10)(30,6)}/{(5)(34)-(10)2}=-0,0071 n (X4) - (X2)2 Jadi persamaan kuadratisnya adalah Y =6,13+0,55x-0,0071x2 8 8 ANALISIS TREND EKSPONENSIAL Persamaan eksponensial dinyatakan dalam bentuk variabel waktu (X) dinyatakan sebagai pangkat. Untuk mencari nilai a, dan b dari data Y dan X, digunakan rumus sebagai berikut: Y’ = a (1 + b)X Ln Y’ = Ln a + X Ln (1+b) Sehingga a = anti ln (LnY)/n b = anti ln (X. LnY) - 1 (X)2 Jumlah Pelanggan (jutaan) Trend Eskponensial 15,00 10,00 5,00 0,00 97 98 99 00 01 Tahun Y= a(1+b)X 9 9 CONTOH TREND EKSPONENSIAL Tahun Y X Ln Y X2 X Ln Y 1997 5,0 -2 1,6 4,00 -3,2 1998 5,6 -1 1,7 1,00 -1,7 1999 6,1 0 1,8 0,00 0,0 2000 6,7 1 1,9 1,00 1,9 2001 7,2 2 2,0 4,00 3,9 9,0 10,00 0,9 Nilai a dan b didapat dengan: a = anti ln (LnY)/n = anti ln 9/5=6,049 b = anti ln (X. LnY) - 1 = {anti ln0,9/10}-1=0,094 (X)2 Sehingga persamaan eksponensial Y =6,049(1+0,094)x 10 10 VARIASI MUSIM Variasi musim terkait dengan perubahan atau fluktuasi dalam musimmusim atau bulan tertentu dalam 1 tahun. Indeks Saham PT. Astra Agro Pergerakan Inflasi 2002 Produksi Padi Permusim Lestari, Maret 2003 150 2 10 0 1,5 Indeks 20 Inflasi (%) Produksi (000 ton) 2,5 30 1 0,5 Triw ulan Variasi Musim Produk Pertanian 50 0 I- II- III- I- II- III- I- II- III- I- II- III98 98 98 99 99 99 00 00 00 01 01 03 100 03 0 1 2 3 4 5 6 7 8 9 10 05 13 14 22 11 12 Bulan Variasi Inflasi Bulanan Tanggal Variasi Harga Saham Harian 11 11 VARIASI MUSIM DENGAN METODE RATA-RATA SEDERHANA Indeks Musim = (Rata-rata per kuartal/rata-rata total) x 100 Bulan Januari 88 Rumus= Nilai bulan ini x 100 Nilai rata-rata (88/95) x100 Februari 82 (82/95) x100 86 Maret 106 (106/95) x100 112 April 98 (98/95) x100 103 Mei 112 (112/95) x100 118 Juni 92 (92/95) x100 97 Juli 102 (102/95) x100 107 96 (96/95) x100 101 105 (105/95) x100 111 85 (85/95) x100 89 November 102 (102/95) x100 107 Desember 76 (76/95) x100 80 Rata-rata 95 Agustus September Oktober Pendapatan Indeks Musim 93 12 12 METODE RATA-RATA DENGAN TREND • Metode rata-rata dengan trend dilakukan dengan cara yaitu indeks musim diperoleh dari perbandingan antara nilai data asli dibagi dengan nilai trend. • Oleh sebab itu nilai trend Y’ harus diketahui dengan persamaan Y’ = a + bX. 13 13 METODE RATA-RATA DENGAN TREND Y Y’ Perhitungan Indeks Musim Januari 88 97,41 (88/97,41) x 100 90,3 Februari 82 97,09 (82/97,09) x 100 84,5 Maret 106 96,77 (106/96,77) x100 109,5 April 98 96,13 (98/96,13) x 100 101,9 Mei 112 95,81 (112/95,81) x 100 116,9 Juni 92 95,49 (92/95,49) x 100 96,3 Juli 102 95,17 (102/95,17) x 100 107,2 Agustus 96 94,85 (96/94,85) x 100 101,2 September 105 94,53 (105/94,53) x 100 111,1 Oktober 85 93,89 (85/93,89) x 100 90,5 November 102 93,57 (102/93,57) x 100 109,0 Desember 76 93,25 (76/93,25) x 100 81,5 Bulan 14 14 VARIASI SIKLUS Siklus Indeks Saham Gabungan Siklus 2,5 Ingat 2 1,5 Y=TxSxCxI TCI = Y/S CI = TCI/T Di mana CI adalah Indeks Siklus IHSG Maka 1 0,5 0 -0,5 94 95 96 97 98 99 00 01 02 -1 -1,5 -2 -2,5 Tahun 15 15 CONTOH SIKLUS Th 1998 1999 2000 2001 Trwl Y I 22 T 17,5 S TCI=Y/S CI=TCI/T C II 14 17,2 95 14,7 86 III 8 16,8 51 15,7 93 92 I 25 16,5 156 16,0 97 97 II 15 16,1 94 16,0 99 100 III 8 15,8 49 16,3 103 102 I 26 15,4 163 16,0 104 104 II 14 15,1 88 15,9 105 105 III 8 14,7 52 15,4 105 106 I 24 14,3 157 15,3 107 108 II 14 14,0 89 15,7 112 III 9 13,6 16 16 GERAK TAK BERATURAN Siklus Ingat Y = T x S x C x I TCI = Y/S CI = TCI/T I = CI/C 17 17 GERAK TAK BERATURAN Th 1998 1999 2000 2001 Trwl C I=(CI/C) x 100 I CI=TCI/T 86 II 93 92 101 III 97 97 100 I 99 100 99 II 103 102 101 III 104 104 100 I 105 105 100 II 105 106 99 III 107 108 99 I 112 II III 18 18 PENGUJIAN KOEFISIEN REGRESI DENGAN ANALISIS VARIANSI 19 Measures of Variation: The Sum of Squares SST = Total = Sample Variability SSR Explained Variability + + SSE Unexplained Variability SST = Total Sum of Squares SSR = Regression Sum of Squares SSE = Error Sum of Squares 20 Measures of Variation: The Sum of Squares SSE =(Yi - Yi )2 Y _ SST = (Yi - Y)2 _ SSR = (Yi - Y)2 Xi _ Y X 21 Venn Diagrams and Explanatory Power of Regression Variations in store Sizes not used in explaining variation in Sales Sizes Sales Variations in Sales explained by the error term or unexplained by Sizes SSE Variations in Sales explained by Sizes or variations in Sizes used in explaining variation in Sales SSR 22 The ANOVA Table in Excel ANOVA df Regressio k n Residuals Total SS MS F SS R MSR =SSR/k P-value of MSR/MSE the F Test n-k- SS 1 E n-1 Significanc e F MSE =SSE/(n-k1) SS T 23 Measures of Variation The Sum of Squares: Example Excel Output for Produce Stores Degrees of freedom ANOVA df SS MS Regression 1 30380456.12 30380456 Residual 5 1871199.595 374239.92 Total 6 32251655.71 F 81.17909 Regression (explained) df Error (residual) df Total df SSE SSR Significance F 0.000281201 SST 24 Venn Diagrams and Explanatory Power of Regression r 2 Sales Sizes SSR SSR SSE 25 Standard Error of Estimate n • SYX SSE n2 i 1 Y Yˆi 2 n2 • Measures the standard deviation (variation) of the Y values around the regression equation 26 Measures of Variation: Produce Store Example Excel Output for Produce Stores R e g r e ssi o n S ta ti sti c s M u lt ip le R R S q u a re 0 .9 4 1 9 8 1 2 9 A d ju s t e d R S q u a re 0 .9 3 0 3 7 7 5 4 S t a n d a rd E rro r 6 1 1 .7 5 1 5 1 7 O b s e r va t i o n s r2 = .94 0 .9 7 0 5 5 7 2 n 7 94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage. Syx 27 Linear Regression Assumptions • Normality – Y values are normally distributed for each X – Probability distribution of error is normal • Homoscedasticity (Constant Variance) • Independence of Errors 28 Consequences of Violation of the Assumptions • Violation of the Assumptions – Non-normality (error not normally distributed) – Heteroscedasticity (variance not constant) • Usually happens in cross-sectional data – Autocorrelation (errors are not independent) • Usually happens in time-series data • Consequences of Any Violation of the Assumptions – Predictions and estimations obtained from the sample regression line will not be accurate – Hypothesis testing results will not be reliable • It is Important to Verify the Assumptions 29 Variation of Errors Around the Regression Line f(e) • Y values are normally distributed around the regression line. • For each X value, the “spread” or variance around the regression line is the same. Y X2 X1 X Sample Regression Line 30 Inference about the Slope: t Test • t Test for a Population Slope – Is there a linear dependency of Y on X ? • Null and Alternative Hypotheses – H0: 1 = 0 – H1: 1 0 (no linear dependency) (linear dependency) • Test Statistic – b1 1 t where Sb1 Sb1 SYX n (X i 1 – d. f . n 2 i X) 2 31 Example: Produce Store Data for 7 Stores: Store Square Feet Annual Sales ($000) 1 2 3 4 5 6 7 1,726 1,542 2,816 5,555 1,292 2,208 1,313 3,681 3,395 6,653 9,543 3,318 5,563 3,760 Estimated Regression Equation: Yˆi 1636.415 1.487X i The slope of this model is 1.487. Does square footage affect annual sales? 32 Inferences about the Slope: t Test Example Test Statistic: H0: 1 = 0 From Excel Printout b 1 H1: 1 0 Coefficients Standard Error .05 Intercept 1636.4147 451.4953 df 7 - 2 = 5 Footage 1.4866 0.1650 Decision: Critical Value(s): Reject Reject Reject H0. .025 .025 -2.5706 0 2.5706 t Sb1 t t Stat P-value 3.6244 0.01515 9.0099 0.00028 p-value Conclusion: There is evidence that square footage affects 33 annual sales. Inferences about the Slope: Confidence Interval Example Confidence Interval Estimate of the Slope: b1 tn 2 Sb1 Excel Printout for Produce Stores Intercept Footage Lower 95% Upper 95% 475.810926 2797.01853 1.06249037 1.91077694 At 95% level of confidence, the confidence interval for the slope is (1.062, 1.911). Does not include 0. Conclusion: There is a significant linear dependency of annual sales on the size of the store. 34 Inferences about the Slope: F Test • F Test for a Population Slope – Is there a linear dependency of Y on X ? • Null and Alternative Hypotheses – H0: 1 = 0 – H1: 1 0 (no linear dependency) (linear dependency) • Test Statistic SSR 1 – F SSE n 2 35 Relationship between a t Test and an F Test • Null and Alternative Hypotheses – H0: 1 = 0 – H1: 1 0 • t n2 2 (no linear dependency) (linear dependency) F1,n 2 • The p –value of a t Test and the p –value of an F Test are Exactly the Same • The Rejection Region of an F Test is Always in the Upper Tail 36 Inferences about the Slope: F Test Example H0: 1 = 0 ANOVA H1: 1 0 .05 Regression numerator Residual df = 1 Total denominator df 7 - 2 = 5 Test Statistic: From Excel Printout df 1 5 6 Reject = .05 0 6.61 F1, n 2 SS MS F Significance F 30380456.12 30380456.12 81.179 0.000281 1871199.595 374239.919 p-value 32251655.71 Decision: Reject H0. Conclusion: There is evidence that square footage affects annual sales. 37 Purpose of Correlation Analysis • Correlation Analysis is Used to Measure Strength of Association (Linear Relationship) Between 2 Numerical Variables – Only strength of the relationship is concerned – No causal effect is implied 38 Purpose of Correlation Analysis • Population Correlation Coefficient (Rho) is Used to Measure the Strength between the Variables XY X Y 39 Purpose of Correlation Analysis (continued) • Sample Correlation Coefficient r is an Estimate of and is Used to Measure the Strength of the Linear Relationship in the Sample Observations n r X i 1 n X i 1 i i X Yi Y X 2 n Y Y i 1 2 i 40 Sample Observations from Various r Values Y Y Y X r = -1 X r = -.6 Y X r=0 Y r = .6 X r=1 X 41 Features of and r • Unit Free • Range between -1 and 1 • The Closer to -1, the Stronger the Negative Linear Relationship • The Closer to 1, the Stronger the Positive Linear Relationship • The Closer to 0, the Weaker the Linear Relationship 42 t Test for Correlation • Hypotheses – H0: = 0 (no correlation) – H1: 0 (correlation) • Test Statistic t – r where r n2 2 n r r2 X i 1 n X i 1 i i X Yi Y X 2 n Y Y i 1 2 i 43 Example: Produce Stores r From Excel Printout Is there any evidence of linear relationship between annual sales of a store and its square footage at .05 level of significance? R e g r e ssi o n S ta ti sti c s M u lt ip le R R S q u a re 0 .9 7 0 5 5 7 2 0 .9 4 1 9 8 1 2 9 A d ju s t e d R S q u a re 0 . 9 3 0 3 7 7 5 4 S t a n d a rd E rro r 6 1 1 .7 5 1 5 1 7 O b s e rva t io n s H0: = 0 (no association) H1: 0 (association) .05 7 44 Example: Produce Stores Solution r .9706 t 9.0099 2 1 .9420 r 5 n2 Critical Value(s): Reject .025 Reject .025 -2.5706 0 2.5706 Decision: Reject H0. Conclusion: There is evidence of a linear relationship at 5% level of significance. The value of the t statistic is exactly the same as the t statistic value for test on the slope coefficient. 45 Estimation of Mean Values Confidence Interval Estimate for Y | X X : i The Mean of Y Given a Particular Xi Standard error of the estimate Size of interval varies according to distance away from mean, X Yˆi tn 2 SYX t value from table with df=n-2 (Xi X ) 1 n n 2 (Xi X ) 2 i 1 46 Prediction of Individual Values Prediction Interval for Individual Response Yi at a Particular Xi Addition of 1 increases width of interval from that for the mean of Y Yˆi tn 2 SYX 1 (Xi X ) 1 n n 2 (Xi X ) 2 i 1 47 Interval Estimates for Different Values of X Y Confidence Interval for the Mean of Y Prediction Interval for a Individual Yi X X a given X 48 Example: Produce Stores Data for 7 Stores: Store Square Feet Annual Sales ($000) 1 2 3 4 5 6 7 1,726 1,542 2,816 5,555 1,292 2,208 1,313 3,681 3,395 6,653 9,543 3,318 5,563 3,760 Consider a store with 2000 square feet. Regression Model Obtained: Yi = 1636.415 +1.487Xi 49 Estimation of Mean Values: Example Confidence Interval Estimate for Y | X X i Find the 95% confidence interval for the average annual sales for stores of 2,000 square feet. Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000) X = 2350.29 SYX = 611.75 Yˆi tn 2 SYX tn-2 = t5 = 2.5706 ( X i X )2 1 n 4610.45 612.66 n 2 (Xi X ) i 1 3997.02 Y |X X i 5222.34 50 Prediction Interval for Y : Example Prediction Interval for Individual YX X i Find the 95% prediction interval for annual sales of one particular store of 2,000 square feet. Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000) X = 2350.29 SYX = 611.75 Yˆi tn 2 SYX tn-2 = t5 = 2.5706 1 ( X i X )2 1 n 4610.45 1687.68 n 2 ( X X ) i i 1 2922.00 YX X i 6297.37 51 PENGGUNAAN MS EXCEL UNTUK REGRESI • Masukkan data Y dan data X pada sheet MS Excel, misalnya data Y di kolom A dan X pada kolom B dari baris 1 sampai 5. • Klik icon tools, pilih ‘data analysis’, dan pilih ‘simple linear regression’. • Pada kotak data tertulis Y variable cell range: masukkan data Y dengan mem-blok kolom a atau a1:a5. Pada X variable cell range: masukkan data X dengan mem-blok kolom b atau b1:b5. • Anda klik OK, maka hasilnya akan keluar. Y’= a+b X; a dinyatakan sebagai intercept dan b sebagai X variable1 pada kolom coefficients. 52 52 53 53 54 54 55 55 SELAMAT BELAJAR SEMOGA SUKSES SELALU 56 56