Ekonometrika

advertisement

Ekonometrika

• Sebagian Materi dapat di download di ariefyulianto.wordpress.com

• Software dapat di download di uap.unnes.ac.id

• Konsep dan Aplikasi Teori Ekonomi melalui

Pendekatan Kuantitatif

Referensi

1. Damodar N Gujarati. Basic econometrics.

Copyrighted Material. Fourth Edition.

2. Damodar N Gujarati. 2006. Dasar-Dasar

Ekonometrika. Jakarta : Penerbit Erlangga.

3. Rainer Winkelmann. 2008. Econometric

Analysis of Count Data. Fifth edition. Berlin

Heidelberg : Springer-Verlag

4. Sarwoko. 2008. Dasar-Dasar Ekonometrika.

Yogyakarta : Penerbit Andi

5. Badi H. Baltagi. 2008. Econometrics. Berlin

Heidelberg : Springer-Verlag

Kontrak (1)

Metode Pembelajaran

Agar dicapai hasil pengajaran yang optimal, maka pada mata kuliah ini digunakan kombinasi metode pembelajaran ceramah dan diskusi di dalam kelas, serta observasi mandiri di luar kelas (lapangan).

Sistem Penilaian

Penilaian atas keberhasilan mahasiswa dalam mengikuti dan memahami materi pada mata kuliah ini didasarkan penilaian selama proses perkuliahan dan nilai ujian, dengan komposisi sebagai berikut: a. nilai tugas individu/kelompok, nilai presensi bobot 1 b. nilai mid test bobot 2 c. nilai ujian: bobot 3

Kontrak (2)

Tugas

Tugas pada mata kuliah ini dapat bersifat tugas individu atau tugas kelompok, dan pemberian tugas oleh dosen dilakukan pada saat perkuliahan. Tidak ada toleransi terhadap keterlambatan penyerahan/ pengumpulan tugas, kecuali ada alasan yang adapat dipertanggungjawabkan.

Persyaratan Mengikuti Kuliah

Sesuai dengan Tata Tertib Mengikuti Kuliah yang ditetepkan oleh UNNES.

Telah membaca dan membawa sekurang-kurangnya buku referensi utama pada setiap perkuliahan.

Lain-lain:

Toleransi keterlambatan untuk dosen dan mahasiswa adalah 30 menit dari jadual dan yang masuk ke kelas terakhir adalah dosen

Alat komunikasi mahasiswa dimatikan selama perkuliahan

1. WHAT IS ECONOMETRICS

• econometrics means “economic measurement

• . . . econometrics may be defined as the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference

• Econometrics is concerned with the empirical determination of economic

• laws.

WHY A SEPARATE DISCIPLINE?

•econometrics is an amalgam of economic theory (makes statements or hypotheses that are mostly qualitative in nature), mathematical economics (to express economic theory in mathematical form (equations) without regard to measurability or empirical verification of the theory), economic statistics (collecting, processing, and presenting economic data in the form of charts and tables), and mathematical statistics (provides many tools used in the trade, the econometrician often needs special methods in view of the unique nature of most economic data, namely, that the data are not generated as the result of a controlled experiment)

METHODOLOGY OF

ECONOMETRICS

1. Statement of theory or hypothesis.

2. Specification of the mathematical model of the theory

3. Specification of the statistical, or econometric, model

4. Obtaining the data

5. Estimation of the parameters of the econometric model

6. Hypothesis testing

7. Forecasting or prediction

8. Using the model for control or policy purposes

To illustrate the preceding steps

1.Statement of Theory or Hypothesis

The fundamental psychological law . . . is that men [women] are disposed, as a rule and on average, to increase their consumption as their income increases, but not as much as the increase in their income

• marginal propensity to consume (MPC)

2. Specification of the Mathematical Model of

Consumption

• Y = β 1 + β 2 X 0 < β 2 < 1 (I.3.1) where Y = consumption expenditure and X = income, and where β 1 and β 2, known as the parameters of the model, are, respectively, the intercept and slope coefficients.

3. Specification of the Econometric Model of

Consumption

• Mathematical Model are exact or deterministic relationship between consumption and income.

But relationships between economic variables are generally inexact

• Y = β 1 + β 2 X + u (I.3.2) where u , known as the disturbance, or error, term, is a random (stochastic) variable that has well-defined probabilistic properties.

4. Obtaining Data

• To estimate the econometric model given in (I.3.2), that is, to obtain the numerical values of β 1 and β 2, we need data

5. Estimation of the Econometric Model

• For now, note that the statistical technique of regression analysis is the main tool used to obtain the estimates

• Y ˆ = −184 .

08 + 0 .

7064 Xi

• The hat on the Y indicates that it is an estimate.11 The estimated consumption function (i.e., regression line)

6. Hypothesis Testing

• Statistical inference (hypothesis testing).

7. Forecasting or Prediction

• To illustrate, suppose we want to predict the mean consumption expenditure for

1997. The GDP value for 1997 was 7269.8 billion dollars

• Y ˆ1997 = −184 .

0779 + 0 .

7064 (7269 .

8) =

4951 .

3167

8. Use of the Model for Control or Policy

Purposes

The Eight Components of

Integrated Service Management

1. Product Elements

2. Place, Cyberspace, and Time

3. Process

4. Productivity and Quality

5. People

6. Promotion and Education

7. Physical Evidence

8. Price and Other User Outlays

Principles of service marketing and management. lovelook, wright

Marketing management (Philip

Kotler twelfth edition

• Product is the first and most important element of the marketing mix. Product strategy calls for making coordinated decisions on product mixes, product lines, brands, and packaging and labeling.

Initial public offering

• Emiten

• Underwriter

• Auditor

• Size

• Age

2. THE NATURE OF

REGRESSION ANALYSIS

Anatomy of econometric modeling

THE MODERN INTERPRETATION

OF REGRESSION

Regression analysis is concerned with the study of the dependence of one variable, the dependent variable , on one or more other variables, the explanatory variables ,with a view to estimating and/or predicting the (population) mean or average value of the former in terms of the known or fixed

(in repeated sampling) values of the latter

Contoh : how the average height of sons changes, given the fathers’ heigh ; Distribution in a hypothetical population of heights of boys measured at fixed ages

Measurement Scales of Variables

• Ratio Scale For a variable X , taking two values,

X 1 and X 2, the ratio X 1 /X 2 and the distance ( X 2

− X 1) are meaningful quantities

• Interval Scale the distance between two time periods, say (2000 –1995) is meaningful, but not the ratio of two time periods (2000 / 1995)

• Ordinal Scale Examples are grading systems

(A, B, C grades) or income class (upper, middle, lower).

• Nominal Scale Variables such as gender (male, female) and marital status (married, unmarried, divorced, separated) simply denote categories

TWO-VARIABLE REGRESSION

ANALYSIS:SOME BASIC IDEAS

the simplest possible regression analysis, namely, the bivariate, or twovariable, regression in which the dependent variable

(the regressand) is related to a single explanatory variable (the regressor)

A HYPOTHETICAL EXAMPLE

in the table refer to a total population of 60 families in a hypothetical community and their weekly income (X) and weekly consumption expenditure (Y), both in dollars. The 60 families are divided into 10 income groups (from $80 to $260) and the weekly expenditures of each family in the various groups are as shown in the table

E(Y | Xi) = β1 + β2Xi where β 1 and β 2 are unknown but fixed parameters known as the regression coefficients; β 1 and β 2 are also known as intercept and slope coefficients, respectively. Equation (2.2.1) itself is known as the linear population regression function. Some alternative expressions used in the literature are linear population regression model or simply linear population regression

THE MEANING OF THE TERM LINEAR

• Linearity in the Variables ( a regression function such as E ( Y | X i

β

1

+ β

2

X 2 i is not a linear function because the variable X appears

) = with a power or index of 2.

• Linearity in the Parameters (E ( Y | Xi ) = β

1 parameter) regression model ; E ( Y | Xi ) = β

1 nonlinear in the parameter β

2

)

+ β

2

X

+ 3 β

2

2 i is a linear (in the x 2 , which is

STOCHASTIC SPECIFICATION OF population regression function (PRF) family consumption expenditure on the average increases, the relationship between an individual family’s consumption expenditure and a given level of income?

where the deviation ui is an unobservable random variable taking positive or negative values. Technically, ui is known as the stochastic disturbance or stochastic error term.

THE SIGNIFICANCE OF THE STOCHASTIC

DISTURBANCE TERM (1)

1.

Vagueness of theory ( The theory, if any, determining the behavior of Y may be, and often is, incomplete)

2.

Unavailability of data ( family wealth as an explanatory variable in addition to the income variable to explain family consumption expenditure. But unfortunately, information on family wealth generally is not available

3.

Core variables versus peripheral variables ( Assume in our consumptionincome example that besides income X 1, the number of children per family X 2, sex X 3, religion X 4, education X 5, and geographical region X 6 also affect consumption expenditure

4.

Intrinsic randomness in human behavior

5.

Poor proxy variables ( The disturbance term u may in this case then also represent the errors of measurement)

THE SIGNIFICANCE OF THE STOCHASTIC

DISTURBANCE TERM (2)

1.

Principle of parsimony ( we would like to keep our regression model as simple as possible

2.

Wrong functional form ( we do not know the form of the functional relationship between the regressand - Dependent variable and the regressors - independent variable )

THE SAMPLE REGRESSION FUNCTION (SRF)

3. TWO-VARIABLE REGRESSION

MODEL: THE PROBLEM OF

ESTIMATION

TWO-VARIABLE REGRESSION MODEL: THE

PROBLEM OF ESTIMATION (ordinary least square)

• the method of least squares has some very attractive statistical properties that have made it one of the most powerful and popular methods of regression analysis

Sering ditemukan pada data cross section

Sering ditemukan pada data timeseries

THE COEFFICIENT OF DETERMINATION r 2 :

A MEASURE OF “GOODNESS OF FIT”

• The coefficient of determination r 2 (two-variable case) or R2

(multiple regression) is a summary measure that tells how well the sample regression line fits the data.

The fundamental psychological law . . . is that men [women] are disposed, as a rule and on average, to increase their consumption as their income increases, but not by as much as the increase in their income,” that is, the marginal propensity to consume (MPC) is greater than zero but less than one

Var iables Entered/Removed b

Model

1

Vari ables

Entered

Vari ables

Rem oved Method

, n a. All reques ted variables entered.

b. Dependent Variable: Kons um s i

Enter

Model Summary

Model

1

R

,981 a

R Square

,962

Adjus ted

R Square a. Predictors : (Cons tant), Pendapatan

,957

Std. Error of the Es timate

6,4930

R Square

Change F Change

,962 202,868

Change Statis tics df1

1 df2

8

Sig. F Change

,000

ANOVA b

Model

1 Regress ion

Sum of

Squares

8552,727

337,273 Res idual

Total 8890,000 a. Predictors : (Constant), Pendapatan df b. Dependent Variabl e: Konsum si

1

8

9

Mean Square

8552,727

42,159

F

202,868

Sig.

,000 a

Coefficients a

Model

1 (Cons tant)

Pendapatan

Uns tandardized

Coefficients

B

24,455

,509

Std. Error

6,414

,036 a. Dependent Variable: Kons ums i

Standardi zed

Coefficien ts

Beta

,981 t

3,813

14,243

Sig.

,005

,000

CONSUMPTION –INCOME RELATIONSHIP

IN THE UNITED STATES, 1982 –1996

THE RELATIONSHIP BETWEEN EARNINGS

AND EDUCATION

Notes

• Alasan menggunakan adjusted R2 karena nilai

R2 bias, setiap tambahan satu variabel pada variabel independent akan meningkat tidak peduli variabel tersebut berpengaruh signifikan atau tidak

• Alasan menggunakan standarized beta mampu mengeliminasi perbedaan unit/ukuran pada variabel independent (butir, ekor) namun tidak dapat diketahui multikolinieritas (korelasi antar var bebas), nilai beta tidak dapat diinterpretasikan

TWO-VARIABLE REGRESSION MODEL:

THE PROBLEM OF ESTIMATION

Recall the two-variable PRF where ˆYi is the estimated (conditional mean) value of Yi .

which shows that the ˆui (the residuals) are simply the differences between the actual and estimated Y values

CLASSICAL NORMAL LINEAR REGRESSION

MODEL (CNLRM)

• Using the method of OLS we were able to estimate the parameters β 1, β 2, and σ 2.

Under the assumptions of the classical linear regression model (CLRM), we were able to show that the estimators of these parameters, ˆ β 1, ˆ β 2, and ˆ σ 2,

TWO-VARIABLE REGRESSION: INTERVAL

ESTIMATION AND HYPOTHESIS TESTING

Asumsi Klasik

• Model regresi linier : terspesifikasi benar dan error term additif

• Nilai rata-rata yang diharapkan disturbance error term = 0

• Kovarian distrubance dengan x = nol

• Varian dari variabel residu, disturbance adalah sama atau homokedastisitas

• Tidak ada otokorelasi antar variabel disturbance

• Tidak ada korelasi sempurna antar variabel bebas

• Variabel error term berdistribusi normal

HYPOTHESIS TESTING: GENERAL

COMMENTS

• HYPOTHESIS TESTING: GENERAL COMMENTS ( Is a given observation or finding compatible with some stated hypothesis or not?)

• In the language of statistics, the stated hypothesis is known as the null hypothesis and is denoted by the symbol H 0. The null hypothesis is usually tested against an alternative hypothesis (also known as maintained hypothesis) denoted by H 1

• reject or not reject the null hypothesis

• There are two mutually complementary approaches for devising such rules,

• namely, confidence interval and test of significance

Hipotesis o

Jika Ho benar

Jika Ho salah

Type kesalahan

Menerima Ho

Keputusan tepat

Kesalahan jenis II

Menolak Ho

Kesalahan jenis I

Keputusan tepat

HYPOTHESIS TESTING:

THE CONFIDENCE-INTERVAL APPROACH

• Two-Sided or Two-Tail Test To illustrate the confidenceinterval approach, once again we revert to the consumption – income example. As we know, the estimated marginal propensity to consume (MPC), ˆ β2, is 0.5091. Suppose we postulate that H0: β2 = 0.3 ; H1: β2 = 0.3

• Very often such a two-sided alternative hypothesis reflects the fact that we do not have a strong a priori or theoretical expectation about the direction in which the alternative hypothesis should move from the null hypothesis.

HYPOTHESIS TESTING:

THE CONFIDENCE-INTERVAL APPROACH

• One-Sided or One-Tail Test Sometimes we have a strong a priori or theoretical expectation (or expectations based on some previous empirical work) that the alternative hypothesis is one-sided or unidirectional rather than two-sided, as just discussed. Thus, for our consumption –income example, one could postulate that H 0: β 2 ≤ 0 .

3 and H 1: β 2 >

0 .

3

• HYPOTHESIS TESTING: THE TEST-OF-

SIGNIFICANCE APPROACH

• Testing the Significance of Regression Coefficients:

The t Test

• which gives the interval in which ˆ β 2 will fall with 1 − α probability, given β 2 = β *2. In the language of hypothesis testing, the 100(1 − α )% confidence interval established in (5.7.2) is known as the region of acceptance (of the null hypothesis) and the region(s) outside the confidence interval is (are) called the region(s) of rejection (of H 0) or the critical region(s). As noted previously, the confidence limits, the endpoints of the confidence interval, are also called critical values

MULTICOLLINEARITY:

WHAT HAPPENS IF

THE REGRESSORS

ARE CORRELATED?

What is the nature of multicollinearity

• Model regresi yang baik, seharusnya tidak terjadi korelasi diantara variabel independen.

• Jika berkorelasi maka variabel tidak ortogonal (korelasi antar variabel independent = 0)

Ciri-Ciri Multikolinieritas (Ghozali,

2005)

• Nilai R square yang dihasilkan dari estimasi model regresi tinggi, namun secara individual variabel independent banyak yang tidak signifikan -> dependen

• Antar variabel independent memiliki korelasi

>0,9

• Setiap variabel independent yang dijelaskan oleh variabel independet lainnya. Output nilai tolerance rendah (<0,10) atau VIF >10

THE NATURE OF

MULTICOLLINEARITY

• it meant the existence of a “perfect,” or exact, linear relationship among some or all explanatory variables of a regression model

• Yi = β 0 + β 1 Xi + β 2 X 2 i + β 3 X 3 i + ui

multicollinearity may be due to the following factors

• The data collection method employed, for example, sampling over a limited range of the values taken by the regressors in the population

• Constraints on the model or in the population being sampled

• Model specification

• An overdetermined model. This happens when the model has more explanatory variables than the number of observations.

Cara mengobati multikolinieritas

1. Menggabungkan data cross section dan time series

2. Keluarkan satu atau lebih variabel independen yang memp nilai korelasi tinggi (0,94%)

3. Transformasi variabel

4. Gunakan model untuk prediksi bukan interpretasi

5. Gunakan center data untuk analisis (data mentah – mean)

Sumber imam ghozali, 2006

AUTOCORRELATION:

WHAT HAPPENS IF

THE ERROR TERMS ARE

CORRELATED?

three types of data

(1) cross section

(2) time series

(3) combination of cross section and time series

• correlation between members of series of observations ordered in time [as in time series data] or space [as in cross-sectional data]

• autocorrelation as “lag correlation of a given series with itself, lagged by a number of time units,’’ whereas he reserves the term serial correlation to “lag correlation between two different series.

• where u and v are two different time series, is called serial correlation

shows a cyclical pattern

suggests an upward or downward linear trend in the disturbances

indicates that both linear and quadratic trend terms are present in the disturbances

indicates no systematic pattern nonautocorrelation

DETECTING

AUTOCORRELATION

• Graphical Method

• Autokorelasi dalam konsep regresi linier berarti komponen error berkorelasi berdasarkan urutan waktu (pada data timeseries) atau urutan ruang (pada data cross-sectional).

• Contoh data timeseries (terdapat urutan waktu) misalnya pengaruh biaya iklan terhadap penjualan dari bulan januari hingga bulan desember. Sedangkan data cross-sectional adalah data yang tidak ada urutan waktu, misal pengaruh konsentrasi zat X terhadap kecepatan reaksi suatu senyawa kimia.

• Untuk mendeteksi ada atau tidaknya autokorelasi, dapat dilakukan dengan menggunakan statistik uji Durbin-Watson. Apabila nilai D-W berada di sekitar angka 2, berarti model regresi kita aman dari kondisi heteroskedastisitas

Menanggulangi autokorelasi

• Beberapa uji statistik yang sering dipergunakan adalah uji Durbin-Watson atau uji dengan Run Test dan jika data observasi di atas 100 data sebaiknya menggunakan uji

Lagrange Multiplier. Beberapa cara untuk menanggulangi masalah autokorelasi adalah dengan mentransformasikan data atau bisa juga dengan mengubah model regresi ke dalam bentuk persamaan beda umum (generalized difference equation). Selain itu juga dapat dilakukan dengan memasukkan variabel lag dari variabel terikatnya menjadi salah satu variabel bebas, sehingga data observasi menjadi berkurang 1

Korelasi

Korelasi

• Korelasi antara x(t) dan y(t) dinamakan dengan cross-correlation , dirumuskan dengan

C xy

( t )

 x ( t )

 y ( t )

  x (

) y ( t

 

) d

 atau

C xy

( t )

 x ( t )

 y ( t )

  x (

  t ) y (

) d

Auto-korelasi

• Korelasi x(t) dengan dirinya sendiri disebut auto-korelasi

C xx

( t )

 x ( t )

 x ( t )

 x (

) x ( t

 

) d

Korelasi

• Contoh

1 x(t)

1 h(t)

1.5

2.5

t

C xh

( p )

 x ( t

 p ) h ( t ) dt

0 1 t

1 x(t)

Korelasi

1 h(t)

1.5+p 2.5+p t

1.

Untuk 1.5+p>1 atau p>-0.5

0 1 t

C xh

( p )

0

1 h(t)

Korelasi

x(t-p) t

1.5+p 2.5+p

2. Untuk 1.5+p<1 dan 1.5+p>0, atau -1.5<p<-0.5

C xh

( p )

 x ( t

 p ) h ( t ) dt

C xh

( p )

1 .

5

1

1 .

1 dt p

C xh

( p )

1

1 .

5

 t

 

1

1 .

5

 p p

 

0 .

5

 p

x(t-p) 1

Korelasi

h(t) t

1.5+p 2.5+p

3. Untuk 1.5+p<0 dan 2.5+p>1, atau -1.5<p<1.5

C xh

( p )

 x ( t

C xh

( p )

C xh

( p )

2 .

0

5

 p

1 .

1 dt

2 .

5

 p p ) h ( t ) dt

 t

 

2 .

5

0

 p

x(t-p)

1 h(t)

Korelasi

t

1.5+p

2.5+p

4. Untuk 2.5+p<0 atau p<-2.5

C xh

( p )

0 p+2.5

-2.5

1

-p-0.5

y(p)

-0.5

p

1 x(t)

Korelasi

1 h(t)

1.5

2.5

t p

C hx

( p )

 h ( t

 p ) x ( t ) dt

1.

Untuk 1+p<1.5 atau p<0.5

C hx

( p )

0

1+p t

Korelasi

1 h(t-p) x(t) t p 1+p

2. Untuk 1+p>1.5 dan 1+p<2.5, atau 0.5<p<1.5

C hx

( p )

 h ( t

 p ) x ( t ) dt

C hx

( p )

C hx

( p )

1

1 .

5 p

1 .

1 dt

1

 p

1 .

5

 

1 .

1

5 p

 p

0 .

5

x(t) 1

Korelasi

h(t-p) t p 1+p

3. Untuk p<2.5 dan 1+p>2.5, atau 1.5<p<2.5

C hx

( p )

 h ( t

 p ) x ( t ) dt

C hx

( p )

2 .

5

 p

1 .

1 dt

C hx

( p )

2 .

5

 p

 

2 .

5 p

x(t)

4. Untuk p>2.5

p

1

Korelasi

h(t-p) t

1+p

C xh

( p )

0 y(p)

1 p-0.5

0.5

-p+2.5

2.5

p

Autokorelasi

1 h(t) h(t-p) t p 1+p

1. Untuk 0<p<1, maka

C hh

( p )

 h ( t

 p ) h ( t ) dt

C hh

( p )

  p

1

1 .

1 dt

C hh

( p )

1

 p

 

1 p

Autokorelasi

h(t) h(t-p) 1 t p 1+p

2. Untuk 0>p>-1, karena p negatif, maka geser kiri

C hh

( p )

 h ( t

C hh

( p )

C hh

( p )

1

0

 p

1 .

1 dt

1

 p p ) h ( t ) dt

 

0

1

 p

Autokorelasi

3. Untuk p>1 dan p<-1,

C hh

( p )

0

1+p

1 y(p)

1-p

-1 +1 p

Korelasi

C xx

( t )

  x (

) x ( t

 

) d

C xy

( t )

C yx

( t )

  x (

  y (

 t ) y (

) d

 t ) x (

) d

 x ( t )

 x ( t ) x ( t )

 y ( t ) y ( t )

 x ( t )

C xx

( t )

C xx

( 0 )

C xx

(

 t )

C xx

( t ) x ( t )

 x ( t )

 y y

( t

( t

)

)

 y ( t z ( t

)

)

 x ( t ) x ( t x ( t

) x

( t

)

)

 y

( t y y

)

( t

( t

)

)

 z x

( t

( t

) z

)

( t

) z ( t )

C xy

( t )

C yx

(

 t )

ILUSTRASI ANALISIS REGRESI

Apakah Skor Tes Masuk dan Peringkat kelas di

SMU mempengaruhi Nilai Mutu Rata – rata

Mahasiswa Tingkat Pertama ?

Variabel Dependen :

NMR (Y)

Variabel Independen :

Skor Tes (X

1

)

Peringkat (X

2

)

ILUSTRASI ANALISIS REGRESI

NMR Skor Tes Peringkat

1.93 565.00

2.55

525.00

1.72

477.00

2.48

555.00

2.87

502.00

1.87

469.00

1.34

517.00

3.03

555.00

2.54

576.00

2.34

559.00

3.00

2.00

1.00

1.00

1.00

3.00

4.00

1.00

2.00

2.00

NMR Skor Tes Peringkat

1.40 574.00

8.00

1.45 578.00

4.00

1.72 548.00

8.00

3.80 656.00

1.00

2.13 688.00

5.00

1.81 465.00

6.00

2.33 661.00

1.00

2.53 477.00

1.00

2.04 490.00

2.00

3.20 524.00

2.00

LANGKAH -LANGKAH

• Masukkan data pada SPSS Data Editor

• Pilih Analyze > Regression > Linear

1. Pilih dependen Variable

2. Pilih Independen Variables

3. Pada pilihan Statistics , aktifkan : Collinearity Diagnostics

Durbin Watson

Klik Continue

4. Pada pilihan Plot , aktifkan Normal Probability Plot . Klik

Continue

5. Pada Pilihan Save ,

~ Predicted Value , aktifkan Unstandardized

~ Residual , aktifkan Studentized

Klik Continue

6. Klik OK

HASIL ANALISIS

• Regression

Model Summary b

Model

1

R

.691

a

R Square

.478

Adjusted

R Square

.417

a.

Predictors: (Constant), PERINGKA, SKORTES b. Dependent Variable: NMR

Std. Error of the Estimate

.4915

ANOV A b

Model

1 Regres sion

Residual

Total

Sum of

Squares

3.762

4.107

7.869

df

2

17

19

Mean S quare a. Predic tors: (Constant), PERINGKA, SKORTE S b. Dependent Variable: NMR

1.881

.242

Durbin-W ats on

2.254

F

7.786

Sig.

.004

a

Model

1 (Const ant)

SK ORTES

PE RINGKA

Unstandardized

Coeffic ients

B

1.269

2.769E -03

-.184

a.

Dependent Variable: NMR

St d. E rror

.978

.002

.050

St andardi zed

Coeffic ien ts

Beta

.275

-.648

t

1.298

1.568

-3. 692

Sig.

.212

.135

.002

Collinearity Statistic s

Tolerance VIF

.998

.998

1.002

1.002

PEMERIKSAAN ASUMSI

1. ASUMSI NORMALITAS ERROR

Hasil P-P plot menunjukkan pola garis lurus mendekati sudut

45 0 , sehingga asumsi normalitas sisaan terpenuhi

1.00

Dependent Variab le: NMR

.75

.50

.25

0.00

0.00

.25

Obs er v ed C um Pr ob

.50

.75

1.00

PEMERIKSAAN ASUMSI

2. ASUMSI AUTOKORELASI

Model Summary b

Model

1

R

.691

a

R Square

.478

Adjusted

R Square

.417

a.

Predictors: (Constant), PERINGKA, SKORTES

Std. Error of the Estimate

.4915

Durbin-W ats on

2.254

Kaidah Uji Durbin Watson : Disimpulkan tidak ada autokorelasi bila du < d < 4 – du, Nilai du dapat dilihat di Tabel

Dengan n = 20 dan k (banyak variable bebas) = 2, diperoleh nilai du = 1.54

dan 4 – du = 4 – 1.54 = 2.46

Karena du = 1.54 < d = 2.254 < 4 – du = 2.46 maka dapat diterima bahwa asumsi nonautokorelasi terpenuhi

PEMERIKSAAN ASUMSI

3.

ASUMSI MULTIKOLINEARITAS

Unstandardized

St andardi zed

Coeffic ien

Coeffic ients ts

Model

1 (Const ant)

SK ORTES

PE RINGKA

B

1.269

2.769E -03

-.184

St d. E rror

.978

.002

.050

Beta

.275

-.648

t

1.298

1.568

-3. 692 a.

Dependent Variable: NM R

Sig.

.212

.135

.002

Collinearity Statistic s

Tolerance VIF

.998

.998

1.002

1.002

Model

1

Dim ension

1

2

3

Eigenvalue

2.725

.269

6.397E -03 a. Dependent Variable: NM R

Condit ion

Index

1.000

3.185

20.639

Variance P roportions

(Const ant) SK ORTES PE RINGKA

.00

.00

.04

.01

.99

.01

.99

.96

.00

Condition Index = 20.639 < 30

Nilai VIF untuk skortes = 1.002 < 10

Nilai VIF untuk peringkat = 1.002 <10

Jadi tidak terdapat multikolinearitas

PEMERIKSAAN ASUMSI

4. ASUMSI

HETEROSKEDASTISITAS

3

2

Plotkan residual terstudentkan dengan nilai dugaan.

a. Pilih Graphs > Scatter > Simple.

1

0 b. Pilih Define

Pilih Stundentized Residual sebagai Y axis

Pilih Unstundardized predicted value sebagai X axis

Klik OK

-1

-2

1.0

1.5

2.0

2.5

Unstandardized Predicted V alue

Plot antara residual terstudentkan dengan nilai dugaan berpola acak, sehingga asumsi homoskedastisitas terpenuhi

3.0

INTERPRETASI

VALIDASI MODEL

Koefisien determinasi (R 2 ) = 0.478

Artinya kontribusi pengaruh skor tes dan peringkat terhadap nilai mutu rata-rata sebesar 47.8%. Sedang sisanya dipengaruhi oleh variabel lain yang belum ada dalam model

Bila kita melakukan prediksi besarnya NMR berdasar skor tes dan perigkat, maka tingkat akurasinya sebesar 47.8%

Uji F melalui ANOVA Regresi menghailkan p = 0.004

Uji koefisien regresi secara simultan signifikan

Uji t menghasilkan p untuk skor tes dan peringkat masing – masing

0.135 dan 0.002. Artinya hanya peringkat yang berpengaruh signifikan terhadap besarnya NMR

INTERPRETASI

Model hasil regresi

NMR = 1.269 + 0.002769 Skor tes – 0.184 Peringkat

1.

Penjelasan terhadap fenomena

Variabel yang berpengaruh secara signifikan adalah peringkat dengan koefisien regresi – 0.184

Artinya semakin kecil peringkat maka semakin tinggi NMR.

Pada keadaan Skor tes konstan, jika Peringkat meningkat 1 tingkat maka NMR akan turun sebesar 0.184

INTERPRETASI

2. Prediksi

Misal terdapat seorang anak dengan Skor tes 550 dengan peringkat 4, maka berapa NMR – nya?

NMR = 1.269 + 0.002769 (550) – 0.184 (4)

= 2.05

Prediksi NMR adalah 2.05

Tingkat akurasi dari hasil prediksi ini adalah sebesar 47.8% (relatif rendah), akan tetapi bersifat general (karena nilai p untuk uji F pada ANOVA sebesar 0.004

INTERPRETASI

3. Faktor determinan

Z

NMR

= 0.275 Z

Skor tes

- 0.648 Z peringkat

Variabel yang berpengaruh paling kuat terhadap NMR adalah peringkat, kemudian Skor tes. (Koefisien standardize Beta terbesar berarti pengaruhnya paling kuat, seandainya seluruh variabel signifikan). Dalam contoh ini yang signifikan hanya peringkat, sehingga yang berpengaruh secara bermakna terhadap NMR hanya peringkat.

Model

1 (Const ant)

SK ORTES

PE RINGKA

Unstandardized

Coeffic ients

B

1.269

2.769E -03

-.184

a. Dependent Variable: NM R

St d. E rror

.978

.002

.050

St andardi zed

Coeffic ien ts

Beta

.275

-.648

t

1.298

1.568

-3. 692

Sig.

.212

.135

.002

Collinearity Statistic s

Tolerance VIF

.998

.998

1.002

1.002

HETEROSCEDASTICITY

WHAT HAPPENS IF THE

ERROR VARIANCE IS

NONCONSTANT?

THE CLASSICAL LINEAR

REGRESSION MODEL

PRF: Yi = β 1 + β 2 Xi + ui . It shows that Yi depends on both Xi and ui . Therefore, unless we are specific about how Xi and ui are created or generated, there is no way we can make any statistical inference about the Yi and also, as we shall see, about β 1 and β 2. Thus, the assumptions made about the Xi variable(s) and the error term are extremely critical to the valid interpretation of the regression estimates

There are several reasons why the variances of ui may be variable, some of which are as follows

• Following the error-learning models

• As incomes grow, people have more discretionary income 2 and hence more scope for choice about the disposition of their income.

Hence, σ 2 i is likely to increase with income

• As data collecting techniques improve, σ 2 i is likely to decrease

• Heteroscedasticity can also arise as a result of the presence of outliers

• the regression model is correctly specified (ex demand function for a commodity, if we do not include the prices of commodities complementary to or competing with the commodity in question (the omitted variable bias)

• Another source of heteroscedasticity is skewness in the distribution of one or more regressors included in the model

There are several reasons why the variances of ui may be variable, some of which are as follows

• Another source of heteroscedasticity is skewness in the distribution of one or more regressors included in the model. Examples are economic variables such as income, wealth, and education. It is well known that the distribution of income and wealth in most societies is uneven, with the bulk of the income and wealth being owned by a few at the top.

• Heteroscedasticity can also arise because of (1) incorrect data transformation (e.g., ratio or first difference transformations) and (2) incorrect functional form (e.g., linear versus log –linear models)

what happens to the regression results if the observations for Chile are dropped from the analysis

• the problem of heteroscedasticity is likely to be more common in cross-sectional than in time series data. In cross-sectional data, one usually deals with members of a population at a given point in time, such as individual consumers or their families, firms, industries, or geographical subdivisions such as state, country, city, etc

DETECTION OF

HETEROSCEDASTICITY

• as in the case of multicollinearity, there are no hard-and-fast rules for detecting heteroscedasticity, only a few rules of thumb (need most economic investigations. In this respect the econometrician differs from scientists in fields such as agriculture and biology, where researchers have a good deal of control over their subjects)

Park Test

Glejser Test

Rank spearman

DUMMY VARIABLE

REGRESSION MODELS

model is based on several simplifying assumptions, which are as follows

• The regression model is linear in the parameters

• The values of the regressors, the X ’s, are fixed in repeated sampling.

• For given X ’s, the mean value of the disturbance ui is zero

• For given X ’s, there is no autocorrelation in the disturbances

• If the X ’s are stochastic, the disturbance term and the (stochastic)

• X ’s are independent or at least uncorrelated

• The number of observations must be greater than the number of regressors

• There must be sufficient variability in the values taken by the regressors.

• The regression model is correctly specified

• There is no exact linear relationship (i.e., multicollinearity) in the regressors.

• The stochastic (disturbance) term ui is normally distributed.

four types of variables

• ratio scale, interval scale, ordinal scale, and nominal scale

• known as indicator variables, categorical variables, qualitative variables, or dummy variables

THE NATURE OF DUMMY

VARIABLES

• In regression analysis the dependent variable, or regressand, is frequently influenced not only by ratio scale variables (e.g., income, output, prices, costs, height, temperature)

• qualitative,or nominal scale, in nature, such as sex, race, color, religion, nationality, geographical region, political upheavals, and party affiliation

• As a matter of fact, a regression model may contain regressors that are all exclusively dummy, or qualitative, in nature. Such models are called Analysis of Variance

(ANOVA) models

Dummy Variables

• Dummy variables refers to the technique of using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal level measure.

• The term “dummy” appears to refer to the fact that the presence of the trait indicated by the code of 1 represents a factor or collection of factors that are not measurable by any better means within the context of the analysis.

Coding of dummy Variables

• Take for instance the race of the respondent in a study of voter preferences

– Race coded white(0) or black(1)

• There are a whole set of factors that are possibly different, or even likely to be different, between voters of different races

– Income, socialization, experience of racial discrimination, attitudes toward a variety of social issues, feelings of political efficacy, etc

– Since we cannot measure all of those differences within the confines of the study we are doing, we use a dummy variable to capture these effects.

Multiple categories

• Now picture race coded white(0), black(1),

Hispanic(2), Asian(3) and Native American(4)

• If we put the variable race into a regression equation, the results will be nonsense since the coding implicitly required in regression assumes at least ordinal level data – with approximately equal differences between ordinal categories.

• Regression using a 3 (or more) category nominal variable yields un-interpretable and meaningless results.

Creating Dummy variables

• The simple case of race is already coded correctly

– Race : coded 0 for white and 1 for black

• Note the coding can be reversed and leads only to changes in sign and direction of interpretation.

• The complex nominal version turns into 5 variables:

– White ; coded 1 for whites and 0 for non-whites

– Black ; coded 1 for blacks and 0 for non-blacks

– Hispanic ; coded 1 for Hispanics and 0 for non- Hispanics

– Asian ; coded 1 for Asians and 0 for non- Asians

– AmInd ; coded 1 for native Americans and 0 for non-native

Americans

Regression with Dummy Variables

• The dummy variable is then added the regression model

Y i

 a

B

1

* X i

B

2

* Race i

 e i

• Interpretation of the dummy variable is usually quite straightforward.

– The intercept term represents the intercept for the omitted category

– The slope coefficient for the dummy variable represents the change in the intercept for the category coded 1 (blacks)

Regression with only a dummy

• When we regress a variable on only the dummy variable, we obtain the estimates for the means of the depended variable.

Y

 a

B * Race

 e i 1 i i

• a is the mean of Y for Whites and a+B

1 the mean of Y for Blacks is

Omitting a category

• When we have a single dummy variable, we have information for both categories in the model

• Also note that

White = 1 – Black

• Thus having both a dummy for White and one for Blacks is redundant.

• As a result of this, we always omit one category, whose intercept is the model’s intercept.

• This omitted category is called the reference category

– In the dichotomous case, the reference category is simply the category coded 0

– When we have a series of dummies, you can see that the reference category is both the omitted variable.

Suggestions for selecting the reference category

• Make it a well defined group – other is usually a poor choice.

• If there is some underlying ordinality in the categories, select the highest or lowest category as the reference. (e.g. blue-collar, white-collar, professional)

• It should have ample number of cases. The modal category is often a good choice.

Multiple dummy Variables

• The model for the full dummy variable scheme for race is:

Y i

 a

B

4

B

1

* X

* Asian i i

B

2

B

5

* Black i

* AmInd i

B

3

 e i

* Hispanic i

• Note that the dummy for White has been omitted, and the intercept a is the intercept for

Whites.

Tests of Significance

• With dummy variables, the t tests test whether the coefficient is different from the reference category, not whether it is different from 0.

• Thus if a = 50, and B

1

= -45, the coefficient for Blacks might not be significantly different from 0, while Whites are significantly different from 0

Interaction terms

• When the research hypothesizes that different categories may have different responses on other independent variables, we need to use interaction terms

• For example, race and income interact with each other so that the relationship between income and ideology is different (stronger or weaker) for

Whites than Blacks

Creating Interaction terms

• To create an interaction term is easy

– Multiply the category * the independent variable

– The full model is thus:

Y i

 a

B Race

B Income

B

3

( Race * Income )

 e i

• (a + B

1

) is the intercept for Blacks;

• B

2

• (B

2 is the slope for Whites; and

+ B

3

) is the slope for Blacks

• t-tests for B

1 and B

3 are whether they are different than a and B

2

Non-Linear Models

• Tractable non-linearity

– Equation may be transformed to a linear model.

• Intractable non-linearity

– No linear transform exists

Tractable Non-Linear Models

• Several general Types

– Polynomial

– Power Functions

– Exponential Functions

– Logarithmic Functions

– Trigonometric Functions

Polynomial Models

• Linear

• Parabolic

Y i

 a

 bX i

 e i

Y i

 a

 b

1

X i

 b

2

X i

2  e i

• Cubic & higher order polynomials cube, etc. the independent variable.

Power Functions

• Simple exponents of the Independent

Variable

Y i

 ab i

X  e i

• Estimated with

Y i

 a

 bLogX i

 e i

Exponential and Logarithmic

Functions

• Common Growth Curve Formula

• Estimated with

Y i

 ae Xb  e i

LogY

 a

 bX

 e i i i

• Note that the error terms are now no longer normally distributed!

Logarithmic Functions

Trigonometric Functions

• Sine/Cosine functions

• Fourier series

Intractable Non-linearity

• Occasionally we have models that we cannot transform to linear ones.

• For instance a logit model

P ( y )

 

1

1 e

XB

Y

 bX t

( 1

 c ) Y t

1

Intractable Non-linearity

• Models such as these must be estimated by other means.

• We do, however, keep the criteria of minimizing the squared error as our means of determining the best model

Estimating Non-linear models

• All methods of non-linear estimation require an iterative search for the best fitting parameter values.

• They differ in how they modify and search for those values that minimize the SSE.

Download