Mid-term Exam
Basic Econometrics
April 15, 2025
Student #: S10920618
Name: Joy Anjora Ginting
Answer all of the questions and SHOW your works.
1. What is the difference between population and sample? To estimable a regression
model, which you will use for your data analysis?
Population is the whole group of people that are in the range of defined group, while
sample is the subset of that population which only take a certain group based on
what we want to define.
To estimate regression model we can go to SPSS then click Analyze then Regression
then choose Linear
2. To estimable a single OLS regression model from a dataset, how we can estimate a,
an intercept and b, coefficient of X, where Y= a + bX. Write formula to estimate a
and b.
𝑦 = 𝑎𝑋 + 𝑏 + 𝑒
X̅Y̅ − X̅ × Y̅
cov (x,y)
a= 𝑋 𝑠𝑞𝑢𝑎𝑟𝑒 𝑏𝑎𝑟− ̅̅̅
= 𝑣𝑎𝑟 (𝑥,𝑦)
(𝑥 )2
𝑏 = −𝑎 ̅
X + Y̅
3. Estimate a single OLS regression model using below dataset. You should use the
formula to estimate intercept and coefficient of X from question 2 to answer this
question.
X
Y
8
190
7
180
7.5
150
4
130
1.5
55
1.3
45
1
2
80
11
350
9.5
255
10
299
Mean of x = 6.18
Mean of y = 173.40
Slope b = 3272.38/122.52= 26.71
Intercept a= Y bar – b X bar = 173.40 – ( 26.71 X 6.18)= 8.33
Y = 8.33 + 26.71X
4. We get an estimated multiple OLS regression model,
Y= -30 + 0.3X1 + 6.2 X2 – 2.5X3
where:
Y: university entrance score, where full mark is 300
X1: mid-term score of the grade 12 (last high school year), where full mark is 300
points
X2: GPA score, where full mark is 45
X3: whether being a club member or not, if member is 1, if not is 0
Assuming all coefficients of X variables are statistically significant at p = 0.05, then
1) interpret the estimated model. You need to interpret relationships between all
independent variables and dependent variable.
2) Estimate a university entrance score with below information. A student’s mid-term
score is 250, his GPA score is 35 and he is a member of a club.
3) Estimate a university entrance score with below information. A student’s mid-term
score is 185, his GPA score is 25 and he is not a member of a club.
1)
for the intercept which is -30 each calculation would be -30 affecting the entrance score
for each additional increase of mid term score will increase the university entrance
exam by +0.3
for each additional increase of GPA score will increase the university entrance exam by
+6.2
2
for each person who is part of a club will decrease their university entrance exam by 2.5 compared to who is not part of a club get no effect at all
2) Y = -30 + 0.3(250) + 6.2 (35) – 2.5 (1)
= -30 + 75 + 217 - 2.5 = 259.5
3) Y = -30 + 0.3(185) + 6.2(25) – 2.5 (0)
= -30 + 55.5 + 155 = 180.5
5. Write important assumptions of OLS regression model to be BLUE (there are 3
assumptions based on the lecture). First, please answer what is BLUE? Then write
the three assumptions.
BLUE is the best linear unbiased estimator
1) linearity means the relationship between independent and dependent variables is
linear
2) unbiasedness means that if you repeat the sampling methods many times, the
average will show correctness
3) minimum variance means that among all linear and unbiased estimators, OLS has
the smallest variance
6. We try to estimate a model explaining college GPA (cplGPA), with the average
number of lectures missed per week (skipped), ATC score to enter the college
(ACT) and high school GPA (hsGPA) as explanatory variables. The estimated model
is
colGPA = 1.39 + 0.412 hsGPA + 0.015 ACT – 0.083 skipped
(0.33) (0.094)
(0.011)
(0.026)
2
N = 141, R = 0.85
The number in parentheses under each explanatory variable is the P-value of the
coefficient of the explanatory variables.
1) Test whether all coefficients of explanatory variables are statistically significant at
5% level. You need to write a null hypothesis and show whether reject or not the
null hypothesis for all explanatory variables.
2) What is R2 meaning of this model?
3
1)
H0 = the coefficient = 0
H1= the coefficient ≠ 0
hsGPA = 0.0094 > 0.05 can’t reject H0 means not significant
ACT = 0.011 < 0.05 reject H0 means significant
lecture per week skipped = 0.026 < 0.05 reject H0 means significant
2) since the R square is at 85% means that the model almost clearly explain the data.
Means that it successfully explain the data of colGPA using the hsGPA, ACT, and
lecture per week skipped
7. Use the SPSS data set called “workprog.sav” and answer below questions.
1) How many respondents in the dataset?
1000
2) How many variables in the dataset?
8
3) What is the average age of all respondents, male respondents and female
respondents? You need to create new gender variable using the SPSS function “Record
into different variables” with coding 1 for male and 0 for female respondents. Please
provide the screenshot of output table as your evidence.
4
average age of man 18.47
average age of female 18.49
4) Draw a bar chart of respondents’ gender.
5
5) Is there statistically significant difference between gender in “income before
program”? Answer the question with appropriate explanation and provide the screenshot
of output table to support your answer.
since the significant level is more than 0.517 > 0.05 so its not statistically significance
between male and female in income before program
6) Is there statistically significant difference between gender in “Marital status”?
Answer the question with appropriate explanation and provide the screenshot of output
table to support your answer.
6
since the significant level is more than 0.621>0.05 so its not statistically significant
between male and female in marital status
7) Try to estimate “income before program” using a OLS model based on the following
explanatory variables, “age” and “number of people in household”. Please write a model
equation with coefficients and interpret the model. You need to provide the screenshot
of output table to support your answer.
Income before program = -2.710 + 0.636 (age) – 0.050 (number of people in household)
increasing the age by 1 will increase the income before program by +0.636
increasing number of people in household by 1 will decrease the income before program
by -0.05
keep in mind only the age are statistically significant and based on the R square
52.7 % is not really a good results but make sense since we only use 2 variables here.
7
But overall the models can explain the income before program by 52.7% whereby only
age are statistically significant so maybe to estimate the income before program next
time we don’t need to include number of household
8) Try to estimate“income before program” using a OLS model based on the following
explanatory variables, “age”, “number of people in household”, “gender” and “level of
education”. To create dummy variables, please set up “female” and “did not complete
high school” as reference. Please write a model equation with coefficients and interpret
the model. You need to provide the screenshot of output table to support your answer.
Income before program = 1.882 + 0.321 (age) – 0.028 (number of people in household)
– 0.12 (male) + 1.624 (HSD) + 3.301 (SC)
increase age by 1 will increase the income before program by +1.882
increase number of people in household by 1 will decrease income before program by 0.028
if the gender is male will decrease the income before program by -0.012
if the respondent last degree is high school will increase the income before program by
+1.624
if the respondent last degree us some college will increase the income before program
by +3.301
keep in mind only age, high school degree, and some college are statistically significant
8
with the R square of 89.4% we can say that the model can be used to estimate the
income before program almost perfectly
just keep in mind to not include number of people and gender male on the next
calculation since they both not significant statistically
9