Part 1

advertisement
UNIVERSITY OF PARMA
Faculty of Economy
December, 20 2011 EXAMINATIONS
Economic Statistics
Duration – 1.20 hours
Examination Aids: Calculator Table and Formulas
In conducting this test is only allowed to use pocket calculators, at most, with
basic statistics functions. You cannot use programmable calculators.
You may use statistical tables (Standard Normal, Student t and F) and
SUMMARY OF UNIVARIATE AND BIVARIATE AND MULTIPLE REGRESSION
FORMULAS attached to this part and
**Unless otherwise specified, use the conventional 5 percent significance
level**
Part 1: 3 written questions worth a total of 30 points
Part 2: 3 multiple choice questions worth 1 point each for a total of 3 points
Show your work and answer clearly, concisely, and completely.
Questions can be clarified, but no hints will be provided.
You may begin good luck.
Part 1
EXERCISE 1 [10 points]
A large food company conducts a survey in a sample of 64 countries in order to know the
factors that influence the sales of a new energy bar.
The dependent variable Y consists of the monthly sales (thousands of euro) while the
independent variables (regressors) are X2: television promotion expenditures (thousands
of euro), X3 radio promotion expenditures (thousands of euros), X4 Price in euros.
We choose to fit to the data a model of multiple linear regression. Here are: i) the mean
and standard deviation for each variable, ii) estimates of the parameters of the model with
Gretl OLS procedure.
Summary Statistics, using the observations 1 - 64
Variable
Mean
Std. Dev.
Y (SALES)
44410.92
1583.45
X2 (TV)
17175.43
886.64
X3 (RADIO)
6055.35
805.32
X4 (PRICE)
2.27
1.06
Model 1: OLS, using observations 1-64
Dependent variable: SALES
Coefficient
-9615.08
Std. Error
7654.85
t-ratio
-1,256077
X2 (TV)
1.53
0.11
13,909091
X3 (RADIO)
1.47
1.26
1,1666667
X4 (PRICE)
-175.52
901.42
-0,194715
const
Mean dependent var
Sum squared resid
R-squared
F
44410.92
0.782
71.81
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
1583.45
7.74766e-020
a) [point 1] State the multiple regression equation in conventional term and interpret
the meaning of the slopes, b2 and b3 and b4 in this problem
b) [point 1] Predict the Sales for an expenditure in TV Advertising of 30000
Thousands euro and in RADIO advertising of 5000 Thousands euro and with a
price of 3 euro.
c) [points 2] Which type of advertising is more effective? Explain
d) [points 2] Determine whether there is a significant relationship between Sales and
the three independent variables (TV, RADIO and PRICE) at the 0.05 level of
significance. Interpret the meaning of the p-value.
.
e) [points 2] At the 0.01 level of significance, determine whether each independent
variable makes a significant contribution to the regression model ( Stating clearly
the null hypothesis). On the basis of these result, indicate the independent
variables to include in this model.
f) [points 2] you decide to fit a reduced model where Y depends on X2 (TV). The
coefficient of determination for this simple linear model is: R2 = 0,776. Compare by
means of a suitable test (at 0.01) the reduced model with the complete model (in
which all variables appear) and specify clearly the null hypothesis.
SOLUTION
a) Y-hat = -9615.08 + 1.53*X2 + 1.47*X3 -175.52*X4
(7654.85)
(0.11)
(1.26)
(901.42)
n = 64, R-squared = 0.782
(standard errors in parentheses)
In this model, the regression coefficients are interpreted as follows:
1) Holding constant the spending in Radio advertising and Price, for each increase
of 1.0 thousand Euro in radio advertising , the Sales is estimated to increase by 1.53
thousand Euro (i.e., Euro 15300).
2) Holding constant the spending in TV advertising and Price , for each increase of
1.0 thousand Euro in Radio advertising , the Sales is estimated to increase by 1.47
thousand Euro (i.e., Euro14700).
3) Holding constant the spending in TV and Radio , for each increase of 1.0 Euro in
Price , the Sales is estimated to increase by 2.27 thousand euro(i.e., euro 14700).
3) The sample Y intercept (b1 = -9615.08) estimate the value of Sales when there is
no money spent on radio and TV advertising and Price are equal zero. Because
these value of promotion and price are outside the range of RADIO and TV used
and Price in this market study, and are nonsensical, the value of b1 has no practical
interpretation.
b) Predict the Sales for an expenditure in TV Advertising of 30000 Thousands euro and in
RADIO advertising of 5000 Thousands euro and with a price of 3 euro
Sales_hat = -9615.08 + 1.53*30000+ 1.47*5000 -175.52*3 = 43108.36 thousands euro
c) [points 2] Which type of advertising is more effective? Explain
Holding the other independent variable constant, TV advertising seems to be more
effective because its slope is greater. But in such case if the mean and the
variability of the two independent variables are different, standardized versions of
the regression coefficients provide more meaningful comparisons. In our case we
do not know the variability and is better to compute the standardized partial
coefficients.
beta2  b2
beta3  b3
sX 2
sY
sX3
sY
 1.53*
886.64
 0.857
1583.45
 1.47*
805.32
 0.748
1583.45
The type of advertising more effective is TV advertising
d) Determine whether there is a significant relationship between Sales and the three
independent variables (TV, RADIO and PRICE) at the 0.05 level of significance. Interpret
the meaning of the p-value.
Our next task is to test the "significance" of this model based on that F-ratio using
the standard five step hypothesis testing procedure.
Hypotheses: H0: all coefficients are zero
H1: almost one is different from 0
Critical value: an F-value based on (k-1) numerator df and (n - k) denominator df
gives us F(3, 60) at 0.05 = 2.758
Calculated Value:
R 2 (k  1)
0.782 / 3
0.2607
F (k  1, n  k ) 


 71.81
2
(1  R ) (n  k ) (1  0.782) / 60 0.00363
From above the F-calc is 71.81
Compare: F-calc > F-crit and thus we reject H0.
Conclusion: This model has explanatory power with respect to Y. In other words the
set of X variables in this model help us explain or predict the Y variable. This model
is SIGNIFICANT.
The p-value associated to F-calc is 7.74766e-020, that is much less than α. So, in
another way we can say that the value of F-crit falls in the rejection zone of the null
hypothesis.
e) [points 2] At the 0.05 level of significance, determine whether each independent
variable makes a significant contribution to the regression model ( Stating clearly the null
hypothesis). On the basis of these result, indicate the independent variables to include in
this model.
Our step is to test the significance of the individual coefficients in the equation. We
will conduct a t-test for each b associated with an X variable. Mechanically the
actual test is going to be the value of b1 (or b2, b3.....bi) over SEb1 (or SEb1...SEbi)
compared to a t-critical with n - k ) df (the Error df from the ANOVA table). Or we
consider the p-values to determine whether to reject or accept Ho. The Ho being
tested by this test is βi = 0. which means this variable is not related to Y. We
consider each variable separately and thus must conduct as many t-tests as there
are X variables.
What NULL are we considering?
Hypotheses: we are testing H0: βi=0 This variable is unrelated to the dependent
variable at alpha=0.05.
With the actual values of the b's and the SEb's, we obtain the t-value (one for each
X variable ):
TTV = 1.53/0.11 = 13.909091
tRADIO = 1.47/1.26 = 1,1666667
tPRICE = -175.52/901.42 = -0.194715
and comparing them with t-critical value (it is the same for each t-test within a single
model) to determine whether to reject or accept the Ho associated with each X.
tcritical = 2.0003 with 60 df
At the 0.05 significance level, reject H0 if t ≥2.0003or t 2.0003. Do not reject H0 if
2.0003t 2.0003.
The critical value from the t-table is t = 2.0003 with 60 degrees of freedom.
Compare the t statistics (13.909091 , 1,1666667 and -0.194715) to the critical value X2 is
significant independent variable and X3 and X4 are not significant independent
variable.
Conclusion: Variables X2 (TV) is significant and contributes to the model’s
explanatory power and X3(Radio) and X4 (Price) are not significant.
f) [points 2] you decide to fit a reduced model where Y depends on X2 (TV). The
coefficient of determination for this simple linear model is: R2 = 0,776. Compare by means
of a suitable test (at 0.05) the reduced model with the complete model (in which all
variables appear) and specify clearly the null hypothesis.
To test this, we consider two separate regressions:
(Restricted)
Y  1   2 X 2  u
(Complete or Unrestricted)
Y    2 X 2  3 X 3  4 X 4  u
1
Does X3 and X4 variables have a significant impact?
We perform an F test comparing RSS when the X3 and X4 variables are included
(RSS2 = RSSc) with RSS when it is not (RSS1 = RSSr).
The null hypothesis is
Ho: β3 = 0 and β4 =0
H1: β3 ≠ 0 and β4 ≠ 0
How can we test this hypothesis? The test statistic is defined in the following way:
( RSSr  RSSc ) / df1 ( Rc2  Rr2 ) / df1 ( Rc2  Rr2 ) (n  m)
F ( m  k , n  m) 



RSSc / df 2
(1  Rc2 ) / df 2
(1  Rc2 ) (m  k )
RSSr =RSS1 = Sum of squared residual of reduced model
RSSc =RSS2 = Sum of squared residual of complete (unrestricted) model
df1 = m - k no. extra parameters, df2 = n-m complete model
k = number of reduced model parameters, m number of complete (unrestricted) model
parameters
( Rc2  Rr2 ) (n  m) (0.782  0.776) (64  4)
F (2, 60) 



 0.826
(1  Rc2 ) (m  k )
(1  0.782)
(4  2)
Decision Rule
From the F-table, F(0.01, 2, 60)  4.98. The decision rule is to reject H0 if F  4.98
and accept (do not reject) H0 if F  4.98.
The test statistic is F = 0.826 which falls in the rejection region. Do not accept H0
and conclude that the introduction in the model of X3 and X4 does not provide a
significant improvement in the explanation of Y
EXERCISE 2
Use the information in the table below to answer the following questions.
United States
(dollar)
South Korea
(won)
Israel (shekel)
Poland (zloty)
Big Mac Price
Exchange Rate
(June 4, 1998)
$2.53
–
W 2,600
1,475 W/$
sh 12.50
zl 5.30
3.70 sh/$
3.46 zl/$
a. Calculate whether the won, the shekel, and the zloty are overvalued or undervalued
with respect to the U.S. dollar in terms of Big Macs purchases. Explain what it means to be
overvalued or undervalued.
Answer: One way to answer this is to calculate the dollar price of a Big Mac in
South Korea, Israel and Poland using current exchange rates. If the dollar price is
less than the price of a Big Mac in the US then the country’s currency is
undervalued. If otherwise, then the currency is overvalued.
In South Korea: W2600 / 1475W/$ = $1.76. This is less than $2.53, the US price,
therefore the South Korean won is undervalued.
In Israel: sh12.50 / 3.7 sh/$ = $3.38. This greater than $2.53 therefore the Israeli
shekel is overvalued.
In Poland: zl5.30 / 3.46 zl/$ = $1.53 This is less than $2.53 therefore the Polish zloty
is undervalued.
Answer: A second way to answer this (solution proposed by the student DANNI
Andrea is to calculate the purchasing power parity (PPP) using the Big Mac price
from each country:
In South Korea: USPPPSK = PW / P$ = 2600 / $2.53 = 1027.67 W/$
In Israel: USPPPIS = Psh / P$ = sh12.50 / $2.53 = 4.94 sh/$
In Poland USPPPPL = Pzl / P$ = zl 5.30 / $2.53 = 2.09 zl/$
If the PPP is less than the Exchange Rate, than the country’s currency is
undervalued. If otherwise, then the currency is overvalued.
For example for South Korea we have: (USPPPSK – USEXRSK)/ USEXRSK = - 0.303. That
is Won is undervalued regarding Us dollar of 30.3%.
b. What would the exchange rates have to be in order to equalize Big Mac prices
between South Korea and the United States, Israel and the United States, and Poland
and the United States?
Answer: Here you can simply apply the purchasing power parity formula using the
Big Mac price from each country,
In South Korea USPPPSK = PW / P$ = R2600 / $2.53 = 1027.67W/$
In Israel USPPPIS = Psh / P$ = sh12.50 / $2.53 = 4.94 sh/$
In Poland USPPPPL = Pzl / P$ = zl 5.30 / $2.53 = 2.09 zl/$
These are the PPP exchange rates based on Big Mac prices.
c. If in the long run the exchange rate moves to satisfy Big Mac purchasing power
parity (PPP), will the won, shekel, and zloty, appreciate or depreciate in terms of
dollars? Explain the logic.
Answer: In order to reach the PPP exchange rate the won would have to change
from 1475 W/$ to 1027.67 W/$ . Since this exchange rate is the value of the $ ($s in
the denominator) the dollar would need to depreciate, therefore the won would
appreciate. This means also that if the won is undervalued the won would need to
appreciate to reach its PPP value.
Similarly, the shekel exchange rate would have to change from 3.70 sh/$ to 4.94 sh/
$, representing a $ appreciation, or a shekel depreciation.
For the zloty, the exchange would need to change from 3.46 zl/$ to 2.09 zl/$,
meaning the $ would have to depreciate or the zloty appreciate.
EXERCISE 3 [10 points]
Dummy variables [5 points]
Role of Categorical (dummy) Variables in the Linear Regression Model (Here you find
some hints):
1.
2.
3.
4.
What is a Dummy variable? Type of Dummy variables . [Analysis & 1 – 2 sentences]
How many dummy variables are needed? . [Analysis & 1 – 2 sentences]
How to Interpret Dummy Variables. [Analysis & 1 – 2 sentences]
How do you add an interaction to a regression? . [Analysis & 1 – 2 sentences]
What is a Dummy variable?
A Dummy variable or Indicator Variable is an artificial variable created to represent
an attribute with two or more distinct categories/levels.
Things to keep in mind about dummy variables
Dummy variables assign the numbers ‘0’ and ‘1’ to indicate membership in any
mutually exclusive and exhaustive category.
1. The number of dummy variables necessary to represent a single attribute variable
is equal to the number of levels (categories) in that variable minus one.
2. For a given attribute variable, none of the dummy variables constructed can be
redundant. That is, one dummy variable cannot be a constant multiple or a simple
linear relation of another.
3. The interaction of two attribute variables (e.g. Gender and Marital Status) is
represented by a third dummy variable which is simply the product of the two
individual dummy variables.
How many dummy variables are needed?
In a multiple regression there are times we want to include a categorical variable in
our model. Examples might include gender or education level. Unfortunately we
cannot just enter them directly because they are not continuously measured
variables. However, they can be represented by dummy variables.
The answer to "how many?" is easy. It is r-1 where r = the number of categories in
the categorical variable. Thus for gender (male - female) we would need only one
dummy variable with a coding scheme of Xi=1 when the individual is male, and 0
when female. Thus female becomes the base case and the bi associate with Xi
becomes the amount of change in Y when the individual is male versus female. For
the education level example, if we have a question with "highest level completed"
with categories (1) grammer school, (2) high school, (3) undergrad, (4) graduate, we
would have 4 categories we would need 3 dummy variables (4-1). Thus we would
create 3 X variables and insert them in our regression equation. We decide on our
base case - in this example it will be grammer school. This category will not have an
X variable but instead will be represented by the other 3 dummy variables all being
equal to zero. We can make X1 = 1 for high school, X2 = 1 for undergrad and X3 = 1
for graduate.
For each of these we are comparing the category in question to the grammer school
category (our base case). The best way to lay this out is to build a little table to
organize that coding. see below:
category/variable X1 X2 X3
Grammer School
0
0
0
High School
1
0
0
Undergraduate
0
1
0
Graduate
0
0
1
Thus no matter how many other variables are in the model, in order to include
education level in your model you will have to add 3 new dummy variables (X's) to
the model.
How to Interpret Dummy Variables.
When a Multiple Regression equation is calculated by the computer you will get a b
value associated with each X variable, whether they are dummy variables or not.
The significance of the model and each individual coefficient is tested the same as
before. Concluding that a dummy variable is significant (rejecting the null and
concluding that this variable does contribute to the model's explanatory power)
means that the fact that we know what category a person falls in helps us explain
more variance in Y. So for instance in the example above with education level, if we
test the B associated with X1 and determine it to be "significant" then that tells us
that X1 (high school vs. grammer school) does contribute to the model's
explanatory power. Thus by knowing whether a person has a high school education
(versus on a grammer school education) helps us explain more of whatever the Y
variable is. This process is repeated for each dummy variable, just as it is for each X
variable in general.
Location Quotient [5 points]
1) For what reasons do you calculate the Location Quotient o in an analysis of your local
economy? . [Analysis & 1 – 2 sentences]
2) Write the basic formula for calculate the Location Quotient when we are comparing the
regional economy to the national economy, highlighting what key inputs are required to
calculate the Location Quotient . [Analysis & 1 – 2 sentences]
3) How you define “the basic export employment”? . [Analysis & 1 – 2 sentences]
The Location Quotient Technique is the most commonly utilized economic base
analysis method. It was developed in part to offer a slightly more complex model to
the variety of analytical tools available to economic base analysts. This technique
compares the local economy to a reference economy, in the process attempting to
identify specializations in the local economy. The location quotient technique is
based upon a calculated ratio between the local economy and the economy of some
reference unit. This ratio, called an industry "location quotient" gives this technique
its name
Location Quotient Calculation
To calculate any location quotient the following formula is applied. Note that in this
formula we are comparing the Regional Economy (often a county) to the National
Economy. Location quotients may also be calculated that compare the county to a
state.
Regional Employment in
Industry k
National Employment in
Industry k
Location
/
Quotient= Total Regional Employment Total National Employment
Examining this formula more closely, we see that to allocate employment to the
basic and non-basic sectors, location quotients are calculated for each industry.
Simply stated, the location quotient method compares Local Employment to
National Employment. The LQ provides evidence for the existence of basic
employment in a given industry.
Interpreting Calculated Location Quotients
Interpreting the Location Quotient is very simple. Only three general outcomes are
possible when calculating location quotients. These outcomes are as follows:
LQ < 1.0
LQ = 1.0
LQ > 1.0
LQ < 1.0 = All Employment is Non-Basic
A LQ that is less than zero suggests that local employment is less than was
expected for a given industry. Therefore, that industry is not even meeting local
demand for a given good or service. Therefore all of this employment is considered
non-basic by definition.
A LQ = 1.0 = All Employment is Non-Basic
A LQ that is equal to zero suggests that the local employment is exactly sufficient to
meet the local demand for a given good or service. Therefore, all of this employment
is also considered non-basic because none of these goods or services are exported
to non-local areas.
A LQ > 1.0 = Some Employment is Basic
A LQ that is greater than zero provides evidence of basic employment for a given
industry. When an LQ > 1.0, the analyst concludes that local employment is greater
than expected and it is therefore assumed that this "extra" employment is basic.
These extra jobs then must export their goods and services to non-local areas
which, by definition, makes them Basic sector employment.
Calculating the Level of Basic Employment
When the LQ is calculated to be greater than 1.0, it has been determined that some
of that industry's employment is Basic. However, it is must be emphasized that a LQ
> 1.0 does not mean that all that industry's employment is basic in nature. Recall
that it is assumed that any employment "below" an LQ of 1.0 is Non-Basic; those
jobs serve local demand. Only those jobs over and above what was expected for the
region can be identified as Basic sector jobs.
Because of the assumptions of the Location Quotient approach, a second formula
must be applied to determine the number of Basic sector jobs when the LQ is
greater than 1.0. This formula is as follows:
Basic Sector
Employment
=
Regional
Employment
Industry k
National
Employment
Industry k
-
Total
Regional
Employment
Total
National
Employment
X
National
Employment
Industry k
Part 2
1) [1 points]
Consider the following data related to US GDP: GDP: $12 Trillion
Consumption: $9.2 Trillion
Government Purchases: $1.8 Trillion
US investment abroad: $0.4 Trillion
Imports: $1.5 Trillion
Domestic Investment: $1.6 Trillion
Private Savings: $2.2 Trillion
1a) What is the value of US exports?
(A)
0.5; (B) -0.9; (C) 1.2; (D) 0.0; (E) 0.9.
1b) Is the US running a trade deficit or surplus?
NX=(GDP−C−I−G)=(12−9.2−1.6−1.8)= −0.6. This is negative, so the US is running a
trade deficit. NX=EX−IM, so EX=NX+IM=−0.6+1.5=0.9
Answer:
2) [1 points] A student obtain the following results in several different regression
problems. In which cases could you be certain that an error has been committed?
( hint: R2Y.234 denote the coefficient of multiple determination between Y and the set of
independent variables X2, X3 and X4 and R2 Y.2345 denote the coefficient of multiple
determination between Y and the set of independent variables X2, X3 , X4 and X5)
a) R2Y.234 = 0.89
R2 Y.2345 = 0.86
b) Adjusted R2Y.234 = 0.86
Adjusted R2 Y.2345 = 0.82
Answer: the statement a) is wrong, because when I introduce a further
variable in the regression model the coefficient of determination cannot
decrease
3) Suppose you calculate X  12, Y  24 , sX = 2, sY = 4, and sXY = -12. How do you know
you must have made a mistake in calculating these statistics?
Answer: if we compute the correlation coefficient the result is:
rxy= sXY/( sX* sY) = -12/(2*4) = - 1.5
This is impossible because the range of rxy is between -1 +1.
Download