Uploaded by Abdullah Al Haddabi

Past Practice Exam

advertisement
This pdf contains exactly the same questions on the Mock eExam and is provided
to you for easy access to the practice and past exam questions.
Please also use the Mock eExam. Sole purpose of the Mock eExam is to get you
familiar with the eExam format. Mock eExam link:
https://student-eassessment.monash.edu/course/view.php?id=4262
1
Past and Practice Exam Questions
ETF2121-ETF5912 Data Analysis in Business
Semester 2, 2021
The formulae sheet and statistical tables are available on Moodle.
2
Multiple Choice Questions
Question 1
A Type I error occurs when we:
(a)
(b)
(c)
(d)
(e)
reject a false null hypothesis.
reject a true null hypothesis.
do not reject a false null hypothesis.
do not reject a true null hypothesis.
none of the above.
Question 2
If an estimated regression line has a y-intercept of 10 and a slope of 4, then when
x = 2, the actual value of y is:
(a)
(b)
(c)
(d)
(e)
18
15
14
13
unknown
Question 3
In the estimated probit model:
Pr(π‘Œπ‘ŒοΏ½
= 1|𝑋𝑋) = Φ(𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋)
Μ‚
Μ‚
if 𝛽𝛽0 = −6, 𝛽𝛽1 = 2 and 𝑋𝑋 = 2, then the predicted probability that π‘Œπ‘Œ = 1 is:
(a)
(b)
(c)
(d)
(e)
0.5
0.9972
0.4772
0.0228
none of the above
Question 4
The residual is defined as the difference between:
(a) the actual value of y and the predicted value of y.
(b) the actual value of x and the predicted value of x.
(c) the actual value of y and the predicted value of x.
(d) the actual value of x and the predicted value of y.
(e) none of the above.
3
Question 5
In a sign test, the following information is given: number of zero differences = 3,
number of positive differences = 20, and number of negative differences = 5. The value
of the test statistic is:
(a)
(b)
(c)
(d)
(e)
5
4
3
2
none of the above
Question 6
A non-parametric method to compare two populations, when the samples are
independent and the data are ranked, is the:
(a) Wilcoxon signed rank sum test
(b) Sign test
(c) Wilcoxon rank sum test
(d) Matched pairs t-test
(e) none of the above
4
Question 7
Suppose you want to conduct a survey of small business owners in Victoria. The state
Directory of Businesses gives a list of 1000 registered business owners, and hence the
size of the population is 1000. You want to obtain a sample size of 30 business owners.
There are two sampling techniques that can be used. Indicate which of these sampling
techniques are described below.
(7.a) Group the businesses according to 5 business types (retailers, agriculture,
manufacturing, financial services and advertising) and then randomly select a sample
of 6 business owners from each business type.
(7.b) Assign a number to each registered business in the state Directory of Businesses,
and then use a random number generator to select the business owners to be included
in the sample of size 30.
Question 8
When conducting an analysis of the equality of means of more than two populations (k
> 2), what is the major advantage of conducting a parametric F-statistic (ANOVA)
instead of doing multiple t-tests of each population mean against all the other population
means separately? Briefly explain your answer.
Question 9
A regression analysis was performed to study the relationship between a dependent
variable and five independent variables. The following information was obtained from
the regression analysis:
𝑆𝑆𝑆𝑆𝑆𝑆 = 2400, SSR = 9600, n = 40
Determine the F-statistic.
Question 10
In a simple regression model, when every observation is on the regression line, the sum
of squares of the error SSE = 0, the standard error of estimate π‘ π‘ πœ€πœ€ = 0, and the coefficient
of determination 𝑅𝑅2 = 1. True or False and give a brief explanation.
5
Question 11
Specify the test statistic and the decision rule for each of the following Wilcoxon Rank
Sum tests.
(11.a)
H0 : The two population locations are the same
HA : The location of population A is to the left of the location of populations B
nA = 4
nB = 6
𝛼𝛼 = 0.025
(11.b)
H0 : The two population locations are the same
HA : The location of population A is different from the location of populations B
nB = 25
𝛼𝛼 = 0.05
nA = 20
Question 12
What is one important potential benefit of using a matched pairs experiment to test the
difference between two populations rather than independent samples? Briefly explain
your answer with an example.
Question 13
A Gallup Organisation poll a randomly selected American adults in July 2002 found
that 55% of those surveyed felt that their weight was about right. The sampling error
for the survey was given as 3%.
(13.a) Find a 95% confidence interval estimate of the percentage of American adults
who think their weight is about right.
(13.b) Based on the interval computed in part (13.a), explain whether it is reasonable to
say that more than 50% of American adults think their weight is about right.
6
Question 14
An importer is considering whether to import and sell a new hair care product here in
Australia. To begin with, he wants to try to sell the product in one large metropolitan
market. He wants to choose the market where people spend the most on hair care
products currently. To find this out, he randomly samples 210 adults who live in
Melbourne and another 250 adults who live in Sydney and asks each one about their
total spending in dollars on hair care products over the past year. The unknown
population variances are assumed to be equal.
Using this information, answer the following questions.
Begin this question 14 on a new page. Clearly label this question number of the
new page.
(14.a) What is the name of the test statistic that you may be able to use to test whether
Melbourne or Sydney people spend more, on average, on hair care products using the
sample data? Briefly explain why this test statistic may be appropriate?
(14.b) If you had all the data on total spending on hair care products for each individual
in the two samples, what could you do to determine what the distribution of the data is?
Explain your answer.
(14.c) Write down the appropriate null and alternative hypotheses for the test statistic
in part (14.a) if you want to test whether spending on hair care products is higher in
Sydney (population 2 or B) than in Melbourne (population 1 or A).
(14.d) What are the steps involved in constructing the test statistic in part (14.a)?
7
Use the following information to answer Questions 15 and 16.
The government is concerned about the rate of smoking in the population. It is
considering whether to raise the tax on cigarettes to reduce the number of people
smoking. To see if such a policy will have an effect, the government undertakes a study
of the determinants of smoking in the community. The government's chief statistical
analyst conducts a random survey of 807 adults in the country's population (from all
states) and collects the following variables.
EDUC
CIGPRIC
=
=
AGE
RESTAURN
=
=
years of schooling of the person
average price of cigarettes including taxes, in dollars
per pack, in the state where the person lives
of person in years
1 if the state where a person lives in has government
imposed restaurant smoking restrictions
0 otherwise
Using this information, the analyst constructed a binary variable SMOKE of whether
the individual smoked or not; that is
SMOKE
=
1 if an individual smokes
0 otherwise
The analyst wanted to identify the main determinants of the probability of an individual
person smoking using this information. To do this, the analyst estimated the Logit
model. The results of this estimation, using EViews, are provided below.
Dependent Variable: SMOKE
Method: ML - Binary Logit (Quadratic hill climbing / EViews legacy)
Sample: 1 807
Included observations: 807
Convergence achieved after 4 iterations
Covariance matrix computed using second derivatives
Variable
Coefficient
Std. Error
z-Statistic
Prob.
C
AGE
EDUC
CIGPRIC
RESTAURN
1.656162
-0.016164
-0.111039
-0.003519
-0.465569
1.001650
0.004514
0.026820
0.015581
0.180173
1.653434
-3.581026
-4.140104
-0.225849
-2.584019
0.0982
0.0003
0.0000
0.8213
0.0098
Mean dependent var
S.E. of regression
Sum squared resid
Log likelihood
Restr. log likelihood
LR statistic
Obs with Dep=0
Obs with Dep=1
0.384139
0.478753
183.5927
-520.8597
-537.5055
33.29162
497
310
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Avg. log likelihood
Total obs
0.486693
1.305724
1.340619
1.319124
-0.645427
807
8
Question 15
Begin this question 15 on a new page. Clearly label this question number of the
new page.
(15.a) Write down the estimated logit model. Report the results to 3 decimal places.
(15.b) Interpret the sign of the estimated coefficients on the AGE and EDUC variables.
(15.c) Interpret the sign of the estimated coefficient on CIGPRIC. Test if the coefficient
on the CIGPRIC variable is less than zero at 𝛼𝛼 = 0.05. Do the six steps of the test. What
do you conclude from this test about the potential effectiveness of a policy of increasing
the tax on cigarettes to reduce smoking?
Question 16
Begin this question 16 on a new page. Clearly label this question number of the
new page.
(16.a) Suppose Ms. A has the following characteristics:
EDUC
= 13 years
CIGPRIC
= $60
AGE
= 40 years old
Using the estimated logit model, and assuming that the state where Ms. A lives in has
government imposed restaurant smoking restrictions, calculate the probability that Ms.
A smokes.
(16.b) Suppose Ms. B has the following characteristics:
EDUC
= 13 years
CIGPRIC
= $60
AGE
= 40 years old
Using the estimated logit model, and assuming that the state where Ms. B lives in has
no government imposed restaurant smoking restrictions, calculate the probability that
Ms. B smokes.
(16.c) Based on the answers to (16.a) and (16.b), do you think that by restricting
smoking in restaurant we can reduce the probability of smoking?
9
Question 17
The quarterly household spending on clothing, denoted 𝑦𝑦𝑑𝑑 , (in millions of dollars) for
the first quarter of 1975 to the first quarter of 2005 is depicted in the line graph below.
An excerpt of these data are shown in the following table.
Year
t
Q1
Q2
𝑦𝑦𝑑𝑑
1975
4818
1
1
0
4800
2
0
1
4866
3
0
0
5139
4
0
0
1976
4810
5
1
0
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
2005
9138
121
1
0
Q3
0
0
1
0
0
..
..
..
0
Q4
0
0
0
1
0
..
..
..
0
Note:
(i) The time variable t equals 1 in the first quarter of 1975;
(ii) Q1, Q2, Q3 and Q4 are the dummy variables defined, respectively, as
Q1 = 1 if 1st quarter (January to March)
0 otherwise
Q2 = 1 if 2nd quarter (April to June)
0 otherwise
Q3 = 1 if 3rd quarter (July to September)
0 otherwise
Q4 = 1 if 4th quarter (October to December)
0 otherwise
10
Begin this question 17 on a new page. Clearly label this question number of the
new page.
(17.a) Consider the following estimated linear trend model with dummy variables.
Dependent Variable: Y
Method: Least Squares
Sample: 1975Q1 2005Q1
Included observations: 121
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
T
Q1
Q2
Q3
4713.070
37.09995
-587.1990
-582.2334
-364.4667
114.3054
1.221092
120.1375
121.1366
121.1181
41.23227
30.38259
-4.887724
-4.806420
-3.009183
0.0000
0.0000
0.0000
0.0000
0.0032
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.892263
0.888548
469.0647
25522519
-913.3781
240.1751
0.000000
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
6591.008
1405.043
15.17980
15.29533
15.22672
0.122454
Construct a forecast value of 𝑦𝑦𝑑𝑑 for the second quarter of 2005 based on the estimated
linear trend model with dummy variables. Show your workings.
(17.b) Consider the estimated AR(1) model for βˆ†π‘¦π‘¦π‘‘π‘‘ which is the change of 𝑦𝑦𝑑𝑑 .
οΏ½ 𝑑𝑑 = 48.04 − 0.26Δ𝑦𝑦𝑑𝑑−1
Δ𝑦𝑦
where Δ𝑦𝑦𝑑𝑑 = 𝑦𝑦𝑑𝑑 − 𝑦𝑦𝑑𝑑−1 . Construct a forecast value of 𝑦𝑦𝑑𝑑 for the second quarter of 2005
based on the estimated AR(1) model. If you think there is not enough information given
to answer this question, you may write “this question cannot be answered on the basis
of the information given” as your answer and write down what information you need
to answer this question.
11
Question 18
A real estate agent thinks that the street number is one of the factors that determine the
sale price of houses in the area. Her theory stems from her observation that many
individuals who live in the area believe that street numbers containing an “8” will bring
good luck, while those containing a “4” will bring bad luck, to those people living in
the house. In order to investigate this matter, the agent randomly selected 28 recently
sold houses in the area and collected the following information:
PRICE - house sale price in thousands of dollars
LAND_SIZE - size of the block of land on which the house is built in metres squared
BEDROOMS - number of bedrooms in the house
NUMBER - the street number of the house
She then constructed two dummy variables as follows:
SN4 = 1 if the house's street number contains the number “4”
= 0 otherwise
SN8 = 1 if the house's street number contains the number “8”
= 0 otherwise
Provided below are the results of an Ordinary Least Squares regression model.
Dependent Variable: PRICE
Method: Least Squares
Sample: 1 28
Included observations: 28
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
LAND_SIZE
BEDROOMS
SN4
SN8
-155.6170
0.111849
86.38813
-50.48589
38.97373
42.43146
0.043058
6.651671
18.44864
25.89519
-3.667491
2.597625
12.98743
-2.736564
1.505057
0.0013
0.0161
0.0000
0.0118
0.1459
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.904364
0.887732
41.09795
38847.96
-141.0232
54.37405
0.000000
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
334.8214
122.6570
10.43023
10.66812
10.50295
1.767468
Begin this question 18 on a new page. Clearly label this question number of the
new page.
(18.a) Interpret the estimated coefficient on LAND_SIZE.
(18.b) Interpret the estimated coefficient on SN4. Is the sign of the coefficient what you
would expect given the agent's theory? Briefly explain why or why not.
(18.c) Test whether the sales agent's theory about the sale price of houses with street
numbers containing the number “8” is supported by the data at 𝛼𝛼 = 0.05.
12
Download