Uploaded by Aria Mahmoud

2011MT-1

advertisement
March 2011
Midterm Examination
Advanced Business Statistics
MGSC 272
March 11, 9:00 – 11:00 a.m.
Examiner:
Prof. Brian Smith
Student Name:
McGill ID:
INSTRUCTIONS:

This is a CLOSED BOOK examination.

Only one hand-written or typed double-sided CRIB SHEETS permitted.

SPACE IS PROVIDED on the examination to answer all questions.

You are permitted translation or regular dictionaries.

Regular calculators are permitted. However calculators that can store text are not permitted.

The marks for each question appear next to the question number.

The exam consists of a total of 23 pages, including this cover page and 2 pages for rough work at the
end of the exam.

This examination paper MUST BE RETURNED
For Marker Only
Question
Part 1
Part 2, Q1
Part 2, Q2
Part 2, Q3
Grade
/30
/18
/12
/25
Part 2, Q4
/15
Part 1 consists of 15 questions worth 2 marks each for a total of 30 marks.
Part 2 consists of four questions for a total of 70 marks.
MGSC 272
Page 1/23
Part 1 (30 marks)
1. For a multiple regression model, when testing a hypothesis for a regression coefficient, which of
the following statements are true:
I.
II.
III.
If the model contains multicollinearity, a t-test is more likely to erroneously conclude H1.
If the model contains no multicollinearity, the F-test and t-tests are equally effective.
The test statistic is calculated by dividing the coefficient by its standard error..
Which of the above statements are true:
A.
B.
C.
D.
E.
I and II only
II and III only
I only
III only
All of the above statements are true.
Answer: B
The following information pertains to the next two questions:
Below you are given a partial Minitab output based on a sample of 20 observations.
Constant
X1
X2
X3
X4
2.
Coefficient
145.321
25.625
-5.720
0.823
1.238
Standard Error
48.682
9.150
3.575
0.183
0.421
Refer to the information above. At the 1% level of significance which of the following statements
are true?
A.
B.
C.
D.
E.
Variables X1 and X2 are significant, but variable X3 is not significant.
Variables X1 and X3 are significant, but variable X2 is not significant
Only variables X1 is significant.
Only variable X3 is significant.
None of the variables are significant.
Answer: t.005 = 2.947 => D is the correct answer.
MGSC 272
Page2/23
3.
A 99% confidence interval estimate for β2 is:
A.
B.
C.
D.
E.
-15.02 to 3.58
-16.26 to 4.82
-16.16 to 4.72
-14.04 to 2.60
None of the above.
Answer: B
A multiple regression model relating Y (dependent variable) and three independent variables X1,
X2 and X3 is based on a sample of 40 observations.
4.
The error degrees of freedom is equal to ______.
Answer: n – (k + 1) = 40 – (3 + 1) = 36
5.
In question 4, SSR = 1200, SSE = 880. Compute the adjusted R2a.
Answer: R 2 a  1 
MGSC 272
40  1 880
 1  .4583  .5417
40  (3  1) 2080
Page3/23
6. Consider the following statements regarding maximum likelihood estimates (MLE).
I
MLEs are sometimes biased.
II MLEs are preferred to least squares estimates when the error terms in a regression model are
normally distributed and exhibit homoscedasticity.
III MLEs use more information than least squares estimates about the parameters being
estimated.
Which of the above statements are true:
A.
B.
C.
D.
E.
I only
I and III only
II and III only
III only
All of the above statements are true
Ans: B
7.
A marketing consultant wants to find a maximum likelihood estimate for the probability that a
consumer will recommend a new product. A random sample of eight consumers is selected and each
consumer may give one of two possible replies, “Recommend” or “Don’t Recommend”. This eight
trial experiment is repeated 12 times, and a total of 42 consumers reply “Recommend”. The
maximum likelihood estimate for the number of consumer who recommend the product is:
_________________
MLE 
x 
nk
42
 .4375
(8)(12)
Ans: 0.4375
MGSC 272
Page4/23
8. A statistician wants to estimate the mean value of a normal distribution with a standard deviation of 5.
Based on a random sample of three values, Y1 = 27, Y2 = 42, and Y3 = 36, she has performed the
following analysis in Excel. The values in cells C6 – E7 are obtained using the normal distribution
function
with  = 8 and the specified value of μ.
Which of the following statements are true?
I. The value of the likelihood function at μ = 32 is closer to the maximum likelihood estimate than the
value of the likelihood function at μ = 36.
II. The maximum likelihood estimate has to be some value between 27 and 36.
III. The likelihood function for this example is obtained by multiplying the cumulative probabilities
associated with the three sample values of the normal distributions for the specified values of μ.
IV. The graph of the likelihood function will have a maximum values for a unique value of μ between 27
and 42.
V. More than one of the statements is true.
Answer: D
9.
An economist is studying a lognormal distribution X. The variable ln(X) has a normal distribution with
a mean of 3 and a standard deviation of 1.5. The lognormal distribution density function for variable X
is given by:
f(x) = _________________.
1  ln x 3 

1.5 
2
 
1
f ( x) 
e 2
2 x 1.5
10.
An Initial stock price is $20. The expected return is 12% per annum, and the volatility is estimated to
MGSC 272
Page5/23
be 8% per annum. A 95% confidence for the value of the stock after 9 months is:

.082 
Mean of ln(Price) = ln(20)   .12 
  .75  3.0833
2 

Std. Dev. of ln(Price) =  T  0.08 .75  0.06928
3.0833  1.96(.06928)
3.0833  .1358
2.9475  ln ST  3.2191
$19.06  ST  $26.01
11. For a multiple regression model, consider the following statements:
I. One use of transformations in regression is to transform what appears to be a nonlinear model into
a linear model.
II. A nested F-test may be used to measure the significance of the coefficient of partial determination
when a new variable is added to a model.
III. Significant interaction between two variables implies that as the value of one of the variables
increases the rate of change of the Y value with respect to the other variable also increases.
Which of the above statements are true:
A.
B.
C.
D.
E.
I and II only
II and III only
II only
III only
All of the above statements are true.
Ans: A
The following information pertains to the next three questions:
MGSC 272
Page6/23
The CEO of a chain of 38 sporting goods stores wishes to determine a relationship between Monthly Sales
(Y) and the demographic variables Age (X1= Median Age of customer base), Income (X2 = Median
Income), and HS (X3 = percentage of customer base with a high school diploma). A partial computer output
is shown below.
Regression Analysis: Sales versus Age, Income, HS
The regression equation is
Sales = - 2063712 - 30100 Age + 0.8 Income + 60212 HS
Predictor
Constant
Age
Income
HS
S = 823650
12.
Coef
-2063712
-30100
0.82
60212
SE Coef
3445731
87328
27.89
31223
R-Sq = 24.3%
T
-0.60
-0.34
0.03
1.93
P
0.553
0.732
0.977
0.062
R-Sq(adj) = 17.7%
Find the value of R2 for this model.
n  (k  1)
38  4
(1  R 2 adj )  1 
(1  .177)
n 1
38  1
 0.2437
R2  1 
13.
One store in the sample was located in a suburb that had a median age of 36 years, a median income
of $48,000, and 80% of the population had a high school diploma. This store recorded sales of $1.5
million. What is the residual for this store?
Sales = -2063712 -30100(36) + 0.8(48000) +60212(80) = 1708048
Residual = 1,500,000 - 1,708,048 = - $208,048
MGSC 272
Page7/23
14.
For a logistic regression model, consider the following statements:
I. The logit function represents the natural logarithm of the probability that a binary variable will
assume the value 1.
II. In a logistic regression the dependent variable assesses the likelihood that a particular outcome will
occur.
III. In logistic regression models, the test statistics for goodness of fit tests follow a Wald distribution.
Which of the above statements are true?
A.
B.
C.
D.
E.
I and II only
II and III only
II only
III only
All of the above statements are true.
Answer: C
15.
Consider the following statements:
I. Mallow’s Cp is a good criterion for selecting the variables in a multiple regression model because
it simultaneously increases the adjusted R2 while reducing the number of variables in the model.
II. Stepwise regression generally selects a model that reduces multicollinearity.
III. The variance inflation factor is a measure of how strongly the Y variable is correlated with all of
the X variables combined.
Which of the above statements are true?
A.
B.
C.
D.
E.
I only
I and II only
II and III only
III only
All of the above statements are true.
Ans: B
MGSC 272
Page8/23
PART 2 (70 marks)
QUESTION 1 (16 marks)
The manager of a large supermarket chain would like to determine the effect of shelf space, and whether the
product was placed at the front or back of the aisle, on the sales of pet food. A random sample of 12 equal
sized stores is selected with the following results:
The variable Space represents shelf space measured in meters, and the variable Place = 1 if the item is
placed in the front of the aisle and 0 if it is placed in the back. The dependent variable Sales represents
weekly sales in thousands of dollars. For example the first observation involves sales of $1,600 when 5 m of
space is allocated and the pet food is placed at the back of the aisle.
The following output has been obtained from Minitab:
MGSC 272
Page9/23
(a) Interpret the coefficient of Space in the first model above, Sales vs Space. The store manager
currently devotes 15 meters of shelf space to pet food and is thinking about doubling the space. As
the manager’s statistical consultant, you are asked to express your opinion on this decision in the
space below. Show calculation to justify your conclusion. [4 points]
Write your opinion here
Coeff of Space = .074. This means that when Space is increased by 1 unit (1 meter) Sales
Increases, on average, by $74.
If Space = 15 then Sales = 2.56 ($2,560)
If Space = 30 then Sales = 3.67 ($3,670)
WARNING: Extrapolation problem, since Space = 30 is outside of the range of observed
data of 5 to 20 meters.
MGSC 272
Page10/23
(b) Interpret the coefficient of Place in the second model above, Salary vs Place. Based on this model
would you conclude that, at the 5% level of significance, sales value is significantly higher when the
pet food is placed at the front of aisle? Show your work. [4 points]
The coefficient of Place is 0.45. This means that, on average, when Place = 1 (front of aisle) sales is
higher by 1000 × 0.45 = $450.
H0: 1  0
H1: 1 > 0
TS: b1 = .45 => t = .45/.1305 = 3.45
CV: t0.05;10 = 1.812
Conclusion: Reject H0 => Sales higher in front of aisle
(c) Consider the regression model of Sales vs Space and Place. [4 points]
(i) For this model, estimate R2.
s = .213177 => s2 = .0454 = MSE
SSE = [n-(k+1)]MSE = [9](.9454) = 0.409
R2 = 1 – SSE/SSTO = 1 - .409/3.0025 = 1 - .1362 = 0.8638 => 86.4%
(ii) Discuss the question of multicollinearity in the model.[2 point]
No Multicollinearity, because the coefficient of Place does not change when Space is added to the
model
(iii) Construct a 98% confidence interval for the coefficient of Space in this model. Interpret the
confidence interval. [2 points]
b1  t.01;9sb1 = .074  (2.821)(.01101) = .074  .031 => .043  1  .105
MGSC 272
Page11/23
QUESTION 2 (20 marks)
The Minister of Education in a province wants to investigate factors affecting the percentage of students
who pass a reading proficiency exam in Grade 10. She has conducted a study of 47 school districts and has
collected the following data:
%Passing (Y): Percentage of Students passing the proficiency exam
%Attendance (X1): Daily average of percentage of students attending class
Salary (X2): Average teacher salary (dollars)
Spending (X3): = Instructional spending per student (dollars)
An extract of the data is shown below:
Some relevant computer output is shown below:
MGSC 272
Page12/23
MGSC 272
Page13/23
Answer the following questions:
a) Which variables are most likely to cause a multicollinearity problem? Justify your answer. [2 points]
Salary and Spending are highly correlated so they will cause multicollinearity
b) Determine the coefficient of partial determination when Spending is added to the model containing
only %Attendance. Interpret this value. [2 points]
R2y2.1 = (5035.9 – 4702)/5035.9 = .0663 => 6.63%
Adding Spending explains an extra 6.63% of the variability unexplained by %Attendance.
c) Perform a nested F-test to determine whether it is worth adding Spending to the model containing
only %Attendance. [4 points]
Nested F-Test
Ho: β1 = 0
H1: β1  0
 SSER  SSEC  /  k  g    5035.9  4702  /(2  1)  3.1245
TS: F =
4702 /(47  3)
SSEC / n   k  1
CV: F.10; 1,44  4.08
Conclusion: Do not reject Ho i.e. it is not worth adding Spending to the model containing %Attendance.
MGSC 272
Page14/23
d) Consider the regression analysis of %Passing on %Attending and Spending. For this model,
calculate the standardized regression coefficients for %Attending and Spending. Interpret the results.
[4 points]
% Attending
b* 
2.121
(8.5)  .7487
275.37
Spending
b* 
209705.1
(.00598)  .1650
275.37
e) Consider the regression analysis of %Passing on %Attending and Spending. Perform the appropriate
t-tests and recommend whether or not to keep each of the variables in the model. [4 points]
CV : t.025;44  1.96
8.501
 7.982  1.96  Reject Ho
1.065
% Attending :
t
Spending : t 
.005984
 1.768  1.96  Do not Reject Ho
.003385
Conclusion: keep %Attending, drop Spending
MGSC 272
Page15/23
f) The regression model of %Passing on %Attendance has been expanded to include a quadratic term
(%Attend^2) with the following results:
You have been asked to recommend a model relating %Passing to %Attendance. Would you
recommend the linear model or the quadratic model? [1 point]
Quadratic.
If you choose the wrong model your estimate of the percentage of students who pass the proficiency
exam will be incorrect. Estimate the percentage error associated with the wrong choice. Will this be
an overestimate or an underestimate? [3 points]
%Error = (52.2 – 64.8)/52.2 = 0.24 => 24% error. Overestimate.
MGSC 272
Page16/23
QUESTION 3 (14 marks)
An auto club rates 50 cars for mileage per gallon as a function of the cars’ horsepower and weight (in
pounds).
Regression analyses are run with and without an interaction term as shown below:
a) In the model without interaction, interpret the coefficients of the variables HP and Weight. [2 points]
For each extra unit of horsepower MPG decreases by .118 miles per gallon
For each extra pound of weight MPG decreases by .00687 miles per gallon
If car A weighs 2100 pounds and car B weighs 3300 pounds, estimate the difference in miles per
gallon obtained by the two cars, assuming both cars have the same horsepower. [2 points]
Weight = 2100: Yhat = 58.2 - .118HP - .00687(2100) = 58.2 - .118HP – 14.427
Weight = 3300: Yhat = 58.2 - .118HP - .00687(3300) = 58.2 - .118Hp – 22.671
Difference = 8.244
Conclusion: Difference in MPG = 8.244 mpg
MGSC 272
Page17/23
Estimate the mileage per gallon obtained by a car with a horsepower rating of 120 and a weight of
2500 pounds. [2 points]
MPG = 58.2 - .118(120) - .00687(2500) = 26.865
b) In the model with interaction, perform a test of hypothesis, showing the null and alternative
hypotheses, with α = .05 to determine if the interaction is significant. [2 points]
H0: 3 = 0
H1: 3 ≠ 0
TS: t = .00010047/.00002971 = 3.38
CV: t.025;46  1.96
Conclusion: Reject Ho => Interaction
Interpret the interaction term in this model. [2 points]
The response of MPG to horsepower differs for cars of different weights.
The response of MPG to weight differs for cars of different horsepowers.
In this model, estimate the mileage per gallon obtained by a car with a horsepower rating of 120 and
a weight of 2500 pounds. [2 points]
MPG = 85.1 - .451(120) - .0152(2500) + .0001(120)(2500) = 22.98
c) Which of the two estimates obtained in parts (a) and (b) for cars with a horsepower rating of 120 and
a weight of 2500 pounds would you convey to a friend who has just bought such a car? Explain the
consequences of choosing the wrong model. [2 points]
The result of part (b) is more accurate because it takes interaction into account.
The wrong model would result in overestimating the mileage per gallon by3.9 miles per gallon.
MGSC 272
Page18/23
QUESTION 4 (20 marks)
A Beer Company Executive is tracking sales of several brands of beer and it's tempting to establish a
relationship between price and the independent variables alcohol content (expressed as a percentage),
country of origin, and type of beer.
Price is measured in dollars; country of origin is equal to 1 for domestic beers and 0 for imported beers.
Beer is classified by 5 types: Lager, Ale, Red, Stout, and Lite.
A sample of 62 beers is selected and the regression analysis of Price on %Alcohol Content and Country of
Origin is:
a) For this model interpret the coefficients of the two independent variables. [4 points]
1% increase in alcohol content increases price on average $0.76 assuming country of origin is fixed i.e. 1%
increase applies equally to domestic and imported beers.
Price is $1.20 cheaper on average for domestic beers (Country of Origin = 1) then for imported beers,
assuming alcohol content is constant.
MGSC 272
Page19/23
The next regression model shows Price versus Country and Type, where Type = 5 is the reference type.
Analysis of Variance
Source
Regression
Residual Error
Total
DF
5
56
61
SS
86.193
39.848
126.042
MS
17.239
0.712
F
24.23
P
0.000
b) Which type of beer is the most expensive? Which is the least expensive? [2 points]
Type 2 is the most expensive.
Type 4 is the least expensive.
Beers Type 3 and 4 are not significant in the model. Should we drop the variables Type_3 and
Type_4 from the model? Justify your answer. [1 points]
You cannot drop a subset of the dummy variables.
Calculate the value of the standard error of the estimate for this regression model. [3 points]
SSE = 39.849; df = 62 –(5+1) = 56 => MSE = 39.849/56 = .7116
S = (MSE) = (.7116) = 0.8436
MGSC 272
Page20/23
The following output shows the full regression model, involving all of the predictor variables.
c) A beer costs $4.85. It is a domestic beer that has 150 calories, and is classified as Type 1. What
is the expected alcohol content of this beer? [4 points]
4.85 = 6.08 +.0006(150) -.139AC - 1.59(1) +.958(1)
AC = (6.08 – 4.85 +.0006×150 -1.59 +.958)/.139 = .688/.139 = 4.95% alcohol
d) Construct a 99% confidence interval for the difference in price between domestic and imported
beers. Interpret the confidence interval in the context of this model. [2 points]
-1.59  2.576(.5274) => -2.95 to -0.23
MGSC 272
Page21/23
Some additional Minitab output is shown below:
Explain why the best subsets model with the minimum value of Cp excludes the variable %Alcohol Content.
[2 points]
1. Parsimony – Cp reduces variable in the model
2. %Alcohol Content and Calories are highly correlated (r = .712). Therefore if one of them is already
in the model the other will not add much to the value of R2adj.
MGSC 272
Page22/23
Estimate the value of R2adj for the model with the minimum Cp value. [2 points]
n 1
SSE
61
 1  (1  R 2 )
n  (k  1) SSTO
59
61
=1- (1  .256)
59
=.2307
R 2 adj  1 
End of Exam
First Blank Page for Rough Work
MGSC 272
Page23/23
Second Blank Page for Rough Work
MGSC 272
Page24/23
Download