Exam 2008 - City University of Hong Kong

advertisement
CITY UNIVERSITY OF HONG KONG
Course code & title
:
MS4225: Business Research Modelling
Session
:
Semester B,
Time allowed
:
2 hours
2007-2008
This paper has 12 pages (including this page)
Instructions to candidates:
1.
2.
3.
Answer ALL THREE questions
Show sufficient work for each question
This question paper is NOT to be taken away
Materials, aids and instruments permitted during examination:
Approved calculator
1
Question 1 (35 marks)
Table 1.1 classifies 30,292 Democratic voters of the 2008 U.S. Presidential Primary in the state of
Texas according to gender (male, female), ethnic origin (White, Hispanic, African) and their
choice of Democratic nominee (Clinton, Obama).
Table 1.1
Male
White
Hispanic
African
Clinton
3,343
1,615
1,104
Obama
3,615
1,017
4,012
Female
White
Hispanic
African
4,517
2,990
1,913
2,012
540
3,614
Let GENDER = 0 for female, 1 for male; RACE = 1 for White, 2 for Hispanic and 3 for
African. Our goal is to estimate a logit model for the dependence of the choice of Democratic
nominee on gender and race. To do this, we run the SAS program:
DATA QUESTION1;
INPUT GENDER RACE CLINTON OBAMA;
TOTAL = CLINTON + OBAMA;
DATALINES;
1 1 3343 3615
1 2 1615 1017
1 3 1104 4012
0 1 4517 2012
0 2 2990 540
0 3 1913 3614
;
PROC GENMOD DATA=QUESTION1;
CLASS RACE;
MODEL CLINTON/TOTAL=GENDER RACE/D=B COVB;
RUN;
i)
Explain the purpose of the statement “MODEL CLINTON/TOTAL=GENDER
RACE/D=B;” in the above SAS program.
(3 marks)
ii)
Explain the purpose of the “CLASS RACE;”statement in the above SAS program.
(2 marks)
iii)
What is the advantage of putting RACE in the CLASS statement and what will happen if
this statement is removed?
(2 marks)
iv)
Appendix 1 gives the parameter estimates and associated output. What are RACE 1,
RACE 2 and RACE 3? Calculate and interpret the odds ratio estimates for GENDER
RACE 1 and RACE 2.
(6 marks)
2
v)
Comment on the “overall” quality of the estimated model in terms of the Deviance test.
Why is the number of degrees of freedom for the test equal to 2?
(4 marks)
vi)
Using the estimated model, calculate the probabilities of a) a Hispanic female and b) a
Black male choosing Clinton as the Democratic nominee.
(6 marks)
vii)
Suppose the predicted probabilities of a White female and a Black female voting for
Obama are 0.309572 and 0.622819 respectively, and the predicted probabilities of a
White male and a Hispanic male voting for Clinton are 0.4817759 and 0.6522784
respectively. Use this information and the results of part vi) to work out the predicted
frequency of each entry in the contingency table.
(4 marks)
viii)
Using the results in vii), verify the Pearson Chi-Square value of 62.2001 in the SAS
output.
Provide an interpretation of what this value tells us.
(8 marks)
3
Question 2 (35 marks)
The marketing manager of Biotherm Homme, a major manufacturer of men’s skin-care products, is
trying to determine whether or not to advertise in a magazine read mostly by young male
professionals. He collected data on the age and occupation of customers purchasing products by
Biotherm Homme and its two closest competitors, Clarins and Clinique, over the past three
months. The data (in number of customers) are as follows:
Age
Occupational Status
25-35
Professional
White collar non-professional
Blue collar
Professional
White collar non-professional
Blue collar
>35
0
Let AGE  
1
if 25  Age  35
if Age > 35
Biotherm
Homme
36
52
8
24
22
2
Clarins
Clinique
27
19
5
41
26
3
29
26
5
32
27
3
;
OS = 1 for Blue Collar, 2 for White collar Non-professional and 3 for Professional, and
Product consumed (Y) is coded 1 for Biotherm Homme, 2 for Clarins and 3 for Clinique.
A multinominal logit model with PROC CATMOD has been fitted with Y as the dependent
variable and AGE and OS as explanatory variables. The SAS program is shown below and the
results are shown in Appendix 2a.
DATA QUESTION2;
INPUT AGE OS Y FREQ;
DATALINES;
0 3 1 36
0 3 2 27
0 3 3 29
0 2 1 52
0 2 2 19
.
.
.
1 2 3 27
1 1 1 2
1 1 2 3
1 1 3 3
;
PROC CATMOD DATA=QUESTION2;
WEIGHT FREQ;
DIRECT AGE OS;
MODEL Y = AGE OS/NOITER;
RUN;
4
i)
Explain the rationale of treating the Y categories as unordered.
ii)
Are there evidences of an age effect and an occupational status effect? Answer this
question using information from the ANOVA table of the output.
(4 marks)
iii)
Write down the estimated equations of the log odds for Biotherm Homme vs. Clarins,
Biotherm Homme vs. Clinique, and Clarins vs. Clinique.
(7 marks)
iv)
Obtain the odds ratio estimate of AGE in the model for Biotherm Homme vs. Clinique.
Give an interpretation of this odds ratio estimate.
(3 marks)
v)
What is the purpose of the DIRECT statement in the SAS program?
vi)
Do you agree that Biotherm Homme is a popular product among young professional men?
(2 marks)
Discuss the meaning of the assumption of “Independence of Irrelevant Alternatives (IIA)”.
(3 marks)
Using results from Appendices 2a and 2b, conduct the Hausman-McFadden test for the
IIA assumption. You may find the following information useful:
(12 marks)
vii)
viii)
 0.1248682E-01 0.4930000E-05 -0.5222100E-02 


 0.4930000E-05 0.3772000E-04 -0.8540000E-05 
 -0.5222100E-02 -0.8540000E-05 0.2728538E-02 


=
 401.7872

121.6721
 769.3541

121.6721
26566.78
316.0170
(2 marks)
(2 marks)
1
769.3541

316.0170
1839.939
5
Question 3 (30 marks)
A company did a survey of employees between 55 and 65 years of age who were eligible for
retirement. The dependent variable (R) was the response to the question of whether the employee
would retire in the next twelve months, for which the employee could answer 0 for no, 1 for
undecided, and 2 for yes. The explanatory variables were A, the age of the employee, N the
number of years employed, and S, the monthly salary at the present time. The analyst wanted to
take into account the ordering of the dependent variable R when estimating the model.
i)
Discuss the representation of the Ordered Logit model starting off with an unobserved
(latent) variable.
(6 marks)
ii)
In an Ordered Logit model, would the “marginal effects” always take the same sign as the
corresponding coefficients? Why or why not?
(6 marks)
iii)
The results reported in Appendix 3 have been obtained with SAS. Comment on the
“overall quality” of this estimated model based on the Likelihood Ratio test.
(5 marks)
iv)
What is the purpose of the “Score test for the proportional odds assumption”? Comment
on the result of the test.
(5 marks)
v)
Calculate the probability that an employee with the following characteristics will make no
decision as to whether he will retire in the next twelve months: A = 64, N = 36 and S =
65,000.
(5 marks)
vi)
How will the estimation results differ if the DESCENDING option is used in the PROC
LOGISTIC statement?
(3 marks)
6
Appendix 1
The SAS System
The GENMOD Procedure
Model Information
Data Set
Distribution
Link Function
Response Variable (Events)
Response Variable (Trials)
Number
Number
Number
Number
of
of
of
of
WORK.QUESTION1
Binomial
Logit
clinton
total
Observations Read
Observations Used
Events
Trials
6
6
15482
30292
Class Level Information
Class
Levels
race
3
Values
1 2 3
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2
Log Likelihood
DF
Value
Value/DF
2
2
2
2
62.4794
62.4794
62.2001
62.2001
-18380.5253
31.2397
31.2397
31.1001
31.1001
Algorithm converged.
Analysis Of Parameter Estimates
Parameter
DF
Estimate
Standard
Error
Intercept
gender
race
1
race
2
race
3
Scale
1
1
1
1
0
0
-0.5485
-0.8750
1.3507
2.0527
0.0000
1.0000
0.0240
0.0254
0.0286
0.0372
0.0000
0.0000
Wald 95% Confidence
Limits
-0.5956
-0.9249
1.2946
1.9797
0.0000
1.0000
-0.5015
-0.8252
1.4067
2.1256
0.0000
1.0000
ChiSquare
522.84
1183.82
2230.07
3041.31
.
Pr > ChiSq
<.0001
<.0001
<.0001
<.0001
.
NOTE: The scale parameter was held fixed.
7
Appendix 2a
The CATMOD Procedure
Data Summary
Y
FREQ
QUESTION2
0
Response
Weight Variable
Data Set
Frequency Missing
Response Levels
Populations
Total Frequency
Observations
3
6
387
18
Population Profiles
Sample
AGE
OS
Sample Size
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
0
1
18
2
0
2
97
3
0
3
92
4
1
1
8
5
1
2
75
6
1
3
97
Response Profiles
Response
Y
ƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
1
2
2
3
3
Maximum Likelihood Analysis
Maximum likelihood computations converged.
Maximum Likelihood Analysis of Variance
Source
DF
Chi-Square
Pr > ChiSq
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Intercept
2
7.51
0.0234
AGE
2
15.48
0.0004
OS
2
2.44
0.2949
Likelihood Ratio
6
3.11
0.7955
The CATMOD Procedure
Analysis of Maximum Likelihood Estimates
Function
Standard
ChiParameter Number
Estimate
Error
Square
Pr > ChiSq
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Intercept
1
0.8745
0.5016
3.04
0.0813
2
-0.5253
0.5534
0.90
0.3425
AGE
1
-0.7051
0.2543
7.69
0.0056
2
0.2671
0.2595
1.06
0.3033
OS
1
-0.1728
0.2018
0.73
0.3917
2
0.1508
0.2158
0.49
0.4846
Covariance Matrix of the Maximum Likelihood Estimates
Row Parameter
Col1
Col2
Col3
Col4
Col5
Col6
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 Intercept 1 0.25163279 0.13980855 -.01643422
-.00998375
-.09560186
-.05199956
2 Intercept 2 0.13980855 0.30624572 -.01030705
-.02414533
-.05188285
-.11212132
3
AGE
1 -.01643422 -.01030705
0.06467743
0.03303173
-.00455919
-.00265971
4
AGE
2 -.00998375 -.02414533
0.03303173
0.06732694
-.00280056
-.00506059
5
OS
1 -.09560186 -.05188285
-.00455919
-.00280056
0.04071492
0.02190519
6
OS
2 -.05199956 -.11212132
-.00265971
-.00506059
0.02190519
0.04657614
8
Appendix 2b
The following gives the Maximum Likelihood estimates and the covariance matrix of the estimates
after deleting observations for Y =2:
The CATMOD Procedure
Analysis of Maximum Likelihood Estimates
Standard
ChiParameter
Estimate
Error
Square
Pr > ChiSq
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Intercept
0.8948
0.5139
3.03
0.0817
AGE
-0.7045
0.2544
7.67
0.0056
OS
-0.1814
0.2071
0.77
0.3811
Covariance Matrix of the Maximum Likelihood Estimates
Row
Parameter
Col1
Col2
Col3
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
Intercept
0.26411961
-.01642929
-.10082396
2
AGE
-.01642929
0.06471515
-.00456773
3
OS
-.10082396
-.00456773
0.04290003
9
Appendix 3
The SAS System
The LOGISTIC Procedure
Model Information
Data Set
Response Variable
Number of Response Levels
Number of Observations
Model
Optimization Technique
WORK.Q3
R
3
22
cumulative logit
Fisher's scoring
Response Profile
Ordered
Value
R
Total
Frequency
1
2
3
0
1
2
6
7
9
Probabilities modeled are cumulated over the lower Ordered Values.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Score Test for the Proportional Odds Assumption
Chi-Square
DF
Pr > ChiSq
1.7989
3
0.6152
The SAS System
The LOGISTIC Procedure
Model Fit Statistics
Criterion
AIC
SC
-2 Log L
Intercept
Only
Intercept
and
Covariates
51.712
53.894
47.712
31.313
36.768
21.313
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
26.3987
14.2666
6.6446
3
3
3
<.0001
0.0026
0.0841
Analysis of Maximum Likelihood Estimates
10
Parameter
Intercept 0
Intercept 1
A
N
S
DF
Estimate
1
1
1
1
1
44.4240
48.0236
-0.5914
-0.0307
-0.00022
Standard
Error
20.6158
21.3889
0.2991
0.0637
0.000086
Wald
Chi-Square
Pr > ChiSq
4.6434
5.0412
3.9106
0.2328
6.3513
0.0312
0.0248
0.0480
0.6295
0.0117
11
Download