Questions

advertisement
Stat-285 – Assignment 9 – 2007 Fall Term
1. Women and children first.
Have you ever watched a movie, or read a book, about a ship in trouble
and when the words “women and children first!” are shouted out, you
know that inevitably those words means that the ship is doomed to sink?
You can find the source of gallant tradition at http://ne.essortment.
com/shiptraditionw_rrqb.htm.
This question deals with the sinking of the Titanic and an examination
of the probability of survivorship as a function of age, sex, and class of
passage of this tragedy.
Visit http://www.statsci.org/data/general/titanic.html to get a
list of the passengers aboard the Titanic. Download the datafile and
import it into JMP. The file contains 5 variables: the passenger name; the
class of passage; the age; the sex; and an indicator variable for survival
status.
(a) Several of the ages are missing. These could likely be reconstructed
from the original sources. We will assume that the age values are
MCAR. What does this mean, and what implications will this have
for the analysis?
(b) Use Analyze->Fit Y-by-X platform to look at the breakdown of sex
by class of passage. What does the mosaic plot show you? Confirm
this by looking at a suitable contingency table with the appropriate
percentages.
(c) Use the Analyze->Fit Y-by-X platform to investigate the survival
rates of the two sexes for each separate class of passage. [Hint: Use
the By button.]. Complete the following table – note that S is survival:1
1
If you use a By variable, you cannot save predictions directly to the data table as in
previous assignments. However, saved columns are still accessible by using the Red-Triange
→Script →Data Table Window. This will show a “hidden” data table that is created for each
value of the By variable. You will have to do this for each value of the By variables.
Here is the official FAQ from SAS:
When by variables are used, JMP creates a new intermediate table for each level of
the by variable. Statistics such as predicted values are saved to these intermediate
tables rather than the original data table. To see the intermediate table you will
need to click on the red triangle next to Generalized Linear Model Fit and choose
1
Males
Female
Odds-ratio of S
Class P (S) ODDS(S) P (S) ODDS(S)
F vs M
1st
2nd
3rd
So what do you conclude about “women and children first”?
(d) The above analysis ignored the age of the passengers. For each combination of sex and passenger class, fit a logistic regression to predict
survival as a function of age. Complete the following table for predicting the SURVIVAL rates of passengers as a function of age
[Hint: think carefully what JMP produces – is it predicting survival
or death?]:
Coefficient
Class Sex
of age SE p-value
1st
Males
1st
Females
2nd
Males
2nd
Females
3rd
Males
3rd
Females
So what do you conclude about the adage of “women and children
first”?
In more advanced classes (e.g. Stat-302 or Stat-402), you would have
learned how to fit one model for the combined data over all sexes and
classes of passage, and looked at the effect of age upon survival after
adjusting for the sex and class of passage.
2. Never underestimate the p-o-w-e-r of the Orange side
Many people find it annoying when a cell phone goes off at the exact
climax of a film.2
When I was visiting England in September 2005, I happened to go to a
movie and noticed a series of ads that played before the movie started
asking patrons to turn off their cell phone. The premise of these advertisements are pitches by various celebrities to the Orange Film Funding
Board, a fictitious agency, for films they would like to produce. The ads
were sponsored by the Orange Cell Phone company, one of the largest
mobile phone companies in the United Kingdom.3
Script->Data Table Window. You will have to do this for each level of the by
variable. The new data table that appears will be for that specific level of the
by variable and will contain the statistics such as predicted values that you have
chosen.
2 See http://www.cnn.com/2005/TECH/10/17/wireless.manners/index.html or http:
//www.boundless.org/2005/articles/a0001207.cfm or http://www.mobiledia.com/news/
41645.html.
3 More details at http://www.orange.com/
c
2007
Carl James Schwarz
2
c.i. for
odds-ratio
You can view some of the advertisements at (don’t forget to press the Play
button beneath each ad):
(a) http://www.visit4info.com/details.cfm?adid=22035 - my favorite
(b) http://www.visit4info.com/details.cfm?adid=20298
(c) http://www.visit4info.com/details.cfm?adid=24647 - my second favorite
(d) http://www.visit4info.com/details.cfm?adid=24648
These advertisements have made it into Wikipedia at http://en.wikipedia.
org/wiki/Orange_UK.
But do these commercials actually work?
(a) Describe how your would perform an experiment as a completely randomized design. The four ads are to be compared (with a control of
no ads). There are 10 screens, five showings per day (morning, early
afternoon, late afternoon, early evening, and late evening identified
by the numbers 1 to 5), seven days per week (1=Sunday, 2=Monday,
etc), and a 4 week test period.
You can download some data from http://www.stat.sfu.ca/~cschwarz/
Stat-285/Assignments/cellphone.txt. The variables in the dataset are
the week, day, showing, screen, ad used, number of tickets sold, and the
number of cell phones that went off.
Convert the number of cell phones that went off to a simple yes/no variable.
(b) Test the hypothesis that the probability of a cell phone interruption
is the same for all ads (including the control).
(c) Estimate the probability of a cell phone interrupting the movie for
each ad and complete the following table:
Ad
Estimate se 95% ci
None
dh
dv
jc
ss
(d) Draw a suitable graph (possibly by hand) showing the results from
the previous table. What does this graph show? Which ad seems to
be the most effective?
(e) Estimate the difference in the log-odds between cases with no ads
and the Darth Vader ad along with a se and and an approximate
95% confidence interval. Convert this to an odds ratio along with
a 95% confidence interval. Interpret this odds-ratio. What do you
conclude?
c
2007
Carl James Schwarz
3
In more advanced classes (e.g. Stat-302 and Stat-402) you will learn how
to use the actual number of cell phone calls as the response variable and
how to adjust it for the number of tickets sold for that showing.
Common errors made on this assignment – check your work!
• Many students just attached all output and did not provide the table and
conclusions.
There are NO jobs for people who just bash numbers through a statistical
package and provide "computer diarrhea" as a report! It is vitally important that you understand what output is produced and that you are able
to write a coherent report. In many cases, output is badly labelled and
the results are not obvious.
• In the experimental design, some students did not consider the control
group (no ad).
• Some students just stated the null hypothesis.
• Many students did not notice that the models estimate the probability of
No interruption.
c
2007
Carl James Schwarz
4
Download