Stat-285 – Assignment 9 – 2007 Fall Term 1. Women and children first. Have you ever watched a movie, or read a book, about a ship in trouble and when the words “women and children first!” are shouted out, you know that inevitably those words means that the ship is doomed to sink? You can find the source of gallant tradition at http://ne.essortment. com/shiptraditionw_rrqb.htm. This question deals with the sinking of the Titanic and an examination of the probability of survivorship as a function of age, sex, and class of passage of this tragedy. Visit http://www.statsci.org/data/general/titanic.html to get a list of the passengers aboard the Titanic. Download the datafile and import it into JMP. The file contains 5 variables: the passenger name; the class of passage; the age; the sex; and an indicator variable for survival status. (a) Several of the ages are missing. These could likely be reconstructed from the original sources. We will assume that the age values are MCAR. What does this mean, and what implications will this have for the analysis? (b) Use Analyze->Fit Y-by-X platform to look at the breakdown of sex by class of passage. What does the mosaic plot show you? Confirm this by looking at a suitable contingency table with the appropriate percentages. (c) Use the Analyze->Fit Y-by-X platform to investigate the survival rates of the two sexes for each separate class of passage. [Hint: Use the By button.]. Complete the following table – note that S is survival:1 1 If you use a By variable, you cannot save predictions directly to the data table as in previous assignments. However, saved columns are still accessible by using the Red-Triange →Script →Data Table Window. This will show a “hidden” data table that is created for each value of the By variable. You will have to do this for each value of the By variables. Here is the official FAQ from SAS: When by variables are used, JMP creates a new intermediate table for each level of the by variable. Statistics such as predicted values are saved to these intermediate tables rather than the original data table. To see the intermediate table you will need to click on the red triangle next to Generalized Linear Model Fit and choose 1 Males Female Odds-ratio of S Class P (S) ODDS(S) P (S) ODDS(S) F vs M 1st 2nd 3rd So what do you conclude about “women and children first”? (d) The above analysis ignored the age of the passengers. For each combination of sex and passenger class, fit a logistic regression to predict survival as a function of age. Complete the following table for predicting the SURVIVAL rates of passengers as a function of age [Hint: think carefully what JMP produces – is it predicting survival or death?]: Coefficient Class Sex of age SE p-value 1st Males 1st Females 2nd Males 2nd Females 3rd Males 3rd Females So what do you conclude about the adage of “women and children first”? In more advanced classes (e.g. Stat-302 or Stat-402), you would have learned how to fit one model for the combined data over all sexes and classes of passage, and looked at the effect of age upon survival after adjusting for the sex and class of passage. 2. Never underestimate the p-o-w-e-r of the Orange side Many people find it annoying when a cell phone goes off at the exact climax of a film.2 When I was visiting England in September 2005, I happened to go to a movie and noticed a series of ads that played before the movie started asking patrons to turn off their cell phone. The premise of these advertisements are pitches by various celebrities to the Orange Film Funding Board, a fictitious agency, for films they would like to produce. The ads were sponsored by the Orange Cell Phone company, one of the largest mobile phone companies in the United Kingdom.3 Script->Data Table Window. You will have to do this for each level of the by variable. The new data table that appears will be for that specific level of the by variable and will contain the statistics such as predicted values that you have chosen. 2 See http://www.cnn.com/2005/TECH/10/17/wireless.manners/index.html or http: //www.boundless.org/2005/articles/a0001207.cfm or http://www.mobiledia.com/news/ 41645.html. 3 More details at http://www.orange.com/ c 2007 Carl James Schwarz 2 c.i. for odds-ratio You can view some of the advertisements at (don’t forget to press the Play button beneath each ad): (a) http://www.visit4info.com/details.cfm?adid=22035 - my favorite (b) http://www.visit4info.com/details.cfm?adid=20298 (c) http://www.visit4info.com/details.cfm?adid=24647 - my second favorite (d) http://www.visit4info.com/details.cfm?adid=24648 These advertisements have made it into Wikipedia at http://en.wikipedia. org/wiki/Orange_UK. But do these commercials actually work? (a) Describe how your would perform an experiment as a completely randomized design. The four ads are to be compared (with a control of no ads). There are 10 screens, five showings per day (morning, early afternoon, late afternoon, early evening, and late evening identified by the numbers 1 to 5), seven days per week (1=Sunday, 2=Monday, etc), and a 4 week test period. You can download some data from http://www.stat.sfu.ca/~cschwarz/ Stat-285/Assignments/cellphone.txt. The variables in the dataset are the week, day, showing, screen, ad used, number of tickets sold, and the number of cell phones that went off. Convert the number of cell phones that went off to a simple yes/no variable. (b) Test the hypothesis that the probability of a cell phone interruption is the same for all ads (including the control). (c) Estimate the probability of a cell phone interrupting the movie for each ad and complete the following table: Ad Estimate se 95% ci None dh dv jc ss (d) Draw a suitable graph (possibly by hand) showing the results from the previous table. What does this graph show? Which ad seems to be the most effective? (e) Estimate the difference in the log-odds between cases with no ads and the Darth Vader ad along with a se and and an approximate 95% confidence interval. Convert this to an odds ratio along with a 95% confidence interval. Interpret this odds-ratio. What do you conclude? c 2007 Carl James Schwarz 3 In more advanced classes (e.g. Stat-302 and Stat-402) you will learn how to use the actual number of cell phone calls as the response variable and how to adjust it for the number of tickets sold for that showing. Common errors made on this assignment – check your work! • Many students just attached all output and did not provide the table and conclusions. There are NO jobs for people who just bash numbers through a statistical package and provide "computer diarrhea" as a report! It is vitally important that you understand what output is produced and that you are able to write a coherent report. In many cases, output is badly labelled and the results are not obvious. • In the experimental design, some students did not consider the control group (no ad). • Some students just stated the null hypothesis. • Many students did not notice that the models estimate the probability of No interruption. c 2007 Carl James Schwarz 4