Homework We have seen that splits in decision trees are related to Chi-Square tests so naturally we want to be comfortable with these. Under our demos link from our course home page, you will find a SAS program Titanic.sas. It contains counts of alive and dead from the famous Titanic ship wreck along with the URL from which the data came (there are several reports available on survival data, not all of which exactly match in their counts). For your writeup, suppose you were working with the White Star Line shortly after the Titanic sunk. Perhaps some initial analyses were run by a consultant and your boss, an intelligent person with some statistical knowledge, wanted you to check out the results and explain a few concerns he had (the questions below). He asked you to write up your thoughts. He is particularly concerned with the potential for being sued for class prejudice in terms of who got saved and who didn’t. My main point on this and other homeworks is to work on your exposition as well as your analysis. It is fine to organize your report by question, but please use complete sentences and make it interesting to read. (1) Look over the program and describe the two forms of data sets that it creates. Also describe what the variable X represents in that second data set and what that first.status notation does. (2) Record the correlation r between X and the 0-1 survival variable and give its p-value. Square the correlation and multiply by the total number of passengers. Record your result (you will see why later). (3) Compute the usual (Pearson) Chi-square test, using PROC FREQ on either TITANIC data set, for testing the null hypothesis of independence between class status (crew, first class etc.) and fate. Remind your readers what a p-value is and what it is used for. What is the p-value for the usual (Pearson) Chisquare test? What is the logworth of the 4 way split implied by this table? For a table like this, the degrees of freedom is (#rows-1)(#cols-1) where #rows is the number of rows etc. (4) We see one p-value that is way out of line with the others and it is based on different degrees of freedom. Why? (Hint: compare the test statistic to your correlation in (1)). Does that unusual p-value indicate that there was class bias in getting people into the lifeboats or does it not? Perhaps the company can use this to defend against accusations of prejudicial saving of lives based on status (??) (5) Does the order listed for the status categories in the Chi-Square table affect any of the p-values? If so, is the order listed the most sensible one? If not, take the appropriate action (I should not have to tell you this!). Which, if any, of the Chi-squares and p-values changed? Why? In describing what you’ve learned here in the report, you might distinguish linear association from general association. (6) Comparing parts 2 and 4, why was it important to run that PROC SORT before creating the second data set and its X variable? Note: Within SAS you can find help on any procedure (like PROC FREQ) by clicking on the purple book with a yellow “?” on it (menu bar at top, rightmost icon). You can click on the contents tab at the top of the help window, then SAS PRODUCTS, then SAS/STAT, then PROC FREQ, then SYNTAX, for example. If you are unfamiliar with SAS or chi-square calculations, you can fill in these blanks to do question 3: proc freq data=titanic; tables _________/chisq; weight ________; run;