Data Analysis131 Logistic Regression Assignment 6 You use logistic regression when you have a binary dependent variable and either interval/ratio scale or binary independent variables. While chi-square statistics allow you to use a nominal scale dependent variable, logistic regression has much greater flexibility in the use of control variables than chi-square. Also, in logistic regression, you can determine the exact relationship between the independent variable and the dependent variable (much like OLS regression analysis). To use the logistical regression model, get into your data, TO ACCESS THE DATA FROM SPSS 1) Click the "Start" button visible in the bottom lefthand corner of your computer screen. 2) Click on My Computer 3) Double Click on ‘Local Disk C’ 4) Double click on the folder ‘Data’ 5) Double click on PSID929903.sav, Adolescent.sav, or CDS2002.sav (depending on which data you are working with). then click on Analyze, then on Regression , then on Binary Logistic. Remember to weight the data. After you've clicked onto your data set, go to Data, then to the bottom of that list to Weight Cases. Next, click on weight cases by and click on Weight. PSID929903 Data In this assignment, you will examine which factors help explain why some people get cancer. You'll use the variable Cancer as your dependent variable, which has a value of 1 when the head of household has cancer and a value of 0 when the person does not have cancer. Click on Cancer , as your dependent variable. For your independent and control variables, or covariates, click on ER13010 (age of the head, interval/ratio), Whthd99 (race of white, dummy variable), Gupoor (did this person grow up poor, dummy variable), hlhdguex (was the health status of this person excellent from ages 0 to 16, dummy variable), Er15654 ( whether the person ever smoked, dummy variable) and Nummarr (number of marriages, interval/ratio). Your research question is: Is the occurance of cancer related to such factors as the age of the head of household, whether the person is white (relative to all other races), whether the person grew up in a poor household (relative to all other households), whether the person had excellent health growing up (relative to those who had very good, good, fair, or poor health growing up), whether the person ever smoked (relative to those who never smoked), and the number of marriages for the person. Each of the coefficient estimates is determined while controlling for the effects of the other variables within the model. Child/Adolescent Data You will examine which factors affect the childhood/adolescent factors affect the likelihood of becoming a high school dropout. You'll use the variable AdDrop as your dependent variable, which has a value of 1 if the person is a high school dropout and a value of 0 if the person has at least a high school degree. Click on AdDrop as your dependent variable. For your independent and control variables, or covariates, click on move (number of household residential moves, interval/ratio), Permar (percentage of years the person's parents were married, interval/ratio), lowbw (was the person born low birthweight versus not being born low birthweight, dummy variable), female (person's gender is female versus male, dummy variable), and NevMarr (has this person never been married, dummy variable). Your research question is: Is dropping out of high school related to such factors as the number of times this person's family moves when he/she is between ages 10-14, the percentage of years this person's parents are married, whether this person was born with weight under 5 1/2 pounds, gender, and if this person has never been married. Each of the coefficient estimates is determined while controlling for the effects of the other variables within the model. Child Development Data You will examine which factors affect whether or not the child feels that she/he does things as well as most people. You'll use the variable aswellas as your dependent variable, which has a value of 1 if the child feels that she/he does things as well as most people and a value of 0 if the child does not feel she/he does things as well as most people. Click on AswellAs as your dependent variable. For your independent variables, or covariates, click on pdepress (does the primary care giver feel depressed, dummy variable), toohard (is taking care of your child more work than pleasure, dummy variable), criticiz (do family members criticize each other a lot, dummy variable), Numspank (how many times in the past week has the primary caregiver spanked the child, interval/ratio scale variable), and Expect1 (the primary caregivers expects the child to do no better than finish high school, dummy variable). Your research question is: Are children's feelings that they can do things as well as most people related to such factors as depression for the primary care giver, whether the priimary caregiver feels that taking care of his/her children is more work than pleasure, family members criticize each other a lot, the number of times per week the child is spanked, and the educational expectations of the child by the primary caregiver. Each of the coefficient estimates is determined while controlling for the effects of the other variables within the model. Interpreting Your Results To determine the significance levels for the coefficient estimates for the variables, look at the bottom table (labeled variables in the equation) and examine the column of your output labeled Sig. These are your two-tailed significance levels. If this column indicates a level under .05, you can reject your null hypothesis at the 5% level of significance for a two-tailed test. A value of .6160 indicates that this coefficient estimate is significant at the 61.07% level -- higher than the 5% level. For all variables that are above the .05 level, accept the null hypothesis (you can adjust the significance levels in the Sig column of your output for one tailed tests). A Sig value of .0000 indicates that the coefficient is significant at the 0% level (or is in the far tail portion of the distribution), and is thus highly significant). Questions: 1. Which of the independent variables have a significant relationship with the depend variable? Which do not have a significant relationship with the dependent variable? Reject the null hypotheses at the .05 level for a two-tailed test? 2. Use the Exp(B) or the odds ratio column to interpret the likelihood of having a value of 1 for your dependent variable. For dummy variables (independent variables), you are examining the likelihood of being in the 1 category versus being in the 0 category. Thus, if you are examining the effects of being depress (independent variable) on the likelihood of suicide (dependent variable) and get an odds ratio of 1.5, this indicates that those who are depressed are 1.5 times as likely or 50% more likely to commit suicide than those who are not depressed. If you are examining the effects of the number of times you spank the child affects the child's likelihood of having poor health and get an odds ratio of 1.03, this indicates that for every time the child is spanked, the likelihood of having poor health increases by 3 percent. 3. How much of the variation of the dependent variable is explained by the set of independent variables? 4. Is your model significant? How do you know this. Hand in your output and the answers to these questions. ******************************** If the dependent variable in your paper is a nominal scale variable, you will be using logistic regression analysis for this final paper. If you would like, go to the data set you are using for your paper, and use the dependent variable that you’ll be using for your paper. Use some of the control and independent variables you will be using for your paper. What do you find?