Data Analysis131 - Bryn Mawr College

advertisement
Data Analysis131
Logistic Regression
Assignment 6
You use logistic regression when you have a binary dependent variable and either
interval/ratio scale or binary independent variables. While chi-square statistics allow you
to use a nominal scale dependent variable, logistic regression has much greater flexibility
in the use of control variables than chi-square. Also, in logistic regression, you can
determine the exact relationship between the independent variable and the dependent
variable (much like OLS regression analysis). To use the logistical regression model, get
into your data,
TO ACCESS THE DATA FROM SPSS
1) Click the "Start" button visible in the bottom lefthand corner of your computer
screen.
2) Click on My Computer
3) Double Click on ‘Local Disk C’
4) Double click on the folder ‘Data’
5) Double click on PSID929903.sav, Adolescent.sav, or CDS2002.sav (depending on
which data you are working with).
then click on Analyze, then on Regression , then on Binary Logistic.
Remember to weight the data. After you've clicked onto your data set, go to Data, then
to the bottom of that list to Weight Cases. Next, click on weight cases by and click on
Weight.
PSID929903 Data
In this assignment, you will examine which factors help explain why some people get
cancer. You'll use the variable Cancer as your dependent variable, which has a value of 1
when the head of household has cancer and a value of 0 when the person does not have
cancer. Click on Cancer , as your dependent variable. For your independent and control
variables, or covariates, click on ER13010 (age of the head, interval/ratio), Whthd99
(race of white, dummy variable), Gupoor (did this person grow up poor, dummy
variable), hlhdguex (was the health status of this person excellent from ages 0 to 16,
dummy variable), Er15654 ( whether the person ever smoked, dummy variable) and
Nummarr (number of marriages, interval/ratio).
Your research question is: Is the occurance of cancer related to such factors as the age of
the head of household, whether the person is white (relative to all other races), whether
the person grew up in a poor household (relative to all other households), whether the
person had excellent health growing up (relative to those who had very good, good, fair,
or poor health growing up), whether the person ever smoked (relative to those who never
smoked), and the number of marriages for the person. Each of the coefficient estimates is
determined while controlling for the effects of the other variables within the model.
Child/Adolescent Data
You will examine which factors affect the childhood/adolescent factors affect the
likelihood of becoming a high school dropout. You'll use the variable AdDrop as your
dependent variable, which has a value of 1 if the person is a high school dropout and a
value of 0 if the person has at least a high school degree. Click on AdDrop as your
dependent variable. For your independent and control variables, or covariates, click on
move (number of household residential moves, interval/ratio), Permar (percentage of
years the person's parents were married, interval/ratio), lowbw (was the person born low
birthweight versus not being born low birthweight, dummy variable), female (person's
gender is female versus male, dummy variable), and NevMarr (has this person never
been married, dummy variable).
Your research question is: Is dropping out of high school related to such factors as the
number of times this person's family moves when he/she is between ages 10-14, the
percentage of years this person's parents are married, whether this person was born with
weight under 5 1/2 pounds, gender, and if this person has never been married. Each of
the coefficient estimates is determined while controlling for the effects of the other
variables within the model.
Child Development Data
You will examine which factors affect whether or not the child feels that she/he does
things as well as most people. You'll use the variable aswellas as your dependent
variable, which has a value of 1 if the child feels that she/he does things as well as most
people and a value of 0 if the child does not feel she/he does things as well as most
people. Click on AswellAs as your dependent variable. For your independent variables,
or covariates, click on pdepress (does the primary care giver feel depressed, dummy
variable), toohard (is taking care of your child more work than pleasure, dummy
variable), criticiz (do family members criticize each other a lot, dummy variable),
Numspank (how many times in the past week has the primary caregiver spanked the
child, interval/ratio scale variable), and Expect1 (the primary caregivers expects the child
to do no better than finish high school, dummy variable).
Your research question is: Are children's feelings that they can do things as well as most
people related to such factors as depression for the primary care giver, whether the
priimary caregiver feels that taking care of his/her children is more work than pleasure,
family members criticize each other a lot, the number of times per week the child is
spanked, and the educational expectations of the child by the primary caregiver. Each of
the coefficient estimates is determined while controlling for the effects of the other
variables within the model.
Interpreting Your Results
To determine the significance levels for the coefficient estimates for the variables, look at
the bottom table (labeled variables in the equation) and examine the column of your
output labeled Sig. These are your two-tailed significance levels. If this column indicates
a level under .05, you can reject your null hypothesis at the 5% level of significance for a
two-tailed test. A value of .6160 indicates that this coefficient estimate is significant at
the 61.07% level -- higher than the 5% level. For all variables that are above the .05 level,
accept the null hypothesis (you can adjust the significance levels in the Sig column of
your output for one tailed tests). A Sig value of .0000 indicates that the coefficient is
significant at the 0% level (or is in the far tail portion of the distribution), and is thus
highly significant).
Questions:
1. Which of the independent variables have a significant relationship with the depend
variable? Which do not have a significant relationship with the dependent variable?
Reject the null hypotheses at the .05 level for a two-tailed test?
2. Use the Exp(B) or the odds ratio column to interpret the likelihood of having a value of
1 for your dependent variable. For dummy variables (independent variables), you are
examining the likelihood of being in the 1 category versus being in the 0 category. Thus,
if you are examining the effects of being depress (independent variable) on the likelihood
of suicide (dependent variable) and get an odds ratio of 1.5, this indicates that those who
are depressed are 1.5 times as likely or 50% more likely to commit suicide than those
who are not depressed. If you are examining the effects of the number of times you
spank the child affects the child's likelihood of having poor health and get an odds ratio
of 1.03, this indicates that for every time the child is spanked, the likelihood of having
poor health increases by 3 percent.
3. How much of the variation of the dependent variable is explained by the set of
independent variables?
4. Is your model significant? How do you know this.
Hand in your output and the answers to these questions.
********************************
If the dependent variable in your paper is a nominal scale variable, you will be using
logistic regression analysis for this final paper. If you would like, go to the data set you
are using for your paper, and use the dependent variable that you’ll be using for your
paper. Use some of the control and independent variables you will be using for your
paper. What do you find?
Download