Using Survival Analysis to Analyze Degree Completion

advertisement
USING SURVIVAL ANALYSIS TO ANALYZE
DEGREE COMPLETION
Janice Love
University of California, Los Angeles
Office of Academic Planning & Budget
CAIR 2014
AGENDA
Survival Analysis History & Background
Overview
Survival Analysis example using SPSS
Results of Survival Analysis
SURVIVAL ANALYSIS BACKGROUND
Definition
• A statistical method for studying the time to an
event. The term “survival” suggests that the event of
interest is death but the technique is useful for other
types of events.
Alternative terminology
• Event analysis, Time series analysis, Time-to-event
analysis
• Survival analysis –studies involving time to death
(biomedical sciences)
• Reliability theory / Reliability analysis (engineering)
• Duration analysis / Duration modeling (economics)
• Event history analysis (Sociology)
Uses
• Clinical trials
• Cohort studies
Example of Survival Probability Graph
http://wpfau.blogspot.com/2011/08/safe-withdrawal-rates-and-life.html
Example of Survival Probability Graph
http://faculty.tamucc.edu/sfriday/wordpress/?p=1358
Example of Survival Probability Graph
http://www.statcan.gc.ca/daily-quotidien/000216/dq000216b-eng.htm
SURVIVAL ANALYSIS HISTORY
• Unknown – been around for a few hundred years
• Techniques developed in medical / biological sciences
• World War II –military vehicles (reliability and failure
time analysis)
• The Kaplan-Meier Estimator was introduced with the
publication of NONPARAMETRIC ESTIMATION FROM
INCOMPLETE OBSERVATIONS – E. L. Kaplan / Paul
Meier, 1958
• Cited 34,000 times as of 2011
http://articles.chicagotribune.com/2011-08-18/news/ct-met-meier-obit-20110818_1_clinical-trials-research-experimental-treatment
SURVIVAL ANALYSIS - OVERVIEW


A set of statistical methods where the outcome variable is the
time until the occurrence of an event of interest

Follows cohort over specified time period with focus on an event

Useful when the rate of the occurrence of the event varies over time

Differs from other statistical methods: handles censored data (the
withdrawal of individuals from the study)
Censored observations :
•
•
Individuals who have not experienced “the event” by the end of the study
Right censoring
o
o
o
o
Study participant can’t be located
or lives beyond the end of the study
or drop outs before the study is completed
or is still enrolled
o
An observation with incomplete information
o
Don’t have to handle these individuals as “missing”
o
Do have to follow rules with respect to censored data
o
o
# of censored should be small relative to non-censored
Censored and non-censored population should be similar (Kaplan-Meier)
SURVIVAL ANALYSIS - CENSORING
terms enrolled Graduation_status
Student 1
Student 2
Student 3
Student 4
Student 5
Dropped out after 5 terms
Student 1
5
9
14
7
8
0
1
0
1
1
Student 2
"Survived" - still enrolled at
the end of the study period
Student 3
Student 4
Student 5
1
2
3
4
5 6 7 8
Time in Terms
9 10 11 12
Outcome data
Censored
Event
2
3
Total
5
SURVIVAL ANALYSIS - CENSORING
Consequences of mishandling or ignoring censored data:
Example
Student cohort, N = 50, event of interest = Graduation
Still enrolled at the end of the study, N = 6
No longer enrolled but did not graduate, N = 4
Options:
Code all 10 as missing
Code 4 as missing, 6 as graduated as of study end
Consequences:
Mean time to degree is over or understated
selection bias risk
Ignoring censored records completely or arbitrarily assigning
event dates introduces bias into the results
Inclusion of the censored data produces less bias.
Newell/Nyun 2011
SURVIVAL ANALYSIS – HANDLING
CENSORED DATA
Two methods to produce the
cumulative probability of survival that
the survival graph is based upon:
1. SPSS Life Table: (Each time period)
the effective size of the cohort is
reduced by ½ of the censored group
2. Kaplan-Meier Survival Table: The
survival probability estimate for
each time period, except the first, is
a compound conditional probability
SURVIVAL ANALYSIS - OVERVIEW

Data required for analysis:




Clearly defined event: (death, onset of illness, recovery from
illness, marriage, birth, mechanical failure, success, job loss,
employment, graduation).
 Terminal event
Event status (1 = event occurred, 0 = event did not occur)
Time variable = Time measured from the entry of a subject into
the study until the defined event. Months, terms, days, years,
seconds.
Covariates:
 To determine if different groups have different survival
times
 Gender, age, ethnicity, GPA, treatment, intervention
 Regression models
SURVIVAL ANALYSIS – SPSS DATA LAYOUT
Basic student data
• Time variable – terms enrolled
• Event status – graduation status
Binary or
dummy
variables
Censored
indicator
terms_enrolled
Student 1
Student 2
Student 3
Student 4
Student 5
5
9
14
7
8
graduate_status
0
1
0
1
1
gender
Group into
categories
1st_term_gpa
1
0
1
1
0
3.4
4.0
2.9
3.9
3.1
Cohort Description
•
•
•
•
•
•
•
Undergraduates, one division
Fall 2006, Fall 2007 entering freshmen, N = 884
Respondents to 2008 UCUES* survey
Freshmen admits (transfers excluded)
1st term gpa >= 3.0
Censored = 10 or 1.1%
Explanatory variables available: gender, URM status,
domestic-foreign status, Pell Grant recipient status, hours
worked (survey), double/triple major
* UCUES = University of California Undergraduate Survey
SURVIVAL ANALYSIS – SPSS
SPSS
• Analyze
• Survival
• Life Tables
SAMPLE DATA – WORKING IN SPSS
SPSS
• Analyze
• Survival
• Life Tables
SURVIVAL ANALYSIS – LIFE TABLE PRODUCED
BY SPSS primary output of the survival analysis procedure
Intervals = terms.
count is from admit
term
Count of still
enrolled
students at
start of term
SURVIVAL ANALYSIS – LIFE TABLE PRODUCED
BY SPSS primary output of the survival analysis procedure
# withdrawing
during interval
= censored
# exposed to
risk: # entering
interval minus ½
censored
Proportion
Terminating: #
Terminal events ÷ #
exposed to risk:
example Term 10 =
38 ÷ 829.5 = .05
# terminal
events = #
graduated
Proportion
surviving = 1
– proportion
terminating
Probability Density =
Estimated probability of
graduating in interval
Cumul. Surviving
= cumulative % of
those surviving at
end of interval =
(829.5 - 38) ÷ 884 =
0.90
Hazard Rate =
Instantaneous failure
rate. % chance of
graduating given not
having graduated at start
of interval
SURVIVAL FUNCTION GRAPH PRODUCED BY
SPSS
The proportion of the cohort that has survived (still enrolled) at
any term
Each step of the
curve represents an
event
There is a 90%
probability of
surviving to the end
of 10th term.
Surviving =
remaining enrolled!
FUNCTION & ONE MINUS A FUNCTION
y = x2
y = 1-x2
y = x+1
y = 1- (x+1)
ONE MINUS SURVIVAL FUNCTION
There is a 10%
probability of notsurviving to the end of
10th term.
Not surviving =
graduating!!
SURVIVAL ANALYSIS: SPSS, WITH COVARIATE
FACTOR = GENDER
SPSS
• Analyze
• Survival
• Life Tables
SURVIVAL TABLE=Terms_enrolled BY Gender(1 2)
/INTERVAL=THRU 15 BY 1
/STATUS=graduated(1)
/PRINT=TABLE
/PLOTS (SURVIVAL OMS)=Terms_enrolled BY
Gender.
SURVIVAL ANALYSIS – SPSS,
LIFE TABLE BY GENDER
Hazard Rate =
Instantaneous failure
rate. % chance of
graduating given not
having graduated at start
of interval
Median Survival Time = Time at
which 50% of the original cohorts
have not-survived (graduated)
SURVIVAL ANALYSIS: HAZARD RATIO
Life Table - Hazard Rate Column
Number
Entering
Interval
Number of
Terminal
Events
First-order Controls
Interval
Start Time
Gender
0
586
0
.00
1
586
0
.00
2
586
0
.00
3
586
0
.00
4
585
0
.00
5
584
0
.00
6
584
0
.00
7
583
0
.00
8
583
0
.00
9
583
38
.07
10
545
22
.04
11
523
73
.15
12
450
404
1.63
13
46
15
.41
14
28
11
.49
15
17
17
.00
0
298
0
.00
1
298
0
.00
2
298
0
.00
3
298
0
.00
4
298
0
.00
5
298
0
.00
6
298
1
.00
7
296
0
.00
8
296
1
.00
9
295
10
.03
10
285
16
.06
11
268
46
.19
12
222
183
1.41
13
38
18
.62
14
20
6
.36
15
13
13
.00
Female
Male
Hazard
Rate



Hazard Ratio = ratio of the
hazard rates.
At 12th term, Hazard ratio =
1.63 / 1.41 = 1.16, females
are 16% more likely to
graduate in the 12th term
than males
At 13th term, Hazard ratio =
.41 / .62 = .66, females are
34% less likely to graduate
in the 13th term than males
SURVIVAL FUNCTIONS - SPSS
FACTOR = GENDER
Survival Pattern: SPSS will produce a different colored line for each of the
factor’s values
SURVIVAL ANALYSIS: KAPLAN-MEIER METHOD
Assumptions
 Censored individual – student who has not
experienced the event (graduated) by the end of
the study, e.g. they are no longer enrolled



Check for differences between censored and noncensored groups
Cohorts should behave similarly – groups
entering at different times should be similar
Avoid “selection bias” in data
SURVIVAL FUNCTIONS –
SPSS, KAPLAN_MEIER
FACTOR = GENDER
KM Terms_enrolled BY
Gender
/STATUS=graduated(1)
/PRINT TABLE MEAN
/PLOT SURVIVAL
/TEST LOGRANK
BRESLOW TARONE
/COMPARE OVERALL
POOLED.
KAPLAN-MEIER SURVIVAL TABLE
This is an example of the survival table
produced by the Kaplan-Meier
procedure.
Kaplan-Meier Survival Probability
Estimate calculation example:
Interval 4: Cumulative Proportion Surviving =
# remaining / # at risk =
[(# at start of interval - (# censored + # of events)]
÷ [# at start of interval - # of events] =
[(46 – (2 + 1)] ÷ [(46 – 2)] = 43 ÷ 44 = 0.978
Interval 5: Cumulative Proportion Surviving =
[(43 – (2 + 2)] ÷ (43 – 2) = 39 ÷ 41 = 0.951 x 0.978
= 0.930
Kaplan-Meier Survival Table: The
survival probability estimate for
each time period, except the first, is
a compound conditional probability
In this way the fudging
is kept conceptual,
systematic, and
automatic.
Kaplan & Meier, 1958
Kaplan-Meier Results – Gender
Null Hypothesis: Female Curve = Male Curve
KAPLAN-MEIER
OUTPUT
Log Rank weights
all graduations
equally
Breslow gives
more weight to
earlier
graduations
Taron-Ware is
mixture of two
Kaplan-Meier Results – Gender
Null Hypothesis: Female Curve = Male Curve
Curves not
significantly
different at p < .05
COX REGRESSION (PROPORTIONAL HAZARDS)
• Measures influence of explanatory variables
• Most used Survival analysis method
• Only time independent variables are appropriate
• Assumptions: Hazards are proportional
COX REGRESSION, CHECKING PROPORTIONAL
HAZARDS ASSUMPTION
Repeat for
each factor!
SPSS
•
Analyze
•
Survival
•
Cox Regression
COX REGRESSION:
USE LOG MINUS LOG FUNCTION TO CHECK
PROPORTIONAL HAZARDS ASSUMPTION
Do not use Cox
Regression if the
curves cross. This
means the hazards are
not proportional.
COX REGRESSION MODEL – EXAMPLE,
GENDER

•
SPSS
Analyze
•
Survival
•
Cox Regression
•
(move gender to
Covariates box)
COX REGRESSION MODEL RESULTS:
EXAMPLE, GENDER
Interpretation of SPSS Cox
Regression Results:
• The reference category is
female because I made that
choice for this model
• It is not statistically
significant at p < 0.05 that
females and males have
different survival curves
Exp(B) = Hazard
ratio: Female vs. Male
The null hypothesis is
that this ratio = 1.
Hazard Ratio = eB = e-0.04 = 0.961
COX REGRESSION MODEL RESULTS: PELL
GRANT RECIPIENTS VS. NON-PELL GRANT
RECIPIENT
Tip: To edit the
default chart,
click on the
chart until the
“Chart Editor”
opens
Per KaplanMeier
Estimation,
Pell-Grant
Student curve
is not equal to
non-Pell Grant
students curve,
highly
significant at p
< .001
COX REGRESSION MODEL RESULTS: PELL
GRANT RECIPIENTS VS. NON-PELL GRANT
RECIPIENT
Pell Grant Recipients
1. Work more hours than non-Pell Grant Recipients
2. Pell Grant Recipients with similar GPAs to non-Pell
Grant Recipients have attempted 10 more units
SUMMARY
Survival Analysis provides the following:
•
•
•
•
•
Handles both censored data and a time variable
Life table
Graphical representation of trends
Kaplan-Meier survival function estimator
Survival comparison between 2 or more groups
p value is produced
that indicates if
difference between
curves is significant
or not
• Regression models – relationships between variables and
survival times
Descriptive power of survival analysis :
Terms Enrolled by 1st Term GPA – Using Survival Graph (K-M) to
display data
At end of 12th term:
~ 34% probability of
continued enrollment
~ 9% probability of
continued enrollment
REFERENCES
Dunn, S. (2002). Kaplan-Meier Survival Probability Estimates. Retrieved from
http://vassarstats.net/survival.html
Harris, S. (2009). Additional Regression techniques, October 2009, Retrieved from
http://www.edshare.soton.ac.uk/id/document/9437
Newell, J. & Hyun, S. (2011). Survival Probabilities With and Without the Use of
Censored Failure Times Retrieved from
https://www.uscupstate.edu/uploadedFiles/Academics/Undergraduate_Research/Reseach_
Journal/2011_007_ARTICLE_NEWELL_HYUN.pdf
Singh, R., Mukhopadhyay, K. (2011). Survival analysis in clinical trials: Basics and must
know areas, Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3227332/t
Wiorkowski, J., Moses, A., & Redlinger, L. (2014).The Use of Survival Analysis to
Compare Student Cohort Data, Presented at the 2014 Conference of the Association of
Institutional Research
Contact Info: jlove@ponet.ucla.edu
Thank you!
Download