An exploration of Hazard Functions: Applied to the Analysis of the

advertisement
Woods 1
Senior Project
An Exploration of Hazard Functions:
Applied to the Analysis of the Displacement Spells of Older Workers
Royce Woods
Fall 2011
Woods 2
Abstract
This study intends to explore duration analysis and apply it to an economic idea. More
specifically, through the use of hazard functions, the displacement spells of older and younger
workers can be effectively juxtaposed. Here similar models to those employed in Chan and
Stevens (2001) and Johnson and Mommaerts (2010) are used. However, instead of using the
Health and Retirement Study (HRS), or the Survey of Income and Program Participation (SIPP),
the Displaced Worker Supplement (DWS) Census data set is used here. Also, instead of
estimating several probability functions, two hazard models are implemented here. As a result of
these changes, the results this study yields diverge from those of previous literature. The most
striking disparity is that age seems to have no effect on the hazard rate of reemployment.
Woods 3
Table of Contents
Introduction …………………………………………………………………………………….4
The Hazard Function ……………………………………………………………………………..5
Application
…………………………………………………………………………………...11
Literature Review
……………………………………………………………………………13
Theoretical Framework
……………………………………………………………………19
Model Specification and Data
Data
……………………………………………………………20
…………………………………………………………………………………………...25
Model Interpretation ……………………………………………………………………………26
Conclusion
…………………………………………………………………………………...28
Works Cited …………………………………………………………………………………...30
Appendix
…………………………………………………………………………………...33
Woods 4
Introduction
Unlike other econometric methods, duration analysis enables one to examine the time
elapsed until an event occurs. There are several methods encompassed in this type of analysis.
This study focuses primarily on hazard functions. Hazard functions calculate the probability an
event occurs in a given interval of time.
The hazard function is the only feasible technique that enables the examination of statechange probabilities over various intervals of time. Perhaps the only other alternative would be
to estimate several probability functions for each time period since the initial state. This,
however, is not efficient. For that reason, the hazard function is particularly powerful.
It is also powerful due to its wide range of applications. Hazard functions can be applied
in any instance where the time elapsed until an event of interest occurs is of importance. This
can be helpful in finding the failure rate of products, or the death rates of mice in an experiment.
Clearly the applications are vast.
Here reemployment will be examined using hazard functions. More specifically, this
study will focus on the probability individuals are reemployed, given that they are displaced,
over a certain spell of time. Moreover, this study intends to find the hazard rate of reemployment
over consecutive intervals of time. For example, if a specific interval were to be isolated, the
probability of reemployment will be over the probability an observation remained in the initial
state until the interval of interest, which gives the probability of reemployment during that
interval. The hazard function, in its entirety, is a collection of these points. This study primarily
explores the effect age has on the duration of displacement. In a duration analysis of this sort,
hazard functions are a prime method of estimation.
Woods 5
The Hazard Function
The hazard function is defined as the ratio of the probability density function and the
survival function of a random variable. The probability density function describes the chances an
event occurs at a given time. On the other hand, the survival function is formally defined as the
one minus the cumulative distribution function. In other words, it is necessary to focus on the
probability an observation survives until the interval of interest. The cumulative distribution
function is essentially the sum of the probability distribution function up to the time in question.
In other words, it is the chance the event occurs prior to the interval of interest. When the
probability density function is over the survival function, it creates the hazard function (Alemi
2007).
For example, where “t” is the standard interval of time for the analysis and “X” (a
random variable) is the time at which an observation leaves the initial state, the formula below
forms the hazard function (Alemi 2007).
𝑃(𝑋 = 𝑡)
𝑃(𝑋 > 𝑡)
One of the methods used to implement the hazard function is the Kaplan-Meier estimator.
It calculates the probability an observation does not leave the initial condition during a given
interval of time. Essentially, it is a ratio of those surviving the interval of time in question, over
all the observations that are at risk. It is formally defined as:
𝑆(𝑡) = ∏
(1 −
𝑡𝑖≤𝑡
𝑑𝑖
)
𝑛𝑖
Woods 6
Here ti is the length of the study up to i, and di is the number of observations that have
transitioned from the initial condition. Also, ni represents the number of observations that are at
risk up to time ti. This calculates the probability that an observation does not leave the initial
state during the interval of time in question, given that observation is in the initial state at the
start of the interval (Frankin 2006). To find the hazard rate, the equation below must be applied
(Stats Direct 2011).
ℎ(𝑡) = −ln(𝑠(𝑡))
The primary drawback of this method is that it is not reasonable for use with continuous
predictors. This is due to the fact that it produces a separate estimation for each level of the
continuous predictor in question. It is limited in the number of regressors that can be added as
well for the same reason. In other words, a different survival curve is estimated for each
covariate. Also, it is non-parametric, meaning that it is not as precise as its parametric and semiparametric counterparts due to the fact that there is no fit to the distribution. For those reasons,
other methods must be explored (Introduction to Survival Analysis).
There are other methods which are variants of the accelerated life model (“Accelerated
Life Models”). The primary difference between these methods is the distribution assumed for
the time to the event of interest and error terms. The distributions, in terms of the time elapsed
until the event of interest, that are applicable are the Weibull, log-logistic, exponential, and lognormal. With each iteration of the accelerated failure time model is different assumptions.
For instance, the log-logistic variation has assumptions that are unique from the other
versions of this model. It presumes the survival time of an observation and error term are
logistically distributed, and implies that the function varies non-monotonically over time. In
Woods 7
other words, it changes over time in a manner that is not uniform. The hazard rate for the loglogistic hazard model is represented as (“Parametric Models”):
1
ℎ𝑖 (𝑡, 𝑋) =
1
𝜆γ t[(γ) − 1]
γ[1 + (𝜆𝑡)
1
γ
( )
]
Where
𝜆𝑖 = 𝑒 −(𝑋𝑖 𝛽)
As illustrated above, the log-logistic hazard model has the two parameters “𝜆” and “γ"
which represent its location and shape, respectively. More specifically, “𝜆” is the rate at which
observations leave the initial state during interval “t,”given that they are in the initial state at the
beginning of the interval. Here “γ" determines if the hazard is rising or falling. To be precise,
if γ < 1, then the hazard rate rises over time then falls. On the other hand, if γ ≥ 1, the hazard is
declining. The log-logistic hazard model can never be increasing monotonically (“Parametric
Models”).
These same rules, with regard to the interpretations of “σ, "which is equivalent to the “γ"
from the log-logistic version, apply to the log-normal version of this model. However, its
specification differs due to the fact it assumes a log-linear distribution for the survival time and a
normal distribution for the error terms. It includes a "𝜑" here which represents the standard
normal cumulative density function and 𝜇 = 𝑋𝛽 (“Parametric Models”).
1
ℎ(𝑡) =
𝑒
𝑡𝜎√2𝜋
[
−1
{ln(𝑡)−𝜇}2 ]
2𝜎2
1 − 𝜑{
ln(𝑡)− 𝜇
𝜎
}
Also, there is the exponential case to consider. This is a general case where the
conditional probability of an event does not change over time. It is extremely simplistic and can
Woods 8
be expressed as (“Parametric Models”):
ℎ(𝑋) = 𝜆𝑖 = 𝑒 𝑋𝑖 𝛽
The exponential distribution can be expressed in the Weibull variant of this method.
Furthermore, when the hazard rate is constant, it can be considered the exponential variant.
Generally speaking, this type of function assumes that the hazard function varies monotonically
over time. In simpler terms, it varies in a uniform manner. Whether the function is monotonically
increasing or decreasing is dependent upon whether “p” is greater than or less than one,
respectively. If “p” is equal to one then it is constant. It is also clearly different in terms of
representation, as shown below (“Parametric Models”).
ℎ𝑖 (𝑡, 𝑋) = 𝜆𝑝(𝜆𝑡)𝑝−1
Where
𝜆 = 𝑒 𝑋𝑖 𝛽
In spite of the clear differences present in the accelerated life model's iterations, they are
predicated on the same idea. That is, that this all can be interpreted in a similar fashion to a
standard semilog model. This notion is illustrated in the function below (“Parametric Models”).
ln(𝑇) = 𝑋𝛽 + 𝜀
The accelerated time model is complicated by the acceleration factor. From this above
equation (“Parametric Models”):
𝑇 = 𝑒 𝑋𝛽 𝑒 𝑧
Therefore, if a variable 𝑋𝑘 is altered by 𝛿, then the survival ratio becomes (“Parametric
Models”):
𝑇(𝑋𝑘 + 𝛿)
= 𝑒 [𝑋𝑘−(𝑋𝑘+ 𝛿)]𝛽𝑘 = 𝑒 𝛽𝑘𝛿
𝑇(𝑋𝑘 )
Woods 9
Here 𝑒 𝛽𝑘𝛿 is the acceleration factor or time ratio and it pertains to the effect of a variable
on the duration that a subject remains in the initial state (“Accelerated Life Models”). In other
words, it conveys the effect a variable has on survival time (“Parametric Models”). Moreover,
when the non- exponentiated coefficient is negative, there is a negative effect of survival time,
and the opposite occurs when it is positive. From there, the coefficients can be exponentiated to
produce the time ratio. By subtracting one from the time ratio, the percentage change in expected
survival time can be obtained (“Parametric Models”). The main drawback to this method is that
unless a Weibull or Exponential variant of is used, the hazard ratio cannot be found, which
makes it impossible to determine the effect a regressor has on the hazard rate.
The proportional hazard model’s strength is in its focus on the effect of the regressors on
the hazard rate which can be obfuscated in the accelerated life models. Instead of measuring the
effect of a regressor by multiplying the predicted event time by an acceleration factor,
proportional hazard models measure the effect of a regressor by producing the hazard ratio,
which is much more straightforward. The coefficients produced by accelerated time models can
be further manipulated to yield the hazard ratio by multiplying the negative scale parameter by
the parameter estimates, then exponentiating the result (Crumer 2011).
However, this transformation is only applicable to the Weibull and exponential variants
of this model. That is due to the fact that despite being an accelerated time model, it satisfies the
proportionality assumption. This assumption means that the hazard function differs from the
baseline hazard by a certain proportion. For that reason, hazard ratios cannot be derived from the
vast majority of accelerated failure time models (“Parametric Models”).
Also, the Cox proportional hazard model does not rely on probability density functions
Woods 10
from parametric distributions. On the other hand, accelerated failure time models rely on a
number of assumptions regarding the distribution of the durations and error terms, which can
alter the veracity of the results yielded. Proportional hazard functions use the probability that an
observation leaves the initial state given that it is at risk, which frees them of the burdensome
assumptions present in accelerated time models (Monogan 2010).
For those reasons, in order to implement the hazard function in this study, it is prudent to
use the Cox proportional hazard model. It is proportional due to the fact that the hazard function
is proportional to the baseline hazard. The baseline hazard refers to the function without the
inclusion of the extra explanatory variables. In other words, it is the hazard at any point in time
when the regressors are all zero. As a result, for the assumption of proportionality to be met, the
explanatory variables must be constant through time. If this is not the case, the variables must be
stratified in the model to accommodate for that (Mason 2005).
The basic form of the Cox proportional hazard model (Mason 2005):
ℎ(𝑡) = ℎ0 (𝑡)𝑒 𝑋𝑖 𝛽
It can also be expressed as (Fox 2002):
log ℎ(𝑡) = 𝛼(𝑡) + 𝑋𝑖 𝛽
The baseline hazard is the bolded portion of the equation below (Mason 2005):
ℎ(𝑡) = 𝒉𝟎 (𝒕)𝑒 𝑋𝑖 𝛽
The proportionality condition asserts that the basic form of the Cox model is valid. In
other words the expression below must be true (Mason 2005).
ℎ(𝑡)
= 𝑒 𝑋𝑖 𝛽
ℎ0 (𝑡)
Stratification allows for the form of the hazard function to vary with different levels of
Woods 11
the stratified variables. For instance, suppose that there is a variable that does not meet the
proportionality condition, but has significance in the model. It is necessary to adjust the model
without explicitly estimating its effect on the outcome. In other words, there is no change in the
parameter fit for this model. Assume the variable is “Z” and is counted with the subscript “j.”
The model would look like this (“Cox Proportional Hazards Model” 2004):
ℎ(𝑡|𝑋, 𝑍 = 𝑗) = ℎ𝑗 (𝑡)𝑒 𝑋𝑖 𝛽
This is essentially a form of nonlinear regression which is used to explain the nonlinear
relationship between variables. This type of estimation is useful for distinguishing the
contributions of the various regressors in the context of a hazard function. For that reason, it
shall be implemented here.
Application
To examine the issue of the reemployment of older displaced workers, the
implementation of the hazard function is appropriate. The hazard function allows one to
estimate the probability of a worker gaining reemployment in a certain interval of time, given
that he is displaced. This is a powerful tool which will aid in examining the consequences of
displacement for older workers. Due to the nature of the procedure, it can help in determining
the reasons behind the disparity between the unemployment spells of older and younger
displaced workers.
In the process of exploring this idea, it is necessary to cover the known facts. In the
interest of clarity, it would be prudent to define displacement. Moreover, a worker is displaced if
he is unemployed due to the firm closing, downsizing, or moving away. It is important to isolate
Woods 12
displaced workers, since non-displaced workers may not be seeking reemployment as
vigorously. For instance, a worker is more likely to leave the labor force if he is unemployed and
non-displaced. This concern only increases with age due to the increasing possibility of
retirement.
Also, it is evident that as one ages, their probability of working decreases. This is
especially clear in older worker’s likelihood of being reemployed. Workers aged 50 to 61, who
lost their jobs between 2008 and 2009, were 33 percent less likely to be reemployed than their
younger counterparts according to the Urban Institute ( J o h n s o n , a n d P a r k 2 0 11 ) .
Furthermore, those older workers that managed to obtain reemployment sustained deep cuts in
compensation ( J o h n s o n , a n d P a r k 2 0 11 ) . In fact, the new median wage for older
reemployed workers fell 36 percent below the old wage ( J o h n s o n , a n d P a r k 2 0 11 ) . It
could be the case that older workers have a more difficult time in the labor market than those
with youth.
There are various reasons as to why older labor force participants may spend longer
periods of time displaced. It is possible that this fast paced world dominated by technology has
alienated many older workers. Not to mention that their reservation wages tend to be higher than
their younger counterparts, due to the perceived increased value of their skills. For these reasons,
it is possible that employers may not wish to hire workers of an advanced age.
The idea that the probability of reemployment plummets as age increases is prevalent in
the media. Schoen, a contributor to MSNBC, cited how the ‘mass layoffs’ often reported are
largely composed of older workers ( S c h o e n 2 0 11 ) . These displaced older workers are
Woods 13
relatively vocal about their strife and diminished opportunities. On the other hand, some older
workers may choose to spend a period of time displaced or out of the work force due to their
lower perceived job search cost and higher propensity to retire.
It may be the case that reemployment tends to be more difficult for older workers. If this
is true, it could be considered to be problematic. Also, there are many causes that can be
considered. For that reason, it would be interesting to test whether older workers have a more
difficult time gaining reemployment through examining their displacement spells using the
hazard function.
Literature Review
Further exploration of the uses of hazard functions is paramount to this study. The hazard
function is integral to this study. In addition to examining matters of employment, and various
other economic ideas (like reemployment), it can be applied to a myriad of areas. Hazard
functions are used in quality engineering, biology, and practically every other area where failure
rates need to be examined.
For instance, hazard functions can be implemented in analyzing the completion rates of
older, non-traditional students in higher education. Essentially, in recent times, older students
have comprised a larger portion of total undergraduates in the United States, according to the
United States Department of Education’s 2003 study. Various studies have shown that older
students have a lower probability of completing their degree or certification for various reasons.
Generally, these reasons revolve around the idea that the opportunity cost of further education
increases as one ages. However, Calcagno et al (2007) consider the idea that the cause of older
Woods 14
students' increased dropout rate is a result of their skills rusting after spending significant
amounts of time away from the educational system.
To test the idea that older students’ graduation rates in a given term are not as high as
traditional students’ hazard rate of degree completion due to diminished skills, Calcagno et al
(2007) employ a Discrete-Time Hazard model, controlling for cognitive mathematics ability.
Once the longitudinal dependent variable, which indicates completion, becomes satisfied the
student in question is deleted from the sample in future time periods. Students who stop out are
also eliminated from the sample. Generally, their hazard model is calculating the probability that
a student completes his degree or certification, in a given term, if he has not done so in the past.
Calcagno et al (2007) found that the erosion of the skills of older students is the primary
cause of their lower probability of completing their degree or certificate in a given term.
Previous studies have already indicated that older students have a lower probability of
completing their degrees or certifications. However, when cognitive mathematics ability is
controlled for, older students have a higher hazard rate of completion. This supports the notion
that more than age itself is responsible for the lower completion rates of older students. This is
just one other area where hazard functions can aid in analysis.
Hazard functions can even handle interesting subjects like war. Collier et al (2004)
examines the duration of civil wars. They note that civil wars tend to last longer than any other
type of warfare. In fact, they tend to last over six times longer than any other type of
international conflict. Clearly, the length of civil wars is a problem. In order to understand how
this problem can be remedied, it is necessary to understand the cause. In other words, Collier et
al (2004) seeks to examine the determinants of the duration of civil wars.
Woods 15
In order to do this they implement a hazard function. Their hazard function determines
the probability that a war ends in a given month. Collier et al (2004) has numerous variables to
control for various demographics amongst other factors. With the information yielded from this
study, they were able to derive sound conclusions that could likely lead to solutions.
Moreover, through their study, Collier et al (2004) are able to determine the primary
determinants of prolonged civil war. These determinants are high income inequality, low per
capita income, and ethnic division. On the other hand, factors that shorten the length of civil
wars are a decline in the prices of exports, and external military intervention. In other words, the
key to quelling such a conflict is to intervene. Without such intervention, civil wars can persist
for, in some cases, generations. This is yet another interesting finding resulting from the
implementation of hazard functions.
Price (2009) uses hazard functions to analyze the relationship between obesity and crime.
Moreover, obese individuals are viewed as disadvantaged and, due to their constraints resulting
from obesity, are hypothesized to be more likely to commit crimes at a younger age. To
exacerbate this, obese people tend to have lower wages and less overall occupational attainment.
Generally speaking, individuals that conform to those characteristics are more likely to commit
crimes.
To test this idea, Price (2009) implements a hazard function. He is essentially attempting
to calculate the probability that an obese individual transitions from legitimate labor market
activities to their first conviction within a given year. This hazard is the product of the
probabilities that an obese individual is presented the opportunity to commit a crime, and the
chances that the reservation wages gained from said crime are high enough. To econometrically
Woods 16
analyze this Price (2009) uses a Cox proportional hazard model.
His results are consistent with his hypothesis; obese individuals have a higher probability
of committing a crime in a given year. The various measures he uses to measure the effect of
obesity show have a positive relationship with the probability of conviction in his hazard
function.
In order to develop a basis for this study, it is necessary to consult relevant articles on the
subject. For that reason, several articles were gathered to illustrate both the application of
relevant economic theory to the subject matter, and the various uses of hazard functions. As a
result, all of the articles gathered use hazard functions, and some of those applications do not
strictly pertain to the matter of reemployment. However, the most relevant articles deal
primarily with the economic theory and statistical methods associated with the hazard function.
For instance, the appropriate theory and relevant statistical methods are more than
adequately addressed in an article by Chan and Stevens (2001). They explore the employment
consequences of being displaced at an advanced age. Given the fact that job loss rates amongst
seniors have risen considerably between 1981 and 1993 their prospects for reemployment are of
the utmost importance (Farber 1997). Prior to this paper many studies had excluded older
workers from post-displacement reemployment studies. This is likely because of the looming
shadow of retirement which could complicate matters, due to the possibility of retirement solely
as a result of job loss.
The issue of retirement is dealt with through employing hazard models on displaced and
non-displaced unemployed workers to compare the groups and highlight the effects of
displacement on reemployment. Moreover, if a worker is displaced, the chances are greater that
Woods 17
leaving their job was not their choice, meaning they are more likely to be seeking reemployment.
On the other hand, the non-displaced unemployed may not necessarily be seeking
reemployment. Chan and Stevens (2001) define their hazard functions as a probits to give the
probability of an individual returning to work, in a given month, if they are displaced or nondisplaced, to contrast the two groups. This model allows them to examine the effects of
displacement on reemployment over time, while controlling for various worker characteristics.
They use a sample with 9,668 observations, from the Health and Retirement Study (HRS), with
1,668 of those observations being displaced workers. Also, the worker's ages range from 50 to
66+.
Chan and Stevens (2001) found a vast disparity between the probability of a displaced
and non-displaced older worker gaining reemployment. For instance, at the age of 55 just 60%
of displaced men were reemployed, while more than 80% were reemployed amongst nondisplaced unemployed men two years after losing their jobs. This gap in reemployment remains
through four years of unemployment. Therefore, generally speaking, displacement is a detriment
to older workers gaining reemployment, according to their study.
Furthermore, these results are theoretically rationalized in Chan and Stevens (2001). In
essence they infer that older displaced workers can be forced out of the labor force as a result of
their displacement. In other words, they are much more likely to retire. This is due to the fact
that older workers face diminished utility from continuing to search for employment or work.
Generally their compensation is substantially lower than it was at their previous positions. As a
result, older worker's incentive to work falls accordingly. Such outcomes make retirement more
appealing to older workers.
Woods 18
The unemployment spells of displaced older and younger workers is juxtaposed directly
in Johnson and Mommaerts (2010). Johnson and Mommaerts (2010) examine the determinants
of reemployment for displaced workers. Citing the Bureau of Labor Statistics’ 2010 study, they
note that displaced older workers tend to remain unemployed longer than their younger
counterparts. Given that, in recent times, a greater portion of the labor force is composed of
older workers. Johnson and Mommaerts (2010) seek to further explore the duration of
unemployment spells for displaced older workers.
To explore the nature in which reemployment varies by age, Johnson and Mommaerts
(2010) employ a hazard model using the Survey of Income and Program Participation (SIPP)
Census panel data set. With this data, they observed respondents from the time they are
displaced to the time they find employment, leave the labor force, leave the survey, or the survey
ends. Observations that leave the labor force, leave the survey, or did not find reemployment by
the end of the survey, are censored. From there, they implement the hazard function through
estimating a logit model for the log odds of becoming reemployed, controlling for various
demographic and financial factors.
Using their methods Johnson and Mommaerts (2010) are able to conclude that older
displaced workers have a lower probability of finding reemployment in a given month than their
younger counterparts. According to their findings, men aged 50 to 61 are 39 percent less likely
than those aged 25 to 34 to gain reemployment within six months of job loss. Furthermore,
similar results were found for the female estimation. However, the authors note that the reality
could be much worse for older workers since their sample excludes displaced workers once they
cease searching for employment.
Woods 19
Johnson and Mommaerts (2010) findings are also consistent with those of Chan and
Stevens (2001). Johnson and Mommaerts (2010) found that older displaced workers have a more
difficult time gaining reemployment. In their study, the unemployment duration of older and
younger workers are compared. This could be rationalized as the result of older workers having
higher reservation wages. This clearly demonstrates the disparity between older and younger
workers in the difficulty of finding reemployment, which is paramount to this study.
This study is derived from these masterfully done works. Here the duration of
displacement will be examined with the hazard function. However, unlike previous studies, this
analysis sports a different data set in the Displaced Worker Supplement (DWS), and employs the
Cox proportional hazard model as seen in this Price (2009). For the sake of comparison an
accelerated failure time model is also estimated.
Theoretical Framework
For further insight regarding the effect age has on reemployment and its various
implications, it is necessary to examine the relevant theory. With that in mind, the job search
and retirement models are implemented to aid in the explanation of the reasons behind the
econometric findings.
Generally speaking, the retirement and job search models aid in describing occurrences
in the labor market. The job search model deals primarily with the strategy used when seeking
employment. Essentially the strategies one employs in his quest for employment is dependent
upon the reservation wage. The reservation wage is the wage at which a worker chooses to take a
job due to the opportunity cost of continuing a job search being too high. Some of the relevant
Woods 20
variables that alter one's reservation wage are the availability of unemployment insurance,
distribution of job offers, and the minimum wage rate. These variables, amongst others, all alter
the one's job search strategy.
Furthermore, as a worker ages, retirement becomes more and more likely. For that
reason, a retirement model is also relevant to this research. Essentially as one ages, the
opportunity cost of working becomes higher. This is especially true for workers of an advanced
age, who largely face deep pay cuts upon reemployment, and may have partial social security
benefits. This clearly has an impact on their reemployment prospects.
Model Specification and Data
Due to their study's relevance to this study, Johnson and Mommaerts (2010) would be
best suited for examining the reemployment of older workers. The use of hazard functions
appears to be prudent in the estimation of the probability of reemployment over a given period of
time. However, this study will diverge from Johnson and Mommaerts (2010) due to the fact it
will use the Displaced Worker Supplement (DWS) Census data set instead of the Survey of
Income and Program Participation (SIPP) Census data set. Also, instead of implementing the
hazard function using a logit model of the log odds of reemployment, a Cox Proportional hazard
function will be implemented here. This will enable the juxtaposition of the two approaches.
Based on the model, the DWS data set may suffice in its estimation. However, it may not
be possible to control for all of the factors they do in Johnson and Mommaerts (2010) more
extensive study. The DWS data set is relatively extensive and has all the variables that are
absolutely necessary for this estimation using a hazard function.
Woods 21
Another difference is that the SIPP census data set used for the model in question covers
the years 1996-2007. The DWS data set is more recent and updated to the year 2009. For the
most part, this difference appears to be negligible, with the exception of the atypical economic
environment. It seems as if the changes, on the whole, will be trivial and not affect the previous
results.
In the end, a myriad of procedures can be carried out to examine the probability of
reemployment of older workers. However, a hazard function best suits the needs of this study. It
offers a relatively straightforward approach to examining the unemployment spells of displaced
older workers. With the aid of the related literature this study may contribute to the pool of
knowledge regarding the employment prospects of older workers.
This study’s model will be estimating,
Reemployment Hazard (found with Weeks without Work) = f(Age, Race, Education, Marital
Status, Receiving Unemployment Benefits, Household Income, Job Tenure in Months, Year,
Union Member),
Where,
Weeks without Work = The amount of time in weeks a worker has been unemployed.
Age = The age of an individual.
Race = A set of dummy variables that indicate the race of an individual, defined as Black,
Hispanic or other.
Education = A set of dummy variables indicating the highest level of educational
attainment a worker has achieved.
Marital Status = A dummy variable that shows whether a worker is married or not.
Receiving Unemployment Benefits = Whether a respondent is receiving unemployment
benefits or not.
Household Income = A continuous variable representing the income a household
receives.
Job Tenure = The amount of time in months a worker has held their previous occupation.
Year = The year defined as a set of dummy variables.
Union member = A dummy variable indicating whether an individual belongs to a union.
Based on the borrowed models and theory applied to this subject matter, the variables
Woods 22
present in this model are summarized in the above equation. All the variables present are
relevant to assessing one's probability of reemployment over a certain interval of time.
The “Age” variable is the primary variable of interest. Essentially, it's the main
independent variable considering the fact that the primary focus of this study is testing whether
the probability of being reemployed, over a given interval of time, differs amongst various age
groups. Chan and Stevens (2001) and Lahey (2008) have regarded the difference in
reemployment prospects amongst various age groups as an effect of statistical discrimination. In
other words, older workers are less attractive to employers than their younger counterparts, for a
given reservation wage. However, “Age” is associated with an increase in the reservation wage
for older workers, due to the increase in perceived value in skills that comes with age. This
creates an ostensible disparity between the wages older workers are willing to accept the wages
they are offered. As a result, it is very likely that older workers face difficulty when seeking
employment. For that reason, it is expected that “Age” has a negative relationship with the
probability of reemployment, from a theoretical perspective.
The “Race” variable's relationship with the probability of reemployment over a given
interval of time depends upon the race in question. For the purposes of this study, “Race” will
be a set of dummy variables, which will be defined as white, black, or other. Basically, different
races may have different reservation wages. For instance, Holzer (2008) found that young black
male youth have higher reservation wages than their white counterparts. This notion is to be
assumed to extend to older workers as well. This implies that different races may employ
different job search strategies.
The “Education” variable's relationship to reemployment probability is rather
Woods 23
straightforward. Generally, it's assumed that the more human capital a worker has, the more
marketable he is. As a result, it has a direct effect on one's job search strategy. Depending on
how a worker perceives his abilities, he is more likely to have a higher reservation wage. The
better educated one is, the more likely he is to place a higher value on his abilities. For that
reason, the “Education” variable is likely to have a negative relationship with the likelihood of
one being reemployed.
On the other hand, “Receiving Unemployment Benefits” may be negatively correlated
with reemployment. Unemployment benefits decrease the opportunity cost of continuing a job
search. If the incentive to gain employment swiftly is diminished, then one will to remain
unemployed longer. Thus, it causes changes in a worker's job search strategy. For that reason,
whether an individual is receiving UI benefits negatively impact their chances of being
reemployed.
Similarly, “Household Income” is negatively related to the probability of a worker
becoming reemployed. The greater one's household income is, the longer their job search is
likely to be. For instance, if the individual in question is displaced, and their household income
is high, their job search is likely to be longer (Alexopoulos, and Gladden 2006). This is an
instance, depending upon the circumstances, where one would consider retirement, especially if
they are old enough to receive partial social security benefits. As a result, this is likely inversely
related with the likelihood of becoming reemployed.
Another factor altering job search strategy is “Job Tenure.” The longer a worker is on the
job, the more human capital he acquires. As a result, the worker in question should have a higher
reservation wage decreasing the cost of rejecting job offers (Mortensen 1988). In this event the
Woods 24
worker may choose to stay unemployed for a longer period of time, reducing his reemployment
prospects. This makes his job search strategy, differ from one who has less experience.
One's job search strategy could also be altered by “Marital Status.” It is assumed that a
married individual would have more of an incentive to seek reemployment, given his situation.
In other words, the fact that an individual is married implies that there is a greater need for
income.For that reason, his reservation wage is likely to be lower than one without a spouse
(Franz 1980). For that reason, a married individual's job search strategy would differ from his
unmarried counterparts.
Job search strategies would also differ in the case of union members versus non-union
members. Moreover, theory suggests that a union member would have a lower reservation wage
due to various other protections offered by the union, according to Johnson and Mommaerts
(2010). This could be a result of a more advantageous bargaining position a union affords its
members. As a result, it is likely that a union member would spend a shorter time unemployed
than his non-union affiliated counterparts.
Lastly, the “Year” variable serves to control for differences in one's job search strategy,
time makes. For instance, different years could be in different parts of the business cycle.
Clearly, someone in a recessionary period would employ a vastly different job search strategy
than one in an expansionary period. As a result, depending upon the year, the parameter estimate
for this variable will be different. More specifically, it'll be negative during recessionary times
and positive during expansionary years. Here it is likely to be negative due to the data set
covering a recessionary period.
Woods 25
Data
As previously noted, this study will implement the Displaced Worker Survey (DWS)
from the Census. Furthermore, this iteration of the DWS data set is the most current and covers
years 2007-2009. As a result, this data set is more current than other studies regarding this topic.
The DWS has the necessary variables to conduct a study of the duration of displacement.
The most important variable to such a study is the duration of unemployment. This is
captured in the weeks without work variable (WKSWO). It represents the amount of time in
weeks a worker was without work. Since individuals that were not displaced, were eliminated
from the sample, this variable captures the duration of displacement. In this sample, the mean
time of unemployment is roughly 14.2 weeks, while the standard deviation, maximum, and
minimum are 18.9 weeks, 160 weeks, and 0 weeks respectively.
Another variable captured in this model is the “AGE” variable. This variable represents
the age of an individual. Here the mean, standard deviation, minimum, and maximum of age are
40.8, 11.8, 20, and 65, respectively. Clearly, it is prudent to cap the maximum age at 65 years to
minimize the effect of retirement.
The remainder of the variables are various demographic and control variables. They deal
with union membership, unemployment benefits, marriage, job tenure, family income, race, and
the year. Perhaps, the only notable manipulation that took place amongst these variables is with
regard to the family income variable. It was represented by income bands originally. In order to
make it more manageable, the income bands were converted to one continuous variable. Other
than that these variables are untainted, and their descriptive statistics and explanations are
available in table A-1 of the appendix.
Woods 26
Model Interpretation
In order to effectively interpret the models, it is prudent to understand their constituent
parts. For the Cox model, when the parameter estimates are exponentiated it yields the hazard
ratio, which is the ratio of the hazard rate produced from a change in a variable over the baseline
hazard. In order to measure the change in the hazard rate caused by a unit change in a variable,
one must be subtracted from the hazard rate. The accelerated failure time model is a bit more
complicated. Here, the parameter estimates are time ratios when exponentiated. In order to
obtain the hazard ratio, the parameter estimates must be multiplied by the negative shape
parameter and then exponentiated. With this in mind, we can interpret the very interesting results
yielded by this model.
The results yielded by this model are interesting. This is largely due to the fact that many
variables that would be expected to have a significant impact on the reemployment hazard are
not significant in either the accelerated failure time or Cox versions of the model. Moreover, this
is a staunch deviation from the results found in the related literature. For those reasons, the
results, reported in table A-2, are quite dubitable, but not completely outside of reason.
This is true especially for the “Age” variable's parameter estimate. In both models the
“Age” variable is not significant as a result of its high p-values. On the other hand, in the Cox
model there is a somewhat lower p-value for this variable than in accelerated failure timeversion.
Clearly, from a statistical perspective, age does not have a significant effect on the probability a
worker gains reemployment in a given week, given that they are displaced. This is evidenced by
both models.
Woods 27
However, the Cox Proportional model demonstrates that whether one has spent some
time in college has a significant effect on the reemployment hazard. Spending some time in
college increases one's chances of reemployment in a given week by 48.1%. This differs from
the accelerated failure time version of this model because this particular variable is not
significant there. The remainder of the variables are also demonstrates a disparity between these
two model.
In both models whether one is receiving unemployment benefits is strongly significant.
The accelerated failure time model and the Cox model both produce low p-values for the
variable representing whether an individual is receiving unemployment benefits. Furthermore, if
one is receiving unemployment benefits then their chances of reemployment in a given week
decrease by 56.1% in the Cox model and by 44.1% in the accelerated failure time model. In
other words, unemployment benefits appear to be a detriment to reemployment.
This non-uniform effect carries over to the variable indicating whether one is a member
of a union. If an individual is in a union they have a 34% greater chance of gaining
reemployment in a given week, according to the Cox model. Moreover, their chances of
reemployment in a week are not significant in the accelerated failure time model. This is one of
the few significant variables currently in the model.
Another significant variable is the one indicating whether an individual is black or not.
In the Cox Proportional model it decreases one’s chances of reemployment in a given week. On
the other hand, according to the accelerated failure time model, this variable is not significant. In
the Cox model one’s chances of reemployment in a given week decrease by 28.6%.
However, they do not deviate vastly from one another. In fact, the year 2009 seems to
Woods 28
have a positive, significant effect in both models. Perhaps, there was some sort of mild recovery
during that year. Their differences can most likely be attributed to the assumption regarding the
distribution of survival times in the accelerated failure time model. Whether the actual
distribution of the survival times was consistent with the Weibull distribution or approaching that
form determines which method is more accurate in this case. The more surprising result is that
most of the variables are not significant. This could be a result of the powerful effect the
unemployment benefits variable has on the model or some multicollinearity.
Conclusion
The results of this study are particularly interesting for various reasons. Perhaps the most
interesting aspect of what was revealed was that most of the variables expected to have a major
effect on the model are not significant. Also, these results diverge from the previous studies cited
earlier rather starkly. Of course, there are reasons for all these events that can be explained
logically.
For instance, the “Age” variable was not significant in either model. This is likely a result
of the effect the variables had on the models. It is very likely that a markedly significant
variable, like unemployment benefits, could have overpowered it. It is not beyond the realm of
plausibility that this is the case.
Also, it is not entirely implausible that a variable is significant in one model, but not in
the other. Each model has different assumptions attached, which could alter the results based on
the accuracy of the assumptions in this case. There is a distinct possibility that one model is
more suited for this study than the other. As a matter of fact, these models have different
Woods 29
specifications, but are widely applicable.
Luckily, and not surprisingly, whether one is receiving unemployment benefits has a
negative impact on an individual's probability of reemployment in a given week in both models.
This affirms the hypothesis that this is true. These results were expected and not particularly
troubling.
The fact that most of the education variables are insignificant is the most troubling aspect
of these results. It would seem as if education would either increase or decrease one's probability
of reemployment. One could assume that either those with more human capital would have a
higher reservation wage lengthening their job search, or would be in high demand which would
shorten their job search. It appears that the latter is the case since the Cox model shows that
whether one has some college experience increases the probability he gains reemployment in a
given week.
On the whole, these results could be due to the data only covering years 2007- 2009.
During these years there was an atypical economic environment which could have skewed the
results. If this study could be conducted differently, it would be prudent to control for the
business cycle. However, other than that, this study was conducted successfully.
Woods 30
Works Cited
"Accelerated Life Models."Web. 1 Nov 2011.
<http://www.mas.ncl.ac.uk/~nmf16/teaching/mas3311/handout9.pdf>.
Alemi, Farrokh. "Hazard Functions for Combination of Causes."Youtube.George Mason
University .Online. 2007.<http://youtu.be/TT7mdmMAmPg>.
Alexopoulos, Michelle, and Tricia Gladden."Wealth, Reservation Wages, and Labor Market
Transitions in the US."University of Toronto. University of Toronto, 01 Jan 2006. Web. 8
Nov 2011. <http://www.iza.org/iza/en/papers/transatlantic/1_alexopoulos.pdf>.
"BIOST 515."Cox proportional hazards models. N.p., 04 Mar 2004. Web. 1 Nov 2011.
<http://courses.washington.edu/b515/l17.pdf>.
Bureau of Labor Statistics. 2010. “Unemployed persons by Age, Sex, Race, Hispanic or Latino
Ethnicity, Marital Status, and Duration of Unemployment.” Washington, DC: U.S.
Department of Labor. ftp://ftp.bls.gov/pub/special.requests/lf/aat31.txt.
Calcagno, Juan, Peter Crosta, Thomas Bailey, and Davis Jenkins. "Does Age of Entrance Affect
Community College Completion Probabilities? Evidence from a Discrete-Time Hazard
Model."Educational Evaluation and Policy Analysis. 29.3 (2007): 218-35. Print.
C h a n , S e w i n . " J o b Lo s s a n d E m p l o ym e n t P a t t e r n s o f O l d e r Wo r k e r s . "
J o u r n a l o f La b o r E c o n o m i c s ( 2 0 0 1 ) : 4 8 4 - 5 2 1 . We b . 11 J a n 2 0 11 .
Collier, Paul, AnkeHoeffler, and MånsSöderbom."On the Duration of Civil War."Journal of
Peace Research. 41.3 (2004): 253-73. Print.
Crumer, Angela. "Comparison between Weibull and Cox proportional hazards models." . Kansas
State University , May 2011. Web. 18 Nov 2011. <http://krex.k
Woods 31
state.edu/dspace/bitstream/2097/8787/3/AngelaCrumer2011.pdf>.
F a r b e r, H e n r y S . “ T h e C h a n g i n g F a c e o f J o b Lo s s i n t h e U n i t e d S t a t e s ,
1 9 8 1 - 1 9 9 5 ” B ro o k i n g s P a p e r s o n E c o n o m i c A c t i v i t y : M i c ro e c o n o m i c s
(1997), pp.55-128.
Fox, John. "Cox Proportional-Hazards Regression for Survival Data." Fox Companion, Feb
2002. Web. 1 Nov 2011. <http://cran.r-project.org/doc/contrib/Fox-Companion/appendix
cox-regression.pdf>.
Franklin, David. "How to Build the Kaplan-Meier Curve from the Ground Up."The
Programmers Cabin.N.p., 08aug2006. Web. 1 Nov 2011.
<http://www.theprogrammerscabin.com/OT060830.pdf>.
Franz, Wolfgang. United States. National Bureau of Economic Research.Reservation wage of
unemployed persons In the Federal Republic of Germany: Theory and Empirical Tests.
Cambridge: NBER, 1980. Print. <http://www.nber.org/papers/w0578>.
H o l z e r, H a r r y. " R e s e r v a t i o n Wa g e s a n d t h e i r L a b o r M a r k e t E f f e c t s f o r
B l a c k a n d W h i t e M a l e Yo u t h . " J o u r n a l o f H u m a n R e s o u rc e s . ( 1 9 8 6 ) :
157-177. Print.
Introduction to Survival Analysis. UCLA: Academic Technology Services, Statistical
Consulting Group. From http://www.ats.ucla.edu/stat/sas/seminars/
sas_survival/default.htm (accessesedNovember 1, 2011).
Johnson, Richard, and CorinaMommaerts. "Age Differences in Job Loss,
J o b S e a r c h , a n d R e e m p l o ym e n t . " U r b a n I n s t i t u t e . ( 2 0 1 0 )
J o h n s o n , R i c h a r d , a n d J a n i c e P a r k . " C a n U n e m p l o ye d O l d e r Wo r k e r s F i n d
Woods 32
Wo r k ? " U r b a n I n s t i t u t e 2 5 . 1 ( 2 0 11 ) : We b . 1 5 M a r 2 0 11 .
< h t t p : / / w w w. u r b a n . o r g / u r l . c f m ? I D = 4 1 2 2 8 3 > .
"Kaplan-Meier survival estimates." Stats Direct.StatsDirect, 2011.Web. 23 Nov 2011.
<http://www.statsdirect.com/help/survival_analysis/kaplan.htm>.
L a h e y, J o a n n a . 2 0 0 8 . “ A g e , Wo m e n , a n d H i r i n g : A n E x p e r i m e n t a l S t u d y. ”
J o u r n a l o f H u m a n Resources 43(1): 30–56.
Mason, Carl. "Cox proportional hazard models." UC Berkeley, 05 Dec 2005. Web. 1 Nov 2011.
<http://www.demog.berkeley.edu/213/Week14/welcome.pdf>.
Monogan, Jamie. "The Cox Proportional Hazards Model." Lecture. Washington University in St.
Louis. St. Louis. April 6, 2010.
<http://monogan.myweb.uga.edu/teaching/pd/16duration2.pdf>.
Mortensen, Dale. "Wages, Separations, and Job Tenure: On-the-job Specific Training or
Matching." Journal of Labor Economics . 6.4 (1988): 445-471.
"Parametric Models."New York University, n.d. Web. 1 Nov 2011.
<https://files.nyu.edu/mrg217/public/parametric.pdf>.
Price, Gregory. "Obesity and crime: Is there a relationship?." Economics Letters. (2009): 149-52.
S c h o e n , J o h n . " H o w a r e o l d e r, l a i d - o ff w o r k e r s f a r i n g ? . " M S N B C 2 0 11 :
We b . 1 5 M a r 2 0 11 .
< h t t p : / / w w w. m s n b c . m s n . c o m / i d / 1 5 5 3 7 9 1 7 / n s / b u s i n e s s
answer_desk/>.
U.S. Department of Education, National Center forEducationStatistics.
(2003). Integratedpostsecondaryeducation data system- Fall
Woods 33
e n ro l l m e n t s u r v e y ; 2 0 0 2 [ D a t a F i l e ] , Wa s h i n g t o n , D C
Woods 34
Append ix
Ta b l e A - 1
Variables
Variable
AGE
FEMALE
LJTEN
Definition
The age of an individual.
A dummy variable indicating gender
An observations tenure at their last job
in years.
Mean
40.8
.396
4.8
Std
11.77
.49
6.2
Min
20
0
.00274
Max
65
1
41
MARRIED
A dummy variable indicating whether
an individual is married.
A dummy variable indicating whether
an individual has received
unemployment benefits
A dummy variable indicating whether
an individual is a member of a union.
.55
.497
0
1
.43
.495
0
1
.100
.300
0
1
A dummy variable indicating whether
an individual’s highest level of
educational attainment is less than
high school.
A dummy variable indicating whether
an individual’s highest level of
educational attainment is a high school
diploma.
A dummy variable indicating whether
an individual’s highest level of
educational attainment is some
college.
A dummy variable indicating whether
an individual’s highest level of
educational attainment is a college
degree.
.075
.264
0
1
.31
.46
0
1
.33
.47
0
1
.206
.405
0
1
A dummy variable indicating whether
an individual’s highest level of
educational attainment is an advanced
degree.
A dummy variable indicating whether
an individual is white.
A dummy variable indicating whether
.077
.266
0
1
.725
.446
0
1
.087
.281
0
1
UIBENS
UNION
LTHS
HS
SCOL
COL
ADV
WHITE
BLACK
Woods 35
HISP
OTHERS
YearOne
an individual is black.
A dummy variable indicating whether
an individual is Hispanic.
A dummy variable indicating whether
an individual is not black, white, or
Hispanic.
A dummy variable indicating whether
an individual was displaced in 2007.
.133
.341
0
1
.0535
.23
0
1
.231
.421
0
1
YearTwo
A dummy variable indicating whether
an individual was displaced in 2008.
.361
.48
0
1
YearThree
A dummy variable indicating whether
an individual was displaced in 2009.
.41
.492
0
1
0
1
0
160
FAMILYINC
A variable representing the income of
58,426 40,461
a family.
WKSWO
A variable representing the amount of 14.17
18.9
time in weeks a worker has been
without work.
The source for all the data used in this analysis was collected from:
http://www.ceprdata.org/cps/dws_data.php
Woods 36
Ta b l e A - 2
Regression Table
Dependent Variable: WKSWO
Regression
(1) Cox Proportional
(2) AFT
Intercept
None (Semi-Parametric)
.9272
(<.0001)
AGE
.00220
(.5777)
.01174
(.8917)
.0008603
(.9043)
-.0004816
(.9958)
-.82323
(<.0001)*
.29569
(.0361)*
Reference
.17976
(.3305)
.39300
(.0362)*
.23436
(.2435)
.42494
(.0631)
Reference
-.33748
(.0222)*
-.22052
(.0918)
-.10798
(.5933)
Reference
.0006
(.7242)
-.0158
(.6527)
.0016
(.5934)
.0194
(.5996)
.2104
(<.0001)*
-.0709
(.2367)
Reference
.00398
(.5854)
-.0944
(.2056)
.0495
(.5351)
-.1131
(.2201)
Reference
.0954
(.0921)
.0742
(.1542)
.1068
(.2303)
Reference
FEMALE
LJTEN
MARRIED
UIBENS
UNION
LTHS
HS
SCOL
COL
ADV
WHITE
BLACK
HISP
OTHERS
YearOne
YearTwo
YearThree
FAMILYINC
.00271
(.9804)
.36295
(.0008)*
2.26342E-7
(.8428)
1.002
1.012
1.001
1.000
0.439
1.344
1.197
1.481
1.264
1.529
0.714
0.802
0.898
1.003
1.438
1.000
-.0125
(.7767)
-.1264
(.0038)*
<-.0001
(.7515)
0.559
1.417
.999140877
Not Applicable
Generalized 𝑅 2
Weibull Shape
Not Applicable
2.7561
-P-Values in parenthesis- Asterisk denotes significant values - Hazard Ratios are to the right
Download