bailey

advertisement
.ls 4
ANSI-C IMPLEMENTATION OF THE BAILEY-MAKEHAM WORKSTATION
William H. Rogers, Ph.D.
April 27, 1991
Work supported by contract 500-90-0048,
Health Care Financing Agency,
and placed in the public domain.
This report describes a set of computer programs written to estimate the
Bailey-Makeham survival model. It is designed to operate on
workstations delivered to the Professional Review Organizations along
side, and integrated with the STATA statistical package, produced by
Computing Resource Center.
In addition, the computer programs are designed to work in a standalone environment using ASCII files.
This report comes in three parts:
1.
1.
The Bailey-Makeham model and the numerical strategy for
estimation.
2.
STATA user's guide for the Bailey-Makeham model.
3.
ASCII user's guide for the Bailey-Makeham model.
THE BAILEY-MAKEHAM MODEL AND ITS ESTIMATION
@The Bailey-Makeham Model@
The Bailey-Makeham model is designed to estimate failure time, or
"survival" variables in which the time to failure follows a hazard
that decreases exponentially to a baseline rate. This model is
typically used to describe time to rehospitalization or death,
following a hospitalization.
Intuitively, these bad outcomes have two causes: underlying chronic
illness (represented by the baseline hazard rate) or simply the
fragility of old age, and acute illness or risk that is coincident with
the hospitalization, unless the patient dies. The initial excess hazard
represents the severity of the illness, and the rate of exponential
design describes the recovery time.
Mathematically, the model was described by Bailey(1988):
r(t) = $\alpha$ exp(-$\gamma$t) + $\delta$
where r(t) describes the incremental hazard for failure per unit time.
The parameters $\alpha$, $\gamma$, and $\delta$ are called the structural
parameters, since
they describe the deterministic information about each individual. The
hazard function describes the stochastic behaviour of that individual
subject to the structural parameters.
The parameter $\delta$ represents the long-term baseline risk, while the
the
parameter $\alpha$ describes the excess short-term risk, and $\gamma$ is
the decay
rate (in units of 1/time). Thus, a large value of $\gamma$ means that
the
process decays quickly, and a small value of $\gamma$ means that it
decays very
slowly.
The structural, or "natural" parameters $\alpha$, $\gamma$, and $\delta$
are a function
of covariates:
$\alpha$ = exp ($\alpha$`0 + $\alpha$`1x`1 + ... + $\alpha$`k)
$\gamma$ = exp ($\gamma$`0 + $\gamma$`1x`1 + ... + $\gamma$`k)
$\delta$ = exp ($\delta$`0 + $\delta$`1x`1 + ... + $\delta$`k)
The choice of covariates is formally identical across all three
parameters. However, covariates may be effectively removed from the
model by fixing (constraining) their parameter values at zero. They
may also be fixed at any other value.
Let R(t) be the survivor function:
log R(t) = - <integral> ($\alpha$ exp(-$\gamma$t) + $\delta$) dt
= -($\alpha$/$\gamma$)(1-exp(-$\gamma$t)) - $\delta$t
Censored data nonfailures have log likelihood log R(t).
Observed failures are assumed to occur in some interval D subsequent to
the observed survival time T. That is, the individual survived to time
T but was known to fail by T+D. Their likelihood is R(T) - R(T+D). The
log likelihood is log(R(T)-R(D)), which is approximately:
log r(t) + log R(t) + log(D)
Thus, the numerical value of the log likelihood is substantially
affected by the choice of the interval length D, as well as the units
for T. The software permits the interval length to vary according to
the subject. For example, if patient A is observed on a daily basis and
patient B on a weekly basis, D would be 1 day for A and one week for B.
@Solution Algorithm@
The solution algorithm is based loosely on the modified Marquardt
maximization procedure programmed by Jim Summe. Briefly, this method
produces a compromise between steepest descent methods and Newton-
Raphson methods, depending on how close to quadratic the function
appears to be. The program emphasizes the steepest descent method if it
runs into numerical difficulties, such as non-positive definite second
derivatives.
For more information on this algorithm, see Stewart(1973)
and Bard(1974). The algorithm was modified by Jim Summe and has
been further modified here.
This algorithm has the following parameters that control convergence:
ssexp --
The relative stepsize for parameter values, as a power
of 10, default -5. The algorithm will not converge if
changes in parameter values are greater than this
amount.
lambda --
The Marquardt parameter. Intuitively, this controls
how much to mix a steepest descent step with a Newton
step. This is changed by the program from step
to step, depending on convergence experience, so that
if the likelihood function appears to be quadratic, the
Newton solution will be emphasized. The initial default
is 1.0.
rhomax --
The maximum stepsize in the parameter values allowed
at any one step.
tolerance -- The minimum change in log likelihood that will be
accepted as convergence.
Unfortunately, the nicely interpretable Bailey-Makeham model appears to
have a nasty likelihood function. A nice problem like the logistic
maximum likelihood has a convex likelihood surface with a single
maximum, but this does not exist here. The likelihood function may have
flat spots, saddle points, and local maxima that are easily confused
with the global maximum, even in seemingly well-defined problems. The
local maxima appear to fit individual points or clumps of points with
early failure times. A dataset that demonstrates the type of problem
that can occur is discussed in Appendix A.
Fortunately, these problems seem to arise only rarely in the kinds of
datasets HCFA analyzes. For example, Jim Summe's substantial experience
has been "I have seen no evidence of this. However, the model is very
richly parameterized. This richness is both a blessing and a curse. A
curse because models can easily be specified which include parameters
that cannot be readily distinguished in the outcome space. Of course,
collinearity amoung the variables being modelled will cause some
problems but even if the variables are not collinear, parameters
associated with them will not be distinguishable."
As with just about every other kind of likelihood method, there is a
problem in the Bailey-Makeham model with likelihood functions that
maximize at infinite values of the parameters. Provisions are made for
dealing with this situation automatically.
We can also show cases where the Bailey-Makeham model elucidates
structure that was not obvious in simpler survival models. A
success of this kind is worth many failures.
The following modifications were made to the maximization procedure in
order to improve its convergence properties:
1.
A reference model is fit first, consisting only of the constant
term for each of the three structural parameters. Remaining
parameters are released only after the reference model has
converged.
2.
Starting values are estimated for the reference model from
the data.
3.
Covariates (other than the constant) are scaled by their
standard deviations in certain computations.
4.
Additional likelihood evaluations are performed when certain
conditions apply, and the step length is reduced if an
unprofitable step is contemplated.
5.
An option has been added to assess whether parameters are
tending to infinity, and to fix them at appropriate values
if it appears that they are.
6.
An option has been added to quit iterations on the basis of
a change in the log likelihood rather than a change in parameter
values.
@Ancillary programs@
Programs are also supplied for the following purposes:
1.
To evaluate predictions of the survival probability at
specific points in time.
2.
To estimate values of the structural parameters and their
standard errors for individual observations.
3.
To perform an evaluation of the surface of the likelihood
function at the point of convergence.
4.
To test sets of structural parameters for significance,
using the estimated covariance matrix. (* NOTE: This is
not supplied directly as a part of this contract, but will
be provided at a future date)
2.
ANSI INTERFACE: ARGUMENTS TO THE PROGRAM
A.
The Bailey-Makeham Model
The program commands are communicated to the program through a
command file. This section describes the syntax of that file.
Note: if you use the program from STATA, this syntax is not of
direct concern to you because the commands will be given directly
in STATA.
The ANSI syntax of this program is frankly old-fashioned. It is
designed this way because this program is intended to be used as a back
end to STATA. However, it is possible to invoke the command language
directly.
The syntax consistes of a series of statements. Each statement consists
of a keyword followed by sets of tokens, ending with a semicolon.
Consistent with C style conventions, everything is lower case.
There are 5 types of statement:
Type 1: How the data are acquired. There are 2 methods: from STATA
datasets and from ASCII datasets.
The relevant keywords are: stata, ascii, read, format.
Type 2: Program options.
Keywords: options, logfile, output.
Type 3: Independent and dependent variables.
and varlist.
Type 4: Starting values.
sv_delta.
Type 5: Fixed parameters.
and fx_delta.
Two lists: depend
Three lists: sv_alpha, sv_gamma, and
Again, three lists: fx_alpha, fx_gamma,
Using your text editor, create an ASCII file with the necessary
commands. The following example conducts an analysis of some
data that are described in the October 19, 1990 issue of JAMA
on the PPS medicare intervention:
options trace interval=1 autofix=.005;
logfile "chf3.log";
stata "chf.dta";
depend Survive Dead;
varlist Acute Chronic age NurseHm female Post1984;
output "chf3.out"; /* this is the binary output */
This 6-line command file was stored in a dataset named chf3.cmd. In
the same directory was a STATA dataset chf.dta with the 8 variables
Acute ... Post1984, Survive, and Dead. There is no importance in
any of the dataset names. However, STATA generally recognizes the
"dta" suffix as its internal datasets. The variable names can
be any combination of letters and numbers, upper or lower case
(case matters), starting with a letter.
The option "interval" defines the interval D in which all failures
occur, defined in the introduction. D may vary in the dataset,
if there is uneven followup, for example. In this example, it is
1 unit--the units happen to be days. The trace option requests
more detail on the progress of the maximization. The autofix
option specifies the aggressiveness with which the program will
attempt to solve "infinite parameter" problems.
Two datasets are created. First, chf3.log is an ASCII dataset that will
contain details on the maximization process. Second, chf3.out is a
binary dataset that will contain the values of the covariance matrix. A
utility program is provided to create an ASCII printout of the
covariance matrix from this binary dataset. The purpose of the binary
dataset is to communicate all important information in the problem to
other programs in the set.
Note that comments may be enclosed by "/*" and "*/".
While the program is running, the user sees the following on the screen:
.ls 2
Iteration
0: Log Likelihood
Iteration
1: Log Likelihood
Iteration
2: Log Likelihood
Iteration
3: Log Likelihood
Iteration
4: Log Likelihood
Iteration
5: Log Likelihood
Iteration
0: Log Likelihood
Iteration
1: Log Likelihood
Iteration
2: Log Likelihood
Iteration
3: Log Likelihood
Iteration
4: Log Likelihood
Iteration
5: Log Likelihood
Iteration
6: Log Likelihood
Iteration
7: Log Likelihood
Stopping with argument 0.
.ls 4
=
=
=
=
=
=
=
=
=
=
=
=
=
=
-14666.9614
-14384.0103
-14344.7678
-14337.8540
-14337.8475
-14337.8475
-14337.8475
-14209.0564
-14121.4334
-14029.5159
-14002.5517
-14001.2944
-14001.2860
-14001.2860
(1)
(2)
(2)
(2)
(2)
(2)
(1)
(2)
(2)
(2)
(2)
(2)
(2)
(2)
Stopping with argument 0 means that the maximization finished;
the user did not press Ctrl+Break or Command+. on the Macintosh.
(Other values are system specific and refer to the type of kill
signal received by the program. Kills are always acknowledged at
the end of an iteration, and an attempt is made to write an output
file showing progress so far.)
The output from the log consists of details on the above (not shown)
and the following summary:
.ls 2
(convergence achieved)
Bailey-Makeham Survival Model 0.4
Log Likelihood (C) =
-14337.848
2590
Number of observations =
Chi2( 18)
14001.286
Prob>chi2
=
673.123
=
0.0000
Log Likelihood
=
-
Structural
|
parameter
Var. |
Coef.
Std. Err.
t
Sig.
Mean
------------------+--------------------------------------------------------alpha
Acute |
0.131239
0.0195267
6.721 0.0000
32.3816
gamma
Acute |
0.193007
0.028999
6.656 0.0000
32.3816
delta
Acute |
0.00999832
0.0123248
0.811 0.4173
32.3816
alpha Chronic |
-0.0153143
0.0132135
-1.159 0.2466
38.3352
gamma Chronic |
-0.112104
0.0219194
-5.114 0.0000
38.3352
delta Chronic |
0.0208428 0.00886573
2.351 0.0188
38.3352
alpha
age |
0.0145094
0.0100164
1.449 0.1476
78.3613
gamma
age |
0.00547471
0.0151997
0.360 0.7187
78.3613
delta
age |
0.013165 0.00466809
2.820 0.0048
78.3613
alpha NurseHm |
0.452992
0.217983
2.078 0.0378
0.0976834
gamma NurseHm |
0.855248
0.297866
2.871 0.0041
0.0976834
delta NurseHm |
0.107426
0.123731
0.868 0.3854
0.0976834
alpha
female |
-0.196235
0.16327
-1.202 0.2295
0.55251
gamma
female |
-0.427492
0.259656
-1.646 0.0998
0.55251
delta
female |
-0.22033
0.0781749
-2.818 0.0049
0.55251
alpha Post1984 |
-0.540054
0.158425
-3.409 0.0007
0.528185
gamma Post1984 |
-0.624134
0.263733
-2.367 0.0180
0.528185
delta Post1984 |
0.057761
0.0887166
0.651 0.5151
0.528185
alpha
_cons |
-10.233
0.858745
-11.916 0.0000
1
gamma
_cons |
-6.30148
1.29032
-4.884 0.0000
1
delta
_cons |
-9.12453
0.356974
-25.561 0.0000
1
------------------+--------------------------------------------------------.ls 4
The interpretation of this output, first, is that the log likehood of the
full model is -14001.286, and the log likehood of the reference model
(constants only) is -14337.848. The Chi-square with 18 degrees of
freedom for the likelihood ratio test is 673, which is of course highly
significant. This means that the variables, taken as a set, make a
difference in the fit. Note that the likelihood is probably not
comparable
to the likelihood from any other kind of model due to the importance of
the interval length.
One might then want to search through the output for the reference model.
These parameters were:
.ls 2
Variable |
alpha
gamma
delta
---------+-----------------------------------------------_cons |
-5.30972
-3.93415
-7.1072
.ls 4
The base hazard is about exp(-7.11), which implies an average survival
time
of just over 1200 days. The short term hazard is exp(-5.31), which must
be
added to the base hazard. Together, these suggest an average expected
survival of 173 days, if the initial hazard prevailed. However, the
initial hazard moves toward the long term hazard with an exponental
decay with time constant 1/exp(-3.93) or 51 days. After 51 days, the
hazard is about 1/500.
Usually, the full model represents a purturbation around the parameters
of the reference model. But sometimes the algorithm will find another
solution. This can be checked by examining the average values of the
predictions of the log parameters. In this case, the means are -5.77,
-4.39, and -7.05, which are close to the above. A similar solution was
found.
Turning to the effects of the individual variables, one can view the
effect of each parameter as a modification of the overall mean.
For example, if a person's acute illness score were 12.4 instead of 32.4,
their alpha value would be -5.77 -2.62 or -8.39, their gamma would be
-4.39 -3.86 or -8.25, and their delta would be -7.05 -0.20 or -7.25.
So their long term risk would be slightly lower, and their short
term risk would only be about 25% higher. In addition, the decay
period would be much longer.
The effect of Chronic sickness is mostly to push out the time constant,
thereby extending the acute risk, whatever its level. The effect of
Post1984 is negative on the short term risk, but the time constant
is pushed out. This is consistent with the idea that improvements
in care at the time of hospitalization are responsible for mortality
improvements around the time of the PPS intervention.
The specific program statements are:
Type 1: Acquisition of datasets
ascii "datatset_name";
read var1 var2 var3 var4 ... vark;
format "c_scanf format";
or
stata "stata_dataset_name";
If ascii input is specified, then a read list and a format may be
included. If a read list is not included, then the option labheads
should be specified, and the variable names should be included,
white- space delimited, as the first record of the dataset itself.
If the format list is not included, the format is considered to be
white-space delimited, meaning that the values are separated by tabs
or spaces.
Some advice: Don't use formats. For one thing, the Think-C
Macintosh compiler misreads them under certain conditions, giving
plausible looking but incorrect answers.
Variable names may include any keyword or option names (this bug
is fixed).
Type 2:
Program options.
All options begin with the options keyword and end in a semicolon.
There are many options:
interval=number. The value of the interval D, applied across
the entire sample. (Interval may also be specified as a
dependent variable.)
weight=identifier; names a variable that will be used as a
weight. Currently, only one interpretation of weights is
supported--frequency weighting.
rescale:
Causes the weights to be rescaled. (unimplemented)
debug: Causes debugging output to be printed.
yydebug: Causes yacc debugging output to be printed.
maxiter=number. Sets the maximum number of iterations allowed.
The default is 100.
trace: Causes maximization details to be printed in the log.
The default is to print only the log likelihoods and the
summary table.
insight: Causes the dataset _checkpt.spm to be written after
each point, to be used by other potential addon programs. For
example, in DESQVIEW a program might look for such a dataset to
indicate recent progress of the maximizer. It is written in
the same format as the output dataset. The default is not to
write this dataset.
onestep: the reference model is not fit. Instead, all
parameters are released simultaineously. The default
is to fit the reference model first.
autofix=number. The algorithm is given license to fix
(constrain, or "stop") all non-constant parameters that
appear headed to infinite values. This only takes place
when successive changes of the likelihood are less than
(nobs/200) * number. The number is scaled in this way
for numerical accuracy reasons. The default is that this
option is turned off.
rhomax=number. This is the maximum Marquardt rho. The
value of number should be greater than 1. It represents
the aggressiveness with which the program will extrapolate
the parabolic solution, if it appears to be profitable.
The default is 5.
lambda=number. This is the
A small positive value says
solution. A large positive
descent. The default is 1;
common.
marquardt mixing parameter.
to trust the Gauss-Newton
value says to trust steepest
values such as 0.1 or 0.01 are
nsf=number. Number of significant figures. This option
affected printing in the mainframe version, but has no effect
in this version due to a different printing design.
ssexp=number. This is a parameter that tells the program
how much the absolute and relative stepsize can be for
the program to consider the answer as converged. It
is interpreted as a power of 10. The default is -5,
meaning that the stepsize needs to be less than 10^-5,
in both absolute and relative terms.
tolerance=number. The minimimum change in log likelihood that
will be considered as evidence of convergence. The default is
zero, meaning that this criterion is not considered.
altquote: This option changes the string quote character to "#".
It is used to facilitate command file setup by languages where
the literal double quote is problematic.
labheads: This option says that the raw dataset has label
headings in the first row.
missing="string". This option says that missing values in the
raw dataset are denoted by the character sequence "string"
the default is "M" (a single capital M).
There are also two separate statements that specify program
options. logfile "dataset name"; specifies the name of
a file which receives program ouptut. output "dataset name"
receives binary output including the covariance matrix and
parameter values.
Type 3:
Independent and Dependent Variables
varlist: following this keyword is the list of independent
variables to be applied to all three parts of the problem.
Order is unimportant.
depend: following this keyword is a list of all the dependent
variables. This list is ordered. The first variable is the
survival time. The second variable is an indicator of whether
the case is a death at the time specified by dependent variable
1. The third variable is optional. It is the length of the
observation interval in which death would have occurred. If
present a value must be specified. If this variable is not
specified, then the option interval is used for all cases.
If this is not specified, the default is 1 unit.
Type 4: Starting values.
sv_delta.
Three lists: sv_alpha, sv_gamma, and
Each of these keywords is followed by a series of expressions
variable = value with space between the sets. For example:
sv_alpha _cons=4
x=2
The default starting values are 0, except for the constant
terms, which are estimated from the data.
Type 5: Fixed parameters.
and fx_delta.
Three lists: fx_alpha, fx_gamma,
Each of these keywords is followed by a list of variables whose
coefficients are to be fixed in the maximization. Note: to
drop a variable from consideration, fix it. Since the starting
value is zero, it effectively drops out of the equation.
fx_alpha zz;
Example of a complete command file:
options autofix=0.001;
stata "small.dta";
depend psurv rd;
varlist x;
This command file tells the program to fit a Bailey-Makeham survival
model with one independent variable (x) and a constant term. Each of
these variables applies to each of the structural parameters. The
input data is a STATA data set. The time to failure or sensoring is
is psurv, and the indicator for failures is rd.
Running the program.
On PCs, type at the command line:
C> bailey commandfile
where commandfile is the name of
described immediately above. On
the finder. It will request the
must be in the same directory as
@B.
the ASCII dataset with the commands
Macs, just invoke the program from
name of the command dataset, which
the program.
Prediction program@
The predict program enables the user to calculate a vector of
failure probabilities, one for each individual, at a sequence of
times. It also enables the user to calculate parameter values.
The predict program is run by giving the command
C> predict commandfile
where commandfile has a format similar to the command file for
the main Bailey-Makeham model.
The statements for acquisition of raw and stata datasets with the
independent variables for prediction are identical to the main
program:
ascii "datatset_name";
read var1 var2 var3 var4 ... vark;
format "c_scanf format";
or
stata "stata_dataset_name";
2. Option statements:
options outascii altquote debug yydebug labheads missing="value";
Only one of these options is new. outascii
says that the output file is to be ascii instead
of a stata dataset.
parmdsn "datatset_name";
This should be the same as the output statement from
the Bailey-Makeham model. It specifies the binary
dataset which contains the covariance matrix and other
statistics.
output "dataset_name";
This specifies where the output values are to be written.
3.
Prediction specifications:
ptime varname1(time1) varname2(time2) ... varnamek(timek);
This specifies the variable names for the failure
probability prediction, and the times at which those
failure times will be calculated.
pparm varnamea(alpha) varnameg(gamma) varnamed(delta)
varnamaa(var(alpha)) varnamag(cov(alpha,gamma))
varnamgg(var(gamma)) varnamgd(cov(gamma,delta))
varnamdd(var(delta)) varnamad(cov(alpha,delta));
Any or all of these choices may be omitted. The
syntax in the parentheses is fixed, but the variable
names are subject to choice.
Example:
stata "small.dta";
parmdsn "final.mta";
ptime day3(3) day5(5);
pparm a(alpha) g(gamma) d(delta);
output "predict.dta";
will produce predicted failure probabilities for 3 units of time
and 5 units of time as well as values of alpha, gamma, and delta.
@C.
Surface Program@
The surface program is invoked by the command
C> surface parmdsn
Parmdsn is the output dataset produced by the Bailey-Makeham model.
The surface program computes the likelihood at one and two standard
errors out from the likelihood maximum, for each coefficient. The
program prints the decrement in log likelihood and the ratio of
that decrement to the expected value.
The standard error reflects the possible distance of the coefficient
from its true value, holding all of the variables fixed. If two
variables are collinear, the standard error will be large. However,
the likelihood decrement depends only on the 2nd derivative of the
variable in question. Thus, one standard error may be substantial
in likelihood terms.
@D.
Covariance Matrix@
A printed version of the covariance matrix may be obtained by typing
C> covar parmdsn
Where
3.
BAILEY-MAKEHAM MODEL STATA INTERFACE
The syntax of the "bailey" command is:
bailey outcome varlist [if exp] [in range] [=exp], [dead(varname)]
interval(# or varname) trace autofix(#) nocons rescale rhomax(#)
lambda(#)
fxalpha(varlist) fxgamma(varlist) fxdelta(varlist)
svalpha(svlist) svgamma(svlist) svdelta(svlist)
See Bailey (1988) for a complete description of this model. The
model is a generalization of the exponential distribution and has
hazard function
alpha e^{-gamma*t} + delta
where $\alpha$, $\gamma$, and $\delta$ are all functions of $X$,
the vector of covariates described in the varlist}. The
quantities $\alpha$, $\gamma$, and $\delta$ are termed
"structural parameters" since all of the effect of the
covariates is represented by them. The functional form of these
structural parameters is:
alpha = e^Sigma alpha_i x_i
gamma = e^Sigma gamma_i x_i
delta = e^Sigma delta_i x_i
The data may be actual failures or censored observations. The time from
the start of observation to failure or censoring is given by the
variable outcome. If there are any censored observations, you must
specify the option "dead()" with a variable that is 1 if the case is a
death and 0 if it is censored. If the observation is a death, it is
assumed to fail in the interval starting at time $T: (T, T+ interval()).
The value of interval() may be a number or a variable name. It
represents the graininess of the observation in the units of the
outcome. The default is 1 unit. For example, if the survival
times are measured in days, the graininess is assumed to be 1 day.
In other words, you know the survival time to one day, but not
closer than that. Although there is little difference between a
fine-grained measurement and a continuous measurement, the
likelihood calculated using the fine-grained approximation is
simpler. If the option is specified as a number, the program
assumes that this number applies to each case. If it is a
variable, then the value of the variable applies to each case.
One important special case is interval(0.00274), which is 1/365
of a year. In other words, this would be saying that the data are
specified in years, but derived from data measured in days. Note
that the value of interval() and the units of measurement affect
the likelihood value, but do not change the nature of the
solution.
Each of the variables in varlist applies to each structural
parameter unless otherwise specified. This process is called
``fixing.'' Variables are fixed for a structural parameter by
including them on its fixed list. The list of variables specified
in fxalpha() is fixed for $\alpha$, the list in fxgamma() is
fixed for $\gamma$, and the list in fxdelta() is fixed for
$\delta$. If a variable is not explicitly fixed, it is assumed to
be free, or available to be maximized.
Starting values are defined by specifying the options
svalpha({\it svlist), svgamma(svlist), or svdelta(svlist). The
argument svlist is a series of statements of the form {\it
variable={\it\#. If starting values are not specified, they start
at other values, usually 0.
Other options are as follows:
trace: Includes a trace of the parameter values at each maximization
step.
nocons: Excludes the automatic constant term from the equation.
Useful if there is no constant or if a mutually exclusive set
of dummy variables is employed.
rescale: Rescales the likelihood by the sum of the weights
(specified by the =exp option). This option is not advised,
but is included for compatibility with other vendors' software.
If this option is not used, the weights are treated like
frequency weights. That is, a weight of 2 means there were two
observations like this that were represented by a single line
of the data. A third meaning of weights is that they represent
inverse sampling probabilities. If this interpretation is
used, the standard errors produced by this program will not be
correct. However, it is better to rescale than not.
maxiter:
Maximum number of iterations allowed.
autofix(#): If this option is specified, then the program will
automatically fix (stop) a coefficient that is rapidly
changing, but not having an appreciable effect on the
likelihood function. The purpose of this option is to
gracefully deal with collinear and infinite coefficients. This
option will kick in only if the change in the likelihood
function is smaller than the given value. If this option is
not specified, the program will make recommendations but will
not take any action.
rhomax(#):
This parameter describes the extent to which we are
willing to extrapolate in the modified Marquardt maximization.
In the Marquardt method, a Newton step (or a ridge-like
approximation thereto) is attempted first. The actual (log)
likelihood gain is computed. This actual gain is combined with
the gradient to calculate an optimal step via a quadratic
approximation. This optimal stepsize is constrained to be at
most rhomax times as large as the original Newton step. The
default is 5; rhomax should always be greater than 1. A larger
value is more aggressive.
lambda(#): This parameter describes the (starting) height
of the ridge that is used if needed in the modified Marquardt
method whenever the 2nd derivative is non-positive definite.
The value is modified during the iterations in light of
experience. The default is 1.0.
Example:
In this dataset of 200 hospitalized cancer patients, we measure the
survival
time as a function of the Karnofsky Performance Scale:
. summ survive karnofsk
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+--------------------------------------------------survive |
200
46.15
61.3805
1
345
karnofsk |
200
49.6
27.79755
10
100
. cox survive karnofsk, dead(dead)
Iteration 0:
Iteration 1:
Iteration 2:
Log Likelihood =-865.73173
Log Likelihood =-852.31805
Log Likelihood =-852.31781
Cox regression
Number of obs =
200
chi2(1)
= 26.83
Prob > chi2
= 0.0000
Log Likelihood =-852.31781
Variable | Coefficient
Std. Error
t
Prob > |t|
Mean
---------+-------------------------------------------------------------survive|
46.15
dead|
1
---------+-------------------------------------------------------------karnofsk|
-.015111
.0029536
-5.116
0.000
49.6
---------+-------------------------------------------------------------. bailey survive karnofsk, dead(dead) autofix(.01)
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
Log
Log
Log
Log
Log
Likelihood
Likelihood
Likelihood
Likelihood
Likelihood
=
=
=
=
=
-959.1589
-955.2354
-954.4966
-954.4889
-954.4889
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
6:
7:
Log
Log
Log
Log
Log
Log
Log
Log
Likelihood
Likelihood
Likelihood
Likelihood
Likelihood
Likelihood
Likelihood
Likelihood
=
=
=
=
=
=
=
=
(convergence achieved)
Bailey-Makeham Survival Model 0.4
Log Likelihood (C) =
-954.489
200
Chi2( 3)
=
35.093
936.943
-954.4889
-946.3064
-941.3947
-937.1823
-936.9542
-936.9427
-936.9426
-936.9426
Number of observations =
Log Likelihood
=
-
Structural
|
parameter
Var. |
Coef.
Std. Err.
t
Sig.
Mean
------------------+--------------------------------------------------------alpha karnofsk |
-0.0287193 0.00898029
-3.198 0.0016
49.6
gamma karnofsk |
0.0338422
0.0177919
1.902 0.0586
49.6
delta karnofsk |
0.00799231
0.0107872
0.741 0.4596
49.6
alpha
_cons |
-2.47772
0.25908
-9.564 0.0000
1
gamma
_cons |
-5.52475
1.19964
-4.605 0.0000
1
delta
_cons |
-4.98935
0.893983
-5.581 0.0000
1
------------------+--------------------------------------------------------(Note, the iterations restart because the optimization procedure
first fits a reference model consisting of a constant term for
each of the structural parameters. Only after that model has
converged are the other parameters freed for optimization.)
In the coefficient report, note that the log likelihood of the
reference model and the chi-square of the final model with respect
to the reference model are printed on the left-hand side of the
header.
We conclude from this analysis that patients with a high Karnofsky
rating are less likely to die quickly, and that their hazard
decays to its long term value sooner. However, patients with a
high Karnofsky rating do not have any better long-term survival
rates.
Predictions
There are two types of predictions available for individuals in the
dataset:
failure probabilities at specific points in time and predictions of the
structural parameters and their standard errors. The pertinent variable
names
to receive these quantities are specified as part of the options to the
mpredict command, unlike most other Stata commands. The syntax of
mpredict is
mpredict, pparm(parmlist) ptime(faillist)
where parmlist has one or more elements of the form:
varname[alpha] | varname[gamma] | varname[delta] |
varname[var[alpha]] | varname[var[gamma]] | varname[var[delta]] |
varname[cov[alpha,gamma]] | varname[cov[alpha,delta]] |
varname[cov[gamma,delta]]
and faillist has one or more elements of the form varname[time].
Note that what is typed in the square brackets indicates what is
being predicted; what is in front of the square brackets is the
name of a new variable to be created containing that prediction.
Multiple elements are separated by spaces. In the above Karnofsky
example, we might give the command
. mpredict, pparm(alpha1[alpha] gamma1[gamma] delta1[delta])
ptime(p6[180])
to create four new variables, alpha1, gamma1, delta1, and p6.
These are the estimates of the structural parameters for each
observation and the estimated 180-day failure probability.
Surface Analysis
A surface analysis is performed by typing the msurface command,
which takes no arguments. A surface analysis is an examination of
the likelihood surface if each parameter is moved 1 and 2 standard
errors above or below the estimated parameter, holding all other
parameters fixed. The change in log likelihood is printed, along
with a number in parentheses which is the ratio of the actual
change to that predicted by the quadratic approximation, using the
second derivative matrix.
If the numbers in parentheses are larger than 2 or below 0.5, the
quadratic approximation at the maximum is definitely poor. This
points to problems with the estimated standard errors and possible
problems with multiple or infinite solutions.
References:
Bailey RC (1977): Moments for a modified Makeham Law of mortality,
Naval Medical Research Institute Technical Report, Bethesda MD.
Bailey RC, Homer LD, Summe JP (1977): A proposal for the analysis
of kidney graft survival. Transplantation 24: 309-315
Bailey RC (1988):
The Makeham Model for Analysis of Survival Data.
Bard (1974): Nonlinear Parameter Estimation.
Academic Press.
Gould WW (1990): Stata 2.1 Reference Manual and Update.
Resource Center, Santa Monica, CA. 213-393-9893
Computing
Krakauer HK, Bailey RC (1991): Epidemiologic Oversight of the Medical
Care Provided to Medicare Beneficiaries. Statistics in Medicine
(forthcoming).
Press WH, Teukolsky SA, Flannery BP, Vetterling WT (1990): Numerical
Recipies in C: The Art of Scientific Computing. Cambridge.
Rogers WH, Draper D, Kahn KL, Keeler EB, Rubenstein LV, Kosekoff J,
Brook RH: Quality of care before and after implementation of the
DRG-based prospective payment system. JAMA October 17, 1990.
264(15):1989-1994.
Stewart GW (1973): Introduction to Matrix Computation.
Summe JP (198?):
Communication.
.ls 2
.pb
@Appendix A:
1.
Solution 1:
Bailey Makeham Instructions for SAS mainframe.
Private
Bailey-Makeham Estimation Problem with 2 local maxima.@
The Dataset
psurv rd
.002 1
.003 1
.004 1
1.5 1
1.7 1
2.5 1
6
1
1
1
2
1
3
1
20 1
30 1
40 1
50 1
60 1
80
1
125 1
Academic Press.
(convergence achieved)
Bailey-Makeham Survival Model 0.4
Log Likelihood (C) =
-139.766
17
Chi2( 0)
=
0.000
139.766
Prob>chi2
=
NANFF
Number of observations =
Log Likelihood
=
-
Structural
|
parameter
Var. |
Coef.
Std. Err.
t
Sig.
Mean
------------------+--------------------------------------------------------alpha
_cons |
-0.932204
0.500671
-1.862 0.0823
1
gamma
_cons |
-0.599606
0.482264
-1.243 0.2328
1
delta
_cons |
-3.90168
0.374063
-10.431 0.0000
1
------------------+--------------------------------------------------------Solution 2:
(convergence achieved)
Bailey-Makeham Survival Model 0.4
Number of observations =
17
Log Likelihood
=
-136.256
Structural
|
parameter
Var. |
Coef.
Std. Err.
t
Sig.
Mean
------------------+--------------------------------------------------------alpha
_cons |
3.26155
0.84484
3.861 0.0017
1
gamma
_cons |
4.92238
0.649651
7.577 0.0000
1
delta
_cons |
-3.40707
0.267261
-12.748 0.0000
1
------------------+---------------------------------------------------------
Download