THE APPLICATION OF HAZARD FUNCTION IN THE CASE STUDY OF... USE OF AN IUD FOR WOMEN NOR AFIAH BINTI NOR HANAFI

advertisement
THE APPLICATION OF HAZARD FUNCTION IN THE CASE STUDY OF THE
USE OF AN IUD FOR WOMEN
NOR AFIAH BINTI NOR HANAFI
UNIVERSITI TEKNOLOGI MALAYSIA
THE APPLICATION OF HAZARD FUNCTION IN THE CASE STUDY OF THE
USE OF AN IUD FOR WOMAN
NOR AFIAH BINTI NOR HANAFI
A report submitted in partial fulfillment of the
requirements for the award of the degree of
Master of Science Mathematics
Faculty of Science
Universiti Teknologi Malaysia
NOVEMBER 2009
iii
To my beloved mother and father
iv
ACKNOWLEDGEMENTS
First and foremost, all praise be to Allah S.W.T., the Almighty, the Benevolent
for His blessing and guidance for giving me the inspiration to complete this dissertation.
A lot of thanks to Dr. Zarina Bt Mohd Khalid for her guidance and support for
me during the whole semester to complete this dissertation.
Furthermore, I would also like to dedicate my appreciation to my fellow friends
for their motivation throughout this semester. Without their endless support, this
dissertation will not complete as presented here.
Last but not least, I also extend my appreciation to all my colleagues and others
who have provided assistance in various occasions. To me, their help in giving views
and tips had provided me with useful aid in completing this project.
v
ABSTRACT
This dissertation presents the application of hazard function in the case study of
the use of an IUD for women. The data in this case study is the secondary data that had
been retrieved from Modeling Survival Data in Medical Research Second Edition by
Collet, 1952. The Kernel nonparametric method and parametric Weibull hazard
estimation methods, such as Maximum Likelihood and Least Square method in
estimating hazard function are being compared with Kaplan-Meier estimate of hazard
function. These comparisons are done through graphical comparison and analysis of
standard error. The analysis of these results is done by using Mathcad and Minitab.
vi
ABSTRAK
Disertasi ini memaparkan aplikasi fungsi risiko di dalam kajian kes terhadap
pengunaan IUD di kalangan wanita. Data di dalam kajian kes ini merupakan data darjah
kedua yang diambil dari Modelling Survival Data in Medical Research Second Edition
oleh Collet, 1952. Teknik tidak berparameter Kernel dan teknik berparameter Weibull
seperti teknik kebarangkalian maksimum dan segi empat sama terkecil dalam
menganggar fungsi risiko telah dibandingkan dengan teknik menganngarkan fungsi
risiko Kaplan-Meier. Perbandingan ini dilakukan melalui perbandingan graf dan analisis
ralat am. Analisis ini dijalankan menerusi penggunaan Mathcad dan Minitab.
vii
TABLE OF CONTENTS
CHAPTER
TITLE
PAGE
SUPERVISOR AUTHENTICATION
1
TITLE
i
DECLARATION
ii
DEDICATION
iii
ACKNOWLEDGEMENTS
iv
ABSTRACT
v
ABSTRAK
vi
TABLE OF CONTENTS
vii
LIST OF TABLES
x
LIST OF FIGURES
xi
LIST OF SYMBOLS
xiii
LIST OF APPENDICES
xiv
INTRODUCTION
1
1.1
Background of Problem
1
1.2
Statement of Problem
2
1.3
Objectives of Study
3
1.4
Scope of the Study
4
1.5
Significance of the Study
4
1.6
Organization of the Report
5
viii
2
SURVIVAL ANALYSIS
7
2.1
Introduction
7
2.1.1
The Survival Time
7
2.1.2
The Hazard Function
9
2.1.3
Weibull Hazard Function
16
2.2
The Nonparametric Hazard Function Estimation
18
2.2.1
18
The Kaplan-Meier Estimate of Hazard
Function
2.2.2
2.3
The Kernel Estimate of Hazard Function
19
The Parametric Weibull Hazard Function Estimation
21
2.3.1
21
The Maximum Likelihood Estimation of
Weibull Hazard Function
2.3.2
The Least Square Estimate of Weibull Hazard
25
Function
3
THE ESTIMATION OF HAZARD FUNCTION
32
PROCEDURES
3.1
Introduction
32
3.2
The Data of Time to Discontinuation of the Use of an
33
IUD for 18 Women
3.3
3.4
The Nonparametric Estimation of the Hazard Function
35
3.3.1
The Kaplan-Meier Estimate of Hazard Function
35
3.3.2
Kernel Estimate of Hazard function
36
The Parametric Estimation of Weibull Hazard Function
41
3.4.1
41
The Maximum Likelihood Estimate of Weibull
Hazard Function
3.4.2
The Least Square Estimate of Weibull
45
Hazard Function
4
RESULTS AND DISCUSSION
47
4.1
47
Introduction
ix
4.2
The Nonparametric Estimation of Hazard Function for
48
the Time to Discontinuation of the Use of an IUD for
18 Women
4.2.1
The Kaplan-Meier Estimate of Hazard Function
48
4.2.2
The Kernel Estimate of Hazard Function
51
4.2.3
The Comparison of Nonparametric Approaches
57
in Estimating the Hazard Function
4.3
The Parametric Estimate of Weibull Hazard Function for
59
the Time to Discontinuation of the Use of an IUD for 18
Women
4.3.1
The Maximum Likelihood Estimate of Weibull
59
Hazard Function
4.3.2
The Least Square Estimate of Weibull Hazard
62
Function
4.3.3
The Comparison of the Standard Error of
65
Estimated Weibull Hazard Function
4.4
The Comparison of the Nonparametric and Parametric
67
Approaches in Estimating the Hazard Function
5
CONCLUSIONS AND SUGGESTIONS
71
5.1
Conclusions
71
5.2
Suggestions
72
5.2.1
Suggestions Based on Findings
72
5.2.2
Suggestions for Future Resarch
73
REFERENCES
74
Appendices A - E
78
x
LIST OF TABLES
TABLE NO.
TITLE
PAGE
3.1
Time in weeks to discontinuation of the use of an IUD
34
4.1
Kaplan-Meier type estimate of the hazard function for
39
the data of the time to discontinuation of the use of an
IUD for 18 women
4.2
Kernel estimate of the hazard function for the data of
58
the time to discontinuation of the use of an IUD for 18
women
4.3
Maximum Likelihood estimate of the hazard function
60
for the data of the time to discontinuation of the use
of an IUD for 18 women
4.4
Least Square estimate of the hazard function for the
63
data of the time to discontinuation of the use of an
IUD for 18 women
4.5
The comparison of standard error of estimated Weibull
hazard function
66
xi
4.6
The standard error (RMSE) of nonparametric and
parametric estimation methods of hazard function
for the data of the time to discontinuation of the use
of an IUD for 18 women
68
xii
LIST OF FIGURES
TABLE NO.
2.1
TITLE
The form of the Weibull hazard function, h(t ) = λkt k −1
PAGE
17
for different values of k
2.2
Carl Friedrich Gauss
27
4.1
Kaplan-Meier type estimate of the hazard function for
50
the data of the time to discontinuation of the use of an
IUD for 18 women
4.2
Epanechnikov Kernel type estimate of the hazard
54
function for the data of the time to discontinuation of
the use of an IUD for 18 women
4.3
Biweight Kernel type estimate of the hazard function
for the data of the time to discontinuation of the use of
an IUD for 18 women
55
xiii
4.4
Maximum Likelihood estimate of Weibull hazard
61
function for the data of the time to discontinuation
of the use of an IUD for 18 women
4.5
Least Square estimate of Weibull hazard function
64
for the data of the time to discontinuation of the use
of an IUD for 18 women
4.6
The graphical comparison of the nonparametric and
parametric approaches in estimating the hazard function
for the data of the time to discontinuation of the use of
an IUD for 18 women
69
xiv
LIST OF SYMBOLS
b
-
Bandwidth
c
-
Censoring time
dj
-
Death at time jth death time
FT ( . )
-
Cumulative density function
fT( . )
-
Probability density function
hT (t )
-
Failure rate at time t
H( .)
-
Integrated hazard function
h(t ) , λ (t )
-
Hazard function at time t
hˆ(t ) , h * (t ) , λˆ (t )
-
Estimated hazard function
K (x)
-
Kernel function of variable x
k
-
Shape parameter of Weibull hazard function
L(θ )
-
Likelihood function of random variable θ
Ln
-
Modified empirical distribution of L
n
-
Number of observation
nj
-
Observation at risk at jth death time
S (t )
-
Survival function at time t
Sˆ (t )
-
Estimated survival function
se( θ )
-
Standard error of θ
t(j)
-
jth death time
W( λ ,k)
-
Weibull hazard distribution
λ
-
Scale parameter of Weibull hazard function
σˆ 2
-
Estimate of variance
xv
LIST OF APPENDICES
APPENDIX
A
TITLE
The application of Minitab in estimating the hazard
PAGE
78
function
B
Estimated hazard function
82
C
Standard error of Kernel estimate of hazard
86
function
D
The application of Mathcad in Kernel estimate of the
90
hazard function
E
The Newton-Raphson procedure
93
CHAPTER 1
INTRODUCTION
1.1
Background of the Problem
The beginning of survival analysis may be traced back to early work on mortality
in the seventeenth century as Graunt published the first Weekly Bill of Mortality in
London and Healey published the first lifetable. Ever since then, the lifetable method has
been used regularly by actuaries, statisticians, and biomedical researchers in
governmental and private agencies. During World War II, reliability of military
equipment became a significant issue. This directed to the study of the durability or the
“lifetime” of industrial devices.
After the war, the techniques used to analyze the reliability of industrial devices
were further expanded and pertained to the study of survival time of cancer patients. The
phrase “lifetime analysis” used by industrial reliability engineers was altered to “survival
analysis” by cancer researchers. During four decades ago, survival analysis has turn into
one of the most regularly used techniques for analyzing data in disciplines ranging from
medicine, epidemiology, and environmental health, to criminology, marketing, and
astronomy.
2
The primary variable in survival analysis is survival time, no longer limited to
mean the time to death. The term “survival time” is used loosely for the time period
from a starting time point to the occurrence of a certain event. Examples of survival time
are: the time to the development of diabetic retinopathy from the time of diagnosis of
diabetes, the time to parole of a prisoner, the duration of first marriage, workman’s
compensation or other insurance claims and lifetime of electronic devices and computer
components.
In summarizing survival data, there are two functions of central interest. The two
functions are the survival function and the hazard functions. There are many ways to
estimate these functions. For this dissertation, the focused is on estimating the hazard
function instead of the survival function. The hazard function will be applied in the case
study of the use of an IUD for woman. The Kernel and parametric Weibull hazard
functions estimates, which are Maximum Likelihood and Least Square methods will be
compared with Kaplan-Meier estimate in order to find the most suitable approach that
can be used to estimate the hazard function.
1.2
Statement of the Problem
There are two types of nonparametric estimation of hazard function that will be
discussed in this dissertation such as Kaplan-Meier estimate and Kernel Method
estimate. While for the parametric estimate of Weibull hazard function, Maximum
Likelihood and Least Square Method will be conclude. From these two approaches,
there will be a most suitable approach to estimate the hazard function of a finite data
compared with others. The problem arises when we want to determine the suitable
approach that can be used to estimate the hazard function. In order to achieve a solution
3
to this, the comparison of all these approaches need to be done and once again we will
have some problem on how to estimate the hazard function using the Kaplan-Meier,
Kernel method, Maximum Likelihood Estimate and Least Square Method, how to find
the standard error of the estimated hazard function and how to find the most appropriate
approach among these two approaches to estimate hazard function.
1.3
Objectives of the Study
The objectives of this study are:
1.3.1
To study the basic concept of Kaplan-Meier, Kernel method, Maximum
Likelihood Estimate and Least Square Method in order to estimate the hazard
function of the time to discontinuation of the use of an IUD for 18 woman.
1.3.2
To compare the nonparametric and parametric estimates using Kaplan-Meier,
Kernel method, Maximum Likelihood Estimate and Least Square Method in
order to find their advantages and disadvantages.
1.3.3
To determine the most suitable approach that can be used to estimate the hazard
function through graphical comparison and the analysis of error.
4
1.4
Scope of the Study
This study will discuss the application of two types of approaches of hazard
function which are nonparametric and parametric approaches in the case study of the use
of an IUD for woman.
For nonparametric approaches, Kaplan-Meier estimate and kernel method
estimate will be discussed and for parametric approaches, Maximum Likelihood
Estimation and Least Square Method will be discussed.
In addition, this study will also determine the most precise approach to estimate
the hazard function.
1.5
Significance of the Study
The outcomes of this study will give advantages to statistical and medical fields.
In statistics, this study will offer advance knowledge on the use of suitable approaches in
analyzing non-negative random variable for instance time to certain event. While in
medical, this research will advanced the knowledge to medical practitioners on the risk
of a person at certain times after being given some kind of treatment so that they can
improve their treatment to cure the patients.
5
1.6
Organization of the Report
This report includes five chapters which are Introduction, Survival Analysis, The
Estimation of Hazard Function Procedures, Results and Discussion and Conclusions and
Suggestions.
The introduction of the study is stated in Chapter 1 of the report. This chapter
contains six subtopics such as background of the problem, statement of the problem,
objectives of the study, scope of the study, significance of the study and organization of
the report.
Chapter 2 contains the discussion about survival analysis. In this chapter,
survival analysis is being introduced by the failure time and the hazard function. After
that, the following subtopics, which are the nonparametric hazard function estimation
and parametric Weibull hazard function estimation, are discussed.
Next, Chapter 3 explains about the estimation of hazard function procedures.
This chapter stated five interesting subtopics which are introduction, the medical data,
the nonparametric estimation of the hazard function and the parametric estimation of
Weibull hazard function.
Then, Chapter 4 includes the results and discussions of the report. The main
things that being stated in this chapters are introduction, the nonparametric estimation of
hazard function for the time to discontinuation of the use of an IUD for 18 women, the
parametric estimate of Weibull hazard function for the data of time to discontinuation of
6
the use of an IUD for 18 Women and the graphical comparison of the nonparametric and
parametric approaches in estimating the hazard function
.
Lastly, Chapter 5 includes the conclusions and suggestions. The Conclusions and
suggestions are brought upon based on findings as preparation for future research.
CHAPTER 2
SURVIVAL ANALYSIS
2.1
Introduction
Survival analysis is a broadly used technique in a multiplicity of disciplines to
review the assets of durations between certain events. Essential examples of durations
are unemployment spells, survival times of patients, and durations between subsequent
transactions in a financial security.
2.1.1
The Survival Time
The main variable in survival analysis is survival time, no longer restricted to
mean the time to death. The expression “survival time” is used freely for the time period
from a starting time point to the occurrence of a certain event. Examples of survival time
are the time to the growth of diabetic retinopathy from the time of diagnosis of diabetes,
8
the time to parole of a prisoner, the duration of first marriage, workman’s compensation
or other insurance claims, and lifetime of electronic devices and computer components.
Survival time is a nonnegative random variable evaluating the time interval from
a starting point to the occurrence of a given event. Because of its two special features,
survival time is not agreeable to standard statistical methods. First, the accurate survival
time possibly longer than the duration of the study time (or observation time) and is
consequently unknown.
For example, in a ten-year longitudinal study of diabetic retinopathy, numerous
patients may not expand retinopathy in the ten year period and, as a result, the exact
retinopathy-free time is unknown. If a patient with no retinopathy goes into the study at
the starting of year 2 and is still retinopathy-free at the last part of the study, his
retinopathy-free time is no less than nine years (usually denoted by 9C years). In an
insurance claim, the overall amount of money an insurance company forfeits in the case
of a severe automobile accident at a certain time may not be precisely known. It may be
known that the total amount is at least $5000 (or $5000C).
In countless clinical and epidemiological studies, applicants could leave the
study before it stops and therefore become lost to follow-up. They possibly will die of a
ground unconnected to the illness under study or merely reject to continue their
contribution. For example, in a three-year study of mortality due to lung cancer, an
applicant may decline to carry on after two years. His survival time is 2C years. These
incomplete observations are known as “censored” observation, and in particular, “right
censored” observations on the contrary to “left censored” observations, in which the
accurate survival time is unknown but is fewer than the observation time.
9
Majority of survival analysis methods center on right censoring as it occurs far
more regularly than left censoring. The imperfectness of data results the conventional
statistical methods inappropriate.
2.1.2
The Hazard Function
The hazard function is a particularly important characteristic of a lifetime
distribution. It indicates the way the risk of failure varies with age or time, and this is of
interest in most applications. Prior information about the shape of the hazard function
can help guide model selection. Finally it factors affecting an individual’s lifetime vary
over time, it is often essential to approach modeling through the hazard function.
The function FT ( . ) and fT( . ) supply two mathematically corresponding ways of
identifying the distribution of a continuous nonnegative random variable, and there are
numerous other corresponding functions. One with particular value in the current
context is the hazard function, or age-specific failure rate, classified by
hT (t ) = lim+
Δ →0
pr (t ≤ T < t + Δ | t ≤ T )
.
Δ
(2.1)
By the definition of conditional probability, excluding the suffix T, that
h(t ) = f (t ) / F (t ) .
(2.2)
10
Condition there is an atom fj of probability at time aj, h(t) holds an element h j δ (t − a j ) ,
where
h j = f j / F (a j ) ,
(2.3)
and for a solely discrete distribution with atoms {fj} at points {aj}, a1 < a 2 < ... ,
h(t ) = ¦ h j δ (t − a j ) ,
where
h j = f j / F (a j )
= f j /( f j + f j +1 + ...)
.
(2.4)
For continuous distribution, the probability density function is
f T (t ) = − FT' (t ) = lim
Δ →0 +
pr (t ≤ T < t + Δ )
.
Δ
(2.5)
11
By (2.2) and (2.5),
h j = − F ' (t ) / F (t )
= − d log F (t ) / dt
in order that, because F(0)=1,
t
F (t ) = exp§¨ − ³ h(u )du ·¸
© 0
¹,
= exp[− H (t )]
(2.6)
say, where H( .) is called the integrated hazard. In addition,
f (t ) = h(t ) exp[− H (t )] .
(2.7)
If and only if h( .) is constant, with value ρ say, the distribution is exponential,
F (t ) = e − ρt ,
f (t ) = ρe − ρt .
(2.8)
For discrete distribution, it pursues on applying (2.4) recursively, or by a straight
appliance of the product law of probabilities, that
12
F (t ) = ∏ (1 − h j ) ;
(2.9)
a j <t
to have T ≥ t it is essential and adequate to endure all points of support before t.
To define an integrated hazard in the discrete case the most productive
convention is to take
H (t ) =
¦ log(1 − h ) ,
j
(2.10)
a j <t
so that (2.6) still holds:
F (t ) = exp[− H (t )] .
If the hj are small
H (t ) ≈
¦h
j
a j <t
and the right-hand side could be taken as another definition.
(2.11)
13
Cox and Oakes (1984) stated that there are many causes why considerations of
the hazard function possibly an excellent idea:
(i)
It may be physically informing to regard the direct ‘risk’ connecting to an
individual identified to be alive at age t;
(ii)
Comparison of groups of individuals is occasionally most wisely made via the
hazard;
(iii)
Hazard-based models are often suitable when there is censoring or there are
some sorts of failure;
(iv)
Comparison with an exponential distribution is mainly straightforward in terms
on the hazard;
(v)
The hazard is the unique type for the ‘single failure’ system of the complete
intensity function for more elaborate point process, such that systems in which
several point events can arise for each individual.
Lee and Go (1997) stated that the hazard function gives the conditional failure
rate and defined as h(t ) = lim lim P (t < T < t + Δt / t ) / Δt . Furthermore, the hazard
Δt →0
function is also recognized as the instantaneous failure rate, age-specific failure rate, or
conditional mortality rate. It is an assess of the flatness to failure as a function of the age
of the individual in the logic that the quantity Δth(t ) is the predicted proportion of age t
individuals who will fail in the small interval t to t+ Δt . It is not actually a probability as
14
its value can be larger than one. The hazard function has a significant task in survival
analysis.
The hazard is a singularly significant assess in survival analysis. Assume that the
distribution and density functions of T, the “failure time” random variable, are indicated
by F and f correspondingly. Subramanian and Bean (2008) had defined hazard function,
indicated by λ (t ) , is the limit of P (t ≤ T < t + h | T ≥ t ) as h tends to 0, and takes the
form f (t ) / F (t ) , where F (t ) = 1 − F (t ) . When data are right censored, the monitored
data are n independent and identically distributed replicates of ( X , δ ) , where
X = min(T , C ) = T ∧ C and δ = I ( X = T ) , and C is an independent censoring time. This
is the classical random censorship (CRC) model for which estimation of λ (t ) has been
well considered, which is an outstanding basis of reference for a variety of hazard
estimation methods.
In addition, Horová et al. (2009) affirmed that the survival process can also be
described by the hazard function λ = λ (x) , such that, the probability that an individual
dies at time x, conditional on he or she having survived to that time. If the lifetime
distribution F has a density f, for F ( x) > 0 the hazard function is defined by
λ ( x) =
f ( x)
.
F ( x)
Since F (0) = 1 , the survival function can be expressed by the formula
15
x
F ( x) = exp§¨ − ³ λ (t )dt ·¸ .
© 0
¹
Let cohort of the initial size N0 die out with the time dependent death
rate μ = μ (x ) , such that, the size of the cohort N = N (x) at time x develops according to
the differential equation N ' ( x) = − μ ( x) N ( x) ; N (0) = N 0 whose result is given by
x
N ( x) = N 0 exp§¨ ³ μ (t )dt ·¸ .
0
©
¹
In this association the survival function F is defined as
F ( x) =
N ( x)
.
N0
Hence, the death rate μ equals the hazard function λ . Consequently
λ ( x) = −
N ' ( x)
.
N ( x)
16
2.1.3
Weibull Hazard Function
A probability distribution that acts a vital task in the analysis of survival data is
the Weibull distribution, established by W. Weibull in 1951 in the perspective of
industrial reliability testing. Certainly, this distribution is as vital to the parametric
analysis of survival data as the normal distribution in linear modeling.
In application, the supposition of a constant hazard function, or equally of
exponential distributed survival times is seldom acceptable. Collet (1952a) stated that, a
more common form of hazard function is such that
h(t ) = λkt k −1 ,
(2.12)
for 0 ≤ t < ∞ , a function that depends on two parameters k and λ , which are equally
greater than 0. In the particular case where k = 1, the hazard function obtains a constant
value λ , and the survival times have an exponential distribution. For additional values
of k, the hazard function increases or decreases monotonically, that is, it does not vary
direction. The shape of the hazard function depends significantly on the value of k, and
so k is identified as the shape parameter, while the parameter λ is a scale parameter. The
common form of this hazard function for different values of k is revealed in Figure 2.1.
17
Figure 2.1: The form of the Weibull hazard function, h(t ) = λkt k −1 , for different values
of k.
For this particular choice of hazard function, the survival function is
{
t
}
S (t ) = exp − ³ λku k −1 du = exp(−λt k ) .
0
The equivalent probability density function is then
f (t ) = λkt k −1 exp(−λt k ) ,
(2.13)
18
for 0 ≤ t < ∞ , which is the density of a random variable that has a Weibull distribution
with scale parameter λ and shape parameter k. The scale parameter k = α −1 is
frequently used in place of k. This distribution will be indicated W( λ ,k). The right-hand
tail of this distribution is longer than the left-hand one, and thus the distribution is
positively skewed.
Franklin (2005) stated that the exponential is not very flexible. It also imposes
the characteristic that the hazard function is a constant. Substantively this means that the
chance of failure is the same at all survival times. Whether the case has survived 10 days
or 10 months or 10 years it has the same chance of failure in the next moment. Many
find this property highly restrictive. The Weibull has a much more flexible form and it
includes the exponential as a special case. This makes it more attractive than the
exponential.
2.2
The Nonparametric Hazard Function Estimation
2.2.1
The Kaplan-Meier Estimate of Hazard Function
The estimation of the hazard function as well as the survival function has been
striking topics in survival analysis when the data are censored. Within the analysis of
survival data, the hazard function is estimated from the observed survival times.
19
Estimation of the hazard function has drawn quite fewer attentions than that of
the survival function. For the survival function, the Kaplan–Meier estimator is mainly
extensively used. This estimator is self-consistent and its asymptotic properties such as
strong consistency and asymptotic normality. Furthermore, Kaplan-Meier estimate can
also been used to estimate the hazard function.
Collet (1952b) stated that a single sample of survival data may also be reviewed
through the hazard function, which shows the dependence of the immediate risk of death
on time and Kaplan-Meier is also one way of estimating hazard function.
Kaplan-Meier is a natural way of estimating the hazard function for survival
data. It takes the ratio of the number of deaths at a given death time to the number of
individuals at risk at that time.
2.2.2
The Kernel Estimate of Hazard Function
A kernel is a weighting function applied in non-parametric estimation
techniques. Kernels are used in kernel density estimation to estimate random variables'
density functions, or in kernel regression to estimate the conditional expectation of a
random variable. Kernels are also used in time-series, in the use of the periodogram to
estimate the spectral density. An extra use is in the estimation of a time-varying intensity
for a point process.
20
A kernel is a non-negative real-valued integrable function K satisfying the
following two requirements:
+∞
³ K (u)du = 1 and K (−u ) = K (u ) for all values of u.
−∞
The first requirement ensures that the method of kernel density estimation results
in a probability density function and the second requirement ensures that the average of
the corresponding distribution is equal to that of the sample used.
If K is a kernel, then so is the function K* defined by K*(u) = Ȝ−1K(Ȝu), where Ȝ
> 0. This can be used to select a scale that is appropriate for the data.
Huang (2005) had stated the common choices for the kernel are:
the Epanechnikov kernel with
K(u) = 0.75(1 − u2) for − 1 ” u ” 1,
21
and the biweight kernel with
K(u) =
15
(1 − u 2 ) 2 for − 1 ” u ” 1.
16
By definition, the Epanechnikov and Biweight kernel are these function
respectively: (3/4)(1-u2) and (15/16)(1-u2)2 for -1<u<1 and zero for u outside that range.
Here u equals to (x-xi)/h, where h is the bandwidth and xi are the values of the
independent variable in the data, and x is the value of the scalar independent variable for
which one seeks an estimate.
2.3
The Parametric Weibull Hazard Function Estimation
2.3.1
The Maximum Likelihood Estimation of Weibull Hazard Function
Maximum likelihood estimates of the two defining parameters of the Weibull
distribution are the most efficient estimates (lowest variances) but are numerically
complex and are usually calculated with a computer program.
22
Lawless (2003) stated that to gain a likelihood function, L(θ ) ∝ Pr( Data;θ ) or
the properties of statistical procedures according on censored data it is essential to regard
the process by which both lifetimes and censoring times arise. To achieve this, a
probability model apparently need for the censoring mechanism. Fascinatingly, it
happens that the monitored likelihood function for lifetime parameters holds the similar
from under a broad range of mechanisms.
First, several notation for censored data is set up. Assume that n individuals have
lifetimes signified by random variables T1 , T2 ,..., Tn . As an alternative of the observed
values of each lifetime, however, there is a known time t i which is either the lifetime or
a censoring time. A variable δ i = I (Ti = t i ) is defined as
­1, Ti = t i
.
¯0, Ti > t i
δi = ®
The observed data then consist of (t i , δ i ) , i = 1,..., n . With this notation, let t i
stand for either random variable or realized value. This disobeys the convention where
capital letters symbolize random variables and lowercase letters symbolize realized
values, but no confusion should occur. The most significant result is that for a variety of
censoring mechanisms, the observed likelihood function equals to
n
L = ∏ f (t i ) δ i S (t i ) 1−δ i .
i =1
23
Perhaps the most vital property of the maximum likelihood process is that it
generates an estimate of the variance of the distribution of the estimated quantities. This
estimated variance is also accustomed to compensate for the incomplete information
from the censored observations. As mentioned, the influence of censored values is
infrequently a concern once it is accounted for in the estimation process.
Consider the hazard function or escape rate
h( y | x, θ ) = lim Pr( y ≤ y < y + Δ | y ≤ y, x) / Δ =
Δ =0
f ( y | x, θ )
.
1 − F ( y | x, θ )
The hazard function is just another approach of characterizing a distribution, like
the density function, the distribution function, the survivor function, the moment
generating function or the characteristic function. It is just mainly suitable and
interpretable way of describing a distribution or durations. Given the hazard, the
distribution can be calculated as
y
F ( y | x, θ ) = 1 − exp§¨ − ³ h( s | x, θ )ds ·¸ ,
© 0
¹
and hence the density function. The exponential model entails that the hazard function
remains constant over the duration of the spell, equal to exp( x' β ) in the previous
specification. To explain this, take a person and look at their chances of finding a job on
the first day of being unemployed. These chances are the same as the chances that this
same person would find a job on the fiftieth day given that he has been failed in finding
24
work in the first forty–nine days. This may be rational, but it might also be something
that is wished to force from the outset.
Therefore, an addition allowing the hazard function to either increase, stay
constant, or decrease over time is considered. This addition is known as the Weibull
distribution:
h( y | x, β , α ) = (α + 1) y α exp( x' β ) .
Note that this reduces to the exponential distribution if α = 0. The implied density
function for the Weibull distribution is
f ( y | x, β , α ) = (α + 1) y α exp( x' β ) exp(− y α +1 exp( x' β )) .
The moments of this distribution are
§ k +1 ·
E[ y k | X ] = exp(− kx' β /(α + 1))Γ¨
¸.
© α +1¹
(Note that for the case with α = 0 this reduces to the exponential case with
E ( y k | X ) = exp(− kx' β )Γ(1 + k ) , and thus with k = 1 the mean of the exponential
distribution is E[ y | X ] = exp(− x' β ) ).
25
The log likelihood function for this model is
N
(
)
L(α , β ) = ¦ ln(α + 1) + α ln y t + xt' β − y t (α + 1) exp( xt' β ) .
t =1
2.3.2
The Least Square Estimate of Weibull Hazard Function
The Least Square Estimate is so frequently applied in engineering and
mathematics problems that are frequently not thought of as an estimation problem. Least
square is a time honored estimation procedure, that has been in apply since the early
nineteenth century. It is possibly the most extensively used technique in geophysical
data analysis.
The technique of least squares developed in the fields of astronomy and geodesy
as scientists and mathematicians hunted to provide solutions to the challenges of steering
the Earth's oceans during the Age of Exploration. The exact picture of the manners of
celestial bodies was key to allowing ships to navigate in open seas where before sailors
had depended on land sightings to decide the locations of their ships.
26
The method was the conclusion of numerous progresses that happened during the
course of the eighteenth century.
(i)
The mixture of different remarks taken under the similar circumstances as
opposed to just trying one's best to monitor and documentation a single
observation accurately. This method was used by Tobias Mayer while studying
the librations of the moon.
(ii)
The mixture of different remarks as being the best estimate of the true value;
errors decrease with aggregation rather than increase, possibly initially expressed
by Roger Cotes.
(iii)
The mixture of different remarks taken under different conditions as performed
by Roger Joseph Boscovich in his study on the shape of the earth and PierreSimon Laplace in his work in clarifying the differences in motion of Jupiter and
Saturn.
(iv)
The growth of a criterion that can be evaluated to establish when the solution
with the minimum error has been attained, expanded by Laplace in his Method of
Situation.
27
Figure 2.2: Carl Friedrich Gauss
At the age of eighteen, Carl Friedrich Gauss is attributed with expanding the
fundamentals of the basis for least-squares analysis in 1795. however, Legendre was the
first to publish the method.
An early exhibition of the potency of Gauss's method appeared when it was used
to forecast the upcoming location of the recently discovered asteroid Ceres. Then, on
January 1, 1801, the Italian astronomer Giuseppe Piazzi revealed Ceres and was able to
trail its path for 40 days before it was gone in the glare of the sun. According on this
data, it was preferred to establish the location of Ceres after it appeared from behind the
sun without answering the complicated Kepler's nonlinear equations of planetary
motion. The only forecasts that fruitfully allowed Hungarian astronomer Franz Xaver
von Zach to relocate Ceres were those achieved by the 24-year-old Gauss using leastsquares analysis.
Gauss did not issue the method until 1809, when it emerged in volume two of his
study on celestial mechanics, Theoria Motus Corporum Coelestium in sectionibus
conicis solem ambientium. In 1829, Gauss was capable to state that the least-squares
28
approach to regression analysis is optimal in the logic that in a linear model where the
errors have a mean of zero, are uncorrelated, and have equal variances, the best linear
unbiased estimator of the coefficients is the least-squares estimator. This outcome is
known as the Gauss–Markov theorem.
The inspiration of least-squares analysis was also autonomously originated by
the Frenchman Adrien-Marie Legendre in 1805 and the American Robert Adrain in
1808.
Agnew and Constable (2004) stated that, unlike maximum likelihood, which can
be applied to some problem for which we know the common form of the joint
probability density function, in least squares the parameters to be estimated must arise in
terms for the means of the observations. When the parameters appear linearly in these
expressions then the least squares estimation problem can be solved in closed form, and
it is relatively simple to derive the statistical properties for the resulting parameter
estimates.
One very simple example that will be treated in some detail in order to
demonstrate the more general problem is that of fitting a straight line to a collection of
pairs of observations (xi, yi) where i = 1, 2, . . . , n. A reasonable model is of the form
y = β 0 + β1 x
(2.14)
is assumed and a mechanism for determining β 0 and β1 is needed. This is just a
particular case of the more general problem of fitting a polynomial of order p, for which
one would need to find p + 1 coefficient.
29
The most frequently used method for finding a model is that of least squares
estimation. It is supposed that x is an independent (or predictor) variable which is
known exactly, while y is a dependent (or response) variable. The least square (LS)
estimates for β 0 and β1 are those for which the expected values of the curve minimize
the sum of the squared deviations from the observations.
The problem is to find the values of β 0 , β1 that minimize the residual sum of
squares
n
S ( β 0 , β1 ) = ¦ ( y i − β 0 − β 1 x1 ) 2 .
(2.15)
i =1
Note that this involves the minimization of vertical deviations from the line (not
the perpendicular distance) and is therefore not symmetric in y and x. In addition, if x is
treated as the dependent variable instead of y one might well expect a different result.
To obtain the minimizing values of β i in (2.15) the equations resulting from
setting
∂S
= 0,
∂β 0
is solved, namely
∂S
=0
∂β 1
(2.16)
30
¦y
i
¦x y
i
i
i
i
= nβˆ0 + βˆ1 ¦ xi
i
= βˆ0 ¦ xi + βˆ1 ¦ xi2
i
(2.17)
i
Solving for the β̂ i yields the least squares parameter estimates:
βˆ0 = (¦ xi2 ¦ iy i − ¦ xi ¦ xi y i ) /(n¦ xi2 − (¦ xi ) 2 )
βˆ1 = (n¦ xi y i − ¦ xi ¦ y i ) /(n¦ x i − (¦ xi ) 2 )
2
(2.18)
where the ™’s are implicitly taken to be from i = 1 to n in each case. Having generated
these estimates, it is usual to question how much faith the values of β 0 and β1 holds,
and whether the fit to the data is reasonable. Possibly a different functional form would
offer more appropriate fit to the observations, as example, involving a series of
independent variables, so that
y ≈ β 0 + β 1 x1 + β 2 x 2 + β 3 x3
(2.19)
f (t ) = Ae −αt + β e − βt
(2.20)
or decay curves
31
or periodic functions
f (t ) = A cos ω1t + B sin ω1t + C cos ω 2 t + D sin ω 2 t
(2.21)
In equations (2.20) and (2.21) the functions f(t) are linear in A, B, C and D, but
nonlinear in the other parameters α , β , ω1 and ω 2 . When the function to be fit is
linear in the parameters, then the partial derivatives of S with respect to them obtain
equations that can be solved in closed form. Normally, nonlinear least squares problems
do not provide a solution in closed form and one must resort to an iterative procedure.
However, it is sometimes likely to convert the nonlinear function to be fitted into a
linear form. As an example, the Arrhenius equation models the rate of a chemical
reaction as a function of temperature via a 2parameter model with an unknown constant
frequency factor C and activation energy EA, so that
α (T ) = Ce − E
A
/ κT
(2.22)
Boltzmann’s constant, k is known a priori. If one estimates α at various values
of T, then C and EA can be obtained by a linear least squares fit to the transformed
variables, log α and
log α (T ) = log C −
EA
.
κT
(2.23)
CHAPTER 3
THE ESTIMATION OF HAZARD FUNCTION PROCEDURES
3.1
Introduction
This section discusses the procedures to estimate the hazard function of the time
to discontinuation of the use of an IUD for 18 women. The general purpose of this
dissertation is to determine the most suitable approach between parametric and
nonparametric approaches that can be used to estimate the hazard function of the case
study of the use of an IUD for woman. The parametric estimates that will be compared
are maximum likelihood estimate and least square estimate. To assess and evaluate the
most appropriate estimate, all these methods have been used to estimate the hazard
function of the time to discontinuation of the use of an IUD for 18 women.
33
3.2
The Data of Time to Discontinuation of the Use of an IUD for 18 Women.
The data of the time to discontinuation of the use of an IUD for 18 women is
obtained from the secondary references as stated in “Modeling Survival Data in Medical
Research (Second Edition)” by Collet (1952). Furthermore, this data is obtained from a
research conducted by World Health Organisation (WHO).
In trials involving contraceptives, prevention of pregnancy is an obvious criterion
for acceptability. However, modern contraceptives have very low failure rates, and so
the occurrence of bleeding disturbances, such as amenorrhoea (the prolonged absence of
bleeding), irregular or prolonged bleeding, become important in the evaluation of a
particular method of contraception.
To promote research into methods of analyzing menstrual bleeding data for
women in contraceptive trials, the World Health Organisation has made available data
from clinical trials involving a number of different types of contraceptive. Part of this
data set relates to the time from which a woman commences use of a particular method
until discontinuation, with the discontinuation reason being recorded when known.
The data in Table 3.1 refer to the number of weeks from the commencement of
use of a particular type of intrauterine device (IUD), known as the Multiload 250, until
discontinuation because of menstrual bleeding problems. Data are given for 18 women,
all who were aged between 18 and 35 years and who had experienced two previous
pregnancies. Discontinuation times that are censored are labeled with an asterisk.
34
Table 3.1: Time in weeks to discontinuation of the use of an IUD.
10
13*
18*
19
23*
30
36
38*
54*
56*
59
75
93
97
104*
107
107*
107*
In this data, the time origin corresponds to the first day in which a woman uses
the IUD, and the end point is discontinuation because of bleeding problems. Some
women in the study ceased using the IUD because of the desire for pregnancy, or
because they had no further need for a contraceptive, while others were simply lost to
follow up. These reasons account for the censored discontinuation times of 13, 18, 23,
38, 54 and 56 weeks.
The study protocol called for the menstrual bleeding experience of each woman
to be documented for a period of two years from the time origin. For practical reasons.
each woman could not be examined exactly two years after recruitment to determine if
they were still using the IUD, and this is why there are three discontinuation times
greater than 104 weeks that are right censored.
This data was originally analyzed to summarize the distribution of
discontinuation times and to estimate the median time to discontinuation of the IUD, or
the probability that a woman will stop using the device after a given period of time. But,
for this dissertation, this data is used to assess and evaluate the most appropriate
estimation method in estimating the hazard function.
35
3.3
The Nonparametric Estimation of the Hazard Function
3.3.1
The Kaplan-Meier Estimate of Hazard Function.
As for Kaplan-Meier estimate, a usual technique of estimating the hazard
function for unground survival data is to take the ratio of the number of deaths at a given
death time to the number of individuals at risk at that time. If the hazard function is
supposed to be constant among successive death times, the hazard per unit time can be
obtained by further dividing by the time interval. Hence, if there are dj deaths at the jth
death time, t(j), j = 1, 2, …, r, and nj at risk at time t(j), the hazard function can be
estimated by
d
hˆ(t ) = j ,
n jτ j
(3.1)
for t(j) ” t < t(j+1), where IJj = t(j+1) - t(j) . Note that it is impossible to apply equation (3.1) to
estimate the hazard in the interval that begins at the final death time, since this interval is
open-ended.
The estimate in equation (3.1) is known as Kaplan-Meier type estimate, since
estimated survival function obtained from it is the Kaplan-Meier estimate. This holds
since hˆ(t ) , t(j) ” t < t(j+1), is an estimate of the risk of death per unit time in the jth
interval. The probability of death in that interval is hˆ(t )τˆ j , that is dj / nj. Hence, an
36
estimate of the corresponding survival probability in that interval is 1 – (dj / nj) and the
estimated survival function is given by
k §
dj
Sˆ (t ) = ∏ ¨1 −
¨ n
j =1 ©
j
·
¸ .
¸
¹
(3.2)
The approximate standard error of hˆ(t ) can be found from the variance of dj,
which may be assumed to have a binomial distribution with parameters nj and pj, where
pj is the probability of death in the interval of length IJ. Consequently, var(dj) = njpj(1-pj),
and estimating pj by dj / nj gives
§ nj − d j ·
¸
se{hˆ(t )} = hˆ(t ) ¨
¨ nd ¸
© j j ¹.
However, when dj is small, confidence interval constructed using this standard error will
be too wide to be of practical use.
3.3.2
Kernel Estimate of Hazard Function
The general kernel estimator of h * (t ) or the failure hazard at point t is given by
n
h * (t ) = ¦
i =1
di
K θ (t − t ( i ) ).
n − i +1
37
When ș = m, that is Kș(z) = (1/m)K(z/m), we will refer to this estimator as the 1parameter estimator. The estimate h * (t ) can be regarded as a convolution smoothing of
the formal derivative of the empirical cumulative hazards Hˆ (t ) = ¦ ti ≤t ai , where
a i = d i /( No. of items at risk at time t i ) = d i /( N − rank of t i + 1).
Let T1, T2, . . . , Tn be independent and identically distributed lifetimes with
distribution function F. Let C1, C2, . . . , Cn be independent and identically distributed
censoring times with distribution function G which are usually assumed to be
independent from the lifetimes. In the random censorship model we observe pairs (ti, di),
i = 1, 2, . . . , n, where ti = min (Ti, Ci) and di = I{ti = Ti} indicates whether the
observation is censored or not. It follows that the {ti} are independent and identically
distributed with distribution function L satisfying L (t ) = F (t )G (t ) where E = 1 − E is
the survival function for any distribution function E.
The survival function F is the probability that an individual survives for a time
greater or equal to x. Kaplan and Meier proposed the product limit estimate of F :
Fˆ (t ) =
§ n− j ·
¨¨
¸¸
∏
{ j:t ( j ) < t } © n − j + 1 ¹
d( j )
(3.3)
where t(j) denotes the j-th order statistics of t1, t2, . . . , tn and d(j) the corresponding
indicator of the censoring status.
38
The hazard function h(t) is the probability that an individual dies at time t,
conditional on he or she having survived to that time. If the life distribution F has a
density f, for F (t) > 0 the hazard function is defined by
h(t ) =
f (t )
F (t )
(3.4)
and the cumulative hazard function as
H(t)= −log F (t).
(3.5)
Nelson proposed to estimate the cumulative hazard function H is estimated by
H n (t ) =
di
¦ n − i +1
(3.6)
t( i ) ≤t
Let [0, T], T > 0, be such an interval for which L(T) < 1. First, let us make some
assumptions:
(1)
h ∈ C k0 [0, T ], k 0 ≥ 2.
(2)
Let Ȟ, k be nonnegative integers satisfying 0 ” Ȟ ” k − 2, 2 ” k ” k0.
39
(3)
Let K be a real valued function on R satisfying conditions
(i)
Support (K) = [−1, 1],K(−1) = K(1) = 0.
(ii)
K ∈ Lip[−1, 1].
(iii)
­0 , 0 ≤ j < k; j ≠ v
°
v
³−1 x K (t )dt = ® (−1) v! , j = v
° β ≠0 , j=k
k
¯
1
j
Such a function is called a kernel of order k and the class of these kernels is
denoted by SȞk.
(4)
Let {b(n)} be a non-random sequence of positive numbers satisfying
lim b(n) = 0 , lim b(n) 2v +1 n = ∞ .
n →∞
n →∞
These numbers are called bandwidths or smoothing parameters.
The definition of the kernel given above is very suitable for the next
considerations and moreover it will be very reasonable to assume that Ȟ and k have the
same parity. This fact enables us to choose an optimal kernel.
40
The kernel estimate of the Ȟth derivative of the hazard function h is the following
convolution of the kernel K with the Nelson estimator Hn:
§t −u·
¸dH n (u )
b ¹
b
1 n § t − t (i ) · d (i )
¸
= v +1 ¦ K ¨¨
, K ∈ S vk
b i =1 © b ¸¹ n − i + 1
h *b( v,k) =
1
v +1
³ K ¨©
(3.7)
Small values of b lead to very spiky estimates (not much smoothing) while larger
b values lead to oversmoothing. The bandwidth, b can be alleviated by increasing the
bandwidth of the kernel to a larger value such as 0.5.
The estimate of variance, σ 2 can be done by
σˆ 2 =
1 n
2
¦ (t i − t )
n − 1 i =1
(3.8)
The asymptotic (1- α ) confidence interval for h *(bv, K) (t ) is given by
1/ 2
h*
(v)
b, K
­° hˆb , K (t )V ( K ) ½°
(t ) ± ®
2 v +1 ¾
°̄ (1 − Ln (t ))nb
°¿
Φ −1 (1 − α / 2)
(3.9)
41
where ĭ is the normal cumulative distribution function and Ln is the modified empirical
distribution of L
Ln (t ) =
1 n
¦1{t ≤t} .
n + 1 i =1 i
When estimating near 0 or T, then boundary effects can occur because the
“effective support” [t – b , t + b] of the kernel K is not contained in [0, T]. This can lead
to negative estimates of hazard functions near endpoints. The same can happen if kernels
of higher order are used in the interior. In such cases it maybe reasonable to
truncate hb*, K (t ) = max(hb*, K (t ),0) . The similar considerations can be made for the
confidence intervals. The boundary effects can be avoided by using kernels with
asymmetric supports.
3.4
The Parametric Estimation of Weibull Hazard Function
3.4.1
The Maximum Likelihood Estimate of Weibull Hazard Function
The Weibull probability distribution of survival times is defined by two
parameters which are the scale parameter, k and shape parameter, λ . The two
parameters of the Weibull parametric distribution provide additional flexibility that
potentially increases the accuracy of the description of collected survival data.
42
The maximum likelihood estimation method will be used to estimate the shape
and scale parameter of probability density function of Weibull distribution. After
obtaining the estimates, the values of these estimates will be fitted into the Weibull
Hazard function and the graph of maximum likelihood estimate of hazard function will
be plotted.
The survival times of n individuals are now taken to be a censored sample from a
Weibull distribution with scale parameter λ and shape parameter k. Suppose that there
are r deaths among n individuals and n – r right censored survival times. By using
equation
n
∏{ f (t )}δ {S (t )}
1−δ i
i
i
i
,
(3.10)
i =1
the likelihood of the sample data can be obtained. The probability density, survival and
hazard function of a W (λ , k ) distribution are given by
f (t ) = λkt k −1 exp(−λt k ), S (t ) = exp(−λt k ), h(t ) = λkt k −1 .
Note that, the scale parameter λ = α −1 is often use in place of λ . So, from expression
(3.10), the likelihood of the n survival times is
n
∏{λkt
k −1
i
exp(−λt ik )}δ i {exp(−λt ik )}1−δ i ,
i =1
where δ i is zero if the ith survival time is censored and unity otherwise. Equivalently,
by equation
43
δi
­ f (t i ) ½
®
¾ S (t i ) ,
∏
i =1 ¯ S (t i ) ¿
n
the likelihood function is
n
∏{λkt
k −1 δ i
i
} exp(−λt ik ) .
i =1
This is regarded as a function of λ and k, the unknown parameters in the Weibull
distribution, and so can be written L(λ , k ) . The corresponding log-likelihood function is
given by
n
n
n
i =1
i =1
i =1
log L(λ , k ) = ¦ δ i log(λk ) + (k − 1)¦ δ i log t i − λ ¦ t ik ,
n
and noting that
¦δ
i
= r , the log-likelihood becomes
i =1
n
n
i =1
i =1
log L(λ , k ) = r log(λk ) + (k − 1)¦ δ i log t i − λ ¦ t ik .
44
The maximum likelihood estimate of λ and k are found by differentiating this function
with respect to λ and k, equating the derivatives to zero, and evaluating them at λ̂ and
k̂ . The resulting equations are
r
λ̂
n
− ¦ t ik = 0 ,
ˆ
(3.11)
i =1
and
n
n
r
ˆ
+ ¦ δ i log t i − λˆ ¦ t ik log t i = 0 .
ˆ
k i =1
i =1
(3.12)
From equation (3.11),
n
λ̂ = r / ¦ t ik ,
ˆ
(3.13)
i =1
and on substituting for λ̂ in equation (3.12), the following equation is obtained
n
r
+ ¦ δ i log t i −
kˆ i =1
r
n
¦t
kˆ
i
kˆ
i =1
i i
¦t
log t i = 0 .
(3.14)
This is a non-linear equation in k̂ , which can only be solved using an iterative numerical
procedure. Once the estimate, k̂ , which satisfies equation (3.14), has been found,
equation (3.13) can be used to obtain λ̂ .
45
In practice, a numerical procedure, such as Newton-Raphson algorithm, is used
to find the values λ̂ and k̂ which maximize the likelihood function simultaneously.
This procedure is described in APPENDIX E. In that appendix it was noted that an
important by-product of the Newton-Raphson procedure is an approximation to the
variance-covariance matrix of the parameter estimates, for which their standard errors
can be obtained.
3.4.2
The Least Square Estimate of Weibull Hazard Function
The linear equation is as follows;
ª 1 º
ln ln «
» = β ln x − β ln α .
¬1 − F ( x) ¼
Then,
­ ª
º½
° «
» °°
1
1
°
»¾ ,
x = ¦ ln ®ln «
i · »°
n i =1 ° « §
¨1 −
¸
°¯ «¬ © n + 1 ¹ »¼ °¿
n
y=
1 n
¦ ln xi ,
n i =1
46
­
§ ª ­
½
ª ­
½º ·¸½ ­
½º
¨ « °
°° n
°
n
n
°°
°
»
«
»
° 1 °° ¸° °
°° 1 °°
β̂ = ®n¦ (ln xi )¨ ln «ln ®
» ¾ − ®¦ ln «ln ®
» ¦ ln xi ¾ ,
¾
¾
¨ « °
i °» ¸° ° i =1 « °
i °» i =1
° i =1
°
1−
1−
¨
¸
°
°
°
°
°¿
°
«
»
«
»
°¯
°
+
n
1
n
1
+
¿
¯
¿
¯
¬
¼
¬
¼
©
¹¿ ¯
and
αˆ = e
§ y−x ·
¨−
¸
¨ βˆ ¸
©
¹
To compare the Maximum Likelihood Estimate and Least Square Estimate, we
used the Standard error test. It can be calculated as below
se(estimate) = Variance[log(estimate )] ,
with 95% confidence interval;
log(estimate) ± 1.96*se(estimate).
CHAPTER 4
RESULTS AND DISCUSSION
4.1
Introduction
This section discusses the result of the research on the application of hazard
function in the case study of the use of an IUD for women. The findings will be divided
into two important topics discussed in the earlier chapter which are parametric and
nonparametric estimation of hazard function.
48
4.2
The Nonparametric Estimation of Hazard Function for the Time to
Discontinuation of the Use of an IUD for 18 Women.
4.2.1
The Kaplan-Meier Estimate of Hazard Function
The first nonparametric estimation method of estimating the hazard function is
Kaplan-Meier method. The estimated hazard function for survival time t(j), j = 1, 2, …, r,
where t(j) is the time to discontinuation of the use of an IUD for 18 women in weeks for
jth interval and r is the greatest observation time is obtained by,
dj
.
hˆ(t ) =
n jτ j
Note that, dj are the number of the discontinuation of the use of an IUD at the jth
discontinuation time, nj are risk of discontinuation at time t(j) and IJj = t(j+1) + t(j).
Next, the standard error of this hazard function is obtained by using the
following equation,
§ nj − d j
se{hˆ(t )} = hˆ(t ) ¨
¨ n d
© j j
·
¸,
¸
¹
49
while the confidence interval is given by
hˆ(t ) ± zα se{hˆ(t )} .
2
Table 4.1: Kaplan-Meier type estimate of the hazard function for the data of the time to
discontinuation of the use of an IUD for 18 women.
Time
95% Confidence
τj
nj
dj
hˆ(t )
se{ hˆ(t ) }
0-
10
18
0
0.0000
-
-
10-
9
18
1
0.0062
0.0060
(-0.0056,0.0180)
19-
11
15
1
0.0061
0.0059
(-0.0055,0.0177)
30-
6
13
1
0.0128
0.0123
(-0.0113,0.0370)
36-
23
12
1
0.0036
0.0035
(-0.0033,0.0105)
59-
16
8
1
0.0078
0.0073
(-0.0065,0.0221)
75-
18
7
1
0.0079
0.0073
(-0.0064,0.0222)
93-
4
6
1
0.0417
0.0380
(-0.0328,0.1162)
97-
10
5
1
0.0200
0.0179
(-0.0151,0.0551)
Interval
Interval
Table 4.1 shows the results of the estimated hazard function, the standard error
and 95% confidence interval of the hazard function.
From Table 4.1, the highest value of estimated hazard function hˆ(t ) is 0.0417 and
the smallest value is 0.0000. For the standard error of the hazard function se( hˆ(t ) ) the
highest value is 0.0380 with 95% confidence interval (-0.0328,0.1162). The smallest
value is 0.0035 with 95% confidence interval (-0.0033,0.0105).
50
The value of estimated hazard function hˆ(t ) , increases from 0 to 10 weeks, 19 to
30 weeks and 36 to 93 weeks as the time to discontinuation of the use of an IUD for 18
women increases. For the rest of the weeks, the estimated hazard function, hˆ(t ) is
decreasing. This result occurs since as the time goes by, the number of risk nj and the
size of the time interval, IJj are increasing and decreasing differently.
The same results happen with the standard error of the hazard function. As the
value of estimated hazard function, hˆ(t ) increases or decreases, the standard error will
also follow the same pattern. This happens since the value of standard error depends on
the value of estimated hazard function.
After calculating the estimated hazard function, the graph of hazard curve is
plotted. Figure 4.1 shows a plot of the estimated hazard function, hˆ(t ) by using the
Kaplan-Meier estimate.
Figure 4.1: Kaplan-Meier type estimate of the hazard function for the data of the time to
discontinuation of the use of an IUD for 18 women.
51
The solid line shows the hazard curve of the time 18 women commence use of an
IUD until discontinuation. The number of weeks from the commencement of use of a
particular type of intrauterine device (IUD) until discontinuation because of menstrual
bleeding problems are recorded.
The downward and upward lines occur every time the discontinuations occur.
The largest number the estimated hazard function of the time to discontinuation of the
use of an IUD occurs at the time interval 97 until 104 weeks. The estimated hazard
function starts at 0.0000 at 0 weeks and ends with 0.0200 at 107 weeks. After 107
weeks, the hazard curve remains the same. This implies that there exist the proportion of
long term survival which equals to 0.0200.
4.2.2
The Kernel Estimate of Hazard Function
For kernel estimate of hazard function, two types of kernel function, K(t) had
been used which are Epanechnikov and Biweight kernel. The following equation shows
all these kernel function.
Epanechnikov Kernel: K (h) = 0.75(1 − h 2 )
Biweight Kernel: K (h) =
15
(1 − h 2 ) 2
16
(4.1)
(4.2)
52
where h =
t − t( j)
b
, for t ( j ) ≤ t < t ( j +1) based on r ordered discontinuation times,
t (1) , t ( 2) ,..., t ( r ) and b is the bandwidth. The larger the value of b, the greater the degree of
smoothing.
~
The Kernel estimate of the hazard function hb, K (t ) is the following convolution
of the Kernel K,
§ t − t (i )
~
1
hb , K (t ) = ¦ K ¨¨
b
© b
· d (i )
¸¸
.
¹ n − i +1
(4.3)
By using all the above equations, the following estimations of hazard function
are obtained respectively,
§ § t − t( i ) · 2 · d ( i ) ·
~
1 n §¨
¸ for Epanechnikov kernel and
¸ ¸
hb, K (t ) = ¦ 0.75¨1 − ¨¨
¨ © b ¸¹ ¸ n − i + 1 ¸
b i =1 ¨
©
¹
©
¹
2 2
§
·
~
1 n ¨ 15 §¨ § t − t(i ) · ·¸ d (i ) ¸
¸
for biweight kernel
hb, K (t ) = ¦ ¨
1− ¨
b i =1 ¨ 16 ¨ ¨© b ¸¹ ¸ n − i + 1 ¸¸
¹
© ©
¹
where d(i) is the indicator function that indicates whether the observation is censored or
not and n is the number of observations. For this case study, the censored
discontinuation times occur when some women in the study ceased using the IUD
53
because the desire of pregnancy, or because they had no further need for a contraceptive,
while others were simply lost to follow up.
For this paper, the value of bandwidth, b is assumed to be equal to 110 since b
needs to be greater than ( t − t i ) in equation (4.3) to avoid the value of estimated hazard
~
function, hb, K (t ) becomes negative and the n equal to 18 observations.
~
The estimated hazard function, hb, K (t ) for Biweight and Epanechnikov Kernel
are stated in the appendix.
After calculating the estimated hazard function, the graph of hazard function is
plotted agains time, t in the interval from 0 until 107 weeks for based on r ordered
discontinuation times, t (1) , t ( 2) ,..., t ( r ) where t(r) is the greatest discontinuation time and b
= 110 is the bandwidth.
54
h(t)
Epanechnikov Kernel of Estimated
Hazard Function
7.00E-03
6.00E-03
5.00E-03
4.00E-03
3.00E-03
2.00E-03
1.00E-03
0.00E+00
1
13 25 37 49 61
73 85 97
Weeks, t
EK(b=110)
Figure 4.2: Epanechnikov Kernel type estimate of the hazard function for the data of the
time to discontinuation of the use of an IUD for 18 women.
Figure 4.2 shows the plotted Epanechnikov Kernel estimate of hazard function
for the time 18 women commence use of an IUD until discontinuation.
The estimated hazard function starts at 0 weeks and ends at 107 weeks.
According to Figure 4.3, the lowest value occurs at the 0 months while the highest value
occurs at 74 weeks.
55
The estimated hazard function is increasing during interval 0 until 74 weeks.
This implies that woman run a high risk of discontinuation of the use of an IUD about 0
until 74 weeks after the use of an IUD.
After that, during interval 74 until 107 weeks, the estimated hazard function is
decreasing. This means that woman run a low risk of discontinuation of the use of an
IUD about 74 until 107 weeks after the use of an IUD.
Biweight Kernel of Estimated
Hazard Function
1.00E-02
h(t)
8.00E-03
6.00E-03
4.00E-03
2.00E-03
0.00E+00
1
13 25 37 49
61 73 85 97
Weeks, t
BK(b=110)
Figure 4.3: Biweight Kernel estimate of hazard function for the data of the time to
discontinuation of the use of an IUD for 18 women.
56
Figure 4.3 shows the plotted Biweight Kernel estimate of hazard function for the
data of the time to discontinuation of the use of an IUD for 18 women.
The estimated hazard function starts at 0 weeks and ends at 107 weeks.
According to Figure 4.3, the lowest value occurs at the 0 weeks while the highest value
occurs at 75 weeks.
The estimated hazard function is increasing during interval 0 until 75 weeks.
This implies that woman run a high risk of discontinuation of the use of an IUD about 0
until 75 weeks after the use of an IUD.
After that, during interval 75 until 107 weeks, the estimated hazard function is
decreasing. This means that woman run a low risk of discontinuation of the use of an
IUD about 75 until 107 weeks after the use of an IUD.
57
4.2.3
The Comparison of Nonparametric Approaches in Estimating the Hazard
Function
~
The square of standard error of the estimated hazard function sse{ hb, K (t ) } is
calculated by using equation
~
hb, K (t )V ( K )
~
sse{hb, K (t )} ≈
(1 − Ln (t ))nb
while the 95% confidence interval of the estimated hazard function is obtained by using
equation
α
~
~
hb, K (t ) ± {sse(hb, K (t ))}1 / 2 Φ −1 (1 − )
2
where
1
V ( K ) = ³ K 2 (t )dt ,
−1
Ln is the modified empirical distribution function of L
58
Ln (t ) =
1 n
¦1{t ≤t}
n + 1 i =1 i
and ĭ is the normal cumulative distribution function.
Table 4.2: Kernel estimate of the hazard function for the data of the time to
discontinuation of the use of an IUD for 18 women.
Mean of the
Mean of
Estimate
Standard Error
Lower Bound
Upper Bound
Epanechnikov
0.0058
0.0019
0.0035
0.0081
Biweight
0.0062
0.0022
0.0036
0.0088
Kernel
95% Confidence Interval
Table 4.2 shows the Kernel estimate of the hazard function for the data of the
time to discontinuation of the use of an IUD for 18 women.
The value of the mean of Epanechnikov Kernel estimate of hazard function is
0.0058 with 0.0019 standard error and (0.0035,0.0081) 95% confidence interval. For the
Biweight Kernel, the mean of estimated hazard function is 0.0062 with 0.0022 standard
error and (0.0036,0.0088) 95% confidence interval.
Since the value of the mean of standard error of Epanechnikov Kernel is smaller
than Biweight Kernel, therefore, the Epanechnikov Kernel estimate of hazard function is
the most precise nonparametric approaches to estimate the hazard function for the data
of time to discontinuation of the use of an IUD for 18 women.
59
4.3
The Parametric Estimate of Weibull Hazard Function for the Data of Time
to Discontinuation of the Use of an IUD for 18 Women.
4.3.1
The Maximum Likelihood Estimate of Weibull Hazard Function
The survival time of n woman are taken to be a censored sample from a Weibull
distribution with scale parameter k and shape parameter Ȝ. Suppose there are r deaths
among n individuals and n-r right censored survival times.
The hazard function of a Weibull, W(k, Ȝ) distribution is given by
k§t·
h(t ; k , λ ) = ¨ ¸
λ ©λ¹
k −1
.
(4.4)
By using Maximum Likelihood estimate, the following results hold;
n
ˆ
kˆ = r / ¦ t iλ
i =1
and
(4.5)
60
n
r
+ ¦ δ i log t i −
λˆ i =1
r
¦t
n
¦ t λ log t
ˆ
i
λˆ
i =1
i i
i
= 0.
(4.6)
The above equation is a non-linear equation in λ̂ , which can be solved by
numerical procedure. Once the estimate of λ̂ , which satisfies equation (4.6), has been
found, equation (4.5) can be used to obtained k̂ .
The Maximum Likelihood estimate of k̂ and λ̂ , the standard error and the 95%
confidence interval are obtained by fitting the Weibull distribution to the observed data.
This can be done by using MINITAB software.
Table 4.3: Maximum likelihood estimate of the hazard function parameter for the data of
the time to discontinuation of the use of an IUD for 18 women.
Parameter
Estimate
Standard Error
Shape, k̂
1.6764
Scale, λ̂
98.6440
95% Confidence Interval
Lower Bound
Upper Bound
0.4604
0.9786
2.8717
20.1601
66.0859
147.2420
Table 4.3 shows the maximum likelihood estimate of the hazard function
parameter for the data of the time to discontinuation of the use of an IUD for 18 women.
The value of estimated shape parameter, k̂ is 1.6764 with 0.4604 standard error
and (0.9785,2.8717) 95% confidence interval. For the scale parameter, λ̂ the estimated
value is 98.6440 with 20.1601 standard error and (66.0859,147.2420) 95% confidence
interval.
61
By substituting the values of estimated shape and scale parameters into equation
(4.5), the following figure is obtained.
MLE of Weibull Hazard Function
0.02
h(t)
0.015
0.01
0.005
0
1
11 21 31 41 51 61 71 81 91 101
Weeks, t
MLE
Figure 4.4: Maximum likelihood estimate of Weibull hazard function for the data of the
time to discontinuation of the use of an IUD for 18 women.
Figure 4.4 shows the maximum likelihood estimate of Weibull hazard function
for the data of time to discontinuation of the use of an IUD for 18 women.
The solid line shows the Weibull hazard curve which starts at 0 weeks and ends
at 107 weeks. The lowest value of the Weibull hazard curve is 0 at weeks 0 and the
highest value is 0.018 at 107 weeks.
62
The values of the Weibull hazard function are increasing as the time to
discontinuation of the use of an IUD for 18 women increasing. This occurs since the
value of the shape parameter, kˆ is greater than 1 which equal to 1.6764.
4.3.2
The Least Square Estimate of Weibull Hazard Function
For the least square estimate of Weibull, W(k, Ȝ) shape and scale parameter, the
following equation are obtained by using the linear equation is as follows;
­ ª 1 º½
ln ®ln «
» ¾ = λ ln t − λ ln k
¯ ¬1 − F (t ) ¼ ¿
where F(t) is the conditional density function for Weibull distribution.
Then,
­ ª
º½
°
» °°
«
1 n °
1
»¾ ,
«
t = ¦ ln ®ln
i · »°
n i =1 ° « §
¨1 −
¸
°¯ «¬ © n + 1 ¹ »¼ °¿
y=
1 n
¦ ln t i ,
n i =1
63
­
½
§ ª ­
ª ­
½º
½º ·¸½ ­
¨ « °
°° n
°
n
n
°
°°
«
»
»
°° 1 °°
° 1 °° ¸° °
λ̂ = ®n¦ (ln t i )¨ ln «ln ®
» ¦ ln t i ¾ ,
» ¾ − ®¦ ln «ln ®
¾
¾
¨ « °
i °» ¸° ° i =1 « °
i °» i =1
° i =1
°
1−
1−
¸
¨
°
°
°
°
°
°¿
«
»
«
»
°¯
°
n
1
n
1
+
+
¿
¯
¿
¯
¬
¼
¬
¼
¹¿ ¯
©
(4.7)
and
§ y −t ·
¨−
¸
λ̂ ¹
kˆ = e ©
.
(4.8)
The least square estimate of k̂ and λ̂ , the standard error and the 95% confidence
interval are also obtained by fitting the Weibull distribution to the observed data. This
can be done by using MINITAB software.
Table 4.4: Least square estimate of the hazard function parameter for the data of the time
to discontinuation of the use of an IUD for 18 women.
Parameter
Estimate
Standard Error
Shape, k̂
1.3379
Scale, λ̂
109.9520
95% Confidence Interval
Lower Bound
Upper Bound
0.5016
0.6416
2.7897
33.0248
61.0293
198.0920
Table 4.4 shows the least square estimate of the hazard function parameter for
the data of the time to discontinuation of the use of an IUD for 18 women.
64
The value of estimated shape parameter, kˆ is 1.3379 with 0.5016 standard error
and (0.6416, 2.7897) 95% confidence interval. For the scale parameter, λ̂ the estimated
value is 109.9520 with 33.0248 standard error and (61.0293, 198.0920) 95% confidence
interval.
By substituting the values of estimated shape and scale parameter into equation
(4.4), the following figure is obtained.
h(t)
Least Square Estimate of Weibull
Hazard Function
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
1
11
21 31 41 51 61 71
81 91 101
Weeks, t
LSE
Figure 4.5: Least square estimate of Weibull hazard function for the data of the time to
discontinuation of the use of an IUD for 18 women.
65
Figure 4.5 shows the least square estimate of Weibull hazard function for the
data of time to discontinuation of the use of an IUD for 18 women.
The solid line shows the Weibull hazard curve which starts at 0 weeks and ends
at 107 weeks. The lowest value of the Weibull hazard curve is 0 at weeks 0 and the
highest value is 0.0121 at 107 weeks.
The values of the Weibull hazard function are increasing as the time to
discontinuation of the use of an IUD for 18 women increasing. This occurs since the
value of the estimated shape parameter, kˆ is greater than 1 which equal to 1.3379.
4.3.3
The Comparison of Standard Error of Estimated Weibull Hazard Function
Standard error based on each of the estimated Weibull parameters are again more
accurately constructed using a logarithmic transformation. The following equation
shows the standard error of Weibull parameters
se(estimate) = Variance[log(estimate )] .
An approximate 95% confidence interval based on the normal distribution and
the transformed log-estimates is
log(estimate) ± 1.96*se(estimate)
66
The log-transformation improves the normal distribution approximation by
creating a more symmetric distribution.
Table 4.5: The Comparison of Standard Error of Estimated Weibull Hazard Function.
Standard Error
Estimation
Parameter
Method
Maximum
Estimate
Standard Error
Lower
Upper
Bound
Bound
2.8717
Shape, k̂ MLE
1.6764
0.4604
0.9786
Estimate
Scale, λ̂ MLE
98.6440
20.1601
66.0859 147.2420
Least Square
Shape, k̂ LSE
1.3379
0.5016
0.6416
Estimate
Scale, λ̂ LSE
109.9520
33.0248
61.0293 198.0920
Likelihood
2.7897
Table 4.5 shows the comparison of standard error of estimated shape and scale
parameter of Weibull hazard function for maximum likelihood estimate and least square
estimate.
The values in bold numbers show the standard error of respective parameters.
From Table 4.5, maximum likelihood estimation method has the lowest standard errors
of shape and scale parameter compared with least square estimation method such that
k̂ MLE =0.4604
is
less
than
kˆLSE = 0.5016
and
λˆMLE = 20.1601 is
less
than
λˆLSE = 33.0248 .
Thus, maximum likelihood estimation method is more precise than least square
estimation method in estimating the shape and scale parameter of Weibull hazard
67
function. This result holds since the maximum likelihood estimation process takes into
account censored observations making the estimates unbiased.
4.4
The Comparison of the Nonparametric and Parametric Approaches in
Estimating the Hazard Function
The comparison of the nonparametric and parametric approaches in estimating
the hazard function of the use of an IUD for 18 women is done by comparing their
standard errors which are root mean square errors (RMSE).
The RMSE is given by
RMSE =
1 n 2
¦ et
n i =1
where et = h(t ) − hˆ(t ) . Note that h(t ) is the Kaplan-Meier estimate of hazard function
and hˆ(t ) is the best nonparametric and parametric estimate of hazard function chosen
earlier.
68
Table 4.6: The standard error (RMSE) of nonparametric and parametric estimation
methods of hazard function for the data of the time to discontinuation of the use of an
IUD for 18 women.
Estimation Method
Standard Error (RMSE)
MLE
0.0070
EK
0.0085
Table 4.6 shows the standard error (RMSE) of nonparametric and parametric
approaches in estimating the hazard function for the data of the time to discontinuation
of the use of an IUD for 18 women.
From Table 4.6, the Maximum Likelihood Estimate of hazard function has
smaller value of standard error which is 0.0070 compared with Epanechnikov Kernel,
which is 0.0085.
Therefore, the Maximum Likelihood estimate of hazard function is the most
appropriate approach to estimate the hazard function for the data of the time to
discontinuation of the use of an IUD for 18 women.
.
69
Figure 4.6: The Graphical Comparison of the Nonparametric and Parametric Approaches
in Estimating the Hazard Function for the data of the time to discontinuation of the use
of an IUD for 18 women.
Figure 4.6 shows the estimated hazard function for the data of the time to
discontinuation of the use of an IUD for 18 women by using Kaplan-Meier,
Epanechnikov Kernel and Maximum Likelihood estimation method.
The solid line shows the estimated hazard function by using Kaplan-Meier
estimation method. This line is compared with two other doted line which are
Epanechnikov Kernel and Maximum Likelihood estimation method of estimating hazard
function.
70
From weeks 0 until 36, the Epanechnikov Kernel and Maximum Likelihood
estimate of hazard function seem equally fitted with Kaplan-Meier estimate of hazard
function.
Then, during weeks 36 until 94, the Maximum Likelihood estimate has
overestimated the Kaplan-Meier hazard function estimate compared with the
Epanechnikov Kernel estimate.
Next, on 94 until 107 weeks, the Epanechnikov Kernel estimate has
underestimated the Kaplan-Meier hazard function estimate.
In conclusion, the Maximum Likelihood estimation of hazard function fitted well
with Kaplan-Meier hazard function estimation compared with Kernel estimate of hazard
function. This holds since the Maximum Likelihood estimation fitted the overall KaplanMeier estimate of hazard function.
Furthermore, this result also true since the value of bandwidth chosen for
Epanechnikov Kernel estimate of hazard function is biased compared with Maximum
Likelihood estimate that takes into account censored observations making the estimates
unbiased.
CHAPTER 5
CONCLUSIONS AND SUGGESTIONS
5.1
Conclusions
This dissertation has compared two nonparametric approaches which are
Biweight Kernel and Epanechnikov Kernel and two parametric approaches which are
Maximum Likelihood and Least Square estimate of hazard function with Kaplan-Meier
estimate of hazard function. As stated in the introduction, the objective of this
dissertation was to identify the most suitable approach that can be used to estimate the
hazard function through the analysis of error and graphical comparison.
Furthermore, this dissertation is also conducted to study the basic concept of
Kaplan-Meier, Kernel method, Maximum Likelihood Estimate and Least Square Method
in order to estimate the hazard function of the time to discontinuation of the use of an
IUD for 18 women and to find their advantages and disadvantages.
72
The findings of this dissertation recommend that in all-purpose, the most suitable
estimate that can be used to analyze the data of the time to discontinuation of the use of
an IUD for 18 women is the Maximum Likelihood estimation of hazard function. This is
because, it had the smallest value of standard error (Root Mean Square Error) and fitted
well with Kaplan-Meier hazard function estimation compared with other estimates of
hazard function. Furthermore, it also true since Maximum Likelihood Estimation of
hazard function takes into account censored observations making the estimates unbiased.
The results of this study imply that the biasness of a certain estimate of hazard
function will influence the results of the most appropriate approaches in estimating the
hazard function. However, these results are only true for the data of the use of an IUD
for 18 women. The same study need to be carried out for other medical data in order to
see if there are any resemblances with the options for the most appropriate approaches to
estimate the hazard function.
5.2
Suggestions
5.2.1
Suggestions Based on Findings
Based on the findings and the conclusions of the study, the following are several
suggestions to be considered:
73
1.
Other parametric distribution can be used to fit the estimated hazard function
such as exponential, log-logistic and gamma distribution.
2.
Other choices of bandwidth can be used in Kernel function estimate to get more
precise results.
3.
Other survival data can be applied into the estimated hazard function.
5.2.2
Suggestions for Future Research
Since this research had only focused on the nonparametric and parametric
approaches to estimate the hazard function, it is suggested that further studies need to be
carried out on the semiparametric approaches in order to estimate the hazard function.
Future study could also be carried out on survival data that has supplementary
information that is recorded on each individual which is refer as explanatory variables
since these variables may all have an impact on time that patients survives.
74
REFERENCES
Agnew, D.C., Constable, C. (2004). Least Square Estimation.
Collet, D. (1952). “Modeling Survival Data in Medical Research”. Boca Rotan, FL:
Chapman & Hall.
Collet D (1994). “Modeling Survival Data in Medical Research”. Chapman & Hall.
Commenges, D., Huber, C. Nikulin, M.S., (2003). “Probability,Statistics and Modeling
In Public Health”. New York, NY: Springer Science + Business Media.
Cook, A. (2008). Survival and Hazard Functions. Introduction to Survival Analysis,
National University of Singapore.
Cox, D.R., Hinkley, D.V., Reid, N., Snell, E.J. (1991). “Statistical Theory and
Modeling: In Honour of Sir David Cox, FRS. Edited By D.V. Hinkley, N. Reid
and E.J. Snell”. London, New York : Chapman & Hall.
Cox, D. R., Oakes, D. (1984). “Analysis of Survuval Data”. London, New York, Tokyo,
Melbourne, Madras: Chapman & Hall.
75
Franklin, C. H. (2003). Maximum Likelihood Estimation. Duration Models: Exponential
and Weibull Likelihoods, University of Wisconsin-Madison.
Grimshaw, S. D., McDonald, J., McQueen, G. R., Thorley, S. (2003). Estimating
Hazard Functions for Discrete Lifetimes. Radical Eye Software.
Horová, I., Pospísil, Z., Zelinka, J. (2009). Hazard Function for Cancer Patients and
Cancer Cell Dynamics. Journal of Theoretical Biology. 258, 437-443.
Horová, I., Zelinka, J. (2006). Kernel Estimates of Hazard Functions for Biomedical
Data Sets.
Huang, B. (2005). Nonparametric Estimation of Hazard Function from Censored Data
by Kernel Method.
Ives, M., Funk, R., Dennis, M. (2000). LI Analysis Training Series. Survival Analysis /
Life Tables, Chesnut Health System.
Kim, C., Bae, W., Park, B.U. (2005). Nonparametric Hazard Function Estimation Using
the Kaplan-Meier Estimate. Nonparametric Statistics, Taylor and Francis
Group. 17(8), 937 - 948.
Lee, E. T., Go, O. T. (1997). Survival Analysis in Public Health. Annual Review Public
Health. 18, 105-134.
76
Müller, H. G., Wang, J. L. (2007). Density and Failure Rate Estimation with Application
to Reliability. Encyclopedia of Statistics in Quality and Reliability.
Nochai, T., Bodhisuwan, W. (2000). Statistical Reliability Analysis of Sometypes of
Two-Parameter Life Time Distributions. Proceedings of the 2nd IMT-GT
Regional Conference on Mathematics, Statistics and Applications. June, 13-15.
Universiti Sains Malaysia, Penang.
Pérez, G. E., Cimadevila, H. L., Río, A. Q. D. (2002). Nonparametric Analysis of the
Time Structure of Seismicity in a Geographic Region. Annals of Geophysics.
Vol. 45(3/4).
Razali, A. M., Salih, A. A., Mahdi, A. A. (2009). Estimation Accuracy of Weibull
Distribution Parameters. Journal of Applied Sciences Research, INSInet
Publication. 5(7), 790-795.
Rodríguez, G. (2005). Nonparametric Estimation in Survival Models. Princeton.
Subramanian, S., Bean, D. (2008). Hazard Function Estimation from Homogeneous
Right
Censored
Data
with
Missing
Censoring
Indicators.
Statistical
Methodology, ELSEVIER, Science Direct. 5, 515-527.
Tutz, G., Pritscher, L. (1996). Nonparametric Estimation of Discrete Hazard Functions.
Lifetime Data Analysis, Kluwer Academic Publishers. 2, 291-308.
77
Vaal, V. A., Koshkin, G. M. (1999). Kernel Nonparametric Estimation of the Hazard
Rate Function and Its Derivatives. KORUS’99, Mathematics, IEEE Xplore. 496500.
Wang, J. L. (2003). Smoothing Hazard Rates. Encyclopedia of Biostatistics.
Wang, Q. H. (2008). Some Bounds for the Error of an Estimator of the Hazard Function
With Censored Data. Statistics and Probability Letters. 44, 319-326.
Xie, Z., Yan, J. (2008). Kernel Density Estimation of Traffic Accidents in a Network
Space. Geography / Geology Faculty Publications, Western Kentucky
University.
78
APPENDICES
APPENDIX A
The Application of Minitab in Estimating the Hazard Function
In this dissertation, Minitab is used to estimate the weibull hazard function of
multiple myeloma patients by using Maximum Likelihood Estimation and Least Square
Method of estimation.
The following steps show how to estimate the hazard function.
Step 1: Key in all the data of the in the column “Patients” “Times” and “Status”.
79
Step 2: Then, click “Stat”, and choose “Reliability/Survival”. Next, click “Distribution
Analysis (Right Censoring)” and “Parametric Distribution Analysis”.
80
Step 3: Then, insert the “Time” in “Variables” and choose “Weibull” in “Assumed
distributions”.
Step 4: Next, click “Censor” and click “Use censoring columns” and select column
“Status”. Insert 0 in “Censoring value” and click “OK”
81
Step 5: Select “Estimate” and choose “Least Squares (Failure Time (X) on rank (Y)) ”
for Least Square estimate and “Maximum Likelihood” for Maximum Likelihood
estimate and clock “OK”.
82
APPENDIX B
Estimated Hazard Function
t
KM
EK(b=110)
BK(b=110)
LSM
MLE
0
0
0.003306
0.002948
0
0
1
0
0.003394
0.003018
0.0024865 0.00076123
2
0
0.003481
0.00309
0.0031426 0.00121656
3
0
0.003567
0.003164
0.003604
4
0
0.003652
0.003239
0.0039718 0.00194424
5
0
0.003735
0.003316
0.0042828
6
0
0.003817
0.003394
0.0045549 0.00255776
7
0
0.003898
0.003473
0.0047984 0.00283885
8
0
0.003978
0.003554
0.0050199 0.00310719
9
0
0.004057
0.003635
0.0052237 0.00336486
10
0.0062
0.004134
0.003718
0.0054129 0.00361341
11
0.0062
0.00421
0.003801
0.0055901 0.00385403
12
0.0062
0.004286
0.003885
0.0057569 0.00408767
13
0.0062
0.004359
0.00397
0.0059147 0.00431508
14
0.0062
0.004432
0.004055
0.0060646 0.00453689
15
0.0062
0.004503
0.004141
0.0062076 0.00475363
16
0.0062
0.004574
0.004227
0.0063445 0.00496574
17
0.0062
0.004643
0.004313
0.0064758
0.0051736
18
0.0062
0.004711
0.0044
0.006602
0.00537754
19
0.0061
0.004777
0.004486
0.0067237 0.00557785
20
0.0061
0.004843
0.004573
0.0068413 0.00577476
21
0.0061
0.004907
0.00466
0.006955
22
0.0061
0.00497
0.004747
0.0070651 0.00615931
0.00160045
0.002261
0.00596852
83
23
0.0061
0.005032
0.004833
0.0071721 0.00634732
24
0.0061
0.005093
0.004919
0.0072759
0.0065327
25
0.0061
0.005152
0.005005
0.007377
0.00671559
26
0.0061
0.005211
0.00509
0.0074754 0.00689613
27
0.0061
0.005268
0.005175
0.0075713 0.00707444
28
0.0061
0.005324
0.00526
0.0076649 0.00725062
29
0.0061
0.005378
0.005343
0.0077563 0.00742478
30
0.0128
0.005432
0.005426
0.0078457
31
0.0128
0.005484
0.005509
0.0079331 0.00776738
32
0.0128
0.005535
0.00559
0.0080186 0.00793599
33
0.0128
0.005585
0.005671
0.0081024
34
0.0128
0.005634
0.00575
0.0081845 0.00826818
35
0.0128
0.005682
0.005829
0.0082651
0.0084319
36
0.0036
0.005728
0.005906
0.0083441
0.0085941
37
0.0036
0.005773
0.005983
0.0084217 0.00875486
38
0.0036
0.005817
0.006058
0.0084979 0.00891422
39
0.0036
0.00586
0.006132
0.0085728 0.00907222
40
0.0036
0.005902
0.006204
0.0086465 0.00922892
41
0.0036
0.005942
0.006276
0.0087189 0.00938436
42
0.0036
0.005981
0.006345
0.0087902 0.00953857
43
0.0036
0.006019
0.006414
0.0088604
44
0.0036
0.006056
0.006481
0.0089294 0.00984349
45
0.0036
0.006092
0.006546
0.0089975 0.00999426
46
0.0036
0.006126
0.00661
0.0090646 0.01014395
47
0.0036
0.00616
0.006672
0.0091307 0.01029259
48
0.0036
0.006192
0.006733
0.0091958 0.01044021
49
0.0036
0.006223
0.006791
0.0092601 0.01058684
50
0.0036
0.006252
0.006848
0.0093235
51
0.0036
0.006281
0.006904
0.0093861 0.01087722
52
0.0036
0.006308
0.006957
0.0094479 0.01102103
0.007597
0.0081029
0.0096916
0.0107325
84
53
0.0036
0.006334
0.007009
0.0095089 0.01116395
54
0.0036
0.006359
0.007059
0.0095691 0.01130599
55
0.0036
0.006383
0.007106
0.0096286 0.01144719
56
0.0036
0.006405
0.007152
0.0096874 0.01158756
57
0.0036
0.006427
0.007196
0.0097455 0.01172712
58
0.0036
0.006447
0.007238
0.009803
59
0.0078
0.006466
0.007278
0.0098598 0.01200389
60
0.0078
0.006483
0.007316
0.0099159 0.01214113
61
0.0078
0.0065
0.007351
0.0099714 0.01227763
62
0.0078
0.006515
0.007385
0.0100264 0.01241342
63
0.0078
0.006529
0.007417
0.0100807 0.01254849
64
0.0078
0.006542
0.007446
0.0101345 0.01268287
65
0.0078
0.006554
0.007473
0.0101877 0.01281658
66
0.0078
0.006565
0.007498
0.0102404 0.01294962
67
0.0078
0.006574
0.007521
0.0102926 0.01308201
68
0.0078
0.006582
0.007542
0.0103442 0.01321377
69
0.0078
0.006589
0.007561
0.0103953 0.01334489
70
0.0078
0.006595
0.007577
0.010446
71
0.0078
0.0066
0.007591
0.0104962 0.01360532
72
0.0078
0.006603
0.007603
0.0105459 0.01373464
73
0.0078
0.006605
0.007613
0.0105952 0.01386338
74
0.0078
0.006606
0.007621
0.010644
75
0.0079
0.006606
0.007626
0.0106924 0.01411916
76
0.0079
0.006605
0.007629
0.0107403 0.01424623
77
0.0079
0.006602
0.00763
0.0107878 0.01437275
78
0.0079
0.006599
0.007628
0.010835
79
0.0079
0.006594
0.007625
0.0108817 0.01462421
80
0.0079
0.006588
0.007619
0.0109281 0.01474917
81
0.0079
0.00658
0.007611
0.010974
82
0.0079
0.006572
0.007601
0.0110196 0.01499758
0.01186589
0.01347541
0.01399155
0.01449874
0.01487363
85
83
0.0079
0.006562
0.007589
0.0110648 0.01512105
84
0.0079
0.006551
0.007574
0.0111097 0.01524404
85
0.0079
0.006539
0.007557
0.0111542 0.01536656
86
0.0079
0.006526
0.007539
0.0111984 0.01548861
87
0.0079
0.006511
0.007518
0.0112422
88
0.0079
0.006496
0.007495
0.0112857 0.01573134
89
0.0079
0.006479
0.00747
0.0113288 0.01585203
90
0.0079
0.006461
0.007442
0.0113717 0.01597229
91
0.0079
0.006441
0.007413
0.0114142 0.01609212
92
0.0079
0.006421
0.007382
0.0114564 0.01621152
93
0.0417
0.006399
0.007349
0.0114984
0.0163305
94
0.0417
0.006376
0.007314
0.01154
0.01644907
95
0.0417
0.006352
0.007276
0.0115813 0.01656723
96
0.0417
0.006327
0.007237
0.0116224 0.01668499
97
0.02
0.006301
0.007196
0.0116631 0.01680235
98
0.02
0.006273
0.007154
0.0117036 0.01691932
99
0.02
0.006244
0.007109
0.0117438 0.01703591
100
0.02
0.006214
0.007063
0.0117838 0.01715211
101
0.02
0.006183
0.007015
0.0118234 0.01726794
102
0.02
0.006151
0.006965
0.0118629
0.0173834
103
0.02
0.006117
0.006914
0.011902
0.01749849
104
0.02
0.006082
0.006861
0.0119409 0.01761323
105
0.02
0.006046
0.006806
0.0119796
0.0177276
106
0.02
0.006009
0.00675
0.012018
0.01784163
107
0.02
0.005971
0.006692
0.0120562
0.0179553
0.0156102
86
APPENDIX C
Standard Error of Kernel Estimate of Hazard Function
t
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SE(EK)
SE(BK)
0.001001 0.001031
0.001014 0.001043
0.001027 0.001056
0.00104
0.001068
0.001052 0.001081
0.001064 0.001094
0.001075 0.001106
0.001087 0.001119
0.001098 0.001132
0.001109 0.001145
0.00115
0.00119
0.00116
0.001203
0.001171 0.001216
0.001215 0.001265
0.001225 0.001278
0.001235 0.001292
0.001245 0.001305
t
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
SE(EK)
SE(BK)
0.001254 0.001318
0.001302 0.001373
0.001354 0.001431
0.001363 0.001445
0.001372 0.001459
0.001381 0.001473
0.001439 0.001538
0.001447 0.001552
0.001456 0.001565
0.001464 0.001578
0.001472 0.001591
0.00148
0.001604
0.001487 0.001617
0.001551 0.001691
0.001558 0.001704
0.001566 0.001716
0.001573 0.001729
87
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
0.00158
0.001741
0.001586 0.001753
0.001658 0.001836
0.001664 0.001848
0.001745 0.001943
0.001751 0.001954
57
58
59
60
61
62
0.002028 0.002341
0.002031 0.002347
0.002157 0.002497
0.00216
0.002503
0.002163 0.002509
0.002165 0.002515
0.001758 0.001966
63
0.002168
0.00252
0.001764 0.001977
64
0.00217
0.002525
0.001769 0.001988
65
0.002172
0.00253
0.001775 0.001999
66
0.002174 0.002534
0.00178
0.002009
67
0.002175 0.002538
0.001786 0.002019
68
0.002176 0.002542
0.001791 0.002029
69
0.002178 0.002545
0.001796 0.002039
70
0.002179 0.002547
0.002048
71
0.002179
0.00255
0.001805 0.002057
72
0.00218
0.002552
0.001809 0.002065
73
0.00218
0.002553
0.001813 0.002074
74
0.00218
0.002555
0.001817 0.002082
75
0.002331 0.002732
0.001821 0.002089
76
0.002331 0.002733
0.001913 0.002199
77
0.00233
0.002733
0.001917 0.002207
78
0.00233
0.002732
0.0018
88
0.002024 0.002333
79
0.002329 0.002732
0.002328 0.002731
103
0.002654 0.003078
81
0.002326 0.002729
104
0.002959 0.003428
82
0.002325 0.002728
105
0.00295
0.003414
83
0.002323 0.002725
106
0.002941
0.0034
84
0.002321 0.002723
107
0.005863 0.006771
85
0.002319
56
80
86
87
88
89
90
91
92
93
94
0.00272
0.002317 0.002716
0.002314 0.002713
0.002311 0.002709
0.002308 0.002704
0.002305 0.002699
0.002302 0.002694
0.002298 0.002688
0.002478 0.002897
0.002474
0.00289
95
0.002469 0.002882
96
0.002464 0.002875
97
0.002694
98
0.002688 0.003131
99
0.002681 0.003121
100
0.002675 0.003111
0.00314
AVERAGE 0.001929 0.002192
89
101
0.002668
0.0031
0.002661 0.003089
102
90
APPENDIX D
The Application of Mathcad in Kernel Estimate of the Hazard Function
§
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
t := ¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
©
1
1
1
3
4
4
5
5
5
5
6
6
7
8
10
10
10
10
10
11
12
12
13
14
15
15
16
16
17
18
18
18
18
23
24
36
40
40
40
50
51
52
56
65
66
76
88
91
·
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¹
§
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
d := ¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
©
1
1
1
0
1
1
1
1
1
1
1
1
0
1
1
0
1
1
1
0
0
1
1
1
0
1
1
1
1
1
0
0
1
1
1
1
1
0
1
1
1
0
0
1
1
0
1
1
·
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¹
91
h := 94.5
k := 2
i := 1 .. 48
n := 48
x := 0 .. 90
Epanechnikov Kernel
1
λE ( x) := §¨ ¸· ⋅
©h¹
47
¦
i=0
ª« ª« ª x − t º 2»º d
( i)»
i
«0.75⋅ «1 − «
»
h
n
−
i
+
¬ ¬ ¬
¼¼
λE ( x) =
º»
»
1¼
92
Standard Error
Standard Error of Epanechnikov Kernel
´
µ
vE( K) := µ
¶
1
−1
(
)2
ª¬0.75 1 − x2 º¼ dx
vE( K) = 0.6
Biweight Kernel
§1·
λB ( x) := ¨
¸⋅
©h¹
λB ( x) =
47
¦
i=0
2
ª
ª« ª x − t º 2º»
«
d
( i)»
15
i
«§¨ ·¸ ⋅ «1 − «
»
¬© 16 ¹ ¬ ¬ h ¼ ¼ n − i +
º
»
»
1¼
93
APPENDIX E
The Newton-Raphson Procedure
Models for censored survival data are usually fitted by using the NewtonRaphson procedure to maximize the partial likelihood function, and so the procedure is
outlined in this section.
Let
* *
u ( β ) be the p × 1 vector of first derivatives of the log-likelihood function
in equation
n
&
& ' ½
­& &
log L( β ) = ¦ δ i ®β ' xi − log ¦ exp(β ' xl )¾
i =1
l∈R ( t i )
¯
¿
with respect to the β -parameters. This quantity is known as the vector of the efficient
& &
scores. Also, let I ( β ) be the p× p matrix of negative second derivatives of the log& &
likelihood, so that the ( j, k)th element of I ( β ) is
&
∂ 2 log L( β )
.
−
∂β j ∂β k
& &
The matrix I ( β ) is known as the observed information matrix.
94
According to Newton-Raphson procedure, an estimate of the vector of β -
&ˆ
parameters at the (s + 1)th cycle of the iterative procedure, β s +1 , is
&ˆ
&ˆ
&
&ˆ & &ˆ
β s +1 = β s + I −1 ( β s )u ( β s ) ,
& &ˆ
& &ˆ
for s = 0,1,2,..., where u ( β s ) is the vector of efficient scores and I −1 ( β s ) is the inverse
&
of the information matrix, both evaluated at β̂ s . The procedure can be started by taking
&ˆ
&
β 0 = 0 . The process is terminated when the change in the log-likelihood function is
sufficiently small.
When the iterative procedure has converged, the variance-covariance matrix of
the parameter estimates can be approximated by the inverse of the information matrix,
&
& &ˆ
evaluated at β̂ , that is I −1 ( β s ) . The square root of the diagonal elements of this matrix
are then the standard errors of the estimated values of β 1 , β 2 ,..., β p .
Download