Assignment #9 - Winona State University

advertisement
STAT 405 – Biostatistics ~ Assignment #9
(102 points)
1. Mayo Clinic Lung Cancer Data
Description: Survival in patients with lung cancer at Mayo Clinic. Karnofsky
performance scores rate how well the patient can perform usual daily activities
Variable Names:
inst:
time:
status:
age:
sex:
ph.ecog:
ph.karno:
pat.karno:
meal.cal:
wt.loss:
Institution code (DO NOT USE IN YOUR ANALYSIS)
Survival time in days
censoring status 0=censored, 1=dead
Age in years
Male=1 Female=2
ECOG performance score (0=good 5=dead)
Karnofsky performance score (bad=0-good=100) rated
by physician
Karnofsky performance score rated by patient
Calories consumed at meals
Weight loss in last six months
Download Lung.txt from the course website and then source it into R. Then do
the following to make sex a factor variable and fix the censoring variable so
censor = 0 and dead = 1.
> source(file.choose())  locate the Lung.txt file in your directory.
> Lung$sex = as.factor(Lung$sex)
> Lung$status = Lung$status – 1
a) Perform a test so see if the survival experience differs across gender. Obtain and plot
the Kaplan-Meier estimates for both genders. Discuss. (4 pts.)
b) Fit a model using age, sex, ph.ecog, ph.karno, pat.karno, meal.cal,
and wt.loss. Briefly summarize the model (don’t compute HR’s), which variables
appear to be significant? (4 pts.)
c) Test proportional hazards assumptions for the significant covariates (p < .10). Which
variables appear to have a violation of proportional hazards assumption? Include plots
from the cox.zph function in support of the p-values from the PH test. (4 pts.)
1
Notice: There is only on patient classified as having an ECOG performance score of 3 or
more. We might want to collapse categories so ECOG would be coded: 0, 1, 2+. This can
done using the following commands.
> table(Lung$ph.ecog)
0 1 2 3
47 81 38 1
> Lung$ph.ecog[Lung$ph.ecog==3] = 2
> table(Lung$ph.ecog)
0 1 2
47 81 39
We could then treat ph.ecog as a factor variable with levels - 0, 1, 2 (which is 2+ now).
d) Using sex as a stratification variable refit the model from part (b) with the recoded
ph.ecog instead of the original one. Remove any predictors from your model with pvalue < .20.
i) Confirm that your reduced model is Ok using a general Chi-square test. (3 pts.)
ii) Assess proportional hazards (cox.zph) for your reduced model. Discuss. (3 pts.)
iii) Examine martingale residual plots for the variables that are treated as numeric, do
you see any evidence of the need to transform any of the predictors? (3 pts.)
iv) Examine DFBETAS for all estimated coefficients in your final model. Do see any
problems with undue influence? Discuss. (4 pts.)
e) Interpret the coefficients in your final model from part (d). If the LCL for the HR is
below one and UCL is not, discuss how large the HR could be by focusing on the UCL.
(4 pts.)
f) Construct plots of the survival curves for different cohorts of patients. Choose these
curves so that they illustrate the effect of the continuous covariates in your final model.
Label the plots so the cohorts are identifiable from the plots. For any particular set of
values for the predictors in your model you will get two curves one for males and one
for females (i.e. one for each strata) and thus you will always get two cohorts
determined by gender automatically. Discuss each of your plots. (8 pts.)
2
2. A Clinical Trial in the Treatment of Carcinoma of the Oropharynx
Datafile:
Pharynx.R
SIZE:
192 observations, 13 variables.
Use Source R Code… to read it into R.
DESCRIPTIVE ABSTRACT:
The Pharynx.R file gives the data for a part of a large clinical trial carried out by the
Radiation Therapy Oncology Group in the United States. The full study included
patients with squamous carcinoma of 15 sites in the mouth and throat, with 16
participating institutions, though only data on three sites in the oropharynx reported by
the six largest institutions are considered here. Patients entering the study were
randomly assigned to one of two treatment groups, radiation therapy alone or radiation
therapy together with a chemotherapeutic agent. One objective of the study was to
compare the two treatment policies with respect to patient survival.
LIST OF VARIABLES:
Variable
Description
_______________________________________________________________________
CASE
Case Number
INST
Participating Institution
SEX
1=male, 2=female
TX
(DO NOT USE AS A COVARIATE!!!)
(DO NOT USE AS A COVARIATE!!)
Treatment: 1=standard, 2=test
GRADE
1=well differentiated, 2=moderately differentiated,
3=poorly differentiated,9=missing (missing cases deleted!)
AGE
In years at time of diagnosis(yrs.)
COND
Condition: 1=no disability, 2=restricted work, 3=requires assistance
with self care, 4=bed confined, 9=missing (missing cases deleted!)
SITE
1=faucial arch, 2=tonsillar fossa, 4=pharyngeal tongue
T_STAGE
1=primary tumor measuring 2 cm or less in largest diameter,
2=primary tumor measuring 2 cm to 4 cm in largest diameter with
minimal infiltration in depth, 3=primary tumor measuring more
than 4 cm, 4=massive invasive tumor
N_STAGE
0=no clinical evidence of node metastases, 1=single positive
node 3 cm or less in diameter, not fixed, 2=single positive
node more than 3 cm in diameter, not fixed, 3=multiple
positive nodes or fixed positive nodes
ENTRY_DT
Date of study entry: Day of year and year, dddyy
STATUS
0=censored, 1=dead
TIME
Survival time in days from day of diagnosis
3
_______________________________________________________________________
STORY BEHIND THE DATA:
Approximately 30% of the survival times are censored owing primarily to
patients surviving to the time of analysis. Some patients were lost to follow-up because
the patient moved or transferred to an institution not participating in the study, though
these cases were relatively rare. From a statistical point of view, an important feature of
these data is the considerable lack of homogeneity between individuals being studied.
Of course, as part of the study design, certain criteria for patient eligibility had to be met
which eliminated extremes in the extent of disease, but still many factors are not
controlled.
This study included measurements of many covariates which would be expected
to relate to survival experience. Six such variables are given in the data (sex, T staging, N
staging, age, general condition, and grade). The site of the primary tumor and possible
differences between participating institutions require consideration as well.
The T,N staging classification gives a measure of the extent of the tumor at the primary
site and at regional lymph nodes. T=1, refers to a small primary tumor, 2 centimeters or
less in largest diameter, whereas T=4 is a massive tumor with extension to adjoining
tissue. T=2 and T=3 refer to intermediate cases. N=0 refers to there being no clinical
evidence of a lymph node metastasis and N=1, N=2, N=3 indicate, in increasing
magnitude, the extent of existing lymph node involvement. Patients with classifications
T=1,N=0; T=1,N=1; T=2,N=0; or T=2,N=1, or with distant metastases were excluded
from study.
The variable general condition gives a measure of the functional capacity of
the patient at the time of diagnosis (1 refers to no disability whereas 4 denotes bed
confinement; 2 and 3 measure intermediate levels). The variable grade is a measure of
the degree of differentiation of the tumor (the degree to which the tumor cell resembles
the host cell) from 1 (well differentiated) to 3 (poorly differentiated)
In addition to the primary question whether the combined treatment mode is
preferable to the conventional radiation therapy, it is of considerable interest to
determine the extent to which the several covariates relate to subsequent survival. It is
also imperative in answering the primary question to adjust the survivals for possible
imbalance that may be present in the study with regard to the other covariates. Such
problems are similar to those encountered in the classical theory of linear regression and
the analysis of covariance. Again, the need to accommodate censoring is an important
distinguishing point. In many situations it is also important to develop nonparametric
and robust procedures since there is frequently little empirical or theoretical work to
support a particular family of failure time distributions.
4
Analyze these data and summarize your findings. Make sure address whether the
treatment is related to patient survival, which means you DO NOT remove Tx from
the model! (15 pts.)
3 – AIDS Survival
Conduct an analysis of time to AIDS diagnosis or death (time) with censor indicator (censor).
The data are contained in the file AidsIDV.R which you can read into R using “Source R code…”
from the File pull-down menu. Do NOT use the variables: id,time_d,censor_d,
txgrp,cd4,or cd4ind in your analysis. Summarize your findings. Interest centers whether
or not the addition of IDV (tx) to their treatment regimen increases “survival”. (20 pts.)
5
Nonparametric Methods Problems
4. Hypertension (Rosner pg. 347) – Problems 9.19 – 9.23 (1, 2, 1, 2, 2 pts.)  enter these
data yourself!
5. Health Promotion (Rosner pg. 348) – Problems 9.28 – 9.32 (Smoke.JMP) (2 pts. each)
6.
Table 6.12
Conduct an appropriate test to answer the question of interest. If you conclude there are
differences between these groups in terms of typical protoporphyrin level then use
multiple comparisons to determine which groups significantly differ. (6 pts.)
6
7.
Table 7.12
Conduct an appropriate test to answer the question of interest. If you conclude there are
differences between the peak ft/lb readings across treatment group then use multiple
comparisons to determine which groups significantly differ.
(6 pts.)
7
Download