A Socio-Demographic Study 2010 LIBRARIES

advertisement
End of Life Resuscitation Patterns:
A Socio-Demographic Study
MASSACHUSETTS INSTiTE,
I OF TECHNOLOGY
of
Intensive Care Unit Patients
JUN 0 2 2010
LIBRARIES
ARCHNES
By
Sharon L. Lojun, MD
SUBMITTED TO THE DIVISION OF HEALTH SCIENCES AND TECHNOLOGY IN PARTIAL
FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE IN BIOMEDICAL INFORMATICS
AT THE
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
MAY 17, 2010
@2010 Sharon L. Lojun All rights reserved.
The author hereby grants to MIT permission to reproduce
and to distribute publicly paper and electronic
copies of this thesis document in whole or in part
in any medium now known or hereafter created.
Si gnature of Author:
Division of Health Sciences and Technology
May, 2010
Certified by:
Regina Barzilay, PhD
Associate Professor of Electrical Engineering
Computer Science
Accepted by:
Ram Sasisekharan, I iD
Director, Harvard-MIT Division of Health Sciences and
Technology; Edward Hood Taplin Professor of Health
Sciences & Technology and Biological Engineering
Table of Contents
I.
II.
III.
IV.
V.
VI.
Table of Contents
Dedication
Abstract
Background and Introduction
a. Motivation
b. Research Questions
c. Methods
d. Key Findings
e. Contributions
Related Work
Methods
a. Data
i. Database
ii. Dataset
iii. ICU Text
iv. Hospital Text
b. Nursing Notes as resuscitation code classification
i. Text Preprocessing
ii. Medical Metrics
iii. Demographics
iv. BoosTexter classification
v. Ablation Algorithm
vi. Medical Condition/Text sub-analysis
vii. Statistical Analysis
c. N-gram Analysis
i. Text Preprocessing
ii. Pattern Recognition
d. Classification Model Creation
i. Univariate analysis
ii. Multivariate analysis
iii. Interaction terms
iv. Model performance
v. Evaluation of gender
vi. Evaluation of age
e. Physician Notes as resuscitation code classification
i. Text Preprocessing
ii. BoosTexter classification
iii. Ablation Algorithm
f. Physician Notes annotation analysis
i. Univariate classification analysis
ii. Gender Analysis
VII. Results
a. Nursing Notes as resuscitation code classification
i. Comparative Impact of Nursing Social Text and Medical
Metrics
ii. Individual Feature Prediction
iii. Most Predictive Models
b. N-gram Analysis
i. Gender differences
ii. Visitation differences
c. Classification Model Creation
i. Feature Prediction
ii. Model
iii. Model Performance
iv. Gender and Age Effect
d. Physician Notes as resuscitation code classification
i. Individual Feature Prediction
ii. Comparison with Nursing and medico-demographic
classification
iii. Most Predictive Models
e. Physician Notes annotation analysis
i. Annotation Error
ii. Annotation predictions
1. Children
2. Living Situation
3. Marital Status
4. Employment Situation
iii. Gender Effect
iv. Age Effect
VIII. Discussion
IX. Summary
X.
References
XI. Acknowledgments
XII. Appendix
Dedication
Devoted to the memory of my mother, Joyce Ann Fleming Lojun; and in
honor of my father, Edward Charles Lojun, Sr, in appreciation for their
guidance and love; to the memory of G. Tom Shires II, MD, who taught me
to aspire to be a surgeon, scientist, physician, and teacher; to Philip Barie,
MD, who inspired me technically and intellectually; to Cornell University
Medical College; to Mitchell Medow who is a special teacher and special
person; to my brother, Edward; to my sister Teresa; to Paulie Pena; to my
friends, Christy Sauper and Amy Lapidow; to Isaac Schiff, MD, David
Grimes, MD, Wayne Cohen, MD inspirational leaders, and special friends; to
Robert Friedman MD, who believed in me and guided me; to Alexa McCray, a
women of inspirational ability and integrity; to Pete Szolovits, wonderful
teacher, kind person and firm proponent of collaboration; to Regina Barzilay
beloved teacher; and to MIT, BU and all my friends at BIRT.
Abstract
This study investigates the effect of age, gender, medical condition, and
daily free text input on classification accuracy for resuscitation code status.
Data was extracted from the MIMICII database. Natural language processing
(NLP) was used to evaluate the social section of the nurses' progress notes.
BoosTexter was used to predict the code-status using text, age, gender, and
SAPS scoring. The relative impact of features was analyzed by feature
ablation. Social text was the greatest single indicator of code status. The
addition of text to medical condition features increased classification
accuracy significantly (p<0.001.) N-gram frequency was analyzed. Gender
differences were noted across all code-statuses, with women always more
frequent (e.g. wife>husband.) Visitors and contact were more common in
the less aggressive resuscitation codes. Logistic regression on medical, age,
and gender features was used to determine gender bias or ageism.
Evidence of bias was found; both females (OR=1.47) and patients over age
70 (OR=3.72) were more likely to be DNR. Feature ablation was also
applied to the social section of physician discharge summaries, as well as to
The addition of annotated features increased
annotated features.
classification accuracy, but the nursing social text remained the most
individually predictive. The annotated features included: children; living
situation; marital status; and working status. Having zero to one child;
living alone or in an institution; being divorced or widow or widower; and
working, working in white collar job, or being retired, were all associated
with higher rates of DNR status, and lower rates of FC status. Contrarily,
living with family; being married; and being unemployed, were all associated
with lower rates of DNR status, and higher rates of FC status. Some of
these findings were gender and/or age dependent.
Introduction and Background
Motivation: Critical Care is the costliest and most invasive of medical care.
The sickest 5% of the U.S population consumes nearly 30% of the health
care costs. This care can lead to remarkable recovery; however, in some
cases it may lead to prolonged invasive care without benefit. The challenge
is to identify the best candidates for ICU care, avoiding needless patient and
family suffering, and the waste of trillions of dollars in medical resources.
Applying the best resuscitation rules based on the patient's wishes and
medical prognosis helps solve this dilemma, especially if applied expediently.
Currently, determining resuscitation status during ICU stays is generally left
to family and physicians.
Understanding resuscitation code assignment in
the ICU is complex, and this study lends a preliminary attempt at this need.
It is vital to understand the factors which influence code assignment in order
to ensure ethical treatment of all patients, and provide treatment in
harmony with patients' wishes.
Knowledge of specific driving factors of
resuscitation code-status is limited. The purpose of this study is to evaluate
a large ICU database for family and social characteristics associated with
code status, including the relationship to corresponding medical measures
and demographic attributes.
For the purpose of this paper: Full Code, all resuscitation measures, will be
designated FC; Do Not Resuscitate or Do Not Intubate, limited resuscitation
in the case of cardiac or pulmonary arrest, will be designated DNR; and
Comfort Measures Only will be designated CMO.
Research Questions: This study investigates nursing social text (largely a
catalogue of family and loved ones' visits, feelings and understanding, and
physician meetings), age, gender, and medical condition as predictors of
code status assignment.
demographic features?
Specifically, what are the driving
socio-
Do these factors match the medical condition? Is
there evidence for ageism or gender bias? What are the most frequent
unigrams, bigrams, and trigrams?
the frequencies?
Are there any patterns associated with
Can the non text features be modeled to discern the
presence of gender bias or ageism?
Physician social text of the hospital discharge summary (largely a catalogue
of children's' involvement, living situation, marital status, and employment
situation) is also investigated as a predictor of code status. What are the
most predictive features? Are they age or gender specific?
Methods:
The machine learning algorithm, BoosTexter, was used on a
training set and test set, of the social sections of the nursing ICU notes, and
the physician hospital discharge summaries for classification code accuracy.
N-gram study of the nursing notes was performed to identify any meaningful
trends.
Annotation of the physician notes was done by hand. The corpus was
classified according to the annotation features. After annotation, BoosTexter
was used to classify code status.
Logistic Regression was used to create a model of non-text features.
All
possible interaction terms were evaluated. Social features of the physician
hospital summaries were evaluated for feature association findings in code
assignment.
Key Findings:
Code status seems to be more a reflection of family and
physician sentiment and assessment, than of unprejudiced medical measure.
It is clear from these analyses that medical condition and prognosis, alone,
are not likely the leading driving factor of code assignment; and in fact these
findings mirror those associated with ICU resource allocation itself.
This is the first study, known to the author, to evaluate prediction of code
status.
The nursing notes alone proved to be a better indicator of code
status than the available medical statistics. Age and gender were also highly
predictive.
When combined with medical features, nursing social text
improves classification accuracy remarkably compared to classification on
medical metrics alone.
Initially, it was noted that the physician notes were not terribly useful as free
text alone; and therefore, an annotation was performed.
The annotation
features were found to be less predictive individually, than nursing text, age
or gender; however, they were more predictive than the SAPSI score.
In
combination with other features, the annotation features reduced the
classification error even further.
However, the least prediction error was achieved with the logistic regression
model.
The model was tested for interaction terms, and for performance.
The precision of the model was excellent, with an AUC (C index) of 0.784.
Calibration, however, was poor. Using the model, an assessment of gender
and age effect was made using the Odds Ratios. Women were 1.47 times
more likely to be DNR rather than FC compared to men. Those aged 70 or
greater were 3.72 times more likely to be DNR rather than FC compared to
those younger than age 70.
There was found, notably clear, gender bias
toward men, and ageism favoring the young.
The n-gram study of the nursing social text revealed interesting specific
differences in gender involvement, with the female counterpart always more
frequent than the male counterpart, regardless of code state. It cannot
readily be concluded that the female gender (wife, daughter, sister) has a
more decisive role compared with the male gender (husband, son, brother);
but it is reasonable to conclude that there is more daily female support.
There does not seem to be bias in this regard to any specific code group, as
the relationship is consistent across all code states.
The findings of the
bigram study are not surprising. There are more visitors, and more contact
as code status progresses from FC to DNR to CMO.
It is likely that family
and loved ones are more involved when death seems imminent or more
likely.
The annotated features of the physician hospital discharge summaries
revealed vulnerable groups.
For children: having zero children (females
only) or one child was associated with decreased rates of FC status, and
increased rates of DNR status; having many children did not differ from the
baseline rates in the full corpus. For living situation: being institutionalized
(nursing home, rehabilitation facility, assisted living, group home, and
others) was associated with decreased rates of FC and increased rates of
DNR statuses; the same trend was observed for living alone (for older
females only), so this is likely a simple age effect; living with family was
associated with increased FC rates and decreased DNR rates (no age or
gender effect). For marital status: being a widow or widower was associated
with decreased rates of FC, and increased rates of DNR, with divorce status
following this trend; being married and female had the reverse observation;
married males and single people did not differ from the underlying corpus.
For employment status:
the values working or white collar were both
associated with decreased FC, and increased DNR;
retired followed this
pattern for men only; unemployed females were observed to follow the
opposite direction, increased FC, and decreased DNR; blue collar, disabled,
and volunteer did not differ from the corpus.
Contributions: The contributions of this work lie largely in finding: that the
driving force in resuscitation code assignment is not medical condition, but
perhaps family sentiment; that women are far more likely to be involved in
the care of ICU patients, regardless of the code status; and that numbers of
children, living situation, marital status, and employment status weigh
heavily on the prediction of code status; and that modeled ageism and
gender bias is very marked.
Finally, there is no evidence known to the
author, that machine learning techniques and logistic regression modeling
have been used in pursuit of this information primarily (however, data
analysis using logistic regression modeling was done by Philip Barie's group,
upon finding gender bias by surprise) (1)
The concept of advanced directives (AD), or living wills, has sought to help
in making a patient's wishes known and followed; however they are
sometimes vague and unable to predict all possible clinical scenarios.
Additionally, ADs have not been very successful in the United States, in
comparison to Japan, e.g. (2) Cultural differences and many other reasons
are cited as the causes.
The likelihood to have an AD is dependent on
advanced age and on increased income. (3) The elderly are interested in
discussing CPR, but do not necessarily want their wishes committed on
paper. (4)
Joos found, in a self-administered questionnaire of general
medical patients, 72% had knowledge of AD, 53% discussed with family,
and only 14% had discussed with their physician. (5) Half of the patients
felt the terminology should be simplified. (5) So, it may be possible in many
instances that the family has the best understanding of the patient's wishes.
Of note, the majority of geriatricians do not establish AD. (6)
Most patients come to ICU care after a sudden change in health, rather than
by a foreseen episode. Code status is generally assigned as Full Code (FC),
until it is possible to sort out the likely prognosis, and obtain information
about a patient's wishes.
Even in the case of AD, it is often difficult to
predict whether stabilization will occur with brief critical care interventions,
and therefore difficult for the physician to interpret the AD in all situations.
As a result, patients are defaulted to Full Code (FC) status; less aggressive
code assignment, such as do-not-resuscitate (DNR), do-not-intubate (DNI),
and comfort-measures-only (CMO), usually does not occur until after entry
into the ICU. In many cases, the patient is unable to communicate due to
treatment or illness. The assignment of code status would then be made by
the closest family relative in conjunction with the medical staff.
Related Work
Eachempati
prospectively studied 723 patients undergoing emergency
surgery. (1) The outcome measures were age, sex, admission diagnosis,
age adjusted APACHE III scores (medical metric), issuance of DNR order,
morbidity, and mortality. The patients were stratified as >75, and younger.
Statistical analysis and model formation was performed.
Logistic regression for new DNR order was performed using sex + MOD
(Multiple Organ Dysfunction - medical metric) + Age + aAIII (age adjusted
APACHE III). The model had a discrimination of 88.9; and goodness of fit of
3.876 (p=.868), implying good calibration. The OR for sex = 2.512, MOD =
1.410, and age = 1.054. DNR order was predominantly predicted by gender
and to a lesser extent by MOD and age.
Eachampati criticized their own findings for limitation in their data to better
explain the gender and age bias. For example they lacked information about
advanced directives, and other factors such as family status, and culture.
Their gender and age biases are similar to those findings of this dissertation.
However, MOD score (a medical metric) was more predictive than age;
medical metrics were not more predictive in this study. This may be due to
a better medical metric, especially in the age group studied. Interestingly,
this thesis shows that family components, for example, marriage and
numbers of children are predictive of code status.
The closest family relative is most often the spouse, and several studies
have suggested that marital status may have substantial impact upon health
care received, and even on outcomes.
Iwashyna et.al, found that married
patients visit higher quality hospitals and may receive better out-patient
care; but receive similar quality of care as that of widows and widowers once
admitted.
(7)
Caberera-Alonso et.al, reported the expenditures of the
married far outweigh the expenditures of the unmarried; with no differences
in the number or types of visits. (8)
Married women were found to have
earlier breast cancer diagnosis, better treatment, and better survival,
independent of any socio-economic or cultural effect. (9)
These three
studies suggest that there is a more aggressive approach, perhaps more
procedural approach for married patients. Iwashyna surmised that this was
the result of the improved advocacy of the spouse over the health care
worker; but this does not consider the wishes and feelings of the patient
him/herself, which may be different in married, compared with unmarried
life statuses.
In addition, it does not consider the impact of children and
extended family.
Similarly, gender differences may be extrinsic or intrinsic.
Valentin et.al,
identified gender bias in ICU resource and invasive procedure allocation; but
this does not account for gender differences in sentiment about invasive
care. (10)
Contrary to these findings are those of de Rooij. (11)
de Rooij used
recursive partitioning to demonstrate that medical metrics are successful in
predicting mortality in the ICU, with age as a feature itself, non-significant.
A total of 6,867 consecutive patients 80 years and older from 21 Dutch ICUs
were analyzed.
Medical metrics included: Glasgow Coma Scale, Acute
Physiology and Chronic Health Evaluation II, Simplified Acute Physiology
Score II (SAPS II), and Mortality Probability Models II Scores.
A recursive partitioning model using all of the medical metrics except SAPS
II was developed. The performance of the model was measured by the AUC
of the Receiver Operator Curve. The tree identified most patients with high
risk of mortality (9.2% versus 8.9% of patients using the tree versus the
original SAPS II score, had a risk of 80% or more of mortality. For the age
adjusted SAPS II score, 5.9% had 80% or more risk of mortality.)
Using
80% as the cut-off point, the positive predictive values were 0.88, 0.83, and
0.87 for the tree, SAPS II, and recalibrated SAPS II.
Other than Eachempati's work, evidence of ageism in the ICU seemed to be
absent in the literature. Hubbard et.al, performed a cross-sectional study on
4058 patients in South Wales in which he concluded that ageism in access to
critical care does not exist. Sick patients in five hospitals were studied every
1 2 th
day for one calendar year. Demographic, clinical and physiologic data
were collected. Ten members of the Welsh Intensive Care Society studied
each case, while blinded to the patient's age.
Decisions were made by
consensus. Medical conditions included use of the APACHE II Score.
The Intensivist group determined that 53% of ward based patients were
better suitable for ICU care, and 12.4% of ICU patients were better suitable
for ward care.
The proportions of those considered to be in inappropriate
care settings differed little by age grouping. (12)
Methods
Classification Algorithm:
BoosTexter Classification - BoosTexter is a freely available machine
learning classification package. (13) It uses a boosting algorithm to classify
text and feature attributes. Specifically, at each point, the algorithm selects
the most predictive feature when used in combination with other features,
and produces classification errors for that specific feature. It does this by
creating a model of prediction, which is un-weighted. Then, misclassified
features are evaluated and increasing weights are applied to these values.
The algorithm continues for the specified number of iterations. In addition,
when classifying text, the use of n-gram may be selected; such that in the
case of bigrams, unigrams and bigrams are evaluated, and so forth.
Features:
Daily ICU nursing social sections; physician hospital discharge summaries;
annotated features of physician notes (number of children, living situation,
marital status, and employment status); age; gender; medical metrics
Medical Metrics - SAPSI(1) (Simplified Acute Physiology Score), is by
definition calculated on the first day of ICU admission. In order to augment
the medical measures, SAPSI(2) was calculated for day two, and the
difference was calculated as the Delta (D) between the two SAPS scores.
These three measures were used to quantify the patient's overall medical
condition.
If the data for SAPS calculation was not available, e.g., in the
case of CMO status, then the entry was null.
Data Set:
Database - MIMIC II Database (an ongoing NIH-sponsored Bioengineering
Research Partnership (NIBIB BRP 5RO1EB001659) including investigators at
MIT, Philips Medical Systems, and Boston's Beth Israel Deaconess Medical
Center.) was used.
The database is a repository of information from
multiple critical care units.
It includes ICU information (observations,
measurements, interventions, and ICU daily notes from all services except
physicians),
and
hospital
medical
information
(laboratory
medications prescribed, and hospital discharge summaries.)
results,
The data are
de-identified, and reformatted. The database contains information from over
30,000 patient admissions (from over 26,000 unique patients.)
Dataset - Data extraction included adults (age greater than 15years) from
all critical care areas, and was stratified according to code status.
For
patients who transitioned from Full Code (FC) status to do-not-resuscitate or
do-not-intubate (DNR),
or to comfort-measures-only
(CMO),
the last
recorded code status was used. For the purpose of analysis, do-not-intubate
(DNI) status was included with DNR status. It is assumed that no significant
transition occurred in the reverse direction. Total number of ICU admissions
included 17,548 (FC); 2060 (DNR); 784 (CMO). Gender, age, and medical
condition were measured.
Demographics - Gender was recorded. Age was collected (values greater
than ninety, by de-identification convention, are recoded as greater than
200), these values were analyzed as all equivalent to exactly 90.
ICU Text - Free text input, from the ICU, included daily notes from all
services, except physicians. Text use was limited to the social sections of
the nursing progress notes.
By convention, this section catalogues family
visits; meetings with physicians; and overall understanding of and feelings
about the patient's condition. Text entries from all social sections of a single
admission were tied to the respective code status and demographic
information.
Some typical excerpts include: "very supportive family has
been in to visit today, wife and children," "family all in agreement that they
want him to be extubated and not to be re-intubated, palliation will be main
goal if he fails,""family meeting planned," "no family contact this shift."
Hospital Text - Free text input, from the hospital stay (including the ICU
stay) included physician discharge summaries. Text use was limited to the
social sections. By convention, this section catalogues tobacco, ethanol, and
illicit drug history. In addition, information regarding the patient's support
structure, living circumstances, working situation, family involvement, and
any other relevant information to the psychosocial functioning affecting
illness and recovery issues. Text entries from all social sections of a single
admission were tied to the respective code status and demographic
information. Some typical excerpts include: "Denies alcohol or tobacco use.
She lives alone. Her son is supportive and lives nearby. She is widowed. He
reports having someone who comes by to help with cleaning and being very
involved in her care. He contacts her several times a day and takes her
shopping. He does her books for her. Although the son does not feel she has
significant cognitive difficulties at baseline it is unclear if he has a realistic
assessment of her abilities", "Denies EtOH(ethyl alcohol), Tobacco or IDU
(intra-venous drug use)", "Denies tobacco or ETOH use. Lives wth husband",
"Divorced.
Lives with significant other.
Drinks 3-4 glasses of wine per
week. Works as a physician", "Drank 1.5L of wine per day for 10-15 years;
has been abstinent for about one month now; denies tobacoo or drug use;
no h/o transfusions; no tattoos; no h/o incarceration or homelessness; no
IVDU (intra-venous drug use)".
Studies
Multiple study analyses were conducted to describe the socio-demographic
features of resuscitation patterns: prediction of code class using BoosTexter
and nursing social text with data including age, gender, SAPS scores, and
the delta; n-gram frequency analysis; and logistic regression on all non-text
features. Further analysis included features obtained through annotation of
the hospital discharge summaries for: children, living situation, marital
status, and employment situation. These features were added to the first
set of features in the ablation algorithm, to evaluate the overall code
classification rate. As well, the features were analyzed individually for their
impact on code status.
1. Nursing Notes as resuscitation code classification
Since the nursing social notes contain information largely about visitors,
physician/family meetings, and family understanding and sentiment; the
notes were evaluated to determine the relative component in which they
play in resuscitation code classification. The relative component was
especially of interest in comparison to the medical condition and prognosis.
Nursing Text Preprocessing - The text entries of the daily notes were
preprocessed in the following way: First, the social component was isolated
(text was converted to lower case, punctuation and de-identification
placeholders were removed), stop words ("and", "the", etc.), rare words
(those appearing fewer than 5 times in the entire corpus), and words
directly indicating code status ("DNR", "full code", "comfort measures", etc.)
were removed.
The Porter stemming algorithm was used to stem the
remaining words (converting "sons" to "son", etc.). Finally, commonly used
abbreviations were added to the stem (e.g. dtr (daughter), was adder to
daughter; and dr was added to physician.)
Data was randomized into a training set (8 0%) and a test set (2 0%).
Algorithm - To accomplish this, a feature ablation study was performed on
the data. (14)
For each combination of medical features (SAPS scores and
delta), BoosTexter was run with and without the social text as a feature.
Statistical Analysis
-
Statistical
significance
was
calculated
using
McNemar's test on differences in classification error for each combination of
features with and without social text. E.g., significance was tested between
SAPSI(1) and delta without text and SAPSI(1) and delta with text; and
likewise, for all other combinations of medical metrics.
2. N-Gram Analysis of Nursing Social Text - This study was performed
to evaluate the most common words and phrases within the nursing text
corpus.
The processed text was used; however, it was divided into 3
groups, one for each code status.
In this format, the social text was a
combination of the individual social text strings, liking to a "bag of strings,"
for each code status.
N-gram frequency analysis was calculated for
increasing sizes of n. The n-gram count was expressed as a ratio of the
count of a particular n-gram to the count of all n-grams of that n group (%
of corpus.)
Pattern
recognitions were further evaluated.
Negation
algorithms were not used, to help discern the meaning of visit, visitors, no
visitors, etc. When patterns were noted in the most frequent n-grams, the
%corpus counts were plotted for each code status.
3. Model Formation
The third study is a logistic regression calculation of
the non-textual attributes using the R statistical framework. The reason for
this analysis was to evaluate the features of gender and age in resuscitation
code assignment.
Each attribute was tested using univariate analysis.
A
multivariate model was developed by using the features of univariate
statistical significance in combination.
All possible interaction terms were
evaluated. The model with the most possible interaction terms was
compared with the main effects model for deviance residuals. The model
performance was evaluated using the regression intercept and coefficients
obtained on the training set (80%) and applied to the test set (2 0%).
A
confusion matrix was generated. Age was considered as a continuous
variable, as well, as a binary variable at the elderly ranges. The odds ratios,
confidence intervals and p-values were calculated.
4. Physician Notes as resuscitation code classification
Hospital text preprocessing - the text entries of the discharge summaries
were processed as follows: the text was converted to lower case, the
punctuation removed, and the de-identifying placeholders were left intact for
easier reading during annotation, finally, the social component was isolated.
Annotation was performed on 500 entries. The dataset was divided into a
training (80%)
and test (20%)
set.
BoosTexter algorithm used the
annotation data to classify on the training set (8 0%), and the error rate was
noted on the test set (20%). The full corpus was then labeled automatically
using the learned classifiers. Classification was done first for the annotation
features, followed by classification for code status.
The fourth study is an analysis of classification accuracy of resuscitation
code using BoosTexter and feature ablation, as in the first study. The full
sets of features, including the nursing text, the medico-demographic
features, the physician text, and the annotated features.
The individual
feature performance was analyzed, as well as features of the top performing
groups of features.
5. Physician Notes annotation analysis
The fifth study evaluates the individual annotation features in a univariate
analysis, allowing comparative contributions from each feature.
annotation error rates are noted.
The
Annotation evaluation is by chi square
comparison of code distribution for each annotated featured with the
distribution of code status in the entire corpus. A gender sub-analysis was
performed, followed by an age/gender sub-analysis.
Results
Data Distribution The data was distributed as follows: code status (FC =
17,548; DNR = 2,060; CMO = 784); gender (males = 11,508; females =
8,884); age (range
=
15
-
>90; mean = 63.46; median
= 65.43);
SAPS1(dayl) (range = 0 - 37; mean 13.45; median 13); SAPS1(day2)
(range = 0 - 41; mean 11.86; median 12); DELTA (range = -19 - 25; mean
-2.33; median -2).
1. Nursing Notes as resuscitation code classification
BoosTexter Classification
Figure 1 represents code status classification
error as computed by BoosTexter for all ablations of medical metrics with
and without social text as a feature.
Statistical significance at all medical
metrics was demonstrated with p < 0.001.
In each case, the notes had a profound effect on classification accuracy.
This may imply that the text contains more information relevant to
determining code status than medical condition does.
Since the text
primarily consists of a record of social visits and meetings of the physician
with the family, there may be a correlation between the number or type of
visitors and the code status.
family or physician sentiment.
This could possibly represent an effect of
0.142
0.14
0.138
0.136
0.134
0.132
0.13
SAPSI(1,2), D
SAPSI(1,2)
SAPSI(1), D
SAPSI(2), D
0 without text
SAPSI(1)
SAPSI(2)
Delta
a with text
Figure 1: BoosTexter error rate with varying combinations of medical metrics;
impact of text demonstrated. Difference at each combination is significant with p <
0.001.
Figure 2 demonstrates the classification errors for each feature individually.
The lowest error univariate error rate is found using the nursing notes' social
text. Surprisingly, social text, gender, and age are all more informative for
classification than the medical metrics provided by the SAPS scores.
0.17
0.165
-
0.16
0.155
0.15
0.145 0.14 0.135 0.13
notes
age
gender
SAPS day 1
SAPS day 2
delta
Figure 2: BoosTexter error rate with single metrics; non-text and nursing notes
data set.
The lowest classification error rate is shown in Figure 3 - trigram, 500 iterations
(Appendix).
Note, that in this overall feature ablation study, the top features all
include the feature "n" (nursing notes). This further supports the role of the nursing
social notes as the most important feature. The relatively low training error may
reflect some over training.
2. N-Gram Analysis
Figure 4 demonstrates the frequency of gender-
specific words for spouse, parent, and child in the social text as a ratio of
word count / total words in the corpus. Comparisons were made within code
groups and between groups. There was a marked gender difference in each
case across all code statuses. This suggests that there is more daily support
from female relatives while in the ICU.
Table 1 - Top Unigrams without stop words
famili
visit
wife
daughter
son
husband
sister
mother
question
doctor
friend
brother
children
visitor
father
parent
niece
nephew
CMO
DNR
Full Code
Unigram
%
Unigram
%
Unigram
Count
Corpus
Count
Corpus
Count
23477
23357
13724
11689
8176
5876
4794
4597
3954
3350
2666
2646
2137
2100
1666
1345
396
355
2.471536
2.458903
1.444791
1.230557
0.860727
0.618595
0.504687
0.483948
0.416257
0.352671
0.280663
0.278557
0.224972
0.221077
0.175388
0.141595
0.041689
0.037373
2.824392
1.696213
0.89057
1.419621
1.034899
0.327177
0.434843
0.257565
0.284017
0.530908
0.165213
0.196306
0.155931
0.13876
0.107203
0.024132
0.069148
0.065435
6086
3655
1919
3059
2230
705
937
555
612
1144
356
423
336
299
231
52
149
141
3815
2070
1208
1419
1062
645
511
293
402
755
228
333
233
138
91
66
%Corpus
3.084773
1.673782
0.976777
1.14739
0.858723
0.521541
0.41319
0.236917
0.325053
0.610486
0.184359
0.269261
0.188402
0.111585
0.073582
0.053367
X
X
0.03
0.025
0.02
0.015
0.01
0.005
0
CMO
S wife
FC
DNR/DNI
m husband E daughter
N son
U
mother
father
Figure 4: Most frequent visitors by percentage of total corpus, stratified by gender
counterparts for each code status of patient.
Appendix (Figures 5, 6, and 7: Daughter more frequent than Son across all code
statues; Wife more frequent than Husband across all code statuses; Mother more
frequent than Father across all code statuses)
0.4
0.35
0.3
0.25
0.2
*
no contact
* no visitor
0.15
0.1
0.05
0
DNR
CMo
Figure 8: Bigram study, less contact, less visitors as resuscitation
Status increases
Figure 8, demonstrates the % corpus of the two bigrams, "no contact," and "no
visitor." No contact and no visitors was observed in the dataset more frequently for
the FC group, and next was the DNR group, followed by the CMO group.
Fewer
visitors, at FC status may be a reflection of better general health in the FC group,
compared with the DNR and CMO groups. Family and friends may be more inclined
to visit when they know a prognosis may be grave. Alternatively, or additionally,
there may be something intrinsically different about groups in which there loved
one is classified as a lessor resuscitation status.
3. Model Formation - Table 2 shows the univariate analysis of the features
considered for model formation.
SAPS1(dayl) was the only metric used
since it is a standard metric utilized in many ICU's, and since it was the most
predictive medical metric when using BoosTexter classification. All features
were highly significant, upon univariate analysis.
Age was converted to
binary groups beginning at age 65 and continuing to age 80 (Maximum age
in the corpus is age 90.), since the elderly are considered separately in the
literature.
Table 2
Independent Variable
AIC
GENDER MALE
10492
AGE
11765
p<.001
SAPS1
12018
p<.001
CHI SQUARE
135.6576
p value
p<.001
AGE>=65
744.5943
p<.001
AGE>=70
933.0086
p<.001
AGE>=75
1177.781
p<.001
AGE>=80
1368.453
p<.001
Table 3 - illustrates the process of model formation for prediction of code
status (FC vs DNR), using the main effects of Age, Gender, and SAPS1.
All interactions are included in the analysis; three-way, and two-way. There
were 2 two-way interactions noted to be significant and with a non-zero
coefficient; AGE:GENDER and GENDER:SAPS1.
The three-way interaction
was not significant. The model including the main effects and the two
interactions was compared with the model using the main effects only.
30
There was a clear effect upon the coefficients of the main effects. Therefore,
an analysis of Deviance residuals was performed (Data in Tables 4, 5,
Appendix.)
Ireg(CODE~AGE*GENDER*SAPS1,data)
OR
low.95
high.95
9.6
14722.26
3523.08
61521.43
2.22E-16
AGE
-0.1
0.91
0.89
0.92
2.22E-16
GENDERM
-0.4
0.67
0.09
4.82
0.69004
-0.23
0.8
0.73
0.87
7.24E-07
0.02
1.02
0.99
1.04
0.21551
0
1
1
1
9.62E-05
-0.02
0.98
0.87
1.1
0.7189
0
1
1
1
0.90175
Coef
(Intercept)
SAPS1
AGE:GENDERM
AGE:SAP1
GENDERM:SAP1
AGE:GENDERM:SAP1
p-val
Ireg(CODE~-AGE+GENDER+SAPS1+AGE:GENDER
p-val
+AGE:SAP1+GENDER:SAP1,data)
Coef
OR
low.95
high.95
(Intercept)
9.54
13867.43
4768.09
40331.79
2.22E-16
-0.1
0.91
0.9
0.92
2.22E-16
GENDERM
-0.29
0.75
0.36
1.57
0.44800021
SAPS1
-0.22
0.8
0.75
0.85
6.60E-12
0.02
1.02
1.01
1.02
0.00075665
0
1
1
1
4.84E-08
-0.03
0.97
0.95
0.99
0.00749844
AGE
AGE:GEN DERM
AGE:SAP1
GENDERM:SAP1
Ireg(CODE~'AGE+GENDER+SAPS1+AGE:GENDER
+GENDER:SAP1,data)
(Intercept)
Coef
OR
low.95
high.95
p-val
7.05
1158.5
696.36
1927.35
2.22E-16
AGE
-0.06
0.94
0.93
0.94
2.22E-16
GENDERM
-0.24
0.78
0.39
1.58
0.4968832
SAPS1
-0.05
0.95
0.94
0.97
8.94E-11
AGE:GENDERM
0.02
1.02
1.01
1.03
0.0002529
GENDERM:SAP1
-0.04
0.96
0.94
0.98
0.00044485
Ireg(CODE~AGE+GENDER+SAPS1
+GENDER:SAP1,data)
Coef
OR
low.95
high.95
p-val
6.5
667.22
451.76
985.44
2.22E-16
-0.06
0.95
0.94
0.95
2.22E-16
0.89
2.45
1.72
3.47
5.82E-07
SAPS1
-0.05
0.95
0.93
0.96
3.52E-12
GENDERM:SAP1
-0.03
0.97
0.95
0.99
0.0026528
(Intercept)
AGE
GENDERM
Ireg(CODE~AGE+GEN DER+SAPS1,data)
Coef
(Intercept)
AGE
GENDERM
SAPS1
OR
low.95
high.95
p-val
6.75
851.8
594.25
1220.97
2.22E-16
-0.06
0.95
0.94
0.95
2.22E-16
0.38
1.47
1.32
1.64
7.86E-12
-0.07
0.93
0.92
0.94
2.22E-16
Ireg(CODE~GENDER+SAPS1,data)
Coef
OR
low.95
high.95
p-val
(Intercept)
3.28
26.69
22.52
31.62
2.22E-16
GENDERM
0.53
1.69
1.52
1.88
2.22E-16
SAPS1
-0.1
0.9
0.9
0.91
2.22E-16
(Appendix: figures 9, 10, and 11: The ROC Curve for Training Set, AUC
0.758; The ROC Curve for Test Set, AUC = 0.784; Error Rate based on
Confusion Matrix of Mode)
=
10
Ct)
o ~
2
C:)2
0
0
Cf)
12
Fiur
tes
AU
-ws
wsmagnal
Ero
aeo
ae
reito
imrvdoe
h
rinn
netSt=.0
Uwihio
terribly meaningful, since the margin of difference is very small. However,
in general the AUC is usually highest on the training set. A value greater
than 0.7 is generally considered pretty good for health outcomes.
The goodness of fit Akaike Information Criterion (AIC) is a test which
penalizes superfluous parameters in the model. The higher the AIC, the
poorer the fit associated with the model. Although the model has a AIC of
8803.2 (which is considered a high number), all three of the model
33
components in univariate analysis have values of AIC, which are even higher
(Gender = 10492, Age = 11765, SAPS1 = 12018.) This would indicate that
gender, continuous Age, and SAPS1 do not give a good fit individually.
However, in Chi Square analysis, binary ages near the median of the data
set (65) and higher are associated with increasingly higher Chi Square
scores.
Hosmer-Lemeshow test of calibration is poor, with p = 0. This is may be
due to the profound differences seen in DNR distribution as age changes.
Despite poor calibration, the discrimination is very good, and the error rate
of classification is 0.103. This is substantially lower than that of any of the
BoosTexter analyses.
Table 7: Odds ratios of female and older patients based on the logistic regression
model, computed separately for FC vs. DNR (CMO ignored) and FC vs. DNR/CMO
Odds Ratio
Conf. Intervals
p-value
Gender = Female
FC vs. DNR
1.47
1.31-1.60
FC vs.
DNR/CMO
Age : 70
1.35
1.24-1.47
< 0.001
< 0.001
FC vs. DNR
3.72
3.35-4.13
< 0.001
FC vs.
DNR/CMO
2.91
2.66-3.18
< 0.001
3.62-4.87
< 0.001
2.95-3.82
< 0.001
2.65-3.58
< 0.001
2.12-2.72
< 0.001
Age 2: 70 and Gender =Female
FC vs. DNR
4.19
FC vs.
DNR/CMO
3.36
Age 2: 70 and Gender = Male
FC vs. DNR
3.08
FC vs.
2.40
DNR/CMO
Upon model formation and testing, the logistic regression model was used to
investigate the role of gender and age in code assignment (Table 6.)
Odds Ratios
are more marked when comparing FC to DNR than that of FC to DNR/CMO. There
is evidence of statistically and clinically relevant gender bias and even greater
ageism.
4. Hospital Discharge Summary BoosTexter Classification
0.17
0.165
0.16
0.155
0.15
0.145
0.14
0.135
0.13
C,
40
Figure 13: BoosTexter error rate with single metrics; non-text and nursing notes
(black), physician notes and annotated physician note data set (grey).
Compared with the first feature ablation study, the physician social text is less
predictive than all but SAPS1 day2 and delta. The annotated features are more
predictive than even SAPS1, which is one of the main effects used in the model to
control for medical condition. So, although the physician discharge summaries are
not particularly helpful in univariate analysis, the annotated features are.
No one
annotated feature group in more predictive than another.
However, in combination, the added features do contribute significantly to code
classification rates, resulting in the lowest overall error produced by feature
ablation (0.1299338.)
Figure 14 - Classification error using feature ablation, 500 iterations. Black
= inclusive of physician notes and annotated features. Light Grey = not
inclusive of physician notes or annotated features. (g=gender, a=age,
1=SAPSI (dayl), 2=SAPSI(day2), d=delta, n=nursing notes, p=physician
notes, c=children, l=living situation, m=marital status, w=working status)
The training error is similarly lower as in the first ablation study; however,
with 250 iterations, the training error increases without adding much to the
classification error. Nursing notes remain the most consistent features,
being present in over the first several tens of the combination studies.
5.-Hospital Discharge Summary Annotation
Children
0= 664
1 = 1121
M = 2104
U =16503
Living Situation
A= 723
F=5039
1=1096
Marital Status
D= 583
M = 4837
S = 391
Working Situation
R= 1502
W = 1244
WC = 487
U= 13534
U= 14039
BC= 425
D = 361
Other = 563
U = 15810
Table 8 - distribution of annotation
FEATURE
ERROR
Children #
Marital Status
Living Situation
WorkingSituation
0.1235955
0.0449438
0.1348315
0.1235955
Table 9- annotation feature classification error
The text was successfully annotated, with the least classification error for
marital status, 0.0449438. The feature annotation rate ranged from 19 33.6% of the corpus. Annotation beyond 500 cases would not likely improve
this rate; as many of the notes were missing, and at least half dealt only with
alcohol, tobacco, and illicit drug use.
Despite this, much information was learned from the annotation.
Number of Children - (Tables 10, 11, 12)
Zero or one child was found to be statistically significantly (p<.001)
associated with lower rates of FC status, and higher rates of DNR status.
Children = Many, did not differ from the corpus distribution of code status.
Children by Gender -
When evaluating the effects of children on parent's code status; the findings are the
same, except for males the zero child effect holds only for women.
Children - bv Gender and Aae >= 70 or Aae<70 -
The zero child impact on code status for women holds true when looking at
age effect.
Living Situations - (Tables 13, 14, 15, Appendix)
Living Alone (p=.004) or in "Institution" (p<.001), were both associated with
less FC, and more DNR statuses. (Institution living includes nursing home,
assisted living arrangements, rehabilitation facilities, and other similar
situations.) However, living with Family was statistically significantly
(p=.044) associated with increased FC and decreased DNR statuses.
Living Situation by Gender
-
Gender differences were evident in Living Alone, with only female significant.
Living Situation by Gender and Age Age effect noted in Living Alone, with only older group statistically
significant. Family and Institution findings not gender or age dependent.
Marital Status - (Tables 16, 17, 18, Appendix)
Marital Status = Widow or Widower, is statistically significantly (p<.001)
found to be associated with less FC and more DNR. Although not statistically
significant (p=.174), Divorced Status follows the same trend. However,
Married Status is associated with more FC and less DNR. Single status is not
significantly different from the corpus.
Marital Status by gender Only Marital Status value, Married, differed by gender. The female group
was statistically significant, the male group was not.
Marital Status by Gender and Age Numbers too small to evaluate.
Working Situation - (Tables 19, 20, 21, Appendix)
Working Situation = Working has lower FC rates, and higher DNR rates
(p<.001.) Although not statistically significant, Working = White Collar
(p<.088) and Working
=
Retired (p=.082), have similar trends.
Unemployed status is associated with more FC and less DNR (p=.026.)
Working = Blue Collar, Disabled, or Volunteer are not significantly different
from the rates in the whole corpus.
Working by Gender -
Gender effect is noted in value retired, significant for male only, p=.004.
Also, gender effect is noted for unemployed, significant for female only,
p=.019.
Working by Gender and Age Age difference noted for Retired young male only p=.031.
Discussion
Understanding resuscitation code assignment in the ICU is complex, and this
study lends a preliminary attempt at this need. It is vital to understand the
factors which influence code assignment in order to ensure ethical treatment
of all patients, and provide treatment in harmony with patients' wishes.
Code status seems to be more a reflection of family and physician sentiment
and
assessment,
than
of
unprejudiced
medical
measure.
This,
unfortunately, can leave room for human mistakes in interpreting medical
condition, prognosis, and individuals' wishes. Certainly, as mentioned in the
introduction, it is not easy to simply encourage advanced directives and
carry them out. There is no room to anticipate every medical scenario.
The main concern is that in this process, individual care takers and family
may introduce unwanted or un-indicated bias.
Some perceived bias, may
indeed be socio-demographically intentionally derived from the patients'
wishes. Discerning this component is impossible, as patients are most often
too sick to communicate for an interview, and retrospective analysis is often
erred.
It is, however, clear from these analyses that medical condition and
prognosis,
alone, are not likely the
leading driving factor of code
assignment; and in fact these findings mirror those associated with ICU
resource allocation itself.
That is to say, gender and age bias, major
influencing factors in code assignment found in this study, are also major
influencing factors
in ICU
bed
assignment.
Valentin
et.al,
found
overwhelming evidence, that women were less likely than men to be
admitted to the ICU when severity of illness was considered. (10)
In
addition, the women were much less likely to undergo invasive procedures.
This finding is in contrary to that of Perkins et.al, who found that women are
more open to invasive treatments. (15)
These conflicting findings may be
further evidence that there is gender bias when treating women in the ICU,
if in fact women are more open than men, to undergoing invasive
treatments; and by extension, more open to more aggressive code status.
However, Covinsky et.al, (16), as part of the SUPPORT project, found
women less likely to want CPR; and Raine et.al, found gender bias toward
men and women, depending on the diagnosis. (17)
Race has in the past been shown to affect code status, although this is not
clear. Bardach et.al, found that women and Hispanics were more likely to
have DNR order, and when adjusted for, hospital mortality rates reversed
the advantage to Hispanics. (18)
Contrarily, Shepardson et.al, found
differences in rates of the DNR order in African American compared with
Caucasians; with rates higher in Caucasian patients. (19) Unfortunately, this
study was not able to extract information about race.
This is the first study, known to the author, to evaluate prediction of code
status. This study included comparative classification using feature ablation
of nursing social notes (largely, a log of family visits, feelings, and
family/physician meetings), age, gender, and SAPSI score.
The nursing
notes alone proved to be a better indicator of code status than the available
medical statistics.
Age and gender were also highly predictive.
When
combined with medical features, social text improves classification accuracy
remarkably.
Additionally, the social section of physician hospital discharge summaries
(largely a log of marital status, children involvement in care, living and
employment situations, as well as history of alcohol, tobacco, and illicit drug
use) were used for comparative classification. Initially, it was noted that the
physician notes were not terribly useful as free text alone; and therefore, an
annotation was performed.
The annotation features (children,
living
situation, marital status, and employment status) were found to be less
predictive individually, than nursing text, age or gender; however, they were
more predictive than the SAPSI score. In combination with other features,
the annotation features reduced the classification error even further.
However, the least prediction error was achieved with the logistic regression
model.
The model was tested for interaction terms, and for performance.
The precision of the model was excellent, with an AUC (C index) of 0.784.
Calibration, however, was poor. Using the model, an assessment of gender
(OR=1.47) and age (OR=3.72) treatment was made, using the Odds Ratios
of likelihood to be of less aggressive code status. There was found, notably
clear, gender bias toward men, and ageism favoring the young.
The gender bias findings are consistent with those of Eachempati, et al, who
found a gender bias in DNR assignment for elderly patients undergoing
emergency surgery. (1) The gender difference is especially concerning,
given the study by Zettel-Watson et al. (20)
In this study, wives were
found to be more accurate compared with husbands regarding their spouse's
wishes. It seems equally plausible, therefore, that the gender difference may
be a reflection of the cultural devaluation of women compared with men.
The remarkable age differences are inconsistent with the lack of ICU ageism
reported by Hubbard et. al. (12)
The n-gram study of the nursing social text revealed interesting specific
differences in gender involvement. It cannot readily be concluded that the
female gender (wife, daughter, sister) has a more decisive role compared
with the male gender (husband, son, brother); but it is reasonable to
conclude that there is more daily female support. There does not seem to
be bias in this regard to any specific code group, as the relationship is
consistent across all code states. The findings of the bigram study are not
surprising.
There are more visitors, and more contact as code status
progresses from FC to DNR to CMO.
It is likely that family and loved ones
are more involved when death seems imminent or more likely.
The annotated features of the physician hospital discharge summaries
revealed vulnerable groups.
For children: having zero children (females
only) or one child was associated with decreased rates of FC status, and
increased rates of DNR status; having many children did not differ from the
baseline rates in the full corpus. For living situation: being institutionalized
(nursing home, rehabilitation facility, assisted living, and others) was
associated with decreased rates of FC and increased rates of DNR statuses;
the same trend was observed for living alone (for older females only), so this
is likely a simple age effect; living with family was associated with increased
FC rates and decreased DNR rates (no age or gender effect).
For marital
status: being a widow or widower was associated with decreased rates of FC,
and increased rates of DNR, with divorce status following this trend; being
married and female had the reverse observation; married males and single
people did not differ from the underlying corpus.
For employment status:
the values working or white collar were both associated with decreased FC,
and increased DNR; retired followed this pattern for men only; unemployed
females were observed to follow the opposite direction, increased FC, and
decreased DNR;
blue collar, disabled, and volunteer did not differ from the
corpus.
A significant limitation of the BoosTexter classification and feature ablation
studies, was the limitation of the SAPSI score to define medical condition
and prognosis.
SAPSI may not be the most robust medical metric for all
medical conditions, especially those related to multiple organ failure. SAPSII
(21) (which includes a parameter for ICU indication for admission), the
APACHE scores, and multiple organ failure scores are established; but the
MIMIC database does not support all the parameters.
For this reason, the
parameters SAPSI (day 2), and Delta, were created; however, they did not
prove to be as predictive as the standard SAPSI score. Lack of Alternative
medical metrics, were probably the most significant limitation of the
classification studies, including the logistic regression model. In fact, the
Eachempati study used both the MOD and aAIII, which may have been
helpful in creating a more calibrated model than that of this thesis. The
MOD score was also more predictive, in that study, than age in determining
code status. (1)
Another limitation to the classification studies, was the overwhelming DNR
class in the corpus. The training set was 86.213449% FC, and the test set
was 85.43761% FC.
This makes classification errors in the 13 - 14 range
difficult to interpret.
The annotated features were limited by numbers, in some cases. Increased
annotation would not likely correct this problem, since many notes were
limited to alcohol, tobacco, and illicit drug use.
Improvements and future study will include: annotation of the nursing notes,
modifiers to the logistic regression model to help increase the calibration;
inclusion of more medical metrics, as the MIMIC database allows; and the
addition of racial, and more socio-demographic information when available.
Summary
These findings highlight several points.
First, there is a need for improved
communication between health care providers and family members.
Their
involvement may clarify the patient's potential to respond to further therapy,
thereby helping accurate code status to be applied more quickly. Second,
there is decidedly more daily involvement from family members of female
gender for patients of any code status; however, the significance of this is
unclear. There is a need for more support and advocacy for- some of the
most vulnerable patients (Those patients: with one or less child; living alone
or institutionalized; widows, widowers, and divorced; retired; former or
present white collar work; and working), and of course women and the
elderly.
Finally, the gender and age differences in less aggressive code
statuses warrant in-depth further study.
References
1.
Eachempati SR, et al. Sex differences in creation of do-notresuscitate orders for critically ill elderly patients following
emergency surgery. Journal of Trauma-Injury Infection & Critical
Care. 2006; 1:193-7.
2.
Matsui M, et al. Perspectives of elderly people on advance
directives in Japan. Journal of Nursing Scholarship. 2007; 2:1726.
3.
Rosnick CB, et al. Thinking ahead: factors associated with
executing advance directives. Journal of Aging &Health. 2003;
2:409-29.
4.
Watson DR, et al. The effect of hospital admission on the opinions
and knowledge of elderly patients regarding cardiopulmonary
resuscitation. Age & Ageing. 2007; 6:429-34.
5.
Joos SK, et al. Outpatients' attitudes and understanding
regarding living wills. Journal of General Internal Medicine. 1993;
5:259-63.
6.
Lester PE, et al. Do Geriatricians Practice What they Preach?:
Geriatricians' personal establishment of advance directives.
Gerontology & Geriatrics Education. 2009; 1:61-74.
7.
Iwashyna TJ, et al. Marriage, widowhood, and health-care use.
Social Science & Medicine. 2003; 57:2137-2147.
8.
Cabrera-Alonso J, et al. Marital Status and Health Care
Expenditures Among the Elderly in a Managed Care Organization.
Health Care Manager. 2003; 22:249-255.
9.
Osborne C, et al. The influence of marital status on the stage at
diagnosis, treatment, and survival of older women with breast
cancer. Breast Cancer Research and Treatment. 2005; 93:41-47.
10. Valentin A, et.al. Gender-related differences in intensive care: A
multiple-center cohort study of therapeutic interventions and
outcome in critically ill patients. Crit Care Med. 2003; 31:19011907.
11. de Rooij, et al. Identification of high-risk subgroups in very
elderly intensive care unit patients. Critical Care. 2007; 11:1-9.
12. Hubbard RE, et.al. Absence of ageism in access to critical care: a
cross-sectional study. Age Ageing. 2003; 32:382-7.
13. Schapire RE, et al. BoosTexter: A Boosting-based System for Text
Categorization. Machine Learning. 2000; 39:135-168.
14. Walker MA, et al. Empirical Studies in Discourse. Association for
Computational Linguistics. 1997; 23:1-12.
15. Perkins HS, et al. Advance care planning: does patient gender
make a difference? American Journal of the Medical Sciences.
2004; 1:25-32.
16. Covinsky KE, et al. Communication and decision-making in
seriously ill patients: findings of the SUPPORT project. The Study
to Understand Prognoses and Preferences for Outcomes and Risks
of Treatments. Journal of the American Geriatrics Society. 2000;
(5 Suppl):S187-93
17. Raine R, et al. Influence of patient gender on admission to
intensive care. J Epidemiol Community Health; 56:418-423.
18. Bardach N, et al. Adjustment for do-not-resuscitate orders
reverses the apparent in-hospital mortality advantage for
minorities. American Journal of Medicine. 2005; 4:400-8.
19. Shepardson LB, et al. Racial Variation in the Use of Do-NotResuscitate Orders. J Gen Intern Med; 14:15-20.
20. Zettel-Watson, et.al. Actual and perceived gender differences in
the accuracy of surrogate decisions about life-sustaining medical
treatment among older spouses. Death Studies. 2008; 3:273-90.
21. LeGall JR, et.al. A new simplified acute physiology scores
(SAPSII) based on a European/North American multicenter study.
JAMA. 1993; 270:2957-2963.
Biographical Note and Acknowledgement
Regina Barzilay, Ph.D.
Mitchell Medow, M.D., Ph.D.
Robert Friedman, M.D.
Alexa McCray, Ph.D.
Roger G. Mark, M.D., Ph.D.
William J. Long, Ph.D.
Christina J. Sauper, S.M.
Mauricio Villarreol, Ph.D.
Daniel Scott, Ph.D.
Michael Craig, Ph.D.
Biographical Note - I received my BS degree from George H Cook College
four year honors program, in biochemistry, and my MD from Cornell
University Medical College. I trained in general surgery under G. Tom Shires
II, MD. I completed my Obstetrics and Gynecologic Oncology training at
Harvard's Brigham and Women and Massachusetts General Hospitals.
Finally, my clinical training was completed with a fellowship in Gynecologic
Oncology at The James Graham Brown Cancer Center.
Appendix
Figures (3, 5, 6, 7, 9, 10, 11)
0.16
1
0.14 -
0.12 -
0.1 -
0.08 -
Mtest error
I training error
0.06 -
0.04 0.02 -
0-
Figure 3
I I I I I
) I I I I I I I I I I I I I
0.03
0.025
0.02
C
0 0.015
R
P
0.01
U
S
0.005
0
DNR/ DNI
CMO
0 daughter
0 st
Figure 5
0.03
0.025
0.02
c
R
P
U
S
00.015
0.01
0.005
0
CMO
DNR/DNI
U wife
Figure 6
a husband
FC
0.012
0.01
C-
0.008
0 0.006
R
0.004
U
S 0.002
0
CMO
Figure 7
DNR/DNI
s mother
father
FC
C)
0
(D
C)
0
ID
It
CJ
0
0.0
0.2
0.4
0.6
False positive rate
Figure 9
0.8
1.0
0D
0
|
C)
04
0)
0
C)
False positive rate
Figure 10
2
C0
0
0
1
2
3
Cutoff
Figure 11
4
5
6
Tables (3, 4, 9 - 20)
model1=glm(CODE~AGE+SAP1+GENDER,binomial)
> summary(modell)
Call:
glm(formula = CODE ~ AGE + SAP1 + GENDER, family
=
binomial)
Deviance Residuals:
Min
1Q Median
3Q Max
-3.2278 0.2027 0.3449 0.5302 1.3922
Coefficients:
Estimate Std. Error z value Pr(> Iz)
(Intercept) 6.747351 0.183697 36.731 < 2e-16
AGE
-0.055159 0.002265 -24.351 < 2e-16 ***
SAPI
-0.068940 0.005571 -12.374 < 2e-16***
GENDERM
0.384418 0.056192 6.841 7.86e-12 *
Signif. codes: 0 "***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1'' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 10063.0 on 14799 degrees of freedom
Residual deviance: 8803.2 on 14796 degrees of freedom
(886 observations deleted due to missingness)
AIC: 8811.2
Number of Fisher Scoring iterations: 6
Table 3
Model 2 - Table 4
>
model2<-glm(CODE~AGE+SAP1+GENDER+AGE:GENDER+SAP1:GENDE
R,binomial)
> summary(model2)
Call:
glm(formula = CODE ~ AGE + SAP1 + GENDER + AGE:GENDER + SAP1:GENDER,
family = binomial)
Deviance Residuals:
Min
1Q Median
3Q
Max
-3.1550 0.2039 0.3453 0.5221 1.3136
Coefficients:
Estimate Std. Error z value Pr(>IzI)
(Intercept) 7.054885 0.259703 27.165 < 2e-16
-0.062976 0.003211 -19.612 < 2e-16***
AGE
-0.050393 0.007772 -6.484 8.94e-11***
SAP1
-0.244020 0.359169 -0.679 0.496883
GENDERM
AGE:GENDERM 0.016639 0.004547 3.659 0.000253 *
SAP1:GENDERM -0.039157 0.011150 -3.512 0.000445 *
Signif. codes: 0
"'***'
0.001'**' 0.01
'*'
0.05 '.' O.1'' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 10063.0 on 14799 degrees of freedom
Residual deviance: 8780.8 on 14794 degrees of freedom
(886 observations deleted due to missingness)
AIC: 8792.8
Number of Fisher Scoring iterations: 6
Table 4
Significance for feature children, value 0:
data: contingency table
1
2
3
A
B
31 784 815
94 2060 2154
539 17548 18087
664 20392 21056
expected: contingency table
1
2
3
A
25.7
67.9
570.
B
789.
2.086E+03
1.752E+04
chi-square = 13.2
degrees of freedom
probability = 0.001
=
2
Significance for feature children, value 1:
data: contingency table
1
2
3
A
B
57 784 841
176 2060 2236
888 17548 18436
1121 20392 21513
expected: contingency table
1
2
3
A
43.8
117.
961.
B
797.
2.119E+03
1.748E+04
chi-square = 42.0
degrees of freedom
probability = 0.000
=
2
Significance for feature children, value Many:
data: contingency table
B
A
1
79 784 863
2 214 2060 2274
3 1811 17548 19359
2104 20392 22496
expected: contingency table
A
B
782.
1 80.7
2 213.
2.061E+03
3 1.811E+03 1.755E+04
chi-square = 0.493E-01
degrees of freedom = 2
probability = 0.976
Table 9
Significance for feature children, value 0 and Male:
data: contingency table
A
B
1
9
407
416
2
31
917
948
3
278 10184 10462
318 11508 11826
expected: contingency table
A
1 11.2
2 25.5
3 281.
chi-square
B
405.
923.
1.018E+04
1.70
degrees of freedom =
probability = 0.427
2
Significance for feature children, value 0 and Female:
data: contingency table
1
2
3
A
B
22 377 399
63 1143 1206
261 7364 7625
346 8884 9230
expected: contingency table
1
2
3
A
15.0
45.2
286.
B
384.
1.161E+03
7.339E+03
chi-square = 13.0
degrees of freedom =
probability = 0.002
2
Significance for feature children, value 1 and Male:
data: contingency table
1
2
3
A
B
20 407 427
46 917 963
352 10184 10536
418 11508 11926
expected: contingency table
1
2
A
15.0
33.8
B
412.
929.
3
369.
1.017E+04
chi-square = 7.20
degrees of freedom =
probability = 0.027
2
Significance for feature children, value 1 and Female:
data: contingency table
1
2
3
A
B
37 377 414
130 1143 1273
536 7364 7900
703 8884 9587
expected: contingency table
1
2
3
A
30.4
93.3
579.
8
384.
1.180E+03
7.321E+03
chi-square = 20.6
degrees of freedom =
probability = 0.000
2
Significance for feature children, value Many and Male:
data: contingency table
1
2
3
A
B
34 407 441
105 917 1022
938 10184 11122
1077 11508 12585
expected: contingency table
1
2
3
A
37.7
87.5
952.
B
403.
935.
1.017E+04
chi-square = 4.47
degrees of freedom =
probability = 0.107
2
Significance for feature children, value Many and Female:
data: contingency table
1
2
3
A
B
45 377 422
109 1143 1252
873 7364 8237
1027 8884 9911
expected: contingency table
1
2
3
A
43.7
130.
854.
B
378.
1.122E+03
7.383E+03
chi-square = 4.23
degrees of freedom =
probability = 0.120
Table 10
2
vl t
...............
..
I..
-tuft, hildr o"J"
a n4 Few a106
77
T,
2so
t
Q
7777
& conPngenlcy, tWe
-,A
list llilk
1
?4A,
2 987E +03;
0
.177
trot
I'-ftwuk4ii
'Chilrd
lu
v""
va
fNevpwl
0
f4i
IMPS4
Role
77
ORO,
_IZ "j-jL3rj4 JA491.
t
7f"'4360 4531
4758 44t6
A',
65
genist!abte
:M
,4-6-.m
i
mmen
2Na~W
m
yeiaiekata
332eyaags+t3m
%%$10%(am
ja66
-lop%
em
too", ta
767
799
d7
.
0'
^4.-JL1
f
'i
~
-i j d rn - a
1-- ---
h~j th68
--A
Table 11
Significance for feature living, value Alone:
data: contingency table
A
B
1
2
32
99
784
2060
3
592 17548 18140
816
2159
723 20392 21115
expected: contingency table
A
1
2
3
27.9
73.9
621.
B
788.
2.085E+03
1.752E+04
chi-square = 10.8
degrees of freedom
probability = 0.004
=
2
Significance for feature living, value Family:
data: contingency table
A
B
1 184 784 968
2 453 2060 2513
3 4402 17548 21950
5039 20392 25431
expected: contingency table
A
B
776.
1 192.
2 498.
2.015E+03
3 4.349E+03 1.760E+04
chi-square = 6.25
degrees of freedom
probability = 0.044
=
2
Significance for feature living, value Institution:
data: contingency table
1
2
3
B
A
61 784 845
250 2060 2310
785 17548 18333
1096 20392 21488
expected: contingency table
1
2
3
A
43.1
118.
935.
B
802.
2.192E+03
1.740E+04
chi-square = 189.
degrees of freedom
probability = 0.000
=
2
Table 12
Significance for feature living, value Alone and Male:
data: contingency table
1
2
3
A
11
B
407
418
33
917
950
292 10184 10476
336 11508 11844
expected: contingency table
A
B
1 11.9
406.
923.
2 27.0
1.018E+04
3 297.
chi-square = 1.55
degrees of freedom
probability = 0.460
2
Significance for feature living, value Alone and Female:
data: contingency table
A
1
2
3
B
21 377 398
66 1143 1209
300 7364 7664
387
8884
9271
expected: contingency table
A
B
1 16.6
381.
1.159E+03
2 50.5
7.344E+03
3 320.
chi-square = 7.49
degrees of freedom = 2
probability = 0.024
Significance for feature living, value Family and Male:
data: contingency table
A
B
1 120 407 527
2 286 917 1203
3 3112 10184 13296
3518 11508 15026
expected: contingency table
A
B
1 123.
404.
921.
2 282.
3 3.113E+03 1.018E+04
chi-square = 0.209
degrees of freedom = 2
probability = 0.901
Significance for feature living, value Family and Female:
data: contingency table
A
1 64
2 167
3 1290
B
377 441
1143 1310
7364 8654
1521 8884 10405
expected: contingency table
A
B
377.
1 64.5
2 191.
1.119E+03
3 1.265E+03 7.389E+03
chi-square = 4.25
degrees of freedom =
2
probability = 0.119
Significance for feature living, value Institution and Male:
data: contingency table
1
2
3
A
B
26 407 433
82 917 999
384 10184 10568
492 11508 12000
expected: contingency table
1
2
3
A
17.8
41.0
433.
B
415.
958.
1.013E+04
chi-square = 52.7
degrees of freedom =
probability = 0.000
2
Significance for feature living, value Institution and Female:
data: contingency table
1
2
3
B
A
35 377 412
168 1143 1311
401 7364 7765
604 8884 9488
expected: contingency table
1
2
3
A
26.2
83.5
494.
B
386.
1.228E+03
7.271E+03
chi-square
113.
degrees of freedom =
probability = 0.000
Table 13
2
-va e A *b-Vide
dava'as
tontiWq
1
74
ft~e
4lO
U~qec tam
basy
5875
a
- i
-
flffca'm-
-- - 1-
e--
Nr-5
24
?
j'
76
13
.2~
19-5-:
290
1 ~ly 3
va
2,24
_41
_77
1"
0
Table 14
Significance for feature married, value Divorced:
data: contingency table
A
1
2
3
B
28 784 812
69 2060 2129
486 17548 18034
583 20392 20975
expected: contingency table
1
2
3
A
22.6
59.2
501.
B
789.
2.070E+03
1.753E+04
chi-square = 3.50
degrees of freedom
probability = 0.174
=
2
Significance for feature married, value Married:
data: contingency table
A
B
1 169 784 953
2 376 2060 2436
3 4292 17548 21840
4837 20392 25229
expected: contingency table
A
B
770.
1 183.
2 467.
1.969E+03
3 4.187E+03 1.765E+04
chi-square = 26.5
degrees of freedom
probability = 0.000
=
2
Significance for feature married, value Single:
data: contingency table
1
2
3
A
B
20 784 804
37 2060 2097
334 17548 17882
391 20392 20783
expected: contingency table
1
2
3
A
15.1
39.5
336.
B
789.
2.058E+03
1.755E+04
chi-square = 1.77
degrees of freedom
probability = 0.412
=
2
Significance for feature married, value Widow/er:
data: contingency table
1
2
3
A
B
24 784 808
106 2060 2166
412 17548 17960
542 20392 20934
expected: contingency table
1
2
3
A
20.9
56.1
465.
B
787.
2.110E+03
1.749E+04
chi-square = 52.3
degrees of freedom
probability = 0.000
Table 15
=
2
Significance for feature married, value Divorced and Male:
data: contingency table
1
2
3
B
A
10 407 417
18 917 935
178 10184 10362
206 11508 11714
expected: contingency table
1
2
3
A
7.33
16.4
182.
B
410.
919.
1.018E+04
chi-square = 1.24
degrees of freedom =
probability = 0.539
2
Significance for feature married, value Divorced and Female:
data: contingency table
1
2
3
A
B
18 377 395
51 1143 1194
308 7364 7672
377 8884 9261
expected: contingency table
B
379.
1.145E+03
7.360E+03
chi-square = 0.424
1
2
3
A
16.1
48.6
312.
degrees of freedom =
probability = 0.809
2
Significance for feature married, value Married and Male:
data: contingency table
A
B
1 115 407 522
2 251 917 1168
3 2959 10184 13143
3325 11508 14833
expected: contingency table
A
B
1 117.
405.
906.
2 262.
3 2.946E+03 1.020E+04
chi-square = 0.693
degrees of freedom =
probability = 0.707
2
Significance for feature married, value Married and Female:
data: contingency table
A
B
1 54 377 431
2 125 1143 1268
3 1333 7364 8697
1512 8884 10396
expected: contingency table
B
A
368.
1 62.7
2 184.
1.084E+03
3 1.265E+03 7.432E+03
chi-square = 28.1
degrees of freedom =
probability = 0.000
2
Significance for feature married, value Single and Male:
data: contingency table
1
2
3
A
B
13 407 420
29 917 946
229 10184 10413
271 11508 11779
expected: contingency table
1
2
3
A
9.66
21.8
240.
B
410.
924.
1.017E+04
chi-square = 4.12
degrees of freedom =
2
probability = 0.128
Significance for feature married, value Single and Female:
data: contingency table
1
2
3
B
A
7 377 384
8 1143 1151
105 7364 7469
120 8884 9004
expected: contingency table
1
2
3
A
5.12
15.3
99.5
B
379.
1.136E+03
7.369E+03
chi-square = 4.56
degrees of freedom =
probability = 0.102
2
Significance for feature married, value Widower and Male:
data: contingency table
1
2
3
B
A
7 407 414
30 917 947
144 10184 10328
181 11508 11689
expected: contingency table
1
2
3
A
6.41
14.7
160.
B
408.
932.
1.017E+04
chi-square = 18.0
degrees of freedom =
probability = 0.000
2
Significance for feature married, value Widow and Female:
data: contingency table
1
2
A
17
76
B
377
1143
394
1219
3
268 7364 7632
361
8884 9245
expected: contingency table
A
B
1 15.4
379.
2 47.6
1.171E+03
3 298.
7.334E+03
chi-square = 21.0
degrees of freedom = 2
probability = 0.000
Table 16
t
"P
44
V%
47
table
84
r".: .
eip
Iag
R"
45"
gme
....-
-
.
-o
585
-. r-OnN
- -
-
.
7 1
-
t
686
c
-
4
-1a
&
587
-
-
Ka f0jr -fagitu'pa,,tnartied
vialytM
d female. jond
t 0It
4,, IV
141
..
...........
3
77.
-Al
138.
-2
ini"
AKY A
'44 19
7
table,
ting
iWi
4a ,N
NO
On d,YO
60 *W14jaft feL ftjW
c
table
A
_196
L22
37' 6715 '6752
41
7229
7270
88
4p
_X
r
ut
ta
,j,
,71
.4,
I
F,441-0
V" 4*"'
-777, 7
-77
,01t+p32.1 0;1
4
®r-
!'a
fit
0
v
Al
r4'
Tn,
A93
w
cq#j#qAgnq tAbli6.
165,
C3546+03
7 25.
_bf keed m
_jp
89
=bl
0727
Table 17
Significance for feature working, value Blue Collar:
data: contingency table
1
2
3
A
B
13 784 797
36 2060 2096
376 17548 17924
425 20392 20817
expected: contingency table
1
2
3
A
16.3
42.8
366.
B
781.
2.053E+03
1.756E+04
chi-square = 2.05
degrees of freedom
probability = 0.358
=
2
Significance for feature working, value Disabled:
data: contingency table
1
2
3
A
B
16 784 800
33 2060 2093
312 17548 17860
361 20392 20753
expected: contingency table
1
2
3
A
13.9
36.4
311.
B
786.
2.057E+03
1.755E+04
chi-square = 0.648
degrees of freedom
probability = 0.723
=
2
Significance for feature working, value Retired:
data: contingency table
A
B
1
47 784 831
2 174 2060 2234
3 1281 17548 18829
1502 20392 21894
expected: contingency table
A
B
1 57.0
774.
2 153.
2.081E+03
3 1.292E+03 1.754E+04
chi-square = 5.00
degrees of freedom
probability = 0.082
=
2
Significance for feature working, value Unemployed:
data: contingency table
1
2
3
A
B
12 784 796
17 2060 2077
282 17548 17830
311 20392 20703
expected: contingency table
1
2
3
A
12.0
31.2
268.
B
784.
2.046E+03
1.756E+04
chi-square = 7.32
degrees of freedom
probability = 0.026
=
2
Significance for feature working, value Volunteer:
data: contingency table
1
2
3
A
B
7 784 791
19 2060 2079
226 17548 17774
252 20392 20644
expected: contingency table
1
2
3
A
9.66
25.4
217.
B
781.
2.054E+03
1.756E+04
chi-square = 2.74
degrees of freedom
probability = 0.254
=
2
Significance for feature working, value Working:
data: contingency table
1
2
3
B
A
26 784 810
23 2060 2083
55 17548 17603
104 20392 20496
expected: contingency table
1
2
3
A
4.11
10.6
89.3
B
806.
2.072E+03
1.751E+04
chi-square = 145.
degrees of freedom
probability = 0.000
=
2
Significance for feature working, value White Collar:
data: contingency table
1
2
3
A
B
23 784 807
62 2060 2122
402 17548 17950
487 20392 20879
expected: contingency table
1
2
3
A
18.8
49.5
419.
B
788.
2.073E+03
1.753E+04
chi-square = 4.86
degrees of freedom
=
2
probability = 0.088
Table 18
Significance for feature working, value Blue Collar and Male:
data: contingency table
B
A
1
7 407 414
2 23 917 940
3 260 10184 10444
290 11508 11798
expected: contingency table
B
404.
917.
1.019E+04
chi-square = 1.06
1
2
3
A
10.2
23.1
257.
degrees of freedom =
probability = 0.589
2
Significance for feature working, value Blue Collar and Female:
data: contingency table
1
2
3
B
A
6 377 383
13 1143 1156
116 7364 7480
135 8884 9019
expected: contingency table
1
2
3
A
5.73
17.3
112.
chi-square =
B
377.
1.139E+03
7.368E+03
1.25
degrees of freedom
2
probability = 0.536
Significance for feature working, value Disabled and Male:
data: contingency table
1
2
3
A
B
9
407
416
13 917 930
180 10184 10364
202 11508 11710
expected: contingency table
A
1 7.18
2 16.0
3 179.
B
409.
914.
1.019E+04
chi-square = 1.07
degrees of freedom =
probability = 0.586
2
Significance for feature working, value Disabled and Female:
data: contingency table
1
2
3
A
B
7 377 384
20 1143 1163
132 7364 7496
159 8884 9043
expected: contingency table
1
2
3
A
6.75
20.4
132.
B
377.
1.143E+03
7.364E+03
chi-square = 0.196E-01
degrees of freedom = 2
probability = 0.990
Significance for feature working, value Retired and Male:
data: contingency table
A
B
1 36 407 443
2 122 917 1039
3 971 10184 11155
1129 11508 12637
expected: contingency table
A
1 39.6
B
403.
946.
2 92.8
1.016E+04
3 997.
chi-square = 11.1
degrees of freedom =
2
probability = 0.004
Significance for feature working, value Retired and Female:
data: contingency table
1
2
3
B
A
11 377 388
52 1143 1195
310 7364 7674
373
8884 9257
expected: contingency table
1
2
3
A
15.6
48.2
309.
B
372.
1.147E+03
7.365E+03
chi-square = 1.75
degrees of freedom =
probability = 0.416
2
Significance for feature working, value Unemployed and Male:
data: contingency table
1
2
3
A
B
8 407 415
10 917 927
158 10184 10342
176 11508 11684
expected: contingency table
1
2
3
A
6.25
14.0
156.
B
409.
913.
1.019E+04
chi-square = 1.67
degrees of freedom =
probability = 0.434
2
Significance for feature working, value Unemployed and Female:
data: contingency table
1
2
3
A
B
4 377 381
7 1143 1150
124 7364 7488
135 8884 9019
expected: contingency table
1
2
3
A
5.70
17.2
112.
B
375.
1.133E+03
7.376E+03
chi-square = 7.95
degrees of freedom =
probability = 0.019
2
Significance for feature working, value Volunteer and Male:
data: contingency table
1
2
3
A
B
4 407 411
10 917 927
164 10184 10348
178 11508 11686
expected: contingency table
1
2
3
A
6.26
14.1
158.
B
405.
913.
1.019E+04
chi-square = 2.31
degrees of freedom =
probability = 0.315
2
Significance for feature working, value Volunteer and Female:
data: contingency table
1
2
3
B
A
3 377 380
9 1143 1152
62 7364 7426
74 8884 8958
expected: contingency table
1
2
3
A
3.14
9.52
61.3
B
377.
1.142E+03
7.365E+03
chi-square = 0.415E-01
degrees of freedom = 2
probability = 0.979
Significance for feature working, value Working and Male:
data: contingency table
1
2
3
A
B
18 407 425
6 917 923
38 10184 10222
62 11508 11570
expected: contingency table
1
2
3
A
2.28
4.95
54.8
B
423.
918.
1.017E+04
chi-square = 115.
degrees of freedom =
probability = 0.000
2
Significance for feature working, value Working and Female:
data: contingency table
1
2
3
A
B
8 377 385
17 1143 1160
17 7364 7381
42 8884 8926
expected: contingency table
1
2
3
A
1.81
5.46
34.7
chi-square =
B
383.
1.155E+03
7.346E+03
54.9
degrees of freedom =
probability = 0.000
2
Significance for feature working, value White Collar and Male:
data: contingency table
1
2
3
A
B
6 407 413
23 917 940
179 10184 10363
208 11508 11716
expected: contingency table
1
2
3
A
7.33
16.7
184.
B
406.
923.
1.018E+04
chi-square = 2.81
degrees of freedom =
2
probability = 0.245
Significance for feature working, value White Collar and Female:
data: contingency table
1
2
3
A
B
17 377 394
39 1143 1182
223 7364 7587
279 8884 9163
expected: contingency table
1
2
3
A
12.0
36.0
231.
B
382.
1.146E+03
7.356E+03
chi-square = 2.70
degrees of freedom =
probability = 0.259
Table 19
2
&
MAP
Ok r
M
&
AMMM
G'4
-7 w
a%
I
6
"tv
WiMMEME"e
REKENP~iliBENN~limi10
M~t
t!
MaianaWW~eMMM45GaWRE MEN%9IIENE9M3E
99Ar
I I777
M
IM
k
100
MM11
r&%#%
M
0"MM
-o
T,%Ne
~
&#RNE
~
~
TWMEWN5Et
%%M
MMEMMMNMMMMMl
a
9
EMMlM&
mmWE..
...
.......
M305%EEET;M
4
Z--
f@
a~
RMM%&&&~ei~n@1
WMWR
4o -9MMW#
?9M44MMMiGM
M
tee~
EbR5siERE
l
iliE1EE-E
yv
t7i~liiiR
1594
R7-
".
et
!f101
102
-
-ti-
447
-
110
or
---
104
105
106
107
108
109
Table 20
110
Download