Reliability of the global assessment of functioning scale

advertisement
Reliability of The Global Assessment of Functioning Scale
A Thesis
Submitted to the Faculty
of
Drexel University
By
Sarah Beth Woldoff, M.A.
in partial fulfillment of the
requirement for the degree
of
Doctor of Philosophy
July 2004
ii
Dedication
I would like to dedicate my dissertation to my family who has always put my
needs before their own. Their love and encouragement has helped me to attain all of
my goals and I appreciate all they have taught me.
iii
Acknowledgments
I would like to thank my committee members for the time and effort they put into
their participation in my defense. I would also like to extend a special thank you to
my chair, Dr. James D. Herbert, for his countless e-mails, feedback sessions, and
being a true mentor in every sense of the word. Lastly, I want to thank Keith Davis
for his support, encouragement, and always being my greatest fan.
iv
Table of Contents
LIST OF TABLES............................................................................................................. vi
LIST OF FIGURES .......................................................................................................... vii
ABSTRACT..................................................................................................................... viii
1.
BACKGROUND AND LITERATURE SURVEY. ................................................1
1.1 History of the GAF ...........................................................................................1
1.2 Limitation of the GAF ......................................................................................2
1.3 New Measures of Social Functioning ...............................................................3
1.3.1 The SOFAS, GARF, and the K-AXIS ...................................................3
1.4 Reliability of the GAF ......................................................................................6
1.5 Predictive Validity ...........................................................................................8
1.6 GAF and Treatment Outcome.........................................................................11
1.7 The GAF and Allocation of Services..............................................................13
1.8 Future Directions of the GAF .........................................................................14
1.9 Rationale for the Study ...................................................................................15
2.
APPARATUS AND TEST PROCEDURE ...........................................................20
2.1 Instrument .......................................................................................................20
2.2 Research Design..............................................................................................21
2.3 Participants......................................................................................................21
2.4 Materials .........................................................................................................22
2.5 Procedure ........................................................................................................23
2.6 Hypotheses and Statistical Analyses...............................................................24
v
2.7 Power Analysis ...............................................................................................26
3.
RESULTS ...............................................................................................................27
4.
DISCUSSION .........................................................................................................30
4.1 Psychometric Properties of the Computerized GAF…………………………31
4.2 Clinical Versus Actuarial Decision Making ....................................................34
4.3 Clinical Implications........................................................................................36
4.4 Strengths and Limitations ................................................................................38
4.5 Conclusions and Future Directions..................................................................40
LIST OF REFERENCES.............................................................................................42
APPENDIX A: TRADITIONAL GAF SCALE ..........................................................45
APPENDIX B: COMPUTERIZED GAF ...................................................................46
APPENDIX C: VIGNETTE A/HIGH INFORMATION ............................................47
APPENDIX D: VIGNETTE C/LOW INFORMATION .............................................48
APPENDIX E: VIGNETTE B/HIGH INFORMATION.............................................49
APPENDIX F: VIGNETTE D/LOW INFORMATION..............................................50
VITA ............................................................................................................................57
vi
List of Tables
1. GAF Minimum, Maximum, Means and Standard deviations.................................52
2. Test-Retest Reliabilty of GAF Ratings ..................................................................53
vii
List of Figures
1. GAF Ratings Compared to Gold Standard Ratings ................................................53
2. Standard Error of GAF Ratings ..............................................................................54
3. Mean Difference Between GAF Ratings and Gold Standard Ratings...................55
4. Mean GAF scores by Method and Information Across Time................................56
viii
Abstract
Global Assessment of Functioning Scale
Sarah Beth Woldoff, M.A.
James D. Herbert, Ph.D.
Since the Global Assessment of Functioning Scale (GAF) was introduced in the
revised third edition of the Diagnostic and Statistical Manual of Mental Disorders
(DSM-III-R) in 1987 (American Psychiatric Association), its use in clinical settings
has grown considerably. However, there is little research on the reliability and
validity of the scale. Thirty-six psychologists with experience administering the GAF
scored one high-information and one low-information vignette according to two
methods of determining the GAF score. Method A consisted of the traditional paperand-pencil version in which the rater determines the client’s GAF score based on the
individual’s psychological, social, and occupational functioning. Method B
consisted of a computer-assisted GAF (First & Multi-Health Systems Staff, 1997), in
which assessment questions related to psychological, social, and occupational
functioning are presented to the clinician in a yes/no format. Results indicated that
both methods of GAF administration could be scored reliably by raters. Consistent
with predictions, the results revealed a significant interaction between method and
information level. Specifically, in the high information condition, the computerassisted method resulted in scores closer to “gold standards” determined by expert
diagnosticians relative to the paper-and-pencil method. These findings are promising
with respect to the clinical utility of the computer-assisted GAF procedure.
Limitations of the study and directions for future research are discussed.
1
CHAPTER 1: BACKGROUND AND LITERATURE SURVEY
1.1. History of the GAF
The Global Assessment of Functioning Scale (GAF) was introduced as a new
rating scale of overall psychiatric disturbance as Axis V of the revised third edition of
the Diagnostic and Statistical Manual of Mental Disorders (DSM III-R, American
Psychiatric Association, 1987). Since its introduction, its use in clinical settings has
expanded due to the need for an easily and quickly administered measure of severity
of mental illness. It is currently the most commonly used global assessment
instrument for psychiatric patients (Bodlund, Kullgren, Ekselius, Lindstrom, & von
Knorring, 1994; Piersma & Boes, 1997).
The GAF is derived from its predecessor, the Global Assessment Scale (GAS;
Endicott, Spitzer, & Fleiss, 1976), a measure used to assess a patient’s overall level of
functioning for a specified time period (American Psychiatric Association, 1987).
The GAF is based on the assumption that the level of current functioning in
psychiatric populations holds crucial information for treatment planning and
treatment outcome. Furthermore, the GAF can be used in a variety of settings to
follow changes in an individual over time without the need or expense of extensive
training.
The GAF is similar to the GAS in that it has similar criteria and the same
interval design, a value range from 0 (most severe) to 100 (least severe) with 10
anchor points at equal intervals (Hall, 1995). However, unlike the GAS, each interval
of the GAF is accompanied by a behavioral descriptor ranging from “superior
functioning in a wide range of activities…no symptoms” to “persistent danger of
severely hurting self or others…persistent inability to maintain minimal personal
hygiene.” Therefore, the interviewer must first determine the descriptor that
summarizes the client’s current difficulties, and then indicate the severity of
2
impairment within a nine-point range. However, the number of criteria required to
meet a particular interval is not defined. Furthermore, the rater must make a single
rating based upon the patient’s overall level of psychological, social, and
occupational functioning. Additionally, impairment in functioning due to physical
limitations or psychosocial stressors is not included in the GAF ratings. The absence
of these stressors is cited as a major flaw by numerous researchers (Bodlund et. al.,
1994; Hall, 1995, & Piersma et.al., 1997), who have suggested that one’s general
medical condition may have a major impact upon one’s social and occupational
functioning.
1.2. Limitation of the GAF
A major difficulty with the GAF is that it integrates three different dimensions
of functioning (i.e., social, occupational, and psychological symptoms) that do not
necessarily covary. In fact, several studies have found that psychological functioning
does not covary with social and occupational functioning (Bodlund, 1996; Byrne et
al., 1994; Calvoressi, Libman, Vegso, McDougle, & Price, 1996). For example,
individuals can experience mild psychological distress with severe impairment in
their daily functioning and vice-versa. However, the scoring methodology of the
GAF does not allow for this differentiation, and instead forces the rater to focus more
heavily upon either psychological symptoms or functioning.
Additionally, the DSM serves as a multiaxial system that involves assessment
on several axes, each of which refers to a different domain of information that may
assist the clinician in both treatment planning and predicting treatment outcome. The
GAF (Axis V) appears to have considerable overlap with Axis I (clinical disorders
and other conditions that may be the focus of clinical attention) and Axis II
(personality disorders and mental retardation). Assessment of functioning beyond
psychological symptoms is needed to develop a comprehensive diagnosis and
3
treatment plan. A separation of Axes I and II from the GAF scale is also important if
a distinct measure of global functioning is to be developed.
Many researchers (e.g., Goldman, Skodol, & Lave, 1992), have suggested that
the GAF scale should be modified into two separate scales, one to measure global
symptomology and psychological functioning, and the other to measure social and
occupational functioning. In an effort to address these concerns, two new
experimental global functioning scales were included in the DSM-IV, the Global
Assessment of Relational Functioning Scale (GARF) and the Social and Occupational
Functioning Assessment Scale (SOFAS).
1.3. New Measures of Social Functioning
1.3.1. The SOFAS, GARF, and the K-AXIS
The SOFAS is intended to assess one’s level of social and occupational
functioning without the influence of psychological symptoms. Unlike the GAF, this
scale considers the impact of general medical conditions and physical disabilities
upon one’s level of social and occupational functioning. The GARF is used to
evaluate one’s level of functioning in relationships with family, friends, and
significant others. This scale uses the three content areas of problem solving,
organization, and emotion to measure the degree of relational functioning in
psychiatric patients with a ranking from optimal to disrupted. These two
experimental scales distinguish psychiatric symptoms from occupational and social
functioning and are thought to provide clinical information beyond that provided by
the GAF (Hilsenroth et. al., 2000).
Patterson and Shin Lee (1995) evaluated the construct validity of a modified
GAF scale, later to become the SOFAS, using the social, occupational, and clinical
data of 196 psychiatric patients receiving outpatient mental health services. The
4
results demonstrated the presence of convergent and discriminant validity of the
modified GAF scale. The results suggest that the modified scale captures
multidimensional information about functioning including social support, medication
compliance, current living situation, and current potential for violence. In addition,
there was no significant difference in scores using the modified GAF scale across
participants with four Axis I diagnoses, suggesting that this measure of functioning
was not unduly affected by the nature of participants’ specific psychiatric symptoms.
Similarly, Byrne, Dagadakis, Unutzer, and Ries (1996) examined the validity
of a modified version of the GAF. This modified GAF was similar to the SOFAS in
that it was designed to consider only social and occupational functioning and not
symptom severity. The results indicated that the revised GAF was nevertheless
strongly related to patient’s psychiatric symptoms rather than social and occupational
functioning. Even when ratings were performed by board certified psychiatrists; the
assessment of psychiatric symptom severity was the primary predictor of GAF scores.
Byrne and colleagues concluded that their revised version functioned similarly to the
original GAS, which has been found to correlate best with psychiatric symptom
severity, duration of hospitalization, and independent living and self-care skills, and
only weakly with social and occupational functioning.
Hilsenroth and colleagues (2000) completed the only published study
investigating the reliability and convergent/discriminant validity of the GAF, the
GARF, and the SOFAS. All three scales exhibited high levels of inter-rater
reliability. However, the GARF and SOFAS were each more related to the GAF than
to each other. The authors concluded that although all three scales can be scored
reliability, the SOFAS and the GARF appear to tap into somewhat different
constructs than the GAF and provide additional information regarding global
functioning.
5
A second study by Hay and colleagues (2003) evaluated the validity of the
GAF, the GARF, and the SOFAS in a two-year follow up of adult psychiatric
patients. Results demonstrated that the SOFAS and the GAF scores on admission
were significantly and negatively correlated with duration of hospital admission. In
addition, the SOFAS ratings on discharge were significantly and negatively correlated
with overall psychiatric outcome at two years follow-up. The authors conclude that
the SOFAS had better predictive and concurrent validity than the GAF or the GARF
and may be a more useful measure of adaptive functioning.
However, Kennedy and Foti (2003) criticized this study stating that although
the GAF collapses functioning and symptoms into a single measure, the SOFAS
merges social and occupational functioning into a single measure also not allowing
the rater to specify which factor is influencing the overall rating. They suggest that
future research should focus on their scale, the Kennedy Axis V (K Axis which
breaks symptoms and functioning into distinct areas including psychological
impairment, social skills, violence, activities of daily living, and occupational skills,
so that no information is lost as it is in scales that use one global score. In addition,
the K Axis can generate a total score equivalent to that of the GAF. The authors state
that the K Axis allows for the measurement of symptoms and functioning in each of
the major clinical domains, eliminating the need for using multiple instruments to
measure these areas separately, which is both inefficient and expensive. In addition,
the authors claim that the K Axis categorizes clinical information in a way that could
simplify treatment planning.
In conclusion, it appears that the SOFAS and the GARF identify different
elements of information important in identifying stressors and potential focal points
of treatment planning. However, it appears that these two scales may have similar
limitations to those identified for the GAF. Although the research on these new
6
scales is limited, it appears that they may add additional information to the diagnostic
model within the DSM-IV, independent of psychiatric symptom severity.
Additionally, the K Axis has been proposed as a better approach to the assessment of
global functioning and symptomotology, but no independent research has yet been
completed on this proposed Axis. More studies are needed to address the overall
assessment of functioning and its relationship to psychiatric symptoms, treatment, and
outcome measures.
1.4. Reliability of the GAF
Interviewer-rated scales such as the GAF are prone to problems of low interrater reliability because they are used by assessors with different levels of training
and experience (Bodlund et al., 1994; Piersma & Boes, 1997). This is a major
problem in that for scores to be comparable and meaningful across research
paradigms, inter-rater reliability needs to be consistent from one study to another.
That is, some raters may have a propensity to make high ratings, whereas others may
have a propensity to make low ratings. Training and clinical experience may also
impact GAF ratings. For example, raters such as nursing staff may be more interested
in functioning and less interested in symptoms than psychiatrists due to their training
and area of expertise. Moreover, variation in adherence to GAF guidelines and the
heterogeneity of patient illness severity are the two most influential factors limiting
inter-rater reliability (e.g., Dworkin, Friedman, & Telschow, 1990; Jones,
Thornicroft, Coffey, & Dunn, 1995).
Although there have been a select number of publications on the psychometric
qualities of the GAS (e.g., Dworkin et al. 1990), few reports address these issues in
reference to the GAF. Data on the basic reliability and validity of the GAF were not
provided in the DSM III-R (APA, 1987). Therefore, much of the research discussed
in this study will focus on its predecessor, the GAS.
7
Endicott et al. (1976) performed the first series of test-retest reliability studies
on the GAS and found intraclass correlation coefficients (ICC’s) ranging from .61 to
.91 with standard error of measurement scores ranging from 5.0 to 8.0. Most of these
ratings were completed by a limited number of well-trained interviewers, raising
questions about the degree to which the results are generalizable to typical clinical
settings (Luborsky, 1962; Luborsky & Bachrach, 1974). Even with this potential bias,
ICC’s of .60 were obtained in one study within the series and suggest that the scale is
only moderately reliable at best. Endicott et al. (1976) reported that patients’
psychiatric symptoms such as cognitive disorganization, hallucinations, delusions,
suspiciousness, and inappropriate appearance were associated with lower GAS
ratings, thereby demonstrating the association between the severity of psychological
symptoms and clinicians’ ratings of global impairment. Dworkin and colleagues
(1990) used the GAS with multiple clinicians to rate 108 chronically mentally ill
outpatients for a period of 18 months. Raters consisted of 17 psychiatrists, 17
residents, and 17 psychologists, all of whom participated in 90-minute training
sessions. Each training session produced relatively strong inter-rater reliability,
ranging from .66 to .92. The r of .66 occurred at the last training session when there
was pressure to complete the study and trainer fatigue appeared to lead to abbreviated
vignette discussion. Although the GAS was designed to be used with minimal
training, the authors concluded that training is necessary in order to maximize interrater reliability in situations in which individual patients may be rated sequentially by
multiple interviewers. They concluded that training helps ensure that any change in
the GAS score actually reflects change in the patient’s global functioning and not
measurement error.
Given the restricted diversity of clinicians, patients, information, and time
frames in these studies, the inter-rater reliability of the GAF has not yet been well
8
established. Although specialized training improved reliability ratings, such training
does not reflect the GAF’s intended use and training is not possible in all settings due
to time constraints and limited resources. In short, there is a paucity of data on the
inter-rater reliability of Axis V of the DSM-IV, especially among clinicians without
specialized training in the GAF. This is a major concern because in order for scores
to be comparable and meaningful across raters, inter-rater reliability needs to be high.
While this problem appears to be attenuated through specific training it defeats one of
the principal purposes of the GAF. The GAF is promoted as an assessment tool that
is easily and quickly administered with minimal training. However, research has
shown that specialized training is a key element to the GAF’s accurate use. Therefore,
further research should systematically evaluate this concern using the current version
of the GAF instead of its predecessor.
1.5. Predictive Validity
Several studies have reported diagnostic group differences when using the
GAF with diverse patient populations (Byrne et al. 1996; Bodlund et al. 1994;
Patterson & Shin-Lee, 1995). Skodol, Link, Shrout, and Horwath (1988) found
significant variation in GAF ratings among 10 diagnostic groups of both psychiatric
inpatients and outpatients. In particular, schizophrenics received the lowest GAF
ratings, and individuals with anxiety disorders, adjustment disorders, major
depressive disorders as well as those without an Axis I diagnosis, received the highest
ratings even though their level of functioning was stated to be equivalent to that of the
schizophrenics. Based on these findings, Skodol et al. (1988) concluded that
psychiatric symptoms have a greater impact on GAF ratings than variables associated
with social and occupational functioning. A second study evaluating a sample of
more than 10,000 psychiatric patients found that depressed patients had higher GAF
ratings than nondepressed patients even though level of functioning was not
9
consistently different between the two groups (Mezzich, Evanczuk, Mathias, and
Coffman, 1985).
Overall, these studies support the claim that individuals with major mental
illnesses usually obtain lower GAF ratings than those diagnosed with less severe
forms of psychopathology. This conclusion is of great importance as it is assumed
that one’s functioning level holds critical information for treatment planning and
treatment outcome. Furthermore, managed medical care tends to allocate more
resources to psychological disorders associated with significant disabilities in
functioning (Phelan, Wykes, & Goldman, 1994).
Jones and colleagues (1995) used the GAF to assess severity of disturbance in
a sample of 103 chronically mentally ill patients, over a 6 month period. GAF scores
were obtained in combination with other ratings of psychological symptoms and
disability. Following the suggestions of Goldman and colleagues (1992), the GAF
was administered as both an overall scale and as two separate measures, one
designated for symptoms (GAFSYM) and the other for disability (GAFDIS). These
ratings were then compared with changes in medication and need for support. In all
cases, a lower composite GAF score was associated with an increase in clinically
identified support needs of the client. The authors concluded that the GAF is a valid
measure of disturbance of psychological functioning among long-term mentally ill
patients and can be readily used by multidisciplinary raters without extensive training.
A second problem with the GAF is its lack of structure and concrete directions
for its use. Specifically, it is unclear how many criteria a patient must meet in order
to fall into a particular range, and how to determine an exact score within that range.
Additionally, directions in the DSM-IV state that ratings are to be based upon
limitations of functioning due to mental impairments alone, but it is not clear how the
clinician is to distinguish functional impairments that are not related to psychiatric
10
disturbance. Based upon these shortcomings, Goldman et al. (1992) suggested the
use of a modified GAF that separated relational functioning from psychological
functioning and symptoms, and included the influence of physical impairments into
the rating. In addition, Goldman et al. suggested increasing the structure of the GAF
by adding additional directions regarding how scores should be assigned.
Following Goldman’s suggestions, Hall (1995) created a modified GAF. He
increased the structure of the original GAF by adding additional directions for
assigning scores and increasing the number of criteria required within each 10-point
interval. Using 16 detailed patient intake histories and discharge summaries from
hospital charts of depressed patients with or without comorbid eating disorders, GAF
ratings were evaluated by using both the original and modified GAF.
Participants were separated into two groups. Group I consisted of nurses,
physical therapists, social workers, psychology technicians, and clinical
psychologists. Group II consisted of other members of the interdisciplinary team
such as psychiatrists and general practitioners. The intra-class correlation coefficients
between groups for both admission and discharge were higher in the modified GAF
group when compared to the original GAF group. Hall concluded that this was due to
increased inter-rater variability rather than patient heterogeneity as shown by the
standard errors of the ratings for each patient. Specifically, of the 16 standard errors
for patient admission data, 13 were higher in the original GAF group than in the
modified group. Therefore, there was more variability among rater’s GAF scores in
the group using the original GAF. In addition, all of the means for the patient’s
admission GAF scores were also higher in the original GAF groups than in the
modified GAF group. Therefore, rater-s using the modified GAF rated patients as
more impaired than those using the original GAF.
The modified version appears to be useful for increasing inter-rater reliability
11
in settings with multiple raters of varying education and employment histories. In
addition, research (e.g., Goldman et.al., 1992) has shown that some raters tend to rate
consistently high while others rate low. Furthermore, staff members who had
different training and clinical experience performed ratings. Lastly, raters had access
to GAF admission scores before selecting their discharge GAF ratings. Therefore,
discharge ratings may have been influenced by a desire to establish treatment
efficacy. In summary, the predictive validity of the GAF appears to be limited by
unclear directions regarding administration and variations in the way different
disciplines assign GAF scores based upon the same data. Overall, it appears that the
ratings made using the original GAF are strongly influenced by patient symptom
severity instead of by functioning in the targeted areas.
1.6. GAF and Treatment Outcome
Others have shown that clinicians’ GAF ratings of psychiatric patients
demonstrate clinical improvement during and following treatment (Piersma & Boes
1997; Rund et al. 1994). Piersma and Boes (1997) evaluated GAF ratings for adult
inpatients, adult partial hospitalization patients, and adolescent inpatients. Mean
scores for current GAF ratings of adult inpatients were 45.1 at admission and 60.1 at
discharge. For adult partial hospitalization patients’ GAF ratings were 55.2 at
admission and 65.5 at discharge. For adolescent inpatients, means were 30.6 at
admission and 52.7 at discharge. In all three ratings, patients were rated as having
improved functioning at time of discharge. However, this could be attributed to the
fact that raters had access to the GAF admission scores when making GAF discharge
ratings and may have been motivated to show clinical improvement. Differences in
GAF ratings between adolescents and adults were explained by the fact that
adolescent and adult psychiatrists may interpret the GAF rating scale differently.
According to the authors, adolescent psychiatrists tended to rate according to the
12
lowest level of functioning on any of three dimensions considered, whereas the adult
psychiatrists tended to rate the GAF by averaging the levels of functioning on all
three dimensions of functioning. Skodal, Link, Shrout, and Horwath (1988) raised
similar concerns regarding the GAF instructions suggesting that confusion exists as to
whether clinicians are to average the various levels of symptoms and social and
occupational functioning or rate the lowest level if symptoms and social and
occupational functioning are not equivalent.
Gordon and Gordon (1985) examined the relationship between GAF ratings
and treatment outcomes using the retrospective chart review of 232 discharged
patients treated in a psychiatric outpatient clinic. Raters were instructed to rate the
patients on Axis V and judge their symptomatic responses to treatment based upon
information in their charts. The results revealed that 55% of patients were judged as
symptomatically improved; yet only 19% were rated as functionally improved
according to the GAF. In addition, they found that patients’ ratings on the GAF
increased consistently as their number of treatment sessions increased regardless of
psychiatric symptomology.
In a subsequent study, Gordon and Gordon (1987) reported predictable
differences in GAF ratings between chronically ill state hospital patients, long-term
inpatients, short-term inpatients, and outpatients. The results showed that chronically
ill state hospital psychiatric patients reported greater levels of stress and had worse
premorbid levels of functioning compared to the other three groups, and that longerterm inpatients scored worse on both measures than shorter-term inpatients and
outpatients alike. They concluded that a “strain ratio,” a ratio of the Axis IV to Axis
V score, could be used to estimate the amount of treatment an individual patient
requires.
Recently, two studies have expanded upon the idea of “strain ratio,” and
13
examined the association between GAF ratings during treatment and independent
information about post-treatment outcomes. Moos, McCoy, and Moos (2000)
assessed the adequacy of GAF ratings in a sample of 1,688 patients with substance
use disorders, many of whom also had psychiatric disorders. They found that the
GAF ratings did not differ in relation to demographic characteristics such as race,
age, or sex, and that clinicians’ ratings of patients’ current GAF scores were based
primarily on psychiatric diagnoses and symptoms. They found that subjects
diagnosed with both poly-substance abuse, Axis I psychiatric diagnoses, as well as a
medical condition received lower GAF scores. In addition, they concluded that
patients’ symptoms and psychiatric diagnoses were more strongly correlated with
clinicians’ judgments of functioning than were the clients’ actual level of social and
occupational functioning. Lastly, they found little or no correlation between ratings
of patients’ current or highest level of global functioning and psychological, social, or
occupational functioning at a one-year follow-up. They also found that although
lower GAF ratings were associated with longer hospital stays, they did not predict
patients’ probability of readmission or length of readmission. These findings cast
doubt on the use of GAF ratings for predicting treatment outcome and imply that the
GAF fails at one of its intended uses.
1.7. The GAF and Allocation of Services
In a review of 9,055 adult psychiatric intake evaluations, Thompson, Burns,
Goldman, and Smith (1992) found variations in the way GAF ratings were assigned
by managed care case managers compared with treatment providers for the same
cases with the same information. They concluded that higher GAF scores reflected a
need by the managed care companies to limit the use of all inpatient services rather
than their desire to selectively eliminate unnecessary hospitalizations.
14
Seeking to replicate and expand upon their findings, Moos, Nichol, and Moos
(2002) evaluated the value of the GAF as part of a system-wide program for
monitoring the allocation and outcomes of mental healthcare services via the
Department of Veterans Affairs. They found that patients’ clinical diagnoses and
symptoms were better predictors of GAF ratings than was their social and/or
occupational functioning. Moreover, patients with psychiatric diagnoses, psychoses,
or a recent inpatient episode were rated as more impaired. Therefore, indicators of
social and occupational functioning made minimal contributions to the GAF ratings.
The authors concluded that GAF ratings provide little or no information independent
of the clinicians’ judgments about diagnoses and symptom severity and may not be a
useful predictor in programs evaluating the allocation and outcomes of mental health
care. In summary, the outcome research demonstrates little or no relationship between
GAF ratings and length of hospital stay, probability of readmission, and prediction of
overall treatment outcome. These results imply that the GAF does not meet one of its
primary uses, in that it fails to predict patients’ future use of psychological services.
1.8. Future Directions of the GAF
Based upon the limitations of the traditional GAF, a cost-effective computer program
was recently developed by First and Multi Health System Incorporated (1997). The
computerized GAF is a clinical tool that assists clinicians in making an Axis V
diagnosis. The computerized GAF program has been designed to ensure that all
aspects of functioning (i.e., psychological, social, and occupational) are considered in
the assessment process and that symptom severity and level of functioning are taken
into account in making a GAF rating. It uses a decision tree model in which
questions regarding all areas of functioning are presented to the clinician in a yes/no
format. Responses to these questions determine the order of presented questions and
guide the clinician’s overall assessment. However, no reliability or validity
15
information on the computerized GAF is provided in the manual. In addition, there is
no published research on this new instrument in the literature.
1.9. Rationale for the Study
To date, the research that has been conducted on the reliability and validity of
the GAF has been limited. Although some research has examined the GAF, few have
evaluated its psychometric properties. This lack of research is problematic since the
GAF is so widely used and is becoming a major tool within the realm of managed
health-care.
The GAF Scale is used to assess patients of all ages in a number of settings,
including outpatient clinics, inpatient clinics, residential treatment programs, and
private practices. Users include psychiatrists, psychologists, social workers,
physicians, nurses, counselors, and related health care workers. It has become a
particularly important tool within the realm of managed health-care and is currently
used to determine “medical necessity” of mental health benefits and eligibility for
disability benefits. Decisions such as whether an individual’s current mental health
state entitles them to mental health benefits depend not only on the patient’s diagnosis
but also on how it affects their current overall level of functioning. The GAF is also
used to determine the type of treatment, level of necessary care required, and
frequency and duration of treatment. Some managed care companies (e.g., Magellan
Behavioral Health) have issued required GAF ratings for inpatient and outpatient
services.
With these standards and its increasing use in both clinical and research
settings, the need for accurate and consistent GAF ratings has become increasingly
important. It is from this need that the computer-assisted GAF was developed. The
authors state, “it ensures the consideration of all three dimensions of the GAF rating
and provides a means to determine an accurate and reliable GAF rating” (First, M.B.
16
and Multi-Health Systems Staff, 1997, p.4).
In an effort to address the limited research on the validity and reliability of the
GAF and explore the utility of the computer-assisted GAF, Woldoff and Herbert
(2001) evaluated the reliability of clinicians’ judgments using the traditional and the
computer-assisted methods of determining GAF scores of two different case
vignettes. Social, occupational, and clinical data on two different psychiatric
disorders were used to create two case vignettes. One vignette represented an
individual with severe psychopathology such as schizophrenia, while the other
vignette described an individual with mild psychopathology such as an anxiety
disorder. Clinical psychology graduate students evaluated these vignettes at an initial
testing and again at a one week retest using two methods of administering the GAF
and determining the GAF score (i.e., the traditional paper/pencil GAF, and the
computer-assisted GAF).
Contrary to predictions, the results revealed that for the severe
psychopathology vignette the computer-assisted method had greater inter-rater
variability and worse test-retest reliability. In addition, the computer method yielded
greater inter-rater variability but comparable test-retest reliability for the milder
psychopathology vignette. The means for GAF ratings of the milder
psychopathology vignette were almost identical for both methods of administration.
However, for the severe psychopathology vignette, the means were significantly
different resulting in lower GAF scores when used with the computer-assisted GAF.
These results imply that the computer method of GAF determination yields greater
inter-rater variability when compared to the traditional method of GAF
administration.
17
There were a number of limitations of this study that could have impacted the
results. The first is the use of a limited sample since all individuals rating the GAF
were graduate students within the same university setting. This was a threat to the
external validity of the study. A related problem is subject heterogeneity, as all
participants had similar backgrounds and were limited in diversity of race, age, and
gender. The most important limitation was that we were unable to compare
participants’ GAF ratings to any sort of “gold standard.”
These results could be related to the amount of information included in the
individual vignettes. Participants reported that the information provided in the severe
psychopathology vignette was vague and incomplete with regards to particular areas
of functioning and patient history. This was said to be very problematic in regards to
answering the specific questions generated by the computer-assisted GAF program.
As previously discussed, the GAF report is a computerized program that uses a
decision tree with various yes/no questions. Each question taps into a different aspect
of the patient’s symptom severity and/or functional impairment. Based upon the
answers to the questions, an upper limit of the GAF rating is continually set until the
program determines a final range. Therefore, if necessary information is not available
to answer these specific questions, informal conjecture must be used which could
account for greater inter-rater variability across raters when using the computerassisted GAF.
The present study addressed these limitations by evaluating the reliability of
clinicians’ judgments using two different methods of determining GAF scores of a
high information and low information vignette with equal levels of global functioning
18
at two different time points. GAF ratings were determined for vignettes by two
recognized experts in the administration and scoring of the GAF in order to establish
“gold standards” for each vignette. Dr. Robert Spitzer and Mimi Gibbons of New
York University completed the expert ratings for each vignette separately and interrater agreement was perfect between their specific GAF ratings. They have numerous
publications on the GAF, assisted in the development of the DSM-IV casebook, and
have over twenty years of experience training countless research assistants and
clinicians in various structured diagnostic interviews that include the GAF. Each
rater’s GAF scores were compared to the “gold standards” to assess the accuracy of
GAF ratings by both means of administration. In addition, participants were all
experienced practicing clinical psychologists as opposed to graduate students.
The use of two information conditions addressed a major concern of our
previous research and answered questions related to whether the amount of
information available regarding the patient’s current psychiatric symptoms, previous
psychiatric history, family history, current and past social and occupational
functioning, and additional sources of information influence GAF scoring. This holds
significant importance since the GAF is used in a variety of settings with varied
populations ranging from homeless individuals to those who have a well-developed
system of social support. Additionally, depending on the degree of impairment,
psychiatric patients may provide limited or inaccurate self-report information.
Furthermore, some individuals may not be able to discuss current symptoms, duration
19
of symptoms, family history, previous hospitalizations, and social and occupational
histories.
Based upon the results of previous research suggesting that the amount of
information included in the vignettes may directly influence GAF scores, specific
hypotheses were developed accordingly. It was hypothesized that the computerassisted GAF would have higher inter-rater agreement and greater test-retest
reliability relative to the paper-and-pencil condition when used with the highinformation vignettes and greater inter-rater variability and less inter-rater reliability
when used with the low-information vignette. In addition, it was hypothesized that
the computer assisted GAF administration would result in GAF scores that are closer
to the established “gold standard” in the high information condition. Specific
hypotheses are outlined in the following section.
20
CHAPTER 2 APPARATUS AND TEST PROCEDURE
2.1. Instrument
The GAF (APA, 1987) is a rating scale from 0 (most severe) to 100 (least
severe), divided into ten rankings ranging from most severe to no symptoms. Each
ranking is accompanied by a behavioral descriptor of functioning and symptom level
ranging from “absent or minimal symptom (e.g., mild anxiety before a group
presentation)....no more than everyday problems” to “persistent danger of severely
hurting self or others... or persistent inability to maintain minimal personal hygiene or
serious suicidal act with clear expectation of death.” In addition, each individual
ranking has a nine-point range. Therefore, the rater must determine which descriptor
best represents a patient’s level of functioning for the specified time period and then
indicate the severity of the problem by assigning a specific score (see Appendix A).
The computer assisted GAF (First and Multi-Health Systems Staff, 1997), has
the same basic format as the traditional GAF. However, the computer program
assists the clinician in determining the score by using a decision tree with the first
part of the tree covering the impact of patient’s symptom severity and the second part
covering the impact of the patient’s impairment in functioning. The computerized
GAF is designed to be able to be completed in less than three minutes. The decision
tree determines the GAF rating using the minimum number of questions possible.
The computer program provides a means for clinicians to ensure that all aspects of
functioning, (psychological, social, and occupational), are taken into consideration in
patient assessment and that symptom severity and level of functioning are
documented. Furthermore, it is designed to remove the “guess work” in determining
this rating and throughout the decision tree explanation screens make clear what is
specifically meant by certain questions and provides examples of patients whose
ratings fall in each particular range. On completion of the GAF report questions, the
21
computer automatically determines a 10-point GAF rating range. At that point, a
sliding rating scale appears on the screen and the rater specifies an exact GAF rating
within the 10 point range using clinical judgment and hypothetical comparison with
other patients in that range available by simply selecting the explanation button.
However, there is no data on the reliability or validity of this method in either the
manual or the literature (See Appendix B).
2.2. Research Design
The study was a 2 (standard vs. computerized method) X 2 (test and retest) X
2 (high vs. low information) factorial design, with each factor being within-subjects.
Each rater evaluated one of two high information vignettes (Vignette A and B) using
the traditional method of GAF determination and one of two low information
vignettes (Vignette C and D) using the computer-assisted GAF program. The
rationale for using four vignettes varied by the amount of information provided will
be explained below. Vignette A and C were the same vignettes, varying only by the
degree of information, as were Vignettes B and D. Furthermore, vignette assignment
was randomized among participants so that half the raters evaluated Vignette A and D
and half the raters evaluated Vignette B and C. Lastly, an equal number of
participants evaluated each vignette using one of the two methods of GAF
determination.
2.3. Participants
Thirty-six doctoral level clinical psychologists participated. All participants
were recruited from clinical facilities in the greater Miami, Florida area, including
programs and clinics associated with the University of Miami and Citrus Health
Network. All participants had at least one year of experience with the administration
of the GAF with psychiatric populations and currently used the GAF in their practice
or clinic. Participants were chosen based on exposure to the diagnostic assessment
22
tool and willingness to participate in the research protocol.
2.4. Materials
Four fictitious vignettes incorporating social, occupational, and clinical data
representing two distinct psychiatric diagnoses were used. Each vignette included
information obtained during a typical intake evaluation including background
variables (age, sex, etc.), social/developmental history, current psychiatric symptoms,
medication history, and previous psychiatric/psychological care.
Each vignette included relevant background information, behavioral
observations, and clinical findings. The vignettes were developed using information
from the DSM-IV Casebook (APA, 1994) and the DSM-IV (APA, 1994). In
addition, two leading experts in the GAF, Mimi Gibbons, LCSW and Robert Spitzer,
Ph.D., reviewed each vignette and determined GAF scores for each of the four
vignettes. All vignettes had similar demographic information in order to prevent any
confounds that may influence the GAF ratings and prevent comparison. Two of the
vignettes (vignettes A and B) represented a high information condition and two
vignettes (vignettes C and D) represented a low information condition. The
conditions were defined by the amount of information provided in the vignettes,
including second-party information, occupational functioning, social functioning,
symptom severity, psychiatric history, medication history, and family history.
Vignettes A and C described an individual with significant impairment in all
areas of functioning. These vignettes described someone with severe psychological
symptoms and impaired social and occupational functioning. Vignette A
incorporated a large amount of information regarding presenting problems and
psychiatric history, while vignette C contained less information regarding these
variables. Vignettes A and C were the same vignettes varied only by the degree of
23
information provided. Both vignettes contained sufficient information to determine
level of current functioning and a GAF score (See Appendices C and D).
Vignettes B and D represented an individual with significant impairment in
most areas of functioning with severe psychological symptoms and impaired social
and occupational functioning. Vignettes B and D were the same vignette, and varied
only by the degree of information provided. Vignette B incorporated a large amount
of information regarding presenting problems and psychiatric history, whereas
Vignette D contained less information regarding these variables. Additionally, each
vignette contained enough information to determine level of current functioning and a
GAF score (See Appendices E and F).
Four vignettes were used as a control factor since the research design is within
subjects and there are two methods of GAF determination. Therefore, the same
vignette could not be used for each of the two conditions because GAF ratings in one
condition would directly influence GAF ratings made using the second method of
administration. In addition, results could be attributed to peculiarities of the vignette
in question. Therefore two versions of each vignette were used, and were presented to
subjects in counterbalanced order. It is important to note that the distinction between
the vignettes was the amount of information provided in the vignettes and not the
severity of psychopathology they represented (i.e., mild versus severe).
2.5. Procedure
Following participant selection, raters underwent a brief 15-minute overview
of the procedures involved in using the computerized version of the GAF. One high
information and one low information vignette was evaluated by each rater according
24
to two methods of administrating and determining the GAF score. Method A
consisted of the traditional paper/pencil version in which the rater determines the
client’s GAF score based on the individuals psychological, social, and occupational
functioning. Based upon the information at hand and their own clinical judgment
they determined the particular range (behavioral descriptor) that best described the
client. The clinician then determined a specific score that indicated the client’s
severity within that particular range.
Method B consisted of the computer-assisted GAF (First & Multi-Health
Systems Staff, 1997) in which assessment questions related to psychological, social,
and occupational questions were presented to the clinician in a yes/no format. A
decision tree was then used to guide the clinician and determine a GAF rating using
the minimum number of questions possible. Therefore, the clinician only needed to
click on the yes icon or the no icon and the assessment tool generated the GAF 10point range based upon the clinician’s responses. The clinician then determined an
exact GAF score using the information icon, which provided hypothetical
comparisons with other patients in that range.
One week following initial participation in the study, each rater re-evaluated
the same two vignettes in order to assess test-retest reliability. The raters re-evaluated
each vignette using the same method of GAF determination previously utilized.
2.6. Hypotheses and Statistical Analyses
1. It was hypothesized that the computer-assisted GAF, relative to the paperand-pencil method, would have greater correspondence with the “gold standard” GAF
25
score when used with the high-information vignette relative to the low information
vignette.
In order to address this hypothesis, a 2 (method: computer-assisted GAF vs.
the traditional paper and pencil GAF) by 2 (level of information: high information
vignette vs. low information vignette) repeated-measures ANOVA was used to
analyze the data. A difference score was calculated between the GAF ratings and the
“gold standard” for each subject in each condition and the difference score served as
the dependent variable for this analysis. A significant interaction effect was predicted
between the two independent variables (e.g., method and vignette information).
2. It was hypothesized that relative to the traditional paper-and-pencil method,
the computer-assisted GAF would have higher inter-rater agreement when used with
the high information condition when compared to the low information condition.
The variability within each condition served as the dependent variable and
Mauchley’s test of sphericity was used to assess the homogeneity of variance across
conditions. It was predicted that the homogeneity of variance would reach statistical
significance, indicating a difference in variances between the two methods of GAF
determination in each information condition.
3. It was hypothesized that relative to the traditional paper-and-pencil method,
the computer-assisted GAF administration would result in greater test-retest
reliability in the high information condition when compared to the low-information
condition.
Cronbach’s alpha correlation coefficients were conducted between subjects’
initial GAF ratings and GAF ratings made one week later. This analysis resulted in 4
separate correlations, one for each method and one for each level information.
26
Although these hypotheses are stated as directional, the theory behind them does not
support this presumption. Therefore, 2-tailed tests of significance were used in the
statistical analyses.
2.7. Power Analysis
A power analysis based upon a 2 by 2 ANOVA was used to determine the
number of subjects needed to demonstrate statistical significance. A small effect size
of .25, based on Cohen’s standards for a 2 by 2 ANOVA, was used because no prior
research using this type of comparison is available within the literature. Using a
power analysis program, Sample Power (Borenstein, Rothstein, & Cohen, 2001), a
total sample of 35 participants was identified as necessary to obtain a power of .80,
with alpha set at .05.
27
CHAPTER 3 RESULTS
Thirty-six participants scored GAF’s on two of four vignettes. Descriptive
data are provided in Table 1 and Figure 1. Overall, the mean scores for the vignettes
were in the moderate-to-severe range.
It was hypothesized that the computer-assisted GAF, relative to the paper-andpencil method, would have greater correspondence with the “gold standard” GAF
score, especially in the high information condition relative to the low information
condition. In other words, a 2-way interaction (method by level of information) was
predicted. In order to address this hypothesis, a 2 (method: computer-assisted GAF
vs. the traditional paper and pencil GAF) by 2 (high vs. low information) repeatedmeasures analysis of variance (ANOVA) was conducted to examine differences
among the GAF scores. For this analysis, scores at the two assessment time points
were collapsed, thereby yielding a single data point for each participant for each of
the experimental conditions. The dependent variable consisted of the difference score
between the gold standard for a given condition and the participant’s score for that
condition. That is, the gold standard score was subtracted from each participant’s
score, depending on vignette (40.5 for Vignettes A and C and 47.5 for Vignettes B
and D (See Figure 2 and 3).
The ANOVA indicated no main effect for method, F (1, 35) = 0.50, nonsignificant, p>.05, or for level of information, F (1, 35) = 3.06, non-significant, p>.05.
However, consistent with predictions, the method by information interaction reached
significance, F (1, 35) = 6.49, significant, p < .05. In order to clarify the interaction,
28
Tukey post-hoc tests were conducted to examine the various means at each point.
There was a significant difference in GAF scores for the low and high information
vignettes for the paper-and-pencil method, F (2,69)=8.88, significant, p<.001, with
the high information condition resulting in higher discrepancy scores than the low
information condition. Also consistent with predictions, there was also a significant
difference between the paper-and-pencil administration and computer-assisted
administration for the high information vignette condition, F (1, 35)=5.02, significant,
p<.05, with the paper-and-pencil condition yielding higher discrepancy scores than
the computer condition. Figure 4 illustrates this interaction. No other differences
were identified among the other group means.
The second hypothesis concerned the degree to which the variance in each of
the GAF administration conditions and level of information conditions may have
differed. The means and standard deviations of the participants’ GAF scores for each
condition were as follows: Paper-and-Pencil/Low Information Condition: M=42.05,
SD=7.65; Computer Assisted/Low Information Condition: M=44.35, SD=10.78;
Paper and Pencil/High Information Condition: M= 48.61, SD= 6.16; and Computer
Assisted/ High Information Condition: M=43.81, SD=11.84. The appropriate statistic
to examine the equality of variance in repeated-measures ANOVA is sphericity.
Sphericity is a test of the equality of variance assumption in repeated-measures
ANOVA, which can be thought of as an extension of the homogeneity of variance
assumption in independent measures ANOVA (Cohen & Cohen, 1983). The
sphericity for the method and information main effects, as well as for the method by
29
information interaction, were all 1, indicating that the variances were equal and not
statistically different.
The third hypothesis examined the one-week test-retest reliability of each
method of GAF administration across each information condition. Table 2 presents
the Cronbach alphas for each pair, as well as a qualitative description of the
magnitude of the reliability coefficient. Most of the reliability quotients from initial
testing to the retest were excellent to good; the exception was the coefficient for the
computer GAF scores for the low information version of Vignette A, which could be
due to one very low score. When the extreme outlier was removed, the alpha
increased from .58 to .70.
30
CHAPTER 4 DISCUSSION
The Global Assessment of Functioning Scale (GAF) is a commonly used
measure of overall severity of psychiatric disturbance and associated functional
impairment. Despite its widespread use, there has been relatively little empirical
research on the GAF, particularly related to its psychometric properties. In addition,
the GAF combines three areas of functioning that do not necessarily covary (i.e.,
psychological, social, and occupational functioning) into a single measure, and
excludes the impact of physical impairments (Skodal, 1988; Goldman, et.al. 1992;
Roy-Byrne et. al., 1996). Several researchers have identified this exclusion as an
important problem with the GAF. For example, a depressed individual may
experience a reduction in symptoms through medication, but continue to have
strained relationships with others and difficulty maintaining job stability, whereas
another individual with depression might be able to maintain employment but
experience occasional bouts of depression.
Prior research has highlighted several problems with the paper-and-pencil
GAF, including its lack of structure, poor reliability, and the fact that it is unclear
with respect to how to integrate the three areas of functioning into a single rating. In
an effort to address these limitations, First and colleagues at Mental Health Systems,
Inc. created the computer-assisted GAF in 1997. The program consists of assessment
questions related to psychological, social, and occupational functioning presented in a
yes/no format. A decision tree is then used to guide the clinician and determine a
GAF rating using the minimum number of questions possible. The assessment tool
31
generates a 10 point range based upon the clinician’s responses, and the clinician
then determines an exact GAF score within that range. Only one study has examined
the psychometric properties of the computer-assisted GAF. Woldoff and Herbert
(2004) evaluated the reliability of clinical psychology graduate students' judgments
using the traditional and the computer-assisted methods of determining GAF scores
of two different case vignettes. One vignette represented an individual
with severe psychopathology, whereas the other vignette described an individual
with mild psychopathology. These vignettes were evaluated at an initial testing
and again at a one week retest using two methods of administering the GAF and
determining the GAF score (i.e., the traditional paper/pencil GAF, and the
computer-assisted GAF). The results revealed that for the severe psychopathology
vignette the computer-assisted method had greater inter-rater variability and worse
test-retest reliability. In addition, the computer method yielded greater inter-rater
variability but comparable test-retest reliability for the milder psychopathology
vignette. The means for GAF ratings of the milder psychopathology vignette were
almost identical for both methods of administration. These results implied that the
computer method of GAF determination yields greater inter-rater variability when
compared to the traditional method of GAF administration. However, we did not
compare participants GAF ratings to any sort of “gold standard" ratings.
4.1 Psychometric Properties of the Computerized GAF
The present study extended upon the existing body of research on the GAF in
several ways. It assessed the possibility of using a computer-assisted GAF
32
administration instead of the traditional paper-and-pencil GAF administration.
Moreover, it explored the potential impact that the amount of available information
regarding treatment history, symptoms, and family history had upon GAF ratings. In
addition, this study evaluated the accuracy of GAF ratings relative to expert ratings
using the two methods of GAF administration.
A primary goal of this study was to examine the relationship of practicing
clinicians’ ratings to “gold-standard” scores as determined by a group of renowned
GAF experts. Gold standard scores provided a means of evaluating these two
measures that has not appeared in the literature to date. Results indicated that for the
traditional paper-and-pencil GAF administration method, the low information
condition was closer to the gold standard score than was the high information
condition. This finding may have occurred because the paper-and-pencil format lacks
structure in the form of specific instructions regarding the determination of an actual
GAF score. Therefore it is possible that greater amounts of available information
created greater confusion for clinicians as to how to incorporate the information into a
single composite rating.
Also consistent with predictions, the paper-and-pencil condition resulted in
greater discrepancy scores than the computer-assisted administration for the high
information condition only; the two methods did not significantly differ from one
another in the low information condition. Nevertheless, it should be noted that there
was a nonsignificant trend for higher discrepancy scores for the paper-and-pencil
method in the low information condition as well. This pattern of findings could be
33
related to the increased structure provided by the computer-assisted GAF program.
Specifically, since some of the “guesswork” is removed through the use of structured
questions and the decision tree model, the computerized method allows for greater
consistency across ratings. In addition, the computer-assisted GAF forces the
clinician to take all three areas of functioning into account and determines the
appropriate GAF range based upon the responses provided by the clinician. This
increased structure is especially relevant to situations in which a relatively high
amount of information is available, which would characterize most actual clinical
situations.
Although both methods of GAF administration demonstrated adequate oneweek test-retest reliability, the computer-assisted GAF resulted in greater inter-rater
variability as indicated by the standard deviations for both the high and low
information conditions when compared to the traditional paper-and-pencil method of
administration. This finding appears to be related to outlier scores in the computerassisted condition, which decreased reliability coefficients. When the outliers were
removed from the analysis, a decrease in the standard deviations was observed and
the reliability coefficients increased from poor to acceptable. These results could be
related to the procedural components of the computer-assisted and paper-and-pencil
GAF. Although most participants reported finding the computerized GAF helpful, a
small number of participants reported that they found some of the computer-assisted
questions confusing and repetitive. Specifically, they stated that the decision tree
model occasionally presented the same question more than once, suggesting perhaps
34
that they should change their initial response. The purpose of this feature of the
program is to provide a reliability check if responses to questions within the same
domain are not in agreement. Although not technically a program flaw, it may create
confusion among users, and should be addressed in future revisions of the program.
As previously discussed, the computerized program poses questions that tap into
different aspect of the patient’s symptom severity and functional impairment that
figure into the GAF score. If necessary information is not available to answer these
specific questions, informal conjecture must be used, which could account for greater
inter-rater variability across raters when using the computer-assisted GAF.
Moreover, participants also reported confusion over how to select a 10 point range on
the paper-and-pencil GAF if all of the indicators identified for that range were not
explicitly discussed in the vignette.
4.2 Clinical Versus Actuarial Decision Making
Differences in the raters’ concept of mental health and the procedural
differences between the computer-assisted GAF and the traditional GAF may have
contributed to the present results. Although all raters were identified as experienced
clinicians given their education, training, and professional experience, research has
shown that experience is only weakly associated with performance (Dawes, Faust,&
Meehl, 1989). Research on clinical decision making has contrasted two approaches,
clinical and actuarial. In the actuarial approach, a decision is made using empirically
established relations between data and the event of interest, independent of human
judgment. In the clinical approach, decisions are based on the mental processes of the
35
actual human judge.
The literature has shown that in many decision making domains, actuarial
models are more accurate in making predictions about outcomes than are trained
human judges. In addition, a number of researchers have concluded that professional
identity is unrelated to the accuracy of clinical judgment (Meehl, 1986; Dawes et al.,
1989; McCauley, 1991). Moreover, research has shown that psychologists’ training
and clinical experiences may actually cause an over-exaggerated sensitivity to
psychopathology.
Secondly, clinical judgment has been shown to be susceptible to various
biases whereas actuarial judgment is not. This literature was reviewed by Garb and
Shramke (1996), in which biases in the clinical judgments of psychologists and
psychiatrists were examined using meta-analytic methods. Blacks and Hispanics
were more likely to be misdiagnosed as schizophrenics, middle-class patients were
more likely to be referred to outpatient psychotherapy than lower-class patients, and
Black patients were more likely to be prescribed anti-psychotic medication. In
conclusion, the author stated that there is little evidence that increased experience
results in increased accuracy of clinical prediction.
Based upon this research, it is possible that the results of the present study are
related to the differences in methodology. Specifically, the traditional GAF relies
heavily upon clinical judgment and thereby can be subject to potential judgment
errors such as regression toward the mean, overconfidence, and the hindsight bias. In
contrast, the computer-assisted GAF uses more of an actuarial approach in
36
determining a GAF score.
4.3 Clinical Implications
Overall, these results generally support the reliability and clinical utility of the
GAF. The results suggest that whereas both methods can be utilized reliably by
clinical psychologists, the computer-assisted GAF appears to generate scores that are
more accurate, at least in the sense of being closer to “gold standard” scores provided
by expert diagnosticians. A notable strength of the computer-assisted GAF is that it
ensures that all aspects of functioning, (psychological, social, and occupational), are
explicitly considered (First, 1997). This feature of the program is of significant
importance as the GAF is used in a variety of settings with varied populations ranging
from homeless individuals to those who have a highly developed system of social
support. In addition, the computer-assisted GAF is time-efficient, easy to administer,
and allows for appropriate record keeping. In real-world practice, the most
significant drawback is the potential cost and need for a computer given the limited
resources in many clinical settings.
The present findings are noteworthy for several reasons. Providers and
consumers of mental health services are interested in both symptoms and functioning.
In addition, as mental health providers and managed care providers utilize the GAF
scale to justify insurance reimbursement for services and to demonstrate efficacy of
treatment interventions, the need for accurate and consistent measurement tools is of
obvious importance. Moreover, all patients discharged from a psychiatric
hospitalization are required to have a GAF score recorded as part of the discharge
37
process in the United States. If GAF scores continue to be given in an unstructured
manner due to confusion over how to integrate the potentially disparate contributions
of a patient’s psychiatric symptoms and functioning into a single GAF rating,
problems ensuring an appropriate standard of care may result. In addition, a reliable
and valid measure of symptoms and functioning is critical to clinicians’ efforts to
document the effectiveness of their interventions.
A transition to the computer-assisted GAF program would require numerous
changes in policy, education, and training in mental health agencies. First, training
would be required to ensure that all clinicians understand the computer-assisted GAF
and the procedural components of the actual program. In addition, practice sessions
using videotaped case presentations would be recommended to familiarize all
clinicians with the computer-assisted program and its intended use. Second, a major
change in policy would be required so that GAF scores were no longer given based
upon managed health-care requirements. Specifically, clinicians would be permitted
to give GAF scores reflective of the clients actual functioning without being
penalized by the managed-care provider because the score is not within the
designated range for service criteria. Lastly, policies would have to be modified to
allow for the cost of the program, training, and changes in how GAF scores are used
to determine treatment eligibility. Therefore, with further research and health-care
reforms, a transition to the computer-assisted GAF program holds promising
potential.
38
4.4 Strengths and Limitations
These results should be interpreted with caution given that the study had
several possible limitations. First, the results may be limited by the relatively small
sample of clinicians, most of whom were female and Latino, from a small yet diverse
community in South Florida. The small sample size may have limited statistical
power for some tests. In addition, the clinicians’ individual backgrounds and
graduate training differed from one another. To the extent that graduate training
affects GAF ratings, responses may have differed depending upon training, clinical
orientation, and area of expertise. However, the sample size was insufficient to
examine the potential effects of such variables. Another potential factor that may
have had an adverse impact on GAF ratings was the fact that participants did not have
the same level of experience and expertise with the GAF, and specific training was
not completed to enhance reliability. Lastly, the sample may not have accurately
represented the wide array of individuals in the mental health profession who use the
GAF on a regular basis such as social workers, mental health technicians, and nursing
staff.
Another potential limitation of this study may have been the use of fictitious
vignettes that represented only one specific gender, race, and age group. It is possible
that biases regarding these demographics may have impacted GAF ratings. In
addition, the vignettes may not have contained some of the relevant information that
is typically obtained in the actual clinical intake process. Future research might use
actual transcriptions of clinician-patient interviews observed in actual clinical
39
settings. This would ensure replication of the conditions under which the GAF rating
would actually occur. Moreover, the vignettes represented mild-to-moderate levels of
psychiatric disturbance similar to what would be found at a community outpatient
clinic. The use of such a symptom profile may have restricted the generalizability of
these results and leads to the need to replicate these findings with vignettes
representative of an inpatient population.
A final possible limitation concerns the specific questions used in the
computer-assisted program. Specifically, no information is available as to how the
individual questions were developed and how it was determined to include them in
the decision tree model. The author simply reports that he completed his residency
under Robert Spitzer in the Biometrics Research Department at Columbia University
and has a doctoral degree in Computer Science. In addition, the authors of the
program make the rather dubious claim that “psychometric data is not needed as the
program is a diagnostic tool based upon the DSM and not a test” (First et. al., 1997, p.
4).
In spite of the aforementioned limitations, this study nevertheless has several
noteworthy strengths. This study was the first to evaluate the psychometric properties
of the GAF in relation to the amount of information available. This is of importance
given the fact that the GAF is used in a wide array of settings with varied patient
populations. In addition, unlike other studies, the reliability findings were obtained
for raters without formalized training in the use of the GAF, thereby enhancing the
external validity of the findings as representing a realistic estimate of reliability
40
figures found in real world practice with clinical psychologists. Another significant
strength of this research is that GAF ratings were determined by two recognized
experts in the administration and scoring of the GAF to establish “gold standard”
scores for each vignette.
4.5 Conclusions and Future Directions
In summary, the results of this study support the previous finding that the
GAF can be rated reliably when used with mild-to-moderately impaired individuals.
Future studies are needed to explore the aforementioned shortcomings using more
clinically typical stimuli with varying levels of psychopathology and a larger, more
diverse, sample. In addition, it is necessary to evaluate whether the computer-assisted
GAF provides information independent of Axis I and Axis II of the DSM-IV
diagnostic system and whether it contains information about patients’ social and
occupational functioning that is independent of clinical judgment about the severity
of their psychological symptoms. Moreover, future research is needed to evaluate the
reliability and validity of the proposed GAF replacements including the computerized
GAF, SOFAS, GARF, and the K-Axis to determine if they are an improvement to the
existing Axis V. Lastly, future research needs to ascertain whether the GAF and
these proposed replacements can be used effectively as outcome measures.
The present findings are consistent with the conclusion that the GAF is a
worthwhile tool, and that the computer-assisted GAF may be an improvement over
the traditional paper-and-pencil GAF in some circumstances. The traditional GAF
lacks structure and depends greatly upon clinical judgment and conjecture. In
41
addition, the directions do not explicitly state how to integrate the three areas of
functioning into a single composite rating. The computer-assisted GAF attenuates
many of these limitations by providing a structured set of questions that address each
area of functioning to ensure that they are considered in the composite score.
Furthermore, the computer-assisted GAF follows a decision tree model that
determines the appropriate GAF range based upon the responses of the clinician.
Further research is needed to evaluate the clinical utility of the GAF relative to other
recently developed measures.
42
LIST OF REFERENCES
American Psychological Association. (1987) Diagnostic and Statistical
Manual of Mental Disorders (3rd ed., revised) (DSM-III-R). Washington, DC: APA.
American Psychological Association. (1994) Diagnostic and Statistical
Manual of Mental Disorders (4th ed.) (DSM-IV). Washington, DC: APA.
Bodlund, O., Kullgren, G., Ekselius, L., Lindstrom, E., and von Knorring, L.
(1994). Axis V- Global assessment of functioning scale: Evaluation of a self-report
version. Acta Psychiatrica Scasndinavica, 90,342-347.
Borenstein, M., Rothstein, H., & Cohen, J. (2001). Sample Power [Computer
software]. Chicago, Illinois: SPSS Inc.
Byrne, P.R., Dagadakis, C., Unutzer, J. , & Ries, R. (1996). Evidence for the
limited validity of the revised global assessment of functioning scale. Psychiatric
Services, 47, 864-866.
Calvocressi, L., Libman, D., Vegso, S.J., McDougle, C.J., & Price, L.H.
Global functioning of inpatients with obsessive-compulsive disorder, schizophrenia,
and major depression. Psychiatric Services, 49, 379-381.
Cohen, J. & Cohen, P. (1983). Applied multiple regression/correlation
analysis for the behavioral sciences (2nd ed.). New Jersey: Lawrence Erlbaum
Associates, Inc.
Dawes, R.M., Faust, D., & Meehl, P.E. (1989). Clinical versus actuarial
judgment. Science, 243,1668-1674.
Dworkin, R.J., Friedman, L.C., & Telschow, R.L. (1990). The longitudinal
use of the global assessment scale in multiple rater situations. Community Mental
Health Journal, 26,335-344.
Endicott, J., Spitzer, R.L., & Fleiss, J.L. (1976). The global assessment scale.
Archives of General Psychiatry, 33, 766-771.
First, M. (1997). A DSM-IV Program for Windows: The GAF Report
(Computer software and manual). Toronto, Canada: Multi-Health Systems.
Garb, H. N., & Schramke, C. J. (1996). Judgment research and
neuropsychological assessment: A narrative review and meta-analyses. Psychological
Bulletin, 120, 140–153.
43
Goldman, H.H., Skodol, A.E., & Lave, T.R. (1992). Revising Axis V for
DSM-IV: A review of measures of social functioning. American Journal of
Psychiatry, 149, 1148-1156.
Gordan, R.E. & Gordon, K.K. (1985). Predicting length of hospital stay of
psychiatric patients. American Journal of Psychiatry, 142, 235-237.
Gordan, R.E. & Gordon, K.K. (1987). Relating axes IV and V of DSM-III to
clinical severity of psychiatric disorders. Canadian Journal of Psychiatry, 32, 423424.
Gordan, R.E., Skodal, A.E., & Lave, T.R. (1992). Revising axis V for DSMIV: A review of measures of social functioning. American Journal of Psychiatry, 149,
1148-1156.
Hall, R.C. (1995). Global assessment of functioning: A modified scale.
Psychosomatics,36, 267-275.
Hilsenroth, M.J, Ackerman, S.J., Blagys, M.D., Baumann, B.D., Baity, M.R.,
Smith, S.R., Price, J.L., Smith, C.L., Heindselman, T.L., Mount, M., & Holdwick,
D.J. (2000). Reliability and Validity of DSM-IV Axis V. American Journal of
Psychiatry, 157, 1858-1863.
Higgins, J. & Purvins, K. (2000). A comparison of the Kennedy Axis V and
the Global Assessment of Functioning Scale. Journal of Psychiatric Practice, 6, 8490.
Jones, S.H., Thornicroft, G., Coffey, M., & Dunn, G. (1995). A brief mental
health outcome study: Reliability and validity of the global assessment of functioning
(GAF). British Journal of Psychiatry,166, 654-659.
Kennedy, J.A. (2003). Mastering the Kennedy Axis: A New Psychiatric
Assessment of Patient Functioning. Washington:DC, American Psychiatric
Publishing.
Luborsky, L. (1962). Clinicians judgement of mental health. Archives of
General Psychiatry, 7, 407-417.
Luborsky, L. & Bachrach, H. (1974). Factors influencing clinician's
judgement of mental health. Archives of General Psychiatry, 31, 292-299.
McCauley, C. (1991). Selection of national science foundation graduate
fellows: A case study of psychologists failing to apply what they know about decision
making. American Psychologist, 46, 1287-1291.
44
Meehl, P.E. (1996). Causes and effects of my disturbing little book. Journal of
Personality Assessment, 50, 370-375.
Mezzich, J.E., Evanczuk, K.J., Mathias, R.J., & Coffman, G.A. (1984).
Admission decisions and multiaxial diagnosis. Archives of General Psychiatry, 41,
1001-1004.
Moos, R. H., McCoy, L., & Moos, B. S. (2000). Global assessment of
functioning ratings: Determinants and role as predictors of one-year treatment
outcomes. Journal of Clinical Psychology, 56 (4), 449-461.
Moos, R.H., Nichol, A.C., & Moos, B.S. (2002). Global assessment of
functioning ratings and the allocation and outcomes of mental health services.
Psychiatric Services, 53, 730-737.
Patterson, D. & Shin-Lee, M. (1995). Field trial of the global assessment of
functioning scale-modified. American Journal of Psychiatry, 152, 1386-1388.
Phelan, M., Wykes, T., & Goldman, H. (1994). Global functioning scales: A
review. Social Psychiatry and Psychiatric Epidemiology, 29, 205-211.
Piersma, H.L. & Boes, J.L. (1997). The GAF and psychiatric outcome: A
descriptive report. Community Mental Health, 46, 117-121.
Roy-Byrne, P., Dagadakis, C., Unutzer, J., & Ries, R. (1996). Evidence for
limited validity of the revised global assessment of functioning scale. Psychiatric
Services, 47 (no. 9,) 864-866.
Skodol, A.E., Link, B.G., Shrout, P.E., & Horwath, E. (1988). The revision of
Axis V in DSM-III-R: Should symptoms be included. American Journal of
Psychiatry, 145, 825-829.
Thompson, J.W., Burns, B. J., Goldman, H. H., & Smith, J. (1992). Initial
level of care and clinical status in managed mental health programs. Hospital and
Community Psychiatry, 43, 599-603.
Williams, J. B. (1985). The multiaxial system of DSM-III: Where did it come
from and where should it go. Archives of General Psychiatry, 42, 175-180.
Woldoff, S. B. & Herbert, J.D. (2001). Reliability and Validitty of the GAF
Using Two Methods of GAF Determination(Masters Thesis, MCP Hahnemann
University, 2001). Manuscript in preparation
45
APPENDIX A TRADITIONAL GAF SCALE
Superior functioning in a wide rage of activities, life's problems never seem
91-100 to get out of hand, is sought out by others because of his or her many
qualities. No symptoms.
Absent or minimal symptoms, good functioning in all areas, interested and
90-81 involved in a wide range or activities, socially effective, generally satisfied
with life, no more than everyday problems or concerns.
If symptoms are present they are transient and expectable reactions to
80-71 psychosocial stresses; no more than slight impairment in social,
occupational, or school functioning
Some mild symptoms OR some difficulty in social, occupational, or school
70-61 functioning, but generally functioning pretty well, has some meaningful
interpersonal relationships.
60-51
Moderate symptoms OR any moderate difficulty in social, occupational, or
school functioning.
50-41
Serious symptoms OR any serious impairment in social, occupational, or
school functioning.
Some impairment in reality testing or communication OR major impairment
40-31 in several areas, such as work or school, family relations, judgment,
thinking, or mood.
Behavior is considered influenced by delusions or hallucinations OR serious
30-21 impairment in communications or judgment OR inability to function in all
areas.
20-11
Some danger or hurting self or others OR occasionally fails to maintain
minimal personal hygiene OR gross impairment in communication.
Persistent danger of severely hurting self or others OR persistent inability to
10-1 maintain minimum personal hygiene OR serious suicidal act with clear
expectation of death.
46
APPENDIX B COMPUTER-ASSISTED DECISION TREE MODEL
47
APPENDIX C VIGNETTE A/HIGH INFORMATION
Ms. Montgomery is a 41 year old African-American woman who has been
participating in outpatient group therapy at a community mental health clinic for the
past ten months. She has a history of two psychiatric hospitalizations due to severe
depression and suicidal ideation. She is the third of four children, and reports that she
was raised by her paternal grandparents. While she reportedly developed
intellectually and physically at a normal rate, Ms. Montgomery describes herself as a
shy, lonely, and quiet child. In addition, she characterizes her family relationship
during her adolescence as “cold and unsupportive.” She also reports that she did not
suffer from any major illnesses during her childhood.
Ms. Montgomery states that she felt neutral about school and received average
grades. In addition, Ms. Montgomery states that she had several friends and that she
rarely got into trouble. After completing a two-year medical assistant program, Mrs.
Montgomery enlisted in the army for approximately 3 years, after which time she
received a medical discharge because of psychological problems. Since leaving the
army, Ms. Montgomery states that she enrolled in Community College, but did not
finish her degree. In addition, Ms. Montgomery reports that she has had more than
seven different full-time jobs, and that she has experienced periods of unemployment
that have lasted for more than a year. Currently, Ms. Montgomery’s major source of
income is her unemployment benefits. She is also currently pursuing SSI benefits.
Ms. Montgomery reports that she has never been married and has no children.
She reports that her parents and all grandparents are deceased. Ms. Montgomery also
reports that she has contacted her siblings over the phone, but has a difficult time
visiting them since they live far away.
Ms. Montgomery appears about her stated age and is always prompt for her
appointments. She was approachable and had a relatively pleasant demeanor.
However, Ms. Montgomery was often poorly groomed, with soiled clothes and an unbathed appearance. Ms. Montgomery often had trouble making eye contact during
the assessment and was somewhat guarded. In addition, Ms. Montgomery had a
difficult time staying focused on the questions being asked, and tended to launch into
tangential stories at inappropriate times. Moreover, these stories tended to have a
common theme of persecution and paranoia.
Ms. Montgomery appears to have very poor insight into her psychological
problems. During all of her appointments with the current evaluator, she insisted that
she had been labeled incorrectly as having “mental problems” and that she did not
belong with the other individuals in her group therapy sessions. In addition, she
stated that she wanted a job but that she had found fault with each job opportunity,
apparently believing that she could do better. Overall, it appears that Ms.
Montgomery has some very negative views about her social situation and the overall
quality of her life. She has been consistently dissatisfied with her economic status as
well as her social pursuits. She denies any suicidal or homicidal ideation.
48
APPENDIX D VIGNETTE C/LOW INFORMATION
Ms. Montgomery is a 41 year old African-American woman who has been
participating in outpatient group therapy at a community mental health clinic for the
past ten months. She has a history of severe depression. She is the third of four
children, and reports that she was raised by her paternal grandparents. While she
reportedly developed intellectually and physically at a normal rate, Ms. Montgomery
describes herself as a shy, lonely, and quiet child. In addition, she characterizes her
family relationship during her adolescence as “cold and unsupportive.” She also
reports that she did not suffer from any major illnesses during her childhood.
Ms. Montgomery states that she felt neutral about school and received average
grades. In addition, Ms. Montgomery states that she had several friends and that she
rarely got into trouble. After completing a two-year medical assistant program, Mrs.
Montgomery enlisted in the army for approximately 3 years, after which time she
received a medical discharge because of psychological problems. Since leaving the
army, Ms. Montgomery states that she enrolled in Community College, but did not
finish her degree. In addition, Ms. Montgomery reports that she has had more than
seven different full-time jobs, and that she has experienced periods of unemployment
that have lasted for more than a year. Currently, Ms. Montgomery’s major source of
income is her unemployment benefits. She is also currently pursuing SSI benefits.
Ms. Montgomery also reports that she has never been married and has no children.
Ms. Montgomery was approachable and had a relatively pleasant demeanor.
However, Ms. Montgomery was often poorly groomed, with soiled clothes and an unbathed appearance. Ms. Montgomery often had trouble making eye contact during
the assessment and was somewhat guarded. Ms. Montgomery appears to have very
poor insight into her psychological problems. During all of her appointments with the
current evaluator, she insisted that she had been labeled incorrectly as having “mental
problems” and that she did not belong with the other individuals in her group therapy
sessions. In addition, she stated that she wanted a job but that she had found fault
with each job opportunity, apparently believing that she could do better. Overall, it
appears that Ms. Montgomery has some very negative views about her social
situation and the overall quality of her life. She has been consistently dissatisfied
with her economic status as well as her social pursuits. She denies any suicidal or
homicidal ideation.
49
APPENDIX E VIGNETTE B/ HIGH INFORMATION
Ms. Harris is a 40 year old African-American woman who has been
participating in outpatient therapy at a community mental health clinic for the past
year. She has a history of one psychiatric hospitalization due to a suicidal gesture.
She is the eldest of two children, and reports that she was raised by her mother and
aunt. While she reportedly developed intellectually and physically at a normal rate,
Ms. Harris describes herself as a timid child. In addition, she characterizes her family
relationship during her adolescence as “distant.” She also reports that she did not
suffer from any major illnesses during her childhood except for chicken pox.
Ms. Harris states that she enjoyed school and received average grades. In
addition, Ms. Harris states that she had a large group of friends and that she rarely got
into trouble. After completing High School, Ms. Harris enrolled in community
college. However, she found the work to be overwhelming and dropped out during
her sophomore year. Ms Harris reports that she has been employed as a part-time
sales associate at a local store. However, she states that she has difficulty getting to
work on time and sometimes does not get out of bed. Currently, Ms. Harris’s major
source of income is her current salary and financial assistance she receives from her
children.
Ms. Harris reports that she has never been married and has three children. She
reports that her mother is deceased and that her aunt resides in a nursing home. She
claims that she has never met her father and is unaware as to his whereabouts. Ms.
Harris also reports that her siblings visit her about once a month but she doesn’t
contact them as she does not like to use the telephone. Lastly, she reports that her
youngest daughter and son live with her and assist her with the finances. Her other
children live on their own and visit on holidays. She states that her relationship with
her children is strained as they do not understand why she is so sad and withdrawn.
Ms. Harris appears about her stated age and is usually late for her
appointments. She was approachable, soft spoken, and had a relatively pleasant
demeanor. However, Ms. Harris was often poorly groomed wearing what appeared to
be the same sweat-suit each week. Ms. Harris often had trouble making eye contact
during the assessment and was tearful at times. In addition, Ms. Harris had a difficult
time staying focused on the questions being asked, and tended to perseverate on
family issues and her need to get better. Moreover, she would be quite talkative at
times and then suddenly withdraw and cry excessively.
Ms. Harris appears to have very poor insight into her psychological problems
and blames outsiders for her own difficulties. During all of her appointments with the
current evaluator, she insisted that she just had “bad nerves.” Furthermore, she stated
that she stooped taking her psychotropic medication because it hurt her stomach but
could not report when she discontinued using it. Overall, it appears that Ms. Harris
has some very negative views about her social situation and the overall quality of her
life. She has been consistently dissatisfied with her economic status and family life.
50
APPENDIX F VIGNETTE D/LOW INFORMATION
Ms. Harris is a 40 year old African-American woman who has been
participating in outpatient therapy at a community mental health clinic for the past
year. She has a history of depression. She is the eldest of two children, and reports
that she was raised by her mother and aunt. While she reportedly developed
intellectually and physically at a normal rate, Ms. Harris describes herself as a timid
child. In addition, she characterizes her family relationship during her adolescence as
“distant.” She also reports that she did not suffer from any major illnesses during her
childhood except for chicken pox.
Ms. Harris states that she enjoyed school and received average grades. In
addition, Ms. Harris states that she had a large group of friends and that she rarely got
into trouble. After completing High School, Ms. Harris enrolled in community
college. However, she found the work to be overwhelming and dropped out during
her sophomore year. Ms Harris reports that she has difficulty getting to work on time
and sometimes does not get out of bed. Currently, Ms. Harris’s major source of
income is her current salary and financial assistance she receives from her children.
Ms. Harris reports that she has never been married and has three children.
Lastly; she reports that her youngest daughter and son live with her and assist her
with the finances. Her other children live on their own and visit on holidays.
Ms. Harris was approachable, soft spoken, and had a relatively pleasant
demeanor. However, Ms. Harris was often poorly groomed wearing what appeared to
be the same sweatsuit each week. Ms. Harris often had trouble making eye contact
during the assessment and was tearful at times. Ms. Harris appears to have poor
insight into her problems.
Ms. Harris appears to have very poor insight into her psychological problems
and blames outsiders for her own difficulties. During all of her appointments with the
current evaluator, she insisted that she just had “bad nerves.” Overall, it appears that
Ms. Harris has some very negative views about her social situation and the overall
quality of her life. She has been consistently dissatisfied with her economic status
and family life.
51
Table 1
GAF Minimum, Maximum, Means and Standard deviations
___________________________________________________
Time 1
Time 2
Vignette
M
SD
M
SD
PPALow
40.33
7.73
42.00
8.62
PPCHigh
45.89
7.46
44.89
7.49
PPBLow
43.11
8.13
42.78
6.12
PPDHigh
52.67
5.41
51.00
4.27
CALow
37.56
14.15
45.00
9.82
CCHigh
41.33
10.65
42.89
11.34
CBLow
44.44
9.13
50.33
10.01
CDHigh
45.56
14.83
45.44
10.55
Note. PP = paper-and-pencil GAF administration; C = computer assisted
GAF administration; Low = low information condition; High = high information condition.
52
Table 2
Test-Retest Reliabilty of GAF Ratings
_______________________________________________________________________________
Variable
Cronbach’s Alpha
Description
PP-Paper-and-Pencil GAF
PPALow1-PPALow2
.98
Excellent
PPCHigh1-PPCHigh2
.95
Excellent
PPBLow1-PPBLow2
.91
Excellent
PPDHigh1-PPDHigh2
.87
Good
CALow1-CALow2
.58
Questionable
CCHigh1-CCHigh2
.77
Acceptable
CBLow1-CBLow2
.89
Good
CDHigh1-CDHigh2
.89
Good
C-Computer-assisted GAF
Note. PP = paper-and-pencil GAF administration; C = computer assisted
GAF administration; Low = low information condition; High = high information condition.
53
60
50
GAF ratings
GAF RATING
40
30
GOLD
STANDARD
GAF RATING
20
10
O
D W
CO HI
M GH
CO BL
M OW
D
H
IG
H
PP
BL
PP
PP
PP
A
LO
CH W
CO I
M GH
CO AL
M OW
CH
IG
H
0
Method
Figure 1. GAF Ratings Compared to Gold Standard GAF Ratings
Note. PP = paper-and-pencil GAF administration; C = computer assisted
GAF administration; Low = low information condition; High = high information condition
Page Break
54
Standard Error of Mean +- 2 SE
8
6
4
2
0
-2
-4
-6
N=
36
36
PPLOW
36
36
COMLOW
PPHIGH
COMHIGH
Figure 2. Standard Error of GAF Ratings
Note. PP = paper-and-pencil GAF administration; C = computer assisted
GAF administration; Low = low information condition; High = high information condition.
Mean Diff. Between GAF and Gold Standard
55
6
4
2
0
-2
-4
PPLOW
PPHIGH
COMLOW
COMHIGH
Figure 3. Mean Difference Between GAF Ratings and Gold Standard Ratings
Note. PP = paper-and-pencil GAF administration; C = computer assisted
GAF administration; Low = low information condition; High = high information condition
56
G
A 55
F
50
R
A 45
T 40
I
N 35
G
S 30
Paper Low
iPaper
f High
if
Computer Low
iCom
f puter High
if
Time 1
Time 2
Figure 4. Mean GAF scores by Method and Information Across Time
Note: Low = low information condition; High = high information condition.
57
Vita
Sarah Beth Woldoff
EDUCATION
Drexel University, Philadelphia, PA
Doctorate of Philosophy in Clinical and Health Psychology, July 2004
Masters of Arts and Sciences in Clinical Psychology, Dec. 2000
1992-1996
Temple University, Philadelphia, PA
Bachelor of Science in Psychology, May 1996
HONORS AND DISTINCTIONS
• Award for Volunteer Community Services at Disability Resources, Temple University, 1996
• Magna Cum Laude, Temple University, 1996
• Golden Key National Honor Society, Temple University, 1995-1996
• Psi Chi (National Honor Society in Psychology), Temple University 1993-1996
• Dean’s List, Temple University, 1993-1996
• Honors Program, Temple University, 1992-1995
• Early Admissions Award, Temple University, 1992
PROFESSIONAL PRESENTATIONS
Woldoff, S., Simmons, M., & Napolitano, D. (May 2002). Evaluation of High-p/Low-p
sequencing in the treatment of non-compliance. Paper presented at the annual meeting of the
Association for Behavior Analysis, Toronto, Canada.
1997-2004
Progar, P., Perrin, F., Woldoff, S., & Tessing, J. (May 2002). Experimental evaluation of
learning history upon current reinforcement contingencies. Symposia panel discussion at the
annual meeting of the Association for Behavior Analysis, Toronto, Canada.
Woldoff, S., Herbert, J., & Greenberg, R. (April 2001). Reliability of the GAF in Multiplerater Situations. Paper presented for the annual meeting of the Eastern Psychological association,
Washington, D.C.
Woldoff, S., & Harwell, V. (April 2001). Anxiety Disorders and Assessment in College-aged
Populations. Symposia presented for the annual meeting of the Eastern Psychological association,
Washington, D.C.
Greenberg-Saluk, R., Herbert, J., Rheingold, V., Woldoff, S., & Crittenden, K. (Nov. 2000).
Brief FNE: Preliminary psychometric findings. Paper presented at the annual meeting of the
Association for the Advancement of Behavior Therapy, New Orleans, and Louisianna.
Progar, P., Woldoff, S., Vollmer, Mace, F.C., Daniels, Dency. (May 1998). A comparison of
NCR and DRO for Self-Injury when reinforcement rates are equated. Symposia panel discussion
at the annual meeting of the Association for Behavior Analysis, Orlando, FL.
Woldoff, S., Progar, P., Vollmer, T., & Eisenchink, K. (May 1998). Reducing Aggressive and
Self-Injurious Behavior using two different schedules of differential reinforcement of other
behavior. Poster presented at the annual meeting of the Association for Behavior Analysis,
Orlando, FL.
Woldoff, S., Progar, P., Mace, F.C., Vollmer, T., Lalli, J., & Ploog, F. (May 1997). A
comparison of NCR and DRO for Self-Injury when reinforcement rates are equated. Poster
presented at the annual meeting of the Association for Behavior Analysis, Chicago, IL.
MANUSCRIPTS
Woldoff, S. & Herbert, J. (in preparation) Reliability of the of the GAF in MultipleraterSituations. Drexel University.
Download