Reliability of The Global Assessment of Functioning Scale A Thesis Submitted to the Faculty of Drexel University By Sarah Beth Woldoff, M.A. in partial fulfillment of the requirement for the degree of Doctor of Philosophy July 2004 ii Dedication I would like to dedicate my dissertation to my family who has always put my needs before their own. Their love and encouragement has helped me to attain all of my goals and I appreciate all they have taught me. iii Acknowledgments I would like to thank my committee members for the time and effort they put into their participation in my defense. I would also like to extend a special thank you to my chair, Dr. James D. Herbert, for his countless e-mails, feedback sessions, and being a true mentor in every sense of the word. Lastly, I want to thank Keith Davis for his support, encouragement, and always being my greatest fan. iv Table of Contents LIST OF TABLES............................................................................................................. vi LIST OF FIGURES .......................................................................................................... vii ABSTRACT..................................................................................................................... viii 1. BACKGROUND AND LITERATURE SURVEY. ................................................1 1.1 History of the GAF ...........................................................................................1 1.2 Limitation of the GAF ......................................................................................2 1.3 New Measures of Social Functioning ...............................................................3 1.3.1 The SOFAS, GARF, and the K-AXIS ...................................................3 1.4 Reliability of the GAF ......................................................................................6 1.5 Predictive Validity ...........................................................................................8 1.6 GAF and Treatment Outcome.........................................................................11 1.7 The GAF and Allocation of Services..............................................................13 1.8 Future Directions of the GAF .........................................................................14 1.9 Rationale for the Study ...................................................................................15 2. APPARATUS AND TEST PROCEDURE ...........................................................20 2.1 Instrument .......................................................................................................20 2.2 Research Design..............................................................................................21 2.3 Participants......................................................................................................21 2.4 Materials .........................................................................................................22 2.5 Procedure ........................................................................................................23 2.6 Hypotheses and Statistical Analyses...............................................................24 v 2.7 Power Analysis ...............................................................................................26 3. RESULTS ...............................................................................................................27 4. DISCUSSION .........................................................................................................30 4.1 Psychometric Properties of the Computerized GAF…………………………31 4.2 Clinical Versus Actuarial Decision Making ....................................................34 4.3 Clinical Implications........................................................................................36 4.4 Strengths and Limitations ................................................................................38 4.5 Conclusions and Future Directions..................................................................40 LIST OF REFERENCES.............................................................................................42 APPENDIX A: TRADITIONAL GAF SCALE ..........................................................45 APPENDIX B: COMPUTERIZED GAF ...................................................................46 APPENDIX C: VIGNETTE A/HIGH INFORMATION ............................................47 APPENDIX D: VIGNETTE C/LOW INFORMATION .............................................48 APPENDIX E: VIGNETTE B/HIGH INFORMATION.............................................49 APPENDIX F: VIGNETTE D/LOW INFORMATION..............................................50 VITA ............................................................................................................................57 vi List of Tables 1. GAF Minimum, Maximum, Means and Standard deviations.................................52 2. Test-Retest Reliabilty of GAF Ratings ..................................................................53 vii List of Figures 1. GAF Ratings Compared to Gold Standard Ratings ................................................53 2. Standard Error of GAF Ratings ..............................................................................54 3. Mean Difference Between GAF Ratings and Gold Standard Ratings...................55 4. Mean GAF scores by Method and Information Across Time................................56 viii Abstract Global Assessment of Functioning Scale Sarah Beth Woldoff, M.A. James D. Herbert, Ph.D. Since the Global Assessment of Functioning Scale (GAF) was introduced in the revised third edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-III-R) in 1987 (American Psychiatric Association), its use in clinical settings has grown considerably. However, there is little research on the reliability and validity of the scale. Thirty-six psychologists with experience administering the GAF scored one high-information and one low-information vignette according to two methods of determining the GAF score. Method A consisted of the traditional paperand-pencil version in which the rater determines the client’s GAF score based on the individual’s psychological, social, and occupational functioning. Method B consisted of a computer-assisted GAF (First & Multi-Health Systems Staff, 1997), in which assessment questions related to psychological, social, and occupational functioning are presented to the clinician in a yes/no format. Results indicated that both methods of GAF administration could be scored reliably by raters. Consistent with predictions, the results revealed a significant interaction between method and information level. Specifically, in the high information condition, the computerassisted method resulted in scores closer to “gold standards” determined by expert diagnosticians relative to the paper-and-pencil method. These findings are promising with respect to the clinical utility of the computer-assisted GAF procedure. Limitations of the study and directions for future research are discussed. 1 CHAPTER 1: BACKGROUND AND LITERATURE SURVEY 1.1. History of the GAF The Global Assessment of Functioning Scale (GAF) was introduced as a new rating scale of overall psychiatric disturbance as Axis V of the revised third edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM III-R, American Psychiatric Association, 1987). Since its introduction, its use in clinical settings has expanded due to the need for an easily and quickly administered measure of severity of mental illness. It is currently the most commonly used global assessment instrument for psychiatric patients (Bodlund, Kullgren, Ekselius, Lindstrom, & von Knorring, 1994; Piersma & Boes, 1997). The GAF is derived from its predecessor, the Global Assessment Scale (GAS; Endicott, Spitzer, & Fleiss, 1976), a measure used to assess a patient’s overall level of functioning for a specified time period (American Psychiatric Association, 1987). The GAF is based on the assumption that the level of current functioning in psychiatric populations holds crucial information for treatment planning and treatment outcome. Furthermore, the GAF can be used in a variety of settings to follow changes in an individual over time without the need or expense of extensive training. The GAF is similar to the GAS in that it has similar criteria and the same interval design, a value range from 0 (most severe) to 100 (least severe) with 10 anchor points at equal intervals (Hall, 1995). However, unlike the GAS, each interval of the GAF is accompanied by a behavioral descriptor ranging from “superior functioning in a wide range of activities…no symptoms” to “persistent danger of severely hurting self or others…persistent inability to maintain minimal personal hygiene.” Therefore, the interviewer must first determine the descriptor that summarizes the client’s current difficulties, and then indicate the severity of 2 impairment within a nine-point range. However, the number of criteria required to meet a particular interval is not defined. Furthermore, the rater must make a single rating based upon the patient’s overall level of psychological, social, and occupational functioning. Additionally, impairment in functioning due to physical limitations or psychosocial stressors is not included in the GAF ratings. The absence of these stressors is cited as a major flaw by numerous researchers (Bodlund et. al., 1994; Hall, 1995, & Piersma et.al., 1997), who have suggested that one’s general medical condition may have a major impact upon one’s social and occupational functioning. 1.2. Limitation of the GAF A major difficulty with the GAF is that it integrates three different dimensions of functioning (i.e., social, occupational, and psychological symptoms) that do not necessarily covary. In fact, several studies have found that psychological functioning does not covary with social and occupational functioning (Bodlund, 1996; Byrne et al., 1994; Calvoressi, Libman, Vegso, McDougle, & Price, 1996). For example, individuals can experience mild psychological distress with severe impairment in their daily functioning and vice-versa. However, the scoring methodology of the GAF does not allow for this differentiation, and instead forces the rater to focus more heavily upon either psychological symptoms or functioning. Additionally, the DSM serves as a multiaxial system that involves assessment on several axes, each of which refers to a different domain of information that may assist the clinician in both treatment planning and predicting treatment outcome. The GAF (Axis V) appears to have considerable overlap with Axis I (clinical disorders and other conditions that may be the focus of clinical attention) and Axis II (personality disorders and mental retardation). Assessment of functioning beyond psychological symptoms is needed to develop a comprehensive diagnosis and 3 treatment plan. A separation of Axes I and II from the GAF scale is also important if a distinct measure of global functioning is to be developed. Many researchers (e.g., Goldman, Skodol, & Lave, 1992), have suggested that the GAF scale should be modified into two separate scales, one to measure global symptomology and psychological functioning, and the other to measure social and occupational functioning. In an effort to address these concerns, two new experimental global functioning scales were included in the DSM-IV, the Global Assessment of Relational Functioning Scale (GARF) and the Social and Occupational Functioning Assessment Scale (SOFAS). 1.3. New Measures of Social Functioning 1.3.1. The SOFAS, GARF, and the K-AXIS The SOFAS is intended to assess one’s level of social and occupational functioning without the influence of psychological symptoms. Unlike the GAF, this scale considers the impact of general medical conditions and physical disabilities upon one’s level of social and occupational functioning. The GARF is used to evaluate one’s level of functioning in relationships with family, friends, and significant others. This scale uses the three content areas of problem solving, organization, and emotion to measure the degree of relational functioning in psychiatric patients with a ranking from optimal to disrupted. These two experimental scales distinguish psychiatric symptoms from occupational and social functioning and are thought to provide clinical information beyond that provided by the GAF (Hilsenroth et. al., 2000). Patterson and Shin Lee (1995) evaluated the construct validity of a modified GAF scale, later to become the SOFAS, using the social, occupational, and clinical data of 196 psychiatric patients receiving outpatient mental health services. The 4 results demonstrated the presence of convergent and discriminant validity of the modified GAF scale. The results suggest that the modified scale captures multidimensional information about functioning including social support, medication compliance, current living situation, and current potential for violence. In addition, there was no significant difference in scores using the modified GAF scale across participants with four Axis I diagnoses, suggesting that this measure of functioning was not unduly affected by the nature of participants’ specific psychiatric symptoms. Similarly, Byrne, Dagadakis, Unutzer, and Ries (1996) examined the validity of a modified version of the GAF. This modified GAF was similar to the SOFAS in that it was designed to consider only social and occupational functioning and not symptom severity. The results indicated that the revised GAF was nevertheless strongly related to patient’s psychiatric symptoms rather than social and occupational functioning. Even when ratings were performed by board certified psychiatrists; the assessment of psychiatric symptom severity was the primary predictor of GAF scores. Byrne and colleagues concluded that their revised version functioned similarly to the original GAS, which has been found to correlate best with psychiatric symptom severity, duration of hospitalization, and independent living and self-care skills, and only weakly with social and occupational functioning. Hilsenroth and colleagues (2000) completed the only published study investigating the reliability and convergent/discriminant validity of the GAF, the GARF, and the SOFAS. All three scales exhibited high levels of inter-rater reliability. However, the GARF and SOFAS were each more related to the GAF than to each other. The authors concluded that although all three scales can be scored reliability, the SOFAS and the GARF appear to tap into somewhat different constructs than the GAF and provide additional information regarding global functioning. 5 A second study by Hay and colleagues (2003) evaluated the validity of the GAF, the GARF, and the SOFAS in a two-year follow up of adult psychiatric patients. Results demonstrated that the SOFAS and the GAF scores on admission were significantly and negatively correlated with duration of hospital admission. In addition, the SOFAS ratings on discharge were significantly and negatively correlated with overall psychiatric outcome at two years follow-up. The authors conclude that the SOFAS had better predictive and concurrent validity than the GAF or the GARF and may be a more useful measure of adaptive functioning. However, Kennedy and Foti (2003) criticized this study stating that although the GAF collapses functioning and symptoms into a single measure, the SOFAS merges social and occupational functioning into a single measure also not allowing the rater to specify which factor is influencing the overall rating. They suggest that future research should focus on their scale, the Kennedy Axis V (K Axis which breaks symptoms and functioning into distinct areas including psychological impairment, social skills, violence, activities of daily living, and occupational skills, so that no information is lost as it is in scales that use one global score. In addition, the K Axis can generate a total score equivalent to that of the GAF. The authors state that the K Axis allows for the measurement of symptoms and functioning in each of the major clinical domains, eliminating the need for using multiple instruments to measure these areas separately, which is both inefficient and expensive. In addition, the authors claim that the K Axis categorizes clinical information in a way that could simplify treatment planning. In conclusion, it appears that the SOFAS and the GARF identify different elements of information important in identifying stressors and potential focal points of treatment planning. However, it appears that these two scales may have similar limitations to those identified for the GAF. Although the research on these new 6 scales is limited, it appears that they may add additional information to the diagnostic model within the DSM-IV, independent of psychiatric symptom severity. Additionally, the K Axis has been proposed as a better approach to the assessment of global functioning and symptomotology, but no independent research has yet been completed on this proposed Axis. More studies are needed to address the overall assessment of functioning and its relationship to psychiatric symptoms, treatment, and outcome measures. 1.4. Reliability of the GAF Interviewer-rated scales such as the GAF are prone to problems of low interrater reliability because they are used by assessors with different levels of training and experience (Bodlund et al., 1994; Piersma & Boes, 1997). This is a major problem in that for scores to be comparable and meaningful across research paradigms, inter-rater reliability needs to be consistent from one study to another. That is, some raters may have a propensity to make high ratings, whereas others may have a propensity to make low ratings. Training and clinical experience may also impact GAF ratings. For example, raters such as nursing staff may be more interested in functioning and less interested in symptoms than psychiatrists due to their training and area of expertise. Moreover, variation in adherence to GAF guidelines and the heterogeneity of patient illness severity are the two most influential factors limiting inter-rater reliability (e.g., Dworkin, Friedman, & Telschow, 1990; Jones, Thornicroft, Coffey, & Dunn, 1995). Although there have been a select number of publications on the psychometric qualities of the GAS (e.g., Dworkin et al. 1990), few reports address these issues in reference to the GAF. Data on the basic reliability and validity of the GAF were not provided in the DSM III-R (APA, 1987). Therefore, much of the research discussed in this study will focus on its predecessor, the GAS. 7 Endicott et al. (1976) performed the first series of test-retest reliability studies on the GAS and found intraclass correlation coefficients (ICC’s) ranging from .61 to .91 with standard error of measurement scores ranging from 5.0 to 8.0. Most of these ratings were completed by a limited number of well-trained interviewers, raising questions about the degree to which the results are generalizable to typical clinical settings (Luborsky, 1962; Luborsky & Bachrach, 1974). Even with this potential bias, ICC’s of .60 were obtained in one study within the series and suggest that the scale is only moderately reliable at best. Endicott et al. (1976) reported that patients’ psychiatric symptoms such as cognitive disorganization, hallucinations, delusions, suspiciousness, and inappropriate appearance were associated with lower GAS ratings, thereby demonstrating the association between the severity of psychological symptoms and clinicians’ ratings of global impairment. Dworkin and colleagues (1990) used the GAS with multiple clinicians to rate 108 chronically mentally ill outpatients for a period of 18 months. Raters consisted of 17 psychiatrists, 17 residents, and 17 psychologists, all of whom participated in 90-minute training sessions. Each training session produced relatively strong inter-rater reliability, ranging from .66 to .92. The r of .66 occurred at the last training session when there was pressure to complete the study and trainer fatigue appeared to lead to abbreviated vignette discussion. Although the GAS was designed to be used with minimal training, the authors concluded that training is necessary in order to maximize interrater reliability in situations in which individual patients may be rated sequentially by multiple interviewers. They concluded that training helps ensure that any change in the GAS score actually reflects change in the patient’s global functioning and not measurement error. Given the restricted diversity of clinicians, patients, information, and time frames in these studies, the inter-rater reliability of the GAF has not yet been well 8 established. Although specialized training improved reliability ratings, such training does not reflect the GAF’s intended use and training is not possible in all settings due to time constraints and limited resources. In short, there is a paucity of data on the inter-rater reliability of Axis V of the DSM-IV, especially among clinicians without specialized training in the GAF. This is a major concern because in order for scores to be comparable and meaningful across raters, inter-rater reliability needs to be high. While this problem appears to be attenuated through specific training it defeats one of the principal purposes of the GAF. The GAF is promoted as an assessment tool that is easily and quickly administered with minimal training. However, research has shown that specialized training is a key element to the GAF’s accurate use. Therefore, further research should systematically evaluate this concern using the current version of the GAF instead of its predecessor. 1.5. Predictive Validity Several studies have reported diagnostic group differences when using the GAF with diverse patient populations (Byrne et al. 1996; Bodlund et al. 1994; Patterson & Shin-Lee, 1995). Skodol, Link, Shrout, and Horwath (1988) found significant variation in GAF ratings among 10 diagnostic groups of both psychiatric inpatients and outpatients. In particular, schizophrenics received the lowest GAF ratings, and individuals with anxiety disorders, adjustment disorders, major depressive disorders as well as those without an Axis I diagnosis, received the highest ratings even though their level of functioning was stated to be equivalent to that of the schizophrenics. Based on these findings, Skodol et al. (1988) concluded that psychiatric symptoms have a greater impact on GAF ratings than variables associated with social and occupational functioning. A second study evaluating a sample of more than 10,000 psychiatric patients found that depressed patients had higher GAF ratings than nondepressed patients even though level of functioning was not 9 consistently different between the two groups (Mezzich, Evanczuk, Mathias, and Coffman, 1985). Overall, these studies support the claim that individuals with major mental illnesses usually obtain lower GAF ratings than those diagnosed with less severe forms of psychopathology. This conclusion is of great importance as it is assumed that one’s functioning level holds critical information for treatment planning and treatment outcome. Furthermore, managed medical care tends to allocate more resources to psychological disorders associated with significant disabilities in functioning (Phelan, Wykes, & Goldman, 1994). Jones and colleagues (1995) used the GAF to assess severity of disturbance in a sample of 103 chronically mentally ill patients, over a 6 month period. GAF scores were obtained in combination with other ratings of psychological symptoms and disability. Following the suggestions of Goldman and colleagues (1992), the GAF was administered as both an overall scale and as two separate measures, one designated for symptoms (GAFSYM) and the other for disability (GAFDIS). These ratings were then compared with changes in medication and need for support. In all cases, a lower composite GAF score was associated with an increase in clinically identified support needs of the client. The authors concluded that the GAF is a valid measure of disturbance of psychological functioning among long-term mentally ill patients and can be readily used by multidisciplinary raters without extensive training. A second problem with the GAF is its lack of structure and concrete directions for its use. Specifically, it is unclear how many criteria a patient must meet in order to fall into a particular range, and how to determine an exact score within that range. Additionally, directions in the DSM-IV state that ratings are to be based upon limitations of functioning due to mental impairments alone, but it is not clear how the clinician is to distinguish functional impairments that are not related to psychiatric 10 disturbance. Based upon these shortcomings, Goldman et al. (1992) suggested the use of a modified GAF that separated relational functioning from psychological functioning and symptoms, and included the influence of physical impairments into the rating. In addition, Goldman et al. suggested increasing the structure of the GAF by adding additional directions regarding how scores should be assigned. Following Goldman’s suggestions, Hall (1995) created a modified GAF. He increased the structure of the original GAF by adding additional directions for assigning scores and increasing the number of criteria required within each 10-point interval. Using 16 detailed patient intake histories and discharge summaries from hospital charts of depressed patients with or without comorbid eating disorders, GAF ratings were evaluated by using both the original and modified GAF. Participants were separated into two groups. Group I consisted of nurses, physical therapists, social workers, psychology technicians, and clinical psychologists. Group II consisted of other members of the interdisciplinary team such as psychiatrists and general practitioners. The intra-class correlation coefficients between groups for both admission and discharge were higher in the modified GAF group when compared to the original GAF group. Hall concluded that this was due to increased inter-rater variability rather than patient heterogeneity as shown by the standard errors of the ratings for each patient. Specifically, of the 16 standard errors for patient admission data, 13 were higher in the original GAF group than in the modified group. Therefore, there was more variability among rater’s GAF scores in the group using the original GAF. In addition, all of the means for the patient’s admission GAF scores were also higher in the original GAF groups than in the modified GAF group. Therefore, rater-s using the modified GAF rated patients as more impaired than those using the original GAF. The modified version appears to be useful for increasing inter-rater reliability 11 in settings with multiple raters of varying education and employment histories. In addition, research (e.g., Goldman et.al., 1992) has shown that some raters tend to rate consistently high while others rate low. Furthermore, staff members who had different training and clinical experience performed ratings. Lastly, raters had access to GAF admission scores before selecting their discharge GAF ratings. Therefore, discharge ratings may have been influenced by a desire to establish treatment efficacy. In summary, the predictive validity of the GAF appears to be limited by unclear directions regarding administration and variations in the way different disciplines assign GAF scores based upon the same data. Overall, it appears that the ratings made using the original GAF are strongly influenced by patient symptom severity instead of by functioning in the targeted areas. 1.6. GAF and Treatment Outcome Others have shown that clinicians’ GAF ratings of psychiatric patients demonstrate clinical improvement during and following treatment (Piersma & Boes 1997; Rund et al. 1994). Piersma and Boes (1997) evaluated GAF ratings for adult inpatients, adult partial hospitalization patients, and adolescent inpatients. Mean scores for current GAF ratings of adult inpatients were 45.1 at admission and 60.1 at discharge. For adult partial hospitalization patients’ GAF ratings were 55.2 at admission and 65.5 at discharge. For adolescent inpatients, means were 30.6 at admission and 52.7 at discharge. In all three ratings, patients were rated as having improved functioning at time of discharge. However, this could be attributed to the fact that raters had access to the GAF admission scores when making GAF discharge ratings and may have been motivated to show clinical improvement. Differences in GAF ratings between adolescents and adults were explained by the fact that adolescent and adult psychiatrists may interpret the GAF rating scale differently. According to the authors, adolescent psychiatrists tended to rate according to the 12 lowest level of functioning on any of three dimensions considered, whereas the adult psychiatrists tended to rate the GAF by averaging the levels of functioning on all three dimensions of functioning. Skodal, Link, Shrout, and Horwath (1988) raised similar concerns regarding the GAF instructions suggesting that confusion exists as to whether clinicians are to average the various levels of symptoms and social and occupational functioning or rate the lowest level if symptoms and social and occupational functioning are not equivalent. Gordon and Gordon (1985) examined the relationship between GAF ratings and treatment outcomes using the retrospective chart review of 232 discharged patients treated in a psychiatric outpatient clinic. Raters were instructed to rate the patients on Axis V and judge their symptomatic responses to treatment based upon information in their charts. The results revealed that 55% of patients were judged as symptomatically improved; yet only 19% were rated as functionally improved according to the GAF. In addition, they found that patients’ ratings on the GAF increased consistently as their number of treatment sessions increased regardless of psychiatric symptomology. In a subsequent study, Gordon and Gordon (1987) reported predictable differences in GAF ratings between chronically ill state hospital patients, long-term inpatients, short-term inpatients, and outpatients. The results showed that chronically ill state hospital psychiatric patients reported greater levels of stress and had worse premorbid levels of functioning compared to the other three groups, and that longerterm inpatients scored worse on both measures than shorter-term inpatients and outpatients alike. They concluded that a “strain ratio,” a ratio of the Axis IV to Axis V score, could be used to estimate the amount of treatment an individual patient requires. Recently, two studies have expanded upon the idea of “strain ratio,” and 13 examined the association between GAF ratings during treatment and independent information about post-treatment outcomes. Moos, McCoy, and Moos (2000) assessed the adequacy of GAF ratings in a sample of 1,688 patients with substance use disorders, many of whom also had psychiatric disorders. They found that the GAF ratings did not differ in relation to demographic characteristics such as race, age, or sex, and that clinicians’ ratings of patients’ current GAF scores were based primarily on psychiatric diagnoses and symptoms. They found that subjects diagnosed with both poly-substance abuse, Axis I psychiatric diagnoses, as well as a medical condition received lower GAF scores. In addition, they concluded that patients’ symptoms and psychiatric diagnoses were more strongly correlated with clinicians’ judgments of functioning than were the clients’ actual level of social and occupational functioning. Lastly, they found little or no correlation between ratings of patients’ current or highest level of global functioning and psychological, social, or occupational functioning at a one-year follow-up. They also found that although lower GAF ratings were associated with longer hospital stays, they did not predict patients’ probability of readmission or length of readmission. These findings cast doubt on the use of GAF ratings for predicting treatment outcome and imply that the GAF fails at one of its intended uses. 1.7. The GAF and Allocation of Services In a review of 9,055 adult psychiatric intake evaluations, Thompson, Burns, Goldman, and Smith (1992) found variations in the way GAF ratings were assigned by managed care case managers compared with treatment providers for the same cases with the same information. They concluded that higher GAF scores reflected a need by the managed care companies to limit the use of all inpatient services rather than their desire to selectively eliminate unnecessary hospitalizations. 14 Seeking to replicate and expand upon their findings, Moos, Nichol, and Moos (2002) evaluated the value of the GAF as part of a system-wide program for monitoring the allocation and outcomes of mental healthcare services via the Department of Veterans Affairs. They found that patients’ clinical diagnoses and symptoms were better predictors of GAF ratings than was their social and/or occupational functioning. Moreover, patients with psychiatric diagnoses, psychoses, or a recent inpatient episode were rated as more impaired. Therefore, indicators of social and occupational functioning made minimal contributions to the GAF ratings. The authors concluded that GAF ratings provide little or no information independent of the clinicians’ judgments about diagnoses and symptom severity and may not be a useful predictor in programs evaluating the allocation and outcomes of mental health care. In summary, the outcome research demonstrates little or no relationship between GAF ratings and length of hospital stay, probability of readmission, and prediction of overall treatment outcome. These results imply that the GAF does not meet one of its primary uses, in that it fails to predict patients’ future use of psychological services. 1.8. Future Directions of the GAF Based upon the limitations of the traditional GAF, a cost-effective computer program was recently developed by First and Multi Health System Incorporated (1997). The computerized GAF is a clinical tool that assists clinicians in making an Axis V diagnosis. The computerized GAF program has been designed to ensure that all aspects of functioning (i.e., psychological, social, and occupational) are considered in the assessment process and that symptom severity and level of functioning are taken into account in making a GAF rating. It uses a decision tree model in which questions regarding all areas of functioning are presented to the clinician in a yes/no format. Responses to these questions determine the order of presented questions and guide the clinician’s overall assessment. However, no reliability or validity 15 information on the computerized GAF is provided in the manual. In addition, there is no published research on this new instrument in the literature. 1.9. Rationale for the Study To date, the research that has been conducted on the reliability and validity of the GAF has been limited. Although some research has examined the GAF, few have evaluated its psychometric properties. This lack of research is problematic since the GAF is so widely used and is becoming a major tool within the realm of managed health-care. The GAF Scale is used to assess patients of all ages in a number of settings, including outpatient clinics, inpatient clinics, residential treatment programs, and private practices. Users include psychiatrists, psychologists, social workers, physicians, nurses, counselors, and related health care workers. It has become a particularly important tool within the realm of managed health-care and is currently used to determine “medical necessity” of mental health benefits and eligibility for disability benefits. Decisions such as whether an individual’s current mental health state entitles them to mental health benefits depend not only on the patient’s diagnosis but also on how it affects their current overall level of functioning. The GAF is also used to determine the type of treatment, level of necessary care required, and frequency and duration of treatment. Some managed care companies (e.g., Magellan Behavioral Health) have issued required GAF ratings for inpatient and outpatient services. With these standards and its increasing use in both clinical and research settings, the need for accurate and consistent GAF ratings has become increasingly important. It is from this need that the computer-assisted GAF was developed. The authors state, “it ensures the consideration of all three dimensions of the GAF rating and provides a means to determine an accurate and reliable GAF rating” (First, M.B. 16 and Multi-Health Systems Staff, 1997, p.4). In an effort to address the limited research on the validity and reliability of the GAF and explore the utility of the computer-assisted GAF, Woldoff and Herbert (2001) evaluated the reliability of clinicians’ judgments using the traditional and the computer-assisted methods of determining GAF scores of two different case vignettes. Social, occupational, and clinical data on two different psychiatric disorders were used to create two case vignettes. One vignette represented an individual with severe psychopathology such as schizophrenia, while the other vignette described an individual with mild psychopathology such as an anxiety disorder. Clinical psychology graduate students evaluated these vignettes at an initial testing and again at a one week retest using two methods of administering the GAF and determining the GAF score (i.e., the traditional paper/pencil GAF, and the computer-assisted GAF). Contrary to predictions, the results revealed that for the severe psychopathology vignette the computer-assisted method had greater inter-rater variability and worse test-retest reliability. In addition, the computer method yielded greater inter-rater variability but comparable test-retest reliability for the milder psychopathology vignette. The means for GAF ratings of the milder psychopathology vignette were almost identical for both methods of administration. However, for the severe psychopathology vignette, the means were significantly different resulting in lower GAF scores when used with the computer-assisted GAF. These results imply that the computer method of GAF determination yields greater inter-rater variability when compared to the traditional method of GAF administration. 17 There were a number of limitations of this study that could have impacted the results. The first is the use of a limited sample since all individuals rating the GAF were graduate students within the same university setting. This was a threat to the external validity of the study. A related problem is subject heterogeneity, as all participants had similar backgrounds and were limited in diversity of race, age, and gender. The most important limitation was that we were unable to compare participants’ GAF ratings to any sort of “gold standard.” These results could be related to the amount of information included in the individual vignettes. Participants reported that the information provided in the severe psychopathology vignette was vague and incomplete with regards to particular areas of functioning and patient history. This was said to be very problematic in regards to answering the specific questions generated by the computer-assisted GAF program. As previously discussed, the GAF report is a computerized program that uses a decision tree with various yes/no questions. Each question taps into a different aspect of the patient’s symptom severity and/or functional impairment. Based upon the answers to the questions, an upper limit of the GAF rating is continually set until the program determines a final range. Therefore, if necessary information is not available to answer these specific questions, informal conjecture must be used which could account for greater inter-rater variability across raters when using the computerassisted GAF. The present study addressed these limitations by evaluating the reliability of clinicians’ judgments using two different methods of determining GAF scores of a high information and low information vignette with equal levels of global functioning 18 at two different time points. GAF ratings were determined for vignettes by two recognized experts in the administration and scoring of the GAF in order to establish “gold standards” for each vignette. Dr. Robert Spitzer and Mimi Gibbons of New York University completed the expert ratings for each vignette separately and interrater agreement was perfect between their specific GAF ratings. They have numerous publications on the GAF, assisted in the development of the DSM-IV casebook, and have over twenty years of experience training countless research assistants and clinicians in various structured diagnostic interviews that include the GAF. Each rater’s GAF scores were compared to the “gold standards” to assess the accuracy of GAF ratings by both means of administration. In addition, participants were all experienced practicing clinical psychologists as opposed to graduate students. The use of two information conditions addressed a major concern of our previous research and answered questions related to whether the amount of information available regarding the patient’s current psychiatric symptoms, previous psychiatric history, family history, current and past social and occupational functioning, and additional sources of information influence GAF scoring. This holds significant importance since the GAF is used in a variety of settings with varied populations ranging from homeless individuals to those who have a well-developed system of social support. Additionally, depending on the degree of impairment, psychiatric patients may provide limited or inaccurate self-report information. Furthermore, some individuals may not be able to discuss current symptoms, duration 19 of symptoms, family history, previous hospitalizations, and social and occupational histories. Based upon the results of previous research suggesting that the amount of information included in the vignettes may directly influence GAF scores, specific hypotheses were developed accordingly. It was hypothesized that the computerassisted GAF would have higher inter-rater agreement and greater test-retest reliability relative to the paper-and-pencil condition when used with the highinformation vignettes and greater inter-rater variability and less inter-rater reliability when used with the low-information vignette. In addition, it was hypothesized that the computer assisted GAF administration would result in GAF scores that are closer to the established “gold standard” in the high information condition. Specific hypotheses are outlined in the following section. 20 CHAPTER 2 APPARATUS AND TEST PROCEDURE 2.1. Instrument The GAF (APA, 1987) is a rating scale from 0 (most severe) to 100 (least severe), divided into ten rankings ranging from most severe to no symptoms. Each ranking is accompanied by a behavioral descriptor of functioning and symptom level ranging from “absent or minimal symptom (e.g., mild anxiety before a group presentation)....no more than everyday problems” to “persistent danger of severely hurting self or others... or persistent inability to maintain minimal personal hygiene or serious suicidal act with clear expectation of death.” In addition, each individual ranking has a nine-point range. Therefore, the rater must determine which descriptor best represents a patient’s level of functioning for the specified time period and then indicate the severity of the problem by assigning a specific score (see Appendix A). The computer assisted GAF (First and Multi-Health Systems Staff, 1997), has the same basic format as the traditional GAF. However, the computer program assists the clinician in determining the score by using a decision tree with the first part of the tree covering the impact of patient’s symptom severity and the second part covering the impact of the patient’s impairment in functioning. The computerized GAF is designed to be able to be completed in less than three minutes. The decision tree determines the GAF rating using the minimum number of questions possible. The computer program provides a means for clinicians to ensure that all aspects of functioning, (psychological, social, and occupational), are taken into consideration in patient assessment and that symptom severity and level of functioning are documented. Furthermore, it is designed to remove the “guess work” in determining this rating and throughout the decision tree explanation screens make clear what is specifically meant by certain questions and provides examples of patients whose ratings fall in each particular range. On completion of the GAF report questions, the 21 computer automatically determines a 10-point GAF rating range. At that point, a sliding rating scale appears on the screen and the rater specifies an exact GAF rating within the 10 point range using clinical judgment and hypothetical comparison with other patients in that range available by simply selecting the explanation button. However, there is no data on the reliability or validity of this method in either the manual or the literature (See Appendix B). 2.2. Research Design The study was a 2 (standard vs. computerized method) X 2 (test and retest) X 2 (high vs. low information) factorial design, with each factor being within-subjects. Each rater evaluated one of two high information vignettes (Vignette A and B) using the traditional method of GAF determination and one of two low information vignettes (Vignette C and D) using the computer-assisted GAF program. The rationale for using four vignettes varied by the amount of information provided will be explained below. Vignette A and C were the same vignettes, varying only by the degree of information, as were Vignettes B and D. Furthermore, vignette assignment was randomized among participants so that half the raters evaluated Vignette A and D and half the raters evaluated Vignette B and C. Lastly, an equal number of participants evaluated each vignette using one of the two methods of GAF determination. 2.3. Participants Thirty-six doctoral level clinical psychologists participated. All participants were recruited from clinical facilities in the greater Miami, Florida area, including programs and clinics associated with the University of Miami and Citrus Health Network. All participants had at least one year of experience with the administration of the GAF with psychiatric populations and currently used the GAF in their practice or clinic. Participants were chosen based on exposure to the diagnostic assessment 22 tool and willingness to participate in the research protocol. 2.4. Materials Four fictitious vignettes incorporating social, occupational, and clinical data representing two distinct psychiatric diagnoses were used. Each vignette included information obtained during a typical intake evaluation including background variables (age, sex, etc.), social/developmental history, current psychiatric symptoms, medication history, and previous psychiatric/psychological care. Each vignette included relevant background information, behavioral observations, and clinical findings. The vignettes were developed using information from the DSM-IV Casebook (APA, 1994) and the DSM-IV (APA, 1994). In addition, two leading experts in the GAF, Mimi Gibbons, LCSW and Robert Spitzer, Ph.D., reviewed each vignette and determined GAF scores for each of the four vignettes. All vignettes had similar demographic information in order to prevent any confounds that may influence the GAF ratings and prevent comparison. Two of the vignettes (vignettes A and B) represented a high information condition and two vignettes (vignettes C and D) represented a low information condition. The conditions were defined by the amount of information provided in the vignettes, including second-party information, occupational functioning, social functioning, symptom severity, psychiatric history, medication history, and family history. Vignettes A and C described an individual with significant impairment in all areas of functioning. These vignettes described someone with severe psychological symptoms and impaired social and occupational functioning. Vignette A incorporated a large amount of information regarding presenting problems and psychiatric history, while vignette C contained less information regarding these variables. Vignettes A and C were the same vignettes varied only by the degree of 23 information provided. Both vignettes contained sufficient information to determine level of current functioning and a GAF score (See Appendices C and D). Vignettes B and D represented an individual with significant impairment in most areas of functioning with severe psychological symptoms and impaired social and occupational functioning. Vignettes B and D were the same vignette, and varied only by the degree of information provided. Vignette B incorporated a large amount of information regarding presenting problems and psychiatric history, whereas Vignette D contained less information regarding these variables. Additionally, each vignette contained enough information to determine level of current functioning and a GAF score (See Appendices E and F). Four vignettes were used as a control factor since the research design is within subjects and there are two methods of GAF determination. Therefore, the same vignette could not be used for each of the two conditions because GAF ratings in one condition would directly influence GAF ratings made using the second method of administration. In addition, results could be attributed to peculiarities of the vignette in question. Therefore two versions of each vignette were used, and were presented to subjects in counterbalanced order. It is important to note that the distinction between the vignettes was the amount of information provided in the vignettes and not the severity of psychopathology they represented (i.e., mild versus severe). 2.5. Procedure Following participant selection, raters underwent a brief 15-minute overview of the procedures involved in using the computerized version of the GAF. One high information and one low information vignette was evaluated by each rater according 24 to two methods of administrating and determining the GAF score. Method A consisted of the traditional paper/pencil version in which the rater determines the client’s GAF score based on the individuals psychological, social, and occupational functioning. Based upon the information at hand and their own clinical judgment they determined the particular range (behavioral descriptor) that best described the client. The clinician then determined a specific score that indicated the client’s severity within that particular range. Method B consisted of the computer-assisted GAF (First & Multi-Health Systems Staff, 1997) in which assessment questions related to psychological, social, and occupational questions were presented to the clinician in a yes/no format. A decision tree was then used to guide the clinician and determine a GAF rating using the minimum number of questions possible. Therefore, the clinician only needed to click on the yes icon or the no icon and the assessment tool generated the GAF 10point range based upon the clinician’s responses. The clinician then determined an exact GAF score using the information icon, which provided hypothetical comparisons with other patients in that range. One week following initial participation in the study, each rater re-evaluated the same two vignettes in order to assess test-retest reliability. The raters re-evaluated each vignette using the same method of GAF determination previously utilized. 2.6. Hypotheses and Statistical Analyses 1. It was hypothesized that the computer-assisted GAF, relative to the paperand-pencil method, would have greater correspondence with the “gold standard” GAF 25 score when used with the high-information vignette relative to the low information vignette. In order to address this hypothesis, a 2 (method: computer-assisted GAF vs. the traditional paper and pencil GAF) by 2 (level of information: high information vignette vs. low information vignette) repeated-measures ANOVA was used to analyze the data. A difference score was calculated between the GAF ratings and the “gold standard” for each subject in each condition and the difference score served as the dependent variable for this analysis. A significant interaction effect was predicted between the two independent variables (e.g., method and vignette information). 2. It was hypothesized that relative to the traditional paper-and-pencil method, the computer-assisted GAF would have higher inter-rater agreement when used with the high information condition when compared to the low information condition. The variability within each condition served as the dependent variable and Mauchley’s test of sphericity was used to assess the homogeneity of variance across conditions. It was predicted that the homogeneity of variance would reach statistical significance, indicating a difference in variances between the two methods of GAF determination in each information condition. 3. It was hypothesized that relative to the traditional paper-and-pencil method, the computer-assisted GAF administration would result in greater test-retest reliability in the high information condition when compared to the low-information condition. Cronbach’s alpha correlation coefficients were conducted between subjects’ initial GAF ratings and GAF ratings made one week later. This analysis resulted in 4 separate correlations, one for each method and one for each level information. 26 Although these hypotheses are stated as directional, the theory behind them does not support this presumption. Therefore, 2-tailed tests of significance were used in the statistical analyses. 2.7. Power Analysis A power analysis based upon a 2 by 2 ANOVA was used to determine the number of subjects needed to demonstrate statistical significance. A small effect size of .25, based on Cohen’s standards for a 2 by 2 ANOVA, was used because no prior research using this type of comparison is available within the literature. Using a power analysis program, Sample Power (Borenstein, Rothstein, & Cohen, 2001), a total sample of 35 participants was identified as necessary to obtain a power of .80, with alpha set at .05. 27 CHAPTER 3 RESULTS Thirty-six participants scored GAF’s on two of four vignettes. Descriptive data are provided in Table 1 and Figure 1. Overall, the mean scores for the vignettes were in the moderate-to-severe range. It was hypothesized that the computer-assisted GAF, relative to the paper-andpencil method, would have greater correspondence with the “gold standard” GAF score, especially in the high information condition relative to the low information condition. In other words, a 2-way interaction (method by level of information) was predicted. In order to address this hypothesis, a 2 (method: computer-assisted GAF vs. the traditional paper and pencil GAF) by 2 (high vs. low information) repeatedmeasures analysis of variance (ANOVA) was conducted to examine differences among the GAF scores. For this analysis, scores at the two assessment time points were collapsed, thereby yielding a single data point for each participant for each of the experimental conditions. The dependent variable consisted of the difference score between the gold standard for a given condition and the participant’s score for that condition. That is, the gold standard score was subtracted from each participant’s score, depending on vignette (40.5 for Vignettes A and C and 47.5 for Vignettes B and D (See Figure 2 and 3). The ANOVA indicated no main effect for method, F (1, 35) = 0.50, nonsignificant, p>.05, or for level of information, F (1, 35) = 3.06, non-significant, p>.05. However, consistent with predictions, the method by information interaction reached significance, F (1, 35) = 6.49, significant, p < .05. In order to clarify the interaction, 28 Tukey post-hoc tests were conducted to examine the various means at each point. There was a significant difference in GAF scores for the low and high information vignettes for the paper-and-pencil method, F (2,69)=8.88, significant, p<.001, with the high information condition resulting in higher discrepancy scores than the low information condition. Also consistent with predictions, there was also a significant difference between the paper-and-pencil administration and computer-assisted administration for the high information vignette condition, F (1, 35)=5.02, significant, p<.05, with the paper-and-pencil condition yielding higher discrepancy scores than the computer condition. Figure 4 illustrates this interaction. No other differences were identified among the other group means. The second hypothesis concerned the degree to which the variance in each of the GAF administration conditions and level of information conditions may have differed. The means and standard deviations of the participants’ GAF scores for each condition were as follows: Paper-and-Pencil/Low Information Condition: M=42.05, SD=7.65; Computer Assisted/Low Information Condition: M=44.35, SD=10.78; Paper and Pencil/High Information Condition: M= 48.61, SD= 6.16; and Computer Assisted/ High Information Condition: M=43.81, SD=11.84. The appropriate statistic to examine the equality of variance in repeated-measures ANOVA is sphericity. Sphericity is a test of the equality of variance assumption in repeated-measures ANOVA, which can be thought of as an extension of the homogeneity of variance assumption in independent measures ANOVA (Cohen & Cohen, 1983). The sphericity for the method and information main effects, as well as for the method by 29 information interaction, were all 1, indicating that the variances were equal and not statistically different. The third hypothesis examined the one-week test-retest reliability of each method of GAF administration across each information condition. Table 2 presents the Cronbach alphas for each pair, as well as a qualitative description of the magnitude of the reliability coefficient. Most of the reliability quotients from initial testing to the retest were excellent to good; the exception was the coefficient for the computer GAF scores for the low information version of Vignette A, which could be due to one very low score. When the extreme outlier was removed, the alpha increased from .58 to .70. 30 CHAPTER 4 DISCUSSION The Global Assessment of Functioning Scale (GAF) is a commonly used measure of overall severity of psychiatric disturbance and associated functional impairment. Despite its widespread use, there has been relatively little empirical research on the GAF, particularly related to its psychometric properties. In addition, the GAF combines three areas of functioning that do not necessarily covary (i.e., psychological, social, and occupational functioning) into a single measure, and excludes the impact of physical impairments (Skodal, 1988; Goldman, et.al. 1992; Roy-Byrne et. al., 1996). Several researchers have identified this exclusion as an important problem with the GAF. For example, a depressed individual may experience a reduction in symptoms through medication, but continue to have strained relationships with others and difficulty maintaining job stability, whereas another individual with depression might be able to maintain employment but experience occasional bouts of depression. Prior research has highlighted several problems with the paper-and-pencil GAF, including its lack of structure, poor reliability, and the fact that it is unclear with respect to how to integrate the three areas of functioning into a single rating. In an effort to address these limitations, First and colleagues at Mental Health Systems, Inc. created the computer-assisted GAF in 1997. The program consists of assessment questions related to psychological, social, and occupational functioning presented in a yes/no format. A decision tree is then used to guide the clinician and determine a GAF rating using the minimum number of questions possible. The assessment tool 31 generates a 10 point range based upon the clinician’s responses, and the clinician then determines an exact GAF score within that range. Only one study has examined the psychometric properties of the computer-assisted GAF. Woldoff and Herbert (2004) evaluated the reliability of clinical psychology graduate students' judgments using the traditional and the computer-assisted methods of determining GAF scores of two different case vignettes. One vignette represented an individual with severe psychopathology, whereas the other vignette described an individual with mild psychopathology. These vignettes were evaluated at an initial testing and again at a one week retest using two methods of administering the GAF and determining the GAF score (i.e., the traditional paper/pencil GAF, and the computer-assisted GAF). The results revealed that for the severe psychopathology vignette the computer-assisted method had greater inter-rater variability and worse test-retest reliability. In addition, the computer method yielded greater inter-rater variability but comparable test-retest reliability for the milder psychopathology vignette. The means for GAF ratings of the milder psychopathology vignette were almost identical for both methods of administration. These results implied that the computer method of GAF determination yields greater inter-rater variability when compared to the traditional method of GAF administration. However, we did not compare participants GAF ratings to any sort of “gold standard" ratings. 4.1 Psychometric Properties of the Computerized GAF The present study extended upon the existing body of research on the GAF in several ways. It assessed the possibility of using a computer-assisted GAF 32 administration instead of the traditional paper-and-pencil GAF administration. Moreover, it explored the potential impact that the amount of available information regarding treatment history, symptoms, and family history had upon GAF ratings. In addition, this study evaluated the accuracy of GAF ratings relative to expert ratings using the two methods of GAF administration. A primary goal of this study was to examine the relationship of practicing clinicians’ ratings to “gold-standard” scores as determined by a group of renowned GAF experts. Gold standard scores provided a means of evaluating these two measures that has not appeared in the literature to date. Results indicated that for the traditional paper-and-pencil GAF administration method, the low information condition was closer to the gold standard score than was the high information condition. This finding may have occurred because the paper-and-pencil format lacks structure in the form of specific instructions regarding the determination of an actual GAF score. Therefore it is possible that greater amounts of available information created greater confusion for clinicians as to how to incorporate the information into a single composite rating. Also consistent with predictions, the paper-and-pencil condition resulted in greater discrepancy scores than the computer-assisted administration for the high information condition only; the two methods did not significantly differ from one another in the low information condition. Nevertheless, it should be noted that there was a nonsignificant trend for higher discrepancy scores for the paper-and-pencil method in the low information condition as well. This pattern of findings could be 33 related to the increased structure provided by the computer-assisted GAF program. Specifically, since some of the “guesswork” is removed through the use of structured questions and the decision tree model, the computerized method allows for greater consistency across ratings. In addition, the computer-assisted GAF forces the clinician to take all three areas of functioning into account and determines the appropriate GAF range based upon the responses provided by the clinician. This increased structure is especially relevant to situations in which a relatively high amount of information is available, which would characterize most actual clinical situations. Although both methods of GAF administration demonstrated adequate oneweek test-retest reliability, the computer-assisted GAF resulted in greater inter-rater variability as indicated by the standard deviations for both the high and low information conditions when compared to the traditional paper-and-pencil method of administration. This finding appears to be related to outlier scores in the computerassisted condition, which decreased reliability coefficients. When the outliers were removed from the analysis, a decrease in the standard deviations was observed and the reliability coefficients increased from poor to acceptable. These results could be related to the procedural components of the computer-assisted and paper-and-pencil GAF. Although most participants reported finding the computerized GAF helpful, a small number of participants reported that they found some of the computer-assisted questions confusing and repetitive. Specifically, they stated that the decision tree model occasionally presented the same question more than once, suggesting perhaps 34 that they should change their initial response. The purpose of this feature of the program is to provide a reliability check if responses to questions within the same domain are not in agreement. Although not technically a program flaw, it may create confusion among users, and should be addressed in future revisions of the program. As previously discussed, the computerized program poses questions that tap into different aspect of the patient’s symptom severity and functional impairment that figure into the GAF score. If necessary information is not available to answer these specific questions, informal conjecture must be used, which could account for greater inter-rater variability across raters when using the computer-assisted GAF. Moreover, participants also reported confusion over how to select a 10 point range on the paper-and-pencil GAF if all of the indicators identified for that range were not explicitly discussed in the vignette. 4.2 Clinical Versus Actuarial Decision Making Differences in the raters’ concept of mental health and the procedural differences between the computer-assisted GAF and the traditional GAF may have contributed to the present results. Although all raters were identified as experienced clinicians given their education, training, and professional experience, research has shown that experience is only weakly associated with performance (Dawes, Faust,& Meehl, 1989). Research on clinical decision making has contrasted two approaches, clinical and actuarial. In the actuarial approach, a decision is made using empirically established relations between data and the event of interest, independent of human judgment. In the clinical approach, decisions are based on the mental processes of the 35 actual human judge. The literature has shown that in many decision making domains, actuarial models are more accurate in making predictions about outcomes than are trained human judges. In addition, a number of researchers have concluded that professional identity is unrelated to the accuracy of clinical judgment (Meehl, 1986; Dawes et al., 1989; McCauley, 1991). Moreover, research has shown that psychologists’ training and clinical experiences may actually cause an over-exaggerated sensitivity to psychopathology. Secondly, clinical judgment has been shown to be susceptible to various biases whereas actuarial judgment is not. This literature was reviewed by Garb and Shramke (1996), in which biases in the clinical judgments of psychologists and psychiatrists were examined using meta-analytic methods. Blacks and Hispanics were more likely to be misdiagnosed as schizophrenics, middle-class patients were more likely to be referred to outpatient psychotherapy than lower-class patients, and Black patients were more likely to be prescribed anti-psychotic medication. In conclusion, the author stated that there is little evidence that increased experience results in increased accuracy of clinical prediction. Based upon this research, it is possible that the results of the present study are related to the differences in methodology. Specifically, the traditional GAF relies heavily upon clinical judgment and thereby can be subject to potential judgment errors such as regression toward the mean, overconfidence, and the hindsight bias. In contrast, the computer-assisted GAF uses more of an actuarial approach in 36 determining a GAF score. 4.3 Clinical Implications Overall, these results generally support the reliability and clinical utility of the GAF. The results suggest that whereas both methods can be utilized reliably by clinical psychologists, the computer-assisted GAF appears to generate scores that are more accurate, at least in the sense of being closer to “gold standard” scores provided by expert diagnosticians. A notable strength of the computer-assisted GAF is that it ensures that all aspects of functioning, (psychological, social, and occupational), are explicitly considered (First, 1997). This feature of the program is of significant importance as the GAF is used in a variety of settings with varied populations ranging from homeless individuals to those who have a highly developed system of social support. In addition, the computer-assisted GAF is time-efficient, easy to administer, and allows for appropriate record keeping. In real-world practice, the most significant drawback is the potential cost and need for a computer given the limited resources in many clinical settings. The present findings are noteworthy for several reasons. Providers and consumers of mental health services are interested in both symptoms and functioning. In addition, as mental health providers and managed care providers utilize the GAF scale to justify insurance reimbursement for services and to demonstrate efficacy of treatment interventions, the need for accurate and consistent measurement tools is of obvious importance. Moreover, all patients discharged from a psychiatric hospitalization are required to have a GAF score recorded as part of the discharge 37 process in the United States. If GAF scores continue to be given in an unstructured manner due to confusion over how to integrate the potentially disparate contributions of a patient’s psychiatric symptoms and functioning into a single GAF rating, problems ensuring an appropriate standard of care may result. In addition, a reliable and valid measure of symptoms and functioning is critical to clinicians’ efforts to document the effectiveness of their interventions. A transition to the computer-assisted GAF program would require numerous changes in policy, education, and training in mental health agencies. First, training would be required to ensure that all clinicians understand the computer-assisted GAF and the procedural components of the actual program. In addition, practice sessions using videotaped case presentations would be recommended to familiarize all clinicians with the computer-assisted program and its intended use. Second, a major change in policy would be required so that GAF scores were no longer given based upon managed health-care requirements. Specifically, clinicians would be permitted to give GAF scores reflective of the clients actual functioning without being penalized by the managed-care provider because the score is not within the designated range for service criteria. Lastly, policies would have to be modified to allow for the cost of the program, training, and changes in how GAF scores are used to determine treatment eligibility. Therefore, with further research and health-care reforms, a transition to the computer-assisted GAF program holds promising potential. 38 4.4 Strengths and Limitations These results should be interpreted with caution given that the study had several possible limitations. First, the results may be limited by the relatively small sample of clinicians, most of whom were female and Latino, from a small yet diverse community in South Florida. The small sample size may have limited statistical power for some tests. In addition, the clinicians’ individual backgrounds and graduate training differed from one another. To the extent that graduate training affects GAF ratings, responses may have differed depending upon training, clinical orientation, and area of expertise. However, the sample size was insufficient to examine the potential effects of such variables. Another potential factor that may have had an adverse impact on GAF ratings was the fact that participants did not have the same level of experience and expertise with the GAF, and specific training was not completed to enhance reliability. Lastly, the sample may not have accurately represented the wide array of individuals in the mental health profession who use the GAF on a regular basis such as social workers, mental health technicians, and nursing staff. Another potential limitation of this study may have been the use of fictitious vignettes that represented only one specific gender, race, and age group. It is possible that biases regarding these demographics may have impacted GAF ratings. In addition, the vignettes may not have contained some of the relevant information that is typically obtained in the actual clinical intake process. Future research might use actual transcriptions of clinician-patient interviews observed in actual clinical 39 settings. This would ensure replication of the conditions under which the GAF rating would actually occur. Moreover, the vignettes represented mild-to-moderate levels of psychiatric disturbance similar to what would be found at a community outpatient clinic. The use of such a symptom profile may have restricted the generalizability of these results and leads to the need to replicate these findings with vignettes representative of an inpatient population. A final possible limitation concerns the specific questions used in the computer-assisted program. Specifically, no information is available as to how the individual questions were developed and how it was determined to include them in the decision tree model. The author simply reports that he completed his residency under Robert Spitzer in the Biometrics Research Department at Columbia University and has a doctoral degree in Computer Science. In addition, the authors of the program make the rather dubious claim that “psychometric data is not needed as the program is a diagnostic tool based upon the DSM and not a test” (First et. al., 1997, p. 4). In spite of the aforementioned limitations, this study nevertheless has several noteworthy strengths. This study was the first to evaluate the psychometric properties of the GAF in relation to the amount of information available. This is of importance given the fact that the GAF is used in a wide array of settings with varied patient populations. In addition, unlike other studies, the reliability findings were obtained for raters without formalized training in the use of the GAF, thereby enhancing the external validity of the findings as representing a realistic estimate of reliability 40 figures found in real world practice with clinical psychologists. Another significant strength of this research is that GAF ratings were determined by two recognized experts in the administration and scoring of the GAF to establish “gold standard” scores for each vignette. 4.5 Conclusions and Future Directions In summary, the results of this study support the previous finding that the GAF can be rated reliably when used with mild-to-moderately impaired individuals. Future studies are needed to explore the aforementioned shortcomings using more clinically typical stimuli with varying levels of psychopathology and a larger, more diverse, sample. In addition, it is necessary to evaluate whether the computer-assisted GAF provides information independent of Axis I and Axis II of the DSM-IV diagnostic system and whether it contains information about patients’ social and occupational functioning that is independent of clinical judgment about the severity of their psychological symptoms. Moreover, future research is needed to evaluate the reliability and validity of the proposed GAF replacements including the computerized GAF, SOFAS, GARF, and the K-Axis to determine if they are an improvement to the existing Axis V. Lastly, future research needs to ascertain whether the GAF and these proposed replacements can be used effectively as outcome measures. The present findings are consistent with the conclusion that the GAF is a worthwhile tool, and that the computer-assisted GAF may be an improvement over the traditional paper-and-pencil GAF in some circumstances. The traditional GAF lacks structure and depends greatly upon clinical judgment and conjecture. In 41 addition, the directions do not explicitly state how to integrate the three areas of functioning into a single composite rating. The computer-assisted GAF attenuates many of these limitations by providing a structured set of questions that address each area of functioning to ensure that they are considered in the composite score. Furthermore, the computer-assisted GAF follows a decision tree model that determines the appropriate GAF range based upon the responses of the clinician. Further research is needed to evaluate the clinical utility of the GAF relative to other recently developed measures. 42 LIST OF REFERENCES American Psychological Association. (1987) Diagnostic and Statistical Manual of Mental Disorders (3rd ed., revised) (DSM-III-R). Washington, DC: APA. American Psychological Association. (1994) Diagnostic and Statistical Manual of Mental Disorders (4th ed.) (DSM-IV). Washington, DC: APA. Bodlund, O., Kullgren, G., Ekselius, L., Lindstrom, E., and von Knorring, L. (1994). Axis V- Global assessment of functioning scale: Evaluation of a self-report version. Acta Psychiatrica Scasndinavica, 90,342-347. Borenstein, M., Rothstein, H., & Cohen, J. (2001). Sample Power [Computer software]. Chicago, Illinois: SPSS Inc. Byrne, P.R., Dagadakis, C., Unutzer, J. , & Ries, R. (1996). Evidence for the limited validity of the revised global assessment of functioning scale. Psychiatric Services, 47, 864-866. Calvocressi, L., Libman, D., Vegso, S.J., McDougle, C.J., & Price, L.H. Global functioning of inpatients with obsessive-compulsive disorder, schizophrenia, and major depression. Psychiatric Services, 49, 379-381. Cohen, J. & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). New Jersey: Lawrence Erlbaum Associates, Inc. Dawes, R.M., Faust, D., & Meehl, P.E. (1989). Clinical versus actuarial judgment. Science, 243,1668-1674. Dworkin, R.J., Friedman, L.C., & Telschow, R.L. (1990). The longitudinal use of the global assessment scale in multiple rater situations. Community Mental Health Journal, 26,335-344. Endicott, J., Spitzer, R.L., & Fleiss, J.L. (1976). The global assessment scale. Archives of General Psychiatry, 33, 766-771. First, M. (1997). A DSM-IV Program for Windows: The GAF Report (Computer software and manual). Toronto, Canada: Multi-Health Systems. Garb, H. N., & Schramke, C. J. (1996). Judgment research and neuropsychological assessment: A narrative review and meta-analyses. Psychological Bulletin, 120, 140–153. 43 Goldman, H.H., Skodol, A.E., & Lave, T.R. (1992). Revising Axis V for DSM-IV: A review of measures of social functioning. American Journal of Psychiatry, 149, 1148-1156. Gordan, R.E. & Gordon, K.K. (1985). Predicting length of hospital stay of psychiatric patients. American Journal of Psychiatry, 142, 235-237. Gordan, R.E. & Gordon, K.K. (1987). Relating axes IV and V of DSM-III to clinical severity of psychiatric disorders. Canadian Journal of Psychiatry, 32, 423424. Gordan, R.E., Skodal, A.E., & Lave, T.R. (1992). Revising axis V for DSMIV: A review of measures of social functioning. American Journal of Psychiatry, 149, 1148-1156. Hall, R.C. (1995). Global assessment of functioning: A modified scale. Psychosomatics,36, 267-275. Hilsenroth, M.J, Ackerman, S.J., Blagys, M.D., Baumann, B.D., Baity, M.R., Smith, S.R., Price, J.L., Smith, C.L., Heindselman, T.L., Mount, M., & Holdwick, D.J. (2000). Reliability and Validity of DSM-IV Axis V. American Journal of Psychiatry, 157, 1858-1863. Higgins, J. & Purvins, K. (2000). A comparison of the Kennedy Axis V and the Global Assessment of Functioning Scale. Journal of Psychiatric Practice, 6, 8490. Jones, S.H., Thornicroft, G., Coffey, M., & Dunn, G. (1995). A brief mental health outcome study: Reliability and validity of the global assessment of functioning (GAF). British Journal of Psychiatry,166, 654-659. Kennedy, J.A. (2003). Mastering the Kennedy Axis: A New Psychiatric Assessment of Patient Functioning. Washington:DC, American Psychiatric Publishing. Luborsky, L. (1962). Clinicians judgement of mental health. Archives of General Psychiatry, 7, 407-417. Luborsky, L. & Bachrach, H. (1974). Factors influencing clinician's judgement of mental health. Archives of General Psychiatry, 31, 292-299. McCauley, C. (1991). Selection of national science foundation graduate fellows: A case study of psychologists failing to apply what they know about decision making. American Psychologist, 46, 1287-1291. 44 Meehl, P.E. (1996). Causes and effects of my disturbing little book. Journal of Personality Assessment, 50, 370-375. Mezzich, J.E., Evanczuk, K.J., Mathias, R.J., & Coffman, G.A. (1984). Admission decisions and multiaxial diagnosis. Archives of General Psychiatry, 41, 1001-1004. Moos, R. H., McCoy, L., & Moos, B. S. (2000). Global assessment of functioning ratings: Determinants and role as predictors of one-year treatment outcomes. Journal of Clinical Psychology, 56 (4), 449-461. Moos, R.H., Nichol, A.C., & Moos, B.S. (2002). Global assessment of functioning ratings and the allocation and outcomes of mental health services. Psychiatric Services, 53, 730-737. Patterson, D. & Shin-Lee, M. (1995). Field trial of the global assessment of functioning scale-modified. American Journal of Psychiatry, 152, 1386-1388. Phelan, M., Wykes, T., & Goldman, H. (1994). Global functioning scales: A review. Social Psychiatry and Psychiatric Epidemiology, 29, 205-211. Piersma, H.L. & Boes, J.L. (1997). The GAF and psychiatric outcome: A descriptive report. Community Mental Health, 46, 117-121. Roy-Byrne, P., Dagadakis, C., Unutzer, J., & Ries, R. (1996). Evidence for limited validity of the revised global assessment of functioning scale. Psychiatric Services, 47 (no. 9,) 864-866. Skodol, A.E., Link, B.G., Shrout, P.E., & Horwath, E. (1988). The revision of Axis V in DSM-III-R: Should symptoms be included. American Journal of Psychiatry, 145, 825-829. Thompson, J.W., Burns, B. J., Goldman, H. H., & Smith, J. (1992). Initial level of care and clinical status in managed mental health programs. Hospital and Community Psychiatry, 43, 599-603. Williams, J. B. (1985). The multiaxial system of DSM-III: Where did it come from and where should it go. Archives of General Psychiatry, 42, 175-180. Woldoff, S. B. & Herbert, J.D. (2001). Reliability and Validitty of the GAF Using Two Methods of GAF Determination(Masters Thesis, MCP Hahnemann University, 2001). Manuscript in preparation 45 APPENDIX A TRADITIONAL GAF SCALE Superior functioning in a wide rage of activities, life's problems never seem 91-100 to get out of hand, is sought out by others because of his or her many qualities. No symptoms. Absent or minimal symptoms, good functioning in all areas, interested and 90-81 involved in a wide range or activities, socially effective, generally satisfied with life, no more than everyday problems or concerns. If symptoms are present they are transient and expectable reactions to 80-71 psychosocial stresses; no more than slight impairment in social, occupational, or school functioning Some mild symptoms OR some difficulty in social, occupational, or school 70-61 functioning, but generally functioning pretty well, has some meaningful interpersonal relationships. 60-51 Moderate symptoms OR any moderate difficulty in social, occupational, or school functioning. 50-41 Serious symptoms OR any serious impairment in social, occupational, or school functioning. Some impairment in reality testing or communication OR major impairment 40-31 in several areas, such as work or school, family relations, judgment, thinking, or mood. Behavior is considered influenced by delusions or hallucinations OR serious 30-21 impairment in communications or judgment OR inability to function in all areas. 20-11 Some danger or hurting self or others OR occasionally fails to maintain minimal personal hygiene OR gross impairment in communication. Persistent danger of severely hurting self or others OR persistent inability to 10-1 maintain minimum personal hygiene OR serious suicidal act with clear expectation of death. 46 APPENDIX B COMPUTER-ASSISTED DECISION TREE MODEL 47 APPENDIX C VIGNETTE A/HIGH INFORMATION Ms. Montgomery is a 41 year old African-American woman who has been participating in outpatient group therapy at a community mental health clinic for the past ten months. She has a history of two psychiatric hospitalizations due to severe depression and suicidal ideation. She is the third of four children, and reports that she was raised by her paternal grandparents. While she reportedly developed intellectually and physically at a normal rate, Ms. Montgomery describes herself as a shy, lonely, and quiet child. In addition, she characterizes her family relationship during her adolescence as “cold and unsupportive.” She also reports that she did not suffer from any major illnesses during her childhood. Ms. Montgomery states that she felt neutral about school and received average grades. In addition, Ms. Montgomery states that she had several friends and that she rarely got into trouble. After completing a two-year medical assistant program, Mrs. Montgomery enlisted in the army for approximately 3 years, after which time she received a medical discharge because of psychological problems. Since leaving the army, Ms. Montgomery states that she enrolled in Community College, but did not finish her degree. In addition, Ms. Montgomery reports that she has had more than seven different full-time jobs, and that she has experienced periods of unemployment that have lasted for more than a year. Currently, Ms. Montgomery’s major source of income is her unemployment benefits. She is also currently pursuing SSI benefits. Ms. Montgomery reports that she has never been married and has no children. She reports that her parents and all grandparents are deceased. Ms. Montgomery also reports that she has contacted her siblings over the phone, but has a difficult time visiting them since they live far away. Ms. Montgomery appears about her stated age and is always prompt for her appointments. She was approachable and had a relatively pleasant demeanor. However, Ms. Montgomery was often poorly groomed, with soiled clothes and an unbathed appearance. Ms. Montgomery often had trouble making eye contact during the assessment and was somewhat guarded. In addition, Ms. Montgomery had a difficult time staying focused on the questions being asked, and tended to launch into tangential stories at inappropriate times. Moreover, these stories tended to have a common theme of persecution and paranoia. Ms. Montgomery appears to have very poor insight into her psychological problems. During all of her appointments with the current evaluator, she insisted that she had been labeled incorrectly as having “mental problems” and that she did not belong with the other individuals in her group therapy sessions. In addition, she stated that she wanted a job but that she had found fault with each job opportunity, apparently believing that she could do better. Overall, it appears that Ms. Montgomery has some very negative views about her social situation and the overall quality of her life. She has been consistently dissatisfied with her economic status as well as her social pursuits. She denies any suicidal or homicidal ideation. 48 APPENDIX D VIGNETTE C/LOW INFORMATION Ms. Montgomery is a 41 year old African-American woman who has been participating in outpatient group therapy at a community mental health clinic for the past ten months. She has a history of severe depression. She is the third of four children, and reports that she was raised by her paternal grandparents. While she reportedly developed intellectually and physically at a normal rate, Ms. Montgomery describes herself as a shy, lonely, and quiet child. In addition, she characterizes her family relationship during her adolescence as “cold and unsupportive.” She also reports that she did not suffer from any major illnesses during her childhood. Ms. Montgomery states that she felt neutral about school and received average grades. In addition, Ms. Montgomery states that she had several friends and that she rarely got into trouble. After completing a two-year medical assistant program, Mrs. Montgomery enlisted in the army for approximately 3 years, after which time she received a medical discharge because of psychological problems. Since leaving the army, Ms. Montgomery states that she enrolled in Community College, but did not finish her degree. In addition, Ms. Montgomery reports that she has had more than seven different full-time jobs, and that she has experienced periods of unemployment that have lasted for more than a year. Currently, Ms. Montgomery’s major source of income is her unemployment benefits. She is also currently pursuing SSI benefits. Ms. Montgomery also reports that she has never been married and has no children. Ms. Montgomery was approachable and had a relatively pleasant demeanor. However, Ms. Montgomery was often poorly groomed, with soiled clothes and an unbathed appearance. Ms. Montgomery often had trouble making eye contact during the assessment and was somewhat guarded. Ms. Montgomery appears to have very poor insight into her psychological problems. During all of her appointments with the current evaluator, she insisted that she had been labeled incorrectly as having “mental problems” and that she did not belong with the other individuals in her group therapy sessions. In addition, she stated that she wanted a job but that she had found fault with each job opportunity, apparently believing that she could do better. Overall, it appears that Ms. Montgomery has some very negative views about her social situation and the overall quality of her life. She has been consistently dissatisfied with her economic status as well as her social pursuits. She denies any suicidal or homicidal ideation. 49 APPENDIX E VIGNETTE B/ HIGH INFORMATION Ms. Harris is a 40 year old African-American woman who has been participating in outpatient therapy at a community mental health clinic for the past year. She has a history of one psychiatric hospitalization due to a suicidal gesture. She is the eldest of two children, and reports that she was raised by her mother and aunt. While she reportedly developed intellectually and physically at a normal rate, Ms. Harris describes herself as a timid child. In addition, she characterizes her family relationship during her adolescence as “distant.” She also reports that she did not suffer from any major illnesses during her childhood except for chicken pox. Ms. Harris states that she enjoyed school and received average grades. In addition, Ms. Harris states that she had a large group of friends and that she rarely got into trouble. After completing High School, Ms. Harris enrolled in community college. However, she found the work to be overwhelming and dropped out during her sophomore year. Ms Harris reports that she has been employed as a part-time sales associate at a local store. However, she states that she has difficulty getting to work on time and sometimes does not get out of bed. Currently, Ms. Harris’s major source of income is her current salary and financial assistance she receives from her children. Ms. Harris reports that she has never been married and has three children. She reports that her mother is deceased and that her aunt resides in a nursing home. She claims that she has never met her father and is unaware as to his whereabouts. Ms. Harris also reports that her siblings visit her about once a month but she doesn’t contact them as she does not like to use the telephone. Lastly, she reports that her youngest daughter and son live with her and assist her with the finances. Her other children live on their own and visit on holidays. She states that her relationship with her children is strained as they do not understand why she is so sad and withdrawn. Ms. Harris appears about her stated age and is usually late for her appointments. She was approachable, soft spoken, and had a relatively pleasant demeanor. However, Ms. Harris was often poorly groomed wearing what appeared to be the same sweat-suit each week. Ms. Harris often had trouble making eye contact during the assessment and was tearful at times. In addition, Ms. Harris had a difficult time staying focused on the questions being asked, and tended to perseverate on family issues and her need to get better. Moreover, she would be quite talkative at times and then suddenly withdraw and cry excessively. Ms. Harris appears to have very poor insight into her psychological problems and blames outsiders for her own difficulties. During all of her appointments with the current evaluator, she insisted that she just had “bad nerves.” Furthermore, she stated that she stooped taking her psychotropic medication because it hurt her stomach but could not report when she discontinued using it. Overall, it appears that Ms. Harris has some very negative views about her social situation and the overall quality of her life. She has been consistently dissatisfied with her economic status and family life. 50 APPENDIX F VIGNETTE D/LOW INFORMATION Ms. Harris is a 40 year old African-American woman who has been participating in outpatient therapy at a community mental health clinic for the past year. She has a history of depression. She is the eldest of two children, and reports that she was raised by her mother and aunt. While she reportedly developed intellectually and physically at a normal rate, Ms. Harris describes herself as a timid child. In addition, she characterizes her family relationship during her adolescence as “distant.” She also reports that she did not suffer from any major illnesses during her childhood except for chicken pox. Ms. Harris states that she enjoyed school and received average grades. In addition, Ms. Harris states that she had a large group of friends and that she rarely got into trouble. After completing High School, Ms. Harris enrolled in community college. However, she found the work to be overwhelming and dropped out during her sophomore year. Ms Harris reports that she has difficulty getting to work on time and sometimes does not get out of bed. Currently, Ms. Harris’s major source of income is her current salary and financial assistance she receives from her children. Ms. Harris reports that she has never been married and has three children. Lastly; she reports that her youngest daughter and son live with her and assist her with the finances. Her other children live on their own and visit on holidays. Ms. Harris was approachable, soft spoken, and had a relatively pleasant demeanor. However, Ms. Harris was often poorly groomed wearing what appeared to be the same sweatsuit each week. Ms. Harris often had trouble making eye contact during the assessment and was tearful at times. Ms. Harris appears to have poor insight into her problems. Ms. Harris appears to have very poor insight into her psychological problems and blames outsiders for her own difficulties. During all of her appointments with the current evaluator, she insisted that she just had “bad nerves.” Overall, it appears that Ms. Harris has some very negative views about her social situation and the overall quality of her life. She has been consistently dissatisfied with her economic status and family life. 51 Table 1 GAF Minimum, Maximum, Means and Standard deviations ___________________________________________________ Time 1 Time 2 Vignette M SD M SD PPALow 40.33 7.73 42.00 8.62 PPCHigh 45.89 7.46 44.89 7.49 PPBLow 43.11 8.13 42.78 6.12 PPDHigh 52.67 5.41 51.00 4.27 CALow 37.56 14.15 45.00 9.82 CCHigh 41.33 10.65 42.89 11.34 CBLow 44.44 9.13 50.33 10.01 CDHigh 45.56 14.83 45.44 10.55 Note. PP = paper-and-pencil GAF administration; C = computer assisted GAF administration; Low = low information condition; High = high information condition. 52 Table 2 Test-Retest Reliabilty of GAF Ratings _______________________________________________________________________________ Variable Cronbach’s Alpha Description PP-Paper-and-Pencil GAF PPALow1-PPALow2 .98 Excellent PPCHigh1-PPCHigh2 .95 Excellent PPBLow1-PPBLow2 .91 Excellent PPDHigh1-PPDHigh2 .87 Good CALow1-CALow2 .58 Questionable CCHigh1-CCHigh2 .77 Acceptable CBLow1-CBLow2 .89 Good CDHigh1-CDHigh2 .89 Good C-Computer-assisted GAF Note. PP = paper-and-pencil GAF administration; C = computer assisted GAF administration; Low = low information condition; High = high information condition. 53 60 50 GAF ratings GAF RATING 40 30 GOLD STANDARD GAF RATING 20 10 O D W CO HI M GH CO BL M OW D H IG H PP BL PP PP PP A LO CH W CO I M GH CO AL M OW CH IG H 0 Method Figure 1. GAF Ratings Compared to Gold Standard GAF Ratings Note. PP = paper-and-pencil GAF administration; C = computer assisted GAF administration; Low = low information condition; High = high information condition Page Break 54 Standard Error of Mean +- 2 SE 8 6 4 2 0 -2 -4 -6 N= 36 36 PPLOW 36 36 COMLOW PPHIGH COMHIGH Figure 2. Standard Error of GAF Ratings Note. PP = paper-and-pencil GAF administration; C = computer assisted GAF administration; Low = low information condition; High = high information condition. Mean Diff. Between GAF and Gold Standard 55 6 4 2 0 -2 -4 PPLOW PPHIGH COMLOW COMHIGH Figure 3. Mean Difference Between GAF Ratings and Gold Standard Ratings Note. PP = paper-and-pencil GAF administration; C = computer assisted GAF administration; Low = low information condition; High = high information condition 56 G A 55 F 50 R A 45 T 40 I N 35 G S 30 Paper Low iPaper f High if Computer Low iCom f puter High if Time 1 Time 2 Figure 4. Mean GAF scores by Method and Information Across Time Note: Low = low information condition; High = high information condition. 57 Vita Sarah Beth Woldoff EDUCATION Drexel University, Philadelphia, PA Doctorate of Philosophy in Clinical and Health Psychology, July 2004 Masters of Arts and Sciences in Clinical Psychology, Dec. 2000 1992-1996 Temple University, Philadelphia, PA Bachelor of Science in Psychology, May 1996 HONORS AND DISTINCTIONS • Award for Volunteer Community Services at Disability Resources, Temple University, 1996 • Magna Cum Laude, Temple University, 1996 • Golden Key National Honor Society, Temple University, 1995-1996 • Psi Chi (National Honor Society in Psychology), Temple University 1993-1996 • Dean’s List, Temple University, 1993-1996 • Honors Program, Temple University, 1992-1995 • Early Admissions Award, Temple University, 1992 PROFESSIONAL PRESENTATIONS Woldoff, S., Simmons, M., & Napolitano, D. (May 2002). Evaluation of High-p/Low-p sequencing in the treatment of non-compliance. Paper presented at the annual meeting of the Association for Behavior Analysis, Toronto, Canada. 1997-2004 Progar, P., Perrin, F., Woldoff, S., & Tessing, J. (May 2002). Experimental evaluation of learning history upon current reinforcement contingencies. Symposia panel discussion at the annual meeting of the Association for Behavior Analysis, Toronto, Canada. Woldoff, S., Herbert, J., & Greenberg, R. (April 2001). Reliability of the GAF in Multiplerater Situations. Paper presented for the annual meeting of the Eastern Psychological association, Washington, D.C. Woldoff, S., & Harwell, V. (April 2001). Anxiety Disorders and Assessment in College-aged Populations. Symposia presented for the annual meeting of the Eastern Psychological association, Washington, D.C. Greenberg-Saluk, R., Herbert, J., Rheingold, V., Woldoff, S., & Crittenden, K. (Nov. 2000). Brief FNE: Preliminary psychometric findings. Paper presented at the annual meeting of the Association for the Advancement of Behavior Therapy, New Orleans, and Louisianna. Progar, P., Woldoff, S., Vollmer, Mace, F.C., Daniels, Dency. (May 1998). A comparison of NCR and DRO for Self-Injury when reinforcement rates are equated. Symposia panel discussion at the annual meeting of the Association for Behavior Analysis, Orlando, FL. Woldoff, S., Progar, P., Vollmer, T., & Eisenchink, K. (May 1998). Reducing Aggressive and Self-Injurious Behavior using two different schedules of differential reinforcement of other behavior. Poster presented at the annual meeting of the Association for Behavior Analysis, Orlando, FL. Woldoff, S., Progar, P., Mace, F.C., Vollmer, T., Lalli, J., & Ploog, F. (May 1997). A comparison of NCR and DRO for Self-Injury when reinforcement rates are equated. Poster presented at the annual meeting of the Association for Behavior Analysis, Chicago, IL. MANUSCRIPTS Woldoff, S. & Herbert, J. (in preparation) Reliability of the of the GAF in MultipleraterSituations. Drexel University.