Evaluation of Three Testing Manuals Evaluation of Three Testing Manuals Laurie E. Gottschalk Portland State University 1 Evaluation of Three Testing Manuals 2 Population of Choice My population of choice is adolescents ranging from grades 6-12. This population experiences a variety of growing pains and challenges that range from getting a good grade on a math test to dealing with eating disorders and depressive symptoms. Since much of this population is not well educated on the properties of the Diagnostic Statistics Manual IV it is easy for them to misinterpret their feelings as it pertains to mental illnesses. Having a variety of instruments to help interpret their feelings and their symptoms could provide a lot of clarification of their feelings and help a counselor to guide them as to the appropriate ways to deal with said feelings. Adolescent Anger Rating Scale Purpose of the test The purpose of the Adolescent Anger Rating Scale (AARS) is to asses multiple aspects of anger for screening and treatment purposes. The AARS measures anger along three subscales and a Total Anger score. The three subscales are Instrumental Anger (IA), Reactive Anger (RA), and Anger Control (AC). Instrumental Anger is defined as a negative emotion that leads to a future act of planned retaliation or revenge. Reactive Anger differs in the fact that it is an immediate reaction to what is perceived as a threat or negative event. Anger Control is the opposite of the previous two as it is defined as a proactive method used to counter instrumental and reactive anger responses (AARS, 2001). Selection and use with your population This test would be an appropriate tool to asses and screen for anger levels and could give a prediction of the possibility a particular student would commit a violent act based on the three subscales that the test uses. The test is specifically divided by gender and grade level, this is justified by research done by Bruney and Kronrey (in print) that suggests that anger tends start early and decline as students progress through middle school and then repeats in the same pattern in high school. However, as Evaluation of Three Testing Manuals Hennington (2007) reports, this justification is only supported be results found in the initial development and there is little other empirical evidence to support it. As the years progress the anger that adolescents experience declines. With this type of information counselors could identify and work with students that could be considered high risk. Quality of Test/Normative Data/Interpretation The sample size represented a typical range of ethnicities with about 50% of the group being Caucasian, 30% African, 4-8% Hispanic, and 3% Asian. Stephenson (2001) rightfully points out that there is an overrepresentation of the African population within a normative data sample. The data seems to be a bit over hyped. The testing manual notes that a strong positive correlation could range from .18.20 (AARS, 2001) While these are positive correlations, they are not significantly high. The only high correlation the manual actually reports is the internal consistency data. Both Stephenson (2003) and Henington (2003) agree that the instrument has face value with reliable and valid scores but greatly express concern about the test having too little data to reinforce what it claims to do. The AARS began as a 106-item instrument. After scrutiny by an expert panel the instrument was reduced to 51 items, further review from a pilot test prompted developers to remove additional items resulting in the 41-item instrument that exists today. The normative sample was defined as two groups: a middle school group (ages 11-14) and a high school group (ages 14-19). The sample size of the middle school group was 2,171 (51% female and 49% male). High School sample size was reported to be 2,124 (60% female and 40% male)(AARS, 2001). Relatively high internal consistency was reported, as measured by Cronbachs (1951) alpha, and ranged from .81-.92 for the standardization sample. The item-level correlation coefficients ranged from .34-.69. While correlation values that are closer to .00 tend to have little significance the AARS testing manual justified that “any item with a correlation higher than .30 was considered to potentially increase 3 Evaluation of Three Testing Manuals 4 the reliability of the scale.” Test-Retest data was taken at a two-week interval and also reported high correlation ranging from .71-.79 (AARS, 2001). Criterion related validity was computed using the Pearson Product correlation method to determine and specify relationships between the AARS subscales and reported conduct, instrumental, and reactive referrals within the school setting. The IA coefficient as related to instrumental referrals was .18, reactive referrals reported a .20 correlation with the RA subscale, and a negative correlation was found between AC and both types of referrals. These alphas were used to justify having three subscales within the test. The alphas for the subscales are low, but the negative correlation between AC and both types of referrals does provide some evidence that controlled anger can result in a student having less referrals (AARS, 2001). These correlations suggest a relationship between the type of anger that a student experiences and the type of referral that they get. The AARS postulates that theoretically the increased amount of one referral (instrumental) as opposed to another (reactive) would be reflected within the test scores. The AARS was compared to two other instruments to justify convergent and discriminate validity. Convergent Validity was assessed using the Cronners-Wells’ Adolescent Self-Report (1997). The two tests correlation scores varied from .35-.61 with respect to each scale. Discriminate validity utilized the Multidimensional Anger Inventory (MAI; Siegal 1986) and reported that coefficients between the IA and RA scales were .46 and .44 respectively. The comparison between these two instruments was justified in the fact that they should have low correlation coefficients since the two tests measure different aspects of anger constructs (AARS, 2001). Quality of Test Manual The AARS manual is well organized and relatively easy to read. The instrument was well rationalized and was developed in a very professional manner. However, Henington (2001) felt that the validity data was “overstated.” Henington (2001) also made a specific point about the test not having a Evaluation of Three Testing Manuals 5 “lie scale” to determined false reports by test takers. Stephenson (2001) pointed out that there are “several unsupported claims regarding the nature of the subscale constructs.” Ease of administration and scoring The AARS is a self report, Likert-type scale instrument that consist of a total of 41 questions. The test was written at a 4th grade reading level and was specifically designed to be administered to 11-19 year olds. The test can be administered to individuals (5-10 minutes) or to groups (10-20) minutes. It is recommended that the purpose of the test be disclosed to the test taker. Test takers are also required to fill out a section on demographics that include information on school referrals in addition to typical biographical information. Scoring is relatively easy, addition of subscale scores to achieve total anger score, and included with the test. Assisting with counseling goals As this test pertains to counseling in a school setting, it would be a very easy instrument to use and score. I would hesitate to use the test for predicting violent occurrences because of its lack of evidence to be able to do so. While it may not predict what it claims to predict, I feel that it would give an idea of the classification of anger the student is dealing with, and good input on the intensity of that anger. Reynolds Adolescent Depression Scale – V.2 Purpose of the test The Reynolds Adolescent Depression Scale Version 2 (RADS-2) was designed to measure depression symptoms in school and clinical settings (RADS-2, 2002). The test manual specifically states it is for measuring symptom severity only and not designed for diagnosis. The RADS-2 was updated to measure depression symptoms based on four subscales: Dysphoic Mood (DM,), Anhedonia/Negative Evaluation of Three Testing Manuals 6 Affect (AN), Negative Self Evaluation (NW), and Somatic Complaints (SC). Items directly relate to the DSM-IV symptoms of depression and dysthymic disorder as described by the RDC (RADS-2, 2002). Selection and use with your population This instrument was specifically designed for adolescents and within a school setting. Many teenagers claim feelings of depression, the test could be utilized to clarify to the counselor the actual level of depressive symptoms. Using this test would provide the counselor with a justification to recommend clinical intervention to parents. The developers of this test paid significant attention to gender, age, and ethnic differences. Females tend to report high scores than males, developmental trends were observed in the data, and there were no significant reports of variation based on ethnicity (RADS-2, 2002). The test is applicable to almost any student that you would come across in a school counseling setting. Quality of Test/ Normative Data/Interpretation Internal reliability coefficients were very high and measured using Cronbachs (1951) alpha and ranged from .91-.96. Internal consistency reliability data was taken from a standardization sample of 2,240 with a high reliability alpha of .92. Other studies conducted to measure the reliability yielded similar results: Reynolds and Miller (1989) in a sample of 112 found an alpha of .96, D’Imperio et. AL (2000) used a sample of 144 with an alpha of .92. In addition Reynolds and Miller (1989) read the test aloud to students with special needs and still found a high reliability coefficient of .87 (RADS-2, 2002) The reliability alphas for this test would lead one to believe that it does indeed measure on a very highly consistent basis. Internal consistency for the Depression Total scale also measured very highly, the school sample reported at .93, school-based restandardization sample .92, and a clinical subsample at .94 (Blair, 2007). Test-Retest reliability reported was very sound, and reflected good reliability. A sample group of 104 retested two weeks with a coefficient of .80 and a mean raw score difference of 1.47. A different sample group of 415 took the test again after three months and had a reliability coefficient of .79 with a mean raw score difference of less than 2 (RADS-2, 2002). Evaluation of Three Testing Manuals 7 The RADS-2 manual reports construct validity based on the research conducted to create an instrument that reflected symptoms of depression using the DSM-IV and the ICD-10, RCD, and the Hamilton Depression Rating Scale. The item scale correlation coefficient conducted on the correlation was .53. Research to compare the RADS-2 to the Hamilton Depression Rating Scale showed an overall validity coefficient of .82 with a p-value of <.0001. Subscale coefficients between the two instruments was .54-.79. This evidence proves that the two scales measure similar criteria. In another study to compare the RADS-2 with the Adolescent Psychopathology Scale (APS) with outcomes of .76 as the inventories relate to depression and .74 with dysthymic disorder. The validity research reinforces the RADS-2 claim to measure depression symptom severity. Quality of Test Manual The test manual is incredibly detailed and ‘represents a significantly improved version of an already extremely well-developed assessment tool for depression’ (Blair, 2007). The manual is laden with copious amount of data that further support the justification of adding the four subscales while reevaluating the original RADS in an effort to reflect the current (2000) census. Administration and Scoring The RADS-2 is reviewed as a well-written and well-constructed instrument by Blair (2007) and is easy to understand, administer, and score according to Carlson (2007). The RADS-2 is reported to be a 30 item instrument that is based on a self-report Likert-type scale yielding scores on four depression subscales and written at a 2nd grade reading level. It is designed for adolescents ages 11-20. The test takes approximately 5 minutes to complete and can be scored easily. Higher scores represent a stronger association to depression levels. Six critical items on the test allow the interpreter to discriminate between clinically depressed and non-depressed adolescents. These items are a safety measure to ensure that adolescents whose scores may not indicate severe depressive symptoms are still identified based on their responses to these questions and not just their overall score (RADS-2, 2002). Evaluation of Three Testing Manuals 8 Counseling Goals The RADS-2 could be very useful in a school counseling setting. With all of the miscellaneous jobs that are put on school counselors I would imagine that a short inventory that has proven to be reliable and valid would be warmly welcomed. I would be cautious although, to carefully explain the purpose of the instrument to the student and make sure that they understand that the manual is not for diagnosing purposes. It would be very easy for a student to misinterpret the results and jump to conclusions. Eating Disorder Inventory-3 Purpose of test The purpose of the Eating Disorder Inventory-3 (EDI-3) is to confirm and evaluate treatment methods for those who have a suspected eating disorder. The original manual was published in 1983, with revisions in 1991 and 2004. The test produces scores based on 12 primary scales, 3 are eating disorder specific and 9 are psychological scales that are not eating disorder specific but reported to be relevant. These subscales are then categorized into 6 composite scales known as Eating Disorder Risk Composite (ESRC), Ineffectiveness Composite (IC), Interpersonal Problems Composite (IPC), Affective Problems Composite (APC), the Over Control Composite (OC), and the General Psychological Maladjustment Composite (GPMC). These subscales are reported to provide a basis for treatment planning, and the scores on the many different subscales can provide a counselor with information as to how treatment may be developed. The instrument also comes with a referral form and symptom checklist to help reinforce the existence of an eating disorder. (EDI-3, 2004) Selection and Use with your population The EDI-3 is a justifiable instrument to use in a school counseling setting because many adolescents today are dealing with eating disorders. While they may not be diagnosed, and a school counselor is certainly not required to professionally to deal with an eating disorder, the school setting is Evaluation of Three Testing Manuals 9 almost a secondary home to students where symptoms of an eating disorder would be observed. The test is written to apply to both male and females, although females tend to be more associated with eating disorders, which would allow a counselor the freedom to use it with any student. Quality of Test/Normative Data/Interpretation The EDI-3 divided their normative data into four diagnostic groups, Anorexia NervosaRestricting type (AN-R), Anorexia Nervosa-Binge-Eating-Purging type (AN-B/P), Bulimia Nervosa (BN) and Eating Disorders Not Otherwise Specified (EDNOS). All of the sample types were based on the DSM-IV-TR diagnostic criteria (EDI-3, 2004). Three of the samples were adult clinical samples, two from the U.S and one International, all ranging from ages 18+. The other sample was a U.S Adolescent Clinical sample, sample ages ranging from 11-17 years. These groups were further defined into three normative sampling groups known as: U.S Adult Clinical, International Adult Clinical, and U.S Adolescent Clinical (EDI-3, 2004). The Eating Disorder Risk Composite, based on the Drive for Thinness, Bulimia, and Body Dissatisfaction subscales, score yielded the highest reliability alpha and ranged from .90-.97 respectively to the four diagnostic groups. The psychological scales all produced coefficients ranging from .74-.85 with respect to each norming group. Test- Retest reliability was measured with a one to seven day span, consisted of 34 female participants ranging in age from 15-55, the EDRC and GPMC coefficients were extremely high, .98 and .97 for each scale. The Eating Disorder Risk scale and Psychological scales had median coefficients of .95 and .93. These coefficients all prove the stability and reliability of the scale itself (EDI-3, 2004). Factor Analysis was used to prove a vast majority of the validity data. Atlas (2007) reports that results “yielded meaningful scale groups of Ineffectiveness, Interpersonal Problems, Affective Problems, and Overcontrol Composites.” The manual itself protests that the information from the factorial analysis Evaluation of Three Testing Manuals 10 was not significant mathematically, however the original purpose of the EDI-3 subscales was to reflect clinical relevance so singling out composites does not invalidate the instrument (EDI-3, 2004, p. 137) Correlations were reported with other external measures. The EDI-3 was compared to the EAT26 scale. The DT scale produced the highest correlations; results reported were .72 for Adult and .70 for Adolescent Populations (EDI-3, 2004). Other highly correlated scales were the BD and EDRC scales with correlations ranging from .52-.63 with respect to Adults and Adolescents. The Psychological scales did not correlate as highly, but still yielded consistent results with the median Adult correlation being .27 and the median Adolescent Correlation being .45. Showing that the two scales moderately correlate on the Psychological Scales but highly correlate on the EDR scales. Convergent Validity was presented in a comparison of the EDI-3 to the Rosenberg Self-Esteem Scale, which produced an expected consistent inverse correlation on all scales. Further reinforcing the theoretical relationship between poor self esteem and depression (Beck, 1976)(EDI-3, 2004, p. 142). The EDI-3 represents a very sensitive instrument for what is a highly sensitive matter. While great care and consideration was put into the creation of the manual, both Atlas (2007) and Kagee (2007) report the need for further development of construct validity and its application to cross-cultural samples. Atlas (2007) reports the EDI-3 to be “ultimately disappointing,” and suggests that the screening components may be more useful than the actual scale itself. Much advancement has been made in the eating disorder field and specific interview techniques have been refined. These two factors combined possibly outweigh the necessity for an instrument of this length and expense. Quality of Test Manual The test manual provides extensive information regarding the development, research, norming data, test interpretation, and reliability and validity data. Kagee (2007) confirms that the data provided proves substantial empirical research but recommends more studies to help prove construct validity. The work put into the EDI-3 was described as “voluminous” and done “with care” as stated by Atlas (2007). Evaluation of Three Testing Manuals 11 Ease Administration and Scoring The Eating Disorder Inventory-3 was written at a 4th grade reading level and was originally developed for a population 18 and older, however the latest revision now has expanded that range to 12 and 53. The manual boasts that there is no need for professional psychological training. Scoring seems quite involved, however Kagee (2007) reports that it is easy to score. Upon opening the test book a scoring rubric is provided and all of 12 of the subscales are scored based on a 0-4 point value system based on the Likert-type scale answers. Scores are a computed based on the 12 subscales and then combined to determine the score of the 6 composite scales. The manual provides ample information on all of the scales interpretations, which all yield a range of scores indicating an elevated clinical range, moderate clinical range, and low clinical range (EDI-3, 2004). Application to Counseling Goals I would be extremely cautious to apply this in a school counseling setting. While the test has a noble purpose and certainly has the evidence to back up its claims, the test is incredibly lengthy and scoring is time consuming although reported as easy to do. With such a demanding instrument it would be very impractical to institute it in a school counseling setting. Evaluation of Three Testing Manuals 12 References AARS Burney, D. M. (2001). Adolescent Anger Rating Scale. Professional Manual. Lutz, FL: Psychological Assessment Resources, Inc. Henington, C. (2003). [Review of the Adolescent Anger Rating Scale]. The fifteenth Mental Measurements Yearbook. Available from: http://library.pdx.edu/dofd/resources.php?category=42 Stephenson, H. (2003). [Review of the Adolescent Anger Rating Scale]. The fifteenth Mental Measurements =Yearbook. Available from: http://library.pdx.edu/dofd/resources.php?category=42 RADS Blair, K. A. (2007). [Review of the Reynolds Adolescent Depression Scale-2nd Edition] The seventeenth Mental Measurements Yearbook. Available from: http://library.pdx.edu/dofd/resources.php?category=42 Carlson, J. F. (2007). [Review of the Reynolds Adolescent Depression Scale-2nd Edition] The seventeenth Mental Measurements Yearbook. Available from: http://library.pdx.edu/dofd/resources.php?category=42 Reynolds, W. M. (2002). Reynolds Adolescent Depression Scale-2nd Edition: Professional Manual. Lutz, FL: Psychological Assessment Resources, Inc. EDI-3 Atlas, J.A. (2007). [Review of the Eating Disorder Inventory-3] In The seventeenth Mental Measurements Yearbook. Available from: http://library.pdx.edu/dofd/resources.php?category=42 Garner, D. M. (2004). Eating Disorder Inventory-3: Professional Manual. Lutz, FL: Psychological Assessment Resources, Inc. Kagee, A. (2007). [Review of the Eating Disorder Inventory-3] In The seventeenth Mental Measurements Yearbook. Available from: http://library.pdx.edu/dofd/resources.php?category=42