Testing Manuals Review - Laurie Gottschalk Professional School

advertisement
Evaluation of Three Testing Manuals
Evaluation of Three Testing Manuals
Laurie E. Gottschalk
Portland State University
1
Evaluation of Three Testing Manuals
2
Population of Choice
My population of choice is adolescents ranging from grades 6-12. This population experiences a
variety of growing pains and challenges that range from getting a good grade on a math test to dealing
with eating disorders and depressive symptoms. Since much of this population is not well educated on the
properties of the Diagnostic Statistics Manual IV it is easy for them to misinterpret their feelings as it
pertains to mental illnesses. Having a variety of instruments to help interpret their feelings and their
symptoms could provide a lot of clarification of their feelings and help a counselor to guide them as to the
appropriate ways to deal with said feelings.
Adolescent Anger Rating Scale
Purpose of the test
The purpose of the Adolescent Anger Rating Scale (AARS) is to asses multiple aspects of anger for
screening and treatment purposes. The AARS measures anger along three subscales and a Total Anger
score. The three subscales are Instrumental Anger (IA), Reactive Anger (RA), and Anger Control (AC).
Instrumental Anger is defined as a negative emotion that leads to a future act of planned retaliation or
revenge. Reactive Anger differs in the fact that it is an immediate reaction to what is perceived as a threat
or negative event. Anger Control is the opposite of the previous two as it is defined as a proactive method
used to counter instrumental and reactive anger responses (AARS, 2001).
Selection and use with your population
This test would be an appropriate tool to asses and screen for anger levels and could give a
prediction of the possibility a particular student would commit a violent act based on the three subscales
that the test uses. The test is specifically divided by gender and grade level, this is justified by research
done by Bruney and Kronrey (in print) that suggests that anger tends start early and decline as students
progress through middle school and then repeats in the same pattern in high school. However, as
Evaluation of Three Testing Manuals
Hennington (2007) reports, this justification is only supported be results found in the initial development
and there is little other empirical evidence to support it. As the years progress the anger that adolescents
experience declines. With this type of information counselors could identify and work with students that
could be considered high risk.
Quality of Test/Normative Data/Interpretation
The sample size represented a typical range of ethnicities with about 50% of the group being
Caucasian, 30% African, 4-8% Hispanic, and 3% Asian. Stephenson (2001) rightfully points out that
there is an overrepresentation of the African population within a normative data sample. The data seems
to be a bit over hyped. The testing manual notes that a strong positive correlation could range from .18.20 (AARS, 2001) While these are positive correlations, they are not significantly high. The only high
correlation the manual actually reports is the internal consistency data. Both Stephenson (2003) and
Henington (2003) agree that the instrument has face value with reliable and valid scores but greatly
express concern about the test having too little data to reinforce what it claims to do.
The AARS began as a 106-item instrument. After scrutiny by an expert panel the instrument was
reduced to 51 items, further review from a pilot test prompted developers to remove additional items
resulting in the 41-item instrument that exists today. The normative sample was defined as two groups: a
middle school group (ages 11-14) and a high school group (ages 14-19). The sample size of the middle
school group was 2,171 (51% female and 49% male). High School sample size was reported to be 2,124
(60% female and 40% male)(AARS, 2001).
Relatively high internal consistency was reported, as measured by Cronbachs (1951) alpha, and
ranged from .81-.92 for the standardization sample. The item-level correlation coefficients ranged from
.34-.69. While correlation values that are closer to .00 tend to have little significance the AARS testing
manual justified that “any item with a correlation higher than .30 was considered to potentially increase
3
Evaluation of Three Testing Manuals
4
the reliability of the scale.” Test-Retest data was taken at a two-week interval and also reported high
correlation ranging from .71-.79 (AARS, 2001).
Criterion related validity was computed using the Pearson Product correlation method to
determine and specify relationships between the AARS subscales and reported conduct, instrumental, and
reactive referrals within the school setting. The IA coefficient as related to instrumental referrals was .18,
reactive referrals reported a .20 correlation with the RA subscale, and a negative correlation was found
between AC and both types of referrals. These alphas were used to justify having three subscales within
the test. The alphas for the subscales are low, but the negative correlation between AC and both types of
referrals does provide some evidence that controlled anger can result in a student having less referrals
(AARS, 2001). These correlations suggest a relationship between the type of anger that a student
experiences and the type of referral that they get. The AARS postulates that theoretically the increased
amount of one referral (instrumental) as opposed to another (reactive) would be reflected within the test
scores.
The AARS was compared to two other instruments to justify convergent and discriminate
validity. Convergent Validity was assessed using the Cronners-Wells’ Adolescent Self-Report (1997).
The two tests correlation scores varied from .35-.61 with respect to each scale. Discriminate validity
utilized the Multidimensional Anger Inventory (MAI; Siegal 1986) and reported that coefficients between
the IA and RA scales were .46 and .44 respectively. The comparison between these two instruments was
justified in the fact that they should have low correlation coefficients since the two tests measure different
aspects of anger constructs (AARS, 2001).
Quality of Test Manual
The AARS manual is well organized and relatively easy to read. The instrument was well
rationalized and was developed in a very professional manner. However, Henington (2001) felt that the
validity data was “overstated.” Henington (2001) also made a specific point about the test not having a
Evaluation of Three Testing Manuals
5
“lie scale” to determined false reports by test takers. Stephenson (2001) pointed out that there are “several
unsupported claims regarding the nature of the subscale constructs.”
Ease of administration and scoring
The AARS is a self report, Likert-type scale instrument that consist of a total of 41 questions. The test
was written at a 4th grade reading level and was specifically designed to be administered to 11-19 year
olds. The test can be administered to individuals (5-10 minutes) or to groups (10-20) minutes. It is
recommended that the purpose of the test be disclosed to the test taker. Test takers are also required to fill
out a section on demographics that include information on school referrals in addition to typical
biographical information. Scoring is relatively easy, addition of subscale scores to achieve total anger
score, and included with the test.
Assisting with counseling goals
As this test pertains to counseling in a school setting, it would be a very easy instrument to use
and score. I would hesitate to use the test for predicting violent occurrences because of its lack of
evidence to be able to do so. While it may not predict what it claims to predict, I feel that it would give
an idea of the classification of anger the student is dealing with, and good input on the intensity of that
anger.
Reynolds Adolescent Depression Scale – V.2
Purpose of the test
The Reynolds Adolescent Depression Scale Version 2 (RADS-2) was designed to measure
depression symptoms in school and clinical settings (RADS-2, 2002). The test manual specifically states
it is for measuring symptom severity only and not designed for diagnosis. The RADS-2 was updated to
measure depression symptoms based on four subscales: Dysphoic Mood (DM,), Anhedonia/Negative
Evaluation of Three Testing Manuals
6
Affect (AN), Negative Self Evaluation (NW), and Somatic Complaints (SC). Items directly relate to the
DSM-IV symptoms of depression and dysthymic disorder as described by the RDC (RADS-2, 2002).
Selection and use with your population
This instrument was specifically designed for adolescents and within a school setting. Many
teenagers claim feelings of depression, the test could be utilized to clarify to the counselor the actual level
of depressive symptoms. Using this test would provide the counselor with a justification to recommend
clinical intervention to parents. The developers of this test paid significant attention to gender, age, and
ethnic differences. Females tend to report high scores than males, developmental trends were observed in
the data, and there were no significant reports of variation based on ethnicity (RADS-2, 2002). The test is
applicable to almost any student that you would come across in a school counseling setting.
Quality of Test/ Normative Data/Interpretation
Internal reliability coefficients were very high and measured using Cronbachs (1951) alpha and
ranged from .91-.96. Internal consistency reliability data was taken from a standardization sample of
2,240 with a high reliability alpha of .92. Other studies conducted to measure the reliability yielded
similar results: Reynolds and Miller (1989) in a sample of 112 found an alpha of .96, D’Imperio et. AL
(2000) used a sample of 144 with an alpha of .92. In addition Reynolds and Miller (1989) read the test
aloud to students with special needs and still found a high reliability coefficient of .87 (RADS-2, 2002)
The reliability alphas for this test would lead one to believe that it does indeed measure on a very highly
consistent basis. Internal consistency for the Depression Total scale also measured very highly, the school
sample reported at .93, school-based restandardization sample .92, and a clinical subsample at .94 (Blair,
2007). Test-Retest reliability reported was very sound, and reflected good reliability. A sample group of
104 retested two weeks with a coefficient of .80 and a mean raw score difference of 1.47. A different
sample group of 415 took the test again after three months and had a reliability coefficient of .79 with a
mean raw score difference of less than 2 (RADS-2, 2002).
Evaluation of Three Testing Manuals
7
The RADS-2 manual reports construct validity based on the research conducted to create an
instrument that reflected symptoms of depression using the DSM-IV and the ICD-10, RCD, and the
Hamilton Depression Rating Scale. The item scale correlation coefficient conducted on the correlation
was .53. Research to compare the RADS-2 to the Hamilton Depression Rating Scale showed an overall
validity coefficient of .82 with a p-value of <.0001. Subscale coefficients between the two instruments
was .54-.79. This evidence proves that the two scales measure similar criteria. In another study to
compare the RADS-2 with the Adolescent Psychopathology Scale (APS) with outcomes of .76 as the
inventories relate to depression and .74 with dysthymic disorder. The validity research reinforces the
RADS-2 claim to measure depression symptom severity.
Quality of Test Manual
The test manual is incredibly detailed and ‘represents a significantly improved version of an
already extremely well-developed assessment tool for depression’ (Blair, 2007). The manual is laden
with copious amount of data that further support the justification of adding the four subscales while
reevaluating the original RADS in an effort to reflect the current (2000) census.
Administration and Scoring
The RADS-2 is reviewed as a well-written and well-constructed instrument by Blair (2007)
and is easy to understand, administer, and score according to Carlson (2007). The RADS-2 is reported to
be a 30 item instrument that is based on a self-report Likert-type scale yielding scores on four depression
subscales and written at a 2nd grade reading level. It is designed for adolescents ages 11-20. The test takes
approximately 5 minutes to complete and can be scored easily. Higher scores represent a stronger
association to depression levels. Six critical items on the test allow the interpreter to discriminate between
clinically depressed and non-depressed adolescents. These items are a safety measure to ensure that
adolescents whose scores may not indicate severe depressive symptoms are still identified based on their
responses to these questions and not just their overall score (RADS-2, 2002).
Evaluation of Three Testing Manuals
8
Counseling Goals
The RADS-2 could be very useful in a school counseling setting. With all of the miscellaneous
jobs that are put on school counselors I would imagine that a short inventory that has proven to be reliable
and valid would be warmly welcomed. I would be cautious although, to carefully explain the purpose of
the instrument to the student and make sure that they understand that the manual is not for diagnosing
purposes. It would be very easy for a student to misinterpret the results and jump to conclusions.
Eating Disorder Inventory-3
Purpose of test
The purpose of the Eating Disorder Inventory-3 (EDI-3) is to confirm and evaluate treatment
methods for those who have a suspected eating disorder. The original manual was published in 1983, with
revisions in 1991 and 2004. The test produces scores based on 12 primary scales, 3 are eating disorder
specific and 9 are psychological scales that are not eating disorder specific but reported to be relevant.
These subscales are then categorized into 6 composite scales known as Eating Disorder Risk Composite
(ESRC), Ineffectiveness Composite (IC), Interpersonal Problems Composite (IPC), Affective Problems
Composite (APC), the Over Control Composite (OC), and the General Psychological Maladjustment
Composite (GPMC). These subscales are reported to provide a basis for treatment planning, and the
scores on the many different subscales can provide a counselor with information as to how treatment may
be developed. The instrument also comes with a referral form and symptom checklist to help reinforce the
existence of an eating disorder. (EDI-3, 2004)
Selection and Use with your population
The EDI-3 is a justifiable instrument to use in a school counseling setting because many
adolescents today are dealing with eating disorders. While they may not be diagnosed, and a school
counselor is certainly not required to professionally to deal with an eating disorder, the school setting is
Evaluation of Three Testing Manuals
9
almost a secondary home to students where symptoms of an eating disorder would be observed. The test
is written to apply to both male and females, although females tend to be more associated with eating
disorders, which would allow a counselor the freedom to use it with any student.
Quality of Test/Normative Data/Interpretation
The EDI-3 divided their normative data into four diagnostic groups, Anorexia NervosaRestricting type (AN-R), Anorexia Nervosa-Binge-Eating-Purging type (AN-B/P), Bulimia Nervosa (BN)
and Eating Disorders Not Otherwise Specified (EDNOS). All of the sample types were based on the
DSM-IV-TR diagnostic criteria (EDI-3, 2004). Three of the samples were adult clinical samples, two
from the U.S and one International, all ranging from ages 18+. The other sample was a U.S Adolescent
Clinical sample, sample ages ranging from 11-17 years. These groups were further defined into three
normative sampling groups known as: U.S Adult Clinical, International Adult Clinical, and U.S
Adolescent Clinical (EDI-3, 2004).
The Eating Disorder Risk Composite, based on the Drive for Thinness, Bulimia, and Body
Dissatisfaction subscales, score yielded the highest reliability alpha and ranged from .90-.97 respectively
to the four diagnostic groups. The psychological scales all produced coefficients ranging from .74-.85
with respect to each norming group. Test- Retest reliability was measured with a one to seven day span,
consisted of 34 female participants ranging in age from 15-55, the EDRC and GPMC coefficients were
extremely high, .98 and .97 for each scale. The Eating Disorder Risk scale and Psychological scales had
median coefficients of .95 and .93. These coefficients all prove the stability and reliability of the scale
itself (EDI-3, 2004).
Factor Analysis was used to prove a vast majority of the validity data. Atlas (2007) reports that
results “yielded meaningful scale groups of Ineffectiveness, Interpersonal Problems, Affective Problems,
and Overcontrol Composites.” The manual itself protests that the information from the factorial analysis
Evaluation of Three Testing Manuals 10
was not significant mathematically, however the original purpose of the EDI-3 subscales was to reflect
clinical relevance so singling out composites does not invalidate the instrument (EDI-3, 2004, p. 137)
Correlations were reported with other external measures. The EDI-3 was compared to the EAT26 scale. The DT scale produced the highest correlations; results reported were .72 for Adult and .70 for
Adolescent Populations (EDI-3, 2004). Other highly correlated scales were the BD and EDRC scales with
correlations ranging from .52-.63 with respect to Adults and Adolescents. The Psychological scales did
not correlate as highly, but still yielded consistent results with the median Adult correlation being .27 and
the median Adolescent Correlation being .45. Showing that the two scales moderately correlate on the
Psychological Scales but highly correlate on the EDR scales.
Convergent Validity was presented in a comparison of the EDI-3 to the Rosenberg Self-Esteem
Scale, which produced an expected consistent inverse correlation on all scales. Further reinforcing the
theoretical relationship between poor self esteem and depression (Beck, 1976)(EDI-3, 2004, p. 142).
The EDI-3 represents a very sensitive instrument for what is a highly sensitive matter. While
great care and consideration was put into the creation of the manual, both Atlas (2007) and Kagee (2007)
report the need for further development of construct validity and its application to cross-cultural samples.
Atlas (2007) reports the EDI-3 to be “ultimately disappointing,” and suggests that the screening
components may be more useful than the actual scale itself. Much advancement has been made in the
eating disorder field and specific interview techniques have been refined. These two factors combined
possibly outweigh the necessity for an instrument of this length and expense.
Quality of Test Manual
The test manual provides extensive information regarding the development, research, norming
data, test interpretation, and reliability and validity data. Kagee (2007) confirms that the data provided
proves substantial empirical research but recommends more studies to help prove construct validity. The
work put into the EDI-3 was described as “voluminous” and done “with care” as stated by Atlas (2007).
Evaluation of Three Testing Manuals 11
Ease Administration and Scoring
The Eating Disorder Inventory-3 was written at a 4th grade reading level and was originally
developed for a population 18 and older, however the latest revision now has expanded that range to 12
and 53. The manual boasts that there is no need for professional psychological training.
Scoring seems quite involved, however Kagee (2007) reports that it is easy to score. Upon
opening the test book a scoring rubric is provided and all of 12 of the subscales are scored based on a 0-4
point value system based on the Likert-type scale answers. Scores are a computed based on the 12
subscales and then combined to determine the score of the 6 composite scales. The manual provides
ample information on all of the scales interpretations, which all yield a range of scores indicating an
elevated clinical range, moderate clinical range, and low clinical range (EDI-3, 2004).
Application to Counseling Goals
I would be extremely cautious to apply this in a school counseling setting. While the test has a
noble purpose and certainly has the evidence to back up its claims, the test is incredibly lengthy and
scoring is time consuming although reported as easy to do. With such a demanding instrument it would
be very impractical to institute it in a school counseling setting.
Evaluation of Three Testing Manuals 12
References
AARS
Burney, D. M. (2001). Adolescent Anger Rating Scale. Professional Manual. Lutz, FL:
Psychological Assessment Resources, Inc.
Henington, C. (2003). [Review of the Adolescent Anger Rating Scale]. The fifteenth Mental
Measurements Yearbook. Available from:
http://library.pdx.edu/dofd/resources.php?category=42
Stephenson, H. (2003). [Review of the Adolescent Anger Rating Scale]. The fifteenth Mental
Measurements =Yearbook. Available from:
http://library.pdx.edu/dofd/resources.php?category=42
RADS
Blair, K. A. (2007). [Review of the Reynolds Adolescent Depression Scale-2nd Edition] The seventeenth
Mental Measurements Yearbook. Available from:
http://library.pdx.edu/dofd/resources.php?category=42
Carlson, J. F. (2007). [Review of the Reynolds Adolescent Depression Scale-2nd Edition] The
seventeenth Mental Measurements Yearbook. Available from:
http://library.pdx.edu/dofd/resources.php?category=42
Reynolds, W. M. (2002). Reynolds Adolescent Depression Scale-2nd Edition: Professional Manual.
Lutz, FL: Psychological Assessment Resources, Inc.
EDI-3
Atlas, J.A. (2007). [Review of the Eating Disorder Inventory-3] In The seventeenth Mental Measurements
Yearbook. Available from:
http://library.pdx.edu/dofd/resources.php?category=42
Garner, D. M. (2004). Eating Disorder Inventory-3: Professional Manual. Lutz, FL:
Psychological Assessment Resources, Inc.
Kagee, A. (2007). [Review of the Eating Disorder Inventory-3] In The seventeenth Mental Measurements
Yearbook. Available from:
http://library.pdx.edu/dofd/resources.php?category=42
Download