Critique of Testing Instruments Crystal Ellis March 4, 2008 COUN 566 Appraisal and Instruments Dr. Miars Introduction The population that I will be working with is school-aged children, within a school setting. The tests that I chose to critique are those that can be used with school aged-children. Upon receiving my graduate degree, I would be qualified to administer these tests. The three tests I chose are the Reynolds Child Depression Scale (RCDS), the Adolescent Anger Rating Scale (AARS), and finally the Comprehensive Assessment of Interpersonal Relations (CAIRS). The following is an analysis of the critiques of these tests and their uses in the field. Adolescent Anger Rating Scale (AARS) Purpose, Selection and Use of the Test The purpose of giving the AARS is so that one can measure total anger expression, and also differentiate between instrumental anger, reactive anger and anger control. (Burney, 2001) The development of the instrument was prompted by interest in understanding the root causes of adolescent anger, the need to measure types of anger and the need to develop specific treatment plans to decrease violence caused by anger that is expressed in adolescents. (Burney, 2001) Burney further describes this manual as an instrument that is intended to be used specifically with adolescent’s aged 11-19. AARS items are written at approximately a fourth grade reading level. (Burney, 2001) The AARS is a 41-item instrument that uses a 4-point Likert scale ranging from 1 (Hardly Ever) to 4 (Very Often). (Henington, MMYB 15) Twenty items measure Instrumental Anger (IA), a delayed possibly covert goal related response. (Henington, 15 MMYB)Eight items measure reactive anger (RA), and immediate response to events perceived as negative, threatening, or fear provoking. (Henington, MMYB 15) Thirteen items measure anger control (AC), a proactive cognitive behavioral anger management response. (Henington, MMYB 15) Norm Data and Quality of the Test The norm sample included adolescents from various ethnic groups including African American, Asian, Caucasian, Hispanic, and multi-ethnic groups. Inner city, urban and suburban environments are represented. (Burney, 2001) The norm group consists of 4,187 adolescents divided into middle school (grades 6-8, ages 11-14) and high school (grades 9-12 ages 14-19) from the United States. (Henington, MMYB 15) According to Henington’s review of the AARS manual the following data was stated. The internal consistency was obtained by using Cronbach’s alpha method. Correlations for the entire standardization sample ranged from .81 to .92. Alpha coefficients and standard errors of measurement for the AARS subscales were provided by grade level and gender group. Alpha coefficients for girls and boys in Grades 6-8 and Grades 9-12 were consistent with correlations observed for the total norm sample. Alpha coefficients ranged from .80 to .92 for participants in Grades 6-8 and .81 to .94 for participants on Grades 9-12. Little variability in SEM was found across the norm groups. (Burney, 2002) AARS test retest reliability was measured using 175 pairs of AARS protocols with in a 2-week interval between ratings. (Henington, MMYB 15)The correlations ranged from .71 to .79. (Henington, MMYB 15) These scores indicate a fairly stable measure. The Pearson Product Moment correlations were relatively low among the subscales. (Henington, MMYB 15) Reactive Anger (RA) obtained low correlations across gender and grade, with younger girls having the lowest alpha (.80) (Henington, MMYB 15) Instrumental Anger (IA) obtained the highest values across gender and grades with the older boys having (.94) (Henington, MMYB 15) Item total correlations ranged from .42 to .69 for IA items; .37 to .64 for RA (Reactive Anger) items and .34 to .65 for AC (Anger Control) items. (Henington, MMYB 15) These low correlations suggest that each subscale represented on the AARS has the ability to uniquely measure a specific type of anger independent of other subscales. Content Validity was assessed using an expert panel (school psychologists, school personnel, university professor and clinicians) the experts supported the face and content validity for the AARS. (Burney, 2001) The Criterion validity was assessed using the Pearson Product moment correlation to determine the scores on each AARS scale and subscale and the number of conduct referrals as well as the number of instrumental and reactive anger conduct referrals. (Henington, MMYB 15) The results yielded positive correlations indicating relationships described were strong. (Henington, MMYB 15) A strong negative correlation was found between anger control and the number of conduct referrals (-.31), number of instrumental referrals (-.29), and the number of reactive type anger referrals (-.36), indicating that the more control an adolescent has over his or her anger, the fewer number of general or specific conduct referrals. (Burney, 2001) Furthermore, positive correlations were found between Total Anger scores and the number of conduct referrals. Administration, Scoring, and Quality of the Test Manual The AARS can be administered in a group or just to an individual. When given in a group setting, the environment should provide privacy and confidentiality. Rapport building techniques are especially useful in easing anxiety of the test taker as well as useful in encouraging respondents to answer honestly. (Burney, 2002) It is also important to discuss any questions that come up and distinguish between the variables on the Likert Scale. (Burney, 2002) The manual also suggests (in more than one place) ensuring that the respondent answers each question. Application of Counseling Goals According to Henington, the AARS is a useful tool and he acknowledges the careful consideration that was given to the development of the instrument. However, Henington feels that the author does overstate some of the validity data. One key concern is the etiology of anger and the number of demographic variables. He says that researchers have found that anger and behavior problems such as conduct disorder, oppositional defiance, and attention deficit disorder as related to, rather than caused by, these characteristics. (Henington, MMYB 15) Finally, as a self-report measure with no validity or lie scale, it is unknown if responses will be altered to achieve social acceptance of those who have access too the information provided by the adolescent. According to Henington, discriminate comparisons were made between AARS and the Multidimensional Anger Inventory (MAI). Rationale for this comparison was that the two instruments measure different aspects of anger. The correlations between the MAI and the AARS were described as moderately low. (Henington, MMYB 15) These relatively low correlations provide support of the AARS scores’ ability to measure constricts of anger that differ from the current measures of anger. Stephenson finds this comparison as a flaw in the validity study. He asserts that it would have been useful to see other convergent data such as the State-Trait Expression of Anger Inventory- II and other scales. It is believed that the value of the instrument is likely to outweigh the concerns established here. I can see that this instrument would be a useful screening tool in an educational setting, with especially considerations to identifying types of anger. Reynolds Child Depression Scale (RCDS) Purpose, Selection and Use of the Test The Reynolds Child Depression Scale (RCDS) is a self-report, paper and pencil measure, intended to assess the severity of depressive symptomalogy in 8-12 year old (grades 3-6) children. (Carlson, MMYB 11) Although it is not intended for diagnostic purposes; it was developed in accordance with widely accepted diagnostic systems, such as the Diagnostic and Statistical Manual of Mental Disorders – Third edition (DSM-III), and the Research and Diagnostic Criteria (RDC) and can be used as a screener and as an assessment and evaluation instrument in clinical and research settings. (Carlson, MMYB 11) The RCDS test book is entitled “About Me.” It may be safe to assume that most children are aware of this concept, as it is something that most teachers address in their curriculum at some point. I think that the comfort level there is perhaps and advantage to the young child who may feel nervous. Children are told to choose responses that tell how they have been feeling for the last two weeks. (Rohrbeck, MMYB 11) The author recommends that the items be read out loud for children in grades 3 and 4. (Reynolds, 1981) The scale includes thirty items (at a second grade reading level) that tap cognitive, motor-vegetative, somatic, and interpersonal symptoms of depression. (Rohrbeck, MMYB 11) Twenty-nine items use a 4 point Likert type scale response format, with choices almost never, sometimes, alot of the time, and all of the time. The last item consists of five faces with expressions that range from happy to sad; the child is asked to choose the circle that shows how he/she feels and several items on the scale (n=7) are reverse scored. (Rohrbeck, MMYB 11) Norm Data and Quality of the Test The test norms were based on a sample of 1,620 elementary school aged children in the Midwest and western areas of the United States. (Carlson, MMYB 11) Both Carlson and Rohrbeck of the Mental Measurements Yearbook felt that the sample seems representative of those regions. The norm sample included approximately 30% ethnic minority children from urban, suburban and rural areas. (Carlson, MMYB 11) The RCDS mean total score is 56.42 and is comparable to the mean item score of 1.88. (Reynolds, 1981) Qualitatively this suggests that the average response to items for which high scores suggest depressive sympotmolgy, was almost never and sometimes. Given that depression and depressive symptoms are not considered a normal aspect of childhood, this level of overall symptom endorsement appears consistent with expectations. (Reynolds, 1981) Both Carlson and Rohrbeck of the Mental Measurements Yearbook also suggested that the information regarding reliability and validity are quite impressive overall. Carlson asserts that the internal consistency coefficients (using Cronbach’s alpha) and split half coefficients, corrected for length by the Spearman-Brown formula were in the upper .80’s and lower .90’s within grades, gender, and ethnic groups, as well as a subset for learning disabled children. Test-retest was surprisingly good as well (.82 and .85). The standard error of measurement, computed to be between 3 and 4 points for the total RCDS scale further lends support of the clinical utilization of this measure. (Reynolds, 1981) To establish validity, the manual’s author addresses content validity, criterionrelated, and construct validities. Evidence of construct validity is that the items were developed to reflect the DSM-III R and the RDC symptoms of depression. (Reynolds, 1981) As evidence of construct validity, there were several studies of convergent validity. (Rohrbeck, MMYB 11) The RCDS correlates with the Child Depression Inventory with a correlation of .76. (Rohrbeck, MMYB 11) Criterion related validity was reported by comparing RCDS performance with two other measures o depression with children. In all instances, a correlation ranging in the mid .70’s were obtained. (Reynolds, 1981) Administration, Scoring and Quality of the Test Manual Reynolds explains that the RCDS is written for ages well below that of the minimum age cut off for the test. The RCDS can be administered individually, in small groups of 5-10 or in larger groups (20-30) in a classroom setting. The test manual also suggests that the administering of this test is not advised for large groups. The RCDS is designed to measure depressive symptoms, so to increase clinical utility of the RCDS scores, a cutoff score is provided to designate a clinically relevant level of depressive sypotmmology. (Reynolds, 1981) There are two forms of the test, Form HS and Form OCR. Form HS is used for hand scoring administered in small groups or individuals and requires about 10 minuets for each child to complete. (Reynolds, 1981) The OCR form is for quantitatively larger groups, with the option of mailing in for a machine to score. The cut-off score of 74 defines a level of depressive symptoms; a child with scores above 80 is characterized as severe. (Reynolds, 1981) The RCDS should not be presented as a test, since this may suggest to young children that there are right and wrong answers. It is also important to be mindful of the time in which this test is given. For example, it should not be given around a holiday, field trip, or report card time. This helps to decrease any stress for the child. The administrator should have a calm and even demeanor, allowing for the child to ask any questions. Questions should be answered honestly and concisely. (Reynolds, 1981) The test manual itself is clear and thorough in its delivery of reliability and validity as well as it explicit use of the test. Carlson states that through out the manual the test author issues many caveats for potential test users in an effort to ensure proper test use and interpretation. The test manual presents a solid base of studies pertaining not only to reliability and validity, but also to the standardization procedures and psychometric properties of the RCDS. Application of Counseling Goals Overall, the test is quite useful in determining depressive symptomalogy. Rorhbeck states that the test is not be given as the sole means of determining depressive qualities in children, but rather used as a tool to help define characteristics. This reviewer also mentions that this test is not be used to determine potential suicide. The Children’s Depression Inventory would be a better tool for that. But the reviewer does state that the test is very practical in it’s use with young children. Clinical Assessment of Interpersonal Relations (CAIR) Purpose, Selection and Use of the Test The Clinical Assessment of Interpersonal Relations (CAIR) is an instrument developed to measure the quality of the relationships of children and adolescents with significant people their lives, specifically, parents, peers and teachers. (Bracken, 1993) Interpersonal relations are defined “as the unique and relatively stable behavioral patterns that exists or develops between two or more people as a result of individual and extra individual influences.” (Bracken, 1993) Behavioral aspects of relationships, environmental influences and similarity of characteristics is included in this definition. The test itself is composed of five relationship scales (mother, father, male peers, female peers, teachers) and a Total relationship index (TRI). (Keith, MMYB 13) Each scale contains 35 items, which are well organized in an eight-page test booklet containing an identification section, directions, scales and summary page. The test taker would use a four point Likert Scale to mark how he/she honestly feels about each relationship. The relationship characteristics measured are companisionship, emotional support, guidance, emotional comfort, reliance, trust, understanding, conflict, identification, respect, empathy, intimacy, affect, acceptance, and shared values. (Keith, MMYB 13) This test aligns with the theoretical model of interpersonal relationships. Simply, the theories generally suggest that interpersonal relationships of children and adolescents are influenced by and related to their functioning in many different settings and often can predict later psychosocial adjustment. (Bracken, 1993) Based on this theoretical position, the CAIR and the Multidimensional Self Concept Scale (MSCS) was co-developed and co-normed. Bracken, the author of the test identified six contexts in which children and adolescents most function (social, competence, affect, academic, family, and physical) along with three relationship domains (social, family and academic). These context and domains form the CAIRS and the MCSC respectively. (Medway, MMYB 13) One would select and use this test with children and adolescents ranging in age from 9.0 to 19.11. The CAIR would useful to school psychologists and neurophysiologists as it would be helpful in determining a child’s feeling around his her place based on the relationships he has developed with others around him. It helps to identify a child’s relationship difficulties and it could serve as a guide for a therapist working on intervention in these areas. (Keith, MMYB 13) Norm Data and Quality of the Test Keith’s findings regarding the norm sample are the following. A national sample of 2, 501 children in grades 5 through 12 and ages 9-19 years was used for norming. Children were from a regular education classes, and special educations classes in rural, urban and suburban school districts. School District that participated was from around the United States. Children from intact, reconstituted, single parent family homes or who were living in foster homes were also chosen to represent the norm group. Students from both genders were equally represented. The CAIR yields standard scores and raw scores, standard scores with a mean of 100 and a standard deviation of 15. The scores are converted to T scores with the mean at 50 and the standard deviation at 10. The classification system of the CAIRS describes the extent to which relationships are positive and/or negative and therefore are easily understood by parents, teachers and does not label a child. The internal consistency of the CAIR TRI (TRI refers to Total relationship index) is relatively high, .96 for the total standardization sample. (Bracken, 1993) The test retest reliability also exceeds .90 for each of the five scales. The CAIR shows moderate correlation with the MCSC at .55. Content Validity is strong, as it was based on sound physiological theories, research and literature support. Standard errors of measurement for the five subscale range form 3.0 to 3.97 with the TRI at a SEM of 3.0. (Bracken, 1993) Administration, Scoring and Quality of the Test Manual Bracken states that a formal degree or training is not required to administer the CAIR, but is highly recommended that the administration be done under the supervision of a trained professional. A professional solely does the interpretation of the scores with a graduate training in the related psychology fields. The test materials consist of the CAIR rating form, and the CAIR Score summary profile form. The rating from can be completed in about 15 minutes as long as a quiet, comfortable, nondistracting environment has been provided. It is essential that the person administering the CAIR have a good rapport with the student. It is also helpful to go over the rating form prior to the beginning of the test. The administrator can answer questions regarding words on eh test, but may not in any way add to the meaning of the word when offering help. The test is written at a third grade reading level. The CAIR can be given in a group setting or in an individual setting. There is no time limit on the CAIR. Although it has been stated that the average time to take the test is 20 minutes. (Medway, MMYB 13) The scoring is very delicate in that it needs to be done with care. The scoring is done by differential procedure. Positively worded items are scored from 4 to 1, where as negative connotations are scored from 1 to 4. The examiner must be careful to apply to correct scoring procedure. All raw scores are calculated and then converted to standard scores based on examinee age and gender. Next, confidence intervals ranging from 85% to 99% are assigned and percentile is found as well as T scores. (Medway, MMYB 13) The test manual has been reviewed as being “exceptionally well written” although more information is needed about age race, and gender in comparative analyses. (Keith, MMYB 13) Medway goes on to say that the manual adequately covers interpretive issues such as differences in ratings across gender and type of interpersonal relationship so that the examiner has some understanding of relationship normality. Application to Counseling Goals I believe that both reviewers and the author of this test felt that the CAIR was a psychometrically strong instrument. I can see its use in the school setting as useful and practical. Medway asserts that praise the instrument as a “well conceived and developed instrument that provide a straightforward method of measuring children’s important social networks.” (Medway, MMYB 13) Bibliography Burney, Deanna Mckinnie, Adolescent Anger Rating Scale. 2001 Psychological Assessment Resource. Odessa, FL. Bracken, Bruce. A Comprehensive Assessment Of Interpersonal Relations. 1993 PROED INC Austin, TX. Carlson, Janet Mental Measurement Year Book 11-RCDS Henington, Carlsen MMYB 15 AARS Keith, Patricia MMYB 13 CAIR Medway, Fredric MMYB 13 CAIR Reynolds, William. Reynolds Child Depression Scale. 1981-1989 Psychological Assessments Resources. Odessa Fl. Rohrbeck, Cynthia MMYB 11 RCDS Stephenson, Hugh MMYB 15 AARS