School Psychology Review Volume 20, No. 2,1991, pp. 284-300 TEACHER RATINGS OF ACADEMIC SKILLS: THE DEVELOPMENT OF THE ACADEMIC PERFORMANCE RATING SCALE George J. DuPaul Mark D. Rapport University of Massachusetts Medical Center University of Hawaii at Mama Lucy M. Perriello University of Massachusetts Medical Center Abstract= This study investigated the normative and psychometric properties of a recently developed teacher checklist, the Academic Pet=fomnance Rating Scale (APRS), in a large sample of urban elementary school children. This instrument was developed to assess teacher judgments of academic performance to identify the presence of academic skills deficits in students with disruptive behavior disorders and to continuously monitor changes in these skills associated with treatment. A principal components analysis was conducted wherein a three-factor solution was found for the APRS. All subscales were found to be internally consistent, to possess adequate test-retest reliability, and to share variance with criterion measures of children’s academic achievement, weekly classroom academic performance, and behavior. The total APRS score and all three subscales also were found to discriminate between children with and without classroom behavior problems according to teacher ratings. The academic performance and adjustment of school-aged children has come under scrutiny over the past decade due to concerns about increasing rates of failure and poor standardized test scores (Children’s Defense Fund, 1988; National Commission on Excellence in Education, 1983). Reports indicate that relatively large percentages of children (i.e., 20-30%) experience academic difficulties during their elementary school years (Glidewell & Swallow, 1969; Rubin & Balow, 1978), and these rates are even higher among students with disruptive behavior disorders (Cantwell & Satterfield, 1978; Kazdin, 1986). Further, the results of available longitudinal studies suggest that youngsters with disruptive behavior disorders and concurrent academic performance dficulties are at higher risk for poor long-term outcome (e.g., Weiss & Hechtman, 1986). These fmdings have direct implications for the assessment of the classroom functioning of students with behavior disorders. Specifically, it has become increasingly important to screen for possible academic skills deficits in this population and monitor changes in academic performance associated with therapeutic interventions. Frequently, traditional measures of academic achievement (e.g., standardized psychoeducational batteries) are used as integral parts of the diagnostic process and for long-term assessment of academic success. Several This project was supported in part by BRSG Grant SO7 RR05712 awarded to the first author by the Biomedical Research Support Grant Program, Division of Research Resources, National Institutes of Health. A portion of these results was presented at the annual convention of the National Association of School Psychologists, April, 1990, in San Francisco, CA The authors extend their appreciation to Craig Edelbrock and three anonymous reviewers for their helpful comments on an earlier draft of this article and to Russ Barkley, Terri Shelton, Kenneth Fletcher, Gary Stoner, and the teachers and principals of the Worcester MA Public Schools for their invaluable contributions to this study. Address all correspondence to George J. DuPaul, Department Medical Center, 55 Lake Avenue North, Worcester, MA 01655. 284 of Psychiatry, University of Massachusetts Academic Performance Rating Scale 285 factors limit the usefulness of norm- possess several advantages for both referenced achievement tests for these screening and identification purposes. purposes, such as (a) a failure to sample Teachers are able to observe student the curriculum in use adequately, (b) the performance on a more comprehensive use of a limited number of items to sample sample of academic content than could various skills, (c) the use of response be included on a standardized achieveformats that do not require the student ment test. Thus their judgments provide to perform the behavior (e.g., writing) of a more representative sample of the interest, (d) an insensitivity to small domain of interest in academic assesschanges in student performance, and (e) ment (Gresham et al., 1987). Such judglimited contribution to decisions about ments also provide unique data regarding the “teachability” (e.g., ability to succeed programmatic interventions (Marston, in a regular education classroom) of 1989; Shapiro, 1989). students (Gerber & Semmel, 1984). FiGiven the limitations of traditional achievement tests, more direct measure- nally, obtaining teacher input about a ment methods have been utilized to screen student’s academic performance can for academic skills deficits and monitor provide social validity data in support of intervention effects (Shapiro, 1989; Sha- classification and treatment-monitoring piro & Kratochwill, 1988.) Several meth- decisions. At the present time, however, ods are available to achieve these pur- teachers typically are not asked for this information in a systematic fashion, and poses, including curriculum-based measurement (Shinn, 1989), direct obser- when available, such input is considered vations of classroom behavior (Shapiro & to be highly suspect data (Gresham et al., Kratochwill, 1988), and calculation of 1987). Teacher rating scales are important product completion and accuracy rates (Rapport, DuPaul, Stoner, & Jones, 1986). components of a multimodal assessment These behavioral assessment techniques battery used in the evaluation of the diagnostic status and effects of treatment involve direct sampling of academic behavior and have demonstrated sensitiv- on children with disruptive behavior ity to the presence of skills deficits and disorders (Barkley, 1988; Rapport, 1987). to treatment-induced change in such Given that functioning in a variety of behavioral domains (e.g., following rules, performance (Shapiro, 1989). In addition to these direct assessment academic achievement) across divergent methods, teacher judgments of students’ settings is often affected in children with achievement have been found to be quite such disorders, it is important to include accurate in identifying children in need information from multiple sources across of academic support services (Gresham, home and school environments. UnfortuReschly, & Carey, 1987; Hoge, 1983). For nately, most of the available teacher rating example, Gresham and colleagues (1987) scales specifically target the frequency of collected brief ratings from teachers problem behaviors, with few, if any, items regarding the academic status of a large related directly to academic performance. sample of schoolchildren. These ratings Thus, the dearth of items targeting teacher were highly accurate in classifying stu- judgments of academic performance is a major disadvantage of these measures dents as learning disabled or non-handiwhen screening for skills deficits or moncapped and were significantly correlated with student performance on two norm- itoring of academic progress is a focus of the assessment. referenced aptitude and achievement To address the exclusivity of the focus tests. In fact, teacher judgments were as accurate in discriminating between these on problem behaviors by most teacher two groups as the combination of the questionnaires, a small number of rating scales have been developed in recent years standardized tests. Although teacher judgments may be that include items related to academic acquisition and classroom performance subject to inherent biases (e.g., confirming previous classification decisions), they variables. Among these are the Children’s 286 School Psychology Review, 7997, Vol. 20, No. 2 Behavior Rating &ale (Neeper & Lahey, 1986), Classroom Adjustment Ratings Scale (Lorion, Cowen, & Caldwell, 1975), Health Resources Inventory (Gesten, 1976), the Social Skills Rating System (Gresham & Elliott, 1990), the Teachermild Rating Scale (Hightower et al., 1986), and the WaZlCimneZZ Scale of social Chphnceand SchoolAdjustment (Walker & McConnell, 1988). These scales have been developed primarily as screening and problem identification instruments and all have demonstrated reliability and validity for these purposes. Although all of these questionnaires are psychometrically sound, each scale possesses one or more of the following characteristics that limit its utility for both screening and progress monitoring of academic skills deficits. These factors include (a) items worded at too general a level (e.g., “Produces work of acceptable quality given her/his skills level”) to allow targeting of academic completion and accuracy rates across subject areas, (b) a failure to establish validity with respect to criterion-based measures of academic success, and (c) requirements for completion (e.g., large number of items) that detract from their appeal as instruments that may be used repeatedly or on a weekly basis for brief periods. The need for a brief rating scale that could be used to identify the presence of academic skills deficits in students with disruptive behavior disorders and to monitor continuously changes in those skills associated with treatment was instrumental in the development of the Academic Performance Rating Scale (APRS). The APRS was designed to obtain teacher perceptions of specific aspects (e.g., completion and accuracy of work in various subject areas) of a student’s academic achievement in the context of a multimodal evaluation paradigm which would include more direct assessment techniques (e.g., curriculum-based measurement, behavioral observations). Before investigating the usefulness of this measure for the above purposes, its psychometric properties and technical adequacy must be established. Thus, this study describes the initial development of the APRS and reports on its basic psychometric properties with respect to factor structure, internal consistency, test-retest reliability, and criterion-related validity. In addition, normative data by gender across elementary school grade levels were collected. METHOD Subjects Subjects were children enrolled in the first through sixth grades from 45 public schools in Worcester, Massachusetts. This system is an urban, lower middle-class school district with a 28.5% minority (African-American, Asian-American, and Hispanic) population. Complete teacher ratings were obtained for 493 children (251 boys and 242 girls), which were included in factor analytic and normative data analyses. Children ranged in age from 6 to 12 years of age (M = 8.9; SD = 1.8). A two-factor index of socioeconomic status (Hollingshead, 1975) was obtained with the relative percentages of subjects in each class as follows: I (upper), 12.3%; II (upper middle), 7.1%; III (middle), 45.5%; IV (lower middle), 26.3% and V (lower), 8.8%. A subsample of 50 children, 22 girls and 28 boys, was randomly selected from the above sample to participate in a study of the validity of the APRS. Children at all grade levels participated, with the relative distribution of subjects across grades as follows: first, 19%; second, 16%; third, 17%; fourth, 17%; fifth, 13.5%; and sixth, 17.5%. The relative distribution of subjects across socioeconomic strata was equivalent to that obtained in the original sample. Measures The primary classroom teacher of each participant completed two brief measures: the APRS and Attention/‘h$itit-Hperact+vity Disorder {ADHD] Rating Scale (DuPaul, in press). In addition, teachers of the children participating in the validity study completed the Abbreviated Canners Teacher Rating Scale Academic Performance (ACTRS); (Goyette, Conners, & Ulrich, 1978). APRS. The APRS is a 19-item scale that was developed to reflect teachers’ perceptions of children’s academic performance and abilities in classroom settings (see Appendix A). Thirty items were initially generated based on suggestions provided by several classroom teachers, school psychologists, and clinical child psychologists. Of the original 30 items, 19 were retained based on feedback from a separate group of classroom teachers, principals, and school and child psychologists, regarding item content validity, clarity, and importance. The final version included items directed towards work performance in various subject areas (e.g., “Estimate the percentage of written math work completed relative to classmates”), academic success (e.g., “What is the quality of this child’s reading skills?“), behavioral control in academic situations (e.g., “How often does the child begin written work prior to understanding the directions?“), and attention to assignments (e.g., “How often is the child able to pay attention without you prompting him/her?“). Two additional items were included to assess the frequency of staring episodes and social withdrawal. Although the latter are only tangentially related to the aforementioned constructs, they were included because “overfocused” attention (Kinsbourne & Swanson, 1979) and reduced social responding (Whalen, Henker, & Granger, 1989) are emergent symptoms associated with psychostimulant treatment. Teachers answered each item using a 1 (never or poor) to 5 (very often or excellent) Likert scale format. SevenAPRS items (i.e., nos. 12,13,15- 19) were reversekeyed in scoring so that a higher total score corresponded with a positive academic status. ADHD Rating Scale. The ADHD Rating Scale consists of 14 items directly adapted from the ADHD symptom list in the most recent edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-III-R; American Psychiatric Association, 1987). Teachers indicated the frequency of each symptom on a 1 (not Rating Scale 287 at all) to 4 (very much) Likert scale with higher scores indicative of greater ADHDrelated behavior. This scale has been found to have adequate internal consistency and test-retest reliability, and to correlate with criterion measures of classroom performance (DuPaul, in press). ACTRS. The ACTRS (or Hyperactivity Index) is a lo-item rating scale designed to assess teacher perceptions of psychopathology (e.g., hyperactivity, poor conduct, inattention) and is a widely used index for identifying children at-risk for ADHD and other disruptive behavior disorders. It has adequate psychometric properties and is highly sensitive to the effects of psychopharmacological interventions (Barkley, 1988; Rapport, in press). Observational measures. Children participating in the validity study were observed unobtrusively in their regular classrooms by a research assistant who was blind to obtained teacher rating scale scores. Observations were conducted during a time when each child was completing independent seatwork (e.g., math worksheet, phonics workbook). Observations were conducted for 20 min with on-task behavior recorded for 60 consecutive intervals. Each interval was divided into 15 s of observation followed by 5 s for recording. A child’s behavior was recorded as on or off-task in the same manner as employed by Rapport and colleagues (1982). A child was considered off-task if (s)he exhibited visual nonattention to written work or the teacher for more than 2 consecutive seconds within each 15 s observation interval, unless the child was engaged in another taskappropriate behavior (e.g., sharpening a pencil). The observer was situated in a part of the classroom that avoided direct eye contact with the target child, but at a distance that allowed easy determination of on-task behavior. This measure was included as a partial index of academic engaged time which has been shown to be significantly related to academic achievement (Rosenshine, 1981). 288 School Psychology Review, 7997, Vol. 20, No. 2 Academic efficiency score. Academic seatwork was assigned by each child’s classroom teacher at a level consistent with the teacher’s perceptions of the child’s ability level with the stipulation that the assignment be gradeable in terms of percentage completed and percentage accurate. Assignments were graded after the observation period by the research assistant and teacher, the latter of whom served as the reliability observer for academic measures. An academic efficiency score (AES) was calculated in a manner identical to that employed by Rapport and colleagues (1986) whereby the number of items’ completed correctly by the child was divided by the number of items assigned to the class multiplied by 100. This statistic represents the mean weekly percentage of academic assignments completed correctly relative to classmates and was used as the classroom-based criterion measure of academic performance. randomly selected from class roster), to complete APRS ratings according to each child’s academic performance during the previous week, and that responses on the ADHD scale were to reflect the child’s usual behavior over the year. Teacher ratings for the large sample (N= 487) were obtained within a l-month period in the early spring, to ensure familiarity with the student’s behavior. A subsample of 50 children was selected randomly from the larger sample and parent consent for participation in the validity study was procured. Teacher ratings for this subsample were obtained within a 3-month period in the late winter and early spring. Teacher ratings on the APRS were randomly obtained for half of the sample participating in the validity study (n = 25) on a second occasion, 2 weeks after the original administration of this scale, to assess test-retest reliability. Ratings reflected children’s academic performance over the previous week The research assistant completed the behavPublished norm-referenced achieveioral observations and collected AES data ment test scores. The results of schoolon 3 separate days (i.e., a total of 60 min based norm-referenced achievement tests of observation) during the same week that (i.e., Comprehensive Test of Basic Skills; APRS, ADHD, and ACIRS ratings were CTB/McGraw-Hill, 1982) were obtained completed. Means (across the 3 observafrom the school records of each student tion days) for percentage on-task and AES in the validity sample. These tests are scores were used in the data analyses. administered routinely on a group basis Interobserver reliability. The research in the fall or spring of each school year. National percentile scores from the most assistant was trained by the first author recent administration (i.e., within the past to an interobserver reliability of 90% or year) of this test were recorded for greater prior to conducting live observaMathematics, Reading, and Language tions using videotapes of children completing independent work. Reliability Arts. coefficients for on-task percentage were calculated by dividing agreements by Procedure agreements plus disagreements and mulRegular education teachers from 300 tiplying by 100%. Interobserver reliability classrooms for grades 1 through 6 were also was assessed weekly throughout the asked to complete the APRS and ADHD data collection phase of the study using rating scales with regard to the perfor- videotapes of 10 individual children (who mance of two children in their class. were participants in the validity study) Teachers from elementary schools in all completing academic work during one of parts of the city of Worcester participated the observation sessions. Interobserver (ie., a return rate of 93.5%) resulting in reliability was consistently above 80%with a sample that included children from all a mean of 90% for all children. A mean socio-economic strata. Teachers were Kappa coefficient (Cohen, 1960) of .74 was instructed by one of the authors on which obtained for all observations to indicate students to assess (i.e., one boy and girl reliability beyond chance levels. Following Academic Performance Rating Scale each observation period, the teacher and assistant independently calculated the amount of work completed by the student relative to classmates and the percentage of items completed correctly. Interrater reliability for these measures was consistently above 96% with a mean reliability of 99%. 289 subscale (e.g., items 3-6 included on both the Academic Success and Academic Productivity subscales). Given that the APRS was designed to evaluate the unitary construct of academic performance, it was expected that the derived factors would be highly correlated. This hypothesis was confirmed as the intercorrelations among Academic Success and Impulse Control, Academic Success and Academic Productivity, and Several analyses will be presented to Impulse Control and Academic Producexplicate the psychometric properties of tivity were .69, .88, and .63, respectively. the APRS. First, the factor structure of this Despite the high degree of overlap between instrument was determined to aid in the the Academic Success and Productivity construction of subscales. Second, the components (Le., items reflecting accuinternal consistency and stability of APRS racy and consistency of work correlated scores were examined. Next, gender and with both), examination of the factor grade comparisons were conducted to loadings revealed some important differidentify the effects these variables may ences (see Table 1). Specifically, the have on APRS ratings as well as to provide Academic Success factor appears related normative data. Finally, the concurrent to classroom performance outcomes, such validity of the APRS was evaluated by as the quality of a child’s academic calculating correlation coefficients be- achievement, ability to learn material tween rating scale scores and the criterion quickly, and recall skills. Alternatively, the measures. Academic Productivity factor is associated with behaviors that are important in the pocess of achieving classroom Factor Structure of the APRS success, including completion of work, The APRS was factor analyzed using following instructions accurately, and a principal components analysis followed ability to work independently in a timely by a normalized varimax rotation with fashion. iterations (Bernstein, 1988). As shown in Table 1, three components with eigen- Internal Consistency and values greater than unity were extracted, accounting for approximately 68% of the Reliability of the AIRS Coefficient alphas were calculated to variance: Academic Success (7 items), Impulse Control (3 items), and Academic determine the internal consistency of the Productivity (12 items). The factor struc- APRS and its subscales. The results of ture replicated across halved random these analyses demonstrated adequate subsamples (i.e., n = 242 and 246, respec- internal consistencies for the Total APRS tively). Congruence coefficients (Harman, (.96), as well as for the Academic Success (.94) and Academic Productivity (.94) 1976) between similar components ranged from 84 to .98 with a mean of .92, subscales. The internal consistency of the indicating a high degree of similarity in Impulse Control subscale was weaker factor structure across subsamples. Items (.72). Subsequently, the total sample was with loadings of 60 or greater on a specific randomly subdivided (i.e., n = 242 and 246, component were retained to keep the respectively) into two independent subnumber of complex items (i.e., those with samples. Coefficient alphas were calcusignificant loadings on more than one lated for all APRS scores within each factor) to a minimum. In subsequent subsample with results nearly identical to analyses, factor (subscale) scores were the above obtained. Test-retest reliability data were obcalculated in an unweighted fashion with complex items included on more than one tained for a subsample of 26 children 290 School Psychology Factor Structure Review, 7997, Vol. 20, No. 2 TABLE1 of the Academic Performance Academic Success Scale Item Rating Scale I. Math work completed 2. language Arts completed .30 0.02 .32 3. Math .60 .06 .I1 work 4. Language accuracy of work 6. Follows group instructions 7. Follows small-group 8. Learns material 9. Neatness 10. Quality 11. Quality instructions quickly .39 .37 .81 .I7 .50 .31 z of handwriting .87 -80 of reading of speaking 12. Careless work completion 13. Time to complete work Iii .36 14. Attention 15. Requires .24 .44 without prompts assistance 16. Begins work carelessly 55.5 values indicate items included .36 Ti .61 s3 53 -02 .35 .39 .38 .67 ,57 .28 6.6 in the factor named in the column (with both genders and all grades represented) across a 2-week interval as described previously. The reliability coefficients were uniformly high for the Total APRS Score (.95), and Academic Success (.91), Impulse Control (.88), and Academic Productivity (.93) subscales. Since rating scale scores can sometimes %nprove” simply as a function of repeated administrations (Barkley, 1988), the two mean scores for each scale were compared using separate t-tests for correlated measures. Scores for each APRS scale were found to be equivalent across administrations with t-test results, as follows: Total APRS Score (t( 24) = 1.24, N.S.), Academic Success (t( 24) = 1.31, N.S.), Academic Productivity (t(24) = 1.32, N.S.), and Impulse Control (t(24) = .15, N.S.). .23 .21 .20 .72 .82 .I6 Underlined ,Is z .39 19. Social withdrawal Note: .21 .35 .66 5 of % variance .I7 .I6 17. Recall difficulties 18. Stares excessively Estimate .84 ,82 F3 xi z 169 ,64 36 G so rl Arts accuracy 5. Consistency Academic Productivity Impulse Control 67 head. Gender and Grade Comparisons Teacher ratings on the APRS were broken down by gender and grade level to (a) assess the effects of these variables on APRS ratings and (b) provide normative comparison data. The means and standard deviations across grade levels for APRS total and subscale scores are presented for girls and boys in Table 2. A 2 (Gender) x 6 (Grade) multivariate analysis of variance (MANOVA) was conducted employing APRS scores as the dependent variables. Significant multivariate effects were obtained for the main effect of Gender (Wilk’s Lambda = .95; fl4, 472) = 6.20, p < .OOl) and the interaction between Gender and Grade (Wilk’s Lambda = .93; F(20,1566) = 1.61,~ < .95). Separate 2 x 6 univariate analyses of Academic Means and Standard Deviations Total Score Grade Performance 291 Rating Scale TABLE 2 for the APRS by Grade and Gender Academic Success Impulse Control Academic Productivity Grade1 (n =82) Girls (n = 40) Boys(n=42) 67.02 (16.27) 71.95 (16.09) 23.92 (7.37) 26.86 (6.18) 9.76 (2.49) 10.67 (2.82) 44.68 (10.91) 46.48 (11.24) Grade2(n=91) Girls (n = 46) Boys(n =45) 72.56 67.84 12.33) 14.86) 26.61 (5.55) 25.24 (6.15) 10.15 (2.70) 9.56 (2.72) 47.85 44.30 7.82) 10.76) Grade 3 (n = 92) Girls (n = 43) Boys (n =49) 72.10 68.49 14.43) 16.96) 25.07 (6.07 25.26 (6.53) 10.86 (2.65) 9.27 (2.67) 47.88 45.61 9.35) 11.89) Grade4(n =79) Girls (n = 38) Boys (n=41) 67.79 (18.69) 69.77 (15.83) 24.08 (7.56) 25.35 (6.50) 10.36 (2.91) 9.83 (2.77) 44.26 45.71 Grade5(n=79) Girls (n = 44) Boys(n =35) 73.02 (14.10) 63.68 (18.04) 26.11 (6.01) 23.14 (7.31) 10.76 (2.34) 8.69 (2.82) 48.36 42.40 (12.47) Grade6(n =70) Girls (n = 31) Boys (n =39) 74.10 (14.45) 65.24 (12.39) 26.59 (6.26) 23.75 (5.90) 10.79 (2.25) 9.05 (2.35) 48.77 ( 9.13) 43.59 ( 8.19) Note: Standard deviations are in parentheses. variance (ANOVAs) were conducted subsequently for each of the APRS scores to determine the source of obtained multivariate effects. A main effect for Gender was obtained for the APRS Total score (fll, 476) = 6.37, p < .05), Impulse Control (F(1, 475) = 16.79, p < .OOl), and Academic Productivity (fll, 475) = 6.95, p < .05) subscale scores. For each of these scores, girls obtained higher ratings than boys, indicating greater teacher-rated academic productivity and behavioral functioning among girls. No main effect for Gender was obtained on Academic Success subscale scores. Finally, a significant interaction between Gender and Grade was obtained for the APRS Total score (F(5,476) = 2.68, p < .05), Academic Success (F(5, 475) = 2.63, p < .05), and Impulse Control (e&475) = 3.59, p < .Ol) subscale scores. All other main and interaction effects were nonsignificant. Simple effects tests were conducted to elucidate Gender effects within each Grade level for those variables where a significant interaction was obtained. Relatively similar results were obtained across APRS scores. Gender effects were found only within grades 6 (fll, 475) = 7.02, p < .Ol) and 6 (fly, 475) = 6.61, p < .05) for the APRS total score. Alternatively, gender differences on the Academic Success subscale were obtained solely within grades 1 (F(1,475) = 4.24, p < .05) and 5 (F(1, 475) = 4.14, p < .05). These results indicate that girls in the first and f&h grades were rated as more academically competent than boys. Significant differences between boys and girls in Impulse Control scores were also found within grades 3 (fll, 475) = 8.73, p < .Ol), 5 (F(1,475) = 12.24,~ < .OOl), and 6 (F(I, 475) = 8.06, p < .Ol) with girls judged to exhibit greater behavioral control in these three grades. All other simple effects tests were nonsignificant. School Psychology Correlations TABLE 3 APRS Scores Total Score Measures ACTRS’ ADHD Between Ratings Review, 7997, Vol. 20, No. 2 and Criterion Measures Academic Success Impulse Control Academic Productivity -m6()***b 9.43’” 0.49”” ,.&4*** -.72*** 0.59”’ -.61*** 0.72”“” On Task Percentage .29* .22 .24 .31* AES” .53*** .26 .41** .57*** CTBS Math .48*** .62*** .28 .39** CTBS Reading .53*** .62*** .34* 44’” CTBS Language .53*** .61*** .41** .45** ‘Abbreviated Conners Teacher Rating Scale. bCorrelations are based on N = 50 with degrees of freedom ‘Academic Efficiency Score. **p<.o1 "pC.05 Note: National Relationships and Criterion percentile -p = 48. < .ool scores were used for all Comprehensive Among APRS Scores Measures Divergent Test of Basic Skills (CTBS) subscales. Validity of the APRS Correlation coefficients between APRS scores and criterion measures were The relationships among all APRS calculated with ACTRS ratings partialled scores and several criterion measures out to statistically control for variance were examined to determine the concur- attributable to teacher ratings of problem rent validity of the APRS. Criterion behavior (see Table 4). Significant relameasures included two teacher rating tionships remained between APRS acascales (ACTRS, ADHD Rating Scale), direct demic dimensions (i.e., Total Score, Acaobservations of on-task behavior, percent- demic Success, and Academic Proage of academic assignments completed ductivity subscales) and performance correctly @ES), and norm-referenced measures such as AES and achievement achievement test scores (CTBS reading, test scores. As expected, partialling out math, and language). Pearson productACTRS scores reduced the correlations moment correlations among these mea- between the Impulse Control subscale and sures are presented in Table 3. Overall, the criterion measures to nonsignificant levels. None of the partial correlations absolute values of obtained correlation coefficients ranged from .22 to .72 with with ADHD ratings and on-task percent24 out of 28 coefficients achieving statis- age were statistically significant, indicattical significance. Further, the APRS Total ing that these criterion measures were Score and Academic Productivity subscale more related to teacher perceptions of a child’s behavioral control than to his or were found to share greater than 36% of her academic performance. The Academic the variance with the AES, ACTRS, and Success subscale continued to share 26% ADHD Rating Scale.The Academic Success or greater of the variance of CTBS scores subscale shared an average of 38% of the when ACIDS scores were partialled out. variance of CTBS scores. Weaker correla- In addition, the Total APRS score and the tions were obtained between APRS scores Academic Productivity subscale shared 9% and direct observations of on-task behav- of the variance with AES beyond that ior with only an average of 7.2% of the accounted for by teacher ratings of latter’s variance accounted for. problem behavior. Academic Correlations Ratings On Task Percentage Rating Scale TABLE 4 Between APRS Scores and Criterion with ACTRSa Scores Partialled Out Total Score Measures ADHD Performance Academic Success 293 Measures Impulse Control Academic Productivity -.12b 0.24 0.24 -. 07 0.04 0.01 0.03 9.04 AESC .32* .06 .22 .37** CTBS Math .38** .56*** .I4 .25 CTBS Reading .46*** .58*** .24 .34* CTBS Language .43** .54*** .28 .30* *Abbreviated bCorrelations Conners Teacher Rating Scale. are based on N = 50 with degrees of freedom ‘Academic Efficiency Score. *p < .05 *+p < .Ol Note: National percentile ““p = 48. < a01 scores were used for all Comprehensive The divergent validities of the APRS subscales were examined to assess the possible unique associations between subscale scores and criterion measures. This was evaluated using separate t-tests for differences between correlation coefficients that are from the same sample (Guilford & Fruchter, 1973, p. 167). The Academic Success subscale was more strongly associated with CTBS percentile rankings than the other subscales or ACTRS ratings. This finding was expected given that the Academic Success subscale is comprised of items related to the outcome of academic performance. Specifically, the relationship between CTBS Math scores and Academic Success ratings was significantly greater than that obtained between CTBS Math scores and Impulse Control (t(47) = 3.03, p < .Ol), Academic Productivity (t(47) = 3.11, p < .Ol, and ACTRS (t(47) = 2.35, p < .05) ratings. Similar results were obtained for CTBS Reading scores. The correlation of the latter with Academic Success ratings was significantly greater than its relationship with Impulse Control (t(47) = 2.50, p < .05, Academic Productivity (t(47) = 2.38, p < .05, and ACTRS (t(47) = 2.76, Test of Basic Skills (CTBS) subscales. p < .Ol) ratings. Finally, the relationship between Academic Success ratings and CTEB Language scores was significantly greater than that obtained between the latter and Academic Productivity ratings (t(47) = 2.12, p < .OS). The Academic Productivity subscale was found to have the strongest relationships with teacher ratings of problem behavior and accurate completion of academic assignments. The correlation between Academic Productivity and ACTRS ratings was significantly greater than that obtained between ACTRS and Academic Success ratings (t(47) = 2.84, p < .Ol). In a similar fashion, Academic Productivity ratings were associated to a greater degree with AES scores than were Academic Success ratings (t(47) = 4.29, p < .Ol). Thus, the Academic Productivity subscale was significantly related to criterion variables that represent factors associated with achieving classroom success (i.e., absence of problem behaviors and accurate work completion). It should be noted that validity coefficients associated with the Impulse Control subscales were not found to be significantly greater than either of the other subscales. 294 School fsvcholonv Review, 7997, Vol. 20, A/o. 2 , “/ APRS Ratings: Sensitivity to Group Differences A final analysis was conducted to investigate the sensitivity of APRS ratings to differences between groups of children with and without attention and impulse control problems (i.e., the latter group representing students who are potentially exhibiting academic performance difficulties). Children from the total sample with scores 2 standard deviations above the mean on the ADHD rating scale (n = 35) were compared with students who received teacher ratings of ADHD symptomatology within 1 standard deviation of the mean (n = 390). Separate t-tests were conducted employing each of the APRS scores as dependent measures. Statistically significant differences were obtained between groups for the APRS Total score (t( 1,423) = 12.32,~ < .OOl), and Academic Success (t(1, 423) = 7.23, p < .OOl), Impulse Control (t( 1, 423) = 8.95, p < .OOl), and Academic Productivity (t(1, 423) = 10.20, p < .OOl) subscales, with the children exhibiting ADHD symptoms rated as significantly inferior on all APRS dimensions relative to control children. DISCUSSION The APRS is a brief teacher questionnaire that provides reliable and valid information about the quality of a student’s academic performance and behavioral conduct in educational situations. Separate principal components analyses resulted in the extraction of three components or subscales (i.e., Academic Success, Impulse Control, and Academic Productivity) that were congruent across random subsamples. The Academic Success subscale accounted for over half of the variance which supports the construct validity of the APRS, as it was intended to assess teacher perceptions of the quality of students’ academic skills. An additional 13% of rating variance was accounted for by the Academic Productivity and Impulse Control subscales. Although the latter are highly correlated with the Academic Success subscale, both appear to provide unique information regarding factors associated with the process of achieving classroom success (e.g., work completion, following instructions, behavioral conduct). Psychometric Properties of the APRS The APRS total and subscale scores were found to possess acceptable internal consistency, to be stable across a 2-week interval, and to evidence significant levels of criterion-related validity. Although the Impulse Control subscale was found to have adequate test-retest reliability, its internal consistency was lower than the other subscales. This latter finding is likely due to the fewer number of items in this subscale. The relationship among APRS scores and criterion measures, such as academic efficiency, behavior ratings, and standardized academic achievement test scores, were statistically significant. The APRS Total Score and two subscales were found to have moderate validity coefficients and to share appreciable variance with several subtests of a norm-referenced achievement test and a measure of classwork accuracy. Further, when validity coefficients were calculated with ACTRS readings partialled out, most continued to be statistically significant indicating that APRS scores provide unique information regarding a child’s classroom performance relative to brief ratings of problem behavior. Two of the three APRS subscales were found to exhibit divergent validity. Although all APRS subscales were positively correlated with achievement test scores, the strongest relationships were found between the Academic Success subscale and CTBS percentile rankings, accounting for an average of 38% of the variance. Alternatively, although negative correlations were obtained between teacher report of problem behaviors (i.e., ACTRS and ADHD ratings) and all APRS scores, the strongest relationships were found between the former rating scales and Academic Productivity scores. Further, a classroom-based measure of work completion accuracy (AES) had a significantly greater correlation with the Academic Productivity subscale with 32.5% variance Academic Performance accounted for. This latter finding may appear counterintuitive (i.e., that Academic Success did not have the strongest relationship with AES), but is most likely due to the fact that AES represents a combination of the child’s academic ability, attention to task, behavioral control, and motivation to perform. Given the varied item content of the Academic Productivity subscale, it is not surprising that it shares more variance with a complex variable like AES. This pattern of results indicates that the Academic Success subscale is most representative of the teacher’s judgment of a student’s global achievement status, whereas the Academic Productivity subscale has a greater relationship with factors associated with the process of day-to-day academic performance. Finally, although the Impulse Control subscale was significantly associated with most of the criterion measures, it was not found to demonstrate divergent validity. This result, combined with its brevity, lower internal consistency, and redundancy with teacher ratings of problem behavior, limits its practical utility as a separate subscale. Although statistically significant positive correlations with on-task percentage were obtained for the APRS Total and Academic Productivity scores, the Academic Success and Impulse Control subscales were not related to this observational measure. One explanation for this result is that the Academic Productivity subscale is more closely related to factors associated with independent work productivity (e.g., attention to task) than are the other subscales. A second possible explanation for the weaker correlations between this criterion variable and all APRS scores is that children’s classroom performance is a function of multiple variables and is unlikely to be represented by a single, specific construct. As such, teacher ratings of academic functioning should be more strongly related to global measures, such as AES or standardized achievement test scores, that represent a composite of ability, attention to task, task completion and accuracy, than with a Rating Scale 295 more specific index such as on-task frequency. Teacher ratings on the APRS differentiated a group of children displaying behavior and attention problems from their normal classmates. Youngsters who had received scores 2 or more standard deviations above the mean on a teacher rating of ADHD symptomatology received significantly lower scores on all APRS scales relative to a group of classmates who were within 1 standard deviation of the mean on ADHD ratings. This result provides preliminary evidence of the APRS’s discriminant validity and value for screening/problem identification purposes. Further studies are necessary to establish its utility in differentiating youngsters with disruptive behavior disorders who are exhibiting concomitant academic problems versus those who are not. APRS: Grade and Gender Differences Girls were rated to be more competent than boys on the Academic Productivity subscale, regardless of grade level. This result was expected as gender differences favoring girls have been found for most similar teacher questionnaires (e.g., Weissberg et al., 1987). Alternatively, for the total and remaining subscale scores, girls were rated as outperforming boys only within specific grade levels. In general, these were obtained at the fifth and sixth grade levels, wherein gender differences with respect to achievement status and behavioral control are most evident at the upper grades. The latter result could indicate that gender differences in daily academic performance do not impact on teachers’ overall assessment of educational status until the later grades when demands for independent work greatly increase. Interestingly, no significant grade differences were obtained for any of the APRS scores. As Hightower and colleagues (1986) have suggested, a lack of differences across grade levels implies that teachers complete ratings of academic performance in relative (i.e., in comparison with similaraged peers) rather than absolute terms. 296 School Psychology Review, 7997, Vol. 20, No. 2 supplement to behavioral assessment techniques (e.g., direct observations of Several factors limit definitive conclubehavior, curriculum-based measuresions about the utility of the APRS based ment) given its brevity, focus on both on the present results. First, the sample global and specific achievement parameof children studied was limited to an ters, and relationship with classroomurban location in one geographic region; it is unknown how representative these based criteria of academic success. The normative data would be for children from present results provide initial support for rural or suburban settings as well as other the utility of the APRS as a screening/ regions. Previous research with similar problem identification measure. Further, teacher questionnaires would suggest when used in the context of an assessment significant differences in scores across battery that includes more direct meaurban, suburban, and rural settings (e.g., sures of academic performance, the APRS Hightower et al., 1986). Secondly, for the may provide important data regarding the norms to be generally applicable, APRS social validity (i.e., teacher perceptions of ratings would need to be collected for a changes in academic status) of obtained sample representative of the general intervention effects, although its increpopulation with respect to ethnicity and mental validity would need to be established. The APRS’s sensitivity to the effects socioeconomic status. A further limitation of the present study was the limited range of behavioral and psychopharmacological of criterion measures employed. In par- interventions awaits further empirical ticular, the relationship of APRS scores study. with more direct measures of academic performance (e.g., criterion-based measurement) should be explored, as the weaknesses of norm-referenced achieve- American Psychiatric Association. (1987). Diugnosment tests for this purpose are well tic and statistical manual of mental disorders (3rd ed. Revised). Washington, DC: Author. documented (Marston, 1989; Shapiro, 1989). Finally, additional psychometric Barkley, R. A. (1988). Child behavior rating scales properties of this scale, such as predictive and checklists. In M. Rutter, A. H. Tuma, & I. S. validity and inter-rater reliability, need to Lann (Eds.), Assessment and diagnosis in child psychopathology (pp. 113-155). New York: be documented. Empirical investigations Guilford. are necessary to determine the usefulness of the APRS as a treatment-sensitive Bernstein, I. H. (1988). Applied multivariate instrument. Evidence for the latter is analysis. New York: Springer-Verlag. especially important as a primary purpose for creating the APRS was to allow Cantwell, D. P., & Satterfield, J. H. (1978). The prevalence of academic under-achievement in assessment of intervention effects on hyperactive children. Journal @‘Pediatric pszlcholacademic performance. w, 3, 168-171. Limitations of the Present Study Summary The results of this preliminary investigation indicate that the APRS is a highly reliable rating scale that has demonstrated initial validity for assessing teacher perceptions of the quality of student academic performance. Given its unique focus on academic competencies rather than behavioral deficits, it appears to have potential utilitywithin the context of a multimethod assessment battery. In particular, it should serve as a valuable Children’s Defense Fund. (1988). A call for actiun to make our nation sqfie for children: A briefing book on the status of American children in 1988. Washington, DC: Author. Cohen, J. (1960). A coefficient nominal scales. Educational Measurement, 20,37-46. of agreement for and pS@&gical CTB/McGraw-Hill. (1982). l%e comprehensive of Basic Skills. Monterey, CA Author. Test DuPaul, G. J. (in press). Parent and teacher ratings of ADHD symptoms: Psychometric properties in a community-based sample. Journal of Clinical Child Psychologg. Academic Performance Gerber, M. M., & Semmel, M. I. (1984). Teacher as imperfect test: Reconceptualizing the referral process. Educational Psychologist, 19, 137-148. Gesten, E. L. (1976). A Health Resources Inventory: The development of a measure of the personal and social competence of primary-grade children. Journal of Consulting and Clinical Psychology, 4-4, 775-786. Rating Scale 297 Lorion, R. P., Cowen, E. L., & Caldwell, R. A. ( 1975). Normative and parametric analyses of school maladjustment. American Journal of Community Psychology, 3,291-301. Marston, D. B. (1989). A curriculum-based measurement approach to assessing academic performance: What it is and why do it. In M. R. Shinn (Ed.), Curriculum-based measurement: Assessing special children (pp. 18-78). New York: Guilford Press. Glidewell, .I. C., & Swallow, C. S. (1969). The prevalence of maladjustment in elementary schools. Report prepared for the Joint Commission on Mental Illness and Health of Children. Chicago: University of Chicago Press. National Commission on Excellence in Education. for (1983). A nation at risk: 17Le immative educational reform. Washington, DC: Author. Goyette, C. H., Conners, C. K., & Ulrich, R. F. (1978). Normative data on Revised Conners Parent and Teacher Rating Scales. Journal of Abnormal Child Psychdogy, 6,221-236. Neeper, R., & Lahey, B. B. (1986). The Children’s Behavior Rating Scale: A factor analytic developmental study. school Psychology Reuiew, 15, 277288. Gresham, F. M., & Elliott, S. N. (1990). Social skills rating system. Circle Pines, MN: American Guidance Service. Rapport, M. D. (1987). Attention Deficit Disorder with Hyperactivity. In M. Hersen &V. B. Van Hasselt (Eds.), Behavior therapy with children and adolescents (pp. 325-361). New York: Wiley. Gresham, F. M., Reschly, D. .I., & Carey, M. P. (1987). Teachers as “tests”: Classification accuracy and concurrent validation in the identification of learning disabled children. S&ool Psychology Rewiim, 16,543-553. Guilford, .I. P., & Fruchter, B. (1973). Fundamental statistics in psychology and education (5th ed.). New York: McGraw-Hill. Harman, H. H. (1976). Malern factor analysis (3rd ed.-revised). Chicago: The University of Chicago Press. Hightower, A. D., Work, W. C., Cowen, E. L., Lotyczewski, B. S., Spine& A. T., Guare, J. C., & Rohrbeck, C. A. (1986). The Child Rating Scale: The development of a socioemotional self-rating scale for elementary school children. school Psychology Review, 16,239-255. Hoge, R. D. (1983). Psychometric properties of teacher-judgment measures of pupil aptitudes, classroom behaviors, and achievement levels. Journal of &x&al Education, 17,401-429. Hollingshead, A. B. (1975). Fourfactor index of social status. New Haven, CT Yale University, Department of Sociology. Kazdin, A. E. (1985). Treatment of antisocial behavior in children and adolescents. Homewood, IL: Dorsey Press. Kinsbourne, M., & Swanson, J. M. (1979). Models of hyperactivity: Implications for diagnosis and treatment. In R. L. Trites (Ed.), Hyperactivity in children: Etiology, measurement, and treatment implications (pp. l-20). Baltimore: University Park Press. Rapport, M. D. (in press). Psychostimulant effects on learning and cognitive function in children with Attention Deficit Hyperactivity Disorder: Findings and implications. In J. L. Matson (Ed.), Hwactivity in children: A handbook. New York: Pergamon Press. Rapport, M. D., DuPaul, G. J., Stoner, G., & Jones, J. T. (1986). Comparing classroom and clinic measures of attention deficit disorder: Differential idiosyncratic, and dose-response effects of methylphenidate. Journal of Consulting and Clinical PsycWQgy, 54,334-341. Rapport, M. D., Murphy, A., & Bailey, J. S. (1982). Ritalin vs. response cost in the control of hyperactivity children: A within-subject comparison. Journal of Applied Behavior Analysis, 15, 205216. Rosenshine, B. V. (1981). Academic engaged time, content covered, and direct instruction. Journal of Education, 3,38-66. Rubin, R. A, & Balow, B. (1978). Prevalence of teacher-identified behavior problems. Exceptional Children, 45, 102-111. Shapiro, E. S. (1989). Academic skills problems: Direct assessment and intervention. New York: GuiIford Press. Shapiro, E. S., & Kratochwill, T. R. (Eds.). (1988). Behavioral assessment in schools: Conceptual foundations and practical applications. New York: Guilford Press. Shinn, M. R. (Ed.). (1989). Curriculum-based measurement: Assessing special children. New York: Guilford Press. 290 School Psychology Review, 7997, Vol. 20, No. 2 Wallrer, H. M., & McConnell, S. R. (1988). ?ViuZtiM&ml1 &ale of social GmqMmce and &hool AAustin, TX: Pro-Ed, Inc. Weiss, G., & Hechtman, L. (1986). Hyperactive clddm grown up. New York: GuMord. Weissberg,R.P.,Cowen,E. L., Lotyczewski,B. S.,Boike, M. F., Orara, N., Ahvay, Stalonas, P., Sterling, S., & Gesten,E. L (1987). Teacherratings of children’s problem and competence behatiors: Normative and parametric characteristics.AmericanJoumMtl c#cOmmun~pszlcho~, 15,387-401. Whalen, C. K., Henker, B., & Granger, D. A. (1989). Ratings of medication effects in hyperactive children: Viable or vulnerable?Behavioral Assessment, 11,179.199. e J. DuPauI, PhD, received his doctorate from the University of Rhode lslan in 1985. He is currently Assistant Professor of Psychiatry at the University of Massachusetts Medical Center. His research interests include the assessment and treatment of Attention Deficit Hyperactivity Disorder and related behavior disorders. -7 Mak D. Rapport, PhD, is currently Associate Professor of Psychology at the University of Hawaii at Manoa. His research interests include assessment of the cognitive effects of psychotropic medications and the treatment of Attention Deficit Hyperactivity Disorder and related behavior disorders. Lucy M. PerrieIIo, MA, received a Master’s degree in Counseling Psychology from Assumption College in 1988. She is currently a Research Associate in Behavioral Medicine at the University of Massachusetts Medical Center. Academic Performance Rating Scale APPENDIX A Student Date Teacher Grade For each of the below items, please estimate the above student’s performance over the PAST WEEK. For each item, please circle one choice only. Estimate the percentage of written math work completed (regardless of accuracy) relative to classmates. 049% Estimate the percentage of written language arts work completed (regardless of accuracy) relative to classmates. 049% I 2 3 .Estimate the accuracy of com4 pleted written math work (i.e., percent correct of work done). 044% 65-69% 70-79% 2 3 4 4. Estimate the accuracy of completed written language arts work (i.e., percent correct of work done). 044% 5. How consistent has the quality of this child’s academic work been over the past week? Consistently Poor 1 1 5049% 2 5049% 3 70-79% 8049% 4 804% 4 8049% 90-100% 5 90400% 5 90-100% 5 70-79% 8&89% 2 3 4 5 More Poor than Successful Variable More Successful than Poor Consistently successful 1 2 3 4 5 Never Rarely Often Very often 1 2 Never Rarely 1 2 3 4 5 8. How quickly does this child learn new material (i.e., pick up novel concepts)? Very Slow Slow Average Quickly very Quickly 1 2 3 4 5 9. What is the quality or neatness of this child’s handwriting? Poor Fair Average Above Average Excellent I 2 3 4 5 6. How frequently does the student accurately follow teacher instructions and/or class discussion during large-group (e.g., whole class) instruction? 7. How frequently does the student accurately follow teacher instructions and/or class discussion during small-group (e.g., reading group) instruction? 1 6549% 70-79% Sometimes 3 Sometimes 90400% 4 5 Often Very often 300 SchoolPsychologyReview,7997, 10. What is the quality of this child’s reading skills? Vo/.2OJVo.2 Poor Fair Average Above Average Excellent 1 2 3 4 5 Poor Fair Average Above Average Excellent 1 2 3 4 5 12. How often does the child complete written work in a careless, hasty fashion? Never Rarely 1 2 13. How frequently does the child take more time to complete work than his/her classmates? Never Rarely 1 2 14. How often is the child able to pay attention without you prompting him/her? Never Rarely 1 2 15. How frequently does this child require your assistance to accurately complete his/ her academic work? Never Rarely 1 2 16. How often does the child begin written work prior to understanding the directions? Never Rarely 1 2 17. How frequently does this child have difficulty recalling material from a previous day’s lessons? Never Rarely 1 2 18. How often does the child appear to be staring excessively or “spaced out”? Never Rarely 1 2 19. How often does the child appear withdrawn or tend to lack an emotional response in a social situation? Never Rarely 1 2 11. What is the quality of this child’s speaking skills? Sometimes 3 Sometimes 3 Sometimes 3 Sometimes 3 Sometimes 3 Sometimes 3 Sometimes 3 Sometimes 3 Often 4 Often 4 Often 4 Often 4 Often 4 Often 4 Often 4 Often 4 Very Often 5 Very Often 5 Very Often 5 Very Often 5 Very Often 5 Very Often 5 Very Often 5 Very Often 5