Hayward, Stewart, Phillips, Norris, & Lovell Test Review: Oral and Written Language Scales: Listening Comprehension and Oral Expression (OWLS) N.B. I have chosen to review only the Listening Comprehension and Oral Expression subtests. In considering the Written Expression subtest, I found that the items were specific to the student’s ability to engage in writing activities such as printing his or her name, copying a word, and writing letters, and later, composing sentences from the examiner’s spoken words (e.g., Item 25 “Write one sentence using these four words”) or responding in writing to the examiner’s instruction (Item 19 “A girl is looking for her lost ring. Write here what she asked her mother.”). However, at the start age of 5-7 years, the student begins by copying words and sentences, writing a letter (e.g., Item 5 Write the letter “f” here) or writing spoken sentences (item 12 “Write the sentence I say here.”). For our purpose, I focused on abilities that I thought were related to reading and were comparable with other tests reviewed so far. Name of Test: Oral and Written Language Scales: Listening Comprehension and Oral Expression (OWLS). Author(s): Elizabeth Carrow Woolfolk Publisher/Year: 1995 American Guidance Service, Inc. Forms: only one Age Range: 3 years through 21 years. Norming Sample Test construction began in 1991 with a tryout phase followed by a national standardization study. From the standardization study (1992-1993), test administration was detailed and normative scores were developed. In developing the normative scores, the author notes that smoothing procedures were used to deal with “irregularities caused by sampling fluctuations” (Carrow-Woolfolk, 1995, p. 119). Item analyses were carried out for both scales. Both classical (item difficulty and item discrimination) and Rasch scaling methods were used to determine the suitability, presentation order, and consistency of test items. The author provides a good explanation of these procedures for the potential examiner (Carrow-Woolfolk, 1995, p.115). Total Number: 1 985 Number and Age: The students ages ranged from 3 to 21 years. There were 13 age groups as follows: 6-month age intervals for ages 3 years, 0 months to 4 years, 11 months; 1-year intervals for ages 5 years, 0 months to 11 years, 0 months; and then by age groups 12-13, 14-15, 16-18, and 19-21. All ages have N=100 or more persons. Younger children, ages 3 and 4, were given only the Listening Comprehension and Oral Expression Scales. 1 Hayward, Stewart, Phillips, Norris, & Lovell 2 Location: 74 sites (NE, North Central, South, and West) Demographics: Demographics are reported by gender, geographical region, race/ethnicity and socioeconomic status (maternal educational). Rural/Urban: not specified. SES: SES was reported by maternal employment, which seems unusual to do so but the author states that the mother’s educational level has a “plausible link to the examinee’s developed language abilities” (Carrow-Woolfolk, 1995, p. 110). Other: none Sample characteristics compared favourably to 1991 U.S. Census information (Bureau of the Census, 1991). All data are presented in table form by various sample characteristics to illustrate representativeness. Summary Prepared By: Eleanor Stewart November 2007 Test Description/Overview The complete test kit consists of the examiner’s manual, Listening Comprehension Easel, Oral Expression Easel, and record forms. (The Written Expression Scale is packaged separately). The OWLS consists of three subtests: Listening Comprehension, Oral Expression and Written Expression. The first two are published together in one test kit while the third is packaged and presented separately. As stated by the Buros reviewers, “The decision to package the Written Expression subtest separately was unfortunate, as it complicates comparing students' performance in these three areas” (Carpenter and Malcom, 2001, p. 860). Theory: The OWLS is based on Elizabeth Carrow’s previous work in the area of language test development. She follows from previous tests in that she presents a model that is based on her theoretical work in language development (she cites Carrow-Woolfolk, 1989; Carrow-Woolfolk and Lynch, 1981). In this manual, the author presents a brief overview of key elements that include the basic dimensions of language, language knowledge, and language performance. Language knowledge refers to the structure of language that includes content and form while language performance refers to “internal systems the language user employs to process language” (Carrow-Woolfolk, 1995, p.7). The author proposes that these two dimensions together account for verbal communication. The author further elaborates on her theory and spends the rest of the section on theory describing the elements of content, form, and use. Much of this information is familiar to me from either reviewing her other tests or using them in clinical practice. Hers is a servo system view of language with interconnecting and overlapping circuits. Language processing theory, according to Carrow, “separates Hayward, Stewart, Phillips, Norris, & Lovell the four major processes by the requirements of their perspective processing systems” (p. 12). I think that the author is confident of her theoretical foundation in that she only references Chomsky and Foster as theoretical sources. Otherwise, the text is sparse in terms of citations to the work of others in language theory. It is on the basis of her theory that the OWLS is organized around the four structural categories she proposes (lexical, syntactic, pragmatic, and supralinguistic) and the processing that includes listening comprehension, oral expression, written expression, and reading. Her model resembles Lois Bloom’s content/form/use model familiar to speech pathologists and developmental linguists. Comment from the Buros reviewers: “The only apparent weakness of this test is its art work. Stimulus materials consist of black-andwhite line drawings. Many of these drawings seem vague to the concepts to be tapped by the items. As the items increase in developmental difficulty, they become much more detailed and subtle in their differences. Students who do not attend well to visual detail, or who have short attention spans, may miss the cues and details presented in the materials that are used to prompt them for the appropriate language responses” (Carpenter & Malcolm, 2001, pp. 863-864). Purpose of Test: The purpose of this test is to assess language knowledge and processing skills. The author identifies three purposes which are to identify, intervene, and monitor progress, and also to use in research. The author states that identifying language problems will assist in “addressing potential academic difficulties” (Carrow-Woolfolk, 1995, p. 3). According to the author, growth across time from preschool through high school and into post-secondary education can be tracked. Also, due to the age range covered, the author claims that it is useful in research studies. Comment: The Buros reviewer states that “support that the author provided for each of these claims was uneven” (Carpenter & Malcolm, 2001, p. 861). Areas Tested: 1. Listening Comprehension: 111 items (four black and white line drawings per test plate) in progression of increasing difficulty are used for this subtest. The student must select (by pointing or verbal response) the picture that best matches the examiner’s statement. The test items reflect increasing difficulty in terms of length and complexity, syntactic structures, semantic factors, and “amount and type of background information needed to comprehend, and degree to which the cognitive system is involved…” (p. 20). For example, from the manual (p. 21), the items begin simply with “Show me the car” (noun) and progress to “Dad said, ‘I think I’ll hit the sack’. What did he do? (idiom). 2. Oral expression: 96 items are presented, as above (i.e., black and white line drawings, etc). Verbal responses are required to questions about the pictures or to requests to complete descriptions. The author defends her choice to use pictures throughout 3 Hayward, Stewart, Phillips, Norris, & Lovell the test by stating that the decision “is based on the desire to provide consistency in testing, elicit responses more readily, and hold the examinee’s attention. Higher level items are far less dependent on picture cues than lower level items, where the need for modeling is greater” (Carrow-Woolfolk, 1995, p. 23). Comment: I think the pictures are boring. Areas Tested: Oral Language Vocabulary Grammar Narratives Other Print Knowledge Environmental Print Alphabet Other Phonological Awareness Segmenting Blending Elision Rhyming Other Reading Single Word Reading/Decoding Comprehension Spelling Other Writing Letter Formation Capitalization Punctuation Conventional Structures Other Listening Lexical Syntactic Supralinguistic Word Choice Details Who can Administer: School psychologists, speech pathologists, educational diagnosticians, early childhood specialists and other professionals with graduate level training in testing and interpretation may administer this test. Administration Time: Table 1.1 provides average administration time in minutes for the normative sample. The Buros reviewers estimated 15 to 40 minutes overall with 5 to 15 minutes for the Listening Comprehension and 10 to 25 for Oral Expression. Test Administration (General and Subtests): Chapter 4 provides a general overview of testing that contains familiar information about the setting, arrangement of materials and positioning of the examiner and student, establishing rapport, etc. Chapters 5 and 6 detail the specific subtests’ administration and scoring. Start points by age are given for the Oral Expression Scale. The examiner begins each subtest with an example for the student that if answered correctly, triggers administration of the rest of the subtest. If the student incorrectly responds to the example, the examiner is instructed to teach the response. Three examples are provided, if all three attempts fail with instruction, the examiner is instructed to note this in the record booklet. Repetitions of the examples are allowed to assist low functioning students. The examiner can start with test item lower than the suggested age start 4 Hayward, Stewart, Phillips, Norris, & Lovell point. Target responses are marked on the examiner’s side of the test booklet. No repetitions are allowed on the Listening Comprehension Scale whereas one repetition is allowed on the Oral Expression Scale. Prompting is allowed on the Oral Expression scale and is outlined in the manual in the detailed section on Item-by-item Scoring Rules. This section addresses the specifics of each test item in terms of scoring rule, preferred and acceptable responses, errors (grammatical, semantic, pragmatic). Basal and ceiling rules apply and differ between the two subtests. Overall, administration is straightforward and easy to follow. Comment from the Buros reviewer: “The establishment of the basal and ceiling for the Oral Expression subtest involves some uncertainty, as the scoring information provided on the record form may not be adequate for correctly scoring all items. More complete scoring information is provided in the test manual. A solution to this problem is to provide the needed information in the same place that instructions for administering each item are provided” (Carpenter & Malcolm, 2001, p. 861). Listening Comprehension is measured by asking the examinee to select one of four pictures that best depicts a statement (e.g., 'In which picture is she not walking to school') made by the examiner. Oral expression is assessed by asking the examinee to look at one or more line drawings and respond verbally to a statement made by the examiner (e.g., 'Tell me what is happening here and how the mother feels'). Contrary to the author's claim, these tasks are not typical of those found in the classroom, and like other language tests of this nature, concerns about the ecological validity of the instrument need to be addressed in the test manual. Test Interpretation: Chapter 7, “Determination and Interpretation of Normative Scores”, provides instruction for converting raw scores to standard scores, calculating confidence intervals and other standardized scores, dealing with 0 scores, and the interpretation of each type of standardized score. Interpretation of the OWLS is limited to the use of standardized scores. Appendix C, “Grammar and Usage Guidelines”, provides a useful glossary and introduction to common grammatical mistakes that the examiner may encounter (e.g., faulty agreement between subject and verb). Comment: No further information, particularly in relation to Carrow-Woolfolk’s theoretical model, is given. No curriculum links are discussed or identified which is surprising as the author makes a claim about classroom language in the introduction. Comments from Buros reviewers: “Listening Comprehension, Oral Expression, Oral Composite. The authors are to be commended for providing clear and easy-to-follow directions for scoring as well as determining normative scores. More attention, however, 5 Hayward, Stewart, Phillips, Norris, & Lovell should have been directed to establishing the cautions that examiners need to exercise in interpreting these scores. This is especially the case for test-age equivalents that can be derived for each subtest and a composite score for the whole test” (Carpenter & Malcolm, 2001, p. 861). Comment: “An interesting feature of the Oral Expression subtest is that the examiner can conduct a descriptive analysis of correct and incorrect responses. For all but 30 of the 96 items on this subtest, correct responses can be categorized as preferred or acceptable responses, providing additional information on how well the examinee understood the oral expression task. In contrast, incorrect responses can further be classified as a miscue involving grammar or a miscue involving semantic and/or pragmatic aspects of language. Although the manual provides item-by-item scoring rules for making these decisions, no data are provided on the reliability of these scores” (Carpenter & Malcolm, 2001, p. 861). Comment: “Both subtests are easy to administer, requiring only about 15 to 40 minutes depending upon the age of the child. The establishment of the basal and ceiling for the Oral Expression subtest involves some uncertainty, as the scoring information provided on the record form may not be adequate for correctly scoring all items. More complete scoring information is provided in the test manual. A solution to this problem is to provide the needed information in the same place that instructions for administering each item are provided” (Carpenter & Malcolm, 2001, p. 861). Standardization: Age equivalent scores called test-age equivalents Grade equivalent scores Percentiles Standard scores Stanines Other Listening Comprehension, Oral Expression, Oral Composite. Normal Curve Equivalents (NCE) are provided as some agencies and legislative requirements exist which mandate their use. Mean scaled scores for both the Listening Comprehension and Oral Expression Scales were 100 with a standard deviation of 15. SEMs (68, 90, and 95% levels) and Confidence intervals available are presented by age on page 123 (Carrow-Woolfolk, 1995). Oral Composite had SEM of 4, Listening had 6.1 and Oral Expression had 5.4 standard score points across age ranges. No mention or caution regarding the use of age equivalent scores was found in the manual. Reliability: Internal consistency of items: Using Guilford’s formula, internal reliabilities were calculated for the three standard scores available. These are reported in Table 9.1 by age and subtest and composite. Mean reliability coefficients (using Fisher’s z transformation) 6 Hayward, Stewart, Phillips, Norris, & Lovell across subtests and composites were high: .84, .87, and .91 respectively. 7 Test-retest: Three age ranges (4 years, 0 months through 5 years, 11 months, n=50, 8 years, 0 months through 10 years, 11months, n=54, and 16 years, 0 months through 18 years, 11 months, n=33) were randomly selected and sample characteristics were provided. The interval median for the retesting was 8 weeks. Table 9.4 presents reliability coefficients for subtests and composite mean standard scores and deviations for the three age ranges. Corrected coefficients range from .73 to .89. (Gain is indicated by subtracting second minus first testing). Comment from the Buros reviewer: “Correlations for test-retest reliability and internal consistency for the Oral Expression scale and the composite score for both scales were almost always above .80. For the Listening Comprehension subtest, however, measures of reliability were below .80 for children aged 6 to 9, suggesting that this particular scales appears to be best suited as a screening device at these ages” (Carpenter & Malcolm, 2001, p. 862). Inter-rater: 96 students in ages 3 to 5, 6 to 8, 9 to 12 and 13 to 21 years were used, and characteristics are reported by gender and race/ethnicity. Completed test booklets were scored independently by four raters who did not have previous experience with the Oral Expression Scale but had attended a brief training session outlining the scoring procedures and rules. Coefficients ranged from .93 to .99 with mean of .95. The author states, “agreement was greatest for the youngest ages taking the early, less complex items. However, agreement at the upper ages is also quite acceptable in light of the complexity of the scoring of the later items” (CarrowWoolfolk, 1995, p. 126). Also, a second analysis was conducted with a Multi-Faceted Rasch Model (FACETS) to examine “significant rater-by-examinee interactions”(p. 126). Results demonstrated that five items were found to be problematic, with three caused by rater errors (recording mistakes). The remaining two items received lower scores and on this basis the manual was clarified and examples were added. Other: Validity: Comment: “Several studies were highlighted that compare results of the Listening Comprehension and Oral Expression Scales, by age levels, with commonly used measures of ability (Kaufman Assessment Battery for Children, Wechsler Intelligence Scales for Children--Third Edition, and the Kaufman Brief Intelligence Scale,) other measures of language (Test for Auditory Comprehension Hayward, Stewart, Phillips, Norris, & Lovell 8 of Language--Revised, Peabody Picture Vocabulary Test--Revised, and Clinical Evaluation of Language Fundamentals--Revised), and tests of academic achievement (Kaufman Test of Educational Achievement, Comprehensive Form, Peabody Individual Achievement Test--Revised, and Woodcock Reading Mastery Test--Revised). Correlations with other language tests were moderate to high, reflecting similarities between the tests. The use of the OWLS can be justified by the more extensive nature of the test and its age ranges. Correlations with IQ measures indicate strong positive relationships between the Listening Comprehension and Oral Expression Scales and various tests of verbal ability. Moderate correlations were obtained between these scales and the nonverbal sections of the IQ batteries. Moderate correlations were also obtained between the OWLS Scales and various measures of verbal achievement. Low correlations were obtained between these scales and measures of math achievement. The author offered these correlations as evidence of divergent validity, especially in areas where low correlations were obtained between the OWLS and various subtests of math achievement. Validity studies conducted with clinical populations (students with speech impairments, language delays, and language impairments) indicated the more involved the speech or language difficulty, the lower the scores on the OWLS. This would indicate that the test is able to identify students with difficulties in the language functions” (Carpenter & Malcolm, 2001, p. 864). Content: The author refers readers to the material presented in the introduction regarding the model and descriptions of constructs (Chapters 2 and 3). Comment: No other information about content is provided whereas other newer tests include research to support their claims. Criterion Prediction Validity: In the introduction to this section, the author provides this overview: “Convergent validity is shown by relatively high correlations with other measures of the same ability, and discriminant validity is demonstrated by lower correlations with measures of different constructs” (Carrow-Woolfolk, 1995, p. 127). The section then proceeds with a summary of evidence in each of the following correlations which were analyzed using Guilford’s (1954) formula which corrects for restricted range of scores: Language: The Test for Auditory Comprehension of Language-revised, Peabody Picture Vocabulary Test-revised, and Clinical Evaluation of Language Fundamentals-revised were administered. Sample sizes ranged from 31 children (TACL-R and CELF-R) to 98 children (PPVT-R). The age ranges were: 4 years, 1 month to 5 years, 11 months (TACL-R), 7 years, 0 months to 11 years, 0 months (PPVT-R), and 14 years, 1month to 16 years, 7 months (CELF-R). Counterbalanced order of presentation was used for all studies. Average intervals between testing ranged from 27 days to 109 days, with some administrations occurring the same day and others up to 292 days between testing administration. Results were: PPVT-R (.75), TACL-R Total Score (.78), and CELF-R Total Language (.91). The author discusses the results in terms of the targeted skills for each test with a summary statement: “While there Hayward, Stewart, Phillips, Norris, & Lovell is an apparent relationship between performance on all four instruments, OWLS addresses language at a level that is conceptually different from just knowledge of vocabulary or syntactic structures. As discussed in Chapter 3, progressive difficulty of items on the Listening Comprehension and Oral Expression scales is achieved not only by increasing the difficulty of the vocabulary but also by including progressively more complex language structures and contexts” (Carrow-Woolfolk, 1995, p. 130). Cognitive ability: The author notes that though measures of language would be expected to produce higher correlations, cognitive measures should show some degree of correlation given the complex relationship between the domains of language and cognition. The Kaufman Assessment Battery for Children (KABC), Weschler Intelligence Scale for Children-III (WISC-III), and Kaufman Brief Intelligence Test (K-BIT) were administered. The procedures followed the same outline as discussed above in the section on language measures. Sample sizes ranged from 31 children (K-ABC) to 66 children (K-BIT). Age ranges were: 4 years, 7 months to 6 years, 11 months (K-ABC), 8 years, 0 months, to 11 years, 11 months (WISC-III), and 14 years, 7 months to 21 years, 11 months (KBIT). Counterbalanced order of presentation was used for all studies. Average intervals between testing ranged from the same day to 37 days. Results were: K-ABC Achievement Score (.82), WISC-III Verbal IQ (.74), and K-BIT Vocabulary subtest (.76). Nonverbal ability correlations ranged: .70, .69, and .65 respectively for the appropriate subtests on each test. Global score correlations were as follows: .76, .73, and .75. Academic achievement: Following the same research frame as first described above relative to language measures, the author also reports on correlations with academic achievement. The Kaufman Test of Educational Achievement (K-TEA), Peabody Individual Achievement Test-revised (PIAT-R), and Woodcock Reading Mastery Test-revised (WRMT-R) were studied. Sample sizes ranged from 30 children (K-TEA and WRMT-R) to 31 children (PIAT-R). Age ranges were: 8 years, 2 months to 9 years, 0 months (KTEA), 9 years, 3 months to 11 years, 1 month (PIAT-R), and 10 years, 2 months to 12 years, 10 months (WRMT-R). Counterbalanced order of presentation was used for all studies. Average intervals between testing ranged from 24 to 65 days. Results indicated “positive correlations between the Oral Composite and the K-TEA, PIAT-R and WRMT-R, suggesting dependence on language in academic tasks” (Carrow-Woolfolk, 1995, p. 134). This statement is confirmed by the data which show the highest correlation to be with WRMT-R Word Comprehension (.88) and lowest correlation with K-TEA Mathematics Composite (.43). Clinical validity is evidenced by comparisons with clinical populations where the samples were matched for age, gender, race/ethnicity, and SES (using the standardization sample) and t-tests for paired samples were conducted. Classification criteria were specified for each diagnostic category. All sample characteristics collected were described. The following diagnostic categories were studied: 9 Hayward, Stewart, Phillips, Norris, & Lovell 10 Speech impaired: 33 students were tested and the criteria were specified in manual. Students received no language services, ranged in age from 3 years, 1 month to 13 years, 2 months. Gender, race/ethnicity, and SES were provided. No significant differences were found for Listening Comprehension though at p<.05, slightly lower performance for clinical group on Oral Expression Scale. However, the author states, “4 point difference between the mean scores for the Oral Composite is not significant” (CarrowWoolfolk, 1995, p.136). Language delayed: 63 students, ages 3 years, 4 months to 7 years, 11 months, were included. The classification was specified in the manual and gender/race, etc were provided. Significant differences were found at p<.001 as expected. Language impaired: 37 students, ages 8 years, 0 months to 12 years, 10 months, were included. The classification is specified in the manual and gender, etc. was provided. The children in the clinical group scored significantly lower at p< .001 on both subtests and the composite. The author notes, “The 19- to 23-point differences are even larger that those for the younger language delayed group” (Carrow-Woolfolk, 1995, p. 138). Mentally handicapped: 32 students, ages 5 years, 7 months to 17 years, 11 months were included. As expected in this clinical group, all scored significantly lower with a 28- to 30-point difference at p<.001. Learning disabled-reading: 40 students, ages 6 years, 6 months to 14 years, 5 months, were tested. “Previous research…found language deficits in 90.5 of a population of learning disabled children, this clinical group was expected to score lower that its control group on the OWLS” (p. 139). Indeed, they did; with point difference of 6-10 at p<.05 and p<.001 level of significance respectively. They performed better on Listening Comprehension (not a significant difference). Learning disabled-undifferentiated: 38 students, ages 7 years, 10 months, to 18 years, 10 months, were tested. Characteristics were provided and similar results to those in Learning disabled-reading clinical group reported above where Listening Comprehension differences were not significant were . However, this clinical group performed significantly differently on both Oral Expression and Oral Composite (16, and 13 respectively at p<.001). Hearing impaired: 27 students with mild to moderate loss (40-55 dB), who mainstreamed most or all of their school day were tested using the same amplification as used in their classrooms. Ages ranged from 3 years, 3 months to 20 years, 0 months and characteristics were provided. As expected, all scores fell significantly lower at p<.01. Hayward, Stewart, Phillips, Norris, & Lovell 11 Chapter One: 46 students receiving special services for reading difficulties, ages 7 years, 1 month to 18 years, 10 months were tested. Sample characteristics were provided and mean scores and standard deviations were reported. All show lower scores than the control group from standardization the sample. For example, the Oral Composite score mean was 91.9 +/- 11.1. Comments from the Buros reviewer: “ ...seven studies demonstrated that students with special needs scored lower on the OWLS than children included in the standardization sample that were matched on age, gender, race/ethnicity, and SES. This included children with learning disabilities in reading, learning disabilities in general academic skills, speech impairments, language delays, language impairments, hearing difficulties, and mental handicaps. The findings from these studies provide strong support for using the OWLS to help in the identification of students with learning disabilities as well as other disabilities involving language and cognitive difficulties” (Carpenter & Malcolm, 2001, p. 861). “Finally, there was relatively strong support for the author's primary claim that the instrument provides a valid measure of general listening and speaking skills. First, scores on the OWLS were moderately to highly correlated with scores on other measures of language development. Second, measures of achievement and cognitive development were also moderately to highly correlated with OWLS scores. Previous research has established that school learning is dependent on language ability and that there is a substantial relationship between the development of cognitive and language skills. Third, mean scores of students in the standardization sample increased from one age to the next. As expected, differentiation was greatest for young children and least for older students and young adults. Fourth, as noted earlier, students with language difficulties, such as a hearing or language impairment, obtained lower scores on the test than matched students in the standardization sample” (Carpenter & Malcolm, 2001, p. 861). Construct Identification Validity: Evidence is provided for two types of construct validity, i.e., developmental progression of scores and intercorrelations of the scales. In terms of developmental progression, age differentiation is evidenced by increases in the raw scores with steeper increases in the earlier years. Table 9.6 presents the means and standard deviations for each age interval from the standardization sample. Comment: I’m not sure why the author chose raw scores rather than standard scores for this purpose. Intercorrelations between the two scales, based on standard scores, are presented in Table 9.7. Moderate correlations are evidenced between the Listening Comprehension Scale and the Oral Expression Scale with a range from .54 to .77 with mean of .70. Sufficient correlation evidences that each scale is tapping skills that are unique but nonetheless related so that support is given to the overall Oral Composite Score. Hayward, Stewart, Phillips, Norris, & Lovell 12 Differential Item Functioning: not reported. Other: none Summary/Conclusions/Observations: The two Buros reviewers differ on many aspects of their reviews: one is a professor and the other a school psychologist. The special education professor was more critical and it was primarily his comments that I included in this review. He stated the following in summary: “In summary, the OWLS provides reliable and valid scores for determining the language competence of individual children. The only exception involves the Listening Comprehension measure, which appears to be best suited as a screening device for children 6 to 9 years of age” (Carpenter & Malcolm, 2001, p. 862). On the other hand, the school psychologist was generally more positive and stated, “The OWLS addresses these areas in a fashion that taps into everyday language functioning more so than do other language tests” (Carpenter & Malcolm, 2001, p. 862). Clinical/Diagnostic Usefulness: The Buros reviewer states: “The OWLS Listening Comprehension and Oral Expression Scales may prove to be one of the more popular and widely used language tests. Examiners may find that the OWLS provides information on language functions that are not tapped by other language tests they are currently using” (Carpenter & Malcolm, 2001, p. 864). My response: But as a speech-language pathologist, I disagree with the reviewers on this point. The OWLS is an older test now, superseded by such measures as the CELF-4, which is more comprehensive, current, and suavely linked to current U.S. education requirements and curriculum. I think that few speech-language pathologists will use OWLS but other professionals, such as special educators and reading specialists, who must make decisions regarding reading abilities may still find this test useful with the caveats described in this review and that of the Buros reviewers. Hayward, Stewart, Phillips, Norris, & Lovell 13 References Carpenter, C. & Malcolm, K. (2001). Test review of Woodcock Reading Mastery Test-Revised 1998 Normative Update. In B.S. Plake and J.C. Impara (Eds.), The fourteenth mental measurements yearbook (pp. 860-864). Lincoln, NE: Buros Institute of Mental Measurements. Carrow-Woolfolk, E. (1995). Manual: Listening Comprehension and Oral Expression. Circle Pines, MN: American Guidance Service, Inc. Current Population Survey, March 1991 [machine-readable data file]. (1991). Washington, DC: Bureau of the Census (Producer and Distributor). To cite this document: Hayward, D. V., Stewart, G. E., Phillips, L. M., Norris, S. P., & Lovell, M. A. (2008). Test review: Oral and written language scales: Listening comprehension and oral expression (OWLS). Language, Phonological Awareness, and Reading Test Directory (pp. 1-13). Edmonton, AB: Canadian Centre for Research on Literacy. Retrieved [insert date] from http://www.uofaweb.ualberta.ca/elementaryed/ccrl.cfm.