Oral and Written Lan..

advertisement
Hayward, Stewart, Phillips, Norris, & Lovell
Test Review: Oral and Written Language Scales: Listening Comprehension and Oral Expression (OWLS)
N.B. I have chosen to review only the Listening Comprehension and Oral Expression subtests. In considering the Written Expression
subtest, I found that the items were specific to the student’s ability to engage in writing activities such as printing his or her name,
copying a word, and writing letters, and later, composing sentences from the examiner’s spoken words (e.g., Item 25 “Write one
sentence using these four words”) or responding in writing to the examiner’s instruction (Item 19 “A girl is looking for her lost ring.
Write here what she asked her mother.”). However, at the start age of 5-7 years, the student begins by copying words and sentences,
writing a letter (e.g., Item 5 Write the letter “f” here) or writing spoken sentences (item 12 “Write the sentence I say here.”). For our
purpose, I focused on abilities that I thought were related to reading and were comparable with other tests reviewed so far.
Name of Test: Oral and Written Language Scales: Listening Comprehension and Oral Expression (OWLS).
Author(s): Elizabeth Carrow Woolfolk
Publisher/Year: 1995 American Guidance Service, Inc.
Forms: only one
Age Range: 3 years through 21 years.
Norming Sample
Test construction began in 1991 with a tryout phase followed by a national standardization study. From the standardization study
(1992-1993), test administration was detailed and normative scores were developed. In developing the normative scores, the author
notes that smoothing procedures were used to deal with “irregularities caused by sampling fluctuations” (Carrow-Woolfolk, 1995, p.
119).
Item analyses were carried out for both scales. Both classical (item difficulty and item discrimination) and Rasch scaling methods
were used to determine the suitability, presentation order, and consistency of test items. The author provides a good explanation of
these procedures for the potential examiner (Carrow-Woolfolk, 1995, p.115).
Total Number: 1 985
Number and Age: The students ages ranged from 3 to 21 years. There were 13 age groups as follows: 6-month age intervals for ages
3 years, 0 months to 4 years, 11 months; 1-year intervals for ages 5 years, 0 months to 11 years, 0 months; and then by age groups
12-13, 14-15, 16-18, and 19-21. All ages have N=100 or more persons. Younger children, ages 3 and 4, were given only the
Listening Comprehension and Oral Expression Scales.
1
Hayward, Stewart, Phillips, Norris, & Lovell
2
Location: 74 sites (NE, North Central, South, and West)
Demographics: Demographics are reported by gender, geographical region, race/ethnicity and socioeconomic status (maternal
educational).
Rural/Urban: not specified.
SES: SES was reported by maternal employment, which seems unusual to do so but the author states that the mother’s educational
level has a “plausible link to the examinee’s developed language abilities” (Carrow-Woolfolk, 1995, p. 110).
Other: none
Sample characteristics compared favourably to 1991 U.S. Census information (Bureau of the Census, 1991). All data are presented in
table form by various sample characteristics to illustrate representativeness.
Summary Prepared By: Eleanor Stewart November 2007
Test Description/Overview
The complete test kit consists of the examiner’s manual, Listening Comprehension Easel, Oral Expression Easel, and record forms.
(The Written Expression Scale is packaged separately).
The OWLS consists of three subtests: Listening Comprehension, Oral Expression and Written Expression. The first two are
published together in one test kit while the third is packaged and presented separately. As stated by the Buros reviewers, “The
decision to package the Written Expression subtest separately was unfortunate, as it complicates comparing students' performance in
these three areas” (Carpenter and Malcom, 2001, p. 860).
Theory: The OWLS is based on Elizabeth Carrow’s previous work in the area of language test development. She follows from
previous tests in that she presents a model that is based on her theoretical work in language development (she cites Carrow-Woolfolk,
1989; Carrow-Woolfolk and Lynch, 1981). In this manual, the author presents a brief overview of key elements that include the basic
dimensions of language, language knowledge, and language performance. Language knowledge refers to the structure of language
that includes content and form while language performance refers to “internal systems the language user employs to process
language” (Carrow-Woolfolk, 1995, p.7). The author proposes that these two dimensions together account for verbal communication.
The author further elaborates on her theory and spends the rest of the section on theory describing the elements of content, form, and
use. Much of this information is familiar to me from either reviewing her other tests or using them in clinical practice. Hers is a servo
system view of language with interconnecting and overlapping circuits. Language processing theory, according to Carrow, “separates
Hayward, Stewart, Phillips, Norris, & Lovell
the four major processes by the requirements of their perspective processing systems” (p. 12).
I think that the author is confident of her theoretical foundation in that she only references Chomsky and Foster as theoretical
sources. Otherwise, the text is sparse in terms of citations to the work of others in language theory. It is on the basis of her theory that
the OWLS is organized around the four structural categories she proposes (lexical, syntactic, pragmatic, and supralinguistic) and the
processing that includes listening comprehension, oral expression, written expression, and reading. Her model resembles Lois
Bloom’s content/form/use model familiar to speech pathologists and developmental linguists.
Comment from the Buros reviewers: “The only apparent weakness of this test is its art work. Stimulus materials consist of black-andwhite line drawings. Many of these drawings seem vague to the concepts to be tapped by the items. As the items increase in
developmental difficulty, they become much more detailed and subtle in their differences. Students who do not attend well to visual
detail, or who have short attention spans, may miss the cues and details presented in the materials that are used to prompt them for
the appropriate language responses” (Carpenter & Malcolm, 2001, pp. 863-864).
Purpose of Test: The purpose of this test is to assess language knowledge and processing skills. The author identifies three purposes
which are to identify, intervene, and monitor progress, and also to use in research. The author states that identifying language
problems will assist in “addressing potential academic difficulties” (Carrow-Woolfolk, 1995, p. 3). According to the author, growth
across time from preschool through high school and into post-secondary education can be tracked. Also, due to the age range
covered, the author claims that it is useful in research studies.
Comment: The Buros reviewer states that “support that the author provided for each of these claims was uneven” (Carpenter &
Malcolm, 2001, p. 861).
Areas Tested:
1. Listening Comprehension: 111 items (four black and white line drawings per test plate) in progression of increasing difficulty
are used for this subtest. The student must select (by pointing or verbal response) the picture that best matches the examiner’s
statement. The test items reflect increasing difficulty in terms of length and complexity, syntactic structures, semantic factors,
and “amount and type of background information needed to comprehend, and degree to which the cognitive system is
involved…” (p. 20). For example, from the manual (p. 21), the items begin simply with “Show me the car” (noun) and
progress to “Dad said, ‘I think I’ll hit the sack’. What did he do? (idiom).
2. Oral expression: 96 items are presented, as above (i.e., black and white line drawings, etc). Verbal responses are required to
questions about the pictures or to requests to complete descriptions. The author defends her choice to use pictures throughout
3
Hayward, Stewart, Phillips, Norris, & Lovell
the test by stating that the decision “is based on the desire to provide consistency in testing, elicit responses more readily, and
hold the examinee’s attention. Higher level items are far less dependent on picture cues than lower level items, where the
need for modeling is greater” (Carrow-Woolfolk, 1995, p. 23). Comment: I think the pictures are boring.
Areas Tested:
 Oral Language
Vocabulary
Grammar
Narratives
Other
 Print Knowledge
Environmental Print
Alphabet
Other
 Phonological Awareness
Segmenting
Blending
Elision
Rhyming
Other
 Reading
Single Word Reading/Decoding
Comprehension

Spelling
Other
 Writing
Letter Formation
Capitalization
Punctuation
Conventional Structures
Other
 Listening
Lexical
Syntactic
Supralinguistic
Word Choice
Details
Who can Administer: School psychologists, speech pathologists, educational diagnosticians, early childhood specialists and other
professionals with graduate level training in testing and interpretation may administer this test.
Administration Time: Table 1.1 provides average administration time in minutes for the normative sample. The Buros reviewers
estimated 15 to 40 minutes overall with 5 to 15 minutes for the Listening Comprehension and 10 to 25 for Oral Expression.
Test Administration (General and Subtests):
Chapter 4 provides a general overview of testing that contains familiar information about the setting, arrangement of materials and
positioning of the examiner and student, establishing rapport, etc.
Chapters 5 and 6 detail the specific subtests’ administration and scoring. Start points by age are given for the Oral Expression Scale.
The examiner begins each subtest with an example for the student that if answered correctly, triggers administration of the rest of the
subtest. If the student incorrectly responds to the example, the examiner is instructed to teach the response. Three examples are
provided, if all three attempts fail with instruction, the examiner is instructed to note this in the record booklet. Repetitions of the
examples are allowed to assist low functioning students. The examiner can start with test item lower than the suggested age start
4
Hayward, Stewart, Phillips, Norris, & Lovell
point. Target responses are marked on the examiner’s side of the test booklet. No repetitions are allowed on the Listening
Comprehension Scale whereas one repetition is allowed on the Oral Expression Scale. Prompting is allowed on the Oral Expression
scale and is outlined in the manual in the detailed section on Item-by-item Scoring Rules. This section addresses the specifics of each
test item in terms of scoring rule, preferred and acceptable responses, errors (grammatical, semantic, pragmatic). Basal and ceiling
rules apply and differ between the two subtests. Overall, administration is straightforward and easy to follow.
Comment from the Buros reviewer: “The establishment of the basal and ceiling for the Oral Expression subtest involves some
uncertainty, as the scoring information provided on the record form may not be adequate for correctly scoring all items. More
complete scoring information is provided in the test manual. A solution to this problem is to provide the needed information in the
same place that instructions for administering each item are provided” (Carpenter & Malcolm, 2001, p. 861).
Listening Comprehension is measured by asking the examinee to select one of four pictures that best depicts a statement (e.g., 'In
which picture is she not walking to school') made by the examiner. Oral expression is assessed by asking the examinee to look at one
or more line drawings and respond verbally to a statement made by the examiner (e.g., 'Tell me what is happening here and how the
mother feels'). Contrary to the author's claim, these tasks are not typical of those found in the classroom, and like other language tests
of this nature, concerns about the ecological validity of the instrument need to be addressed in the test manual.
Test Interpretation:
Chapter 7, “Determination and Interpretation of Normative Scores”, provides instruction for converting raw scores to standard
scores, calculating confidence intervals and other standardized scores, dealing with 0 scores, and the interpretation of each type of
standardized score. Interpretation of the OWLS is limited to the use of standardized scores. Appendix C, “Grammar and Usage
Guidelines”, provides a useful glossary and introduction to common grammatical mistakes that the examiner may encounter (e.g.,
faulty agreement between subject and verb).
Comment: No further information, particularly in relation to Carrow-Woolfolk’s theoretical model, is given. No curriculum links are
discussed or identified which is surprising as the author makes a claim about classroom language in the introduction.
Comments from Buros reviewers: “Listening Comprehension, Oral Expression, Oral Composite. The authors are to be commended
for providing clear and easy-to-follow directions for scoring as well as determining normative scores. More attention, however,
5
Hayward, Stewart, Phillips, Norris, & Lovell
should have been directed to establishing the cautions that examiners need to exercise in interpreting these scores. This is especially
the case for test-age equivalents that can be derived for each subtest and a composite score for the whole test” (Carpenter &
Malcolm, 2001, p. 861).
Comment: “An interesting feature of the Oral Expression subtest is that the examiner can conduct a descriptive analysis of correct
and incorrect responses. For all but 30 of the 96 items on this subtest, correct responses can be categorized as preferred or
acceptable responses, providing additional information on how well the examinee understood the oral expression task. In contrast,
incorrect responses can further be classified as a miscue involving grammar or a miscue involving semantic and/or pragmatic
aspects of language. Although the manual provides item-by-item scoring rules for making these decisions, no data are provided on
the reliability of these scores” (Carpenter & Malcolm, 2001, p. 861).
Comment: “Both subtests are easy to administer, requiring only about 15 to 40 minutes depending upon the age of the child. The
establishment of the basal and ceiling for the Oral Expression subtest involves some uncertainty, as the scoring information provided
on the record form may not be adequate for correctly scoring all items. More complete scoring information is provided in the test
manual. A solution to this problem is to provide the needed information in the same place that instructions for administering each
item are provided” (Carpenter & Malcolm, 2001, p. 861).
Standardization:
Age equivalent scores called test-age equivalents
Grade equivalent scores
Percentiles
Standard
scores
Stanines
Other Listening Comprehension, Oral Expression, Oral Composite. Normal Curve Equivalents (NCE) are
provided as some agencies and legislative requirements exist which mandate their use.
Mean scaled scores for both the Listening Comprehension and Oral Expression Scales were 100 with a standard deviation of 15.
SEMs (68, 90, and 95% levels) and Confidence intervals available are presented by age on page 123 (Carrow-Woolfolk, 1995). Oral
Composite had SEM of 4, Listening had 6.1 and Oral Expression had 5.4 standard score points across age ranges.
No mention or caution regarding the use of age equivalent scores was found in the manual.
Reliability:
Internal consistency of items: Using Guilford’s formula, internal reliabilities were calculated for the three standard scores available.
These are reported in Table 9.1 by age and subtest and composite. Mean reliability coefficients (using Fisher’s z transformation)
6
Hayward, Stewart, Phillips, Norris, & Lovell
across subtests and composites were high: .84, .87, and .91 respectively.
7
Test-retest:
Three age ranges (4 years, 0 months through 5 years, 11 months, n=50, 8 years, 0 months through 10 years, 11months, n=54, and 16
years, 0 months through 18 years, 11 months, n=33) were randomly selected and sample characteristics were provided. The interval
median for the retesting was 8 weeks. Table 9.4 presents reliability coefficients for subtests and composite mean standard scores and
deviations for the three age ranges. Corrected coefficients range from .73 to .89. (Gain is indicated by subtracting second minus first
testing).
Comment from the Buros reviewer: “Correlations for test-retest reliability and internal consistency for the Oral Expression scale
and the composite score for both scales were almost always above .80. For the Listening Comprehension subtest, however, measures
of reliability were below .80 for children aged 6 to 9, suggesting that this particular scales appears to be best suited as a screening
device at these ages” (Carpenter & Malcolm, 2001, p. 862).
Inter-rater: 96 students in ages 3 to 5, 6 to 8, 9 to 12 and 13 to 21 years were used, and characteristics are reported by gender and
race/ethnicity. Completed test booklets were scored independently by four raters who did not have previous experience with the Oral
Expression Scale but had attended a brief training session outlining the scoring procedures and rules. Coefficients ranged from .93 to
.99 with mean of .95. The author states, “agreement was greatest for the youngest ages taking the early, less complex items.
However, agreement at the upper ages is also quite acceptable in light of the complexity of the scoring of the later items” (CarrowWoolfolk, 1995, p. 126). Also, a second analysis was conducted with a Multi-Faceted Rasch Model (FACETS) to examine
“significant rater-by-examinee interactions”(p. 126). Results demonstrated that five items were found to be problematic, with three
caused by rater errors (recording mistakes). The remaining two items received lower scores and on this basis the manual was clarified
and examples were added.
Other:
Validity:
Comment: “Several studies were highlighted that compare results of the Listening Comprehension and Oral Expression Scales, by
age levels, with commonly used measures of ability (Kaufman Assessment Battery for Children, Wechsler Intelligence Scales for
Children--Third Edition, and the Kaufman Brief Intelligence Scale,) other measures of language (Test for Auditory Comprehension
Hayward, Stewart, Phillips, Norris, & Lovell
8
of Language--Revised, Peabody Picture Vocabulary Test--Revised, and Clinical Evaluation of Language Fundamentals--Revised),
and tests of academic achievement (Kaufman Test of Educational Achievement, Comprehensive Form, Peabody Individual
Achievement Test--Revised, and Woodcock Reading Mastery Test--Revised). Correlations with other language tests were moderate to
high, reflecting similarities between the tests. The use of the OWLS can be justified by the more extensive nature of the test and its
age ranges. Correlations with IQ measures indicate strong positive relationships between the Listening Comprehension and Oral
Expression Scales and various tests of verbal ability. Moderate correlations were obtained between these scales and the nonverbal
sections of the IQ batteries. Moderate correlations were also obtained between the OWLS Scales and various measures of verbal
achievement. Low correlations were obtained between these scales and measures of math achievement. The author offered these
correlations as evidence of divergent validity, especially in areas where low correlations were obtained between the OWLS and
various subtests of math achievement. Validity studies conducted with clinical populations (students with speech impairments,
language delays, and language impairments) indicated the more involved the speech or language difficulty, the lower the scores on
the OWLS. This would indicate that the test is able to identify students with difficulties in the language functions” (Carpenter &
Malcolm, 2001, p. 864).
Content: The author refers readers to the material presented in the introduction regarding the model and descriptions of constructs
(Chapters 2 and 3).
Comment: No other information about content is provided whereas other newer tests include research to support their claims.
Criterion Prediction Validity: In the introduction to this section, the author provides this overview: “Convergent validity is shown
by relatively high correlations with other measures of the same ability, and discriminant validity is demonstrated by lower
correlations with measures of different constructs” (Carrow-Woolfolk, 1995, p. 127). The section then proceeds with a summary of
evidence in each of the following correlations which were analyzed using Guilford’s (1954) formula which corrects for restricted
range of scores:
Language: The Test for Auditory Comprehension of Language-revised, Peabody Picture Vocabulary Test-revised, and Clinical
Evaluation of Language Fundamentals-revised were administered. Sample sizes ranged from 31 children (TACL-R and CELF-R) to
98 children (PPVT-R). The age ranges were: 4 years, 1 month to 5 years, 11 months (TACL-R), 7 years, 0 months to 11 years, 0
months (PPVT-R), and 14 years, 1month to 16 years, 7 months (CELF-R). Counterbalanced order of presentation was used for all
studies. Average intervals between testing ranged from 27 days to 109 days, with some administrations occurring the same day and
others up to 292 days between testing administration. Results were: PPVT-R (.75), TACL-R Total Score (.78), and CELF-R Total
Language (.91). The author discusses the results in terms of the targeted skills for each test with a summary statement: “While there
Hayward, Stewart, Phillips, Norris, & Lovell
is an apparent relationship between performance on all four instruments, OWLS addresses language at a level that is conceptually
different from just knowledge of vocabulary or syntactic structures. As discussed in Chapter 3, progressive difficulty of items on the
Listening Comprehension and Oral Expression scales is achieved not only by increasing the difficulty of the vocabulary but also by
including progressively more complex language structures and contexts” (Carrow-Woolfolk, 1995, p. 130).
Cognitive ability: The author notes that though measures of language would be expected to produce higher correlations, cognitive
measures should show some degree of correlation given the complex relationship between the domains of language and cognition.
The Kaufman Assessment Battery for Children (KABC), Weschler Intelligence Scale for Children-III (WISC-III), and Kaufman
Brief Intelligence Test (K-BIT) were administered. The procedures followed the same outline as discussed above in the section on
language measures. Sample sizes ranged from 31 children (K-ABC) to 66 children (K-BIT). Age ranges were: 4 years, 7 months to 6
years, 11 months (K-ABC), 8 years, 0 months, to 11 years, 11 months (WISC-III), and 14 years, 7 months to 21 years, 11 months (KBIT). Counterbalanced order of presentation was used for all studies. Average intervals between testing ranged from the same day to
37 days. Results were: K-ABC Achievement Score (.82), WISC-III Verbal IQ (.74), and K-BIT Vocabulary subtest (.76). Nonverbal
ability correlations ranged: .70, .69, and .65 respectively for the appropriate subtests on each test. Global score correlations were as
follows: .76, .73, and .75.
Academic achievement: Following the same research frame as first described above relative to language measures, the author also
reports on correlations with academic achievement. The Kaufman Test of Educational Achievement (K-TEA), Peabody Individual
Achievement Test-revised (PIAT-R), and Woodcock Reading Mastery Test-revised (WRMT-R) were studied. Sample sizes ranged
from 30 children (K-TEA and WRMT-R) to 31 children (PIAT-R). Age ranges were: 8 years, 2 months to 9 years, 0 months (KTEA), 9 years, 3 months to 11 years, 1 month (PIAT-R), and 10 years, 2 months to 12 years, 10 months (WRMT-R).
Counterbalanced order of presentation was used for all studies. Average intervals between testing ranged from 24 to 65 days. Results
indicated “positive correlations between the Oral Composite and the K-TEA, PIAT-R and WRMT-R, suggesting dependence on
language in academic tasks” (Carrow-Woolfolk, 1995, p. 134). This statement is confirmed by the data which show the highest
correlation to be with WRMT-R Word Comprehension (.88) and lowest correlation with K-TEA Mathematics Composite (.43).
Clinical validity is evidenced by comparisons with clinical populations where the samples were matched for age, gender,
race/ethnicity, and SES (using the standardization sample) and t-tests for paired samples were conducted. Classification criteria were
specified for each diagnostic category. All sample characteristics collected were described. The following diagnostic categories were
studied:
9
Hayward, Stewart, Phillips, Norris, & Lovell
10
Speech impaired: 33 students were tested and the criteria were specified in manual. Students received no language services, ranged in
age from 3 years, 1 month to 13 years, 2 months. Gender, race/ethnicity, and SES were provided. No significant differences were
found for Listening Comprehension though at p<.05, slightly lower performance for clinical group on Oral Expression Scale.
However, the author states, “4 point difference between the mean scores for the Oral Composite is not significant” (CarrowWoolfolk, 1995, p.136).
Language delayed: 63 students, ages 3 years, 4 months to 7 years, 11 months, were included. The classification was specified in the
manual and gender/race, etc were provided. Significant differences were found at p<.001 as expected.
Language impaired: 37 students, ages 8 years, 0 months to 12 years, 10 months, were included. The classification is specified in the
manual and gender, etc. was provided. The children in the clinical group scored significantly lower at p< .001 on both subtests and
the composite. The author notes, “The 19- to 23-point differences are even larger that those for the younger language delayed group”
(Carrow-Woolfolk, 1995, p. 138).
Mentally handicapped: 32 students, ages 5 years, 7 months to 17 years, 11 months were included. As expected in this clinical group,
all scored significantly lower with a 28- to 30-point difference at p<.001.
Learning disabled-reading: 40 students, ages 6 years, 6 months to 14 years, 5 months, were tested. “Previous research…found
language deficits in 90.5 of a population of learning disabled children, this clinical group was expected to score lower that its control
group on the OWLS” (p. 139). Indeed, they did; with point difference of 6-10 at p<.05 and p<.001 level of significance respectively.
They performed better on Listening Comprehension (not a significant difference).
Learning disabled-undifferentiated: 38 students, ages 7 years, 10 months, to 18 years, 10 months, were tested. Characteristics were
provided and similar results to those in Learning disabled-reading clinical group reported above where Listening Comprehension
differences were not significant were . However, this clinical group performed significantly differently on both Oral Expression and
Oral Composite (16, and 13 respectively at p<.001).
Hearing impaired: 27 students with mild to moderate loss (40-55 dB), who mainstreamed most or all of their school day were tested
using the same amplification as used in their classrooms. Ages ranged from 3 years, 3 months to 20 years, 0 months and
characteristics were provided. As expected, all scores fell significantly lower at p<.01.
Hayward, Stewart, Phillips, Norris, & Lovell
11
Chapter One: 46 students receiving special services for reading difficulties, ages 7 years, 1 month to 18 years, 10 months were tested.
Sample characteristics were provided and mean scores and standard deviations were reported. All show lower scores than the control
group from standardization the sample. For example, the Oral Composite score mean was 91.9 +/- 11.1.
Comments from the Buros reviewer: “ ...seven studies demonstrated that students with special needs scored lower on the OWLS than
children included in the standardization sample that were matched on age, gender, race/ethnicity, and SES. This included children
with learning disabilities in reading, learning disabilities in general academic skills, speech impairments, language delays, language
impairments, hearing difficulties, and mental handicaps. The findings from these studies provide strong support for using the OWLS
to help in the identification of students with learning disabilities as well as other disabilities involving language and cognitive
difficulties” (Carpenter & Malcolm, 2001, p. 861).
“Finally, there was relatively strong support for the author's primary claim that the instrument provides a valid measure of general
listening and speaking skills. First, scores on the OWLS were moderately to highly correlated with scores on other measures of
language development. Second, measures of achievement and cognitive development were also moderately to highly correlated with
OWLS scores. Previous research has established that school learning is dependent on language ability and that there is a substantial
relationship between the development of cognitive and language skills. Third, mean scores of students in the standardization sample
increased from one age to the next. As expected, differentiation was greatest for young children and least for older students and
young adults. Fourth, as noted earlier, students with language difficulties, such as a hearing or language impairment, obtained lower
scores on the test than matched students in the standardization sample” (Carpenter & Malcolm, 2001, p. 861).
Construct Identification Validity: Evidence is provided for two types of construct validity, i.e., developmental progression of
scores and intercorrelations of the scales. In terms of developmental progression, age differentiation is evidenced by increases in the
raw scores with steeper increases in the earlier years. Table 9.6 presents the means and standard deviations for each age interval from
the standardization sample.
Comment: I’m not sure why the author chose raw scores rather than standard scores for this purpose.
Intercorrelations between the two scales, based on standard scores, are presented in Table 9.7. Moderate correlations are evidenced
between the Listening Comprehension Scale and the Oral Expression Scale with a range from .54 to .77 with mean of .70. Sufficient
correlation evidences that each scale is tapping skills that are unique but nonetheless related so that support is given to the overall
Oral Composite Score.
Hayward, Stewart, Phillips, Norris, & Lovell
12
Differential Item Functioning: not reported.
Other: none
Summary/Conclusions/Observations:
The two Buros reviewers differ on many aspects of their reviews: one is a professor and the other a school psychologist. The special
education professor was more critical and it was primarily his comments that I included in this review. He stated the following in
summary: “In summary, the OWLS provides reliable and valid scores for determining the language competence of individual
children. The only exception involves the Listening Comprehension measure, which appears to be best suited as a screening device
for children 6 to 9 years of age” (Carpenter & Malcolm, 2001, p. 862).
On the other hand, the school psychologist was generally more positive and stated, “The OWLS addresses these areas in a fashion
that taps into everyday language functioning more so than do other language tests” (Carpenter & Malcolm, 2001, p. 862).
Clinical/Diagnostic Usefulness:
The Buros reviewer states: “The OWLS Listening Comprehension and Oral Expression Scales may prove to be one of the more
popular and widely used language tests. Examiners may find that the OWLS provides information on language functions that are not
tapped by other language tests they are currently using” (Carpenter & Malcolm, 2001, p. 864).
My response: But as a speech-language pathologist, I disagree with the reviewers on this point. The OWLS is an older test now,
superseded by such measures as the CELF-4, which is more comprehensive, current, and suavely linked to current U.S. education
requirements and curriculum. I think that few speech-language pathologists will use OWLS but other professionals, such as special
educators and reading specialists, who must make decisions regarding reading abilities may still find this test useful with the caveats
described in this review and that of the Buros reviewers.
Hayward, Stewart, Phillips, Norris, & Lovell
13
References
Carpenter, C. & Malcolm, K. (2001). Test review of Woodcock Reading Mastery Test-Revised 1998 Normative Update. In B.S. Plake
and J.C. Impara (Eds.), The fourteenth mental measurements yearbook (pp. 860-864). Lincoln, NE: Buros Institute of Mental
Measurements.
Carrow-Woolfolk, E. (1995). Manual: Listening Comprehension and Oral Expression. Circle Pines, MN: American Guidance
Service, Inc.
Current Population Survey, March 1991 [machine-readable data file]. (1991). Washington, DC: Bureau of the Census (Producer and
Distributor).
To cite this document:
Hayward, D. V., Stewart, G. E., Phillips, L. M., Norris, S. P., & Lovell, M. A. (2008). Test review: Oral and written language scales:
Listening comprehension and oral expression (OWLS). Language, Phonological Awareness, and Reading Test Directory (pp.
1-13). Edmonton, AB: Canadian Centre for Research on Literacy. Retrieved [insert date] from
http://www.uofaweb.ualberta.ca/elementaryed/ccrl.cfm.
Download