Assessment Notes

advertisement
Test of Early Reading Ability, Third Edition.
Review of the Test of Early Reading Ability, Third Edition by SHARON deFUR, Associate
Professor of Special Education, College of William and Mary, Williamsburg, VA:
DESCRIPTION. The Test of Early Reading Ability, Third Edition (TERA-3) is a normreferenced, individually administered test that assesses the mastery of emergent literacy skills in
young children ages 3 years 6 months to 8 years 6 months. There are five identified purposes of
the TERA-3: (a) to identify children who are below peers in reading development; (b) to identify
strengths and weaknesses of individual children; (c) to document progress as a result of early
reading intervention; (d) to serve as a measure in reading research; and (e) to serve as one
component of a comprehensive assessment. To their credit, the authors clearly state that the
TERA-3 is not to be used as a sole basis for instructional planning.
The TERA-3 has two alternate forms and the test kit includes an examiner's manual, Form A and
B picture book, and examiner record booklets for Forms A and B. Three subtests comprise the
TERA-3: (a) Alphabet (mastering the alphabet and its functions; (b) Conventions (understanding
the arbitrary conventions of reading and writing in English); and (c) Meaning (understanding that
print conveys thought and meaning). The full test can be administered typically in 30 minutes or
less, but the items are not timed.
Nonclinical staff can administer the TERA-3, but the authors strongly recommend that the
examiner have formal training in assessment with a basic understanding of testing statistics, and
general procedures regarding test administration, scoring, and interpretation. Regardless of
training, the authors recommend careful study of the examiner's manual and a minimum of three
practice opportunities before using the TERA-3 in an actual testing situation.
The TERA-3 results are reported in raw scores, age and grade equivalents, percentile scores,
standard scores, and confidence scores for each of the three subtests. A Reading quotient is
calculated by summing scores from the three subtests and using a table to convert the sum to a
Reading Quotient. The Reading Quotient is also reported as a percentile. Standard error of
measurement, confidence interval, and standard score ranges are calculated for the combined
subtests and for the Reading Quotient. Answers receive a score of 1 for correct or 0 for incorrect,
and expected answers are clearly indicated in the examiners record booklet. Chronological age
determines the start point for testing, but a basal is established when three items are correct in a
row, and a ceiling is established when three items are failed in a row. The examiner record
booklet includes a profile sheet that offers a graphic comparison across the three TERA-3
subtests as well as a graph to compare the TERA-3 Reading Quotient with other comparable
measures that might have been administered to the child. In addition, the examiner record
booklet has space for interpretation and comments, where, in addition to diagnostic implications,
the examiner can note the conditions of testing and the degree of validity obtained given the
testing conditions. The authors take care to help the user interpret the scores obtained on the
TERA-3. Throughout the manual, the authors emphasize that 'tests do not diagnose, people do'
urging the user to consider why a child responded in a certain way and not just that they did
respond in a particular way.
DEVELOPMENT. The TERA-3 represents a revision of the TERA-2 with the authors taking the
following actions based on the recommendations of reviewers of TERA-2. The test authors
collected new normative data, addressing the need for appropriate demographic representation;
conducted extensive reliability and validity studies; added items as recommended; made the test
pictures in color; and re-introduced grade and age equivalents (with reluctance) because they are
required by many state and local agencies. No discussion is directed at describing the specific
field tests for the TERA-3; however, extensive discussion is provided in the description of the
technical adequacy of the TERA-3.
The authors provide a readable review of reading literature that documents the importance of
emergent literacy skills in alphabet, conventions, and meaning along with the importance of
assessing reading in young children. The authors substantiate the appropriateness of addressing
these three areas simultaneously, rather than sequentially, as this progression mirrors the reading
development process. The theoretical framework underlying the TERA-3 is well supported in
current reading research (National Reading Panel, 2000).
TECHNICAL. The TERA-3 used a relatively small norming sample (N = 875), but one that was
well matched to the general school-age population (gender, race, ethnicity, SES, disability, and
urban/rural) and representative of regions across the United States. All data were collected
between February 1999 and April 2000. In response to criticisms of the TERA-2, the test authors
carefully examined the possibility of bias for gender, race, ethnicity, and culture and found none
evident, or made accommodations as needed.
The test developers assessed content sample reliability using two measures. The first measure
was the use of coefficient alpha across age intervals for each subtest and for the Reading
Quotient. They found all subtests had acceptable levels of alpha with all values exceeding .80.
The Reading Quotient had alphas of .91 or higher across all ages. They also evaluated the
internal consistency reliability for a variety of examinee subgroups and found alphas for all
subgroups to be high also (above .91) and concluded that this indicates that the TERA-3 has no
clear bias for any subgroup. In addition, the alternate forms correlations exceeded .80.
Test-retest reliability (an interval of 2 weeks) resulted in correlation coefficients near .88, with
most comparisons near .92. The test developers assessed interscorer reliability using one TERA
author and two advanced graduate students who observed a total of 40 protocols; the interscorer
reliability was at .99, clearly supporting the consistency that could be expected between test
examiners who had well-developed skills in administering the TERA-3.
The test developers describe a convincing and systematic process used to determine content
validity for the TERA-3. These include reviewing research, comparing lists of emerging reading
behaviors, subjecting items for expert examination, employing a conventional item analysis, and
a differential item functioning analysis. Each of these measures strongly supported the assertion
that items on the TERA-3 represent the behaviors consistent with those expected for emerging
readers, and do so without bias.
The test developers estimated the concurrent validity of the TERA-3 scores by comparing them
to scores on other norm-referenced measures, including the TERA-2, the Stanford Achievement
Test-9, and the Woodcock Reading Mastery Test-Revised (NV). Not surprisingly, the concurrent
validity coefficients for the TERA-2 were extremely high. The TERA-3 compared well to the
SAT-9 reading comprehension and the WRMT-R(NV) reading quotient. Moderate predictive
validity was found for other subtests of the SAT-9 and the WRMT-R(NV). The test developers
also determined that the TERA-3 differentiated appropriately for chronological age and for
children who were experiencing reading, language, or learning disabilities where lower scores on
the TERA-3 would be expected. In conclusion, the authors provide convincing evidence that the
TERA-3 is a psychometrically sound measure of early reading ability.
COMMENTARY. The TERA-3 represents the culmination of 20 years of revision on a Test of
Early Reading Ability where the test developers have carefully attended to the criticisms of
earlier versions. To their credit, the technical development and analysis of the TERA-3 instills
confidence that the test scores can be considered highly reliable and valid and that the authors
have taken great care to address any inherent bias due to race, gender, ethnicity, SES, or
disability. In spite of the TERA-3's technical adequacy, commendably, the authors urge the user
not to rely solely on this test for curricular or other diagnostic decisions and point the user to
other sources of data that can substantiate or refute the findings of the TERA-3 for any one child.
The examiner's manual is well written and readable, which can serve to educate the user who
studies it attentively.
The TERA-3 is easy to use and score, but I support the test developers' recommendation that the
user have training in assessment and interpretation and that the prospective user engage in
systematic practice prior to using the TERA-3 for diagnostic purposes. The authors indicate that
they reluctantly reinstated the calculation of age and grade equivalent scores and I share their
reluctance. In spite of all of the psycho-educational specialists' warnings that age and grade
equivalents are meaningless scores regarding instruction or diagnostic comparisons across
instruments, educators tend to gravitate toward these scores because they seem the most
understandable. Yet, an age equivalent of 4 years 4 months on the TERA-3 Alphabet subtest or a
grade equivalent of 1.3 on the Meaning subtest have no instructional implications, nor do they
accurately measure progress. Children's grade and age level equivalency could be rising over
time, but their standard scores and percentiles declining when the rate of improvement did not
equate to the passage of chronological time. I would urge the authors to make this precaution
more prominent than is found in their current manual.
SUMMARY. The TERA-3 represents a reliable and valid measure of early reading ability and
reflects those skills that have been identified in reading research as critical to the development of
reading. It provides data that suggest strengths or weaknesses in understanding the alphabet and
its functions, understanding the conventions of print, and in deriving meaning from print. The
TERA-3 offers a quick tool, with easy one-to-one nonclinical administration, to supplement
other formal and informal assessments of development reading and can screen for specific areas
of strength and weaknesses in individual children. Although the TERA-3 is not a restricted
assessment tool, interpretation can be best done by examiners who have training in assessment
and an understanding of developmental reading skills. Given that alphabetic knowledge and
phonological awareness have a high rate of prediction for future reading skills (National Reading
Panel, 2000), the TERA-3 provides some data that could assist educators in identifying those
students who would benefit from early intervention in their reading instruction.
REVIEWER'S REFERENCE
National Reading Panel. (2000, April). Report of the National Reading Panel: Teaching children
to read: An evidence-based assessment of the scientific research literature on reading and its
implications for reading instruction: Reports of the subgroups. Washington, DC: National
Institute of Child Health and Human Development, National Institutes of Health.
Review of the Test of Early Reading Ability, Third Edition by LISA F. SMITH, Associate
Professor, Psychology Department, Kean University, Union, NJ:
DESCRIPTION. The Test of Early Reading Ability, Third Edition (TERA-3) is an individually
administered assessment of emerging reading skills for children between the ages 3 years, 6
months and 8 years, 6 months. The authors define five purposes for the TERA-3:
(a) to identify those children who are significantly below their peers in reading development and
thus may be candidates for early intervention, (b) to identify strengths and weaknesses of
individual children, (c) to document children's progress as a consequence of early reading
intervention programs, (d) to serve as a measure in research studying reading development in
young children, and (e) to accompany other assessment techniques. (examiner's manual, p. 8)
The TERA-3 is attractively packaged. It contains three sturdy and colorful spiral bound booklets,
one each for the examiner's manual, Test Form A, and Test Form B, and a profile/examiner
record booklet for each form of the test. Each form of the TERA-3 is made up of three subtests.
Subtest I, Alphabet, has 29 items designed to assess skills such as knowledge of the alphabet,
counting syllables, and initial and final letter sounds. Subtest II, Conventions, has 21 items
designed to assess principles such as page orientation, punctuation, spelling, and capitalization.
Subtest III, Meaning, has 30 items designed to assess skills such as reading comprehension,
sentence construction, and paraphrasing.
The administration is not timed but takes an average of 30 minutes. Age-appropriate entry points
are given; ceiling and basal points are clear. The authors state that children under the age of 5
may require a break after each 10 minutes of testing. Scores, demographics, and other data are
entered on the appropriate form's profile/examiner record booklet page. In scoring the items,
correct responses are scored as 1, incorrect as 0. Although the authors claim that the scoring is
straightforward, there is some potential for ambiguity or subjectivity for some items.
Each subtest has a mean of 10 and a standard deviation of 3. The three subtest scores are
combined to form a composite Reading Quotient score with a mean of 100 and a standard
deviation of 15. The name given to this score and the fact that the metric is identical to an IQ can
lead to serious misinterpretation of the nature of what is being measured here. The authors
appear to be making the argument that reading ability is somehow analogous to intelligence, a
contention with a host of philosophical and empirical problems. Percentiles, age equivalents, and
grade equivalents are also given, the last two with cautionary remarks.
DEVELOPMENT. The TERA-3 has been under development since 1981. The examiner's
manual gives a detailed historical overview and a listing of current improvements. It offers
several definitions of reading and describes components of early reading to establish the
rationale for the TERA-3 subtests. Individual items were developed from consultation of
research results, the literature base, other tests, and curriculum materials. In forming the subtests,
the authors 'asked seven professionals with expertise in reading to review our item placement
and suggest any changes they deemed necessary' (examiner's manual, p. 62). Although the
professionals are listed by name, no credentials are given.
The authors argue that the TERA-3 is based on a philosophy of emergent reading. They cite
Valencia (1997) as providing a guiding structure for the development of the measure, with the
notable exception that the TERA-3 does not include any measure of phonemic awareness. There
are some questions that arise with respect to item development and placement. These relate to the
numbers of item types present within each subtest across forms and the ordering of items.
However, these issues are relatively minor and probably do not affect the subscores generated.
TECHNICAL.
Standardization. The norming sample included 875 children aged 3-6 to 8-6 from 22 states. This
sample appears to be representative of nationwide statistics as reported in the 1999 U.S. Census,
with regard to geographic region, gender, race, urban/rural residence, ethnicity, family income,
educational level of parents, and disability status. The sample was also stratified by age.
Participants took both forms of the TERA-3 during one testing session; no counterbalancing
procedures are described.
Reliability. Evidence of reliability is presented for content sampling, time sampling, and
interrater reliability. For content sampling, acceptable coefficient alpha data are given by age,
form, subtest, and reading quotient score. Acceptable coefficient alpha data are also given by
selected subgroups: gender, ethnicity, learning disabled, language impaired, and reading
disabled. However, it would have been helpful to see the data for the selected subgroups by age,
as well. Correlations for the alternate forms (immediate administration) are also acceptable.
Overall, for content sampling, Subtest II, Conventions, demonstrates lower reliability (.83) on
both forms as compared to the other subtests (roughly .90) and the Reading Composite (.95).
Test-retest reliability at a 2-week interval was investigated using n = 30 children aged 4-6 years
from Michigan and n = 34 children aged 7-9 years from Texas. Though the correlations shown
are acceptable, it would be difficult to generalize about stability over time given the sample
characteristics. Interrater reliability on 40 randomly drawn protocols was .99.
Validity. Evidence of content validity is provided by lists of research, curriculum materials, and
other tests consulted; favorable evaluations by the seven professionals (mentioned previously);
and a parallel comparison of the item content on the TERA-3 to Valencia's (1997) categories of
early reading behaviors. The authors state that they selected items for the TERA-3 based on the
item-total score Pearson correlation rather than the point-biserial correlation, although these are
mathematically identical. A number of differential item functioning (DIF) bias studies were
conducted. Although 13 items across the two forms exhibited some DIF, the amount observed
was negligible. There is some concern that 7 of the 13 items demonstrating DIF were on Subtest
I, Alphabet, Form B. Criterion-prediction studies were conducted using the TERA-2, the
Stanford Achievement Test Series-Ninth Edition (n = 70), the Woodcock Reading Mastery TestRevised-Normative Update (n = 64), and teacher ratings (n = 411). The correlations tend to be
moderate to high. Evidence of construct validity was determined by correlating performance on
the TERA-3 to age. Favorable correlations here are hardly surprising given the developmental
nature of reading and the additional instruction received as age increases. Similarly, group
differentiations comparing disability subgroups to nonclassified subgroups are what would be
expected. Details of a confirmatory factor analysis need clarification.
COMMENTARY/SUMMARY. Generally, the TERA-3 accomplishes its stated purposes,
especially if used in conjunction with other assessments. Its strengths lie in the ease of
administration, easy to use tables for scoring, and a clearly written examiner's manual. However,
claims by the authors that the TERA-3 is 'a valid measure of reading' (examiner's manual, p. 76)
should be viewed with caution. Tests themselves are not valid. The TERA-3 will be used with
diverse types of children in a variety of settings for an assortment of reasons. As such, the
validity of the TERA-3 will depend on the specific use of the test in a given situation. The
authors should be commended for offering an assessment based on modern reading theory that
incorporates examples from everyday life that should appeal to children.
REVIEWER'S REFERENCE
Valencia, S. W. (1997). Authentic classroom assessment of early reading: Alternatives to
standardized tests. Preventing School Failure, 41(2), 63-70
Download