Evaluation of Formal Assessment Reviewed by Myria Knapp Test Title: Peabody Individual Achievement Test-Revised (Piat-R) Author: Frederick C. Markwardt, Jr. Publisher: AGS American Guidance Service Copyright: 1989 Description General Purpose: The Piat-R provides screening in general information, reading recognition, reading comprehension, mathematics, spelling, and written expression. It measures functional knowledge and abilities that are expected in the education system. The test offers both multiple choice and free response questions in the following six categories. General Information – measures general encyclopedic knowledge Reading Recognition- oral test of reading Reading Comprehension- measure understanding of what the tester reads Mathematics- Multiple-choice test to examine knowledge and application of mathematical concepts. Spelling- Measures subjects ability to recognize letters after hearing their sound or name. Written Expression – At level two, the test tests subjects writing skill Materials provided/needed: The book of plates that contains the six subtests, the manual, test records, written expression response booklet and pronunciation guide cassette all of which are included. Alternate Forms: There is only one form for the Piat-R. There are however five subtests. The written subject has a form A and form B. The form A is for younger children and the form B has two writing prompts to choose from. Administration Age ranges: Kindergarten through Grade 12 (Ages 5-0 to 18-11) Administration and scoring time: One hour is usually enough time to complete this assessment, and additional 15 minutes can be added to record observations, obtain derived scores and plot profiles. The test is not timed, the only part that is timed is the written expression where subjects are given 20 minutes. Administrators should wait 30 seconds on the mathematics and 15 seconds on the other subtest, when waiting for a response. Types of scores reported: The test raw scores can be calculated into grade and age equivalents, grade-and-age based standard scores, percentile ranks, normal curve equivalents, and stanines. Starting points, basal and ceiling levels: The testers grade level determines the starting point. You start on the first subtest by simply finding their grade. To find the starting point of the next subtest you take the raw score from the previous subtest. If a subtest is omitted, the raw score on the subtest before the one that was omitted determines the starting point. The basal is the five highest correct consecutive answers. The ceiling is when the subject makes five errors in seven consecutive items. Standard error of measurement and confidence levels: The standard error of measurement mean of the total test raw score is 5.8; this is at the 68% confidence level. At the 95% confidence level the SEM is 11.368. At the 99% confidence level the SEM is 14.964. Norming Procedures Sampling procedures: A professional or counselor was designated for each school district. They surveyed, chose examiners and supervised the whole testing process at each location. The examiners were professionals who had experience in the education world. Every examiner had to attend a two-hour workshop; in order to make sure all examiners understood proper administration. Size and characteristics of sample: The sample was taken from 1,563 students ranging from kindergarten through Grade 12 in 33 communities nationwide. There were an additional 175 kindergarteners tested. Most students were from the public school (91.4%). Special education classes were excluded. The sample was given to an equal amount of sexes, and distributed around all geographic regions, socioeconomic statuses, and race/ethnic groups. There were more kindergartners (150) than any other age group, and this number got less with age (grades 9-12) in each region. Date of Norms: Between April and June of 1986. Reliability Split-Half Reliability: Calculated using raw scores on even items and raw score on odd items. The coefficient by grade ranged from .95-.98. The coefficient by age ranged from .93-.99. Kuder-Richardson Reliability: All six subtests by grade had a coefficient that ranged from .94-.98. The coefficient by age ranged from .95-99. Test-Rest Reliability: To obtain this data they randomly select 50 subjects in kindergarten, 2nd, 4th, 6th, 8th and 10th. These subjects were tested twice with a two to four week interval. The coefficient ranged from .84 to .96 by grade. The coefficient by age ranged from .90 to .96. Item Response Theory Reliability: This reliability was based on estimated raw scores for both age and grade. The coefficient by grade ranged from .96-.99. The coefficient by age ranged from .96-.99. Validity Content Validity: The information from the Kuder-Richardson and split-half reliability helped make the cause of content validity. The PIAT-R also went into detail of how they tested validity in ever subtest. They went into great research to make sure every subtest had content that accurately measured what it said it would measure. Construct Validity: Construct validity was proven in several ways. The first of these is developmental changes. Test scores are expected to increase with age and grade. The raw score means of this test increase with grade level, and age level. The second way this test proved construct validity was through correlation with other tests. The correlation of the PIAT to the PVVT-R was .72 for total mean correlation. They also correlated the PVT-R with the original PIAT. The correlation for these two tests for grades and age ranged from .83 - .97. Factor analysis was another way they proved validity. The intercorrelation between relationships among the test ranged from .61.95 among grade. The intercorrelation among tests of age ranged from .66-.96. Classroom uses Suggested by Authors: The test is used to measure functional knowledge and abilities that are seen in the education setting. It is suggested as a use for when a survey of scholastic achievement is needed. It also provides a diagnostic instrument for achievement level of subjects. Many professionals in the school setting can use the test in many settings. It provides information on student’s strengths and weaknesses, and behavior in testing situations. The manual says the test provides an insight into how the student learns and handles school subjects. This in turn helps provide a way for intervention and helping the student in his or her weaknesses. The test can be used outside of the school system. It could be used as a way to measure achievements of a candidate for a job or training. My Opinion of Appropriate Uses: I believe this is a test that would be great to determine a students overall academic level. If you have time, you could use this to test every student in the beginning of the year to just see where they’re at in every subject level, and have a better idea. However that seems very unlikely due to the size of the test. It would be valuable in the beginning stages of intervention (RTI), when trying to figure out why a student is falling behind. It could be used to measure progress; however once you find exactly where the student is struggling I believe you would give them an assessment tailored to that subject area. I believe this test would be best used to initially get an idea of a students strengths and weaknesses, and from there choose more tests or a plan of intervention. Desirable Uses It measures a variety of content areas Gives an overall idea of a student’s ability, strengths and weaknesses. They had a strong norm reference, with many variety and large sample. Great way to assess student’s total academic knowledge. Reliability coefficients averaged in the .90’s Seemed like there was a lot that went into making sure the test had strong validity in each subtest. Undesirable Uses Did not take a norm reference from children with special needs. No color in test booklet Longer test No Alternate Form Old Test A pretty high standard error of measurement compared to other tests. The book mentioned starting point is determined with chronological age but testing booklet says it’s with grade. Confusing. Have to stall between tests to count raw score. It seems like students would lose focus during this time.