Standardized Testing and California Schools’ API Scores What’s the Connection? Let’s Start Thinking 1. Where is the best place to examine direct data about student learning? 2. List at least three advantages and three disadvantages to using standardized assessment tools. 3. List at least three advantages and three disadvantages to using local or homegrown assessment tools. 4. What are some advantages to embedded assessment? What’s the Deal with Testing? As a society, we like numbers. If sometime can be quantified, it is viewed as valid or more scientific. If it cannot be quantified, we view the activity with suspicion. Machine scoring of a test is fast, efficient, and cheap. Hand scoring of a test is slow, time consuming, and very expensive. Lessons from the Past Mass testing came about in the late 1800’s / early 1900’s. Originally used to decide who was qualified to attend universities and who was bound to work in factories. Attempted to model the efficient factory methods of Henry Ford – test should be easy, cheap, and work for everyone. Early IQ Tests (the Alpha-Beta Tests) were developed for the U.S. Army as a way to decide the career path of new recruits. Early test also developed to determine which immigrants could enter the U.S. Standardized Tests – What’s the Difference? Criterion-Referenced Test Criterion-referenced tests, also called mastery tests, compare a person's performance to a set of objectives. Anyone who meets the criterion can get a high score. Everyone knows what the benchmarks / objectives are and can attain mastery to meet them. It is possible for ALL the test takers to achieve 100% mastery. Standardized Tests – What’s the Difference? Norm-Referenced Test Norm-referenced tests compare an individual's performance with the performance of others. They are designed to yield a normal curve, with 50% of test takers scoring above the 50th percentile and 50% scoring below it, so half the test takers MUST pass and half the test takers MUST fail The test makers design the test with questions that MOST people will get incorrect. If too many people get a question correct, or too many score well, then test questions are “thrown out” until they achieve a normal curve again. Interpreting Test Scores (some definitions) Raw score. This is the number of items the student answered correctly. It is used to calculate the other, more useful scores. Stanine. One of nine equal sections of the normal curve. Stanines can be easily averaged and compared from test to test, but are less precise than other scores. Normal curve equivalent (NCE). For these scores, the normal curve is divided into equal units ranging from 1 to 99, with an average of 50. These can be averaged and compared from test to test or year to year. Normal Curve Half of the test takers are grouped into the “passing” region of the curve and half into the “failing” region of the curve. So by definition, half the test takers MUST “fail”, i.e. be below the 50th percentile. State/School Goals So when a school says that their goal is to have 70% of their students above the 50th percentile, is this possible? Well, yes, but it would mean that another school would have to have 70% of their students below the 50th percentile. Closer to Home: San Diego City Schools (SDCS) In 2001, SDCS officials reported that as a district (second largest in the state), they had 66% of their students above the 50th percentile on the SAT/9 test for 2000. The news media reported “the shame of SDCS” because 1/3 of their students where below the 50th percentile. Was this a fair report?? MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTING Many educators and members of the public fail to grasp the distinctions between criterion-referenced and norm-referenced testing. It is common to hear the two types of testing referred to as if they serve the same purposes, or shared the same characteristics. Much confusion can be eliminated if the basic differences are understood. The following is adapted from: Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall, Inc. MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Norm-Referenced Tests Purpose To determine whether each student has achieved specific skills or concepts. To find out how much students know before instruction begins and after it has finished. To rank each student with respect to the achievement of others in broad areas of knowledge. To discriminate between high and low achievers. MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Content Measures specific skills which make up a designated curriculum. These skills are identified by teachers and curriculum experts. Each skill is expressed as an instructional objective. Norm-Referenced Tests Measures broad skill areas sampled from a variety of textbooks, syllabi, and the judgments of curriculum experts. MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Each skill is tested by at Characteristics least four items in order to obtain an adequate sample of student performance and to minimize the effect of guessing. The items which test any given skill are parallel in difficulty. Item Norm-Referenced Tests Each skill is usually tested by less than four items. Items vary in difficulty. Items are selected that discriminate between high and low achievers. MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Score Each individual is Interpretation compared with a preset standard for acceptable achievement. The performance of other examinees is irrelevant. A student's score is usually expressed as a percentage. Student achievement is reported for individual skills. Norm-Referenced Tests Each individual is compared with other examinees and assigned a score--usually expressed as a percentile, a grade equivalent score, or a stanine. Student achievement is reported for broad skill areas, although some normreferenced tests do report student achievement for individual skills. Tests Currently Used in California California Achievement Test – 6th Edition (CAT/6): National Norm Referenced Test California Standards Test (CST): State Norm Referenced Test w/ Scaled Scores Golden State Exam: Criterion Referenced Test CA-High School Exit Exam (CA-HSEE): Criterion Referenced Test Testing Case In Point Testing Case In Point In this scenario we will use a fictitious “norm-referenced” test being given a a single high school. Testing Case In Point John and his fellow students at Anywhere High School are given the “Let’s Achieve Test” version 1 (LAT/1). The LAT/1 is a norm-referenced test. Testing Case In Point John does not perform well on the test, compared to the other test takers. He scores below the 50th percentile and is classified “below grade level”. John spends the next school year getting extra tutoring, staying after school, and going to Saturday tutoring sessions. Testing Case In Point The following school year on the LAT/1, John performs better than he did the previous year. However, because of a school-wide focus on the test, all the other students in the school also perform better. As a result, John’s norm-reference test score is still below the 50th percentile and he is still classified as “below grade level”. Academic Performance Index (API) The API score was originated to provide a systematic method to rank order schools based on a number of criteria. It is to measure academic growth and performance of a school. The schools would receive a rank compared to ALL other schools in the state and a second ranking comparing them to SIMILAR schools around the state. Early Proposed API Criteria (1999): Test Results (SAT/9) – 60% of score Attendance Rates Graduation Rates Other statewide test results (GSE, CA-HSEE) From 1999 to 2002 ONLY the SAT/9 Test results are used to calculate 100% of a school’s API score. Current API Criteria (baseline set in 2002): California Achievement Test (CAT/6) – about 12% of score. Includes mathematics, reading, language, science California Standards Test (CST) – about 73% of score. Includes mathematics, science, language arts, social science CA- High School Exit Exam (CA-HSEE) – about 15% of score. Eventually API scores will also include graduation and attendance rates from schools as part of the overall “score”. Consider This So, does this system adequately measure the success of CA students? Does it reflect the learning that is happening in CA classrooms? Some Questions What are the appropriate uses of Normreference tests? Criterion-reference tests? How should these test be used at the state/district/school level? What role does testing play in looking at school performance? Student performance? Teacher performance? The Real Question We Should Ask Testing is a reality that is here to stay. It has been legislated by the state of CA under the STAR system and by the federal government by the NCLB Act. So we should really be asking; How do we use these tools to support students and their learning in CA schools?