Tests and Time: An Assessment Guide for Education Leaders Table of Contents From the length and frequency of testing to the ability of assessments to track student learning over time, this research-driven guide explores the different interactions of tests and time—and what they mean for today’s educators. 6 Section 1 Finding your best assessment, part one: Length, reliability, and validity 6 14 Section 2 Finding your best assessment, part two: Learning progressions 21 Section 3 Using your best assessment: Questions that help guide assessment 28 14 21 28 33 Section 4 Testing timeline: A brief history of American assessment innovations 33 Section 5 Action steps: Checklist for evaluating assessments 35 Section 6 Recommended reading: Books, articles, and more 35 ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 2 Assessment Guide for Education Leaders Finding your best assessment: Length, reliability, and validity Bigger is better . . . isn’t it? When it comes to assessment, this seems to be a common misconception: Some educators mistakenly believe that longer tests are always better measures of student learning. However, that’s just not true. Thanks to the many innovations that have occurred over the last 170 years of American assessment, today it’s absolutely possible to have very short assessments that provide educators with highly reliable, valid, and meaningful data about their students. In part one of Finding your best assessment, we’ll explore the test designs, technology, and approaches that make shorter tests possible—and help you figure out if a shorter test is a better option for you and your students. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. Takeaways for Education Leaders Longer tests are not always better measures of student learning. When evaluating an assessment, consider: • • • Item format: The item format used, as well as the design of individual items, affects test length. Two tests could assess the same content and have similar reliability and validity, but one could take much longer. Item quantity: There is a direct relationship between the number of items on a test and its length, but a “diminishing returns” curve for the number of items and the reliability of an assessment. Adding a large number of items may have only minimal impact on reliability—and that small increase in reliability may be meaningless depending on how you use the results. Computer-adaptive testing: A well-designed CAT may require half the number of items to be as reliable as a traditional test. In addition, CAT also tends to be more reliable for low- and high-achieving students because it adapts to their precise level. Ask if a shorter test could achieve your reliability and validity goals, while saving precious instructional time. Assessment Guide for Education Leaders: Length, reliability, and validity Section 1 Sample assessment item formats How does item format affect testing time, reliability, and validity? An assessment can feature one or more item formats, such as constructed-response items (e.g., essay questions), performance tasks (e.g., asking a student to draw a graph or conduct an experiment), and selectedresponse items (e.g., true/false and multiple-choice questions). It’s important to note that there is no universal “best” item format. A well-designed assessment may use any of these formats, or a mix of formats, and be reliable and valid. Which format to use depends on what you’re trying to assess. If you want to understand a student’s opinions or thinking process, constructed-response items and performance tasks may be good options. For measuring a wide range of content—such as mastery of state standards—in a single session, selected-response items are often a better choice. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. There is no universal “best” item format. A well-designed assessment may use any of these formats, or a mix of formats, and be reliable and valid. That said, there are some key differences between item formats when it comes to the time needed to administer and score an assessment. Constructed-response items and performance tasks usually take more time to complete than selected-response questions. Because these first two formats often must be graded by hand, they also tend to take longer to score than selectedresponse items, which can often be instantaneously and accurately scored by computer. This is important because the faster teachers can get results from assessments, the faster they can act on that data and get students the help they need. 4 Assessment Guide for Education Leaders: Length, reliability, and validity Section 1 Location of answer blanks in multiple-choice questions Even within an item format, there is still a lot of variability. Take fill-in-the-blank or cloze items, for example. If you place the answer blank near the beginning of the sentence (or “stem”), students may need to reread the question multiple times to understand what it’s asking. The more students have to reread, the longer the test will take. Placing the blank near the end of the sentence can help minimize the amount of rereading required and thus speed up the test. This means two multiple-choice questions could have the same number of items, assess almost identical content, and have very similar reliability and validity, but one could take longer to complete! Clearly, longer tests are not inherently better. When writing your own test or purchasing one from an assessment provider, be sure to ask these questions about item format: • • • • What is the best item format for the content of this assessment? Does this item format match any time limitations I may have for test administration? How will this item format be scored and how long will it take to get results? Are the items written in a way that minimizes rereading and overall testing time? How does item quantity affect testing time, reliability, and validity? Another important element to consider is the number of items included in the assessment. The quantity of items will have a notable impact on length, reliability, and validity. In general, the more items an assessment has, the longer the assessment will take. It’s pretty clear that a test with ten multiple-choice questions will usually take longer than one with only two multiple-choice questions. However, this rule isn’t universal—as discussed above, different item formats have different time requirements, so two essay questions may take much longer than ten multiple-choice questions. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 5 Assessment Guide for Education Leaders: Length, reliability, and validity Section 1 Increasing the number of items on an assessment lengthens the administration time Assuming all questions on an assessment have the same format, there is a fairly linear relationship between quantity and length. Doubling the number of questions on a test will also double the test’s length (or could lengthen the test further if student fatigue or boredom slow response times). Plotted on a graph, the relationship will generally be a straight line. Longer assessments also tend to be more reliable—but the relationship between item quantity and length is very different than the relationship between item quantity and reliability. If you have a short test, adding one or two additional items could markedly increase the reliability. However, if you have a long test, adding a few questions may have only a tiny effect on reliability. It’s a classic case of diminishing returns and, at a certain point, it’s just not worth it to keep adding items. In fact, for a very long test, adding many more items may barely improve reliability—and might even decrease reliability if students get tired or bored and start to guess answers. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. If you have a short test, adding one or two additional items could markedly increase the reliability. However, if you have a long test, adding a few questions may have only a tiny effect on reliability. 6 Assessment Guide for Education Leaders: Length, reliability, and validity Section 1 Increasing the number of items has diminishing returns on reliability On a graph, the relationship between quantity and reliability is usually a curve that starts out steep and then flattens as more and more items are added. (The exact shape of this curve will vary depending on the assessment.) Since reliability is a key component of validity—assessments must be reliable in order to be valid—validity often follows a similar curve in relation to test length. Try this thought exercise. Say you have two tests that use the same score scale from 0 to 100. One is 15 minutes long with an SEM of 4, meaning a child who scores 75 is likely to have a “true score” as low as 71 or as high as 79. The other test is 60 minutes long and has an SEM of 2. In this case, a child who scores 75 is likely to have a “true score” as low as 73 or as high as 77. Which test is better? Do tests with more items take more time—are longer tests longer? Definitely, yes. Are longer tests more reliable? Generally, yes. Are they better? That’s a different question entirely. Well, that depends on what you’re going to do with the assessment data. If this is a summative assessment that is administered only once and determines if a student advances to the next grade, you may decide the longer test with the smaller SEM is the best option. Accuracy would be paramount and you’d lose relatively little instructional time. Let’s first look at something called standard error of measurement (SEM), which is closely related to reliability: As reliability increases, SEM decreases. In simple terms, SEM describes the size of the range in which a student’s “true score” is likely to fall. Since no test can get a “perfect” measure of a student’s ability (their “true score”), all tests have an SEM. If the SEM is 1 and a student’s score is 5, their “true score” is likely to be 4 (5 minus SEM), 5, or 6 (5 plus SEM). It’s important to consider a test’s score range when contemplating SEM: If the score range is only 30 points, you should worry if the SEM is 10 points—but if the range is 300, an SEM of 10 could be very good! For this reason, you cannot compare SEMs between two different tests (an SEM of 10 may be worse than an SEM of 100 depending on the ranges). ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. On the other hand, if you’re using the assessment to inform instructional decisions and testing students every month, you may choose the shorter assessment with the larger SEM. Perhaps you’re grouping all students with scores between 60 and 80 in the same instructional group. In this case, a score of 71, 75, or 79 won’t change the instruction a child receives and you’ll get 45 more minutes of instructional time every month. Here, efficiency takes priority and shorter is far better. 7 Assessment Guide for Education Leaders: Length, reliability, and validity Section 1 CAT tailors item difficulty to match a student’s skill level So which is better? Only you can answer. When evaluating length and reliability, you should: • • • • Ensure all assessments meet your reliability standards—an unreliable test is no good, no matter how short or long it is. Get the greatest reliability per minute—if two tests have similar reliability and validity, the shorter one is often the better choice. Take a holistic look at your return on investment—if you’re sacrificing a lot of extra time for a little extra reliability, think about how you’ll use the results and whether the small increase in reliability is meaningful for your purposes. Watch out for tests that are longer than your students’ attention spans—fatigue, boredom, and stress can all result in artificially low scores that don’t reflect students’ real skill levels. How does computer-adaptive testing (CAT) affect testing time, reliability, and validity? Remember how we said that longer tests are generally more reliable? There’s one big exception to that rule: computer-adaptive testing, also known as computerized adaptive testing or CAT. With CAT, each student’s testing experience is unique. When a student answers a question correctly, the assessment automatically selects a more difficult item to be the next question. When a student answers a question incorrectly, the opposite occurs and the next item is less ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. difficult than the current one. Since a well-designed CAT assessment has thousands of questions in its item bank, even if students have similar skill levels, they’ll likely see different questions—and the same student could even test and re-test without items being repeated. Tailoring item difficulty according to student ability has several notable benefits. Since students do not see overly challenging items, they are less likely to be stressed, anxious, or intimidated. Similarly, since they do not see overly easy items, they are less likely to become bored or careless. When these negative emotions and distractions are reduced, scores tend to be more reliable and valid. Perhaps more important, CAT requires fewer items to gauge a student’s skill level since it essentially “skips” the too-easy and too-hard questions that would need to be included on a traditional test. Think of it this way: If you have no idea what a student’s math level is, a traditional or fixed-form test (where all students see the same questions) needs to include everything from basic addition all the way up to advanced algebra or even calculus. That’s a very long test—and if it only has a few questions for each skill, it may not be very precise! “The basic notion of an adaptive test is to mimic automatically what a wise examiner would do.” — Howard Wainer 8 Assessment Guide for Education Leaders: Length, reliability, and validity Section 1 Meanwhile with CAT, if a student answers two or three algebra questions correctly—something like finding x if (22x – 57) • (144 + 289) = -5,629—then the test will not ask basic addition, subtraction, and multiplication questions. Instead, it will present more and more difficult questions until the student has an incorrect answer. This means a computer-adaptive test needs fewer items overall to pinpoint the student’s ability level. In addition, CAT can provide more precise measures for low-achieving and high-achieving students than traditional tests. Consider the lowachieving student who cannot answer any questions on a 50-item fixed-form algebra test; we know his math skill level is below algebra, but how far below? The traditional test provides little insight. In contrast, after a student provides an incorrect answer on a computer-adaptive test, the test will continue to adapt down until it finds the student’s actual level, even if it’s far below the initial starting point. CAT essentially mimics what an expert teacher would do if she could personally question each student and thus can provide a more precise measure of student skill. As a result of all these factors, a welldesigned CAT can be two or more times as efficient as traditional tests. In other words, if a traditional test has 50 items, a CAT assessment may need only 25 items to reach the same reliability; or, if both tests have the same number of items, the CAT one is likely to be more reliable. Along with improved reliability, CAT assessments also tend to have improved validity over traditional assessments. Shorter tests can be more reliable and more valid! In fact, hundreds of studies have shown that even very short CAT assessments (including ones that take some students as little as 11 minutes to finish) can predict performance on much longer fixed-form assessments, such as yearend summative state tests. The benefits of CAT are so great that some high-stakes assessments are moving from fixed-form to computer-adaptive. Is CAT right for you and your students? You should definitely consider CAT if: • • • Your students represent a wide range of ability levels and you need to know exactly how high or how low their skill levels are— CAT means you can use one test for an accurate measure of all students’ skill levels. You have limited time to administer tests or need to assess students multiple times throughout the year—shorter CAT assessments could help you protect your instructional time without sacrificing reliability or validity. You want a reliable measure of student progress over the school year or from year to year—CAT assessments will adapt up as students’ skill levels improve, allowing you to use the same test in multiple grades for directly comparable scores. On-Demand Webinar Can CAT assessments provide the insights you need to improve students’ high-stakes test performance? Watch Predicting Performance on High-Stakes Tests to hear one researcher discuss why one popular CAT assessment is a good predictor of performance on year-end summative state tests. Available at www.renaissance.com/resources/webinars/ predicting-performance-on-high-stakes-tests-recorded-webinar/ ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 9 Assessment Guide for Education Leaders: Length, reliability, and validity Section 1 Are you assessing in the past—or in the future? The United States’ first mandated written assessment took one hour. That was back in 1845, when multiple-choice questions had yet to be invented, formal adaptive testing didn’t exist, and digital computers were almost 100 years away. Today, things have changed a lot, but many assessments still seem stuck in the past. How many “modern” assessments still take an hour or more? How many need to take that long? Could a shorter test give you the reliability, validity, and insights you need to support student success? Before you decide on your best assessment, there’s one more factor you need to consider—the critical element that bridges assessment and instruction. In part two of Finding your best assessment, we’ll examine why learning progressions are so essential for good assessments. This section is part of Tests and Time: An Assessment Guide for Education Leaders, a six-part guide that examines the factors that affect test length, reliability, and validity; the role of learning progressions in making assessment data actionable; critical questions educators should ask before scheduling assessments; and a brief history of assessment innovations in the United States. Download the complete guide at www.renaissance.com/tests-and-time ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 10 Assessment Guide for Education Leaders: Length, reliability, and validity Section 1 Sources Beck, J. E., Mostow, J., & Bey, J. (2003). Can automated questions scaffold children’s reading comprehension? Pittsburgh, PA: Carnegie Mellon University Project LISTEN. Bell, R., & Lumsden, J. (1980). Test length and validity. Applied Psychological Measurement, 4(2), 165-170. Brame, C. J. (2013). Writing good multiple choice test questions. Vanderbilt Center for Teaching. Retrieved from https://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions Clay, B. (2001). Is this a trick question? A short guide to writing effective test questions. Lawrence, KS: University of Kansas Curriculum Center. Croft, M., Guffy, G., & Vitale, D. (2015). Reviewing your options: The case for using multiple-choice test items. Iowa City, IA: ACT. Fetzer, M., Dainis, A., Lambert, S., & Meade, A. (2011). Computer adaptive testing (CAT) in an employment context. Surrey, UK: SHL. Galli, J. A. (2001). Measuring validity and reliability of computer adaptive online skills assessments. Washington, DC: Brainbench. Retrieved from https://www.brainbench.com/xml/bb/mybrainbench/community/whitepaper.xml?contentId=938 Leung, C. K., Chang, H. H., & Hau, K. T. (2003). Computerized adaptive testing: A comparison of three content balancing methods. The Journal of Technology, Learning, and Assessment, 2(5), 1-15. Livingston, S. A. (2009). Constructed-response test questions: Why we use them; how we score them. ETS R&D Connections, 11, 1-8. Mattimore, P. (2009, February 5). Why our children need national multiple choice tests. Retrieved from http://www.opednews.com/articles/Why-Our-Children-Need-Nati-by-Patrick-Mattimore-090205-402.html Monaghan, W. (2006). The facts about subscores. ETS R&D Connections, 4, 1-6. Princeton, NJ: Educational Testing Services. Retrieved from http://www.ets.org/Media/Research/pdf/RD_Connections4.pdf Nicol, D. (2007). E-assessment by design: Using multiple-choice tests to good effect. Journal of Further and Higher Education, 31(1), 53–64. Phipps, S. D., & Brackbill, M. L. (2009). Relationship between assessment item format and item performance characteristics. American Journal of Pharmaceutical Education, 73(8), 1-6. Popham, W. J. (2008). Classroom assessment: What teachers need to know (5th ed.). Boston: Allyn and Bacon. Popham, W. J. (2009). All about assessment / unraveling reliability. Educational Leadership, 66(5), 77-78. Renaissance Learning. (2015). Star Math technical manual. Wisconsin Rapids, WI: Author. Renaissance Learning. (2015). Star Reading technical manual. Wisconsin Rapids, WI: Author. Shapiro, E. S., & Gebhardt, S. N. (2012). Comparing computer-adaptive and curriculum-based measurement methods of assessment. School Psychology Review, 41(3), 295-305. Stecher, B. M., Rahn, M., Ruby, A., Alt, M., Robyn, A., & Ward, B. (1997). Using alternative assessments in vocational education. Santa Monica, CA: RAND Corporation. Stiggins, R. J. (2005). Student-involved classroom assessment for learning (4th ed.). Upper Saddle River, NJ: Pearson/Merrill Prentice Hall. Wainer, H. (2000). CATs: Whither and whence. Princeton, NJ: Educational Testing Services. Retrieved from https://doi.org/10.1002/j.2333-8504.2000.tb01835.x Weiss, D. J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and Evaluation in Counseling and Development, 37, 70-84. Retrieved from http://www.psych.umn.edu/psylabs/catcentral/pdf%20files/we04070.pdf Wells, C. S., & Wollack, J. A. (2003). An instructor’s guide to understanding test reliability. Madison, WI: University of Wisconsin Testing & Evaluation Services. Young, J. W., So, Y., & Ockey, G. J. (2013). Guidelines for best test development practices to ensure validity and fairness for international English language proficiency assessments. Princeton, NJ: Educational Testing Service (ETS). ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 11 Assessment Guide for Education Leaders Finding your best assessment: Learning progressions All educational assessments, from lengthy high-stakes summative exams to quick skill checks, share one thing in common: They provide a measure of student learning— knowledge, skill, attitude, and/or ability—at a specific moment in time. When you string together the results of multiple test administrations, you can sometimes zoom out and get a slightly bigger picture: You can see where the student is now, understand where they have been in the past, and perhaps even get a sneak peek of where they may go in the future. But how do you get the full picture? Not just where students are predicted to go, but what their ultimate destination looks like and how they can get there? Furthermore, how do you make that assessment data actionable so you can use it to help students move forward? The answer is a learning progression. A learning progression is the critical bridge that connects assessment, instruction, practice, and academic standards. However, while the right learning progression can help educators and students find a clear path to success, the wrong learning progression may lead them astray and make it that much harder for them to achieve their goals. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. Takeaways for Education Leaders Learning progressions are roadmaps for the learning journey, showing how students develop greater understanding over time. They arrange skills in a teachable order that educators can use to inform instruction. When an assessment is linked with a learning progression, a student’s assessment score can place them into the learning progression—revealing which skills they are likely ready to learn next. Some learning progressions also come with a wealth of resources, such as detailed descriptions or examples of each skill, approaches to teaching the skill for students with different learning needs (such as English Language Learners), and even instructional materials to help teach the skill as well as activities that give students practice with the skill. When evaluating assessments, look for tests that are linked to a high-quality learning progression: One that is based on research, built and reviewed by experts, empirically validated using real-world data, and continually updated. It’s also important to select a learning progression that is aligned with your state’s specific academic standards, or else the learning progression may place skill instruction in the wrong grade. Assessment Guide for Education Leaders: Learning progressions Section 2 Learning progressions show the way forward In this section, we’ll examine what a learning progression is, how it can be used, and why it’s so important to find a learning progression that truly fits your specific needs. What are learning progressions? Learning progressions are essentially roadmaps for the learning journey. If academic standards represent waypoints or landmarks along a student’s learning journey (what is learned), with college and career readiness being the ultimate destination, then learning progressions describe possible paths the student might take to get from one location to another (how learning progresses). Dr. Karin Hess, an internationally recognized educational researcher, described learning progressions as “researchâ based, descriptive continuums of how students develop and demonstrate deeper, broader, and more sophisticated understanding over time. A learning progression can visually and verbally articulate a hypothesis about how learning will typically move toward increased understanding for most students.” In simplified terms, learning progressions provide a teachable order of skills. They differ from scopes and sequences in that learning progressions are networks of ideas and practices that reflect the interconnected nature of learning. Learning progressions incorporate both the vertical alignment of skills within a domain and the horizontal alignment of skills across multiple domains. For example, learning progressions recognize that the prerequisites for a specific skill may be in a different domain entirely. A learning progression may also show that skills that at first appear unrelated are often learned concurrently. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. Learning progressions can come in all shapes and sizes; they could describe only the steps from one standard to another, or they could encompass the entirety of a child’s pre-K–12 education. It is critically important to note that there is no one “universal” or “best” learning progression. Think of it this way: With mapping software, there are often multiple routes between two points. The same is true with learning; even the best learning progression will represent only one of many possible routes a student could take. Good learning progressions will describe a “commonly traveled” path that generally works for most students. Great learning progressions will put the educator in the driver’s seat and allow her to make informed course adjustments as needed. “A learning progression can visually and verbally articulate a hypothesis about how learning will typically move toward increased understanding for most students.” — Dr. Karin Hess 13 Assessment Guide for Education Leaders: Learning progressions Section 2 Learning progressions identify which skills act as prerequisite “building blocks” for future learning How do I use a learning progression? To get the most out of a learning progression, an educator must understand how learning progressions relate to assessment and instruction. Returning to our roadmap metaphor, academic standards describe the places students must go. Learning progressions identify possible paths students are likely to travel. Instruction is the fuel that moves students along those paths. And assessment is the GPS locator—it tells educators exactly where students are at a specific moment in time. (Great assessments will also show how fast those students are progressing, if they’re moving at comparable speeds to their peers, and whether they’re accelerating or decelerating—and may even predict where they’ll be at a future time and if they’re on track to reach their learning goals.) When an assessment is linked with a learning progression, a student’s score places them within the learning progression. While the assessment can identify which skills a student has learned or even mastered, the learning progression can provide the educator insights into which skills a student is likely ready to learn. It answers the question, “What’s next?” In this way, learning progressions provide a way for assessments to do more than just report on learning that has occurred so far. As Margaret Heritage and ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. Frederic Mosher have stated, “if assessments are seen as standing outside regular instruction, no matter how substantively informative and educative they are … they are very unlikely to be incorporated into and have a beneficial effect on teaching.” Learning progressions are how assessments get “inside” instruction. The combination of the two allows assessments to meaningfully inform instruction. Educators can use learning progressions to plan or modify instruction to make sure that students are getting the right instruction and content at the right time. If an educator knows she needs to teach a specific skill, she can use the learning progression to look backward and see what prerequisites are needed for that skill. She may also seek out other skills that can be taught alongside her selected skill for more efficient and interconnected teaching, avoiding the repetitious learning than can occur when individual skills are taught in isolation. For a student who is lagging far behind peers and needs to catch up quickly, a learning progression can identify which skills are the “building blocks” that students need not just for success in their current grade but to succeed in future grades as well (for example, a student will struggle to progress in math if they cannot multiply by 5, but not knowing how to count in a base-5 or quinary numeral system is likely to be less of a roadblock). As a result, the teacher can better concentrate her time and instruction on these focus skills to help the student reach grade level more quickly. 14 Assessment Guide for Education Leaders: Learning progressions Section 2 Well-designed learning progressions are often paired with useful resources, such as detailed descriptions or examples of each skill, approaches to teaching the skill for students with different learning needs (such as English Language Learners), information about how the skill relates to domain-level expectations and state standards, and even instructional materials to help teach the skill as well as activities that give students practice with the skill. Having assessments and resources linked directly to skills within the larger context of the learning progression is ideal for personalized learning, as it allows educators to quickly see where students are in relation to one another and to the larger learning goals, determine the next best steps to move each student’s learning forward, and find tailored resources that match a student’s specific skill level and learning needs—and ensure that all students are consistently moving toward the same set of grade-level goals, even if they are on different paths. However, not all learning progressions are well designed— and even an exceptionally well-designed learning progression may lead you and your students astray if it’s not designed to meet your specific needs and goals. How do I find the right learning progression? The key is finding a learning progression that’s not just well designed—it should also be designed in a way that matches how you will implement and use it with students. So what should you look for in a learning progression? First, it’s very unlikely that you will be selecting a learning progression as a stand-alone resource. Moreover, your learning progression should be tightly linked to your assessment—otherwise you will have no reliable way to place your students in the learning progression. This means that evaluating learning progressions should be a nonnegotiable part of your overall assessment selection process. (What about assessments without a learning progression? Remember that learning progressions are how assessment data gets “inside” instruction; without a learning progression, it is difficult for assessment to meaningfully inform instruction. Outside of summative assessments—designed to report on past learning rather than guide future instruction—we cannot recommend investing in assessments that are not linked to highquality learning progressions.) With that in mind, look for a high-quality learning progression that is: • • • • Based on research about how students learn and what they need to learn to be ready for the challenges of college, career, and citizenship Built and reviewed by educational researchers and subject matter experts, with guidance and advice from independent consultants and content specialists Empirically validated using real-world student data to make sure the learning progression reflects students’ actual (observed) order of skill development and that assessment scores are appropriately mapped to the learning progression Continually updated based on new research, changes in academic standards, ongoing data collection and validation efforts, and observations from experts in the field The foundation of a high-quality learning progression ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 15 Assessment Guide for Education Leaders: Learning progressions Section 2 If you would not use a map of California to navigate your way around Virginia... Why would you use a Common Core learning progression if your state doesn’t use Common Core? However, even the highest-quality learning progression in the world may be a poor choice if it’s not actually a good fit for your specific needs. In the United States, one of the biggest and most important factors impacting fit is state standards. Just as there is frequently more than one route to a single destination, there are many skills that can be taught and learned in different, equally logical orders. Different states often choose different orders for the same skills, some add skills that others remove, and many call the same skill by different names. Each learning progression presents one possible order or “roadmap” of learning— but that doesn’t mean it’s the only order out there or that it necessarily matches the order found in your state’s standards. Using a learning progression that is built for another state can be a bit like trying to navigate your way around one state using a map of a completely different state. You may find it quite difficult to reach your destination in time! Imagine if a teacher in Virginia, whose students must meet the Virginia Standards of Learning, tries to rely on a learning progression aligned with California Common Core State Standards. Even if the Common Core learning progression is of exceptionally high quality, it will be a poor fit for her needs. For example, Virginia students are expected to be able to describe what a median is and to calculate it for any given number set by the end of fifth grade. In Common Core states, that skill is part of the sixth-grade standards. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. If our Virginia teacher follows the Common Core learning progression, she will fail to teach her kids a key aspect of her state’s standards and leave them unprepared for their high-stakes math test at the end of the school year. Alternately, she could compare the Common Core learning progression to the Virginia standards and try to keep track of all the ways the two differ—but that’s a lot of extra work for a teacher who already has to squeeze so much into each school day. If she had a learning progression that already matched the order of the Virginia Standards of Learning, our Virginia teacher could use the learning progression to help all her students meet all the state’s standards, in the right order and in the right grade for Virginia. Your learning progression should be a helpful tool that suggests a logical, easy-to-follow route to success with your state standards. It should not be an obstacle course that forces you to constantly zigzag between another state’s requirements and your own state’s standards. Taking things one step further, your learning progression should support a flexible approach to learning. After all, you know your students better than any learning progression ever will. The learning progression doesn’t teach students. It won’t monitor their work to see if they understood the concept taught yesterday. It can’t decide what is a realistic goal for a student. Only you can do that. But your learning progression should make doing all of that easier. 16 Assessment Guide for Education Leaders: Learning progressions Section 2 Look for learning progressions that you can interact with—you should be able to look forward and backward within the learning progression to understand how skills can develop. You will want to be able to search for specific skills or standards across the entire learning progression and then find related resources. Your learning progression should also allow you to bundle skills together for instructional purposes, or ungroup any skills to teach discretely if need be. After all, the learning progression is only the roadmap—you’re the one actually driving the car. How do I find my best assessment? There are a lot of factors that go into assessment selection—and those factors may differ depending on your school’s or district’s specific needs, goals, and initiatives. However, there are a few universal elements to consider. As we explored in part one of Finding your best assessment, you should consider the format of the assessment and the items within it. Are they the best option for the content you’re trying to measure? How quickly will you get results? Think about the reliability, validity, and return on investment. Does the assessment meet your reliability goals? Are you sacrificing a lot of instructional time for a little bit of extra reliability? Consider whether the assessment is appropriate for all your students. Will it work as well for your low-achieving and high-performing students as it does for your on-level students? Can you use it to measure progress over time, including over multiple school years? Examine the learning progression as well. Does the assessment map to a learning progression? Is it a well-designed learning progression? Is the learning progression aligned to your specific state standards? Does it include additional resources to support instruction and practice? Can you easily interact with and navigate through the learning progression? Once you’ve selected your best assessment and are ready to use it, you may encounter a different set of questions: When do I assess my students? How often? Why? In the next section, this guide will look at recommendations about assessment timing and frequency from the experts. “There is no single, universally accepted and absolutely correct learning progression.” — W. James Popham This section is part of Tests and Time: An Assessment Guide for Education Leaders, a six-part guide that examines the factors that affect test length, reliability, and validity; the role of learning progressions in making assessment data actionable; critical questions educators should ask before scheduling assessments; and a brief history of assessment innovations in the United States. Download the complete guide at www.renaissance.com/tests-and-time ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 17 Assessment Guide for Education Leaders: Learning progressions Section 2 Sources Blythe, D. (2015). Your state, your standards, your learning progression. Renaissance Blog. Wisconsin Rapids, WI: Renaissance Learning. Retrieved from https://www.renaissance.com/2015/06/11/your-state-your-standards-your-learning-progression/ California Department of Education. (2014). California Common Core State Standards: Mathematics. Sacramento, CA: Author. Hess, K. K. (2012). Learning progressions in Kâ8 classrooms: How progress maps can influence classroom practice and perceptions and help teachers make more informed instructional decisions in support of struggling learners (Synthesis Report 87). Minneapolis, MN: University of Minnesota. Retrieved from http://www.cehd.umn.edu/NCEO/OnlinePubs/Synthesis87/SynthesisReport87.pdf Kerns, G. (2014). Learning progressions: Deeper and more enduring than any set of standards. Renaissance Blog. Wisconsin Rapids, WI: Renaissance Learning. Retrieved from https://www.renaissance.com/2014/10/02/learning-progressions-deeper-and-more-enduring-than-any-set-of-standards/ Mosher, F., & Heritage, M. (2017). A hitchhiker’s guide to thinking about literacy, learning progressions, and instruction (CPRE Research Report #RR 2017–2). Philadelphia, PA: Consortium for Policy Research in Education. National Research Council (NRC). (2007). Taking science to school: Learning and teaching science in grades K–8. Washington, DC: The National Academies Press. Retrieved from https://doi.org/10.17226/11625. Popham, W. J. (2007). All about accountability / The lowdown on learning progressions. Educational Leadership, 64(7), 83-84. Retrieved from http://www.ascd.org/publications/educational-leadership/apr07/vol64/num07/The-Lowdown-on-Learning-Progressions.aspx Renaissance Learning. (2013). Core Progress for math: Empirically validated learning progressions. Wisconsin Rapids, WI: Author. Renaissance Learning. (2013). Core Progress for reading: Empirically validated learning progressions. Wisconsin Rapids, WI: Author. Renaissance Learning. (2013). The research foundation for Star Assessments: The science of star. Wisconsin Rapids, WI: Author. Virginia Department of Education. (2016). Mathematics Standards of Learning for Virginia public schools. Richmond, VA: Author. Wilson, M. (2018). Classroom assessment: Continuing the discussion. Educational Measurement, 37(1), 49-51. Wilson, M. (2018). Making measurement important for education: The crucial role of classroom assessment. Educational Measurement, 37(1), 5-20. Yettick, H. (2015, November 9). Learning progressions: Maps to personalized teaching. Education Week. Retrieved from https://www.edweek.org/ew/articles/2015/11/11/learning-progressions-maps-to-personalized-teaching.html ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 18 Assessment Guide for Education Leaders Using your best assessment: Questions that help guide assessment Once you’ve found a timely, trustworthy assessment with a well-designed learning progression, it’s time to start assessing your students! . . . or is it? Over-testing is a real concern in today’s schools. A typical student may take more than 100 district and state exams between kindergarten and high school—and that’s before you count the tests included in many curriculum programs and educatorcreated tests. How can you balance the assessments that help inform instruction with ensuring enough time is reserved for that instruction? When should you assess your students? Which students should you assess? How frequently? What factors should drive your assessment schedule? Takeaways for Education Leaders The assessment process isn’t just about gathering data; more critical is using that data to answer questions and make decisions. The questions you’re trying to answer can help you determine if you should assess students, which students to assess, what type of assessment to use, and when and how often to use it. Educators may find it helpful to organize questions—and associated assessments—into three broad categories: • • • Discover questions to guide interim assessments, universal screeners, diagnostic tools, and similar tests. These questions help you discover your students’ specific strengths and unique needs prior to a new phase of learning. Monitor questions to guide progress monitoring and formative assessment. These questions help you monitor student progress during the current phase of learning. Verify questions to guide summative assessments. The questions help you decide if students should advance and if your core instructional and supplemental programs are working. If you do not have a specific question in mind that an assessment will answer, consider not assessing students and instead devoting the time to instruction, practice, or other learning activities. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 19 Assessment Guide for Education Leaders: Questions that help guide assessment Section 3 These questions often challenge educators, in part because there are no universal answers save “It depends.” It depends on the specific needs and goals of your school or district. It depends on your students, the challenges they face, and the strengths they can build on. It depends on how your team approaches teaching and learning. It depends on so much. But that doesn’t mean there aren’t any answers at all. We looked at research, consulted with experts, and found a few insights we think will help you discover the right rhythm for your assessment schedule. Let your assessment questions guide you The assessment process isn’t just about gathering data— more critical is using that data to answer questions and make decisions. Educational professors John Salvia, James E. Ysseldyke, and Sara Witmer define assessment as “the process of collecting information (data) for the purpose of making decisions for or about individuals.” It follows that if there are no decisions to be made, there is no reason to gather the data. Eric Stickney, an educational researcher at Renaissance®, summarized this simply as “Never give an assessment without a specific question in mind.” The questions you have will help determine the type, timing, and frequency of assessment “Never give an assessment without a specific question in mind.” — Eric Stickney Assessing to discover your students Sometimes you have questions about what your students have learned—but sometimes you have questions about the students themselves. These latter questions arise most frequently before learning or a new phase of learning (e.g., new school year, new semester, or new unit) has begun. Because they’re generally broad, openedended questions that educators ask to learn more about their students or uncover issues that may otherwise have gone unnoticed, we can think of them as “discover” questions. Discover questions—and the assessment data that answers them—can help educators identify the unique traits of an individual student or understand the composition of a group of students. At the individual level, discover questions include: • • • • • • • • • • • Where is the student on my state-specific learning progression? Is this student ready for grade-level instruction? Is he on track to meet state standards? What are his strengths or weaknesses? What learning goals should I set for this student? Is additional, more targeted, testing needed? Is the student a candidate for intervention? Which skills is he ready to learn? How does he compare to his peers? Is he achieving typical growth? Did he experience summer learning loss? At the group level, discover questions include: • • The questions you’re trying to answer will help determine if you should assess students, which students to assess, what type of assessment to use, and when and how often to use it. For the purposes of this guide, we’re grouping assessment-related questions into three broad (and sometimes overlapping) categories: the “discover” questions that most frequently arise before learning begins, the “monitor” questions that occur during learning, and the “verify” questions that happen after a stage of learning has ended. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. • • • • • • What percentage of students are on track to meet grade-level goals? How many will likely need additional support to meet learning goals? What percentage qualify for intervention? How many should be further evaluated for special education service needs? How should we allocate current resources? Are additional resources needed? How do specific subgroups, such as English Language Learners, compare to the overall student population? Are there achievement gaps between different student groups? 20 Assessment Guide for Education Leaders: Questions that help guide assessment Section 3 Assessing before learning helps educators discover students’ unique needs There are several types of assessment that can be used to answer discover questions, including interim assessments, benchmark assessments or benchmarking assessments, universal screeners or universal screening assessments, diagnostic assessments, and periodic assessments. (Note “periodic assessments” aren’t actually an assessment type per se, but rather a description of how an assessment is used: periodically.) One thing to remember is that most of these terms describe how an assessment and its results are used rather than its content or format—one test can often serve multiple purposes! While this may make things confusing when trying to distinguish between the types, there is also a huge benefit to both educators and students: When one assessment can serve multiple functions, students can take fewer tests overall and more time can be devoted to instruction. How often you use these assessments depends on how often you find yourself asking discover questions and how often you think the answers may change. If you’re not asking questions or if not enough time has passed for the answers from the last assessment administration to have changed, then it’s probably not ideal to assess students again. In addition, if you’re not prepared to act on the assessment data or if you’re unable to make changes ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. based on assessment data, then we would caution against assessing your students. Assessment for assessment’s sake is not a practice we can recommend; the ultimate goal of assessment should be to inform educators and benefit student learning. Although only you can know the assessment timing and frequency that’s right for your students, many schools and districts opt to assess all students two to five times a year for “discover” purposes. Among these, most will assess near the beginning of the school year (which is especially helpful if you have new students without detailed historical records or students who may have experienced summer learning loss) and then again around the middle of the year (to check for changes and ensure students are progressing as expected). Some add another assessment after the first quarter or trimester (to make sure the year is off to a good start or to get better projections of future achievement), after the second trimester or third quarter (to ensure students are progressing as expected), and/or near the end of the school year (to provide a complete picture of learning over the year and a comparison point for the following school year). Some will also assess a specific subset of students more frequently. However, many times these assessments move from answering “discover” questions to answering “monitor” questions. 21 Assessment Guide for Education Leaders: Questions that help guide assessment Section 3 One assessment can often serve multiple purposes Assessing to monitor ongoing learning As the school year progresses, you may have questions about what and how your students are learning. This brings us to our second category of questions, the “monitor” questions, where you’re primarily looking to understand how students are progressing, if the instruction provided is working for them, or if changes need to be made. Whereas discover questions are more focused on the student before a new phase of learning beings, monitor questions are more focused on the student’s learning during the current phase of learning. Monitor questions include: • • • • • • • • • • • • • How is the student responding to core instruction? Is it working? How is the student responding to supplemental intervention? Is it working? Is she progressing as expected? Is she on track to meet her learning goals? How do I help her reach those goals? Is she experiencing typical, accelerated, or slowed growth? How does she compare to her peers? Is she learning the new skills being taught? Is she retaining previously taught skills? Which skills is she ready to learn next? Does she need additional instruction or practice? Does she need special support, scaffolding, or enrichment? Should she stay in her current instructional group or move to a different one? that progress monitoring and formative assessment are not assessment types in and of themselves. Instead, they describe processes that use assessment as part of an overall approach to monitoring student learning and shaping instruction. As with discover questions, how often you need answers to monitor questions will help decide how often to assess students. Once again, if you’re not asking questions or if you’re not able to act on the answers, reconsider whether your limited time is best spent assessing students. Don’t arbitrarily assess students just because a certain amount of time has passed. Depending on the rhythm of your instruction and how quickly students seem to grasp new skills, your formal assessment schedule may not be a regular one—perhaps you’ll assess one week, not assess for six, then assess every other week for a month. Assessing during learning helps educators monitor student progress and adjust instruction to keep students moving forward There are several approaches that can be used to address monitor questions. Perhaps the two most common are progress monitoring and formative assessment. Note ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 22 Assessment Guide for Education Leaders: Questions that help guide assessment Section 3 Assessing after learning helps educators verify student achievement at each stage of education Similarly, if you only have questions about specific students (perhaps those identified as “at risk” or “on watch” by your universal screener or those currently in intervention), you may choose to only assess those students instead of the whole group. Alternately, you could assess all students on a semi-regular basis and then choose to assess identified students on a more frequent basis. In all cases, the information gleaned from formative assessment may be very helpful in deciding which students should take formal assessments and how often they should take them. For formal assessments that are not part of instruction, time is also a critical factor to consider both when choosing an assessment tool and when deciding how frequently to use that tool. Consider an intervention model where students have 2.5 hours of intervention time per week. If your formal assessment takes one hour, using it biweekly means losing 20% of your intervention’s instructional time! Although you may want fresh insights every few weeks, you might decide the time sacrifice is too great and assess only once per month. Alternately, if your assessment tool takes only 20 minutes, you could assess every two or three weeks and lose very little instructional time. In short, consider assessment length when making timing and frequency decisions. (See the first section of this guide for more about the factors that affect assessment length, reliability, and validity.) Although discover and monitor frequently overlap—the assessment you use to answer your discover questions may be the same one you use to answer your monitor questions—they tend to diverge quite a bit from our last category. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. Assessing to verify completed learning When you hear “summative assessment,” your first thought may be the large, end-of-year or end-of-course state tests that are required in many grades. However, these are not the only summative assessments in education. Final projects, term papers, end-of-unit tests, and even weekly spelling tests can be examples of summative tests! Summative assessments provide the answers to our last category of questions, the “verify” questions. When a phase of learning has ended—such as a school year, course, or unit—educators often find it helpful to confirm what students have and haven’t learned. In some cases, this can help educators judge the effectiveness of their curriculum and decide if they need to make changes. In other cases, whether or not a student advances to the next phase of learning (such as the next course in a series or next grade) may be decided by a summative exam. At the state or even national level, summative assessments like the SAT can be helpful for seeing larger trends in education and evaluating the overall health of the education system. • • • • • • Which skills and standards has this student mastered? Did he understand the materials and content taught over the last learning period? Did he gain the skills and knowledge needed to meet state standards? What are his strengths, relative to the standards? What are his weaknesses, relative to the standards? How much did the student grow over this phase of learning? 23 Assessment Guide for Education Leaders: Questions that help guide assessment Section 3 • • • • • • • Is the student ready to progress to the next stage of learning? Did he reach his learning goals? How does he compare to his peers? How does my group of students, as a whole, compare to state or national norms? Did enough students master required standards? How effective are our core instructional and supplemental intervention services? Are there patterns of weakness among students that indicate a change in curriculum or supplemental program may be needed? Of all the assessments types, deciding when to schedule summative assessments may be the easiest. In many cases, your state or district has already determined the frequency and timing for you. For the summative assessments that you schedule, ask yourself two questions. First, “Is this phase of learning complete?” If it is, then ask, “Do I need an overall measure of learning for this phase of learning?” If both answers are yes, then you may want to administer a summative assessment. If the first answer is no, you may want to delay your summative assessment until the phase of learning is done. If the second answer is no, you may choose to skip a summative assessment entirely—not all phases of learning need to be verified by a summative assessment. Finding the balance between assessing and learning The key is finding the right balance between them. To help find that balance, keep these key questions in your mind whenever you think about assessing students: • • • • • • Do I have a specific question in mind that this assessment will help answer? Will those answers help me improve teaching and learning—is there a high-quality learning progression that helps the assessment connect with my instruction and curriculum? Do I have time to meet with the team and make plans based on the assessment data? Will I be able to make changes to instruction or content based on the assessment? Is assessment the best use of this time—or would it be better to dedicate it to instruction or practice? Could a shorter assessment answer my questions in less time—and thus preserve more time for teaching and learning? The truth is that no one knows your students like you do. Only you can determine which assessment is best for your needs. Only you can decide the right timing and frequency for that assessment. Only you can find the right balance. In the next section, you’ll find a checklist to help you find your best assessment and learning progression as well as decide when and how often to best use those resources. Every time you set aside time for assessment, the time available for instruction shrinks. However, the insights provided by assessments can be critical for improving teaching and learning. Instruction without assessment can sometimes be like driving with a blindfold on—you might not know you’ve gone off course until it’s too late. This section is part of Tests and Time: An Assessment Guide for Education Leaders, a six-part guide that examines the factors that affect test length, reliability, and validity; the role of learning progressions in making assessment data actionable; critical questions educators should ask before scheduling assessments; and a brief history of assessment innovations in the United States. Download the complete guide at www.renaissance.com/tests-and-time ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 24 Assessment Guide for Education Leaders: Questions that help guide assessment Section 3 Sources Black, P., Wilson, M., & Yao, S. (2011). Road maps for learning: A guide to the navigation of learning progressions. Measurement: Interdisciplinary Research and Perspectives, 9, 71-123. Center on Response to Intervention (n.d.) Progress monitoring. Retrieved from: https://www.rti4success.org/essential-components-rti/progress-monitoring Center on Response to Intervention. (n.d.) Universal screening. Retrieved from: https://rti4success.org/essential-components-rti/universal-screening Council of Chief State School Officers (CCSSO). (2012). Distinguishing formative assessment from other educational assessment labels. Washington, DC: Author. Retrieved from: https://www.michigan.gov/documents/mde/CCSSO_Assessment__Labels_Paper_ada_601108_7.pdf Fuchs, L.S. (n.d.) Progress monitoring within a multi-level prevention system. RTI Action Network. Retrieved from: http://www.rtinetwork.org/essential/assessment/progress/mutlilevelprevention Great Schools Partnership. (n.d.) The glossary of education reform. Retrieved from: https://www.edglossary.org/ Herman, J. L., Osmundson, E., & Dietel, R. (2010). Benchmark assessment for improved learning: An AACC policy brief. Los Angeles, CA: University of California. Lahey, J. (2014, January 21). Students should be tested more, not less. The Atlantic. Retrieved from: https://www.theatlantic.com/education/archive/2014/01/students-should-be-tested-more-not-less/283195/ Lazarin, M. (2014). Testing overload in America’s schools. Washington, DC: Center for American Progress. Paul, A. M. (2015, August 1). Researchers find that frequent tests can boost learning. Scientific American. Retrieved from: https://www.scientificamerican.com/article/researchers-find-that-frequent-tests-can-boost-learning/ Renaissance Learning. (2013). The research foundation for Star Assessments: The science of star. Wisconsin Rapids, WI: Author. Salvia, J., Ysseldyke, J. E., & Witmer, S. (2016). Assessment in special and inclusive education. Boston, MA: Cengage Learning. Stecker, P. M., Lembke, E. S., & Foegen, A. (2008). Using progress-monitoring data to improve instructional decision making. Preventing School Failure, 52(2), 48-58. The Center on Standards & Assessment Implementation. (n.d.) Overview of major assessment types in standards-based instruction. Retrieved from: http://www.csai-online.org/sites/default/files/resources/6257/CSAI_AssessmentTypes.pdf Wilson, M. (2018). Making measurement important for education: The crucial role of classroom assessment. Educational Measurement: Issues and Practice, 37(1), 5-20. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 25 Assessment Guide for Education Leaders Testing timeline: A brief history of American assessment innovations Today is test day. In a quiet classroom, children stare at the preprinted sheets in front of them. Some students squirm; they’re nervous. The assessment is timed and they only have 60 minutes to answer all of the questions before them. It’s a lot of pressure for kids who are, for the most part, only 12 or 13 years old. Over the next several weeks, children all over the city will experience similar levels of anxiety as they take identical tests. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. Their teacher is nervous too. This test will be used to judge not only students’ abilities but also the quality of their schooling. The results will be public. Parents will discuss them—and so will administrators, legislators, and other school authorities. In short, there’s a lot riding on this test. When the reports finally roll in that summer, the scores are dismal. On average, students answer only 30% of the questions correctly. Citizens are in shock. Newspapers are packed with articles and letters to the editor, some attacking and some praising the results. People passionately debate the value of the assessment. 26 Assessment Guide for Education Leaders: A brief history of American assessment Section 4 Pop quiz: What year is it? You might think it was 2015. That year, thousands of eighth-grade students across the country took the National Assessment of Educational Progress (NAEP) reading assessment, a paper-and-pencil assessment that took up to 60 minutes for students to complete. In 2015, only 34% of eighth graders scored proficient on NAEP, triggering an onslaught of media coverage debating the quality of American education—as well as the quality of the tests themselves. In reality, it’s 1845, and these children are taking America’s first mandated written assessment. It’s the first time external authorities have required that students take standardized written exams in order to measure their ability and achievement levels, but it won’t be the last. Over the next 170 years, standardized testing will become a widespread and, well, standard part of American education. While educators have always adapted to meet the needs of their students (consider the Socratic method of tailoring questions according to a student’s specific assertions, which has been around for more than two thousand years), the first formal adaptive test does not appear until 1905. Called the Binet-Simon intelligence test—and commonly known today as an intelligence quotient, or IQ, test—it features a variable starting level. The examiner then selects item sets based on the examinee’s performance on the previous set, providing a fully adaptive assessment. Just two years earlier, in 1903, Orville and Wilbur Wright made history with the first flight of their airplane in Kitty Hawk, North Carolina. By 1905, the brothers are already soaring around in the Wright Flyer III, sometimes called the first “practical” fixed-wing aircraft. Let’s briefly explore how assessment has evolved—and, in many ways, stayed the same—between 1845 and today. As a comparison, we’ll note major innovations in transportation along the way. In 1845, the first reported mandated written assessment in the United States takes place in Boston, Massachusetts. While only 530 students take this first assessment, thousands follow in their footsteps as standardized written assessments spread across the country in the decades following. The same year, Robert William Thomson patents the first vulcanized rubber pneumatic tire—the type of tire now used on cars, bicycles, motorcycles, buses, trucks, heavy equipment, and aircraft. At this point, however, only the bicycle has been invented—and it still uses wooden wheels banded with iron. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. Exactly seven decades after the first mandated written assessment in Massachusetts, the first multiple-choice tests are administered in Kansas in 1915. These three tests—one for grades 3–5, one for grades 6–8, and one for high school—are collectively known as the “Kansas Silent Reading Tests.” Devised the year prior by Dr. Frederick J. Kelly, each test consists of 16 short paragraphs and corresponding questions. Students have five minutes to read and answer as many questions as possible. Standardization and speed seem to be hot topics in this decade. Only a few years earlier, in 1913, Henry Ford installed the first moving assembly line for the mass production of cars. 27 Assessment Guide for Education Leaders: A brief history of American assessment Section 4 One of the most famous standardized academic assessments in the world is born: the Scholastic Aptitude Test (SAT). The first administration is in 1926. Students have a little more than an hour and a half—97 minutes to be exact—to answer 315 questions about nine subjects, including artificial language, analogies, antonyms, and number series. Interestingly, the SAT comes after the similarly named Stanford Achievement Test, which was first published in 1922 (to differentiate the two, the Stanford tests are known by their edition numbers, the most recent version being the “Stanford 10” or “SAT-10”). In 1927, just one year after the first SAT, the Sunbeam 1000 HP Mystery becomes the first car in the world to travel over 200 mph. The same year, production of the iconic Ford Model T comes to an end after more than 15 million cars have rolled off the assembly line. Although planning began in 1964, the first National Assessment of Educational Progress (NAEP) takes place in 1969. Instead of today’s more well-known reading and math assessments, the first NAEP focuses on citizenship, science, and writing. It combines paper-andpencil tests with interviews, cooperative activities, and observations of student behavior. There are no scores; NAEP only reports the percentage of students who could answer a question or complete an activity. Also in 1969, Neil Armstrong and Edwin “Buzz” Aldrin become the first humans to set foot on the moon. A few months later, Charles Conrad and Alan Bean become the third and fourth individuals to take a stroll on the lunar surface. Back on earth, the first Boeing 747 takes flight. Although multiple-choice tests were invented two decades earlier, it’s not until 1936 that they can be scored automatically. This year, the IBM 805 Test Scoring Machine sees its first large-scale use for the New York Regents exam. Experienced users can score around 800 answer cards per hour—the speed limited not by the machine itself but by the operator’s ability to insert cards into the machine and record the scores. Meanwhile, in the world of transportation, the world is introduced to the first practical jet aircraft. The Heinkel He 178 becomes the world’s first turbojet-powered aircraft to take flight in 1939. The SAT’s main rival is born in 1959, when the first American College Testing (ACT) is administered. Each of its four sections—English, mathematics, social studies, and natural sciences—takes 45 minutes to complete for a total test time of three hours. That same year, in the skies above, the turbojet powers a new airspeed record as the Convair F-106 Delta Dart becomes the first aircraft to travel faster than 1,500 mph. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 28 Assessment Guide for Education Leaders: A brief history of American assessment Section 4 It’s hard to pinpoint the very first computerized adaptive test (CAT): A few claim David J. Weiss develops the first one in either 1970 or 1971; others give this honor to Abraham G. Bayroff of the US Army Behavioral Research Laboratory, who experimented with “programmed testing machines” and “branching tests” in the 1960s; and some point earlier still to the work of the Educational Testing Service (ETS) in the 1950s. Regardless, computerized adaptive testing gains great momentum in the 1970s. In 1975, the first Conference on Computerized Adaptive Testing takes place in Washington, DC. By the end of the decade, the US Department of Defense has started investigating the possibility of large-scale computerized adaptive testing. In the middle of the decade, in July 1976, the Lockheed SR-71 Blackbird shoots across the sky at a whopping 2,193 mph—setting an airspeed record that has yet to be broken. Computerized adaptive tests start moving out of the laboratory and into the real world. One of the first operational computerized adaptive testing programs in education is the College Board’s ACCUPLACER college placement tests. In 1985, the four tests— reading comprehension, sentence skills, arithmetic, and elementary algebra—are used in a low-stakes environment to help better place students into college English and mathematics courses. For some students, the ACCUPLACER might be their first experience with a computer—but not for all of them. In 1981, IBM introduced its first personal computer, the IBM 5150. A few years later, in 1984, Apple debuted the first Macintosh. This decade also sees the Rutan Voyager fly around the globe without stopping or refueling, making it the first airplane to do so. The 1986 trip takes the two pilots nine days and three minutes to complete. After nearly 20 years of research and development, the computerized adaptive version of the Armed Services Vocational Aptitude Battery—more commonly known as the CAT-ASVAB—earns the distinction of being the first large-scale computerized adaptive test to be administered in a high-stakes setting.* First implemented in 1990 at a select number of test sites, the CAT-ASVAB goes nationwide in 1996 in part thanks to its reduced testing time and lower costs in comparison to the paper-and-pencil version (called the P&P-ASVAB). Today, the CAT-ASVAB takes about half the time (1.5 hours) of the P&P-ASVAB (3 hours). (1996 also sees the advent of the first Renaissance Star Reading® assessment, a computerized adaptive test that quickly measures students’ reading levels.) Another brainchild of the 1970s also comes to fruition in this decade: The Global Positioning System (GPS) is declared fully operational in 1995, with 27 satellites orbiting the globe. The new millennium ushers in a new era of American testing. The No Child Left Behind Act of 2001 (NCLB) mandates state testing in reading and math annually in grades 3–8 and once in high school. While the Improving America’s Schools Act of 1994 (IASA) had previously required all states to develop educational standards and assess students, not all states were able to comply—and those that did not faced few consequences. This time, things are different and states must comply with NCLB or risk losing their federal funding. The new millennium also sees GPS come to consumer electronics when the United States decides to stop degrading GPS signals used by the public. For the first time, turn-by-turn navigation is possible for civilians. This decade also sees the introduction of Facebook (2004), YouTube (2005), the iPhone (2007), and the Tesla Roadster (2008). *Some claim this honor should go to the Novell corporation’s certified network engineer (CNE) examination, the Education Testing Service’s (ETS) Graduate Record Examination (GRE), or the National Council of State Boards of Nursing’s (NCSBN) NCLEX nursing licensure examination, all of which debuted computerized adaptive tests in the early 1990s. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 29 Assessment Guide for Education Leaders: A brief history of American assessment Section 4 Have we reached testing overload? A 2014 report titled Testing Overload in America’s Schools finds that average students in grades 3–5 spend 15 hours taking district and state exams each year. Students in grades 6–8 spend even more time, with 16 hours each year spent on standardized assessments. On average, students in grades 3–8 take 10 standardized assessments each year, although some are found to take as many as 20 standardized tests in a single year. Their younger and older counterparts generally take 6 standardized tests per year, totaling four hours per year in grades K–2 and nine hours per year in grades 9–12. Meanwhile self-driving cars navigate city streets, flying drones deliver groceries to customers’ doors, the Curiosity rover is taking selfies on Mars, and you can order almost anything—from almost anywhere in the world—right from your phone. This means a typical student may take 102 standardized tests before graduating high school, and some will take many more than that! When evaluating assessments, keep in mind the history of assessment in the United States—and all of the technological innovations over the years and the great leaps in learning science that have made it possible to create shorter tests that still provide educators with meaningful data. Over 170 years ago, it took more than three weeks to get from New York to Los Angeles by train and one hour to finish the country’s first mandated written exam. Today the trip requires less than six hours in an airplane, but many assessments still take an hour or longer—and students take many more tests than they used to. But things are changing. The passage of the Every Student Succeeds Act (ESSA) in 2015—which replaces NCLB—doesn’t eliminate mandated assessments, but it does offer states new levels of flexibility and control over their assessments. Around the same time, states across the nation reconsider the benefits and drawbacks of mandated assessments. Several eliminate high school graduation exams. Some limit the amount of time districts can devote to testing. Others discontinue achievement tests for specific grades or subjects. A few allow parents and guardians to opt their children out of some or even all standardized exams. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 30 Assessment Guide for Education Leaders: A brief history of American assessment Section 4 Sources Assessment Systems. (2017, January 11). A History of Adaptive Testing from the Father of CAT, Prof. David J. Weiss [Video file]. Retrieved from: https://www.youtube.com/watch?v=qb-grX8oqJQ Bayroff, A. G. (1964). Feasibility of a programmed testing machine (BESRL Research Study 6U-3*). Washington, DC: US Army Behavioral Research Laboratory. Bayroff, A. G. & Seeley, L. C. (1967). An exploratory study of branching tests (Technical Research Note 188). Washington, DC: US Army Behavioral Research Laboratory. Beeson, M. F. (1920). Educational tests and measurements. Colorado State Teachers College Bulletin, 20(3), 40-53. Fletcher, D. (2009, December 11). Brief history: Standardized testing. Time. Retrieved from: http://content.time.com/time/nation/article/0,8599,1947019,00.html Gamson, D. A., McDermott, K. A., & Reed, D. S. (2015). The Elementary and Secondary Education Act at fifty: Aspirations, effects, and limitations. RSF: The Russell Sage Foundation Journal of the Social Sciences 1(3), 1-29. Retrieved from https://muse.jhu.edu/article/605398 IACAT. (n.d.) First adaptive test. Retrieved from: http://iacat.org/node/442 IBM. (n.d.) Automated test scoring. Retrieved from: http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/testscore/ Jacobsen, R. & Rothstein, R. (2014, February 26). What NAEP once was, and what NAEP could once again be. Economic Policy Institute. Retrieved from: https://www.epi.org/publication/naep-naep/ Lazarin, M. (2014). Testing overload in America’s schools. Washington, DC: Center for American Progress. Luecht, R. M. & Sireci, S. G. (2011). A review of models for computer-based testing. College Board. Retrieved from: https://files.eric.ed.gov/fulltext/ED562580.pdf McCarthy, E. (2014, March 5). Take the very first SAT from 1926. Mental Floss. Retrieved from: http://mentalfloss.com/article/50276/take-very-first-sat McGuinn, P. (2015). Schooling the state: ESEA and the evolution of the US Department of Education. RSF: The Russell Sage Foundation Journal of the Social Sciences, 1(3). Retrieved from: https://www.rsfjournal.org/doi/full/10.7758/RSF.2015.1.3.04 National Center for Education Statistics (NCES). (2012). NAEP: Measuring student progress since 1964. Retrieved from: https://nces.ed.gov/nationsreportcard/about/naephistory.aspx National Center for Education Statistics (NCES). (2017). Timeline for National Assessment of Educational Progress (NAEP) Assessments from 1969 to 2024. Retrieved from: https://nces.ed.gov/nationsreportcard/about/assessmentsched.aspx Pommerich, M., Segall, D. O., & Moreno, K. E. (2009). The nine lives of CAT-ASVAB: Innovations and revelations. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved from: https://pdfs.semanticscholar.org/dab4/36470022fa4819d8d6256727ff869aaf58cb.pdf Reese, W. J. (2013). Testing wars in the public schools: A forgotten history. Cambridge, MA: Harvard University Press. Seeley, L. C., Morton, M. A., & Anderson, A. A. (1962). Exploratory study of a sequential item test (BESRL Technical Research Note 129). Washington, DC: US Army Behavioral Research Laboratory. Strauss, V. (2017, December 6). Efforts to reduce standardized testing succeeded in many school districts in 2017. Here’s why and how. The Washington Post. Retrieved from: https://www.washingtonpost.com/news/answer-sheet/wp/2017/12/06/efforts-to-reduce-standardized-testing-succeeded-in-manyschool-districts-in-2017-heres-why-and-how/?utm_term=.bf6c68cbe156 US Congress Office of Technology Assessment. (1992). Testing in American schools: Asking the right questions (Publication No. OTA-SET-519). Washington, DC: US Government Printing Office. Van de Linden, W. J., & Glas, C. A. W. (2000). Computerized adaptive testing: Theory and practice. Dordrecht, Germany: Kluwer Academic Publishers. Winship, A. E. (Ed). (1917). Educational news: Kansas. New England and National Journal of Education, 85(21), 582-586. This section is part of Tests and Time: An Assessment Guide for Education Leaders, a six-part guide that examines the factors that affect test length, reliability, and validity; the role of learning progressions in making assessment data actionable; critical questions educators should ask before scheduling assessments; and a brief history of assessment innovations in the United States. Download the complete guide at www.renaissance.com/tests-and-time ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 31 Assessment Guide for Education Leaders Action steps: Checklist for evaluating assessments From reliability curves to learning progressions, this guide has covered a lot of ground. To make it easier to put this information into action, we’ve compiled a checklist that you may find helpful to use when considering a new assessment solution or reviewing an existing assessment. These yes/no questions will help you determine if an assessment fits your needs. “Yes” answers indicate an assessment that is a good fit; “no” answers indicate a gap where the assessment is a poor fit. There are a few critical questions where a “no” should immediately disqualify an assessment as an option; they are marked as such. While this is not exhaustive list of all possible factors to consider, asking yourself these questions before buying a new assessment or renewing your contract or subscription on a current assessment is a great way to start finding your best assessment. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 32 Assessment Guide for Education Leaders: Checklist for evaluating assessments Section 5 Question Yes No Item format and scoring Is this item format suitable for the content you will be assessing? (If no, eliminate this assessment as an option.) Is this the best item format for the content you will be assessing? Are the items written in a way that minimizes rereading and overall testing time? Are the items automatically scored by computer? Are results available immediately after the assessment is completed? Reliability and validity Does the assessment meet your minimum reliability standards? (If no, eliminate this assessment as an option.) Does the assessment meet your minimum validity standards? (If no, eliminate this assessment as an option.) Among the assessment options with similar reliability, does this assessment offer the shortest administration time? Does this assessment offer the greatest “return on investment” when comparing reliability and validity against administration time? Is the test designed to minimize distractions—such as fatigue, boredom, or stress—that could cause artificially low scores? Does the assessment accurately predict performance on other measures, such as year-end summative state tests? Computer-adaptive assessments Is the assessment computer adaptive? Will the assessment adapt up or down as much as needed to pinpoint a student’s specific skill level? Can you use the same assessment multiple times during the school year—or even over multiple years? Learning progressions Is the assessment linked to a high-quality learning progression? (If no, eliminate this assessment as an option.) Is the learning progression aligned to your state standards? Can educators easily interact with and navigate the learning progression to see prerequisite skills, skills that can be taught concurrently, and future skills? Is the learning progression paired with resources to help teach skills, such as instructional materials or practice activities? Fit for purpose Does the assessment support a flexible administration schedule—can it be used on an “as-needed” basis, without advanced scheduling? Does the assessment provide easy-to-read reports that you can use to track student achievement and growth over time? Will the assessment provide the kind of data you need to answer the questions you have about your students? Will using the assessment help you improve teaching and learning? (If no, eliminate this assessment as an option.) TOTAL SCORE If you’d like to see how Renaissance Star Assessments® can meet your assessment needs, we encourage you to contact us. Please call us at 888-979-7950, email us at educatordevelopment@renaissance.com, or use the live chat feature on our website at www.renaissance.com. ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 33 Assessment Guide for Education Leaders Recommended reading: Books, articles, and more Looking for more great resources? Here are the top recommendations from our academic advisors, psychometricians, and subject matter experts. Renaissance Blog Many sections of this guide were originally published on the Renaissance Blog. The Renaissance Blog is full of shareable tips and insightful thoughts on pre-K–12 education, with new posts published all the time. Here are some popular blog posts to get you started. • • • • • • • • • The difference between proficiency and growth by Dr. Jan Bryan Giving meaning to test scores by Dr. Catherine Close Understanding the reliability and validity of test scores by Dr. Catherine Close The basics of test score reliability for educators by Dr. Catherine Close What educators need to know about measurement error by Dr. Catherine Close Rising to literacy challenges with effective universal screening by Dr. Denise Gibbs The law of the vital few (focus skills) by Dr. Gene Kerns No datum left behind: Making good use of every bit of educational data by Eric Stickney 4 questions learning data can help teachers answer by Renaissance Additional Renaissance resources In addition to the blog, Renaissance provides many other free resources for educators, from eBooks and guides to recorded webinars. Here are some highlights. • • • • • • • • EdWords™: Assessment Edition, an eBook of education terms The Perfect District Assessment System, an on-demand webinar by Rick Stiggins Giving Meaning to Test Scores, an on-demand webinar by Dr. Dianne Henderson Predicting Performance on High-Stakes Tests, an on-demand webinar by Eric Stickney How Learning Progressions Show the Way Forward, an on-demand webinar by Diana Blythe, Julianne Robar, and Rita Wright A School Leader’s Guide to Using Data to Enrich Classroom Teaching & Learning, a guide with more than 25 strategies for finding and using student data 81,000-student Florida district boosts achievement and state ranking with spot-on K–12 assessments, a success story about Lee County School District Better data, better decisions: Building an assessment portfolio to energize achievement, a success story about Hartford School District ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 34 Assessment Guide for Education Leaders: Additional resources Section 6 Books When you’re ready to dive deep into a topic, books are often one of the best options. These are the books and authors that helped inspire and inform this guide. • • • • • • • • • • Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement by John Hattie Developing Assessment-Capable Visible Learners, Grades K-12: Maximizing Skill, Will, and Thrill by Nancy Frey, John Hattie, and Douglas Fisher Unlocking Student Talent: The New Science of Developing Expertise by Robin J. Fogarty, Gene M. Kerns, and Brian M. Pete Assessment in Special and Inclusive Education by John Salvia, James E. Ysseldyke, and Sara Witmer Test Better, Teach Better: The Instructional Role of Assessment by W. James Popham Educational Assessment of Students by Susan M. Brookhart and Anthony J. Nitko Assessment and Student Success in a Differentiated Classroom by Carol Ann Tomlinson and Tonya R. Moon The Perfect Assessment System by Rick Stiggins The Promise and Practice of Next Generation Assessment by David T. Conley Using Formative Assessment to Enhance Learning, Achievement, and Academic Self-Regulation by Heidi L. Andrade and Margaret Heritage Third-party resources This guide wouldn’t be possible without the research, studies, and reports produced by hundreds of educators, education organizations, and others around the globe. Here are our favorites. • • • • • • • • • • The Glossary of Education Reform by the Great Schools Partnership Testing Overload in America’s Schools by Melissa Lazarin, published by the Center for American Progress Is This a Trick Question? A Short Guide to Writing Effective Test Questions by Ben Clay, published by the Kansas Curriculum Center Reviewing Your Options: The Case for Using Multiple-Choice Test Items by Michelle Croft, Gretchen Guffy, and Dan Vitale, published by ACT The Facts About Subscores by William Monaghan, published by ETS Essential Components of RTI and Screening Tools Chart by the Center on Response to Intervention Academic Progress Monitoring by the National Center on Intensive Intervention Progress Monitoring Within a Multi-Level Prevention System by Lynn Fuchs, published by the RTI Action Network The Standards for Educational and Psychological Testing by the American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME) The Role of Interim Assessments in a Comprehensive Assessment System: A Policy Brief by Marianne Perie, Scott Marion, Brian Gong, and Judy Wurtzel, published by The Aspen Institute ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. 35 Assessment solutions by Renaissance® Renaissance Star 360® is the most comprehensive pre-K–12 interim and formative assessment suite available. Star 360 includes computer-adaptive assessments for reading, math, and early literacy—all in both English and Spanish—as well as fixed-form tests for formative assessment. Renaissance Star Assessments® can be used for universal screening, progress monitoring, instructional planning, and growth tracking. They feature: • • • • • • • • Highly efficient item designs for quicker testing time—20 minutes or less Fully adaptive testing experience for learners of all achievement levels Automatic computer scoring for immediate, accurate results Easy-to-read reports for tracking student achievement and growth over time Valid, reliable data that’s proven to predict performance on state tests Flexible scheduling so educators can assess on an as-needed basis High-quality learning progressions aligned to state standards Rich instruction and practice resources linked to the learning progressions Find your best assessment at www.renaissance.com ©Copyright 2018 Renaissance Learning, Inc. All rights reserved. All logos, designs, and brand names for Renaissance products and services, including but not limited to Accelerated Reader, Accelerated Reader 360, Renaissance, Renaissance Flow 360, myON, myON Books, myON News, Renaissance, Renaissance Growth Alliance, Renaissance Learning, Renaissance-U, Star 360, Star Custom, Star Early Literacy, Star Math, Star Reading, and Star Spanish are trademarks of Renaissance Learning, Inc., and its subsidiaries, registered, common law, or pending registration in the United States and other countries. All other product and company names should be considered the property of their respective companies and organizations. Star 360® | Accelerated Reader 360® | myON Reader™ | myON News™ | Accelerated Math® | Renaissance Flow 360™| Renaissance Growth Alliance™ 267460.062118