Assessing Students’ Reading and Texts Agenda Assessment Humor Major Issues With Accountability and Reading Assessment America’s Infatuation with Assessment and Accountability What Statistics Tell Us: A little and a lot Reading Assessments for Your Class Assessment Two types of assessment: Formal and Informal Assessment is ever-present reality Good teachers are always assessing their students Good teachers are always assessing their own practice Sign of a bad teacher is one who is not introspective and seeking constructive feedback! EXAMPLES Accountability = Assessment ??? High-Stakes Testing “You can’t fatten a calf by weighing it” - proverb (and a quote from House the floor during the NCLB debate) Origins of High-Stakes Testing Schooling and high stakes testing grew exponentially in conjunction with industrialization and with “modern” psychology (a la E.L. Thorndike) Worldwide “paradigm shift” (Kuhn, 1962) from rural farming to industrialization Origins of High-Stakes Testing Schools as Factories: Assembly lines (age-graded classrooms) Interchangeable parts (teachers all teach same curriculum in each grade) Product (all students have same knowledge when finished) Quality control (tests at specific intervals to ensure learning) Belief that similar inputs = similar outcomes = measurable knowledge Standardized Tests: Why Americans Love ‘em Belief that standardized tests are inherently fair Standardized tests lend themselves to statistical analysis and reporting U.S. population tends to trust statistics Belief that STATISTICS = UNBIASED Belief that statistics are math U.S. Statistics: Test your knowledge At what pace has the murder rate grown over the last 40 years (in other words, how much worse is murder per capita now than 40 years ago)? How has the violent crime rate changed in the last 30 years? How much more likely are children to be abducted today compared to 25 years ago? 40 years ago? Statistics Answer to ALL of the ABOVE: Far less Violent crimes have fallen relatively consistently since 1972. U.S. Statistics: Test your knowledge At what pace has the murder rate grown over the last 40 years (in other words, how much worse is murder per capita now than 40 years ago)? How has the violent crime rate changed in the last 30 years? How much more likely are children to be abducted today compared to 25 years ago? 40 years ago? If you were answering these questions as a lay person— and not in the context of this class—how would you answer? How would most people you know answer? What leads people to answer as they do? Misuse of Statistics Take the following example: "Every year since 1950, the number of American children gunned down has doubled.” (Children’s Defense Fund) Misuse of Statistics Take the following example: "Every year since 1950, the number of American children gunned down has doubled.” (Children’s Defense Fund) Now the real statistic: "The number of American children killed each year by guns has doubled since 1950.” (Children’s Defense Fund) Misuse of Statistics Take the following example: "Every year since 1950, the number of American children gunned down has doubled.” (Children’s Defense Fund) Now the real statistic: "The number of American children killed each year by guns has doubled since 1950.” (Children’s Defense Fund) By the first account, 1995’s murder rate for children (when the first quote above appeared) would have been 35 trillion children Statistics When people who are ‘trusted’ quote statistics, what they say is given even more credibility (often regardless of how ridiculous the claim). Using your incredible statistical (and basic math) skills, what is wrong with the following statistical analysis? Bad Statistics! Bad Dog! Statistics 1) PER CAPITA is a ratio. It does NOT change when the population size changes. Per capita is Latin for by the head or for each head - Example: If I were to say that 20% of Russians smoke, it does not matter if the population of Russia is 10 people, 200,000 people, or 400,000 people. - One in five Russians smoke. 2) Statistical procedures and measurements are prone to specific kinds of bias, but “the way they do statistics” does not change Statistics = Fact? Statistics can be manipulated and are seldom free of bias Discarding Unfavorable Data Sampling bias/Margin of Error Leading/Loaded Question bias False causality Null Hypothesis Numerous other issues (see examples in hyperlinks below) For more examples of common misuses of statistics, see: http://en.wikipedia.org/wiki/Misuse_of_statistics http://cseweb.ucsd.edu/~ricko/CSE3/Lie_with_Statistics.pdf http://knowledge.wharton.upenn.edu/article/the-use-and-misuse-of-statistics-how-and-whynumbers-are-so-easily-manipulated/ Statistics = Fact? Discarding Unfavorable Data Companies and their research staff can—and often do— ignore data that contradicts what they hope to find and/or they fail to publish studies that are disadvantageous to them. Medical studies in which the outcomes do not favor the introduction and use of a new (and costly) medicine. Ignoring myriad unfavorable variables and outcomes while selectively using those that are favorable (e.g., the drug reduces arthritis pain but increases risk of death fourfold). Antidepressant company researching the benefits of a new drug choose to discard from the study sample a group of people who showed dramatically increased risk of suicide when on the drug (excluding them for myriad reasons). Statistics = Fact? Sampling Bias (also correlates with Margin or Error) Recent Example: The 2012 Romney Campaign managers and statisticians genuinely thought the race would be close or that they would win handily. They based this upon statistics garnered through sampling voters by telephone poll. They ignored the findings of statistician Nate Silver who had in 2008 accurately predicted the electoral vote who was again predicting (accurately) the 2012 electoral vote. “Dick Morris, former Campaign manager for Bill Clinton's 1996 reelection [and a leading strategist for Mitt Romney] has absolutely put his political pundit reputation on the line by declaring that Mitt Romney will win the Presidency in a landslide, which of course mirrors yours truely's prediction of Romney getting 52% of the votes against Obama's 47%.” - JustPlainPolitics.com Romney Sampling = “likely voters’ with home phones willing to answer a poll about their preference of Presidential candidates. Statistics = Fact? Leading/Loaded Question Bias Do you support the attempt by the USA to bring freedom and democracy to other places in the world? Do you support the unprovoked military action by the USA? Do you support ObamaCare? Do you support the Affordable Care Act? Do you think teachers should be held to high standards that are measured fairly and accurately? Do you support more standardized testing in K-12 public school classrooms? Statistics = Fact? False causality (A ‘causes’ B) Correlation is NOT causation. Many things are correlated (related) to each other, but this does not mean that one thing causes another. Almost all heavy drug use starts with first with alcohol or marijuana use. Thus, marijuana use causes heroin addiction (FALSE). The number of people buying ice cream at the beach is statistically related to the number of people who drown at the beach. Thus ice cream causes drowning (FALSE). Statistics = Fact? Misuse of Null Hypothesis Statisticians use the ‘null hypothesis’ as their starting point— that there is no relationship between two measured phenomena or that a potential medical treatment has no effect—and assume this true until proven otherwise via conclusive evidence (confidence intervals). The U.S. Court system follows a similar approach: Innocent until proven guilty beyond a reasonable doubt. But the acquittal of a defendant does not prove the defendant “innocent” of the crime; rather it merely states that there is insufficient evidence for a conviction. If a tobacco company runs studies to show that its products do not cause cancer. But it uses a small sample and the study is done over a short period of time. Thus it is unlikely that they’ll disprove the null hypothesis (that there is no relationship between using X tobacco product and cancer). They should not therefore report that their product does not cause cancer. Statistics Bias: Reporting Based upon performance, which stock would you be more inclined to buy? Statistics Bias: Reporting Based upon performance, which stock would you be more inclined to buy? The stocks performed at same rate. The only difference is in how their performance is registered. Statistics = Fact? / (& “reading is reading”) U.S. population tends to trust statistics Belief that STATISTICS = UNBIASED (statistics = math) Thus, belief that standardized tests are inherently fair Statistics can be manipulated, poorly done, and biased to find what one is seeking to find Romney Campaign SAT scores as example of school failure (Berliner and Biddle, ) Concern that SAT scores over time not changing or getting higher “Culture of Fear” & Statistics The Bell Curve (Murray & Herrnstein) Statistics and Standardized Tests Standardized tests are trusted to give us an accurate picture of how well students are doing in school. Because they are statistically based, norm-referenced, and relatively easily scored, most people trust the information they give. WHAT DO THEY NOT GIVE? WHAT DO THEY NOT MEASURE? WHAT SOURCES SHOULD WE TRUST FOR READING STATISTICS? A Real (valid) Statistic According to the National Assessment of Educational Progress (NAEP), approximately one in four students in the 12th grade (who have not already dropped out of school) are still reading at "below basic" levels, while only one student in twenty reads at "advanced" levels. High-Stakes Testing Your performance will be based, at least in part, upon your students’ test scores; measures such as Student Success Act (2011) and Race to the Top all rely on grading teachers’ performance as indicators of their teaching ability and performance pay. 50% of your yearly assessment— regardless of content area—will be based upon one test score! This is yet another reason to make sure that your students can read their content area texts effectively: Your job will depend upon it! Different Kinds of Testing Norm-referenced tests vs. criterion-referenced tests Norm-Referenced Tests NCLB and large-scale testing tends to be norm-referenced Norm-referenced means a student’s performance is measured against a ‘norm’ for that age, ability, level, etc. Students are compared to a large average Teacher success based largely on norm-referenced test scores Criterion-Referenced Tests Classroom testing is almost always criterion-referenced Criterion-referenced tests measure knowledge or ability on a specific area (whether or not a student has learned specific material) Reliability vs. Validity Reliability refers to the confidence we can place on the measuring instrument to give us the same numeric value when the measurement is repeated on the same object Will students score roughly the same if the test/assessment is repeated (results are not random) Validity refers to whether the assessment instrument/tool actually measures the property it is supposed to measure. Does the assessment tool actually measure the right thing? Reliability vs. Validity Reliability “Another way to think of reliability is to imagine a kitchen scale. If you weigh five pounds of potatoes in the morning, and the scale is reliable, the same scale should register five pounds for the potatoes an hour later (unless, of course, you peeled and cooked them). Likewise, instruments such as classroom tests and national standardized exams should be reliable – it should not make any difference whether a student takes the assessment in the morning or afternoon; one day or the next” (http://fcit.usf.edu/assessment/basic/basicc.html) Reliability CLASSROOM EXAMPLE: “Another measure of reliability is the internal consistency of the items. For example, if you create a quiz to measure students’ ability to solve quadratic equations, you should be able to assume that if a student answers an item correctly, he or she will also be able to answer other, similar items correctly. The following table outlines three common reliability measures.” http://fcit.usf.edu/assessment/basic/basicc.html Validity Validity refers to the accuracy of an assessment -- whether or not it measures what it is supposed to measure. Even if a test is reliable, it may not provide a valid measure. If a test is valid, it is almost always reliable; a reliable test, however, does not correlate with validity. GENERAL RULE: Validity = Reliability Reliability ≠ Validity Validity Validity Imagine a bathroom scale that consistently tells you that you weigh 118 pounds. The reliability (consistency) of this scale is very good, but it is not accurate ≠ Validity Because “teachers, parents, and school districts make decisions about students based on assessments (such as grades, promotions, and graduation), the validity inferred from the assessments is essential -- even more crucial than the reliability.” (http://fcit.usf.edu/assessment/basic/basicc.html) When you try to assess whether or not a student is able to read a text, you must use a valid measurement. For instance, students may be able to read and comprehend all of the words of a text but not understand the content. Testing for their ability to literally ‘read’ the text would not be a valid assessment. The figure above shows four possible situations. In the first one, you are hitting the target consistently, but you are missing the center of the target; you are consistently and systematically measuring the wrong value for all respondents. This measure is reliable, but not valid (that is, it's consistent but wrong). The second shows hits that are randomly spread across the target. You seldom hit the center of the target but, on average, you are getting the right answer for the group (but not very well for individuals); you get a valid group estimate, but you are inconsistent. Here, you can clearly see that reliability is directly related to the variability of your measure. The third scenario shows a case where your hits are spread across the target and you are consistently missing the center. Your measure in this case is neither reliable nor valid. Finally, we see the "Robin Hood" scenario -- you consistently hit the center of the target. Your measure is both reliable and valid. High-Stakes Tests Today’s Corollary: A gun aimed at teachers saying “If all of your students don’t score well on the test, you’re a bad teacher who needs to go.” Bang Assessment & Reading You, the content area teacher, can do a number of relatively simple assessments to gauge a student’s or a group of students’ reading ability/level There are also numerous ways to determine the reading level of various texts (textbooks, articles, web pages, etc.) The goal: to match readers’ ability to appropriate texts (within the Zone of Proximal Development) Assessment & Reading Informal Assessments Questioning: Questioning (orally) students about textual information (from general to specific, making note of students’ responses in some format) Questioning students directly (but privately) about their reading abilities Students know when they struggle with reading & are often more open about their struggles than you would initially imagine. Assessment & Reading Informal Assessments: Observation: when and where do students struggle with reading? Watch for (and make note of) those students who never volunteer to read and who avoid reading out loud, even in small groups Listen to peer’s comments about individual’s reading ability Watch students as they read (do they look up often? are they easily distracted? do they react negatively or disruptively?) Speed of reading is a good indicator of reading text ability (though not always!) Rate of Comprehension There is, obviously, a relationship (correlation) between how quickly one reads and one’s level of fluency. Good readers read more quickly. This is not a one-to-one correspondence.: Rate of Comprehension: Create a simple ratio for how long it takes students to read a passage of a specific length. This measure is best used with a Comprehension Inventory (see p. 111) Note that speed does not necessarily correspond with accuracy (sometimes slower readers can be reading for better understanding than faster readers). By combining the Comprehension Inventory and the Rate of Comprehension one can get a fuller picture (i.e., students who rush through a reading but read superficially and students who spend too much time on one passage). Readability Readability formulas Can be good arbiter of difficulty of text Used in conjunction with other data (using it alone tells you very little that can be of use; you must use such information with other points of data and what you know about your students) Zone of Proximal Development Do NOT use readability formulas to tailor all of your information/texts to students’ respective abilities Doing so can hinder student reading growth Readability formulas do not correlate with students’ individual abilities, background knowledge, or ability to read specific texts What did you find about the texts you selected using the Fry Readability Graph or the Flesch-Kincaid Readability Formula? Reading and ZDP Zone of Proximal Development (Lev Vygotsky) Readability Readability formulas: A caveat Readability formulas can be misused and misunderstood. Basic readability measures are simple formula: ration of syllables to words. But, it does NOT take into account: Specialized vocabulary (regardless of length of word) Difficulty in construction of passages (think in terms of “Truth is untruth insofar as…” and poetry Assessment & Reading There are a number of formal assessment tools for determining a student’s ability to read well: Dynamic Indicators of Basic Early Literacy Skills (DIBELS) – primarily for emergent literacy and early literacy) Woodcock Reading Mastery Test – Ibid Diagnostic Assessment of Reading (DAR): can be used for secondary students (expensive) District Measures (FCAT and other measures) Lexile Scores Readability Lexile© Scores Lexile scores can be obtained through state agencies (Departments of Education). The Scholastic Reading Inventory (SRI) provides a Lexile score. Check with your school or district to find out if a student or students have been tested in ways that measure Lexile. Some resources are free (analyzing a classroom text for example) Readability Lexile Scores Lexile gives a score that roughly corresponds to a range in which the average readers in that age should fall. Lexile also categorizes texts to determine its Lexile score (reading difficulty) Teachers can use this data to find appropriate reading materials for students. Cloze Tests Cloze Tests Help you see how well students know material (can be especially helpful as a pre-test of content) while also helping you determine reading ability Formula for a cloze test: 1) Select 250-500 words for a selected piece of text 2) Leaving first sentence intact, begin deleting every fifth to seventh word of the text thereafter (delete a mixture of important vocabulary, conjunctions, verbs, etc.) as this tells teachers a great deal about comprehension 3) Delete fifty words. 4) Multiply students’ exact word replacements (or very close substitutions) by two to get percentage correct Simple Cloze Tests as a class http://www.edict.com.hk/vlc/cloze/cloze.htm Readability & Comprehension Readability vs. Comprehension Readability rates the text's complexity in terms of words and grammar, but we're actually more interested in the text's difficulty in terms of reader comprehension of the content. Sad to say, no formula can measure whether users understand your site. Take, for example, the following two sentences: He waved his hands. He waived his rights. Both score well in readability formulas: simple words, short sentences. But whereas everybody understands what the first sentence describes, you might need a law degree to fully comprehend the implications of the second sentence. Summary and Discussion: Using This Information What can YOU do? Find out how well your students are reading Prior test data (state measures, other measures, IEPs, etc.) Informal assessments that you conduct in your content area Determine readability AND supplement accordingly You may not have a choice of whether or not to use specific text If you do have a choice, choose wisely (not catering to students’ weaknesses, but within their ‘zone’ of proximal development) Readability Lexile© Scores Lexile scores can be obtained through state agencies (Departments of Education). The Scholastic Reading Inventory (SRI) provides a Lexile score. Check with your school or district to find out if a student or students have been tested in ways that measure Lexile. Some resources are free (analyzing a classroom text for example) Readability Lexile Scores Lexile gives a score that roughly corresponds to a range in which the average readers in that age should fall. Lexile also categorizes texts to determine its Lexile score (reading difficulty) Teachers can use this data to find appropriate reading materials for students. Assessment & Reading Observation As the teacher your knowledge is the best indicator of students’ relative abilities to read When and where do students struggle with reading? Watch for (and make note of) those students who never volunteer to read and who avoid reading out loud, even in small groups Listen to peer’s comments about individual’s reading ability Watch students as they read. Do they look up often? Are they easily distracted? Do they react negatively or disruptively? Speed of reading is a good indicator of reading text ability (though not always!) Assessment & Reading Questioning Questioning (orally) students about textual information (from general to specific, making note of students’ responses in some format) Questioning students directly (but privately) about their reading abilities Students know when they struggle with reading Informal Observation & Questioning Video Assessment of Reading: Individual San Diego Quick (SDQ) Create a list of vocabulary words from your discipline/content area that range in “readability” Monosyllabic to polysyllabic, common English words to Latin- based words, etc. Paste individual words onto note cards. Note readability level of word on back of card (using your judgment and/or readability formulas) Starting with lower-level words, test student’s ability to pronounce the word randomly and quickly (gauge their confidence with the words) Note where students begin to struggle. Move forward AND backward with word difficulty to try to determine where student is comfortably reading them; this will help determine their reading level Assessment of Reading: Individual Cloze Tests Help you see how well students know material (can be especially helpful as a pre-test of content) while also helping you determine reading ability Formula for a cloze test: 1) Select 250-500 words for a selected piece of text 2) Leaving first sentence intact, begin deleting every fifth to seventh word of the text thereafter (delete a mixture of important vocabulary, conjunctions, verbs, etc.) as this tells teachers a great deal about comprehension 3) Delete fifty words. 4) Multiply students’ exact word replacements (or very close substitutions) by two to get percentage correct Simple Cloze Tests as a class More Cloze Tests Assessment & Reading: Group Content Area Reading Inventory (CARI) This is basically an informal test that you develop to see how well students are reading specific material –using your knowledge of the text—to test if students understand the main ideas, vocabulary, etc. It should be based upon literal answers, inferred answers, predictive answers In the text (word-for-word), searching the text (in the text but in different wording), inferential (inferred in the text but not directly stated), applied (in one’s head/experience, thought-provoking) Can be used in a fashion similar to a pre-test (NOT graded) Click here for more on creating and using CARI Assessment & Reading: Group Informal Reading Inventory (IRI) Assign a longer passage for all students to read (one that they have not read before); have they put their names on a note-card with four squares on it Tell them that they will each read the passage aloud Create a note-taking system (for yourself) to note students’ strengths and weaknesses Go around the room, stopping to listen to students as they read Look for fluency, vocabulary struggles, mispronunciations, frequent stopping and starting, speed of reading, etc. Make (simple) notes of each student’s areas of strength and weakness & record this in a reading inventory log Create learning opportunities and specific activities for students based upon reading ability) Click here for more on creating and using IRI Assessment of Reading: Group Student Response Form (SRF) Have students read a passage they have not seen (in your content area/lesson) Give them a response form Include on form a place to mark Reading Time, Part I and Part II questions When students complete reading they: 1) Mark at what time they completed the reading 2) They then answer Part I questions (literal questions “from the text”) NOT using the text 3) Answer Part II questions that are interpretive, analytical, applied) – they may refer to the text for this part. Click here for more information on Student Response Forms Assessment of Reading: Group TIMED READING There is, obviously, a relationship (correlation) between how quickly one reads and one’s level of fluency. Good readers read more quickly. This is not a one-to-one correspondence.: Rate of Comprehension: Create a simple ratio for how long it takes students to read a passage of a specific length. This measure is best used with a Comprehension Inventory (see above or Vacca and Vacca, p. 111) Note that speed does not necessarily correspond with accuracy (sometimes slower readers can be reading for better understanding than faster readers). By combining the Comprehension Inventory and the Rate of Comprehension one can get a fuller picture (i.e., students who rush through a reading but read superficially and students who spend too much time on one passage). Readability Readability formulas Can be good arbiter of the difficulty of a text; however, these are NOT a measure of a student’s reading ability nor should they be used that way! Use in conjunction with other data (using it alone tells you very little that can be of use; you must use such information with other points of data and what you know about your students) Readability Readability formulas: A caveat Readability formulas can be misused and misunderstood. Basic readability measures are simple formula: ration of syllables to words. But, it does NOT take into account: Specialized vocabulary (regardless of length of word) Difficulty in construction of passages (think in terms of “Truth is untruth insofar as…” Unusual word formations, such as poetry Readability Readability formulas: A caveat Parting My life closed twice before its close; It yet remains to see If Immortality unveil A third event to me. So huge, so hopeless to conceive, As these that twice befell. Parting is all we know of heaven, And all we need of hell. - Emily Dickinson Readability Readability formulas: A caveat Parting My life closed twice before its close; It yet remains to see If Immortality unveil A third event to me. So huge, so hopeless to conceive, As these that twice befell. Parting is all we know of heaven, And all we need of hell. 5.6th grade level according to Flesch Kincaid Reading and ZDP Zone of Proximal Development (Lev Vygotsky) Summary and Discussion: Using This Information What can YOU do? Find out how well your students are reading Prior test data (state measures, other measures, IEPs, etc.) Informal assessments that you conduct in your content area Determine readability AND supplement accordingly You may not have a choice of whether or not to use specific text If you do have a choice, choose wisely (not catering to students’ weaknesses, but within their ‘zone’ of proximal development) Summary and Discussion: Using This Information What can YOU do? Use checklist (gauge texts for appropriateness for different readers (see p. 134) Directly Teach the Text and Model Effective Reading to help struggling readers Use read/think alouds, guided questions, pre-reading strategies Use paired or group activities with mixed ability students