DDM Part II Analyzing the Results Dr. Deborah Brady Agenda Overview of how to measure growth in 4 “common sense” ways Quick look at “standardization” Not all analyses are statistical or new We’ll use familiar ways of looking at student work Excel might help when you have a whole grade’s scores, but it is not essential Time for your questions; exit slips My email dbrady3702@msn.com; PowerPoint and handouts at http://tinyurl.com/k23opk6 2 Considerations Local DDMs,” 1. Comparable across schools Example: Where possible, measures are identical Easier Do Teachers with the same job (e.g., all 5th grade teachers) to compare identical measures identical measures provide meaningful information about all students? Exceptions: When might assessments not be identical? Different content (different sections of Algebra I) Differences in untested skills (reading and writing on math test for ELL students) Other accommodations (fewer questions to students who need more time) NOTE: Roster Verification and Group Size will be considerations by DESE 3 2. Comparable across the District Aligned to your curriculum (comparable content) K-12 in all disciplines Appropriate for your students Aligned to your district’s content Informative, useful to teachers and administrators “Substantial” Assessments (comparable rigor): “Substantial” units with multiple standards and/or concepts assessed. (DESE began talking about finals/midterms as preferable recently) See Core Curriculum Objectives (CCOs) on DESE website if you are concerned http://www.doe.mass.edu/edeval/ddm/example / Quarterly, benchmarks, mid-terms, and common end of year exams NOTE: All of this data stays in your district. Only HML goes to DESE with a MEPID for each educator. Examples of 4 +1 Methods for Calculating Growth Each is in handout Pre-post test Repeated Holistic Post A measures Rubric (Analytical Rubric) test only look at “standardization” with percentiles Typical Gradebook and Distribution Page 1 of handout Alphabetical order (random) Sorted low to high Determine “cut scores” (validate in the student work) Use “Stoplight Method” to help see cut scores Graph of distribution of all scores Graph of distribution of High, Moderate, Low scores Random 90 76 92 72 80 98 91 75 60 52 76 77 96 61 63 78 79 95 80 85 86 84 65 Sorted 52 60 61 63 65 72 75 76 76 77 78 79 80 80 84 85 86 90 91 92 95 96 98 Distribution of whole class all scores, low to high 120 100 80 60 “Cut” Scores and “common sense”: validate them with performances. What work is not moving at an average rate? 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 14 12 What work shows accelerated growth? 10 8 6 4 2 0 Some benchmarks have determined rates of growth over time High Mod Low High, Moderate, Low Distribution High Count Mod Count Low Count 6 12 5 Pre/Post Test Description: The same or similar assessments administered at the beginning and at the end of the course or year Example: Grade 10 ELA writing assessment aligned to College and Career Readiness Standards at beginning and end of year Measuring Growth: Difference between pre- and post-test. Check if all students have an equal chance of demonstrating growth 8 Pre- Post Tests Pre-test Lowest to highest Post test Difference (Growth) Post minus pre Analysis Range of growth Pre/post Cut score? Look at work. Look at distribution. %age growth Difference %age based on diff/pre 20 35 15 15 75% 25 30 5 5 20% 30 50 20 20 67% 35 60 25 25 42% 35 60 25 25 42% 40 70 35 35 87% 40 65 25 25 62% 50 75 25 25 50% 50 80 30 30 60% 50 85 35 35 70% How many L/M/H? 5 3 2 LOW MODERATE HIGH Holistic Description: Assess growth across student work collected throughout the year. Example: Tennessee Arts Growth Measure System Measuring Growth: Growth Rubric (see example) Considerations: Option for multifaceted performance assessments Rating can be challenging & time consuming 10 Holistic Example (unusual rubric) 1 No improvement in the level of detail. Modest improvement in the level of detail One is true One is true * No new details across versions * There are a few details included across all versions * New details are added, but not included in future versions. * A few new details are added that are not relevant, accurate or meaningful Details 2 * There are many added details are included, but they are not included consistently, or none are improved or elaborated upon. * There are many added details, but several are not relevant, accurate or meaningful 3 4 Considerable Improvement in the level of detail Outstanding Improvement in the level of detail All are true All are true * There are many examples of added details across all versions, * On average there are multiple details added across every version * At least one example of a detail that is improved or elaborated in future versions *Details are consistently included in future versions * There are multiple examples of details that build and elaborate on previous versions * The added details reflect the most relevant and meaningful additions *The added details reflect relevant and meaningful additions Example taken from Austin, a first grader from Anser Charter School in Boise, Idaho. Used with permission from Expeditionary Learning. Learn 11 more about this and other examples at 11 http://elschools.org/student-work/butterfly-drafts HOLISTIC Easier for Large-Scale Assessments like MCAS Rubric Topic or Conventions and useful when categories overlap Criteria In one cell Advanced Proficient NI At Risk 1)Insightful, accurate, carefully developed claims and evidence. 2) Counterclaims are thoughtfully, accurately, completely discussed and argued. 3) Whole essay and each paragraph are carefully organized and show interrelationships among ideas. 4) Sentence structure, vocabulary, and mechanics show control over language use Adequate Effective “Gets it” Misconcep tions; some errors Serious errors Writing 1) 2) 3) 4) Claims/evidence Counterclaims Organization Language/style MCAS Has 2 Holistic Rubrics Topic/D evelop ment Conven tions 6 5 4 4 5 6 Rich topic/idea development Careful, subtle organization Effective rich use of language Full topic/idea development Logical organization Strong details Appropriate use of language Moderate topic/idea development and organization Adequate, relevant details Some variety in language Rudimentary topic/idea development and/or organization Basic supporting details Simplistic language Limited or weak topic/idea development, organization, and/or details Limited awareness of audience and/or task Little topic/idea development, organization, and/or details Little or no awareness of audience and/or task Control of sentence structure, grammar, usage, and mechanics, (length and complexity of essay) provide opportunity for student to show control of standard English conventions) Errors do not interfere with communication and/or Few errors relative to length of essay or complexity of sentence structure, grammar and usage, and mechanics Errors interfere somewhat with communication and/or Too many errors relative to the length of the essay or complexity of sentence structure, grammar and usage, and mechanics •Errors seriously interfere with communication AND •Little control of sentence structure, grammar and usage, and mechanics Pre and Post Rubric (2 Criteria) Growth Add the scores Pretests Topic Conventio ns Post tests Topic Convent ions Differe ce 1/1 1/1 0/0 0 0 0 1 /2 2/2 1/0 1 1 100% 1/2 2/3 1/1 2 1 100% 2/3 3/3 1/0 1 2 50% 1= 2= 3= 4= Analysis Add together criteria gains as raw score In order % of growth difference /pre Rubrics do not represent percentages. A student who received a 1 would probably receive a 50. F? 50 F Seriously at risk range 60-72, 75? D to CAt risk 76-88, 89? C+ to B+ Average 90-100 A to A+ Above most Holistic Rubric or Holistic Descriptor Keeping 1-4 scale distribution Pre Post Difference Rank order Cut 0 1 +1 -1 -1 0 1 +1 0 0 0 1 +1 0 0 5 1 0 -1 1 1 4 1 1 0 1 1 1 1 0 1 1 1 3 +2 1 1 1 1 0 1 1 2 3 +1 2 2 7 6 3 2 1 0 low mod High Converting Rubrics to Percentages Not recommended for classroom use because it distorts the meaning of the descriptors. May facilitate this large-scale use. District Decision Pre Converted “grade” Post Converted “grade” Difference %age growth Difference/ pre 0 0 1 50 50 50% 0 0 1 50 50 50% 0 0 1 50 50 50% 1 50 0 0 -50 -50% 1 50 1 50 0 0 1 50 1 50 0 0 1 50 3 82 32 64% 1 50 1 50 0 0 2 65 3 82 17 26% Common Sense analysis Was the assessment too difficult? Zeros in pretest (3) Zero growth Only 1 student improved Change assessment scale? Look at all of the grade-level assessments. % conversion not helpful in this case? Repeated Measures Description: Multiple assessments given throughout the year. Example: running records, attendance, mile run Measuring Growth: Graphically Ranging Less from the sophisticated to simple pressure on each administration. Authentic Tasks (reading aloud, running) 17 Repeated Measures Description: Multiple assessments given throughout the year. Example: running records, attendance, mile run Measuring Growth: Graphically Ranging from the sophisticated to simple Considerations: Less pressure on each administration. Authentic Tasks 18 Repeated Measures Example Running Record Errors in Reading Average of high, moderate, and low error groups 70 Error Chart of Averages from each assessment 60 50 40 30 20 10 0 1 2 3 4 5 6 September Sept Septe mber Novem Januar March April ber y June 65 48 13 30 15 15 Ra 68 63 30 35 20 22 18 10 65 65 32 22 10 12 5 2 1 30 30 28 24 22 20 19 22 Post test only AP exam: Use as baseline to show growth for each level or… for classroom This assessment does not have a “normal curve” An alternative for post test only for a classroom and to show student growth is to give a mock AP pre and post. Post Test Only AP Exam Example 16 14 12 10 8 6 4 2 0 five four three two one Looking for Variability Good Problematic 200 150 150 # of students # of students 200 100 50 100 50 0 0 Low Moderate High Low Moderate High The second graph is problematic because it doesn’t give us information about the difference between average and high growth because so many students fall into the “high” growth category. NOTE: Look at the work and make “common sense” decisions. 21 Consider the whole grade level; one class’s variation may be caused by teacher’s effectiveness Critical Question: Do all students have equal possibility for success? “Standardizing” Local Norms Percentages versus Percentiles % within class/course %iles across all courses in district Many Assessments with different standards Student A Student A English: Math: Art: Social Studies: Science: Music: 15/20 22/25 116/150 6/10 70/150 35/35 “Standardized” Normal Curve Percentage of 100% • • • • • • English Math Art Social Studies Science Music 75% 88% 77% 60% 46% 100 Student A English: Math: Art: Social Studies: 22 Science: Music: 62 %ile 72 %ile 59 %ile 71 %ile 70 %ile 61 %ile Standardization In Everyday Terms Standardization is a process of putting different measures on the same scale For example Most cars cost $25,000 give or take $5,000 Most apples costs $1.50 give or take $.50 Getting a $5000 discount on a car is about equal to what discount on an apple? Technical terms “Most are” = mean “Give or take” = standard deviation 23 Percentile/Standard Deviation Excel Functions Sort high to low or low to high, Graphing Function, Statistical Functions including Percentiles and Standard Deviation Student grades can be sorted from highest to lowest score with one command Table of student scores can be easily graphed with one command Excel will easily calculate %, but this is probably not necessary “Common Sense” The purpose of DDMs is to assess Teacher Impact The student scores, the Low, Moderate, and High growth rankings are totally internal DESE (in two years) will see MEPIDS L, and M or H next to a MEPID The important part of this process needs to be the focus: Your discussions about student learning with colleagues Your discussions about student learning with your evaluator An ongoing process