Study Guide: Foundations of Assessment Final Chapters 8-16 Ch 8: Rubrics -Two types of rubrics- performance and product Performance assessment Creating products or demonstrate behaviors other than answering selectedresponse or constructed-response questions Teachers or others observe the process of the construction Judgement is made of the performance Oral book report, play/skit know the closest the task is to the real world the closer the evaluation will be Three features of performance assessment 1. Multiple evaluative criteria a. Response must be judged on more than one criteria b. Mystery criteria i. Intro, body, conclusion, content, CUPS 2. Pre-spaced quality standards . Evaluative criteria known in advance 3. Judgemental appraisal . Depends on human judgements Identifying suitable performance tasks Tasks generally few in number o Each task is significant and complex o Great care must be taken in creating and selecting the taks tasks should provide evidence for inferences to be made from key curricular aims Rubrics: Definition o Scoring procedures for judging student performance Specify the evaluative criteria o For instance, writing composition could be scored on the basis of Organization, word choice, and clarity spelling , punctuation and grammar o Effective rubrics inform instruction 1. Evaluate criteria a. Four by four (make even) 2. Clear description of the qualitative differences for each criterion 3. Holistic and analytic scoring . Holistic scouring doesn’t enumerate students’ shortcomings but is quicker a. Analytic scoring provides more diagnostic information Sources of Error Scoring instrument flaw Procedural flaws Teacher’s unintentional personal bias errors Rubric writing rules 1. Skill is significant 2. Criteria can be taught 3. Few criteria 4. Succinctly labeled 5. Match length of rubric to tolerance for detail 6. Even number of quality levels (not Popham) CH 9: Portfolio Assessment = A systematic collection of students’ work updated as achievement and skills grow Seven Key Ideas (not list on final but have to apply) 1. Make sure students own their portfolios a. Must be perceived as collections of their work, not receptacles for products you will grade 2. Decide what kinds of work samples to collect . Substantial variety is preferred over a limited range of work products 3. Collect and store work samples . Students decide, with teacher input, what to collect a. Students collect and store them b. Students must have access 4. Select criteria by which to evaluate work . Best if determined by teacher and student a. Must be clearly described for self-evaluation to be successful 5. Students should evaluate products continually . Can evaluate holistically and/or analytically a. Evaluations are dated and attached to the product 6. Schedule and conduct portfolio conferences . Conversation is essential to effectiveness 7. Involve parents in the proceeds . Communicated the process to parents early Types of Portfolios 1. Document of student progress, growth over time (student evaluation) = working portfolio a. Provide evidence of student growth or lack of it b. Provide meaningful opportunities for self-evaluation 2. Best work= celebration portfolio . Selection of best work as a celebration i. Especially appropriate for early grades 3. Work that's required to move me to the next level = passport portfolio CH 10: Affective Survey = a systematic assessment of students’ attitudes, interests and values (VIA) o Values o o 1. 2. 3. a. 4. . 5. . a. b. i. c. Interest or lack of interest in a topic Attitudes toward learning Affect predicts behavior= an individual’s affective status predicts that individual’s future behavior, although not perfectly Rensis Likert inventories o A series of statements to which students register their agreement or disagreement o Number of choices reflects students development (typically three choices if 3rdyounger) if super young can do smile and frown face o Uses qualifiers to avoid “always” or never scenarios o Can be time consuming to create and use o Not appropriate for measuring more than one affect variable, so rarely recommended/use Pick your subject- pick variable to measure Determine the number of items per variable (our class 10 different approaches,create a pair, 20 in total) Create a series of positive and negative statements related to each affective variable Mix up the order (could do A-Z sort) Determine the number and phrasing of students’ response options Consider 5 options- strongly agree, agree, uncertain, disagree, strongly disagree Create clear directions for the inventory and an appropriate presentation format Have to have a sample item in directions for ours (pg 256) How to respond Guarantee anonymity No handwriting, no name, collection model (finish put in envelope and place in bin Remind no wrong answers CH 11: Improving tests 2 categories of improvement o Words aka judgmentally based improvement procedures o Numbers aka empirical Five criteria for judgmentally based improvement procedures pg 267 1. Remember the commandments/specific guidelines 2. Contribution to score based inference 3. Accuracy of content 4. Absence of content lacunae (gap) a. Not leaving anything out 5. Fairness (GIRAF) Three groups of people? Test writers Professional colleges Students Difficulty Index P= R/T P: difficulty index R: number of right answers T: total number of answers Item-Discrimination Index Positive Discriminator o High Scorers > Low Scorer Negative Discriminator o High Scorers < Low Scorer Nondiscriminator o High Scorers = Low Scorers 1. Order test scores from high to low 2. Divide the paper into high and low group with even number in each, if odd- throw out median 3. Calculate a p value for the high group and a p value for the low group 4. Subtract pl from ph to obtain each item’s discrimination index .40 and above Very good items .30–.39 Reasonably good items, but possibly subject to improvement .20–.29 Marginal items, usually needing improvement .19 and below Poor items, to be rejected or improved by revision Distractor Analysis (pg 277) You need to know how to draw this. You basically take the question and figure out how mnay people in the upper and lower half answered each answer choice. Item #28 alternatives (p=.50, D= -.33) A B* C D OMIT Upper 15 students 2 5 0 8 0 Lower 15 students 4 10 0 0 1 Answer B is correct, but the answer choice might need to be fixed because the upper students did not get that. D needs to be reviewed because the high students went for it but none of the low students did. Item C is doing nothing for the question because no one chose it. CH 12: Formative Assessment Formative assessment A planned process in which assessment evidence of students’ status is used by teachers to adjust their ongoing instructional procedures or by students to adjust their current learning tactics Occurs during learning Summative assessment Occurs after learning Learning progression A sequenced set of building blocks students must master en route to mastering a target curricular aim Purpose: identify the lower order curricular aims (building blocks) needed to master higher order aims o Higher order aim: write a clear explanatory essay o Lower order aims: know the important mechanical conventions of spelling, punctuation, and grammar (enabling knowledge), be able to present information in an organized way (cognitive subskill) o Building blocks: lesser aims that enable higher order aims. (guidelines: modest in number, truly rather than arguably requisite, sequenced according to teacher judgement Creating a learning progression o Acquire a thorough understanding of the target curricular aim o Identify all requisite “must know” subskills and bodies of enabling knowledge o Determine measurability of each building block (determine if building block can be assessed, eliminate building blocks that cannot be assessed) o Arrange building blocks that cannot be assessed Formative assessment applies a common everyday process to instruction Identify a desired end Consider the strategies (means) to try to reach that end Reflect on effectiveness Employ different means if the situation indicates a need Deterrents to the use of formative assessment Educators’ misunderstandings about formative assessment Resistance to change Failure of external accountability tests to reflect formative assessment driven improvements CH 13: Standardized Tests 3 things that make it standard: -administered, scored, interpreted the same way Central Tendency and Variability Measures necessary to describe the performance of a group o Mean- average o Median- middle o Range- high minus low o Standard deviation- how far the average score is from the middle Know the definition of SD but you do not have to compute -distribution= a fancy way to say a set of scores Percentiles Indicates % of students in the norm group that the student outperformed o A percentile of 60 indicates the student performed better than 60% of students in norm group Most frequently used relative score Easy to understand Usefulness relies on the quality go norm group “Percentage of outscored students” Grade-Equivalent Scores Relative interpretation Estimates student performance using grade level and months of the school year o It is a developmental score as it represents a continuous range of grade levels A grade equivalent score of 4.5 grade-> 4.5<- Month of School Year The score is an estimate of the grade level of a student who would obtain that score on that particular test Most appropriate for basic skills like reading and math Two assumptions are implied: 1. The subject area tested is equally emphasized at each grade level 2. Student mastery increases at a constant rate -raw score: number of correct answers Scale Scores Relative interpretation Often used to describe group test performances at the state, district and school level Utilizes converted raw scores Scores are positioned on an arbitrary chosen scale Represents student performance Can be used to make direct comparisons between groups Stanine Scores Also assume the distribution of scores is normal Less precise and therefore more easily tolerates deviation from normality The normal curve is divided into 9 segments o Each segment represents a range of percentiles o Fifth stanine is the center and covers the middle 20% of scores 1,2,3 low performing 4,5,6 normal performing 7,8,9 high performing 5 is right in the middle Highest and lowest is always going to be equivalent with a bell curve CH 14: Inappropriate test prep 2 guidelines 1. Professional ethics a. Test prep practices must not violate the ethical professional norms b. Teachers model appropriate behavior for children 2. Educational defensibility guideline . Test prep must not increase scores without also increasing student mastery a. Practices that increase both scores and mastery desirable b. Practices that only increase test performance short-change students 5 test prep practices 1. previous -form preparation a. Practice and instruction using an earlier version of the test b. Inappropriate c. Neither ethical not educationally defensible i. Coaching merely for test-score gains ii. Likely to boost scores without boosting mastery 2. Current form prep . Practice and instruction on the current test itself a. Inappropriate b. Neither ethical nor educationally defensible . Current form could only be used if obtained unethically i. Outright ex of cheating 3. Generalized test taking prep . Covers test-taking skills addressing a variety of test formats . Appropriate ethically and educationally 1. SHOULD BE BRIEF 2. Maintains momentum on curricul;ar aims 3. Helps students demonstrate knowledge on varied test forms 4. Same format prep . Address test content formatted similarly to test a. Inappropriate . Ethical, but niot educationally defensible i. Mastery of content is not likely to rise as test scores do ii. Administrators may opt for this despite something 5. Varied form prep . Addresses test content using a variety of item formats . Ex subtraction problems fromatted in many different ways 1. Vertical columns 2. Horizontal column 3. Story from i. Ethical and educationally defensible 1. Practice with content in different forms 2. Scores and mastery likely to rise Test prep practice Guideline satisfied? Professional ethics Guideline satisfied? Educational defensibility Previous form no no Current form no no Generalized test taking yes yes Same format yes no Varied format yes yes PLANETS CAN GET SIX VOTES CH 15: The Evaluation of Instruction Evaluation is measuring teacher success, grading is student success Two types of teacher evaluation: Self evaluation o Personal appraisal of instructional activities o purpose: improvement of teaching effectiveness External evaluation o Generally not improvement-focused Two categories of external evaluation Summative teacher evaluations o Focused on high-stakes personnel decisions Formative teacher evaluations o Focused on helping teachers improve their instructional effectiveness o Done collaboratively with a supervisor or on-site administrator Pretest-posttest model Assessing only after instruction does not indicate effectiveness If students have already mastered a curricular aim prior to instruction, outcome evidence is meaningless A pretest-posttest data-gathering model allows for measurement of growth Differences in scores something Split and Switch Model Works well with a reasonably large class Use 2 test forms of similar difficulty Addresses many of the problems associated with the classic pretest-posttest design o what do u call pretest/posttest that's two different forms- test question (split and switch) CH 16: Grading Goal attainment Grading GAG 1. Clarify Curricular aims a. Share understandable curricular aims with students and parents b. Provide illustrative test items c. Provide criteria for evaluation d. Provide exemplars 2. Choose goal attainment evidence . Consider which evidence will help determine students’ progress on the aim a. Share assessment approach with students and parents b. Use well-described goals and assessments 3. Weight the goal-attainment evidence . Decide how much info about goal attainment each piece of evidence gives a. Decide an appropriate weight to each component b. Be transparent with weight decisions i. Communicate evidence used ii. Communicate importance assigned to each piece 4. Arrive at the final goal attainment grade . Focus on student’s status with regards to grade and apply weight to evidence a. Teachers of younger students should consider grading from a developmental perspective Evaluation options Absolute grading o Grade represents level of performance o All students or no students could get an “a” grade o Related to criterion-referenced assessment o Creates legit levels of expectation that must be satisfied Relative grading o Grade represents relative position within a group o Best and worst levels of performance always exist o Related to norm-referenced assessment o Class-specific grading can address differences in group composition and instruction o Flexibility means teachers can change expectations among classes Aptitude based o Grade represents actual performance in relation to potential performance o Potential performance based on aptitude tests or observation Difficult to determine Relies on students’ test performance or teachers judgement Ways grades can be reported Letter grades Numerical grades Verbal descriptors Find out which scheme is employed in the district Weird small notes I took during the final review: Selected answers and writing Ch 8: performance assessment, rubrics Roles of rubrics Continuum of scoring guides: mystery criteria Mom bear- hyper general, too soft