Uploaded by Thomas Pinder

foundationsstudyguide

advertisement
Study Guide: Foundations of Assessment Final
Chapters 8-16
Ch 8: Rubrics
-Two types of rubrics- performance and product
Performance assessment

Creating products or demonstrate behaviors other than answering selectedresponse or constructed-response questions
 Teachers or others observe the process of the construction
 Judgement is made of the performance
Oral book report, play/skit
know the closest the task is to the real world the closer the evaluation will be
Three features of performance assessment
1. Multiple evaluative criteria
a. Response must be judged on more than one criteria
b. Mystery criteria
i.
Intro, body, conclusion, content, CUPS
2. Pre-spaced quality standards
.
Evaluative criteria known in advance
3. Judgemental appraisal
.
Depends on human judgements
Identifying suitable performance tasks
 Tasks generally few in number
o Each task is significant and complex
o Great care must be taken in creating and selecting the taks
 tasks should provide evidence for inferences to be made from key curricular aims
Rubrics:
 Definition
o Scoring procedures for judging student performance
 Specify the evaluative criteria
o For instance, writing composition could be scored on the basis of
 Organization, word choice, and clarity
 spelling , punctuation and grammar
o Effective rubrics inform instruction
1. Evaluate criteria
a.
Four by four (make even)
2. Clear description of the qualitative differences for each criterion
3. Holistic and analytic scoring
.
Holistic scouring doesn’t enumerate students’ shortcomings but is quicker
a.
Analytic scoring provides more diagnostic information
Sources of Error
 Scoring instrument flaw
 Procedural flaws
 Teacher’s unintentional personal bias errors
Rubric writing rules
1. Skill is significant
2. Criteria can be taught
3. Few criteria
4. Succinctly labeled
5. Match length of rubric to tolerance for detail
6. Even number of quality levels (not Popham)
CH 9: Portfolio Assessment
= A systematic collection of students’ work updated as achievement and skills grow
Seven Key Ideas (not list on final but have to apply)
1. Make sure students own their portfolios
a.
Must be perceived as collections of their work, not receptacles for products you will
grade
2. Decide what kinds of work samples to collect
.
Substantial variety is preferred over a limited range of work products
3. Collect and store work samples
.
Students decide, with teacher input, what to collect
a.
Students collect and store them
b.
Students must have access
4. Select criteria by which to evaluate work
.
Best if determined by teacher and student
a.
Must be clearly described for self-evaluation to be successful
5. Students should evaluate products continually
.
Can evaluate holistically and/or analytically
a.
Evaluations are dated and attached to the product
6. Schedule and conduct portfolio conferences
.
Conversation is essential to effectiveness
7. Involve parents in the proceeds
.
Communicated the process to parents early
Types of Portfolios
1. Document of student progress, growth over time (student evaluation) = working portfolio
a.
Provide evidence of student growth or lack of it
b.
Provide meaningful opportunities for self-evaluation
2. Best work= celebration portfolio
.
Selection of best work as a celebration
i.
Especially appropriate for early grades
3. Work that's required to move me to the next level = passport portfolio
CH 10: Affective Survey
= a systematic assessment of students’ attitudes, interests and values (VIA)
o Values
o
o


1.
2.
3.
a.
4.
.
5.
.
a.
b.
i.
c.
Interest or lack of interest in a topic
Attitudes toward learning
Affect predicts behavior= an individual’s affective status predicts that individual’s future
behavior, although not perfectly
Rensis Likert inventories
o A series of statements to which students register their agreement or
disagreement
o Number of choices reflects students development (typically three choices if 3rdyounger) if super young can do smile and frown face
o Uses qualifiers to avoid “always” or never scenarios
o Can be time consuming to create and use
o Not appropriate for measuring more than one affect variable, so rarely
recommended/use
Pick your subject- pick variable to measure
Determine the number of items per variable (our class 10 different approaches,create a
pair, 20 in total)
Create a series of positive and negative statements related to each affective variable
Mix up the order (could do A-Z sort)
Determine the number and phrasing of students’ response options
Consider 5 options- strongly agree, agree, uncertain, disagree, strongly disagree
Create clear directions for the inventory and an appropriate presentation format
Have to have a sample item in directions for ours (pg 256)
How to respond
Guarantee anonymity
No handwriting, no name, collection model (finish put in envelope and place in bin
Remind no wrong answers
CH 11: Improving tests

2 categories of improvement
o Words aka judgmentally based improvement procedures
o Numbers aka empirical
Five criteria for judgmentally based improvement procedures pg 267
1. Remember the commandments/specific guidelines
2. Contribution to score based inference
3. Accuracy of content
4. Absence of content lacunae (gap)
a.
Not leaving anything out
5. Fairness
(GIRAF)
Three groups of people?
 Test writers
 Professional colleges
 Students
Difficulty Index
P= R/T
P: difficulty index
R: number of right answers
T: total number of answers
Item-Discrimination Index
 Positive Discriminator
o High Scorers > Low Scorer
 Negative Discriminator
o High Scorers < Low Scorer
 Nondiscriminator
o High Scorers = Low Scorers
1. Order test scores from high to low
2. Divide the paper into high and low group with even number in each, if odd- throw out
median
3. Calculate a p value for the high group and a p value for the low group
4. Subtract pl from ph to obtain each item’s discrimination index
.40 and above
Very good items
.30–.39
Reasonably good items, but possibly subject to improvement
.20–.29
Marginal items, usually needing improvement
.19 and below
Poor items, to be rejected or improved by revision
Distractor Analysis (pg 277)
You need to know how to draw this. You basically take the question and figure out how mnay
people in the upper and lower half answered each answer choice.
Item #28
alternatives
(p=.50, D= -.33)
A
B* C D OMIT
Upper 15 students
2
5
0
8
0
Lower 15 students
4
10 0
0
1
Answer B is correct, but the answer choice might need to be fixed because the upper students
did not get that. D needs to be reviewed because the high students went for it but none of the
low students did. Item C is doing nothing for the question because no one chose it.
CH 12: Formative Assessment
Formative assessment
 A planned process in which assessment evidence of students’ status is used by
teachers to adjust their ongoing instructional procedures or by students to adjust their
current learning tactics

Occurs during learning
Summative assessment
 Occurs after learning
Learning progression
 A sequenced set of building blocks students must master en route to mastering a target
curricular aim
 Purpose: identify the lower order curricular aims (building blocks) needed to master
higher order aims
o Higher order aim: write a clear explanatory essay
o Lower order aims: know the important mechanical conventions of spelling,
punctuation, and grammar (enabling knowledge), be able to present information
in an organized way (cognitive subskill)
o Building blocks: lesser aims that enable higher order aims. (guidelines: modest in
number, truly rather than arguably requisite, sequenced according to teacher
judgement
 Creating a learning progression
o Acquire a thorough understanding of the target curricular aim
o Identify all requisite “must know” subskills and bodies of enabling knowledge
o Determine measurability of each building block (determine if building block can
be assessed, eliminate building blocks that cannot be assessed)
o Arrange building blocks that cannot be assessed
Formative assessment applies a common everyday process to instruction
 Identify a desired end
 Consider the strategies (means) to try to reach that end
 Reflect on effectiveness
 Employ different means if the situation indicates a need
Deterrents to the use of formative assessment
 Educators’ misunderstandings about formative assessment
 Resistance to change
 Failure of external accountability tests to reflect formative assessment driven
improvements
CH 13: Standardized Tests
3 things that make it standard:
-administered, scored, interpreted the same way
Central Tendency and Variability
 Measures necessary to describe the performance of a group
o Mean- average
o Median- middle
o Range- high minus low
o Standard deviation- how far the average score is from the middle
 Know the definition of SD but you do not have to compute
-distribution= a fancy way to say a set of scores
Percentiles





Indicates % of students in the norm group that the student outperformed
o A percentile of 60 indicates the student performed better than 60% of students in
norm group
Most frequently used relative score
Easy to understand
Usefulness relies on the quality go norm group
“Percentage of outscored students”
Grade-Equivalent Scores
 Relative interpretation
 Estimates student performance using grade level and months of the school year
o It is a developmental score as it represents a continuous range of grade levels
 A grade equivalent score of 4.5
 grade-> 4.5<- Month of School Year
 The score is an estimate of the grade level of a student who would obtain that score on
that particular test
 Most appropriate for basic skills like reading and math
 Two assumptions are implied:
1. The subject area tested is equally emphasized at each grade level
2. Student mastery increases at a constant rate
-raw score: number of correct answers
Scale Scores
 Relative interpretation
 Often used to describe group test performances at the state, district and school level
 Utilizes converted raw scores
 Scores are positioned on an arbitrary chosen scale
 Represents student performance
 Can be used to make direct comparisons between groups
Stanine Scores
 Also assume the distribution of scores is normal
 Less precise and therefore more easily tolerates deviation from normality
 The normal curve is divided into 9 segments
o Each segment represents a range of percentiles
o Fifth stanine is the center and covers the middle 20% of scores
 1,2,3 low performing
 4,5,6 normal performing
 7,8,9 high performing
 5 is right in the middle
 Highest and lowest is always going to be equivalent with a bell curve
CH 14: Inappropriate test prep
2 guidelines
1. Professional ethics
a.
Test prep practices must not violate the ethical professional norms
b.
Teachers model appropriate behavior for children
2. Educational defensibility guideline
.
Test prep must not increase scores without also increasing student mastery
a.
Practices that increase both scores and mastery desirable
b.
Practices that only increase test performance short-change students
5 test prep practices
1. previous -form preparation
a.
Practice and instruction using an earlier version of the test
b.
Inappropriate
c.
Neither ethical not educationally defensible
i.
Coaching merely for test-score gains
ii.
Likely to boost scores without boosting mastery
2. Current form prep
.
Practice and instruction on the current test itself
a.
Inappropriate
b.
Neither ethical nor educationally defensible
.
Current form could only be used if obtained unethically
i.
Outright ex of cheating
3. Generalized test taking prep
.
Covers test-taking skills addressing a variety of test formats
.
Appropriate ethically and educationally
1. SHOULD BE BRIEF
2. Maintains momentum on curricul;ar aims
3. Helps students demonstrate knowledge on varied test forms
4. Same format prep
.
Address test content formatted similarly to test
a.
Inappropriate
.
Ethical, but niot educationally defensible
i.
Mastery of content is not likely to rise as test scores do
ii.
Administrators may opt for this despite something
5. Varied form prep
.
Addresses test content using a variety of item formats
.
Ex subtraction problems fromatted in many different ways
1. Vertical columns
2. Horizontal column
3. Story from
i.
Ethical and educationally defensible
1. Practice with content in different forms
2. Scores and mastery likely to rise
Test prep practice
Guideline satisfied? Professional ethics
Guideline satisfied?
Educational defensibility
Previous form
no
no
Current form
no
no
Generalized test taking
yes
yes
Same format
yes
no
Varied format
yes
yes
PLANETS CAN GET SIX VOTES
CH 15: The Evaluation of Instruction

Evaluation is measuring teacher success, grading is student success
Two types of teacher evaluation:
 Self evaluation
o Personal appraisal of instructional activities
o purpose: improvement of teaching effectiveness
 External evaluation
o Generally not improvement-focused
Two categories of external evaluation
 Summative teacher evaluations
o Focused on high-stakes personnel decisions
 Formative teacher evaluations
o Focused on helping teachers improve their instructional
effectiveness
o Done collaboratively with a supervisor or on-site administrator
Pretest-posttest model
 Assessing only after instruction does not indicate effectiveness
 If students have already mastered a curricular aim prior to instruction, outcome evidence
is meaningless
 A pretest-posttest data-gathering model allows for measurement of growth
 Differences in scores something
Split and Switch Model
 Works well with a reasonably large class
 Use 2 test forms of similar difficulty
 Addresses many of the problems associated with the classic pretest-posttest design
o what do u call pretest/posttest that's two different forms- test question (split and
switch)
CH 16: Grading
Goal attainment Grading GAG
1. Clarify Curricular aims
a.
Share understandable curricular aims with students and parents
b.
Provide illustrative test items
c.
Provide criteria for evaluation
d.
Provide exemplars
2. Choose goal attainment evidence
.
Consider which evidence will help determine students’ progress on the aim
a.
Share assessment approach with students and parents
b.
Use well-described goals and assessments
3. Weight the goal-attainment evidence
.
Decide how much info about goal attainment each piece of evidence gives
a.
Decide an appropriate weight to each component
b.
Be transparent with weight decisions
i.
Communicate evidence used
ii.
Communicate importance assigned to each piece
4. Arrive at the final goal attainment grade
.
Focus on student’s status with regards to grade and apply weight to evidence
a.
Teachers of younger students should consider grading from a developmental
perspective
Evaluation options
 Absolute grading
o Grade represents level of performance
o All students or no students could get an “a” grade
o Related to criterion-referenced assessment
o Creates legit levels of expectation that must be satisfied
 Relative grading
o Grade represents relative position within a group
o Best and worst levels of performance always exist
o Related to norm-referenced assessment
o Class-specific grading can address differences in group composition and
instruction
o Flexibility means teachers can change expectations among classes
 Aptitude based
o Grade represents actual performance in relation to potential performance
o Potential performance based on aptitude tests or observation
 Difficult to determine
 Relies on students’ test performance or teachers judgement
Ways grades can be reported
 Letter grades
 Numerical grades
 Verbal descriptors
 Find out which scheme is employed in the district
Weird small notes I took during the final review:
Selected answers and writing
Ch 8: performance assessment, rubrics
Roles of rubrics
Continuum of scoring guides: mystery criteria
Mom bear- hyper general, too soft
Download