Formative Benchmark Testing: Evidence of Impact on Teachers and

advertisement
Learning from Interim Assessments: District
Implementation to Classroom Practice
JAMES H. MCMILLAN
LISA M. ABRAMS
VIRGINIA COMMONWEALTH UNIVERSITY
MARCES Annual Conference
University of Maryland, College Park
October 20, 2011
(PowerPoint available at http://www.soe.vcu.edu/merc)
Flight Plan
 Why we need interim




assessment
What the research says about
impact
Qualitative study summary
Quantitative study summary
Recommendations for practice
Need for Interim Assessments
 Increased pressure to understand student achievement:



Are students making progress toward meeting the requirements of the
state test?
Are students on track to pass the state test?
Are subgroups of students on track to meet AYP targets?
 Greater information needs:






Measure of student progress relative to a set of specific content
standards/skills
Identify content areas of strength/areas for improvement
Shape instructional decisions
Serve as an “early warning” system
Inform strategies to support the learning of individual students
Results that can be aggregated: student classroom grade/team level
school district levels
Offer a Range of Instructional Uses
(see Supovitz & Klein, 2003)
 Planning:


Decide on content
Pace and instructional strategies or approaches (i.e., mastery orientation)
 Delivery:



Targeted instruction: whole class or small groups depending on mastery
of content/skills
Provide feedback and/or re-teaching selected content and/or skills
Selection and use of supplemental or additional resources
 Remediation:


Identify low-performing students
Design plans for providing additional supports/assistance
 Evaluation:



Monitor/track student progress
Examine effectiveness of interventions
Determine instructional effectiveness
What We Know About Interim Testing
 Widespread use across districts in Virginia and nationally
(Marsh, Pane & Hamilton, 2006).
Mixed views on usefulness of interim test results
 Compared to own classroom assessments  less useful, provide
redundant information.
 Compared to state test more useful than those of state tests to
“identify and correct gaps in their teaching.”

 Factors that influence teachers’ views: quick turnaround of
results, alignment with curriculum, capacity and support,
instructional leadership, perceived validity, reporting, addedvalue
Impact on Teachers
 Informs instructional adjustments (Brunner et al., 2005; Marsh, Pane &
Hamilton, 2006; Oláh, Lawrence & Riggan, 2010; Yeh, 2006)
 Increased collaboration and problem solving (Lachat & Smith, 2005;
Wayman & Cho, 2009; Yeh, 2006)
 Enhanced self-efficacy, increased reflection (Brunner et al., 2005; Yeh,
2006)
 Increased emphasis on testing; test preparation and primary
influence of colleagues and standards on practice (Loeb, Knapp & Elfers,
2008)
 Variability within schools – some teachers use information others do
not –80% of the variability in teacher survey responses was within
rather than between schools (Marsh, Pane & Hamilton, 2006).
Impact on Students
 Achievement – although limited, research suggests impact may be
mixed

Targeted instruction led to improvements in student test scores (Lachat &
Smith, 2005; Nelson & Eddy, 2008; Trimble, Gay & Matthews, 2005; Yeh,
2006) and proficiency in reading and mathematics (Peterson, 2007).

Large-scale studies have failed to find significant differences in student
achievement between treatment and comparison schools (Henderson,
Petrosino & Guckenburg, 2008; May & Robinson, 2007; Quint, Speanik &
Smith, 2008).
 Increased engagement and motivation (Yeh, 2006)
 Increased access to learning opportunities – tutoring and remedial
services (Marsh, Pane & Hamilton, 2006)
 Targeted instruction toward the “bubble kids”.
MERC Research on Interim Assessments
 Qualitative study
Explored the extent to which teachers used interim test
results to support learning.
 Quantitative study
 Designed to examine teachers’ self-reports about using
interim test results and the influence of results on instruction.
 What conditions are necessary to promote use of test
results?
 How do teachers analyze and use interim test results to
inform instruction? To inform decisions about
students?
 What most influence teachers’ use of test results?

Qualitative Study Research Design and Methods
 Qualitative double-layer category focus-group design (Krueger &
Casey, 2009)
 Layers : school type & district (N=6)
 Protocol
 the general nature of interim testing policies and the type of
data teachers receive
 expectations for using interim test results
 instructional uses of interim test results
 general views on interim testing policies, practices and
procedures
 Focus group sessions
Participants
 Selection: two-stage convenience sampling process

District  School Principal  Teachers
 Data Collection:

Spring 2009/Fall 2010; 15 focus groups w/67 core-content area teachers
 Demographic Profile:




The majority were: white (82%), female (88%), taught at the elementary
level (80%)
Average of 11.5 years of classroom experience (range of 1-34 yrs.)
33% were beginning teachers with 1-3 years of teaching experience and
20% had been teaching for over 20 years.
20% were middle school teachers in the areas of civics, science,
mathematics and language/reading
Data Analysis
 Transcript-based approach using a constant-comparative
analytic framework was used to identify emergent patterns
or trends (Krueger & Casey, 2009).
 Analysis focused on the frequency and extensiveness of
viewpoints or ideas
 Codes created in 9 key areas and applied to the text
 “alignment”, “test quality”, “individualized instruction”,
“testing time”
 High inter-coder agreement
Findings: District Policies and Expectations
 Theme 1: Interim testing policies
related to test construction and
administration were similar among
school divisions. Inconsistencies
were evident across content areas
and grade levels within districts.
They are graded, but they are not part of
their grade. So they will [interim test
results] show up on their report card as a
separate category just so parents know
and the students know what the grade is,
but it doesn’t have any effect on their
class grade.
 Theme 2: There are clear and
consistent district- and buildinglevel expectations for teachers’
analysis and use of interim test
results to make instructional
adjustments in an effort to increase
student achievement.
Our principal expects when you have a
grade level meeting to be able to say, this
is what I’m doing about these results,
because it is an unwritten expectation but
it is clearly passed on… by sitting down
with them the first time they are giving the
test and describing how you do data
analysis and literally walking them through
it and showing them patterns to look for.
Findings: Access to Results and Analysis
 Theme 3: Timely access to test
results and use of a software
program supported data analysis
and reporting.
That if we are supposed to be using
this information to guide instruction
we need immediate feedback, like
the day of, so we can plan to adjust
instruction for the following day.
 Theme 4: It was important for
teachers to discuss results with
others and have time with colleagues
to discuss results.
We have achievement team
meetings where we look at every
single teacher, every single class,
everything, and look at the data
really in depth to try to figure out
what’s going on. What is the
problem with this class? Why is this
one doing better?
Findings: Informing Instruction
 Theme 5: Teachers’ analyze
interim test results at the class and
individual student level to inform
review, re-teaching, and remediation
or enrichment.
If I see a large number of my students
missing in this area, I am going to try to
re-teach it to the whole class using a
different method. If it is only a couple of
[students], I will pull them aside and
instruct one-on-one.
It makes a difference in my instruction.
I mean, I think I’m able to help students
more that are having difficulty based on it.
I am able to hone in on exactly where the
problem is. I don’t have to fish around.
 Theme 6: A variety of factors
related to data quality and validity
impact teachers’ use of interim test
data.
We really need to focus on the tests
being valid. It is hard to take it seriously
when you don’t feel like it is valid. When
you look at it and you see mistakes or
passages you know your students aren’t
going to be able to read because it is way
above their reading level.
Findings: Testing Time vs. Learning Time
 Theme 7: Teachers expressed significant concerns about the amount
of instructional time that is devoted to testing and the implications for
the quality of their instruction.
I think it is definitely made us change
the way we teach because you are
looking for how can I teach this the
most effectively and the fastest…that is
the truth, you have got to hurry up and
get through it [curriculum] so that you
can get to the next thing so that they
get everything [before the test]. I do
feel like sometimes I don’t teach things
as well as I used to because of the
time constraints.
You are sacrificing learning time
for testing time…we leave very
little time to actually teaching.
These kids are losing four weeks
out of the year of instructional
time.
Conclusions From Qualitative Study
 In the main, consistent with other research.
 Importance of conversations among teachers.
 Relatively little emphasis on instructional correctives.
 Alignment and high quality items are essential.
Quantitative Study: Research Design and Methods
o Survey design
o Conducted Spring 2010
o Administered online in 4 school districts
o Target population: elementary (4 and 5th grades) and middle school
teachers (core content areas)
o 460 teachers responded; 390 w/useable responses
o Response rates ranged from 25.4% to 85. 1% across the districts1
o Survey items adapted from the Urban Data Study survey, American
Institutes for Research
o Analyses
o
o
Frequency and measures of central tendency
Factor analysis and regression procedures
1. Response rate reported for 3 of the 4 participating districts due to difference in recruitment procedures.
Demographic Information: Race and Gender
Characteristics
n
%
61
15.7
Females
328
84.3
White
362
93
Black/ African
American
21
5.4
Other
7
.16
Gender a
Males
Race
Note: Total Sample Size N = 390. a. The data contain one missing value.
Demographic Information:
Grade Level and Years of Experience
Characteristics
n
%
Elementary
169
43.3
Middle
221
56.7
5 or less years
56
14.9
6- 10 years
93
24.8
11 + years
226
60.3
Grade Level
Teaching Experience
Note: Total Sample Size N = 390
Demographic Information:
Subjects and Grade Level
Characteristics
n
%
All (Elementary)
102
26.2
Reading/ English
119
30.5
111
28.5
Elementary
169
43.3
Middle
221
56.7
Subject a
Language Arts
Mathematics
Grade Level
Note: Total Sample Size N = 390. a. Responses to this item allowed for multiple selections.
Demographic Information: Degrees
Characteristics
n
%
Bachelors Degree
382
97.9
Masters Degree
197
65
Educational Specialist/
36
18.8
21
11.7
Educational Qualification a
Professional Diploma
Certificate of Advanced
Graduate Studies
Note: Total Sample Size N = 390. a. Frequencies will not add up to N=390 due to multiple selections by participants
Interim Assessment Survey
Survey Topics:
Policies and Procedures
Variables for Analysis:
Six Conditions for Use
Accessing Test Data
Analyzing Results
Instructional Adjustments
Authentic Strategies
Instructional Uses
Use of Scores
Attitudes
Traditional Strategies
Demographics
Condition 1: Alignment
Items on Scale
% Agree or Strongly Agree
1.Well-aligned with state and division standards
77%
2. Well-aligned with the state assessment
69%
3. Well- aligned with the pacing guides
75%
4. Well-aligned with what I teach in the classroom
77%
5. Appropriately challenging for my students
69%
Note: Scale- Strongly Disagree= 1; Disagree= 2 Agree= 3 Strongly Agree= 4
Reliability estimate for the scale, Cronbach’s α =.901 (n=300).
Condition 2: Division (District) Policy
Items on Scale
% Agree or Strongly Agree
1. The division sets clear, consistent goals for
schools to use data for school improvement.
63%
2. Division staff provide information and expertise
that support the data use efforts at my school.
48%
3. The division’s data use policies help us address
student needs at our school.
45%
4. The division has designated adequate resources
(e.g. time, staff, money) to facilitate teachers’ use of data.
28%
Note: Scale- Strongly Disagree= 1; Disagree= 2 Agree= 3 Strongly Agree= 4
Reliability estimate for the scale, Cronbach’s α =.864 (n= 267).
Condition 3: School Environment
Items on Scale
% Agree or Strongly Agree
1. Teachers in this school are continually
learning and seeking new ideas.
86%
2. Teachers are engaged in systematic
analysis of student performance data.
72%
3. Teachers in this school approach their
work with inquiry and reflection.
83%
4. Assessment of student performance
leads to changes in the curriculum.
54%
5. Teachers in this school regularly
examine school performance on assessments.
78%
Note: Agreement Scale- Strongly Disagree= 1; Disagree= 2 Agree= 3 Strongly Agree= 4
Reliability estimate for the scale, Cronbach’s α =.856 (n=283).
Condition 4: Time Spent Analyzing
and Reviewing Interim Data
Items on Scale
% 1-2 or more hours
Independently
70%
Analyzing with other teachers
46%
Analyzing with principal/assistant principal
11%
With students
45%
With parents
8%
Note: Frequency Scale- 0, <1 hour; 1-2 hours; 2-3 hours; more than 3 hours;
Reliability estimate for the scale, Cronbach’s α =.718 (n = 358).
Condition 5: Frequency of Analysis and Review
Items on Scale
% 1-2 Times a Month or More
Department chair/grade-level chair
15%
Grade level lead teacher
20%
Other teachers
34%
Instructional coaches
10%
School administrators
12%
Central office staff
2%
Parents/guardians
11%
Students
26%
Note: Frequency scale- Never = 1; 1-2 times a quarter = 2; 1-2 times a month= 3; 1-2 times a week = 4 ;
Reliability estimate for the scale, Cronbach’s α =.815 (n = 200).
Condition 6: Teachers’ Interactions
Items on Scale
% Moderate or
Major Extent
Grade level teams to review data
56%
Share ideas to improve teaching
56%
Share and discuss student work
66%
Discuss unsuccessful lessons
56%
Note: Extent Scale: Not at all = 1; Slight Extent = 2; Moderate Extent= 3; Major Extent = 4
a. Reliability estimate for the scale, Cronbach’s α =.867 (n = 361).
Conditions: Some Additional Individual Items
Items on Scale
% Hinder to Moderate
or Major Extent
Lack of time to study and think about data
51%
Lack of time to collaborate with others
52%
Insufficient professional development
27%
Data provided too late for use
7%
Curriculum pacing pressures
60%
Central office staff
2%
Parents/guardians
11%
Students
26%
Of little use in my instruction
37% agree or strongly agree
Note: Scale
not at all = 1;
minor = 2;
moderate = 3;
major = 4 ;
Instructional Adjustments
Component 1 (n=303; alpha = .90)
Loading
M
SD
Adjusting goals for student learning
.758
2.40
.954
Determining a student’s grouping for instruction
.735
2.19
.993
The types
of assessments
Scale
Range I=use
1-to4evaluate students
.710
2.24
.940
1= no influence
change
The instructional
strategies I or
employ
on instruction
4= major influence or change on instruction
Adjusting pacing in areas where students encountered problems
.698
2.40
.911
.689
2.56
.957
Adjusting use of textbooks and instructional materials
.672
2.12
.953
Changed teaching method
(e.g., lecture, cooperative learning, student inquiry)
.650
2.57
.956
The curriculum content I teach
.650
1.89
.911
Use same-level achievement groupings
.619
2.19
1.00
Changed the sequence of instruction
.617
2.30
1.00
Used mixed-level achievement groupings
.557
2.27
1.02
Added, deleted, or changed skills taught
.541
2.57
.917
Component
2.31
Instructional Adjustments: Some Individual Items
 85% of teachers reported making some kind of change
in instructional strategies
 67% of teachers reported some level of change in
student expectations
 84% of teachers reported some level of influence in
adjusting goals for student learning
 35% of teachers indicated that reviewing results with
their principal or assistant principal was somewhat or
very useful
Instructional Adjustments: Some Individual Items
Items on Scale
% Increase
Problem-solving activities
58%
Cooperative/group learning
49%
Inquiry/investigation
47%
Peer tutoring
31%
Collaboration/team teaching
29%
Worksheets
8%
Textbook-based assignments
8%
Lecturing
7%
Authentic Instructional Strategies
Component 2 (n=310; alpha = .82)
Loading
M
SD
Inquiry/Investigation
.767
3.54
.798
Problem-solving activities
.732
3.67
.790
Scale Range = 1- 5
Project-based
assessments
1= Large
decrease in use of strategy
5= Large
increase
in use of strategy
Use of student
response
journals
.697
3.31
.816
.659
3.55
1.05
Collaborative/team teaching
.630
3.64
.998
Peer or cross-age tutoring
.616
3.61
.948
Use of portfolios
.606
3.61
1.21
Cooperative learning/group work
.602
3.55
.739
Component
3.56
Use of Scores
Component 3 (n=283; alpha = .79)
Loading
M
SD
Results for subgroups of students
(e.g., SWD, ELL/LEP, gender, race/ethnicity)
.766
2.36
1.05
Scale scores or other scores that show how close students
are to performance levels
.736
2.43
1.03
Results for each grade level
.724
2.23
1.07
Results for specific reporting categories
.698
2.77
1.03
Percent of students score at or above the proficient level
.662
2.85
.971
Component
2.53
Scale Range = 1- 4
1= No use
4= Extensive use
Traditional Instructional Strategies
Component 4 (n=320; alpha = .61)
Loading
M
SD
Lecturing
.687
2.94
.732
Worksheets
.635
2.97
.687
Textbook-based assignments
.563
3.14
1.09
Component
3.02
Scale Range = 1- 5
1= Large decrease in use of strategy
5= Large increase in use of strategy
Bivariate Correlations Between Conditions and Use
Condition
Instructional
Adjustments
Authentic
Instructional
Strategies
Use of Scores
Traditional
Instructional
Strategies
Alignment
.229*
.101
.266*
.007
Division Policy
.336**
.087
.373**
-.022
School
Environment
.218**
.084
.189*
.088
Frequency of
Analysis and
Review
.425**
.249**
.400**
.036
Teachers’
Interactions
.381**
.113*
.398**
-.039
Time Spent
Analyzing
.425**
.249**
.400**
.036
*correlations significant at .05; **correlations significant at .01.
Stepwise Regression:
Conditions With Instructional Adjustments
Model
Variable
R
R2
Beta
Sig.
1
Time Spent
Analyzing
.488
.239
.300
.000
2
Division Policy
.530
.281
.153
.000
3
Frequency
Reviewing
Data
.549
.301
.154
.006
4
Teachers’
Interactions
.559
.312
.129
.039
Note: Total Sample Size N = 390. a. The data contain one missing value.
Stepwise Regression:
Conditions With Use of Specific Scores
Model
Variable
R
R2
Beta
Sig.
1
Frequency
Reviewing
Data
.406
.165
.175
.000
2
Division Policy
.487
.237
.219
.000
3
Time Spent
Analyzing
.515
.265
.163
.006
4
Time
Interacting
with Others
.530
.281
.156
.039
Note: Total Sample Size N = 390. a. The data contain one missing value.
Conclusions From Quantitative Study
 Interim testing may serve a meaningful formative purpose and effect
instruction.
 District policy and school leadership that encourage an environment in which use of data is


encouraged and supported, and making time available for teacher review and analysis of data
(especially with other teachers) is positively related to teachers’ instructional adjustments and
use of specific report scores.
Teachers report extensive use of interim test data across many different instructional
adjustments. No single type of adjustment was used most often.
Only 37% of teachers agree or strongly agree that interim testing is of little use in instruction.
 Elementary school teachers’ use of interim data only slightly greater
than middle school teachers’ use.
 Greatest barriers to using interim data are lack of time for review and
analysis of data and pacing guide pressures.
Recommendations for Effective Practice
Recommendation
VCU
Qualitative
Study
VCU
Quantitative
Study
Researcher
Experience
with Districts
Clarify purpose – focus on
instructional adjustments
✔
✔
Establish alignment evidence
– content
✔
✔
Establish alignment evidence
– cognitive level
Provide clear guidelines for
use
Establish district and school
environments that support
data-driven decision making
✔
✔
✔
✔
✔
Literature
Riggan & Olah, 2011; Olah, Lawrence & Riggin,
2010;
Blanc, Christman, Liu, Mitchell, Travers &
Bulkley, 2010;
Bulkley, Christman, Goertz & Lawrence, 2010;
Christman, Neild, Bulkley, Blanc, Liu, Mitchell &
Travers, 2009; Yeh 2006
Blanc, Christman, Liu, Mitchell, Travers &
Bulkley, 2010;
Bulkley, Christman, Goertz & Lawrence, 2010;
Goertz,, Olah & Riggan, 2009; Hintze &
Silberglitt, 2005
Bulkley, Christman, Goertz & Lawrence, 2010
Blanc, Christman, Liu, Mitchell, Travers &
Bulkley, 2010;
Bulkley, Christman, Goertz & Lawrence, 2010
Blanc, Christman, Liu, Mitchell, Travers &
Bulkley, 2010;
Bulkley, Christman, Goertz & Lawrence, 2010;
Christman, Neild, Bulkley, Blanc, Liu, Mitchell &
Travers, 2009;
Goertz, Olah & Riggan, 2009 & Yeh, 2006
Recommendations for Effective Practice
Recommendation
Use high quality items
Provide structured time for
review and analysis
Use teams of teachers for
review and analysis
Include estimates of error
Distribute questions along
with results, with numbers of
students selecting each
alternative
Monitor unintended
consequences
Document costs – How much
instructional time is being
replaced by testing, test prep,
and review and analysis of
results? How much does the
process cost in terms of
software and personnel?
VCU
Qualitative
Study
VCU
Quantitative
Study
Researcher
Experience
with Districts
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
✔
Literature
Bulkley, Cjristman, Goertz & Lawrence, 2010;
Yeh, 2006
Blanc, Christman, Liu, Mitchell, Travers & Bulkley,
2010
Blanc, Christman, Liu, Mitchell, Travers & Bulkley,
2010; Goertz,, Olah & Riggan, 2009; Yeh, 2006
Bulkley, Christman, Goertz & Lawrence, 2010;
Yeh, 2006
Olah, Lawrence & Riggan, 2010; Bulkley,
Christman, Goertz & Lawrence, 2010;
Yeh, 2006
✔
Bulkley, Christman, Goertz & Lawrence, 2010;
✔
✔
Recommendations for Effective Practice
Recommendation
Evaluate use of results – What
evidence exists that teachers are
using results to modify instruction
and that students are learning
more?
Provide adequate professional
development
VCU
Qualitative
Study
VCU
Quantitative
Study
✔
✔
✔
✔
✔
✔
Address effect of pacing guide
Keep items secure until after test is
administered
Literature
Bulkley, Christman, Goertz & Lawrence,
2010;
Standardize administrative
procedures for all schools within a
district – No longer than one hour
for each test
Ensure fairness
Verify results with other evidence
Researcher
Experience with
Districts
✔
✔
✔
✔
Bulkley, Christman, Goertz & Lawrence,
2010; Christman, Neild, Bulkley, Blanc,
Liu, Mitchell & Travers, 2009
Learning from Interim Assessments: District
Implementation to Classroom Practice
Questions?
Download