Written for CADRE
by Abt Associates (Daphne Minner, et. al.), 2012
In 2011, the National Science and Technology Council
reviewed how 13 federal agencies spent $3.4 billion in
fiscal year 2010 to support STEM education.
NSF was found to have made the largest investment in
STEM education, and its DRK-12 program had the largest
budget of its 6 educational research and development
The compendium reviewed here (Part 1 of 2) focuses on 5
cohorts of DRK-12 projects (2008-2012) that utilized
instruments designed to assess teacher practices,
pedagogical content knowledge, and content knowledge.
The purpose of this compendium is to provide an overview
on the current status of STEM instrumentation commonly
being used in the U.S and to provide resources useful to
research and evaluation professionals.
Research Question: What are the instruments,
constructs, and methods being used to study
teacher outcomes within the DR-K12 portfolio?
Only extant, named instruments (as opposed to
instruments being developed as part of a current
proposal) were included.
Two Phases:
◦ Phase 1: A review of all proposals funded by DRK-12
2008-2012 revealed 295 eligible projects.
◦ Phase 2: Data collection was conducted for instrumentspecific information about reliability and validity
evidence, development and piloting, accessibility of the
instrument, administration, and constructs measured.
Since CADRE is funded as a cooperative
agreement rather than a contract, they were
unable to access Fastlane files and relied on
materials provided by PIs.
For 36 projects, materials were unavailable.
For 8 of the 57 PCK instruments, the actual
instruments were unavailable.
6 instruments required purchasing.
75 projects proposed to measure teacher
practices, PCK, or Content: 71% measured
only 1outcome, 24% measured 2 outcomes,
and 5% measured all 3.
Instruments Identified
◦ Practices: 42
◦ PCK: 24
◦ Content Knowledge: 27
5 Categories of Instruments
Instructional Practices
Instructional Practices plus Additional
Constructs (Appendix B)
Instructional Beliefs
(Appendix A)
(Appendix C)
System-wide Reform Focused
Discourse Focused
(Appendix E)
(Appendix D)
Need to be more cognizant about providing
relevant psychometric information on the
tools used and developed in order for others
to reliably implement the tools in their own
Instruments developed must go through
rigorous reliability and validity
Initial step towards the systematic
assessment and improvement of STEM
research tools.
Eleven instruments that primarily assessed classroom
instructional practices:
Seven observation protocols, Three rubrics, One survey
Predominantly designed for pre-k through middle school
teachers (6, 55%)
More focused on science (5, 45%) than mathematics (3, 27%)
or technology (2, 18%).
The three science observation protocols capture variables
ranging from the lesson’s temporal flow and percentage of
time students spend in different types of groupings, to the
extent of opportunity for students to engage in the various
phases of the investigation cycle.
The two science scoring rubrics are intended to be applied to
lesson artifacts and instructional materials that the teacher
provides students. They contain codes for student grouping,
structure of lessons, use of scientific resources, hands-on
opportunities through investigation, cognitive depth of the
materials, encouragement of the scientific discourse
community, and opportunity for explanation/justification,
and connections/applications to novel situations.
Across these eleven instruments, one had low reliability
evidence, and four (36%) had acceptable or good evidence.
For only two instruments was the team able to find validity
Instructional Strategies Classroom
Observation Protocol
◦ Identifying sense of purpose; asking account of student
ideas; engaging students with relevant phenomena;
developing and using scientific ideas; promoting student
thinking about phenomena, experiences, and knowledge
Scoop Notebook – Artifact rubric
◦ Portfolio assessment that captures: grouping, structure of
lessons, use of scientific resources, hands-on, inquiry,
cognitive depth, scientific discourse community,
explanation/justification, assessment,
11 instruments that measure instructional practices in addition to
one or two other constructs, meaning:
◦ physical context
◦ demographics
◦ teacher content knowledge
◦ an aspect of classroom management
This more comprehensive nature is also reflected in the subject
domains being assessed—
◦ 2 each, mathematics and science
◦ 5 both mathematics and science
◦ 1 technology
◦ 1 general teaching skills
Exist for many subjects. Middle School version tests all sciences very
generally, whereas high school breaks it apart by specific domains
Sit-down test, 4 hours, 50 multiple-choice questions and 2
constructed-response questions
The test is designed to provide evidence that an examinee has a
basic working knowledge of teaching foundations
Ratings are made after at least 3 hours of observation
Ratings for each item are made on a 7-point scale. Behavioral
descriptors are present at the 1, 3, 5, and 7 levels.
Assesses the materials and instructional supports for math and
science learning present
Efficacy Belief
(MTEBI, 2000)
Modified from
Riggs (California
State U) & Enochs
(Kansas State
What’s measured
Personal math teaching
efficacy (13 items) & math
teaching outcome
expectancy (8 items)
 Extent to which teachers
believe they have the
capability to positively
affect student
Validity Evidence
(Likert 5-point
MTEBI:21 items,
(STEBI:25 items,
Principles of
Chapman (Utah
State U) & AbdHamid (U of
Iowa), 2010
What’s measured
Teacher (& student)
perceptions of frequency
of occurrence when
students are responsible
for each of 5 principles of
scientific inquiry (NRC)
Validity Evidence
(5-point, 20
(teacher &
student version)
 Extent to which students Construct
are experiencing inquiry
in science classrooms
PS: Why is VNOS –C included?
What’s measured
Validity Evidence
Local Systemic
Protocol (LSC)
Research Inc.,
Inside the
Teacher Interview
Overall quality of
observed math/science
lesson: lesson design;
math/science content;
classroom culture; likely
impact on students'
(5-point scale)
Teachers' perceptions of
factors that influenced
selection of lesson
content and pedagogy
(Good inter-rater
% reliability)
Thirteen instruments looking at instructional
practices and social aspects of classroom
community (including class management).
Observation protocols
Six are non-domain specific
Three are math-specific
Three are science-specific
One measures both
Seven demonstrated more than one type of
validity (more than other categories)
Three scales
◦ Lesson design implementation
◦ Content ->PCK->Propositional and Procedural
◦ Classroom culture (e.g., egalitarian s-t relationship)
High interrater % agreement
High Validity
◦ Construct
◦ Content
◦ Predictive
Three domains
◦ Emotional
◦ Classroom organization
◦ Instructional support
High internal consistency
High interrater % agreement
Content validity
25 Content Tests, 12 General tests, 8 Science, 3 Math, 1 Science and
Math, 1 Technology.
◦ General Tests: American College testing, GRE, ITBS-Iowa Test of Basic Skills,
◦ Science: MOSART, FACETS, IL Certification Testing System Study Guide-Science,
FCI Force Concept Inventory Assessment, DTAMS-science: Diagnostic Science
Assessment for Middle School Teachers, Classroom Test of Scientific Reasoning
◦ Math: MKT, M-SCAN, DTAMS-math: Diagnostic Math Assessment for Middle
School Teachers.
◦ Science and Math: TIMSS
◦ Technology: TAGLIT: Taking a Good Look At Instructional Technology
12 of them Student Test, 9 Teacher Test, 2 Survey, 1 observation
Tool, 1 Student test and Teacher Tools.
7 K-12 level, 4 elementary and middle, 4 Postsecondary, 3 high
school, 2 middle, 1 grades4-9, 1 elementary, 1 middle and high,
1 high and postsecondary, 1 no level indicated.
Each assessment is composed of 25 items—20 multiple-choice and 5
Paper-and-pencil format
Pre- and post-tests before and after workshops
To determine growth in teachers' content knowledge
To be completed by test-takers within an hour.
Each assessment has 3-4 science sub-domains.
Available for use free of charge.
Scored for a fee of $10 per teacher per assessment -includes scores on
individual items, on each science sub-domain in the content area, and on
four different knowledge types (memorized, conceptual understanding,
higher-order thinking, pedagogical content knowledge)
http://louisville.edu/education/centers/crmstd/diag-sci-assess-middle http://louisville.edu/education/centers/crmstd/diag-sci-assess-middle
Free and can be accessed after completion of four online tutorials
that explain test design, use, scoring, and interpretation of results.
A set of multiple-choice items include K–12 physical science and
earth science content, and K–8 life science content in the NRC NSES
as well as to the research literature about misconceptions
concerning science concepts.
ASW – Analysis of Student Work: A rubric is used to
score teachers’ evaluations of a standardized set of
video cases of student problem solving.
LoU – Levels of Use Interviews: An interview
determines how a change is being implemented in
the classroom.
SEPUP - Group Interaction and Communication of
Scientific Information Rubrics: Rubrics are used to
grade student work on a variety of measures
including how they design and conduct an
investigation, analyze data, understand concepts,
evaluate evidence and identify tradeoffs,
communicate scientific information, and work
cooperatively in a group.
Detailed access information can be found for
each instrument in Appendices H & I of the
Part 2 of the compendium (not covered here)
details measurement of students’ content
knowledge, reasoning skills, and
psychological attributes.

Feb 1 - Teacher research instruments DRK12