Document 12884637

advertisement
Journal of Assessment and Accountability in Educator Preparation
Volume 2, Number 1, February 2012, pp. 48-57
Teacher Education Accountability Systems:
What an Examination of Institutional Reports
Shows About Current Practice
Robert M. Boody, Tomoe Kitajima
University of Northern Iowa
Teacher education units that want to be accredited by the National Council for Accreditation of Teacher
Education (NCATE) must have a system that collects, analyzes, and uses data about student performance and
unit operations (NCATE, 2008). And even for those institutions not NCATE affiliated, most states have
incorporated similar standards into state accreditation. With different audiences requiring more and more
evidence, the use of an accountability system has become a necessity. But there is very little professional
literature on the topic.
What is “best practice” in accountability systems? What are important issues with which some
institutions are still struggling? And what seem to be productive answers? There are few studies that describe
or report on such accountability systems outside of NCATE accreditation documents. There is little literature
that attempts to explore conceptual grounding for such systems. There is little comparative work across
institutions. In 2002 when NCATE put out a document with examples of systems, it was noted that most of
the assessment submitted for possible inclusion had only been recently developed and in fact were still under
development (Elliott, 2003).
The purpose of this study is to provide a concrete examination of important aspects of accountability
systems through the lens of 10 specific teacher education institutions.
In this article we address six important issues, including (a) the system’s computer backbone, (b) evidence for
reliability and validity, (c) decision points used for initial licensure, (d) processes for decision-making
processes about students, (e) regular evaluation of the system, and (f) use of data for program improvement.
The results of our study suggest that there is room for improvement in the state of practice in
accountability systems, or at least as portrayed in IR documentation. Perhaps part of the solution is the need
to raise awareness of evaluation issues among teacher education faculty. But we also believe that there is
need for more detailed descriptive work and comparative analysis of systems to go beyond merely reading the
Institutional Reports. And we also believe that there is a need to develop additional conceptual understandings and practical processes.
Throughout education and other professions such as
medicine there is an increasing push to collect and use
data in decision making. The No Child Left Behind
legislation is a case in point. Teacher education is not
immune from this trend. Certain types of data are
required to be reported to the federal government by the
Higher Education Act. Teacher education units that
want to be accredited by the National Council for
Accreditation of Teacher Education (NCATE) must
have a system that collects, analyzes, and uses data
about student performance and unit operations
(NCATE, 2008). Even for those institutions not
Correspondence: Robert Boody, Educational Psychology & Foundations, College of Education, University of Northern Iowa,
Cedar Falls, IA 50614-0607. Email: robert.boody@uni.edu
Journal of Assessment and Accountability in Educator Preparation
Volume 2, Number 1, February 2012, 48-57
Teacher Education Accountability Systems
NCATE affiliated, most states have incorporated
similar standards into state accreditation.
With
different audiences requiring more and more evidence,
the use of an accountability system has become a
necessity. Thus, all teacher education programs have
an accountability system. But there is very little
professional literature on the topic. What is “best
practice” in accountability systems? What are important issues with which some institutions are still
struggling? And what seem to be productive answers?
There are few studies that describe or report on such
accountability systems outside of NCATE accreditation
documents. There is little literature that attempts to
explore conceptual grounding for such systems. There
is little comparative work across institutions.
It became clear as we developed a literature review
on accountability and assessment systems in teacher
education that there is simply not much literature
available. There is some, of course, and several large
bodies of related material that is helpful, but little
material that is directly applicable. In some ways this
should not be surprising. It was only in 2000 that
NCATE changed to a performance-based system for
accreditation. Even in 2003 when NCATE put out a
document with examples of systems, it was noted that
most of the assessments submitted for possible
inclusion had only been recently developed and in fact
were still under development (Elliott, 2003). Although
assessment of teacher education candidates has a long
history, the type of broad ranging accountability
systems that NCATE Standard 2 (NCATE, 2008)
expects is a recent development.
The purpose of this study is to provide a concrete
examination of important aspects of accountability
systems through the lens of 10 specific teacher
education institutions. Please note that our purpose
here is not to evaluate these 10 accountability systems;
that is the job of the NCATE or state accreditation
review teams or both. Our purpose, rather, is to
describe what is being done—the current state of
practice—including commonalities and differences
across the institutions. Thus, our review is focused
more on what is important from disciplinary and
functional perspectives rather than what different
review teams will allow or flag in the accreditation
process.
In this article we address six important issues,
including (a) the system’s computer backbone, (b)
evidence for reliability and validity, (c) decision points
used for initial licensure, (d) processes for decisionmaking processes about students, (e) regular evaluation
49
of the system, and (f) use of data for program
improvement.
Method
Institutions Studied
The 10 institutions we studied were chosen
purposively rather than randomly. Thus, our findings
should not be interpreted as representative of some
specific larger population. First, we chose several
institutions with highly respected teacher education
programs as we hoped they would likely have high
quality assessment systems. Second, for the same
reason, we chose only NCATE accredited institutions
so we could be sure that they considered Standard 2 as
something important to follow. Third, because we
wanted to examine the state of the art now rather than
as it was years ago, we chose institutions had been
through NCATE accreditation recently; of the 10
institutions studied, 1 was visited in 2007, 2 in 2008, 2
in 2010, and all the rest in 2011. Fourth, to make the
study feasible, we chose only programs that had their
NCATE Institutional Report (IR), which would include
a description of the accountability system, online.
Finally, we chose institutions that would provide a
certain amount of variability in size and type of
institution.
Because our interest was not in evaluating the
systems nor in generating contention, we have not
publicly identified the institutions. Total campus
student body size varied between fewer than 2,000
students to as many as 24,000. The majority of
institutions were state assisted comprehensive
universities, but the sample included both private
liberal arts and research intensive institutions as well.
Data Sources
All of our data was taken from each institution’s
online Institutional Report (IR), a document used for
NCATE accreditation. It is possible that in some cases
a teacher preparation unit might have information about
their assessment system that they do not put in the IR.
Ultimately, then, our data is not the accountability
system as it exists on the ground, but the system as it is
described in the IR. The two perspectives on the
system may be quite similar or less similar. But in
either case, since the IR is the public face of the unit to
50
Journal of Assessment and Accountability in Educator Preparation
Table 1
NCATE Standard 2 with Subcategories and Target-Level Descriptors
Standard 2: Assessment System and Unit Evaluation
The unit has an assessment system that collects and analyzes data on applicant qualifications, candidate and
graduate performance, and unit operations to evaluate and improve the performance of candidates, the unit,
and its programs.
2a. Assessment System
TARGET
The unit, with the involvement of its professional community, is regularly evaluating the capacity and
effectiveness of its assessment system, which reflects the conceptual framework and incorporates candidate
proficiencies outlined in professional and state standards. The unit regularly examines the validity and
utility of the data produced through assessments and makes modifications to keep abreast of changes in
assessment technology and in professional standards. Decisions about candidate performance are based on
multiple assessments made at multiple points before program completion and in practice after completion of
programs. Data show a strong relationship of performance assessments to candidate success throughout
their programs and later in classrooms or schools. The unit conducts thorough studies to establish fairness,
accuracy, and consistency of its assessment procedures and unit operations. It also makes changes in its
practices consistent with the results of these studies.
2b. Data Collection, Analysis, And Evaluation
TARGET
The unit's assessment system provides regular and comprehensive data on program quality, unit operations,
and candidate performance at each stage of its programs, extending into the first years of completers’
practice. Assessment data from candidates, graduates, faculty, and other members of the professional
community are based on multiple assessments from both internal and external sources that are
systematically collected as candidates progress through programs. These data are disaggregated by program
when candidates are in alternate route, off-campus, and distance learning programs. These data are regularly
and systematically compiled, aggregated, summarized, analyzed, and reported publicly for the purpose of
improving candidate performance, program quality, and unit operations. The unit has a system for
effectively maintaining records of formal candidate complaints and their resolution. The unit is developing
and testing different information technologies to improve its assessment system.
2c. Use Of Data For Program Improvement
TARGET
The unit has fully developed evaluations and continuously searches for stronger relationships in the
evaluations, revising both the underlying data systems and analytic techniques as necessary. The unit not
only makes changes based on the data, but also systematically studies the effects of any changes to assure
that programs are strengthened without adverse consequences. Candidates and faculty review data on their
performance regularly and develop plans for improvement based on the data.
________________________________________________________________________________________________________________________________________
Note: only the Target level (the highest level) is provided here. The original document also includes two
other levels: Unacceptable and Acceptable. Adapted from Professional Standards for the Accreditation of
Schools, Colleges, and Departments of Education. Copyright 2008 by the National Council for Accreditation of Teacher Education.
Teacher Education Accountability Systems
both the accreditation site visit team as well as to peers
or any of the general public which care to access it, we
believe that the IR provides a reading on how the unit
understands and implements the technical and practical
requirements of Standard 2, the NCATE accreditation
standard that directly addresses unit accountability
systems (NCATE, 2008). This Standard is outlined in
Table 1 for easy reference.
Results
Computer Backbone
Although it is certainly theoretically possible to run
an accountability system without computer technology,
it would be difficult, and probably unworkable for all
but the very smallest programs. We are unaware of an
institution at this time, even the smallest, which does
not employ one or more technologies to support their
accountability system.
When beginning the development of our
accountability system at the University of Northern
Iowa (UNI), one of the first decisions we had to make
was how to set up the computer system. In visits to
other institutions, the main models we saw, in terms of
control, were (a) locally-developed (that is, within the
college of education), (b) university-run, (c) a
combination of these two, or (d) commercial system
developed specifically for the purpose. In the end we
chose option c because we wanted the advantages of
connecting with the university mainframe—Web-based
access, security administered by the university, and
student data updated in real-time—combined with the
ability to collect whatever other data we wanted.
As part of this study we wanted to see what
choices other institutions had made. It appears that 5 of
our 10 sampled institutions use what we classify as a
locally-developed system. This means that it was set
up and managed by the teacher education unit itself
rather than the university. Such a system can generally
download student data from the university system on
occasion, but only infrequently. This model gives the
unit the maximum amount of control, but also the
maximum of responsibility and fiscal outlay, and
depends on having considerable expertise available.
None of the 10 programs relied solely on the university
mainframe. Three of the institutions chose option c,
which combines real-time access to and through the
university mainframe with additional data tables under
51
the control of the unit. These units use the university
mainframe system, but have had programmers under
the direction of the unit add additional data capabilities
(usually relational database tables) managed by teacher
education but using the university’s software and
hardware. Finally, 2 of the programs rely primarily on
commercial systems, in these cases Tk20 and
TaskStream.
Not all of the 10 sampled systems seem to be webaccessible; indeed, it appears to be only half.
Interestingly, the systems that are web-accessible are
the 3 institutions employing the combined approach
and the 2 using commercial packages. We do not
believe this to be a coincidence; it would take extensive
effort and resources to develop and maintain safe and
functional web access as a unit.
Evidence for Reliability and Validity of
Assessments in the Accountability System
NCATE Standard 2 (NCATE, 2008) considers
evidence for reliability and validity to be an essential
part of a quality accountability system, as expressed in
the following criterion, “The unit conducts thorough
studies to establish fairness, accuracy, and consistency
of its assessment procedures and unit operations” (2a).
This is line with standard psychometric practice.
Although Standard 2 does not seem to us to make this
point explicitly, we believe it should be read as
requesting reliability and validity evidence for all
specific assessments as well as for the overall system.
In this section we address individual assessments only;
examination of the system as a whole will be covered
below.
Every one of our 10 IRs at least mentioned
reliability and validity of assessments; however, many
IRs only mentioned it without any results, concerns, or
implications provided. Our analysis shows that of the
10 institutions:
• 3 simply mentioned reliability and validity
with little additional detail;
• 4 mentioned them along with a plan for
evaluating reliability and validity, but no
results are given;
• 1 IR mentioned them and provided a few
verbally described examples (again, no
numbers or other study results were given);
and
• 2 IRs did provide some amount of specific
results.
52
Journal of Assessment and Accountability in Educator Preparation
One unit honestly noted: “We are still developing
and conducting thorough studies to establish absence of
bias and to assure fairness, accuracy, and consistency
of the performance assessment procedures.” This IR
provided no additional details.
A more detailed plan is given in the following
description from another IR. No additional details were
provided beyond this paragraph.
Assessment Accuracy, Consistency and Freedom from
Bias
As a sub-component of the two-year process, and in
connection with the Curriculum Alignment Audit, the
[accountability system] includes a process that is
designed to provide effective steps to eliminate bias in
its assessments and evaluate the fairness, accuracy,
and consistency of its assessment procedures and unit
operations. Evaluation of assessment reliability is a
departmental function, conducted as clusters of faculty
who deal with the same or similar assessments
convene to examine this issue and identify disparities
that would unduly compromise accuracy and
consistency or would reflect an undisclosed bias in the
construction, administration, and/or scoring of
assessments.
Yet another unit included the description below.
This IR suggests studies of a professional variety have
been carried out although no results have been provided
to the reader.
Instruments that use online methods to collect data or
to enter data after assessments occur by other means
are examined for consistency. Formal analyses of
reliability and agreement are conducted where
appropriate once sufficient data are available for
stable analysis. Factor analysis of instruments that
have a subscale construction has also been done.
Results of these analyses have shown high reliability,
consistency, and adequate factor integrity.
It appears that most units primarily use locallycreated assessments, making for limited ability to build
an evidentiary base across institutions. One of the few
assessments used across a number of institutions, other
than Praxis, is the Teacher Work Sample (TWS)
(Renaissance Partnership for Improving Teacher
Quality, 2002), a performance based assessment. There
is a growing national body of evidence for the reliability and validity of the TWS (see, for example,
Cornish, Boody, & Robinson, 2010; Denner, Newsome,
& Newsome, 2005; Denner, Norman, & Lin, 2009;
Denner, Norman, Salzman, Pankratz, & Evans, 2004;
Denner, Salzman, & Bangert, 2001; Denner, Salzman,
Newsome, & Birdsong, 2003; McConney, Schalock, &
Schalock, 1998).
Decision Points for Initial Licensure Programs
All of the sampled units followed roughly the same
decision points. Our institution (UNI) does the same,
and like the others, ours comes from the levels that
were used prior to the accountability system itself. The
precise terms used at the different institutions has some
variety, but roughly they correspond to
1. Before teacher education,
2. Entry into teacher education,
3. Entry in to student teaching, and
4. Graduation and recommendation for licensure.
In addition, some institutions included one or more of
(a) entry into the university, and (b) after graduation.
Strictly speaking, these two stages do not belong to the
teacher education experience itself, but data from them
can be useful for accountability.
Decision Making Process Around Candidates
Recognizing that the ultimate value of an
accountability system is improvement in candidates and
not just reporting, Standard 2 (NCATE, 2008) includes
the following, “Candidates and faculty review data on
their performance regularly and develop plans for
improvement based on the data” (2c). In this section
we examine the extent to which decisions about
candidate progress require direct human interaction or
are simply carried out by the system checking off the
list of requirements.
What we found was that almost all of our 10 cases
followed a checkoff system, meaning that the electronic
system verified completion of certain listed
requirements, allowing the candidate to progress when
all listed requirements are met without any additional
investigation or judgment by faculty. The alternative to
the checkoff approach might include one or more
faculty members examining the data and rendering a
decision, collecting additional data through interviews,
etc., or providing feedback directly to the student. Note
that we do not mean to imply that a given student or
their advisor does not use checkoff data for feedback—
we are just saying that such use has not been built into
the system description.
Out of the 10 IRs examined, we found that
• 7 institutions reported that all decisions were
checkoff based;
• 1 institution indicated that all levels used a
checkoff approach with the exception of one
Teacher Education Accountability Systems
level that added something more (interview
and portfolio review); and
• 2 systems had substantially more than
checkoffs at 3 of the 4 levels.
Considering the last group, the 2 that used more than
checkoff at 3 of the 4 levels, one of them required
portfolio reviews at each of the three higher levels. The
other unit used a series of interviews, building from an
interview with a single faculty member to both an
internal interdisciplinary interview and another
interview with external public school teachers and
administrators.
Typical of the largest group, those using only
checkoffs at each decision point, is the following
account.
Candidates have access to their checkpoint data
through the . . . student record system, which is
available through. . .via the Internet. Candidates and
their advisors can use this information to plan
schedules, work on remediating performances, and
making academic as well as professional suggestions
toward dispositional improvements.
As this description notes, candidates and advisors can
use data on the system; it is available. But its use is
apparently not required according to the system
description.
Regular Evaluation of the System as a Whole
The issue here is whether each institution includes
evaluation of the accountability system itself, as
opposed to using data from the system to evaluate
individual students or programs. Standard 2 (NCATE,
2008) puts it this way, “The unit, with the involvement
of its professional community, is regularly evaluating
the capacity and effectiveness of its assessment system,
which reflects the conceptual framework and
incorporates candidate proficiencies outlined in
professional and state standards” (2a). We take this to
mean evaluating and proposing changes to the
accountability system itself, rather than the use of data
generated by the accountability system to evaluate and
improve programs.
We found this piece to be particularly difficult to
work on from accreditation documents, as NCATE now
has multiple accreditation approaches, the structure of
which do not all ask the institution to address this part
of Standard 2. Our findings are thus particularly
tentative. We assume that most or even all of the units
actually engage in this work over time, as we found
53
evidence of changes made to the system. However,
there were only 2 of the 10 IRs that explicitly discussed
a systematic plan for doing so. None of the IRs
mentioned evidence.
Following is an account taken from 1 of the 10 IRs
we studied. It does not give results of any evaluation of
the system, but does provide a process by which it is
(or should be) done.
Formal and informal evaluation of the unit assessment
system takes place at various levels and involves
multiple stakeholder groups. Program faculty review
the functioning of the assessment system dynamically,
as they monitor students and review their programs. In
many of the review sessions, particularly Semester
Review, the Assessment Coordinator works directly
with program faculty and captures ideas on functional
changes that would enhance the operation of the
assessment system. As new enhancements to the
assessment system have been deployed, the
Assessment Coordinator has provided direct training
to those using the system. Through this direct contact
with end-users, the Assessment Coordinator has
obtained evaluations of the assessment system.
Information gathered from end-users, including
candidates,
cooperating
teachers,
university
supervisors, public school administrators, and faculty
has led to changes in both the functional and reporting
components of the assessment system.
The Assessment Coordinator also works closely
with Information Technology (IT) to review the
assessment system from a technical perspective. IT
staff review the load, access, and data demands and
have recommended changes and enhancements to the
technical elements of the system. The move to [a
database system] and the incorporation of [a specific
form of] credentialing are two of the major changes
that were initiated by IT evaluation of the assessment
system. Since the assessment system accesses data
from [the university’s mainframe] system, the same
data security protocols are applied to both, and SOE
assessments are evaluated by the same security
standards.
Several formal committees of the School evaluate
the assessment system. The Faculty Executive
Committee offers guidance to the Dean on all matters
pertaining to the SOE and shares responsibility for
governance. The Faculty Executive Committee is
comprised of at least six faculty representatives, the
Associate Deans, and the Dean. The Faculty
Executive Committee evaluates the assessment system
to ensure that it is functioning to meet the specified
needs the SOE and its programs. Through a review of
the SOE committee structure and the functioning of
the assessment system, the Faculty Executive
Committee recommended the restructuring of several
54
Journal of Assessment and Accountability in Educator Preparation
committees to allow for a more unified point of action.
The Academic Affairs Committee is a new committee
since the last NCATE visit. It consolidates
responsibilities of the prior Assessment, Curriculum,
and Admissions and Financial Aid Committees. In the
prior structure, discrete responsibilities for review and
action had been distributed across the committees, and
representatives did not have adequate view of the
whole. The new committee centralizes the action and
review responsibilities into one committee. Academic
Affairs has the responsibility to review the reports
generated by the assessment system and the Areas. It
also reviews how the system is working and makes
suggestions for changes. It was this Committee that
generated the change in the Program Review format.
Through the Program Review process, Areas have
indicated changes that they would like to have in the
assessment system. Placing these evaluations within
the Program Review process ensures that a major
school-wide committee considers the recommendations.
Another unit also did not provide results of system
evaluation, but provided a set of criteria by which they
assess or intend to assess their system. We include it
here because we think their attempt to delineate the
qualities they want in an assessment system is to be
commended.
Our system is based on five principles: Efficacy,
Comprehensiveness, Bias Elimination, Capacity, and
Technological Sufficiency.
Efficacy addresses the question of the system's
degree of effectiveness, in a holistic sense, in doing
what it is designed to do. It inquires into how well the
system succeeds in data collection in an adequate
range of data types so as to make its content most
useful. It addresses the issue of the system's ability to
aggregate and disaggregate data at multiple levels that
provide sufficient clarity and meaningfulness to be an
effective tool for evaluation and decision making.
Comprehensiveness analyzes the needs of the
system to embrace expanding parameters of data
needs, such as the data that is available in clinical
settings. It looks at the potential for organic
expansion, in which existing elements of the system
are
leveraged
to
enlarge
the
system's
comprehensiveness. In this analysis, potential
opportunities are discussed and explored to the end of
system expansion.
While bias elimination is formally addressed on the
two-year cycle of the curriculum alignment audit. . ., it
is addressed more informally on an annual basis in the
unit evaluation process. The unit assessment and
evaluation committee references anecdotal evidence,
exit surveys, [student assessment] data, and follow-up
surveys to make a more comprehensive evaluation of
candidate feedback patterns that suggest problems
related to assessment accuracy, consistency, and
freedom from bias.
Assessment system capacity and technological
sufficiency are evaluated as a closely related tandem,
especially since technology greatly impacts the ability
to make gains in system capacity. The current
software and hardware interconnections are examined
in the effort to identify future technological
enhancements that could meaningfully impact the
system's overall capacity.
Use of Data Generated by the System for
Program Change
Since the purpose for an accountability system is
not just reporting but also for program improvement,
Standard 2 (NCATE, 2008) includes the following,
“[The unit] also makes changes in its practices
consistent with the results of these studies” (2a). To
close the feedback loop, the standard adds this criterion
as well, “The unit not only makes changes based on the
data, but also systematically studies the effects of any
changes to assure that programs are strengthened
without adverse consequences (2c)”.
Out of the 10 institutions studied we found:
• 1 did not mention data-based changes at all;
• 1 mentioned them, but with no process or
examples given;
• 3 gave examples of changes made to
programs, but not necessarily tied to specific
data;
• 3 included numerous examples of changes,
with some reference to process but no
connection to specific data;
• 1 gave numerous examples of changes with
some reference to relevant data; and
• 1 gave substantial system design but no
outcomes.
Details on how such a system works were fairly
sparse in the documents available online—possibly
phone interviews or even site visits might be necessary
to tease out more details. Several units appear to hold
one or two day retreats to look at data and ponder over
changes. One unit requires each of its programs to
submit a review at least every year. Below are several
examples taken from one IR illustrating changes made
and supported with reference to data.
• Praxis II score analyses led to the revision of
social sciences content course requirements in
the undergraduate programs.
Teacher Education Accountability Systems
• Candidate, mentor teacher, mentor principal,
methods faculty, and clinical faculty feedback
prompted a revised structure for the
Elementary Education internship year.
• Focus group discussions resulted in the
development of a new course requirement in
children’s literature for Early Childhood
Education candidates.
The second criterion—studying the effects of
changes made due to feedback from the system was
mentioned in only one of the IRs. Here is what that
unit wrote. Note that this statement is all that was
provided. The actual changes and the data behind them
were not given, neither were results of follow-up
studies.
The unit not only makes changes when evaluations
indicate, but also systematically studies the effects of
any changes to assure that the intended program
strengthening occurs and that there are no adverse
consequences. Beginning in January of 1998, Teacher
Education faculty have held semesterly retreats per
year, one at the beginning of spring semester
(January), one at the beginning of fall semester
(August), and one during the summer (May or June).
During each retreat, the topic has been the UAS, the
conceptual framework, and curriculum issues.
Discussion and Conclusion
Summary and Discussion by Issue
Computer Backbone. It is hard to imagine a
teacher education program of any considerable size
functioning without a web-based system. It is hard to
see how data can effectively be relayed out to people,
much less be collected without one. Yet, it appears half
the units lack one. For this study we did not do site
visits to explore capabilities and faculty satisfaction.
We recommend this for future studies, because the IRs
neither discussed the rationales behind the choice of
approach nor provided any evidence either for or
against suitability and effectiveness. What does each
institution believe it is receiving, or not receiving, from
its system choice?
Are they aware of all the
alternatives?
Evidence for Reliability and Validity. Although the
IRs showed evidence of growing consideration to
quality in instrumentation for accountability systems,
there is a need for more yet. We speculate that one of
55
the reasons that more attention is not paid to this is that
most teacher education faculty are not well versed in
psychometrics.
An important aspect was brought up by only one
unit: the importance of building reliability in up front as
opposed to simply reporting it after the fact. Below is
what this unit wrote.
For assessments that are rated by faculty teams, interrater reliability workshops are conducted to prepare
faculty. The portfolio, COE exit exam, and the
graduate comprehensive exam are examples. In
addition to the general workshop, raters engage in a
brief training prior to each grading session to
standardize the process for that session. Training
involves review of the rubric, the rating scale, and the
procedure to assure a consensus of understanding. It
also involves a short trial-run as a test of rater
agreement for the session.
Decision Points Used for Initial Licensure. This is
one area in which all units followed essentially the
same path.
Decision Making Process Around Candidates. As
described above, most of the units we studied did not
have any required process involving faculty judgment
for candidates as they moved from stage to stage within
the program. Likewise there was no requirement for
candidate self-reflection. In most of the IRs the
computer simply tracked the requirements and allowed
students to move forward when every requirement was
checked off. It is possible that programs are missing
something by following this strategy. In this regard,
Hall, Smith, and Nowinski (2005) note the following.
Assessment information should also be used for
Candidate Self-Improving.
Data from peer
observations, candidate’s team teaching small groups
of students, and clinical supervisor feedback should be
seen by candidates as important information to guide
self improvements rather than just as the basis for
final grades.
In these examples, Dispositions can be as
important as the structure of a particular teacher
education experience. An important implication of
using evidence is that the program evaluation data can
be made available to candidates in systematic ways.
For example, the assessment system at State
University of New York Cortland relies heavily on
candidates entering their own data and monitoring
their accumulating performance records. (p. 31)
Regular Evaluation of the System Itself. Although
we saw evidence that some units were making system
56
Journal of Assessment and Accountability in Educator Preparation
changes that presumably came from evaluation of the
accountability system itself, there was little direct
mention of this part of the Standards except for one IR.
Use of Data for Program Improvement. Part of the
purpose of accountability systems is to provide
information useful for program improvement. Wilkens,
Young, and Sterner’s 2009 study of 80 IRs found that
assessments systems were useful in identifying areas on
which a unit could work. However, they found that
most IRs were not strong in stating changes, and it was
not clear if many of the changes described were due to
data or to other forces. This was similar to our
findings. Their hypothesis was that “This may be
caused by teacher education programs being at an early
stage of developing, testing, and refining their
performance-based systems rather than at a stage where
the focus is on long-term data collection, aggregation,
and reporting to make informed decisions” (p. 21).
Conclusion
The results of our study suggest that there is room
for improvement in the state of practice in
accountability systems, or at least as portrayed in IR
documentation. Perhaps part of the solution is the need
to raise awareness of evaluation issues among teacher
education faculty.
But we also believe that there is need for more
detailed descriptive work and comparative analysis of
systems. Merely reading the IRs did not usually give
us the reasoning behind the choices made—only the
choices. We recommend that future research include
site visits to see the accountability system in operation,
interview faculty, staff, and administrators for details
and rationales, and interview users for how it affects
them. And we also believe that there is a need to
develop additional conceptual understandings and
practical processes, to advance the state of the art.
References
Cornish, Y., Boody, R. M., & Robinson, V. (2010). A
study of rater differences in scoring the Teacher
Work Sample. Journal of Assessment and
Accountability in Educator Preparation, 1, 53-62.
Denner, P., Newsome, J., & Newsome, J. D. (2005,
February).
Generalizability of teacher work
sample performance assessments across occasions
of development. Paper presented at the meeting of
the Association of Teacher Educators, Chicago, IL.
Denner, P., Norman, A., & Lin, S. (2009). Fairness and
consequential validity of teacher work samples.
Educational
Assessment
Evaluation
and
Accountability, 21, 235-254. doi: 10.1007/s1109008-9059-6
Denner, P. R., Lin, S.-Y., Newsome, J. R., Newsome, J.
D., & Hedeen, D. L. (this issue). Evidence for
Improved P-12 Student Learning and Teacher
Work Sample Performance from Pre-Internships to
Student-Teaching Internships. Journal of Assessment and Accountability in Educator Preparation,
2, 23-35.
Denner, P. R., Norman, A. D., Salzman, S. A.,
Pankratz, R. S., & Evans, C. S. (2004). The
Renaissance Partnership teacher work sample:
Evidence supporting score generalizability,
validity, and quality of student learning assessment.
In E. M. Guyton & J. R. Dangel (Eds.). Teacher
education yearbook XII: Research linking teacher
preparation and student performance (pp. 23-56).
Dubuque, IA: Kendall/ Hunt.
Denner, P. R., Salzman, S. A., Newsome, J. D., &
Birdsong, J. R. (2003). Teacher work sample
assessment: Validity and generalizability of
performances across occasions of development.
Journal for Effective Schools, 2(1), 29-48.
Denner, P., Salzman, S., & Bangert, A. (2001). Linking
teacher assessment to student performance: A
benchmarking, generalizability, and validity study
of the use of teacher work samples. Journal of
Personnel Evaluation in Education, 15, 287-307.
Elliott, E. J. (2003). Assessing education candidate
performance: A look at changing practices.
Washington,
DC:
National
Council
for
Accreditation of Teacher Education. Retrieved
from
http://www.ncate.org/Portals/0/documents/
Accreditation/article_assessmentExamples.pdf
Hall, G. E., Smith, C, & Nowinski, M. B. (2005). An
organizing framework for using evidence-based
assessments to improve teaching and learning in
teacher education program. Teacher Education
Quarterly, 32(3), 19-33.
McConney, A. A., Schalock, M. D., & Schalock, H. D.
(1998). Focusing improvement and quality
assurance: Work samples as authentic performance
measures of prospective teachers’ effectiveness.
Journal of Personnel Evaluation in Education, 11,
343-363.
National Council for Accreditation of Teacher
Education. (2008). Professional standards for the
Teacher Education Accountability Systems
accreditation of schools, colleges, and departments
of education. Retrieved from http://www. ncate.
org/Standards/NCATEUnitStandards/UnitStandard
sinEffect2008/tabid/476/Default. aspx#stnd2
Renaissance Partnership for Improving Teacher Quality
[RPITQ]. (2002). Teacher work sample: Performance prompt, teaching process standards,
scoring rubrics. Retrieved from http://www.uni.
edu/itq/RTWS/
Wilkins, E. A., Young, A., & Sterner, S. (2009). An
examination of institutional reports: Use of data for
program improvement. Action in Teacher
Education, 31(1), 14-23.
Authors
Robert Boody is Associate Professor of Educational
Psychology and Foundations at the University of
Northern Iowa. His research interests include teacher
knowledge and change, classroom assessment,
philosophy of inquiry, and accountability systems in
the preparation of educators.
Tomoe Kitajima recently graduated from the
University of Northern Iowa with her EdD in
Curriculum and Instruction. Her research interests
include reflexive inquiry, existential personal projects,
and spirituality in leisure. She is currently working as a
program evaluator.
57
Download