Journal of Assessment and Accountability in Educator Preparation Volume 2, Number 1, February 2012, pp. 48-57 Teacher Education Accountability Systems: What an Examination of Institutional Reports Shows About Current Practice Robert M. Boody, Tomoe Kitajima University of Northern Iowa Teacher education units that want to be accredited by the National Council for Accreditation of Teacher Education (NCATE) must have a system that collects, analyzes, and uses data about student performance and unit operations (NCATE, 2008). And even for those institutions not NCATE affiliated, most states have incorporated similar standards into state accreditation. With different audiences requiring more and more evidence, the use of an accountability system has become a necessity. But there is very little professional literature on the topic. What is “best practice” in accountability systems? What are important issues with which some institutions are still struggling? And what seem to be productive answers? There are few studies that describe or report on such accountability systems outside of NCATE accreditation documents. There is little literature that attempts to explore conceptual grounding for such systems. There is little comparative work across institutions. In 2002 when NCATE put out a document with examples of systems, it was noted that most of the assessment submitted for possible inclusion had only been recently developed and in fact were still under development (Elliott, 2003). The purpose of this study is to provide a concrete examination of important aspects of accountability systems through the lens of 10 specific teacher education institutions. In this article we address six important issues, including (a) the system’s computer backbone, (b) evidence for reliability and validity, (c) decision points used for initial licensure, (d) processes for decision-making processes about students, (e) regular evaluation of the system, and (f) use of data for program improvement. The results of our study suggest that there is room for improvement in the state of practice in accountability systems, or at least as portrayed in IR documentation. Perhaps part of the solution is the need to raise awareness of evaluation issues among teacher education faculty. But we also believe that there is need for more detailed descriptive work and comparative analysis of systems to go beyond merely reading the Institutional Reports. And we also believe that there is a need to develop additional conceptual understandings and practical processes. Throughout education and other professions such as medicine there is an increasing push to collect and use data in decision making. The No Child Left Behind legislation is a case in point. Teacher education is not immune from this trend. Certain types of data are required to be reported to the federal government by the Higher Education Act. Teacher education units that want to be accredited by the National Council for Accreditation of Teacher Education (NCATE) must have a system that collects, analyzes, and uses data about student performance and unit operations (NCATE, 2008). Even for those institutions not Correspondence: Robert Boody, Educational Psychology & Foundations, College of Education, University of Northern Iowa, Cedar Falls, IA 50614-0607. Email: robert.boody@uni.edu Journal of Assessment and Accountability in Educator Preparation Volume 2, Number 1, February 2012, 48-57 Teacher Education Accountability Systems NCATE affiliated, most states have incorporated similar standards into state accreditation. With different audiences requiring more and more evidence, the use of an accountability system has become a necessity. Thus, all teacher education programs have an accountability system. But there is very little professional literature on the topic. What is “best practice” in accountability systems? What are important issues with which some institutions are still struggling? And what seem to be productive answers? There are few studies that describe or report on such accountability systems outside of NCATE accreditation documents. There is little literature that attempts to explore conceptual grounding for such systems. There is little comparative work across institutions. It became clear as we developed a literature review on accountability and assessment systems in teacher education that there is simply not much literature available. There is some, of course, and several large bodies of related material that is helpful, but little material that is directly applicable. In some ways this should not be surprising. It was only in 2000 that NCATE changed to a performance-based system for accreditation. Even in 2003 when NCATE put out a document with examples of systems, it was noted that most of the assessments submitted for possible inclusion had only been recently developed and in fact were still under development (Elliott, 2003). Although assessment of teacher education candidates has a long history, the type of broad ranging accountability systems that NCATE Standard 2 (NCATE, 2008) expects is a recent development. The purpose of this study is to provide a concrete examination of important aspects of accountability systems through the lens of 10 specific teacher education institutions. Please note that our purpose here is not to evaluate these 10 accountability systems; that is the job of the NCATE or state accreditation review teams or both. Our purpose, rather, is to describe what is being done—the current state of practice—including commonalities and differences across the institutions. Thus, our review is focused more on what is important from disciplinary and functional perspectives rather than what different review teams will allow or flag in the accreditation process. In this article we address six important issues, including (a) the system’s computer backbone, (b) evidence for reliability and validity, (c) decision points used for initial licensure, (d) processes for decisionmaking processes about students, (e) regular evaluation 49 of the system, and (f) use of data for program improvement. Method Institutions Studied The 10 institutions we studied were chosen purposively rather than randomly. Thus, our findings should not be interpreted as representative of some specific larger population. First, we chose several institutions with highly respected teacher education programs as we hoped they would likely have high quality assessment systems. Second, for the same reason, we chose only NCATE accredited institutions so we could be sure that they considered Standard 2 as something important to follow. Third, because we wanted to examine the state of the art now rather than as it was years ago, we chose institutions had been through NCATE accreditation recently; of the 10 institutions studied, 1 was visited in 2007, 2 in 2008, 2 in 2010, and all the rest in 2011. Fourth, to make the study feasible, we chose only programs that had their NCATE Institutional Report (IR), which would include a description of the accountability system, online. Finally, we chose institutions that would provide a certain amount of variability in size and type of institution. Because our interest was not in evaluating the systems nor in generating contention, we have not publicly identified the institutions. Total campus student body size varied between fewer than 2,000 students to as many as 24,000. The majority of institutions were state assisted comprehensive universities, but the sample included both private liberal arts and research intensive institutions as well. Data Sources All of our data was taken from each institution’s online Institutional Report (IR), a document used for NCATE accreditation. It is possible that in some cases a teacher preparation unit might have information about their assessment system that they do not put in the IR. Ultimately, then, our data is not the accountability system as it exists on the ground, but the system as it is described in the IR. The two perspectives on the system may be quite similar or less similar. But in either case, since the IR is the public face of the unit to 50 Journal of Assessment and Accountability in Educator Preparation Table 1 NCATE Standard 2 with Subcategories and Target-Level Descriptors Standard 2: Assessment System and Unit Evaluation The unit has an assessment system that collects and analyzes data on applicant qualifications, candidate and graduate performance, and unit operations to evaluate and improve the performance of candidates, the unit, and its programs. 2a. Assessment System TARGET The unit, with the involvement of its professional community, is regularly evaluating the capacity and effectiveness of its assessment system, which reflects the conceptual framework and incorporates candidate proficiencies outlined in professional and state standards. The unit regularly examines the validity and utility of the data produced through assessments and makes modifications to keep abreast of changes in assessment technology and in professional standards. Decisions about candidate performance are based on multiple assessments made at multiple points before program completion and in practice after completion of programs. Data show a strong relationship of performance assessments to candidate success throughout their programs and later in classrooms or schools. The unit conducts thorough studies to establish fairness, accuracy, and consistency of its assessment procedures and unit operations. It also makes changes in its practices consistent with the results of these studies. 2b. Data Collection, Analysis, And Evaluation TARGET The unit's assessment system provides regular and comprehensive data on program quality, unit operations, and candidate performance at each stage of its programs, extending into the first years of completers’ practice. Assessment data from candidates, graduates, faculty, and other members of the professional community are based on multiple assessments from both internal and external sources that are systematically collected as candidates progress through programs. These data are disaggregated by program when candidates are in alternate route, off-campus, and distance learning programs. These data are regularly and systematically compiled, aggregated, summarized, analyzed, and reported publicly for the purpose of improving candidate performance, program quality, and unit operations. The unit has a system for effectively maintaining records of formal candidate complaints and their resolution. The unit is developing and testing different information technologies to improve its assessment system. 2c. Use Of Data For Program Improvement TARGET The unit has fully developed evaluations and continuously searches for stronger relationships in the evaluations, revising both the underlying data systems and analytic techniques as necessary. The unit not only makes changes based on the data, but also systematically studies the effects of any changes to assure that programs are strengthened without adverse consequences. Candidates and faculty review data on their performance regularly and develop plans for improvement based on the data. ________________________________________________________________________________________________________________________________________ Note: only the Target level (the highest level) is provided here. The original document also includes two other levels: Unacceptable and Acceptable. Adapted from Professional Standards for the Accreditation of Schools, Colleges, and Departments of Education. Copyright 2008 by the National Council for Accreditation of Teacher Education. Teacher Education Accountability Systems both the accreditation site visit team as well as to peers or any of the general public which care to access it, we believe that the IR provides a reading on how the unit understands and implements the technical and practical requirements of Standard 2, the NCATE accreditation standard that directly addresses unit accountability systems (NCATE, 2008). This Standard is outlined in Table 1 for easy reference. Results Computer Backbone Although it is certainly theoretically possible to run an accountability system without computer technology, it would be difficult, and probably unworkable for all but the very smallest programs. We are unaware of an institution at this time, even the smallest, which does not employ one or more technologies to support their accountability system. When beginning the development of our accountability system at the University of Northern Iowa (UNI), one of the first decisions we had to make was how to set up the computer system. In visits to other institutions, the main models we saw, in terms of control, were (a) locally-developed (that is, within the college of education), (b) university-run, (c) a combination of these two, or (d) commercial system developed specifically for the purpose. In the end we chose option c because we wanted the advantages of connecting with the university mainframe—Web-based access, security administered by the university, and student data updated in real-time—combined with the ability to collect whatever other data we wanted. As part of this study we wanted to see what choices other institutions had made. It appears that 5 of our 10 sampled institutions use what we classify as a locally-developed system. This means that it was set up and managed by the teacher education unit itself rather than the university. Such a system can generally download student data from the university system on occasion, but only infrequently. This model gives the unit the maximum amount of control, but also the maximum of responsibility and fiscal outlay, and depends on having considerable expertise available. None of the 10 programs relied solely on the university mainframe. Three of the institutions chose option c, which combines real-time access to and through the university mainframe with additional data tables under 51 the control of the unit. These units use the university mainframe system, but have had programmers under the direction of the unit add additional data capabilities (usually relational database tables) managed by teacher education but using the university’s software and hardware. Finally, 2 of the programs rely primarily on commercial systems, in these cases Tk20 and TaskStream. Not all of the 10 sampled systems seem to be webaccessible; indeed, it appears to be only half. Interestingly, the systems that are web-accessible are the 3 institutions employing the combined approach and the 2 using commercial packages. We do not believe this to be a coincidence; it would take extensive effort and resources to develop and maintain safe and functional web access as a unit. Evidence for Reliability and Validity of Assessments in the Accountability System NCATE Standard 2 (NCATE, 2008) considers evidence for reliability and validity to be an essential part of a quality accountability system, as expressed in the following criterion, “The unit conducts thorough studies to establish fairness, accuracy, and consistency of its assessment procedures and unit operations” (2a). This is line with standard psychometric practice. Although Standard 2 does not seem to us to make this point explicitly, we believe it should be read as requesting reliability and validity evidence for all specific assessments as well as for the overall system. In this section we address individual assessments only; examination of the system as a whole will be covered below. Every one of our 10 IRs at least mentioned reliability and validity of assessments; however, many IRs only mentioned it without any results, concerns, or implications provided. Our analysis shows that of the 10 institutions: • 3 simply mentioned reliability and validity with little additional detail; • 4 mentioned them along with a plan for evaluating reliability and validity, but no results are given; • 1 IR mentioned them and provided a few verbally described examples (again, no numbers or other study results were given); and • 2 IRs did provide some amount of specific results. 52 Journal of Assessment and Accountability in Educator Preparation One unit honestly noted: “We are still developing and conducting thorough studies to establish absence of bias and to assure fairness, accuracy, and consistency of the performance assessment procedures.” This IR provided no additional details. A more detailed plan is given in the following description from another IR. No additional details were provided beyond this paragraph. Assessment Accuracy, Consistency and Freedom from Bias As a sub-component of the two-year process, and in connection with the Curriculum Alignment Audit, the [accountability system] includes a process that is designed to provide effective steps to eliminate bias in its assessments and evaluate the fairness, accuracy, and consistency of its assessment procedures and unit operations. Evaluation of assessment reliability is a departmental function, conducted as clusters of faculty who deal with the same or similar assessments convene to examine this issue and identify disparities that would unduly compromise accuracy and consistency or would reflect an undisclosed bias in the construction, administration, and/or scoring of assessments. Yet another unit included the description below. This IR suggests studies of a professional variety have been carried out although no results have been provided to the reader. Instruments that use online methods to collect data or to enter data after assessments occur by other means are examined for consistency. Formal analyses of reliability and agreement are conducted where appropriate once sufficient data are available for stable analysis. Factor analysis of instruments that have a subscale construction has also been done. Results of these analyses have shown high reliability, consistency, and adequate factor integrity. It appears that most units primarily use locallycreated assessments, making for limited ability to build an evidentiary base across institutions. One of the few assessments used across a number of institutions, other than Praxis, is the Teacher Work Sample (TWS) (Renaissance Partnership for Improving Teacher Quality, 2002), a performance based assessment. There is a growing national body of evidence for the reliability and validity of the TWS (see, for example, Cornish, Boody, & Robinson, 2010; Denner, Newsome, & Newsome, 2005; Denner, Norman, & Lin, 2009; Denner, Norman, Salzman, Pankratz, & Evans, 2004; Denner, Salzman, & Bangert, 2001; Denner, Salzman, Newsome, & Birdsong, 2003; McConney, Schalock, & Schalock, 1998). Decision Points for Initial Licensure Programs All of the sampled units followed roughly the same decision points. Our institution (UNI) does the same, and like the others, ours comes from the levels that were used prior to the accountability system itself. The precise terms used at the different institutions has some variety, but roughly they correspond to 1. Before teacher education, 2. Entry into teacher education, 3. Entry in to student teaching, and 4. Graduation and recommendation for licensure. In addition, some institutions included one or more of (a) entry into the university, and (b) after graduation. Strictly speaking, these two stages do not belong to the teacher education experience itself, but data from them can be useful for accountability. Decision Making Process Around Candidates Recognizing that the ultimate value of an accountability system is improvement in candidates and not just reporting, Standard 2 (NCATE, 2008) includes the following, “Candidates and faculty review data on their performance regularly and develop plans for improvement based on the data” (2c). In this section we examine the extent to which decisions about candidate progress require direct human interaction or are simply carried out by the system checking off the list of requirements. What we found was that almost all of our 10 cases followed a checkoff system, meaning that the electronic system verified completion of certain listed requirements, allowing the candidate to progress when all listed requirements are met without any additional investigation or judgment by faculty. The alternative to the checkoff approach might include one or more faculty members examining the data and rendering a decision, collecting additional data through interviews, etc., or providing feedback directly to the student. Note that we do not mean to imply that a given student or their advisor does not use checkoff data for feedback— we are just saying that such use has not been built into the system description. Out of the 10 IRs examined, we found that • 7 institutions reported that all decisions were checkoff based; • 1 institution indicated that all levels used a checkoff approach with the exception of one Teacher Education Accountability Systems level that added something more (interview and portfolio review); and • 2 systems had substantially more than checkoffs at 3 of the 4 levels. Considering the last group, the 2 that used more than checkoff at 3 of the 4 levels, one of them required portfolio reviews at each of the three higher levels. The other unit used a series of interviews, building from an interview with a single faculty member to both an internal interdisciplinary interview and another interview with external public school teachers and administrators. Typical of the largest group, those using only checkoffs at each decision point, is the following account. Candidates have access to their checkpoint data through the . . . student record system, which is available through. . .via the Internet. Candidates and their advisors can use this information to plan schedules, work on remediating performances, and making academic as well as professional suggestions toward dispositional improvements. As this description notes, candidates and advisors can use data on the system; it is available. But its use is apparently not required according to the system description. Regular Evaluation of the System as a Whole The issue here is whether each institution includes evaluation of the accountability system itself, as opposed to using data from the system to evaluate individual students or programs. Standard 2 (NCATE, 2008) puts it this way, “The unit, with the involvement of its professional community, is regularly evaluating the capacity and effectiveness of its assessment system, which reflects the conceptual framework and incorporates candidate proficiencies outlined in professional and state standards” (2a). We take this to mean evaluating and proposing changes to the accountability system itself, rather than the use of data generated by the accountability system to evaluate and improve programs. We found this piece to be particularly difficult to work on from accreditation documents, as NCATE now has multiple accreditation approaches, the structure of which do not all ask the institution to address this part of Standard 2. Our findings are thus particularly tentative. We assume that most or even all of the units actually engage in this work over time, as we found 53 evidence of changes made to the system. However, there were only 2 of the 10 IRs that explicitly discussed a systematic plan for doing so. None of the IRs mentioned evidence. Following is an account taken from 1 of the 10 IRs we studied. It does not give results of any evaluation of the system, but does provide a process by which it is (or should be) done. Formal and informal evaluation of the unit assessment system takes place at various levels and involves multiple stakeholder groups. Program faculty review the functioning of the assessment system dynamically, as they monitor students and review their programs. In many of the review sessions, particularly Semester Review, the Assessment Coordinator works directly with program faculty and captures ideas on functional changes that would enhance the operation of the assessment system. As new enhancements to the assessment system have been deployed, the Assessment Coordinator has provided direct training to those using the system. Through this direct contact with end-users, the Assessment Coordinator has obtained evaluations of the assessment system. Information gathered from end-users, including candidates, cooperating teachers, university supervisors, public school administrators, and faculty has led to changes in both the functional and reporting components of the assessment system. The Assessment Coordinator also works closely with Information Technology (IT) to review the assessment system from a technical perspective. IT staff review the load, access, and data demands and have recommended changes and enhancements to the technical elements of the system. The move to [a database system] and the incorporation of [a specific form of] credentialing are two of the major changes that were initiated by IT evaluation of the assessment system. Since the assessment system accesses data from [the university’s mainframe] system, the same data security protocols are applied to both, and SOE assessments are evaluated by the same security standards. Several formal committees of the School evaluate the assessment system. The Faculty Executive Committee offers guidance to the Dean on all matters pertaining to the SOE and shares responsibility for governance. The Faculty Executive Committee is comprised of at least six faculty representatives, the Associate Deans, and the Dean. The Faculty Executive Committee evaluates the assessment system to ensure that it is functioning to meet the specified needs the SOE and its programs. Through a review of the SOE committee structure and the functioning of the assessment system, the Faculty Executive Committee recommended the restructuring of several 54 Journal of Assessment and Accountability in Educator Preparation committees to allow for a more unified point of action. The Academic Affairs Committee is a new committee since the last NCATE visit. It consolidates responsibilities of the prior Assessment, Curriculum, and Admissions and Financial Aid Committees. In the prior structure, discrete responsibilities for review and action had been distributed across the committees, and representatives did not have adequate view of the whole. The new committee centralizes the action and review responsibilities into one committee. Academic Affairs has the responsibility to review the reports generated by the assessment system and the Areas. It also reviews how the system is working and makes suggestions for changes. It was this Committee that generated the change in the Program Review format. Through the Program Review process, Areas have indicated changes that they would like to have in the assessment system. Placing these evaluations within the Program Review process ensures that a major school-wide committee considers the recommendations. Another unit also did not provide results of system evaluation, but provided a set of criteria by which they assess or intend to assess their system. We include it here because we think their attempt to delineate the qualities they want in an assessment system is to be commended. Our system is based on five principles: Efficacy, Comprehensiveness, Bias Elimination, Capacity, and Technological Sufficiency. Efficacy addresses the question of the system's degree of effectiveness, in a holistic sense, in doing what it is designed to do. It inquires into how well the system succeeds in data collection in an adequate range of data types so as to make its content most useful. It addresses the issue of the system's ability to aggregate and disaggregate data at multiple levels that provide sufficient clarity and meaningfulness to be an effective tool for evaluation and decision making. Comprehensiveness analyzes the needs of the system to embrace expanding parameters of data needs, such as the data that is available in clinical settings. It looks at the potential for organic expansion, in which existing elements of the system are leveraged to enlarge the system's comprehensiveness. In this analysis, potential opportunities are discussed and explored to the end of system expansion. While bias elimination is formally addressed on the two-year cycle of the curriculum alignment audit. . ., it is addressed more informally on an annual basis in the unit evaluation process. The unit assessment and evaluation committee references anecdotal evidence, exit surveys, [student assessment] data, and follow-up surveys to make a more comprehensive evaluation of candidate feedback patterns that suggest problems related to assessment accuracy, consistency, and freedom from bias. Assessment system capacity and technological sufficiency are evaluated as a closely related tandem, especially since technology greatly impacts the ability to make gains in system capacity. The current software and hardware interconnections are examined in the effort to identify future technological enhancements that could meaningfully impact the system's overall capacity. Use of Data Generated by the System for Program Change Since the purpose for an accountability system is not just reporting but also for program improvement, Standard 2 (NCATE, 2008) includes the following, “[The unit] also makes changes in its practices consistent with the results of these studies” (2a). To close the feedback loop, the standard adds this criterion as well, “The unit not only makes changes based on the data, but also systematically studies the effects of any changes to assure that programs are strengthened without adverse consequences (2c)”. Out of the 10 institutions studied we found: • 1 did not mention data-based changes at all; • 1 mentioned them, but with no process or examples given; • 3 gave examples of changes made to programs, but not necessarily tied to specific data; • 3 included numerous examples of changes, with some reference to process but no connection to specific data; • 1 gave numerous examples of changes with some reference to relevant data; and • 1 gave substantial system design but no outcomes. Details on how such a system works were fairly sparse in the documents available online—possibly phone interviews or even site visits might be necessary to tease out more details. Several units appear to hold one or two day retreats to look at data and ponder over changes. One unit requires each of its programs to submit a review at least every year. Below are several examples taken from one IR illustrating changes made and supported with reference to data. • Praxis II score analyses led to the revision of social sciences content course requirements in the undergraduate programs. Teacher Education Accountability Systems • Candidate, mentor teacher, mentor principal, methods faculty, and clinical faculty feedback prompted a revised structure for the Elementary Education internship year. • Focus group discussions resulted in the development of a new course requirement in children’s literature for Early Childhood Education candidates. The second criterion—studying the effects of changes made due to feedback from the system was mentioned in only one of the IRs. Here is what that unit wrote. Note that this statement is all that was provided. The actual changes and the data behind them were not given, neither were results of follow-up studies. The unit not only makes changes when evaluations indicate, but also systematically studies the effects of any changes to assure that the intended program strengthening occurs and that there are no adverse consequences. Beginning in January of 1998, Teacher Education faculty have held semesterly retreats per year, one at the beginning of spring semester (January), one at the beginning of fall semester (August), and one during the summer (May or June). During each retreat, the topic has been the UAS, the conceptual framework, and curriculum issues. Discussion and Conclusion Summary and Discussion by Issue Computer Backbone. It is hard to imagine a teacher education program of any considerable size functioning without a web-based system. It is hard to see how data can effectively be relayed out to people, much less be collected without one. Yet, it appears half the units lack one. For this study we did not do site visits to explore capabilities and faculty satisfaction. We recommend this for future studies, because the IRs neither discussed the rationales behind the choice of approach nor provided any evidence either for or against suitability and effectiveness. What does each institution believe it is receiving, or not receiving, from its system choice? Are they aware of all the alternatives? Evidence for Reliability and Validity. Although the IRs showed evidence of growing consideration to quality in instrumentation for accountability systems, there is a need for more yet. We speculate that one of 55 the reasons that more attention is not paid to this is that most teacher education faculty are not well versed in psychometrics. An important aspect was brought up by only one unit: the importance of building reliability in up front as opposed to simply reporting it after the fact. Below is what this unit wrote. For assessments that are rated by faculty teams, interrater reliability workshops are conducted to prepare faculty. The portfolio, COE exit exam, and the graduate comprehensive exam are examples. In addition to the general workshop, raters engage in a brief training prior to each grading session to standardize the process for that session. Training involves review of the rubric, the rating scale, and the procedure to assure a consensus of understanding. It also involves a short trial-run as a test of rater agreement for the session. Decision Points Used for Initial Licensure. This is one area in which all units followed essentially the same path. Decision Making Process Around Candidates. As described above, most of the units we studied did not have any required process involving faculty judgment for candidates as they moved from stage to stage within the program. Likewise there was no requirement for candidate self-reflection. In most of the IRs the computer simply tracked the requirements and allowed students to move forward when every requirement was checked off. It is possible that programs are missing something by following this strategy. In this regard, Hall, Smith, and Nowinski (2005) note the following. Assessment information should also be used for Candidate Self-Improving. Data from peer observations, candidate’s team teaching small groups of students, and clinical supervisor feedback should be seen by candidates as important information to guide self improvements rather than just as the basis for final grades. In these examples, Dispositions can be as important as the structure of a particular teacher education experience. An important implication of using evidence is that the program evaluation data can be made available to candidates in systematic ways. For example, the assessment system at State University of New York Cortland relies heavily on candidates entering their own data and monitoring their accumulating performance records. (p. 31) Regular Evaluation of the System Itself. Although we saw evidence that some units were making system 56 Journal of Assessment and Accountability in Educator Preparation changes that presumably came from evaluation of the accountability system itself, there was little direct mention of this part of the Standards except for one IR. Use of Data for Program Improvement. Part of the purpose of accountability systems is to provide information useful for program improvement. Wilkens, Young, and Sterner’s 2009 study of 80 IRs found that assessments systems were useful in identifying areas on which a unit could work. However, they found that most IRs were not strong in stating changes, and it was not clear if many of the changes described were due to data or to other forces. This was similar to our findings. Their hypothesis was that “This may be caused by teacher education programs being at an early stage of developing, testing, and refining their performance-based systems rather than at a stage where the focus is on long-term data collection, aggregation, and reporting to make informed decisions” (p. 21). Conclusion The results of our study suggest that there is room for improvement in the state of practice in accountability systems, or at least as portrayed in IR documentation. Perhaps part of the solution is the need to raise awareness of evaluation issues among teacher education faculty. But we also believe that there is need for more detailed descriptive work and comparative analysis of systems. Merely reading the IRs did not usually give us the reasoning behind the choices made—only the choices. We recommend that future research include site visits to see the accountability system in operation, interview faculty, staff, and administrators for details and rationales, and interview users for how it affects them. And we also believe that there is a need to develop additional conceptual understandings and practical processes, to advance the state of the art. References Cornish, Y., Boody, R. M., & Robinson, V. (2010). A study of rater differences in scoring the Teacher Work Sample. Journal of Assessment and Accountability in Educator Preparation, 1, 53-62. Denner, P., Newsome, J., & Newsome, J. D. (2005, February). Generalizability of teacher work sample performance assessments across occasions of development. Paper presented at the meeting of the Association of Teacher Educators, Chicago, IL. Denner, P., Norman, A., & Lin, S. (2009). Fairness and consequential validity of teacher work samples. Educational Assessment Evaluation and Accountability, 21, 235-254. doi: 10.1007/s1109008-9059-6 Denner, P. R., Lin, S.-Y., Newsome, J. R., Newsome, J. D., & Hedeen, D. L. (this issue). Evidence for Improved P-12 Student Learning and Teacher Work Sample Performance from Pre-Internships to Student-Teaching Internships. Journal of Assessment and Accountability in Educator Preparation, 2, 23-35. Denner, P. R., Norman, A. D., Salzman, S. A., Pankratz, R. S., & Evans, C. S. (2004). The Renaissance Partnership teacher work sample: Evidence supporting score generalizability, validity, and quality of student learning assessment. In E. M. Guyton & J. R. Dangel (Eds.). Teacher education yearbook XII: Research linking teacher preparation and student performance (pp. 23-56). Dubuque, IA: Kendall/ Hunt. Denner, P. R., Salzman, S. A., Newsome, J. D., & Birdsong, J. R. (2003). Teacher work sample assessment: Validity and generalizability of performances across occasions of development. Journal for Effective Schools, 2(1), 29-48. Denner, P., Salzman, S., & Bangert, A. (2001). Linking teacher assessment to student performance: A benchmarking, generalizability, and validity study of the use of teacher work samples. Journal of Personnel Evaluation in Education, 15, 287-307. Elliott, E. J. (2003). Assessing education candidate performance: A look at changing practices. Washington, DC: National Council for Accreditation of Teacher Education. Retrieved from http://www.ncate.org/Portals/0/documents/ Accreditation/article_assessmentExamples.pdf Hall, G. E., Smith, C, & Nowinski, M. B. (2005). An organizing framework for using evidence-based assessments to improve teaching and learning in teacher education program. Teacher Education Quarterly, 32(3), 19-33. McConney, A. A., Schalock, M. D., & Schalock, H. D. (1998). Focusing improvement and quality assurance: Work samples as authentic performance measures of prospective teachers’ effectiveness. Journal of Personnel Evaluation in Education, 11, 343-363. National Council for Accreditation of Teacher Education. (2008). Professional standards for the Teacher Education Accountability Systems accreditation of schools, colleges, and departments of education. Retrieved from http://www. ncate. org/Standards/NCATEUnitStandards/UnitStandard sinEffect2008/tabid/476/Default. aspx#stnd2 Renaissance Partnership for Improving Teacher Quality [RPITQ]. (2002). Teacher work sample: Performance prompt, teaching process standards, scoring rubrics. Retrieved from http://www.uni. edu/itq/RTWS/ Wilkins, E. A., Young, A., & Sterner, S. (2009). An examination of institutional reports: Use of data for program improvement. Action in Teacher Education, 31(1), 14-23. Authors Robert Boody is Associate Professor of Educational Psychology and Foundations at the University of Northern Iowa. His research interests include teacher knowledge and change, classroom assessment, philosophy of inquiry, and accountability systems in the preparation of educators. Tomoe Kitajima recently graduated from the University of Northern Iowa with her EdD in Curriculum and Instruction. Her research interests include reflexive inquiry, existential personal projects, and spirituality in leisure. She is currently working as a program evaluator. 57