Journal of Assessment and Accountability in Educator Preparation Volume 2, Number 1, February 2012, pp. 23-35 Evidence for Improved P-12 Student Learning and Teacher Work Sample Performance from PreInternships to Student-Teaching Internships Peter R. Denner, Shu-Yuan Lin, Julie R. Newsome, Jack D. Newsome, Deborah L. Hedeen Idaho State University Teacher Work Samples (TWSs) were examined for the P-12 student learning reported by pre-interns (TWS1) and student-teaching interns (TWS2), and for the total TWS scores of the teacher candidates at the two internship levels. Across sets of TWSs from different years, teacher candidates performed better overall as student-teaching interns on their TWSs than the same candidates did as pre-interns. The percentages of students achieving the lesson targets and showing learning gains were high for both pre-interns and studentteaching interns. The student-teaching interns reported a higher percentage of their students who showed learning gains from lesson pre-assessments to post-assessments than the pre-interns, but fewer students achieving the lesson targets on the post-assessments according to their stated success criteria. The latter finding was partly due to the pre-interns selecting an easier learning goal for their first achievement target. The TWS total scores were related positively to each of the student learning measures for the student-teaching interns, but only one of the measures for the pre-interns. Recent teacher candidates had better evidence for their impacts on student learning and higher percentages of their students who show learning gains than teacher candidates who graduated five years earlier. The results support a positive influence for a sequential teacher preparation program on the abilities of teacher candidates to meet targeted teaching standards and to support student learning as they progress from pre-interns (TWS1) to student-teaching interns (TWS2). During the last decade, teacher preparation programs have responded to national and state mandates to set rigorous standards for teacher training and to demonstrate their accountability for high-quality teacher preparation. An important aspect of accountability is establishing the link between teacher candidates’ performance and evidence for their impacts on the learning of the students they teach (Schalock, Schalock, & Myton, 1998). Indeed, National Council for Accreditation of Teacher Education’s (NCATE, 2008) Professional Standards for the Accreditation of Schools, Colleges, and Departments of Education requires teacher preparation programs to provide evidence of the impact of their candidates on the learning of P-12 students. The NCATE requirement emerged from value-added research showing that teacher effects on student learning are both additive and cumulative (see Marzano, 2003, Chapter 8 for a review). Schalock, Cowart, and Staebler (1993) proposed that teacher impacts on student learning could be examined in two ways: (a) as teacher effectiveness, defined by Schalock et al. as positive impacts on student achievement resulting from shorter-term Correspondence: Peter Denner, College of Education, 921 South 8th Ave., Stop 8059, Idaho State University, Pocatello, ID 83209. Email: dennpete@isu.edu Author Note: Special thanks to the many teacher education faculty (too numerous to name here) who tirelessly scored Teacher Work Samples for the College of Education at Idaho State University between 2003 and 2009. Journal of Assessment and Accountability in Educator Preparation Volume 2, Number 1, February 2012, 23-35 24 Journal of Assessment and Accountability in Educator Preparation instruction, and (b) as teacher productivity, defined as gains in student achievement on state-mandated achievement tests resulting from longer-term instructtion. For teacher preparation programs to connect candidate performance measures to long-term teacher productivity, it is first essential for them to be able to show the link between their candidate performance measures and short-term teacher effectiveness. Schalock (1987) was among the first to suggest that the effectiveness of prospective teachers could be assessed using a measure of learning gains as demonstrated in the context of teacher work samples. Responding to the call to connect candidate performance to student learning, the Renaissance Partnership for Improving Teacher Quality (Pankratz, 1999), as one of its strategies, adapted the Western Oregon University (Schalock, Schalock, & Girod, 1997) Teacher Work Sample Methodology (TWSM). In addition to documenting candidates’ abilities to plan, deliver, and assess a standards-driven instructional sequence, the Renaissance Teacher Work Sample (RTWS) assessment requires teacher candidates to profile their impacts on student learning, and to reflect on the results of their instruction in order to increase student learning. Consistent with the Western Oregon University TWSM, the Renaissance approach (Denner, Norman, Salzman, Pankratz, & Evans, 2004; Pankratz, 1999) has been to set specific criteria for quality teaching performance in the RTWS standards-linked scoring rubric. The criteria take into consideration the significance of the learning goals, quality of the assessments used to measure student learning, and the candidates’ abilities to profile student performance relative to the learning goals (Denner, et al., 2004). Teacher candidates are not held directly accountable by the RTWS scoring criteria for their effectiveness. Nevertheless, as part of their analysis of student learning, teacher candidates are asked to report learning gains and the number and percent of their students who achieved two of the learning goals (or achievement targets) of the lessons. Consequently, the relationship between the RTWS total scores and these reported measures of impact on student learning could be examined. The present study employed Idaho State University’s (ISU) implementation of the RTWS. A major purpose of the present study was to determine whether TWS total scores are related to the teacher candidates’ impact on P-12 student learning (teaching effectiveness) as reported in their TWSs. McConney, Schalock, and Schalock (1998), using the Western Oregon University TWSM, reported positive links between student learning as measured by an index of pupil growth and instructional variables measured within the context of their TWSM. One previous study (Denner & Salzman, 2003), using a very small number of the RTWSs, showed a positive trend across the RTWS holistic scores of teacher candidates and the percentage of their students who showed learning gains. In an effort to extend the findings of the previous investigations, this study investigated the relationship between TWS total scores and teacher effectiveness as measured by the percentage of the P-12 students meeting achievement targets on the postassessment and the percentage of the P-12 students who showed improvement from the pre-assessment to the post-assessment in the context of the TWS lessons. At ISU, teacher candidates learn about the essential skills required for their TWSs in foundations coursework and are required to write an initial TWS (TWS1) during a pre-internship associated with a general methods course (EDUC 309 Planning, Delivery & Assessment, 6 credits) as a critical assessment for entrance to student-teaching. The candidates complete a second TWS (TWS2) during their student-teaching internships as a critical assessment for program completion. The candidates’ performance scores and their student learning data are retained from both TWSs as part of the unit assessment system. Because of the availability of data from the two TWSs, the TWS total scores and the reported measures of student learning could be examined for changes from TWS1 to TWS2. Expanding beyond the previous studies, a major purpose of the present study was to determine whether value was added to the candidates’ teaching effectiveness as they progressed sequentially from preinterns (TWS1) to student-teaching interns (TWS2) in their teacher preparation programs. Methods Participants The first set of participants were 548 teacher candidates who completed one or more Teacher Work Samples (TWSs) at ISU during the period from fall 2005 through spring 2007. In this set, there were 288 pre-interns and 260 student-teaching interns. For the Improved Student Learning and TWS Performance pre-interns, 49.5% of the TWSs (TWS1s) were from candidates in elementary education and 50.5% from candidates in secondary education. For the studentteaching interns, 43.8% of the TWSs (TWS2s) were from candidates in elementary education and 56.2% from candidates in secondary education. Of the 548 teacher candidates, there were 152 teacher candidates who completed both internships (both TWS1 and TWS2) within the period from fall 2005 to spring 2007. As a follow-up reexamination, the paired TWSs (TWS1 and TWS2) for a second set of 84 teacher candidates who completed their student-teaching internships (TWS2s) in calendar year 2008, and who had previously completed a TWS1, were examined for documentation of the candidates’ impacts on P-12 student learning. The 84 pairs of TWSs included 41 (48.8%) from candidates in elementary education and 43 (51.2%) from candidates in secondary education. Sixty-three of these TWSs were submitted by female teacher candidates (75.0%) and 21 of the TWSs were submitted by male teacher candidates (25.0%). In addition, a representative set of 20 paired TWSs collected by Denner, Newsome, and Newsome (2005) from teacher candidates at ISU who completed their pre-internship during the spring of 2003 and who completed their student-teaching internship during the fall of 2003 were compared with the later sets of TWSs with respect to the reported student learning measures. The retrospective set of TWSs from 2003 consisted of 20 pairs of TWS1s and TWS2s. The paired set of TWSs included nine (45.0%) from candidates in elementary education and 11 (55.0%) from candidates in secondary education. Fifteen (75.0%) of the TWS pairs were submitted by female teacher candidates and five (25.0%) were submitted by male teacher candidates. Measures The teacher candidates in this study completed their TWSs according to the guidelines employed at ISU. The guidelines specified the standards to be demonstrated and the tasks to be performed. (See http://ed.isu.edu/depts/assistdean/assistdean_index.shtm l for the targeted TWS standards, TWS guidelines, and TWS scoring rubric.) A description of the tasks required for the TWSs was presented in Denner, Salzman, Newsome, and Birdsong (2003). Among the required TWS tasks, all teacher candidates were asked to profile and analyze student learning for at least two 25 of the achievement targets (learning goals) of the TWS lessons. The instructors of the courses associated with the internships used the TWS scoring rubric to rate the indicators of each standard, and then the eight standards, on a three-point scale of 0 = Not Met, 1 = Met Acceptable, or 2 = Met At Target. For this study, the TWS total scores were determined by summing the scores for each standard. Hence, the TWS total scores could vary from 0 to 16 points. The additional measures employed in this study came from the student learning measures reported by the teacher candidates in their TWSs. The teacher candidates reported the number and percent of their students who achieved each of two featured achievement targets (Target1 and Target2) and the number and percent of their students who showed improvement (learning gains) from the pre-assessment to the post-assessment on those same achievement targets (Target1 and Target2). Inspection of the TWSs indicated the candidates typically set a criterion of 75% or 80% of the possible points for achievement of the two lesson targets on their post-assessments, depending upon the possible points on the measure. Two additional measures were generated for this study by averaging the percentage of students showing gains and the percentage of students achieving the targets across Target1 and Target2. This study did not examine measures of teacher candidate demographics. Denner, Norman, and Lin (2009) examined the effect of various demographic characteristics of teacher candidates—including gender, age, race/ethnicity, grade point average, and program— on TWS performance levels. They found program major was a consistent predictor of TWS performance along with grade point average, but other demographic factors were not primary predictors. As reported previously, the proportions of teacher candidates in elementary education and in secondary education were similar across all of the sets of TWSs in this study. In addition, no effort was made in this study to control for lesson content, number of students taught, student demographics, grade level, school setting, or other factors that might affect teaching effectiveness as measured in this study. The position of the developers of the RTWS (see Denner et al., 2004), which followed the lead of the developers of the Western Oregon University TWSM (see Schalock et al., 1998), has been that teacher candidates should consider such factors when they plan their lessons and demonstrate positive 26 Journal of Assessment and Accountability in Educator Preparation impacts on student learning regardless of such situational and contextual factors. Indeed, Wright, Horn, and Sanders (1997) have shown variables such as class size and student heterogeneity exert little influence on teacher effectiveness as measured by academic gain. For the candidates for whom we had scores from TWS1 and TWS2, as explained in Denner et al. (2003), policies at ISU regarding internship placements ensured that the same teacher candidate taught a different topic to different students at a different grade level in a separate semester and usually at a different school, when he or she completed the two TWSs. Again, no effort was made to control for any situational or contextual factors when comparing the TWS1 to the TWS2 performances of these teacher candidates. Scoring The TWS scores used in this study were the scores assigned by the program faculty of the courses the candidates were required to take in conjunction with their internships. All of the course instructors were trained and experienced raters. A random sample of 50 TWSs was selected by internship level (25 TWS1s and 25 TWS2s). The 50 TWSs were rescored by a trained and experienced rater who had not previously scored any of the TWSs used in this study. The Pearson correlation between the original TWS total scores and the second ratings was r = .92, p < .001, indicating a high level of inter-rater agreement and sufficient scoring reliability for the purposes of this study. The reliability of the achievement and improvement percents reported by the teacher candidates for their students in their TWSs was not assessed. Because the ISU teacher candidates were only required to report these measures, and they were not required to demonstrate any level of achievement or learning gains, there was no incentive for the teacher candidates to make false reports. When submitting their TWSs, the teacher candidates signed an affidavit affirming that the reported work was completed by them and was not being reported dishonestly. Cooperating teachers and university supervisors observed the candidates during the TWS lessons and reviewed their TWS reports. No additional effort was made to verify the accuracy of the achievement and improvement data as reported by the teacher candidates. Procedures The TWSs were completed to meet graduation requirements for teacher education programs in the College of Education at ISU. The TWS scores and student learning measures are routinely collected and entered into a database for the teacher education programs. This study made use of the existing TWS scores, the existing student learning measures (the reported percent of students meeting Target1 and Target2 and the reported percent of students who showed improvement from the pre-assessment to the post-assessment of Target1 and Target2) contained in the database. The records of the candidates were located as found sets by the principal investigator within the database via a search of the records based on the semester and year when the TWSs were completed. The data were then exported from the database and entered into Windows® SPSS®16.0 for data analysis. The TWSs from calendar year 2003 had been collected as part of a previous study (Denner et al., 2005). The student learning measures were available from the existing data files of the earlier study, but had not been reported. The use of the existing data was approved by the ISU Human Subjects Committee. Candidates in the ISU teacher education programs were informed of the purposes of the unit assessment system and the fact that their assessment information may be used as part of program evaluation studies. Design The design was descriptive, correlational, and causal-comparative. Descriptive statistics for TWS performance levels and the student learning measures were calculated separately by internship level for the TWS sets. Regression analyses and Pearson correlations were used to investigate the relations of the participants’ total TWS scores and the student learning measures by internship level for the initial TWS set. The effects of internship level on TWS performance and the student learning measures were examined only for the sets of teacher candidates that produced both TWS1 and TWS2 during the periods of the study. The effects were tested using correlated t-tests. Primary dependent variables were total TWS scores, average percentage of students achieving the lesson targets, and the average percentage of students showing improvement on the lesson targets. Other dependent variables were the reported percent of Improved Student Learning and TWS Performance students achieving each of the two lesson targets (Target1 and Target2), and the reported percent of students showing improvement on the two lesson targets. The level of significance was set at α = .05 for statistical tests for separate dependent variables and was held at α = .033 (FWE = .10) for statistical tests performed for related dependent measures. Cohen’s d was reported as the measure of effect size for all t-tests. Results Reported Impacts on Student Learning Table 1 presents the means and standard deviations for the student learning measures reported in the TWSs by internship level (TWS1 and TWS2) by the 20052007 teacher candidates. The pre-interns reported an average of 81.6% of their students met the achievement targets on the post-assessments, whereas the average percent reported for the students taught by the studentteaching interns was lower at 75.5%. In addition, an average of 80.6% of the students taught by pre-interns showed pre-assessment to post-assessment learning gains, while 92.6% of the students taught by studentteaching interns showed learning gains. When the preinterns reported a higher percentage of students who achieved Target1, they also reported a lower percentage of students who showed improvement on the same target from pre-assessment to post-assessment. In contrast, although the student-teaching interns reported lower percentages of students achieving the criterion they set for Target1 and Target2, they reported a higher percentage of students showing improvement for both achievement targets. Relations of TWS Scores to the Student Learning Measures Table 1 also presents the means and standard deviations for TWS performances of the 288 preinterns who completed TWS1 between fall 2005 and spring 2007 and for the 260 student-teaching interns who completed TWS2 during the same period. The mean TWS1 total score was 13.1 (SD = 2.3) and the mean TWS2 total score was 14.7 (SD = 1.6). For the 2005-2007 pre-interns, a linear regression of the TWS1 total scores on the student learning measures showed a statistically significant relationship 27 to the reported percent of students achieving Target2, F(1, 275) = 5.78, MSE = 483.41, p = .017, r = .14, but not to the reported percent of students achieving Target1, F(1,274) = 0.07, MSE = 303.47, p = .796, r = .02, or to the average percent of the students achieving the targets, F(1,272) = 3.14, MSE = 230.44, p = .077, r = .11. The regression for the pre-interns of their TWS1 total scores on the average percent of students showing improvement (learning gains) on the achievement targets from the pre-assessment to the post-assessment did not yield a statistically significant relationship, F(1, 270) = 1.85, MSE = 239.35, p = .174, r = .08. There was also no statistically significant relationship to the reported percent of students showing improvement on Target1, F(1, 276) = 0.53, MSE = 498.75, p = .466, r = .04, or to the reported percent of students showing improvement on Target2, F(1, 270) = 1.52, MSE = 354.94, p = .219, r = .08. Together, these results indicate the TWS1 total scores of the pre-interns were only positively related to achievement for the second featured achievement target (Target2) and not to any of the other student learning measures. For the 2005-2007 student-teaching interns, the regression analyses revealed that the TWS2 scores were related statistically to each of the student learning measures. There was a statistically significant relationship between the TWS2 total scores and the reported percent of students achieving Target1, F(1, 251) = 4.64, MSE = 468.94, p = .032, r = .14, the reported percent of students achieving Target2, F(1, 250) = 4.88, MSE = 517.67, p = .028, r = .14, and the average percent of the students achieving the targets, F(1,250) = 6.13, MSE = 400.70, p = .014, r = .16. A statistically significant relationship was also shown for the student-teaching interns between their TWS2 total scores and the reported percent of students showing improvement Target1, F(1,252) = 5.05, MSE = 185.37, p = .025, r = .14, the reported percent of students showing improvement on Target 2, F(1,250) = 12.93, MSE = 243.69, p < .001, r = .22, and the average percent of the students showing improvement, F(1, 250) = 10.87, MSE = 178.38, p = .001, r = .20. Although these relationships were small, TWS performance was positively linked to the P-12 student learning impacts of the student-teaching interns. 28 Journal of Assessment and Accountability in Educator Preparation Table 1 Means and Standard Deviations for the Teacher Work Sample (TWS) Total Scores and Student Learning Measures by Internship Level (TWS1 and TWS2) of all 2005-2007 Teacher Candidates and the Teacher Candidates with Paired TWSs. All Teacher Candidates n TWS2 M (SD) n TWS2 M (SD) n 13.1 (2.3) 260 14.7 (1.6) 152 13.3 (2.2) 152 14.9 (1.4) 274 81.6 (15.2) 252 75.5 (20.2) 140 82.0 (15.4) 140 76.1 (20.9) Percent Achieving Target1 276 88.7(17.4) 253 76.1(21.8) 140 89.5 (16.2) 140 77.0 (22.5) Percent Achieving Target2 277 74.5 (22.2) 252 75.0 (22.9) 140 74.6 (22.4) 140 75.2 (23.0) Average Percent Showing Improvement 272 80.6 (15.5) 252 92.6 (13.6) 140 80.1 (16.9) 140 92.4 (15.3) Percent Improving on Target1 278 73.0 (22.3) 254 92.8 (13.7) 140 70.9 (24.3) 140 92.4 (15.2) Percent Improving on Target 2 272 88.4 (18.9) 252 92.4 (16.0) 140 89.3 (19.3) 140 92.4 (17.3) Measures n TWS Total Scores 288 Average Percent Achieving Lesson Targets TWS1 M (SD) Teacher Candidates with TWS Pairs TWS1 M (SD) Note. TWS stands for Teacher Work Sample. TWS1 was completed by pre-interns and TWS2 was completed by student-teaching interns. Target1 refers to the first lesson achievement target featured in a TWS. Target2 refers to the second lesson achievement target featured in a TWS. Improved Student Learning and TWS Performance 29 Table 2 Means and Standard Deviations for the 2008 and 2003 Student Learning Measures by Internship Level (TWS1 and TWS2). 2008 2003 TWS1 TWS2 Student Learning Measures n M (SD) M (SD) Average Percent Achieving Targets 84 81.6 (16.1) 79.2 (16.0) Percent Achieving Target1 84 86.1 (21.6) 79.3 (19.4) Percent Achieving Target2 84 77.1 (21.1) 79.0 (19.2) Average Percent Improving 84 79.9 (16.0) 93.5 (11.3) Percent Improving on Target1 84 72.5 (23.9) Percent Improving on Target2 84 87.3 (17.0) TWS1 n a TWS2 a M (SD) n M (SD) 9 64.9 (25.5) 10 83.3 (22.4) 9 70.6 (25.5) 9 80.7 (27.2) 94.0 (12.1) 13 84.7 (19.5) 17 84.6 (19.0) 92.9 (12.0) 13 85.0 (18.6) 14 87.3 (22.4) Note. TWS stands for Teacher Work Sample. TWS1 was completed when the candidates were interns and TWS2 was completed when the same candidates were student-teaching interns. Target1 refers to the first lesson achievement target featured in a TWS. Target2 refers to the second lesson achievement target featured in a TWS. a Valid n only for the 2003 teacher candidates out of 20 TWS pairs. Effect of Sequential Internships on TWS Performance Performances were available for both TWS1 and TWS2 for 152 of the teacher candidates from 20052007. The means and standard deviations are shown in Table 1. The mean TWS total score of these 152 teacher candidates was 13.3 on TWS1 and 14.9 on TWS2. The correlated test for the positive mean difference of 1.6 for the paired TWS1 and TWS2 total scores was statistically significant, t(151) = 7.61, p < .001, d = 0.62. The effect size was considerable. Thus, the same teacher candidates were shown to perform better overall as student-teaching interns on their TWSs than they did as pre-interns. Effect of Sequential Internships on Student Learning As can be seen in Table 1, all of the learning measures were available from both TWS1 and TWS2 for 140 of the 152 teacher candidates from 20052007 with both TWSs available. The student learning measures were not adequately reported by twelve of the 152 teacher candidates (7.2%). The mean for the average percent of their students achieving the lesson targets was M = 82.0 for TWS1 and M = 76.1 for TWS2. The correlated test for this mean difference of -6.0% was statistically significant, t(140) = -2.88, SE = 2.08, p = .005, d = 0.24. Hence, these teacher candidates reported a higher percentage of students who achieved the lesson targets on average when they were pre-interns than when they were student-teaching interns. Separately, the negative mean difference of -12.5% for the Target1 was statistically significant, t(139) = -5.34, SE = 2.35, p < .000, d = -0.45, but the positive mean difference of 0.62% for the Target2 was not statistically significant, t(139) = 0.24, SE = 2.57, p = .809, d = .02. For the average percentage of their students showing improvement, the mean of these 140 teacher candidates was M = 80.1 on TWS1 and M = 92.4 on TWS2. The correlated test for this mean difference of 12.3% was statistically significant, t(140) = 6.72, SE = 1.83, p < .001, d = 0.57. Looking at the achievement targets separately, the positive mean difference of 21.6% for the Target1 was statistically 30 Journal of Assessment and Accountability in Educator Preparation significant, t(139) = 8.93, SE = 2.41, p < .000, d = .75, but the positive mean difference of 3.1% for Target2 was not statistically significant, t(139) = 1.49, SE = 2.11, p = .140, d = .13. Hence, the teacher candidates reported a higher percentage of students who showed learning gains from the pre-assessment to the post-assessment when they were studentteaching interns than when they were pre-interns, but the main influence was for the first featured achievement target. Reported Impacts on Student Learning in 2008 The effects of the sequential internships on P-12 student learning were reexamined for the studentteaching interns in 2008. Table 2 presents the means and standard deviations for the student learning measures of the paired TWSs (TWS1 and TWS2) of the 84 teacher candidates who completed their student-teaching internships (TWS2s) in calendar year 2008 and who had previously completed a TWS1. The student learning measures were available from both TWSs for all 84 of the teacher candidates. As can be seen from Table 2, a higher percentage of P-12 students were reported to achieve the lesson targets on average (M = 81.62 versus M = 79.16) when the teacher candidates were pre-interns (TWS1) than when they were student-teaching interns (TWS2). However, this difference was not statistically significant, t(83) = 1.10, SE = 2.24, p = .276, d = .12. Singly, the negative mean difference of -6.83% for Target1 was statistically significant, t(83) = -2.28, SE = 3.00, p = .025, d = .25, but the positive mean difference of 1.92% for Target2 was not statistically significant, t(83) = 0.68, SE = 2.84, p = .501, d = .07. Similar to the teacher candidate performances from fall 2005 through spring 2007, there was a negative mean difference for the average percent of students achieving the lesson targets reported by teacher candidates when they were student-teaching interns compared to when they were pre-interns. However, the difference was only negative and statistically significant for Target1. Similar to the earlier findings for the 2005-2007 teacher candidates with paired TWSs, Table 2 shows the mean percentage of the students showing improvement on both achievement targets was higher when the teacher candidates from 2008 were studentteaching interns (TWS2) than when they were pre- interns (TWS1). The means for Target1 were M = 72.5 for TWS1 and M = 94.0 for TWS2, and the means for Target2 were M = 87.3 for TWS1 and M = 92.9 for TWS2. The correlated t-test for the positive mean difference of 13.6% in the average percent of students improving across both achievement targets was statistically significant, t(83) = 7.51, SE = 1.80, p < .001, d = .82. Separately, the correlated t-test for the positive mean difference of 21.6% for Target1 was statistically significant, t(83) = 8.37, SE = 2.58, p < .001, d = .91. The effect size was large. The correlated t-test for the positive mean difference of 5.5% for Target2 was also statistically significant, t(83) = 2.58, SE = 2.14, p = .012, d = .28. However, this effect size was small. Together with the earlier findings for the 2005-2007 teacher candidates, the results indicate the teacher candidates had higher percentages of P-12 students who showed learning gains from pre-assessments to post-assessments when they were student-teaching interns than when they were pre-interns. Like the teacher candidates from 2005-2007, the effect was larger for the first featured achievement target (Target1). Overall, the findings indicate the teacher candidates increased their abilities to impact student learning as they progressed in a sequential teacher preparation program from a pre-internship to a student-teaching internship. In addition, the average percentage of P-12 students reported to show improvement by the 2008 studentteaching interns was very high at 93.5%. Reported Impacts on Student Learning in 2003 The effects of the sequential internships on the evidence for P-12 student learning contained in TWSs was also examined for a representative set of 20 paired TWSs from 2003. Table 2 contains the means and standard deviations for the reported student learning measures by internship level (TWS1 and TWS2). As can be seen from Table 2, the number of TWSs containing the student learning information varied from TWS1 to TWS2. In addition, the number of TWSs containing information about the percentage of students achieving the lesson targets was different from the number of TWSs with information about the percentage of students improving on the lesson targets. From Table 2, it can also be seen that half or less of the 40 TWSs in the paired set of 20 TWSs Improved Student Learning and TWS Performance contained information sufficient to determine the percentages of students achieving each of the achievement targets. Inspection of the means in Table 2 indicates the average percentage of students reported to achieve each of the lesson targets was higher for TWS2 than for TWS1. However, the number of TWSs in 2003 with data for both TWS1 and TWS2 for the two achievement targets was too small to test the differences for statistical significance. Table 2 also shows the mean percentage of the students reported to improve on both achievement targets was much lower in the set of TWS2s from 2003 than in the set of TWS2s from 2008. Again, the number of candidates with information about the percentage of students improving on the achievement targets in both of their TWSs was insufficient to test meaningfully. It should be noted, however, that the means shown in Table 2 indicate little difference in 2003 in the percentage of students showing improvement on the two achievement targets from TWS1 to TWS2. Together, the results for the 2003 TWSs indicated that ISU teacher candidates in 2003 did not show consistent evidence for their impacts on student learning. The percentage of the teacher candidates (50% or less) showing evidence sufficient to determine the percentage of their students achieving the lesson targets was low for both pre-interns and student-teaching interns. Although the percentage of candidates showing evidence sufficient to determine the percentages of their students improving on the lesson targets was somewhat higher (around 65%), the means shown in Table 2 did not reveal the candidates were getting any better at improving student learning from their pre-internship to their student-teaching internship. This finding suggests that the increase in teaching practice from the candidates’ pre-internships to their student-teaching internships did not increase the percentage of their students reported to improve on their TWS achievement targets. In addition, Table 2 shows the student-teaching interns in 2008 reported higher average percentages of their students with learning gains (94.0% on the first achievement target and 92.9% on the second achievement target) than did the student-teaching interns with available information in 2003 (84.6% for the first achievement target and 87.3% for the second achievement). The differences 31 are revealing despite the fact that they could not be tested statistically. Discussion Evidence for Teaching Effectiveness Do Teacher Work Samples (TWSs) show evidence for the teaching effectiveness of teacher candidates? Consistent with the pioneering work at Western Oregon University (McConney et al., 1998; Schalock, 1987; Schalock et al., 1997), the findings of this study confirm that TWSs provide evidence for the teaching effectiveness of teacher candidates. The 2005-2007 student-teaching interns reported that an average of 75.5% of their students met the criterion (typically set between 75% and 80% of the possible points on the post-assessment) for the achievement targets, and 92.6% of their students showed learning gains from pre-assessments to post-assessments. These findings were replicated by the 2008 studentteaching interns, who reported an average of 79.2% of their students met the achievement targets and 93.5% showed learning gains from their preassessments to their post-assessments. Hence, TWSs do serve as a means of quality assurance, whereby teacher candidates demonstrate their abilities to teach so that students can learn. Due to the statistical phenomenon known as regression toward the mean and other factors affecting the reliability of gain scores, it is doubtful that any selected set of TWSs is going to include learning gains that average 100%. In that light, the average percentage of P-12 students showing learning gains reported here (93.5% for the 2008 student-teaching interns) was a very good approximation to a demonstration that our studentteaching interns are able to teach so that all students can learn in accordance with the expectation of the NCATE (2008) Professional Standards for the Accreditation of Schools, Colleges, and Departments of Education. This finding is relevant to policy makers who are considering a TWS assessment as a requirement for teacher licensure. It is also relevant to teacher preparation programs that use, or are considering using, TWSs as a means to document the teaching effectiveness of their program graduates. 32 Journal of Assessment and Accountability in Educator Preparation Relation of TWS Performance to Teaching Effectiveness Is there a relationship between TWS performance levels and the evidence for teaching effectiveness exhibited in the TWSs of teacher candidates? Consistent with the findings reported by McConney et al. (1998), TWS performance levels of the 20052007 student-teaching interns in this study were related positively to all of the evidence for student learning contained within their TWSs. Dissimilar to the findings reported by McConney et al. (1998, p. 357), where TWS measures were reported to explain from 24.5% to 59.5% of the variance in their metrics of student learning, the amounts of variance in the percentages of students showing learning gains that could be explained by the total TWS performance levels were found to be much smaller in this study. The TWS total scores of the student-teaching interns accounted for only 2% to 5% of the variance of the learning measures. The difference in the results was undoubtedly linked to the different metrics that were used to assess impacts on student learning. McConney et al. (1998) used an adjusted index of pupil growth that took into account both the complexity of the achievement targets and the quality of the assessments. The learning measures employed in this study were not adjusted for these or any other factors. In contrast, for the pre-interns (TWS1), a positive relationship was only shown for the second featured achievement target, but not the first achievement target. On the second featured achievement target, those pre-interns who scored better on their TWS1 overall began to report higher achievement levels, but the percentage of explained variance was very small (only 2%). In addition, the TWS1 total scores were not related to the evidence for learning gains. The contrast between the results for TWS2 and the results for TWS1 suggests the relationship between TWS performance and teaching effectiveness increases as candidates progress sequentially in their preparation programs from being pre-interns to student-teaching interns. This prospect will be explored later in the context of other findings of this investigation. Finally, although the McConney et al. (1998) study and the present study both showed positive relationships between TWS total scores and reported measures of student learning contained within TWSs, the amounts of explained variance imply the assessment of teacher candidates’ abilities to meet the targeted teaching standards is also considerably separate from the issue of their teaching effectiveness. Indeed, it has always been claimed that the RTWS does not hold teacher candidates directly accountable for their teaching effectiveness, but is instead a measure of their abilities to meet the targeted teaching standards (Denner et al., 2004). The findings of this investigation support this claim. Teaching Performance Gains Do teacher candidates improve their abilities to meet teaching standards in a sequential preparation program that requires them to document their teaching performances using TWSs in both a preinternship and a student-teaching internship? The findings of the present investigation revealed that teacher candidates performed higher overall as student-teaching interns on their TWSs than they did as pre-interns. This supports a contribution resulting from the sequential experiences of completing two TWSs during two levels of internship experience. However, this finding is contrary to our previous studies (Denner & Lin, 2005; Denner et al., 2005; Denner et al., 2003) that did not show overall performance differences by internship level. The differences in the findings were likely due in part to the dissimilar methodologies of the studies, but may also be attributed to program changes that have occurred because of those earlier studies. (The most notable change was a repositioning and redesign of a course on diversity away from the student-teaching internship to an earlier position in the program in favor of a new student-teaching seminar focused on teaching performance issues.) The finding of the present study is important because it shows that teacher preparation programs that employ two TWSs can make a difference in the documented teaching abilities of their teacher candidates with respect to widely acknowledged teaching standards. Of course, part of the improvement on TWS2 might be due to practice in completing the tasks required by the TWSs, although this was not found in our prior studies. For performance assessments, improvements in abilities to meet standards are not separable from improvements in abilities to execute the authentic tasks and to supply the required documentation used to measure those standards. As a result, higher TWS performance is evidence of better Improved Student Learning and TWS Performance ability to meet the teaching standards. This interpretation is strengthened by the additional evidence discussed next for improved teaching effectiveness as the candidates progressed from preinterns to student-teaching interns. Improved Teaching Effectiveness Do TWSs contain evidence of increased teaching effectiveness as candidates progress from being preinterns and to student-teaching interns in a sequential teacher preparation program? For the two recent sets (2005-2007 and 2008) of paired TWS performances (TWS1 and TWS2), the TWS documentation revealed a higher percentage of P-12 students who showed learning gains from the pre-assessments to the post-assessments when teacher candidates were student-teaching interns than when they were preinterns. For both of the sets, the mean differences were statistically significant and substantial. This result supports an influence for a sequential teacher preparation program on the abilities of teacher candidates to support student learning as they progress from pre-interns to student-teaching interns. Other teacher preparation programs should consider using a similar sequential internship model (preinternship with TWS1 followed by a student-teaching internship with TWS2). The finding of a decline in the reported average percent of students achieving the lesson targets when the teacher candidates were student-teaching interns compared to their TWSs as pre-interns was likely due to the teacher candidates setting more challenging achievement targets as student-teaching interns. When the teacher candidates were pre-interns, they tended to choose easier achievement targets. This was particularly true for the first achievement target, where a higher percentage of the students taught by the pre-interns achieved the target on their postassessments, but a smaller percentage of their students showed learning gains from the preassessment to the post-assessment. Hence, rather than a negative finding, this result may reflect an improvement by the teacher candidates from their pre-internships to their student-teaching internships in setting appropriate achievement targets and choosing challenging content. The issue of setting appropriate achievement targets and its relation to demonstrated impacts on student learning merits further consideration and investigation. 33 While the sequential model was undoubtedly one reason for the results, similar results were not present in the TWSs from calendar year 2003. Although the percentages of students reported to achieve the TWS lesson targets was similar in both 2003 and 2008, in calendar year 2003 many of the teacher candidates were missing evidence in their TWSs, which was much less the case from 2005 to 2007, and was not the case at all in 2008. In addition, evidence for improvement in the percentages of students showing learning gains on the two lesson targets from the preinternship (TWS1) to the student-teaching internship (TWS2) was not present in the calendar year 2003 TWSs. The comparisons between 2003 and 2008 reveal that beyond any practice effects resulting from the sequential internships, the use of two TWSs as part of the assessment system has enabled us to improve the quality of our teacher preparation program in terms of the abilities of candidates to teach so their students can learn. This is consistent with the expectations of NCATE (2008) that focusing on candidate performances as an important aspect of unit accountability is a vehicle for program improvement. It is also consistent with the vision of Del Schalock, Mark Schalock and their colleagues at Western Oregon University (Schalock et al., 1997; Schalock et al., 1998; Schalock, 1987; Schalock et al., 1993) that a Teacher Work Sample Methodology that focused on student learning would lead to the reform and improvement of teacher preparation programs. References Denner, P. R., & Lin, S.-Y. (2005). Fairness and aspects of the consequential validity of performance assessments using a Teacher Work Sample. In P. R. Denner (Chair), Fairness and aspects of the consequential validity of performance assessments using a teacher work sample. Symposium presented at the 59th annual meeting of the American Association for Colleges of Teacher Education, Washington, D.C. Denner, P., Newsome, J., & Newsome, J. D. (2005, February). Generalizability of teacher work sample performance assessments across occasions of development. A research report presented at the Annual Meeting of the Association of Teacher Educators, Chicago, IL. 34 Journal of Assessment and Accountability in Educator Preparation Denner, P., Norman, A., & Lin, S. (2009). Fairness and consequential validity of teacher work samples. Educational Assessment, Evaluation and Accountability, 21, 235-254. doi: 10.1007/s1109-008-9059-6 Denner, P. R., Norman, A. D., Salzman, S. A., Pankratz, R. S., & Evans, C. S. (2004). The Renaissance Partnership teacher work sample: Evidence supporting score generalizability, validity, and quality of student learning assessment. In E. M. Guyton & J. R. Dangel (Eds.), Teacher education yearbook XII: Research linking teacher preparation and student performance (pp. 23-56). Dubuque, IA: Kendall/ Hunt. Denner, P. R., & Salzman, S. A. (2003, January). Ways Teacher Work Samples Impact the Learning of All Students. In R. S. Pankratz (Chair), Evidence of Teacher Work Sample Impact on P-12 Student Learning, Teacher Performance and Teacher Preparation Programs. Symposium conducted at the 55th annual meeting of the American Association for Colleges of Teacher Education, New Orleans, LA. Denner, P. R., Salzman, S. A., Newsome, J. D., & Birdsong, J. R. (2003). Teacher work sample assessment: Validity and generalizability of performances across occasions of development. Journal for Effective Schools, 2(1), 29-48. Marzano, R. J. (2003). What works in schools: Translating research into action. Alexandria, VA: Association for Supervision and Curriculum Development. McConney, A. A., Schalock, M. D., & Schalock, H. D.(1998). Focusing improvement and quality assurance: Work samples as authentic performance measures of prospective teachers’ effectiveness. Journal of Personnel Evaluation in Education, 11, 343-363. National Council for Accreditation of Teacher Education. (2008). Professional standards for the accreditation of schools, colleges, and departments of education. Washington, DC: Author. Pankratz, R. (1999). Improving teacher quality through partnerships that connect teacher performance to student learning. Unpublished manuscript, Western Kentucky University. Schalock, M. D. (1987). Teacher productivity: What is it? How might it be measured? Can it be warranted? Journal of Teacher Education, 38(5), 59-62. Schalock, M. D., Cowart, B., & Staebler, B. (1993). Teacher productivity revisited: Definition, theory, measurement, and application. Journal of Personnel Evaluation in Education, 7, 179-196. Schalock, H. D., Schalock, M., & Girod, G. (1997). Teacher work sample methodology as used at Western Oregon State College. In J. Millman (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluation measure? (pp. 15-45). Thousand Oaks, CA: Corwin Press. Schalock, H. D., Schalock, M., & Myton, D. (1998, February). Effectiveness—along with quality— should be the focus. Phi Delta Kappan, 79, 468470. Wright, S. P., Horn, S. P., & Sanders, W. L. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 11, 57-67. Authors Dr. Peter R. Denner is the Associate Dean of the College of Education at Idaho State University. His current research interests are focused on standards-based performance assessments of teacher quality and the linking of teacher performance assessments, particularly Teacher Work Samples, to the learning of P-12 students. Dr. Shu-Yuan Lin is an associate lecturer in the Department of Educational Foundations in the College of Education at Idaho State University. Her current research interests and special projects are focused on English as a second/new/foreign language instruction, technology integration in K-16 instruction, and cultural and linguistic diversity in education. Dr. Julie R. Newsome is an associate professor in the Department of Educational Foundations in Improved Student Learning and TWS Performance the College of Education at Idaho State University. Her current research interests and special projects are focused on performance assessments of P-12 students and teacher candidates and how these can demonstrate teacher quality in the accreditation process. Dr. Jack D. Newsome is the former Associate Dean of the College of Education at Idaho State University. Before his retirement in 2010, his research interests were focused on standardsbased performance assessments of teacher quality and the linking of teacher performance assessments, particularly Teacher Work Samples, to the learning of P-12 students. Dr. Deborah L. Hedeen is the Dean of the College of Education at Idaho State University. Her current research interests are focused on teacher candidate impact on student learning, P16 seamless education and the Common Core State Standards, and designing inclusive learning environments for students with disabilities. 35