Exhibit 5. Fair, Accurate, Consistent, and Bias-free Assessment Strategies to assure assessments are fair, accurate, consistent, and free of bias, include: (1) Using multiple measures to assess performance (2) Ongoing conversations among faculty teaching the same course and using the same assessments (3) Collecting assessment data at multiple points across the programs (4) Ongoing professional development for ourselves and the partners in schools where our candidates participate (5) Obtaining estimates of inter-rater reliability across evaluators (6) Estimating content validity of the assessment instruments (7) Recalibrating rubrics after they have been applied over a time period, (8) Testing assessment outcomes for consistency and predictive validity for a sample of candidates (9) Restructuring assessment tasks when the tasks have been deemed invalid or unreliable. Examples include: 1 2 3 In every program across the unit, multiple measures are conducted on candidate’s knowledge, performance, and dispositions. No decisions for program improvement or judgments on candidate performance are based on a single assessment. Ongoing conversations among faculty teaching common courses and using common assessments occur in EDT, for example. In the Special Education program, faculty who teach the same course and use the same rubric regularly convene to ask whether or not the data are reasonably consistent regardless of which instructor is using the rubric. Other examples from Educational Leadership depict conversations among faculty members and adjunct faculty members who teach the same courses. These partners review the common assessments and attempt to resolve those barriers to valid assessment that might arise; for example, lack of clarity in a rubric that is misunderstood by one or more faculty members. That must be resolved through collaborative revision. The applicability of the assessment to real school settings can be validated as well. These conversations are directed at confirming the content validity of an assessment for contemporary school settings, thus assuring another dimension of fairness. In the Middle Childhood program in EDT, for example, similar kinds of conversations are held in order to constantly review the assessments being used. In every program, as seen on Table 6, five common assessments are carried out in each program. Multiple assessments over time strengthen reliability when evaluating the outcomes in terms of candidate performance. A more specific example comes from Early Childhood Education. Candidates are evaluated in a field experience as sophomores, juniors, and those data are related to their senior year experience in student teaching. Repeated assessments in the field UD.School of Education and Allied Professions.May.2009 sample their behavior multiple times in multiple settings, thus strengthening reliability. 4 Ongoing professional development is conducted by both the faculty from the Department of Teacher Education and partner clinical educators. The purpose of our professional development is to assist both the university faculty and the clinical educators to develop a teacher education program that produces quality educators, to improve student learning in prek-12 settings, to engage educators in self-study centered around pedagogical dialogues about best practices and instruction, improve student learning in prek-12 settings, and engage in research and inquiry for the improvement of teaching and learning. 5 In the Special Education program in EDT informal estimates of inter-rater reliability were developed across instructors on certain assessment rubrics. Examples of content validity are included in prior examples in this section. 6 7 In the School Psychology program, for example, candidates prepare a case study that addresses 10 of the 11 domains required by NASP. Studies have been conducted to investigate the validity of the rubric in relation to problem solving outcomes and inter-rater reliability of the instrument. As result, substantial changes in the rubric for the case study have been developed to improve its fairness and unbiased qualities. 8 The assessment of candidates in an initial field experience in Early Childhood Education during their Sophomore year are compared to the later field experiences during Junior and Senior years. Over time estimates of predictive validity of these assessments have been gathered and analyzed. In Early Childhood Education, the faculty members reflected on the ways in which candidates responded to an assignment to compare their life experiences to their students’ life experiences. It was revealed that substantial numbers of candidates lacked the prerequisite cognitive skills to carry out the task. As a result, the instruction was revised and the rubric was recalibrated according to a more substantive theoretical framework. Both efforts enhanced the validity of the assessment, and thus its fairness and unbiased nature. 9 UD.School of Education and Allied Professions.May.2009