Exhibit 5. Fair, Accurate, Consistent, and Bias-free Assessment
Strategies to assure assessments are fair, accurate, consistent, and free of bias, include:
(1) Using multiple measures to assess performance
(2) Ongoing conversations among faculty teaching the same course and using the same
(3) Collecting assessment data at multiple points across the programs
(4) Ongoing professional development for ourselves and the partners in schools where
our candidates participate
(5) Obtaining estimates of inter-rater reliability across evaluators
(6) Estimating content validity of the assessment instruments
(7) Recalibrating rubrics after they have been applied over a time period,
(8) Testing assessment outcomes for consistency and predictive validity for a sample of
(9) Restructuring assessment tasks when the tasks have been deemed invalid or
Examples include:
In every program across the unit, multiple measures are conducted on
candidate’s knowledge, performance, and dispositions. No decisions for
program improvement or judgments on candidate performance are based on a
single assessment.
Ongoing conversations among faculty teaching common courses and using
common assessments occur in EDT, for example. In the Special Education
program, faculty who teach the same course and use the same rubric regularly
convene to ask whether or not the data are reasonably consistent regardless of
which instructor is using the rubric. Other examples from Educational
Leadership depict conversations among faculty members and adjunct faculty
members who teach the same courses. These partners review the common
assessments and attempt to resolve those barriers to valid assessment that
might arise; for example, lack of clarity in a rubric that is misunderstood by
one or more faculty members. That must be resolved through collaborative
revision. The applicability of the assessment to real school settings can be
validated as well. These conversations are directed at confirming the content
validity of an assessment for contemporary school settings, thus assuring
another dimension of fairness. In the Middle Childhood program in EDT, for
example, similar kinds of conversations are held in order to constantly review
the assessments being used.
In every program, as seen on Table 6, five common assessments are carried out
in each program. Multiple assessments over time strengthen reliability when
evaluating the outcomes in terms of candidate performance. A more specific
example comes from Early Childhood Education. Candidates are evaluated in a
field experience as sophomores, juniors, and those data are related to their
senior year experience in student teaching. Repeated assessments in the field
sample their behavior multiple times in multiple settings, thus strengthening
Ongoing professional development is conducted by both the faculty from the
Department of Teacher Education and partner clinical educators. The purpose
of our professional development is to assist both the university faculty and the
clinical educators to develop a teacher education program that produces quality
educators, to improve student learning in prek-12 settings, to engage educators
in self-study centered around pedagogical dialogues about best practices and
instruction, improve student learning in prek-12 settings, and engage in
research and inquiry for the improvement of teaching and learning.
In the Special Education program in EDT informal estimates of inter-rater
reliability were developed across instructors on certain assessment rubrics.
Examples of content validity are included in prior examples in this section.
In the School Psychology program, for example, candidates prepare a case
study that addresses 10 of the 11 domains required by NASP. Studies have been
conducted to investigate the validity of the rubric in relation to problem solving
outcomes and inter-rater reliability of the instrument. As result, substantial
changes in the rubric for the case study have been developed to improve its
fairness and unbiased qualities.
The assessment of candidates in an initial field experience in Early Childhood
Education during their Sophomore year are compared to the later field
experiences during Junior and Senior years. Over time estimates of predictive
validity of these assessments have been gathered and analyzed.
In Early Childhood Education, the faculty members reflected on the ways in
which candidates responded to an assignment to compare their life experiences
to their students’ life experiences. It was revealed that substantial numbers of
candidates lacked the prerequisite cognitive skills to carry out the task. As a
result, the instruction was revised and the rubric was recalibrated according to a
more substantive theoretical framework. Both efforts enhanced the validity of
the assessment, and thus its fairness and unbiased nature.
