Educational Effectiveness An Assessment Guide to

advertisement
An Assessment Guide to
Educational Effectiveness
Educational Effectiveness
Chapter 1: Assessment Concepts, Terms, and Purpose - provides an overview of assessment in the
United States and defines the elements of effective assessment practice. The intended audience of this
chapter includes college and university presidents, vice presidents, provosts, board members, and executive
sponsors of campus assessment efforts
Chapter 2: Planning for Assessment – provides the operational elements of planning for assessment
including governance, structures, and strategies. The intended audience of this chapter includes executive
sponsors of campus assessment and directors of institutional effectiveness, directors of assessment,
assessment coordinators , department heads, and others on committees involved in the assessment process.
Chapter 3: Conducting Assessment – provides an operational view of assessment including decisions that
need to be made, and designing processes to keep assessment useful, manageable, and sustainable. The
intended audience of this chapter include those directly involved in conducting assessment.
Chapter 4: Using Assessment Results – provides information about the important final steps of
assessment: using actionable knowledge to improve educational quality. The intended audience of this
chapter includes senior leadership as well as those involved in assessment.
The clickable document map below provides readers with the ability to both understand Blackboard’s
Educational Effectiveness strategic consulting framework, and to access specific topics by clicking on the links.
Designing
Effective
Assessment
Governance
Identifying
program
outcomes
Designing
effective direct
measurements
Designing
effective surveys/
course evals
Interpreting
results
Designing
Effective Goals &
Outcomes
Identifying
disciplinary
outcomes
Identifying cocurricular
outcomes
Writing effective
outcome
statements
Designing
effective indirect
measurements
Designing
effective rubric
evaluation
processes
Designing
collective
interpretation
processes
Designing
Institutional
Assessment Plans
Designing faculty
assessment
governance
Designing staff
assessment
governance
Aligning
assessment with
standards
Designing
effective
expressions of
quality
Using curriculum
maps for
curricular design
Developing
summative &
formative uses
Designing Exec.
Roles & Decisionmaking
Assessment in
Public Relations
Interpreting
assessment
results
Interpreting
assessment
results
Creating effective
rubrics
Improving equity
of outcomes
Using results for
program
improvement
Using results for
program
improvement
Developing
embedded
assessments
Designing for
manageability &
sustainability
Institutional
Strategy
Faculty &
Program
Development
Student Support
& Engagement
Strategic
Evaluation
Criteria
Measurement
Methods
Repurposing test
items for
assessment
Designing
effective
assessment
portfolios
Measurement
Instruments
Making results
visible
Designing
effective reports
Use of
Results
Tactical
Tactical
Chapter 1: Assessment Concepts, Terms, and Purpose
Assessment as an engine of change. While it is likely that very few institutions would be
conducting assessment were it not for accreditation requirements, and while many do view the
process as an exercise in accountability, institutions have much to gain from outcomes
assessment. If used as a diagnostic tool, assessment can serve as a powerful engine of
change that keeps education a
vibrant, relevant, and
indispensable social institution.
Indeed, outcomes assessment
is arguably the only systematic
means we have to continually
improve our core business –
educating our students. By
framing assessment within our
institutional mission and goals,
we can understand the impact
of our work on several levels.
To what extent did we achieve
our goals? To what extent are
we aligned among mission,
goals and processes? Where are opportunities to celebrate our accomplishments? Where are
opportunities to improve performance?
Of course, educators have long practiced assessment in many forms. Assessment in the
classroom allows us to understand student performance. Administrative assessment allows us
to understand faculty and staff performance. The work of our Institutional Research offices
allows us to understand the inputs and outputs of our work: average test scores, average high
school GPA, graduation rates, retention rates, average GPA on graduation, time to graduation,
average admission scores, etc. All of these processes tend to focus on the student or employee
as the unit of measure. Outcomes assessment focuses on the program as the unit of measure.
More importantly, outcomes assessment provides a means for us to understand the quality of
our programs. With thoughtfully designed methods, we can generate data that tells us not only
where we stand in achieving our desired end states (judgment and accountability), we can also
identify what to do to improve our work (diagnostic and improvement).
Before delving into the implementation of outcomes assessment (how to measure, what to
measure, etc.), it is helpful to have a common understanding of terms used throughout this
collection of documents.
Terms and definitions. What do we mean when we use the term “assessment” and what is
the difference between “assessment” and “evaluation”?
3
Arguably the father (and mother) of evaluation practice, Michael Scriven, in his Evaluation
Thesaurus (Fourth Edition, 1991, Sage Publications) provides 4 definitions for evaluation. We
share the most relevant definition:
EVALUATE, EVALUATION The process of determining the merit, worth, or value of
something, or the product of that process.
Scriven sees no distinction between the term “assessment” and “evaluation” in his definition of
“assessment”:
ASSESSMENT Often used as a synonym for evaluation, but sometimes the subject of
valiant efforts to differentiate it, presumably to avoid some of the opprobrium associated
with the term “evaluation” in the minds of people for whom the term is more important
than the process. None of these efforts are worth much, either in terms of intrinsic logic
or adoption.
Although we do not make distinctions between the two terms, many campuses use the terms
differentially – often assigning the term “evaluation” in instances where faculty or staff are the
unit of measure (e.g. performance evaluation) and assessment as practices adopted to address
accreditation standards. At Blackboard, we have adopted a definition that specifically address
our work with clients:
ASSESSMENT In an educational context, a process focused on understanding how well a
program has delivered specific knowledge, skills, and competencies and using that
understanding to plan for improved program performance going forward.
Throughout this document, we will use “assessment” and “evaluation” interchangeably.
However, when working with clients who use these terms for specific processes, we adjust our
use of the terms according to campus definitions, as we do other elements of educational
practice.
OUTCOME: An outcome is what a student knows, thinks, or is able to do as a result of a
program. Outcomes are observable and therefore measurable, where “measureable” is
typically a quantitative expression of quality. Outcomes should have a name and a description
where the description typically sets forth the criteria for evaluating the outcome. For example:
Outcome name: Information Literacy
Outcome description: The ability to know when there is a need for information, to be
able to identify, locate, evaluate, and effectively and responsibly use and share that
information for the problem at hand. - Adopted from The National Forum on Information
Literacy (http://www.infolit.org/)
Criteria for evaluation:
4
Recognize need: Effectively defines the scope of the research question or
thesis. Effectively determines key concepts. Types of information (sources)
selected directly relate to concepts or answer research question
.Access information: Accesses information using effective, well designed search
strategies and most appropriate information sources
Evaluate: Thoroughly (systematically and methodically) analyzes own and
others' assumptions and carefully evaluates the relevance of contexts when
presenting a position.
Use for a purpose: Communicates, organizes and synthesizes information from
sources to fully achieve a specific purpose, with clarity and depth
Use ethically and legally: Students use correctly all of the following information
use strategies (use of citations and references; choice of paraphrasing,
summary, or quoting; using information in ways that are true to original context;
distinguishing between common knowledge and ideas requiring attribution) and
demonstrate a full understanding of the ethical and legal restrictions on the use
of published, confidential and/or proprietary information
(http://www.aacu.org/value/rubrics/pdf/InformationLiteracy.pdf)
In addition to curricular outcomes, there are co-curricular outcomes resulting from the programs
and services typically organized within student affairs. Among the knowledge, skills, and
competencies related to the co-curriculum are career planning, interpersonal relationships,
academic planning, multicultural competence, physical health and wellbeing, etc.
Beyond curricular and co-curricular outcomes, an argument can be made that any collection of
organized activities on a campus has implications for intended outcomes. The student billing
function could think of their work as resulting in personal financial skills. The registrar’s office
could think of their work in terms of academic planning skills. The library could think of their
work in terms information location and access skills. Campus Security can think of their
outcomes as knowledge of personal safety, and so on.
With the principal terms of “assessment” and “outcomes” defined here, additional definitions will
be provided as concepts are introduced.
Accreditation and accountability. While we have argued that assessment is of value to all
institutions of higher education because of its potential to serve continuous improvement, there
is no question that institutions of higher education are being driven to adopt assessment by the
professional and regional accreditation agencies. Over the last two decades, as higher
education in the US has reached full maturity, accreditation standards have moved away from
compliance models to quality assurance and enhancement models. No longer as focused on
the capacity to deliver education, regional accreditation now has expectations about institutional
and educational effectiveness. Each regional agency states these expectations in language
making clear that institutions need to conduct assessment and use results to improve; they do
not expect institutions to provide laundry lists of achievement levels or comparability with other
institutions.
5
To underscore the concept that
“accountability” in the US means
that institutions have systematic
processes in place and use
resultant data to improve their
work, relevant language from
each of the accreditation agency
standards are included below.
Southern Association of
Colleges and Schools
3.3.1 The institution identifies
expected outcomes, assesses the
extent to which it achieves these
outcomes, and provides evidence
of improvement based on analysis
of the results in each of the following areas: (Institutional Effectiveness)
3.3.1.1 educational programs, to include student learning outcomes
3.3.1.2 administrative support services
3.3.1.3 educational support services
3.3.1.4 research within its educational mission, if appropriate
3.3.1.5 community/public service within its educational mission, if appropriate
North Central Association – Higher Learning Commission
Core Component - 2c The organization’s ongoing evaluation and assessment processes
provide reliable evidence of institutional effectiveness that clearly informs strategies for
continuous improvement.
Western Association of Schools and Colleges
Standard 4 Creating an Organization Committed to Learning and Improvement
The institution conducts sustained, evidence-based, and participatory discussions about how
effectively it is accomplishing its purposes and achieving its educational objectives. These
activities inform both institutional planning and systematic evaluations of educational
effectiveness. The results of institutional inquiry, research, and data collection are used to
establish priorities at different levels of the institution and to revise institutional purposes,
structures, and approaches to teaching, learning, and scholarly work.
Middle States Commission on Higher Education
Standard 7 Institutional Assessment The institution has developed and implemented an
assessment process that evaluates its overall effectiveness in achieving its mission and goals
and its compliance with accreditation standards.
New England Association of Schools and Colleges
6
Evaluation 2.4 The institution regularly and systematically evaluates the achievement of its
mission and purposes, giving primary focus to the realization of its educational objectives. Its
system of evaluation is designed to provide relevant and trustworthy information to support
institutional improvement, with an emphasis on the academic program. The institution’s
evaluation efforts are effective for addressing its unique circumstances. These efforts use both
quantitative and qualitative methods.
Northwest Commission on Colleges and Universities
Standard 1.A – Mission and Goals The institution’s mission and goals define the institution,
including its educational activities, its student body, and its role within the higher education
community. The evaluation proceeds from the institution’s own definition of its mission and
goals. Such evaluation is to determine the extent to which the mission and goals are achieved
and are consistent with the Commission’s Eligibility Requirements and standards for
accreditation.
This is not to say that the accrediting organizations had fully developed underlying theories and
methodologies to offer their members when the standards were initially introduced. Assessment
guidelines are evolving, with workshops, written materials, rubrics and other tools being offered
by accrediting agencies to assist institutions to meet accreditation expectations. Indeed, the
field of evaluation is a developing field as discussed in the following section.
Evaluation theory. Evaluation theory is new enough that it is safe to say that all or most
theorists are alive today. The field is becoming well established as a profession and as an
academic discipline. Growing out of massive government funding of social programs in the
1960’s, when it was important to know the consequences – both intended and unintended – of
government spending on programs such as Head Start, Upward Bound and other War on
Poverty initiatives, the era gave rise to graduate programs in evaluation across the US, that
result today in many options for those seeking advanced degrees in evaluation – particularly in
educational evaluation.
Theories and evaluation models abound, with a range of underlying concepts. Theories such as
“goal based,” “goal-free,” “practical”, ”empowerment,” “appreciative inquiry,” and “program” all
have stake in the literature of evaluation. Some theories are more appropriate than others
depending on what is being evaluated. Among the things to evaluate are: product, cost,
process, social impact, and outcomes.
Our work at Blackboard is unique to each client’s needs, but in general, the approach is
grounded in higher education, and a blend of:

Program level: Understanding the impact of the design and delivery of our programs
and services.

Outcomes based: Examining the program through the expected knowledge, skills and
competencies of students who go through the program
7

Action oriented: Generating “actionable knowledge” by developing information that tells
us where we stand and what to do to improve our programs.
Our theoretical model gives rise to specific methodologies.
Evaluation methodology. It is important to remember that evaluation research differs
substantially from scientific research methodology. While evaluation uses some functional
elements of scientific research
methodology, the purposes are
different. Where the purpose of
scientific research methodology is to
test a hypothesis, evaluation lives in a
very real context of teaching, learning,
support and administrative activities.
Where the former insists on value-free
or value-neutral observations,
evaluation research is entirely about our
values as educators and our desire to contribute to an educated citizenry and develop each
student to his or her full potential.
The different purposes and uses of these two methods often present challenges in academia
where scientific methodology has dominated for generations. The idea of ‘scientific rigor” has
led to many assessment practices that are over-engineered, over-sampled, and intrusive.
Indeed, outcomes assessment requires us to put on new lenses.
New lenses for education. Because educators are comfortable thinking of their work in terms
of each student’s mastery of disciplinary knowledge or content, organizing courses to transfer
that knowledge from faculty to student, and issuing grades as evidence of mastery, the
assessment process initially places many squarely in a zone of discomfort. The assessment
process calls for collaboration with colleagues to decide on program outcomes, looking at
aggregate rather than individual student performance, and examining these data in terms of
program performance (rather than individual student performance) and then collectively
determining a course of action to increase program performance. Naming and defining program
outcomes in a way that is observable (and therefore measurable) are as yet alien
responsibilities for faculty.
In addition to new processes, the vocabulary of outcomes assessment is new, very specific and
complex. In the course of evolving, attempts to explain the purpose of higher education
assessment gave rise to certain concepts that have not withstood the test of merit, yet have
remained in the lexicon as attractive distracters. The term “value added,’ for example, has led
some institutions to pursue growth of learning – a tiresome endeavor that only leads
investigators to the non-starter conclusion that students know more when they graduate than
when they entered the institution. The term “lifelong learning” has led some institutions to
struggle with how to measure a concept over which they will have no control once the student
has graduated.
8
There are many hazards along the road to building manageable sustainable assessment
processes that lead to actionable knowledge and institutional effectiveness. Our work as
Blackboard consultants is to clear the hazards, recommend the right tools, sharpen visibility,
and organize for sustainability and manageability of assessment processes. What does
effective assessment practice look like?
Effective assessment practice. Several levels of organizational capability need to come
together in order for institutions to realize the power of assessment as an engine of change.
These levels can be described as follows:
1. Institutional Strategy
Leadership – Top leadership visibly supports outcomes based education and views assessment
as a continuous improvement strategy
Goals and outcomes – Goal statements describe desired end states of the program; outcome
statements describe specific knowledge, skills or competencies gained from the program.
Assessment plan - An institution-wide assessment plan has been adopted which guides the
assessment activities across the institution.
Governance model - A representative group meets regularly to monitor the quality of
assessment methods, information gained, and improvements in practices
Visibility - Outcomes results are reported to all stakeholders who use the information to
collectively plan and improve program design and delivery
2. Measurement Methods
Direct measurement - All outcomes are assessed using direct measurement such as tests or
rubrics over a defined period of time
Indirect measurement - Indirect measurements such as surveys are used to supplement and
inform direct measurement of outcomes
Quantitative expression of quality - Quality of outcomes are expressed quantitatively with
institution-wide understanding of the level of achievement desired
Equity of outcomes - Outcomes results for at-risk populations are routinely examined and result
in strategies achieve parity among all groups
Manageability and sustainability - Assessment processes are ongoing while producing low
levels of distraction to the teaching and learning process
3. Measurement instruments
Surveys and course evaluations - Surveys and course evaluations are easily deployed through
automated means and reports are automatically generated
9
Rubrics - Outcomes rubrics are easily deployed through automated means and reports are
automatically generated
Curriculum maps - Curriculum map shells are easily generated through automated processes
and can be accessed across all programs from a central location.
Tests - Tests for specific outcomes are easily deployed and reports are automatically generated
Portfolios - Portfolios are easily created, populated, and assessed with associated rubrics
through automated processes
4. Evaluation criteria
Standards - The institution is easily able to associate accreditation standards with evidence of
program performance at all times with little manual processing
Targets - The institution has established target (expected levels of performance) for all
outcomes
Outcomes defined - Outcomes are defined with a name or label and an expanded definition
describing the key elements of the outcome; there are no conflated outcomes
Rubric criteria for evaluation - Each outcome for which rubrics are used in direct measurement
has at least 3 or more specific observable criteria used to evaluate the outcome
Test items are criteria for assessments (embedded assessment) - Embedded test questions
relate to specific criteria for evaluation and scores by test item are used for evaluation purposes
5. Use of Results
Program improvement focus - Assessment results are consistently viewed as indicators of
program performance (not student performance) and use results to improve programs
Collective interpretation - Assessment results are interpreted, discussed and decided upon by
all individuals who are involved with the design and delivery of the program.
Used for improvement - Assessment processes focus on gaining an understanding of how to
improve program performance over time; improvement decisions flow directly from the
assessment results
Used for accountability - Assessment results are regularly examined by program members to
evaluate program merit
Results visible - Assessment results are accessible and visible to all interested stakeholders
Meaningful reports - Assessment reports are presented in a form that is meaningful to most
audiences.
10
6. Faculty and Program Development
Outcomes based academic program design - All educational programs have established a
comprehensive set of student learning outcomes
Faculty-driven change - All full-time program faculty collectively review assessment results and
reach agreement on curricular and pedagogical changes leading to improved program
outcomes
Program level engagement - All full-time faculty in programs are knowledgeable about and
involved with program assessment
Faculty development program - Faculty development workshops on program assessment are
regularly provided
7. Student support and engagement
Outcomes based program or service design - All student support programs and services have
identified a comprehensive set of student outcomes that go beyond student satisfaction
indicators
Staff-driven program or service change - Program staff and leaders collectively review
assessment results and reach agreement on program and service changes that will lead to
improved student outcomes
Staff professional development program - Student affairs and student services staff fully
participate in outcomes assessment workshops and seminars
Actionable knowledge. The power of assessment lies in the development of actionable
knowledge – that information that tells an institution where they stand, and what to do to
improve. There is no one assessment process that will lay this information at our doorsteps, but
there are many pieces of information on campuses that when brought together into a quality
dashboard, can provide this information. The following dashboard example would be built from
a number of data sources: course evaluations, campus climate surveys, facilities surveys,
rubrics, final examinations, and other instruments that campuses use every year. This
dashboard uses both direct and indirect measurements to paint a picture of where the campus
stands in its desire to provide high quality instruction, support services, and physical
environments.
11
In interpreting the information in the illustration “Know where you are,” this campus knows that
graduates are not leaving with the level of scientific reasoning skills (circled in red) that would be
expected from having completed
12 units of science lab courses.
An examination of the rubric
used to assess scientific
reasoning, provides the granular
information we need to improve
the institution’s performance. In
this case, faculty can collectively
engage to plan for emphasizing
and reinforcing skills that will
improve students’ ability to
hypothesize, analyze, and
conclude.
There are many ways for a campus to develop the skills and organizational capacity to use
assessment information in powerful ways. There are a number of useful resources in the field
that can be accessed as well.
Field Resources. Although there are numerous resources available for educational
assessment practitioners, a few stand out in terms of immediate use:

Maintained by the University of Kentucky, a very active listserv of US higher education
assessment practitioners provides useful advice and support. Subscribe to this listserv
at http://lsv.uky.edu/archives/assess.html.

With a focus on teaching and learning, developing a quick understanding of the context
surrounding outcomes assessment is important for faculty and senior leaders. From
Teaching to Learning - A New Paradigm for Undergraduate Education by Robert B. Barr
and John Tagg (1994) will be very useful. In this article the authors describe the role of
educators under the traditional “instruction paradigm” and under the new “learning
paradigm.” See this article at http://ilte.ius.edu/pdf/BarrTagg.pdf.
12

To develop an conceptual understanding of the assessment of student learning, T. Dary
Erwin’s Assessing Student Learning and Development: A guide to the Principles, Goals,
and Methods of Determining College Outcomes (1991) is an accessible and useful book.

At a practice level, program assessment methodology is clearly presented in Worthen,
Sanders and Fitzpatrick’s Program Evaluation: Alternative Approaches and Practical
Guidelines (3rd Edition), (2003)

From a theoretical perspective, E. Jane Davidson’s Evaluation Methodology Basics
(2005) is a compendium of general evaluation practices. Although not aimed at
educational evaluation, it is a solid source of theory and practice considerations.
Assessment is a powerful gift. At Blackboard, the definition of institutional effectiveness is:
the capacity of an organization to sustain adaptive processes to achieve its mission Three
powerful gifts result from outcomes assessment. First, institutions have a means of collectively
engaging faculty in the design and delivery of the curriculum – and to do this on an ongoing
basis. Secondly, the institution can now provide evidence to substantiate bold claims of the
type typically found in mission statements, such as “our graduates have a deep awareness of
their professional work on society.” Thirdly, the most powerful gift is adaptivity – the ability to
change course to meet a desired end state. Faculty engagement, continuous improvement,
institutional identity, adaptation and change – a powerful gift indeed.
.
.
13
Chapter 2: Planning for Assessment
Assessment planning involves governance, structures and strategies. An effective
governance model is key to success in assessment. Ideally, an
institution’s assessment initiative has 3 core elements of governance:
(1) an executive sponsor providing legitimacy, visibility, and resources
Executive
Sponsor
to the initiative, (2) an assessment team with wide college
representation tasked with designing and overseeing the process, and
(3) a staff position responsible for organizing, executing and
Assessment
Assessment
presenting resultant data. This model provides the head, heart and
Team
Staff
hands of successful assessment processes in an institution. The
specific roles and responsibilities of these core elements are to:
1. Executive sponsor
a. Introduce the concept of outcomes and the purpose of evaluating outcomes
b. Appoint and charge a working group to plan and oversee assessment processes
c. Keep the campus informed of key milestones achieved, celebrate successes,
and encourage participation in the process
d. Provide resources when needed to accomplish the work of assessment
2. Assessment team or steering committee
a. Define an institution-wide structure, process, and schedule for assessment
b. Establish broad understanding and agreement of the process
c. Identify key outcomes the institution will want to evaluate across all programs; we
will refer to these as trans-disciplinary outcomes
d. Provide guidelines for programs to define disciplinary outcomes
e. Identify assessment methods and instruments to be used on key outcomes the
institution will want to evaluate across all programs
f. Regularly monitor process and oversee the quality of process and results
g. Provide direction to assessment support staff and make decisions as they arise
from support staff
3. Assessment support position
a. Staff working group meetings, identifying areas of needed attention
b. Assist programs with assessment planning, process design, and implementation
c. Provide expertise on evaluation methods
d. Create instruments such as surveys, rubrics, curriculum maps
e. Administer assessment processes
f. Create reports on assessment results; present reports if necessary
g. Schedule meetings for collective discussion and decision-making
h. Organize and centrally maintain assessment data
Once these structures are in place, there are governance strategies to consider.
Models of governance. Will assessment be centralized (designed and overseen by an
institutional assessment team), de-centralized (designed and conducted at the program or
14
discipline level), or some combination? Many institutions expect their faculty to self-organize for
assessment. Some have driven assessment down to the course level. Course-level and selforganizing assessment approaches are both recipes for inconsistent results – results that are
not comparable from one course to another, unusable metrics (e.g. grades), and results that
cannot be generalized to the program level, and certainly not to the institution level. Faculty are
not experts in program assessment and without a structured faculty development effort, such
governance decisions will expend a great deal of resources for little return. For the sake of
consistency, manageability and sustainability 4 general guidelines are recommended:
Sample when
using a rubric
Rotate and
Phase
Decentralize
Disciplinary
Centralize
Transdisciplinary
1. Centralize as much as possible.
Assess all mission related and transdisciplinary outcomes centrally; leaving all
disciplinary outcomes to programs. Transdisciplinary outcomes will typically be
outcomes that are not “owned” by a specific
discipline, and are often those introduced in
the general education program with further
development in major disciplines. Crossdisciplinary assessment teams can be
formed to manage and conduct
assessment activities.
2. De-centralize at the highest common level. These assessment activities will typically
reflect disciplinary outcomes related to disciplinary knowledge and application of theory
and practice in a given field of study. Many programs will list trans-disciplinary outcomes
as well, but program level assessment teams can be formed to manage and conduct
assessment focused on disciplinary outcomes.
3. Rotate assessment activities so that over a period of time all outcomes will have been
assessed both at the central and decentralized levels. Phased approaches may also be
used to learn lessons of good practice for all through the experiences of a few. For
example, an institution may start with Phase I being the assessment of general
education outcomes, followed by Phase II being the assessment of disciplinary
outcomes. The lessons learned through Phase I will enable subsequent assessment
processes to run more smoothly.
4. Always sample with process-heavy processes such as rubric assessment. Sampling
allows for the development of valid results while keeping processes manageable and
sustainable. The methodology should also include steps to assure that the sample is
representative of the entire population.
These guidelines are proposed to maintain high quality results, minimizing effort, and
comprehensively addressing an institution’s expected outcomes. “High quality results” means
that outcomes assessment of a representative sample of student work results in both
summative and formative performance metric. High quality formative performance metrics
15
means results at the criterion for evaluation level. “Minimizing efforts” means that a
knowledgeable and interested group of faculty and staff has conducted the assessment and
taken the burden of assessment off the backs of faculty and staff across the institution.
“Comprehensive assessment” means that with a planned, purposeful approach, the institution
will have an understanding of how it is performing on all outcomes within a given time period,
and with improvements made as a result of initial rounds, will observe performance increases
over time.
The governance models described above, working hand-in-hand, assures that little effort is
expended on duplicated, unnecessary, or unproductive assessment activity. Regardless of the
model of governance used at an institution, the overall process should ensure that coordination
takes place among the institution’s assessment processes.
Institutional assessment, for example, will typically include assessment of the academic,
student support, and administrative functions of the university. Many institutions provide a
framework for organizing, storing and reporting data resulting from all assessment processes.
Typically this framework includes places for outcomes, outcomes assessment results, and
actions to be taken to improve outcomes. A sample framework with sample instructions for
submitting departments is provided for illustration.
Approaches such as this provide a central common framework for Assessment Directors,
Assessment Steering Committees, and leaders to access and monitor assessment activity
across the organization. General education, academic departments, student support programs
and administrative assessment activities may take place concurrently and somewhat
independently, but can easily be reported
Academic
and tracked within such a framework.
(Indeed, the content should be reviewed at
Co-curricular
a central level with an eye to reducing
duplication of efforts and providing
Administrativ
e
feedback to programs with ways to
maximize useful information.)
Institutional messaging is a key use of the
framework approach. Clear language sets
Institutional Quality Framework
a foundation for expectations. Some
institutions have used framework headers such as “What are you going to do?” This invites
respondents to list process statements rather than outcome statements, which in turn invites
16
people to report results of processes. The following graphic illustrates the difference between
frameworks that set expectations for change and those that will result in the status quo.
The quality of information obtained depends on clear directions and language that set
expectations for quality of responses.
Is this a lot of work? It doesn’t have to be as consuming as we see on many campuses.
Indeed, the term “assessment” comes from the Latin verb assidere, which means “to sit beside.”
The process of assessment should sit beside teaching and learning, and serve to inform the
improvement of teaching and learning. In addition to approaches we have discussed above
(centralization/decentralization, rotation and phasing, and sampling), we now turn to
assessment at the program level and approaches that can be taken to keep assessment both
manageable and sustainable.
Sustainability
The key to developing sustainable and manageable processes is careful thought and purposeful
planning before launching a campus-wide assessment processes. Five recommendations are
made for the sake of sustainability and manageability; more detailed discussion follows.
1. Design assessment processes toward the conclusion of the students’ studies in a
program or at the institution
2. Develop granularity by developing data at the criteria for evaluation level
3. Reuse graded work; avoid the creation of work (for anyone) that is solely for use in
program assessment.
17
4. Multipurpose artifacts: use student artifacts that can be used by applying multiple
rubrics.
5. Repurpose tests; analyze existing tests that capture program outcomes at the criteria for
evaluation level
These recommendations and the context that give rise to them are discussed briefly below.
End of program focus: As mentioned earlier, assessment processes should capture the
knowledge, skills, and competencies as students are leaving the program in order to answer the
question: do graduates have the knowledge, skills and competencies set out in our goals?
While it is interesting to know whether students’ progress from one year to the next, assessing
progress is a lot of work to learn that students have more knowledge and skills leaving than
when they came in the door. So, recommendation 1 is: Design assessment processes toward
the conclusion of the students’ studies in a program or at the institution.
Granularity: The most useful assessment information lies in the granularity of the results –
meaning student performance results that, when aggregated across the sample, provide data at
the criterion for evaluation level. It is the granularity that provides the adaptive muscle for
program improvement. What does this mean? If, for example, the outcome “written
communication” is defined as having criteria for evaluation that include: key idea, presentation
of evidence, transition statements, strong vocabulary usage, error-free grammar and
mechanics, the real insight into improving program performance is at the criteria for evaluation
level. This suggests that recommendation 2 is: Granularity is necessary for visibility; develop
granularity by developing data at the criteria for evaluation level.
Reuse of graded work and multipurpose artifacts: There are two key aspects of
sustainability regarding rubric evaluation – reusing work submitted for grades and choosing
artifacts that can be assessed for multiple outcomes. Senior theses and capstone projects are
an excellent way to repurpose student work that has already been submitted for individual
grading. Additionally, these particular
artifacts can be used to assess several
competencies at once. Thinking of the
senior project/thesis processes at many
institutions, one can imagine using the
thesis and the presentation to capture:
written communication, oral communication,
information literacy, problem solving, critical
thinking, and perhaps others.
Recommendation 3 is about avoiding the
creation of work (for anyone) that is solely
for use in program assessment.
Recommendation 4 is to use student
artifacts that can be used to assess more
than one outcome.
18
Sy
nt
h
es
iz
Appl Compreh
e Analyze
y
end
RepurposingTests: Within the realm of disciplinary outcomes, a simple and elegant way to
assess these is to repurposing final
exams scores by test question for
Course X
program assessment. Many programs
Average Score as % of Total
will place embedded questions that can
Possible
be used for program evaluation into
Class, Instance,…
86% course finals. This is not necessary if
67% there is a program exit examination that
Aggregation or Composition
100%
all students take or if there is a required
33%
Draw Use Case
86% course that program faculty agree will
92%
Modify diagram
75% capture the key disciplinary outcomes.
100% The underlying assumption in this
Elaborate: Order Code,…
67%
78% approach is that each test question can be
Modify Class Diagram
75% associated with a criterion for evaluation
80%
that will roll up to disciplinary outcomes:
problem solving, analytic thinking,
knowledge of disciplinary theory, application of knowledge, etc. Aggregating student responses
by test item and expressing this as a percentage of the total possible for that item, gives the
program a good idea of what students are learning and the levels of higher order thinking they
are achieving. Thus, recommendation 5 is to analyze existing tests that capture program
outcomes. Arguably, all test items reflect a criterion for evaluating a program outcome, making
final tests a rich source of information about program performance.
In summarizing, the principle of sustainability, the key is to generate as little new work as
possible, using existing artifacts that provide sufficient granularity to create actionable
knowledge – knowledge that leads to action – action that increases the quality of teaching and
learning, support programs and services, and administrative work. The other principle that we
emphasize in planning to do assessment is the idea of actively managing the process. Active
management requires ownership, roles and responsibilities.
Ownership
At the institution, program, or in fact any level, the assessment process needs people with
sufficient authority and responsibility to see that assessment activities lead to actions that will
improve performance. This implies the involvement of senior leadership as well as leadership at
the assessment activity level. Senior leaders who remain disengaged from the assessment
process are setting their institutions up for mediocrity if not failure of the assessment process;
the result of which is institutional status quo. At these institutions, much rich and useful
information is developed, with no usage beyond collecting charts, tables, and reports and
organizing them for presentation to accreditors. One way to examine the roles and
responsibilities needed for assessment is the template used in Blackboard Consulting projects.
19
Roles
Executive Sponsor
Initiative Lead
Initiative Team
Members
Responsibilities
 Champions the initiative
 Provides high level oversight, direction, and support
 Approves major scope changes including additional funding
 Regularly updates senior leaders and ensures initiative progress
 Provides team leadership
 Manages initiative milestones, schedule, budget, and human resources
 Provides updates to Executive Sponsor and key stakeholders
 Recommends scope changes as appropriate
 Brings initiative team together as necessary
 Identifies key users of Blackboard Outcomes System
 Actively supports the initiative
 Participates in all team meetings
 Owns processes as assigned
 Contributes to process decisions
Even with clear roles, responsibilities and ownership, the collection of data can still result in less
than useful information without a clear evaluation question. The following section will provide an
overview of this critical element.
Developing the evaluation question
Evaluation questions serve the same purpose as research questions but differ from each other
in important ways. They both frame the evidence gathering activities around things that matter.
While the research question captures the heart of a hypothesis and sets a framework for
research methodology, the evaluation question captures the heart of a program and sets a
framework for the evaluation process.
Ideally, a new program starts by first identifying the desired knowledge, skills, and competencies
needed to fulfill the expectations of the desired end state (goal). The next step is to then build a
series of activities to produce the desired results, followed by the assessment process which
leads to ongoing program improvement. It is safe to say this rarely, if ever, happens.
Assessment is usually a process that is tacked onto an existing practice – sometimes without
defined outcomes in place. In these instances, we often find surveys that are a collection of
“satisfaction” items or survey items that reflect the curiosity or interests of the persons designing
the survey. When working with clients we often hear comments like, “it would be interesting to
know how students . . . .” It is important to remember that keeping survey questions focused on
the research question is essential for useful result as they help inform us about the desired end
state.
Evaluation questions should be crafted to capture both summative (judgmental) and formative
(diagnostic) information. In educational program evaluation the question would be some variant
of: To what extent were the goals of the program achieved and what opportunities are there to
improve?
Course or instructor evaluations are a great example of a common and widely practiced form of
assessment that typically occurs without an underlying evaluation question to guide its use. As
a result, faculty get summaries (and in some instances copies of student responses) on a
20
course-by-course basis, which encourages them to view the information on a course-by-course
basis. These metrics are read with interest and likely interpreted as “the perceptions of students
in a course”. Because faculty receive these reports for each administration there is little
incentive to do anything with the information because of the sample size. This represents a lot
of activity with little payoff. Of particular concern are course evaluations that ask “overall, how
do you rate this course?” This single metric is then often used by performance review
committees and deans to decide on retention, promotion, or pay increases. This represents a
questionable use of a single metric. Neither of these practices are particularly effective, useful,
or reliable because there is no underlying evaluation question. Additionally, with current
technology, there is no
reason to rely on a single
question about overall
instructor rating, as overall
performance scores can
easily be calculated by
averaging responses of other
course evaluation questions.
There is a way make effective
use of these data if we think
of course evaluations within
the context of a faculty
development program model.
So just as the instructional
program leads to knowledge,
skills and competencies of
students, the institution’s
faculty development program
leads to knowledge, skills,
and competencies of faculty.
The evaluation question would be: to what extent is the faculty development program resulting
in the expected outcomes, and where are opportunities to improve individual and overall
performance?
In such a model, faculty
development goals and
outcomes are clearly articulated.
Course evaluations, consolidated
over a year or two, provide
metrics on each of the expected
outcomes for individual faculty
and for the faculty as a whole.
Individual faculty can now rely on
the data because they are
21
consolidated across all courses. Faculty and their department heads can now have a
conversation on ways to improve individual performance. Provosts can now see patterns
across all faculty indicating professional development programming opportunities and schedule
workshops, retreats, and other opportunities to improve faculty teaching skills.
Understanding Diversity
For campuses whose mission and goals address the expectation of a welcoming and inclusive
learning environment, assessment is a useful means of understanding where we are and what
to do to improve. The question of diversity is related to equity of outcomes and requires no
more effort than displaying results disaggregated by demographic groups in order to answer
these questions. These data provide an understanding of gaps in knowledge, skills, and
competencies between demographic groups and serve as a basis for curricular and pedagogical
strategies that will improve the learning outcomes of target groups. Generally, such changes
also improve the learning outcomes for all students. This type of approach may also serve to
change the conversation from the “underprepared student” to the “underperforming curriculum.”
The Nature of Outcomes. We discussed earlier that an outcome is what students know, think
or are able to do as a result of their experience in an institution, program, or other purposefully
designed collection of activities. Outcomes, then, can reasonably be expected to reflect the
organization of a college or university.
Institutional outcomes are shaped by a combination of the mission statement and the
knowledge, skills, and competencies we expect for all graduates – typically these are results of
the general education program. An example mission statement that reads in part: achieve
success in their chosen careers and promote justice and peace in a constantly changing global
society suggests that the institutional outcomes would include: understanding of peace and
justice, and global awareness in addition to the common outcomes defined by general
education and the co-curriculum.
Co-curricular outcomes are shaped by the programs beyond the classroom. At one time the
theory of student affairs offices were founded on the principle of en loco parentis – in the place
of a parent – which dissolved in the 1960’s without the emergence of a clear model for the
design and delivery of student affairs programming. The organization of student affairs,
however, provides a mechanism for identifying co-curricular outcomes. If the institution
supports students through offices and activities such as: New Student Orientation, Career &
Placement, Personal Counseling, Academic Counseling, Student Activities, Student Clubs &
Organizations, then one would expect to see outcomes such as: help-seeking skills, career
planning skills, realistic self-appraisal, academic goal setting, leadership, interpersonal
relationship skills, membership skills, physical health and wellbeing, spiritual awareness, etc.
Curricular outcomes result from the design and delivery of the curriculum. As in the cocurriculum, we can analyze the college catalog and identify most of these outcomes by what we
see in the design and delivery mechanisms of the curriculum.
22
Curricular outcomes – disciplinary. Disciplinary outcomes can be found in the required
courses for majors. Typically we will see a core set of courses that guide students to develop:
knowledge of disciplinary theory and practice, application of theory and practice, knowledge of
disciplinary history and philosophy, etc. Disciplinary outcomes are also found in the collection of
general education courses representing the breadth of knowledge that institutions want their
students to develop. Knowledge of history, literature, mathematics, fine arts, sciences, social
sciences – institutions want all students to have a grounding in the theories and principles of
these disciplines. In addition, however, they are intended to expose students to knowledge,
skills and competencies beyond disciplinary theory. These are the trans-disciplinary outcomes
– often referred to as transferrable skills. Some institutions refer to these as “soft skills”. Yet
they are as critical to success as disciplinary knowledge.
Curricular outcomes – trans-disciplinary. Trans-disciplinary outcomes are less aligned to
disciplinary content than they are pedagogical practice. They are, however, purposeful and
intentional, and by design included to produce knowledge, skills, and competencies that
students need in order to develop their skills. The use of intensive writing, technology,
presentations, teamwork, research, and other pedagogies arguably result in outcomes such as:
written communication, oral communication, technological literacy, teamwork, information
literacy.
Other outcomes. Arguably, there are no
departments, offices, or programs in a
college or university that do not have
some outcomes they expect in terms of
what students know, think or are able to
do. The student billing office plays a role
in developing students’ knowledge of
managing personal finances, the library
plays a role in developing students ability
to access information, campus security
plays a role in understanding and
practicing personal safety, and so on.
“Services” such as food service
have a role in developing an
understanding of nutrition and
health.
Through defining outcomes,
institutions have not only a
means of expressing the
unique character of their
institutions, and identifying the
value to students, they also
have taken the first step in
23
measuring the quality of these outcomes. In addition to clear outcomes, the clarity of mission
and goal statements are also critical to the alignment among all of these elements so that the
institution is able to consistently achieve its mission.
Effective mission and goal statements. Chapter 1 described the importance of writing
effective outcome statements. We see how the outcome name captures the heart of the
outcome, and the extended definition contains the criteria for evaluating the outcome. The key
to effective outcome statements are that the criteria for evaluation are observable; and because
they are observable, they are therefore measurable. Mission and goal statements are not
directly measurable but serve as a critical foundation upon which the outcomes rest. Missions
describe the overall purpose of the organization and goal statements describe the desired end
state as guided by the mission.
Mission statements often describe something other than the purpose of the organization as
seen in three real examples whose names have been redacted:
1. The mission of Community College X is to address the higher education needs of its
community. Through its diverse programs and services, CCX assists students in meeting
their educational goals. We fulfill this mission as an institution of higher education by
preparing students:

To participate responsibly in a culturally diverse, technological and global society.

For successful transfer to colleges and universities.

For employment and advancement within their chosen care
The mission statement of Community College X is clear and comprehensive in stating its
purpose (meet educational needs of community); how it meets those needs (through diverse
programs . . .); and the end result (prepare students to . . .)
2. State University Y is a diverse, student-centered, globally-engaged public university
committed to providing highly-valued undergraduate and graduate educational
opportunities through superior teaching, research, creative activity and service for the
people of California and the world.
This mission statement tells us who comes to the university; what the institution does (engage
globally, provide highly valued. . .); how they do it (through superior teaching); and who they
serve (people of California and the world). It does not, however, speak to preparing students to
participate in society and the workplace
3. Ivy League University Z is one of the world's most important centers of research and at
the same time a distinctive and distinguished learning environment for undergraduates
and graduate students in many scholarly and professional fields. The University
recognizes the importance of its location in City N and seeks to link its research and
teaching to the vast resources of a great metropolis. It seeks to attract a diverse and
24
international faculty and student body, to support research and teaching on global
issues, and to create academic relationships with many countries and regions. It expects
all areas of the university to advance knowledge and learning at the highest level and to
convey the products of its efforts to the world.
From Ivy League University Z’s mission statement, we can tell why the university is important
(research, distinguished learning environment); where it is located and why that’s a good thing;
who it attracts; the relationships it thinks are important; that it expects all areas to advance
knowledge and learning; and finally that it shares this knowledge with the rest of the world.
Again, it does not speak to preparing students to participate in society and the workplace.
These illustrations attempt to convey that the essential purpose of all educational institutions is
often overlooked for the sake of describing where they are located, who their students are, how
they prepare students, and how important they are in the world stage of education. If preparing
students for meaningful participation in society is not in the mission statement, the institution
could lose its focus on teaching and learning. We also see lost opportunities in institutional goal
statements.
Goal statements, in higher education, rather than describing the desired end state, often
describe the process of achieving the desired end state, as in the statements listed under “The
Language of Process.” We can easily see what this university will do to achieve a desired end
state, but the desired end state is not as clear.
The Language of Process
X University’s mission is characterized by its pursuit of the following institutional goals:
• To foster a safe, civil, and healthy University community
• To provide access to academic programs at reasonable cost and in multiple settings
• To strengthen interdisciplinary collaboration and international programs
• To increase diversity within the student body, faculty, and staff through institutional practices and
programs
• To recognize excellence in the teaching, research, learning, creative work, scholarship, and service
contributions of students, faculty, and staff
• To conduct ongoing assessment activities and engage in continuous improvement initiatives within
the University
• To establish lifelong relationships between alumni and the University
• To advance responsible environmental stewardship
• To support community and regional partnerships that elevate civic, cultural, social, and economic
life
If we recast these goals into the Language of Goals, we can clearly see the desired end state.
The Language of Goals
25
X University’s mission is characterized by its pursuit of the following institutional goals:
1.
Educational Excellence: Students throughout the region and beyond are prepared to enter
society through affordable, interdisciplinary and international programs.
2.
Student Access and Success: Students from diverse backgrounds participate together in the
life of University X supported by inclusive faculty, staff, practices, and programs.
3.
Strength of Community: University X students, alumni, and other stakeholders are deeply
engaged with the university through scholarship, service, civic, cultural, social, and economic in a
university culture that practices safe, healthy, civil, and environmentally responsible community.
The point of the above illustration is to highlight the difference between “goals” that give us a list
of things to “do” rather than goals that give us a target to collectively work towards. It is the
difference between administration and leadership and the difference between tactical and
visionary.
There are organizational implications for attending to this concept. Both the nature of how
people view their work and the ongoing improvement of that work are at stake. Specifically,
when operationalized, the list approach can lead to standoffs between one process and another
because of the focus on process; whereas the target approach will encourage reconciliation of
processes to achieve the desired end state. From an assessment perspective, the list approach
will result in checklists of things people did to operationalize the goal. Did we do it? Yes, and
here is a list of the ways we did it – a recipe for maintaining the status quo. By using target
language assessment is forced to look at the extent to which the goal was achieved and to see
areas for future improvement.
While creating clear goals is a fundamental piece of good assessment, they are only a piece of
the structure and process for institutional effectiveness. There are many other elements that
come together when we examine how well those goals have served the institution.
Planning for institutional and program review
Concepts that apply to accreditation and program review processes are framed in ways that
are similar because of a common practice of aligning institutional goals to program goals. The
key elements of making these review processes productive and useful lie in a methodology
such as:
1. Creating clear institutional and programmatic goals that depict the desired end state
2. Identifying the key outcomes that the institution or program expects to see if the goal is
achieved
3. Gathering, interpreting and reporting metrics at a sufficiently granular level to provide
information on the extent to which the goals were achieved and the opportunities to
improve for the next review period.
4. Stating what actions or plans will be adopted to achieve the desired improvement
Methods used for institutional and program review processes are the real challenge for those
responsible for organizing them. Institutional effectiveness and assessment directors know that
26
serving up metrics is useless without a range of other structures, people, and processes in
place, including:
1. The visible support of executive sponsorship
2. The structured engagement of key stakeholders – particularly faculty regarding the
assessment of academic outcomes and student support personnel regarding the
assessment of co-curricular and support outcomes.
3. A critical cohort of community members skilled in developing actionable knowledge
4. Human and technological resources to support manageable and sustainable
assessment activity including experts in
a. Rubric design
b. Survey design
c. Data analysis
d. Report design
e. Sampling methodology
We will discuss these methods in more detail in the following section “Conducting Assessment”.
Planning for Assessment is a big part of the success of assessment initiatives that lead to
improvement. And planning is a lot of work, but the risk of not planning is huge. Without a
planned approach to assessment, institutions run the risk of expending huge amounts of energy
and resources for little payoff. With a planned approach, however, institutions are positioned to
reap gains in terms of validating the design and delivery of all of its programs, adapting to new
internal and external expectations, and systematically improving performance of the institution.
27
Chapter 3: Conducting Assessment
Defining has many dimensions within the context of outcomes assessment. In earlier
discussions we have explored the importance of
defining an institutional governance structure, of
defining processes, and of defining outcomes with
clear criteria for evaluation. In this section, we will
assume that all of those structures are in place and
explore decisions of methodology and assessment
practice, which leads us into tools and instruments,
roles and responsibilities, and a host of other details
that belie the overly simplified representation of
“define, decide, assess.”
Deciding on assessment methods, on first blush seems to be a relatively straightforward
proposition. There are two major classifications in methodology: direct and indirect. Within
these classifications there are two basic instruments or tools: direct measurement typically
involves tests or rubrics while indirect measurement typically involves surveys or counts. It is at
the point of deciding which test,
rubric, survey or count to use that
the decision on assessment
methods becomes challenging. The
chart here provides a diagrammatic
representation of the categories
where decisions for conducting
assessment take place.
Tests are a useful source of
program assessment data. They
are also attractive because students
are not required to do anything
beyond what is required for the
course, and they are motivated to
perform at their best. This is an
“embedded assessment”
opportunity at its best. The real
argument for using tests is that
every question in a test provides both disciplinary and trans-disciplinary outcomes. By this, we
mean that each test question has the following characteristics:
1. Test questions are always aligned to an outcome and always represent a criterion for
evaluating an outcome whether recognized, articulated or not.
28
2. A test question in the context of a program always contains an element of disciplinary
knowledge and a level of difficulty which can always be aligned to higher order thinking
(understanding, applying, analyzing, synthesizing, evaluating, etc).
3. Because they represent criteria for evaluating an outcome, test questions can often be
aligned with trans-disciplinary outcomes such as analytic thinking, problem solving,
mathematical reasoning, and so on.
This method requires
a few more steps on
the part of faculty to
articulate the criteria
for evaluation related
to a test question and
to calculate the test
results by question
rather than by
student. Accordingly,
in this illustration, we
can see the primary function of course level metrics: issuing a grade to the student.
Let’s assume that the
General Education
Committee has identified
this test as one that is
suitable for use in
assessing how well the
General Education
program has performed on
Quantitative Reasoning.
These very same data now
become an instrument of
program assessment except that rather than calculating by row (student performance) the calculation is done on the
columns (program performance). What these data tell us is the extent to which the program has
delivered on key disciplinary criteria for evaluation. Of course the questions need to be linked to
criteria for evaluation of an outcome to tell us how the outcome is performing. Using the
Association of American Colleges and Universities’ definition of Quantitative Literacy
(http://www.aacu.org/value/rubrics/pdf/QuantitativeLiteracy.pdf) the faculty would link the criteria
for evaluating Quantitative Literacy to the questions, for example:
Interpret:
Average of Q1 and Q2
Represent:
Average of Q3 and Q4
Calculation:
Q5
Application/analysis: Q6
29
Communication:
Q7
Math 204 Spring '10 Final
Interpret
74%
Represent
80%
Calculation
83%
Analysis
81%
Communication
Overall
61%
76%
With these data, the General
Education committee has data sets
that contribute to understanding how
well their program is developing
Quantitative Reasoning for all
graduates. With these data, they can
ask themselves (1) is the overall
performance what we want, and (2)
where are opportunities to improve the
program’s performance? There is
always a way to improve performance.
Note that sample size would need to
be sufficient before making decisions
based on such data.
The important thing to note is the shift from the student as the unit of measure to the program as
the unit of measure. In assessment we are looking at what students know, think or are able to
do as an indicator of the impact of the program.
To summarize the use of tests in direct assessment, the following steps are necessary:
1. Aggregate average student performance by test question
2. Align test questions to criteria for evaluating the outcome
3. Report aggregate average student performance by criterion to identify areas of
improvement (formative)
4. Report average of all criteria for overall performance on the outcome to identify
achievement of the outcome (summative)
Many commercial tests are available to assess a range of outcomes. Institutions often find this
attractive because of the low investment in time and effort. In deciding to use commercial
products to measure outcomes, there are some compelling reasons to think twice before
making this decision:
1.
Are the costs of commercial tests justified by the information we will get? In many
cases, results are reported in terms of your institution’s performance against other
institutions. Thus, the research question implied in this situation is: How do our
students perform in comparison to other colleges or universities? We are less
positioned to ask the more appropriate evaluation question: How does our program
perform in comparison to like colleges or universities?
30
2. Are commercial test results reported by criterion for evaluation? Without granular
information on criteria for evaluation, you will have no ability to develop improvement
plans. All we will know is that we performed better or worse than other institutions,
which could be a factor of student selectivity or program performance.
3. Will students put any effort into a test that does not count for grades?
4. What is the general feeling about requiring students to participate in an exercise that is
of little or no educational benefit to them?
Until commercial tests provide clear criteria for evaluation and test results are reported by
criterion for evaluation, they will only provide part of the information needed.
Rubrics are the other (and arguably the most impactful) instrument of direct assessment. A
rubric is simply a matrix with performance level descriptors mapped against a set of criteria for
evaluating an outcome. A well designed rubric is a powerful mechanism for giving students a
learning target, and for giving faculty a teaching target. Faculty who collaboratively create a
rubric often find it an important, energizing and useful academic work. Whether or not a rubric
evaluation process is used to evaluate outcomes, it is worthwhile to construct and communicate
to students and faculty rubrics for every institution-level outcome. This practice will begin to
provide a common understanding of teaching and learning expectations. Institutions are
strongly encouraged, however, to develop the skills of a critical core of faculty and staff who are
skilled in creating rubrics.
Rubric design is critical to meaningful evaluation. A poorly designed rubric is a roadblock to
both students and faculty and certainly to improving program performance. Examples of poorly
designed rubrics we have encountered are:
1. Rubrics that consist of one criterion for evaluating an outcome. This rubric will not
identify opportunities for program improvement because there is insufficient granularity
to identify what needs to improve.
2. Rubrics that have no performance level descriptors. This rubric will result in unreliable
results because evaluators can each use their own definitions of quality
3. Rubrics that use judgmental terms as performance descriptors. Terms like excellent,
good, fair, poor, needs improvement can distract conversations by making student
performance the focus rather than program performance.
4. Holistic rubrics that lump all performance levels under criteria for evaluation. This rubric
behaves like rubrics that have one criterion for evaluating an outcome. They will result
in understanding how well the outcome performs, but will not identify areas of
improvement.
The characteristics of poorly designed rubrics suggest the elements of a well-designed rubric.
Creating rubrics is an art and campuses that have this expertise are fortunate. An example of a
31
well designed rubric is taken from the VALUE (Valid Assessment of Learning in Undergraduate
Education) project of the American Association of Colleges and Universities. Earlier we referred
to the criteria for
evaluating quantitative
literacy. The VALUE
rubric for Quantitative
Literacy
(http://www.aacu.org/val
ue/rubrics/pdf/Quantitati
veLiteracy.pdf) modified
for this discussion is
shown here. Note the
non-judgmental
performance levels, the
full description of each
criterion and
performance level, the
presence of several key
criteria for evaluating
the outcome, and in
particular, how the
criteria for evaluation
provide a framework for the use of tests in program evaluation as we described above.
Deciding on assessment instruments the direct measurement, given what we have covered
in this section suggest that the decision-making process on how to assess an outcome might
be:
1.
Consider existing tests for disciplinary outcomes:
a. If there are existing final test or tests, and
b. If those tests are taken by a representative sample of students; and
c. If the tests are taken near the completion of the program; and
d. If scores by question are available; and
e. If a collection of final tests in a program make a complete picture of disciplinary
outcomes (knowledge of theory, practice, history/philosophy, application,
analysis and evaluation of theory, for example); then
f.
Use test question scores – reported as a percentage of total possible item score
– by criterion for evaluation
32
2. If no test or series of tests are available, use rubrics applied to completed work of
students hearing the completion of their program
Deciding on who, what, and when to assess takes us once again into the territory of
manageability and sustainability. Our discussions of sampling, artifacts, and frequency are
written through the lens of manageability and sustainability.
Sampling is critical in rubric assessments where large populations are involved. Many
institutions try to capture all students in a given assessment activity, which is not necessary.
Rubric assessments for all 2,000 students in a graduating class is neither manageable nor
sustainable. Rubric assessment of a sample population of 95, however, with a team of 5
evaluators each scoring 38 papers ((95 papers x 2 reads per paper)/5 evaluators) is sustainable
and manageable, particularly when the papers are being read for specific reasons (written
communication, information literacy, etc.); evaluators are not reading for content.
Sampling in outcomes assessment borrows from the guidelines of social scientific research
methodology. Typically this is a consideration in rubric evaluation. A quick internet search for
“statistical sampling size” will identify several usable tables to incorporate into an assessment
plan. Keep in mind that assessment results are not a formal research project ending in
published results. Instead the aim is to produce actionable knowledge that assists in the
continuous improvement of the program. As such, a 10% confidence level is sufficient for the
purposes of assessment.
Artifact collection processes are a challenge for all institutions. A perfect scenario is that all
students submit all their work electronically, neatly organized across the institution, with “tags”
that align to outcomes, so that gathering artifacts is easily accomplished. This scenario, of
course, does not exist - anywhere. In the planning stages of assessment, however, the process
of artifact collection should be collaboratively planned, clearly defined, and communicated in
advance to faculty and students. Bringing sampling methodology into the mix of considerations
further complicates the process. When applying sampling to a set of artifacts one approach
might be:
1. Identify the outcome to be assessed and the rubric to be used
2. Identify the source of evidence (artifact) that will reflect the outcome and is closest to
program completion
3. Identify the complete population of eligible students and artifacts
4. Determine the nature of artifacts that will be collected for evaluation through
randomization, purposive selection, or other systematic means.
5. Determine the workflow of obtaining the artifact from students. Examples: will the
artifact be collected electronically? Is the artifact already being submitted via an
electronic method? Have the artifacts already been collected via some other
online/offline method?
33
6. Identify courses from which the artifacts will be collected
Frequency of assessment is also viewed through the lens of manageability and sustainability.
The goal is to avoid going through assessment processes repeatedly and getting the same
results. Assessment results should show improvement over time as assessment results lead to
improvements in practice. Once the desired level of performance is achieved and assessment
results stabilize, the assessment cycle can move from annual to less frequent maintenance
checks. There are events and conditions that could affect results, however:
1. Initially it may be necessary to modify the rubric if evaluator feedback indicates it is
vague, confusing, or difficult to score; this is best discovered and corrected during pilot
phases
2. As the process matures, assessment teams may want to change the rubric in some
fundamental fashion. The institution may feel that important criteria are missing, or that
the levels of performance should be widened in order to better understand learning
needs.
Whatever conditions arise that suggest changing the rubric or the process, change them. This
is less about preservation of historical and comparable data than it is establishing the right tools
to generate program improvement.
Course evaluations are an important part of academic cycles serving the faculty by providing
feedback on their performance, and frequently serving the retention and promotion process by
providing information on student perspectives of faculty performance. Although volumes could
be written on campus issues surrounding course evaluations, this section will address only the
relationship of outcomes assessment to course evaluations. Guidelines for the administration of
course evaluations have typically been worked out by campus institutional research offices and
most campuses have established processes for these. The relationship of outcomes
assessment to course evaluations is discussed in the section entitled “Planning for Assessment”
in which we argued that course evaluations are about the outcomes of professional
development goals in an institution. But course evaluations could also address student
learning. If we believe that developing students’ ability to write and speak clearly, think critically,
work collaboratively on teams, analyze issues, etc. is the responsibility of faculty across the
institution, should course evaluations contain a question such as:
Please indicate the degree to which this course improved my ability to:
Express my thoughts in writing
Express my thoughts orally
Analyze and critique the thoughts of others
Work well in a team situation
34
Using this approach, course evaluations would provide an additional dimension to
understanding student learning. With or without the addition of student learning outcomes in
course evaluations there are new ways to think about course evaluations.
As in test questions, each course evaluation item always aligns to a faculty performance
outcome - whether explicitly stated or not. Mostly, course evaluation results are not reported by
outcome, but by order in which the questions were asked. This practice tends to shift
interpretation of results to frequency distributions, standard deviations, and other metrics that
tend to take our focus away from the meaning of the data. In order to support meaningful and
useful interpretation, course evaluation results might instead be reported by the expected
performance outcome. For example, a question that asks whether the faculty member graded
fairly is about “fairness”, as is a question that asks whether objectives of the class were
provided to students. A question that asks whether the faculty member was available during
posted office hours is about “accessibility.” And a question that asks whether the faculty
member treated students respectfully is about “respectful behavior.”
When course evaluations are reported from in an outcomes framework, it becomes a powerful
instrument for improving individual and institutional performance. This approach both on an
individual basis and on across the institution, provides a roadmap for individual improvement
planning and
professional
development
workshops. It should
be noted that in this
illustration, the category
marked “overall” is an
average of all questions
– it is not a question
that asks students to
provide an “overall”
rating for the course.
In the illustration, this
instructor would take
action on these data by (1) studying the survey questions that are contributing to the low mark in
“fairness,” (2) examining student comments from the course evaluations to determine why the
rating is low, (3) having focus groups or conversations with students to understand the gap
between student perceptions and the faculty member’s performance, and (4) establishing a plan
to improve. The Provost would examine this report from an institutional performance
perspective, noting that clear communication has the lowest rating across the university. The
provost would also study the survey questions and develop additional information as to why this
rating is low, then design a faculty development workshop for all faculty to improve their
classroom communication skills.
35
Surveys are widely used on campuses for a wide range of purposes. As in course evaluations,
each survey question is associated with an outcome, again usually unexpressed. Campuses
often describe their surveys as “satisfaction” surveys, however, one might argue that colleges
and universities are in the business of educating rather than satisfying students. Survey design
should start with the goals of the program.
For example, the facilities and maintenance office assessment might look something like the
following:
Goals:
Buildings and grounds at University X are clean, safe and contribute to
the educational and social success of its members and the surrounding
community. .
Outcomes:
Clean classrooms, public spaces, office spaces, and grounds;
Safe classrooms, public spaces, offices and grounds, etc
Functional classrooms, public spaces, offices, and grounds
Measurements:
Number of campus security incidents
Number of injuries due to hazards
Survey results on perceptions of students, faculty and staff
In designing a survey instrument, we want to ask no more and no less than the information we
need to identify how well the program is meeting its goals, and how the program can be
improved. By “tagging” each question with the associated outcome, we can on reporting, group
these results to show performance on the outcome overall, and the components that will
contribute to improved performance going forward. As with rubric design, survey design is key
to developing useful, actionable knowledge. In many instances, surveys are being deployed out
of routine with no one reviewing and using the results. In these instances, the survey should be
decommissioned to avoid survey fatigue.
Survey design and development on campuses are often the responsibility of people with direct
knowledge or expertise in the practice, but may not necessarily be skilled in survey design. As
with rubric design, this is an area in which a small investment in time and money can lead to
large rewards in terms of useful knowledge. Qualities of a good survey in an educational setting
include:
1. Questions align with criteria for evaluating an outcome. For example, a new student
orientation exit survey might ask if students understand how to access help with:
understanding your student bill; using the library; use of academic advising, adjusting to
college, etc.
2. Responses categories are comparable across questions. For example, a series of 5point Likert responses with “strongly agree to strongly disagree” responses with 4-point
36
Likert responses with “very useful to not useful”. There are very few questions that
cannot be worded to maintain a consistent response category throughout the survey.
3. Response categories are grouped together. For example, a series of 5-point Likert
responses with “yes/no” responses in the middle. Regardless of reporting format, this
type of organization creates problems for the reader or the report designer. Again, this
issue can be avoided through rewording.
4. Questions are understood clearly by the respondents. For example, questions that are
vaguely written, contain vocabulary or concepts that are inaccessible to the respondent,
and questions that should be split into two separate questions or reduced to the essence
of the question.
5. Questions are specific about criteria for evaluation. For example, questions that ask for
an overall rating, or level of satisfaction provide no meaningful information because the
criteria on which the response was based is unknown. This is a very common practice
in surveys; unfortunately the single value yielded by such questions are often used for
consequential decisions. Faculty committees are known to use the results of a single
question asking for an overall rating for a course to make decisions about retention,
promotion and tenure. A more valid measure would simply be to take an average of the
other questions. Particularly in cases where electronic course evaluations are used,
there is no reason to retain this question, as the average of other responses should be
easily calculated.
Other tools used in assessment are curriculum maps and portfolios. While both of these tools
are a bit removed from the assessment of student learning, both can be valuable in institutional
processes.
Curriculum maps are a very useful tool to understand how a program is designed to deliver on
outcomes. Curriculum maps are a
matrix, mapping expected program
outcomes as column headings with
program courses as row headings.
Individual cells within the matrix
indicate which course delivers on
which outcome. Institutions and
programs complete curriculum maps
in different ways and as might be
expected, some approaches are more
useful than others. This can be a
very powerful mechanism for
collaboration, planning, and
understanding program design by all
members of the program.
37
The concept of curriculum maps can be applied to any program that has a set of activities
designed to produce outcomes. The student retention program, for example, can map expected
skills and competencies (outcomes) against program activities. In fact any program or service
can use this approach to determine: (a) that all of the expected outcomes are addressed in the
planned actions of the unit, (b) where there are gaps that need filling, and (c) where there are
overlaps that can be reduced. The characteristics of good maps are that visibility and
understanding of the program are made clear in each of the 3 areas. Generally, this means that
there are numerical values assigned to intensity of coverage, perhaps accompanied by color
coding of cells, so that coverage, gaps, and overlaps are clear to the stakeholders.
Portfolios are often thought to be a convenient collection method for artifacts organized by
expected outcomes which can then be used in the assessment of those outcomes. With the
emergence of artifact collection technologies there may be less focus on portfolios because
automated artifact collection technologies avoid the problems associated with getting students
to faithfully populate the portfolios and avoid the ethical issue of requiring all students to
complete a portfolio but using only a portion of the total portfolios for actual assessment.
Characteristics of assessment portfolios might include:
1. A separate portfolio space for artifact(s) associated with a specific outcome
2. A clear explanation of the outcome including name, criteria for evaluation, and a means
to access the associated rubric(s)
3. Instructions that ask for the student’s best work that matches the criteria for evaluating
the outcome.
4. The ability for students to replace the artifact(s) as the student produces higher levels of
their best work
5. The student’s ability to take a copy of the portfolio and contents with them upon
graduation or to access the portfolio for a certain period of time after graduation for use
in employment or graduate school applications.
Tests, rubrics, surveys, course evaluations, curriculum maps, and portfolios are the main tools
of educational outcomes assessment. Throughout this discussion, we have kept a focus on
quality of tools and processes, while keeping an eye on the issues of sustainability,
manageability, scalability, which will be directly addressed in the next section.
Sustainability, Manageability and Scalability are those high level dynamics that play out in
the most minute details of assessment practice. Each of these issues have been addressed
throughout the preceding discussions, but to summarize these minute details, a few guidelines
are:
Sustainability:
38
1. Pilot and conduct direct assessment on a small scale initially and use the initial pilots to
identify opportunities to create efficiency – meaning getting the needed information with
the least amount of effort
2. Cultivate assessment across the institution through demonstrating the benefits and
sharing the energy generated through using results that improve programs
3. Use resources up front to organize, staff, and build expertise in critical areas of
assessment design
4. Keep executive sponsorship in the information loop
Manageability:
1. Maintain active oversight and guidance by organizing people with responsibility for the
assessment processes
2. Establish and communicate clear roles and responsibilities of staff and committees and
provide authority to modify, change, and improve assessment processes
Scalability
1. Establish a plan for growing assessment practice across the institution
2. Communicate successes and lessons learned
The information in this section is designed to produce actionable knowledge – knowledge that
results in the successive improvement of quality in teaching and learning, support programs,
and administrative services across the institution. A significant factor in creating actionable
knowledge is the use of results, which includes the interpretation of data and converting the
resultant knowledge into action. Our next section, “Using Assessment Results” will discuss this
dimension.
39
Chapter 4: Using Assessment Results
. . . for institutions, three powerful gifts result from outcomes assessment. First, institutions have a means of
engaging faculty in the design and delivery of the curriculum – and to do this on an ongoing basis. Secondly,
the institution can now provide evidence to substantiate bold claims such as “our graduates have a deep
awareness of their professional work on society.” Thirdly, the most powerful gift is adaptivity – the ability to
change course to meet a desired end state. Faculty engagement, continuous improvement, institutional
identity, adaptation and change – a powerful gift indeed.
Karen Yoshino, Blackboard Blog January, 2010
Actionable knowledge is the end game of outcomes assessment – meaning the use of data to
tell us where we stand and how to improve. Getting to actionable knowledge, however, requires
a process for giving meaning to data. When meaning is assigned to data, we raise the
opportunity for action – action that improves the quality of teaching and learning, student
support programs, and administrative services. This process includes systematic approaches
to analysis, interpretation, communication, improvement planning, and follow-up actions. Each
of these concepts will be discussed in this section.
Indirect
Direct
Indirec
t
Direct
Dire
ct Indirect
Institutional
Effectiveness Student Success
Educational
Effectiveness
What does actionable knowledge look like? In the example here, we see that direct
measurement of teaching and learning
Educational Effectiveness
indicates our General Education
Dashboard - % of total possible
program has not developed the level of
knowledge and skills we have set as
Fairness
our standard for “scientific reasoning.”
Organized
This information can be provided to our
Increase knowledge
Oral Communication
accreditors as evidence that the
Critical Thinking
institution has named, evaluated, and
Scientific Reasoning
identified underperforming areas. Most
Multicultural Competence
institutional metrics (graduation,
Availability of social…
retention, average GPA, etc.) provide
Availability of career guidance
this level of insight. They tell us where
Clarified values
we stand, but provide little insight into
Healthy behavior
Social responsibility
Access to technology
2009-10 Scientific reasoning results
Safe campus
Access to financial aid status
Personal finance knowledge
Organizational knowledge
Observe
Hypothesize
Design
how to improve performance.
Collect
Analyze
Conclude
40
If we then look at the results of the underlying results that produced the scientific reasoning
indicator (in this case a rubric evaluation process), we can then better understand what to do to
improve the outcome. In the illustration here, we see that hypothesis, analysis, and conclusions
skills need to be improved. This is an opportunity to bring the lab science faculty together as a
whole, examine the design of the curriculum and make plans to reorder, reinforce, and
emphasize these skills to generate improved student AND program performance in the future.
Analysis of outcomes assessment data requires us to think about “data” in different ways than
we routinely think about data. In general, when interpreting assessment data it is useful to:

Maintain a focus on the program as the unit of measure;

Use data for the purpose of improving performance

Avoid the tempting but common questions stemming from our prior knowledge and
daily practice that detract from the focus on program improvement. We will discuss
many of these detractors in this section.
Program is the unit of measure. Faculty constantly evaluate student work and assign grades
on a student-by-student basis. As a result, our tendencies are to view aggregate student data
as a reflection of aggregate student performance rather than our real focus – program
performance. It is true that we must rely on student work in outcomes evaluation, but once
student performance results are aggregated we need to resist the temptation to speak about
areas of underperformance as student underperformance, but instead to think of it in terms of
program underperformance.
41
Although most program assessment processes thankfully do not rely on grades, it is important
to discuss the underlying rationale. Unless grades are assigned to students using a formal
rubric process, a grade is simply
not sufficiently granular to be
useful in program assessment.
For example, a B+ issued to a
student after an oral
presentation in a history class is
based on two areas: history and
oral communication.
Accordingly, the use of the B+ in
assessing “oral communication”
for the general education
program or the history program
will be skewed by the faculty
member’s weighting between
history and oral communication
skills. To further complicate this example, the variation in weighting among several history
faculty make the collection of grades for program assessment gives us no actionable
knowledge. In using grades for program assessment, we spend considerable institutional
resources for little return.
Scientific and Evaluative methods. We read scholarly journals containing data gathered with
the highest levels of scientific rigor and analyzed with the tools and standards of inferential
statistics. As a result, we are accustomed to thinking about data as objective and value neutral,
and the implications of these data as adding to a
body of theoretical knowledge rather than of
practical use. Because we have become
accustomed to taking in new research knowledge
and filing it away in our minds, we are not inclined
to take action on data.
The data provided by our institutional research
offices are generated with statistical tools and
strict standards. These data are often used for
external reporting purposes or used as internal
indicators of program performance. These data
(retention rates, Average GPA, admission scores,
demographics) are used to tell us where we stand, but because they are not sufficiently granular
they give us little information about where to target improvement efforts. Often, when we do
take action based on these high level data, we do so without the benefit of knowing whether the
action is going to fix the problem.
42
Another aspect of our training in scientific research methodology is a tendency to bring
misplaced scientific rigor to the assessment process. While no one is arguing against
methodological rigor in program assessment, traditional methods of statistical research often
create unnecessary
distractions. A recent
assessment listserv
discussion on inter-rater
reliability encouraged many
voices to contribute
information on what inter-rater
reliability programs were being
used, what analyses were being generated, and detailed methodologies. No one raised the
issue of examining the specificity of the rubrics! We have seen many rubrics with vague
performance descriptors or no performance descriptors. These rubrics will surely lead to interrater reliability issues. However, a tightly written rubric with specific differentiators among
performance levels, accompanied by a short introductory exercise will discourage inter-rater
reliability issues.
In thinking about the level of rigor in conducting assessment it is well to keep in mind that the
methodology should be rigorous enough to identify systematic patterns of program
performance. Program assessment is not a high stakes proposition for individuals; no individual
gets failed, blamed or held accountable as a result of the process. It is high stakes, however,
because it is the only mechanism we have to manage the quality of our programs and failure to
conduct assessment puts educational quality at risk.
Finally, the use of correlations, regressions and other methods of inferential statistics are of little
use in program assessment because the unit of measure is the program, not the student.
The value of “value added.” During the evolution of educational assessment, the issue of
accountability was the principal driver. As the conversation continued and the practice of
assessment began to take shape the concept of “value added” was introduced. This concept
was often expressed in terms of questions such as: “Why does going to college matter?” or
“What value does this college add that another college doesn’t?” “What do students get from an
education at University X?” While these are interesting research questions, the pursuit of “value
added” has resulted in significant amounts of energy and resources spent on standardized and
pre-post tests that only raise more questions.
In the case of standardized tests, we learn that our students perform more poorly than our
comparison group or the national norm on quantitative literacy. Does this mean we did a worse
job of educating our students in quantitative literacy, or does it mean our comparison group
draws from a population of students who had higher levels of math skills coming in the door, or
does it mean that the test is not a good measure of quantitative literacy? More importantly,
what do we do now that we have this information? Unless the standardized test reports provide
43
a breakdown at the criteria for evaluation level, there is little more we can do with this
information but say we do better or worse than comparison groups.
Pre-post testing is often thought to be a means of measuring value-added, but when we find
that students know and are able to do more between pre-and post testing events, what do we
learn? Only that they do better at the end of a program or they do not. However, pre-post test
results can give us useful information. If we examine the results by criteria for evaluation we
have valuable information. Pre-test scores reported by criteria for evaluation will tell us how to
adapt the program to address the knowledge and skill needs of the incoming student population.
Post-test scores reported by criteria for evaluation will indicate for us specific ways to improve
the design or delivery of the curriculum.
The graphic below is a concept map for all of the elements we have discussed above that lead
to actionable knowledge – using results for accountability and for improvement.
What do we mean by “summative” and “formative?” We have argued that the primary
purpose of educational program assessment is to provide summative and formative information
about program quality. Unfortunately, the terms “summative” and “formative” carry different
definitions depending on your location in the world of education. In grants and other projects,
the term is time-based. So formative might mean mid-program and summative might mean at
44
the end of a program. In schools, summative is assessment of learning, and formative is
assessment for learning – where the former uses grades or report card content, and the latter
uses various mechanisms for feedback to students. The result is that context brings different
interpretations to the assessment table.
We argue that the terms formative and summative mean more than when an assessment
occurred; but should instead convey the purpose of the assessment. Thus, we take “summative”
to mean judgment (to what extent were the outcomes of the program achieved?); and
“formative” to mean diagnostic (what are opportunities to improve the program? In this context,
it makes sense that we examine evidence of students nearing the end of a program to answer
the summative question – to what extent did we achieve the program goals? If we also have
sufficiently granular data, we can also end-point information to identify improvement
opportunities.
Designing assessment processes for analysis. The conceptual “set-up” for analyzing
assessment information is important in generating actionable knowledge. Although rarely found
in the real world, ideal assessment instruments are organized so that reporting and analysis are
apparent. In this ideal situation: criteria for evaluation are grouped by outcome, outcomes are
aligned to goals of the program, goals of the program are aligned to goals of the institution, and
goals of the institution are aligned to the mission. In the ideal survey, questions are aligned to
outcomes, answer scales are consistent throughout the survey; indeed, the institution has
established a standard scale so that all surveys are designed on the same scale. Coming back
to the real world, these conditions are rarely found. This makes report design extremely
important.
Data + meaning = Actionable Knowledge. The key to designing assessment reports is to
focus on the evaluation question, which is another way to say “what is the meaning of the
data?” Across all types of assessment results to be reported, design the report for your least
experienced reader by providing
graphics whenever possible, and stick
with the principles that: (a) fullness and
completeness of data do not
Not Recommended
necessarily equate to usefulness of
data, (b) interesting data do not always
equate to important data, and (c)
granular results reported at the criteria
for evaluation level facilitate use of
results. Detail level data can always
be produced for those who are
interested, but the distillation of data to
Recommended
its key elements will keep the
conversation focused on the meaning
45
and not the methodology or the statistical intricacies involved.
Actionable knowledge + effective reports = Improvement Opportunities. The following are
helpful pointers for reporting assessment results to support the analysis process:
Reporting surveys and course evaluation results
1. Resist the urge to report results in the order they are found in the instrument; instead,
group item results by construct logic. For example, the course evaluation questions
“The instructor showed respect for students,” “The instructor was tolerant of differing
opinion,” and “The instructor encouraged broad participation” might all be grouped under
the construct “inclusive behavior.” Whether the 3 individual question results are reported
or not, the combined results should be calculated for “inclusive behavior.” This will
support analysis by the qualities of teaching that the university considers desirable.
Indeed, all questions on a survey or course evaluation should be aligned to a construct
that links to the reason the question is being asked in the first place.
2. Resist the urge to report all of the data in the instrument; only report those that respond
to the evaluation question. Staying with the course evaluation example, the question
“Was this course team taught?” has little to do with the quality of the course. This item
responds to a research question such as “what is the status of team teaching at the
university?” and as such should be reported for a separate audience. Excluding data
that do little to inform the evaluation question will encourage interpretation and use of
these reports. Another way to state this is: beware of reporting instances (or asking
questions) in which all responses in a course are the same; eg. “Was the course team
taught?” In the instance of the team teaching question, if it is important to know how
many courses are being team taught by how many teachers, the institution should
consider adding this to registrar’s database officially maintained by the institution.
3. Beware of survey items that cannot be associated to a criterion for evaluation –
especially if they are “student satisfaction” type questions. When surveys or course
evaluations contain questions like “The overall quality of teaching in this course was
excellent/outstanding,” consider reporting the combined results of all other questions as
well as the results of this question. The argument here is that there are no criteria for
evaluating “excellent/outstanding.” Some students will respond to this question on the
basis of the personality, political views, or any number of factors that have little to do
with effective teaching and learning. Instead, if we put emphasis on the questions that
relate to specific criteria (fairness, inclusive behavior, availability, effective
communication, student engagement, etc). This becomes particularly important if
student perceptions are being used in the performance of faculty, as committees may
rely on the responses to questions such as “overall quality” to make impactful decisions.
Reporting rubric evaluation results
46
1.
Report average aggregate results by criterion for evaluation (rubric rows) as well as
averages for the outcome as a whole. This will support analysis by providing answers to
the summative question (how well is the program performing on this outcome?) and the
formative question (where are there opportunities to improve performance?).
2. Where rubric performance levels (rubric columns) assign labels such as “excellent,
good, fair, poor”, attempt to convert these to non-judgmental terms like “level 1, level 2,
level 3, level 4”. This will support analysis by lessening the tendency to see the data as
“fair” student performance, and encourage viewing the data as indicators of program
performance.
Once raw assessment data have been converted into meaningful reports, they are ready for the
process of interpretation.
Improvement Opportunities + Decision Path = Quality Enhancement. The process of
interpretation takes us directly back to the evaluation question – to what extent did program X
deliver on the expected outcomes, and what are opportunities to improve performance? This
implies that those participating in the interpretation of results have a combination of: a stake in
the program being evaluated, expertise in the subject matter, the skills to create plans for
improvement, and the organizational capital to make recommendations that will be broadly
accepted in the community of practice. Since very few individuals possess all of these
characteristics, this suggests that the interpretation process be collective and collaborative.
Collective and collaborative interpretation processes. After a combination of expertise and
skills have been identified, and before coming together as an interpretation team, each member
should have an opportunity to review assessment reports, together with a description of the
methodology, and the instruments used.
In most circumstances, the
interpretation process will be
facilitated by a brief interpretation of
the data and even a recommended
action statement such as in this
graphic.
A team leader or chair of the group
should have a thorough
understanding of the data and be in
agreement with the initial interpretive statements as well as the recommendation before a group
meeting takes place. The group should be provided a framework within which to conduct the
interpretation discussion. Elements of the framework should include expectations for the
outcomes of the meeting. In general these expectations will be:
a. Summative: Is the program performing as desired?
47
b. Formative: In what areas do we find opportunities to improve either the design or the
delivery of the program?
c. Action plan: What do we want to accomplish in terms of improved performance going
forward, what steps will be taken, and who will be responsible for follow-through?
Action plans. The types of recommended changes resulting from the assessment process
typically have broad implications involving many faculty. This is the point at which broad groups
of faculty become engaged with each other in improving the design and delivery of the
curriculum. Evidence of need to improve written communication, or global awareness, or critical
thinking can now take place. Sufficient granularity is provided to understand what elements of
critical thinking need to be more specifically infused across the curriculum makes everyone think
about the approaches in their programs and in the courses they teach.
It is also likely the point at which the Director of Assessment releases ownership of the data and
the process to Vice Presidents and Provosts, and the point at which the assessment team
releases ownership of the process to the broader faculty. For example, recommendations on
general education outcomes may have to be processed through the General Education
Committee. The academic senate may need to consider the recommendations.
A powerful gift. The mechanisms by which the data move through the governance process of
the institution will vary by institution, but the result will be powerful. Through the assessment
process, institutions have a mechanism to systematize adaptive processes – to make those
adjustments and change course to meet the mission and goals of the institution a powerful gift
indeed.
48
Download