An Assessment Primer: A Guide to Conducting Assessment Projects Metropolitan Community College

advertisement
An Assessment Primer:
A Guide to Conducting
Assessment Projects
August 2003
Metropolitan Community College
Produced by the
Office of Research, Evaluation and Assessment
An Assessment Primer:
A Guide to Conducting
Assessment Projects
A Resource Developed for the
District Steering Committee for Institutional Assessment
General Education Outcomes Assessment Subcommittee
Subcommittee for Assessment of Occupational Program Outcomes
Student Development Assessment Subcommittee
The Metropolitan Community College District
Business & Technology College – Blue River Community College
Longview Community College – Maple Woods Community College
Penn Valley Community College – Administrative Center
Produced by the
Office of Research, Evaluation and Assessment
August 22, 2003
TABLE OF CONTENTS
Section
Page
INTRODUCTION………………………………………………………..
Why is Assessment Important……………………………………………..
What is Assessment?.....................................................................................
Explanations of an Assessment Program…………………………………..
Measures of Learning………………………………………………………
Expectations and Demands for Accountability…………………………….
1
1
1
1
3
4
DEVELOPMENT OF ASSESSMENT PROJECTS……………………
Spend Time to Review the Literature and Discuss Issues about the Topic...
Develop Question(s) that form the Basis of Inquiry………………………..
Develop a Research Plan to Obtain the Data……………………………….
Develop Answers Based on the Data……………………………………….
4
5
5
5
6
METHODOLOGICAL ISSUES…………………………………………
What to Assess……………………………………………………………...
Define Components………………………………………………………...
Examine Component Intricacies……………………………………………
Review Measurement Options……………………………………………..
Scales……………………………………………………………………….
Rubrics……………………………………………………………………...
How Many Subjects………………………………………………………...
Sampling……………………………………………………………………
Random Sampling………………………………………………………….
Stratified Sample…………………………………………………………...
Representative Sample……………………………………………………..
Living with Something Called Error……………………………………….
6
6
6
6
7
7
7
8
8
8
9
9
9
DESIGN ISSUES…………………………………………………………. 10
Experimental vs. Control Group…………………………………………... 10
Collection of Students…………………………………………………….. 10
Longitudinal Design………………………………………………………. 10
Mixed Model……………………………………………………………… 11
METHODOLOGICAL CONSTRUCT………………………………… 11
Written Data………………………………………………………………. 11
Objective Data…………………………………………………………….. 11
Rating Scales………………………………………………………………. 12
Survey Data………………………………………………………………... 12
TABLE OF CONTENTS
Section
Page
IDENTIFYING MEANS OF MEASUREMENT………………………. 13
Fixed Scale Data…………………………………………………………… 13
Sliding Scale Data…………………………………………………………. 13
Using Proportions………………………………………………………….. 13
Norm Referenced Measurements………………………………………….. 13
Criterion Referenced Measurements………………………………………. 14
Qualitative vs. Quantitative Data…………………………………………... 14
LOOKING AT DIFFERENCES………………………………………... 14
Significant vs. Meaningful Differences……………………………………. 14
Control for Effect………………………………………………………….. 15
Pre-Test to Post-Test Differences…………………………………………. 15
Reliability vs. Validity……………………………………………………... 15
IMPLEMENTATION ISSUES………………………………………….
Establishing Analytical Creditability………………………………………
Answer the Question but First Define the Question……………………….
Provide Information According to its Purpose…………………………….
Match your Information with its Recipients……………………………….
Beware of the Perils of Printout Worship………………………………….
Keep Focused on what you are Attempting………………………………..
Develop a Research Plan and Follow It……………………………………
Take Time to Summarize…………………………………………………..
Make an Effort to “Do It Right”…………………………………………...
How Much Data is Enough?.........................................................................
16
16
16
17
17
17
18
18
18
19
19
HOW GENERALIZABLE ARE THE SCORES………………………. 20
Do the Scores Suggest Learning has Occurred?............................................ 20
Does the Learning Relate to the Larger Question of Assessment?............... 21
Does the Activity Producing the Score Suggest General or Specific
Learning………………………………………………………………… 21
GATHERING AND MANAGING RESOURCES FOR
ASSESSMENT………………………………………………………... 21
Whose Responsibility is Assessment Anyway?........................................... 21
Obtaining Human Resources for Assessment…………………………….. 22
Consultants………………………………………………………………... 22
Institutional Employee…………………………………………………….. 22
Assessment Committee……………………………………………………. 23
Assessment Consortium…………………………………………………… 23
External Agency…………………………………………………………… 23
Avoiding the HINIBUS Syndrome………………………………………... 24
TABLE OF CONTENTS
Section
Page
A COLLECTION OF SPECIFIC ASSESSMENT TECHNIQUES….
Embedded Course Assessment…………………………………………….
Using Norm-Referenced Tests…………………………………………….
Commercially Produced Tests……………………………………………..
Criterion-Referenced Tests………………………………………………...
Portfolio……………………………………………………………………
Scoring Rubrics……………………………………………………………
Classroom Assessment Techniques (CAT)………………………………..
Capstone Courses………………………………………………………….
24
24
25
26
27
27
28
28
29
SUMMARY………………………………………………………………. 29
REFERENCES…………………………………………………………….30
Preface
This document was developed to assist MCC personnel engaged in assessment or
research activities. The contents are the result of the author’s consulting with district
faculty, staff and administrators and mirror the questions and concerns voiced by those
persons as they engage in their research/assessment activities.
Any questions or comments regarding this document should be directed to its author:
Dr. Charles L. Van Middlesworth
District Director, Research, Evaluation and Assessment
Metropolitan Community College
3200 Broadway
Kansas City, MO 64111
Telephone: (816) 759-1085
Fax: (816) 759-1158
Email: charles.vanmiddlesworth@mcckc.edu
An Assessment Primer
INTRODUCTION
Why is Assessment Important?
Since the late 1980’s much of higher education has been focused, with varying degrees of
success, on assessing what students know. External mandates for institutional
accountability have made college and universities shift their focus from teaching to
learning. This is not to say that the assessment of student learning has not been taking
place within the wall of academe; rather, the emphasis has not been on what students
learn and the validation that students learn what we think they learn. Assessment is far
more than external accountability. It is the process of gathering and using information
about student learning to improve the institution, curriculum, pedagogy, faculty and
students.
What is Assessment?
Assessment has been defined as the systematic gathering, interpretation and use of
information about student learning for the purposes of improvement. Assessment has
also been defined as a means for focusing our collective attention, examining our
assumptions and creating a shared academic culture dedicated to continuously improving
the quality of higher learning. Assessment requires making expectations and standards
for quality explicit and public by systematically gathering evidence on how well
performance matches those expectations and standards. It also includes analyzing and
interpreting the evidence of assessment and using the information to document, explain,
and improve performance. Thus, the single unifying notion of this discussion is that
assessment is a process not a product.
Expectations of an Assessment Program
For an assessment program to be considered “effective” it should contain the following
features:
8/22/2003
1
•
Structured; that is it should be organized and have a recognizable
framework;
•
Systematic; it is conceived and implemented according to an assessment plan
developed by the institution;
•
Ongoing; assessment activities and feedback are continuing rather than
episodic;
•
Sustainable; that is, the institution is able to maintain the assessment program
with the structures, processes, and resources in place;
•
A Process exists that uses assessment results to improve student learning.
The assessment process will be “framed” through the questions an institution wants to
know about its teaching and learning. This cannot be emphasized enough – questions
will influence every decision made concerning the assessment process. Many assessment
plan developers attempt short cuts by beginning with how the data is to be collected,
rather than to discuss and question what they want to know. I have learned from years of
experience that it is nearly impossible to solve a problem that has not been defined! It is
as important to query what students have learned, but it is equally important to provide
students with the option of “reflecting on their learning”.
Assessment can refer to two different activities: the gathering of information and the
utilization of information for improvement. From a practitioner point of view, the most
important description of assessment is simply: 1) what do students know; 2) how do
you know they know; and 3) what do you do with the information? As a term,
assessment data has various meanings within higher education. The term can refer to a
student’s course grade, a college placement test score, or a score on some form of
standardized instrument produced by an external agency or organization. The
information for assessment may be numerical, written reflection on previous work,
results from internally or externally developed instruments or examinations, or
assessments embedded within course work, to name a few. The principal goal of a
program for the assessment of student academic achievement is to improve teaching and
learning with evidence provided from an institution’s assessment program. Assessment
8/22/2003
2
information can stimulate changes in curriculum. A common misconception in higher
education is that assessment consists of administering exams, assigning a course grade,
scoring an assignment with a rubric or having a student demonstrate learning.
Assessment is not an act but rather a process that includes developing a conceptual
framework, identifying learning outcomes, developing a methodology for
implementation, evaluating design issues, administration of the activity, analyzing the
results, with the final step being using the information learned from the process to
improve teaching and learning. This primer is designed to assist the reader as they
become involved with their institution’s assessment program.
Measures of Learning
Data collection methods include paper and pencil testing, essays and writing samples,
portfolio collections of student work, exit interviews, surveys, focused group interviews,
the use of external evaluators, logs and journals, behavioral observations, and many other
research tools. Research methods should be tailored to the type of data to be gathered,
the degree of reliability required, and the most appropriate measurement for the activity
being conducted.
In practical terms, there are two different types of learning measurement: direct
measures of learning and indirect measures of learning. Direct Measures of Learning
include pre- and post-testing; capstone courses; oral examinations; internships; portfolio
assessments; evaluation of capstone projects; theses or dissertations; standardized
national exams; locally developed tests; performance on licensure, certification, or
professional exams; and juried reviews and performances. Indirect Measures of
Learning might include information gathered from alumni, employers, and students;
graduation rates; retention and transfer studies; graduate follow-up studies; success of
students in subsequent institutional settings; and job placement data. The preference for
assessment programs is to use direct measures of learning.
8/22/2003
3
What can institutions of higher education do to prepare themselves to meet the
expectations or demands for accountability through the assessment of student
learning?
Institutions should create “an assessment-oriented perspective”. “An assessment-oriented
perspective” exists when all levels of the institution become advocates for “doing what is
right” and commit time, energy and resources to seriously examine student learning.
Advocating an emphasis toward student learning enables the institution to place “at the
head of the table” the single most important aspect of higher education, and that is
learning. The role of faculty in this endeavor is paramount. The comment made most
often by faculty is they are content not assessment specialists; however, faculty need to
realize it is through their knowledge of the content area that learning questions are posed,
discussed, defined and assessed. One method to use when engaging in the process of
assessment is triage. Triage has been defined as the sorting of and assigning priority
order to projects on the basis of where time, energy and resources are best used or most
needed. In this case, triage refers to identifying a learning need, such as general
education, discussing its attributes, defining its context, developing its component and
learning outcomes and developing assessment strategies to answer those questions. First
and foremost, all should recognize that assessment is research. Assessment is research
because through the multi-stage research process judgments are made, and judgment
translates into meaning.
DEVELOPMENT OF ASSESSMENT PROJECTS
For the last 10 years, institutions of higher education have spent considerable time and
energy refining their “Plan for the Assessment of Student Academic Achievement”.
When an institution’s plan was developed, submitted and accepted, there were probably
some that did not think revisions would be necessary until the next accreditation visit.
This is not true and by now it is widely known that some plans will require significant
changes. A method of reviewing assessment plans and making adjustments that are timewise and appropriate are:
1. Spend time to review the literature and discuss issues about the topic
8/22/2003
4
2. Develop question(s) that form the basis of inquiry
3. Develop a research plan to obtain the data
4. Develop answers based on the data.
Spend Time to Review the Literature and Discuss Issues about the Topic
In general terms, the first step to building an effective assessment project is to spend time
to review the literature and discuss issues about the topic. Without question, discussions
regarding learning topics are more productive when efforts are taken to review pertinent
literature. The literature review allows members of the “learning topic group” the ability
to obtain a “theoretical perspective” of the topic as well as examine implementation of
similar projects at like institutions. Having and discussing background information tends
to keep the learning topic “in focus”.
Develop Question(s) that Form the Basis of Inquiry
Once the literature review and subsequent discussion has taken place the learning topic
group needs to develop a series of questions that will form the basis of inquiry. As
mentioned previously, assessment is research and research is an activity that is employed
to answer questions. During the question development stage steps need to be taken to
insure the question and its components are adequately defined or specified. Poorly
framed questions will generate data of little value.
Develop a Research Plan to Obtain the Data
The third step in this process is to create a research plan that becomes the operational
document for examining the learning topic. A research plan is the methodological
roadmap to successful and useful assessment activities. The development of the research
plan occurs following the first two steps of this process, noted above. Earlier narrative
provides the basis from which the learning topic will be framed, structured and assessed.
8/22/2003
5
Develop Answers Based on the Data
The last statement constituting this process seems logic and/or obvious. In fact, this
aspect of the process can be the most difficult because the data provided might show that
earlier steps were not “fleshed-out” to the extent they should have been. At this stage if
any short cuts or other “less defined” activities were implemented, the data will provide a
clear statement that the process needs to be refined or some aspects of the project need
revisiting.
METHODOLOGICAL ISSUES
What to Assess
Within the context of the aforementioned steps, it is critical that appropriate
methodological decisions be made about the project and its associated activities. Several
issues have been identified to provide a “framework” to assist with the development of
the methodological component of the learning topic project. The first issue is to
determine what to assess. Earlier, a four-step process asked the learning topic group to
review the literature and develop questions that formed the basis of the assessment
activity.
Define Components
In methodological terms, it is time to define the components of the question in
operational terms. “In operational terms” suggests that each component of the learning
topic be defined in a unique and identifiable way.
Examine Component Intricacies
Once the learning topic components are defined, members of the group need to focus on
the measurement possibilities. Examination of component intricacies provides a basis for
determining if the learning topic is to be assessed as a whole and/or through the
individual components. If the “individual component” option is chosen then the
measurement option chosen needs to afford the learning topic group the ability to
individually assess each component as well as its contribution to the whole.
8/22/2003
6
Review Measurement Options
Measurement options for components include the use of fixed or sliding scales or scoring
rubrics. Scales or rubrics become numerical representations of learning activities,
experiences and breadth of learning.
Scales
A scale has been defined as a set of items “arranged in some order of intensity or
importance” (Vogt, 1993). Scales associated with research and assessment activities can
be generally categorized as fixed or sliding. A fixed scale refers to a set of points that
identify a particular social or learning event. The points on a fixed scale are integers, that
is, on a scale of 1 to 7 each case (or person’s) behavior and/or learning is associated with
a component score that is either a 1,2,3,4,5,6 or 7. Fixed scale points are defined in terms
of “agreement”, “satisfaction” or other, but the scale definition and value are the same for
each item. Fixed scales are sometimes misconstrued as sliding scales because during the
analysis the mean for an individual item may be a 3.7, thus, many think of the scale as
sliding. This is not so, because 3.7 is a summary of item responses rather than an
individual item score. All scores for the item create the item summary mean. A sliding
scale may use the same 7-point scale but a person or case has the option of placing
himself or herself at any point along the scale. Sliding scales are typically used to
determine where a person would place himself or herself in response to a set of questions
or items that pertain to a defined knowledge or opinion set. Although the scale is a 7point scale, the scale attributes are different for each item.
Rubrics
A rubric, on the other hand, utilizes an attribute scale designed to signify the level or rate
of learning that has taken place. The idea behind a rubric is for members of the learning
topic group to identify competencies or attributes that enable specific distinctions
between performance or knowledge levels for each person completing the assessment
activity. A rubric may not lend itself to utilizing a mean to describe performance or
differences between participant scores. Rubrics are typically discrete scales that identify
8/22/2003
7
participants as achieving a score of 1, 2, 3, 4, 5 or 6, rather than identifying participants
as at the 3.46 level. If the learning topic group wants to utilize a sliding scale, those
discussions need to occur during the project development stage prior to implementation.
How Many Subjects?
One of the most frequently asked question of methodologists is “how many people do we
need to make this a legitimate study/process? The answer to this question has
antecedents in the discussion above. The number of students is determined by the nature
of the project and the level of implementation (pilot study or major assessment
component).
Sampling
The first question that should be asked is whether or not all students should receive the
assessment or a sample. This decision is largely determined by the size of the student
population, availability of human and financial resources, and relevance to the project
intent. Small institutions should carefully scrutinize the use of sampling because of the
small numbers of students that represent the target group. Readers should be cautioned
that not all members of a campus community necessarily endorse the use of sampling.
There are many reasons for the reluctance to accept sampling as a viable component of
the assessment process. This writer believes much of the reluctance stems from a
misunderstanding of sampling, its attributes, and rules regarding the appropriate use of
sampling. When properly applied, sampling is a powerful tool and should be in every
institution’s methodological toolbox.
There are several types of sampling: random, stratified and representative. By far the
easiest sampling technique to implement is the random sample.
Random Sampling
A random sample is determined by “selecting a group of subjects for study from a larger
group (population) so that each individual is chosen entirely by chance” (Vogt, 1993).
8/22/2003
8
The important aspect of the random sample is that every member of the population has an
equal opportunity of being chosen.
Stratified Sample
A stratified (random) sample involves selecting a group of subjects from particular
categories (or strata, or layers) within the population (Vogt, 1993). Examples of a
stratified sample are using females, or males 30 years of age or older, or students
completing Composition I, as the population for the study.
Representative Sample
Representative samples are not mentioned in the literature to the extent of random and
stratified samples. A representative sample is the selection of a group of subjects that are
similar to the population from which it was drawn. In the case of representative sample,
the operative word is similar. In its complete form, similar refers to matching the
characteristics of a subject population in terms of the criteria that identify that population.
For instance, if the learning topic group wants to assess a representative sample of
students at an institution the criteria for determining membership within the sample must
be through “demographic analysis”. Prior to selecting the sample the learning topic
group would identify the attributes of the student population that are determined
“demographic”. In most cases the demographic attributes of a subject population would
be gender, age, racial/ethnic affiliation, marital status, socio-economic level and so forth.
To obtain a representative sample subjects would be selected in the direct proportion of
their membership within the student population. For instance, if 25 percent of the student
population is female, over 25 years of age, white, single and lower middle class, then 25
percent of the study subjects must also meet these criteria. Representative samples are
not used at a frequency as great as random or stratified because of the complex nature of
developing such samples.
Living with Something Called “Error”
Aside from the mechanics involved with selecting a sample of subjects is the realization
that every form of design has some sampling error. The typical amount of sampling error
8/22/2003
9
associated with surveys varies from 3 to 5 percent. The amount of error a project is
allowed to tolerate is proportional to the number of persons associated with the project
and the amount of “infrastructure’ support that is available.
DESIGN ISSUES
There are several types of research designs that lend themselves to assessment projects:
experimental versus control groups, pre- post-test, collection of students, longitudinal or
mixed model. Each design will be briefly discussed.
Experimental vs. Control Group
The experiment versus control group is one of the most widely known research designs.
The underlying principal of the experimental design is the ability to assess the
intervention(s) and/or the lack of, with a group of subjects. As is commonly known, the
control group does not receive the intervention whereas the experimental group does.
The pre- post-test design enables campus personnel the opportunity to evaluate the
intervention results by using a test at the beginning of the semester as a comparison with
data acquired at the conclusion of a semester.
Collection of Students
The design using a collection of students involves identifying an unique characteristic,
such as, a collection of students that have completed Comp I, have completed 50 credit
hours, or have an ACT score of 24 and above. Students meeting the selection
requirements become project subjects.
Longitudinal Design
An increasingly popular design is the longitudinal study. Longitudinal designs involve
identifying subjects on the basis of participation in a particular course, program, and/or
course of study. Data elements collected for the longitudinal design involve those aspects
that directly relate to the learning topic being examined. The length of a longitudinal
study is determined based on the needs of the project.
8/22/2003
10
Mixed Model
On the other hand, the mixed model involves using a combination of both qualitative and
quantitative data. Mixed models are used with many of the designs previously discussed.
Mixed models provide an excellent opportunity to link the qualitative and quantitative
data being collected in this process.
METHODOLOGICAL CONSTRUCT
The term methodological construct may appear to be misleading, but the writer uses the
term to refer to “how the data is collected”. There are many different means for
collecting data: written, objective, rating scales, sliding scale data, proportional data, and
norm referenced measurements, and qualitative versus quantitative data.
Written Data
Written data refers to data collected through the use of “controlled prompts”, open-ended
responses to assessment surveys or classroom assignments linked with larger assessment
projects. For written assessment data to be of value, considerable “up-front” time must
be used to develop scoring criteria and appropriate score values. Categorization of
written data, also relevant when using qualitative analysis, requires rigorous training in
order to “norm” responses to scale values. Inter-rater reliability is obtained from making
a concerted effort to insure that multiple readers/raters assign a score to a subject’s
writing that falls within the same scale value or does not vary more than one score value.
Without a control for inter-rater reliability written assessment data becomes suspect for
use institution-wide.
Objective Data
Objective data are those data that are presented with a multiple-choice theme, and are
typically used to identify the amount of knowledge possessed. This format is frequently
used in the development of local assessment instruments. These instruments may be
associated with specific courses and/or faculty that are using these data as a link with
larger assessment projects. Prior to using locally developed instruments such as these, it
8/22/2003
11
is necessary to conduct the appropriate reliability and validity checks. It should be noted
that some commercially produced assessment instruments also use the objective format.
Rating Scales
Rating scales refer to the use of specific-point scale values to distinguish subject opinion
or knowledge level. Likert-type scales are “fixed point” scales when subjects complete
the instrument; however, during analysis an item receives a score value that appears to be
sliding or continuous. Readers should note that Likert-type scales are constructed so
subject opinion or perception is determined from the mid-point of the scale (neutral or
undecided), rather than the anchorage points (extreme lower and upper score). Scores
obtained from a rubric need to be recorded as an integer, unless provisions have been
made to account for “sliding” scores. As has been mentioned, significant “up-front”
discussion needs to occur prior to implementing an assessment instrument that yields data
values that create more confusion than answers.
Survey Data
Surveys as a means to collect assessment data has as many advocates as it has critics.
The prevailing wisdom is to view survey data as supplemental to other forms of
assessment data. Unfortunately, surveys have received a great deal of criticism that is
unwarranted when viewing surveys as a methodological design. In this writer’s
experience, most surveys developed and administered fail to meet the lowest level of test
for appropriateness. For some persons a survey consists of the best questions they can
think of and typed, using the latest word processing “gee whiz” format to make it look
professional. If surveys are to be used as an assessment tool, considerable time and
energy needs to be expended to insure: 1) the information desired can be obtained by
using a survey design, 2) questions/statements have been clearly thought-out and written
clearly, 3) the survey has been field-tested on a group of respondents that mirror those for
whom the survey is written, 4) appropriate reliability and validity tests have been
administered, and 5) the data collected can be linked with and support larger sets of
assessment data.
8/22/2003
12
IDENTIFYING MEANS OF MEASUREMENT
As mentioned previously, defining the means of measurement for an assessment project
is one of the most important steps in the process. The following four types of
measurements are possible options for most assessment projects.
Fixed Scale Data
The first type is fixed scale data, or data that have numerical points that are integers.
Discussion topics for this type of scale is whether or not an individual score can have a
partial value; e.g., decimal point. Typically, fixed scale data points should be viewed as
having meaning or value unto themselves; that is, a 1 is a 1 with multiple scores are
summed rather than averaged. This rule is violated frequently.
Sliding Scale Data
The sliding scale is the second type of measurement. As mentioned previously, sliding
scales allow subjects to place their opinion or knowledge level at any point, fraction or
otherwise, along the scale.
Using Proportions
Proportions as a measurement provide a simple way of determining the knowledge or
opinion of a group of subjects. The precision of the proportion is determined by the sum
of what the proportion represents. Many faculty prefer this because of current grading
practices.
Norm Referenced Measurements
Norm referenced measurements represent a score that allows local subjects to be
compared with a larger population regionally or nationally. These scores are derived
externally and are generally very reliable.
8/22/2003
13
Criterion Referenced Measurements
Criterion referenced measurements are questions or items that are written to a specific or
predetermined set of criteria. The criteria become the standard from which students
compete.
Qualitative vs. Quantitative Data
Quantitative data refers to numerical information that explains events, abilities and/or
behavior that requires precision. Qualitative data refers to observational information such
as color, texture or clarity of some object or place. Qualitative data is desirable when
describing or measuring something that lacks inherent precision. Many times qualitative
data is used to construct instruments from which quantitative data is eventually collected.
Both have a place in assessment program measurement.
LOOKING AT DIFFERENCES
Once a process has been developed, instruments implemented, and data collected
researchers have the task of examining the results. The research plan provides an outline
of the analytical steps necessary to demonstrate the degree to which learning has
occurred.
Significant vs. Meaningful Differences
One method of determining "learning growth" is the use of significant tests. There are a
variety of tests that can be used and numerous reference materials that explain their
meaning and use. What is of importance to the current topic is the distinction of what is
significant and what is meaningful. Identification of significance is established during
initial discussion about the assessment project and is stated within the research plan.
Significant difference is a difference between two statistics, e.g., means, proportions,
such that the magnitude of the difference is most likely not due to chance alone (Wilson,
1980). Values associated with significant differences are normally thought of as .05
(occurrence a result of chance is less than 5 in 100), .01 (less than one chance in 100), or
.001 (occurs by chance less than 1 in 1,000). Many tend to view assessment results
strictly in terms of significant difference and are disappointed if those results are not
8/22/2003
14
obtained. What should be recommended is viewing what researchers categorize as
"meaningful differences". Meaningful differences refer to differences that fall within the
range of .15 to .06 (occurs less than 15 to 6 chances in 100). The differences are worthy
to note and represent a meaningful change among and/or between subjects. Meaningful
differences are especially noteworthy for assessment projects that examine growth or
change within short periods of time; e.g., one semester, year or two years. Change that
occurs through short periods of time is less dramatic and must be measured with a
"learning ruler" printed with large numbers. Viewing assessment through smaller
increments of change is more realistic and more reflective of what occurs with shorter
intervals of time.
Control for Effect
Another aspect of "looking at differences" is qualifying change by "controlling for
effect". Controlling for effect is an important step in the interpretation of results from an
experimental/control group or evaluation study. "Effect size" is a process used to see
how much of the standardized mean difference is attributed to the intervention as
compared with differences generally found for this intervention or activity.
Pre-Test to Post-Test Differences
Assessment projects look for differences between groups of subjects through testing for
prior intervention and post intervention; e.g., pre- and post-testing. The use of pre- and
post-testing can serve an assessment project in several ways: 1) pre-tests provide the
basis from which future tests are compared, 2) the legitimacy of the intervention is
framed from pre- and post-test data; 3) differences between the two test scores form the
basis of reporting change or learning, and 4) change can be identified through differences
in test score means or the growth attributed to the distance between scores.
Reliability vs. Validity
Construct Validity is the degree to which inferences can be made from
operationalizations in a study to the theoretical constructs on which those
operationalizations were based (Vogt, 1993).. In other words, if your study is based on a
8/22/2003
15
theoretical notion about some aspect of student learning, it is assumed that as the study
was developed the context of the study is defined operationally (specific aspects of
learning to be studied). At the conclusion of the study if inferences or predictions about
learning from the operational aspects of the assessment can be linked to the theoretical
constructs, then the project has construct validity. On the other hand, content validity
refers to the degree to which items accurately represent the “thing” (behavior, learning,
etc) being measured. Reliability refers to the consistency or stability of a measure to test
from one administration to the next. If the results are similar the instrument is said to be
reliable. Two common techniques for reliability is KR-20 or Cronbach’s Alpha.
Reliability rates are provided by a two-point decimal value with a .60 being considered as
minimal for most studies. A reliability of .60 means that 6 out of 10 persons view
attributes of the study similarly.
IMPLEMENTATION ISSUES
Establishing Analytical Creditability
During the analytical phase of an assessment project two questions resonate: 1) does the
design have analytical credibility?; and 2) how is the data (or score) generalizable to the
fundamental questions of assessment? To answer question one, nine points are presented
to provide the framework for determining analytical credibility:
•
Answer the Question but First Define the Question
Making the statement to define questions before seeking answers probably seems like
questionable advice. If assessment projects “get into trouble” it typically is a result of
varying from the identified question or focus with an excursion to “it would be nice to
know land”. Defining the question before seeking to answer the question, especially
during the project development stage, allows the linking of literature review, colleague
discussion, question definition and agreement to occur prior to implementation.
8/22/2003
16
•
Provide Information According to its Purpose
Research and assessment projects have the potential for voluminous and complex
analytical reports containing facts, figures, data tables and graphs, ad nausam. A general
rule is to provide the minimum of information necessary to fulfill a request or answer the
assessment question, uncluttered with excessive verbiage and unnecessary analysis.
Without question members of the assessment project group should have detailed
information as well as synthesis about the project in which they are involved. Providing
the “right fit”, regarding the amount of information needed to make judgments about the
project, needs to be part of the routine business discussed at assessment committees or
project teams.
•
Match your Information with its Recipients
The discussion of who receives information is as important as purpose and content.
The question of who is to be excluded from the distribution list is an important as that of
who should be included. When communicating the results of assessment it is essential to
build a data portrait of assessment recipients. Most faculty and administrators have
preferences for particular types of data; some prefer data in tabular form, graphs,
narrative, and/or executive summaries. It is clear that most assessment initiatives lack the
resources to produce five different versions of a report to fit the preferences of different
persons. Knowing how strong particular preferences are for information and adjusting
accordingly will enable the assessment “feedback loop” to be more effective.
•
Beware of the Perils of Printout Worship
When many persons think of research and assessment projects a picture comes into focus
of several people carefully examining several inches of computer output while engaged in
an intense discussion. While it is true many forms of analysis produce large stacks of
computer output, it would be unwise for any learning topic group to distribute the
computer output without some synthesis. Regardless of the significance of the work at
hand, very few decision-makers, faculty or administrators, will sift through reams of
output to find the project produced significant results.
8/22/2003
17
•
Keep Focused on what you are Attempting
The origin of this phrase comes from years of experience watching research and
assessment committees’ loose focus of their project, consequently spending inordinate
amounts of time discussing issues long since agreed. When a project team looses focus
on its goals, work stalls and the subsequent inertia provides the catalyst for individuals to
re-engage academic debate that attempts to redefine a project as implementation is taking
place. The best analogy for this is a dog chasing its tail. Assessment projects should
never be implemented as long as there are questions about its intent, theoretical
framework, design or implementation procedures.
•
Develop a Research Plan and Follow It
Projects that have a research plan demonstrate to the campus community that thought and
deliberate action were principal agents of its creation. Likewise, following the
established research plan demonstrates professional and methodological credibility.
Research plans are dynamic and they can be changed to adjust to circumstances and
events that necessitate change. It is imperative that if a change in the plan does occur
those changes should reflect sound methodological judgment. Research plans that are
continually in a state of flux provide little data that is worthwhile or effective. The rules
of reliability and validity must always be the plan’s fundament tenet.
•
Take Time to Summarize
For research and assessment information to be helpful it is necessary for it to be
summarized in some manner. It would be helpful to the campus audience that the
summary contained narrative as well as tabular data. For instance, if a summary report is
distributed and the principal text is a table highlighting a mean (average) score of 3.543,
several questions need to be answered. First, what does a mean (average) score of 3.543
mean, and from what context does this score emerge? Issues pertaining to the score can
be defined by explaining the “context” for the score from the section in the assessment
project plan that pertains to score range. Second, when viewing the score 3.543, how is
this score different from 3.5 or 3.54? It is important to have an agreed upon style for
8/22/2003
18
reporting assessment data that does not suggest or infer artificial precision. An example
of artificial precision is using percent figures containing decimal points, such as, 86.3
percent. Is it necessary to provide data points that contain two proportions, e.g., 86 with
regard to 100 and .3 with regard to 10)?
•
Make an Effort to “do it Right”
No one doubts research and assessment projects take time, financial resources, as well as
physical and emotional energy. If an activity consumes this much energy it would seem
logical for participants to insure assessment meets the “do it right” criterion. “Doing it
right” should take on a practical orientation that is supportive of multi-methodologies and
design philosophy. Assessment projects that are part of an overall institutional
assessment initiative and support the “assessment-oriented perspective” produce
information about learning that assists with the evaluation of curriculum.
Another aspect of “doing it right” is to follow sound methodological practices. Earlier it
was stressed that discussion about the learning topic, literature review and identification
of assessment goals provide the basis for determining the methodologies used. Following
sound or accepted practices sends a message throughout the institution that assessment
projects have creditability. Creditability also occurs when the assessment initiative has
faculty buy-in. If faculty do not legitimize the assessment process, it is unlikely that
anyone else will view the efforts as meaningful. The campus assessment initiative needs
to be inclusive; that is, involve all levels of the faculty involvement. Critics as well as
“believers” should have equal seating at the assessment table. It is important for
assessment committees to meet “head-on” issues or questions raised by the critics of
assessment. Excluding critics from assessment discussions only strengthens their resolve
and intensifies attacks. If the issues raised by critics have merit then those issues should
be examined and discussed by the campus community. An action that lessened tension at
my institution was to encourage the district assessment steering committee to develop a
statement on assessment. A statement was developed and endorsed by the faculty senate
at one of their business meetings. The statement on Ethical Conduct and Assessment
provides the institution’s philosophy on assessment as well as an understanding of how
8/22/2003
19
assessment data will be used. Developing and following the research plan, involving
faculty in its development and implementation, making curriculum decisions based on the
assessment data, and providing feedback to all levels of the institution meets the criteria
for an institution “doing it right”.
•
How Much Data is Enough? Or When does Baseline Data Cease being
Baseline Data?
All assessment initiatives, regardless of institutional size, must make the decision
regarding “how much data is enough”? Prior to answering this question there are several
issues that must be discussed. First, there are statistical concerns that include replication,
sampling error, and methodological viability. The second issue is manageability.
Manageability includes what is practical in terms of human resources, burnout, and
financial costs. As mentioned in the section, “How Many Subjects”, a plan for consistent
assessment of student learning is of greater value to an institution than massive testing
that runs for several semesters, stops and lacks applicability to the overall assessment
initiative.
HOW GENERALIZABLE ARE THE SCORES
Asking the question about how generalizable the scores are to the fundamental questions
of assessment (reference step 2, above) provides the basis for the project’s external
validity. Establishing the relationship between assessment scores and the assessment
instrument answers “to what population(s), settings, treatment variables, and
measurement variables can the effect (outcome) of the activity be generalized to other
learning” (Leedy, 1974:150)? This section poses three questions as a way to explain the
link between generalizable scores with assessment questions.
•
Do the Scores Suggest Learning has Occurred?
During the development phase of the assessment program faculty engage in discussions
that establish the theoretical framework of the project as well as defining what constitutes
learning. A majority of the time evidence of learning is attributed to an assessment score.
8/22/2003
20
The score must be viewed within the context it was created as well as through the
rationale used to create the meaning of each score value.
•
Does the Learning Relate to the Larger Question of [component]
Assessment?
This question seeks to establish whether or not an assessment activity (or score) can be
used to provide evidence of more global learning. For instance, if a group of faculty
develops a project to examine an attribute of critical thinking, will the score obtained
through its assessment provide creditable evidence or data that links the project’s activity
with the established components for critical thinking institution-wide? Administrators
and faculty need to recognize that assessment projects scattered throughout the institution
with little or no linkage to the overall assessment initiative cannot be classified as an
effective assessment program.
•
Does the Activity Producing the Score Suggest General or Specific
Learning?
This question tries to determine that when a group of subjects completing an assessment
activity produce a set of scores, is the basis for these scores a series of questions that
produced evidence of general learning or specific learning? For example, when a group
of subjects complete an assessment activity for critical thinking, do the scores reflect the
subject’s general knowledge about critical thinking or can specific evidence be extracted
from the assessment to identify knowledge of deduction, inference, and so forth.
Distinguishing general knowledge from specific component knowledge provides the
basis for what could be termed a “robust” assessment program.
GATHERING AND MANAGING RESOURCES FOR ASSESSMENT
Whose Responsibility is Assessment Anyway?
Several months ago there was considerable discussion on an assessment listserv
regarding the responsibility for assessment. In the course of the dialogue, the tally was
roughly even between faculty having the responsibility and administrators being
responsible. However, most agreed that it is all our responsibility. Faculty should be
8/22/2003
21
responsible for developing the institutional assessment program, with the administrators
responsible for obtaining the financial resources. In addition to financial resources, it is
also the responsibility of the administration to provide technical assistance for the
assessment initiative. It is understood that not all institutions have the ability to support
assessment in a multi-level fashion; that is, provide financial, human and methodological
resources.
Obtaining Human Resources for Assessment
Institutions not able to hire an assessment director or coordinator may have to look at
other options for the technical and human resources needed to support the assessment
program. Several options exist for institutions with limited resources.
Consultants
Institutions can hire a consultant to visit the campus periodically to monitor assessment
progress. Consultants are able to place a considerable amount of knowledge and
experience in an assessment program in fairly short order. However, the limitation of a
consultant, especially if he or she is not local, is that they are not always available when
assessment questions or problems arise. If a consultant is used, institutions must insure
that the momentum for assessment does not wax and wane with consultant visits.
Institutional Employee
Another option is identifying an existing institutional employee to coordinate the
assessment program. If an institution chooses to use a current employee, he or she should
have several characteristics: 1) must be a faculty member; 2) must have the respect of
his/her colleagues; and 3) he or she should be freed from a majority of their teaching
assignment. The institution should be willing to invest some of its resources to provide
the faculty member with fundamental knowledge about assessment as well as funds to
bring in experts from time to time. Of course, the advantage of using a current employee
is they are on campus everyday. There is one caution about naming an employee as the
“assessment person”. It is too easy for campus personnel to assume the “assessment
person” is responsible for administering, analyzing and writing assessment project
8/22/2003
22
reports. Therefore, it becomes too easy for faculty and others to assume their role is to
respond to the work of others rather than actively engaging in the assessment program.
Assessment Committee
Many institutions create an assessment committee that has representatives from the
faculty, staff and administration to monitor the assessment program. Providing a
“charge” that outlines expectations for the assessment committee insures the work of the
group is meaningful. For instance, the assessment committee is charged with:
•
Determining how the assessment program functions;
•
Clarifying the role faculty play in its operation;
•
Identifying what measures and standards have been proposed and adopted for
assessing student learning; and
•
Stating how the results of assessment are used to identify changes that may be
needed if student learning is to improve.
Assessment Consortium
The different option is to network with a group of colleges and establish an “assessment
consortium”. A consortium has several advantages in that expenses are shared, a variety
of personnel may be available as an assessment resource rather than a single consultant.
Collective knowledge and collaboration are primary benefits of assessment through a
consortium. A limitation of a consortium is its strength is only as viable as the
commitment of the institutions involved.
External Agency
A fifth option is to have a testing organization/company analyze institutional data. If a
testing organization option is chosen, it may necessitate an institution modifying its
assessment strategies to use commercially produced assessments. Testing organizations
can provide a considerable amount of technical expertise in the field in very short order;
however, affiliation with a “for profit” business may create obligations for using their
8/22/2003
23
products. Technical expertise is available but may be by telephone or email rather than
on-site.
Avoiding the HINIBUS Syndrome
One shortcoming to using persons that are not employees of your institution may result in
a “lukewarm” commitment to assessment because of comments like, “Dr. Smith is not a
member of our faculty”, “it’s not our data”, or “the data are more supportive of the needs
of outsiders than our needs”, and so forth. Institutions should be careful not to fall for the
“beware of it’s not our data” when “outsiders” analyze institutional assessment data”.
It is too easy for assessment momentum to be lowered by questions of doubt or
allegations of inappropriate use of institutional assessment data. This is what can be
called the HINIBUS Syndrome, or “Horrible If Not Invented By Us”. Faculties need
to be cautioned that it is not necessary to independently invent all assessment procedures
or activities. Using a combination of locally developed and norm referenced assessments
compliment most assessment programs. Readers should note these are issues that need to
be discussed during the steps reported in the section on Development of Assessment
Projects.
A COLLECTION OF SPECIFIC ASSESSMENT TECHNIQUES
Embedded Course Assessment
The term “course embedded assessment” refers to linking classroom activities and
assignments to the assessment of a common learning outcome. The outcome is linked to
what students are already learning in class, thereby taking advantage of existing
[curricular offerings] that instructors collect or by introducing new assessment measures
into courses. To successfully embed assessment measures into existing assignments, the
following sequence of activities are recommended:
•
•
•
•
•
8/22/2003
Specify intended outcomes;
Identify related courses;
Select measurements and techniques;
Assign techniques to course and embed measures;
Specify assessment criteria;
24
•
•
Evaluate student performance on exams, papers, projects, etc., for course
grades;
Evaluate student performance on course embedded measures.
(Larry H. Kelley, Workshop on Embedded Assessment. Kelley Planning and
Educational Services, LLC).
The most commonly used embedded assessment methods involve the gathering of
student data based on questions placed within course assignments. These questions are
intended to assess student outcomes and are incorporated into course assignments or
requirements, such as, final examinations, research reports, course projects or some type
of demonstration. Student responses are typically graded by at least two faculty members
in order to determine whether or not the students are achieving the prescribed learning
goals and objectives. It should be noted the embedded assessment is a different process
from that used by the course instructor to grade the course assignments, exams, or papers.
There are several advantages to using course embedded assessments, they are:
•
•
•
•
Student information gathered from embedded assessment draw on
accumulated education experiences and familiarity with specific areas or
disciplines.
Embedded assessment often does not require additional time for data
collection, since instruments used to produce student-learning information can
be derived from course assignments that are currently part of the
requirements.
The presentation of feedback to faculty and students can occur quickly
creating an environment conducive for ongoing programmatic improvement.
Course embedded assessment is part of the curricular structure and students
have a tendency to respond seriously to this method.
(Blue Ridge Community College Student Outcomes Assessment Manual: A
Guide for the College Community).
Using Norm-referenced Tests
Norm-referenced tests refer to instruments that are designed and administered to large
groups of students. The collective responses of these students represent learning
associated with the student sample and the test; the results being a Mean (average)
response. After the test is administered many times and with each administration the
instrument is subject to rigorous item and content validity and reliability, the test is
8/22/2003
25
considered “normed” and is the reference point for all students taking the test. The
Means for students at your campus can then be compared with all students that have
taken the test. Norm-referenced tests are often developed by testing companies through
the use of employing “experts” to develop the test items. The assumption regarding
norm-referenced tests is the specific test, subtest or module content is considered to be
what all students should know about a given topic. Tests, subtests or modules are
normally categorized or named in general terms, such as: Reading or Reading
Comprehension, Critical Thinking, Scientific Reasoning, and so forth.
Commercially Produced Tests [similar to norm referenced]
Commercially produced tests and examinations are used to measure student competencies
under controlled conditions. Tests are typically developed by professional organizations
and companies to determine the level of learning a student should acquire in a specific
field of learning. Commercially produced tests generally consist of multiple choice
questions whose results can be used to compare local students with other students from
institutions across the country. If properly chosen, the results from these tests can be
used to improve teaching and learning. The most notable advantages of commercially
produced tests are:
•
•
•
•
Institutional comparisons of student learning;
Little professional time is needed beyond faculty efforts to analyze
examination results and develop appropriate curricular changes that address
the findings;
Nationally developed tests are devised by experts in the respective field;
Tests are typically administered to students in large groups and do not require
faculty involvement when students are taking the exam.
The strongest criticism of commercially produced tests is they may not be reflective of
the institution’s curriculum. Test design and content should be reflective of an
institution’s curriculum in order for the results to be helpful.
8/22/2003
26
Criterion-Referenced Tests
Criterion-referenced tests are designed to measure how well a student has learned a
specific body of knowledge and skills. Multiple choice tests, similar to a driver’s license
test, are examples of a criterion-referenced test. Criterion-referenced tests are usually
made to determine whether or not a student has learned the material taught in a specific
course or program. Criterion-referenced tests that are used within a course are designed
to test the information learned from the course as well as the instruction that prepared
students for the test. The principal use of criterion-referenced tests come from using a
pre- and post-test design to determine how much students know prior to the beginning of
instruction and after it has finished. The test measures specific skills which make up the
designated curriculum. Each skill is expressed at an instructional objective and each skill
is tested using at least four items in order to obtain an adequate sample of student
performance and to minimize the effect of guessing. The items which test any given skill
are parallel in difficulty. Each student’s score, normally expressed as a percentage, is
compared with a preset standard for acceptable achievement with little regard for the
performance of other students. [Source: The National Center for Fair & Open Testing
and Educational Psychology Interactive].
Portfolio
A portfolio is normally considered to be a collection of work that represents an
individual’s cumulative work (at a given point in time and space). In assessment terms, a
portfolio represents a collection of student work that exhibit to faculty a student’s
progress and achievement in specified areas. For instance, included in a student’s
portfolio could be written papers, either term papers, reports, etc. that include a reflective
piece written by the student; results from a comprehensive or capstone examination;
norm-referenced exam results, such as, WGCTA, Cornell X, CAAP-Critical Thinking,
CollegeBASE, etc. If a student is a vocational student the portfolio may consist of pieces
of machinery a student designed and built; a collection of computer programs; field
reports about specific activities or procedures; or a set of drawings that demonstrate
student knowledge.
8/22/2003
27
A portfolio can be collected over the student’s experience at the institution, e.g., one year,
several semesters, etc., so faculty can evaluate the full scope of a student’s work. In
particular, the longitudinal aspect of evaluating portfolios allow faculty to “see” the
academic growth of students as they matriculate through the institution. Central to the
evaluation of a student portfolio is a scale or rubric from which to grade the material or
artifacts. The criteria for grading the portfolio need to be in place prior to the formal
evaluation of a student’s material or artifacts. The proliferation of modern technology
has provided new ways for storing written and visual information; such as through a disk,
CD or webpage.
Scoring Rubrics
Rubrics are a popular means for assessing student learning because, with proper training,
they can be a consistently reliable means to assess essays, papers, projects, and
performances. A rubric contains descriptions of the traits a student must have to be
considered “high-level”, “acceptable”, or “poor quality”.
A rubric can contain several
layers of student ability; such as, comprehensibility, usage, risk taking and variety, to
name a few. Within each rubric category (e.g., risk taking) there are multiple “levels” of
learning a student can display. An analytic rubric measures each part of a student’s work
separately whereas a holistic rubric combines them.
Classroom Assessment Techniques (CAT)
Classroom Assessment Techniques (CAT) refers to when faculty obtain useful
information, or feedback on what, how much and how well their students are learning.
The feedback can be as simple as: 1) list the most important thing you learned today; 2)
what is it that you are most confused about; or 3) what additional information would you
like to know about what was discussed today? This process was popularized by Angelo
and Cross in their work Classroom Assessment Techniques, published by Jossey-Bass in
1993.
8/22/2003
28
Capstone Courses
Capstone courses are designed to integrate knowledge, concepts, and skills associated
with a complete sequence of study in a program. The method of assessment is to use the
courses themselves as the instrument and basis for assessing teaching and student
learning. The evaluation of a student’s work in capstone courses is used as a means of
assessing student outcomes. The capstone course becomes the forum from which a
student displays his or her knowledge through various aspects of their programmatic
experiences. It varies from program to program whether or not a single or several
capstone courses are necessary to adequately assess a student’s learning. Generally
speaking, capstone courses are the final experiences for students within a discipline or
program.
SUMMARY
The basis for this paper was to provide an overview of the practical application for
conducting learning assessment in a campus community. The comments and suggestions
are derived from years of working with campus communities as they attempt to put their
“assessment house in order”. As mentioned previously, encouraging an institution to
create “an assessment-oriented perspective” is the first step in creating a campus climate
that is assessment friendly. The majority of comments made within this paper can be
summarized as a “back-to-the-basics” for fostering the assessment initiative and
developing assessment projects. Hopefully the information contained within this paper
will provide readers with suggestions or techniques that will enable their experience with
assessment to be more rewarding, exciting and productive.
8/22/2003
29
REFERENCES
1993
Angelo, Thomas A., Cross, K. Patricia. Classroom Assessment Techniques: A
Handbook for College Teachers. Jossey-Bass, Publishers: San Francisco.
1987 Kraemer, Helena Chmura and Thiemann, Sue. How Many Subjects? Sage
Publications: Newbury Park, CA.
1991
Miller, Delbert C. Handbook of Research Design and Social Measurement. Sage
Publications: Newbury Park, CA.
1983
Norris, Donald M. “Triage and the Art of Institutional Research.” The AIR
Professional File, Number 16, Spring-Summer 1993. The Association for
Institutional Research.
1993 Vogt, W. Paul Dictionary of Statistics and Methodology: A Nontechnical Guide
for the Social Sciences. Sage Publications: Newbury Park, CA.
1980
Wilson, Terry C. RESEARCHER'S GUIDE TO STATISTICS: Glossary and
Decision Map. University Press of America. Lanham, MD.
8/22/2003
30
Download