An Assessment Primer: A Guide to Conducting Assessment Projects August 2003 Metropolitan Community College Produced by the Office of Research, Evaluation and Assessment An Assessment Primer: A Guide to Conducting Assessment Projects A Resource Developed for the District Steering Committee for Institutional Assessment General Education Outcomes Assessment Subcommittee Subcommittee for Assessment of Occupational Program Outcomes Student Development Assessment Subcommittee The Metropolitan Community College District Business & Technology College – Blue River Community College Longview Community College – Maple Woods Community College Penn Valley Community College – Administrative Center Produced by the Office of Research, Evaluation and Assessment August 22, 2003 TABLE OF CONTENTS Section Page INTRODUCTION……………………………………………………….. Why is Assessment Important…………………………………………….. What is Assessment?..................................................................................... Explanations of an Assessment Program………………………………….. Measures of Learning……………………………………………………… Expectations and Demands for Accountability……………………………. 1 1 1 1 3 4 DEVELOPMENT OF ASSESSMENT PROJECTS…………………… Spend Time to Review the Literature and Discuss Issues about the Topic... Develop Question(s) that form the Basis of Inquiry……………………….. Develop a Research Plan to Obtain the Data………………………………. Develop Answers Based on the Data………………………………………. 4 5 5 5 6 METHODOLOGICAL ISSUES………………………………………… What to Assess……………………………………………………………... Define Components………………………………………………………... Examine Component Intricacies…………………………………………… Review Measurement Options…………………………………………….. Scales………………………………………………………………………. Rubrics……………………………………………………………………... How Many Subjects………………………………………………………... Sampling…………………………………………………………………… Random Sampling…………………………………………………………. Stratified Sample…………………………………………………………... Representative Sample…………………………………………………….. Living with Something Called Error………………………………………. 6 6 6 6 7 7 7 8 8 8 9 9 9 DESIGN ISSUES…………………………………………………………. 10 Experimental vs. Control Group…………………………………………... 10 Collection of Students…………………………………………………….. 10 Longitudinal Design………………………………………………………. 10 Mixed Model……………………………………………………………… 11 METHODOLOGICAL CONSTRUCT………………………………… 11 Written Data………………………………………………………………. 11 Objective Data…………………………………………………………….. 11 Rating Scales………………………………………………………………. 12 Survey Data………………………………………………………………... 12 TABLE OF CONTENTS Section Page IDENTIFYING MEANS OF MEASUREMENT………………………. 13 Fixed Scale Data…………………………………………………………… 13 Sliding Scale Data…………………………………………………………. 13 Using Proportions………………………………………………………….. 13 Norm Referenced Measurements………………………………………….. 13 Criterion Referenced Measurements………………………………………. 14 Qualitative vs. Quantitative Data…………………………………………... 14 LOOKING AT DIFFERENCES………………………………………... 14 Significant vs. Meaningful Differences……………………………………. 14 Control for Effect………………………………………………………….. 15 Pre-Test to Post-Test Differences…………………………………………. 15 Reliability vs. Validity……………………………………………………... 15 IMPLEMENTATION ISSUES…………………………………………. Establishing Analytical Creditability……………………………………… Answer the Question but First Define the Question………………………. Provide Information According to its Purpose……………………………. Match your Information with its Recipients………………………………. Beware of the Perils of Printout Worship…………………………………. Keep Focused on what you are Attempting……………………………….. Develop a Research Plan and Follow It…………………………………… Take Time to Summarize………………………………………………….. Make an Effort to “Do It Right”…………………………………………... How Much Data is Enough?......................................................................... 16 16 16 17 17 17 18 18 18 19 19 HOW GENERALIZABLE ARE THE SCORES………………………. 20 Do the Scores Suggest Learning has Occurred?............................................ 20 Does the Learning Relate to the Larger Question of Assessment?............... 21 Does the Activity Producing the Score Suggest General or Specific Learning………………………………………………………………… 21 GATHERING AND MANAGING RESOURCES FOR ASSESSMENT………………………………………………………... 21 Whose Responsibility is Assessment Anyway?........................................... 21 Obtaining Human Resources for Assessment…………………………….. 22 Consultants………………………………………………………………... 22 Institutional Employee…………………………………………………….. 22 Assessment Committee……………………………………………………. 23 Assessment Consortium…………………………………………………… 23 External Agency…………………………………………………………… 23 Avoiding the HINIBUS Syndrome………………………………………... 24 TABLE OF CONTENTS Section Page A COLLECTION OF SPECIFIC ASSESSMENT TECHNIQUES…. Embedded Course Assessment……………………………………………. Using Norm-Referenced Tests……………………………………………. Commercially Produced Tests…………………………………………….. Criterion-Referenced Tests………………………………………………... Portfolio…………………………………………………………………… Scoring Rubrics…………………………………………………………… Classroom Assessment Techniques (CAT)……………………………….. Capstone Courses…………………………………………………………. 24 24 25 26 27 27 28 28 29 SUMMARY………………………………………………………………. 29 REFERENCES…………………………………………………………….30 Preface This document was developed to assist MCC personnel engaged in assessment or research activities. The contents are the result of the author’s consulting with district faculty, staff and administrators and mirror the questions and concerns voiced by those persons as they engage in their research/assessment activities. Any questions or comments regarding this document should be directed to its author: Dr. Charles L. Van Middlesworth District Director, Research, Evaluation and Assessment Metropolitan Community College 3200 Broadway Kansas City, MO 64111 Telephone: (816) 759-1085 Fax: (816) 759-1158 Email: charles.vanmiddlesworth@mcckc.edu An Assessment Primer INTRODUCTION Why is Assessment Important? Since the late 1980’s much of higher education has been focused, with varying degrees of success, on assessing what students know. External mandates for institutional accountability have made college and universities shift their focus from teaching to learning. This is not to say that the assessment of student learning has not been taking place within the wall of academe; rather, the emphasis has not been on what students learn and the validation that students learn what we think they learn. Assessment is far more than external accountability. It is the process of gathering and using information about student learning to improve the institution, curriculum, pedagogy, faculty and students. What is Assessment? Assessment has been defined as the systematic gathering, interpretation and use of information about student learning for the purposes of improvement. Assessment has also been defined as a means for focusing our collective attention, examining our assumptions and creating a shared academic culture dedicated to continuously improving the quality of higher learning. Assessment requires making expectations and standards for quality explicit and public by systematically gathering evidence on how well performance matches those expectations and standards. It also includes analyzing and interpreting the evidence of assessment and using the information to document, explain, and improve performance. Thus, the single unifying notion of this discussion is that assessment is a process not a product. Expectations of an Assessment Program For an assessment program to be considered “effective” it should contain the following features: 8/22/2003 1 • Structured; that is it should be organized and have a recognizable framework; • Systematic; it is conceived and implemented according to an assessment plan developed by the institution; • Ongoing; assessment activities and feedback are continuing rather than episodic; • Sustainable; that is, the institution is able to maintain the assessment program with the structures, processes, and resources in place; • A Process exists that uses assessment results to improve student learning. The assessment process will be “framed” through the questions an institution wants to know about its teaching and learning. This cannot be emphasized enough – questions will influence every decision made concerning the assessment process. Many assessment plan developers attempt short cuts by beginning with how the data is to be collected, rather than to discuss and question what they want to know. I have learned from years of experience that it is nearly impossible to solve a problem that has not been defined! It is as important to query what students have learned, but it is equally important to provide students with the option of “reflecting on their learning”. Assessment can refer to two different activities: the gathering of information and the utilization of information for improvement. From a practitioner point of view, the most important description of assessment is simply: 1) what do students know; 2) how do you know they know; and 3) what do you do with the information? As a term, assessment data has various meanings within higher education. The term can refer to a student’s course grade, a college placement test score, or a score on some form of standardized instrument produced by an external agency or organization. The information for assessment may be numerical, written reflection on previous work, results from internally or externally developed instruments or examinations, or assessments embedded within course work, to name a few. The principal goal of a program for the assessment of student academic achievement is to improve teaching and learning with evidence provided from an institution’s assessment program. Assessment 8/22/2003 2 information can stimulate changes in curriculum. A common misconception in higher education is that assessment consists of administering exams, assigning a course grade, scoring an assignment with a rubric or having a student demonstrate learning. Assessment is not an act but rather a process that includes developing a conceptual framework, identifying learning outcomes, developing a methodology for implementation, evaluating design issues, administration of the activity, analyzing the results, with the final step being using the information learned from the process to improve teaching and learning. This primer is designed to assist the reader as they become involved with their institution’s assessment program. Measures of Learning Data collection methods include paper and pencil testing, essays and writing samples, portfolio collections of student work, exit interviews, surveys, focused group interviews, the use of external evaluators, logs and journals, behavioral observations, and many other research tools. Research methods should be tailored to the type of data to be gathered, the degree of reliability required, and the most appropriate measurement for the activity being conducted. In practical terms, there are two different types of learning measurement: direct measures of learning and indirect measures of learning. Direct Measures of Learning include pre- and post-testing; capstone courses; oral examinations; internships; portfolio assessments; evaluation of capstone projects; theses or dissertations; standardized national exams; locally developed tests; performance on licensure, certification, or professional exams; and juried reviews and performances. Indirect Measures of Learning might include information gathered from alumni, employers, and students; graduation rates; retention and transfer studies; graduate follow-up studies; success of students in subsequent institutional settings; and job placement data. The preference for assessment programs is to use direct measures of learning. 8/22/2003 3 What can institutions of higher education do to prepare themselves to meet the expectations or demands for accountability through the assessment of student learning? Institutions should create “an assessment-oriented perspective”. “An assessment-oriented perspective” exists when all levels of the institution become advocates for “doing what is right” and commit time, energy and resources to seriously examine student learning. Advocating an emphasis toward student learning enables the institution to place “at the head of the table” the single most important aspect of higher education, and that is learning. The role of faculty in this endeavor is paramount. The comment made most often by faculty is they are content not assessment specialists; however, faculty need to realize it is through their knowledge of the content area that learning questions are posed, discussed, defined and assessed. One method to use when engaging in the process of assessment is triage. Triage has been defined as the sorting of and assigning priority order to projects on the basis of where time, energy and resources are best used or most needed. In this case, triage refers to identifying a learning need, such as general education, discussing its attributes, defining its context, developing its component and learning outcomes and developing assessment strategies to answer those questions. First and foremost, all should recognize that assessment is research. Assessment is research because through the multi-stage research process judgments are made, and judgment translates into meaning. DEVELOPMENT OF ASSESSMENT PROJECTS For the last 10 years, institutions of higher education have spent considerable time and energy refining their “Plan for the Assessment of Student Academic Achievement”. When an institution’s plan was developed, submitted and accepted, there were probably some that did not think revisions would be necessary until the next accreditation visit. This is not true and by now it is widely known that some plans will require significant changes. A method of reviewing assessment plans and making adjustments that are timewise and appropriate are: 1. Spend time to review the literature and discuss issues about the topic 8/22/2003 4 2. Develop question(s) that form the basis of inquiry 3. Develop a research plan to obtain the data 4. Develop answers based on the data. Spend Time to Review the Literature and Discuss Issues about the Topic In general terms, the first step to building an effective assessment project is to spend time to review the literature and discuss issues about the topic. Without question, discussions regarding learning topics are more productive when efforts are taken to review pertinent literature. The literature review allows members of the “learning topic group” the ability to obtain a “theoretical perspective” of the topic as well as examine implementation of similar projects at like institutions. Having and discussing background information tends to keep the learning topic “in focus”. Develop Question(s) that Form the Basis of Inquiry Once the literature review and subsequent discussion has taken place the learning topic group needs to develop a series of questions that will form the basis of inquiry. As mentioned previously, assessment is research and research is an activity that is employed to answer questions. During the question development stage steps need to be taken to insure the question and its components are adequately defined or specified. Poorly framed questions will generate data of little value. Develop a Research Plan to Obtain the Data The third step in this process is to create a research plan that becomes the operational document for examining the learning topic. A research plan is the methodological roadmap to successful and useful assessment activities. The development of the research plan occurs following the first two steps of this process, noted above. Earlier narrative provides the basis from which the learning topic will be framed, structured and assessed. 8/22/2003 5 Develop Answers Based on the Data The last statement constituting this process seems logic and/or obvious. In fact, this aspect of the process can be the most difficult because the data provided might show that earlier steps were not “fleshed-out” to the extent they should have been. At this stage if any short cuts or other “less defined” activities were implemented, the data will provide a clear statement that the process needs to be refined or some aspects of the project need revisiting. METHODOLOGICAL ISSUES What to Assess Within the context of the aforementioned steps, it is critical that appropriate methodological decisions be made about the project and its associated activities. Several issues have been identified to provide a “framework” to assist with the development of the methodological component of the learning topic project. The first issue is to determine what to assess. Earlier, a four-step process asked the learning topic group to review the literature and develop questions that formed the basis of the assessment activity. Define Components In methodological terms, it is time to define the components of the question in operational terms. “In operational terms” suggests that each component of the learning topic be defined in a unique and identifiable way. Examine Component Intricacies Once the learning topic components are defined, members of the group need to focus on the measurement possibilities. Examination of component intricacies provides a basis for determining if the learning topic is to be assessed as a whole and/or through the individual components. If the “individual component” option is chosen then the measurement option chosen needs to afford the learning topic group the ability to individually assess each component as well as its contribution to the whole. 8/22/2003 6 Review Measurement Options Measurement options for components include the use of fixed or sliding scales or scoring rubrics. Scales or rubrics become numerical representations of learning activities, experiences and breadth of learning. Scales A scale has been defined as a set of items “arranged in some order of intensity or importance” (Vogt, 1993). Scales associated with research and assessment activities can be generally categorized as fixed or sliding. A fixed scale refers to a set of points that identify a particular social or learning event. The points on a fixed scale are integers, that is, on a scale of 1 to 7 each case (or person’s) behavior and/or learning is associated with a component score that is either a 1,2,3,4,5,6 or 7. Fixed scale points are defined in terms of “agreement”, “satisfaction” or other, but the scale definition and value are the same for each item. Fixed scales are sometimes misconstrued as sliding scales because during the analysis the mean for an individual item may be a 3.7, thus, many think of the scale as sliding. This is not so, because 3.7 is a summary of item responses rather than an individual item score. All scores for the item create the item summary mean. A sliding scale may use the same 7-point scale but a person or case has the option of placing himself or herself at any point along the scale. Sliding scales are typically used to determine where a person would place himself or herself in response to a set of questions or items that pertain to a defined knowledge or opinion set. Although the scale is a 7point scale, the scale attributes are different for each item. Rubrics A rubric, on the other hand, utilizes an attribute scale designed to signify the level or rate of learning that has taken place. The idea behind a rubric is for members of the learning topic group to identify competencies or attributes that enable specific distinctions between performance or knowledge levels for each person completing the assessment activity. A rubric may not lend itself to utilizing a mean to describe performance or differences between participant scores. Rubrics are typically discrete scales that identify 8/22/2003 7 participants as achieving a score of 1, 2, 3, 4, 5 or 6, rather than identifying participants as at the 3.46 level. If the learning topic group wants to utilize a sliding scale, those discussions need to occur during the project development stage prior to implementation. How Many Subjects? One of the most frequently asked question of methodologists is “how many people do we need to make this a legitimate study/process? The answer to this question has antecedents in the discussion above. The number of students is determined by the nature of the project and the level of implementation (pilot study or major assessment component). Sampling The first question that should be asked is whether or not all students should receive the assessment or a sample. This decision is largely determined by the size of the student population, availability of human and financial resources, and relevance to the project intent. Small institutions should carefully scrutinize the use of sampling because of the small numbers of students that represent the target group. Readers should be cautioned that not all members of a campus community necessarily endorse the use of sampling. There are many reasons for the reluctance to accept sampling as a viable component of the assessment process. This writer believes much of the reluctance stems from a misunderstanding of sampling, its attributes, and rules regarding the appropriate use of sampling. When properly applied, sampling is a powerful tool and should be in every institution’s methodological toolbox. There are several types of sampling: random, stratified and representative. By far the easiest sampling technique to implement is the random sample. Random Sampling A random sample is determined by “selecting a group of subjects for study from a larger group (population) so that each individual is chosen entirely by chance” (Vogt, 1993). 8/22/2003 8 The important aspect of the random sample is that every member of the population has an equal opportunity of being chosen. Stratified Sample A stratified (random) sample involves selecting a group of subjects from particular categories (or strata, or layers) within the population (Vogt, 1993). Examples of a stratified sample are using females, or males 30 years of age or older, or students completing Composition I, as the population for the study. Representative Sample Representative samples are not mentioned in the literature to the extent of random and stratified samples. A representative sample is the selection of a group of subjects that are similar to the population from which it was drawn. In the case of representative sample, the operative word is similar. In its complete form, similar refers to matching the characteristics of a subject population in terms of the criteria that identify that population. For instance, if the learning topic group wants to assess a representative sample of students at an institution the criteria for determining membership within the sample must be through “demographic analysis”. Prior to selecting the sample the learning topic group would identify the attributes of the student population that are determined “demographic”. In most cases the demographic attributes of a subject population would be gender, age, racial/ethnic affiliation, marital status, socio-economic level and so forth. To obtain a representative sample subjects would be selected in the direct proportion of their membership within the student population. For instance, if 25 percent of the student population is female, over 25 years of age, white, single and lower middle class, then 25 percent of the study subjects must also meet these criteria. Representative samples are not used at a frequency as great as random or stratified because of the complex nature of developing such samples. Living with Something Called “Error” Aside from the mechanics involved with selecting a sample of subjects is the realization that every form of design has some sampling error. The typical amount of sampling error 8/22/2003 9 associated with surveys varies from 3 to 5 percent. The amount of error a project is allowed to tolerate is proportional to the number of persons associated with the project and the amount of “infrastructure’ support that is available. DESIGN ISSUES There are several types of research designs that lend themselves to assessment projects: experimental versus control groups, pre- post-test, collection of students, longitudinal or mixed model. Each design will be briefly discussed. Experimental vs. Control Group The experiment versus control group is one of the most widely known research designs. The underlying principal of the experimental design is the ability to assess the intervention(s) and/or the lack of, with a group of subjects. As is commonly known, the control group does not receive the intervention whereas the experimental group does. The pre- post-test design enables campus personnel the opportunity to evaluate the intervention results by using a test at the beginning of the semester as a comparison with data acquired at the conclusion of a semester. Collection of Students The design using a collection of students involves identifying an unique characteristic, such as, a collection of students that have completed Comp I, have completed 50 credit hours, or have an ACT score of 24 and above. Students meeting the selection requirements become project subjects. Longitudinal Design An increasingly popular design is the longitudinal study. Longitudinal designs involve identifying subjects on the basis of participation in a particular course, program, and/or course of study. Data elements collected for the longitudinal design involve those aspects that directly relate to the learning topic being examined. The length of a longitudinal study is determined based on the needs of the project. 8/22/2003 10 Mixed Model On the other hand, the mixed model involves using a combination of both qualitative and quantitative data. Mixed models are used with many of the designs previously discussed. Mixed models provide an excellent opportunity to link the qualitative and quantitative data being collected in this process. METHODOLOGICAL CONSTRUCT The term methodological construct may appear to be misleading, but the writer uses the term to refer to “how the data is collected”. There are many different means for collecting data: written, objective, rating scales, sliding scale data, proportional data, and norm referenced measurements, and qualitative versus quantitative data. Written Data Written data refers to data collected through the use of “controlled prompts”, open-ended responses to assessment surveys or classroom assignments linked with larger assessment projects. For written assessment data to be of value, considerable “up-front” time must be used to develop scoring criteria and appropriate score values. Categorization of written data, also relevant when using qualitative analysis, requires rigorous training in order to “norm” responses to scale values. Inter-rater reliability is obtained from making a concerted effort to insure that multiple readers/raters assign a score to a subject’s writing that falls within the same scale value or does not vary more than one score value. Without a control for inter-rater reliability written assessment data becomes suspect for use institution-wide. Objective Data Objective data are those data that are presented with a multiple-choice theme, and are typically used to identify the amount of knowledge possessed. This format is frequently used in the development of local assessment instruments. These instruments may be associated with specific courses and/or faculty that are using these data as a link with larger assessment projects. Prior to using locally developed instruments such as these, it 8/22/2003 11 is necessary to conduct the appropriate reliability and validity checks. It should be noted that some commercially produced assessment instruments also use the objective format. Rating Scales Rating scales refer to the use of specific-point scale values to distinguish subject opinion or knowledge level. Likert-type scales are “fixed point” scales when subjects complete the instrument; however, during analysis an item receives a score value that appears to be sliding or continuous. Readers should note that Likert-type scales are constructed so subject opinion or perception is determined from the mid-point of the scale (neutral or undecided), rather than the anchorage points (extreme lower and upper score). Scores obtained from a rubric need to be recorded as an integer, unless provisions have been made to account for “sliding” scores. As has been mentioned, significant “up-front” discussion needs to occur prior to implementing an assessment instrument that yields data values that create more confusion than answers. Survey Data Surveys as a means to collect assessment data has as many advocates as it has critics. The prevailing wisdom is to view survey data as supplemental to other forms of assessment data. Unfortunately, surveys have received a great deal of criticism that is unwarranted when viewing surveys as a methodological design. In this writer’s experience, most surveys developed and administered fail to meet the lowest level of test for appropriateness. For some persons a survey consists of the best questions they can think of and typed, using the latest word processing “gee whiz” format to make it look professional. If surveys are to be used as an assessment tool, considerable time and energy needs to be expended to insure: 1) the information desired can be obtained by using a survey design, 2) questions/statements have been clearly thought-out and written clearly, 3) the survey has been field-tested on a group of respondents that mirror those for whom the survey is written, 4) appropriate reliability and validity tests have been administered, and 5) the data collected can be linked with and support larger sets of assessment data. 8/22/2003 12 IDENTIFYING MEANS OF MEASUREMENT As mentioned previously, defining the means of measurement for an assessment project is one of the most important steps in the process. The following four types of measurements are possible options for most assessment projects. Fixed Scale Data The first type is fixed scale data, or data that have numerical points that are integers. Discussion topics for this type of scale is whether or not an individual score can have a partial value; e.g., decimal point. Typically, fixed scale data points should be viewed as having meaning or value unto themselves; that is, a 1 is a 1 with multiple scores are summed rather than averaged. This rule is violated frequently. Sliding Scale Data The sliding scale is the second type of measurement. As mentioned previously, sliding scales allow subjects to place their opinion or knowledge level at any point, fraction or otherwise, along the scale. Using Proportions Proportions as a measurement provide a simple way of determining the knowledge or opinion of a group of subjects. The precision of the proportion is determined by the sum of what the proportion represents. Many faculty prefer this because of current grading practices. Norm Referenced Measurements Norm referenced measurements represent a score that allows local subjects to be compared with a larger population regionally or nationally. These scores are derived externally and are generally very reliable. 8/22/2003 13 Criterion Referenced Measurements Criterion referenced measurements are questions or items that are written to a specific or predetermined set of criteria. The criteria become the standard from which students compete. Qualitative vs. Quantitative Data Quantitative data refers to numerical information that explains events, abilities and/or behavior that requires precision. Qualitative data refers to observational information such as color, texture or clarity of some object or place. Qualitative data is desirable when describing or measuring something that lacks inherent precision. Many times qualitative data is used to construct instruments from which quantitative data is eventually collected. Both have a place in assessment program measurement. LOOKING AT DIFFERENCES Once a process has been developed, instruments implemented, and data collected researchers have the task of examining the results. The research plan provides an outline of the analytical steps necessary to demonstrate the degree to which learning has occurred. Significant vs. Meaningful Differences One method of determining "learning growth" is the use of significant tests. There are a variety of tests that can be used and numerous reference materials that explain their meaning and use. What is of importance to the current topic is the distinction of what is significant and what is meaningful. Identification of significance is established during initial discussion about the assessment project and is stated within the research plan. Significant difference is a difference between two statistics, e.g., means, proportions, such that the magnitude of the difference is most likely not due to chance alone (Wilson, 1980). Values associated with significant differences are normally thought of as .05 (occurrence a result of chance is less than 5 in 100), .01 (less than one chance in 100), or .001 (occurs by chance less than 1 in 1,000). Many tend to view assessment results strictly in terms of significant difference and are disappointed if those results are not 8/22/2003 14 obtained. What should be recommended is viewing what researchers categorize as "meaningful differences". Meaningful differences refer to differences that fall within the range of .15 to .06 (occurs less than 15 to 6 chances in 100). The differences are worthy to note and represent a meaningful change among and/or between subjects. Meaningful differences are especially noteworthy for assessment projects that examine growth or change within short periods of time; e.g., one semester, year or two years. Change that occurs through short periods of time is less dramatic and must be measured with a "learning ruler" printed with large numbers. Viewing assessment through smaller increments of change is more realistic and more reflective of what occurs with shorter intervals of time. Control for Effect Another aspect of "looking at differences" is qualifying change by "controlling for effect". Controlling for effect is an important step in the interpretation of results from an experimental/control group or evaluation study. "Effect size" is a process used to see how much of the standardized mean difference is attributed to the intervention as compared with differences generally found for this intervention or activity. Pre-Test to Post-Test Differences Assessment projects look for differences between groups of subjects through testing for prior intervention and post intervention; e.g., pre- and post-testing. The use of pre- and post-testing can serve an assessment project in several ways: 1) pre-tests provide the basis from which future tests are compared, 2) the legitimacy of the intervention is framed from pre- and post-test data; 3) differences between the two test scores form the basis of reporting change or learning, and 4) change can be identified through differences in test score means or the growth attributed to the distance between scores. Reliability vs. Validity Construct Validity is the degree to which inferences can be made from operationalizations in a study to the theoretical constructs on which those operationalizations were based (Vogt, 1993).. In other words, if your study is based on a 8/22/2003 15 theoretical notion about some aspect of student learning, it is assumed that as the study was developed the context of the study is defined operationally (specific aspects of learning to be studied). At the conclusion of the study if inferences or predictions about learning from the operational aspects of the assessment can be linked to the theoretical constructs, then the project has construct validity. On the other hand, content validity refers to the degree to which items accurately represent the “thing” (behavior, learning, etc) being measured. Reliability refers to the consistency or stability of a measure to test from one administration to the next. If the results are similar the instrument is said to be reliable. Two common techniques for reliability is KR-20 or Cronbach’s Alpha. Reliability rates are provided by a two-point decimal value with a .60 being considered as minimal for most studies. A reliability of .60 means that 6 out of 10 persons view attributes of the study similarly. IMPLEMENTATION ISSUES Establishing Analytical Creditability During the analytical phase of an assessment project two questions resonate: 1) does the design have analytical credibility?; and 2) how is the data (or score) generalizable to the fundamental questions of assessment? To answer question one, nine points are presented to provide the framework for determining analytical credibility: • Answer the Question but First Define the Question Making the statement to define questions before seeking answers probably seems like questionable advice. If assessment projects “get into trouble” it typically is a result of varying from the identified question or focus with an excursion to “it would be nice to know land”. Defining the question before seeking to answer the question, especially during the project development stage, allows the linking of literature review, colleague discussion, question definition and agreement to occur prior to implementation. 8/22/2003 16 • Provide Information According to its Purpose Research and assessment projects have the potential for voluminous and complex analytical reports containing facts, figures, data tables and graphs, ad nausam. A general rule is to provide the minimum of information necessary to fulfill a request or answer the assessment question, uncluttered with excessive verbiage and unnecessary analysis. Without question members of the assessment project group should have detailed information as well as synthesis about the project in which they are involved. Providing the “right fit”, regarding the amount of information needed to make judgments about the project, needs to be part of the routine business discussed at assessment committees or project teams. • Match your Information with its Recipients The discussion of who receives information is as important as purpose and content. The question of who is to be excluded from the distribution list is an important as that of who should be included. When communicating the results of assessment it is essential to build a data portrait of assessment recipients. Most faculty and administrators have preferences for particular types of data; some prefer data in tabular form, graphs, narrative, and/or executive summaries. It is clear that most assessment initiatives lack the resources to produce five different versions of a report to fit the preferences of different persons. Knowing how strong particular preferences are for information and adjusting accordingly will enable the assessment “feedback loop” to be more effective. • Beware of the Perils of Printout Worship When many persons think of research and assessment projects a picture comes into focus of several people carefully examining several inches of computer output while engaged in an intense discussion. While it is true many forms of analysis produce large stacks of computer output, it would be unwise for any learning topic group to distribute the computer output without some synthesis. Regardless of the significance of the work at hand, very few decision-makers, faculty or administrators, will sift through reams of output to find the project produced significant results. 8/22/2003 17 • Keep Focused on what you are Attempting The origin of this phrase comes from years of experience watching research and assessment committees’ loose focus of their project, consequently spending inordinate amounts of time discussing issues long since agreed. When a project team looses focus on its goals, work stalls and the subsequent inertia provides the catalyst for individuals to re-engage academic debate that attempts to redefine a project as implementation is taking place. The best analogy for this is a dog chasing its tail. Assessment projects should never be implemented as long as there are questions about its intent, theoretical framework, design or implementation procedures. • Develop a Research Plan and Follow It Projects that have a research plan demonstrate to the campus community that thought and deliberate action were principal agents of its creation. Likewise, following the established research plan demonstrates professional and methodological credibility. Research plans are dynamic and they can be changed to adjust to circumstances and events that necessitate change. It is imperative that if a change in the plan does occur those changes should reflect sound methodological judgment. Research plans that are continually in a state of flux provide little data that is worthwhile or effective. The rules of reliability and validity must always be the plan’s fundament tenet. • Take Time to Summarize For research and assessment information to be helpful it is necessary for it to be summarized in some manner. It would be helpful to the campus audience that the summary contained narrative as well as tabular data. For instance, if a summary report is distributed and the principal text is a table highlighting a mean (average) score of 3.543, several questions need to be answered. First, what does a mean (average) score of 3.543 mean, and from what context does this score emerge? Issues pertaining to the score can be defined by explaining the “context” for the score from the section in the assessment project plan that pertains to score range. Second, when viewing the score 3.543, how is this score different from 3.5 or 3.54? It is important to have an agreed upon style for 8/22/2003 18 reporting assessment data that does not suggest or infer artificial precision. An example of artificial precision is using percent figures containing decimal points, such as, 86.3 percent. Is it necessary to provide data points that contain two proportions, e.g., 86 with regard to 100 and .3 with regard to 10)? • Make an Effort to “do it Right” No one doubts research and assessment projects take time, financial resources, as well as physical and emotional energy. If an activity consumes this much energy it would seem logical for participants to insure assessment meets the “do it right” criterion. “Doing it right” should take on a practical orientation that is supportive of multi-methodologies and design philosophy. Assessment projects that are part of an overall institutional assessment initiative and support the “assessment-oriented perspective” produce information about learning that assists with the evaluation of curriculum. Another aspect of “doing it right” is to follow sound methodological practices. Earlier it was stressed that discussion about the learning topic, literature review and identification of assessment goals provide the basis for determining the methodologies used. Following sound or accepted practices sends a message throughout the institution that assessment projects have creditability. Creditability also occurs when the assessment initiative has faculty buy-in. If faculty do not legitimize the assessment process, it is unlikely that anyone else will view the efforts as meaningful. The campus assessment initiative needs to be inclusive; that is, involve all levels of the faculty involvement. Critics as well as “believers” should have equal seating at the assessment table. It is important for assessment committees to meet “head-on” issues or questions raised by the critics of assessment. Excluding critics from assessment discussions only strengthens their resolve and intensifies attacks. If the issues raised by critics have merit then those issues should be examined and discussed by the campus community. An action that lessened tension at my institution was to encourage the district assessment steering committee to develop a statement on assessment. A statement was developed and endorsed by the faculty senate at one of their business meetings. The statement on Ethical Conduct and Assessment provides the institution’s philosophy on assessment as well as an understanding of how 8/22/2003 19 assessment data will be used. Developing and following the research plan, involving faculty in its development and implementation, making curriculum decisions based on the assessment data, and providing feedback to all levels of the institution meets the criteria for an institution “doing it right”. • How Much Data is Enough? Or When does Baseline Data Cease being Baseline Data? All assessment initiatives, regardless of institutional size, must make the decision regarding “how much data is enough”? Prior to answering this question there are several issues that must be discussed. First, there are statistical concerns that include replication, sampling error, and methodological viability. The second issue is manageability. Manageability includes what is practical in terms of human resources, burnout, and financial costs. As mentioned in the section, “How Many Subjects”, a plan for consistent assessment of student learning is of greater value to an institution than massive testing that runs for several semesters, stops and lacks applicability to the overall assessment initiative. HOW GENERALIZABLE ARE THE SCORES Asking the question about how generalizable the scores are to the fundamental questions of assessment (reference step 2, above) provides the basis for the project’s external validity. Establishing the relationship between assessment scores and the assessment instrument answers “to what population(s), settings, treatment variables, and measurement variables can the effect (outcome) of the activity be generalized to other learning” (Leedy, 1974:150)? This section poses three questions as a way to explain the link between generalizable scores with assessment questions. • Do the Scores Suggest Learning has Occurred? During the development phase of the assessment program faculty engage in discussions that establish the theoretical framework of the project as well as defining what constitutes learning. A majority of the time evidence of learning is attributed to an assessment score. 8/22/2003 20 The score must be viewed within the context it was created as well as through the rationale used to create the meaning of each score value. • Does the Learning Relate to the Larger Question of [component] Assessment? This question seeks to establish whether or not an assessment activity (or score) can be used to provide evidence of more global learning. For instance, if a group of faculty develops a project to examine an attribute of critical thinking, will the score obtained through its assessment provide creditable evidence or data that links the project’s activity with the established components for critical thinking institution-wide? Administrators and faculty need to recognize that assessment projects scattered throughout the institution with little or no linkage to the overall assessment initiative cannot be classified as an effective assessment program. • Does the Activity Producing the Score Suggest General or Specific Learning? This question tries to determine that when a group of subjects completing an assessment activity produce a set of scores, is the basis for these scores a series of questions that produced evidence of general learning or specific learning? For example, when a group of subjects complete an assessment activity for critical thinking, do the scores reflect the subject’s general knowledge about critical thinking or can specific evidence be extracted from the assessment to identify knowledge of deduction, inference, and so forth. Distinguishing general knowledge from specific component knowledge provides the basis for what could be termed a “robust” assessment program. GATHERING AND MANAGING RESOURCES FOR ASSESSMENT Whose Responsibility is Assessment Anyway? Several months ago there was considerable discussion on an assessment listserv regarding the responsibility for assessment. In the course of the dialogue, the tally was roughly even between faculty having the responsibility and administrators being responsible. However, most agreed that it is all our responsibility. Faculty should be 8/22/2003 21 responsible for developing the institutional assessment program, with the administrators responsible for obtaining the financial resources. In addition to financial resources, it is also the responsibility of the administration to provide technical assistance for the assessment initiative. It is understood that not all institutions have the ability to support assessment in a multi-level fashion; that is, provide financial, human and methodological resources. Obtaining Human Resources for Assessment Institutions not able to hire an assessment director or coordinator may have to look at other options for the technical and human resources needed to support the assessment program. Several options exist for institutions with limited resources. Consultants Institutions can hire a consultant to visit the campus periodically to monitor assessment progress. Consultants are able to place a considerable amount of knowledge and experience in an assessment program in fairly short order. However, the limitation of a consultant, especially if he or she is not local, is that they are not always available when assessment questions or problems arise. If a consultant is used, institutions must insure that the momentum for assessment does not wax and wane with consultant visits. Institutional Employee Another option is identifying an existing institutional employee to coordinate the assessment program. If an institution chooses to use a current employee, he or she should have several characteristics: 1) must be a faculty member; 2) must have the respect of his/her colleagues; and 3) he or she should be freed from a majority of their teaching assignment. The institution should be willing to invest some of its resources to provide the faculty member with fundamental knowledge about assessment as well as funds to bring in experts from time to time. Of course, the advantage of using a current employee is they are on campus everyday. There is one caution about naming an employee as the “assessment person”. It is too easy for campus personnel to assume the “assessment person” is responsible for administering, analyzing and writing assessment project 8/22/2003 22 reports. Therefore, it becomes too easy for faculty and others to assume their role is to respond to the work of others rather than actively engaging in the assessment program. Assessment Committee Many institutions create an assessment committee that has representatives from the faculty, staff and administration to monitor the assessment program. Providing a “charge” that outlines expectations for the assessment committee insures the work of the group is meaningful. For instance, the assessment committee is charged with: • Determining how the assessment program functions; • Clarifying the role faculty play in its operation; • Identifying what measures and standards have been proposed and adopted for assessing student learning; and • Stating how the results of assessment are used to identify changes that may be needed if student learning is to improve. Assessment Consortium The different option is to network with a group of colleges and establish an “assessment consortium”. A consortium has several advantages in that expenses are shared, a variety of personnel may be available as an assessment resource rather than a single consultant. Collective knowledge and collaboration are primary benefits of assessment through a consortium. A limitation of a consortium is its strength is only as viable as the commitment of the institutions involved. External Agency A fifth option is to have a testing organization/company analyze institutional data. If a testing organization option is chosen, it may necessitate an institution modifying its assessment strategies to use commercially produced assessments. Testing organizations can provide a considerable amount of technical expertise in the field in very short order; however, affiliation with a “for profit” business may create obligations for using their 8/22/2003 23 products. Technical expertise is available but may be by telephone or email rather than on-site. Avoiding the HINIBUS Syndrome One shortcoming to using persons that are not employees of your institution may result in a “lukewarm” commitment to assessment because of comments like, “Dr. Smith is not a member of our faculty”, “it’s not our data”, or “the data are more supportive of the needs of outsiders than our needs”, and so forth. Institutions should be careful not to fall for the “beware of it’s not our data” when “outsiders” analyze institutional assessment data”. It is too easy for assessment momentum to be lowered by questions of doubt or allegations of inappropriate use of institutional assessment data. This is what can be called the HINIBUS Syndrome, or “Horrible If Not Invented By Us”. Faculties need to be cautioned that it is not necessary to independently invent all assessment procedures or activities. Using a combination of locally developed and norm referenced assessments compliment most assessment programs. Readers should note these are issues that need to be discussed during the steps reported in the section on Development of Assessment Projects. A COLLECTION OF SPECIFIC ASSESSMENT TECHNIQUES Embedded Course Assessment The term “course embedded assessment” refers to linking classroom activities and assignments to the assessment of a common learning outcome. The outcome is linked to what students are already learning in class, thereby taking advantage of existing [curricular offerings] that instructors collect or by introducing new assessment measures into courses. To successfully embed assessment measures into existing assignments, the following sequence of activities are recommended: • • • • • 8/22/2003 Specify intended outcomes; Identify related courses; Select measurements and techniques; Assign techniques to course and embed measures; Specify assessment criteria; 24 • • Evaluate student performance on exams, papers, projects, etc., for course grades; Evaluate student performance on course embedded measures. (Larry H. Kelley, Workshop on Embedded Assessment. Kelley Planning and Educational Services, LLC). The most commonly used embedded assessment methods involve the gathering of student data based on questions placed within course assignments. These questions are intended to assess student outcomes and are incorporated into course assignments or requirements, such as, final examinations, research reports, course projects or some type of demonstration. Student responses are typically graded by at least two faculty members in order to determine whether or not the students are achieving the prescribed learning goals and objectives. It should be noted the embedded assessment is a different process from that used by the course instructor to grade the course assignments, exams, or papers. There are several advantages to using course embedded assessments, they are: • • • • Student information gathered from embedded assessment draw on accumulated education experiences and familiarity with specific areas or disciplines. Embedded assessment often does not require additional time for data collection, since instruments used to produce student-learning information can be derived from course assignments that are currently part of the requirements. The presentation of feedback to faculty and students can occur quickly creating an environment conducive for ongoing programmatic improvement. Course embedded assessment is part of the curricular structure and students have a tendency to respond seriously to this method. (Blue Ridge Community College Student Outcomes Assessment Manual: A Guide for the College Community). Using Norm-referenced Tests Norm-referenced tests refer to instruments that are designed and administered to large groups of students. The collective responses of these students represent learning associated with the student sample and the test; the results being a Mean (average) response. After the test is administered many times and with each administration the instrument is subject to rigorous item and content validity and reliability, the test is 8/22/2003 25 considered “normed” and is the reference point for all students taking the test. The Means for students at your campus can then be compared with all students that have taken the test. Norm-referenced tests are often developed by testing companies through the use of employing “experts” to develop the test items. The assumption regarding norm-referenced tests is the specific test, subtest or module content is considered to be what all students should know about a given topic. Tests, subtests or modules are normally categorized or named in general terms, such as: Reading or Reading Comprehension, Critical Thinking, Scientific Reasoning, and so forth. Commercially Produced Tests [similar to norm referenced] Commercially produced tests and examinations are used to measure student competencies under controlled conditions. Tests are typically developed by professional organizations and companies to determine the level of learning a student should acquire in a specific field of learning. Commercially produced tests generally consist of multiple choice questions whose results can be used to compare local students with other students from institutions across the country. If properly chosen, the results from these tests can be used to improve teaching and learning. The most notable advantages of commercially produced tests are: • • • • Institutional comparisons of student learning; Little professional time is needed beyond faculty efforts to analyze examination results and develop appropriate curricular changes that address the findings; Nationally developed tests are devised by experts in the respective field; Tests are typically administered to students in large groups and do not require faculty involvement when students are taking the exam. The strongest criticism of commercially produced tests is they may not be reflective of the institution’s curriculum. Test design and content should be reflective of an institution’s curriculum in order for the results to be helpful. 8/22/2003 26 Criterion-Referenced Tests Criterion-referenced tests are designed to measure how well a student has learned a specific body of knowledge and skills. Multiple choice tests, similar to a driver’s license test, are examples of a criterion-referenced test. Criterion-referenced tests are usually made to determine whether or not a student has learned the material taught in a specific course or program. Criterion-referenced tests that are used within a course are designed to test the information learned from the course as well as the instruction that prepared students for the test. The principal use of criterion-referenced tests come from using a pre- and post-test design to determine how much students know prior to the beginning of instruction and after it has finished. The test measures specific skills which make up the designated curriculum. Each skill is expressed at an instructional objective and each skill is tested using at least four items in order to obtain an adequate sample of student performance and to minimize the effect of guessing. The items which test any given skill are parallel in difficulty. Each student’s score, normally expressed as a percentage, is compared with a preset standard for acceptable achievement with little regard for the performance of other students. [Source: The National Center for Fair & Open Testing and Educational Psychology Interactive]. Portfolio A portfolio is normally considered to be a collection of work that represents an individual’s cumulative work (at a given point in time and space). In assessment terms, a portfolio represents a collection of student work that exhibit to faculty a student’s progress and achievement in specified areas. For instance, included in a student’s portfolio could be written papers, either term papers, reports, etc. that include a reflective piece written by the student; results from a comprehensive or capstone examination; norm-referenced exam results, such as, WGCTA, Cornell X, CAAP-Critical Thinking, CollegeBASE, etc. If a student is a vocational student the portfolio may consist of pieces of machinery a student designed and built; a collection of computer programs; field reports about specific activities or procedures; or a set of drawings that demonstrate student knowledge. 8/22/2003 27 A portfolio can be collected over the student’s experience at the institution, e.g., one year, several semesters, etc., so faculty can evaluate the full scope of a student’s work. In particular, the longitudinal aspect of evaluating portfolios allow faculty to “see” the academic growth of students as they matriculate through the institution. Central to the evaluation of a student portfolio is a scale or rubric from which to grade the material or artifacts. The criteria for grading the portfolio need to be in place prior to the formal evaluation of a student’s material or artifacts. The proliferation of modern technology has provided new ways for storing written and visual information; such as through a disk, CD or webpage. Scoring Rubrics Rubrics are a popular means for assessing student learning because, with proper training, they can be a consistently reliable means to assess essays, papers, projects, and performances. A rubric contains descriptions of the traits a student must have to be considered “high-level”, “acceptable”, or “poor quality”. A rubric can contain several layers of student ability; such as, comprehensibility, usage, risk taking and variety, to name a few. Within each rubric category (e.g., risk taking) there are multiple “levels” of learning a student can display. An analytic rubric measures each part of a student’s work separately whereas a holistic rubric combines them. Classroom Assessment Techniques (CAT) Classroom Assessment Techniques (CAT) refers to when faculty obtain useful information, or feedback on what, how much and how well their students are learning. The feedback can be as simple as: 1) list the most important thing you learned today; 2) what is it that you are most confused about; or 3) what additional information would you like to know about what was discussed today? This process was popularized by Angelo and Cross in their work Classroom Assessment Techniques, published by Jossey-Bass in 1993. 8/22/2003 28 Capstone Courses Capstone courses are designed to integrate knowledge, concepts, and skills associated with a complete sequence of study in a program. The method of assessment is to use the courses themselves as the instrument and basis for assessing teaching and student learning. The evaluation of a student’s work in capstone courses is used as a means of assessing student outcomes. The capstone course becomes the forum from which a student displays his or her knowledge through various aspects of their programmatic experiences. It varies from program to program whether or not a single or several capstone courses are necessary to adequately assess a student’s learning. Generally speaking, capstone courses are the final experiences for students within a discipline or program. SUMMARY The basis for this paper was to provide an overview of the practical application for conducting learning assessment in a campus community. The comments and suggestions are derived from years of working with campus communities as they attempt to put their “assessment house in order”. As mentioned previously, encouraging an institution to create “an assessment-oriented perspective” is the first step in creating a campus climate that is assessment friendly. The majority of comments made within this paper can be summarized as a “back-to-the-basics” for fostering the assessment initiative and developing assessment projects. Hopefully the information contained within this paper will provide readers with suggestions or techniques that will enable their experience with assessment to be more rewarding, exciting and productive. 8/22/2003 29 REFERENCES 1993 Angelo, Thomas A., Cross, K. Patricia. Classroom Assessment Techniques: A Handbook for College Teachers. Jossey-Bass, Publishers: San Francisco. 1987 Kraemer, Helena Chmura and Thiemann, Sue. How Many Subjects? Sage Publications: Newbury Park, CA. 1991 Miller, Delbert C. Handbook of Research Design and Social Measurement. Sage Publications: Newbury Park, CA. 1983 Norris, Donald M. “Triage and the Art of Institutional Research.” The AIR Professional File, Number 16, Spring-Summer 1993. The Association for Institutional Research. 1993 Vogt, W. Paul Dictionary of Statistics and Methodology: A Nontechnical Guide for the Social Sciences. Sage Publications: Newbury Park, CA. 1980 Wilson, Terry C. RESEARCHER'S GUIDE TO STATISTICS: Glossary and Decision Map. University Press of America. Lanham, MD. 8/22/2003 30