Evaluating Online and Blended Learning Environments What does evaluation mean to you? • • • • • • • • • analysis Critique Judgment Feedback Audit Reflection Improvement Client perspective Satisfaction Agenda 1. Clarify the challenges of evaluating online and blended learning. 2. Introduce evaluation model. 3. Present case studies. 4. Engage in a planning exercise. Planning Exercise Part 1 • Sketch an evaluation plan for a new online course titled “21st Century Communication Skills” • Jointly enrolls students from around the globe. • Course is designed with participants from multiple cultures and various fields of study “It would be very surprising if even 10 percent of organizations using e-learning actually conducted any well-structured and executed evaluations.” http://www.alleninteractions.com/ “An evaluation can first and foremost determine whether the distance learning version worked as effectively as, or better than, the standard instructional approach – teaching students face to face.” - The ASTD Distance Learning Handbook We already know that online learning works as well as face-to-face instruction. Despite 50 years of “no significant differences” between media, people persist in trying to find them. Dr. Ken Allen NYU VS. VS. Allen, K., Galvis, D., Katz R. (2004). Evaluation of CDs and chewing gum in teaching dental anatomy. Journal of Dental Research, 83. Evaluation Is the Key to Effective Successful Online Learning Initiatives A major evaluation challenge is determining what your stakeholders will regard as credible evidence. Evaluation Paradigms • People have hold different evaluation paradigms. • We should recognize ours and others. • Try to avoid paradigm wars. Experimental (Quantitative) Paradigm • There are facts with an objective reality that exist regardless of our beliefs. • The goal of evaluation is to detect the causes of changes in phenomena through measurement and quantitative analysis. • Experimental designs are best because they reduce “error” which hides the truth. • Detachment is the ideal state. Interpretive (Qualitative) Paradigm • Reality is socially constructed through collective definitions of phenomena. • The goal of evaluation is to interpret phenomena from multiple perspectives. • Ethnographic methods such as observation and interviews are best because they provide the basis for sharing interpretations. • Immersion is the ideal state. Postmodern (Critical) Paradigm • Reality is individually constructed based upon experience, gender, culture, etc. • The goal of evaluation is to improve the status of under-privileged minorities. • Critical theory that deconstructs phenomena is best because it reveals the “hidden curriculum” or other power agendas in technological innovations. • Political engagement is the ideal state. Pragmatic (Eclectic) Paradigm • Reality is complex, and many phenomena are chaotic and unpredictable. • The goal of evaluation is to provide decision-makers with the information they need to make better decisions. • Methods and tools should be selected on the basis of their potential for enhancing the quality of decision-making. • Wonder and skepticism are the ideal states. Experimental Evaluation Flaws • There is over-reliance on comparative designs using inadequate measures. • No significant differences are the most common result in media comparisons. • Learning is difficult to measure in most cases, especially in higher education. We don’t know enough about the outcomes of teaching and learning in higher education. It is convenient for everyone involved to pretend that high quality, relevant teaching and learning are occurring. “Quality” ratings of universities & colleges by commercial entities have enormous impact in the USA today. The criteria used for these rankings are surprisingly dubious. Film Clip from “Declining by Degrees” by John Merrow and Learning Matters Film Clip from “Declining by Degrees” by John Merrow and Learning Matters Interpretive Evaluation Flaws • Administrators often express distain for “anecdotal evidence.” • Observations and interviews can be expensive and timeconsuming. • Qualitative interpretations are open to bias. The Failure of Educational Research – Vast resources going into education research are wasted. – They [educational researchers] employ weak research methods, write turgid prose, and issue contradictory findings. The Failure of Educational Research – Too much useless work is done under the banner of qualitative research. – Qualitative research…. [yields] ….little that can be generalized beyond the classrooms in which it is conducted. Postmodern Evaluation Flaws • It is easier to criticize than to propose solutions. • Extreme subjectivity is not widely accepted, especially outside higher education. • Whose power agenda should be given precedence? The Trouble with Postmodernists • Write critiques in a language inaccessible to decision makers. • Regard technologies as inherently evil. Pragmatic Evaluation Flaws • Requires larger commitment of resources to the evaluation enterprise. • Mixed-methods can be expensive and time-consuming. • Sometimes, decisionmakers ignore even the best evidence. The Lesson of the Vasa http://www.vasamuseet.se/ So what are some better ideas about evaluating online and blended learning courses? Three core starting points. • Plan up front. • Align anticipated decisions with evaluation questions. • Use multiple criteria and multiple data collection methods. Decision-Making and Evaluation • We must make decisions about how we go about designing and using e-learning. • Information from evaluation is a better basis for decisionmaking than habit, intuition, superstition, politics, prejudice, or just plain ignorance. Planning is the key! • A major challenge is getting stakeholders to identify the decisions they face. • Clear decisions drive the rest of the planning. • Evaluation questions emerge from decisions. • Methods emerge from questions. Conducting Evaluations - Step 1 • Identify decisions that must be made about e-learning. –adopt –expand –improve –abandon –reallocate funding Conducting Evaluations - Step 2 • Clarify questions that must be addressed to guide decisions. – Who is enrolled in elearning and why? – How can CMS be improved? – What is the impact on access? – What is the impact on performance? Conducting Evaluations - Step 3 • Select methods. –Observations –Interviews –Focus Groups –Questionnaires –Data log analysis –Expert Review –Usability Studies Conducting Evaluations - Step 4 • Collect the data. – Triangulate. – Revise data collection strategies as needed. – Accept limitations, but aim for quality. – Be sensitive to demands on all participants. Conducting Evaluations - Step 5 • Report findings so that they influence decisions in time. – Report early and often – Use multiple formats – Engage stakeholders in focus groups – Don’t hide complexity Selective Criteria for Evaluation Learning Consistency Economy Safety Flexibility Efficiency Learning Low High In comparison to traditional instructor-led instructional methods, e-learning may show statistically significant, but modest, learning gains as measured by most standardized tests. Developing reliable, valid measures of the most important outcomes is difficult and expensive. Who do we want our learners to become? • Better problemsolvers and communicators • Capable of working collaboratively as well as independently • Knowledgeable • Highly skilled Who do we want our learners to become? • Experts who possess robust mental models specific to the professions in which they work • Lifelong learners who value personal and professional development Developing reliable and valid online tests is expensive. In addition, many, if not most, important outcomes are difficult to assess with traditional measures. Consistency Low High In comparison to traditional instructor dependent instructional methods, e-learning can be more consistent, providing each learner with equivalent exposure to content, interactions, and assessment strategies, all of which can be reliably documented. Economy Low High In comparison to traditional classroom instruction, e-learning can be more economical. Unfortunately, valid examples of ROI evaluations for e-learning are still rare, especially in higher education. Safety Low High In comparison to many types of laboratory or field activities, e-learning can be safer for both people and equipment. Safety is an increasingly important criteria in higher education as well as in business and industry. Flexibility Low High In comparison to traditional instructional approaches, e-learning can be more flexible for both instructors and learners. This is clearly an important advantage in business and industry and in increasingly in many higher education contexts as well. Efficiency Low High In comparison to traditional instructional approaches, e-learning can yield 25 or more percent savings in time to achieve a given set of objectives. This factor alone justifies its adoption in any organization concerned with efficiency. Approach A Learning Approach B Consistency Economy Safety Flexibility Efficiency Low High Bad News There is no single, easy to administer, inexpensive, reliable, and valid approach to evaluating e-learning. Oh...no! Good News There are practical strategies for documenting the development and use of e-learning, improving it, and building a case for its effectiveness and impact. Thank goodness! Many models: • • • • • • • • • • • Objectives-based Accreditation Kirkpatrick’s 4 Levels Countenance Goal Free Reform Naturalistic Adversary Connoisseurship Theory-based Fourth Generation Kirkpatrick’s Four Levels http://coe.sdsu.edu/eet/Articles/k4levels/index.htm Level 4 – detect the impact on outcomes Level 3 – find out if behavior changed Level 2 – assess their learning Level 1 – measure how they liked it Results Transfer Learning Reaction Phillip’s Fifth Level (ROI) http://www.roiinstitute.net/ ASTD Survey ROI ? Results 2% Transfer 11% Learning 34% Reaction 92% Reeves/Hedberg Instructional Product Evaluation Model (2003) • Views evaluation as a process that: – focuses on supporting decision-making – adopts procedures and tools from many different fields, e.g., usability testing – involves innovative reporting approaches – engages six facets (review, needs assessment, formative, effectiveness, impact, and maintenance) • Evaluation functions are keyed to Instructional Design functions. Evaluation Function Development Activities Review Conceptualization Needs Assessment Design Formative Evaluation Development Effectiveness Evaluation Implementation Impact Evaluation Institutionalization Maintenance Evaluation Re-conceptualization Review • Ensure that the development team is well-informed about previous work done in the area during the early stages of course or program conceptualization. • Avoid recreating the wheel. Review • Review related literature. • Examine competing e-learning courses and programs. I can do better than this! Needs Assessment • Identify the critical needs that an e-learning program is supposed to meet. • Provides essential information to guide the design phase of the development process. Needs Assessment • Primary methods: • task analysis, • job analysis, and • learner analysis. • Yields a list of specific goals and objectives that learners will accomplish through engaging in e-learning. Goals Formative Evaluation • Collect information that can be used for making decisions about improving e-learning programs. • Formative evaluation should be continuous. Formative Evaluation • Provided the results are used, formative evaluation usually provides the biggest payoff for evaluation activities. • Faculty or sponsors may be reluctant to accept the results of formative evaluation. Effectiveness Evaluation • Estimate shortterm effectiveness in meeting objectives. • Necessary, but insufficient, approach to determining the outcomes. Effectiveness Evaluation • Evaluating implementation is as important as evaluating outcomes. • You must understand how e-learning programs were actually implemented to interpret results. A connection with the server could not be established? Impact Evaluation • Estimate the long-term impact on performance, both intended and unintended. • Extremely difficult to evaluate the impact of e-learning courses and programs, but increasingly important. Impact Evaluation • Evaluating impact is increasingly critical because of increased emphasis on the bottom line. • Some managers expect impact evaluation to include “return-oninvestment” (ROI) approaches. Maintenance Evaluation • Ensure the viability of an e-learning courses and programs over time. • Maintenance is one of the weakest aspects of e-learning, especially in higher education. Maintenance Evaluation • Methods include: • • • • Document analysis, interviews, observations, and automated data collection. • Very few organizations currently engage in serious maintenance evaluation of e-learning initiatives. USAFA Case Study • Decisions • Questions • Methods • Results • Recommendations Engineering Education Engineering Education • Problem: Cadets not achieving higher order outcomes • Critical Outcomes for 21st Century Graduates of the US Air Force Academy – Frame and resolve ill-defined problems – Exhibit intellectual curiosity – Communicate with multiple media – Enrich mental model of engineering • Solution – New ENGR 110 “Introduction to Engineering” blended learning course developed • Course was intended to be a showcase for alternative pedagogical dimensions • Course was intended to take maximum advantage of the technological infrastructure available at USAFA • Pedagogical Dimensions of BL Task-Oriented - cadets were given three tasks during the semester Get to Mars Build a research site on Mars Develop a power source on Mars Constructionist - cadets created knowledge representations of solutions Conversational - cadets joined listservs and other forums to discuss tasks Collaborative - cadets worked in teams throughout the course • Pedagogical Dimensions of BL Challenging - there were no “correct” solutions to tasks, but lots of wrong ones Responsive - faculty and external experts provided multiple levels of guidance and feedback Reflective - cadets kept electronic journals and participated in focus groups Formative - cadets developed prototypes and refined them over time • Web provided rich resources about Mars, space travel, engineering, Air Force, etc. • Web tools enabled cadets to collaborate. • E-mail supported consultation with experts. • PowerPoint used to construct knowledge representations. • Excel, Stella, and other tools afforded problem-solving and modeling. • Decisions had to be made: – After a three year beta test, should the new course become part of the “core.” – How could this type of course be supported after faculty who created it were gone? • Evaluation questions: –Did students achieve higherorder outcomes? –What were the logistical requirements for implementation? –How could the blended learning course be improved? Evaluation methods can be represented using a matrix: QUESTIONS How is course implemented? Learner reactions? Enhancements? What learning occurs? Observations Interviews PS Measure • A comparative evaluation was conducted using two experimental classes and two control classes with a range of measures: – Standardized problem-solving instrument – Concept maps – Questionnaires • Interviews and focus groups employed. • Intensive observations. Engr Mech Task-Oriented Challenging Collaborative Constructionist Conversational Responsive Reflective Formative Engr 110 • Educationally significant differences were found on a standardized measure of problem-solving. • Concepts maps revealed little. • Observations indicated that course was very demanding on both cadets and faculty. Pre- and Post- Course Results • No pre-course differences between cadets in new course and those in control course • Significant post-course differences between cadets in new course and those in control course • Cadets in new course improved by a whole standard deviation (1 Sigma difference) Pre = Post = D D+ S- S S+ E- E • Other benefits found included: – richer mental models – improved communication skills – enhanced research skills – better team skills • Recommendations: –Continue to support the course for two more years –Explore extensions of pedagogical dimensions into other courses –Provide more faculty release time Planning Exercise Part 2 Plan an evaluation of the “21st Century Communications” course by identifying the following - Decisions - Questions - Methods Recommendation 1 Embrace the complexity of e-learning, describing it in many ways. And don’t yield to the temptation to oversimplify. Recommendation 2 Render judgment with great care. Whereas description preserves complexity, judgment forces decisions of acceptance or rejection. Recommendation 3 Keep before you the image of multiple publics ready to eat you alive! Evaluating is always a political activity. Recommendation 4 Remember that data don’t make decisions. People do! Your task is to help people (including yourself) make decisions based on sound information. Evaluate before you decide. Recommendation 5 Don’t confuse measuring with evaluating. And avoid the empirical swamp of traditional media comparisons. Recommendation 6 Don’t ask: Which test should we use? Ask: What can we count as evidence that learning has occurred? Recommendation 7 Lastly, prepare to work far into the night. The evaluator’s labors are long and difficult. Thank You! Professor Emeritus Tom Reeves The University of Georgia Instructional Technology 604 Aderhold Hall Athens, GA 30602-7144 USA treeves@uga.edu http://it.coe.uga.edu/~treeves