Testing a Strategic Evaluation Framework for Incrementally Building Evaluation Capacity in a Federal R&D Program 27th Annual Conference of the American Evaluation Association Washington, DC October 17, 2013 JOHN TUNNA Director Office of Research and Development Office of Railroad Policy and Development Federal Railroad Administration Federal Railroad Administration (FRA) Evaluation Implementation Plan • Introduction – R&D Evaluation Mandate – R&D Evaluation Goals – R&D Evaluation Standards • Uses of Evaluation – Formative – Summative • Types of Evaluation (CIPP Evaluation Model) – – – – • • • Context Input Implementation Impact Evaluation Framework & Key Evaluation Questions Start-up Pilot Evaluations Institutionalizing and Mainstreaming Evaluation – Metaevaluation – The Evaluation Manual • • Evaluation templates Attestation of standards R&D Evaluation Mandate • Congressional Mandates – – – • OMB Memos – – – – – • M-13-17, July 26, 2013: Next Steps in the Evidence and Innovation Agenda M-13-16, July 26, 2013: Science and Technology Priorities for the FY 2015 Budget M-10-32, July 29, 2010: Evaluating Programs for Efficacy and Cost-Efficiency M-10-01, October 7, 2009: Increased Emphasis on Program Evaluations M-09-27, August 8, 2009: Science and Technology Priorities for the FY2011 Budget Federal Evaluation Working Group – – • Government Performance and Results Act (GPRA, 1993) Program Assessment Rating Tool (PARTs, 2002) GPRA Modernization Act of 2010 Reconvened in 2012 to help build evaluation capacity across the federal government “[We] need to use evidence and rigorous evaluation in budget, management, and policy decisions to make government work effectively.” GAO reports – – – Program Evaluation: Strategies to Facilitate Agencies’ Use of Evaluation in Program Management and Policy Making (June, 2013) Program Evaluation: A Variety of Rigorous Methods Can Help Identify Effective Interventions (GAO-10-30, November, 2009) Program Evaluation: Experienced Agencies Follow a Similar Model for Prioritizing Research (GAO-11-176 , January, 2011) R&D Evaluation Mandate OMB Memo M-13-16 (July 26, 2013) Subject: Science and Technology Priorities for the FY 2015 Budget “Agencies . . . should give priority to R&D that strengthens the scientific basis for decision-making in their mission areas, including but not limited to health, safety, and environmental impacts. This includes efforts to enhance the accessibility and usefulness of data and tools for decision support, as well as research in the social and behavioral sciences to support evidence-based policy and effective policy implementation. “ “Agencies should work with their OMB contacts to agree on a format within their 2015 Budget submissions to: (1) explain agency progress in using evidence and (2) present their plans to build new knowledge of what works and is cost-effective.“ R&D Evaluation Goals • Meet R&D accountability requirements • Guide and strengthen Division R&D program effectiveness and impact • Facilitate knowledge diffusion and technology transfer • Build R&D evaluation capacity • Improve railroad safety Why Evaluation in R&D? Assessing the logic of R&D Programs ACTIVITIES OUTPUTS Funded Activity “Family” Deliverables/ Products ___________ Technical Report(s) Scientific Research Technology Development Forecasting Model(s) OUTCOMES IMPACTS Application of Research Reduced Accidents Injuries Data Use Adoption of Guidelines, Standards or Regulations Emergent Outcomes Changing Practices Positive Knowledge Gains Negative Environmental Effects The Research-Evaluation Paradigm Research Primary Purpose: Primary audience: Types of Questions: Sources of Data: Criteria: - contribute to knowledge improve understanding scholars researchers academicians hypotheses theory driven preordinate surveys tests experiments pre-ordinate - validity - reliability - generalizability Evaluation - program improvement decision-making program funders administrators decision makers practical applied open-ended, flexible interviews field observations documents mixed sources open-ended, flexible utility feasibility propriety accuracy accountability Program Evaluation Standards: Guiding Principles for Conducting Evaluations • Utility (useful): to ensure evaluations serve the information needs of the intended users. • Feasibility (practical): to ensure evaluations are realistic, prudent, diplomatic, and frugal. • Propriety (ethical): to ensure evaluations will be conducted legally, ethically, and with due regard for the welfare of those involved in the evaluation, as well as those affected by its results. • Accuracy (valid): to ensure that an evaluation will reveal and convey • valid and reliable information about all important features of the subject program. Accountability (professional): to ensure that those responsible for conducting the evaluation document and make available for inspection all aspects of the evaluation that are needed for independent assessments of its utility, feasibility, propriety, accuracy, and accountability. Note: The Program Evaluation Standards were developed by the Joint Committee on Standards for Educational Evaluation and have been accredited by the American National Standards Institute (ANSI). CIPP Evaluation Model: (Context, Input, Process, Product) Types of Evaluation • Context • Input • Implementation • Impact Stakeholder engagement is key Daniel L. Stufflebeam's adaptation of his CIPP Evaluation Model framework for use in guiding program evaluations of the Federal Railroad Administration's Office of Research and Development. For additional information, see Stufflebeam, D.L. (2000). The CIPP model for evaluation. In D.L. Stufflebeam, G. F. Madaus, & T. Kellaghan, (Eds.), in Evaluation models (2nd ed.). (Chapter 16). Boston: Kluwer Academic Publishers. Evaluation Framework: Roles and Types of Evaluation Formative Evaluation (proactive) Context Inputs Implementation Impact Identifies: Assesses: Monitors, documents, Assesses: & guides execution +/- outcomes • Needs • Problems • Assets Alternative approaches Helps set: • Goals • Priorities Develops: Program plans, designs, budgets Assesses: Summative Assesses: Evaluation (retroactive) Original rogram Original goals & priorities Reassess: project and program plans; procedural plans & budget Informs: Policy development Strategic planning Assesses: Assesses: Execution Outcomes Impacts Side effects Cost-effectiveness Evaluation Framework: Key Evaluation Questions – Safety Culture Context Formative Evaluation Summative Evaluation Inputs Implementation Impact What are the highest priority needs to improve safety culture in the U.S. rail industry? What are the most promising alternatives for safety culture interventions (BBS, ISROP, Rules Revision, Close Calls, etc.)? How do they compare (potential success, costs, etc.)? How can these interventions be most effectively implemented? What are some potential barriers to implementation? To what extent do safety culture interventions proceed on time, within budget, and effectively? How can safety culture interventions be implemented to maximize effectiveness? What are some indicators of impact or use, if any, that have emerged to indicate that these interventions are being adopted more broadly? What are some emerging outcomes (positive or negative)? How can the implementations be modified to minimize costs and maximize effectiveness? To what extent did this intervention address the high priority safety need? What intervention strategy was chosen, and why was it chosen compared to other viable strategies (re. prospects for success, feasibility, costs)? To what extent was the intervention carried out as planned, or modified with an improved plan? If needed, how can the intervention design be improved? To what extent did these interventions improve safety/safety culture? Were there any unanticipated negative or positive side effects? What conclusions and lessons learned can be reached (i.e. cost effectiveness, stakeholder engagement, program effectiveness)? 11 Evaluation as a Key Strategy Tool • Ask questions that matter. About processes, products, programs, policies, and impacts Then develop appropriate and rigorous methods to answer them. • Measure the extent to which, and ways, programs goals are being met. What’s working, and why, or why not? • Use to refine program strategy, design and implementation. Inform others about lessons learned, progress, and program impacts. • Improve likelihood of success with: – – – – Intended users Intended uses Outcomes and impacts Unanticipated (positive) outcomes • Use evaluation to develop appropriate and useful performance measures for reporting R&D outcomes, and monitoring those outcomes for continuous improvement. Michael Coplen Senior Evaluator Office of Research & Development Federal Railroad Administration 202-493-6346 Michael.Coplen@dot.gov 13 13 QUESTIONS? 14 Supplemental Information 15 Evaluation Framework: Illustrative Questions – Fatigue Website Context Inputs Implementation Impact Formative Evaluation What are the highest priority needs for sleep health and safety in the railroad industry? Given the need for sleep health education and training, what are the most promising alternatives (fatigue website, regulations, etc.)? How do they compare (potential success, costs, etc.)? How can this strategy be most effectively implemented? What are some potential barriers to implementation? To what extent is the website project proceeding on time, within budget, and effectively? If needed, how can the design be improved? To what extent are people using the website? What other indicators of use, if any, have emerged that indicate the website is being accessed and the information is being acted upon? What are some emerging outcomes (positive or negative)? How can the implementation be modified to maintain and measure success? Summative Evaluation To what extent did the fatigue website address this high priority need? What strategy was chosen and why compared to other viable strategies (re. prospects for success, feasibility, costs)? To what extent was the website carried out as planned, or modified with an improved plan? To what extent did this project effectively address the need to educate railroad employees on sleep health and safety? Were there any unanticipated negative or positive side effects? What conclusions and lessons learned can be reached (i.e. cost effectiveness, stakeholder engagement, program effectiveness)? Input Evaluation: Program Design and Partnership Commitment to Change Clear Signal for Action (CSA) Theory of Change Safety Culture Management Values Attitudes Competencies Patterns of Behavior (Management & Labor) INTERVENTION Establish Steering Committee (Management) Develop Checklist (Steering Committee) Data Analysis & CA Planning (Steering Committee, CA Team) Observer Training (Steering Committee (Observers) Corrective Actions Workers don’t have control (CA Team) Workers have control (Steering Committee) Data Gathering & Feedback (Observers) At-Risk Conditions At-Risk Behaviors Incidents Implementation Evaluation Peer-to-Peer Feedback Continuous Improvement (CI) Safety Leadership Development (SLD) Safety Outcomes Impact Evaluation: Expected changes and possible metrics (Union Pacific example) Implementation First Order Impacts S.T.E.E.L. Activities S.T.E.E.L.-targeted employee practices Communications Steering committee training Employee involvement in S.T.E.E.L. Checklist development Sampler training Second Order Impacts General employee practices Employee well-being Personal sense of control/responsibility Job satisfaction Equipment control Health Rule compliance Stress Safe behaviors Awareness Sampling Reactions to problems Attitude toward safety Incidents Coaching Third Order Impacts Investigations Personal Injuries Feedback Discipline Derailments Data analysis FTX results Barrier removal Collisions Decertifications Close calls Barrier identification Safety hotline Management practices Leadership training Other influences include: · Corporate policy changes · FRA practices Safety-enabling leadership behaviors Communication quality, amount and consistency Attitude toward safety Culture Corporate results Safety culture Liability Labor-management relations Incident costs Productivity Public image