THE TAMING OF UNCERTAINTY Magda Osman QMUL Head of Dynamic Learning and Decision-making Lab www.magdaosman.co.uk m.osman@qmul.ac.uk The Taming of Uncertainty Act 1 – The Comedy of Gains & Losses Act 2- The Tragedy of Unknowing Act 3 – The Triumph of Heroes Scene setting Decision-Making Scenarios Meder, Le Lec & Osman (2013). Trend in Cognitive Science Types of Uncertainty Meder, Le Lec & Osman (2013). Trend in Cognitive Science How are dynamic situations studied in the lab? • Mircoworlds – mini computerised situations that mimic uncertain domains in which the participant interacts with and then attempts to control various outcomes 1. A brief Scenario is presented 2. The participant is then shown the computer-based task 3. They have a set number of trials in which to manipulate variables in order to control an outcome to criterion 4. This is followed by knowledge tests designed to access their understanding of the rules or causal structure connecting the input variables to the output variables Characters Dynamic Decision-Making (DDM) “….is a goal directed process that involves selecting actions that will reliably achieve and maintain the same outcome over time” • (Brehmer, 1992) • Where can DDM be found? • Changes occur either endogenously and/or exogenously • What are the defining Characteristics of DDM? • Sequentially, interdependent, online Osman (2010). Psychological Bulletin Agency & Control Choice involves an selection between alternatives, inherent in this is an action (mental/physical) that identifies the preferred choice Control “… is the combination of cognitive processes needed to co-ordinate actions in order to achieve an goal on a reliable basis over time” Agency “…is the overarching state or sustained experience that is concerned with ownership of and responsibility of observed actions” Osman (2014). Future-minded: The psychology of agency and control Heroes vs. Villains • Bounded Rationality • Unconscious • Reinforcement learning • Invalidity • Causality • Unreliability Plot Advances in DDM Research Machine learning/ Animal learning/ Neuroscience: Huys & Dayan (2009),Dayan (2009); Daw, Niv & Dayan, (2005), Dickinson (1985); Matignon, Laurent, & Fort-Piat (2006) Dynamic control / Naturalistic decision making tasks/ Contingency learning/ Management Science: Brehmer (1992), Busemeyer (1999), Edwards (1962), Kirlik, Miller & Jagacinski (1993), Sterman (1989) So, what are the key influences on Dynamic DecisionMaking? Monitoring & Control Theory (Osman, 2008, 2010, 2011) The agent The agent is engaged in a goal-directed way (goals), in order to achieve a certain outcome (reward), and repeatedly behaves as if an action will achieve and maintain a certain goal (sense of agency/contingency learning) The situation The actions of the agent are informed by state changes in the environmental, and the feedback/reward structure, in which the state changes are potentially knowable Simple prediction Only manipulations of Agency/ Goals/Reward/Feedback/Contingency should impact decision-making performance Synopsis Research Programme Learning: - Mode of learning (prediction vs. control) (observation vs. action) - Type of goal (Exploration vs. Exploitation) Sense of Agency: (Low vs. High) Contingency: (Extreme, High, Moderate, Low) Performance feedback: (Positive, Negative, Both, Neither) Reward: (Gains, Losses) Preamble losses vs. Gains Lab-based Decision-making – Hedonic principle + Loss aversion = Losses are generally more salient than gains (a la Prospect theory) So, losses should drive learning more effectively Real-world Decision-making - During early stages of learning negative rewards lead to improved performance So, losses should drive learning more effectively (Brett & VandeWalle, 1999; Kaheman & Tversky, 1970; Latham & Locke, 1990; Tabernero & Wood, 1999) Experimental Set up Actions • Experimental Goal of task Reward set upset up •• • • • -5 100 Learning trials Intervene on Input 1, input outcome 2,Losses input 3,tono Learnt to control a dynamic a specific Maximize gains vs. Minimize intervention 20 Test trial (reward is performance related) – goal Familiar goal (1-100) Value setting 20 Test trials (reward is performance related)Unfamiliar goal Points convert s to money +5 Structure Reward is based on the discrepancy between achieved and target value Gains +10 (maximum gain) Losses -5 (minimum loss) the closer the outcome value is to target as compared to the previous trial Gains +5 (minimum gain) Losses -10 (maximum loss) the further the outcome value is to target as compare to the previous trial But point assignment is probabilistic, 80% reliable Total points in Test 1 + Test 2 * 2.5 = final wins 20 Structure of Task Environment 3 inputs , 1 Output (Continuous variables) y(t) = y(t-1) + b1 x1(t) + b2 x2(t) + et • • • • • Output value = y(t) Previous output value = y(t-1) Positive input= b1 = 0.65 Negative input = b2 = -0.65 (Null input) Random noise = et The random noise was drawn from a normal distribution with mean of 0, SD 8 (Intermediate Noise) Act 1: The Comedy of Gains & Losses Learning Performance - outcome feedback every trial Trials 1-100 N = 40 ( Gains N = 20, Losses N = 20) - No differences in learning patterns Test Performance : Outcome feedback every trial Test 1 Limited difference in test performance Test 2 Act 2: The Tragedy of Unknowing Learning Performance: Outcome feedback every 5th trial Trials 1-100 100 90 80 70 60 50 Gains 40 Losses 30 20 10 N = 50 ( Gains N = 25, Losses N = 25) - Advantage for gains group 97 93 89 85 81 77 73 69 65 61 57 53 49 45 41 37 33 29 25 21 17 13 9 5 1 0 Test Performance: Outcome feedback every 5th trial Test 2 Test 1 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 1 3 5 7 9 11 13 15 17 19 Gains Losses 1 3 5 Small but reliable advantage for the Gains group in test performance 7 9 11 13 15 17 19 Comparison of both Experiments Learning Performance Exp 1a Exp 1b Test Performance Exp 1a Exp 1b Final Act: The Triumph of Heroes Best Circumstances for DDM Despite impoverished information (Exp 1b) DDM is robust enough that learning is possible - but only when maximizing gains 1. High quality (but simple) outcome information presented frequently, with reward signals – socially framed] 2. Incentivization schemes make a difference but only under extreme uncertainty [avoid punishment schedules under these conditions] [Not Strategies When the conditions are highly unstable people 1. intervene on the system a lot 2. make dramatic changes in parameter setting 3. Change multiple variables at once When it is highly stable people 1. intervene on the system very little 2. make conservative changes in parameter setting 3. Make minimal systematic changes to variables Under both conditions people seem to stick to their choice of strategy than switch over time Monitoring and Control • Simple prediction • Only manipulations of Agency/ Goals/Reward/Feedback/Contingency should impact decisionmaking performance • This has been supported in several studies • Critically the above factors impact on forecasting behaviour as well as DDM (control behaviour) • Critically, there is NO evidence that these factors have differential effect on DDM/forecasting because they tap in different “systems” (i.e. System 1 vs. System 2) Thanks to Researchers Undergraduate Students Patrycja Marta Bartoszek Zuzanna Hola Bjoern Meder Brian Glass Agata Ryterska Maarten Speekenbrink Susanne Stollewerk Exploitation, Optimality, Variability Exploitation Optimality Variability