A Maximum Expected Gain Model of Movement under Risk L. T. Maloney, J. Trommershäuser, M. S. Landy, Psychology and Neural Science, New York University VSS 2003 Sarasota, FL We present a novel model for the planning of movements in environments where there are explicit gains and losses associated with the outcomes of actions. In our model, subjects choose movement trajectories to maximize expected gain. The model takes into account the consequences of accidental deviations that carry the movement into “dangerous regions.” ILLUSTRATION TEST OF MODEL Penalty -500 0 On each trial, she must rapidly touch the screen or incur a large ‘timeout’ penalty. 0 : -100 0 100 0 Reward 18 mm 0 : 100 y [mm] • 100 trials, bonus: 25¢ per 1000 points : -500 : different outcomes of the (simulated) experiment 2. When a strategy is executed, the result is a particular movement trajectory, τ. 3. A visual-motor strategy imposes a probability density, p(τ|S), on the space of possible movement trajectories. Trial by trial variation of movement end points: Subjects are not learning by trial by trial error gain [$] 1. The outcome of movement planning is a visual-motor strategy, S. R 1.5R 2R • 6 stimulus configurations: (varied within block) • 3 penalty conditions: 0, -100, -500 (varied between blocks) • Time limit: 700 ms Results: : 100 y [mm] • Response after time limit: -700 points Assumptions: One practice session: 300 trials, slowly decreasing time limit. Mean movement endpoints • subject’s variability: = 4.83 mm MEGaMove* MODEL Eighteen Conditions six spatial arrangements three penalty values 40 repetitions per condition Record movement endpoints (x,y) 720 = 40 x 6 x 3 data points per subject gain [$] We report the results of an experimental test of the model. 0 0 The subject sees a target on a computer screen that consists of overlapping reward and penalty regions. Example (simulation): 0 R = 9 mm x [mm] 4. The movement space contains possibly overlapping regions, Ri, I = 1, …, n. trial number 5. If a trajectory τ passes through region Ri, a gain Gi is earned (Gains may be negative). 6. The subject chooses the strategy, S, that maximizes overall expected gain, PREDICTED MEAN END POINT Penalty = 0 S Gi Pi S GTO PTO S B S i 1 GTO PTO S BS Penalty = 500 y x x y 90 60 30 0 -30 -60 x x, y: mean movement end point [mm] ‘timeout’ penalty for failing to complete a movement within a specified time limit biomechanical costs associated with strategy S. x [mm] y [mm] Ri y y [mm] p | S d y [mm] Pi S probability that the trajectory resulting from strategy S passes through region Ri. Penalty = 100 points per trial n x [mm] X: mean movement end point; x = xgreen - X (data corrected for constant pointing bias) CONCLUSIONS Subjects shifted their mean movement endpoints in response to changes in penalties and in the location of the penalty region. Overall, subjects acted so as to maximize expected gain in a variety of stimulus configurations, in good agreement with the predictions of the model. We conclude that movement planning takes extrinsic costs and the subject’s own motor uncertainty into account. x [mm] Trommershäuser, Maloney & Landy (2003), Spatial Vision, 16, 255-275. Trommershäuser, Maloney & Landy (2003), JOSA A, in press. *: Maximum Expected Gain Model of Movement planning Supported by NIH EY08266 and HFSP RG0109/1999-B, J.T. funded by the DFG (Emmy-Noether Programm)