Schedules of Reinforcement and Choice Simple Schedules • • • • Ratio Interval Fixed Variable Fixed Ratio • CRF = FR1 • Partial/intermittent reinforcement • Post reinforcement pause Causes of FR PRP • Fatigue hypothesis • Satiation hypothesis • Remaining-responses hypothesis – Reinforcer is a discriminative stimulus signaling absence of next reinforcer any time soon Evidence • PRP increases as FR size increases – Does not support satiation • Multiple FR schedules – Long and short schedules – PRP longer if next schedule long, shorter if next one short • Does not support fatigue L L S FR10 S FR40 S L Fixed Interval • • • • Also has PRP Not remaining responses, though Time estimation Minimize cost-to-benefit Variable Ratio • Steady response pattern • PRPs unusual • High response rate Variable Interval • Steady response pattern • Slower response rate than VR Comparison of VR and VI Response Rates • Response rate for VR faster than for VI • Molecular theories – Small-scale events – Reinforcement on trial-by-trial basis • Molar theories – Large-scale events – Reinforcement over whole session IRT Reinforcement Theory • Molecular theory • IRT: Interresponse time • Time between two consecutive responses • VI schedule – Long IRT reinforced • VR schedule – Time irrelevant – Short IRT reinforced • Random number generator (mean=5) • 30 reinforcer deliveries Time/number for reinforcement 895323566436145668… Time b/t responses 1 3 10 4 1 1 1 3 8 8 7 9 9 7 1 1 5 6 1 9 8 5 1 4 1 9 6 3 10 5 … i i i i i i i i i i i i i i i i r r r r r number Ratio number Interval 1 2 3 4 5 6 seconds 7 8 9 10 1 2 3 4 5 6 seconds 7 8 9 10 Response-Reinforcer Correlation Theory – Long-range reinforcement outcome – Trial-by-trial unimportant 100 Reinforcers/hour • Molar theory • Response-reinforcer relationship across whole experimental session VR 60 50 VI 60 sec • Criticism: too cognitive 50 Responses/minute 100 Choice • • • • • 2 key/lever protocol Ratio-ratio Interval-interval Typically VI-VI CODs Matching Law B1 R1 = B1 + B2 R 1 + R2 or • B = behaviour (responses) • R = reinforcement B1 B2 = R1 R2 Bias • Spend more time on one alternative than predicted • Side preferences • Biological predispositions • Quality and amount • Undermatching, overmatching Qualities and Amounts • • • • Q1: quality of first reinforcer Q2: quality of second reinforcer A1: amount of first reinforcer A2: amount of second reinforcer Undermatching • Most common • Response proportions less extreme than reinforcement proportions Overmatching • Response proportions are more extreme than reinforcement proportions • Rare • Found when large penalty imposed for switching – e.g., barrier between keys Undermatching/Overmatching Undermatching Overmatching 1 B1 B1+B2 B1 B1+B2 1 0.5 0 0.5 1 0.5 0 0.5 1 Baum’s Variation B1 B2 = b( R1 s R2 ) • s = sensitivity of behaviour relative to rate of reinforcement – Perfect matching, s=1 – Undermatching, s<1 – Overmatching, s>1 • b = response bias Matching as a Theory of Choice • Animals match because they are evolved to do so. • Nice, simple approach, but ultimately wrong. • Consider a VR-VR schedule – Exclusively choose one alternative • Whichever is lower – Matching law can’t explain this Melioration Theory • Invest effort in “best” alternative • In VI-VI, partition responses to get best reinforcer:response ratio – Overshooting the goal; feedback loop • In VR-VR, keep shifting towards lower schedule; gives best reinforcer:response ratio • Mixture of responding important over long run, but trial-by-trial responding shifts the balance Optimization Theory • Optimize reinforcement over long-term • Minimum work for maximum gain • Respond to both choices to maximize reinforcement Momentary Maximization Theory • Molecular theory • Select alternative that has highest value at that moment • Short-term vs. long-term benefits Delay-reduction Theory • Immediate or delayed reinforcement • Basic principles of matching law, and... • Choice directed towards whichever alternative gives greatest reduction in delay to next reinforcer • Molar (matching response:reinforcement) and molecular (control by shorter delay) features Self-Control • Conflict between short- and long-term choices • Choice between small, immediate reward or larger, delayed reward • Self-control easier if immediate reinforcer delayed or harder to get Value-Discounting Function • V = M/(1+KD) – V = value of reinforcer – M = reward magnitude – K = discounting rate parameter – D = reward delay • Set M = 10, K = 5 – If D = 0, then V = M/(1+0) = 10 – If D = 10, then V = M/(1+5*10) = 10/51 = 0.196 Reward Size & Delay • Set M=5, K=5, D=1 – V = 5/(1+5*1) = 5/6 = 0.833 • Set M=10, K=5, D=5 – V = 10/(1+5*5) = 10/26 = 0.385 • To get same V with D=5 need to set M=21.66 Ainslie-Rachlin Theory • Value of reinforcer decreases as delay b/t choice & getting reinforcer increases • Choose reinforcer with higher value at the moment of choice • Ability to change mind; binding decisions