www.ucl.ac.uk/behaviour-change @UCLBehaveChange behaviourchange@ucl.ac.uk Using Bayesian methods for evaluating behaviour change interventions 12 May 2014 With live-blogging directly from the workshop: http://blogs.ucl.ac.uk/bayesian-bci/ www.ucl.ac.uk/behaviour-change @UCLBehaveChange behaviourchange@ucl.ac.uk Programme Chair: Dr Jamie Brown 4.30-4.35 Introduction to the Centre for Behaviour Change (Prof Susan Michie) 4.35-4.50 Introductions and aims of the workshop (Prof Robert West) 4.50-5.10 Bayesian statistics and its use in intervention evaluation (Prof Zoltan Dienes) 5.10-5.20 Questions 5.20-5.30 Summary of a social marketing intervention on smoking in Botswana (Larissa Persons) 5.30-5.50 Bayesian versus classical statistics in evaluating this intervention (Prof Robert West and Dr Emma Beard) 5.50-6.25 Discussion around other examples provided by participants (Dr Jamie Brown) 6.25-6.30 Concluding remarks and possible future steps (Prof Robert West) 6.30-7.30 Reception www.ucl.ac.uk/behaviour-change @UCLBehaveChange behaviourchange@ucl.ac.uk UCL Centre for Behaviour Change Susan Michie, Director www.bct-taxonomy.com www.behaviourchangewheel.com Book Launch: 2 June 2014 Introductions and aims of the workshop Robert West How to get the most out of null results using Bayes Zoltán Dienes The problem: Does a non-significant result count as evidence for the null hypothesis or as no evidence either way? ? .081 * .034 26 .740 .034 .090 .817 .028 .001 .056 .031 .279 .024 .083 .002 .167 .172 .387 .614 .476 .006 .028 .002 .024 .144 .230 24 * ? * ***< ? * * ? ** ** * ** * -20 25 23 22 21 20 19 18 Successive experiments Geoff Cummin: http://www.latrobe.edu.au/psy/esci/index.html 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 -10 2 mdiff H0 0 10 Difference in verbal ability 20 30 40 1. Intervals 2. Bayes factors The four principles of inference by intervals: Null region Minimal interesting value 0 Difference between means -> accept the null region hypothesis reject the null region hypothesis reject a directional theory Data are insensitive: suspend judgment The Bayes Factor: Strength of evidence for one theory versus another (null) If B > 1 then the data supported your theory over the null If B < 1, then the data supported the null over your theory If B = about 1, experiment was not sensitive. Jeffreys, 1939: Bayes factors more than 3 or less than a 1/3 are substantial If B > 1 then the data supported your theory over the null If B < 1, then the data supported the null over your theory If B = about 1, experiment was not sensitive. Jeffreys, 1939: Bayes factors more than 3 or less than a 1/3 are substantial B > 3 substantial support for theory B < 1/3 substantial support for null To know which theory data support need to know what the theories predict The null is normally the prediction of e.g. no difference On the null hypothesis only this value is plausible Plausibility -2 0 2 4 Population difference between conditions To know which theory data support need to know what the theories predict The null is normally the prediction of e.g. no difference Need to decide what difference or range of differences are consistent with one’s theory Difficult - but forces one to think clearly about one’s theory. To calculate a Bayes factor must decide what range of differences are predicted by the theory 1) Uniform distribution 2) Normal 3) Half normal Example: Does imagining a sports move improve sports performance Example: Does imagining a sports move improve sports performance Plausibility -2 0 2 4 8 Population difference in means between practice versus no practice Performance with real practice for same amount of time Similar sorts of effects as those predicted in the past have been on the order of a 5% difference between conditions Plausibility 0 5 Population difference in means between conditions Implies: Smaller effects more likely than bigger ones; effects bigger than 10% very unlikely To calculate Bayes factor in a t-test situation Need same information from the data as for a t-test: Mean difference, Mdiff SE of difference, SEdiff To calculate Bayes factor in a t-test situation Need same information from the data as for a t-test: Mean difference, Mdiff SE of difference, SEdiff Note: t = Mdiff / SEdiff SEdiff = Mdiff/t Also note F(1,x) = t2(x) Generalising to categorical data Intervention given to one of two classes about harm of smoking Smoker Not smoker intervention 32 54 no intervention 31 35 Odds ratio = (32*35) / (31*54) Ln odds ratio is normally distributed with squared SE = 1/32 + 1/54 + 1/31 + 1/35 Generalising to categorical data Intervention given to one of two classes about harm of smoking Smoker Not smoker intervention 32 54 no intervention 31 35 Odds ratio = (32*35) / (31*54) Ln odds ratio is normally distributed with squared SE = 1/32 + 1/54 + 1/31 + 1/35 A different intervention had reduced smokers with odds ratio of 3. Ln 3 To calculate a Bayes factor: 1) Google “Zoltan Dienes” 2) First site to come up is the right one: http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/ 3) Click on “Click here for a Bayes factor calculator” 4) Scroll down and click on “Click here to calculate your Bayes factor!” 2.96 4.88 0.52 4.88 2.70 0.46 4.40 1024.6 3.33 4.88 1.73 4.28 2.96 49.86 2.16 2.12 1.01 0.65 0.75 28.00 4.28 49.86 5.60 2.36 1.73 The tai chi of the Bayes factors p .081 .034 .74 .034 .09 .817 .028 .001 .056 .031 .279 .024 .083 .002 .167 .172 .387 .614 .476 .006 .028 .002 .024 .144 .23 The dance of the p values http://www.latrobe.edu.au/psy/esci/index.html ?.081 *.034 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 .740 *.034 ?.090 .817 *.028 ***<.001 ?.056 *.031 .279 *.024 ?.083 **.002 .167 .172 .387 .614 .476 **.006 *.028 **.002 *.024 .144 .230 -20 H0 -10 0 mdiff 10 20 Difference in verbal ability 30 Successive experiments Bayes 40 My typical practice: If think of way of determining an approximate expected size of effect Use half normal with SD = to that typical size If think of way of determining an approximate upper limit of effect => Use uniform from 0 to that limit Moral and inferential paradoxes of orthodoxy: 1. On the orthodox approach, standardly you should plan in advance how many subjects you will run. If you just miss out on a significant result you are not allowed to just run 10 more subjects and test again. You are not allowed to run until you get a significant result. Bayes: It does not matter when you decide to stop running subjects. You can always run more subjects if you think it will help. Moral paradox: If p = .07 after running planned number of subjects i) If you run more and report significant at 5% you have cheated ii) If you don’t run more and bin the results you have wasted tax payer’s money and your time, and wasted relevant data You are morally damned either way Inferential paradox Two people with the same data and theories could draw opposite conclusions Moral and inferential paradoxes of orthodoxy: 2. On the orthodox approach, it matters whether you formulated your hypothesis before or after looking at the data. Post hoc vs planned comparisons Predictions made in advance of rather than before looking at the data are treated differently Bayesian inference: It does not matter what day of the week you thought of your theory The evidence for your theory is just as strong regardless of its timing Moral and inferential paradoxes of orthodoxy: 3. On the orthodox approach, you must correct for how many tests you conduct in total. For example, if you ran 100 correlations and 4 were just significant, researchers would not try to interpret those significant results. On Bayes, it does not matter how many other statistical hypotheses you investigated (or your RA without telling you). All that matters is the data relevant to each hypothesis under investigation. For orthodoxy but not Bayes: Different people with the same data and theories can come to different conclusions You can thus be tempted to make false (albeit inferentially irrelevant claims), like when you thought of your theory What is the aim of statistics? 1) Control the proportion of errors you make in the long run in accepting and rejecting hypotheses (conventional statistics) 2) Indicate how strong the evidence is for one hypothesis rather than another / how much you should change your confidence in one hypothesis rather than another (Bayesian statistics) Dienes 2011 Perspectives on Psychological Science Threshold = 3 Fixed 10 trials d = 0 Reject = 2 accept = 55 d = 1 Reject = 91 accept = 0 Threshold : 3 4 5 6 7 8 9 10 Reject 14 12 11 11 7 7 6 5 Accept 86 87 86 86 85 79 74 69 Reject 97 100 100 100 100 100 100 100 Accept 1 0 0 0 0 0 0 0 Population effect : d=0 d=1 Table 1 Per cent decision rates for BH(0,1) MaxN = 100 MinN = 1 If maxN = 1000, threshold = 10, for d = 0, reject = 5 accept = 93 Threshold : 3 4 5 6 7 8 9 10 Reject 7 7 7 5 5 3 4 3 Accept 93 91 88 81 83 82 73 66 Reject 100 100 100 100 100 100 100 100 Accept 0 0 0 0 0 0 0 0 Population effect: d=0 d=1 Table 2 Per cent decision rates for BH(0,1) MaxN = 100 MinN = 10 Devil hypothesis Cat hypothesis If a devil, you will lose finger 9/10 of time If a cat, you lose finger only 1/10 of time Evidence supports the theory that most strongly predicted it Evidence supports the theory that most strongly predicted it John puts his hand in the box and loses a finger. Which hypothesis is most strongly supported, the cat hypothesis or the devil hypothesis? Evidence supports a theory that most strongly predicted it John puts his hand in the box and loses a finger. Which hypothesis is most strongly supported, the cat hypothesis or the devil hypothesis? Cat hypothesis predicts this result with probability = 1/10 Devil hypothesis predicts this result with probability = 9/10 Evidence supports a theory that most strongly predicted it John puts his hand in the box and loses a finger. Which hypothesis is most strongly supported, the cat hypothesis or the devil hypothesis? Cat hypothesis predicts this result with probability = 1/10 Devil hypothesis predicts this result with probability = 9/10 Strength of evidence for devil over cat hypothesis = 9/10 divided by 1/10 =9 The evidence is nine times as strong for the devil over the cat hypothesis OR Bayes Factor (B) = 9 Consider: John does not lose a finger Consider: John does not lose a finger Now evidence strongly supports cat over devil hypothesis (BF = 9 for cat over devil hypothesis or 1/9 for devil over cat hypothesis) Probability of losing finger given cat = 4/10 Probability of losing finger given devil = 6/10 Now if John loses finger strength of evidence for devil over cat = 6/4 = 1.5 Not very strong We can distinguish: Evidence for cat hypothesis over devil Evidence for devil hypothesis over cat Not much evidence either way. Bayes factor tells you how strongly the data are predicted by the different theories (e.g. your pet theory versus null hypothesis): B= Probability of your data given your pet theory divided by probability of data given null hypothesis Questions Social Marketing for tobacco control amongst teens An overview of the Botswana intervention Our context: a rapid rise in teen girl smoking rates Aware that smoking is associated with long-term health issues Think it is normal to try smoking 12 10.9 10 Think celebrities & aspirational people (women) smoke 8 6 Think smoking will reap social rewards 4 2.6 2 Find it hard to say no 0 2002 2008 Grow up seeing adults around them smoking cigarettes Rate of cigarette use 13-15 yr old girls (GYTS) 53 Our intervention: Tobacco Control Botswana Multi-intervention approach to strip out the aspiration from smoking amongst teenage girls - using different influencers and channels to build momentum & reinforce change Teen movement School sessions Parents campaign Seeding Influencers, content 54 The teen movement 55 Our start point: the Botswana teenage mind... Social inclusion and belonging is everything Girls are trying to find their identity and express themselves – they want to work out who they are and what they can do and be They think their friends are the only people who really understand what they’re going through The here and now is everything – long term health risks of smoking are understood but don’t really matter to them My daughter likes wearing Indian bindis and scarfs on her head like a Muslim. I don’t know why, she just likes it! Peer pressure and getting into relationships might stop me from achieving my dreams. 56 Girls my age don’t think about health at all. They think about outer health (looks). That’s what counts. Introducing SKY... Be true to yourself. Shapo ka yone Sure ka yone I’m good without it I’m sure about it – I like it SKY Be true. What’s your sky? SKY is a movement for girls, by girls. It helps girls be true to themselves and express themselves through the choices they make about things. Girls say Sure Ka Yona to the things they like. They say Shapo Ka Yona to the things they don’t like. SKY aims to make one of these choices the choice not to smoke. 57 Key components of the movement FACEBOOK SKY MAGAZINE SKY LIVE community free, for teens pop up events TOUCH THE SKY music single THE SUNDAY SKY Yarona FM SKY pledge I pledge to be true to myself and what I believe in. To be who I am, not who someone else thinks I should be. I will say Sure Ka Yona to the things that I am all about – like standing up for my friends or listening to a song just because it makes me smile. I will say Shapo Ka Yona to the things I am good without – like smoking or backstabbing. I will make choices that are true to me. I will follow my skyline. I pledge to find my SKY. CELEBRITIES DiSKYples 58 BRANDS Mafia soul, Yarona etc. 15 MINUTES OF SKY Radio RB2 1. Building the SKY movement 59 2. Embedding smoking RADIO SHOW DISCUSSIONS MAGAZINE ARTICLES FACEBOOK COMMENTS CELEBRITY INTERVIEWS QUIZZES AND GAMES PEER-TOPEER SKY SINGLE LYRICS THE PLEDGE CELEBRITY POSTERS 60 3. Expressing the choice not to smoke Shapo ka yone SMOKING! Pledge it – online, facebook, SMS, events Show it – girls’ pictures Say it – girls’ messages Wear it – the wristband Experience it – pop up events 61 Our supporting interventions Teen movement School sessions Parents campaign Seeding Influencers, content 62 Seeding strategy AIM Integrate an anti-smoking/ smoking prevention message into existing content, and encourage key influencers to include the message in their communications, in a seamless way – so that it feels like a natural part of everything surrounding it ROLE Convey an explicit message through implicit means. Complement and reinforce SKY by helping to strip the aspiration out of smoking and re-contextualise it CONTENT CELEBS Zeus local Originates: Botswana Originates: South Africa Silent Shout Rhythm City Youth talk show - one of very few local soapies/ chat based programmes Drama series around the music business – very popular in SA and beyond Zizi Panther South African Boitumelo ‘Boity’ Thulo Samantha Mogwe 63 Parents campaign AIM Make parents think about the role they play in regard to their teenage children’s attitudes to smoking, and adapt a more mindful attitude towards it ROLE Influence the influencers – parents’ smoking behaviour often acts to normalise smoking for their children - indirect but important route to our target In development: likely to be a relatively simple campaign in partnership with newspapers, ‘straight’ and adult in nature – ‘advertorials’ may play a large part within it 64 Schools sessions AIM Use schools as a channel to ‘do’ health in a way that makes knowledge that is currently theoretical relevant and real. Make sure it is a “schools programme” like no other ROLE Reinforce the underlying reason why the majority of girls don’t want to smoke before social context comes into play - and drive it home RULES OF OUR GAME Delivered by an aspirational role model Out of the classroom ‘Gross’ factor Experiential and interactive Bayesian versus classical statistics in evaluating this intervention Robert West Discussion Concluding remarks and possible future steps Robert West