Using Bayesian methods for evaluating behaviour change interventions 12 May 2014

advertisement
www.ucl.ac.uk/behaviour-change
@UCLBehaveChange
behaviourchange@ucl.ac.uk
Using Bayesian methods for evaluating
behaviour change interventions
12 May 2014
With live-blogging directly from the workshop:
http://blogs.ucl.ac.uk/bayesian-bci/
www.ucl.ac.uk/behaviour-change
@UCLBehaveChange
behaviourchange@ucl.ac.uk
Programme
Chair: Dr Jamie Brown
4.30-4.35 Introduction to the Centre for Behaviour Change (Prof Susan Michie)
4.35-4.50 Introductions and aims of the workshop (Prof Robert West)
4.50-5.10 Bayesian statistics and its use in intervention evaluation (Prof Zoltan Dienes)
5.10-5.20 Questions
5.20-5.30 Summary of a social marketing intervention on smoking in Botswana (Larissa
Persons)
5.30-5.50 Bayesian versus classical statistics in evaluating this intervention (Prof Robert
West and Dr Emma Beard)
5.50-6.25 Discussion around other examples provided by participants (Dr Jamie Brown)
6.25-6.30 Concluding remarks and possible future steps (Prof Robert West)
6.30-7.30 Reception
www.ucl.ac.uk/behaviour-change
@UCLBehaveChange
behaviourchange@ucl.ac.uk
UCL Centre for Behaviour Change
Susan Michie, Director
www.bct-taxonomy.com
www.behaviourchangewheel.com
Book Launch:
2 June 2014
Introductions and aims of the workshop
Robert West
How to get the most out of null
results using Bayes
Zoltán Dienes
The problem:
Does a non-significant result count as evidence for the
null hypothesis or as no evidence either way?
? .081
* .034
26
.740
.034
.090
.817
.028
.001
.056
.031
.279
.024
.083
.002
.167
.172
.387
.614
.476
.006
.028
.002
.024
.144
.230
24
*
?
*
***<
?
*
*
?
**
**
*
**
*
-20
25
23
22
21
20
19
18
Successive experiments
Geoff Cummin: http://www.latrobe.edu.au/psy/esci/index.html
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
-10
2
mdiff
H0
0
10
Difference in verbal ability
20
30
40
1. Intervals
2. Bayes factors
The four principles of inference by intervals:
Null region
Minimal interesting value
0
Difference between means ->
accept the null region
hypothesis
reject the null region hypothesis
reject a directional theory
Data are insensitive: suspend judgment
The Bayes Factor:
Strength of evidence for one theory versus another (null)
If B > 1 then the data supported your theory over the null
If B < 1, then the data supported the null over your theory
If B = about 1, experiment was not sensitive.
Jeffreys, 1939: Bayes factors more than 3 or less than a 1/3 are
substantial
If B > 1 then the data supported your theory over the null
If B < 1, then the data supported the null over your theory
If B = about 1, experiment was not sensitive.
Jeffreys, 1939: Bayes factors more than 3 or less than a 1/3 are
substantial
B > 3 substantial support for theory
B < 1/3 substantial support for null
To know which theory data support need to know what the
theories predict
The null is normally the prediction of e.g. no difference
On the null hypothesis
only this value is
plausible
Plausibility
-2
0
2
4
Population difference between conditions
To know which theory data support need to know what the
theories predict
The null is normally the prediction of e.g. no difference
Need to decide what difference or range of differences are
consistent with one’s theory
Difficult - but forces one to think clearly about one’s theory.
To calculate a Bayes factor must decide what range of differences
are predicted by the theory
1) Uniform distribution
2) Normal
3) Half normal
Example: Does imagining a sports move improve sports
performance
Example: Does imagining a sports move improve sports
performance
Plausibility
-2
0
2
4
8
Population difference in means between practice versus no practice
Performance with real practice
for same amount of time
Similar sorts of effects as those predicted in the past have been on
the order of a 5% difference between conditions
Plausibility
0
5
Population difference in means between
conditions
Implies: Smaller effects more likely than bigger ones; effects
bigger than 10% very unlikely
To calculate Bayes factor in a t-test situation
Need same information from the data as for a t-test:
Mean difference, Mdiff
SE of difference, SEdiff
To calculate Bayes factor in a t-test situation
Need same information from the data as for a t-test:
Mean difference, Mdiff
SE of difference, SEdiff
Note: t = Mdiff / SEdiff
SEdiff = Mdiff/t
Also note F(1,x) = t2(x)
Generalising to categorical data
Intervention given to one of two classes about harm of smoking
Smoker
Not smoker
intervention
32
54
no intervention
31
35
Odds ratio = (32*35) / (31*54)
Ln odds ratio is normally distributed with squared SE = 1/32 + 1/54 + 1/31 + 1/35
Generalising to categorical data
Intervention given to one of two classes about harm of smoking
Smoker
Not smoker
intervention
32
54
no intervention
31
35
Odds ratio = (32*35) / (31*54)
Ln odds ratio is normally distributed with squared SE = 1/32 + 1/54 + 1/31 + 1/35
A different intervention had reduced smokers with odds ratio of 3.
Ln 3
To calculate a Bayes factor:
1) Google “Zoltan Dienes”
2) First site to come up is the right one:
http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/
3) Click on “Click here for a Bayes factor calculator”
4) Scroll down and click on “Click here to calculate your Bayes
factor!”
2.96
4.88
0.52
4.88
2.70
0.46
4.40
1024.6
3.33
4.88
1.73
4.28
2.96
49.86
2.16
2.12
1.01
0.65
0.75
28.00
4.28
49.86
5.60
2.36
1.73
The tai chi of the
Bayes factors
p
.081
.034
.74
.034
.09
.817
.028
.001
.056
.031
.279
.024
.083
.002
.167
.172
.387
.614
.476
.006
.028
.002
.024
.144
.23
The dance of the p
values
http://www.latrobe.edu.au/psy/esci/index.html
?.081
*.034
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
.740
*.034
?.090
.817
*.028
***<.001
?.056
*.031
.279
*.024
?.083
**.002
.167
.172
.387
.614
.476
**.006
*.028
**.002
*.024
.144
.230
-20
H0
-10
0
mdiff
10
20
Difference in verbal ability
30
Successive experiments
Bayes
40
My typical practice:
If think of way of determining an approximate expected size of effect
Use half normal with SD = to that typical size
If think of way of determining an approximate upper limit of effect
=> Use uniform from 0 to that limit
Moral and inferential paradoxes of orthodoxy:
1.
On the orthodox approach, standardly you should plan in advance how many subjects
you will run.
If you just miss out on a significant result you are not allowed to just run 10 more subjects
and test again.
You are not allowed to run until you get a significant result.
Bayes: It does not matter when you decide to stop running subjects. You can always run
more subjects if you think it will help.
Moral paradox:
If p = .07 after running planned number of subjects
i)
If you run more and report significant at 5% you have cheated
ii) If you don’t run more and bin the results you have wasted tax payer’s money and
your time, and wasted relevant data
You are morally damned either way
Inferential paradox
Two people with the same data and theories could draw opposite conclusions
Moral and inferential paradoxes of orthodoxy:
2. On the orthodox approach, it matters whether you formulated your hypothesis before
or after looking at the data.
Post hoc vs planned comparisons
Predictions made in advance of rather than before looking at the data are treated
differently
Bayesian inference: It does not matter what day of the week you thought of your theory
The evidence for your theory is just as strong regardless of its timing
Moral and inferential paradoxes of orthodoxy:
3. On the orthodox approach, you must correct for how many tests you conduct in total.
For example, if you ran 100 correlations and 4 were just significant, researchers would not try
to interpret those significant results.
On Bayes, it does not matter how many other statistical hypotheses you investigated (or your
RA without telling you). All that matters is the data relevant to each hypothesis under
investigation.
For orthodoxy but not Bayes:
Different people with the same data and theories can come to different
conclusions
You can thus be tempted to make false (albeit inferentially irrelevant claims),
like when you thought of your theory
What is the aim of statistics?
1)
Control the proportion of errors you make in the long run in accepting and rejecting
hypotheses
(conventional statistics)
2) Indicate how strong the evidence is for one hypothesis rather than another / how
much you should change your confidence in one hypothesis rather than another
(Bayesian statistics)
Dienes 2011 Perspectives on Psychological Science
Threshold = 3
Fixed 10 trials
d = 0 Reject = 2 accept = 55
d = 1 Reject = 91 accept = 0
Threshold :
3
4
5
6
7
8
9
10
Reject
14
12
11
11
7
7
6
5
Accept
86
87
86
86
85
79
74
69
Reject
97
100
100
100
100
100
100
100
Accept
1
0
0
0
0
0
0
0
Population
effect :
d=0
d=1
Table 1 Per cent decision rates for BH(0,1) MaxN = 100 MinN = 1
If maxN = 1000, threshold = 10, for d = 0, reject = 5 accept = 93
Threshold :
3
4
5
6
7
8
9
10
Reject
7
7
7
5
5
3
4
3
Accept
93
91
88
81
83
82
73
66
Reject
100
100
100
100
100
100
100
100
Accept
0
0
0
0
0
0
0
0
Population
effect:
d=0
d=1
Table 2 Per cent decision rates for BH(0,1) MaxN = 100 MinN = 10
Devil hypothesis
Cat hypothesis
If a devil, you
will lose finger
9/10 of time
If a cat, you
lose finger only
1/10 of time
Evidence supports the theory that most strongly predicted it
Evidence supports the theory that most strongly predicted it
John puts his hand in the box and loses a finger.
Which hypothesis is most strongly supported, the cat
hypothesis or the devil hypothesis?
Evidence supports a theory that most strongly predicted it
John puts his hand in the box and loses a finger.
Which hypothesis is most strongly supported, the cat
hypothesis or the devil hypothesis?
Cat hypothesis predicts this result with probability = 1/10
Devil hypothesis predicts this result with probability = 9/10
Evidence supports a theory that most strongly predicted it
John puts his hand in the box and loses a finger.
Which hypothesis is most strongly supported, the cat
hypothesis or the devil hypothesis?
Cat hypothesis predicts this result with probability = 1/10
Devil hypothesis predicts this result with probability = 9/10
Strength of evidence for devil over cat hypothesis
= 9/10 divided by 1/10
=9
The evidence is nine times as strong for the devil over the cat
hypothesis
OR
Bayes Factor (B) = 9
Consider:
John does not lose a finger
Consider:
John does not lose a finger
Now evidence strongly supports cat over devil hypothesis
(BF = 9 for cat over devil hypothesis or 1/9 for devil over cat
hypothesis)
Probability of losing finger given cat = 4/10
Probability of losing finger given devil = 6/10
Now if John loses finger strength of evidence for devil over cat =
6/4 = 1.5
Not very strong
We can distinguish:
Evidence for cat hypothesis over devil
Evidence for devil hypothesis over cat
Not much evidence either way.
Bayes factor tells you how strongly the data are predicted by
the different theories (e.g. your pet theory versus null
hypothesis):
B=
Probability of your data given your pet theory
divided by
probability of data given null hypothesis
Questions
Social Marketing for tobacco control amongst teens
An overview of the Botswana intervention
Our context: a rapid rise in teen girl smoking rates
Aware that smoking is associated
with long-term health issues
Think it is normal to try smoking
12
10.9
10
Think celebrities & aspirational
people (women) smoke
8
6
Think smoking will reap social
rewards
4
2.6
2
Find it hard to say no
0
2002
2008
Grow up seeing adults around them
smoking cigarettes
Rate of cigarette use 13-15 yr old girls (GYTS)
53
Our intervention: Tobacco Control Botswana
Multi-intervention approach to strip out the aspiration from smoking amongst teenage girls
- using different influencers and channels to build momentum & reinforce change
Teen movement
School
sessions
Parents
campaign
Seeding
Influencers, content
54
The teen movement
55
Our start point: the Botswana teenage mind...
Social inclusion and belonging is everything
Girls are trying to find their identity and express themselves – they want to
work out who they are and what they can do and be
They think their friends are the only people who really understand what
they’re going through
The here and now is everything – long term health risks of smoking are
understood but don’t really matter to them
My daughter likes wearing Indian bindis
and scarfs on her head like a Muslim. I
don’t know why, she just likes it!
Peer pressure and getting into
relationships might stop me
from achieving my dreams.
56
Girls my age don’t think about
health at all. They think about outer
health (looks). That’s what counts.
Introducing SKY...
Be true to yourself.
Shapo ka yone
Sure ka yone
I’m good without it
I’m sure about it – I like it
SKY
Be true.
What’s your sky?
SKY is a movement for girls, by girls. It helps girls be true to themselves and express themselves
through the choices they make about things. Girls say Sure Ka Yona to the things they like. They say
Shapo Ka Yona to the things they don’t like.
SKY aims to make one of these choices the choice not to smoke.
57
Key components of the movement
FACEBOOK
SKY MAGAZINE
SKY LIVE
community
free, for teens
pop up events
TOUCH THE SKY
music single
THE SUNDAY
SKY Yarona FM
SKY pledge
I pledge to be true to myself and what I
believe in. To be who I am, not who
someone else thinks I should be.
I will say Sure Ka Yona to the things that I
am all about – like standing up for my
friends or listening to a song just because
it makes me smile.
I will say Shapo Ka Yona to the things I
am good without – like smoking or
backstabbing.
I will make choices that are true to me.
I will follow my skyline.
I pledge to find my SKY.
CELEBRITIES
DiSKYples
58
BRANDS
Mafia soul,
Yarona etc.
15 MINUTES OF
SKY Radio RB2
1. Building the SKY movement
59
2. Embedding smoking
RADIO SHOW
DISCUSSIONS
MAGAZINE
ARTICLES
FACEBOOK
COMMENTS
CELEBRITY
INTERVIEWS
QUIZZES
AND GAMES
PEER-TOPEER
SKY SINGLE
LYRICS
THE PLEDGE
CELEBRITY
POSTERS
60
3. Expressing the choice not to smoke
Shapo ka yone
SMOKING!
Pledge it – online,
facebook, SMS, events
Show it – girls’ pictures
Say it – girls’ messages
Wear it – the wristband
Experience it – pop up
events
61
Our supporting interventions
Teen movement
School
sessions
Parents
campaign
Seeding
Influencers, content
62
Seeding strategy
AIM
Integrate an anti-smoking/ smoking prevention message into existing content, and
encourage key influencers to include the message in their communications, in a
seamless way – so that it feels like a natural part of everything surrounding it
ROLE
Convey an explicit message through implicit means. Complement and reinforce
SKY by helping to strip the aspiration out of smoking and re-contextualise it
CONTENT
CELEBS
Zeus
local
Originates: Botswana
Originates: South Africa
Silent Shout
Rhythm City
Youth talk show - one of
very few local soapies/
chat based programmes
Drama series around the music
business – very popular in SA
and beyond
Zizi Panther
South
African
Boitumelo ‘Boity’
Thulo
Samantha Mogwe
63
Parents campaign
AIM
Make parents think about the role they play in regard to their teenage children’s
attitudes to smoking, and adapt a more mindful attitude towards it
ROLE
Influence the influencers – parents’ smoking behaviour often acts to normalise
smoking for their children - indirect but important route to our target
In development: likely to be a relatively simple campaign in
partnership with newspapers, ‘straight’ and adult in nature –
‘advertorials’ may play a large part within it
64
Schools sessions
AIM
Use schools as a channel to ‘do’ health in a way that makes knowledge that is
currently theoretical relevant and real. Make sure it is a “schools programme” like
no other
ROLE
Reinforce the underlying reason why the majority of girls don’t want to smoke before social context comes into play - and drive it home
RULES OF OUR GAME
Delivered by an
aspirational role model
Out of the classroom
‘Gross’ factor
Experiential and
interactive
Bayesian versus classical statistics in evaluating
this intervention
Robert West
Discussion
Concluding remarks and possible future steps
Robert West
Download