Hierarchical Bayesian Analysis of Proportions - Dwight Howard Free Throw Shooting (2013-2014)

advertisement
Hierarchical Bayesian Analysis:
Binomial Proportions
Dwight Howard’s Game by Game Free Throw Success
Rate – 2013/2014 NBA Season
Data Source: www.nba.com
Data/Model Description
• n = 70 NBA games during 2013/14 season that Dwight
Howard attempted at least one free throw (aka foul
shot)
• Assume that for each game, Mr. Howard has an
underlying “true” success rate for free throws, pi ,
which can vary due to many environmental factors
(although the actual process is the same: undefended
shot 15’ from the frame of the backboard)
• For the ith game, Mr. Howard takes ni free throw
attempts, successfully making yi attempts
• Assume: Random Variable Yi ~ Binomial( ni, pi )
Binomial Likelihood for Y|p
p  P(Success)
n  #Trials
Y  # of Successes
n
n y
p ( y )  P Y  y | n, p     p y 1  p 
 y
0.3
E Y   np V Y   np 1  p 
0 yn
0.45
0.3
0.4
0.25
0.25
0.35
0.2
0.3
0.2
0.25
0.15
0.15
0.2
0.1
0.15
0.1
0.1
0.05
0.05
0.05
0
0
0
0
1
2
3
4
5
6
7
8
9 10
Bin(n=10 , p = 0.25)
0
1
2
3
4
5
6
7
Bin(n=10 , p = 0.50)
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Bin(n=10 , p = 0.90)
Modeling the Variation in Success Rates - pi
• Prior Distribution: Beliefs on possible values of pi and
how “likely” they are. Important questions:
 What is the range of possible values? Between 0 and 1
 What is the “expected value”?
0.2? 0.5? 0.8?
 What is a range of values we may want to put most of the
density between? (0.025-0.975)? (0.25-0.75)? (0.4-0.8)?
 What is the shape of the distribution? The beta family of
densities give a natural (and conjugate) distribution with very
much flexibility for the shape of the prior.
p i ~ Beta  ,   
E p i  

 
p p i  
V p i  
      1
 1
p i 1  p i 
      

2




     1
0  pi  1 ,   0
Beta Prior for p
3
1.2
2
2.5
1.8
1
1.6
2
1.4
0.8
0.6
f(pi)
p(pi)
p(pi)
1.2
1
0.8
0.4
1.5
1
0.6
0.5
0.4
0.2
0.2
0
0
0
0
0.5
pi
Beta(1,1) - Uniform
1
0
0.5
pi
Beta(3,2)
1
0
0.5
pi
Beta(5,5)
1
Prior Distributions for , 
• The parameters of the Beta distribution that acts as the
prior distribution for the individual game pi must be
specified, or given prior distributions themselves.
• The mean of the distribution of the ps is m = /(+)
which can lead to choices for the means of the priors for
 and 
• Suppose we want to choose distributions for  and  so
that the prior mean is around 0.60 (he is a center and
tall). We want to allow for a wide range of possibilities,
permitting the data to have a larger impact on the
posterior densities of the ps and m
• Exponential Distributions:  ~ EXP(0.33)  ~ EXP(0.50)
Prior Distributions for , 
p    0.33e 0.33   0
p     0.50e 0.50 
E    3 V    9
E    2 V    4
 0
Prior Densities for  and 
0.6
0.5
p(alpha), p(beta)
0.4
p(alpha)
0.3
p(beta)
0.2
0.1
0
0
1
2
3
4
5
alpha,beta
6
7
8
9
10
Posterior Distributions of , , p1,…,pn
n 
n y
Likelihood (Stage 1): p  y1 ,..., y70 | p 1 ,..., p 70     i  p iyi 1  p i  i i
i 1  yi 
70
      1
 1
Priors on Stage 1 Parameters (Stage 2): p p 1 ,..., p 70 |  ,    
p i 1  p i 
i 1       
70
Priors on Stage 2 Parameters (Stage 3): p  ,     0.33e 0.33  0.50e 0.50  
Posterior Distribution of Parameters given the data is proportional to the product of the densities:
p p 1 ,..., p 70 ,  ,  | y1 ,..., y70   e
      
 e 0.33 e 0.50  











70
0.33
70
e
0.50 
 70  ni  yi
ni  yi     
 1 
 1
p i 1  p i  
   p i 1  p i 
y








 i 1  i 

ni  yi   1
yi  1


p
1

p



i
i


i 1
MCMC Implementation in OpenBugs
• Assign Distributions and Relations for , , m, pi}, {Yi}
Yi | ni , p i ~ Bin  ni , p i 
p i |  ,  ~ Beta  ,  
 ni  yi
n y
f  yi     p i 1  p i  i i 0  yi  ni
 yi 
      1
 1
f p i  
p i 1  p i 
0  p i  1;  ,   0
      
 ~ Exp  0.33
f ( )  0.33e 0.33
 ~ Exp  0.50 
f (  )  0.50e 0.50 
m

 
Overdispersed Initial Values for 3 Chains:
Chain 1:   1,   1, p i  0.5 i  1,..., 70
Chain 2:   100,   10, p i  0.9 i  1,..., 70
Chain 3:   10,   100, p i  0.1 i  1,..., 70
Summary of Results - , , m
Mean
SD
MC Error
8.61
2.547 0.03375
7.128
2.093 0.02767
0.5468 0.02489 8.24E-05
alpha sample: 300000
0.0
50%
8.246
6.836
0.547
97.50%
14.53
11.99
0.5954
beta sample: 300000
10.0
20.0
0.0
alpha
P(mu)
0.010.020.0
2.50%
4.674
3.886
0.4978
P(beta)
0.0 0.2
P(alpha)
0.0 0.1 0.2
Parameter
alpha
beta
mu
10.0
20.0
beta
mu sample: 300000
0.4
0.5
0.6
mu
0.7
Distribution of game specific “true”
success rates are centered at 0.55 with a
standard deviation of 0.025. A 95%
credible set for his true average success
rate is 0.50 to 0.60.
Summary of Results - pi
Rank
…
Game
1 pi[39]
2 pi[65]
3 pi[8]
4 pi[28]
…
34 pi[31]
35 pi[27]
36 pi[9]
37 pi[62]
…
…
67 pi[66]
68 pi[60]
69 pi[24]
70 pi[47]
Mean
0.418
0.4264
0.4375
0.4443
…
0.5415
0.5416
0.5432
0.5499
…
0.637
0.6456
0.6618
0.6917
SD
0.107
0.08958
0.1094
0.08811
…
0.1203
0.1201
0.09327
0.1008
…
0.09261
0.07668
0.09364
0.1009
MC Error
3.93E-04
2.84E-04
3.61E-04
2.44E-04
…
2.33E-04
2.37E-04
1.79E-04
2.00E-04
…
2.51E-04
2.03E-04
2.92E-04
4.04E-04
2.50%
0.2114
0.2539
0.2243
0.2738
50%
97.50%
0.4175
0.6277
0.4258
0.6025
0.4377
0.6503
0.4441
0.617
…
…
…
0.3023
0.302
0.3585
0.3491
0.5436
0.5436
0.5443
0.5513
0.7704
0.7701
0.7217
0.7413
…
…
…
0.4496
0.4902
0.4712
0.4857
0.6397
0.6476
0.6648
0.6954
0.8105
0.7895
0.8356
0.8763
Y
n
1
5
1
6
…
7
16
6
17
…
1
1
7
5
…
…
2
0.5
2
0.5
13 0.538462
9 0.555556
…
9
17
9
7
Pi-hat
0.142857
0.3125
0.166667
0.352941
…
12
0.75
24 0.708333
11 0.818182
7
1
The table includes Lowest, Middle, and Highest 4 game specific posterior success
rates. Note that the lower game specific success rates are increased from the MLE
Pi-hat = Y/n to the overall mean (with the amount of shift highest when ni is small).
Similarly higher game specific success rates are shrunk toward the overall mean.
P(pi[39])
0.0 2.0 4.0
pi[39] sample: 300000
Game with
lowest
posterior
mean
success rate
0.0
0.25
0.5
0.75
1.0
P(pi[47])
0.0 2.0 4.0
pi[39]
pi[47] sample: 300000
Game with
highest
posterior
mean
success rate
0.0
0.25
0.5
0.75
pi[47]
1.0
Download