Randomization/Permutation Tests Body Mass Indices Among NBA & WNBA Players Home Field Advantage in England Premier League Background • Goal: Compare 2 (or More) Treatment Effects or Means based on sample measurements Independent Samples: Units in different treatment conditions are independent of one another. In controlled experiments they have been randomized to treatments. Observed data are: Y11,…Y1n1 and Y21,…,Y2n2 Paired Samples: Units are observed under each condition (treatment), and the subsequent difference has been obtained: dj = Y1j – Y2j j=1,…,n • Procedure: Working under null hypothesis of no differences in treatment effects, how extreme is observed treatment difference relative to many (in theory all) possible randomizations/permutations of the observed data to the treatment labels. Independent Samples – 2 Treatments Model: Yij i ij i 1, 2; j 1,..., ni E ij 0 where: Overall population Mean i Effect of Treatment i subject to 1 2 0 i i No Treatment effect (differences in population means) 1 2 0 Yij ij H 0 : 1 2 This observation is its mean + its random error All observed data come from same population and labels "random" Test Statistic used to compare 2 Treatments (One of many): T Y 1 Y 2 ni Yij Yi where: Y i i 1 ni ni Algorithm: o Compute Test Statistic for Observed Data and save o Obtain large number of permutations (N) of observed values to treatment labels o For each permutation, compute the Test Statistic and save o P-value = (# Permuted TS ≥ Observed TS)/(N+1) Example – NBA and WNBA Players’ BMI • Groups: Male: NBA(i=1) and Female: WNBA(i=2) • Samples: Random Samples of n1 = n2 = 20 from 2013 seasons (2013/2014 for NBA) kg lbs BMI 703 2 2 inches metres Player Giannis Antetokounmpo Joel Anthony Alex Len Erik Murphy Ersan Ilyasova Kevin Garnett Chauncey Billups Juwan Howard Vladimir Radmanovic Tiago Splitter Jarvis Varnado Alexey Shved Jermaine O`Neal Michael Kidd-Gilchrist Metta World Peace Tim Hardaway Jr. Greivis Vasquez Daniel Gibson Terrence Ross Chris Kaman id Gender 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Height 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 81 81 85 82 82 83 75 81 82 83 81 78 83 79 79 78 78 74 79 84 Males: Y 1 24.95 Females: Y 2 23.35 Test Statistic: T Y 1 Y 2 24.95 23.35 1.60 Weight BMI Player 205 21.97 Tamika Catchings 245 26.25 Courtney Clements 255 24.81 Allie Quigley 230 24.05 Quanitra Hollingsworth 235 24.57 Katie Smith 253 25.82 Tayler Hill 202 25.25 Allison Hightower 250 26.79 Kara Braxton 235 24.57 Eshaya Murphy 240 24.49 Michelle Campbell 230 24.64 Briann January 190 21.95 Jasmine James 255 26.02 Kelsey Bone 232 26.13 Jia Perkins 260 29.29 Ebony Hoffman 205 23.69 Shavonte Zellous 211 24.38 Matee Ajavon 200 25.68 Karima Christmas 197 22.19 Erika de Souza 265 26.40 Jayne Appel id Gender 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Height 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 73 72 70 77 71 70 70 78 71 74 68 69 76 68 74 70 68 72 77 76 Weight BMI 167 22.03059 155 21.01948 140 20.08571 203 24.06966 175 24.40488 145 20.80306 139 19.94224 225 25.99852 164 22.87086 183 23.49324 144 21.89273 175 25.84016 200 24.34211 155 23.5651 215 27.60135 155 22.23776 160 24.32526 180 24.40972 190 22.52825 210 25.55921 Permutation Samples • Generate Permutations of the 40 integers using a random number generator (like pulling 1:40 from hat, one-at-a-time without replacement) • Assign the first 20 players (based on id) selected to Treatment 1, last 20 to Treatment 2 • Compute and save Test Statistic: T Y Y • Continue for many (N total) samples • Count number as large or larger than observed Test Statistic (in absolute value, if 2-sided test) • P-value obtained as (Count+1)/(N+1) 1 2 Permutation Samples (EXCEL) Group id 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 BMI Ran1 sort_ran1 sort_id sort_BMI Group 21.9654 0.12415 0.01077 11 24.6441 26.2513 0.23551 0.06690 34 23.5651 24.8118 0.60869 0.08209 38 24.4097 24.0467 0.41569 0.08289 24 24.0697 24.5695 0.53313 0.08631 7 25.2455 25.8178 0.49959 0.09680 10 24.4912 25.2455 0.08631 0.11182 33 24.3421 26.7871 0.77255 0.12415 1 21.9654 24.5695 0.66982 0.12967 15 29.2870 24.4912 0.09680 0.19153 16 23.6875 24.6441 0.01077 0.23389 26 20.8031 21.9543 0.92364 0.23551 2 26.2513 26.0219 0.66018 0.25141 40 25.5592 26.1330 0.52730 0.25471 28 25.9985 29.2870 0.12967 0.37605 35 27.6014 23.6875 0.19153 0.38224 18 25.6757 24.3808 0.82900 0.41569 4 24.0467 25.6757 0.38224 0.45036 19 22.1905 22.1905 0.45036 0.45735 31 21.8927 26.4024 0.76531 0.49959 6 25.8178 22.0306 0.63860 0.51106 39 22.5283 21.0195 0.88937 0.52730 14 26.1330 20.0857 0.80044 0.53313 5 24.5695 24.0697 0.08289 0.54241 36 22.2378 24.4049 0.87924 0.60869 3 24.8118 20.8031 0.23389 0.63860 21 22.0306 19.9422 0.96878 0.66018 13 26.0219 25.9985 0.25471 0.66982 9 24.5695 22.8709 0.89297 0.69176 37 24.3253 23.4932 0.81115 0.76531 20 26.4024 21.8927 0.45735 0.77255 8 26.7871 25.8402 0.93426 0.80044 23 20.0857 24.3421 0.11182 0.81115 30 23.4932 23.5651 0.06690 0.82900 17 24.3808 27.6014 0.37605 0.87924 25 24.4049 22.2378 0.54241 0.88937 22 21.0195 24.3253 0.69176 0.89297 29 22.8709 24.4097 0.08209 0.92364 12 21.9543 22.5283 0.51106 0.93426 32 25.8402 25.5592 0.25141 0.96878 27 19.9422 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Original Sample (Column C) Group Mean 1 24.9466 2 23.3510 Difference 1.5957 Permutation Sample (Column G) Group Mean 1 24.5772 2 23.7204 Difference 0.8568 Comments: Column 4: (Ran1) has smallest number (.01077) corresponding to id=11. Thus player 11 is first player in group 1 in Permutation sample. Next smallest is .06690 (id=34) The “sort” columns (5-8) give the first permutation samples for the 2 groups. The difference in BMI for groups 1 and 2 in the original sample is 1.5957 The difference in BMI for groups 1 and 2 in the permutation sample is 0.8568 R Program ### Download dataset nba.bmi <- read.csv("http://www.stat.ufl.edu/~winner/data/wnba_nba_bmi.csv", header=T) attach(nba.bmi); names(nba.bmi) ### Obtain sample sizes, sample means, and observed Test Statistic (n1 <- length(BMI[Gender==1])); (n2 <- length(BMI[Gender==2])) (ybar1.obs <- mean(BMI[Gender==1])); (ybar2.obs <- mean(BMI[Gender==2])) (TS.obs <- ybar1.obs-ybar2.obs); (n.tot <- n1+n2) ### Choose number of permutations and initialize TS vector to save Test Statistics ### set seed to be able to reproduce permutation samples N <- 9999; TS <- rep(0,N); set.seed(97531) ### Loop through N samples, generating Test Stat each time for (i in 1:N) { perm <- sample(1:n.tot,size=n.tot,replace=F) if (i == 1) print(perm) ybar1 <- mean(BMI[perm[1:n1]]) ### mean BMI of first n1 elements of perm ybar2 <- mean(BMI[perm[(n1+1):(n1+n2)]]) ### mean BMI of next n2 elements of perm TS[i] <- ybar1-ybar2 } ### Count # of cases where abs(TS) >= abs(TS.obs) for 2-sided test and obtain p-value (num.exceed <- sum(abs(TS)>=abs(TS.obs))) (p.val.2sided <- (num.exceed+1)/(N+1)) ### Draw histogram of distribution of TS, with vertical line at TS.obs hist(TS,xlab="Mean1 - Mean2",breaks=seq(-2.5,2.5,0.25), main="Randomization Distribution for BMI") abline(v=TS.obs) R Output > ### Obtain sample sizes, sample means, and observed Test Statistic > (n1 <- length(BMI[Gender==1])) [1] 20 > (n2 <- length(BMI[Gender==2])) [1] 20 > (ybar1.obs <- mean(BMI[Gender==1])) [1] 24.94665 > (ybar2.obs <- mean(BMI[Gender==2])) [1] 23.35099 > (TS.obs <- ybar1.obs-ybar2.obs) [1] 1.595653 > (n.tot <- n1+n2) [1] 40 ### First permutation of 1:40 [1] 26 31 12 20 4 28 23 13 2 19 9 35 34 5 16 14 29 11 32 24 39 10 [26] 30 21 27 1 38 17 22 15 25 8 18 6 40 33 37 7 3 36 > ### Count # of cases where abs(TS) >= abs(TS.obs) for 2-sided test and obtain p-value > (num.exceed <- sum(abs(TS)>=abs(TS.obs))) [1] 121 > (p.val.2sided <- (num.exceed+1)/(N+1)) [1] 0.0122 Normal t-test (Equal Variances Assumed) Model: Yij i ij i ij i 1, 2; ni 2 Y i ~ N i , ni Y Y i i j 1 ni ni Y 1 Y 2 ni Yij 1 1 ~ N 1 2 , 2 n1 n2 ij ~ NID 0, 2 j 1,..., ni s 2 i j 1 Y Z Yij Y i ni 1 1 2 W Y 1 n1 1 s12 n2 1 s22 2 n1 n2 2 1 1 2 n1 n2 2 i ~ N 0,1 Pooled Sample Variance: s n1 1 s12 n2 1 s22 2 n1 n2 2 s 2p Y i si2 W Z 2 2 p n1 1 s12 n2 1 s22 T n1 n2 2 Z ~ Student's t W Y 2 1 2 1 1 n 1 n2 2 T 1 ni 1 si2 ~ 2 n 1 2 Y 2 1 2 n1 1 s12 n2 1 s22 n1 1 s12 n2 1 s22 ~ 2 n n 2 2 2 2 2 s 2p 2 Y 1 Y 2 1 2 2 Y 1 Y 2 1 2 ~ Student's t n1 n2 2 2 s 1 1 1 1 p 2 s 2p n1 n2 n1 n2 t-test for NBA vs WNBA BMI H 0 : 1 2 0 TS : tobs Y 1 H A : 1 2 0 Y 2 0 H0 ~ t n1 n2 2 1 1 s n1 n2 Data (From EXCELSpreadsheet): 2 p n1 n2 20 Y 1 24.9466 Y 2 23.3510 s12 3.0919 s22 4.2694 s 2 p 20 1 3.0919 20 1 4.2694 3.6806 tobs 20 20 2 24.9466 23.3510 0 1 1 3.6806 20 20 2.6301 P value : 2 P t 38 2.6301 2(.0061) .0122 Note: the Permutation and t-tests give the same P-value to 4 decimal places – ≈Normal Data Paired Samples • Data Consists of n Pairs of Observations (Y1j,Y2j) j=1,…,n • Data are on same subject (individuals matched on external criteria) under 2 conditions (often Before/After) • Construct the differences: dj = Y1j - Y2j • The true population mean difference is: d = 1 – 2 • Wish to test H0: d = 0 with a 1-sided or 2-sided alternative Yij i ij i ij i 1, 2; j 1,..., n E ij 0 d j Y1 j Y2 j 1 1 j 2 2 j 1 2 j 1 2 j j 1 j 2 j Under H 0 : d 1 2 0 : d j j E d j 0 Thus: Under H 0 , once a difference is observed, it could have just as easily been +/- Procedure • Compute an observed Test Statistic that measures the treatment effect in some manner (such as the sample mean of the differences) • For many randomization samples: Generate a series of n U(0,1) random variables: U1,…,Un If (say) Uj< 0.5 set dj* = -dj where dj* is difference for case j in this sample, otherwise, set dj* = dj Compute the Test Statistic for this sample and save • Compare the observed Test Statistic with the sample Test Statistics in a manner similar to Independent Sample Case: Computing the proportion of sample Test Statistics as extreme or more than the observed Test Statistics Example: English Premier League Football - 2012 • Interested in Determining if there is a home field effect League has 20 teams, all play all 19 opponents Home and Away (190 pairs of teams, each playing once on each team’s home field). No overtime. Label teams in alphabetical order: 1=Arsenal, 20=Wigan Let Y1jk = (Hj-Ak) j < k Differential when j at Home, k is Away Let Y2jk = (Aj-Hk) j < k Differential when j is Away, k is at Home djk = Y1jk – Y2jk = (Hj+Hk) - (Aj+Ak) j < k • Note: d represents combined Home Goals – Combined Away Goals for the Pair of teams • No home effect should mean d = 0 Representative Games from the Sample Team.j Arsenal Chelsea Fulham Manchester City Newcastle United Norwich City Southampton Sunderland West Ham United Team.k H.j Aston Villa Everton Liverpool Manchester United Queens Park Rangers Wigan Athletic Stoke City West Bromwich Albion Wigan Athletic A.k 2 2 1 2 1 2 1 2 2 Y.1jk 1 1 3 3 0 1 1 4 0 H.k 1 1 -2 -1 1 1 0 -2 2 A.j 0 1 4 1 1 1 3 2 2 Y.2jk 0 2 0 2 2 0 3 1 1 0 1 -4 1 1 -1 0 -1 -1 Average d.jk 1 0 2 -2 0 2 0 -1 3 0.556 Ran1 d.jk*(1) Ran2 d.jk*(2) 0.3686 -1 0.4514 -1 0.6741 0 0.0780 0 0.5002 2 0.1319 -2 0.0414 2 0.9600 -2 0.8097 0 0.0184 0 0.6642 2 0.4300 -2 0.9612 0 0.9095 0 0.1422 1 0.0997 1 0.9499 3 0.5974 3 1.000 -0.333 Comments (regarding these 9 pairs, and these 2 samples - Full Analysis next slide): For the original sample, the Test Statistic is the Average Difference: 0.556 For the first random sample, games 1,4,8 had Ran1 < 0.5, and their djk switched sign. The new sampled test statistic was 1.000 For the second random sample, games 1,2,3,5,6,8 had Ran2 < 0.5, and their djk switched sign. The new sampled test statistic was -0.333 The p-value for a 1-tailed (HA: d > 0) would be p = (1+1)/(2+1) = 2/3 as both the original sample and Ran1 have Test Statistics ≥ 0.556. The 2-sided is also p = 2/3 R Program epl2012 <- read.csv("http://www.stat.ufl.edu/~winner/data/epl_2012_home_perm.csv", header=T) attach(epl2012); names(epl2012) ### Obtain Sample Size and Test Statistic (Average of d.jk) (n <- length(d.jk)) (TS.obs <- mean(d.jk)) ### Choose the number of samples and initialize TS, and set seed N <- 9999; TS <- rep(0,N); set.seed(86420) ### Loop through samples and compute each TS for (i in 1:N) { ds.jk <- d.jk # Initialize d*.jk = d.jk u <- runif(n)-0.5 # Generate n U(-0.5,0.5)'s u.s <- sign(u) # -1 if u.s < 0, +1 if u.s > 0 ds.jk <- u.s * ds.jk TS[i] <- mean(ds.jk) # Compute Test Statistic for this sample } summary(TS) (num.exceed1 <- sum(TS >= TS.obs)) # Count for 1-sided (Upper Tail) P-value (num.exceed2 <- sum(abs(TS) >= abs(TS.obs))) # Count for 2-sided P-value (p.val.1sided <- (num.exceed1 + 1)/(N+1)) # 1-sided p-value (p.val.2sided <- (num.exceed2 + 1)/(N+1)) # 2-sided p-value ### Draw histogram of distribution of TS, with vertical line at TS.obs hist(TS,xlab="Mean Home-Away",main="Randomization Distribution for EPL 2012 Home Field Advantage") abline(v=TS.obs) R Output > > ### Obtain Sample Size and Test Statistic (Average of d.jk) > (n <- length(d.jk)) [1] 190 > (TS.obs <- mean(d.jk)) [1] 0.6368421 > > summary(TS) Min. 1st Qu. Median Mean -0.573700 -0.110500 -0.005263 -0.002513 3rd Qu. 0.100000 Max. 0.542100 > (num.exceed1 <- sum(TS >= TS.obs)) # Count for 1-sided (Upper Tail) P-value [1] 0 > (num.exceed2 <- sum(abs(TS) >= abs(TS.obs))) # Count for 2-sided P-value [1] 0 > (p.val.1sided <- (num.exceed1 + 1)/(N+1)) # 1-sided p-value [1] 1e-04 > (p.val.2sided <- (num.exceed2 + 1)/(N+1)) # 2-sided p-value [1] 1e-04 The observed Mean difference (0.6368) exceeded all 9999 sampled values: (min = -0.5737, max = 0.5421) Thus, both P-values = (0+1)/(9999+1) = .0001 Normal Paired t-test Yij i ij i ij i 1, 2; j 1,..., n E ij 0 V ij 2 COV 1 j , 2 j 2 d j Y1 j Y2 j 1 1 j 2 2 j 1 2 j 1 2 j j 1 j 2 j E j E 1 j 2 j E 1 j E 2 j 0 0 0 V j V 1 j 2 j V 1 j V 2 j 2COV 1 j , 2 j 2 2 2 2 2 1 2 2 E d j 1 2 V d j 2 E d 1 2 V d 2 n Under Normality of d j (unlikely here): Z d 1 2 2 ~ N 0,1 n 1 sd2 ~ 2 n 1 2 d sd2 n By same argument as for Indepent Samples t-test: d 1 2 2 T n n 1 sd2 2 d 1 2 2 d 1 2 ~ t n 1 2 2 2 s sd d n 1 n n Paired t-test for EPL 2012 Home vs Away Goals H 0 : 1 2 0 TS : tobs d0 H A : 1 2 0 H0 ~ t n 1 sd2 n Data (From EXCELSpreadsheet): n 190 d 0.6368 sd2 4.3912 tobs 0.6368 0 4.1888 4.3912 190 P value : 2 P t 189 4.1888 2(.00002) .00004 Note: the t-test gives smaller P-value, but Permutation test was limited to number of samples