1180:Lab10
The binomial distribution is slightly confusing. You take a random variable that you understand
(like a coin flip) and repeat it many times. The number of successes (heads) becomes your new random variable, which is binomially distributed.
For our example, we’ll use the outcomes of games in the NCAA basketball tournament. These outcomes obviously depend on the seeds involved, with a 1 seed far more like to beat the 16th seed than the 8 is to beat the 9. I’ve created an adhoc relative strength system.
#Hypothetical Relative Strengths getstrengths=function(teams){ seed_winprob=seq(.8,.05,length=teams) seed_strength=seed_winprob/(1-seed_winprob) seed_strength
}
For example, with 16 teams we have the following strengths.
> s=getstrengths(16)
> s
[1] 4.00000000 3.00000000 2.33333333 1.85714286 1.50000000 1.22222222
[7] 1.00000000 0.81818182 0.66666667 0.53846154 0.42857143 0.33333333
[13] 0.25000000 0.17647059 0.11111111 0.05263158
To get the probability that a 16 seed beats a 1 seed in this setup, we divide the strength of the 16 seed by the sum of the strengths of the 16th seed and the 1st seed.
> s[16]/(s[1]+s[16])
[1] 0.01298701
Or the probability that an 3rd seed beats a 9th seed
> s[3]/(s[3]+s[9])
[1] 0.7777778
1
First let’s simulate the classic first round matchups (1v16,2v15,etc). The tournament has had it’s current form for 27 years and there are 4 regions everytime. Therefore we should simulate 27 4-trial experiments. We’ll call the successes ‘upsets’ upsets16=rbinom(27,size=4,prob=s[16]/(s[1]+s[16]) hist(upsets16,breaks=seq(-.5,4.5),main=’16 v 1’,xlab=’upsets’,ylab=’Number of Years’)
Save This Plot Simulate this for all eight of the first round matchups and put the histograms on the same figure.
You’ll need to use this
> par(mfrow=c(2,4)) to split up your graph properly. You may do this by hand or use a for loop.
The following code will simulate a March Madness like tournament of arbitrary size and count the number of upsets that we get.
#Weird Spaghetti Logic that gets the proper matchups
#I did this with recursive programming getseeds=function(rounds){ if(rounds>1){ r=2^rounds x=getseeds(rounds-1) y=r+1-x seed_order=seq(r) for(i in seq(1,r/2)){ seed_order[2*i-1]=x[i] seed_order[2*i]=y[i]
}
} else{seed_order=c(1,2)} seed_order
}
#Simulate Tournament (and count the upsets) tournament=function(rounds){
N=2^rounds upsets=0
2
remainingteams=getseeds(rounds) seed_strength=getstrengths(N) while(N>1){ next_N=N/2 newteams=numeric(next_N) for(i in seq(1,next_N)){ pos1=2*i-1 pos2=pos1+1 teams=remainingteams[c(pos1,pos2)] strengths=seed_strength[teams] winner=sample(teams,size=1,prob=strengths) if(2*seed_strength[winner]<sum(strengths)){upsets=upsets+1} newteams[i]=winner;
} remainingteams=newteams
N=next_N
} upsets
}
The only function that you need to worry about is tournament, which takes as input the number of rounds and spits out a number of random number of upsets.
Write a program that simulates
T tournaments of R rounds each and records the number of upsets.
(Hint you’ll need to call the tournament function multiple times, so use a for loop.)
Use your program to simulate tournaments of increasing size. First do 1000 tournaments with 1 round, then 2 rounds...all the way to 8 rounds. Plot histograms of the number of upsets. When there are four rounds or less, set the ‘breaks’ option in hist so that each bar corresponds to a single integer.
For example, to set ‘breaks’ option for a one round tournament should be hist(....,breaks=seq(-.5,1.5))
This will mean that there is one bin for zero upsets and and one for one upset. In a one round tournament, these are the only two possibilities. Try using hist with and without this option to see what it does.
3