Section 002: Wednesday 9:40−10:30am
Section 003: Wednesday 10:45−11:35am
Lab location − LCB 115
Lab instructor − Erica Graham, graham@math.utah.edu
Lab webpage − www.math.utah.edu/~graham/Math1180.html
Lab office hours − Monday, 9:00 − 10:30 a.m., LCB 115
Review: In the previous lab, we explored covariances and correlations between 2 random variables in the context of virus susceptibility.
Background: Today, we’ll use Maple to calculate the maximum likelihood estimator for the binomial distribution. We will also see how confidence limits relate to hypothesis testing.
restart; with(Statistics): with(plots): randomize(): ## makes your results (more or less) unique
Suppose the probability of having a cartilage piercing in a particular subpopulation is 0.4. Also assume that the decision of an individual in this population to get his/her cartilage pierced is not influenced by any other person. Given this probability, how can we simulate 50 groups of 5 people to determine how many groups have exactly 2 people with their cartilage pierced? We first need to figure out the type of distribution.
Since the piercing review committee only returns a ’yes’ or a ’no’, we can think of this as a binomially distributed event. So, our first step in the simulation is to defined the piercing random variable, say ’C’ for cartilage, that has a binomial distribution. The Binomial(n, p) command in the Statistics package will let us do this. Since we are only sampling 5 people at a time, the binomial distribution parameter, n, is 5. Then we can sample C 50 times to get our groups. The 50 numbers we get are the number of cartilage−pierced people in those groups. We’ll call this list ’groups.’
C:=RandomVariable(Binomial(5,0.4)): groups:=Sample(C,50): ## sample 50 groups of 5 people; returns number with pierced cartilage
To see how many groups have exactly 2 cartilage−pierced people, we’ll use the new Maple command ’evalhf
( )’ along with the previously introduced add( ) command.
groups[1]; evalhf(groups[1]=2); results:=[seq(evalhf(groups[j]=2),j=1..50)]; ## 1s if = 2, 0s if not k50:=add(results[j],j=1..50); ## add all 1s to get no. of groups with exactly 2 pierced people
3.
0.
results := 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1.
k50 := 11.
Page 1 of 7
So, the experimental prevalence of cartilage piercing in the given subpopulation is k50/50. We can compare this to the actual probability of having exactly 2 of 5 people with pierced cartilage using the
ProbabilityFunction( ) command in the Statistics package. Notice what happens for this command in general and how this changes when we make k = 2.
k50/50; ## experimental result f:=ProbabilityFunction(C,k); eval(f,k=2); ## estimated probability of finding exactly 2 of 5 with pierced cartilage.
0.2200000000
f :=
0 binomial 5, k 0.4
k
0.6
5 k k 0 otherwise
0.34560
Now, we’ll look at the case where we consider simulating 50 different groups of 5, all with different probabilities of having cartilage piercings. Given a certain probability, how many groups out of each set of
50 could we find exactly 2 of 5 people with the piercing? To simulate this situation, we’ll need a list of ’test’ probabilities. Then we’ll create 50 groups for each of these probabilities, see which ones have exactly 2 pierced people, total them and plot our findings.
Pc:=[seq(0.05..0.95,0.05)]; ## test probabilities
N:=nops(Pc); ## how many loops we do
Pc := 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80,
0.85, 0.90, 0.95
N := 19 for i from 1 to N do:
X:=RandomVariable(Binomial(5,Pc[i])): grp:=Sample(X,50): count[i]:=add([seq(evalhf(grp[j]=2),j=1..50)][k],k=1..50): end do:
Note: The ’\n’ in the plot labels creates a line break in the label.
plot([seq([[Pc[i],count[i]],[Pc[i+1],count[i+1]]],i=1..N−1)],p=0.
.1,color="Black",gridlines=true,labels=["probability","# groups with 2 of 5\ncartilage−pierced people"],labeldirections=
[horizontal, vertical]);
18
14
10
6
0
0 0.2
0.4
0.6
probability
0.8
1
Notice which probability gives the maximum number of groups. We can compare this to the maximum likelihood (ML) estimator for p. To start, we’ll define a new random variable with n=5 and p undefined, and then define the likelihood function L(p) using ProbabilityFunction( ).
Cmle:=RandomVariable(Binomial(5,p)):
L:=unapply(ProbabilityFunction(Cmle,2),p); ## the likelihood function for this problem
L := p 10 p
2 p 1
3
Page 2 of 7
Without doing actual computation, we can get an idea of where the ML estimator might be.
plot(L(p),p=0..1,0..0.5,labels=["p","likelihood"],gridlines=true, labeldirections=[horizontal, vertical]);
0.5
0.4
0.3
0.2
0.1
0
0 0.2
0.4
0.6
0.8
1 p
Let’s find the ML estimator explicitly. Since we know p is a probability and our estimator is therefore strictly between 0 and 1, we can tell Maple to exclude these extreme values from its search.
MLest:=fsolve(diff(L(p),p)=0,p=0.001..0.999); ## where’s the derivative equal to zero?
MLest := 0.4000000000
Now, let’s find the 95% confidence limits. For this, the degree of confidence is 0.95, and so alpha = 1 −
0.95 = 0.05 represents the probability that the true proportion of groups with exactly 2 cartilage−pierced members lies outside the confidence limits.
5
We can find these limits exactly by solving
k = 2
b k; 5, p l
2
=
2
for the lower limit pl and
k = 0
b k; 5, p h
=
2
for the upper limit ph. We’ll define the left−hand sides of these equations as functions of p.
alpha:=1−0.95; b:=ProbabilityFunction(Cmle,k); ## binomial p.d.f. for a generic k value lowerfn:=unapply(sum(b,k=2..5),p); ## lower confidence limit summation upperfn:=unapply(sum(b,k=0..2),p); ## upper confidence limit summation
:= 0.05
b :=
0 binomial 5, k p k p 1
5 k k 0 otherwise
lowerfn := p 10 p
2 p 1
upperfn := p p 1
5
3
10 p
3 p 1
5 p p 1
4
2
5 p
4 p 1 p
5
10 p
2 p 1
3
Now, we can see what it is that we wish to determine graphically. Normally, we’ve used plots[display]( ) to see more than one graph side−by−side. Since we’ve imported the ’plots’ package, we can use the display( ) command by itself.
plot1:=plot([[0,alpha/2],[1,alpha/2]],color="Red"): ## probability threshold plot2:=display([plot(lowerfn(p),p=0..1,color="LimeGreen"),plot1], title="Pr(finding 2 or more with cartilage piercings)"): plot3:=display([plot(upperfn(p),p=0..1,color="DodgerBlue"),plot1], title="Pr(finding less than 2 with cartilage piercings)"):
Page 3 of 7
display(Array(1..2,[plot2,plot3]),gridlines=true,labels=
["proportion with cartilage piercings",""]);
Pr(finding 2 or more with cartilage piercings)
1
0.6
0
0 0.2
0.4
0.6
0.8
1 proportion with cartilage piercings
Pr(finding less than 2 with cartilage piercings)
1
0.6
0
0 0.2
0.4
0.6
0.8
1 proportion with cartilage piercings
We wouldn’t want to solve for the points where the lines meet in each graph by hand, but luckily for us,
Maple will. We’ll specify a solution range between 0 and 1 for the probabilities.
pl_true:=fsolve(lowerfn(p)=alpha/2,p=0..1); ## where left graph’s lines meet ph_true:=fsolve(upperfn(p)=alpha/2,p=0..1); ## where right graph’s lines meet
pl_true := 0.05274495053
ph_true := 0.8533672004
What did we actually find with pl_true and ph_true?
plot00:=plot(L(p),p=pl_true..ph_true,0..0.5,filled=true,color=
"LightGray"): plot01:=plot(L(p),p=0..1,0..0.5,color="Black",thickness=2): plot02:=plot([[pl_true,alpha/2],[ph_true,alpha/2]],color="Red", style=point,symbol=solidcircle,symbolsize=14): plot03:=plot([[pl_true,alpha/2],[ph_true,alpha/2]],color="Red"): display([plot00,plot01,plot02,plot03],labels=["p","likelihood"], labeldirections=[horizontal, vertical]);
0.5
0.4
0.3
0.2
0.1
0
0 0.2
0.4
0.6
0.8
1 p
How does the idea of confidence intervals relate to hypothesis testing? Suppose we knew for sure (due to extreme stalking by highly paid individuals) that 40% of the subpopulation at hand had cartilage piercings, but that out of 100 individuals we observed on the street, we found that only 25% were sporting pierced cartilage. What’s the probability that we would be able to find such a number of cartilage−pierced individuals that is so far from the expected number in the subpopulation? And, is this enough to completely invalidate the 40% that those professional stalkers found?
First, we need to identify this in terms of hypothesis testing. Our null hypothesis says that the true parameter is 0.4, i.e. it matches the population proportion. We can decide whether to accept or reject the null hypothesis based on the 95% confidence limits determined by this new problem. This approach asks whether the true proportion is actually equal to 0.4. We’ll look at a zoomed in version of the appropriate
Page 4 of 7
functions to simply approximate the confidence limits (because Maple has issues with super−high accuracy).
Note that the summation notation is equivalent to what we did with the original ’lowerfn’ and ’upperfn.’
This construction will keep the function in summation form.
lowerfn2:=(n,p)−>sum(n!/(k!*(n−k)!)*p^k*(1−p)^(n−k),k=25..100); upperfn2:=(n,p)−>sum(n!/(k!*(n−k)!)*p^k*(1−p)^(n−k),k=0..25);
lowerfn2 := n, p
100
k = 25
n! p k
1 p n k
k! n k !
upperfn2 := n, p
25
k = 0
n! p k
1 p n k
k! n k !
plot4:=plot([lowerfn2(100,p),upperfn2(100,p)],p=0..0.5,0..0.05, color=["LimeGreen","DodgerBlue"]): display([plot4,plot([[0,alpha/2],[0.5,alpha/2]],color="Red")], gridlines=true);
0.05
0.04
0.03
0.02
0.01
0
0 0.1
0.2
0.3
0.4
0.5
p
Here, pl ~ 0.17 and ph ~ 0.35. Since 0.4 does not fall within this range, we may reject the null hypothesis.
This means we can be confident that having found 25 of 100 people with pierced cartilage, it’s not true that
0.4 describes piercing prevalence. To further solidify this conclusion, we can compute the p−value of our test. This is the probability of observing a result at least as extreme as the measured result if the null hypothesis is true. To do this, we find the probability of observing 25 or 55 people of 100 with pierced cartilage given the expected proportion of 0.4.
F:=(n,p)−>sum(n!/(k!*(n−k)!)*p^k*(1−p)^(n−k),k=0..15)+sum(n!/(k!*
(n−k)!)*p^k*(1−p)^(n−k),k=55..100); pValue:=F(100,MLest);
15 100
n! p k
1 p n k
n! p k
1 p n k
F := n, p
k = 0
k! n k !
k = 55
k! n k !
pValue := 0.001710977806
The result of this two−tailed test says that there is less than a 0.2% chance of us finding some number of people with pierced cartilage that is more than (40−25) = 15 people away from the expected value (on either side) if 0.4 is the true proportion. In other words, we can reject the findings of the professional stalkers because there’s (statistically) no way their number could produce what we saw with our very own eyes. We win!
11
Page 5 of 7
Useful Tip #1: Read each problem carefully , and be sure to follow the directions specified for each question! I will take a vow of silence if you ask me a question that is clearly stated in a problem.
Useful Tip #2: Minimize your code by not simply copying and pasting absolutely everything we do in class. See if you can eliminate unnecessary commands by knowing what it is you have to do and what tools you (minimally) need to do it.
Useful Tip #3: When in doubt, restart !
Useful Tip #4: When you reopen a saved .mw file, remember to re−execute everything in order to use previously defined things. Maple can show you the output of what you’ve done before, but it won’t remember how it got there.
Useful Tip #5: Read through your completed assignment again before handing it in, to make sure that things make sense. It helps both the learning and grading processes.
Paper−saving tip: Make the size of your output graphs smaller to save paper when you print them. Please ask me if you’re unsure of how to do this. (You can see how much paper you’d use beforehand by going to
File Print Preview.) Also, please DO NOT attach printer header sheets (usually yellow, pink or blue) to your assignment. Recycle them instead!
(0) Import the Maple Statistics and plots packages. Also run the randomize( ) command to ensure random results.
with(Statistics): with(plots): randomize():
Note #1: You will be penalized one raw point for each unnecessary Maple command. Make sure you understand what’s needed and what’s superfluous!
Note #2: I will not grade anything (i.e. you will lose full points for any problem) that is wrong because you failed to follow directions in any capacity. Be careful!
Suppose students fall asleep during any given minute after the first 30 minutes of a rather monotonous 90− minute lecture with some probability p. The TA for the class checks every 5 minutes, starting at minute 30, to see if there are any students sleeping (mainly to avoid falling asleep herself), but she doesn’t count them.
She does, however, record whether there was someone sleeping in class for the remaining 60 minutes.
Assume that any sleeping student is not impacted by the sleep state of his/her peers.
(1) Let the presence of a sleeping student be described by a binomially distributed random variable. If this situation occurs campus−wide at a large university, and there are 100 TAs who record and submit their results from a single lecture to the Psychology department, simulate the number of TAs who were able to find at least one sleeping student in exactly 7 out of the total number of intervals checked, for various probabilities p. Run the simulation for p values ranging from 0.01 to 0.96 with a step size of 0.05.
Suppress your ’do loop’ output.
## simulation
(2) Save to ’plot1’ a plot of your ’count’ results all divided by 100.
Required:
[1] Omit axes labels.
[2] Choose ’Black’ to be the color for all your lines.
## plot1
(3) Appropriately define the likelihood function of the current situation as L(p) with random variable S.
## likelihood function
(4)(a) Save to ’plot2’ a plot of L(p) for p=0..1.
Page 6 of 7
Required:
[1] Omit axes labels.
[2] Choose ’Red’ as the color for L(p).
[3] Specify a thickness of 3.
## plot2
(b) Display both ’plot1’ and ’plot2’ on the same set of axes.
## display plots
(c) Looking at the red curve above, approximately how likely is it that a by−the−5−minute student slumber probability equaling 0.4 generated the data you found above? Your answer should be in a percentage.
(d) Use your graph to compare where the maximum values for L(p) and your simulation results occur.
Which p is larger?
(5) Compute the maximum likelihood estimator for p, as we did in class. Save the result to MLest.
## maximum likelihood estimator
(6)(a) Find the true 99% lower and upper confidence limits of the data. Suppress the output of b, lowerfn and upperfn, wherever you define them.
Save your limits as pl99 and ph99 for lower and upper, respectively.
## confidence limits
(b) What percentage of all possible probabilities is contained in the 99% confidence interval?
## some percentage
(c) How confident could you be that the true p lies between 0.3971 and 0.7539?
## be confident!
(7)(a) Now assume that you were the TA, and that you happened to find slumbering students in 8 of the total number of 5−minute intervals. Test the null hypothesis that the actual proportion of 5−minute intervals during which a sleeping student is found is your MLest by finding (non−graphically!) the "exact" 95% confidence limits for this new situation and seeing if this MLest lives within the interval.
## confidence limits
(b) Given your results from the previous part, would you reject the null hypothesis? Why?
(c) Confirm your conclusion by performing a two−tailed test.
## calculate p−value
(8) Out of curiosity, what do the evalhf( ) and ProbabilityFunction( ) commands do? describe, Describe,
DESCRIBE!
Did you remember to save paper?
Good luck with finals!
Page 7 of 7