STAT 401 Lab 2 Solutions Geoffrey Thompson 5/30/2013

advertisement
STAT 401 Lab 2 Solutions
Geoffrey Thompson
5/30/2013
Just as a reminder, the web page is http://gzt.public.iastate.edu/
stat401/ and my e-mail address is gzt@iastate.edu. I don’t have scheduled
office hours, but if you want to discuss questions feel free to e-mail me to set up
a time.
1
Example Problems With Solutions
1.1
Efron Dice
I have four dice, A, B, C, D. They are fair dice (each face of each die will come
up with probability 1/6. The faces are labeled as follows:
A: 4 4 4 4 0 0
B: 3 3 3 3 3 3
C: 6 6 2 2 2 2
D: 5 5 5 1 1 1
So four faces of die A are labeled 4. Therefore, e.g.:
P (A = 4) = 4(1/6) = 2/3
Find the following:
• P (A > B)
Note that B is always 3. Therefore, A > B when A = 4. Therefore,
P (A > B) = P (A = 4) = 2/3.
The general steps to solving this are to look at what values, b, one of the
variables, B, can take and then, for each of those, find the probability that
A > b.
• P (B > C)
B is always 3. C takes on 2 values: 6 and 2. So B > C when C = 2. So
P (B > C) = P (C = 2) = 2/3.
• P (C > D)
Note: this one is tricky.
1
Idea: if I roll die C and get a 2, what is the probability that I beat D?
Continue for all possibilities.
I discussed this answer in lab:
6
2
C
P (C = c)
1/3 2/3
P (D < c)
1
1/2
P (C = c)P (D < c) 1/3 1/3
Add up the bottom row to figure out P (C > D), which is 2/3. This wasn’t
quite covered in class, apparently, which is why I solved it in the lab.
• P (D > A)
Note: same method as the previous.
D
P (D = d)
P (A < d)
P (D = d)P (A < d)
5
1/2
1
1/2
1
1/2
1/3
1/6
Add up the bottom row to figure out P (D > A), which is 2/3.
One way to think about this is that you have a random variable, X, which
is 1 when D > A and 0 otherwise.
Then you are finding E(X).
• OPTIONAL CHALLENGE: you get to choose one die and somebody else
will randomly select one of the three remaining dice. Whoever then rolls
the highest number wins a giant sack of cash. Which die should you
choose?
For each die, there are three possible comparisons. We have already made
two for each die. For instance, die A needs to be compared to B, C, and
D. We already have a comparison to B and D. We know that compared
to B, A will win 2/3 of the time. Compared to D, A will 1/3 of the time.
We do not know how A compares to C. We have the same sort of thing
for the other dice.
So the “best” die will be the one that has the best performance against
the third die whose result we don’t yet know.
We have to comparisons to evaluate: P (A > C) and P (B > D).
These dice behave in a counterintuitive manner!
1.2
Normal Dice
We are now using normal, 6-sided dice, X.
That is, there are 6 sides, each side is labeled with a different integer between
1 and 6, and each side has probability 1/6 of being rolled.
2
• Sketch the pmf and cdf of X.
• If you roll one die, X, what is the mean of X?
One way to help think about this is to make a table. Fill in the values of
X on the first row. Then fill in their probabilities on the second row. The
product is the third row. Then the expectation (i.e., the mean) is the sum
of the third row.
X
1
2
3
4
5
6
P (X = x) 1/6 1/6 1/6 1/6 1/6 1/6
x · P (x)
1/6 1/3 1/2 2/3 5/6
1
You can verify that the sum of the bottom row is 3.5.
• What is the variance of X?
We already have E(X), so we need to calculate E(X 2 ).
X
X2
P (X 2 = x2 )
x2 · P (X 2 = x2 )
1
1
1/6
1/6
2
4
1/6
2/3
3
9
1/6
3/2
4
16
1/6
8/3
5
25
1/6
25/6
6
36
1/6
6
You can verify the sum is 91/6. So then we have:
V (X) = E(X 2 ) − (E(X))2 =
91
− 3.52 = 2.91666 . . .
6
• What is the mean of 2X?
Recall that, for a random variable X and constants a and b:
E(aX + b) = a · E(X) + b
In this case, a = 2 and b = 0. So E(2X) = 2E(X) = 7.
• What is the variance of 2X?
Recall that, for a random variable X and constants a and b:
V (aX + b) = a2 · V (X)
In this case, a = 2 and b = 0. So V (2X) = 4V (X) ≈ 11.7.
3
• I’m playing a board game. I have a choice between either rolling two dice,
X1 and X2 , and adding them up or rolling one die, X, and using 2X for
my turn. It’s the last turn of the game. If I get a 10 or higher, I win.
Otherwise, I lose. Should I roll 2 dice or roll one die and double it?
I gave, in class, the probability distribution of X1 + X2 for this instance.
As a result, we have P (X1 + X2 ≥ 10) = 16 . In other words, if I roll two
dice, that is the probability that I win.
If I use one die, then we have the following:
P (2X ≥ 10) = P (X ≥ 5) =
1
3
So using only one die, in this case, is twice as good.
• What about if I only need a 7 or higher to win?
Using the numbers I gave, using two dice has a probability of 21/36 of
winning.
If using only one die, rolls of 4, 5, 6 win while 1, 2, 3 lose. This means there
is a probability of 0.5.
So using two dice is a better choice.
• Okay, this one isn’t a normal die anymore: I have an 8-sided die labeled
from 0 to 7. What is the mean and what is the variance of its rolls?
The mean, E(X), is still 3.5.
X
X2
P (X 2 = x2 )
x2 · P (X 2 = x2 )
0
0
1/8
0
1
1
1/8
1/8
2
4
1/8
1/2
3
9
1/8
9/8
4
16
1/8
2
5
25
1/8
25/8
6
36
1/8
9/2
7
49
1/8
49/8
Sum up the bottom row to find E(X 2 ). The variance is, again, E(X 2 ) −
(E(X))2 .
1.3
Coloring a Complete Graph Randomly
A complete graph G on n vertices is a graph with an edge connecting each pair
of vertices.
Insert crude doodle:
There are n2 = n(n−1)
edges in G.
2
Suppose each edge is colored red with probability p and blue with probability
q = 1 − p. Every edge is either colored red or blue.
4
• What is the expected number of red edges in G?
I just want to note, this problem is couched in a lot of rather difficult terms,
so don’t feel bad if you don’t get it. Everybody was having problems. This
is harder than anything you would be tested on. However, I am providing
this solution in the hopes that it will help you see the path through these
kinds of problems.
The whole write-up of the question is a leadup to the last couple sentences
edges . . . . Each edge is colored
of the problem statement: There are n(n−1)
2
red with probability p.
And then the question: What is the expected number of red edges?
Here’s the big hint to take away from reading the question: we have some
big fixed number of edges on the graph, each one has a certain probability
of having some property, and we want to count how many of those there
are. After seeing that, we need to look at what probability distributions
that we know of deal with that kind of thing.
Of the distributions we know, the binomial seems most appropriate. Looking at the definitions, if we have n(n−1)/2 edges and each has a probability
of p of being red, then the expected number of red edges is pn(n − 1)/2.
• What is the standard deviation of the number of red edges in G?
We need to find the variance. The standard deviation is the square root
of the variance.
Working from the definition of a binomial again, we have:
V (X) =
2
σX
n(n − 1)
=
p(1 − p) → std.dev = σX =
2
r
n(n − 1)
p(1 − p)
2
• Suppose n > 3, 0 < p < 1. If I choose 3 vertices at random from G, what
is the probability that the edges connecting them to each other are all
red?
This is also binomial.
We have to do three things, then:
1. Find out what n is in this case. This will be the total number of
edges we are looking at.
2. Find out what p is.
3. Find out what x value we’re trying to find. As above, it’s the number
of red edges.
So we chose three vertices from G. There are two ways to figure out how
many edges there are: calculating it or drawing the triangle.
3
2 = 3 edges. So this is n.
5
p hasn’t changed from above: it’s still the p from the problem statement.
x: if all the edges are red, then x = n = 3.
So the answer is b(3, 3, p) = p3 , using the notation from the course notes.
What is the probability that they are all blue?
If all the edges are blue, then it is the same as above, except x = 0.
Using our notation, then, we are interested in b(0, 3, p) = (1 − p)3 .
• TRICKY AND OPTIONAL: If I choose k vertices, 1 < k < n, what is
the probability that all of the edges are one color? e.g. all red or all blue.
Note again this is harder than anything that will come up in class or in
an exam. It is tricky and optional.
Break this up into two parts: all red and then all blue.
This problem is not that much different from the previous one, except the
n we are concerned with is different.
This is still a binomial problem. We have a bunch of edges, they’re all
independently colored red with some probability, and we want to know
the probability that a certain number of edges are colored red.
The only thing that has changed from the previous problem is the number
n.
This is considered tricky and optional because I haven’t really told you
how to calculate n, but I gave a hint: if there are n vertices, there are n2
edges. So if there are k vertices, there are k2 edges.
For the sake of brevity, let’s define N = k2 .
Then the probability that all the edges are red is b(N, N, p). Checking the
definition, this is pN .
Conversely, the probability that all the edges are blue is b(0, N, p). By
definition, this is (1 − p)N .
So the probability that all the edges are one color is pN + (1 − p)N .
Secret math talk: this converges to 0 in probability as k → ∞.
• Let n = 20 and k = 5. That is, we choose 5 vertices at random from the
graph with their edges and call this new graph K.
Suppose the number of red edges in G is 50.
What is the probability that there are exactly 2 red edges in K?
HINT: hypergeometric.
This was deemed optional in lab because this had not yet been fully covered in class.
There are k2 = 10 edges in K.
6
There are n2 = 190 edges in G. If there are 50 red edges, there are 140
blue edges.
If there are 2 red edges in K, there are 8 blue edges in K.
By the definition of the hypergeometric distribution, we have the following:
50 140
P (X = 2) =
2
8
190
10
The intuition is that it’s the number of ways of choosing 2 red edges from
the big graph and 8 blue edges from the big graph divided by the number
of ways of choosing 10 edges.
• If we use the binomial approximation, what is our estimate of the probability that there are exactly 2 edges in K?
5
50
= 19
.
(20
2)
Here was have 10 edges and want the probability that 2 are red.
2
8
This is b(2, 10, p) = 10
2 p (1 − p) .
I provided that, for this problem, p =
The two answers are fairly close to each other.
The first two problems of the lab are probably not harder than what you will
be asked to do. The third problem is quite tricky.
2
References
• Efron Dice
• Complete graph
• STAT 401 Page
7
Download