Probability and Statistics (Spring 2025)
Final Exam (June 5, 2025)
Baseball, Cheerleaders, and Hypothesis Testing
(Because statistics can be fun, too!)
1. True Fan or Cheer Fan? A Hypothesis Testing Adventure (30%)
Somewhere in Asia, baseball is not just a sport — it’s a whole vibe. The stadiums are
packed with fans, but not everyone is there for the same reason.
Some fans are there for the game — they follow the players, know the stats, and live
for the home run. These people proudly call themselves “true baseball fans.”
But there’s another group — just as enthusiastic — who are really there for the cheerleaders. They may not know who’s batting, but they know every cheer routine by
heart. These are the “true cheerleader fans.”
There are 6 teams in the pro league.
Jane, a college student and a proud baseball lover, has been going to games regularly.
She noticed a quirky pattern: the more often someone points their phone camera at
the cheerleaders, the less likely they are to know who’s on first base.
So she starts collecting data.
Let X be the number of times a person points their phone at the cheerleaders during
a game.
For true baseball fans, X ∼ Binomial(90, 0.1)
For true cheerleader fans, X ∼ Binomial(90, 0.5)
In the population, 75% of fans are baseball fans, 25% are cheerleader fans.
Now enter Mike — Jane’s boyfriend. Mike claims he’s a baseball fan. But Jane’s
suspicious. So she brings him to a game. . . and decides it’s time for a statistical test
of truth.
(a) (5%) Formulate a binary hypothesis testing problem to help Jane test whether
Mike is really a true baseball fan.
Clearly write down the form of the binary hypothesis testing problem to get full
credit.
(b) (10%) Based on Jane’s observation X, derive the decision rule that minimizes the
overall probability of error. Simplify the rule as much as possible. (The answer
can be expressed in log form.)
(c) (In this problem, you may have summation in your answer.)
(3%) What is the name of this decision rule?
(4%) What is the Type I error probability?
(4%) What is the Type II error probability?
(4%) What is the overall error probability?
1
2. Team “Universe”: Losing Games, Selling Tickets (15%)
Team Universe is dead last in the standings... but somehow sells more tickets than any
other team. What’s their secret? Cheerleaders. Lots of them. 23, to be exact. Let
Xi be the number of fans who show up specifically for cheerleader i. The {Xi} are
independent random variables.
Assume the following:
For i=1 to 20:
Xi ∼ Binomial(1500, 0.2)
X21 ∼ Binomial(3000, 0.2)
X22 ∼ Binomial(3000, 0.2)
X23 ∼ Binomial(4000, 0.2)
Let Y = X1 + X2 + ... +X23 be the total number of cheerleader fans attending a home
game.
(a) (5%) Find the moment generating function (MGF) of Y .
(b) (5%) What is the distribution of Y ? What are E[Y ] and Var(Y )?
(c) (5%) The team manager cares about the ticket sales. She wants to know the
probability that more than 8000 cheer fans show up, i.e., P[Y > 8000].
She uses the standard normal CDF Φ(·) to estimate this. Explain in detail why
this is valid, and express the probability in terms of Φ(·).
3. Mr. J’s Cursed Praise (5%)
Mr. J is a popular YouTuber with a strange superpower: whenever he praises a baseball
team before a game, that team tends to lose.
Out of the last 200 games, Mr. J praised a team in each one — and 160 of them lost.
Now, some teams are thinking:
“What if we pay Mr. J to praise our opponents?”
They estimate the true “jinx probability P[J]” by P̂ = 160/200 = 0.8. But they’re
worried: what if the true P[J] is actually less than 0.5? That would be bad.
Please use Chebyshev’s inequality to provide an upper bound on the probability that
the true P[J] is less than 0.5. (Is Mr. J truly magical. . . or just statistically lucky?)
4. (10%) Let X be a discrete non-negative integer-valued random variable with MGF as
follows:
14 + 5es − 3e2s
ϕX (s) = K ·
8(2 − es )
(a) (5%) Find E[X].
(b) (5%) Find the conditional expected value of X given that X ̸= 0.
2
5. (15%) Let X and Y be independent random variables with CDFs FX (x) and FY (y)
respectively. Let U = min(X, Y ) and V = max(X, Y ).
(a) (6%) Express FU,V (u, v) in terms of the CDFs of X and Y .
(b) (4%) Find fU,V (u, v) if X and Y are independent continuous uniform (0, 1) random
variables.
(c) (5%) (Continued from (b)) Find the correlation coefficient ρU,V of U and V .
6. (25%) A spy inside a classified research facility wishes to use a gateway router for covert
communication of research secrets to an outside accomplice. To do so, the spy covertly
communicates a bit W ∈ {0, 1} for every n transmitted packets. To signal W = 0, n
packets are sent out through the gateway router as a Poisson process of rate λ packets
per second. To signal W = 1, n packets are sent out as a Poisson process of rate 2λ
packets per second. The secret communication bits are equiprobable in that P[W =
1] = P[W = 0] = 1/2. The spy’s accomplice outside the gateway router monitors
the outbound packet transmission process by observing n packet interarrival times
T1 , T2 , ...Tn and use X = min(T1 , T2 , ...Tn ) as the decision statistic for guessing the bit W
for every n packets. (That is, the accomplice observes the minimum interarrival times
of n packets for performing hypothesis test.) Let the starting time of the monitoring
period be time 0 and the interarrival time of the first packet is simply its arrival time.
(a) (5%) Find the likelihood functions for the decision statistic X given W = 0 (hypothesis H0 ) and W = 1 (hypothesis H1 ) respectively.
(b) (4%) What is the hypothesis test based on X that can minimize the total probability of error for the accomplice? You need to identify the decision boundary
clearly for determining H0 and H1 .
(c) (5%) What is the total probability of error under the aforementioned hypothesis
testing rule? Round the answer to three decimal places.
(d) (6%) Assume that the gateway router performs random dropping for each packet
that passes through it with dropping probability p, 0 < p < 1. Since such a
random drop behavior is unknown to the accomplice, the decision boundary for
determining H0 or H1 remains the same. Find the probability of error as a function
of p under such random dropping.
(e) (5%) Find the impact of random dropping on the changes of PFA (Type I error)
and PMISS (Type II error) if p = 1/2. Explain the reason for the increase or
decrease of the two error probabilities.
3