# Section 2

```STAT 139
Week 3
Unit 2: Probability and Random Variables
Unit 3: Confidence Intervals and Hypothesis Testing
TF: Nicole Pashley (npashley@g.harvard.edu)
Probability Review
P (A) ≥ 0 P (S) = 1 P (Ac ) = 1 − P (A)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
• P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
• If A and B are disjoint or mutually exclusive events then P (A ∪ B) =
• Conditional probability: the probability of one event occurring under the condition
that we know the outcome of another event
P (A|B) =
P (A ∩ B) =
• Law of Total Probability: For a partition, A1 ,..., An :
P (B) =
• Bayes Rule:
P (A|B) =
• Two events A and B are independent if and only if knowing that one event occurs does
not change the probability that the other event occurs. Which means:
Exercise 1: 70% of students at a college use Macs while the other 30% use other computer
brands (assume students only have one computer). 80% of the Mac users also have iPhones,
while 40% of the non-Mac users have iPhones.
a) What is the probability that a randomly selected student is a Mac user and has an iPhone?
1
STAT 139
Week 3
b) Now we randomly pick another student. What is the probability that they have an
iPhone?
c) What is the probability that a randomly picked student does not use a Mac, given they
use an iPhone?
Random variables: PMFs, PDFs, and CDFs
• Probability mass function(PMF) of a discrete r.v. X: the function f such that for every real number x, f (x) = P (X = x)
• Cumulative distribution function (CDF) of a r.v. X: the function FX given by FX (x) =
P (X ≤ x)
• Probability density function (PDF) of a continuous r.v. X: the derivative, f , of the
CDF, given by f (x) = F 0 (x)
Means (Expected value) and Variances of r.v.s
• Expectation:
(P
xP (X = x) for discrete r.v.
E[X] = R ∞all x
xf (x)dx
for a continuous r.v.
−∞
• LOTUS (expectation of a function)
(P
g(x)P (X = x) for discrete r.v.
E[g(X)] = R ∞all x
g(x)f (x)dx
for continuous r.v.
−∞
h
i
• Variance (σx2 ): V ar(X) = E (X − &micro;)2 = E[X 2 ] − &micro;2
• Standard deviation: SD(X) =
p
V ar(X)
Exercise 2: We have a random variable X with


−1 with probability 0.25
X= 0
with probability 0.25


1
with probability 0.5
2
STAT 139
Week 3
Find E[5X + 3] and V ar(5X + 3).
Normal and Binomial distributions
Exercise 3: Each person in a book club, with 10 members, must purchase To Kill a Mockingbird to read. Each person buys a paperback copy with 60% probability and buys a hardcover
copy with probability 40% (and no one buys more than one copy). What is the expected
total number of paperback copies of To Kill a Mockingbird purchased by members of the
book club? What is the probability that less than 9 paperbacks are purchased by the club?
Sample Statistics: X̄ and S 2 as r.v.s
• Sample mean X̄: X̄ =
1
n
Pn
i=1
Xi ; Population mean: &micro; = E[Xi ]
• Sampling distribution of a statistic: reference distribution that arises from a chance
mechanism used to select a random sample from a population
• Law of Large Numbers: X̄ will have E[X̄] = &micro; and V ar(X̄) =
3
σ2
n
STAT 139
Week 3
• Central Limit Theorem: All sample means and sums of r.v.s will be normally distributed, no matter what the original distribution of the individual Xi is (assuming n
is large and variance is finite)
Statistical Inference Two major types of inference:
• Estimation (confidence intervals)
• Making decisions (tests of hypothesis)
Confidence Intervals
• Confidence intervals: provides range of plausible values for the true parameter (estimand ) based on data collected
• General form: Point estimate &plusmn; margin of error
• Margin of error: Measures accuracy of our estimate
• Correct interpretation of 95% CI: If we repeatedly draw samples of size n and
calculate 95% CIs for each sample, then 95% of such CIs will cover the unknown
population mean &micro;
σ
σ
√
√
• 95% confidence interval for mean when n &gt; 30: x̄ − 1.96 n , x̄ + 1.96 n
Exercise 4: Suppose you are overseeing the operations of a candy manufacturing company.
You want to build a confidence interval for the average weight of a certain type of candy
bars. You take a sample of 50 candy bars and find that the sample mean of the candy bar
weights was 56 grams. The standard deviation was 4. Assume that the candy bar weights
are approximately normally distributed. What is a 95% confidence interval for the average
candy bar weight? What is a 99% confidence interval? Give an interpretation of these
intervals.
4
```