STAT 139 Week 3 Unit 2: Probability and Random Variables Unit 3: Confidence Intervals and Hypothesis Testing TF: Nicole Pashley (npashley@g.harvard.edu) Probability Review P (A) ≥ 0 P (S) = 1 P (Ac ) = 1 − P (A) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) • P (A ∪ B) = P (A) + P (B) − P (A ∩ B) • If A and B are disjoint or mutually exclusive events then P (A ∪ B) = • Conditional probability: the probability of one event occurring under the condition that we know the outcome of another event P (A|B) = P (A ∩ B) = • Law of Total Probability: For a partition, A1 ,..., An : P (B) = • Bayes Rule: P (A|B) = • Two events A and B are independent if and only if knowing that one event occurs does not change the probability that the other event occurs. Which means: Exercise 1: 70% of students at a college use Macs while the other 30% use other computer brands (assume students only have one computer). 80% of the Mac users also have iPhones, while 40% of the non-Mac users have iPhones. a) What is the probability that a randomly selected student is a Mac user and has an iPhone? 1 STAT 139 Week 3 b) Now we randomly pick another student. What is the probability that they have an iPhone? c) What is the probability that a randomly picked student does not use a Mac, given they use an iPhone? Random variables: PMFs, PDFs, and CDFs • Probability mass function(PMF) of a discrete r.v. X: the function f such that for every real number x, f (x) = P (X = x) • Cumulative distribution function (CDF) of a r.v. X: the function FX given by FX (x) = P (X ≤ x) • Probability density function (PDF) of a continuous r.v. X: the derivative, f , of the CDF, given by f (x) = F 0 (x) Means (Expected value) and Variances of r.v.s • Expectation: (P xP (X = x) for discrete r.v. E[X] = R ∞all x xf (x)dx for a continuous r.v. −∞ • LOTUS (expectation of a function) (P g(x)P (X = x) for discrete r.v. E[g(X)] = R ∞all x g(x)f (x)dx for continuous r.v. −∞ h i • Variance (σx2 ): V ar(X) = E (X − µ)2 = E[X 2 ] − µ2 • Standard deviation: SD(X) = p V ar(X) Exercise 2: We have a random variable X with −1 with probability 0.25 X= 0 with probability 0.25 1 with probability 0.5 2 STAT 139 Week 3 Find E[5X + 3] and V ar(5X + 3). Normal and Binomial distributions Exercise 3: Each person in a book club, with 10 members, must purchase To Kill a Mockingbird to read. Each person buys a paperback copy with 60% probability and buys a hardcover copy with probability 40% (and no one buys more than one copy). What is the expected total number of paperback copies of To Kill a Mockingbird purchased by members of the book club? What is the probability that less than 9 paperbacks are purchased by the club? Sample Statistics: X̄ and S 2 as r.v.s • Sample mean X̄: X̄ = 1 n Pn i=1 Xi ; Population mean: µ = E[Xi ] • Sampling distribution of a statistic: reference distribution that arises from a chance mechanism used to select a random sample from a population • Law of Large Numbers: X̄ will have E[X̄] = µ and V ar(X̄) = 3 σ2 n STAT 139 Week 3 • Central Limit Theorem: All sample means and sums of r.v.s will be normally distributed, no matter what the original distribution of the individual Xi is (assuming n is large and variance is finite) Statistical Inference Two major types of inference: • Estimation (confidence intervals) • Making decisions (tests of hypothesis) Confidence Intervals • Confidence intervals: provides range of plausible values for the true parameter (estimand ) based on data collected • General form: Point estimate ± margin of error • Margin of error: Measures accuracy of our estimate • Correct interpretation of 95% CI: If we repeatedly draw samples of size n and calculate 95% CIs for each sample, then 95% of such CIs will cover the unknown population mean µ σ σ √ √ • 95% confidence interval for mean when n > 30: x̄ − 1.96 n , x̄ + 1.96 n Exercise 4: Suppose you are overseeing the operations of a candy manufacturing company. You want to build a confidence interval for the average weight of a certain type of candy bars. You take a sample of 50 candy bars and find that the sample mean of the candy bar weights was 56 grams. The standard deviation was 4. Assume that the candy bar weights are approximately normally distributed. What is a 95% confidence interval for the average candy bar weight? What is a 99% confidence interval? Give an interpretation of these intervals. 4