Lecture 3 Properties of Summary Statistics: Sampling Distribution Main Theme How can we use math to justify that our numerical summaries from the sample are good summaries of the population? Lecture Summary • Today, we focus on two summary statistics of the sample and study its theoretical properties 1 π π π π=1 π 1 π S2 = π−1 π=1 – Sample mean: X = – Sample variance: ππ − π 2 • They are aimed to get an idea about the population mean and the population variance (i.e. parameters) • First, we’ll study, on average, how well our statistics do in estimating the parameters • Second, we’ll study the distribution of the summary statistics, known as sampling distributions. Setup • Let π1 , … , ππ ∈ π π be i.i.d. samples from the population πΉπ • πΉπ : distribution of the population (e.g. normal) with features/parameters π – Often the distribution AND the parameters are unknown. – That’s why we sample from it to get an idea of them! • i.i.d.: independent and identically distributed – Every time we sample, we redraw π members from the population and obtain (π1 , … , ππ ). This provides a representative sample of the population. • π π : dimension of ππ – For simplicity, we’ll consider univariate cases (i.e. π = 1) Loss Function • How “good” are our numerical summaries (i.e. statistics) in capturing features of the population (i.e. parameters)? • Loss Function: Measures how good the statistic is in estimating the parameter – 0 loss: the statistic is the perfect estimator for the parameter – ∞ loss: the statistic is a terrible estimator for the parameter • Example: π π, π = π − π 2 where π is the statistic and Λ is the parameter. Called square-error loss Sadly… • It is impossible to compute the values for the loss function • Why? We don’t know what the parameter is! (since it’s an unknown feature of the population and we’re trying to study it!) • More importantly, the statistic is random! It changes every time we take a different sample from the population. Thus, the value of our loss function changes per sample A Remedy • A more manageable question: On average, how good is our statistic in estimating the parameter? • Risk (i.e. expected loss): The average loss incurred after repeated sampling π π = πΈ π π, π • Risk is a function of the parameter • For the square-error loss, we have π π =πΈ π−π 2 Bias-Variance Trade-Off: Square Error Risk • After some algebra, we obtain another expression for square error Risk π π = πΈ π − π 2 = πΈ π − π 2 + πΈ[(π − πΈ π )2 ] = π΅πππ π π 2 + πππ π • Bias: On average, how far is the statistic away from the parameter (i.e. accuracy) π΅πππ π π = πΈ π − π • Variance: How variable is the statistic (i.e. precision) • In estimation, there is always a bias-variance tradeoff! Sample Mean, Bias, Variance, and Risk 1 π • Let π π1 , … , ππ = π . We want to see how well the sample mean π π=1 π estimates the population mean. We’ll use square error loss. • π΅πππ π π = 0, i.e. the sample mean is unbiased for the population mean – Interpretation: On average, the sample mean will be close to the population mean, π • πππ π = π2 π 1 – Interpretation: The sample mean is precise up to an order π. That is, we decrease the variability of our estimate for the population mean by a factor of 1 π • Thus, π π = π2 π – Interpretation: On average, the sample mean will be close to the population π2 mean by π . This holds for all population mean π Sample Variance and Bias 1 π−1 π 2 • Let π π1 , … , ππ = π − π . We want to π=1 π see how well the sample variance estimates the population variance. We’ll use square error loss. • π΅πππ π2 π = 0, i.e. the sample variance is unbiased for the population mean – Interpretation: On average, the sample variance will be close to the population variance, π 2 • πππ π =?, π π 2 =? – Depends on assumption about fourth moments. In Summary… • We studied how good, on average, our statistic is in estimating the parameter • Sample mean, π, and sample variance,π 2 , are both unbiased for the population mean,π, and the population variance, π 2 But… • But, what about the distribution of our summary statistics? • So far, we only studied the statistics “average” behavior. • Sampling distribution! Sampling Distribution when πΉ is Normal Case 1 (Sample Mean): Suppose πΉ is a normal distribution with mean π and variance π 2 (denoted as π(π, π 2 )). Then π is distributed as π 1 π2 π= ππ ∼ π(π, ) π π π=1 Proof: Use the fact that ππ ∼ π π, π 2 . Sampling Distribution Example • Suppose you want to know the probability that your sample mean, π, is π > 0 away from the population mean. • We assume π is known and ππ ∼ π π, π 2 , i. i. d. π π−π ≤π =P X−π π π ≤ π π π where π ∼ π(0,1). =π π π π ≤ π Sampling Distribution when πΉ is Normal Case 1 (Sample Variance): Suppose πΉ is a normal distribution with mean π and variance π 2 2 π (denoted as π(π, π 2 )). Then n − 1 2 is π distributed as π 2 π 1 2 n−1 2 = 2 ππ − π 2 ∼ ππ−1 π π π=1 2 where ππ−1 is the Chi-square distribution with π − 1 degrees of freedom. Some Preliminaries from Stat 430 • Fact 0: (π, π1 − π, … , ππ − π) is jointly normal Proof: Because π and ππ − π are linear combinations of normal random variables, they must be jointly normal. • Fact 1: For any π = 1, … , π, we have πΆππ£ (π, ππ − • Since π and ππ − π are jointly normal, the zero covariance between them implies that π and ππ − π are independent 2 • Furthermore, because π = 1 π−1 π π=1(ππ 2 • Fact 2: If π = π + π and π ∼ ππ+π , π ∼ ππ2 , and π and π are independent, then π ∼ ππ2 Proof: Use moment generating functions • Now, we can prove this fact. (see blackboard) ππ −π 2 π π = π=1 = π π−1 π 2 π= π2 2 π π−π π= ∼ π12 π 2 Thus, π ∼ ππ−1 2 π − π+ π−π π π π=1 π = π + π ∼ ππ2 Sampling Distribution when πΉ is not Normal Case 2: Suppose πΉ is an arbitrary distribution with mean π and variance π 2 (denoted as F(π, π 2 )). Then as π → ∞, π π−π lim → π 0,1 π→∞ π Proof: Use the fact that ππ ∼ πΉ π, π 2 and use the central limit theorem Properties • As sample size increases, π is unbiased – Asymptotically unbiased • As sample size increases, π approaches the 1 normal distribution at a rate π – Even if we don’t have infinite sample size, using π2 π π, as an approximation to π is meaningful for π large samples – How large? General rule of thumb: π ≥ 30 Example • Suppose ππ ∼ πΈπ₯π π in i.i.d. – Remember, πΈ ππ = π 1 and πππ ππ = 2 π • Then, for large enough 1 1 π, π ≈ π( , 2 ) π π π An Experiment Lecture Summary • Risk: Average loss/mistakes that the statistics make in estimating the population parameters – Bias-variance tradeoff for squared error loss. – π and π 2 are unbiased for the population mean and the population variance • Sampling distribution: the distribution of the statistics used for estimating the population parameters. – If the population is normally distributed: • π∼π π2 π, π and π−1 π2 π2 2 ∼ ππ−1 – If the population is not normally distributed • π π−π π → π(0,1) or π ≈ π π2 π, π