Topic 4: Statistical Inference Outline • Statistical inference – confidence intervals – significance tests • Statistical inference for β1 • Statistical inference for β0 • Tower of Pisa example Theory for Statistical Inference • Xi iid Normal(μ,σ2), parameters unknown X X i n s s 2 , , s 2 (X i X) n 1 s s(X) n 2 Theory for Statistical Inference X • Consider variable t s (X) • t is distributed as t(n-1) • Use distribution in inference for m – confidence intervals – significance tests Confidence Intervals X tc s(X) where tc= t(1-α/2,n-1), the upper (1-a/2)100 percentile of the t distribution with n-1 degrees of freedom • 1-a is the confidence level Confidence Intervals • X is the sample mean (center of interval) • s( X ) is the estimated standard deviation of X, sometimes called the standard error of the mean • tc s(X) is the margin of error and describes the precision of the estimate Confidence Intervals • Procedure such that (1-a)100% of the time, the true mean will be contained in interval • Do not know whether a single interval is one that contains the mean or not • Confidence describes “long-run” behavior of procedure • If data non-Normal, procedure only approximate (central limit theorem) Significance tests H 0 : 0 vs H a : 0 t (X 0 ) s(X) * Reject H 0 if t | t c |, t c * t(1 α / 2 , n 1) P Prob( t t ), where t ~ t(n - 1) * Significance tests • • • • • Under H0 t* will have distribution t(n-1) P(reject H0 | H0 true) = a (Type I error) Under Ha, t* will have noncentral t(n-1) dists P(DNR H0 | Ha true) = b (Type II error) Type II error related to the power of the test NOTE IN THIS COURSE USE α=.05 UNLESS SPECIFIED OTHERWISE Theory for β1 Inference b1 ~ N (1 , (b1 )) 2 where (b1 ) 2 2 (X i X) t (b1 1 ) / s (b1 ) * where s(b1 ) s 2 (X Under H 0 , t ~ t(n 2) * i X) 2 2 Confidence Interval for β1 b1 ± tcs(b1) where tc = t(1-α/2,n-2), the upper (1-α/2)100 percentile of the t distribution with n-2 degrees of freedom • 1-α is the confidence level Significance tests for β1 H 0 : 1 0 vs H a : 1 0 t (b1 0) s(b1 ) * Reject H 0 if t | t c |, t c t(1 α / 2 , n 2) * P Prob( t t ), where t~t(n 2) * Theory for β0 Inference 2 b 0 ~ N ( 0 , (b 0 )) 2 1 X 2 2 where (b 0 ) 2 n (X i X) * t (b 0 0 ) / s (b 0 ) for s (b 0 ) replace by s and take 2 Under H 0 , t ~ t(n 2) * 2 Confidence Interval for β0 b0 ± tcs(b0) where tc = t(1-α/2,n-2), the upper (1-α/2)100 percentile of the t distribution with n-2 degrees of freedom • 1-α is the confidence level Significance tests for β0 H 0 : 0 0 vs H a : 0 0 t (b0 0) s(b 0 ) * Reject H 0 if t | t c |, t c t(1 α / 2 , n 2) * P Prob( t t ), where t~t(n 2) * Notes • The normality of b0 and b1 follows from the fact that each of these is a linear combination of the Yi, each of which is an independent normal • For b1 see KNNL p42 • For b0 try this as an exercise Notes • Usually the CI and significance test for β0 is not of interest • If the ei are not normal but are relatively symmetric, then the CIs and significance tests are reasonable approximations Notes • These procedures can easily be modified to produce one-sided confidence intervals and significance tests 2 2 2 • Because (b1 ) ( X i X ) we can make this quantity small by making n 2 ( X X ) large. i i 1 SAS Proc Reg proc reg data=a1; model lean=year/clb; run; clb option generates confidence intervals Variable Intercept year Parameter Estimates Parameter Standard 95% Confidence DF Estimate Error t Value Pr > |t| Limits 1 -61.12088 25.12982 -2.43 0.0333 -116.43124 -5.81052 1 9.31868 0.30991 30.07 <.0001 8.63656 10.00080 CIs given here….CI for intercept is uninteresting Review • What is the default value of α that we will use in this class? • What is the default confidence level that we use in this class? • Suppose you could choose the X’s. How would you choose them if you wanted a precise estimate of the slope? intercept? both? Background Reading • Chapter 2 – 2.3 : Considerations • Chapter 16 – 16.10 : Planning sample sizes with power • Appendix A.6