* Inference for Proportions(C18-C22 BVD) C19-22: Inference for Proportions * When you calculate a statistic from a sample, you often do so to estimate a population parameter. The sample result is a point estimate of the true value in the population. * We realize the population parameter is probably not exactly equal to our sample result, so we want to say how accurate the estimate is. * We know from Ch. 18 that sampling distributions for means and proportions follow a Normal distribution centered at the true population parameter. * If we “reach out” a certain number of standard errors (margin of error)from the sample statistic, we can tell how close the estimate tends to be to the population true value in repeated random sampling. * A confidence level (like C%) tells the success rate of “capturing” the true parameter in an interval of that size if it were to be done repeatedly (although usually we really only do it once). * Confidence Level: To say that we are 95% confident is shorthand for “95% of all possible samples of a given size from this population will result in an interval that captures the true population parameter.” * Confidence Interval: To interpret a C% confidence interval for an unknown parameter, say, “We are C% confident that the interval from __________ to ___________ captures the actual value of the (population parameter in context). * *P-hat +/- z*(sqrt(p-hat*q-hat/n) *Sample statistic +/- ME *ME = # standard errors reaching out from statistic. *1 prop Z Int on calculator * * Draw or imagine a normal model with C% shaded, symmetric about the center. * What percent is left in the two tails? * What percentile is the upper or lower fence at? * Look up that percentile in z-table to read off z (or use invNorm(.95) or whatever percent is appropriate) * Good to memorize most common ones, like 95% => 1.96 * *ME = z*(SE) Plug in desired ME (like within 3% means ME = .03). Plug in z* for desired level of confidence. Plug in sample p-hat and q-hat if they exist, or use 0.5 for both. Solve equation for n. See pages 374-375 for an example. * *For all inference for proportions check: *1. Is it plausible that sample/experiment results are independent of one another? *2. Random sampling/assignment? *3. Sample less than 10% of population? *4. Success/Fail – np and nq both greater than 10? *Success/Fail – if doing interval, use p-hat and q-hat, if doing hypothesis test, use p0 and q0. * * Null: p is hypothesized value (no hats!) * Alternate: isn’t, is greater, is less than * Hypothesized Model: centers at p, has a standard deviation of sqrt(poq0/n) * Find z-score of sample value. * Use table or normalcdf to find area of shaded region. (double for two-tail test). * 1-prop-z-test on calculator – report z and pvalue. * * When stating which test/interval you’ll be doing, and/or writing hypotheses, you should define the variables used in writing those hypotheses so there is no confusion. * Beware of writing down whatever your calculator spits out without defining/explaining what those numbers represent. * * Two – not equal alternate if all you want to know is if there is evidence that the null isn’t true. * One – tailed alternate if you want to know if there is evidence that the null is too high or too low. * * Use given alpha-level (significance level) or choose alpha-level. * If p-value is below alpha, reject null. If pvalue is above alpha, fail to reject null. * “The p-value of ________ is above/below our alpha level of _________ therefore we fail to reject/reject the null hypothesis and conclude (what the null says in context/what the alternate says in context). * * If you choose alpha = 0.05, 1-alpha gives confidence level for a two-tail test (95% CI) that would lead to the same conclusions. * If you’re doing a one-tail test, all of alpha is one tail. The corresponding CI ends up being shortened. Instead of 1-alpha, it is 1-2alpha. * For a one-tail test with alpha = 0.05, the CI that would give same conclusions is 1-.1 or 90%. * * Type I error – rejecting null when it is really true * Type II error – failing to reject null that is really false * Probability of Type I error = alpha * Probability of Type II error represented by Beta, but there is a different Beta for each specific alternate value you might want to detect. You will not have to calculate Beta. * The Power of a test is 1-Beta, and is the probability or ability to correctly detect that a null is false. * If you increase alpha (i.e. lowering C), you increase the probability of a Type I error, you also increase the power of the test, but you decrease the probability of a Type II error. * Increasing sample size can decrease the probability of both types of error while still increasing power. * In reality, you almost never know what type of errors have occurred, if any. You analyze what types of errors would be the most damaging, and choose your sample size, alpha level, confidence level and so on accordingly. Make a truth table to help you analyze the risks. * * Changes: * check if two groups are independent, not just if individual sample data are independent. * Check Success/Fail for both groups. * SE formula for CI is sqrt(p1q1/n1 +p2q2/n2) * CI = sample difference +/- ME * Null usually is p1-p2 = 0 * SE formula for hypothesis uses p-total and q-total instead of p1,q1,p2,q2. *