AP Stat Summary of Using Inference Procedures For all situations below: Confidence Interval: statistic (criticalvalue ) ( s tan darddeviationofstatistic ) Standardized Test Statistic: statistic parameter s tan darddeviationofstatistic INFERENCE PROPORTIONS 1.) One Proportion Z-interval-used when you have one sample and the word “proportion” is mentioned and you want to estimate the true proportion. Conditions: 1.) Plausible Independence Condition-This requires that you know something about the data. 2.) Randomization Condition-Were proper randomization techniques used to collect the data? 3.) 10% Condition- Unless the sample size is less than 10% of the population the normal model may not be appropriate. 4.) Success/Failure Condition-We must have at least 10 “successes” (np 10) and 10 “failures.” (n(1 p) 10) Formula: pˆ z * pˆ (1 pˆ ) n 2.) One Proportion Z-test-used when you have one sample and the word “proportion” is mentioned and you want to compare the sample proportion to an old proportion. Conditions: Same as above for the One Proportion z-interval except that the third condition is checked using the population proportion. Formula: z pˆ p 0 p 0 (1 p 0 ) n p p0 Null: p p0 , p p0 , orp p0 Alt: *For both procedures above use the normal model (or chart) to obtain the p-value. 3.) Two Proportion Z-interval-used when you have two samples and the word “proportion” is mentioned and you want to estimate the true difference between the two proportions. Conditions: 1.) Plausible Independence Condition-It is important to be certain that the two sample groups are independent of one another. If the samples are NOT independent, this procedure is inappropriate. 2.) Randomization Condition-Were proper randomization techniques used to collect the data? 3.) 10% Condition- Unless the sample size of each sample is less than 10% of its respective population the normal model may not be appropriate. 4.) Success/Failure Condition-We must have at least 10 “successes” (np 10) and 10 “failures.” (n(1 p) 10) from each sample. Note: Some statisticians feel as long each of the above computations is more than 5, the success/failure condition is met. pˆ (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) Formula: ( pˆ 1 pˆ 2 ) z * 1 n n 1 2 4.) Two Proportion Z-Test- used when you have two samples and the word “proportion” is mentioned and you want to compare the difference between the two proportions. Conditions: 1.) Plausible Independence Condition-It is important to be certain that the two sample groups are independent of one another. If the samples are NOT independent, this procedure is inappropriate. 2.) Randomization Condition-Were proper randomization techniques used to collect the data? 3.) 10% Condition- Unless the sample size of each sample is less than 10% of its respective population the normal model may not be appropriate. 4.) Success/Failure Condition-We must have at least 10 “successes” (np 10) and 10 “failures.” (n(1 p) 10) for each sample. Note: Some statisticians feel as long each of the above computations is more than 5, the success/failure condition is met. Formula: Null: Alt: z pˆ 1 pˆ 2 1 1 pˆ c (1 pˆ c ) n1 n 2 p1 p2 p1 p2 , p1 p2 , orp1 p2 where pˆ c x1 x 2 n1 n2 *For both procedures above use the normal model (or chart) to obtain the p-value. INFERENCE for MEANS 5.) One Sample t-interval(for means)-used when you have one sample and the word “average” or “mean” is mentioned or you have lists of quantitative data and you want to estimate the true mean. Conditions: 1.) Randomization Condition-the data come from a random sample or randomized experiment. 2.) 10% Condition-the sample size is less than 10% of the population. 3.) Nearly normal condition-The data come from a unimodal, symmetric, bell-shaped distribution. This can be verified by constructing a histogram or a normal probability plot of the data. s Formula: x t *n 1 n 6.) One Sample t-Test (for means)- used when you have one sample and the word “average” or “mean” is mentioned or you have lists of quantitative data and you are comparing the sample mean to the population mean. Conditions: Same as the above for One Sample t-interval(for means). x s n x 0 Formula: t n 1 Null: Alt: x 0 , x 0 , orx 0 *For each of the above use the t-distribution (chart) for n-1 degrees of freedom. (Note: If you are given the population standard deviation, use instead of s in the above formulas.) 7.) Two Sample t-interval(for means)- used when you have two samples and the word “average” or “mean” is mentioned or you have lists of quantitative data and you are interested in estimating the true mean difference. Conditions: 1.) Independence Condition-The data from each sample should be independent. There is really no way to check this but you should think about whether it is reasonable. 2.) Randomization Condition-the data come from a random samples or randomized experiments. 3.) 10% Condition-the sample sizes are less than 10% of the population. 4.) Nearly normal condition-The data come from a unimodal, symmetric, bell-shaped distribution. This can be verified by constructing a histogram or a normal probability plot of the data. You must check this for both samples. s2 s2 Formula: ( x1 x 2 ) t *df 1 2 where degrees of freedom are no less than n 1 n1 n2 (where n is the smaller sample size) and no more than n1 n2 2 . (Note: You will probably do an interval such as this using the calculator. Just record the value given on the screen after the test.) 8.) Two Sample t-Test (for means)- used when you have two samples and the word “average” or “mean” is mentioned or you have lists of quantitative data and you are comparing their means or mean difference. Conditions: Same as the above for two Sample t-interval (for means). Formula: t df ( x1 x 2 ) ( 1 2 ) s12 s 22 n1 n2 (Note: the degrees of freedom formula for this is not needed for the exam but again you can get df from your calculator when you run this test.) Null: 1 2 1 2 , 1 2 0 , or1 2 Alt: ** If for some reason the variances are equal and you are using a two sample t-test (for means) the denominator of the formula above changes to the ugly mess on your formula sheet. The third formula under section 1 is the one you would use as your denominator and it used for pooling the samples due to equal variances.** *For each of the above use the t-distribution (chart) for the correct degrees of freedom. 9.) Special Case Two Sample T-test or Confience Interval (Matched Pairs test for Comparing Differences or Confidence Intervals)-used when the data are matched in pairs and you are looking for a comparison of the mean differences. Conditions: 1.) Paired Data Condition-The data must be paired. (Note: You need a justifiable way to do this. The rest of the conditions for this are the same as the conditions for the one sample procedures for means. The procedures themselves are also the same. Perform them on the differences between the two samples. CHI-SQUARE TESTS 10.) Chi-Square Goodness of Fit Test-used when comparing a sample distribution to a hypothesized population distribution. Conditions: NULL; ALT: 1.) Counted Data Condition-make sure the sample data is listed in counts. 2.) Randomization-Need random casers from a population of interest. 3.) Expected Cell-Frequency-There are at least 5 cases in each expected cell. No more than 20% of the expected counts are less than 5 and no expected counts are less than 1. The distribution of _____ is the same as ________ . The distribution of _____ is different than _______. Degrees of freedom= number of categories-1. Expected Counts are calculated by multiplying % from the population distribution or the old distribution by the sample size. Formula: df2 allcells O E 2 E 11.) Chi-Square Test of Homogeneity-used when you have more than 2 groups and you want to know if the category proportions are the same for each group. ExpectedCellCounts (rowtotal)(columntotal ) (tabletotal) Degrees of freedom=(# of rows-1)(# of columns-1) Condtions: 1.) Counted Data Condition-make sure the sample data is listed in counts. 2.) Randomization-Need only needed is generalizing to a larger population. 3.) Expected Cell-Frequency-Expected cell counts must be at least 5. Formula for the Test Statistic is the same as it is above for GOF. NULL; ALT: Distribution is the same for each group. Distribution is not the same for each group. 12.) Chi-Square Test for Independence-Used to determine if, in a single population, there is an association between two categorical variables. (Usually data presented in a 2-way table) Expected counts and degrees of freedom are calculated the same as they are homogeneity. Assumptions and Conditions are the same as they are for homogeneity. Formula for the test statistic is the same as GOF. NULL; ALT: Variable 1 and Variable 2 are independent. Variable 1 and Variable 2 are not independent. **For all the chi square tests you should look up the test statistic on the chi-distribution or use the chi square cdf function on your calculator. INFERENCE FOR REGRESSION 12.) Confidence Interval for the Slope of a Regression Line-used when looking for an estimate of the true slope of a regression line. Conditions: 1.) Linearity Condition-Scatterplot must look linear. 2.) Randomization 3.) Equal-Variance Condition-Residual plot should not have a pattern. 4.) Normality Condition-Normal Probability Plot of the residuals should be linear. The Standard Error About the Line Formula is The Standard Error About the slope Formula is s 1 resid 2 n2 SEb s (x x) 2 Degrees of Freedom= n-2 Formula: b t n 2 SEb 13.) Linear Regression t-test for Slope-used when you want to check for a linear relationship between two variables. Conditions are the same as the above for the interval. Formulas for standard errors and degrees of freedom are also the same. Formula: t NULL; ALT: b SE b There is no linear relationship between variable A and variable B. ( 0) There is a linear relationship between variable A and variable B. ( 0, 0, 0 ) **For regression inference techniques use the t-distribution.**