AP Stat Summary of Using Inference Procedures

advertisement
AP Stat Summary of Using Inference
Procedures
For all situations below:
Confidence Interval:
statistic  (criticalvalue )  ( s tan darddeviationofstatistic )
Standardized Test Statistic:
statistic  parameter
s tan darddeviationofstatistic
INFERENCE PROPORTIONS
1.) One Proportion Z-interval-used when you have one sample and the word “proportion” is
mentioned and you want to estimate the true proportion.
Conditions:
1.) Plausible Independence Condition-This requires that you know
something about the data.
2.) Randomization Condition-Were proper randomization techniques used to
collect the data?
3.) 10% Condition- Unless the sample size is less than 10% of the
population the normal model may not be appropriate.
4.) Success/Failure Condition-We must have at least 10 “successes”
(np  10) and 10 “failures.” (n(1  p)  10)

Formula: pˆ  z * 

pˆ (1  pˆ ) 


n

2.) One Proportion Z-test-used when you have one sample and the word “proportion” is
mentioned and you want to compare the sample proportion to an old proportion.
Conditions:
Same as above for the One Proportion z-interval except that the
third condition is checked using the population proportion.
Formula: z 
pˆ  p 0
p 0 (1  p 0 )
n
p  p0
Null:
p  p0 , p  p0 , orp  p0
Alt:
*For both procedures above use the normal model (or chart) to obtain the p-value.
3.) Two Proportion Z-interval-used when you have two samples and the word “proportion”
is mentioned and you want to estimate the true difference between the two proportions.
Conditions:
1.) Plausible Independence Condition-It is important to be certain that the
two sample groups are independent of one another. If the samples are
NOT independent, this procedure is inappropriate.
2.) Randomization Condition-Were proper randomization techniques used to
collect the data?
3.) 10% Condition- Unless the sample size of each sample is less than 10%
of its respective population the normal model may not be appropriate.
4.) Success/Failure Condition-We must have at least 10 “successes”
(np  10) and 10 “failures.” (n(1  p)  10) from each sample.
Note: Some statisticians feel as long each of the above computations is
more than 5, the success/failure condition is met.
 pˆ (1  pˆ 1 ) pˆ 2 (1  pˆ 2 ) 


Formula: ( pˆ 1  pˆ 2 )  z *  1

n
n
1
2


4.) Two Proportion Z-Test- used when you have two samples and the word “proportion” is
mentioned and you want to compare the difference between the two proportions.
Conditions:
1.) Plausible Independence Condition-It is important to be certain that the
two sample groups are independent of one another. If the samples are
NOT independent, this procedure is inappropriate.
2.) Randomization Condition-Were proper randomization techniques used to
collect the data?
3.) 10% Condition- Unless the sample size of each sample is less than 10%
of its respective population the normal model may not be appropriate.
4.) Success/Failure Condition-We must have at least 10 “successes”
(np  10) and 10 “failures.” (n(1  p)  10) for each sample.
Note: Some statisticians feel as long each of the above computations is
more than 5, the success/failure condition is met.
Formula:
Null:
Alt:
z
pˆ 1  pˆ 2
1
1 
pˆ c (1  pˆ c )  
 n1 n 2 
p1  p2
p1  p2 , p1  p2 , orp1  p2
where pˆ c 
x1  x 2
n1  n2
*For both procedures above use the normal model (or chart) to obtain the p-value.
INFERENCE for MEANS
5.) One Sample t-interval(for means)-used when you have one sample and the word
“average” or “mean” is mentioned or you have lists of quantitative data and you want to
estimate the true mean.
Conditions:
1.) Randomization Condition-the data come from a random sample or
randomized experiment.
2.) 10% Condition-the sample size is less than 10% of the population.
3.) Nearly normal condition-The data come from a unimodal, symmetric,
bell-shaped distribution. This can be verified by constructing a histogram
or a normal probability plot of the data.
 s 
Formula: x  t *n 1 

 n
6.) One Sample t-Test (for means)- used when you have one sample and the word
“average” or “mean” is mentioned or you have lists of quantitative data and you are comparing
the sample mean to the population mean.
Conditions: Same as the above for One Sample t-interval(for means).
x
 s 


 n
x  0
Formula: t n 1 
Null:
Alt:
x   0 , x   0 , orx   0
*For each of the above use the t-distribution (chart) for n-1 degrees of freedom.
(Note: If you are given the population standard deviation, use  instead of s in the above
formulas.)
7.) Two Sample t-interval(for means)- used when you have two samples and the word
“average” or “mean” is mentioned or you have lists of quantitative data and you are interested
in estimating the true mean difference.
Conditions:
1.) Independence Condition-The data from each sample should be
independent. There is really no way to check this but you should think
about whether it is reasonable.
2.) Randomization Condition-the data come from a random samples or
randomized experiments.
3.) 10% Condition-the sample sizes are less than 10% of the population.
4.) Nearly normal condition-The data come from a unimodal, symmetric,
bell-shaped distribution. This can be verified by constructing a histogram
or a normal probability plot of the data. You must check this for both
samples.
 s2 s2 
Formula: ( x1  x 2 )  t *df  1  2  where degrees of freedom are no less than n  1
 n1 n2 


(where n is the smaller sample size) and no more than n1  n2  2 . (Note: You will
probably do an interval such as this using the calculator. Just record the value given
on the screen after the test.)
8.) Two Sample t-Test (for means)- used when you have two samples and the word
“average” or “mean” is mentioned or you have lists of quantitative data and you are comparing
their means or mean difference.
Conditions: Same as the above for two Sample t-interval (for means).
Formula: t df 
( x1  x 2 )  ( 1   2 )
s12 s 22

n1 n2
(Note: the degrees of freedom formula for this
is not needed for the exam but again you can get df from your calculator when you
run this test.)
Null:
1   2
1   2 , 1   2 0 , or1   2
Alt:
** If for some reason the variances are equal and you are using a two sample t-test
(for means) the denominator of the formula above changes to the ugly mess on your
formula sheet. The third formula under section 1 is the one you would use as your
denominator and it used for pooling the samples due to equal variances.**
*For each of the above use the t-distribution (chart) for the correct degrees of
freedom.
9.) Special Case Two Sample T-test or Confience Interval (Matched Pairs test for
Comparing Differences or Confidence Intervals)-used when the data are matched in pairs
and you are looking for a comparison of the mean differences.
Conditions:
1.) Paired Data Condition-The data must be paired. (Note: You need a
justifiable way to do this.
The rest of the conditions for this are the same as the conditions for the
one sample procedures for means.
The procedures themselves are also the same. Perform them on the
differences between the two samples.
CHI-SQUARE TESTS
10.) Chi-Square Goodness of Fit Test-used when comparing a sample distribution to a
hypothesized population distribution.
Conditions:
NULL;
ALT:
1.) Counted Data Condition-make sure the sample data is listed in counts.
2.) Randomization-Need random casers from a population of interest.
3.) Expected Cell-Frequency-There are at least 5 cases in each expected
cell. No more than 20% of the expected counts are less than 5 and no
expected counts are less than 1.
The distribution of _____ is the same as ________ .
The distribution of _____ is different than _______.
Degrees of freedom= number of categories-1.
Expected Counts are calculated by multiplying % from the population distribution or the old
distribution by the sample size.
Formula:
 df2 

allcells
O  E 2
E
11.) Chi-Square Test of Homogeneity-used when you have more than 2 groups and you
want to know if the category proportions are the same for each group.
ExpectedCellCounts 
(rowtotal)(columntotal )
(tabletotal)
Degrees of freedom=(# of rows-1)(# of columns-1)
Condtions:
1.) Counted Data Condition-make sure the sample data is listed in counts.
2.) Randomization-Need only needed is generalizing to a larger population.
3.) Expected Cell-Frequency-Expected cell counts must be at least 5.
Formula for the Test Statistic is the same as it is above for GOF.
NULL;
ALT:
Distribution is the same for each group.
Distribution is not the same for each group.
12.) Chi-Square Test for Independence-Used to determine if, in a single population, there
is an association between two categorical variables. (Usually data presented in a 2-way table)
Expected counts and degrees of freedom are calculated the same as they are homogeneity.
Assumptions and Conditions are the same as they are for homogeneity.
Formula for the test statistic is the same as GOF.
NULL;
ALT:
Variable 1 and Variable 2 are independent.
Variable 1 and Variable 2 are not independent.
**For all the chi square tests you should look up the test statistic on the chi-distribution or use
the chi square cdf function on your calculator.
INFERENCE FOR REGRESSION
12.) Confidence Interval for the Slope of a Regression Line-used when looking for an
estimate of the true slope of a regression line.
Conditions:
1.) Linearity Condition-Scatterplot must look linear.
2.) Randomization
3.) Equal-Variance Condition-Residual plot should not have a pattern.
4.) Normality Condition-Normal Probability Plot of the residuals should be
linear.
The Standard Error About the Line Formula is
The Standard Error About the slope Formula is
s
1
 resid 2
n2
SEb 
s
 (x  x)
2
Degrees of Freedom= n-2
Formula: b  t n 2 SEb
13.) Linear Regression t-test for Slope-used when you want to check for a linear
relationship between two variables.
Conditions are the same as the above for the interval.
Formulas for standard errors and degrees of freedom are also the same.
Formula: t 
NULL;
ALT:
b
SE b
There is no linear relationship between variable A and variable B.
(  0)
There is a linear relationship between variable A and variable B.
(   0,   0,   0 )
**For regression inference techniques use the t-distribution.**
Download