STA 101 Final Review

advertisement
STA 101 Final Review
Statistics 101
Thomas Leininger
June 24, 2013
Announcements
All work (besides projects) should be returned to you and should
be entered on Sakai.
Office Hour: 2–3pm today (Old Chem 114)
Final Exam: 9am–12pm HERE
Bring a calculator–no cell phones, laptops, tablets, etc.
Allowed one 8 12 × 11 inch cheat sheet with notes on both sides.
You must create this yourself.
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
2 / 19
Topics for today
different conditions and hypotheses in each test (and a list of tests) — when to
use Z, T, F stats, 2 vs 1 sample tests, degrees of freedom
pooled variance, pooled proportion (when to use)
confidence intervals – is it always two-sided?, how to interpret CI of difference
b/w 2 means
chi-square – what we are testing, when to use, how to approach hypotheses
ANOVA (filling in the chart)
Regression – interpreting linear lines and writing them, correlation, residuals
Type I, II error
Bayesian probability – won’t be on there (but cond’l probability might)
MLR – won’t be on there
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
3 / 19
Review
What you need to know about HTs:
Format for answering a hypothesis test question:
1
State the null and alternative hypotheses
2
Check conditions
3
Calculate the test statistic (T, Z, etc.) and standard error (if
needed)
4
Calculate the p-value (double if two-sided hypothesis)
5
Reject or fail to reject the null hypothesis
6
Interpret your decision in context of the problem
Know how to interpret a p-value in context
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
4 / 19
Review
What you need to know about CIs:
Format for answering a confidence interval question:
1
Check conditions
2
?)
Find and state the critical value (z ? , tdf
3
Calculate the standard error
4
Calculate the confidence interval
5
Interpret your confidence interval in context of the problem
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
5 / 19
Review
Different types of tests
General conditions for HTs/CIs with means or proportions:
Independence–random samples (¡10% of population sampled)
nearly normal data–either we know the population is normal or
we have to use CLT
Conditions for CLT:
sample looks nearly normal, no large skew, no major outliers. If
not met, should use randomization
sample size ≥ 30. Else, you should use a t-distribution.
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
6 / 19
Review
One sample test for mean
Conditions for one sample test:
Independence–random samples (¡10% of population sampled)
sample looks nearly normal, no large skew, no major outliers. If not met, should use
randomization
sample size ≥ 30. Else, you should use a t-distribution.
Note: if paired data, use differences as a one sample test
Conditions for two sample test:
Independence–random samples (¡10% of population sampled)
sample looks nearly normal, no large skew, no major outliers. If not met, should use
randomization
both sample sizes ≥ 30. Else, you should use a t-distribution (if either or both are smaller
than 30).
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
7 / 19
Review
Recap - inference for one proportion
Population parameter: p, point estimate: p̂
Conditions:
independence
- random sample
at least 10 successes and failures
- if not → randomization
q
Standard error: SE =
p (1−p )
n
for CI: use p̂
for HT: use p0
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
8 / 19
Review
Recap - comparing two proportions
Population parameter: (p1 − p2 ), point estimate: (p̂1 − p̂2 )
Conditions:
independence within groups
- random sample and 10% condition met for both groups
independence between groups
at least 10 successes and failures in each group
- if not → randomization
q
SE(p̂1 −p̂2 ) =
p1 (1−p1 )
n1
+
p2 (1−p2 )
n2
for CI: use p̂1 and p̂2
for HT:
1 +#suc2
when H0 : p1 = p2 : use p̂pool = # suc
n1 +n2
when H0 : p1 − p2 = (some value other than 0): use p̂1 and p̂2
- this is pretty rare
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
9 / 19
Review
Reference - standard error calculations
one sample
mean
SE =
proportion
SE =
q
√s
n
q
two samples
SE =
p (1−p )
n
q
SE =
s12
n1
+
s22
n2
p1 (1−p1 )
n1
+
p2 (1−p2 )
n2
When working with means, it’s very rare that σ is known, so we
usually use s.
When working with proportions,
if doing a hypothesis test, p comes from the null hypothesis
if constructing a confidence interval, use p̂ instead
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
10 / 19
Review
When to use pooled standard error
For a two sample HT/CI for a difference of means, we can pool our
information about the variance if s1 and s2 can be assumed to be
roughly equal
2
If we can do this, then we replace s12 and s22 with spool
, where
2
spool
=
s12 (n1 − 1) + s22 (n2 − 1)
n1 + n2 − 2
The degrees of freedom for the t-distribution are now df = n1 + n2 − 2
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
11 / 19
Review
When to use pooled proportion
When doing a HT for a two sample test for a difference of proportions,
the null hypothesis is that p1 = p2 or p1 − p2 = 0
Since we assume that H0 is true in a HT, we assume ppool = p1 = p2 ,
which means our standard error is
s
SE =
ppool (1 − ppool )
n1
We estimate ppool with p̂pool =
Statistics 101 (Thomas Leininger)
+
ppool (1 − ppool )
n2
# of successes1 + # of successes2
n1 + n2
STA 101 Final Review
June 24, 2013
12 / 19
Review
More on confidence intervals
how to interpret diff b/w two means
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
13 / 19
Review
ANOVA—filling in chart
Exercise 5.40
Mean
SD
n
Less than HS
38.67
15.81
121
Df
degree
XXXXX
Residuals
XXXXX
Total
XXXXX
Statistics 101 (Thomas Leininger)
Educational attainment
HS
Jr Coll
Bachelor’s
39.6
41.39
42.55
14.97
18.1
13.62
546
97
253
Sum Sq
XXXXX
267,382
Mean Sq
501.54
Graduate
40.85
15.51
155
F value
XXXXX
Total
40.45
15.17
1,172
Pr(>F)
0.0682
XXXXX
XXXXX
STA 101 Final Review
June 24, 2013
14 / 19
Review
Chi-square tests
Goodness of Fit: 1 variable
H0 : There is no inconsistency between the observed and the
expected counts.
HA : There is an inconsistency between the observed and the
expected counts.
Test of Independence: 2 variables
H0 : Variable 1 and Variable 2 are independent.
HA : Variable 1 and Variable 2 are not independent.
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
15 / 19
Review
Regression
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.012701
0.012638 -1.005
0.332
bac$Beers
0.017964
0.002402
7.480 2.97e-06
BAC (grams per deciliter)
●
Exercise 7.28
0.15
●
0.10
●
●
●
●
●
●
●
●
0.05
●
●
●
●
●
●
2
4
6
Cans of beer
8
Residual standard error: 0.02044 on 14 degrees of freedom
Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855
F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06
Write the equation of the regression line. Interpret the slope and
intercept in context.
Do the data provide strong evidence that drinking more cans of beer is
associated with an increase in blood alcohol? State the null and
alternative hypotheses, report the p-value, and state your conclusion.
What is R 2 ? Interpret R 2 in context. What is the correlation?
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
16 / 19
Review
Regression
Conditions for regression
linearity
nearly normal residuals
constant variability (of residuals)
Residuals
Predict BAC content for someone who has had 5 cans of beer:
ˆ = −0.012701 + 0.017964 ∗ 5 = 0.077099
BAC
Observed BAC is 0.10
Residual is yi − ŷi = 0.10 − 0.077 = 0.023
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
17 / 19
Review
Type I, Type II error
H0 true
Truth
HA true
Decision
fail to reject H0
reject H0
X
Type 1 Error
Type 2 Error
X
Type 1 error is rejecting H0 when you shouldn’t have, and the
probability of doing so is α (significance level)
Type 2 error is failing to reject H0 when you should have, and the
probability of doing so is β (more complicated to calculate)
Power of a test is the probability of correctly rejecting H0 , and the
probability of doing so is 1 − β
In hypothesis testing, we want to keep α and β low, but there are
inherent trade-offs.
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
18 / 19
Review
Calculating sample sizes
For our CIs, we can calculate the minimum sample size needed to
provide a certain margin of error
For a desired (maximum) ME, call it m, we start with
m ≤ ME = critical value × SE (function on n)
Solve for n, should always look like n ≥ 123.45
Then round n up to the next whole number (123.45 → 124).
For a two-sample CI, you would need both sample sizes to be at least
this large.
Statistics 101 (Thomas Leininger)
STA 101 Final Review
June 24, 2013
19 / 19
Download