Class 09 notes Review of Part 1

advertisement
Class 09 Exam1 Prep
Things you should know
Exam Details
• 75 minutes
• Allowed
– Any book, the course website and all files linked to it,
your notes, excel, word, calculators.
• Not Allowed
– Communication with others (of any kind using any
device)
– Websites other than the course.
• Short answers, lots of partial credit. Some easy.
Some difficult.
Class 01 Jefferson Had Girls
• We used BINOMDIST(false) to find the
probabilities of 0,1,2,3,4 girls in four children IF
the coin flip model is true.
• We constructed a table of Observed vs Expected
Counts for the 31,595 Danish Families.
• You should be able to finish the hypothesis test.
• You should be able to do the hypothesis test for a
p other than 0.5.
Class 02 Prob, Pmfs, Binomial
• The first probability problem
– What if the situation were slightly different?
• How probability works for two either/or events.
– Athlete and Gender, Disease and Test.
– Either build a 2x2 table of counts…or draw a prob
tree…or use formulas.
– What if one of the events had three outcomes?
• Positive, Negative, Inconclusive
• Strain A, Strain B, no disease
• University Athlete, Intramural Athlete, Non-Athlete
Class 02 Prob, Pmfs, Binomial
• BINOMDIST(X,n,p,false)
– Applying the rules of probability to calculate the
probability of X successes in n independent trials.
– This is a PMF
– Characteristics of Pmfs
• A schedule assigning the unit of prob to the possible values
of X.
• Mean, Mode, Median, Std Deviation, Variance.
– Binomial Mean is n*p
– Binomial Standard Deviation is [n*p*(1-p)]^.5
– Most of our problems had p=.5. What if p was
different?
Class 03 Lady Tasting Tea
• Hypothesis Testing
– Formulate H0 and Ha
– Pick alpha (usually 0.05)
– Identify and calculate the test statistics
• So far it has either been number correct (proportion correct) or
calculated chi-squared.
– Calculate the p-value…the prob of observing a test statistic
more extreme than the one observed if H0 is true.
• So far we’ve used the binomial, the normal, the chi-squared to
calculate p-values.
– Reject H0 (in favor of Ha) if p-value is less than 0.05. Say
the result is statistically significant. (Our result is rare if H0
is true).
Class 04 Wunderdog and Normal
• As n gets big, the binomial looks like the
normal.
• NORMDIST(X,μ,σ,TRUE)
– As an approximation to the binomial set μ=n*p
and σ=[n*p*(1-p)]^.5
– Normal is a PDF
– Normal is a family of distributions….but all have
identical properties…see next slide.
EMBS Fig 6.4, p 249
You can use
Normdist(X,0,1,true)
Norminv(p,0,1)
For more detail
Normal continued
• NORMDIST(X,μ,σ,false)
– The height of the normal density curve. We have never used it…and
never will.
• NORMINV(p,μ,σ)
– finds the x value such that P(X<x) = p
• Lots of variations of questions you can use the normal to answer.
– What is p?
– What X gives a specified p?
– What is σ?
• Lots of decisions to make (what target lorex should use)
Class 06 Descriptive Statistics
• Each of the characteristics of a prob distribution
(mean, median, mode, var, stdev, skew) has a
corresponding summary descriptive statistic
• Know how to calculate summary descriptive
statistics and what they tell you.
– They are only useful for NUMERICAL (not categorical)
scaled variables.
• EXCEPTION: If there are only two categories (Male/Female),
(Right/Wrong), (Athlete/Not) then you can change to a 1/0
number and descriptive statistics make sense.
• Descriptive statistics make some sense for birth
months…especially in the soccer context where birth month
number predicts tournament participation.
Class 07 Roulette and GOF
• Observations of a categorical variable with
multiple categories (n=904) are summarized
using a table of counts.
– Use Countif or pivottable to calculate the counts.
• If we have an H0 that leads to expected counts,
then we can perform a chi-squared GOF test.
– H0: All categories are equally probable (roulette and
soccer birth month)
• Know how to perform and interpret a GOF test.
Class 07 Lorex GOF
• Start with n=144 observations of a numerical
variable.
• Create (somewhat arbitrary) BINS, and a table of
summary counts.
• H0: the data came from N(10.2,.16)
– Under H0, we can calculate the expected counts in
each bin.
• This is another use of the chi-squared GOF test.
– Numerical data, bins and observed counts, H0 and
expected counts, GOF test.
Class 08 2-tailed tests and sample
proportions
• P-value is always the probability of observing a
test statistic AS EXTREME as ours under H0.
– The interpretation of “extreme” depends on Ha.
• Ha: p>.5, extreme means X ≥ 8.
• Ha: p≠.5, extreme means X ≥ 8 or X ≤ 2
– The calculation of the p-value depends on Ha.
• The p-value for a 2-tailed test will usually be
twice that from a one-tailed test.
– It is more difficult to find statistical significance when
you are less sure what you are looking for.
There is often another way…
• X is Binomial(n,p)
• X is N(n*p,[n*p*(1-p)]^.5)
•
= X/n is N(p,[p*(1-p)/n]^.5)
– Going from binomial to normal is an
approximation good when n is big
– The last two ARE EXACLTY THE SAME.
Suppose n=100,
p=.5
[n*p*(1-p)]^.5
X is N(50,5)
X is N(0.5,0.05)
35
40
45
50
55
60
65
0.35 0.40 0.45 0.50 0.55 0.60 0.65
𝑝 ∗ (1 − 𝑝)
𝑛
1/2
When interpreting Statistics….
• It’s all about the n!
– 60% correct is unimpressive if n=10 and very
impressive if n=100.
– Even with the chi-squared GOF test, n matters
• Distance = (15-10)^2/10 = 2.5
• Distance = (150-100)^2/100 = 25
n is the total
count (907
for roulette,
288 for
soccer, 144
for Lorex)
Hypothesis Tests we have come to
love…….
[149*.5*.5]^.5
Example
Ha
Test Statistic
LTT
p=0.5
Wunderdog p=0.5
Wunderdog p=0.5
p>0.5
p>0.5
p>0.5
Wunderdog p=0.5
p>0.5
X=number correct
X=number correct
X=number correct
pbar = sample
proportion correct
Buttered
toast
Roulette
Wheel
Lorex
H0
p=0.5
p ≠.5
X=number butter up
Distribution of
test statistic
p-value
given H0
Binomial
=1-BINOMDIST(7,10,0.5,TRUE)
Binomial
=1-BINOMDIST(86,149,0.5,TRUE)
Normal
=1-NORMDIST(87,74.5,6.10,TRUE)
conclusion
0.055
0.024
0.020
fail to reject
reject
reject
Normal
=1-NORMDIST(0.586,0.5,0.041,TRUE)
0.020
reject
Binomial
=1-binomdist(28,48,.5,true)
+ binomdist(19,48,.5,true)
0.097
fail to reject
0.737
fail to reject
0.371
fail to reject
they do
chi-squared with
calculated chi-squared
=CHIDIST(31.2,37)
not
37 dof
Data came from they did
chi-squared with
calculated chi-squared
=CHIDIST(7.577,7)
N(10.2,.16)
not
7 dof
All 38 p's = 1/38
pvalue
[.5*.5/149]^.5
Pfeifer’s Pfoibles
• How many will she get correct?
• Do I have the disease?
– Give me probabilities as answers even though I did not
explicitly ask for them.
• H0 is a statement about the process, not the outcomes.
• If p-value < 0.05 say ….
– The result is statistically significant
– We reject the null hypothesis.
• If p-value > 0.05 say …
– The result is NOT statistically significant.
– We fail to (or cannot) reject H0.
Download