Class 09 Exam1 Prep Things you should know Exam Details • 75 minutes • Allowed – Any book, the course website and all files linked to it, your notes, excel, word, calculators. • Not Allowed – Communication with others (of any kind using any device) – Websites other than the course. • Short answers, lots of partial credit. Some easy. Some difficult. Class 01 Jefferson Had Girls • We used BINOMDIST(false) to find the probabilities of 0,1,2,3,4 girls in four children IF the coin flip model is true. • We constructed a table of Observed vs Expected Counts for the 31,595 Danish Families. • You should be able to finish the hypothesis test. • You should be able to do the hypothesis test for a p other than 0.5. Class 02 Prob, Pmfs, Binomial • The first probability problem – What if the situation were slightly different? • How probability works for two either/or events. – Athlete and Gender, Disease and Test. – Either build a 2x2 table of counts…or draw a prob tree…or use formulas. – What if one of the events had three outcomes? • Positive, Negative, Inconclusive • Strain A, Strain B, no disease • University Athlete, Intramural Athlete, Non-Athlete Class 02 Prob, Pmfs, Binomial • BINOMDIST(X,n,p,false) – Applying the rules of probability to calculate the probability of X successes in n independent trials. – This is a PMF – Characteristics of Pmfs • A schedule assigning the unit of prob to the possible values of X. • Mean, Mode, Median, Std Deviation, Variance. – Binomial Mean is n*p – Binomial Standard Deviation is [n*p*(1-p)]^.5 – Most of our problems had p=.5. What if p was different? Class 03 Lady Tasting Tea • Hypothesis Testing – Formulate H0 and Ha – Pick alpha (usually 0.05) – Identify and calculate the test statistics • So far it has either been number correct (proportion correct) or calculated chi-squared. – Calculate the p-value…the prob of observing a test statistic more extreme than the one observed if H0 is true. • So far we’ve used the binomial, the normal, the chi-squared to calculate p-values. – Reject H0 (in favor of Ha) if p-value is less than 0.05. Say the result is statistically significant. (Our result is rare if H0 is true). Class 04 Wunderdog and Normal • As n gets big, the binomial looks like the normal. • NORMDIST(X,μ,σ,TRUE) – As an approximation to the binomial set μ=n*p and σ=[n*p*(1-p)]^.5 – Normal is a PDF – Normal is a family of distributions….but all have identical properties…see next slide. EMBS Fig 6.4, p 249 You can use Normdist(X,0,1,true) Norminv(p,0,1) For more detail Normal continued • NORMDIST(X,μ,σ,false) – The height of the normal density curve. We have never used it…and never will. • NORMINV(p,μ,σ) – finds the x value such that P(X<x) = p • Lots of variations of questions you can use the normal to answer. – What is p? – What X gives a specified p? – What is σ? • Lots of decisions to make (what target lorex should use) Class 06 Descriptive Statistics • Each of the characteristics of a prob distribution (mean, median, mode, var, stdev, skew) has a corresponding summary descriptive statistic • Know how to calculate summary descriptive statistics and what they tell you. – They are only useful for NUMERICAL (not categorical) scaled variables. • EXCEPTION: If there are only two categories (Male/Female), (Right/Wrong), (Athlete/Not) then you can change to a 1/0 number and descriptive statistics make sense. • Descriptive statistics make some sense for birth months…especially in the soccer context where birth month number predicts tournament participation. Class 07 Roulette and GOF • Observations of a categorical variable with multiple categories (n=904) are summarized using a table of counts. – Use Countif or pivottable to calculate the counts. • If we have an H0 that leads to expected counts, then we can perform a chi-squared GOF test. – H0: All categories are equally probable (roulette and soccer birth month) • Know how to perform and interpret a GOF test. Class 07 Lorex GOF • Start with n=144 observations of a numerical variable. • Create (somewhat arbitrary) BINS, and a table of summary counts. • H0: the data came from N(10.2,.16) – Under H0, we can calculate the expected counts in each bin. • This is another use of the chi-squared GOF test. – Numerical data, bins and observed counts, H0 and expected counts, GOF test. Class 08 2-tailed tests and sample proportions • P-value is always the probability of observing a test statistic AS EXTREME as ours under H0. – The interpretation of “extreme” depends on Ha. • Ha: p>.5, extreme means X ≥ 8. • Ha: p≠.5, extreme means X ≥ 8 or X ≤ 2 – The calculation of the p-value depends on Ha. • The p-value for a 2-tailed test will usually be twice that from a one-tailed test. – It is more difficult to find statistical significance when you are less sure what you are looking for. There is often another way… • X is Binomial(n,p) • X is N(n*p,[n*p*(1-p)]^.5) • = X/n is N(p,[p*(1-p)/n]^.5) – Going from binomial to normal is an approximation good when n is big – The last two ARE EXACLTY THE SAME. Suppose n=100, p=.5 [n*p*(1-p)]^.5 X is N(50,5) X is N(0.5,0.05) 35 40 45 50 55 60 65 0.35 0.40 0.45 0.50 0.55 0.60 0.65 𝑝 ∗ (1 − 𝑝) 𝑛 1/2 When interpreting Statistics…. • It’s all about the n! – 60% correct is unimpressive if n=10 and very impressive if n=100. – Even with the chi-squared GOF test, n matters • Distance = (15-10)^2/10 = 2.5 • Distance = (150-100)^2/100 = 25 n is the total count (907 for roulette, 288 for soccer, 144 for Lorex) Hypothesis Tests we have come to love……. [149*.5*.5]^.5 Example Ha Test Statistic LTT p=0.5 Wunderdog p=0.5 Wunderdog p=0.5 p>0.5 p>0.5 p>0.5 Wunderdog p=0.5 p>0.5 X=number correct X=number correct X=number correct pbar = sample proportion correct Buttered toast Roulette Wheel Lorex H0 p=0.5 p ≠.5 X=number butter up Distribution of test statistic p-value given H0 Binomial =1-BINOMDIST(7,10,0.5,TRUE) Binomial =1-BINOMDIST(86,149,0.5,TRUE) Normal =1-NORMDIST(87,74.5,6.10,TRUE) conclusion 0.055 0.024 0.020 fail to reject reject reject Normal =1-NORMDIST(0.586,0.5,0.041,TRUE) 0.020 reject Binomial =1-binomdist(28,48,.5,true) + binomdist(19,48,.5,true) 0.097 fail to reject 0.737 fail to reject 0.371 fail to reject they do chi-squared with calculated chi-squared =CHIDIST(31.2,37) not 37 dof Data came from they did chi-squared with calculated chi-squared =CHIDIST(7.577,7) N(10.2,.16) not 7 dof All 38 p's = 1/38 pvalue [.5*.5/149]^.5 Pfeifer’s Pfoibles • How many will she get correct? • Do I have the disease? – Give me probabilities as answers even though I did not explicitly ask for them. • H0 is a statement about the process, not the outcomes. • If p-value < 0.05 say …. – The result is statistically significant – We reject the null hypothesis. • If p-value > 0.05 say … – The result is NOT statistically significant. – We fail to (or cannot) reject H0.