Refresher in inferential statistics

+ Refresher in inferential statistics Tim.bates@ed.ac.uk http://www.psy.ed.ac.uk/events/research_seminars/psych stats + Resources  http://www.statmethods.net + Our basic question…  Did something occur?  Importantly, did what we predicted would occur, transpire?, i.e., is the world as we predicted?  Why does this require statistics? + Is Breastfeeding good for Baby’s brains? The association between breastfeeding and IQ is moderated by a genetic polymorphism (rs174575) in the FADS2 gene Caspi A et al. PNAS 2007;104:18860-18865 ©2007 by National Academy of Sciences + Overview  Hypothesis testing  p-values  Type I vs. Type II errors  Power  Correlation  Fisher’s exact test  T-test  Linear regression  Non-parametric statistics (mostly for you to go over in your own time) + Hypothesis testing 1. Propose a null and an experimental hypothesis. Mistakes here may make the experiment un-analysable 2. Consider the assumptions of the test: Are they met? Statistical independence of observations Distributions of the observations. Student's t distribution, normal distribution etc. 3. Compute the relevant test statistic. 1. Student’s t-test-> t ; ANOVA  F; Chi2 4. Compute likelihood of the test-statistic: 1. Does it exceed your chosen threshold? 2. Either reject (or fail to reject) the null hypothesis + What mistakes can we make? “The World” Yes Yes correct detection Your Decision No false negative No false positive correct rejection + Starting to make inferences…the Binomial  Toss a coin + Dropping lots of coins...  Pachinko + Normal compared to Binomial  n=6  p = .5 + Distributions normal (µ, ∂) binomial (p, n) + Distributions Poisson (lambda) Accidents in a period of time; Power Publication rates + Testing what distribution you have + Why are things normal? + Central limit theorem  The mean of a large number of independent random variables is distributed approximately normally. + Hypothesis testing  Making statistical decisions using experimental data.  Need to form a null hypothesis   (we can reject, but not confirm hypotheses) A result is “significant” if it is unlikely to have occurred by chance.  Ronald Fisher “We may discover whether a second sample is or is not significantly different from the first”. + What mistakes can we make? “The World” Yes Yes correct detection Your Decision No false negative No false positive correct rejection + Error  Type-I error: False Alarm, a bogus effect  reject the null hypothesis when it is really true  Much of published science is Type-I error  (Ioannides, 2008)  Type-II error: Miss a real effect  Fail to reject our null hypothesis when it is false  Many small projects have this problem  Type-III error: :-)  lazy, incompetent, or willful ignorance of the truth + p-values  Almost any difference (a count, a difference in means, a difference in variances) can be found with some probability, irrespective of the true situation.  All we can do is to set a threshold likelihood for deciding that an event occurred by chance.  p=.05 = 1 time in 20, the result would be as large by chance when the null hypothesis is true. + Type I vs. Type II errors  Type I:  False positive  Likelihood of type 1 = α  p=.05 = setting α to .05 World  Type II:  False negative  Likelihood of type 2 = β  Power = 1-β Yes You No Yes Correct detection Type I (α) (power) No Type II (β) Correct rejection + P-values  p-value is the likelihood of mean differences as large or larger than those observed in the data occurring by chance  p-value criteria (alpha ) allow us a binary answer to our questions  Questions – is a smaller p-value:  “More” significant?  Indicate a “Bigger” effect? (if so when?)  and how could we measure” effect”? + Compare these two statements  It’s ‘significant’, but how big is the effect?  I can see it’s big: but what is the p-value? + Confidence Intervals  Range of values within a given likelihood threshold    (for instance 95%) Closely related to p-values.  p = 1-CI  i.e., if p<.05, 95% CI will not include 0 (no difference)  Would you rather have a CI or a p-value?  Why? What is an effect size? + P and CI  You can’t go from p to CI!  You can go from CI to p  At a p=.05, 95%CIs will overlap less than 25%  At p= .01, the 95% CI bars just touch + Units of a Confidence Interval   Unlike p, CIs are given in the units of the DV  Cumming and Finch (2005)  BMI in people on a low carb diet might be19-23 kg/m2 Cumming, G. and Finch S.(2005). Inference by eye: confidence intervals and how to read pictures of data. American Psychologist. 60:170-80. PMID: 15740449 + Standard Errors and Standard Deviations   SE is (typically) the standard error of the mean  The precision with which we have estimated the population mean based on our sample  Computationally, it is ∂/sqrt(n) A 95% confidence interval is ± 1.96 SE + Example: coin toss  Random sample of 100 coin tosses, of a coin believed to be fair  We observed number of 45 heads, and 55 tails: Is the coin fair? + Binomial test  binom.test(x=45, n=100, p=.5, alternative="two.sided”) number of successes = 45, number of trials = 100 p-value = 0.3682 alternative hypothesis: true probability of success != 0.5 95 percent confidence interval: 0.3503 0.5527 sample estimates: probability of success: 0.45 + Categorical Data  Fisher’s Exact Test  Categorical data resulting from classifying objects in one of two ways  Tests significance of the observed "contingency" of the two outcomes.  Fisher, R. A. (1922). On the interpretation of χ2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 85(1), 87-94. + The Lady Drinking Tea  Question: Does Tea taste better if the milk is added to the tea, or vice versa?  Null Hypothesis: The drinker cannot tell  Subjects: Ms Bristol  Experiment: 8 "trials" (cups): 4 in each way, in random order  DV: Milk versus Milk second discrimination  Enter data into 2 x 2 contingency table + Fisher Contingency Table Guess Milk Tea Truth Milk Tea 3 1 1 3 A = c(1, 1, 1, 0, 1, 0, 0, 0) # vector of guesses B = c(1, 1, 1, 1, 0, 0, 0, 0) # vector of Teas guessTable <- table(A,B) # contingency table labels = list(Guess = c("Milk", "Tea"), Truth = c("Milk", "Tea")) # make labels dimnames(guessTable)= labels # add label fisher.test(guessTable, alternative = "greater") # test + Can she tell? Fisher's Exact Test for Count Data p-value = 0.24 # association could not be established Alternative hypothesis: true odds ratio is greater than 1 95% confidence interval: 0.313 – Inf Sample odds ratio: 6.40 + What if we have two continuous variables? Are they related Q: If you have continuous depression scores and cut-off scores, which is more powerful? + Correlation of two continuous variables: Pearson’s r  All variables continuous  Pearson + Correlation: what are the maximum and minimum correlations? + Power (1-β)  Probability that a test will correctly reject the null hypothesis.  Complement of the false negative rate, β  False negative = missing a real effect  1-β = p (correctly reject a false null hypothesis) + Power and how to get it  Probability of rejecting the null hypothesis when it is false  Whence comes power? + Power applied to a correlation Samples of n=30 from a population in which two normal traits correlate 0.3  r=0.3  xy = mvrnorm (n=30, mu=rep(0,2), Sigma= matrix(c(1,r,r,1) ,nrow=2, ncol=2));  xy = data.frame(xy);  names(xy) <- c("x", "y");  qplot(x, y, data = xy, geom = c("point" , "smooth"), method=lm) + Power of a correlation test library(pwr) pwr.r.test(n = 30, r = .3, sig.level = 0.05) n = 30 r = 0.3 sig.level = 0.05 power = 0.359 alternative = two.sided + Power: r = .3 + t-test  When we wish to compare means in a sample, we must estimate the standard deviation from the sample  Student's t-distribution is the distribution of small samples from normally varying populations + t-distribution function  t is defined as the ratio:  Z/sqrt(V/v)  Z is normally distributed with expected value 0 and variance 1;  V has a chi-square distribution with ν degrees of freedom; + Normal and t-distributions  Normal is in blue  Green = t with df = 1  Red = t with df = 3 (far right = df increasing to 30) + Power of t-test power.t.test(n=15, delta=.5) Two-sample t test power calculation n = 15 ; delta = 0.5 ; sd = 1; sig.level = 0.05 power = 0.26 alternative = two.sided NOTE: n is number in *each* group + Linear regression + Linear regression  fit = lm(y ~ x1 + x2 + x3, data=mydata)  summary(fit)  anova(fit) # show results # anova table  coefficients(fit) # model coefficients  confint(fit, level=0.95)  fitted(fit) # CIs for model parameters # predicted values  residuals(fit) # residuals  influence(fit) # regression diagnostics + Nonparametric Statistics Timothy C. Bates tim.bates@ed.ac.uk + Bootstrapping: Kurtosis differences kurtosisDiff <- function(x, y, B = 1000){ kx <- replicate(B, kurtosi(sample(x, replace = TRUE))) ky <- replicate(B, kurtosi(sample(y, replace = TRUE))) return(kx - ky) } kurtDiff <- kurtosisDiff(x, y, B = 10000); mean(kurtDiff > 0) # p= 0.205 NS + Parametric Statistics 1  Assume data are drawn from samples with a certain distribution (usually normal)  Compute the likelihood that groups are related/unrelated or same/different given that underlying model  t-test, Pearson’s correlation, ANOVA… + Parametric Statistics 2 Assumptions of Parametric statistics  1. Observations are independent 2. Your data are normally distributed 3. Variances are equal across groups  Can be modified to cope with unequal ∂2 + Non-parametric Statistics?  Non-parametric statistics do not assume any underlying distribution  They compute the likelihood that your groups are the same or different by comparing the ranks of subjects across the range of scores. + Non-parametric Statistics Assumptions of non-parametric statistics  1. Observations are independent + Non-parametric Statistics?  Non-parametric statistics do not assume any underlying distribution  Estimating or modeling this distribution reduces their power to detect effects…  So don’t use them unless you have to + Why use a Non-parametric Statistic?  Very small samples  Leads to Type-1 (false alarm) errors  Outliers more often lead to spurious Type-1 (false alarm) errors in parametric statistics.  Nonparametric statistics reduce data to an ordinal rank, which reduces the impact or leverage of outliers. + Non-parametric Choices Data type? continuous discret e Question? association Spearman’s Rank χ2 Different central value BrownForsythe Number of groups? two-groups Mann-Whitney U Wilcoxon’s Rank Sums Difference in ∂2 more than 2 Kruskal-Wallis test + Non-parametric Choices Data type? continuous discret e Question? Like a Pearson’s R association Spearman’s Rank Like Student’s t No alternative χ2 Different central value Difference in ∂2 BrownForsythe Number of groups? two-groups Mann-Whitney U Wilcoxon’s Rank Sums more than 2 Kruskal-Wallis test Like F-test Like ANOVA + Binomial test binom.test(45, 100, .5, alternative="two.sided”) number of successes = 45, number of trials = 100, p-value = 0.3682 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.350 0.5527 Sample estimates: probability of success 0.45 binom.test(51,235,(1/6),alternative="greater") + Spearman Rank test (ρ (rho))   Named after Charles Spearman, Non-parametric measure of correlation  Assesses how well an arbitrary monotonic function describes the relationship between two variables,  Does not require the relationship be linear  Does not require interval measurement + Spearman Rank (ρ rho) d = difference in rank of a given pair  n = number of pairs  Alternative test = Kendall's Tau (Kendall's τ) + Mann-Whitney U  AKA: “Wilcoxon rank-sum test   Mann & Whitney, 1947; Wilcoxon, 1945 Non-parametric test for difference in the medians of two independent samples  Assumptions:  Samples are independent  Observations can be ranked (ordinal or better) + Mann-Whitney U  U tests the difference in the medians of two independent samples  n1 = number of obs in sample 1  n2 = number of obs in sample 2  R = sum of ranks of the lower-ranked sample + Mann-Whitney U or t?  Should you use it over the t-test?  Yes if you have a very small sample (<20)  (central limit assumptions not met)  If your data are really ordinal  Otherwise, probably not.  It is less prone to type-I error  (spurious significance) due to outliers.  But does not in fact handle comparisons of samples whose variances differ very well  (Use unequal variance t-test with rank data) + Wilcoxon signed-rank test (related samples)  Same idea as Mann-U, generalized to matched samples  Equivalent to non-independent sample t-test + Kruskall-Wallis  Non-parametric one-way analysis of variance by ranks (named after William Kruskal and W. Allen Wallis)  tests equality of medians across groups.  It is an extension of the Mann-Whitney U test to 3 or more groups.  Does not assume a normal population,  Assumes population variances among groups are equal. + Aesop: Mann-Whitney U Example  Suppose that Aesop is dissatisfied with his classic experiment in which one tortoise was found to beat one hare in a race.  He decides to carry out a significance test to discover whether the results could be extended to tortoises and hares in general… + Aesop 2: Mann-Whitney U  He collects a sample of 6 tortoises and 6 hares, and makes them all run his race. The order in which they reach the finishing post (their rank order) is as follows:  tort = c(1, 7, 8, 9, 10,11)  hare = c(2, 3, 4, 5, 6, 12)  Original tortoise still goes at warp speed, original hare is still lazy, but the others run truer to stereotype. + Aesop 3: Mann-Whitney U  wilcox.test(tort, hare)  Wilcoxon = W = 25, p-value = 0.31  Tortoises and hares do not differ  tort = c(1, 7, 8, 9, 10,11) (n2 = 6)  hare = c(2, 3, 4, 5, 6, 12) (n1 = 6, R1 =32)

Refresher in inferential statistics

Related documents

Products

Support

Refresher in inferential statistics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib