Statistical inference …..is inference about a population from a random sample drawn from it. It includes: • point estimation • interval estimation • hypothesis testing (or statistical significance testing) • prediction 1 Estimation and Inference Suppose we have a single sample. The questions we might want to answer are these: • what is the population mean value? • is the population mean value significantly different from current expectation or recommended level? • what is the level of uncertainty associated with our estimate of the population mean level? 2 In order to be reasonably confident that our inferences are correct, we need to establish some facts about the distribution of the data: • is the sample size large enough ? • are there outliers in the data ? • is the mean a sensible summary statistics ? • if data were collected over a period of time, is there evidence of serial correlation? • are the values normally distributed or not? 3 What is the level of uncertainty associated with our point estimate x of the population mean µ ? 4 Confidence Intervals for µ x ± margin of error 5 Confidence Interval for µ Shows the likely range in which the true population mean is situated. The more ‘confidence’ required the wider the interval. The convention is to calculate 95% confidence intervals. 6 Confidence Interval for µ x ± Confidence Coefficient ∗ σ n Standard error Gosset and the t-distribution 7 To improve the ‘precision’ of the sample mean: • decrease σ (?) • increase n σ n 8 One-Sample T: Cadmium Variable Cadmium N 168 Mean 0.268690 StDev 0.163347 SE Mean 0.012602 95% CI (0.243810, 0.293571) Interpretation ?? 9 Hypothesis Testing - Strategy i. Take a random sample from the population of interest and calculate a suitable test statistic ii. Investigate how likely the value of that statistic is given some specified hypothesis is true iii. Make a decision as to whether your hypothesis is true given ii. 10 Significance. The key concept is the amount of variation that we would expect to occur by chance alone when nothing scientifically interesting was going on. If we measure bigger differences than we would expect by chance, we say that the result is statistically significant. If we measure no more variation that we might reasonably expect by chance alone we say our result is not statistically significant. 11 The first step in a hypothesis test is to state a claim that we will try to find evidence against. “Innocent until proven guilty …” 12 The hypothesis that the population parameter is equal to some claimed value is called the null hypothesis (H0). The hypothesis that must be true if the null hypothesis is false is called the alternative hypothesis (H1 or HA). 13 Tests Concerning Means One Sample Test ‘mean cadmium levels’ Is µ = 0.30 ? 14 Ho : µ = 0.30 Various alternative hypotheses H1 : µ ≠ 0.30 Two sided H1 : µ < 0.30 H1 : µ > 0.30 One sided One sided 15 One-Sample T: Cadmium Test of mu = 0.3 vs not = 0.3 Variable Cadmium N 168 Mean 0.268690 95% CI (0.243810, 0.293571) StDev 0.163347 T -2.48 SE Mean 0.012602 P 0.014 (Large sample with N > 30 → Normality Assumption not necessary) 16 P-value: • An estimate of the probability that the test statistic could have occurred by chance, if the null hypothesis were true. • How likely is this data if the true mean Cadmium level was equal to 0.3? • A low p-value means the null hypothesis is unlikely to be true (a p-value of < 0.05 is considered low). P-value: 0.014 Moderate evidence against H0. 17 Comparing CI & Hypothesis tests • Confidence interval: A fixed level of confidence is chosen. We determine a range of possible values for the parameter that are consistent with the data (at the chosen significance level). • Hypothesis test: Only one possible value for the parameter is tested. We determine the strength of the evidence provided by the data against the proposition that the hypothesized value is the true value. 18 CAUTION! A non-significant test does not imply that the null hypothesis is true. 19