Jan 17, 2014 Hypothesis, Null hypothesis • The probability of the alpha level. • Research question • Null is the hypothesis of “no relationship” differences between two mean values, or a difference from a specific level of a mean, such as zero: hypothesis testing • Bell curve • Standard normal distribution “Alpha .05” • Expect to get results different than those found this many times if the experiment was repeated 100 times. Percent error we are willing to tolerate Student’s t-test • Used to test for significant Normal Distribution has a mean of “0” and a standard deviation of “1” “P” Value Confidence Intervals • What is the expected range of possible observations, based on the data we have? Can you sample every person in a population? In most cases, it is impossible to survey every person in a population to come up with a distribution of data so we never can know for sure what a distribution may look like. So we use a sample. A sample is drawn and we hope that the sample is representative of the population. (e.g. Sheldon knocking) Size of sample is based in large part on the amount of variability in the population, not the “size” of the population. Generally, when n greater than or equal to about 30, we assume we have a normal distribution This is based in part on a concept you may recall called the “Central Limit Theorem: the distribution of sample means is approximately normal regardless of whether the population being sampled is normal. When we talk about a “normal distribution” you can picture the Bell Curve A normal distribution is defined by its mean and std. dev. The mean is a measure of location (where is the mean?) The std. dev. is a measure of spread or variation. A normal distribution underlying our data is a requirement for many statistical procedures. Lit Eval Note: When you suspect that data may not be normally distributed, you need to look very closely at the statistics used. Standard Normal Curve (Z - Distribution) • Has a mean of “0” and a std. dev. of “1” • The area under the whole distribution is 1 • The AUC between any two points can be interpreted as the relative frequency of the values included between those points • Understanding this concept is essential to understanding p and alpha. Z=X–u/s Where: X is the variable to be standardized, u is the population mean and s is the population standard deviation Example: Transformation of a distribution with a mean of 50 and a standard deviation of 5 Population Mean μ=500 • Standard deviation σ=100 • 68% of SAT scores fall between 400 and 600 • Approximately 95% of scores fall between 300 and 700 Distribution 1 Distribution 2 What happens if we do not have a normal distribution of data? Is it possible to draw conclusions about areas under the curve, as we did with the Z distribution, a symmetrical distribution? It depends…we often still do, depending on how far from normal a distribution may be. Rule of thumb, n=30. If we did an experiment 100 times, we would expect to get similar results (an observed value equal to or less than we got this time) 95 times, and we would expect a different result only 5 times if we choose an alpha level of .05. The probability of concluding there is a difference between groups when there really is no difference between them (i.e., you found one of those times when Sheldon knocked something other than his normal 3 times, also referred to as a type I error). A statistical result is usually considered statistically significant if the probability of a type I error (alpha) is less than 5%. So what happens when alpha is .0512: do you conclude the study has obtained non-significant results? What are the implications of having steadfast rules for statistical significance? The level of statistical significance. A value of p<0.05 means that the probability that the result is due to chance is less than 1 in 20 and is the same as alpha < 0.05. The smaller the p-value, the greater your confidence in the statistical result. Alpha does not change whereas p values are dependent on the actual value of the statistic in question. When we do research, we set a standard that is relatively conservative that a researcher must meet in order to claim that s/he has made a discovery of a some phenomenon or answered some question. The standard is the alpha level, usually set at .05. Assuming that the null hypothesis is true, this means we may reject the null only if the observed data are so unusual that they would have occurred by chance at most 5 % of the time. The smaller the alpha, the more stringent the test (the more unlikely it is to find a statistically significant result). Once the alpha level has been set, a statistic (like t or Z) is computed. Each statistic has an associated probability value called a p-value, or the likelihood of an observed statistic occurring due to chance, given the sampling distribution. Alpha sets the standard for statistical significance, yes or no – whether or not we can reject the null hypothesis. The p-value indicates the actual level of how extreme the data are. Alpha is the probability of a Type I error: • The error of rejecting the null hypothesis if it is really true – or saying something is significant when it is not. We found a difference when there is not really a difference. With this kind of error, a drug that does not work could get to market. Sometimes referred to as a false positive. Type II error (beta): • The probability of concluding that there is no difference between treatment groups when there really is a difference • The error of accepting the null hypothesis and concluding no difference, when it is actually false and there is a difference. Failing to recognize a real difference. Sometimes referred to as a False Negative. In this type error, we could keep a potentially life saving drug off the market. In research, we generally set, in advance of doing research alpha (type I) levels established at 0.05 and beta (type II) levels at 0.20. What does this tell us? The ability of a study to detect a significant difference between treatment groups; the probability that a study will have a statistically significant result (p<0.05). Power = 1- beta (the false-negative rate). By convention, adequate study power is usually set at 0.8 (80%). This corresponds to beta of 0.2 (a false-negative rate of 20%). Power increases as sample size increases. The power of a study should be stated in the methods section of a study report. Statistical Power The probability of correctly rejecting the null hypothesis find it true when it is. Has become common to see power reported in clinical studies. Think of it as your “confidence” in your results. Power is (1-Type II) error So, its 1 – the chance you got it wrong = the probability you got it right. Typically we see Alpha 0.05, Beta 0.20, Power is 80% While alpha = 0.05 is an absolute according to most statistical experts, power is not. Power analysis is used in sample size planning and can be used for hypothesis testing • To calculate power all you need is: Desired alpha level An estimate of how big the effect is in the population. Estimate of the variability. Suppose you calculate that you need 1800 subjects to achieve 80% power for a hypertension study you are conducting. After completing the study you end up with 1788 subjects. What implications does this have for the quality of your results? Is the study flawed because you did not achieve sufficient power? (You missed your goal by 12 subjects.) • Effect size and variability of your results will dictate the power. Depending on the size of these two variables, you may still have reached sufficient power. • Even if you did not reach .80 or 80%, what are the implications? Test statistics take the form of: Test Statistic = Some measure of difference Some measure of variation Students t-test is one of many test-statistics. The t distribution is very similar to the Z distribution, except that the area under the curve is affected by the sample size. • Z vs. t computationally: Z = x1 – x2 / s.d. t = x1- x2 / (s.d./sqrt n) One of the most common statistical tests you will see. Compares mean scores, or compares a mean score to a fixed value: i.e., average change in blood pressure is more than 10 mmHg Average change in blood pressure between drug A and drug B is zero Basically, a t-test is a Z score, adjusted for sample size. Generally, all test statistics take on the form of “some difference divided by some measure of variation” For the t-test, it is; t = mean –observation / (s / sqrt n) – however: The actual test statistic (the calculation) varies with the nature of your study, depending on if you have “dependent” or “independent” samples Dependent samples are also referred to as “within-subjects” “repeated measures” or simply the same subjects are present in both groups being compared. Independent samples compares two separate, unrelated groups. Dependence or independence of the samples affects the variability – and this is accounted for in the calculations of the different tests (dependent and independent t-tests) Look for recognition of this issue as this usually means the authors know what they are doing! Assume: • Random sample of 100 students taking statistics exam in 2012 • Mean score=86, standard deviation=25 • From 2000 to 2012, mean score was 80 (population mean) Question: Did 2013’s students do significantly better on the test than previous years? Null form: There is no difference in 2012 students test scores than previous years. Assume the variable test score has a normal distribution Step 1: Hypotheses: Null Hypothesis = H0: μ equals 80 Alternative Hypothesis = HA: μ does not equal 80 (or μ > 80?) In this example, should we use a 1-sided or 2-sided test? Step 2: Decide on what percent of error we are willing to tolerate (alpha) for incorrectly rejecting our H0 This is the Type I error - (i.e., incorrectly saying that this year’s students did better when in fact they did not) Let’s say, we are willing to accept a 5% Type I error or α=0.05 Step 3: • Based on accepted level of α and sample size, determine the critical value of t statistic above which H0 will be rejected • For α=0.05, and n=100, • Critical value (c.v.)=1.66 (from t-distribution table, next slide, one-sided or 1-tailed test) Step 4: Calculate the t statistic t = (sample mean – population mean) / ( s / sqrt n ) t = (86 -80) / (25 / sqrt 100) = 2.4 Step 5: Since 2.4 (t-calc) > 1.66 (t-table) the hypothesis that the two means are equal is rejected. Or, the means are different based on this result. Confidence interval (C.I) is an interval around our mean that indicates the reliability of the mean C.I. = mean ± t (s / sqrt n) 95% CI of μ = mean ± 1.96 (s / sqrt n) 95% of all sample means fall within 1.96 standard deviations of the population mean 99% CI of μ = x ± 2 . 58 ( s / sqrt n ) 99% of all sample means fall within 2.58 standard deviations of the population mean1 Confidence Intervals: Assume: • Sample size n=100 • Sample mean age = 54.85 years • Sample standard deviation = 5.50 95% CI = 54.85 ± 1.96 (5.5 / sqrt 100) = (53.78, 55.93) • This range captures the true value of mean population age with 95% certainty • There is a 2.5% chance that the true mean actually lies above 55.93, or lies below 53.78 • My question was always: “Where does the 1.96 come from?” Short-term outcomes of an employer-sponsored diabetes management program at an ambulatory care pharmacy clinic. Yoder et al. American Journal of Health-System Pharmacists