Hypothesis Testing III MARE 250 Dr. Jason Turner To ASSUME is to make an… Four assumptions for means test hypothesis testing: 1. 2. 3. 4. Random Samples Independent Samples Normal Populations (or large samples) Variances (std. dev.) are equal Significance Level The probability of making a TYPE I Error (rejection of a true null hypothesis) is called the significance level (α) of a hypothesis test TYPE II Error Probability (β) – nonrejection of a false null hypothesis For a fixed sample size, the smaller we specify the significance level (α) , the larger will be the probability (β) , of not rejecting a false hypothesis Significance Level If H:0 is true If H:0 is false TYPE I ERROR No Error If H:0 is not No Error rejected TYPE II ERROR If H:0 is rejected I have the POWER!!! The power of a hypothesis test is the probability of not making a TYPE II error (rejecting a false null hypothesis) t evidence to support the alternative hypothesis POWER = 1 – β Produces a power curve We need more POWER!!! For a fixed significance level, increasing the sample size increases the power Therefore, you can run a test to determine if your sample size HAS THE POWER!!! By using a sufficiently large sample size, we can obtain a hypothesis test with as much power as we want Increasing the power of the test There are four factors that can increase the power of a means test: 1. Larger effect size (difference) - The greater the real difference between data for the two populations, the more likely it is that the sample means will also be different. 2. Higher α-level (the level of significance) - If you choose a higher value for α, you increase the probability of rejecting the null hypothesis, and thus the power of the test. (However, you also increase your chance of type I error.) Increasing the power of the test There are four factors that can increase the power of a means test: 3. Less variability - When the standard deviation is smaller, smaller differences can be detected. 4. Larger sample sizes - The more observations there are in your samples, the more confident you can be that the sample means represent m for the two populations. Thus, the test will be more sensitive to smaller differences. Calculating Power Power - the probability of being able to detect an effect of a given size Sample size - the number of observations in each sample Difference (effect) - the difference between μ for one population and μ for the other Calculating Power For a t-test – provide difference (means) and standard deviation (largest of two) If enter Sample size – get power If enter Power – get sample size Calculating Power Similar for ANOVA – enter levels as well If enter Sample size – get power If enter Power – get sample size Calculates sample size per level Calculating Power For an ANOVA– provide # levels, difference (means) and standard deviation (largest) If enter Sample size – get power If enter Power – get sample size Calculates sample size per level Increasing the power of the test The most practical way to increase power is often to increase the sample size However, you can also try to decrease the standard deviation by making improvements in your process or measurement Sample size Increasing the size of your samples increases the power of your test However, in the real world this is also a function of: 1. 2. 3. 4. Time Money Logistics Reality Data Transformations One advantage of using parametric statistics is that it makes it much easier to describe your data If you have established that it follows a normal distribution you can be sure that a particular set of measurements can be properly described by its mean and standard deviation If your data are not normally distributed you cannot use any of the tests that assume that it is (e.g. ANOVA, t test, regression analysis) Data Transformations If your data are not normally distributed it is often possible to normalize it by transforming it Transforming data to allow you to use parametric statistics is completely legitimate Data Transformations People often feel uncomfortable when they transform data because it seems like it artificially improves their results but this is only because they feel comfortable with linear or arithmetic scales However, there is no reason for not using other scales (e.g. logarithms, square roots, reciprocals or angles) where appropriate (See Chapter 13) Data Transformations Different transformations work for different data types: Logarithms : Growth rates are often exponential and log transforms will often normalize them. Log transforms are particularly appropriate if the variance increases with the mean. Reciprocal : If a log transform does not normalize your data you could try a reciprocal (1/x) transformation. This is often used for enzyme reaction rate data. Data Transformations Square root : This transform is often of value when the data are counts, e.g. # urchins, # Honu. Carrying out a square root transform will convert data with a Poisson distribution to a normal distribution. Arcsine : This transformation is also known as the angular transformation and is especially useful for percentages and proportions Which Transformation? Johnson Transformation is useful when the collected data are non-normal, but you want to apply a methodology that requires a normal distribution It is a MINITAB program – not a TEST! Which Transformation? Johnson Transformation should be used as a first step before you transform data “by hand” Why? 1) Its quick and easy (point and click) 2) It runs a variety of very complex data transformation functions 3) However, only runs LOG, ARCSINE based equations How To? STAT – Quality Tools – Johnson Transformation Enter what variable to be transformed, and what the “new” transformed variable will be called Places transformed data into a new column in your MINITAB datasheet – can copy this into Excel and save FOREVER… Johnson Transformation How do I know if it worked? If Johnson transformation program is successful it will: 1) Transform data and provide info on what transformation it ran (formula) 2) Run normality test to verify 3) Provide you with transformed data (if you ask for it) 4) Output has 3 graphs Johnson Transformation J o hns o n T r a ns fo r ma tio n fo r M a le s P r o b a b ilit y P lo t f o r O r ig in a l D a t a N AD P-V alue P e r ce nt 90 31 0.876 0.022 50 10 0.96 P -V a lue fo r A D te st 99 S e le c t a T r a n s f o r m a t io n 0.8 0.6 0.4 0.2 Ref P 0.0 0.2 1 0.6 0.8 1.0 100 200 99 (P - V alu e = 0.005 m ean s < = 0.005) N AD P-V alue 90 31 0.176 0.915 3 Graphs Transformation Equation P -V a lu e fo r B e st F it: 0 .9 1 5 2 4 9 Z fo r B e st F it: 0 .9 6 B e st T ra n sfo rm a tio n T y p e : S L T ra n sfo rm a tio n fu n ctio n e q u a ls 50 -1 0 .8 7 3 4 + 2 .5 8 2 9 5 * Lo g ( X + 5 .3 4 3 3 5 ) 10 1 1.2 Z V a lue 0 P r o b a b ilit y P lo t f o r T r a n s f o r m e d D a t a P e r ce nt 0.4 Did a LOG Transformation -2 0 2 4 Johnson Transformation How do I know if it worked? If Johnson transformation program is NOT successful it will: 1) Tell you it failed to find a data transformation that passed the normality test 2) Output has 2 graphs Johnson Transformation 2 Graphs Did not transform data Then What? If the Johnson Transformation program was unsuccessful at transformation your data to meet parametric assumptions then run Data transformations “by hand” There are several, I am teaching you 4: 1) Log 2) Reciprocal 3) Square Root 4) Arcsine Then What? Calculate these in your working Excel file 1) Make new column headers Then What? Calculate these in your working Excel file 2) Insert Function in cell below header Then What? Calculate these in your working Excel file 3) Enter cell number for first datapoint Then What? Calculate these in your working Excel file 4) Copy cell and paste/fill down Then What? Calculate these in your working Excel file 5) Wash, Rinse, Repeat…for other 3 variables