ARE OBSERVATIONS OBTAINED DIFFERENT? ARE OBSERVATIONS OBTAINED DIFFERENT? • You use different statistical tests for different problems. • We will examine some basic tests (χ2, t-test, Regression, ANOVA, ANCOVA, χ2) • We expect you to use these basic tests in your research. • Your research project should not be so complicated that more advanced tests are required. • Always state your hypothesis – what you are testing. BASIC PREMISE OF STATISTICAL TESTING: Null Hypothesis: The coin is fair Toss a coin 100 times Frequency A fair coin: x = 50 heads sd = 5 heads (√(½ x ½ x 100)) You observe 60 heads. Is the coin fair? sd away from mean = (60 – 50)/5 = 2 sd 2 sd is 5% chance, but in one direction so 2.5% chance (5%/2) Proportion of heads NULL HYPOTHESIS NULL HYPOTHESIS ACCEPTED REJECTED TRUE CORRECT TYPE I ERROR FALSE TYPE II ERROR CORRECT What if you set the probability to claim it to be unfair to be 5%? What if you set the probability to claim it to be unfair to be 25%? NONPARAMETRIC TESTS: (data does not have to be normally distributed) Data must be counts and you test proportional distribution of counts. Null hypothesis: no difference in proportion of red among strata 2 CONTINGENCY TABLE: #1 strata #2 #3 SPECIES RED NOT RED #1 STRATA #2 #3 8 (2.94 ) 1 (3.24) 1 ( 3.82) 10 2 (7.06) 10 (7.76) 12 ( 9.18) 24 10 11 13 34 Expected for each cell = (R x C)/TOTAL 2 = RED 8/10 RED 1/11 RED 1/13 (O – E)2 = 8.71 + 3.63 +1.55 + .65 + 2.08 + .87 = 17.49 E P < 0.001; df = (r-1)(c-1) = 2 2 CONTINGENCY TABLE: Make a spreadsheet with table categories and counts in each, and then have MYSTAT use as frequencies (Data … Case weighting … By frequencies) Depending on table, use One-way frequency tables (one category – e.g., tree type) or Tables (more than one category – e.g., tree type and strata) in Analyze in MYSTAT PARAMETRIC TESTS (data is normally distributed) Data do not have to be counts. Easier to see differences (more powerful) than nonparametric statistics. #1 strata #2 Frequency Null hypothesis: no difference in proportion of red between strata #1 and #2. #3 Proportion red T-TEST: t= (x1 - x2)√n1n2/(n1 + n2) √[(n1 – 1)s12 + (n2 – 1)s22]/(n1+ n2 – 2) t = [(0.71)(1.41)]/.214 = 4.68 RED RED RED 0.79 + 0.25 0.08 + 0.17 0.08 + 0.17 P < 0.005, degrees of freedom = 6 T-TEST: Use Hypothesis testing in Analyze in MYSTAT for means PARAMETRIC TESTS (data is normally distributed) Data do not have to be counts. Easier to see differences (more powerful) than nonparametric statistics. #1 strata #2 Null hypothesis: no difference in relative abundance of red between strata #1 and #2 for matched plots based on similarity. #3 EVEN MORE POWERFUL IF A PRIORI BASIS TO PAIR OBSERVATIONS. PAIRED T-TEST: Pairs: 0.5 – 0 = 0.5; 1.0 – 0 = 1.0; 1.0 - 0.33 = 0.67; 0.67 – 0 = 0.67 mean = 0.71, sd = 0.21 t = 0.71/(0.21/√4) = 6.76 RED RED RED 0.79 + 0.25 0.08 + 0.17 0.08 + 0.17 P < 0.001, degrees of freedom = n-1 = 3 PARAMETRIC TESTS (data is normally distributed) Data do not have to be counts. Easier to see differences (more powerful) than nonparametric statistics. Null hypothesis: no difference in absolute abundance of red between strata #1 and #2. #1 strata #2 #3 Now use numbers not proportions. T-TEST: Strata #1: mean = 2.0, sd = 0.82, n = 4 Strata #2: mean = 0.25, sd = 0.5, n =4 t = [(2 – 0.25)(1.41)]/ 0.68 = 3.63 P < 0.01, degrees of freedom = 6 RED RED RED 0.79 + 0.25 0.08 + 0.17 0.08 + 0.17 STATISTICAL TESTS Null hypothesis: there is no relationship between red vs. blue + green in plots. REGRESSION ANALYSIS: 3 2 #3 RED #1 strata #2 1 5 0 0 1 2 3 BLUE or GREEN RED = 2.33 – 0.75(BLUE or GREEN) RED RED RED 0.79 + 0.25 0.08 + 0.17 0.08 + 0.17 r2 = 0.75, r = -0.88 Degrees of freedom = 12 – 2 = 10 P < 0.001 REGRESSION ANALYSIS: Use Regression … Linear … Least squares in Analyze in MYSTAT Select dependent (y) and independent (x) variables PARAMETRIC TESTS (data is normally distributed) WHAT IF MULTIPLE COMPARISONS OF A CATEGORY (ANOVA) Null hypothesis: no difference in relative abundance of red among all strata. #1 strata #2 #3 Three possible t-test comparisons: #1 vs. #2 #1 vs. #3 #2 vs. #3 PROBLEM: As number of comparisons increases, the likelihood of finding at least one significant difference by chance increases. ANOVA takes this into account to compare differences in mean values. 1-WAY ANOVA: RED RED RED 0.79 + 0.25 0.08 + 0.17 0.08 + 0.17 F = 19.75 df = 2, 9 (strata -1, samples – strata) p < 0.001 ANOVA: Use Analysis of variance … Estimate model in Analyze in MYSTAT Select continuous dependent (y) variable and categorical independent (x) variables MULTIPLE COMPARISONS (ANOVA): (Which specific differences are significant?) Post –hoc analysis: Must compensate for number of comparisons and the fact that a difference is already known to be significant. #1 strata #2 #3 Bonferroni test: (t-test adjusted for # of comparisons) #1 vs. #2 – p < 0.001 #1 vs. #3 – p < 0.001 #2 vs. #3 – p < 1.0 RED RED RED 0.79 + 0.25 0.08 + 0.17 0.08 + 0.17 ANOVA – POST HOC: (cannot do with MYSTAT, but will with SYSTAT) Use Analysis of variance … Estimate model … Hypothesis test in Analyze in SYSTAT MULTIPLE COMPARISONS (ANOVA): (several independent categorical variables) Null hypothesis: no difference in relative abundance of red between strata and with distance into the woods. TWO-WAY ANOVA: #1 strata #2 #3 far RED RED RED 0.79 + 0.25 0.08 + 0.17 0.08 + 0.17 DISTANCE FROM EDGE near Strata: F = 15.65; df = 2,6; p < 0.001 Distance: F = 0.12; df = 1,6; p < 0.74 Strata X Distance Interaction: F = 0.51; df = 2,6; p < 0.63 COULD HAVE N-WAY ANOVA, YOUR PROJECT SHOULD NOT EXCEED A 2-WAY. THE INTERACTION TERM’S MEANING (no variety) LOCATION SEASON A B C mean I 1 2 3 2 II 2 2 2 2 III 3 2 1 2 mean 2 2 2 2 NO MAIN EFFECTS (SEASON or LOCATION – no differences) INTERACTION IS SIGNIFICANT (greatest at A:III and C:I) THE INTERACTION TERM’S MEANING (wider variety) LOCATION SEASON A B C mean I 1 2 3 2 II 4 5 6 5 III 7 8 9 8 mean 4 5 6 5 MAIN EFFECTS (SEASON or LOCATION -- differences) NO INTERACTION (highest always in C and III) MULTIPLE COMPARISONS (ANCOVA): (several independent variables: one categorical and one continuous) Null hypothesis: no difference in relative abundance of red with blue + green and distance into the woods (assume equal slopes). #1 strata #2 RED (#/plot) 3.5 #3 3.0 2.5 2.0 1.5 1.0 DISTANCE$ FAR NEAR 0.5 far RED RED RED 0.79 + 0.25 0.08 + 0.17 0.08 + 0.17 3. 5 3. 0 2. 5 2. 0 1.5 1.0 0. 5 DISTANCE FROM EDGE near 0. 0 0.0 BLUE + GREEN (#/plot) ANCOVA: Blue + Green: F = 36.10; df = 1,9; p < 0.0002 Distance: F = 0.78; df = 1,9; p < 0.40 Interaction (slope): F = 0.08; df = 1,8; p < 0.08 COULD HAVE N-WAY ANCOVA, ANCOVA: Use Analysis of variance … Estimate model in Analyze in MYSTAT. In SYSTAT use General linear model … Estimate model in Analyze Select continuous dependent (y) variable and categorical independent (x1) variable and covariate (x2). In SYSTAT, create interaction term to test slope. DATA TRANSFORMATIONS (can normalize data or make it continuous so parametric statistics can be used, or make data linear for regression) • Data are not always normally distributed, but a transformation may make it normal (e.g., log). If it cannot be normalized then must use non-parametric statistics (less powerful). • Data are not always continuous, percentages or proportions are not continuous because they cannot be less than 0 or greater than 100 or 1. To make them continuous from 0 to infinity or –infinity to +infinity, you can use transforms: arcsine transform = arcsinproportion; logarithmic transform = log(proportion)* logit transform = log (proportion/1-proportion)*. This stretches both tails and compresses the peak to approximate a continuous normal distribution. * If some proportions = 0 or 1, then add a small constant to all values (e.g, 0.001) • Data for regression are not always linear, various transformations, especially log x, log y or both, can transform a curve into a straight line. What do logarithmic transforms imply about the linear function? DATA TRANSFORMATIONS Use Data … Transform … Let in MYSTAT. ARE OBSERVATIONS OBTAINED DIFFERENT? • Different statistical tests for different problems. • You will use these basic tests in your research (χ2, t-test, Regression, ANOVA, ANCOVA) • Your research project should not be so complicated that more advanced tests are required. • Always graph your data and state your hypothesis. Meadow vole (Microtus pennsylvanicus) Yellowbellied marmot (Marmota flaviventris) UNDERC-WEST (National Bison Range) USE MYSTAT WITH DATA FILES CREATED LAST WEEK (be sure to set 6 decimal places -- Edit … Options … Output in MYSTAT so p values are exact) WITH MYSTAT ANSWER THESE QUESTIONS: (you will use χ2, regression, t-test, 2-way ANOVA, ANCOVA) • Does snap-trapping lead to a sex bias in Microtus? • What is the relationship between length and mass for Microtus? (hint: need to use Data … Transform … Let) • Do Microtus and Marmota exhibit similar length and mass growth relationships? (hint: think about question above) • Does Marmota mass vary with month? Explain ecologically what you see. • Does reproductive status of female Microtus differ with mass? Why do you observe this? (hint: need to use Data … Select cases) • Does the reproductive status of male and female Microtus with mass differ? Due in two weeks!