Data Analysis A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements along a continuum, such as Flow Velocity What type of variable would “Mottled Sculpin /meter2” be? What type of variable is “Substrate Type”? What type of variable is “% of bank that is undercut”? A Few Necessary Terms Explanatory Variable: Independent variable. On xaxis. The variable you use as a predictor. Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable. Statistical Tests: Appropriate Use For our data, the response variable will always be continuous. T-test: A categorical explanatory variable with 2 options. ANOVA: A categorical explanatory variable with >2 options. Regression: A continuous explanatory variable Statistical Tests Hypothesis Testing: In statistics, we are always testing a Null Hypothesis (Ho) against an alternate hypothesis (Ha). Test Statistic: p-value: The probability of observing our data or more extreme data assuming the null hypothesis is correct Statistical Significance: We reject the null hypothesis if the p-value is below a set value, usually 0.05. Student’s T-Test Tests the statistical significance of the difference between means from two independent samples Compares the means of 2 samples of a categorical variable Mottled Sculpin/m2 Cross Plains Salmo Pond Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution (histogram) • Samples are independent • Assumed equal variance (boxplot) • No other sample biases • Interpreting the p-value Analysis of Variance (ANOVA) Tests the statistical significance of the difference between means from two or more independent samples Grand Mean Mottled Sculpin/m2 ANOVA website Riffle Pool Run Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution • Samples are independent • Assumed equal variance • No other sample biases • Interpreting the p-value • Pairwise T-tests to follow Simple Linear Regression • What is it? Least squares line •When is it appropriate to use? •Assumptions? •What does the p-value mean? The Rvalue? • How to do it in excel Simple Linear Regression Tests the statistical significance of a relationship between two continuous variables, Explanatory and Response 0.4 R2 = 0.6955 0.35 Brown Trout/Meter^2 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.1 0.2 0.3 Mottled Sculpin/Meter^2 0.4 0.5 Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution • Samples are independent • Assumed equal variance • Relationship is linear • No other sample biases • Interpret the p-value and R-squared value. Residual Plots Residuals are the distances from observed points to the best-fit line Residuals always sum to zero Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line. 0.4 R2 = 0.6955 0.35 Brown Trout/Meter^2 0.3 Residuals 0.25 0.2 0.15 0.1 0.05 0 0 0.1 0.2 0.3 Mottled Sculpin/Meter^2 0.4 0.5 Residual vs. Fitted Value Plots 0.15 Observed Values (Points) Residuals 0.1 0.05 0 0 0.1 0.2 0.3 -0.05 -0.1 -0.15 Fitted Values (MS_CPUA) Model Values (Line) 0.4 0.5 Residual Plots Can Help Test Assumptions 0 0 “Normal” Scatter Fan Shape: Unequal Variance 0 Curve (linearity) Have we violated any assumptions? 0.4 0.15 R2 = 0.6955 0.35 0.1 Residuals Brown Trout/Meter^2 0.3 0.25 0.2 0.05 0 0 0.15 -0.05 0.1 -0.1 0.1 0.2 0.3 0.05 -0.15 0 Fitted Values (MS_CPUA) 0 0.1 0.2 0.3 Mottled Sculpin/Meter^2 0.4 0.5 0.4 0.5 R-Squared and P-value High R-Squared Low p-value (significant relationship) R-Squared and P-value Low R-Squared Low p-value (significant relationship) R-Squared and P-value High R-Squared High p-value (NO significant relationship) R-Squared and P-value Low R-Squared High p-value (No significant relationship) P-value indicates the strength of the relationship between the two variables You can think of this as a measure of predictability R-Squared indicates how much variance is explained by the explanatory variable. If this is low, other variables likely play a role. If this is high, it DOES NOT INDICATE A SIGNIFICANT RELATIONSHIP!