T Tests and ANovas

Jennifer Siegel Statistical background Z-Test T-Test Anovas  Science tries to predict the future  Genuine effect?  Attempt to strengthen predictions with stats  Use P-Value to indicate our level of certainty that result = genuine effect on whole population (more on this later…)  Develop an experimental hypothesis  H0 = null hypothesis  H1 = alternative hypothesis  Statistically significant result  P Value = .05  Probability that observed result is true  Level = .05 or 5%  95% certain our experimental effect is genuine  Type 1 = false positive  Type 2 = false negative  P = 1 – Probability of Type 1 error  Let’s pretend you came up with the following theory… Having a baby increases brain volume (associated with possible structural changes) Z - test T - test  Population z x   Cost  Not able to include everyone  Too time consuming  Ethical right to privacy Realistically researchers can only do sample based studies  T = differences between sample means / standard error of sample means  Degrees of freedom = sample size - 1 t differences _ between _ sample _ means estimated _ standard _ error _ of _ differences _ between _ means x1  x 2 t s x1  x2 2 s x1  x2 2 s1 s 2   n1 n2  H0 = There is no difference in brain size before or after giving birth  H1 = The brain is significantly smaller or significantly larger after giving birth (difference detected) Sum Mean SD Before Delivery 1437.4 1089.2 1201.7 1371.8 1207.9 1150.7 1221.9 1208.7 9889.3 1236.1625 113.8544928 6 Weeks After Delivery 1494.5 1109.7 1245.4 1383.6 1237.7 1180.1 1268.8 1248.3 10168.1 1271.0125 119.0413426 Difference 57.1 20.5 43.7 11.8 29.8 29.4 46.9 39.6 278.8 34.85 5.18685 T=(1271-1236)/(119-113) T DF 6.718914454 7 Women have a significantly larger brain after giving birth http://www.danielsoper.com/statcalc/calc08.aspx  One-sample (sample vs. hypothesized mean)  Independent groups (2 separate groups)  Repeated measures (same group, different measure)  ANalysis Of VAriance  Factor = what is being compared (type of pregnancy)  Levels = different elements of a factor (age of mother)  F-Statistic  Post hoc testing  1 Way Anova  1 factor with more than 2 levels  Factorial Anova  More than 1 factor  Mixed Design Anovas  Some factors are independent, others are related  There is a significant difference somewhere between groups  NOT where the difference lies  Finding exactly where the difference lies requires further statistical analysis = post hoc analysis  Z-Tests for populations  T-Tests for samples  ANOVAS compare more than 2 groups in more complicated scenarios Varun V.Sethi Objective Correlation Linear Regression Take Home Points. Correlation - How much linear is the relationship of two variables? (descriptive) Regression - How good is a linear model to explain my data? (inferential) Correlation Correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom).  Strength and direction of the relationship between variables  Scattergrams Y Y Y X Positive correlation Y Y Y X Negative correlation No correlation Measures of Correlation 1) Covariance 2) Pearson Correlation Coefficient (r) 1) Covariance - The covariance is a statistic representing the degree to which 2 variables vary together n cov( x, y )   ( x  x)( y  y) i 1 i i n {Note that Sx2 = cov(x,x) }  A statistic representing the degree to which 2 variables vary together  Covariance formula n cov( x, y )   ( x  x)( y i 1 i i n n  cf. variance formula S x2  2 ( x  x )  i i 1 n  y) 2) Pearson correlation coefficient (r) cov( x, y) rxy  sx s y (S = st dev of sample) - r is a kind of ‘normalised’ (dimensionless) covariance - r takes values fom -1 (perfect negative correlation) to 1 (perfect positive correlation). r=0 means no correlation Limitations: Sensitive to extreme values Relationship not a prediction. Not Causality Regression: Prediction of one variable from knowledge of one or more other variables How good is a linear model (y=ax+b) to explain the relationship of two variables? - If there is such a relationship, we can ‘predict’ the value y for a given x. (25, 7.498) Linear dependence between 2 variables Two variables are linearly dependent when the increase of one variable is proportional to the increase of the other one y x Samples: - Energy needed to boil water - Money needed to buy coffeepots Fiting data to a straight line (o viceversa): Here, ŷ = ax + b – ŷ : predicted value of y – a: slope of regression line – b: intercept ŷ = ax + b εi = ŷi, predicted = yi , observed εi = residual Residual error (εi): Difference between obtained and predicted values of y (i.e. yi- ŷi) Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (yi- ŷi)2 Adjusting the straight line to data: • Minimise (yi- ŷi)2 , which is (yi-axi+b)2 • Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus • Take partial derivatives of (yi-axi-b)2 respect parametres a and b and solve for 0 as simultaneous equations, giving: rs y a sx • This can always be done b  y  ax  We can calculate the regression line for any data, but how well does it fit the data?  Total variance = predicted variance + error variance sy2 = sŷ2 + ser2  Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model r2 = sŷ2 / sy2  Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get: ser2 = sy2 (1 – r2) From this we can see that the greater the correlation the smaller the error variance, so the better our prediction  Do we get a significantly better prediction of y from our regression equation than by just predicting the mean? F-statistic  Prediction / Forecasting  Quantify strength between y and Xj ( X1, X2, X3 )  A General Linear Model is just any model that describes the data in terms of a straight line  Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept. y = bx + a +ε  Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y  The different x variables are combined in a linear way and each has its own regression coefficient: y = b0 + b1x1+ b2x2 +…..+ bnxn + ε  The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.  i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for Take Home Points - Correlated doesn’t mean related. e.g, any two variables increasing or decreasing over time would show a nice correlation: C02 air concentration in Antartica and lodging rental cost in London. Beware in longitudinal studies!!! - Relationship between two variables doesn’t mean causality (e.g leaves on the forest floor and hours of sun)  Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y  Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y  Both are types of General Linear Model Thank You

T Tests and ANovas

Related documents

Products

Support

T Tests and ANovas

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib