SPSS Workshop Research Support Center Chongming Yang Causal Inference • If A, then B, under condition C • If A, 95% Probability B, under condition C Student T Test (William S. Gossett’s pen name = student) • Assumptions – Small Sample – Normally Distributed • t distributions: t = [ x - μ ] / [ s / sqrt( n ) ] df = degrees of freedom=number of independent observations Type of T Tests • One sample – test against a specific (population) mean • Two independent samples – compare means of two independent samples that represent two populations • Paired – compare means of repeated samples One Sample T Test • Conceputally convert sample mean to t score and examine if t falls within acceptable region of distribution x u t s n Two Independent Samples t x1 x2 (n1 1)s (n2 1)s 1 1 ( ) n1 n2 2 n1 n2 2 1 2 2 Paired Observation Samples • d = difference value between first and second observations t d Sd n Multiple Group Issues • Groups A B C comparisons – AB AC BC – .95 .95 .95 • Joint Probability that one differs from another – .95*.95*.95 = .91 Analysis of Variance (ANOVA) • Completely randomized groups • Compare group variances to infer group mean difference • Sources of Total Variance – Within Groups – Between Groups SSB df1 F • F distribution SSW – SSB = between groups sum squares df 2 – SSW = within groups sum squares Fisher-Snedecor Distribution F Test • Null hypothesis: 𝑥1 = 𝑥2 = 𝑥3 . . . = 𝑥𝑛 • Given df1 and df2, and F value, • Determine if corresponding probability is within acceptable distribution region Issues of ANOVA • Indicates some group difference • Does not reveal which two groups differ • Needs other tests to identify specific group difference – Hypothetical comparisons Contrast – No Hypothetical comparisons Post Hoc • ANOVA has been replaced by multiple regressions, which can also be replaced by General Linear Modeling (GLM) Multiple Linear Regression • Causes 𝑥 cab be continuous or categorical • Effect 𝑦 is continuous measure y 0 1x1 2 x2 3 x3...k xk • Mild causal terms predictors • Objective identify important 𝑥 Assumptions of Linear Regression • • • • Y and X have linear relations Y is continuous or interval & unbounded expected or mean of = 0 = normally distributed not correlated with predictors • Predictors should not be highly correlated • No measurement error in all variables Least Squares Solution • Choose 𝛽0 , 𝛽1 , 𝛽2 , 𝛽3 , . . . 𝛽𝑘 to minimize the sum of square of difference between observed 𝑦𝑖 and model estimated/predicted 𝑦𝑖 ˆ ( y y ) i i 2 • Through solving many equations Explained Variance in 𝑦 (yi ) 2 y ( yi yˆi ) 2 n R 2 2 (yi ) yi n 2 2 i Standard Error of 𝛽 ( yi yiˆ ) 1 SE 2 2 n k 1 ( xi xi ) (1 R ) 2 T Test significant of 𝛽 • t = 𝛽 / SE𝛽 • If t > a critical value & p <.05 • Then 𝛽 is significantly different from zero Confidence Intervals of 𝛽 Standardized Coefficient (𝛽𝑒𝑡𝑎) • Make 𝛽s comparable among variables on the same scale (standardized scores) stdx eta stdy Interpretation of 𝛽 • If x increases one unit, y increases 𝛽 unit, given other values of X Model Comparisons • Complete Model: y 0 1 x1 2 x2 3 x3 ...k xk • Reduced Model: y 0 1 x1 2 x2 ... g xg • Test F = Msdrop / MSE – MS = mean square – MSE = mean square error Variable Selection • Select significant from a pool of predictors • Stepwise undesirable, see http://en.wikipedia.org/wiki/Stepwise_regression • Forward • Backward (preferable) Dummy-coding of Nominal 𝑥 • R = Race(1=white, 2=Black, 3=Hispanic, 4=Others) R 1 1 2 2 3 3 4 4 d1 d2 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 d3 0 0 0 0 1 1 0 0 • Include all dummy variables in the model, even if not every one is significant. Interaction y 0 1x1 2 x2 3 x3 4 x2 x3...k xk • Create a product term X2X3 • Include X2 and X3 even effects are not significant • Interpret interaction effect: X2 effect depends on the level of X3. Plotting Interaction • Write out model with main and interaction effects, • Use standardized coefficient • Plug in some plausible numbers of interacting variables and calculate y • Use one X for X dimension and Y value for the Y dimension • See examples http://frank.itlab.us/datamodel/node104.html Diagnostic • Linear relation of predicted and observed (plotting • Collinearity • Outliers • Normality of residuals (save residual as new variable) Repeated Measures (MANOVA, GLM) • • • • • Measure(s) repeated over time Change in individual cases (within)? Group differences (between, categorical x)? Covariates effects (continuous x)? Interaction between within and between variables? Assumptions • Normality • Sphericity: Variances are equal across groups so that • Total sum of squares can be partitioned more precisely into – Within subjects – Between subjects – Error Model yij i j ij ij • 𝜇 = grand mean • 𝜋𝑖 = constant of individual i • 𝜏𝑗 = constant of jth treatment • 𝜀𝑖𝑗 = error of i under treatment j • 𝜋𝜏 = interaction F Test of Effects • F = MSbetween / Mswithin (simple repeated) • F = Mstreatment / Mserror (with treatment) • F = Mswithin / Msinteraction (with interaction) Four Types Sum-Squares • • • • Type I balanced design Type II adjusting for other effects Type III no empty cell unbalanced design Type VI empty cells Exercise • http://www.ats.ucla.edu/stat/spss/seminars/R epeated_Measures/default.htm • Copy data to spss syntax window, select and run • Run Repeated measures GLM