Linear Regression and Correlation Fitted Regression Line 200 Y=Weight(g) 180 160 140 120 100 80 54 56 58 60 62 Length (cm) 64 66 68 70 Equation of the Regression Line Y b0 b1 X Least squares regression line of Y on X b1 ( xi x )( yi y ) ( xi x ) 2 b0 y b1x Regression Calculations Plotting the regression line Residuals Using the fitted line, it is possible to obtain an estimate of the y coordinate yˆi b0 b1xi The “errror” in the fit we term the “residual error” yi yˆi 200 Residual Y=Weight(g) 180 160 140 120 100 80 54 56 58 60 62 Length (cm) 64 66 68 70 Residual Standard Deviation ( yi yˆ ) n2 sY | X 2 200 Y=Weight(g) 180 160 140 120 100 80 54 56 58 60 62 Length (cm) 64 66 68 70 Residuals from example Other ways to evaluate residuals Lag plots, plot residuals vs. time delay of residuals…looks for temporal structure. Look for skew in residuals Kurtosis in residuals – error not distributed “normally”. Model Residuals: constrained Model Residuals: freely moving 40 40 Pairwise model 20 0 40 20 0 40 Independent model 20 0 -0.3 0 Pairwise model Independent model 20 0.3 0.08 0 -0.3 0 0.15 0.06 0.1 Pairwise model 0.04 0.02 0.05 0 0.3 Pairwise model 0 -0.05 -0.02 -0.1 -0.04 -0.15 -0.06 -0.08 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.05 -0.2 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.1 0 0 -0.05 -0.1 -0.15 -0.2 Independent model -0.1 -0.2 -0.25 Independent model -0.3 -0.3 -0.35 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 -0.4 -0.4 -0.3 -0.2 -0.1 0 0.1 Parametric Interpretation of regression: linear models Conditional Populations and Conditional Distributions A conditional population of Y values associated with a fixed, or given, value of X. A conditional distribution is the distribution of values within the conditional population above Y|X Population mean Y value for a given X Y | X Population SD of Y value for a given X The linear model Assumptions: Linearity Constant standard deviation Y Y|X Y|X 0 1 X Y 0 1 X Statistical inference concerning You can make statistical inference on model parameters themselves b0 estimates b1 estimates sY | X estimates 0 1 Y|X 1 Standard error of slope 95% Confidence interval for 1 b1 t0.025 SEb1 where SEb1 sY | X (x x) i 2 Hypothesis testing: is the slope significantly different from zero? 1 = 0 Using the test statistic: b1 ts SE b1 df=n-2 Coefficient of Determination r2, or Coefficient of determination: how much of the variance in data is accounted for by the linear model. ˆ ( y y ) i 1 2 ( yi y ) 2 Line “captures” most of the data variance. Correlation Coefficient R is symmetrical under exchange of x and y. sY b1 r * sX r ( x x )( y y ) ( x x ) ( y y) i i 2 i i 2 What’s this? It adjusts R to compensate for the fact That adding even uncorrelated variables to the regression improves R Statistical inference on correlations Like the slope, one can define a t-statistic for correlation coefficients: b1 n 1 ts r 2 SEb1 1 r Consider the following some “Spike Triggered Averages”: 500V 0 -500V -5 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 15 0 -2mV -5 0 5 10 15 15 5mV 0 -5mV -5 0 5 10 15 2mV 0 -2mV -5 0 5 10 15 -5 500V 0 -500V -5 500V 0 5 10 15 1mV 0 -1mV 0 5 10 0 -500V -5 0 5 10 5mV 0 -5mV -5 5mV 0 -5mV -5 10mV 0 -10mV -5 2mV STA example R2=0.25. Is this correlation significant? b n 1 ts 1 r SEb1 1 r2 N=446, t = 0.25*(sqrt(445/(10.25^2))) = 5.45 6 4 2 0 -2 -4 -6 -10 -5 0 5 When is Linear Regression Inadequate? Curvilinearity Outliers Influential points Curvilinearity 0 -5 -10 -15 -20 -25 60 61 62 63 64 65 66 67 68 69 70 Outliers Can reduce correlations and unduly influence the regression line You can “throw out” some clear outliers A variety of tests to use. Example? Grubb’s test m ean value Z SD Look up critical Z value in a table Is your z value larger? Difference is significant and data can be discarded. Influential points Points that have a lot of influence on regressed model Not really an outlier, as residual is small. 300 250 200 150 100 50 50 55 60 65 70 75 80 85 90 95 100 105 Conditions for inference Design conditions Random subsampling model: for each x observed, y is viewed as randomly chosen from distribution of Y values for that X Bivariate random sampling: each observed (x,y) pair must be independent of the others. Experimental structure must not include pairing, blocking, or an internal hierarchy. Conditions on parameters Y|X 0 1 X sY | X is not a function of X Conditions concerning population distributions Same SD for all levels of X Independent Observatinos Normal distribution of Y for each fixed X Random Samples Error Bars on Coefficients of Model MANOVA and ANCOVA MANOVA Multiple Analysis of Variance Developed as a theoretical construct by S.S. Wilks in 1932 Key to assessing differences in groups across multiple metric dependent variables, based on a set of categorical (non-metric) variables acting as independent variables. MANOVA vs ANOVA ANOVA Y1 = (metric DV) X1 + X2 + X3 +...+ Xn (non-metric IV’s) MANOVA Y1 + Y2 + ... + Yn = X1 + X2 + X3 +...+ Xn (metric DV’s) (non-metric IV’s) ANOVA Refresher SS Df k-1 Between SS(B) Within SS(W) N-k Total SS(W)+SS(B) N-1 MS SS ( B ) k 1 F MS ( B) MS (W ) SS (W ) N k Reject the null hypothesis if test statistic is greater than critical F value with k-1 Numerator and N-k denominator degrees of freedom. If you reject the null, At least one of the means in the groups are different MANOVA Guidelines Assumptions the same as ANOVA Additional condition of multivariate normality all variables and all combinations of the variables are normally distributed Assumes equal covariance matrices (standard deviations between variables should be similar) Example The first group receives technical dietary information interactively from an on-line website. Group 2 receives the same information in from a nurse practitioner, while group 3 receives the information from a video tape made by the same nurse practitioner. User rates based on usefulness, difficulty and importance of instruction Note: three indexing independent variables and three metric dependent Hypotheses H0: There is no difference between treatment group (online learners) from oral learners and visual learners. HA: There is a difference. Order of operations MANOVA Output 2 Individual ANOVAs not significant MANOVA output Overall multivariate effect is signficant Post hoc tests to find the culprit Post hoc tests to find the culprit! Once more, with feeling: ANCOVA Analysis of covariance Hybrid of regression analysis and ANOVA style methods Suppose you have pre-existing effect differences between subjects Suppose two experimental conditions, A and B, you could test half your subjects with AB (A then B) and the other half BA using a repeated measures design Why use? Suppose there exists a particular variable that *explains* some of what’s going on in the dependent variable in an ANOVA style experiment. Removing the effects of that variable can help you determine if categorical difference is “real” or simply depends on this variable. In a repeated measures design, suppose the following situation: sequencing effects, where performing A first impacts outcomes in B. Example: A and B represent different learning methodologies. ANCOVA can compensate for systematic biases among samples (if sorting produces unintentional correlations in the data). Example Results Second Example How does the amount spent on groceries, and the amount one intends to spend depend on a subjects sex? H0: no dependence Two analyses: MANOVA to look at the dependence ANCOVA to determine if the root of there is significant covariance between intended spending and actual spending MANOVA Results ANCOVA ANCOVA Results So if you remove the amount the subjects intend to spend from the equation, No significant difference between spending. Spending difference not a result Of “impulse buys”, it seems. Principal Component Analysis Say you have time series data, characterized by multiple channels or trials. Are there a set of factors underlying the data that explain it (is there a simpler exlplanation for observed behavior)? In other words, can you infer the quantities that are supplying variance to the observed data, rather than testing *whether* known factors supply the variance. Example: 8 channels of recorded EMG activity 0 500 1000 1500 2000 2500 PCA works by “rotating” the data (considering a time series as a spatial vector) to a “position” in the abstract space that minimizes covariance. Don’t worry about what this means. Note how a single component explains almost all of the variance in the 8 EMGs Recorded. 0 500 1000 1500 2000 2500 Next step would be to correlate these components with some other parameter in the experiment. 0 500 1000 1500 2000 2500 Neural firing rates Largest PC 0 500 1000 1500 2000 2500 Some additional uses: Say you have a very large data set, but believe there are some common features uniting that data set Use a PCA type analysis to identify those common features. Retain only the most important components to describe “reduced” data set.