WORKSHOP: Time series analysis Use the dataset TSA.sav. These data were obtained from the same individual who filled out a questionnaire related to the Five Factor Model of personality for 90 subsequent days. The data file contains 3 variables: a daily Extraversion score and a daily Neuroticism score and a variable indicating which day it is. Go to GRAPHS – SEQUENCE to plot E and N. 1. Lagged variables First, we are going to make some lagged variables. To this end, use TRANSFORM – CREATE TIME SERIES Now click the variable E to the NEW VARIABLE(S) box. Choose LAG for the FUNCTION. Leave the ORDER at 1. Click on CHANGE (do not skip this step!!!) Click OK. Now you can see in the VARIABLE VIEW that a new variable E_1 is made from the existing variable E, and it consists of exactly the same numbers as E but it is shifted downwards 1 position. We call this a lagged variable. Go to GRAPHS – SCATTER – SIMPLE and plot E against E_1. Fit a linear line to the plot. Question: 1. What does this plot represent? Go to TRANSFORM – CREATE TIME SERIES. The variable E is already in the NEW VARIABLE(S) box. Change the NAME to E_2. Change the ORDER to 2. Click on CHANGE. Click OK. Now there is a new variable E_2 which is the same as E but shifted down 2 positions. 2. Autocorrelation Go to ANALYZE – CORRELATE – BIVARIATE and click E, E_1 and E_2 to the right. Click OK. Now you have obtained correlations between E and itself at lag 1 and lag 2. Questions: 2. What does the significant lag 1 correlation mean? 3. Why is the correlation between E and E1 not the same as the correlation between E1 and E2? Go to GRAPHS – TIME SERIES – AUTOCORRELATIONS and click E to the right and click OK. In the output you see all autocorrelation and partial autocorrelations up to lag 16. Questions: 4. Compare the results for the autocorrelations to the correlations you obtained for the lagged variables: what do you notice? 5. When looking at the plot of the autocorrelations (ACF), what do you notice? 6. What do the partial autocorrelations represent? 3. Autoregression Now we are going to make an autoregressive model. Go to ANALYZE – REGRESSION – LINEAR and chose E as the dependent variable, and E1 as the independent variable. Click OK. Questions: 6. What does this model imply? Can you make a drawing of it? 7. What do you conclude (both in statistical and in substantive terms)? Go to ANALYZE – TIME SERIES – ARIMA and put in E as the dependent variable. Then change the value for Autoregressive p in MODEL from 0 to 1. Click OK. Compare your results to the results obtained with the regression analysis. As you can see the autoregressive parameter is almost identical (the difference is caused by a difference in estimation procedure). This shows that an AR(1) model is a model in which each observation is regressed on the observation at the previous occasion. Note however that the constants in both models are rather different. This is because the constant in the regression model is the intercept (i.e., the expected value of the dependent variable y(t) when the predictor is zero), while the constant from the ARIMA model is the mean of the dependent variable (you can check this by comparing it to the mean of the variable E). It can be shown that the intercept and the mean in an AR(1) model are related to each other in the following way: Mean = intercept / (1 - regression coefficient) 4. Cross-correlation Next we will investigate the relationship between E and N over time. In particular we want to investigate whether there is a lead-lag relationship meaning that one variable changes as a function of the other variable at a previous occasion. Another way of saying this is that one variable influences the other. First make lagged variables for N in the same manner as you did before for E. This gives you N_1 and N_2. Now go to ANALYZE – CORRELATE – BIVARIATE and click E E_1 E_2 N N_1 N_2 to the right (in this order!). Questions: 8. What is the lag 1 autocorrelation of E? 9. What is the lag 1 autocorrelation of N? 10. What is the lag 1 cross-correlation between E and N at the previous occasion? 11. What is the lag 1 cross-correlation between N and E at the previous occasion? 12. Do the same for all 4 lag 2 correlations (both auto and cross). Now go to GRAPHS – TIME SERIES – CROSS-CORRELATIONS and click E and N to the right. Click OKAY. Compare the results to the lagged cross-correlations you obtained above. Question: 13. How should the cross-correlations be interpreted? 5. An autoregressive model From the above it became clear that E can be predicted from itself at the previous occasion, and it can also be predicted from N at previous occasion(s). Now we are going to make a model in which we predict E from both E and N at previous occasions. Go to ANALYZE – REGRESSION – LINEAR and put E in as the dependent variable and E_1, E_2, N_1 and N_2 as predictors. Questions: 14. Which predictors are significant? 15. What does this mean in substantive terms? (do not forget to take the sign of the parameters into account!) Now run the same regression model for N as the dependent variable. Questions: 16. Which predictors are significant? 17. What does this mean in substantive terms? (do not forget to take the sign of the parameters into account!) 18. What do you conclude about the reciprocal influence of E and N? 6. Cycles We can also expect there to be weekly cycle in the data. To check for this, we will make a week cycle variable that we can use as an independent variable in our autoregressive model. A sine wave can be written as: X(t) = R sine ( t + ), where the period = 2/f with f being the frequency (here f=7 because we are considering a week cycle); is the phase shift which determines where in the cycle we start out; and R is the amplitude, determining the range of values (you can think of it as a scaling parameter). It is well known that a sine wave can be rewritten as the sum of a sine and a cosine in the following way: X(t) = sine ( t ) + cosine ( t ), such that R = (2 + 2) -1 and can also be written as a function of and . Thus we are going to make a sine variable and a cosine variable. Got to TRANSFORM – COMPUTE and name the target variable “sine”. Then type in the box called numeric expression: SIN(day*2*3.141593/7) Click OK. Now you have made a new variable that represents a sine wave. Go to TRANSFORM – COMPUTE again and change the name of the target variable to “cosine”. Change “SIN” in the numeric expression to “COSIN”. Click OK. Now you have made a new variable that represents a cosine wave. Next we can use these two variables as predictors in our time series model to see whether there is a significant week cycle, without having to define where the peak of the cycle is. Go to ANALYZE – TIME SERIES – ARIMA. Choose E as the dependent variable. Choose sine and cosine as independent variables. Set the AR order p at 2. Click OK. Previously you found that the lagged N variables were significant predictors of E. Hence you can run the model above (with week cycle) and the lagged N variables. To compare the models you can make use of the AIC and BIC, which are reported in the table Residual Diagnostics. For both measures smaller values indicate the model describes the data better, while taking into the complexity of the model. (When the models are nested (that is, one model is a special case of the other more general model), you can use the difference in -2 times the log likelihoods. This should be chi-square distributed with df equal to the difference in number of parameters between the two models. This way you can actually determine a p-value which tells you whether adding the extra predictors to your model significantly improves the prediction of the outcome variable). You can run the same models for N. Questions: 19. Is there evidence for a weekly cycle in E and N? 20. Do the auto- and cross-regression relationships change when you include a week cycle? 7. Trends and changes In the beginning you made sequence plots of E and N. Look at these closely: there is some indication of an increase in E and decrease in N in the first 55 days, and a reversed pattern afterwards. To check these we need two trends. The first trend is simply the day variable. The second trend will be added to the first trend, and only start at occasion 55. Go to TRANSFORM – COMPUTE and call the target variable change. In the numeric expression box type 0. Click OK. Now you have a new variable on which every case has a zero score. Go to TRANSFORM – COMPUTE. Do not change the target variable name. Remove the 0 from the numeric expression box. Type in the numeric expression box: day - 55. Click on IF (lower left corner). Choose “Include if case satisfies condition”, and type: day > 55. Click CONTINUE. Click OK. Now you have a new variable that can be used to represent the difference in trend between the first 55 days and the rest (day 56 to 90). Go to ANALYZE – REGRESSION and choose E as the dependent variable. Choose “day” as the predictor. Click on NEXT to create a new model. Choose “day” and “change” as predictors. Click on STATISTICS and check the box for R squared change. Click CONTINUE. Click OK. Questions: 21. Is there a change in trend in the data? 22. What can you conclude about the trend before and after day 55?