First, we are going to make some lagged variables

advertisement
WORKSHOP: Time series analysis
Use the dataset TSA.sav. These data were obtained from the same
individual who filled out a questionnaire related to the Five Factor Model of
personality for 90 subsequent days. The data file contains 3 variables: a
daily Extraversion score and a daily Neuroticism score and a variable
indicating which day it is.
Go to GRAPHS – SEQUENCE to plot E and N.
1. Lagged variables
First, we are going to make some lagged variables. To this end, use
TRANSFORM – CREATE TIME SERIES
Now click the variable E to the NEW VARIABLE(S) box.
Choose LAG for the FUNCTION.
Leave the ORDER at 1.
Click on CHANGE (do not skip this step!!!)
Click OK.
Now you can see in the VARIABLE VIEW that a new variable E_1 is made
from the existing variable E, and it consists of exactly the same numbers
as E but it is shifted downwards 1 position. We call this a lagged variable.
Go to GRAPHS – SCATTER – SIMPLE and plot E against E_1. Fit a linear
line to the plot.
Question:
1. What does this plot represent?
Go to TRANSFORM – CREATE TIME SERIES. The variable E is already in
the NEW VARIABLE(S) box.
Change the NAME to E_2.
Change the ORDER to 2.
Click on CHANGE.
Click OK.
Now there is a new variable E_2 which is the same as E but shifted down
2 positions.
2. Autocorrelation
Go to ANALYZE – CORRELATE – BIVARIATE and click E, E_1 and E_2 to
the right. Click OK. Now you have obtained correlations between E and
itself at lag 1 and lag 2.
Questions:
2. What does the significant lag 1 correlation mean?
3. Why is the correlation between E and E1 not the same as the
correlation between E1 and E2?
Go to GRAPHS – TIME SERIES – AUTOCORRELATIONS and click E to the
right and click OK. In the output you see all autocorrelation and partial
autocorrelations up to lag 16.
Questions:
4. Compare the results for the autocorrelations to the correlations you
obtained for the lagged variables: what do you notice?
5. When looking at the plot of the autocorrelations (ACF), what do you
notice?
6. What do the partial autocorrelations represent?
3. Autoregression
Now we are going to make an autoregressive model. Go to ANALYZE –
REGRESSION – LINEAR and chose E as the dependent variable, and E1 as
the independent variable. Click OK.
Questions:
6. What does this model imply? Can you make a drawing of it?
7. What do you conclude (both in statistical and in substantive terms)?
Go to ANALYZE – TIME SERIES – ARIMA and put in E as the dependent
variable. Then change the value for Autoregressive p in MODEL from 0 to
1. Click OK.
Compare your results to the results obtained with the regression analysis.
As you can see the autoregressive parameter is almost identical (the
difference is caused by a difference in estimation procedure). This shows
that an AR(1) model is a model in which each observation is regressed on
the observation at the previous occasion.
Note however that the constants in both models are rather different. This
is because the constant in the regression model is the intercept (i.e., the
expected value of the dependent variable y(t) when the predictor is zero),
while the constant from the ARIMA model is the mean of the dependent
variable (you can check this by comparing it to the mean of the variable
E). It can be shown that the intercept and the mean in an AR(1) model
are related to each other in the following way:
Mean = intercept / (1 - regression coefficient)
4. Cross-correlation
Next we will investigate the relationship between E and N over time. In
particular we want to investigate whether there is a lead-lag relationship
meaning that one variable changes as a function of the other variable at a
previous occasion. Another way of saying this is that one variable
influences the other.
First make lagged variables for N in the same manner as you did before
for E. This gives you N_1 and N_2. Now go to ANALYZE – CORRELATE –
BIVARIATE and click E E_1 E_2 N N_1 N_2 to the right (in this order!).
Questions:
8. What is the lag 1 autocorrelation of E?
9. What is the lag 1 autocorrelation of N?
10. What is the lag 1 cross-correlation between E and N at the previous
occasion?
11. What is the lag 1 cross-correlation between N and E at the previous
occasion?
12. Do the same for all 4 lag 2 correlations (both auto and cross).
Now go to GRAPHS – TIME SERIES – CROSS-CORRELATIONS and click E
and N to the right. Click OKAY.
Compare the results to the lagged cross-correlations you obtained above.
Question:
13. How should the cross-correlations be interpreted?
5. An autoregressive model
From the above it became clear that E can be predicted from itself at the
previous occasion, and it can also be predicted from N at previous
occasion(s). Now we are going to make a model in which we predict E
from both E and N at previous occasions.
Go to ANALYZE – REGRESSION – LINEAR and put E in as the dependent
variable and E_1, E_2, N_1 and N_2 as predictors.
Questions:
14. Which predictors are significant?
15. What does this mean in substantive terms? (do not forget to take the
sign of the parameters into account!)
Now run the same regression model for N as the dependent variable.
Questions:
16. Which predictors are significant?
17. What does this mean in substantive terms? (do not forget to take the
sign of the parameters into account!)
18. What do you conclude about the reciprocal influence of E and N?
6. Cycles
We can also expect there to be weekly cycle in the data. To check for this,
we will make a week cycle variable that we can use as an independent
variable in our autoregressive model.
A sine wave can be written as:
X(t) = R sine ( t +  ),
where the period  = 2/f with f being the frequency (here f=7 because
we are considering a week cycle);  is the phase shift which determines
where in the cycle we start out; and R is the amplitude, determining the
range of values (you can think of it as a scaling parameter).
It is well known that a sine wave can be rewritten as the sum of a sine
and a cosine in the following way:
X(t) =  sine ( t ) +  cosine ( t ),
such that R = (2 + 2) -1 and  can also be written as a function of  and
.
Thus we are going to make a sine variable and a cosine variable.
Got to TRANSFORM – COMPUTE and name the target variable “sine”.
Then type in the box called numeric expression: SIN(day*2*3.141593/7)
Click OK.
Now you have made a new variable that represents a sine wave.
Go to TRANSFORM – COMPUTE again and change the name of the target
variable to “cosine”.
Change “SIN” in the numeric expression to “COSIN”.
Click OK.
Now you have made a new variable that represents a cosine wave.
Next we can use these two variables as predictors in our time series
model to see whether there is a significant week cycle, without having to
define where the peak of the cycle is.
Go to ANALYZE – TIME SERIES – ARIMA.
Choose E as the dependent variable.
Choose sine and cosine as independent variables.
Set the AR order p at 2.
Click OK.
Previously you found that the lagged N variables were significant
predictors of E. Hence you can run the model above (with week cycle) and
the lagged N variables.
To compare the models you can make use of the AIC and BIC, which are
reported in the table Residual Diagnostics. For both measures smaller
values indicate the model describes the data better, while taking into the
complexity of the model.
(When the models are nested (that is, one model is a special case of the
other more general model), you can use the difference in -2 times the log
likelihoods. This should be chi-square distributed with df equal to the
difference in number of parameters between the two models. This way
you can actually determine a p-value which tells you whether adding the
extra predictors to your model significantly improves the prediction of the
outcome variable).
You can run the same models for N.
Questions:
19. Is there evidence for a weekly cycle in E and N?
20. Do the auto- and cross-regression relationships change when you
include a week cycle?
7. Trends and changes
In the beginning you made sequence plots of E and N. Look at these
closely: there is some indication of an increase in E and decrease in N in
the first 55 days, and a reversed pattern afterwards. To check these we
need two trends. The first trend is simply the day variable. The second
trend will be added to the first trend, and only start at occasion 55.
Go to TRANSFORM – COMPUTE and call the target variable change.
In the numeric expression box type 0.
Click OK.
Now you have a new variable on which every case has a zero score.
Go to TRANSFORM – COMPUTE. Do not change the target variable name.
Remove the 0 from the numeric expression box.
Type in the numeric expression box: day - 55.
Click on IF (lower left corner).
Choose “Include if case satisfies condition”, and type: day > 55.
Click CONTINUE.
Click OK.
Now you have a new variable that can be used to represent the difference
in trend between the first 55 days and the rest (day 56 to 90).
Go to ANALYZE – REGRESSION and choose E as the dependent variable.
Choose “day” as the predictor.
Click on NEXT to create a new model.
Choose “day” and “change” as predictors.
Click on STATISTICS and check the box for R squared change.
Click CONTINUE.
Click OK.
Questions:
21. Is there a change in trend in the data?
22. What can you conclude about the trend before and after day 55?
Download