LINKÖPINGS UNIVERSITET 732G06 TIME SERIES ANALYSIS Institutionen för datavetenskap

advertisement
LINKÖPINGS UNIVERSITET
Institutionen för datavetenskap
Statistik, CL, ANd
732G06 TIME SERIES ANALYSIS
Fall semester 2008
Assignment
Assignment week 36: Time series regression
Prediction of winter rye
Data material: winter_rye.txt
Background: During the last decades the produce of winter rye has increased in Sweden. There is
however a significant variation between years, which is mainly due to weather conditions. This fact
makes it impossible to make precise predictions of harvest for coming years. The task consists of
investigating if one can use a simple linear regression model for predicting the harvest in hectares of
winter rye during the next year.
A. Graphical illustration
Get acquainted with the time series of yearly harvests of winter rye during the period 1960-1995 bye
making suitable graphs. Use Microsoft Excel and enter your data from the file winter_rye.txt (tabseparated)
B. Fitting regression model to raw data
Describe the yearly harvests of winter rye as a linear function of time (year number) = a regression
model including a linear trend. Calculate a prediction interval for the harvest the year after the last
observation of the series.
If you use Minitab for this part, simply copy and paste your data from Excel to Minitab. Make sure to
copy so that the headlines of each column comes in the first unnumbered row of the Minitab worksheet.
This regression model assumes that there is a basic linear relationship between harvest and time which
is disturbed by random errors that:
1. are statistically independent
2. have zero mean and constant variance over time
3. are normally distributed
The first property is particularly important. If there is an obvious tendency that a large postive
(negative) deviation from the regression line is followed by positive (negative) deviations, one says the
observations are serially correlated and in that case the calculations of p-values, confidence intervals
and prediction intervals may be seriously disturbed.
Are there any signs of serial correlation in the current analysis? Are the other assumptions fulfilled?
Can the predictions improve if we replace the simple linear regression model with a model describing
the harvest as a non-linear function of time?
Analysis of trends in water-bearing from Lake Hjälmaren
Data material: Hjalmaren.txt , Hjalmarenmonth.txt
Background: During the 20th century several changes were made in our environment that in a long-time
perspective would have impact on the water-bearing I in our water courses. First, almost every water
course in Sweden has been objected to regulations. Second, intensified farming and forestry has lead to
more efficient draining methods. In addition there are possible effects of an obvious climate-change.
Your task is by application of different regression techniques investigate if there are any long time
trends in water-bearing data from the outflow of Lake Hjälmaren. More precisely there are two
purposes:
1. to clarify if there are statistically significant upward or downward trends for annual means or
specific seasons (months).
2. to illustrate how the presence of seasonal variation affects different types of statistical trend
analyses.
A. Graphical illustration of data
Get acquainted with the data material by plotting the series of yearly and monthly values.
B. Regression analysis
Carry out a formal regression analysis with time as the independent variable. Compare the results for
monthly and yearly values with respect to:


yearly change on the average
p-value for testing the hypothesis that the coefficient (beta-parameter) for the time-variable is
zero.
Are there any signs on serial correlation in the annual series?
Are there any signs on serial correlation in the monthly series?
C. Regression analysis of deseasonalised data
The water-bearing at the outflow of Lake Hjälmaren is not evenly distributed over the whole year. The
second half of the year has in general more precipitation than the first half. We can therefore reduce the
variation in the time series of monthly data by deseasonalise the observations. This means that the
observed values are adjusted downwards for months with values above the annual mean and upwards
for months with values below the annual mean. For this:
-
-
Calculate the average water-bearing for each of the twelve calendar months and use these mean
values to deseasonalise the monthly series (if you have problems computing the mean values
you can find an Excel file on the course webpage that includes these values). To deseasonalise
just subtract the corresponding mean value from the observation.
Then carry out a new regression analysis with the deseasonalised values as response variable
and compare with the results you got in B.
How has the deseasonalising affected



the standard deviation, s, in the regression analysis?
the p-value for testing the hypothesis that the coefficient (beta-parameter) for the time variable
is zero?
the possible occurrence of serial correlation?
D. Multiple regression analysis of trend and seasonal effect
Analyse the monthly water-bearing data by using a regression model with time as one independent
variable and dummy variables for the different months.
How has the introduction of dummy variables affected


the standard deviation, s, in the regression analysis?
the p-value value for testing the hypothesis that the coefficient (beta-parameter) for the time
variable is zero?
E. Regression analysis of trends for separate months
Carry out separate analyses of water-bearing data for the months July and January respectively. Are
there any significant trends during these two calendar months?
F. Summarising analysis
Summarise the results from all regression analyses made and discuss which method of analysis is the
most appropriate in each case.
Download