LINKÖPINGS UNIVERSITET Institutionen för datavetenskap Statistik, CL, ANd 732G06 TIME SERIES ANALYSIS Fall semester 2008 Assignment Assignment week 36: Time series regression Prediction of winter rye Data material: winter_rye.txt Background: During the last decades the produce of winter rye has increased in Sweden. There is however a significant variation between years, which is mainly due to weather conditions. This fact makes it impossible to make precise predictions of harvest for coming years. The task consists of investigating if one can use a simple linear regression model for predicting the harvest in hectares of winter rye during the next year. A. Graphical illustration Get acquainted with the time series of yearly harvests of winter rye during the period 1960-1995 bye making suitable graphs. Use Microsoft Excel and enter your data from the file winter_rye.txt (tabseparated) B. Fitting regression model to raw data Describe the yearly harvests of winter rye as a linear function of time (year number) = a regression model including a linear trend. Calculate a prediction interval for the harvest the year after the last observation of the series. If you use Minitab for this part, simply copy and paste your data from Excel to Minitab. Make sure to copy so that the headlines of each column comes in the first unnumbered row of the Minitab worksheet. This regression model assumes that there is a basic linear relationship between harvest and time which is disturbed by random errors that: 1. are statistically independent 2. have zero mean and constant variance over time 3. are normally distributed The first property is particularly important. If there is an obvious tendency that a large postive (negative) deviation from the regression line is followed by positive (negative) deviations, one says the observations are serially correlated and in that case the calculations of p-values, confidence intervals and prediction intervals may be seriously disturbed. Are there any signs of serial correlation in the current analysis? Are the other assumptions fulfilled? Can the predictions improve if we replace the simple linear regression model with a model describing the harvest as a non-linear function of time? Analysis of trends in water-bearing from Lake Hjälmaren Data material: Hjalmaren.txt , Hjalmarenmonth.txt Background: During the 20th century several changes were made in our environment that in a long-time perspective would have impact on the water-bearing I in our water courses. First, almost every water course in Sweden has been objected to regulations. Second, intensified farming and forestry has lead to more efficient draining methods. In addition there are possible effects of an obvious climate-change. Your task is by application of different regression techniques investigate if there are any long time trends in water-bearing data from the outflow of Lake Hjälmaren. More precisely there are two purposes: 1. to clarify if there are statistically significant upward or downward trends for annual means or specific seasons (months). 2. to illustrate how the presence of seasonal variation affects different types of statistical trend analyses. A. Graphical illustration of data Get acquainted with the data material by plotting the series of yearly and monthly values. B. Regression analysis Carry out a formal regression analysis with time as the independent variable. Compare the results for monthly and yearly values with respect to: yearly change on the average p-value for testing the hypothesis that the coefficient (beta-parameter) for the time-variable is zero. Are there any signs on serial correlation in the annual series? Are there any signs on serial correlation in the monthly series? C. Regression analysis of deseasonalised data The water-bearing at the outflow of Lake Hjälmaren is not evenly distributed over the whole year. The second half of the year has in general more precipitation than the first half. We can therefore reduce the variation in the time series of monthly data by deseasonalise the observations. This means that the observed values are adjusted downwards for months with values above the annual mean and upwards for months with values below the annual mean. For this: - - Calculate the average water-bearing for each of the twelve calendar months and use these mean values to deseasonalise the monthly series (if you have problems computing the mean values you can find an Excel file on the course webpage that includes these values). To deseasonalise just subtract the corresponding mean value from the observation. Then carry out a new regression analysis with the deseasonalised values as response variable and compare with the results you got in B. How has the deseasonalising affected the standard deviation, s, in the regression analysis? the p-value for testing the hypothesis that the coefficient (beta-parameter) for the time variable is zero? the possible occurrence of serial correlation? D. Multiple regression analysis of trend and seasonal effect Analyse the monthly water-bearing data by using a regression model with time as one independent variable and dummy variables for the different months. How has the introduction of dummy variables affected the standard deviation, s, in the regression analysis? the p-value value for testing the hypothesis that the coefficient (beta-parameter) for the time variable is zero? E. Regression analysis of trends for separate months Carry out separate analyses of water-bearing data for the months July and January respectively. Are there any significant trends during these two calendar months? F. Summarising analysis Summarise the results from all regression analyses made and discuss which method of analysis is the most appropriate in each case.