Time series Sales figures jan 98 - dec 01 45 40

advertisement

Time series

Sales figures jan 98 - dec 01

45

40

35

30

25

20

15

10

5

0 ju n

-9

7 ja n

-9

8 ju l-

9

8 fe b

-9

9 a u g

-9

9 m a r-

0

0 o k t-

0

0 a p r-

0

1 n o v-

0

1 m a j-

0

2

600

500

400

300

200

100

0

1000

900

800

700

Tot-P ug/l, Råån, Helsingborg 1980-2001

Characteristics

• Non-independent observations (correlations structure)

• Systematic variation within a year (seasonal effects)

• Long-term increasing or decreasing level

(trend)

• Irregular variation of small magnitude

(noise)

Where can time series be found?

• Economic indicators: Sales figures, employment statistics, stock market indices,

• Meteorological data: precipitation, temperature,…

• Environmental monitoring: concentrations of nutrients and pollutants in air masses, rivers, marine basins,…

Time series analysis

Purpose: Estimate different parts of a time series in order to

– understand the historical pattern

– judge upon the current status

– make forecasts of the future development

Methodologies:

Method

Time series regression

Classical decomposition

Exponential smoothing

ARIMA modelling (Box-Jenkins)

Non-parametric tests

Transfer function and intervention models

State space modelling

Spectral domain analysis

This course?

No

No

No

Yes

Yes

Yes

Yes

No

Time series regression?

Let y t

= (Observed) value of times series at time point t and assume a year is divided into L seasons

Regession model (with linear trend): y t

=

0

+

1 t+

 j

 sj x j,t

+

 t where x j,t

= 1 if y t belongs to season j and 0 otherwise, j=

1,…,

L-1 and {

 t

} are assumed to have zero mean and constant variance

(

2 )

The parameters

0 ,

1 ,

 s 1

,…,

 s,L1 are estimated by the Ordinary Least

Squares method:

( b

0

, b

1

, b s1

, … ,b s,L-1

)=argmin {

( y t

(

0

+

1 t+

 j

 sj x j,t

) 2 }

Advantages:

• Simple and robust method

• Easily interpreted components

• Normal inference (conf.intervals, hypothesis testing) directly applicable

•Drawbacks:

•Fixed components in model (mathematical trend function and constant seasonal components)

•No consideration to correlation between observations

Example: Sales figures

Sales figures January 1998 - December 2001

45

40

35

30

25

20

15

10

5

0

Ju n-

97

Ja n-

98

Ju l-9

8

Fe b-

99

A ug

-9

9

M ar

-0

0 month

O ct

-0

0

A pr

-0

1

N ov-

01

M ay-

02 jan-98 feb-98 mar-98 apr-98 maj-98 jun-98 jul-98 aug-98 sep-98 okt-98 nov-98 dec-98

20.33

20.96

23.06

24.48

25.47

28.81

30.32

29.56

30.01

26.78

23.75

24.06

jan-99 feb-99 mar-99 apr-99 maj-99 jun-99 jul-99 aug-99 sep-99 okt-99 nov-99 dec-99

23.58

24.61

27.28

27.69

29.99

30.87

32.09

34.53

30.85

30.24

27.86

24.67

jan-00 feb-00 mar-00 apr-00 maj-00 jun-00 jul-00 aug-00 sep-00 okt-00 nov-00 dec-00

26.09

26.66

29.61

32.12

34.01

32.98

36.38

35.90

36.42

34.04

31.29

28.50

jan-01 feb-01 mar-01 apr-01 maj-01 jun-01 jul-01 aug-01 sep-01 okt-01 nov-01 dec-01

28.43

29.92

33.44

34.56

34.22

38.91

41.31

38.89

40.90

38.27

32.02

29.78

Construct seasonal indicators : x

1

, x

2

, … , x

12

January (1998-2001): x

1

= 1, x

2

= 0, x

3

= 0, …, x

12

= 0

February (1998-2001): x

1

= 0, x

2

= 1, x

3

= 0, …, x

12

= 0 etc.

December (1998-2001): x

1

= 0, x

2

= 0, x

3

= 0, …, x

12

= 1 sales

20.33

20.96

23.06

24.48

32.02

I

29.78

time

1

2 x1

1

0 x2

0

1 x3

0

0 x4

0

0 x5

0

0 x6

0

0 x7

0

0 x8

0

0 x9

0

0 x10

0

0 x11

0

0 x12

0

0

47

I

3

4

48

0

I

0

0

0

0

I

0

0

0

0

I

1

0

0

0

I

0

1

0

0

I

0

0

0

0

I

0

0

0

0

I

0

0

0

0

I

0

0

0

0

I

0

0

0

0

I

0

0

0

1

I

0

0

0

0

I

0

0

1

Use 11 indicators, e.g. x

1

x

11 in the regression model

Regression Analysis: sales versus time, x1, ...

The regression equation is sales = 18.9 + 0.263 time + 0.750 x1 + 1.42 x2 + 3.96 x3 + 5.07 x4 + 6.01 x5

+ 7.72 x6 + 9.59 x7 + 9.02 x8 + 8.58 x9 + 6.11 x10 + 2.24 x11

Predictor Coef SE Coef T P

Constant 18.8583 0.6467 29.16 0.000

time 0.26314 0.01169 22.51 0.000

x1 0.7495 0.7791 0.96 0.343

x2 1.4164 0.7772 1.82 0.077

x3 3.9632 0.7756 5.11 0.000

x4 5.0651 0.7741 6.54 0.000

x5 6.0120 0.7728 7.78 0.000

x6 7.7188 0.7716 10.00 0.000

x7 9.5882 0.7706 12.44 0.000

x8 9.0201 0.7698 11.72 0.000

x9 8.5819 0.7692 11.16 0.000

x10 6.1063 0.7688 7.94 0.000

x11 2.2406 0.7685 2.92 0.006

S = 1.087 R-Sq = 96.6% R-Sq(adj) = 95.5%

Analysis of Variance

Source DF SS MS F P

Regression 12 1179.818 98.318 83.26 0.000

Residual Error 35 41.331 1.181

Total 47 1221.150

Source DF Seq SS time 1 683.542

x1 1 79.515

x2 1 72.040

x3 1 16.541

x4 1 4.873

x5 1 0.204

x6 1 10.320

x7 1 63.284

x8 1 72.664

x9 1 100.570

x10 1 66.226

x11 1 10.039

Unusual Observations

Obs time sales Fit SE Fit Residual St Resid

12 12.0 24.060 22.016 0.583 2.044 2.23R

21 21.0 30.850 32.966 0.548 -2.116 -2.25R

R denotes an observation with a large standardized residual

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI

1 32.502 0.647 ( 31.189, 33.815) ( 29.934, 35.069)

Values of Predictors for New Observations

New Obs time x1 x2 x3 x4 x5 x6

1 49.0 1.00 0.000000 0.000000 0.000000 0.000000 0.000000

New Obs x7 x8 x9 x10 x11

1 0.000000 0.000000 0.000000 0.000000 0.000000

Sales figures with predicted value

45

40

35

30

25

20

15

10

5

0

Ju n-

97

Ja n-

98

Ju l-9

8

Fe b-

99

A ug

-9

9

M ar

-0

0 month

O ct

-0

0

A pr

-0

1

N ov-

01

M ay-

02

What about serial correlation in data?

Positive serial correlation:

Values follow a smooth pattern

Negative serial correlation:

Values show a “thorny” pattern

How to obtain it?

Use the residuals.

e t

 y t

 t

 y t



 ˆ

0

  ˆ

1

 t

 j

11 

1

 ˆ s , j

 x j , t

 ; t

1 ,..., 48

Residual plot from the regression analysis:

2

1

0

-1

-2

10 20 30

Month number (from jan 1998)

Smooth or thorny?

Durbin Watson test on residuals: d

 t n 

2

( e t

 e t

1

) 2 t n 

1 e t

2

Thumb rule:

If d < 1 or d > 3, the conclusion is that residuals (and original data) are correlated.

Use shape of figure (smooth or thorny) to decide if positive or negative)

(More thorough rules for comparisons and decisions about positive or negative correlations exist.)

Durbin-Watson statistic = 2.05 ( Comes in the output )

Value > 1 and < 3  No significant serial correlation in residuals!

• Decompose – Analyse the observed time series in its different components:

– Trend part

– Seasonal part

( TR )

( SN )

– Cyclical part

– Irregular part

( CL )

( IR )

Cyclical part: State-of-market in economic time series

In environmental series, usually together with

TR

• Multiplicative model: y t

=TR t

·SN t

·CL t

·IR t

Suitable for economic indicators

Level is present in TR t

TC t

= (

TR·CL

) t or in

SN t

, IR indices t

(and CL t

) works as

Seasonal variation increases with level of y t

• Additive model: y t

=TR t

+SN t

+CL t

+IR t

More suitable for environmental data

Requires constant seasonal variation

SN t

, IR t

(and CL t

) vary around 0

Example 1: Sales data

Sales figures jan 98 - dec 01

45.00

40.00

35.00

30.00

25.00

20.00

15.00

10.00

5.00

0.00

ju n

-9

7 ja n

-9

8 ju l-

9

8 fe b

-9

9 a u g

-9

9 m a r-

0

0 o k t-

0

0 a p r-

0

1 n o v-

0

1 m a j-

0

2

Observed (blue) and deseasonalised (magenta)

45.00

40.00

35.00

30.00

25.00

20.00

15.00

10.00

5.00

0.00

ju n

-9

7 ja n

-9

8 ju l-9

8 fe b

-9

9 a ug

-9

9 ma r-0

0 o kt

-0

0 a pr-0

1 n ov-

0

1 ma j-0

2

Observed (blue) and theoretical trend (magenta)

45.00

40.00

35.00

30.00

25.00

20.00

15.00

10.00

5.00

0.00

ju n

-9

7 ja n

-9

8 ju l-9

8 fe b

-9

9 a ug

-9

9 ma r-0

0 o kt

-0

0 a pr-0

1 n ov-

0

1 ma j-0

2

Observed (blue) with estimated trendline (black)

45.00

40.00

35.00

30.00

25.00

20.00

15.00

10.00

5.00

0.00

mar-97 jul-98 dec-99 apr-01 sep-02

Example 2:

Estimation of components, working scheme

1.

Seasonally adjustment/Deseasonalisation:

• SN t usually has the largest amount of variation among the components.

The time series is deseasonalised by calculating centred and weighted Moving

Averages :

M t

( L )  y t

( L / 2 )

 y t

( L / 2

1 )

2

...

 y t

2

L

2

...

 y t

( L / 2

1 )

2

 y t

( L / 2 ) where L =Number of seasons within a year ( L

=2 for ½-year data, 4 for quaerterly data och 12 för monthly data)

– M t becomes a rough estimate of (TR·CL) t

.

– Rough seasonal components are obtained by

• y t

/M t

• y t

– M t in a multiplicative model in an additive model

– Mean values of the rough seasonal components are calculated for eacj season separetly.  L means .

– The L means are adjusted to

• have an exact average of 1 (i.e. their sum equals

L ) in a multiplicative model.

• Have an exact average of 0 (i.e. their sum equals zero) in an additive model.

–  Final estimates of the seasonal components are set to these adjusted means and are denoted: sn , ,

1

 sn

L

– The time series is now deaseasonalised by

• y t

*  y t

/ sn t in a multiplicative model

• y t

*  y t

 sn t in an additive model sn sn , ,

1

 sn

L depending on which of the seasons t represents.

2. Seasonally adjusted values are used to estimate the trend component and occasionally the cyclical component.

If no cyclical component is present:

• Apply simple linear regression on the seasonally adjusted values

Estimates tr t of linear or quadratic trend component.

• The residuals from the regression fit constitutes estimates, ir t the irregular component of

If cyclical component is present:

Estimate trend and cyclical component as a whole (do not split them) by tc t

 y t

*

 m

 y t

*

( m

1 )

 

2

 m y t

*

1

 y t

*

1

   y t

*

 m i.e. A non-weighted centred Moving Average with length 2 m +1 caclulated over the seasonally adjusted values

– Common values for 2 m +1: 3, 5, 7, 9, 11, 13

– Choice of m is based on properties of the final estimate of IR t which is calculated as

• ir

 t y t

*

/( tc t

) in a multiplicative model

• ir t

 y t

* 

( tc t

) in an additive model

– m is chosen so to minimise the serial correlation and the variance of ir t

.

– 2 m +1 is called (number of) points of the

Moving Average.

Example, cont: Home sales data

Minitab can be used for decomposition by

Stat

Time series

Decomposition

Val av modelltyp

Option to choose between two models

Time Series Decomposition

Data Sold

Length 47,0000

NMissing 0

Trend Line Equation

Yt = 5,77613 + 4,30E-02*t

Seasonal Indices

Period Index

1 -4,09028

2 -4,13194

3 0,909722

4 -1,09028

5 3,70139

6 0,618056

7 4,70139

8 4,70139

9 -1,96528

10 0,118056

11 -1,29861

12 -2,17361

Accuracy of Model

MAPE: 16,4122

MAD: 0,9025

MSD: 1,6902

Deseasonalised data have been stored in a column with head DESE1.

Moving Averages on these column can be calculated by

Stat

Time series

Moving average

Choice of 2 m +1

TC component with 2 m +1 = 3 (blue)

MSD should be kept as small as possible

By saving residuals from the moving averages we can calculate MSD and serial correlations for each choice of 2 m+ 1.

2 m +1

3

5

7

9

11

13

MSD

1.817

1.577

1.564

1.602

1.542

1.612

Corr( e t

,e t-1

-0.444

-0.473

-0.424

-0.396

-0.431

-0.405

)

A 7-points or 9-points moving average seems most reasonable.

Serial correlations are simply calculated by

Stat

Time series

Lag and further

Stat

Basic statistics

Correlation

Or manually in Session window:

MTB > lag ’RESI4’ c50

MTB > corr ’RESI4’ c50

Analysis with multiplicative model:

Time Series Decomposition

Data Sold

Length 47,0000

NMissing 0

Trend Line Equation

Yt = 5,77613 + 4,30E-02*t

Seasonal Indices

Period Index

1 0,425997

2 0,425278

3 1,14238

4 0,856404

5 1,52471

6 1,10138

7 1,65646

8 1,65053

9 0,670985

10 1,02048

11 0,825072

12 0,700325

Accuracy of Model

MAPE: 16,8643

MAD: 0,9057

MSD: 1,6388

additive

additive additive

Classical decomposition, summary

Multiplicative model: y t

TR t

SN t

CL t

IR t

Additive model: y t

TR t

SN t

CL t

IR t

Deseasonalisation

• Estimate trend+cyclical component by a centred moving average:

CMA t

 y t

( L / 2 )

 y t

( L / 2

1 )

2

...

 y t

L

2

2

...

 y t

( L / 2

1 )

2

 y t

( L / 2 ) where L is the number of seasons (e.g. 12, 4, 2)

• Filter out seasonal and error (irregular) components:

– Multiplicative model: sn t

 ir t

 y t

CMA t

-- Additive model: sn t

 ir t

 y t

CMA t

Calculate monthly averages

Multiplicative model: sn m

1 n m

(

 n m sn l ir l

)

Additive model: sn m

1 n m

(

 n m sn l ir l

) for seasons m

=1,…,

L

Normalise the monhtly means

Multiplicative model: sn m

1

L sn m

 l

L

1 sn l

 l

L

1

L sn l

Additive model: sn m

 sn m

1

L

L l

1 sn l

Deseasonalise

Multiplicative model: d

 t y sn t t

Additive model: d t

 y t

 sn t where sn t

= sn m for current month m

Fit trend function, detrend (deaseasonalised) data tr t

 f ( t )

Multiplicative model: cl t

 ir t

 d tr t t

Additive model: cl t

 ir t

 d t

 tr t

Estimate cyclical component and separate from error component

Multiplicative model: cl t ir t

( cl

 ir ) t

 k

( cl

 ir ) t

( k

1 )

2

 k

...

1

( cl

 ir ) t

...

( cl

 ir ) t

 k

( cl

 ir ) t cl t

Additive model: cl t ir t

(

( cl cl

 ir ir ) t

) t

 k

( cl

 ir ) t

( k

1 )

...

2

 k

1

( cl

 ir ) t

 cl t

...

( cl

 ir ) t

 k

Download