A PAIR OF STATIONARY STOCHASTIC PROCESSES WITH A Thesis by Qi Li

advertisement

A PAIR OF STATIONARY STOCHASTIC PROCESSES WITH

APPLICATION TO WICHITA TEMPERATURE DATA

A Thesis by

Qi Li

Bachelor of Statistics, Nankai University, 2007

Submitted to the Department of Mathematics and Statistics and the faculty of the Graduate School of

Wichita State University in partial fulfillment of the requirements for the degree of

Master of Science

August 2010

c Copyright 2010 by Qi Li

All Rights Reserved

A PAIR OF STATIONARY STOCHASTIC PROCESSES WITH

APPLICATION TO WICHITA TEMPERATURE DATA

The following faculty members have examined the final copy of this thesis for form and content, and recommend that it be accepted in partial fulfillment of the requirement for the degree of Master of Science with a major in Mathematics.

Tianshi Lu, Committee Chair

Chunsheng Ma, Committee Member

Yanwu Ding, Committee Member iii

ACKNOWLEDGEMENTS

This research is supported in part by the Kansas NSF EPSCoR under Grant EPS

0903806 and in part by a Kansas Technology Enterprize Corporation grant on Understanding

Climate Change in the Great Plains: Source, Impact, and Mitigation.

iv

ABSTRACT

The thesis investigates a pair of stationary stochastic process models whose domains are the set of integers and the set of real numbers respectively. The stationary processes with our specific correlation functions include the discrete and continuous first and second order autoregressive processes as their special cases. The maximum likelihood method is then applied to obtain the nonlinear equation system for the maximum likelihood estimators of the model parameters and the solutions are found by using the deepest gradient algorithm.

The advantage of the algorithm lies in the calculation could be divided into several steps at a cost of O ( n ) calculations per step. Finally, predictions are given for both simulated data and Wichita temperature data.

v

TABLE OF CONTENTS

Chapter Page

1 INTRODUCTION

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

2 MODELS OF DISCRETE STATIONARY TIME SERIES

. . . . . . . . . . . . .

3

2.1

Stationary Time Series Models and Correlation Functions

. . . . . . . . . . .

3

2.1.1

Correlation Function with Real and Distinct α

1

, α

2

. . . . . . . . . .

4

2.1.2

Correlation Function with Only One Exponential Factor

. . . . . . .

6

2.1.3

Correlation Function with Complex Conjugates α

1

, α

2

. . . . . . . .

7

2.2

Special Cases

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.2.1

Special Case DAR(1)

. . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.2.2

Special Case DAR(2)

. . . . . . . . . . . . . . . . . . . . . . . . . . .

12

3 MODELS OF STATIONARY STOCHASTIC PROCESS ON THE REAL LINE

.

15

3.1

Stationary Stochastic Process Models and Correlation Functions

. . . . . . .

15

3.1.1

Correlation Function with Real and Distinct α

1

, α

2

. . . . . . . . . .

15

3.1.2

Correlation Function with Only One Exponential Factor

. . . . . . .

16

3.1.3

Correlation Function with Complex Conjugates α

1

, α

2

. . . . . . . .

17

3.2

Special Cases

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

3.3

Embedding

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

4 EXACT MAXIMUM LIKELIHOOD ESTIMATION

. . . . . . . . . . . . . . . .

22

4.1

Likelihood Function

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

4.2

Several Forms of Σ

− 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

4.3

Explicit Inverses of A

1 and A

2

. . . . . . . . . . . . . . . . . . . . . . . . . .

23

4.4

Equation System for Exact MLEs

. . . . . . . . . . . . . . . . . . . . . . .

26 vi

TABLE OF CONTENTS (continued)

Chapter Page

5 NUMERICAL SOLUTIONS AND PREDICTIONS

. . . . . . . . . . . . . . . . .

30

5.1

Numerical Algorithm for The Solutions

. . . . . . . . . . . . . . . . . . . . .

30

5.2

Fitting Simulated Data

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

5.3

Fitting Wichita Temperature Data

. . . . . . . . . . . . . . . . . . . . . . .

39

6 CONCLUSIONS

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

6.1

Conclusions

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

6.2

Future Work

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

REFERENCES

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44 vii

LIST OF FIGURES

Figure Page

5.1

Prediction with complete data estimation

. . . . . . . . . . . . . . . . . . . . .

36

5.2

Prediction with exact the original model

. . . . . . . . . . . . . . . . . . . . . .

37

5.3

Prediction with randomly deleted data estimation

. . . . . . . . . . . . . . . .

38

5.4

Original temperature data

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

5.5

Operated temperature data

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

5.6

Prediction with real data estimation

. . . . . . . . . . . . . . . . . . . . . . . .

41 viii

CHAPTER 1

INTRODUCTION

The first purpose of this thesis is to investigate certain kinds of functions which might be both the correlation functions of discrete stationary time series models and the correlation functions of stationary stochastic process models on the real line with the difference lying in the change of index domains. These kinds of functions contain the correlation functions of the 1st and 2nd order discrete autoregressive models (denoted as DAR(1) and DAR(2)) as special cases when the functions are the correlation functions with discrete index domains.

Correspondingly, they contain the correlation functions of the 1st and 2nd order continuous autoregressive models (denoted as CAR(1) and CAR(2)) as special cases when the functions are the correlation functions with continuous index domains.

Next we develop a method to fit the discrete time series data using one of our continuous stochastic process models. The Yule-Walker equations are not applicable to estimate the parameters in the model and calculate the predictions because of the incomplete data and the nonspecific stochastic equation of the process. We use the likelihood method to find the maximum likelihood estimators (denoted as MLEs) for the parameters in our covariance function and calculate the predictions. During the procedure of developing the algorithm to calculate the MLEs we find that one great advantage of our model is the convenience of computation regarding both the determinant and the inverse of the correlation matrix.

There are 6 chapters in the thesis and the structure is set up as follows. In chapter 2, we show the correlation structures of our general models with respect to discrete stationary time series case and then show DAR(1) and DAR(2) are special cases of the general model.

In chapter 3, we discuss the similar correlation structures but change the index domain to the set of real numbers. We show the stationary stochastic processes on the real line with the correlation functions, when they are well defined, contain CAR(1) and CAR(2) as special cases. In chapter 4, under the normality assumption, we give the equation system

1

for the exact likelihood estimation of the parameters. In chapter 5, we use matrix notation to represent the exact likelihood estimation equations and their partial derivatives for the convenience of computation. Then the algorithm is performed on both the simulated data and an example of incomplete temperature data. We show the corresponding MLEs of the parameters and compute the predictions. In chapter 6, conclusions are presented and some directions for future research are pointed out with the unsolved problems encountered during our study of the subject.

2

CHAPTER 2

MODELS OF DISCRETE STATIONARY TIME SERIES

In this chapter we introduce some discrete stationary time series models with the specific correlation functions and then show that they contain the first order discrete autoregressive model ( DAR (1) ) and the second order discrete autoregressive model ( DAR (2) ) as special cases.

2.1

Stationary Time Series Models and Correlation Functions

Considering a discrete times series { X t

, t ∈

Z

} which is a sequence of random variables on the same probability space with

Z being the set of integers, we say that it is (weak) stationary if its 1st and 2nd moments do not vary with respect to time. This is expressed as follows.

Firstly, the mean function m x

( t ) of a (weak) stationary time series does not change with respect to time which implies that the means at different time points are a constant,

E { X t

} = E { X t + s

} = m x

( t ) = m x

(0) , t, ∀ s ∈

Z

.

Secondly, the correlation function R x

( t

1

, t

2

) only depends on the lag between the two time points,

E { X t

1

X t

2

} = R x

( t

1

, t

2

) = R x

( t

1

+ s, t

2

+ s ) = R ( t

1

− t

2

) ,

So the covariance function also depends on the lag t

1

− t

2

, t

1

, t

2

, ∀ s ∈

Z

.

C x

( t

1

, t

2

) = C x

( t

1

− t

2

, 0) = C ( t

1

− t

2

) , t

1

, t

2

Z

.

It is known that a Gaussian time series with zero mean is characterized by its correlation function. The correlation functions we will introduce depend on some parameters.

According to the different choices of those parameters, the correlation functions have three different forms which will be stated in the following subsections.

3

2.1.1

Correlation Function with Real and Distinct α

1

, α

2

Assume α

1

, α

2 are two distinct real numbers with 0 < | α

1

| , | α

2

| < 1 and θ is a constant whose range is determined by the values of α

1 and α

2

. We investigate that under what conditions the function

ρ ( n ) = θα

| n |

1

+ (1 − θ ) α

2

| n |

, n ∈

Z

(2.1) is the correlation function of a discrete stationary time series on

Z

. To make sure it is a correlation function we need the function to be positive definite and obtain the conditions in Lemma 1.

Lemma 1.

Let α

1

, α

2 be two real numbers with − 1 < α

1

< α

2

< 1. The function

( 2.1

) is the correlation function for a discrete stationary times series if and only if the constant

θ is in the range

1

1 − b

≤ θ ≤

1

1 − a where a =

(1+ α

1

)(1 − α

2

)

(1 − α

1

)(1+ α

2

) and b =

(1 − α

1

)(1+ α

2

)

(1+ α

1

)(1 − α

2

)

.

Proof: A function is a positive definite function in the discrete case if and only if the

its Fourier series is nonnegative. The Fourier series of function ( 2.1

) is

+ ∞

X n = −∞

ρ ( n ) e iωn

= θ

+ ∞

X

α

| n |

1 e iωn

+ (1 − θ )

+ ∞

X

α

| n |

2 e iωn n = −∞ n = −∞

= θ [1 +

+ ∞

X

( α

1 e iω

) n

+

+ ∞

X

( α

1 e

− iω

) n

] + (1 − θ )[1 +

+ ∞

X

( α

2 e iω

) n

+

+ ∞

X

( α

2 e

− iω

) n

] n =1 n =1 n =1 n =1

= θ (1 +

α

1 e iω

1 − α

1 e iω

1 − α 2

1

+

1

α

1 e

α

= θ

(1 − α

1 e iω )(1 − α

1 e − iω )

− iω

1 e − iω

+ (1

) + (1

− θ )

− θ )(1 +

1

α

1 − α

2

2

2 e

α iω

2 e iω

+

(1 − α

1 e iω )(1 − α

1 e − iω )

α

2 e

− iω

1 − α

2 e − iω

)

= θ

1 − 2 α

1 − α 2

1

1 cos ω + α 2

1

1 − α

+ (1 − θ )

1 − 2 α

2 cos

2

2

ω + α 2

2

, ω ∈

R and the range of θ is the set of θ s that guarantee the expression to be nonnegative for all

ω ∈

R

.

4

The procedure to find the range of θ is as same as to find the range of x with the following conditions

 x

1 − α

2

1

1 − 2 α

1 cos ω + α

2

1

+ y

1 − α

2

2

1 − 2 α

2 cos ω + α

2

2

 x + y = 1

≥ 0 which is a linear programming problem with restriction.

As we know, the range of the points that satisfy the inequality ax + by ≥ 0 with b > 0 is the half plain above the line ax + by = 0 when a and b are fixed. Now the tangent of the line changes according to ω . So we need to find the line with the largest and the smallest tangent respectively, and then figure out the intersection of the two half planes. Finally, we determine the range of x such that the line x + y = 1 falls into this intersection.

The line of our linear programming could be written as y =

(1 − α 2

1

)(1 − 2 α

2

(1 − α 2

2

)(1 − 2 α

1 cos ω + α 2

2 cos ω + α 2

1

)

) x.

To find the extreme value of the tangent we take the first derivative of

1 − 2 α

2

1 − 2 α

1 cos ω + α 2

2 cos ω + α 2

1 then let its numerator equal 0 and solve the equation, which is

(1 − 2 α

2 cos ω + α

2

2

)

0

(1 − 2 α

1 cos ω + α

2

1

) − (1 − 2 α

2 cos ω + α

2

2

)(1 − 2 α

1 cos ω + α

2

1

)

0

=2 α

2 sin ω (1 − 2 α

1 cos ω + α

2

1

) − 2 α

1 sin ω (1 − 2 α

2 cos ω + α

2

2

)

=2 sin ω ( α

1

α

2

− 1)( α

2

− α

1

) = 0 .

Since − 1 < α

1

< α

2

< 1, the numerator achieves its extremes when sin ω = 0. Thus the tangent will be maximized and minimized when cos ω = 1 or cos ω = − 1, which is

 y = −

(1+ α

1

)(1 − α

2

(1 − α

1

)(1+ α

2

)

) x when cos ω = 1 y = −

(1 − α

1

)(1+ α

2

)

(1+ α

1

)(1 − α

2

) x when cos ω = − 1 .

Both tangents are negative and reciprocal to each other. The range of θ is the range of x such that x + y = 1 falls between the two lines whose boundary is given by the x coordinates

5

of the two crossing points. The following inequality is the range of θ which we have already stated in Lemma 1,

1

1 − b

≤ θ ≤

1

1 − a

.

2.1.2

Correlation Function with Only One Exponential Factor

Assume α is a real number with 0 < | α | < 1 and θ is a real number whose range is determined by α . We discuss under what conditions the function

ρ ( n ) = (1 + θ | n | ) α

| n |

, n ∈

Z

(2.2) is the correlation function of a discrete stationary time series. The conditions to make sure

the function ( 2.2

) positive definite are stated in Lemma 2.

Lemma 2.

Let α be a real number with 0 < | α | < 1 and θ be a real number whose range is determined by α

. Then the function ( 2.2

) is the correlation function of a discrete

stationary time series if and only if θ satisfies the following conditions

− 1 − α

2

2 α

1 − α

2

2 α

≤ θ ≤ 1 − α

2

2 α

≤ θ ≤ − 1 − α

2

2 α when 0 < α < 1 when − 1 < α < 0 .

Proof:

The Fourier series of function ( 2.2

) is

X n = −∞

ρ ( n ) e iωn

= 1 +

X

(1 + θn ) α n e iωn

+

X

(1 + θn ) α n e

− iωn

= 1 +

= n =1

(1 + θ ) αe iω − α 2 e i 2 ω

(1 − αe iω ) 2 n =1

(1 + θ ) αe

− iω

+

− α

(1 − αe − iω ) 2

2 e

− i 2 ω

1 − 4 θα

2 − α

4

+ 2[( θ − 1) α + ( θ + 1) α

3

] cos( ω )

.

(1 − 2 α cos( ω ) + α 2 ) 2

The range of θ is the set of θ s that guarantee the expression to be nonnegative for all ω ∈

R

.

Because the denominator is always nonnegative, we only need to make sure the numerator is nonnegative.

We need to find the range of θ that guarantees the expression

[2( α + α

3

) cos( ω ) − 4 α

2

] θ − 2( α − α

3

) cos( ω ) + 1 − α

4

(2.3)

6

to be nonnegative. Let the first derivative of expression ( 2.3

) with respect to

ω equal 0, we obtain

[(1 − θ ) α − (1 − θ ) α

3

] sin( ω ) = 0 which is 0 when sin( ω ) = 0, equivalently cos( ω ) = − 1 or cos( ω ) = 1. By plugging cos( ω ) =

− 1 or cos( ω

) = 1 into expression ( 2.3

) and solving the inequality, we get

θ ≥ −

θ ≤ −

θ

θ

1 − α

2

2 α

1 − α

2

2 α

1 − α

2

2 α

1 − α

2

2 α when 0 < α < 1 when 0 < α < 1 and cos( ω ) = 1 when − 1 < α < 0 and cos( ω ) = 1 when − 1 < α < 0 and cos( ω ) = − 1 and cos( ω ) = − 1 .

We match them with the compatible pairs and get our result stated in Lemma 2, which is

− 1 − α

2

2 α

1 − α

2

2 α

≤ θ ≤ 1 − α

2

2 α

≤ θ ≤ − 1 − α

2

2 α when 0 < α < 1 when − 1 < α < 0 .

2.1.3

Correlation Function with Complex Conjugates α

1

, α

2

Assume α

1

, α

2 are two conjugate complex numbers with 0 < | α

1

| = | α

2

| < 1 ( | α | denotes the modulus of a complex number α ) and θ is a complex number whose range is determined by the values of α

1 and α

2

. We investigate under what conditions the function

ρ ( n ) = θα

| n |

1

+ (1 − θ ) α

| n |

2

= 2 | θ | a

| n | cos( | n | β

1

+ β

2

) , n ∈

Z

(2.4) is the correlation function of a discrete stationary time series. Here a, | α

1

| , | α

2

| are the identical real numbers represent the norm of α

1

, α

2 and 0 < a < 1.

β

1

= Arg( α

1

) > 0 >

Arg( α

2

) = − β

1 and β

2

= Arg( θ ). The “Arg” whose range is [ − π, π ) represents the principle value of a complex number. We state the conditions in Lemma 3.

Lemma 3.

Let α

1

, α

2 be two conjugate complex numbers with 0 < | α

1

| = | α

2

| < 1 and θ be a complex number whose range is determined by the values of α

1 and α

2

. The

function ( 2.4

) is the correlation function of a discrete stationary time series if and only if the

7

following condition is satisfied

− π + arccot a sin( β

1

)

1 − a 2

≤ β

2

≤ arccot − a sin( β

1

)

1 − a 2 where β

1

= Arg( α

1

), β

2

= Arg( θ ), a = | α

1

| = | α

2

| and the range of arccot is (0 , π ).

Proof:

The Fourier series of function ( 2.4

) is

X

2 | θ | a

| n | cos( | n | β

1

+ β

2

) e iωn n = −∞

X

= | θ | a

| n |

[ e i ( | n | β

1

+ β

2

)

+ e

− i ( | n | β

1

+ β

2

)

] e iωn

= | n = −∞

θ | e iβ

2

X a

| n | e i | n | β

1 e iωn

+ e

− iβ

2

X a

| n | e

− i | n | β

1 e iωn

!

.

n = −∞ n = −∞

We could omit | θ | in the following steps since | θ | > 0. Now let us take a look at a simplification of the first sum: e iβ

2

X a

| n | e i | n | β

1 e iωn n = −∞

= e iβ

2

"

1 +

X a n e in ( β

1

+ ω )

+

X a n e in ( β

1

− ω )

#

= e iβ

2 n =1 ae i ( β

1

+ ω )

1 +

1 − ae i ( β

1

+ ω ) n =1 ae i ( β

1

− ω )

+

1 − ae i ( β

1

− ω )

= e iβ

2

1 − a 2 e 2 iβ

1

(1 − ae i ( β

1

+ ω ) )(1 − ae i ( β

1

− ω ) )

, and then we simplify the second sum with the same method:

X e

− iβ

2 a

| n | e

− i | n | β

1 e iωn n = −∞

= e

− iβ

2

1 − a 2 e

− 2 iβ

1

(1 − ae − i ( β

1

− ω ) )(1 − ae − i ( β

1

+ ω ) )

.

8

By adding them together and further simplifying, we obtain e iβ

2

X a

| n | e i | n | β

1 e iωn

+ e

− iβ

2

X a

| n | e

− i | n | β

1 e iωn n = −∞ n = −∞

1 − a 2 e 2 iβ

1

= e iβ

2

(1 − ae i ( β

1

+ ω ) )(1 − ae i ( β

1

− ω ) )

1 − a 2 e

− 2 iβ

1

+ e

− iβ

2

(1 − ae − i ( β

1

− ω ) )(1 − ae − i ( β

1

+ ω ) ) e iβ

2 (1 − a

2 e

2 iβ

1 )(1 − ae

− i ( β

1

− ω ) − ae

− i ( β

1

+ ω )

+ a

2 e

− 2 iβ

1 )

=

(1 − 2 a cos( β

1

+ ω ) + a 2 )(1 − 2 a cos( β

1

− ω ) + a 2 )

+ e

− iβ

2

(1

(1 − a 2

− 2 a e cos(

− 2 iβ

1

β

1

+

)(1

ω

) + ae a 2 i ( β

1

− ω )

)(1 − 2

− ae a i ( β

1

+ ω ) cos( β

1

+ a 2 e 2 iβ

1 )

.

− ω ) + a 2 )

(2.5)

From ( 2.5

) we find it is

β

2

= Arg( θ ) that determines the sign of the Fourier series

of function ( 2.4

). We discuss the range of

β

2 by considering the geometric meaning of the

complex numbers in ( 2.5

). Since

α

1

, α

2 are already known the values of a and β

1 are fixed.

Because the correlation function ( 2.4

) is a real value function and the Fourier series is also

real we claim the two complex numbers e iβ

2 (1 − a

2 e

2 iβ

1 )(1 − ae

− i ( β

1

− ω ) − ae

− i ( β

1

+ ω )

+ a

2 e

− 2 iβ

1 ) and e

− iβ

2 (1 − a

2 e

− 2 iβ

1 )(1 − ae i ( β

1

− ω )

− ae i ( β

1

+ ω )

+ a

2 e

2 iβ

1 ) are complex conjugates. Let us denote c

1

= (1 − a 2 e 2 iβ

1 ) and c

2

= (1 − ae

− i ( β

1

− ω ) − ae

− i ( β

1

+ ω ) + a

2 e

− 2 iβ

1 c

1

= (1 − a

2 e

− 2 iβ

1 c

2

= (1 − ae i ( β

1

− ω ) − ae i ( β

1

+ ω )

+ a

2 e

2 iβ

1 ).

The sign of ( 2.5

) only depends on the Arg(

c

1

) and Arg( c

2

) but not their modulus. It is easy to find that the Arg( c

1

) is fixed since it only depends on a and β

1

. Thus the range of Arg( c

2

) plays the most important role. The complex number c

2

= (1 + a 2 e

− 2 iβ

1 ) − ( ae

− i ( β

1

− ω ) + ae

− i ( β

1

+ ω ) ) could be viewed as a substraction of two parts. The first part is the part in the first bracket which is not changed with respect to ω . The second part is the part in the second bracket and it is changed when ω changes. We should notice that the second part, if we consider it as a vector on the complex plane, is always moving along the line

Arg( c ) = − β

1 or Arg( c ) = − β

1

+ π , because no matter what value ω takes the two complex numbers ae

− i ( β

1

− ω ) and ae

− i ( β

1

+ ω ) are always symmetric with respect to this line. We draw

9

the conclusion that the extreme values of Arg( c

2

) are achieved when ω = 0 or ω = π . Thus

β

2 also achieve its extremes when ω = 0 or ω = π . Let us consider the value of ω in two cases and preform the calculation respectively as follows.

When ω

= 0, the numerator of ( 2.5

) becomes

e iβ

2 (1 − a

2 e

2 iβ

1 )(1 − ae

− iβ

1 )

2

+ e

− iβ

2 (1 − a

2 e

− 2 iβ

1 )(1 − ae iβ

1 )

2

= (1 − ae iβ

1 )(1 − ae

− iβ

1 ) e iβ

2 (1 + ae iβ

1 )(1 − ae

− iβ

1 ) + e

− iβ

2 (1 + ae

− iβ

1 )(1 − ae iβ

1 )

= 2(1 − a

2

) cos( β

2

) + 2 a sin( β

1

) sin( β

2

) .

Letting it be greater than 0 we find the range of β

2 is cot( β

2

) ≥ − a sin( β

1

)

1 − a 2 when or sin( β

2

) > 0 cot( β

2

) ≤ − a sin( β

1

)

1 − a 2 when sin( β

2

) < 0 .

When ω = π , we perform the calculation in the same way and obtain the numerator

of ( 2.5

) as

2(1 − a

2

) cos( β

2

) − 2 a sin( β

1

) sin( β

2

) .

Letting it be greater than 0 we find the range of β

2 is cot( β

2

) ≥ a sin( β

1

)

1 − a 2 when or

The final conclusion is cot( β

2

) ≤ a sin( β

1

)

1 − a 2 when sin( sin(

β

β

2

2

)

) >

<

0

0 .

− π + arccot a sin( β

1

)

1 − a 2

≤ β

2

≤ arccot − a sin( β

1

)

1 − a 2

.

2.2

Special Cases

In this section we will show the correlation functions of DAR (1) and DAR (2) are included in the correlation functions of our models. Thus DAR (1) and DAR (2) are special cases of our discrete stationary time series models.

10

Definition 1.

A stationary random process { X t

, t ∈

Z

} is from DAR ( p ) model if there exist the following relation

X t

= φ

1

X t − 1

+ φ

2

X t − 2

+ . . .

+ φ p

X t − p

+ ω t

, t ∈

Z where φ

1

, φ

2

, . . . , φ p are constants( φ p

= 0) and { ω t

, t ∈

Z

} is a Gaussian white noise series independently from { X t − 1

, X t − 2

, . . .

} with mean 0 and variance σ

2

ω

.

Given the definition of the general DAR ( p ) model, we are going to develop the correlation function for DAR (1) and DAR (2) models respectively.

2.2.1

Special Case DAR(1)

Proposition 1.

For DAR (1) model X t

= φX t − 1

+ ω t

, φ = 0 the correlation function has the following form: r ( n ) = φ

| n |

, n ∈

Z

.

Proof: First we claim the correlation function of an invertible DAR (1) model is a

special case of ( 2.2

) when

θ = 0 with | φ | < 1 and then we show the procedure to derive the correlation function. Since the correlation function is even we assume n ≥ 0 while deriving the formula. When p = 1 and the DAR (1) model is invertible, the sequence { X t

, t ∈

Z

} can be written as

X t

= φX t − 1

+ ω t

= φ ( φX t − 2

+ ω t − 1

) + ω t

. . . .

This method suggests, by continuing to iterate backwards, that the DAR (1) model can be represented as a linear combination of Gaussian white noise series

X t

=

X

φ i

ω t − i

.

i =0

11

The covariance function C ( n ) , n ≥ 0 satisfies

C ( n ) = cov ( X t + n

, X t

)

= E [(

X

φ i

ω t + n − i

)(

X

φ j

ω t − j

)] j =0 i =0

= σ

2

ω

X

φ i

φ i + n i =0

= σ

2

ω

φ n

X

φ

2 i

= i =0

σ 2

ω

φ n

1 − φ 2

.

And it is easy to verify that

C ( n ) =

σ 2

ω

φ n

1 − φ 2

= φC ( n − 1) = φ n

C (0) = C ( − n ) .

Thus, r ( n ) = r ( − n ) =

C ( n )

= φ

| n |

.

C (0)

2.2.2

Special Case DAR(2)

Proposition 2.

For the DAR (2) model X t

= φ

1

X t − 1

+ φ

2

X t − 2

+ ω t

, φ

2

= 0 with the equation 1 − φ

1 z − φ

2 z

2

= 0 having two roots z

1

, z

2

, the correlation function satisfies one of the following three forms r ( n ) = c

1 z

1

−| n |

+ c

2 z

2

−| n | r ( n ) = z

1

−| n |

( c

1

+ c

2

| n | ) r ( n ) = a | z

1

|

−| n | cos ( | n | β + b )

(2.6)

(2.7)

(2.8) where c

1

, c

2

, a, b and β are constants determined by the model.

Proof: Because the correlation function is even so we only show the proof for n ≥ 0.

The correlation function has the following relation r ( n ) − φ

1 r ( n − 1) − φ

2 r ( n − 2) = cov ( X t − n

, X t

− φ

1

X t − 1

+ φ

2

X t − 2

)

= cov ( X t − n

, ω t

) = 0

12

(2.9)

Suppose the equation 1 − φ

1 z − φ

2 z 2 = 0 has two different roots z

1

, z

2 which could be either real numbers or complex conjugates (note that if the AR (2) model is invertible, both z

1

, z

2

are outside the unit circle). Then we claim the general solution to equation ( 2.9

) is

r ( n ) = c

1 z

1

− n

+ c

2 z

2

− n

(2.10) where c

1

, c

2 depend on the initial conditions which are r (0) = c

1

+ c

2 r (1) = c

1 z

1

− 1

+ c

2 z

2

− 1

.

This claim of the general form solution can be verified through the substitution of

equation ( 2.10

) into equation ( 2.9

)

( c

1 z

1

− n

+ c

2 z

2

− n

) − φ

1

( c

1 z

1

− ( n − 1)

+ c

2 z

− ( n − 1)

2

) − φ

2

( c

1 z

− ( n − 1)

1

+ c

2 z

2

− ( n − 1)

)

= c

1 z

1

− n

(1 − φ

1 z

1

− φ

2 z

1

2

) + c

2 z

2

− n

(1 − φ

1 z

2

− φ

2 z

2

2

)

= c

1 z

− n

1

· 0 + c

2 z

− n

2

· 0 = 0

So ( 2.10

) is a form of the correlation function of

DAR

(2) which is ( 2.6

) when

z

1

, z

2 are real

numbers and ( 2.8

) when

z

1

, z

2

are complex conjugates. It is easy to check that ( 2.7

) is the

corresponding correlation function when the equation 1 − φ

1 z − φ

2 z 2 = 0 has equal roots.

After calculation we give the specific value of those parameters in ( 2.6

), ( 2.7

) and

( 2.8

). Let

α

1

=

1 z

1 and α

2

=

1 z

2 where z

1

, z

2 are two roots of equation 1 − φ

1 z − φ

2 z

2

= 0,

then the three formulas ( 2.6

), ( 2.7

) and ( 2.8

) correspond to the following three formulas in

the same order r ( n ) =

(1 − α

2

2

( α

) α

1

| n | +1

1

− (1 − α

2

1

) α

| n | +1

2

,

− α

2

)(1 + α

1

α

2

) r ( n ) = (1 + | n |

1 − α

2

1

1 + α 2

1

) α

| n |

1

, r ( n ) = [ cos ( | n | β ) + cot ( β ) sin ( | n | β )

1 − | α

1

| 2

1 + | α

1

| 2

] | α

1

|

| n |

, β = Arg( α

1

) .

(2.11)

(2.12)

(2.13)

13

So the correlation functions of DAR (2) are all special cases of our general models because the coefficients all fall into the range we determined in Lemma 1, Lemma 2 and Lemma 3.

At the end of this chapter we claim our discrete time series model contains some cases other than DAR (1) and DAR (2) model. One reason is that the range of θ could be broader

(note: We still need | α

1

| < 1 , | α

2

| < 1, although sometimes it is even not strict enough, to make the correlation matrix nonnegative definite).

14

CHAPTER 3

MODELS OF STATIONARY STOCHASTIC PROCESS ON THE REAL LINE

In this chapter we introduce stationary stochastic process models and their specific correlation functions whose index set is the set of real numbers and focus on the three types

of functions similar to ( 2.1

),(

2.2

) and ( 2.4

) to find the necessary and sufficient conditions

such that they are the correlation functions of stationary stochastic processes on the real line.

Then we show that they contain the first order continuous autoregressive model ( CAR (1)) and the second order continuous autoregressive model ( CAR (2)) as special cases.

3.1

Stationary Stochastic Process Models and Correlation Functions

A stochastic process { X t

, t ∈

R

} , where

R is the set of real numbers, is a set of random variables on the same probability space at any time point. We say it is (weak) stationary if its 1st and 2nd moments do not vary with respect to time. This could be expressed as the same properties shown at the beginning of the second chapter with a change of the index domain from

Z to

R

. Because the formulas for those properties are the same we will not repeat them here.

3.1.1

Correlation Function with Real and Distinct α

1

, α

2

Assume α

1

, α

2 are two distinct real numbers with 0 < α

1

, α

2

< 1 and θ is a constant whose range is determined by the values of α

1 and α

2

, we investigate under what conditions the function

ρ ( t ) = θα

| t |

1

+ (1 − θ ) α

| t |

2

, t ∈

R

(3.1) is the correlation function of a stationary stochastic process on the real line. To make sure it is a correlation function we need the function to be positive definite, thus we introduce

Lemma 4.

Lemma 4.

Let α

1

, α

2 be two real numbers and 0 < α

1

< α

2

<

1. The function ( 3.1

)

is the correlation function for a stationary stochastic process on the real line if and only if

15

the constant θ satisfies ln α

2 ln α

2

− ln α

1

≤ θ ≤ ln α

1 ln α

1

− ln α

2

.

Proof: By Bochner’s theorem, a continuous function is the positive definite function if and only if its Fourier transform is nonnegative, which is

Z

+ ∞

[ θα

| t |

1

−∞

+ (1 − θ ) α

| t |

2

] e iωt d t

=

Z

0

+ ∞

θα t

1 e iωt d t +

Z

0

−∞

θα

− t

1 e iωt d t +

Z

(1 − θ ) α t

2

0 e iωt d t +

1 − θ

Z

0

−∞

(1 − θ ) α

− t

2 e iωt d t

=

= θ

θ

− ln α

1

2 ln

(ln α

1

) 2

α iω

+

1

ω

+

2

θ

− ln α

1

+

+ (1 − θ ) iω

+

− 2 ln α

2

1 − θ

− ln α

2

(ln α

2

) 2 + ω 2

.

− iω

+

− ln α

2

+ iω

We need to find the range of θ such that the expression above is not less than 0 by solving the restricted linear programming problem

 x

− 2 ln α

1

(ln α

1

) 2 + ω 2

+ y

− 2 ln α

2

(ln α

2

) 2 + ω 2

≥ 0

 x + y = 1 with the same procedure that we have used for the discrete case.

After similar calculations we get the two lines with extremal tangent, which are

 y = − ln α

1 ln α

2 x ω → ∞

 y = − ln α

2 ln α

1 x ω = 0 .

Thus, we must have ln α

2 ln α

2

− ln α

1

≤ θ ≤ ln α

1 ln α

1

− ln α

2

.

3.1.2

Correlation Function with Only One Exponential Factor

Assume α is a real number with 0 < α < 1 and θ is a real number whose range is determined by α . We investigate under what conditions the function

ρ ( t ) = (1 + θ | t | ) α

| t |

, t ∈

R

(3.2) is the correlation function of a stationary stochastic process on the real line. We need the conditions to guarantee the function above positive definite which are stated in Lemma 5.

16

Lemma 5.

Let α be a real number with 0 < α < 1 and θ be a constant whose range is determined by α

. The function ( 3.2

) is the correlation function of a stationary stochastic

process on the real line if and only if θ satisfies ln α ≤ θ ≤ − ln α.

Proof:

The Fourier transform of the function ( 3.2

) is

Z

(1 + θ | t | ) α

| t | e iωt d t

−∞

Z

Z

0

= α t e iωt d t +

0 −∞

α

− t e iωt d t + θ

Z

0 tα t e iωt d t − θ

1

= − ln α + iω

+

− ln

1

α + iω

− ln α

θ

+ iω

Z

0

Z

0

−∞ e

(ln α + iω ) t d t + tα

− t e iωt d t

− ln

θ

α + iω

1

= − ln α + iω

+

− ln

1

α + iω

+

(ln α

θ

+ iω ) 2

+

( − ln α

θ

+ iω ) 2

− 2( θ + ln α ) ω

2

+ 2( θ − ln α )(ln α )

2

=

[(ln α ) 2 + ω 2 ] 2

Z

0

−∞ e

( − ln α + iω ) t d t and it is nonnegative for all ω ∈

R if and only if the numerator is nonnegative. Because the numerator is a quadratic form of ω , thus the necessary and sufficient conditions are

θ + ln α ≤ 0

θ − ln α ≥ 0 , which is our conclusion in Lemma 5.

3.1.3

Correlation Function with Complex Conjugates α

1

, α

2

Assume α

1

, α

2 are two conjugate complex numbers with 0 < | α

1

| = | α

2

| < 1 and θ is a complex number whose range is determined by the values of α

1 and α

2

, we investigate under what conditions the function

ρ ( t ) = θα

| t |

1

+ (1 − θ ) α

| t |

2

= 2 | θ | a

| t | cos( | t | β

1

+ β

2

) , t ∈

R

(3.3) is the correlation function of a stationary stochastic process on the real line, where a =

| α

1

| = | α

2

| are the identical real numbers represent the modulus of α

1

, α

2 and 0 < a < 1.

17

β

1

= Arg( α

1

) > 0 > Arg( α

2

) = − β

1 and β

2

= Arg( θ ). The “Arg” whose range is [ − π, π ) represent the principle value of a complex number. We state the conditions in Lemma 6.

Lemma 6.

Let α

1

, α

2 be two conjugate complex numbers with 0 < | α

1

| = | α

2

| < 1 and θ be a complex number whose range is determined by the values of α

1 and α

2

, the

function ( 3.3

) is the correlation function of a stationary stochastic process on the real line if

and only if one of the following conditions is satisfied

π

2

+ arctan

β

1

− ln a

≤ β

2

≤ − arctan

β

1

− ln a or arctan

β

1

− ln a

≤ β

2

π

2

− arctan where β

1

= Arg( α

1

), β

2

= Arg( θ ) and a = | α

1

| = | α

2

| .

β

1

− ln a

Proof:

The Fourier transform of the function ( 3.3

) is

Z

| θ | a

| t | cos( | t | β

1

+ β

2

) e iωt d t

=

−∞

| θ | n

Z

∞ e iβ

2 e

[ − ln a + i ( ω + β

1

)] t d t +

2

0

Z

0

+ e

− iβ

2 e

[ − ln a + i ( ω + β

1

)] t d t o

Z

0 e

− iβ

2 e

[ln a + i ( ω − β

1

)] t d t +

Z

0

−∞ e iβ

2 e

[ − ln a + i ( ω − β

1

)] t d t

=

| θ

2

|

−∞

= −

| θ |

2 e iβ

2

− ln a + i ( ω + β

1

) e

− iβ

2 ln a + i ( ω − β

1

) e iβ

2 e

− iβ

2

− + e

− iβ

2 [ln a + i ( ω + β

1

)] + e iβ

2 [ln a − i ( ω + β

1

)]

+

− ln a + i ( ω − β

1

) − ln a + i ( ω + β

1

) e iβ

2 [ln a + i ( ω − β

1

)] + e

− iβ

2 [ln a − i ( ω − β

1

)]

+

(ln a ) 2 + ( ω + β

1

) 2 (ln a ) 2 + ( ω − β

1

) 2

.

Nonnegativity of the above formula depends on the range of β

2 with respect to all ω ∈

R

. The denominator is always positive. For further calculation let us denote the complex numbers as ln a + i ( ω + β

1

) = a

1 e iθ

1 ln a − i ( ω + β

1

) = a

1 e

− iθ

1 ln a + i ( ω − β

1

) = a

2 e iθ

2 ln a − i ( ω − β

1

) = a

2 e

− iθ

2

18

where a

1

, a

2 are two nonnegative real numbers. After substitution and further simplification the condition that the numerator of the above formula is nonnegative is equivalent to

0 ≥ e iβ

2 a

2 e

− iθ

1 + a

1 e iθ

2 + e

− iβ

2 a

2 e iθ

1 + a

1 e

− iθ

2

= e iβ

2 e

− i ( θ

1

− θ

2

) a

2 e

− iθ

2 + a

1 e iθ

1 + e

− iβ

2 e i ( θ

1

− θ

2

) a

2 e iθ

2 + a

1 e

− iθ

1

= 2 (ln a + iβ

1

) e i [ β

2

− ( θ

1

− θ

2

)]

+ (ln a − iβ

1

) e

− i [ β

2

− ( θ

1

− θ

2

)] where ln a + iβ

1 and ln a − iβ

1 are complex conjugates. The range of θ

1

− θ

2 could be obtained by studying its geometric meaning which is the angle between the two vectors ln a + i ( ω + β

1

) and ln a + i ( ω − β

1

). The range of θ

1

− θ

2 is given as follows:

− arctan

2 β

1

− ln a

≤ θ

1

− θ

2

≤ 0 or

2 π − 2 arctan

β

1

− ln a

≤ θ

1

− θ

2

≤ 2 π − arctan

2 β ln

1 a

.

Let us denote θ

3

= Arg(ln a + iβ

1

) and θ

3 could be expressed as θ

3

= π − arctan

β

1

− ln a

.

The range of β

2 is

− π ≤ β

2

− ( θ

1

− θ

2

) + θ

3

≤ −

π

2 or

π

2

≤ β

2

− ( θ

1

− θ

2

) + θ

3

≤ π and finally, the range of β

2 is given by

π

2

+ arctan

β

1 ln a

≤ β

2

≤ − arctan

β

1

− ln a or

3.2

Special Cases arctan

β

1

− ln a

≤ β

2

π

2

− arctan

β

1

− ln a

Definition 2.

A stationary stochastic process { X t

, t ∈

R

} is from CAR ( p ) model if there exist the following relation:

φ p

X t

+ φ p − 1

X t

0

+ . . .

+ φ

1

X t

( p − 1)

+ X t

( p )

= ω t

,

19

where X t

( i )

, i = 1 , 2 , . . . , p represent the ith derivative of X t

, { ω t

, t ∈

R

} is a continuous white noise process and φ

1

, φ

2

, . . . , φ p are real numbers.

The definition for a stationary stochastic process { X t

, t ∈

R

} from CAR (1) and

CAR (2) could be easily obtained. Here we show the correlation functions of CAR (1) and

CAR

(2) without calculation. One can find the details from Box and Jenkins [ 2 ], Chan and

Tong [ 3 ] and Ma [ 7 ].

Proposition 3.

The correlation function of a CAR (1) model is r ( t ) = e

| t | ln α

= α

| t |

, t ∈

R

, where 0 < α < 1 is determined by the coefficient of the model.

Proposition 4.

The correlation function of a CAR (2) model belongs to one of the following three types r ( t ) = ln α

2

α

| t |

1

+ln α

1 ln α

2

− ln α

1

α

| t |

2 , r ( t ) = (1 − ln α | t | ) α

| t |

,

0 < α

1

< α

2

0 < α < 1 ,

< 1 , r ( t ) = cos( β | t | ) − ln αβ

− 1 sin( β | t | ) α

| t |

, 0 < α < 1 , β > 0 .

Because the above correlation functions of CAR (1) and CAR (2) are all special cases of the correlation functions of our general stationary stochastic process model, it is natural to claim CAR (1) and CAR (2) are special cases of our stationary stochastic processes.

3.3

Embedding

Now let us extend the discussion. The embedding problem is: when we have a discrete stationary time series { X t

, t ∈

Z

} whose correlation function is from one of our discrete stationary time series models, can we find a stationary stochastic process { Y t

, t ∈

R

} such that Y t

= X t when t ∈

Z

?

For the first kind of correlation functions which have two distinct real numbers α

1

, α

2

, we could find the following relation between the ranges of θ with respect to discrete time series and continuous stochastic process cases

1

1 − b

≤ ln α

2 ln α

2

− ln α

1

≤ θ ≤ ln α

1 ln α

1

− ln α

2

1

1 − a

.

20

It means if we have a stationary stochastic process { Y t

, t ∈

R

} we could always claim there is a discrete stationary time series { X t

, t ∈

Z

} with the property Y t

= X t

, ∀ t ∈

Z by simply changing the index domain. But if we are given { X t

, t ∈

Z

} first, even with 0 < α

1

< α

2

< 1, we are not so sure that there exist a stationary stochastic process { Y t

, t ∈

R

} satisfies our requirement. The reason is although a function of the first kind is positive definite in the discrete case, it may not be positive definite on the real line.

For the other two cases, which are the second and third kinds of correlation functions, the discussion is almost the same. We list the inequalities below and it is clear that their relations between the discrete cases and the corresponding continuous cases are the same as that in the first kind. For the second kind of correlation functions we have

(1 − α 2 )

2 α

≤ ln α ≤ θ ≤ − ln α ≤

(1 − α 2 )

2 α and for the third kind of correlation functions we have

− π + arccot a sin( β

1

)

1 − a 2

≤ −

π

2

β

1

+ arctan(

− ln a

) ≤ β

2

β

1

≤ − arctan(

− ln a

) ≤ 0 or

0 ≤ arctan(

β

1

− ln a

) ≤ β

2

π

2

− arctan(

β

1

− ln a

) ≤ arccot − a sin( β

1

)

1 − a 2

.

21

CHAPTER 4

EXACT MAXIMUM LIKELIHOOD ESTIMATION

4.1

Likelihood Function

Before showing the likelihood function of our model we introduce some prerequisite conditions. We assume the stationary stochastic process { X t

, t ∈

R

} has the finitedimensional Gaussian distribution and the elements of its correlation matrix Σ have the

structure ( 3.1

). Then the likelihood function of a vector

X = ( x t

1

, x t

2

, . . . , x t n

)

0

, where x t i is an observation of { X t

} at time point t i

, i ∈ 1 , 2 , . . . , n , is

= ln l ( α

1

, α

2

, θ, σ

2

) = ln f ( X, σ

2

1

Σ) = ln [

(2 πσ 2 ) n

2

| Σ |

1

2

− n

2 ln (2 πσ

2

) −

1

2 ln | Σ | −

1

2 σ 2

X

0

Σ

− 1

X.

1 exp −

2 σ 2

X

0

Σ

− 1

X ]

To find the maximum of the likelihood function above is the same as finding the minimum of the function n ln ( σ

2

) + ln | Σ | +

1

σ 2

X

0

Σ

− 1

X.

Some prerequisite knowledge to obtain the minimum of this function is introduced in the next two sections.

4.2

Several Forms of Σ

− 1

Proposition 5.

Given two nonsingular matrices A , B ∈

R n × n , if their addition

A + B is also nonsingular, then we have

( A + B )

− 1

= A

− 1

( A

− 1

+ B

− 1

)

− 1

B

− 1

= A

− 1 − A

− 1

( A

− 1

+ B

− 1

)

− 1

A

− 1

= B

− 1 − B

− 1

( A

− 1

+ B

− 1

)

− 1

B

− 1

.

(4.1)

(4.2)

(4.3)

Readers can verify the proposition by checking the multiplication of matrices equals the

identity matrix or refer to Schott [ 13 ].

22

The correlation matrix of the column vector X = ( x t

1

, x t

2

, . . . , x t n

)

0 is

Σ = θ A

1

+ (1 − θ ) A

2 where and

A

1

1

=

α t

2

1

− t

1

α t

3

− t

1

1

 .

..

α t n

− t

1

1

α t

2

− t

1

1

1

α t

3

− t

2

1

.

..

α

1 t n

− t

2

α t

3

− t

1

1

. . . α t n

− t

1

1

α t

3

1

− t

2 . . . α t n

1

− t

2

.

..

1 . . . α t n

− t

3

1

. ..

.

..

α t n

− t

3

1

. . .

1

A

2

=

1 α

2 t

2

− t

1

α t

2

2

− t

1 1

α t

3

2

 .

..

− t

1

α

2 t

3

.

..

− t

2

α t n

2

− t

1 α

2 t n

− t

2

α

2 t

3

− t

1

α t

3

2

− t

2

. . . α

2 t n

− t

1

. . . α t n

− t

2

2

.

..

1 . . . α

2 t n

− t

3

. ..

.

..

.

α

2 t n

− t

3 . . .

1

So we have

Σ

− 1

= [ θ A

1

+ (1 − θ ) A

2

]

− 1

= A

1

− 1

[(1 − θ ) A

1

− 1

+ θ A

2

− 1

]

− 1

A

2

− 1

(4.4)

= ( θ A

1

)

− 1 { θ A

1

− [( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1 } ( θ A

1

)

− 1

(4.5)

= ((1 − θ ) A

2

)

− 1 { (1 − θ ) A

2

− [( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1 } ((1 − θ ) A

2

)

− 1

.

(4.6)

4.3

Explicit Inverses of A

1 and A

2

To find the Σ

− 1 we need to figure out the inverses of A

1

, A

2 and the inverse of

( θ A

1

)

− 1 + ((1 − θ ) A

2

)

− 1 . Now let us first take a look at what A

1

− 1 is.

As we have stated earlier A

1 can be viewed as the correlation matrix of a model like

DAR (1), which we define it as

X t

2

= α t

2

1

− t

1 X t

1

+ ε t

2

23

where { X t

, t ∈

R

} is a stochastic process with t

1

< t

2

< . . .

which is not stationary and

{ ε t

, t ∈

R

} is a Gaussian white noise with ε t i independent of X t for all t < t i

(note: here we assume the variances of ε t are different at different time points).

The joint pdf of column vector X = ( x t

1

, x t

2

, . . . , x t n

)

0

, t

1

< t

2

< . . . < t n with covariance matrix σ 2 A

1 is f ( X, σ

2

A

1

) =

1

(2 πσ 2 ) n

2

| A

1

|

1

2

1 exp ( −

2 σ 2

X

0

A

1

− 1

X ) .

We denote the covariance matrix of the vector X

1

= ( x t

1

, ε t

2

, . . . , ε t n

)

0 as Σ

1

, var ( x t i

) =

σ 2 , for all i ∈ { 1 , 2 , . . . , n } and var ( ε t i

) = σ 2

ε ti

, for all i ∈ { 1 , 2 , . . . , n } . Because all the components of X

1 are independent with each other, the covariance matrix Σ

1 has the form

Σ

1

σ 2 0 0 . . .

0

=

0 σ 2

ε t

2

0 . . .

0

0 0 σ 2

ε t

.

..

.

..

.

..

0 0

3

. . .

. ..

0

..

.

0 . . . σ 2

ε tn

 and the joint pdf of the vector X

1 is f ( X

1

, Σ

1

) =

1

(2 π ) n

2

| Σ

1

|

1

2 exp ( −

1

2

X

1

0

Σ

− 1

1

X

1

) .

From the model we could get the relations between σ 2 and each σ 2

ε ti as follows:

σ

2

= var ( x t i

) = var ( α

1 t i

− t i − 1 x t i − 1

+ ε t i

)

= α

1

2( t i

− t i − 1

) var ( x t i − 1

) + var ( ε t i

)

= α

2( t i

− t i − 1

)

1

σ

2

+ σ

2

ε ti

.

Thus, we have

σ

2

ε ti

= (1 − α

1

2( t i

− t i − 1

)

) σ

2

.

By writing the expression of the transformation from the vector X

1

= ( x t

1

, ε t

2

, . . . , ε t n

)

0

24

to the vector X = ( x t

1

, x t

2

, . . . , x t n

)

0

, we claim it is a one-to-one transformation because

 x t

1

 x t

2

 x t

3

.

..

 x t n

1

=

α t

2

1

− t

1

α

1 t

3

.

..

− t

1

α

1 t n

− t

1

0

1

α

1 t

3

.

..

− t

2

α

1 t n

− t

2 α

1

0 . . .

0

  x t

1

0 . . .

0 

ε t

2

 1 . . .

0 

.

..

. .. ...

ε t

3

.

..

 t n

− t

3 . . .

1

 

ε t n

 and the Jacobian determinant of the transform matrix is 1, thus obviously nonsingular.

We can expand the joint pdf of the vector X

1 as f ( X

1

, Σ

1

) =

1

(2 π ) n

2

| Σ

1

|

1

2 exp [ −

1

2

1

(

σ 2 x

2 t

1

+

1

σ 2

ε t

2

ε

2 t

2

+ . . .

+

1

σ 2

ε tn

ε

2 t n

)] .

To find the joint pdf of the vector X from the joint pdf of the vector X

1

, we just need to apply the one to one transformation, which is the same as the following substitution:

σ

2

ε ti

= (1 − α

1

2( t i

− t i − 1

)

) σ

2 and

ε t i

= x t i

− α

1 t i

− t i − 1 x t i − 1 for any i ∈ { 2 , 3 , . . . , n } . Then we get f ( X, σ

2

A

1

) =

(2 π ) n

2

1

| Σ

1

|

1

2 exp {−

1

2 σ 2

[ x

2 t

1

+

1

1 − α

1

2( t

2

− t

1

)

( x t

2

− α

1 t

2

− t

1 x t

1

)

2

+ . . .

+

1

1 − α

1

2( t n

− t n − 1

)

( x t n

− α t n

− t n − 1

1 x t n − 1

)

2

] }

=

(2 π ) n

2

1

| Σ

1

|

1

2

1 exp {−

2 σ 2

1

[

1 − α

1

2( t

2

− t

1

) x

2 t

1

1 − α

2( t

3

− t

1

)

1

+

(1 − α

1

2( t

2

− t

1

)

)(1 − α

2( t

3

1

− t

2

)

) x

2 t

2

+ . . .

+

2 α t

2

1

− t

1

1 − α

1

2( t

2

− t

1

) x t

1 x t

2

1

1 − α

1

2( t n

− t n − 1

) x

2 t n

] } , which is another form of f ( X, σ

2

A

1

) =

1

(2 πσ 2 ) n

2

| A

1

|

1

2

1 exp ( −

2 σ 2

X

0

A

1

− 1

X ) .

(4.7)

(4.8)

25

Now by comparing ( 4.7

) and ( 4.8

) we conclude that

| Σ

1

| = | σ 2 A

1

| and the explicit form of A

1

− 1 is

A

1

− 1

=

1

1 − α

1 −

2( t

2

1

α

α t

1

2

2(

1

− t

1 t

2 t

1)

− t

1)

0

..

.

0

(1 − α

2( t

2

1

1 − α

1

− t

1)

1 −

α

α

α t

2

1

1

− t

1

1 − α

2( t

1

2( t

3

2

− t

1)

− t

1)

.

..

)(1 − α

2( t

3

1 t

3

1

− t

2

2( t

0

3

− t

2)

− t

2)

)

0

α t

3

1

− t

2

(1 − α

2( t

3

1

1 −

1 − α

− t

α

2( t

1

2( t

4

1

3

− t

2)

− t

2)

2)

)(1 − α

2( t

4

1

.

..

0

− t

3)

)

. . .

. . .

. . .

. ..

. . .

0

0

0

..

.

.

1

1 − α

2( tn

1

− tn

− 1)

The elements of the tridiagonal matrix A

1

− 1 are given by the following formulas:

( A

1

− 1

) ii

=

( A

1

− 1

)

11

=

1 − α

2( ti

+1

− ti

− 1)

[1 − α

2( ti

1

1

− ti

− 1)

][1 − α

2( ti

+1

1

− ti )

]

,

1

1 − α

2( t

2

1

− t

1)

, i = 1 and n

( A

( A

1

− 1

1

− 1

) nn

)

= i,i +1

1 − α

1

2( tn − tn

− 1)

1

,

= ( A

1

− 1

) i +1 ,i

= −

1 −

α ti

+1

1

− ti

α

2( ti

+1

1

− ti )

, i ∈ { 1 , 2 , . . . , n − 1 } .

Similarly, the explicit form of A

− 1

2 can be obtained by applying the same procedure to A

2 and the results changes from α

1 to α

2 with the same formula types.

4.4

Equation System for Exact MLEs

Here we introduce another lemma, which is given in Fonseca [ 4 ], to help determine

the form of [ θ A

1

− 1

+ (1 − θ ) A

2

− 1

]

− 1 .

Lemma 7.

If a matrix T has the following form

T =

 a

1

 b

1

 a

.

..

0 b

2

.

..

1

 b

1 b a

0

.

..

2

3

. . .

. . .

. . .

. ..

0

0

0

..

.

0 0 0 . . . b n − 1 a

0

0

0

.

..

n

 where we define b

0

= 1, then the ij th element of its inverse, which is denoted as ( T

− 1

) ij

, can be written as

( T

− 1

) ij

= ( − 1) i + j b min ( i,j )

. . . b max ( i,j ) − 1

γ min ( i,j ) − 1

δ max ( i,j )+1

γ n

26

where γ i and δ i satisfy the following recurrence

γ i

= a i

γ i − 1

− b

2 i − 1

γ i − 2

,

δ i

= a i

δ i +1

− b

2 i

δ i +2

.

Since we have already shown that θ A

1

− 1

+ (1 − θ ) A

2

− 1 is a symmetric tridiagonal matrix, the ijth element of [ θ A

1

− 1

+ (1 − θ ) A

2

− 1

]

− 1 , which we denote as η ij

, can be obtained by applying Lemma 7.

We will use ( 4.5

) to calculate the elements of Σ

− 1 . First, the ij th element of θ A

1

[ θ A

1

− 1

+ (1 − θ ) A

2

− 1

]

− 1 is

ζ ij

= θα

1

| t i

− t j

|

− η ij

.

Second, let us make a change of our matrices to make the notation of our result simpler. Let

0

B =

0 n

0

0

0 n

θ A

1

− [ θ A

1

− 1

+ (1 − θ ) A

2

− 1

]

− 1

0

0 n

0

0 n

0

 where 0 n represents a column vector in

R n consists of all zeros; hence B is a matrix in

R

( n +2) × ( n +2) and we expand the index of ζ ij to i, j ∈ { 0 , 1 , 2 , . . . , n, n + 1 } with ζ

0 j

= ζ i 0

=

ζ n +1 ,j

= ζ i,n +1

= 0. Let

B

1

= 0 n

( θ A

1

)

− 1 0 n which is a matrix in

R n × ( n +2)

.

Denote the diagonal elements of θ A

1

− 1 as n c i

, i ∈ { 1 , 2 , . . . , n } o and the subdiagonal elements as n d i

, i ∈ { 1 , 2 , . . . , n − 1 } o

. We expand the index of { d i

} to i ∈ { 0 , 1 , 2 , . . . , n, n +

1 } with d

0

= d n +1

= 0. Then we have

Σ

− 1

= ( θ A

1

)

− 1 { θ A

1

− [( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1 } ( θ A

1

)

− 1

= B

1

BB

1

0

.

27

Denoting the ij th element of Σ

− 1 as σ ij , we can express σ ij as follows:

σ ij

= (Σ

− 1

) ij

= d i − 1 c i d i +1

ζ i − 1 ,j − 1

ζ i,j − 1

ζ i +1 ,j − 1

ζ i − 1 ,j

ζ ij

ζ i +1 ,j

ζ i − 1 ,j +1

  d j − 1

ζ i,j +1

 c j

ζ i +1 ,j +1

  d j +1

 for any i, j ∈ { 1 , 2 , . . . , n } .

Now we express the log likelihood function as ln l ( α

1

, α

2

, θ, σ

2

) = − n

2 ln (2 πσ

2

) −

1

2 ln | Σ | −

1

2 σ 2 n

X n

X

σ ij x t i x t j

.

i =1 j =1

(4.9)

Because σ ij and | Σ | do not depend on σ 2

, we let the partial derivative of ( 4.9

) with respect

to σ 2 equal 0 and solve the equation to find The MLE for σ 2 , which is

2

=

1 n n

X n

X

σ ij x t i x t j

.

i =1 j =1

(4.10)

To find the MLEs for θ, α

1 and α

2 we need the following lemma which is given in Plackett

[ 12 ].

Lemma 8.

If f ( X, Q ) is the multivariate normal density with mean zero and positive definite covariance matrix Q = ( σ ij

) n × n and f ( X, Q ) =

1

(2 π ) n

2

| Q |

1

2 exp {−

1

2

X

0

Q

− 1

X } , then for i, j ∈ { 1 , 2 , . . . , n } we have

∂σ ij f ( X, Q ) =

1

2

∂x

2 i f ( X, Q ) i = j

∂x i

∂x j f ( X, Q ) i = j.

Let Q = σ 2 Σ, f ( X, Q ) = l ( α

1

, α

2

, θ, σ 2 ) For α = α

1

, α = α

2 and α = θ using the

28

Lemma 8 we get

∂α ln l =

X l

− 1

∂σ ij i ≤ i

∂l

·

∂σ

∂α

=

X l

− 1

∂x t i

∂x t j j<i

∂l 2

· ij

∂σ

∂α ij

+

1

2

X l

− 1

∂x t i

2 j =1

∂l 2

·

∂σ ii

∂α

= σ

− 2 n

X

{ (

X n

σ il x t l

) · (

X

σ jl x t l

) − σ ij

σ

2 }

∂ [ θα

1

| t i

− t j

|

+ (1 − θ ) α

2

| t i

− t j

|

]

∂α j<i l =1 l =1

= σ

− 2 n

X

{ (

X n

σ il x t l

) · (

X

σ jl x t l

) − σ ij

σ

2

}

∂ [ θα

1

| t i

− t j

|

+ (1 − θ ) α

2

| t i

− t j

|

]

∂α j<i l =1 l =1 from this, the three likelihood equations become

σ

− 2 n

X

{ (

X n

σ il x t l

) · (

X

σ jl x t l

) − σ ij

σ

2 }

∂ [ θα

1

| t i

− t j

|

+ (1 − θ ) α

2

| t i

− t j

|

]

∂α

1 j<i l =1 l =1

= σ

− 2 n

X

{ (

X n

σ il x t l

) · (

X

σ jl x t l

) − σ ij

σ

2 } θ ( t i

− t j

) α

1 t i

− t j

− 1 j<i l =1 l =1

= 0 , (4.11)

σ

− 2 n

X

{ (

X n

σ il x t l

) · (

X

σ jl x t l

) − σ ij

σ

2 }

∂ [ θα

1

| t i

− t j

|

+ (1 − θ ) α

2

| t i

− t j

|

]

∂α

2 j<i l =1 l =1

= σ

− 2 n

X

{ (

X n

σ il x t l

) · (

X

σ jl x t l

) − σ ij

σ

2

} (1 − θ )( t i

− t j

) α

2 t i

− t j

− 1

= 0 j<i l =1 l =1

(4.12) and

σ

− 2 n

X

{ (

X n

σ il x t l

) · (

X

σ jl x t l

) − σ ij

σ

2 }

∂ [ θα

| t i

− t j

|

1

+ (1 − θ ) α

2

| t i

− t j

|

]

∂θ j<i l =1 l =1

= σ

− 2 n

X

{ (

X n

σ il x t l

) · (

X

σ jl x t l

) − σ ij

σ

2 } ( α

1 t i

− t j j<i l =1 l =1

− α

2 t i

− t j ) = 0 (4.13)

By substituting equation ( 4.10

) into the above equations ( 4.11

), ( 4.12

) and ( 4.13

), we

get a nonlinear equation system about α

1

, α

2 and θ . The MLEs ˆ

1

, ˆ

2 and ˆ are obtained by solving these equations simultaneously and numerically.

29

CHAPTER 5

NUMERICAL SOLUTIONS AND PREDICTIONS

In this chapter we will describe the way to put the idea of estimating the parameters in the model by its MLEs into practice and work on both the simulated and real data.

The basic idea to find the MLEs numerically is simple. First, we perform initial analysis on the original data and take the first difference of the sequence to approximate stationarity. Second, we try to locate the global maximum of the likelihood function by dividing the range of α

1

, α

2 and θ into cubes and comparing the values of the function at the points in different cubes. To achieve a relatively small cube we need to perform the procedure many times. Then we assign a point in the obtained cube as a starting point of our iterations and use the deepest gradient to find the estimators within a preset error.

Finally, we use the krigging method for the predictions.

5.1

Numerical Algorithm for The Solutions

Although the closed form of the nonlinear equation system is obtained in last chapter, it is not convenient for calculation. Thus we develop a way to find the estimation of parameters in the model numerically.

Proposition 6.

If Σ is a nonsingular matrix with each element a function of some parameters including α , then

∂ Σ

− 1

∂α

= − Σ

− 1

∂ Σ

Σ

− 1

.

∂α

Proof.

For the identity matrix I = ΣΣ

− 1

, we take the first partial derivative of α on both sides

∂ I

0 =

∂α

=

∂ Σ

Σ

− 1

∂α

+ Σ

∂ Σ

− 1

∂α

,

30

then we reach the conclusion

∂ Σ

− 1

∂α

= − Σ

− 1

∂ Σ

Σ

− 1

.

∂α

Now let us find the partial derivative of the likelihood function with respect to σ 2 , θ, α

1 and α

2 respectively. The partial derivative of the likelihood function with respect to σ 2 is

( n ln σ

2

∂σ 2

1

= n

σ 2

1

= n

σ 2

σ

1

σ 4

1

4

+ ln | Σ | +

X

[( θ

0

Σ

− 1

A

1

X

)

− 1

σ

1

2

X ]

0

X

0

Σ

− 1

X )

[( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1

[( θ A

1

)

− 1

X ] which leads to the same estimation for σ 2

as ( 4.10

).

The partial derivative of σ 2 gives us an explicit form of the MLE, so we do not consider

σ 2 as one dimension when we divide the range of parameters. The partial derivative of the likelihood function with respect to θ, α

1 and α

2 are

∂θ

( n ln σ

2

= −

∂θ ln |

+ ln

A

1

|

− 1

Σ | +

1

σ 2

X

0

Σ

− 1

X )

[(1 − θ ) A

1

− 1

+ θ A

2

− 1

]

− 1

A

2

− 1

| +

=

=

=

1

σ 2

X

0

(

∂θ

Σ

− 1

) X

∂ ln | (1 − θ ) A

1

− 1

+ θ A

2

− 1 | +

1

σ 2

X

0

∂θ

[

1

A

θ

1

− 1 − A

1

− 1

( θ A

1

− 1

+

∂θ

∂ ln | (1 − θ ) A

1

− 1

+ θ A

2

− 1

|

∂θ

1

σ 2

1

[

θ 2

X

0

A

1

− 1

X + ( A

1

− 1

X )

0

∂θ ln | (1 − θ ) A

1

− 1

+ θ A

2

− 1

∂θ

| −

( θ A

1

− 1

+

θ 2

1 − θ

A

2

− 1

)

− 1

( A

1

− 1

X )]

1

( σθ ) 2

X

0

A

1

− 1

X

θ 2

1 − θ

A

2

− 1

)

− 1

A

1

− 1

] X

+

1

σ 2

( A

1

− 1

X )

0

( θ A

1

− 1

+

θ

2

1 − θ

A

2

− 1

)

− 1

[ A

1

− 1

+

2 θ − θ

2

(1 − θ ) 2

A

2

− 1

]( θ A

1

− 1

+

θ

2

1 − θ

A

2

− 1

)

− 1

( A

1

− 1

X ) ,

31

and

∂α

1

( n ln ( σ

2

) + ln | Σ | +

= −

∂α

1

= −

∂α

1 ln | A ln | A

1

1

− 1

− 1

1

σ 2

X

0

Σ

− 1

X )

[(1 − θ ) A

1

| +

∂α

1 ln |

− 1

(1

+

θ

θ

A

)

2

− 1

A

1

]

− 1

− 1

+

A

θ

2

− 1

| +

A

2

− 1 |

1

σ 2

X

0

(

∂α

1

Σ

− 1

) X

1

σ 2

= −

∂α

1

[((1 − θ ) A

2

)

− 1

X ]

0

(

∂α

1 ln | A

1

− 1 | +

∂α

1

[( θ A

1

)

− 1 ln | (1 − θ ) A

1

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1

)[((1 − θ ) A

2

)

− 1

X ]

+ θ A

2

− 1 |

+

1

σ 2

[((1 − θ ) A

2

)

− 1

X ]

0

{ [( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1

∂α

1

[( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

[( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1 } [((1 − θ ) A

2

)

− 1

X ]

= −

∂α

1 ln | A

1

− 1

| +

∂α

1 ln | (1 − θ ) A

1

− 1

+ θ A

2

− 1

|

+

1

σ 2

[((1 − θ ) A

2

)

− 1

X ]

0

[( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1

1

θ

∂ A

1

− 1

∂α

1

[( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1 } [((1 − θ ) A

2

)

− 1

X ]

∂α

2

( n ln ( σ

2

) + ln | Σ | +

1

σ 2

X

0

Σ

− 1

X )

= −

∂α

2

= ln | A

1

− 1

[(1 − θ ) A

1

− 1

+ θ A

2

− 1

]

− 1

A

2

− 1

| +

1

σ 2

X

0

(

∂α

2

Σ

− 1

) X

∂α

2

1

σ 2 ln | (1 − θ ) A

1

− 1

[( θ A

1

)

− 1

X ]

0

(

∂α

+

2

θ

[(

A

θ

2

− 1

A

1

)

| −

− 1

∂α

2 ln | A

2

− 1 |

+ ((1 − θ ) A

2

)

− 1

]

− 1

)[( θ A

1

)

− 1

X ]

=

∂α

2

1

+

σ 2 ln | (1 − θ ) A

1

− 1

[( θ A

1

)

− 1

X ]

0

+

{ [( θ A

θ

1

A

2

)

− 1

− 1

| −

+ ((1

∂α

2

θ ) ln

A

|

2

A

2

− 1

)

− 1

|

]

− 1

∂α

2

[( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

[( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1 } [( θ A

1

)

− 1

X ]

=

∂α

+

2

1

σ 2 ln | (1 − θ ) A

[( θ A

1

)

− 1

X ]

0

1

− 1

[( θ

+ θ A

2

− 1

| −

A

1

)

− 1

∂ ln | A

2

− 1

|

∂α

2

+ ((1 − θ ) A

2

)

− 1

]

− 1

1

1 − θ

∂ A

2

− 1

∂α

2

[( θ A

1

)

− 1

+ ((1 − θ ) A

2

)

− 1

]

− 1

} [( θ A

1

)

− 1

X ] .

Letting the four partial derivatives equal 0, we get the equation system for the MLEs.

32

The MLEs obtained through this equation system coincide with the MLEs obtained in last chapter.

The calculation is not very time-consuming because all A

1

− 1

, A

2

− 1 are symmetric tridiagonal matrices. To prove this is equivalent to show the calculation contains no steps to compute two matrices multiplication even if the matrices are both sparse.

First let us consider how to evaluate the partial derivatives of ln | Σ | with respect to the parameters at a given point in the parameter space. We will take

∂θ ln | Σ | for example.

Since A

1

− 1

, A

2

− 1 are symmetric tridiagonal matrices, A = θ A

1

− 1

+ (1 − θ ) A

2

− 1 is a symmetric tridiagonal matrix. Suppose its LU decomposition is matrices L and U . Vector l is the subdiagonal vector of matrix L . Vectors u and e are the diagonal and subdiagonal vectors of matrix U respectively. Thus l and e are in

R

( n − 1) × 1 and u is in

R n × 1 . Let u ( i ) be the ith component of vector u , i ∈ { 1 , 2 , . . . , n } and l ( j ) , e ( j ) be the jth component of vector l and e , j ∈ { 1 , 2 , . . . , n − 1 } respectively. So the matrices are

1 0 0 . . .

0 0

L =

 l (1) 1 0 . . .

0

..

.

0 l (2) 1

..

.

..

.

. . .

. ..

0

0

.

..

0 

0 

.

..

0 0 . . .

l ( n − 1) 1

 and

U =

 u (1) e (1) 0 . . .

0 0 0

0

0

.

..

0

0 u (2)

0

..

.

0

0 e u

(2)

(3)

..

.

. . .

. . .

. .. ...

0

0

0

0

..

.

0

0

..

.

0 . . .

0 u ( n − 1) e ( n − 1) 

0 . . .

0 0 u ( n )

.

33

Then we have the iteration u (1) = A

11 e ( i ) l ( i ) = u ( i ) u ( i + 1) = A i +1 ,i +1

− l ( i ) e ( i ) i ∈ { 1 , 2 , . . . , n − 1 } where A ii represents the ith element on the diagonal of A = θ A

1

− 1

+ (1 − θ ) A

2

− 1

.

After applying the LU decomposition for A we have

∂θ

∂ ln | Σ | = −

∂θ

= ln | A

1

− 1

[(1 − θ ) A

1 ln | (1 − θ ) A

1

− 1

+ θ A

− 1

2

− 1

∂θ

+ θ A

2

− 1

]

− 1

|

= ln | LU |

∂θ n

=

X ln u ( i )

∂θ i =1

A

2

− 1

| n

X

=

1 u ( i )

∂ u ( i )

.

∂θ i =1

(5.1)

Because we have already shown the formulas for diagonal and subdiagonal elements in A

1 and A

2 it will be easy to find the formulas for diagonal and subdiagonal elements in A . By matrix multiplication, e is not only the subdiagonal vector of matrix U but also the subdiagonal vector of matrix A . The derivative of u ( i ) can be obtained through the following iteration:

∂ u (1)

∂ l

∂θ

(

∂θ i )

=

∂ A

11

=

1

∂θ u ( i ) 2

[ u ( i )

∂ e ( i )

∂θ

− e ( i )

∂ u ( i )

∂θ

]

∂ u ( i + 1)

∂θ

=

∂ A i +1 ,i +1

∂θ

− e ( i )

∂ l ( i )

∂θ

− l ( i )

∂ e ( i )

∂θ

, i ∈ { 1 , 2 , . . . , n − 1 } .

By this iteration, the vector u is determined for any given parameters. Thus the

∂θ ln | Σ | is

determined through ( 5.1

). For further explanation one could refer to Loan [ 5 ].

Now let us consider how to compute the partial derivatives of X

0

Σ

− 1

X . We also take

∂θ

X

0

Σ

− 1

X for example.

34

This part is much easier theoretically. After applying Proposition 4, we simplify the

∂θ

X

0

Σ

− 1 X into two parts. Although the two parts seem quite different, the calculation actually turns to be similar. Based on the symmetric tridiagonal form of A

1 and A

2

, in order to get A

− 1

1

X we only need 3 n calculations per step.

To find the value of the expression

( θ A

1

− 1

+

θ

2

1 − θ

A

2

− 1

)

− 1

( A

1

− 1

X ) , we do not compute the inverse of θ A

1

− 1

+

θ

2

1 − θ

A

2

− 1 directly but use the method of solving linear equation system to obtain the answer. We suppose

Y = ( θ A

1

− 1

+

θ 2

1 − θ

A

2

− 1

)

− 1

( A

1

− 1

X ) which change the problem to find the unique solution to

( θ A

1

− 1

+

θ

2

1 − θ

A

2

− 1

) Y = A

1

− 1

X.

Since we have already shown the iteration procedure to find the LU decomposition for a symmetric tridiagonal matrix, it is not hard to obtain the matrices L and U . We again suppose

L Z = A

1

− 1

X,

U Y = Z, then Y is obtained by solving the two linear equation systems in sequence. By another 3 n multiplications we get the vector [ A

1

− 1

+

2 θ − θ

2

(1 − θ ) 2

A

2

− 1

] Y and the entire second part of the partial derivative.

After finishing the above discussions of how to perform the calculation of those partial derivatives we will take a look at how well our algorithm works. We will use the model to fit both simulated and real data.

35

Figure 5.1:

This graph shows the predicted data with the estimated parameters ˆ

1

= 0 .

2881 , ˆ

2

= 0 .

3220 ,

ˆ

= − 6 .

6702.

The upper and lower black lines give the 90% confidence bands of predicted data. The red line between them is the predicted data and the green line with asterisks is the original temperature generated by the given DAR(2) model.

5.2

Fitting Simulated Data

The simulated data comes from a given DAR (2) model. We analyze the complete generated data and part of the data respectively to illustrate the usefulness of our model.

We first establish a DAR2 model with the given parameters x t

− φ

1 x t − 1

− φ

2 x t − 2

= ω t

, t ∈

Z where φ

1

= 0 .

6, φ

2

= − 0 .

08 and { ω t

, t ∈

Z

} is Gaussian white noise with mean 0 and variance 1. We generate a random sample of size 3000 using this DAR (2) model. Through the calculation we figure out that the model has corresponding α

1

= 0 .

2 , α

2

= 0 .

4 and

θ = − 0 .

˙7. We use the first 2000 observations to find the estimation and then use those

MLEs to predict the data from 2001 to 3000. The graph comparing the prediction with the original random sample from the 2001 to 2100 observations is shown in Figure 5.1.

36

Figure 5.2:

This graph shows the predicted data with the estimated parameters ˆ

1

= 0 .

2 , ˆ

2

= 0 .

4 ,

ˆ

= − 0 .

7778. The two black dashed lines give the 90% confidence bands of predicted data. The blue line between them is the predicted data and the green line with asterisks is the original temperature generated by the given DAR(2) model.

To illustrate the estimation is good enough we make a comparison with the prediction using the MLEs exactly identical to the original parameters. The graph is shown in Figure

5.2.

37

Figure 5.3:

This graph shows the predicted data with the estimated parameters ˆ

1

= 0 .

2825 , ˆ

2

= 0 .

3198 ,

ˆ

= − 6 .

0331.

The upper and lower black lines give the 90% confidence bands of predicted data. The red line between them is the predicted data and the green line with asterisks is the original temperature generated by the given DAR(2) model.

Because our model was developed to deal with the missing data problem we test its efficiency by using the incomplete simulated data, which could be viewed as missing data, from the same DAR (2) model. To obtain the missing data we delete some observations in the first 2000 observations randomly. After deletion 1701 observations are remaining. We use these 1701 observations to do the estimation and predict the rest from 2001 to 3000.

The comparison of the prediction and the original data set is shown in Figure 5.3. From the comparison of Figure 5.1 and Figure 5.3 we claim the missing data does not play an important role in parameter estimation. The effect of those missing data is just a reduction of the sample size and thus may slightly affect the accuracy of the estimation. To verify the above claim another experiment is executed. A sample of size 10000 is generated by the same DAR (2) model. We estimate the parameters by using the complete data and incomplete data respectively with the size of incomplete data far larger than 3000. The

38

Figure 5.4:

This graph shows the original temperature data set.

estimations are almost the same which illustrate that the missing data is not a problem with our computation.

5.3

Fitting Wichita Temperature Data

Now we are going to use the model to fit the real data, the daily temperature of

Wichita since 1973 to 2010. The data is found from the National Climate Data Center’s online system and the observations are all chosen from one meteorological observation station located at Wichita Mid-Continent Airport. By looking at the observed data in Figure 5.4, it has some data abnormally observed and there are also some big “gaps” scattering among the period.

39

Figure 5.5:

This graph shows the temperature data set after being operated.

Considering the operated temperature data set with 10354 observations, which is shown in Figure 5.5 and seems stationary, we use the first 10000 observations to find the

MLEs of those parameters and then make predictions for the rest. We get the picture comparing the prediction with the original random sample, as shown in Figure 5.6.

From Figure 5.6 the predictions seem not as satisfactory as in fitting the simulated data from the given DAR(2) model because the variance of the real data is much bigger and there are some observations under extreme weather conditions standing out of our predicted bands.

40

Figure 5.6:

This graph shows the predicted data with the estimated parameters ˆ

1

≈ 0 .

3002 , ˆ

2

≈ 0 .

3512 ,

ˆ

≈ 9 .

0000. The upper and lower black lines give the 90% confidence bands of predicted data. The red line between them is the predicted data and the green line with asterisks is the original temperature generated by the given AR2 model.

41

CHAPTER 6

CONCLUSIONS

6.1

Conclusions

The first and the most important conclusion is that we developed some general models such that the correlation functions could be applied to both discrete time series and continuous-time stochastic process with the only change of the index domain. The models contain DAR(1), DAR(2) when the index domain is discrete and CAR(1), CAR(2) when the index domain is continuous. Second, we find a way to use our model to fit the complete and incomplete data regardless of their difference and present an algorithm to carry out the computation efficiently. Finally, we use our model to fit the real data.

6.2

Future Work

The main purposes were accomplished but many detail problems were found difficult and unsolved along the way.

The method to compare the values of likelihood function at different points seems lack of theoretical support but it is a straightforward idea and works well most of the time. To use the deepest gradient is an ideal method to find the maximum. Since our likelihood function may have local maximums and the method also heavily depends on the initial point, it does not show many advantages over the former. But once the region containing the global maximum is located, it could find the exact estimators quickly. When searching for the global maximum, we found the maximum converged slowly and the function approximates its maximum around the true point to the accuracy of machine . So we are unable to pinpoint the exact maximum, although it is unnecessary. To solve this problem the algorithm needs to be optimized.

Because of the special properties of the correlation structure, the calculation is executable. However, we still have no method to generalize our analysis to the correlation

42

structure with finitely many matrix additions. This sets an obstacle to extend our results to more general cases.

Another problem is how to deal with the real data. It is difficult to say the data is stationary after our operation. We may only say it approximates stationarity. Even if we could claim it is stationary, we are still not quite sure that the correlation structure suits our data. A better way to deal with the data from online system is still under development.

How to interpret the prediction is another issue. How does the model fit the real data?

What about those prediction against the original data? If we use exactly the same model to generate the data twice we would have a large difference, which is illustrated in Figure

5.2, then how can we say our model fits the real data well. Another problem is our MLEs may not be the best parameters when considering the predictions, which means we need to modified our estimators around the MLEs so that the prediction will be more conservative.

To give answers to all the above problems we still need to go further about the research.

43

REFERENCES

44

LIST OF REFERENCES

[1] C. F. Ansley, “An Algorithm for The Exact Likelihood of A Mixed Autoregressive-

Moving Average Process,” Biometrika , Vol.66, No.1, 1979, pp. 59-65.

[2] G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting And Control , 3rd ed, Prentice Hall, Upper Saddle River, NJ, 1994.

[3] K. S. Chan and H. Tong, “A Note on Embedding A Discrete Parameter ARMA Model in A Continuous Parameter ARMA Model,” Journal of Time Series Analysis , Vol. 8,

No. 3, 1987, pp. 277-281.

[4] C. M. da Fonseca and J. Petronilho, “Explicit inverses of some tridiagonal matrices,”

Linear Algebra and its Applications , vol.325, 2001, pp. 7-21.

[5] C. F. V. Loan, Introduction to Scientific Computing A Matrix-Vector Approach Using

Matlab

R

, Prentice Hall, Upper Saddle River, NJ, 1997.

[6] C. Ma, “Exact Maximum Likelihood Estimation of an ARMA(1,1) Model with Incomplete Data,” Journal of Time Series Analysis , vol. 23, 1999, pp. 49-56.

[7] C. Ma, “Long-Memory Continuous-Time Correlation Models,” Journal of Applied Probability , Vol. 40, No. 4, 2003, pp.1133-1146.

[8] P. Newbold, “The Exact Likelihood Function For A Mixed Autoregressive-Moving Average Process,” Biometrika , Vol. 61, No. 3, 1974, pp. 423-426.

[9] D. F. Nicholls and A. D. Hall, “The Exact Likelihood Function of Multivariate

Autoregressive-Moving Average Models,” Biometrika , Vol.66, No.2, 1979, pp. 259-264.

[10] J. Penzer and B. Shea, “The Exact Likelihood of An Autoregressive-Moving Average

Model With Incomplete Data,” Biometrika , Vol. 84, No. 4, 1997, pp. 919-928.

[11] M. S. Phadke and G. Kedem, “Computation of The Exact Likelihood Function of Multivariate Moving Average Models,” Biometrika , Vol. 65, No. 3, 1978, pp. 511-519.

[12] R. L. Plackett, “A Reduction Formula for Normal Multivariate Integrals,” Biometrika , vol 41, No 3/4, 1954, pp. 351-360.

[13] J. R. Schott, Matrix Analysis for Statistics , Wiley-Interscience, Hoboken, NJ, 2005.

[14] R. H. Shumway and D. S. Stoffer, Time Series Analysis and Its Applicaitons With R

Examples , Springer Science+Business Media, New York, NY, 2006.

45

LIST OF REFERENCES

[15] M. L. Stein, Interpolation of Spatial Data Some Theory for Kriging , Springer-Verlag,

New York, NY, 1999.

46

Download