Space/Time mapping – Part 3 The Simple Kriging limiting case Space/time mapping using second order moments and hard data: We will consider a special case of the general BME framework corresponding to the case where the general knowledge consists only of the mean trend and covariance function, and the site-specific data consist only in hard data. This case corresponds to the Simple Kriging method of classical Geostatistics. 22 1 The general knowledge base and prior pdf The general knowledge base G of the S/TRF X(p) consist of It’s mean trend mX(p)=E[X(p)]. It’s covariance cX(p, p’)= E[ (X(p)-mX(p)) (X(p’)-mX(p’)) ]. This general knowledge base is using the stochastic moment constraints h ( pmap ) dxmap g ( x map ) f G ( x map ; pmap ) , =1 …,Nc where the chosen g(Xmap) and corresponding h are as follow =0 gi (Xmap)= 1, h0 = 1 =i, i =1,.,n gi (Xmap)= Xi, hi = mX(pi) =(i,j) i =1,.,n; j =1,.,n gij (Xmap)= Xi Xj, hij =cx(pi,pj)+mX(pi)mX(pj) 23 Hence, the mathematical form of the maximum entropy pdf is given by n n n fG(xmap) = exp { o+ i x i + i j xi x j } i 1 j 1 i 1 where the 1+n+n2 lagrange coefficients 0 , i , i =1,…,n, and by solving the following set of 1+n+n2 equations =0 i =1,…,n j =1, 1,…,n are obtained n 1= dx map exp { o+ i x i + i j xi x j } i 1 =i, i =1, …,n n n ij , i 1 j 1 n n n mx(pi)= dx map xi exp { o+ i xi + i j xi x j } i 1 j 1 i 1 n n n =(i,j) i =1,…,n j =1, 1,…,n cx(pi,pj)+mx(pi)mx(pj)= dx map xi x j exp { o+ i x i + i j xi x j } i 1 i 1 j 1 We can actually solve the above set of equations to obtain values for the 1+n+n2 unknown parameters. Let us write the equation for the maximum entropy pdf using vector notation as follow T T fG(xmap) = exp { o+ x + x x } where x =[x1, …, x n] , =[1, …, n] and is a n by n matrix T with elements ij. The problem is to find the values for and . Let us define the 1 by n vector o =- 24 /2 and the n by n matrix D = - /2. Since o and D have the same size as and , and since -1 -1 there is a one-to-one relationship between them, then solving for o and D is the same as solving for -1 T -1 and . By rearranging the relationships we get = -D /2 and = o C , which when substituted in the prior pdf equation leads to fG(xmap) = exp { o+ x + x x } T -1 T -1 = exp { o+ (o D )x + x (-D /2) x } T -1 T -1 T -1 T -1 = exp o exp 0.5{ 2o D x - x D x + (o D o-o D o)} T -1 T -1 T -1 T -1 = exp {o+0.5 o D o} exp 0.5{ 2o D x - x D x - o D o} T -1 T -1 T -1 T -1 T -1 = exp {o+0.5 o D o} exp 0.5{ (o D x +x D o)- x D x - o D o} T -1 T -1 T -1 T -1 T -1 = exp {o+0.5 o D o} exp {-0.5( x D x -o D x+ o D o -x D o)} T -1 T T -1 T T -1 = exp {o+0.5 o D o} exp {-0.5( (x -o )D x+ (o -x )D o)} T -1 T T -1 T T -1 = exp {o+0.5 o D o} exp {-0.5( (x -o )D x- (x -o )D o)} T -1 T T -1 = exp {o+0.5 o D o} exp {-0.5( (x -o )D (x-o))} T Using this above equation for fG(xmap) we can rewrite the equations as T -1 T T -1 =0 1 = dχ map exp {o+0.5 o D o} exp {-0.5( (x -o )D (x-o))} 25 mx(pi)= dx map xi exp {o+0.5 o D o} exp {-0.5( (x -o )D (x-o))} =(i,j) i =1,…,n j =1, 1,…,n T -1 T T -1 cx(pi,pj)+mx(pi)mx(pj)= dx map xi x j exp {o+0.5 o D o} exp {-0.5( (x -o )D (x-o))} =i, i =1, …,n T -1 T T -1 We now use the following property of the multivariate Gaussian pdf with mean o and covariance D 1/ 2 D 1 T T -1 dx 2 n / 2 exp {-0.5( (x -o )D (x-o))}=1 Using this property for equation =0 we get 1/ 2 D 1 T -1 exp {o+0.5 o D o}= 2 n / 2 Using that expression we can write the remaining equations as 1/ 2 D 1 T T -1 =i, i =1, …,n mx(pi)= dx map xi n / 2 exp {-0.5( (x -o )D (x-o))} 2 =(i,j) i =1,…,n j =1, 1,…,n 1/ 2 D 1 T T -1 cx(pi,pj)+mx(pi)mx(pj)= dx map xi x j n / 2 exp {-0.5( (x -o )D (x-o))} 2 Using properties of the multivariate Gaussian pdf we get 26 =i, i =1, …,n mx(pi)= oi =(i,j) i =1,…,n j =1, 1,…,n cx(pi,pj)+mx(pi)mx(pj)= Dij+oioj or equivalently =i, i =1, …,n oi =mx(pi) =(i,j) i =1,…,n j =1, 1,…,n Dij =cx(pi,pj)+mx(pi)mx(pj)-oioj = cx(pi,pj)+oioj-oioj= cx(pi,pj) which can be written in vectorial form as o=m and D=C, where mi=mx(pi) i =1, …,n and Cij=cx(pi,pj), -1 T -1 i =1, …,n, j =1, …,n. From this is follows that =-C /2 , =m C , which is the solution of the set of equations that provides numerical values of the lagrange coefficients. Hence, by way of summary, the maximum entropy pdf given a general knowledge base G consisting the values of the mean trend mx(p) and covariance cx(p, p’) at points pi, i =1, …,n is given by T fG(xmap) = exp {o+ x + x x} 1/ 2 C 1 -1 T -1 T -1 where =-C /2 , =m C and o= ln( n / 2 )-0.5 m C m, which can equivalently be written as 2 the multivariate pdf 27 C 1 fG(xmap) = 1/ 2 1 T 1 exp ( x m ) C ( x m ) n/2 2 2 where the elements of m and C are mi= mx(pi) i =1, …,n and Cij=cx(pi,pj), i =1, …,n, j =1, …,n 28 2 Hard data and the posterior pdf χ T Let all the data be hard, xdata=xhard=[x1,xn-1] , so that xmap= hard where xk =xn . Then the χk posterior pdf is given by n fK(xk) = f G ( x hard , xk ) = f G ( x hard ) n n exp{ 0 i xi i j xi x j } i 1 n i 1 j 1 n n dxk exp{ 0 i xi i j xi x j } i 1 , i 1 j 1 m Since the prior pdf fG(xmap) is Gaussian with mean mmap= hard and covariance mk C hard, hard C hard, k Cmap= , and since the posterior pdf fK(xk)=fG(xkxhard) is its conditional pdf given C C k ,k k , hard the hard data, it follows from properties of Gaussian distributions that the posterior pdf is also Gaussian 29 Hence when the knowledge base consists of the mean trend mx(p), the covariance cx(p, p’), and the T hard data xhard=[x1,xn-1] of a S/TRF X(p), then the posterior pdf fK(xk) is univariate Gaussian, and from properties of Gaussian distributions we find that its mean mk|hard and variance Ck|hard are mk|hard= mk + Ck,hard Chard,hard-1 (xhard-mhard) Ck|hard= Ck,k - Ck,hard Chard,hard-1 Chard,k where mk = mx(pk) is a scalar, Ck,hard is a row vector with the n-1 elements cx(pk,pj), j =1,.,n-1 , Chard,hard is a n-1 by n-1 matrix with elements cx(pi,pj), i =1,.,n-1; j =1,.,n-1, xhard is a column vector with the n-1 hard data, mhard is a column vector with the n-1 elements mx(pi) , i =1,.,n-1, Ck,k=cx(pk,pk) is a scalar, and Chard,k is the transpose of Ck,hard. We note that: A good choice for the estimator ̂ k of X(pk) is the posterior mean mk|hard, i.e. use ̂ k =mk|hard This estimator is a linear combination of the hard data, i.e. ̂ k =0+1 x1 + … +n-1 x n-1 The posterior variance Ck|hard provides an assessment of the estimation error The posterior variance Ck|hard is always smaller than the prior variance Ck,k 30 3 Example Let X(p) be a homogeneous Random Field representing river water quality along the scalar coordinate p (one dimensional space along the river). The mean of X(p) is m=10 g/m3, it’s covariance is cX(r)=exp(-r), and the value of X(p) at 3 monitoring locations phard=[p1, p2, p3] T=[0 2.5 T 3.1] (Km) was exactly measured to be xhard=[x1,x2,x3] =[20 14 18] (g/m3). Estimate the water quality at the estimation location pk=1.7 Km. Solution: T The vector xmap=[x1, x2, x3, xk] representing X(p) at the mapping points pmap=[p1, p2, p3, pk] T=[0 2.5 3.1 1.7] is multivariate Gaussian, with the following mean vector mmap=E[xmap] and covariance matrix Cmap=Cov[xmap] = cX(pmap, pmap) with elements Cij=exp(-|pi - pj|) mmap 1 10 e |0 2.5| 10 1 ; C map 10 ( sym ) 10 e |0 3.1| e | 2.5 3.1| 1 0.082 0.045 0.183 e |0 1.7| 1 | 2.5 1.7| 1 0 . 549 0 . 449 e 1 0.247 e |3.11.7| ( sym ) 1 1 31 The posterior pdf of xk is univariate Gaussian with the following mean and variance 1 E xk χ hard mk Ck , hardChard, , hard x hard m hard 1 0.082 0.045 20 10 1 10 0.183 0.449 0.247 1 0.549 14 10 ( sym ) 1 18 10 13.2 1 Varxk χ hard Ckk Ck , hard Chard, , hard Chard, k 1 0.082 0.045 0.183 1 1 0.183 0.449 0.247 1 0.549 0.449 ( sym ) 1 0.247 0.777 32 4 Deriving Simple Kriging as a Best Linear Unbiased Estimator (BLUE) Another way to derive the formulas for Simple Kriging is by defining it as a Best Linear Unbiased Estimator (BLUE). Let X(p) be a random field with known mean mx(p), covariance cX(p, p’). Let T Xhard =[X1,…, Xn-1] represent X(p) at the points phard=(p1, , pn-1), and let the hard data T xhard=[x1,xn-1] be the exact known values of Xhard. We start by defining the estimator X̂ k of Xk =X(pk) as the linear combination of Xhard X̂ k 0 T Xhard. where 0 and T = [1, n-1] are parameters to be determined. The unbiasness condition imposes that E[ X̂ k ]=E[Xk], which leads to0 mk T mhard , so that the estimator can be rewritten as X̂ k mk + T (Xhard. mhard) 33 The expected value of the estimation error, ek = Xk X̂ k , is the mean square error e2 = E[(Xk X̂ k )2]. Substituting X̂ k mk + T (Xhard. mhard) in e2 leads to e2 E[ Xkmk T (Xhard. mhard))2] E[(Xkmk)2 2 (Xkmk) T (Xhard. mhard)+ ( T (Xhard. mhard))( T (Xhard. mhard))] E[(Xkmk)2 2 (Xkmk) (Xhard. mhard)T + ( T (Xhard. mhard))((Xhard. mhard)T )] E[(Xkmk)2 2 (Xkmk) (Xhard. mhard)T+ T ( (Xhard. mhard) (Xhard. mhard)T) ] E[(Xkmk)2] 2 E[ (Xkmk) (Xhard. mhard)T]+ T E[(Xhard. mhard) (Xhard. mhard)T] Ck,k 2 Ck,,hard + T Chard,,hard where Ck ,hard [cx ( pk , p1) ... cx ( pk , pn1)] and C hard , hard cx ( p1, p1 ) ... cx ( p1, pn 1 ) ... ... . cx ( pn 1, p1 ) ... cx ( pn 1, pn 1 ) 34 The parameters T are obtained by minimizing the mean square error e2 e2 0 , i 1,..., n 1 i e2 0 λT Ck , k 2 C k , hard λ λT C hard , hard λ 0 λT 2 Ck , hard 2 λT Chard , hard 0 1 λT Ck , hard Chard , hard 1 x̂ Substituting λT Ck , hard Chard , hard and using k and xhard in place of X̂ k and Xhard, respectively, leads to x̂k mk + Ck,,hard Chard,,hard (xhard. mhard) -1 e2 Ck,k 2 Ck,,hard Chard,,hard Chard,,k -1 These equations correspond to the mean and variance of the BME posterior pdf obtained when using mean and covariance as general knowledge, and hard data as site specific knowledge. Hence when 35 the knowledge base consists of mean, covariance and hard data, BME obtains Simple Kriging as special case. 36