September 10, 1991 LIDS-P-2065 DATA FUSION WITH MINIMAL COMMUNICATION* by Zhi-Quan Luot and John N. Tsitsiklist ABSTRACT Two sensors obtain data vectors x and y, respectively, and transmit real vectors mri(z) and rm2 (y), respectively, to a fusion center. We obtain tight lower bounds on the number of messages (the sumI of the dimensions of mil and r12 ) that have to be transmitted for the fusion center to be able to evaluate a given function fxI,y). When the function f is linear, we show that these bounds are effectively computable. Certain decentralized estimation problems can be cast in our framework and are discussed in some detail. In particular, we consider the case where x and y are random variables representing noisy measurements and f(z, y) = E[z I x, y], where z is a random variable to be estimated. Furthermore, we establish that a standard method for combining decentralized estimates of Gaussian random variables has nearly optimal communication requirements. KEY WORDS. Decentralized estimation, distributed computation, data fusion, communication complexity, lower bounds. * This research was supported by the Natural Sciences and Engineering Research Council of Canada, Grant No. OPG0090391, and by the ARO under contract DAAL03-86-K-0171. t Room 225/CRL, Department of Electrical and Computer Engineering, McMaster University, Hamilton, Ontario, L8S 4L7, Canada; email: luozq@sscvax.cis.mcmaster.ca. $ Room 35-214, Laboratory for Information and Decision Systems, MIT, Cambridge, Mass. 02139, USA; e-mail: jntCathena.rnit.edu. 1 Introduction and Problem Formulation Let there be two sensors, S 1 and S2, respectively. Sensor S1 (respectively, S 2 ) obtains a data vector x E J'm (respectively, y E N't). Sensor, S1 (respectively, S2) transmits to a fusion center a message _ r2 are vector-valued functions m1 (x) (respectively, i 2 (Y)). Here, 77ml : Ron _ jrl and i 2 : Jn which we call message functions. Finally, the fusion center uses the values of the received messages RJnm+n _ RsJ. For this to be possible, the received messages must to evaluate a given function f: contain enough information; in particular, the function f must admit a representation of the form f(x, y) = h(ril(x), i for some function h: ril+r 2 R . Here 2 (y)), V(x,y) YE , (1.1) G is some subset of ~m+n representing the set of all pairs (x, y) that are of interest. For example, we might have some prior knowledge that guarantees that all possible observation pairs (x, y) lie in G. For reasons to be explained later, we also require the finctions rhml, rm2, and h to be continuously differentiable. In the sequel, we will occasionally refer to the functions m7l, ri 2 and h as a commnunication protocol. The above described framework is a generic description of the process of data fusion. Data are collected at geographically distant sites and are transmitted, possibly after being compressed, to a fusion center. The fusion center needs these data for a specific purpose. No matter what this purpose is, it can be always modeled as the task of evaluating a particular function of the data. For example, suppose that x and y are random variables, representing noisy observations. Let z be a vector random variable to be estimated, and suppose that we wish the fusion center to compute the mean square estimate E[z I x, y]. Assuming that the joint probability distribution of (x, y, z) is known, E[z I x, y] can be expressed as a function f(x, y), and we are back to the model introduced in the preceding paragraph. iFrom now on, we adopt the above framework. We assume that the function f and the data domain g are given. Our objective is to choose the message functions mil1 and r-2, in some desirable manner. An obvious solution to our problem is to let n-1l(x) = x and ri 2 (y) = y. This corresponds to a centralized solution whereby all available data are transmitted to the fusion center. However, if conununication is costly, as it sometimes is, there could be an advantage if less information were transmitted. We may thus pose the problem of choosing the message functions nil and irn2 so as to minimize the number r = rl + r 2 of real-valued messages that are transmitted by the two sensors (recall that ri is the dimension of the range of aii), subject to the constraint that f can be represented in the form (1.1). The minimal possible value of r will be called the communication complexity corresponding to f and G and will be denoted by Cl(fA; ), where the subscript "1" denotes the fact that the functions rmi and h are assumed to be C1 functions. We will also consider the cases where the functions mil, ri 2 , and h are restricted to be linearor analytic. For these cases, we use Clin(f; 9) and Coo(f; ) to denote the corresponding communication complexity. Clearly, one has Cl(f,; 9) < Co(f; 9) < Clin,(; 9). A couple of remarks about our model of communication are in order. 1. The assumption of continuous differentiability is introduced in order to eliminate some uninteresting communication protocols. For example, if no smoothness condition is imposed, then each sensor Si can simply interleave the bits in the binary expansions of each component of its data vector and send the resulting real number to the fusion center, thus sending a single real valued message. Upon receiving this message, the fusion center can easily decode it and determine the value of x and y. Thus, with a total of 2 messages, the fusion center can recover the values of x, y and thus evaluate f(z, y). Such a communication protocol is not interesting since it basically amounts to sending all the information collected by the sensors to the fusion center. We are interested instead in a protocol that can somehow intelligently compress the information contained in the values of x, y and send to the fusion center only that information which is relevant to the evaluation of f(z, y). As we shall see later, the smoothness condition on the message functions succeeds in eliminating uninteresting communication protocols such as the one described above. The differentiability requirement on the function h of Eq. (1.1) is quite mild and not unnatural given the assumption that mil and rmi2 are differentiable. 2. We have assumed that messages are real valued, in contrast to the digital communication often used in practice. Although such a continuous model of communication cannot be implemented exactly using digital devices, it is nonetheless a useful idealization for certain types of problems. For example, most (if not all) of the parallel and distributed numerical optimization algorithms are usually described and analyzed as if real numbers can be computed and transmitted exactly [BeT89]. In addition, there is a fair body of literature in which data are communicated and combined for the purpose of obtaining a centralized optimal estimate [Spe79, Cho79, WBC82, HRL88]. This literature invariably asstunes that real-valued messages are transmitted. The schemes proposed in these papers are often evaluated on the basis of the number of transmitted messages. However, there has been no work that tries to derive the minimal number of required messages, and this is where our contribution lies. Another motivation for using a continuous communication model, as opposed to the discrete model often used in the theoretical computer science community [Yao79], is that it opens the possibility of applying tools from analysis, algebra and topology to the systematic study of conmmunication complexity problems. It is also worth noting that a similar continuous framework has been successfully applied to the study of computational complexity [BoM75, BSS89]. Our formulation of the data fusion problem can be regarded as an extension of the one-way commniunication complexity model first introduced and studied by Abelson [Abe78]. In particular, Abelson considered the situation where two processors P 1 , P2 wish to compute some real valued function f(x,y) under the assumption that the value of x (respectively, y) is given only to P1 (respectively, P 2 ) and that the messages (real valued) can be sent only from P1 to P 2. Our setting has a similar flavor, except that we are dealing with a different "organizational structure." It is also worth noting that Abelson's model of continuous conuiunication protocols has an interesting 2 parallel in the field of mathematical economics; in the latter field, the problem of designing a communication protocol is fornmulated as a problem of designing a decentralized process that performs a desired economic function [Hur60O, Hur85, MoR74]. Subsequent to Abelson's initial work, there have been several other studies [Abe80, LuT89, LuT91] of the communication complexity of various specific problems under more general continuous models of communication (e.g., allowing messages to be sent in both directions). The discrete counterpart of Abelson's formulation was introduced in [Yao79] and was followed by many studies of the communication complexity of specific graph and optimization problems (e.g., [JaJ84, LiS81, PaS82, TsL87]). This paper is organized as follows. In Section 2, we consider the case where f is linear and 5 is a subspace of R"+", and we restrict ourselves to linear protocols. We motivate this problem in the context of decentralized estimation of Gaussian random variables, under the assumption that the statistics of the underlying random variables are commonly known. We obtain a complete characterization of the corresponding communication complexity Clin(f; 5), together with an effective algorithm for determining it. In the process of deriving these results, we solve a problem in linear algebra that could be of independent interest. In Section 3, we extend the results of Section 2 to the case of a general nonlinear function f and general communication protocols. In particular, we show that for the case of decentralized Gaussian estimation, the restriction to linear message functions does not increase the communication complexity. In Section 4, we consider a variation of the Gaussian case treated in Section 2. The main difference from Section 2 is that the covariance matrix of the observation noise at any particular sensor is assumed to be known by that sensor but not by the other sensor or the fusion center. We apply a result from Section 3 and obtain a fairly tight bound on the communication complexity. In particular, we show that a standard method for combining decentralized estimates has nearly optimal conununication requirements. To the best of our knowledge, this is the first time that a result of this type appears in the estimation literature. We shall adopt the following notational conventions throughout this paper. For any matrix M and N of size I x m and I x n respectively, we use [AM, N] to denote the matrix of size 1 x (7m + n) whose colunms are the columns of M followed by the columns of N. We let r(M) be the rank of M, and MT its transpose. For any differentiable function f : R'"+" -+ R of two vector variables x CEm and y E Rn, we use the notation Vx/f(x, y) (respectively, Vyf(x, y)), to denote the m-dimensional (respectively, n-dimensional) vector whose components are the partial derivatives of f with respect to the components of x (respectively, y). If f: Rm+n - Rs is a vector function with component mappings fl, f2 ..- , f 3, then Vf will denote its Jacobian matrix whose i-column is given by the gradient vector Vfi. Similarly, Vjf (respectively, Vyf) will denote the matrix whose i-th column is V,,fi (respectively, Vyfi). 3 2 Decentralized Gaussian Estimation with Linear Messages In this section, we consider a simple decentralized estimation problem in which all of the random variables involved are Gaussian and all the message functions are linear. We will give a complete characterization of the communication complexity for this problem, together with an effective method for computing it. The results in this section and the techniques developed for proving them will provide insight and motivation for the results in the next section where the general nonlinear case will be considered. Let z E Wt be a zero mean Gaussian random variable with known covariance matrix Pzz. Suppose that the sensors S1 and S2 collect data about z according to the formulas x = H1z +vl, (2.1) y = H 2 z + v 2, (2.2) where H 1, H 2 are some m x I and n x I matrices, respectively, and v1 E Rm, v 2 E En are zero mean Gaussian noise variables, not necessarily independent. Let R be the covariance matrix of (vl, v 2 ). Suppose that the fusion center is interested in computing f(X, y) = E[z I x, y], the conditional expectation of z given the observed values x and y. Assuming that vl and v 2 are independent of z, we have fAxy) = E[z I x,y = HT [HPZZHT + R] [ , (2.3) where we let H= | Hi1 ] (2.4) and where we have also assumed that the inverse exists. Suppose that the matrices Pzz, H 1 , H 2 and R are known to both sensors S1, S2 as well as the fusion center. Then, equations (2.3) and (2.4) imply that f(x, y) = Ax + By, (2.5) for some matrices A, B (of size I x m and I x n respectively) depending on Pz, R, H 1, and H 2. Notice that the matrices A and B can be regarded as constant since they can be precomputed and can be assumed to be available at the sensors and the fusion center. The set g of possible data pairs is the support of the probability distribution of (x, y) and our Gaussian assumptions imply that it is a subspace of Rm+n. If the noise covariance matrix R is positive definite, then it is clear that 5 = Rm+n. On the other hand, if the covariance matrix of (x, y) is singular, then (x, y) takes values in a proper subspace of R n+, with probability 1. In other words, we have either G = Rpnm+n (2.6) or = {(x, y) Cx 4- Dy = 0}, 4 (2.7) where C and D are some matrices of size k x m and k x n respectively, and k is some positive integer. The entries of C and D can be determined from Pzz, H 1 , H2, and R-'so both C and D can be viewed as commonly known by the sensors and the fusion center. Under the restriction to linear message functions, we have the following characterization of the communication complexity: Theorem 2.1. Let f(x,y) and g be given by (2.5) and (2.6)-(2.7). Suppose that the matrices P,, H1, H 2, and R are known to both sensors S1, S 2 as well as the fusion center. We then have fr(A) + r(B), C ,(f;= mllinx {r(A - XC) + r(B - XD)}, if g = Rm+,n, if 5 {(x, y) I Cx + Dy = 0, (2.8) where the minimum is taken over all possible real matrices X of size l x k, and where k is the number of rows of C and D. Proof. Consider any conununication protocol for computing f with linear message functions. Let nAil(x) = Mix and Pm2 (y) = AM 2 y be the message functions used by sensor S1 and S2 respectively, where M1 is a matrix of size rl x im and ¥1,2 is a matrix of size r 2 x n. (So, rl and r2 are the number of messages sent to the fusion center from sensor S1 and S2 respectively.) By (1.1), there exists some final evaluation function h such that Az + By = f(x,y) =h(ihl(zx), iz(y)) = h(MIxz, M 2y), V(x, y) C g. (2.9) We consider two cases. Case 1. g = Rm+n. By (2.9), Ax + By is a function of MIx and M 2 y. So, there holds Ax + By = h(0, 0) = constant, for all (x, y) satisfying Mlx = 0, (x,y) E Rm+n. M 2 y = 0, (2.10) Thus, (x, y) must be orthogonal to the rows of the matrix [A, B] whenever (x, y) satisfies (2.10). In other words, the null space of Mi 1 0 0 M2 is contained in the null space of [A, B]. It follows that there exist matrices N 1 and N 2 of appropriate dimensions such that [A, B] = [Ni, N 2] [ 11 2 Therefore, we have A = N1 AI 1 , B = N 2 AI 2 , 5 which further implies r(A) < r(MI) < rl and r(B) < r(M 2 ) < r 2. Thus, r = rl +r 2 > r(A)+r(B), which proves Cli,,(fi Rin+±l) > r(A) + r(B). Rm+n) < r(A) + r(B) by constructing a communication protocol We now show that Clin(f; for computing f with r(A) + r(B) linear message functions. This is accomplished as follows. By the singular value decomposition, A can be written as A = EF for some matrices E and F of size 1 x r(A) and r(A) x m respectively. Furthermore, A is known to both sensors S1, S 2 and to the fusion center. Thus, the decomposition A = EF can be precomputed so that both the sensors and the fusion center know the value of E and F. Now let sensor S1 use the message function mii(x) = Fx, which clearly takes r(A) messages. Upon receiving the value of Fx, the fusion center can compute Ax by using the formula Ax = E(Fx). By an identical argument, the value of By can also be computed with r(B) linear messages from the sensor S2. As a result, the fusion center can compute f(x, y) = Ax + By with a total of r(A) + r(B) linear messages, which proves Clin(f; _m+n)< r(A) + r(B), as desired. Case 2. We now assume that 5 = {(x, y) I Cx + Dy = 0}. Again, by (2.9), we have Ax + By = h(O, 0)= constant, for all (x, y) C Rm+n satisfying llMx = 0, M 2 y = O, Cx + Dy = O. Therefore, the null space of the matrix [C D 0 M2 is contained in the null space of the matrix [A, B]. As a result there holds [A, B] = [X, P1 , P2 ] C MI D 0 0 M/2 for some matrices X, P1 and P2 of appropriate dimensions. Thus, we have A = XC + P1 All, B = XD + P2 MI2, which implies r(A- XC) = r(P1 MI 1 ) < r(M1 ) < r, and r(B - XD) = r(P2 A1 2 ) < r(M 2 ) 6 r2-. Therefore, r(A - XC) + r(B - XD) < rl + r 2, which implies Cli,(fA 5) > minx {r(A - XC) + r(B - XD)}. It remains to prove Cli,(f1 5) < minx{r(A - XC) + r(B - XD)}. To do this, fix any matrix X which attains the mininum in the expression minx{r(A - XC) + r(B - XD)}. Notice from (2.7) that f(x, y) = Ax + By = (A - XC)x + (B - XD)y, for all (x, y) E g. By the singular value decomposition of A - XC and B - XD and using an argument similar to the one used in Case 1, we see that (A-XC)x can be computed with r(A-XC) linearmessages from sensor S 1 , and (B-XD)y can be computed with r(B - XD) linear messages from S 2; thus, f(x, y) = (A - XC)x + (B - XYD)y is computable with a total of r(A - XC) + r(B - XD) linear messages. By the choice of X, we Q.E.D. have Cli,(f; 5) < minx{r(A - XC) + r(B - XD)}, as desired. Theorem 2.1 has provided a complete but nonconstructive characterization of the communication complexity of computing E[z I x, y] with linear message functions, for the Gaussian case. In order to turn Theorem 2.1 into a useful result, we show below that Cli,(f; ) and the minimizing matrix X in (2.8) is effectively computable (in polynomial time). The intuition behind this result can be drawn by considering the following two extreme cases. Suppose that C = -D = I. Then, 5) = minx{r(A - X) + r(B + X)}. Choosing X = -B, we see that Cliu(f; ) < r(A + B). On the other hand, using the inequality r(A - X) + r(B + X) > r(A + B), for all X, we have C 1i.(f; >) > r(A + B), and, therefore, C1i,(f; S) = r(A + B). For another extreme case, suppose that D = 0. Then, Clin(f; 5) = r(B) + minx r(A + XC). Lenuna 2.1 below shows that minx r(A + XC) is computable in polynomial time. The proof of the polynomial computability of Cli,(f; 5) in the general case is based on a combination of these techniques, as can be seen in the proof to follow. Clin(f; Theorem 2.2. Suppose that the entries of the matrices A, B, C and D are all rational numbers. Then, there is a polynomial time algorithmn (in terms of the total sizes of the entries of A, B, C and D) for computing minx{r(A - XC) + r(B - XD)}. Remark: By transposing, we see that the minimization minx{r(A - CX) + r(B - DX)} is also solvable in polynomial time. Proof. The proof of Theorem 2.2 consists of a sequence of lemmas. Lemma 2.1. For any rational matrices A and C, there exist square invertible matrices P and Q such that: (a) P and Q depend oIlly on C and are computable in polynomial time. (b) If Y = XP and A = AQ - 1, then r(A + XC) = r([A1 + YI, A 2 ]), where Yt is a submatrix of Y given by a certain partition Y = [Y1 , Y2] of the columns of Y, and [A 1, A 2] is a corresponding partition of A (partitioned in the same way as Y = [Y1, Y2]). (c) minx r(A + XC) = r(A 2 ). In particular, minx r(A + XC) is computable in polynomial time. 7 Our next lemma, which is based on Lemmna 2.1, reduces the original rank minimization problem '~ to a simpler one. Lemma 2.2. Let A, B, C and D be some rational matrices. Then, there exist matrices A 1 , A 2, B1 and B 2 with min {r(A - XC) + r(B - XD)} = nln {r([Ai + W, 1A2]) + r([B + W, B 2 ])} W X Moreover, the matrices A1, A2, B1/ and B 2 can be computed in polynomial time from A, B, C and D. The proofs of Lemmas 2.1 and 2.2 are lengthy and have been relegated to the appendix. Lemma 2.3. For any rational matrices A, B, C and D, there holds min{r([A + X, C]) + r([B + X, D])} = minr(A - B + CY + DZ) + r(C) + r(D), (2.11) where the minimum is taken with respect to all matrices of the proper dimensions. Proof. We first show that the left-hand side of (2.11) is no smaller than the right-hand side. To do this, we first notice r([A + X, C]) = r([A + CY + X, r([B + X,D]) ,C]), VY, (2.12) VZ. (2.13) = r([B- DZ + X,D]), Suppose that the minimum of minyx{r([A + X, C]) + r([B + X, D])} is attained at some X*. Then, using the above relations, we have ([A + X, D]) r([A + X*, C]) + r([B + CY + X*, C]) + r([B - DZ + X*, D]) = r([A + CY + X*, C]) + r([B - DZ* + X*, 0]) + r(D) = > r([A-B + CY + DZ*, C]) + r(D) = r(A - B + CY* + DZ*]) + r(C)+ r(D) > minr(A - B + CY + DZ) + r(C) + r(D), Y,Z where the second equality follows from choosing a Z* so that the columns of D become perpendicular to the columns of B - DZ* + X*. (Such a Z* can be found by solving for Z* the system DT(B DZ* + X*) = 0. This system clearly has a solution when D has full rank. The case where D does not have full rank can be easily reduced to the full rank case by throwing away some of the columns of D and letting the corresponding rows of Z* be equal to zero.) The first inequality follows from the general matrix inequality r(M) + r(N) > r(M + N), for all M and N; the third equality follows from choosing Y* so that the columns of C are orthogonal to the columns of A - B + CY* + DZ*. 8 To show the other direction of the inequality, suppose that the minimum in the expression mIiny,z r(A - B + CY + DZ) is attained at some matrices Y* and Z*. Then, using (2.12) and (2.13) we see that r([A + X, C]) + r([B + X, D]) = r([A + CY* + X, C])+ r([B - DZ* + X, D]). Letting X* = -B + DZ* and using the above relation, we obtain r([A + X*, C]) + r([B + X*, D]) = r([A - B + CY* + DZ*, C]) + r(D) < r(A - B + CY* + DZ*) + r(C) + r(D) = minr(A - B + CY + DZ) + r(C) + r(D), Y,z where the inequality follows from the general matrix inequality r([M, N]) < r(M) + r(N), for all M and N; the last step is due to the definition of Y* and Z*. This completes the proof of (2.11). Q.E.D. We are now ready to complete the proof of Theorem 2.2. By Lenuna 2.2, it suffices to show the polynomial time computability of inn {r([A + , A2 ]) + r([1 + W, 21)} for some known (and polynomially computable) matrices A 1, A 2, B1 and B2. By Lemma 2.3, we only have to argue that miny,z r(Al - B 1 + A 2 Y + B2Z) is computable in polynomial time [cf. (2.11)]. Letting X = minr(A1 -B1f Y,ZA [ ]we have + A 2 Y + B 2 Z) = nrin (A 1 - B 1 + [A 2, and the result follows from Lemma 2.1(c). 2]x) 3 = min r (AT -Bf X X T[+XT I 2, B 2 ]T) Q.E.D. Theorem 2.2 provides a method for evaluating Cujl(fA 5) (as given by (2.8)). The proofs of Lemmas 2.1 and 2.2 (see the appendix) show that the running time of this method is roughly equal to the running time of performing several Gaussian eliminations and matrix inversions plus that of evaluating the rank of several matrices. It is still an open question whether there exist more efficient algorithmls for computing Cul,(f; 5). We also remark that the proof of Theorem 2.2 also provides us with a (polynomial time) algorithm for constructing a minimizing matrix X and a corresponding optimal communication protocol. 3 The General Nonlinear Case In this section, we consider the case where the function f is nonilinear and fairly arbitrary. Accordingly, we allow the message functions to be nonlinear as well. In terms of the decentralized 9 estimation context, this is the situation that would arise if we were dealing with the optimal estimation of non-Gaussian random variables. We derive general lower bounds on the communication complexity for solving this problem. Our results imply that the lower bound of Theorem 2.1 remains valid even with general message functions. Thus, the restriction to linear message functions does not increase the communication complexity for the case of Gaussian random variables. We will also consider in this section the case of computing a rationalfunction fi(x, y) by using conmunication protocols whose message functions and final evaluation function are analytic. We will use some analytical tools to obtain an exact characterization of the communication complexity. This bound will be used in Section 4, in our further analysis of decentralized Gaussian estimation. In what follows, we assume that 5, the set of possible observation pairs for the two sensors S 1 and S2, is described by 5 = { (x, Y) 91i(, where gi: R )= 0, g9 2 (x, y) _)m+n Rt, and i2 : gRm+n some positive integers. When -i - t2 < 0; z E m y E Rn } (3.1) are some given differentiable functions, with tl, t 2 0 we have 5 = S"n+", which corresponds to the case where the pair (x, y) is unrestricted. Let f: _m+n R sabe a differentiable function of two vector variables x and y (z E R m , y E Rn), and let g = (fg,g2). Our result is the following. Theorem 3.1. Suppose that g (as defined by (3.1)) is nonempty. Suppose that either Vg(z) (the Jacobian of y) has full rank for all (z, y) E g, or that 9 is a linear mapping. Then, for any z = (x, y) E g, we have Cl ( g) > min {r(V f(z) - Vdl(z)X - Vx 2 (z)Y) + r(Vyf(z)- Vyfj(z)X - Vy9 2 (z)r)}. (3.2) Here, the minimum is taken over all matrices X and Y of appropriate dimensions, subject to the constraint that all entries of Y are nonnegative. Proof. Consider any optimal communication protocol for computing fA, y) over g (i.e., with a minimum number of messages). Let tfin: Rm " "1" and m 2 : ni" 1 Rr2 be its message functions which are assumed to be continuously differentiable. Here, rl (respectively, r 2 ) is equal to the number of messages sent from sensor S1 (respectively, S2) to the fusion center. ZiFrom equation (1.1), we have f(a, y)= h(itZ(X), 7n2 (y)), V,(z,y) E , (3.3) where h is a continuously differentiable function. We need the following simple lemnia. Lemma 3.1. Let pf: R1t - aRs and ql j : RI Rtl qf2 : _ Rt2 be three continuously differentiable functions. Suppose that the set g = {z E RI I qi(z) = 0, q 2 (z) < 0} is nonempty and that Piz) is constant over 5. Let q = (q, q'2). If Vqlz) (the Jacobian of q-) has full rank for all z E R,l or if 10 is a linear mapping, then there exists a matrix function A(z) of size tl x s and a matrix function B(z) > 0 (comlponentwise) of size t 2 x s, so that V/p(z) = Vq'i(z)A(z) + Vq-2(zB(z), for all z E 5. Proof. Let pi be the i-th component function of , i = 1, ..., s. Consider, for each i, the following constrained optimization problem: min pi(z). zE9 (3.4) By assumption, each z satisfying ql(z) = 0 and q2(z) < 0 is an optimal solution to (3.4). Since the regularity condition on the Jacobian of q or the linearity of q ensures the existence of a set of Lagrange multipliers, the necessary condition for optimality ([Lue84, page 300]) gives Vpi(z) = Vql(z)ai(z) + Vq2(z)bi(z), for some vector function ai(z) of dimension tl and some vector function bi(z) > 0 (componentwise) of dimension t2, for all i = 1,..., s. Writing these relations in matrix form yields the desired result. Q.E.D. By (3.3), f(, y) - h (mil(x), imi2 (y)) = 0 for all (x,y) satisfying -g(x, y) = 0 and g 2 (x,y) < 0. Let p-z, y) = f(, y) - h( il(X), fi 2(y)) and let qi (x, y) = g 1(x, y), q2(x, y)= g-2 (, y). Then, p, qi and q2 satisfy the assumptions of Lemma 3.1. Thus, there exist some matrix functions Ql(x, y) and Q 2 (x, y) > 0 such that V(x, y) = Vq(x, Y)Ql(X,y) + VQ(Z, y)Q 2 (x, y), for all (x,y) CE . Equivalently, for all (x, y) E g, we have vf- [7 Vr= [ V[V1 Q + Vg2Q2. Fix any (x, y) E 5 and let X = Ql(x, y) and Y = Q 2 (x, y). The above relation implies that VmrlV',h = VXf- VM 2 Vnh = Vyf - VygX - Vyg2Y. Vzglx - Vzg2 Y, (3.5) (3.6) Since Vrmil is a matrix of size m x rl, we obtain rl > r(Vml) > r(VrmnV, = r(Vf h) - VJglX - Vg2Y), (3.7) where the last step is due to (3.5). Similarly, (3.6) yields r 2 > r(Vyf - VylX -Vyg2)- Therefore, Cl(f;g) = rl + r 2 > r(Vf > - Vx,YX - Vz min {r(Vrf- VgIX Y ) + r(V 1yf- Vy91X - V~92Y) + r(Vyf- Y>o,X 11 - Vy2r) Vy9X - Vyg2 )}, for all (x, y) E 9. This completes the proof of Theorem 3.1. Q.E.D. We remark that when f and 9 are given by (2.5) and (2.6)-(2.7) then the right-hand side of (3.2) reduces to the right-hand side of (2.8). This implies that Czi,i(j g;)= C1(f; ). In other words, for the problem of estimating a Gaussian random variable, the restriction to the linear message functions does not increase the communication complexity. It is not clear how such a restriction on the message functions will affect the communication complexity for estimating general random variables. A disadvantage of Theorem 3.1 is that it only provides a lower bound for the communication complexity C 1 (A; 9). It is not known in general how far away this lower bound can be from Cl(A; 9). However, we show next that if f is a rational vector function, then we can obtain tight lower bounds in a local sense, for the class of analytic communication protocols (Theorems 3.2 and 3.3). We need to fix some notations. Notation: Let f = (f, ..., fs) be a (vector) rational function and let 1D (the domain of f) denote the open subset of Rm+n over which f is well defined (finite). For i = l,...,s, and for any ntuple a = (aa,...,a,n) and m-tuple /3 = ( ../1,,/ 3m) of nonnegative integer indices, we define the functions fi : D -+ R and f/ : fc,!(x,y) = .. -D * 4y A R by letting f 3 (xm,y) = (, y), z (3.8) (x, Y). (We use the convention fi = fi.) Furthermore, a notation such as span{Vfg(xz, y): Vy E Dy, Vi, x} will stand for the vector space spanned by all vectors of the form Vfja(x, y) that are obtained as y varies in a set Dy and as i and a vary within their natural domains. Finally, for any finite index set I and collection {ai: i E I} of vectors, we use [ai: i E I] to denote the matrix whose columns are given by the vectors ai E I. Theorem 3.2. Let 1 denote the domain of f. Let Dx and Dy be two nonempty open subsets of R"n and ' respectively with D. x Dy C D. Consider an analytic communication protocol, consisting a total of rl + r 2 messages, for computing f over 9 = D, x Dy, where rl (respectively r 2 ) sR denotes the number of messages sent to the fusion center from sensor S1 (respectively, S2). Then rl > max dimspan({Vdfj(x,y): Vy E Dy, Vi,ca}, (3.9) -ED. r2 _> max dim span{V y 3(x, y): x E D, Vi, } (3.10) Proof. Due to symmetry, we shall only prove (3.9). Let the message functions be denoted by rll(X), mn2 (y) and let the final evaluation function be denoted by f(x,y) =I(t7.i(x),i 2 (y)), 12 I. We then have from (1.1), V(x,y) E Dx x Dy. Fix any x E D,. Differentiating the above expression with respect to y yields fi(x,y) = h( il(x), 2 (y),y), Vy E Dy, Vi, (3.11) where hi is a suitable analytic function. We now differentiate both sides of (3.11) with respect to x to obtain V.f?(x, y) = Vriml(x)V, l ht(mil(x), 2(y),y), Vy E Dy, Vi. (3.12) Thus, for all y E Dy and for all i, the vector Vjf?(x, y) is in the span of the columns of the matrix V,7i.l(x). Since the number of columns of Vxril(x) equals rl, it follows that rl > dim span{(Vf(x, y)::y e Dy, Vi, a}. Since the above relation holds for all x E D., we see the validity of (3.9). Q.E.D. It should be clear from the proof that Theorem 3.2 remains valid if the function f is merely analytic, rather than rational. We continue with a corollary of Theorem 3.2 that will be used in the next Section. Corollary 3.1. Let f be a rational function with domain D. For any analytic protocol that computes f over an open set 5 c D, the number of messages r 1 and r2 transmitted by sensors S 1 and S2, respectively, satisfy dimrank[Vfji(x, y) i, a], r > r2 > dimrank[Vyfi(x, y): i, a]. V(x, y) E V(x, y) E g. Proof. Given (x, y) CE C D, let D_ and Dy be some open sets containing (x, y) and such that D, x Dy C 5, and apply Theorem 3.2. Q.E.D. We now provide a partial converse of Theorem 3.2 by showing that the lower bounds (3.9) and (3.10) are tight in a local sense. Let f(x, y) = (fi(x, y), f 2 (x, y),..., fM(x, y)) be a collection of rational functions to be computed by the fusion center, where x E ,n, y E Rn. Suppose that fi(x, y) = pi(, y)/qi(x, y), i = 1,..., s, where pi and qi are relatively prime polynomials. We assume that all of the pi's and qi's are nonzero polynomials. Note that each Pi and qi (i = 1, ..., s) can be written in the form pi(X, y) E P=(P... pil(A)yly~ 2 . ny qi3(x)yly 2 **yn qi(x, q y) = ,Pn)EeB f3=(P (3 13) ,....,Pn)Es where each pi3, qiP, is a suitable polynonmial and B is a finite set of n-tuples of nonnegative integers. Symmetrically, we can write 13 p (X,) Z = a=(cl ..... pi,(y)X 2 . .Cm qi(x,y) san)E eA qi(y)xl X2 2 Xm -ca=(al ... am)EA (3.14) where each Pi,, qic is a suitable polynomial and A is a finite set of m-tuples of nonnegative integers. We let t, = max r[Vpji(x), Vqi3(x): 1 < i < s, P E B], (3.15) max r[Vpi(y), Vqj(y): 1 < i < s, a E A]. (3.16) xEERm ty Finally, let D., and Dy be the (open) sets of points at which the maxima in Eqs. (3.15) and (3.16), respectively, are attained. Our result is the following: Theorem 3.3. Let f(x, y) = (p(x, y)/q l (x, y), ... ,p , (x, y)/q,(x, y)) be a rational (vector) function (x E Wm, y E WRn), where pi, qi (i = 1, ..., s) are relatively prime polynomials. Let D = { (x, y) I qi(x,y) 4 0,Vi}. Suppose that (0,0) EC and that pi(0, 0) # 0 for all i. Then, for any (x, y) E D n (Dx) x Dy), there exists an open set 9 of the forml = Dx x Dy containing (x, y) such that Dx x Dy c D n (Dx x Dy) and tX = max dimspan{Vf~y(x, y): Vy E Dy, Vi,c}, XED. ty = max dim span{Vyf(x, y): Vx E D,, Vi, 8} yEDY (3.17) (3.18) and Coo(f; ) = t± + ty, (3.19) where tx, ty are defined by (3.15) and (3.16). Proof. For notational simplicity, we shall prove (3.17), (3.18) and (3.19) only for the case s = 1 (i.e., f is a scalar rational function). We will thus onmit the subscript i from our notation. The general case of s > 1 can be handled by mlodifying slightly the proof given below. Fix some (x*, y*) E D n (Dx x Dy). Let Dx x Dy C D n (LDx x Dy) be an arbitrary open set containing (x*, y*). The validity of (3.17) and (3.18) follows directly from Theorem 3.3 of [Luo90O]. Theorem 3.2 yields C,(f; DX x Dy) > tX + ty. (3.20) It only remains to show that (3.20) holds with equality. To do this, we need to construct an analytic cormmunication protocol for computing f(x,y) over some open subset Dx x Dy of D n (D) x Dy) containing (x*, y*). Let B1 C B1and B2 C B be such that 1P11 + 1[21 = tx and r[Vp,(x*), Vq,,(x*): P ECB 1, 3' E t3 2] 14 = t,. (3.21) Similarly, we let A 1 C A and A2 C A be such that 1Al4 + 1A2 1 = ty and r[Vpa(y*), Vq,,(y*): a E Al, a' E A 2 ] = ty. (3.22) Consider the commlunication protocol with rml(x) = {pp(x), qo,(x) : 3 e B1 , /3' E B 2 } and with rn2 (Y) = {Pa(Y), q,(Y) : a C A 1, a' E A2}- Clearly, the total number of messages used in this protocol is equal to tx + ty. We claim that this protocol can be used to compute f(x, y) over some open set D, x Dy containing (x*, y*). We need the following lenunal whose proof can be found in [LuT91, Theorem A.1]. Lemmna 3.2. Let Q be an open subset of Rr. Let F: Q R- be an analytic mapping such that maxr(VF(z)) = . zEQ Suppose that f: Q R J is an analytic function with property Vf(z) E span{VF(z)}, Vz E Q. Then, there exists some analytic function h such that f(z) = h(F(z)) for all z E Q', where Q' is some open subset of Q. Consider the polynomial mapping F : Rm+n _ Rt+ty defined by F(x, y) = (mil(z), Clearly, max,,y r(VF(x, y)) = t, + ty. Moreover, we have VF(x,y)= [VI = () M2(y)). Vi2(y) Vpp(z) 0 [ Vqp(z) 1 [ 0 VPa,(y) ' ' [VqE(y) ],: B,, ' ' E ] B 2 ,ta EA, ] a' E A 2 , for all x and y, where the second step follows from the definition of 7ml(x) and ri12 (y). (3.23) On the other hand, by differentiating f(x, y) = p(x, y)/q(x, y) and using (3.13) and (3.14), we obtain, for all (x, y) sufficiently close to (x*, y*), that Vf(x,y) E span{Vp(x,y),Vq(z,y)} = span V{p(x,y) ] 0Vyp(x y)J Csn{ [ Vp(,y) ] [ V] q(x,y) Vypq(x,y) Vyp(Zx ,y) ] [ V( ) ] [ Vyq(z ,y) 'I II fct., Leiia. :3.2 va.s proved( in [Lu'T'91] only for contilnuously differcntia:le functions. gencrallizes l.o th.e atalytic futtctiolls. 15 B3ut its proof easily spant Vpl = a [ :/ 0{ E ] Vql 1 0pp() , E B 2 ,aa A, Vpo(y) J Lvq[(y) J [o,, Vp (y '( ) a'C Vq (y) A 2 }, where the last step follows from (3.15)-(3.16) and the fact that (3.21)-(3.22) holds for all (x,y) close to (x*, y*). This, together with (3.23), implies Vf(x, y) E span{VF(x, y)} for all (x, y) near (x*, y*). Thus, we can invoke Lemma 3.2 (with the correspondence s + t, + ty and z (x, y)) to conclude that there exists some analytic function h: Wt:+ty H- R such that f(x, y) = h(F(x, y)) for all (x, y) near (x*, y*). Since F(x, y) = (mil(X), M 2(y)), we see that f(z, y) can be computed on -+ the basis of the messages i1l(z) and fn 2 (y) over some open subset D containing (x*, y*). Now take an open subset of D with the product form D, x Dy such that (x*, y*) E D, x Dy. This completes the proof of Theorem 3.2. Q.E.D. In essence, Theorem 3.3 states that the lower bounds of Theorem 3.2 are tight, in a local sense, for the class of analytic colmnunication protocols. The adjective "local" stands for the fact that the protocol of Theorem 3.3 works only for (x, y) in a possibly small open set D. x Dy. Theorem 3.3 also provides an alternative way of computing t, and ty (cf. (3.17)-(3.18)). This is particularly useful since in some applications the computation of t., and ty as defined by (3.15)-(3.16) can be quite involved, whereas the computation of t, and ty as given by (3.17)-(3.18) is relatively simple. We note that, instead of quoting the results of [Luo90], we could have proved the lower bound Coo(f; D) > tx + ty directly from Theorem 3.1. However, such an approach is more complicated. Finally, note that Theorem 3.2 asserts the existence of a local analytic protocol with txt+ty messages. Ideally, we would like to have a rational protocol (in which both the message functions and the final evaluation functions are rational, instead of analytic), which uses only tx + ty messages, and which is global (in the sense that the domain 5 of the protocol coincides with the domain 7D of f). 4 Decentralized Gaussian Estimation Revisited In this section, we consider a variation of the decentralized Gaussian estimation problem of Section 2. In contrast to Section 2, we will now assume that some of the statistics of the random variables involved are only locally known. We shall apply the results from Section 3 to obtain some tight bounds on the number of messages that have to be transmitted from the sensors to the fusion center and establish near optimality of a natural communication protocol. Let z E 'mn be an unknown Gaussian random variable to be estimated by the fusion center. 16 Let there be two sensors S1 and S2 which are making observations of z according to u = Hiz + vi, (4.1) w = H 2 z + v 2, (4.2) where u E ?tl (respectively, w E Rn) denotes the data vector observed by S1 (respectively, S2). Here, vl and v 2 are n-dimensional Gaussian noise vectors, independent of z and independent of each other. Also, H 1 and H 2 are two coefficient matrices of size n x m. We note that, in practice, the number n of observations obtained by each sensor is typically much larger than the dimension ?m of the random variable z to be estimated. For this reason, we will be focusing on the case n > m. Let R 1 and R 2 be the covariance matrices of vl, v2 , respectively. Let Pzz be the covariance matrix of z, which we assume for simplicity to be positive definite. We assume that the fusion center wishes to compute the conditional expectation E[z I u, w]. Assuming the existence of the inverse in the equation below, we have E[z u,w] = HT [HPzHT + R] [ ] (4.3) where H= [ H ] X R 0[ OOR R22 ] (4.4) Note that the invertibility of HP=zHT + R is equivalent to assuming that the support of the distribution of (u, w) is all of R2n. For this case, the results of Section 2 show that if the matrices P,z, Hi and Ri (i = 1, 2) are colllnonly known by the sensors and the fusion center, then the communication complexity is equal to 2m. Let us now assume that PzZ is known by the two sensors and the fusion center, while the matrices H 1 , R 1 are known only to sensor S and the matrices H 2 , R 2 are known only to sensor S2- This case can be quite realistic. For example, the coefficients of H 1 might be determined locally and on-line by sensor S1 (as would be the case if sensor S1 were running an extended Kahlan filter). Also, the entries of R 1 might be estimated locally and on-line by sensor S1, by computing the empirical variance or autocorrelation of past observations. In both of the cases described above, the values of H 1 and R 1 would be known only by sensor S1. In relation to the notation used earlier in the paper, we have x = (H 1 , R 1, u), y = (H 2 , R 2, w), and the function to be computed by the fusion center is f(x, y) = f(H 1 , R, u; H 2, R 2 , w)= HT [HPZHT + R] [ ], (4.5) where H and R are given by (4.4). Note that Pzz does not appear as an argument in the lefthand-side of Eq. (4.5), because it is considered as a commonly known constant. 17 Finally, we let g be the set of all (H1, R1, u, H 2, R 2 , w) such that R1 and R 2 are symmetric positive definite matrices. (Note that on g, the matrix HP,,HT+Ris guaranteed to be invertible.) Theorem 4.1. Let f and 5 be as above and assume that n > m. Then, m2 + m ) < Cf; m 2 + 3m . Proof. The upper bound follows from well-known formulas for the combining of measurements. We repeat the argument here for the sake of completeness. As is well-known, we have E[z I u, w] = [P-z + HTR lH1 + HTR- 1H 2 ]-l[HTR-lu + HTR2-w]. (4.6) This suggests the following protocol. Sensor Si transmits HTRl -1 (m messages) and HTRj-1Hj to the fusion center. The latter is a symmetric matrix of dimension m x m. Since the off-diagonal entries need only be transmitted once, m(m + 1)/2 messages suffice. The situation for sensor S 2 is symmetrical, and we see that the total number of transmitted messages is equal to m 2 + 3m. It is clear from Eq. (4.6) that these messages enable the fusion center to compute E[z I u, w]. We continue with the proof of the lower bound. To keep notation simple, we will only prove the lower bound for the case m = n. The argument for the general case (n > m) is very similar. In any case, it should be fairly obvious that increasing the value of n, while keeping the value of m constant, cannot decrease the communication complexity. (A formal proof is omitted.) Thus, any lower bound established for the case n = m is valid for the case n > m as well. A further simplification of the proof is obtained by considering the special case where Pzz = I. As long as m = n, and Pzz is positive definite, any decentralized estimation problem can be brought into this form, by performning an invertible coordinate transformation to the vector z. Thus, this assumption results to no loss of generality. Let A= [H1rHiT + R1 H2 H T H124HR H2H T l + R2 and note that f = 1[H, HT ]A-1 U We will now evaluate VR, f at (H2, R2) = (I, O). Note that VRf is a matrix of size m(m + 1)/2 x mn, because R 2 has m(mn + 1)/2 independent entries. A typical row of this matrix, denoted by df/dR 2(i, j), contains the partial derivatives of f with respect to a simultaneous change of the (i,j)-th and the (j, i)-th entry of R 2. We now use the formula VA - 1 = -A-1VAA-1 to see that the transpose of OR 2 (i,j) H12= R 2 =O 18 (4.7) (which is an m-dimensional column vector) is equal to [H l]A [ Eij ] []0 (4.8) Here, Eij, for i :$ j denotes the m x m matrix all of whose entries are zero except for its (i, j)-th and (j, i)-th entries, which are equal to 1. For i = j, Eii has all zero entries, except for the (i, i)-th entry which is equal to 1. It is now easily verified that A-' IH HiHT + Ri [[HT R2=0 H1 i]'[RI1 1 R -H -1 R1 -R-H1 1 1 ] * (4.9) R11 Taking (4.9) into account, expression (4.8) becomes, after some algebra, EijHTR-lu - Eij(I + HTRL1l)w. We differentiate once more, this time with respect to w, and obtain OR 2 (i,j) f H2=I Eij( +H1 1) '2=0 In terms of the notation used in Section 3, each entry of the matrix -Eij (I+ HTR1' H1) corresponds to a function of the form fic. Our objective being to apply Corollary 3.1, we will now compute the gradient of a typical entry of Eij(I + HTR-'IH1), with respect to the variables of sensor S1. More precisely, we only take the gradient with respect to R1. (By not taking the gradient with respect to some of the components of x, we are essentially deleting some of the rows of the matrix [Vxfi'(x, y): i, a], and this cannot increase the rank of that matrix.) In fact, it is more convenient to represent this gradient as an m x m symmetric matrix, rather than a vector of dimension m(m + 1)/2, with the entries in the upper triangular part corresponding to the components of the gradient. It is then understood that the rank in Corollary 3.1 will be computed in the vector space of mnx m symmetric matrices. - 1 H ) can be The (p, q)-th entry of Eij(I+HTRL written as eTEij(I+HTR1 Hl )eq, where ep and I eq are the p-th and q-th unit vectors, respectively. We then use the formula 2 VAXTAy = xyT + yxT to evaluate the gradient of the above expression, with respect to R1, at the point H1 = R1 = I. The result is seen to be -(eqe TEij + EijepeT). Note that eqeTEij + EijeieT = Eqj, if i and eqeTEij + EijeieT = 2Eqj, if i j, if i = j. Therefore, the matrices eqeTEij + EijepeT span the vector space of synuIetric matrices, which is of dimension rn(m + 1)/2. In the notation of Corollary 3.1, the rank of [Vzf,gc(x,y) : i,ac] evaluated at x* = (H,,Rl,u) = (I,I,u) and y* = (H2, R 2 , w) = (I, 0, w) is at least m(m + 1)/2. 2 o[a' '1'1le correct foriiula is only xyT. and aji sillmulta.ncoulslv. form bea.use VA standsl ,e for derivative in the direction get the symmetric 19 The desired lower bound now follows from Corollary 3.1, except for the minor difficulty that (x*, y*) 5 S. (This is because in g we have required R 2 to be positive definite.) However, an easy continuity argument shows that the rank of [Vfc(x, y): i, a] remains at least m(m + 1)/2 in an open set around (x*, y*). Thus, we can apply Corollary 3.1 to a point in the vicinity of (x*, y*) that belongs to g, and the proof is complete. Q.E.D. 5 Discussion In this paper, we considered the problem of minimizing the amount of communication in decentralized estimation. When the random variables involved are Gaussian, we have obtained some tight bounds on the number of messages that have to be commnunicated in order for a fusion center to make a statistically optimal estimation. Our results may provide useful insight and guidelines to design communication protocols for the decentralized estimation problems when the communication resource is scarce. While this paper has focused on static estimation problems, it might be interesting to consider extensions to decentralized Kalman filtering problems. A Appendix Proof of Lemma 2.1: Using Gaussian elimination, we can write C=P I ] Q, where P and Q are some invertible square matrices. Clearly, P and Q are computable in polynomial time. Let A = AQ- 1 and Y = XP. We then have r(A+XC) = r A + XP I = r (AQ-1 + XP [ r(A + 0 I 0]) ]) r r(A + [Y, 0]) = r([A 1 + Y},A 2 ]), where we have partitioned Y into [Y1, Y2] so that y[I O] =[r] 20 O] (A.1) and have partitioned A = [A1, A2] accordingly. This proves part (b). Since P is invertible and Y= [Y1, 2] = XP, we have from (A.1) minr(A+ XC) = nminr([A + Y1,A 2]) = r(A 2), (A.2) where the last step follows by taking Y1 = -Ai,. Since Q can be computed in polynomial time, we see that A 2 is also computable in polynomial time, which further implies that r(A 2) can be evaluated in polynomial time. Combining this with (A.2) yields part (c). Q.E.D. Proof of Lemma 2.2: First, by Lenunmma 2.1, there exists some linear transformation Y = XP under which r(A + XC) = r([AL + Y1, A2]), (A.3) where P and A' are two matrices computable in polynomial time and A' = [A', A'] is partitioned according to the partition Y = [Y1, Y2]. Under the same linear transformation Y = XP, we have r(B + XD) = r(B + Y'j) = r(B + Y1Dl + 1'2 D 2 ), where D = P-1D and i =/ [ (A.4) ] is partitioned according to Y = [Y,,Y2 ]. Using Lenuna 2.1 again (with the correspondence B + IY1/l - A, 6 2 D C and Y2 -+X), we obtain minr(B + YlDl + Y 2 D22) = r(B' + YD'), y2 (A.5) where B' and D' are some matrices computable in polynomial time from B, D1 and D 2. Combining (A.3) and (A.4), we obtain min{r(A + XC) + r(B + XD)} = min{r([A' + Y1, A']) + r(B + Ylf, + Y2 f 2 )} ±r([A + , AY']) + (n)inr(B + Y161 + Y 2 d 2 )) min = min{r(f[A' + Yi, A']) + r(B' + YD')}, (A.6) where the last step follows from (A.5). To complete the proof, we apply Lernma 2.1 to r(B' + Y1D'). In particular, there exists a linear transformation W = Y1P' (P' is an invertible matrix computable in polynomial time) such that r(B' + YD') = r(B1 + T;V, B2) 21 (A.7) for some matrices B1 and B 2 which are computable in polynomial time from B' and D'. Here W = [l'V1, W 2] is some partition of the matrix W. Under the same linear transformation W = YmP', we have r([A' + Y1, A']) = r([A + W(P')-', A]) = r([A4P'+ W = r([AIP' + [Wl1, W2], A2]), ,A]) where the second step follows from the invertibility of P'. Thus, we have minr([A + Y 1 ,A]) = minr([A[P' + [W1, W 2], A2]) w2 w 2 2 2 = = r([(AIP')1 + W 1 o,A2]) , r([Ai + W 1, A2]), (A.8) where the second step follows from choosing W2 = (A[P') 2 which is obtained by partitioning AlP' = [(A'P')1 , (A1P') 2 ] according to the partition W = [W1 , W 2]; and in the last step we have let Al = (A'P')l and A 2 = [0, A']. We now combine (A.7) with (A.8) to obtain nin{r([A' + Yt, A])+ r(B' + Y 1 D')} = 51 2,I min {r([A; + 1j, A]) + r([B + WV1 , B2])} ,W 2 = mnin {(minr([A + Y1, A2])) + r([Bi+WW,, B = min{r([A + W 1 , A2])+ r([B1 + W 1 , B2])} 2 !\ W2 ' 2 2])} This, together with (A.6), implies min{r(AXC) + r(B - XD)} X as desired. = nin{r(A + XC) + r(B + XD)} X - min{r([A1 + Wl,, A2]) + r([Bl + WI, B 2 ])}, Q.E.D. References [Abe78] Abelson, H., "Towards a Theory of Local and Global Computation", Theoretical Computer Science, Vol. 6, 1978, pp. 41-67. [Abe80] Abelson, H., "Lower Bounds on Information Transfer in Distributed Computations", Journal of A C/l, 27, 2, 1980, pp. 384-392. [BoM75] Borodin, A. and Munro, I., The Computational Complexity of Algebraic and Numeric Problems, AmIerican Elsevier, New York, 1975. 22 [BSS89] Blum, L. Shub, M. and Smnale, S., "On a Theory of Computation and Complexity over the Real Numbers: NP-Completeness, Recursive Functions and Universal Mlachines", Bulletin of American Mathematical Society, Vol. 21, No. 1, 1989, pp. 1-47. [BeT89] Bertsekas, D.P. and Tsitsiklis, J.N., Parallel and Distributed Computation: Numerical Methods, Prentice Hall, 1989. [Cho79] Chong, C.Y. "Hierarchical Estimation", Proceedings of the 2nd MIT/ONR Workshop on Command, Control, and Communications, Monterey, California, 1979. [HRL88] Hashemipour, H.R., Roy, S., and Laub, A.J., "Decentralized Structures for Parallel Kalman Filtering", IEEE Transactions on Automatic Control, Vol. AC-33, 1988, pp. 8894. [IIur60] Hurwicz, L., "Optimality and Informational Efficiency in Resource Allocation Processes," in Mathematical Methods in the Social Sciences, 1959, Arrow, K.J., Karlin S. and Suppes, P., (Eds.), Stanford University Press, Stanford, California, 1960, pp. 27-48. [Hur85] Hurwicz, L., "On Informational Decentralization and Efficiency in Resource Allocation", in S. Reiter, (Ed.), Studies in Mathematical Economics, Mathematical Association of America, 1985. [Jaj84] Ja'Ja, J., "The VLSI Complexity of Selected Graph Problems", Journal of A CM, Vol. 31, No. 2, 1984, pp. 377-391. [LiS81] Lipton, R.J. and Sedgewick, R., "Lower Bounds for VLSI", Proceedings of 13th STOC, 1981, pp. 300-307. [Lue84] Luenberger, D.G., Linear and Nonlinear Programming,Addison-Wesley Publishing Company, 1984. [Luo90] Luo, Z.-Q., "Comlmnication Complexity of Computing a Collection of Rational Functions", Advances in Computing and Information, Edited by Akl, S.G. and Koczkodaj, W.W., Springer-Verlag, 1990, pp. 453-462. [LuT89] Luo, Z.-Q. and Tsitsiklis, J.N., "On the Communication Complexity of Distributed Algebraic Computation", submitted to the Journal of Association of Computing Machinery. [LuT91] Luo, Z.-Q. and Tsitsiklis, J.N., "On the Coommunication Complexity of Solving a Polynomial Equation", SIAM Journal on Computing, in press. [MoR74] Mount, K. and Reiter, S., "The Informational Size of Message Spaces", Journal of Economic Theory, 1974, pp. 161-192. [PaS82] Papadifnitriou, C.H. and Sipser, M., "Commlunication Complexity", Proceedings of the 14th STOC, 1982, pp. 196-200. 23 [Spe79] Speyer, J.L., "Computation and Transmission Requirements for a Decentralized LinearQuadratic-Gaussian Control Problem," IEEE Transactions on Automhatic Control, Vol. AC-24, No. 2, April, 1979, pp. 266-269. [TsL87] Tsitsiklis, J.N. and Luo, Z.-Q., "Communication Complexity of Convex Optimization", Journal of Complexity, Vol. 3, pp. 231-243, 1987. [WBC82] Willsky, A.S., Bello, M.G., Castanon, D.A., Levy, B.C. and Verghese, G.C., "Combining and Updating of Local Estimates and Regional Maps Along Sets of One-Dimensional Tracks," IEEE Transactions on Automatic Control, Vol. AC-27, No. 4, August, 1982, pp. 799-813. [Yao79] Yao, A.C., "Some Complexity Questions Related to Distributed Computing", Proceedings of the 11th STOC, 1979, pp. 209-213. 24