A Numerical Study of Witsenhausen's Counterexample by Jordan Romvary B.S. in Electrical and Computer Engineering, Rutgers University, 2012 Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science at the ARCHIVES MASSACHUSETTS INST1TUJTE OF TECHNOLOLGY MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 0 7 2015 June 2015 @ Massachusetts Institute of Technology 2015. All rights reserved. A uthor.................... LIBRARIES Signature redacted Department of Oectrical Engineering-4nd Computer Science May 18, 2015 Certified by................. ........... Signature redacted Pablo Parrilo ofessor of Electrical Engineering Thesis Supervisor Accepted by ........................... Signature redacted DeCKA. Kolodziejski Chair, Department Committee on graduate Studies A Numerical Study of Witsenhausen's Counterexample by Jordan Romvary Submitted to the Department of Electrical Engineering and Computer Science on May 15, 2015, in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science Abstract In this thesis, we consider Witsenhausen's Counterexample, a two-stage control system in decentralized stochastic control. In particular, we investigate a specific homogenous integral equation that arises from the necessary first-order condition for optimality of the Stage I controller in the system. Using finite element (FE) analysis, we develop a numerical framework to study this integral equation and understand the structure of optimal controllers. We then solve the integral equation as a mathematical optimization and use our FE model to numerically compute a nonlinear controller that satisfies the necessary condition approximately at a set of quadrature points. Thesis Supervisor: Pablo Parrilo Title: Professor of Electrical Engineering 3 4 Acknowledgments I want to start off my thanking those who have helped me pursue this research as well as those who have helped me in my adjustment to life as an MIT graduate student. To my advisor Pablo Parrilo, I want to thank you for challenging me to think critically and for your patience through my countless research dead ends and MATLAB® code malfunctions. To Alan Oppenheim and Yury Polyanskiy, thank you for your support during my first year and for the countless conversations we had while I was searching for a research group to work in. To Terry Orlando, thank you for making time to meet with me, for listening to my concerns, and for helping me navigate my first two years at MIT. I would also like to thank those in my life who have contributed the most to my academic success as well as my development as a young man. To my parents, I want to thank you for sacrificing having fancy vacations and new cars so that my siblings and I could attend the best schools possible and obtain a thorough and diverse education. To my siblings, Christian, Jonathan, and Victoria, I want to thank you for being my first and closest friends and for challenging me to reach the highest levels as well as for supporting me when I struggled to do so. In addition, I want to thank my girlfriend Aarthy for being there for me the past two years and for giving me the strength to persevere when I questioned my place in graduate school. And to those other countless individuals in the Ashdown Community and elsewhere who have contributed to my success and helped me reach this level at MIT, I thank you wholeheartedly. Finally, I want to acknowledge my Jesuit education at Saint Joseph's Preparatory School for driving me to always question and investigate anything and everything. In particular, I want to acknowledge the teachers and administrators who tought me to be a man for and with others and to always keep in mind that all I do should be for the betterment of society and the well being of others. Ad maiorem Dei gloriam. 5 6 Contents Introduction 1.1.1 Formal Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.1.2 Static and Dynamic LQG Teams . . . . . . . . . . . . . . . . . . . . . . . . 17 . . 15 Previous W ork . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 21 1.3 Summary of Contributions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 . . . 1.2 Background 25 Mathematical Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Calculus of Variations: Functional Derivatives . . . . . . . . . . . . . . . . . . . . . 26 2.3 Numerical Integration: Quadrature Rules and Gauss-Hermite Quadrature . . . . . . 28 2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4 Optimal Transport Theory: The Monge-Kantorovich Problem . . . . . . . . . . . . 32 2.5 Properties of Special Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 . . . . . . 34 . . 2.1 2.5.2 MMSE Functional . W2 Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 . . . Gauss-Hermite Quadrature Witsenhausen's Counterexample 35 3.1 Classical Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Transport-Theoretic Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 The "Counterexample" Aspect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.1 Optimal Linear Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.2 1-Bit Quantization Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3 . 3 . . . . . . Main Theorems from TTF . 2 Team Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 15 . 1 7 3.3.3 . 52 Proof of Lemma 3.3.1 . . . . . . . . . . . . . . . . . . 52 3.4.2 Proof of Lemma 3.3.3 . . . . . . . . . . . . . . . . . . 54 Finite Element Model for Witsenhausen's Counterexample 57 4.1.1 D erivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1.2 D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 . . . . . . . . . . . . . 62 . . . . . . . . . . . . . . 62 . . 57 . . . . . . . . . . . . . 67 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finite Element Model 4.2.1 Model Parameters 4.2.2 Formulas for Computing WC Formulas & Equations 4.2.3 Justification of Rational Basis Functions . . . . . . . . . . . . . . . . . . . . 68 4.2.4 Discussion of "Edge Effects" in MMSE Calculation . . . . . . . . . . . . . 69 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.1 Accuracy of the FE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.2 Optimizational Framework using FE Model . . . . . . . . . . . . . . . . . . 74 4.3.3 Analysis of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 . . . . . . . . . . . . . 82 . Numerical Experiments using Finite Element Model..... . 4.3 Necessary Condition for WC . . . . . . . . . . . . . . . . . . . 4.2 . 3.4.1 4.1 4.4 Additional Plots for Chapter 4 4.5 Derivation of FE Model Approximations for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 99 5.1 C ontributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2 Future W ork . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Proofs for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . 4 49 . 3.4 Refuting the Conjecture . . . . . . . . . . . . . . . . . A MATLAB@ Code 100 101 A.1 Function Argum ents A.2 Function Specifications ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................................. 8 101 103 List of Figures 3-1 Classical Formulation of Witsenhausen's Counterexample. An initial random variable Xo is passed through the Stage I controller C1 to get U 1 . The sum Xo + U, is then passed, with additive uncertainty given by the random variable V, to the Stage II controller C2 to get U2 . We then take the difference between U2 and the output of Stage I to get X2 = (Xo + U1 ) - U2 . The objective of the control system is to minimize the quadratic cost k 2 U2 + X2 for some scalar k. 3-2 . . . . . . . . . . . . . . . I) of First-Order Condition for the Linear Case. Absolute Magnitude (IG[A] We set k = 0.2 and o,= 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 36 Solutions to (3.21) for k 2U 2 - 45 1. Each colored value represents a distinct solution to (3.21) for any designated (k, u) pair, where we iterate through all such pairs by changing k on the abscissa axis. 3-4 Absolute Magnitude of (3.28) for the 1-Bit Quantization Case. k 4-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2 and a = 5 and use Q = 46 We set 50 quadrature weights. . . . . . . . . . . . . . . . . . 50 Third-Order Sub-Basis Rational Basis Functions with A -= . These two sub-basis functions in turn make up the rational basis functions, which we use to computationally approximate the Stage I controller. Both sub-basis functions are zero outside the interval [0, 1) and, within this interval, are either strictly decreasing or increasing. ......... 4-2 ........................................ Example of the jth Third-Order Basis Function with A aj = 2, and aj+1 = 64 = -, aj_1 = 0, 1. We note that this jth third-order basis function is zero outside the interval [aj_, aj+l) and, within this interval, is increasing on [aj_1, aj] and decreasing on [aj, aj+ ) .. . . . . . . . ... .. . ................65 4-3 Relative Errors for Approximation of Linear Controller with A = 0.1. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . . 9 71 4-4 Relative Errors for Approximation of 1-Bit Controller with a = 1. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . . 4-5 72 Absolute Size (IIG[]({x}!_ 0 )jI 2 ) of First-Order Condition for the Linear Case using FE Model. We set k = 0.2 and o- 5 and use the values for our FE model from Table 4.2. We see that the overall behavior of IIG[I]({xi}= 0 ) 2 follows that of the theoretical IG[A] I from Figure 3-2, allowing us to verify the accuracy and suitability of the FE m odel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4-6 Example of 5-Level Sinusoidal Quantizer with 75 4-7 Optimal 5-Level Sinusoidal Quantizer. The form of this Stage I controller was = [--10, -4,0, 3,15]. . . . . . determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . 4-8 78 Necessary Functional Equation (G[*]) for Optimal 5-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and GaussHermite quadrature pairs as depicted in Table 4.9. 4-9 . . . . . . . . . . . . . . . . . . . 79 Relative Errors for Approximation of Linear Controller with A = 0.5. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . . 83 4-10 Relative Errors for Approximation of Linear Controller with A = 1. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . . 84 4-11 Relative Errors for Approximation of Linear Controller with A = 5. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . . 85 4-12 Relative Errors for Approximation of 1-Bit Controller with a = 2.8. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . . 86 4-13 Relative Errors for Approximation of 1-Bit Controller with a = 0- = 5. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . . 87 4-14 Relative Errors for Approximation of 1-Bit Controller with a = a 2 = 25. We denote functions approximated using our FE model with a hat, ^. . . . . . . . . . 88 4-15 Optimal 3-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . 10 89 4-16 Necessary Functional Equation (IIG[T3*]I) for Optimal 3-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . . . . . . . 90 4-17 Optimal 4-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . 91 4-18 Necessary Functional Equation (JIG[T*]H) for Optimal 4-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . . . . . . . 92 4-19 Optimal 6-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . 4-20 Necessary Functional Equation (JIG[Tr*] 93 I) for Optimal 6-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 11 . . . . . . . . . . . . . . . 94 12 List of Tables 4.1 FE Model Parameter Descriptions. These variables represent the principal components of our FE model, which we use to approximate the Stage I controller in the central problem of this thesis, Witsenhausen's Counterexample. . . . . . . . . . . . . 4.2 FE Model Parameter Values for Model Verification. 67 These values are used to evaluate the accuracy of our FE model approximation for the Stage I controller in the previously worked-out cases of linear controllers and 1-bit quantization controllers. 70 4.3 FE Model Parameter Values used to find Optimal 5-level Sinusoidal Quantizer. These parameters were used to fully define the optimization problem (4.17). 4.4 77 Optimal 5-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and Gauss-Hermite quadrature pairs as depicted in Tab le 4 .9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.5 Costs and Necessary Functional Equation Norms (JIG[-i*i]1I). These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3 for n = R = 3, n= R =4, n= R= 5, and n = R =6 and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . . . . . 4.6 81 Optimal 3-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Tab le 4 .9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 89 Optimal 4-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Tab le 4 .9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 91 4.8 Optimal 6-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in T ab le 4.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Gauss-Hermite Quadrature Nodes ({xj}R_) and Weights ({wj}R values were determined computationally for the case of (k = 0.2, a-= 5). 14 ). 93 These . . . . . . . 95 Chapter 1 Introduction We begin this thesis by introducing the central problem: Witsenhausen's Counterexample (WC). We shall build up to WC in the context of team decision theory, a variant of the more general decentralized stochastic control. We conclude with a discussion of previous works on WC as well as our contributions through this thesis. Then, we end with a detailing of the organization of the rest of this thesis. 1.1 Team Decision Theory Team decision theory refers to the stochastic control situation in which each decision maker (DM) has access to different sets of information and acts once. Following closely the approach and illustrative examples of [1], we shall make clear some of the basic concepts regarding the interplay between information structures and notions such as signaling, estimation, and error reduction. We shall detail basic results from the theory, including those dealing with the static linear quadratic Gaussian (LQG) case. 1 We then discuss WC using the terminology and methods introduced in this section and mention how it shows that dynamic LQG teams are fundamentally different from their static counterparts. 1.1.1 Formal Definition To begin, we define the team problem to consist of five main facets: 'By static we mean that the underlying information structure does not imply a necessary ordering of the control actions. 15 a) a q-length random vector = [6, ... , (] we let the probability distribution of [ui, ... b) a control action vector u = C B representing all uncertainties in the system, where be denoted as P( ); , UK] E U, where ui represents the action of decision maker i (DMa), for i = 1,..., K; c) an observation vector z = [z,.. . , ZK] E Z, where zi = 77i[uc() (1.1) represents the observation available to DMi, for i = 1,..., K. We note that {ili = 1,... , K} is also known as the information structure of the team problem. We further note that the observation of DMi may depend on the actions of the other agents; 2 d) a set of control laws { yi :Z UIi = 1, ... -÷ ,K}, where Zi and Uj refer to the sets of admissible observations and admissible actions, respectively, for DMi and where each control action of a particular DMi is selected according to ui = y I[zi]. We let F denote the set of acceptable control laws -yj for DMi. We also let -y = [71, ... F = F 1 x ... x 1 K refer to the strategy of the system of all DMs, and denote the set of all admissible strategies; e) a cost function L : U x L(u, , YK] ) = E -÷ R, where L(ui = Y1[T1(U-1,&)],.. .,UK = 'YK[?K(-K, ) i,, --- , q)- 3 (1.2) We note that the expectation of L w.r.t. P( ) is well-defined as we have fully specified the control laws. Next, we formally define the team decision problem in its most exact, so-called strategicform. If we introduce the functional J(y) = E [L(u = '-y(q(u, i)), )] for some control law vector, or strategy, F, then the strategic form of the decision problem is y min J() yEr min E [L(u = -y(rq(u, )), )]. yEr 2 (1.3) We refer to the situations in which (1.1) does not depend on the actions of the other agents, or in which the dependence of the observation on the actions of the other agents is known, as static team decision problems. On the other hand, the situations in which (1.1) does depend on the actions of the other agents are referred to as dynamic team decision problems, and such dynamic teams necessarily require some type of partial ordering of the actions of the agents on which the observation of agent i depends (so-called causality constraints). 3The notation u-j refers to a vector consisting of all the entries of u with the exception of the entry at index j. 16 We note that this optimization, as opposed to being a parameter optimization, is a functional optimization. Such optimizations are inherently more difficult, so attempting to solve for an optimal strategy via this formulation could be very computationally intensive. Instead, consider the decision maker's point of view. Specifically, for DMj, let yj represent the control laws of all the other DMs except for i. We will treat these control laws as fixed. Then, from DMi's point of view, the decision problem becomes min J(-yi, yi) = min E [L({ui = yi(qi(u, )),i}, We observe that, once we define our information structure ni as an appropriate Borel-measurable function, our measurement zi becomes a well-defined random variable. about the conditional distribution of (1.4) )]. Therefore, we can talk given zi and, in particular, the conditional expectation EJ[] = Ez [Eji[-]]. Thus, we can further reduce (1.4) to its extensive form: min J(-yi, >y) = min Ez, [Eg z, [L({ui = yj(nj(u, )y-i}, = Ez, [min Egjz [L({ui, y-J}, )] (1.5) . Indeed, we can reduce it to the following person-by-person, or semi-strategic, form: (1.6) min Ji(ui, zi; yi), Vi, min Egjz, [L({ui, yi},)] E UuiUui whose solution can be obtained iteratively by "guessing" the right strategy -y and then checking whether the chosen strategy is fixed under the optimization specified in (1.6) for all DMs. We note, however, that (1.3) -> (1.6), but not necessarily vice versa. 1.1.2 Static and Dynamic LQG Teams Next, we shall discuss some conceptual differences between what is possible in static and dynamic teams. For the sake of brevity and ease of exposition, we will assume that all the uncertainties in the problem are Gaussian random variables of arbitrary correlation. We shall further assume that the cost functions are quadratic and that all observation functions (i.e., q) are linear. These stipulations define the linear quadratic Gaussian (LQG) team problem. 17 - .A(O, a2 ) First, let us consider the very simple static LQG team as specified below, with X being the Gaussian initial state and V1, V2 ~ N(O, 1) being the noise measurements: 4 L =(X+U 1 -+U2 )2 +U+IU Z = X+ V Z2 = (1.7) X+V2 An interpretation of this problem is as follows: Both DM1 and DM 2 observe the initial state with some additive white Gaussian noise. They then act to bring the state, which was originally X, to some other state X + Ui + U 2 . Their goal is to have their actions cancel out X, while also minimizing the energy of their individual control actions. The order in which they act does not matter, and we can, without loss of generality, assume a lexicographical ordering of their actions, i.e., DM1 acts before DM 2 . In this problem, there is no possibility for cooperation between the two DMs beyond some initial coordination and planning before they receive their respective measurements. It turns out that the optimal solution to this problem is actually given by (1.6), as a result of the following theorem: Theorem 1.1.1 (Proposition 1, [1) Let _UTQU + uTS and z = Q, S, and H be matrices such that Q > 0. If L = H , then the unique optimal control laws are linear and can be solved from (1.6). At this point, the problem of finding an optimal set of control laws is no more than a simple optimization problem, and the action ordering does not matter. What happens if we consider a situation in which the information available to DM 1 and DM 2 is a little uneven, for example, if we allow DM 2 to know what DM, knows and also have a separate measurement on the state of the system after DM 2 acts? Would not such a problem necessitate DM1 acting before DM 2 ? To explore the answers to these questions, consider the LQG team as specified below: + U1 + U2)2 + IU2 + IU2 =(X ~= Y2 2 Z1~ + L Z,~ Z2 .,X+V (1.8) =, Y2 X + U1 + V2 4 . We note here that, because we assumed that both -y and 72 are Borel-measurable mappings, their outputs, U 1 and U2 , respectively, are themselves random variables. This is important in the case of dynamic teams because, if 772 depends on the control action U1 , then U2 is not well-defined until U1 is actualized. Hence, there is a necessary partial ordering of their actions, i.e., DM 1 must act before DM 2 18 At first glance, this problem appears to be fundamentally different from that of (1.7). It seems to necessitate DM 1 acting first, advancing the state of the system to X 1 = X + U 1 , and then DM 2 acting upon both DMj's initial measurement Z1 and a new noisy measurement of the system, Y2 . However, if DM 2 knows the control law 71 of DM, then we can define an equivalent set of measurements for DM 2 as 1 Z2=XVl -22 F2 Y71(1.9) Z1 = X+ V Y2 - Y(Yi) = X 19 + V2 which has the same form as the problem specification in (1.7). In fact, we can also get Z2 from Z2 with knowledge of -Yi. Hence, from Theorem 1.1.1, we know that the solution of the optimal control laws to (1.8) is linear. We refer to the information structure displayed in (1.8) as that of perfect recall, that is, an information structure in which all agents who act after other agents have access to all the information those previous agents had when they made their decisions. Indeed, if we further treat each successive agent's decision as coming from the same agent (i.e., a one-person team), then we get the following theorem: Theorem 1.1.2 (Proposition 2, [11) In one-person LQG teams with perfect recall, the optimal control laws are linear. Indeed, the information structure of perfect recall for one-person LQG games (which we can interpret our last example to be) is very similar to a more general information structure known as partially nested. Such an information structure can be applied to situations in which there is a time structure involved and a partial ordering of the actions of the agents, i.e., there is some element of sequential control involved. Indeed, we can represent the information available to DMi when it is its turn to act as Zz = Hj + DjU, (1.10) where Hi and Di are linear operators (i.e., matrices) that satisfy causality constraints, so that the action of any agent that acts after DMi is not factored into the observation information Zi. Such an information structure necessitates a partial ordering of the agent actions because, if qj depends on Ui, then Uj is not a well-defined random variable until Uj is actualized, meaning that DMi must act before DMj (unless DMj has some side information that can remove the dependence of rmj on the action Ui, akin to (1.9)). 19 As such, we have the following theorem: Theorem 1.1.3 (Proposition 3, [1) In an LQG team with a partially nested information structure, the optimal control laws are linear. Indeed, due to the assumed superiority of linear control laws, a natural conjecture would be the following: Conjecture 1.1.4 In any LQG team, the optimal control laws are linear. Conjecture 1.1.4 is reasonable given Theorems 1.1.2 and 1.1.3, but consider the following system for some k E R+ L =(X + U + U2)2+ k2U2 Y AX (1.11) Z = Z2 =Y 2 AX+U+V 2 At first glance, this problem stipulation looks very similar to that of (1.8). However, the underlying information structure is NOT the partially nested information structure that was present in that problem. Indeed, while DM 1 has perfect measurement of the state X, the only information that . DM 2 has about the underlying state X is wholly affected by the choice of control action of DM1 Therefore there is no equivalence to (1.7) as there was in the case of (1.8) because knowledge of the control law -y1 does not help in the same way. We can interpret this problem as follows: DM 1 observes the state of the system X and then performs the action U1, advancing the state of the system to Xi = X + U1. DM 2 then receives a noisy measurement of this state, Z2 = X + U1 + V2, and chooses a control action accordingly. The , goal for DM 1 is to try and cancel out X using as little energy as possible in its control action U 1 whereas DM 2 desires to cancel out X+U 1 . Now, if DM 1 cancels out X completely (i.e., X+U1 and DM 2 accordingly follows U 2 = 2 = 0), 2 0, then L = k X , whose expectation w.r.t X remains high. If, on the other hand, DM 1 uses no energy (i.e., U1 = 0), then DM 2 must choose a U2 to negate X based on an observation with additive white gaussian noise (AWGN), which is known as the minimum mean squared error (MMSE) problem in statistic inference and has a relatively high expectation , cost of L w.r.t. X and V2. As such, there is an inherent trade-off between the two purposes of DM1 that of reducing the error directly through its efforts and that of signaling DM 2 by reducing the 20 uncertainty in the information received by it. 5 It turns out that the control laws to the above problem were shown by Witsenhausen in [3] to be nonlinear, refuting Conjecture 1.1.4! This system, which we expand more upon in Chapter 3 and which serves as the main focus of this thesis, is known as Witsenhausen's Counterexample (WC). Indeed, one can see the advantage of nonlinearity by considering the control laws specified by = o-sgn(X) - X U2 = o- tanh(uZ 2 where U2 is the MMSE estimator for X (1.13) ) Ui + U under AWGN. In fact, these control laws, on average, outperform any linear control law in a certain regime of k and -! This is because these control laws, as opposed to linear control laws, attempt to balance the error reduction and signaling aspects of decentralized control, which, along with estimation, are the three pillars of the "tridimensional nature" of decentralized stochastic control. Further information concerning team decision theory and its widespread applications in information theory, economics, and game theory can be found in [1]. The rest of this thesis will focus on WC. 1.2 Previous Work Since its initial publication in 1968, the simple two-stage control system in (1.11) that Witsenhausen analyzed to disprove Conjecture 1.1.4 has attracted much interest in the control theory, information theory, and computer science research communities. Most of the research on WC has focused on two main thrusts. The first involves finding optimal controllers through the use of step functions (which we refer to as quantization schemes) for the canonical case of ir(k2 , .2 ) k=.2,-=5 (see Chapter 3). Beginning with Mitter and Sahai's demonstration that quantization schemes will achieve arbitrarily low costs in the regime of very small k (with o.2 k 2 = 5 1) [4], a lot of work has been conducted on A problem formulation from [2] that makes this concept of signaling more apparent is as follows: SL Y1 Z{ Z2 + U1+U2)2+ _IU12 =(X =Y2 X (1.12) =U1 If DM 1 and DM 2 pursue the control laws U1 = -yi(Yi) = eX and U2 = 7 2 (Y2) = E - 1 U1 , respectively, then they can, on average, almost cancel out the expected cost J(-y, ), if E -* 0. However, if DM 1 wished to communicate some information to DM 2 , then it could do so through a little additional cost. That is, instead of letting E -4 0, DM 1 could choose some Eo > 0 and transmit some information to DM 2 about the size of X via its control law yi. This is known as the "transparency of information," which is further explained in [2]. 21 efficiently searching the feasible set of (2n + 1)-bit quantization schemes, most successfully through hierarchical search methods [5] and potential games [6]. However, basic computational difficulties of such schemes have been discussed in [7] and [8], the latter of which showed that a discrete version of the WC was NP complete. The second thrust uses information theory to develop upper and lower bounds on the cost for optimal controllers. For example, considering a finite-dimensional analog to WC, Grover and Sahai [9, 10] were able to develop control strategies that approximate optimal controllers within a bounded interval. Moreover, Choudhuri and Mitra considered implicit discrete memoryless channels in the case of an asymptotic version of WC in [11] to great effect. In addition, Wu and Verdni recently used optimal transport theory to reformulate WC from a functional optimization to a probability measure optimization [12]. Using this formulation, Wu and Verdfi were able to discern more analytical properties of optimal controllers and show that the necessary condition first introduced by Witsenhausen in his original paper had to hold everywhere. We touch more on this in Chapter 3. 1.3 Summary of Contributions Our main contribution in this thesis is the design and implementation of a finite element model to be used in the study of the necessary condition of WC system. As we shall discuss later, the necessary condition presents as a homogenous integral equation of the Stage I controller, f. In particular, we investigate the case in which all the native random variables are Gaussian in nature and the system specifications, k and o-, satisfy k 2 U2 = 1. We verify the accuracy of our computational model using analytically derived formulas for the simple linear and 1-bit quantization controllers. We then develop a mathematical optimization framework to solve the necessary condition approximately at certain points of interest. These points of interest turn out be Gauss-Hermite quadrature abscissas. Using a simple family of five-parameter controllers, we then demonstrate successfully that we can find controllers within this family that approximately satisfy the necessary condition. 1.4 Organization of the Thesis The thesis will proceed as follows: Chapter 2 details some of the general mathematical necessaries and background that will be required for sufficient understanding of the content of this thesis. 22 Chapter 3 introduces WC and discusses the existence of optimal solutions and the non-optimality of linear controllers. Chapter 4 includes a full derivation of the first-order condition for optimality and also introduces our finite element model and the results of numerical experiments obtained using the model. Ultimately, Chapter 5 summarizes our main contributions and suggests future avenues for research that build upon the results of this thesis. 23 24 Chapter 2 Background In this chapter, we will detail and define some of the mathematical terminology and notation we will be using. We will also discuss some of the underlying concepts that serve as the basis for our finite element model and the derivation of the necessary variational equation for WC (Chapter 4). 2.1 Mathematical Notation To begin, let us discuss the notation and mathematical concepts we will be using in this thesis. We denote the set of real numbers by R, and use Rd to refer to the d-dimensional vector space defined in the usual way with R representing the underlying scalar field. Z is the set of integers, and N is the set of non-negative integers (including 0). We denote vectors, those belonging to Rd for some nonzero d E N as well as those belonging to the infinitely long extension of Rd, by non-italicized lower-case letters with "hats" (e.g., A) or by an explicitly defined Greek letter with a "hat" (e.g., C.). Constants are represented by italicized letters and are always assumed to be members of the real field, R, unless otherwise stated. In addition, limits are assumed to be defined as in the usual sense (so-called strong limits), and we denote elements of a set or a vector by italicized lower-case letters with subscripts (e.g., wi or aj). We denote sets themselves by an explicit representation like {ai}f 0 , or simply as {ai} when the bounds of the set are understood from the context. Underlying functions are assumed to be functions of R into R and are always denoted by lower-case letters, unless otherwise stated. Sets of functions or vectors are denoted by non-italicized capital letters and will be defined as they are introduced. Also, derivatives of functions are defined with ax -2- representing a partial derivative and d3a 44 regular" derivative (in the sense of single-variable calculus). We denote derivatives of functions 25 explicitly using these aforementioned representations, though we at times use upper ticks (like ') when the underlying variable we are differentiating w.r.t. is understood. Furthermore, we represent functionals between function spaces using the regular functional analysis notation. For example, if G : A -- B is a mapping between function spaces A and G B, and if fi-g, then we write g = G[f]. Also, inner products are defined similarly for function spaces as well as vector spaces. For example, the inner product between two vectors, a and b, is defined as (h, b) = E aibi, and the inner product between two functions, f and g, is defined as (f, g) = f f(x)g(x)dx, unless otherwise indicated. 1 Also, we are assuming an underlying measurable space of (R, B(R)), where R is our observation set, and B(R) is the Borel algebra comprising all real Borel sets. From this measurable space, we define our probability laws for all of our independent random variables in our control system. Random variables themselves are denoted as italicized capital letters (e.g., X). In addition, we denote by P(B(R)) the set of all Borel-measurablefunctions on this space, i.e., F(B(R)) is the set of all real-valued real functions f : R -÷ R such that, if E E B(R), then f- 1 (E) E B(R). Finally, for arguments that utilize the term weak, we mean this in the functional analysis sense as follows: Consider the underlying metric space (R, I-) and the corresponding topology that is induced by it, (R, r1 . 1). We note that, because this metric space is completely separable (i.e., contains a dense countable subset), we have that the Levy-Prokhorov metric metrizes the notion of weak convergence in measures (or, more simply in the case of probability measures, convergence in distribution). Hence, we say that a sequence of probability measures {P,} converges weakly to a probability measure P (written P, !4 P) iff P, d+ P. Thus, for example, when we say that a particular function # : P (B (R)) -+ R is lower semi-continuous, we mean that #(P) <; lim infu, 0, O(Pa) for any P 14 P. One final note is that we do utilize some standard shorthand throughout our discussions and proofs. For example, "RHS/LHS" means "right-hand side/left-hand side". 2.2 Calculus of Variations: Functional Derivatives The mathematical field known as calculus of variations is concerned with maximizing and/or minimizing functionals, or functions (of functions) that map functions in some function space to the 'Because we are assuming only real scalars, there is no need for complex conjugation in the second multiplicand as one usually encounters. Though, for the sake of completeness, we implicitly define the inner product as (f, g) f f(x)g(x)dx. 26 A.A.4k- - - lk4k- , - ' underlying scalar field (usually R). A central result of calculus of variations is that, if G is a functional, then any function f that minimizes G must necessarily have the first variationof G, 6GIf, be zero. The treatment in this section is based on [13], and we refer the reader to that resource for a more thorough summary of this material. Before defining what is meant by first variation, recall the concept of a derivative of a funcWanting the derivative of a function to capture the tion, If(x), from single-variable calculus. instantaneous change (or speed in some contexts) of the given function, we define it as follows: df (x) = lim f(X+A)-f W. A &-+-o dxk (2.1) We can interpret this as providing us local information about how our function behaves under incremental changes to a value in its domain. Indeed, the derivatives and their higher-order equivalents tell us so much information about the structure of the function that we can represent nearly every common function using a Taylor Series of the form f (x) = f (a) + f'(a)(x - a) + 2! (x -a) 2 + 3! (x - a)3 + o(jx - a1), (2.2) where a is the value in the domain about which we are defining our series. 2 This is a very useful representation as it leads to a necessary condition f'(x) = 0 for any value of x that minimizes/maximizes f, a condition that is commonly known as the first-order necessary condition for optimality.3 We can apply the same rationale when defining derivatives of functionals, also known as functional derivatives. With a little more mathematical finery and careful attention, we can similarly expand functions in terms of functional derivatives. In particular, if G is our underlying functional, and f is a specified function in the domain of G, then we can define the Giteaux derivative (also known as the first variation) of G at f as JG|f (0) = OG = de -G~f + EO] ,(2.3) dE 6=0 where 2 4' is a function with e a scalar and E<b the variation of f. We can interpret this as generalizing Here, we are using the little-o notation, which represents functions that go to zero faster than the specified function. That is, we say b is o(c) if lim,,,o b(x)= 0. 3 We are assuming underlying domains like R, which are not inherently bounded. Otherwise, we could have the maximum of a function on a bounded interval whose derivative is not zero. For example, consider f(x) = x on [-1, 1]. 27 , - . .4." the notion of a regular derivative in the sense that we look at how the functional responds to "instantaneous" changes in its underlying function, with this change represented by f + eip, where E is assumed to be very small. Indeed, as was stated earlier, we can expand G in terms of its first variation as G[f + e4] = G[f] + 6G|f (O)E + o(e). (2.4) This resembles the Taylor Series representation discussed above and also leads to a similar firstorder necessary condition for optimality for optimizing functionals: 6G~f (0) = 0. This condition becomes extremely useful in our analysis of optimal controllers in WC as we see in Chapter 4. 2.3 Numerical Integration: Quadrature Rules and Gauss-Hermite Quadrature Other concepts central to the work of this thesis are the ideas of numerical integration and quadrature rules. Our discussion here will proceed in a similar fashion to that of [14], and we refer the reader to that resource for a much more thorough introduction to this very important topic. To begin, the basic idea behind numerical integration is to approximate an integral of the form f f(x)w(x)dx by another integral f f(x)w(x)dx that can be evaluated more easily. 4 In a lot of cases, this latter integral involving f can be approximated as a summation utilizing a set of weights, {wi} 1 , and abscissa points, {xj} 1 , so that: f (x)w(x)dx (x)w(x)dx n = Zwif (xi). i=1 Hence, the key to quadrature rules is to choose the weights and abscissa points correctly. Now, assuming that the abscissa points are chosen ({xi} T), we can easily determine the weight function by considering the family of Lagrange polynomials on this set of abscissa points: A (X - X1)(X -X1) (xi - x1 )(Xi - X1) -- (X - Xi-1)(x ... (Xi - i_1)(xi -i1+) i+1) ...(X - Xn_ 1) (X - Xn) . .. (X - xn_1)(Xi - xn) (2.5) These polynomials act as interpolants in the open subsets between successive abscissa values (e.g., 4In this context, we refer to w(x) as the weighting function and f(x) as the integrand. To treat this more generally, we could define the integral as f fdp, where pt is a measure of the underlying measurable space. All of the results in this section could then be employed in a similar fashion to treat this case as discussed in [14]. 28 (Xi, Xi+1)) and as a standard Kronecker delta function at the abscissa points (i.e., at the boundaries of the aforementioned open subsets). 5 Indeed, if we consider the values of f at these abscissa points, we can define our approximation to f, f, explicitly as n f(x) an expression that satisfies f f (xi)Li(x), (2.6) exactly at the abscissa points. Using this approximation, we can then define our set of weights as (2.7) Li (x) w(x) dx, Wi since Jfw(x)dx J f(xj)Lj(x) w(x)dx Li(x)w(x)dx) f(Xi). Now that we know how to choose our weights given a set of abscissa (or nodal) points, we turn our attention to choosing the best set of points possible. In order to do so, we can employ the simple rationale that we would like to have some polynomial, P, of degree n with roots {xj 3 (v, P) = 6 f v(x)i(x)w(x)dx = 0 for any polynomial v of degree less than n. If we have such a P, then for any polynomial f of degree less than 2n, we have f(x)w(x)dx (t(x)P(x) = t(x)p(x)w(x)dx + = (tP) + J =0+ =( r(x)w(x)dx wjr(xj) j=1 5 Li(x, ) =i { J r(x)w(x)dx n 6We write P(x) + r(x))w(x)dx = ~0 otherwise (x - xo) ... (x - x,). 29 r(x)w(x)dx _1, such that n j=1 where we utilized Euler's division rule to determine another remainder polynomial, r, and polynomial multiplier, t, both with degrees less than n. We also utilized that xj's are roots of that f(xj) = P so t(xj)p(xj) + r(xj) = r(xj). Indeed, this means that we are able to fully represent a continuous integral involving In addition, even if f f and w by a finite summation involving only n points and n weights! is not a polynomial, we can think of the n-point quadrature rule as the best approximation of the integral of f, if we assumed f was represented using a Taylor Series of degree less than 2n. Of course, the goal is finding this special polynomial P and determining its roots. A straightforward way to do so is to employ the standard Gram-Schmidt orthogonalization procedure to determine an orthonormal basis for the span {1, X,X 2 ,... functions, {pj}> O, for x" pj(x), - j = 1, ... (2.8) f" and get a series of orthonormal n,, as follows: Pn(X) with po(x) = 1 when (a, b) , x} n a(x)b(x)w(x)dx. We then have that pn acts as P, and the roots of pn end up being the set of abscissas we are looking for! Indeed, any polynomial pn determined by this method is guaranteed to have n distinct roots and satisfy (V,pn) = 0 [141. Thus, we can methodically determine our abscissa points and then, using these points, determine our weights. In practice, however, finding pn for a general weight function w(x) is difficult. Luckily, a wellknown result from numerical analysis says that the set of orthonormal basis polynomials formed in (2.8) have to satisfy the following recurrence relation: Theorem 2.3.1 (Three-Term Recurrence Relation) pn+l(x) = (x - an)pn(x) - bnpn_1(x), with an = (T-'-Pn) f(nPn) ( = (poPo) ard b =- () (Pn-1,Pn-i) (bo = (2.9) 0). Using this relation, we then have the following incredible result from Golub-Welsch [15 in the form as it was presented in [14]: 30 Theorem 2.3.2 If {aj}i_7 and {b}}-1 are the terms in a three-term recurrence relation, then the Jacobian matrix for this three-term recurrence relation is defined as ao bj Vbi ai \b . (2.10) .. / n- -. a_-1 an~i ba_ 1 And so, if Jn = VAVT is the eigenvalue decomposition, we have that the eigenvalues of A correspond directly to the abscissa points (xj = Aj), and the weights correspond to scaled versions of the first components of the eigenvectors (w= (f w(x)dx) v2 0 , where Vj is the jth column of V). Indeed, we note that Theorem 2.3.2 has transformed the problem of finding appropriate weights and abscissas to one of finding eigenvalues and eigenvectors. As methods for solving such problems with sparse matrices like J, have been highly optimized over recent years, we can efficiently and accurately determine Gaussian quadrature rules! 2.3.1 Gauss-Hermite Quadrature In the case of infinite integrals involving a Gaussian weight function, w(x) = ex 2 , we can use what is known as Gauss-Hermite quadrature. The three-term recurrence relation for the orthogonal polynomial basis (known as the Gauss-Hermite polynomial basis) can easily be determined to be [14] PGH,n+1(=) XPGH,n(X) - () PGH,n-1(), (2.11) meaning that our weights and abscissas for any n-point approximation can be found by investigating the eigenvalues and eigenvectors for a matrix of the form: 0 1 0 (2.12) 0 n-2 31 n-1 This matrix is sparse (in the sense of being mostly zero), so we can efficiently determine the eigenvalues and eigenvectors and, ipso facto, efficiently determine the abscissa values and weights. Indeed, our approximation of expectations involving Gaussian random variables will use Gauss- fi(A, oa2 ). Then, we can determine an n-point approximation of the expectation of f(X) w.r.t. X using Hermite quadrature in a critical way. In particular, suppose f is any function and X Gauss-Hermite quadrature as follows: E [f(X)] f(x) dx oo f(( n 2'2)U+)U) eu du f x p22)Xj+ ~nw j=1 \/ (V2,)xj + /_) .(2.13) f j=1 It should be emphasized that (2.13) forms the backbone of our FE model as we discuss in Chapter 4. We also note that, when we refer to an n-point Gauss-Hermite quadrature in the context of Gaussian probability weighting functions, we are referring to the set of weights set of abscissas { (V/2a) xj + p}, where {wj}>_ 1 { ,} and the and {xj}>_ 1 are the weights and abscissas, respectively, for the standard n-point Gauss-Hermite quadrature rule. 2.4 Optimal Transport Theory: The Monge-Kantorovich Problem As we will be discussing an alternate formulation of WC that utilizes the theory of optimal transport, we will now very briefly introduce some of the related terminology and the basic transport problem. Generally speaking, optimal transport theory is concerned with problems regarding supply and demand, specifically methods of efficiently "transporting" the supply to meet the demand. A given tranportplan is regarded as being "efficient" if it minimizes some cost criterion, typically one based, in some sense, on the common Euclidean metric. A classic example of optimal transport theory involves finding minimum transportation plans from "mines" to "factories" within the Cartesian plane, where the mines and factories are located at given coordinates and the cost function in question is the Euclidean metric. 32 More formally, we can consider the classic Monge-Kantorovich probabilistic formulation for a probability space (Q, B(Q)) and probability measures [ and v on Q with a cost function c: O x Q- [0, +oo]: min cd Y E 11(g, V) (2.14) where II(y, v) {E E P(Q x Q) : -y(X, Q) = p (X), -(Q, Y) = v(Y) for Borel X, Y} (2.15) is the set of transportplans between it and v [161. Since p 0 v E II(y, v) (i.e., the joint distribution of independent p and v), we know that there always exists at least one transport plan to consider. One can also prove that an optimal transport plan always exists using calculus of variations [16]. We will touch more on this in Chapter 3 when we discuss a formulation of WC in terms of transport plans. 2.5 Properties of Special Functionals In the next two sections, we will be stating (without proof) some of the necessary properties and associated lemmas for both the MMSE functional defined in (3.11) and the quadratic Wasserstein metric. These two functionals will be used during our discussion of the Wu and Verd6 formulation of WC in Chapter 3. 2.5.1 W 2 Functional First, we define the following space and metric (the results in this section borrow heavily from [17]): Definition The quadratic Wasserstein space on R is the collection of all Borel probability measures with finite second moments and is denoted by P2 (R) A { P E F (B (R)) I f x 2 dFp(x) < oo}, where F (B (R)) denotes the set of Borel-measurable functions on R. Definition The quadratic Wasserstein metric (or W2 ) is a metric on P2 (R) defined for any P, Q E P2 (R) as W 2 (P,Q) A inf {E[(X -Y) The following results are borrowed from [12]: 33 :Px=PPy=Q}. (2.16) Lemma 2.5.1 a) For any P, Q E P2 (R), we have W 2 (P, Q) F 1 (t) - F-1(t)2dt = (2.17) for cumulative distributionfunctions Fp and FQ. b) (P, Q) " W 2 (P, Q) is weakly lower semi-continuous. c) (P, Q) " W22(P, Q) is convex in Q. d) For any strictly increasing measurable function f : R -÷ R, W2 (Px, Pf x)) = VE [X - f(X)]. Remark In the context of the Monge-Kantorovich transportation problem where our cost is given by W2 , we note that the optimal coupling of random variables X and Y are X = FF1 (U) and Y F -1(U), where U ~ Unif(O, 1). That is to say, if P is atomless, then the optimal transportation = plan between P and Q is deterministic and is given by T = Q-- o P, where o is the composition operator. 2.5.2 MMSE Functional In this subsection, we shall review two properties of the MMSE functional that are used for the main results discussed in this thesis. The MMSE functional is defined as mmse (X, a.2) - min E [(X - g(oX + N))2] (2.18) = E [var (Xo-X + N)], where X0 = (2.19) -X with X distributed according to the probability measure P, and N is some additive random noise. Using this, we have the following result from [18, 19, 20], assuming that Q E F (B (R)) and o> 0: Lemma 2.5.2 a) Q H-+ mmse (Q, .2 ) is weakly continuous. b) Among all probability distributionswith variance .2 , Gaussian distributions maximize the MMSE functional. 34 Chapter 3 Witsenhausen's Counterexample In this chapter, we will formally present WC and analyze various properties and aspects of the twostage controller system. As we had stated previously, WC is a two-stage control system in which two agents (hereafter referred to as Controller 1 (C1) and Controller 2 (C2)) attempt to "match" the value of a random variable in two stages. Its difficulty arises from its non-classical information structure (i.e., C1 and C2 have access to different information and can only communicate implicitly). We shall proceed by overviewing the classical formulation (Witsenhausen's so-called "classical formulation") and then briefly summarize a more recent alternate, but equivalent, formulation of WC based on optimal transport theory as given by Wu and Verdi [121. We shall also detail the necessary equations for understanding WC in the context of the canonical controller examples, i.e., in the linear and n-bit quantization cases, and briefly go through how Witsenhausen mathematically refuted Conjecture 1.1.4, the "LQG conjecture" mentioned in Chapter 1. 3.1 Classical Formulation Consider the two-stage decentralized control system specified as follows and depicted in Figure 3-1. Let X0 and V be two independent nondefective random variables with finite second moments. Then, we specify the state equations of the system as " Xo = Xo " X 1 =Xo+U SX 2 = X- U2 and the observation equations of the system as 35 . Figure 3-1: Classical Formulation of Witsenhausen's Counterexample. An initial random variable X0 is passed through the Stage I controller C1 to get U 1 . The sum Xo + U1 is then passed, with additive uncertainty given by the random variable V, to the Stage II controller C2 to get U2 We then take the difference between U2 and the output of Stage I to get X2 = (Xo + U1 ) - U2 . The objective of the control system is to minimize the quadratic cost k 2 U + X2 for some scalar k. " Y1 = X0 Y X 1 + V, where U1 and U2 are the control variables given by the Borel-measurable functions (i.e., control laws) yI and Y2 as U1 = 7 1 (Y) and U 2 = 2 (Y2 ). The cost function of the system is given by Exov[k 2 U1 + X] and the optimal cost of the system is the minimum cost over all applicable Borel-measurable functions, E[k 2 U + X], min (3.1) 'i,y 2 EF(Z3) where F (B) is the set of Borel-measurable real-valued functions. As Witsenhausen showed, the above basic two-stage control problem is equivalent to another specification. Specifically, if we let g = Y2 and f be such that f(Xo) = Xo + -y 1(Y 1), then our cost objective as specified in (3.1) is equivalent to the following: min J(f, g) = E[k 2 (f(Xo) - Xo) 2 + (f(Xo) f,ger(i3) We assume, without loss of generality, that E[Xo] = E[V] did. 36 = - g(f(Xo) + V)) 2 ]. (3.2) 0 and E[V 2 ] = 1, just as Witsenhausen I II I I I - 1. -- - -14- - - " - M - .- . ..... .... 1, 4 1. 1 -. 1.1 , "I In fact, we can simplify the functional optimization in (3.2) by recognizing that, relative to any f, the controller g acts as the MMSE estimator for estimating f(Xo) using the measurement f(Xo) + V. We shall then denote this MMSE estimator using standard functional analysis notation as g[f]. Explicitly, we can write g[f] using standard notation as gff](y) A E [f(Xo)If(Xo) + V = y] = Nf (y), Nf~O(y)' (3.3) where NkI[f](y) A f(x)ko(y with No[f](y) representing the p.d.f. of Y[f] A - f(x))dF(x), (3.4) f(Xo) + V and #(u) A (27r)- the standard Gaussian probability weighting function. We can interpret Nk kth conditional moment of f(Xo) under Y[f] (i.e., Nk[f](y) - e-U representing [f] as representing the E [fk(Xo)Iy[f] = y]). Indeed, according to Lemma 3 of [3], we can further reformulate (3.1) using the Fisher information of the random variable Y[f] about f, [f], which we represent as If] A JNo[f](y)n[f](y)dy, (3.5) where q[f] is the score function defined as: q-f](y) Remark This r[f] log No[f](y) = Nof](y) is similar to the well-known score function in statistical inference. interpret y as parameterizing f(Xo) through Y[f]. (3.6) We can I[f], in this context, measures how much "information" the variable Y[f] contains about f(XO) when it takes on a certain value. The Fisher information then directly corresponds to the ability of C2 to effectively estimate f(Xo) through the relation E [(f (Xo) - g[f](y[f]))2 = 1 - T[f]. Using these new definitions, we can further reformulate (3.2) to be J*(k, o-) = min k 2E f T'(B) [(XO - f(Xo)) 2] + 1 - I[f], (3.7) which is the standard version of the classical formulation of WC. Remark We can now understand WC better in the context of (3.7). The overall cost of the system 37 is directly tied to C1 's ability to send enough "information" to C2 through f(X) + V (as measured by 1[f]) so as to make C2's job easier while keeping in mind that it is penalized for the amount of "information" it sends (as measured through the quadratic cost). Borrowing further notation from Witsenhausen, we let ir(k2 , F) denote the problem of finding (3.7), where k > 0, V - K(0, 1), and F is a valid cumulative distribution function with Xo - F. If X0 ~ .J(0, o.2), then we denote the problem of finding the solution to (3.7) by 7r(k 2 , a.2 ). Finally, Witsenhausen proved a number of desirable and important properties regarding optimal controllers, which we now summarize in Theorem 3.1.1 [3]. Denoting the optimal controller for -r(k 2 , F) as f*, we have: Theorem 3.1.1 a) No [f], Ni[ f ], and g [f ] are all analytic functions; b) No[ f] is the p.d.f. for Y [f ] with No[f] > 0; c) N[f](y) = yNk[f](y) + Nk+1 f](y); d) g'[f](y) = E [(f(Xo) - E[f(X)])2 Y[f] e) E [(f(XO) f) E [f*(Xo) ] - g[f](y))2] = 1 - = _[f ]; 0 and E [f*(X)] < 40r; g) f* is monotonically nondecreasing on a(F), the intersection of all convex sets of probability one under F; h) J[f] - k2IE [(Xo - f(Xo)) 2 ] + E [f 2 (X0 )] - E (g2 i) the first variation of J[f] (in the sense of Ghteaux) is given by 6 Jil = G[f](x)6f(x)dF(x), (3.8) where G[f](x) = 2k 2 (f(x) - x) + j0 (y - f(x)) r f](y) [2 (f(x) - y) 2 + i[f] (f (x) - y) - 21; (3.9) furthermore, we necessarily have that 6 J.f = 0. The proofs of the above statements can found in [3]. 38 11 .......... -ali4- - -A&Lr&#,A --- . 4- --- _ "', Remark The necessary variational equation G[f] serves as the main focus of this thesis and will be explored in depth in Chapter 4, where we will derive it formally and investigate it numerically. In fact, using the above facts as well as some results concerning the boundedness of optimal controllers, Witsenhausen was able to prove the existence of optimal controllers. Utilizing the classical proof technique of assuming a minimizing sequence exists, Witsenhausen was able to show that the limit of the cost of a suitably chosen controller pair sequence (fs, g,) existed and was bounded from above and below by the optimal cost, J*. Thus, an optimal solution exists for WC, though Witsenhausen's proof of this fact does not offer many clues as to the structure of such solutions. Transport-Theoretic Formulation 3.2 Next, we will present an alternate formulation of the ir(k 2 ,, 2 ) problem as given by Wu and Verdni in [12], which we shall refer to throughout this thesis as the "TTF," or "Transport-Theoretic Formulation." To wit, Wu and Verdni decided to view this functional optimization problem of Witsenhausen's in a different light. Specifically, they view it as one in which the goal is to try and find a probability distribution, Q, that minimizes a weighted sum of the Wasserstein distance metric between Xo and the MMSE of Q Q and in the presence of additive white Gaussian noise, V. Following the reasoning in [12], we can recast (3.1) as J*(k 2 1, 2 , p) A inf fer(B) [k2,2E [(X - f(X))2] + a 2 mmse (f(X), E [(X a2) , (3.10) where mmse (X, a2 ) = - g(OX + V))2] (3.11) gEr(S)L E [var (XlaX + V)] (3.12) and X0 = oX, with X distributed according to the probability measure P ~ f(0, 1). If Xo ~ A(O, a 2 ), then X ~ j(0, 1), and we let (3.10) become simply J*(k 2 , a2 ) A J*(k 2 2 , g(0, 1)). (3.13) Noting that the optimal controller must be deterministic, Wu and Verdi found that it could be viewed in an optimal transport-theoretic sense. That is to say, they relaxed the Stage I control 39 f to be a random function Px x and then reframed the problem as follows: J*(k 2 u 2, p) = U2 inf PX 1 IXO o [k 2IE [(X 1 inf QCP2 (R) Xo) 2] + mmse - (X1, a2 )] [k 2 W (P, Q) + mmse (Q, 0 2 )]. (3.14) Thus, using the facts about optimal transport theory introduced in Chapter 2, we have that, for any particular Q from which X1 is distributed, the optimal control is given by f = F o Fp. Hence, the problem of finding an optimal control in the set of measurable functions on R in (3.1) is equivalent to the problem of finding the optimal output measure the metric space (R, I- I) In fact, by restricting Q in the space of probability measures on equipped with the underlying Borel --field. f to be of a certain type (e.g., affine), we are equivalently restricting our search to distributions Q of a certain type (e.g., Gaussian). Table 1 on p. 5735 in [12] has a listing of a few interesting restrictions and their consequences on the structure of Q, and we urge the reader to consult that table to gain a better understanding of this relationship. 3.2.1 Main Theorems from TTF The following results were put forth in [12] and are presented here mostly without proof, except a few that include broad proof sketches of the proofs as put forth by Wu and Verddi. Please consult the original paper for more-thorough proofs. Theorem 3.2.1 Let P E P2 (R) and or > 0. Then, there exists a R G P2(R) that achieves the minimum in (3.14). Proof Sketch We recall that Witsenhausen proved in [3] that any optimal Stage I control f must have E [f(Xo)] 4a2 . Hence, we can restrict our search for an optimal Q to the weakly compact subset {R E P2 (R) : E [R 2] < 40 2 }. We can then use the fact that W2 (P, Q) are lower semi-continuous w.r.t. Q to show that semi-continuous w.r.t. Q. Hence, there exists at least one Q - Q '-4 mmse k 2 Wy(P, Q that (Q, a2 ) and (P, Q) a Q)+mmse (Q, a) is lower achieves the infimum since weakly lower semi-continuous functions attain infimum on weakly compact sets.1 l Remark This proof of existence is more straightforward and simpler than Witsenhausen's original. This illustrates the depth of understanding that viewing WC as a problem of finding the right 'One can see this by recalling the definition of compactness in terms of open covers and then using a representative finite subcover together with the properties of the infimum of a set. 40 distribution for X1 adds. Theorem 3.2.2 Suppose that P has a real analytic and strictly positive density. Then, we have the following: E [Q] = E [P] and var(Q) < var(P) + , a) Any optimal Q satisfying (3.14) also has a real analytic density and unbounded support, with b) Any optimal Stage I controllerf is a strictly increasing, unbounded, piecewise real analytic function with a real analytic left inverse. Proof Sketch The key to this proof is to peturb the optimal Stage I output random variable Y[f] ~ Q E P2 (R) in order to derive the following variational equation (which Witsenhausen first derived in [3]) that serves as a necessary condition for any optimal f (P-a.e.): 2k 2 (f - id) = (#'* (q [f] 2 + 277[f]')) o f, (3.15) where id is the identity function and * represents the convolution operator. In addition, we then note that there is a left inverse, h o f = id, where h is given by h = id - 2k 2 ((()) The existence of h means that Since q is analytic, #'* (q2 + f must necessarily * (q2 + 2n')) . (3.16) be injective into R and therefore strictly increasing. 2q') is also analytic, and hence h is analytic. This then implies that f is piecewise real analytic and, since h is continuous, f's range must be unbounded. The former conclusion of the first part of the theorem can be found by using facts about composition of analytic functions and the Cauchy-Schwartz inequality. D Remark We note that the analyticity of h as given in (3.16) does not imply that well (as Wu and Verdd' pointed out in Footnote 3 on p. 5735 of [12]). Proving require proving that Q f f is analytic as is analytic would is supported on the entirety of R. Also, as Wu and Verdd discuss, the (piecewise) analyticity of f suggests that finding series approximations to the solution of the variational equation (3.15) could be a valid approach to finding the optimal f. However, the authors do note that the only polynomial solution to (3.15) is the affine controller, i.e., Q being Gaussian. 41 Indeed, we should point out that the #' in the alternate version of the necessary condition refers to the function - (#(y)) and necessitates use of the chain rule. For example, if a is some real function, we have that (O(a(y)))' = a'(y)#'(a(y)). Finally, and perhaps most importantly, it should be noted that, since f as defined above (FQ a Fp) is right-continuous, the variational equation must be identically zero at all points. The rightcontinuity, in combination with the density on R of the set of all points at which the necessary condition resolves to zero, makes this evidently clear. Corollary 3.2.3 The optimal Stage I controller for (3.14) cannot be piecewise affine. Proof Sketch The key to the proof is the use of the identity theorem for holomorphic functions and its results when we consider h o also assuming, of course, that f f on a small-enough open interval of the real line. We are has at least one discontinuity (a so-called "jump") and/or at least one nonsmooth "bend." Otherwise, f is completely affine, and our proof arguments do not follow. However, as we shall see, the affine controller is not always optimal and, in fact, hardly ever is. Theorem 3.2.4 The map P -+ J*(k2 , 0 2 , P) is concave, weakly upper semi-continuous, and trans- lation invariant. In addition, the following inequality holds: 0 J*(k 2 2 , P) < min{k 2 a2 var(P), a2 mmse (P, U2 )} 1. (3.17) Corollary 3.2.5 Let P, Q E P2 (R) and o- > 0. Then: a) For any b Q, J*(k 2 U2, P* Q) > J*(k 2 ,, ) For Gaussian P, a2 Theorem 3.2.6 o 2 6J* 3o. k p). J*(k 2 , c 2 ) is increasing. J*(k 2 , a2 ) is increasing, subadditive, and Lipschitz continuous, with 0 < 2 -k+1 Lemma 3.2.7 2 ,p) which is achieved by the affine controller f(x) =a (2k 2 2 var(P), (3.18) + lim J*(k 2 02-+0 = k 2 var(P) Lemma 3.2.8 If P ~ .A(0, 1) and k < 0.564, we have that lim J*(k 2 7a2 ) < 1 = lim J*(k2 o 2 ). '-*+OO 0T-+00 42 (3.19) Remark The last few theorems are mainly analytical results and add to our understanding of the underlying geometry of the problem. From these and the facts specified in Chapter 2 about the MMSE functional and Wasserstein distance, we see that the underlying cost functional is a weighted sum of a convex functional (the Wasserstein distance) and a concave functional (the MMSE functional). General solutions for such problems, even in the finite-dimensional case, remain elusive, and the solution to these optimization problems are currently an open problem. As such, it is not surprising that WC has not been solved for any particular choice of 0- and k, let alone in full generality. 3.3 The "Counterexample" Aspect In this section, we will explore Witsenhausen's refutation of the LQG conjecture. We shall start by discussing the two canonical controllers and then show that, in certain regimes, a simple nonlinear controller outperforms the best linear controller by a large margin. 3.3.1 Optimal Linear Controls To begin, let us consider the most basic linear control law. We define the following: fx(X) AX, (3.20) where A E R is the linear coefficient. Lemma 3.3.1 summarizes the main formulas related to a linear controller. For ease of exposition, we have placed the derivations at the end of the chapter in Section 3.4. Lemma 3.3.1 For the case of linear controllers, we have the following: IL(1 a) No[f](y) =e 2 b) N1[f,\](y) = 112)yNf'.(y), y, c) 9[fx](y)= e) J[f,\] J[A] 2 + ' d) r/[fA](y) = 2 1+12,\2 k2( _ A)42 43 Now, in order to find the optimal linear coefficient, A*, that minimizes J[fx], we can proceed in two ways. We can directly differentiate the above algebraic expression for J[fx] w.r.t. A, set the resulting expression to zero, and then solve for A. We could also utilize the necessary condition being identically zero everywhere to derive the equation for optimal A*. That being said, we will use the former approach as it is more straightforward (though we would get the same expression using the necessary condition). Lemma 3.3.2 The optimal linear coefficient A* that minimizes J[fx] has to satisfy the following equation: k 2 (1 - A*) = . (1 + o.2(A*)2) (3.21) 2 Proof Taking the partial derivative of J[fA] w.r.t A, we have J[fx] = 6A JV1 - A -2k (i21 2 2 - (1 - A) 2o. 2 + 1 A 1 + o2A2 A) + (1 + .2A 2 ) (2a'A) - (.2 A 2 ) (20.2A) (1 + 0.2A2) = -2k 2. 2 (1 - A) + 2o 2 A (1 + 0.2A2) 2o. 2 (-k2(1 - A) + (1.+ 2 2 T2A2)2 Then, setting this expression to zero, multiplying out the constants, and rearranging the two expressions, we get the desired formula. LI Thus, the optimal A* is a function of k and a with a value satisfying (3.21). That is, if we let R3.21 denote the set of real roots of (3.21), then we define the optimal linear coefficient to be a function of k and o as A*(ko) A arg min J[fr]. TE7R3.21 (3.22) Now, as Witsenhausen pointed out, (3.21) can have at most three solutions, so the space of solutions we end up searching over to find the optimal A* turns out to be extremely small. Indeed, we can visualize this necessary condition through the zeros of the function, G[A], which we define as G[A] A k 2 (A - 1) + 44 A (1 + a.2A2)2 (3.23) 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 A 1 1.5 Figure 3-2: Absolute Magnitude (IG[H]I) of First-Order Condition for the Linear Case. We set k = 0.2 and o- = 5. 45 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 1 k 0.8 1.2 1.4 1.6 1.8 2 Figure 3-3: Solutions to (3.21) for k 2 , 2 = 1. Each colored value represents a distinct solution to (3.21) for any designated (k, -) pair, where we iterate through all such pairs by changing k on the abscissa axis. 46 In Figure 3-2, we graph IG[A] for the canonical case of k = 0.2 and o- = 5. In this case, we can clearly see three distinct solutions at roughly 0.04, 0.3 and 0.95. In fact, if we were to graph all the solutions to (3.21) we would notice a very interesting bifurcation that emerges in the special case when k 2 , 2 = 1 and k < 1. We end up having two equal cost solutions driving upwards and downwards to one and zero, respectively. These two curves represent the optimal solutions with the third concave curve representing a suboptimal solution. In Chapter 4, we explore this behavior in more depth using our finite element model, but for now, we present this visually in Figure 3-3. Before moving on, we note that, for very large k, the optimal linear coefficient A* is found through a simple manipulation of the equality in (3.21) as follows: A k 2 (1-)\'=-A( 2 2 (1 + 0 A ) 2 k2 +(1+ 0 2A 2 ) 2 } which, as k -+ o, in turn yields A* 1. (3.24) Thus, we get a minimum possible cost, in the limit of large k, of 2 2 ' J[A*] 00 11 + U2- (3.25) which is the traditional expression for the conditional expectation of Gaussian random variables. This result agrees with our intuition, since, in the limit of large k, the Stage 1I cost is inconsequential in comparison. In this situation, the only quantity that is of interest to be minimized is the energy of the Stage I controller. This energy minimization is obtained by using a zero-energy policy of allowing the value of X 0 to "pass through" Stage I unchanged. Hence, our optimal linear coefficient is A* = 1. On the other extreme, if we consider the case when k = 0, then quite clearly the optimal linear coefficient is A* = 0, as the equality in (3.21) is satisfied by such an assignment. This is also confirmed via a simple graphical investigation. Thus, if we have absolutely no cost on our ability to control what the Stage II controller sees, then we can force f(Xo) = 0 and then have our MMSE estimator simply guess zero with fidelity one. The optimal cost in this case is quite clearly zero, and this strategy is known as a zero-forcing control policy. 47 3.3.2 1-Bit Quantization Controls When k is sufficiently small but nonzero, the optimal linear control function f.\ is not necessarily the best control law one can choose. Indeed, in some parameter specifications for k and o, simple nonlinear control laws perform better. The 1-bit quantization controls are a particular family of nonlinear control laws that can perform better and are the next controls we investigate. To begin, we define the following: a, a, faGV)A x<O x> 0 (3.26) , where a E R is the 1-bit quantization coefficient. We note that we may also represent (3.26) more compactly as asgn(x). fa(x) As in the linear case, we summarize the relevant formulas in Lemma 3.3.3 and delay the proofs until Section 3.4. Lemma 3.3.3 For the case of 1-bit quantization controllers, we have the following: a) No[fa](y) = (v/ 2) #(y)#(a) cosh(ay), b) Ni[fa](y) = a tanh(ay)No[fa](y), c) g[fcj(y) = a tanh(ay), d) r/[fc](y) = -y + a tanh(ay), e) J[fa] A J[a] A k 2 (a2 - 2 o-a + a2) + (2w/2) a2 #(a) f0 sech(ay)#(y)dy. Now, in order to find the optimal 1-bit quantization coefficient, a*, we can once again choose one of two options. The first option is to minimize J[fa] directly via a partial differentiation w.r.t a, and the second option is to use the necessary functional equation. As in the linear case, for ease of exposition, we shall pursue the former course of action. Lemma 3.3.4 The optimal 1-bit quantization coefficient a* that minimizes J[fc] has to satisfy the 48 following equation: 2k 2 a* o- - = (v'7)a*0(a*) j sech(a*y)#(y) [(a*) 2 + a* tanh(a*y) - 2] dy. (3.27) Proof We can proceed as we did in Lemma 3.3.2. Taking the partial derivative of J[f,] w.r.t a, we have a9 a [ a 2 2 [k 2 9ZJ[f ] = [2(2 2 -2 2 k2 2-2 2)\ o-a + a 2 2\1+ o-a + a f / 2 + (vf2w) a20(a) a [ + 2I Vr 1 sech(ay)#(y)dy.] a2#(a) 1 c)sech(ay)0(y)dy] OO Setting A J[f,,] 0 and moving the second summand on the RHS over, we get the desired equation. Remark Unlike in the linear case, we note that the necessary equation for optimal a is a transcendental equation (because of the presence of fft. sech(ay)#(y)dy), so direct analytic evaluation of it is difficult. Luckily, as we shall do quite frequently in Chapter 4, we can approximate this integral using quadrature rules (specifically the Gauss-Hermite quadrature rule). As such, we can approximately graph (3.27) in a similar manner to how we treated (3.21) and qualitatively discern properties of optimal a* (which we do in Figure 3-4). However, further discussion of this behavior is, as it was in the linear case, outside the scope of this thesis. 3.3.3 Refuting the Conjecture Using the above facts concerning linear controllers and 1-bit quantizers, Witsenhausen was able to prove that, in a particular regime of wr(k 2 , a2 ), a rather naive 1-bit quantizer can outperform by a fairly large margin the best linear controller (effectively refuting Conjecture 1.1.4). Our resulting proof mirrors Witsenhausen's, and we reproduce the mechanics of it here for the sake of completeness. Theorem 3.3.5 For 7r(k 2 ,0. 2 ), linear controllers are not optimal in the regime of small k with ko- = 1. 49 1 I I I I 1 1.5 2 2.5 a I I I 3.5 4 4.5 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 3 5 Figure 3-4: Absolute Magnitude of (3.28) for the 1-Bit Quantization Case. We set k = 0.2 and o- = 5 and use Q = 50 quadrature weights. 50 Proof To begin, recall our discussion of optimal linear controls in the regime of small k, in Section 3.3.1. In that narrative, we had pointed out that, as k becomes smaller, the optimal A approaches both zero and one (i.e., we have two optimal A's of equal cost that approach both zero and one). In this regime of small k, J* [fx] -+ 1. Keeping this mind, let us define the very simple nonlinear controller f,(x) -sgn(x). = Using our previously stated equation for J[f,] combined with ku = 1, we can expand the cost of this nonlinear controller to be J[fa] = 2 1- + ( v/) 020(o-) sech(uy)#(y)dy. (3.29) Now, since sech(x) < 1 for all x E R, we have from Lemma 16 in [3] that sech(x)#(x)dx < j q(x)dx = 1, and, using this, we can bound the RHS of (3.29) to get J[f,] 2(1 - .) + (Vs) .24(u) Thus, taking the limit as k -+ 0 coupled with the fact that othat J[fa] < 2 (1 - . = - and limkO s = 0, we have Since this upper bound is strictly less than the best possible performance for the linear controller, we have that linear controllers are not optimal controllers in all regimes of 7r(k 2 ,U 2 ). . 1 Remark As Witsenhausen dutifully points out, his token nonlinear controller is "far from optimal." Indeed, he notes that, by using (2n+ 1)-bit quantizers together with a gradient descent method, one may obtain much better controllers in the regime of very small k. This agrees with our intuition and countless papers, including the best known results to date, which we had discussed in Chapter 1. This follows since, with small penalty on its movement, C1 has greater freedom to communicate more explicitly through Y[f] by sending appropriately spaced messages (w.r.t. F), which are represented by the quantization levels in any (2n + 1)-bit quantization scheme. 51 A few decades after Witsenhausen's paper, Mitter and Sahai were able to construct families of n-bit quantization schemes that perform infinitely better than the optimal linear control in the regime of small k [4]. In particular, the cost of their n-bit quantizers approaches zero in the regime of very small k. This result, combined with Witsenhausen's original example, has directed most of the work on this problem to be that of efficiently searching the space of (2n + 1)-bit quantizers for the optimal quantizers (or, more realistically, best-yet performing as the space of (2n + 1)-bit quantizers is infinite). Using rather elegant and creative approaches, researchers have been able to improve the cost little by little over the years. The best known result that this author is aware of is the one that was referenced in Chapter 1, that of Li, Marden, and Shamma. For the canonical problem of 7r( (_)2, 52), they were able to design a 25-bit quantizer with cost J 3.4 Proofs for Chapter 3 3.4.1 Proof of Lemma 3.3.1 = 0.1670790 [6]. Proof a) No[fx](y) 0(y - fx(x))dF(x) = f - (y - f.(x)) _ ( ) dx e -2 dx / +.22)) 2A(1 (1 - 27(1 + ]2 1+2,\2 2 2A b) j 0' ( from which we gather that Y[f] ~ K(0, 1 + o2 A 2 ). N1[fA](y) = 2_ 22,\2) e 0.2 e ) 2 27o fx(x)#$(y - fA(x))dF(x) 52 Y29 -V+a,2 dx e 27r(1 + 0.2A2) 2 du) ~ 1 A~ f -ooxe2 (1+-2 1+ u 27(a) + 7ra + I ( 1 2A2 ) 2 (1o 2- f-00o) - y 2 A2 2du 2 ye 2 022 (I+ 2 2A +o f-oo'c') du y) 022 ) 2 ( i-Ny) dx + + - e 1 d 2 1 ( ye 12A2 2ir( 1 + u2 2)A c) g[f ](y) = Ni [fx] (y) No[f\] (y) (2 2 2 (1+ - .2 T) ye- 2 12)y A2 (I+ u22 )7 Y- d) -y + g[f,](y) n fx](y) (1 2 A2 =I + 1+2A2 e) J[fA] = k2 E Lf )x o ) X O)2 ] + - k 2 (A - 1) 2 + ) o g [fx(y] Y[fAfx2 ) 2 + E [f(Xo)] (A _ 1)2X2] - ( .( A2 2 2)2 53 )(I1+02A2) = k2 =k 3.4.2 2 (1 2 A20 2 (1 + C. 2 A ) - (02A2) 1 + a.2A2 .22+ 2 A) 2 - .2 + 2 A22 1 + q2 A 2 Proof of Lemma 3.3.3 Proof a) O(y = = - fa(x))dF(x) (y - fi(x))dF(x) = c(y + oe) dF( ) + #(y - a) o00 dF(x) (y - a) + = O(y + a) (2 - - fa(x))dF(x) + No[fQ](y) + 2 ( _a2) i f- 7r (y)#(a) (ea'Y + e- y) = ( 2)#(y)O(a)cosh(ay). =N-[f](y)=-yNk [f](Y) + Nk+ [f](y), which holds for any b) Using the relation No[fa](y) = ((vf-)#(y)#(a) cosh(ay) ) d = (V)#(ce) = (2;)#(c) = -yN[fa](y) (#(y) cosh(ay)) (-y#(y) cosh(ay) + #(y)ao sinh(ay)) + aNO[fa](y)nh(y) =+No[f](y)h(ay) =-YNo Vfl](Y) + oeNo [fe] (y) tanh(ay), we get N1[fa](y) = +No[fa](y) + yNo[fa](y) = (-yNo[fj](y) + aNo[fa](y) tanh(ay)) + yNo[fa](y) = a tanh(ay)No[fe](y). 54 f, and c) g[fa](y) = Ni[fa](y) No [fa] (y) a tanh(ay)No[f)](y) No[fV] (y) = a tanh(ay). d) r/[fa](y) = -y + g[fa](y) = -y +a tanh(ay). e) J[fQ] = k 2E [(XO - fa(Xo))2] + E [fa(Xo)] - E X2] - 2E [Xofa(Xo)] + E [f(X 0 )]) + E [fa(Xo)] - 2 (joxfa(x)dF(x)) = k2 2 = k2 2- (jf 0 tanh2(ay)N[fa](y)dy - 2 (. + xfa(x)dF(x) k 2 (2 = k 2 (02 -2 = k 2 (02 -2 = 2 k 2 (a2 -2 - -Oa 2 -a 2 -a a2 - a2 ( xfo(x)dF(x)) - E [a2 tanh2 (aY[fa])] tanh2(ay)No[fa](y)dy + a2) + a 2 + a 2) +a2 -- a2 (ftanh2(ay) No[fa](y)dy) + = + 2 1 2 a2 + a 2 - a2 + -a Sa a2 + a2 + k 2 (E [(gfa](yrfa]))21 a2) + (1 - sech 2 (ay)) No[f.](y)dy) sech2(ay)No[f.](y)dy) (V21r) a 2q(a) 55 j sech(ay)4(y)dy. 56 Chapter 4 Finite Element Model for Witsenhausen's Counterexa mple In this chapter, we detail the computational model that we use to investigate WC, particularly, its necessary functional equation that was first introduced in Chapter 3 in two forms (the classical and TTF forms). We proceed by first deriving, in full, the necessary functional equation for optimality in the classical format and show that this equation is equivalent to that of the TTF. We then specify our finite element model and comment on its suitability as a numerical tool for investigation of WC. Following this, we present numerical results through which we verify the accuracy of our model, and then we present the results of using our FE model to approximately solve the necessary condition at certain points of interest. In our numerical investigations, we will be considering the standard version of WC, 7r(k 2, 4.1 2 0. ), as we have throughout this thesis. Necessary Condition for WC As we stated earlier in Theorem 3.1.1, the first variation of the WC cost functional J[f] has a succinct and analytically derivable form. In this section, we shall formally derive this cost functional, show that it is equivalent to the necessary variational equation obtained by Wu and Verdi in [121, and then discuss various aspects of this equation, including its computational difficulties. 4.1.1 Derivation Consider Theorem 4.1.1, which incorporates Lemma 9 from [3] and Theorem 2 from [12]: 57 Theorem 4.1.1 In WC, if f* is an optimal controller, then it satisfies the following exactly for all x C R: - X) + G[f*](x) = 2k 2 (f*(x) ] # (y - f*(x)) r[f*](y) [2 (y - f*(x)) 2 + j[f*](y) (y - f*(x)) - 21 dy, (4.1) (f*(x) - X) + d(-) + = 2k2 (f*(x) - X) * (T 2 [f*] + 27'[f*]) o (4.2) , 2k 2 - 2 (0 * o [i7"[f*] + q[f*],'[f*]]) o f*(x) (4.3) -0. Proof To begin, recall the cost function form from Theorem 3.1.1: Jf] = E [(f(XO) - Xo)2] + 1 - '[f]. (4.4) Now, using the standard condition for first optimality from calculus of variations, we need to figure out what the first variation of J[f], 6JIf , is and set the result equal to zero. Using (2.3) from Chapter 2 with some arbitrary variation 6(t), we have &9E [((f (XO) + eO(t)) - Xo)2] = E=0 ((fX() + eO(x)) - X)2 dF(x) ((f (X) + e6(X)) - X)2 dF(x) 2 ((f (x) + eO(x)) - x) O(x)dF(x) = /=j2 dO (f (x) - x) O(x)dF(x) and -(1 9E - I[f + O]) 0 = 1[f + e6] 6=0 E=OE ( [f + eO](y)No[f 2 = - 2[f 6=0 + E6](Y) + e1](y)d y) No[f +eO](y)dy 'Sometimes we may use the notation 6J[f] for the first variation of J[f]. 58 2f + + -o 60](y) No[f + E6]()y) ( (+n f + [2,q f + cO] (y) + 7 2 [f + E](y) 00co 09,E I ,= + () No[f + cO](y) No[f + EO](y) (ANo f + EO](y)) N[f + No No[f + EO](y) I6](Y) No[f + EO](y) dy 2:[f - dy co](y)) No[f + ce6](y) No[f + cO](y) aNoIf + co](y) dy + 772[If + Cel](Y) a9, 6](y) E2T [f + : -0 a No[f + 60](y) No[f + 6] (y) - 27[f + 6O](y) (1 f + co](y) 2 No[f + 0] (y) [f + CO](y) j 2[](y) - - 2, 2 f(x)) 2 ] # (y - f(x))0(x)dF(x) (y - f (x))(y - f(x))0(x)dF(x) [f](y) + n2 [f](y) J-000 _ (y - dy (y - f(x))(y f(x))0(x)dF(x)] dy - + [f[(y) 2(y - f(X))2 + r[f](y)(y - f(x)) - 2] O(y - f(x))dy)0(x)dF(x), using that 09E a fjo (y - f(x) - EO(x))dF(x) j (y - f(x) - EO(x))dF(x) = = j(y - f(x) - Eo(x))#(y - a-No[f + EO](y) f(x) - cO(x))0(x)dF(x), with a No[f + Eo](y) = j(y - 59 - f(x))0(x)dF(x) and + Ec](y) o2No[f = No[ f +,EO](y)) j(y J j - f (x) -,EO(x))#(y 9 +[(y - f((x) - EO(x))#(y - f(x) - cO(x))] 9(x)dF(x) j[#(y - (y - f(x) - cO(x))O(x)dF(x) f(x) - CO(x)) f(x) - 1 - (y - - eo(x)) 2q(y - f(x) - E)] 6(x)dF(x) f(x) - eO(x))2] O(y - f(x) - EO(x))O(x)dF(x). with 02 = E( j [1 - (y - f (x))2] q(y - f (x))9(x)dF(x). Thus, defining - x) + G[f](x) = 2k 2 (f(x) # (y - f (x)) I[f](y) [2 (y - f(X))2 + T[f](y) (y j - f(x)) - 21 dy, we have 6 JIf = JG[f](x)dF(x)6f(x). Hence, G[f](x) = 0 must hold F-a.e. since we require that 6J~f = 0. Furthermore, as pointed out by Wu and Verdi in [12], we must have that G[f*](x) = 0. We can see this by considering the fact that {G[f*](x) = 0}1 C R is dense in R, coupled with the right-continuity of optimal controllers. Finally, the latter two forms of the necessary condition can easily be ascertained through repeated use of integration by parts, as follows: G[f](x) = 2k 2 (f (x) x) + j - # (y - f(x)) 7I[f](y) [2 (y - f(X))2 + r[f](y) (y - f(x)) - 21 dy [(y = 2k 2 (f (x) - x) + 2 + (y = 2k 2 (f (x) - - f(x))O(y x) + 2j - - f - 1 ](y - f(x))r[f](y)dy f(x))q 2 [f](y)dy #"(y - f (x))rq[f](y)dy - j 60 -(y - f (X))q5(Y _ f x))r 2 [f](y)dy = 2k 2 (f(x) - x) - 2 = 2k 2 (f(x) - x) - j 0'(y - f (x))r7'[f](y)dy - q'(y - f(x)) 7 2 [f](y)dy 0'(y - f(x)) [2'[f](y)dy + 'q2[f](y)] dy, and G[f](x) = 2k 2 (f(x) - x) = 2k 2 (fW - x) + - f '(y - f (x)) [2''[f](y)dy + n 2 [f](y)] dy J q(y - f(x)) [2,q"[f](y)dy + 27'[f](y)] dy # = 2k 2 (f(x) - x) + 2 where we have used repeatedly that limto q(y - f (x)) [q"[f](y)dy + 77'[f](y)] dy, $(t) = limtex 0(t) = 0 in our multiple integration by parts. Thus, we have our formulas and have proved the theorem. E 4.1.2 Discussion Before moving on to specifying our finite element model, let us briefly discuss some aspects of the variational equation we just derived in Theorem 4.1.1. For one, we note the rather complicated nature of the equation in terms of f. Not only does f appear inside and outside the integral expres- sion, its appearance inside the integral expression is in the power term of the Gaussian probability weight function as well as in the score function. The score function itself involves the fraction of two integrals, both of which include f f appearing within in the power terms of a Gaussian probability weight function. This, coupled with the fact that the integral of the necessary condition has three separate terms involving f (#(y - f(x)), 'r[f](y), and [2(y - f(x)) 2 + n [f](y) (y - f(x)) - 2]), indicates extreme difficulty in analyzing this expression analytically as well as in modeling it numerically (though we do the latter successfully in this thesis through clever use of Gauss-Hermite quadrature). In addition, we can see that no simple (2n+ 1)-bit quantization scheme could possibly satisfy this equation. This is because any f(x) that satisfies the necessary condition has to incorporate uniquely identifiable information regarding x. Therefore, if in some interval x E (aleft, aright), f(x) = a, then the second summand in G[f](x) would be the same for all x E (aleft, aright) whereas the first summand will change depending on x. Thus, there is no way a (2n + 1)-quantization scheme could be zero for all values of x, as we know it must if f were to satisfy the necessary condition. Now, as we shall 61 see in Section 4.3.2, we can approximately satisfy the necessary condition at quadrature points for controllers that have flat sections (like the (2n + 1)-bit quantization schemes), however, we cannot satisfy it everywhere in general for such controllers. 4.2 Finite Element Model In this section, we shall present the main contribution of this thesis, our finite element (FE) model using rational basis functions. We will specify fully our M-element FE model, discuss our choice of rational functionals as our basis functions, and then touch on the degrading impact that "edge effects", or the effects of boundary conditions, have on the numerical computation of the MMSE estimator. 4.2.1 Model Parameters To begin, let us choose two nested, connected compact intervals in which to place our basis functions. We want to choose a bounded interval in which "most of" the probability of the underlying initial state variable Xo resides in and an enclosing bounded interval in which we end up placing our basis functions. The former interval models the region in which we want to most accurately model functions since that is where most of the probability of Xo will be calculated, 2 whereas the latter interval will be used to provide a wide-enough buffer for our numerical computation. 3 Denoting these intervals by BI and BO, respectively, we can choose them in the context of r(k 2 , a 2 ) as B1 = [-Ku, Ka], (4.5) BO = [-La, Lo], (4.6) where K and L are positive reals such that K < L. We choose symmetric intervals since the underlying probability distributions for X0 and V are symmetric. Remark In the context of our implementation, we further specify that both K and L are powers of 2 so that we can efficiently search for the appropriate sub-interval element using a binary search algorithm. Proceeding, let us partition BO into M + 1 non-overlapping evenly spaced boundary points. Denoting these boundary points by the set & A {ao = -La, . . . , a, 2 3 . . . , am = Lo}, we can refer to and hence the region in which the accuracy of the cost of our controller will be most affected. and to provide some protection against deleterious edge effects caused by our numerical integration. 62 jth subinterval (hereafter referred to as jth element), Sj, by [aj-=j (4.7) [agl,aM), j<M [am_,, am], j = M We then refer to the collection of all the such elements, E = {ei We note that Bo = j = 0, ... , M}, as the element set. U 1,jM S' Next, we need to specify our basis functions. We shall be utilizing a piecewise rational basis function set specified as follows. Let A - denote our mesh size, and let Oj denote La-(-Lo) = the jth basis function, which will be nonzero only on the two elements bordering the boundary point aj, or only on the inner element in the case of ao or am. As for the shape of these basis functions in their respective elements, we shall use a simple rational representation as in [21]. Let us formulate our sub-basis functions j and 2 as the following 3rd order rational basis functions: 4 a+ (i~X) = b2 , 0 <A (4.8) + 1+(A)(A (48 0, otherwise and 0 < X <A - d C+ (X) =A+i+i2() where the weights a, b, c, d - R are suitably chosen so that and limA find that a =- (X) = (4.9) otherwise 0, i(0) = 1, lim,,A p1 (x) 0, (2(0) = 0 1. We can figure out these weights by solving this system of equations, and we , b =, c = , and d =-. Therefore, we can fully specify our sub-basis functions as 4 4 dz)=3 1+iq)+(q) 2+(q) 3 (4.10) 3 - 0, 0<x<A otherwise and 4 3 2 (X) (L)(L)+ -1()Q) )3 0 < X< (4.11) 0, otherwise Please consult Figures 4-la and 4-1b for graphical representations of these sub-basis functions. 4 We choose a third order function to allow for greater changes in the value of our function between boundary points. We could choose higher-order rational basis functions, but we settled on using third order rational functions as they are sufficient to capture most of the nuances of the derived equations. 63 1.5 1 0.5 0 -0.54 0 0.1 0.2 0.3 0.4 0 5 0.6 0.7 0.8 0.9 (a) First Sub-Basis Function 1 ((1) with A 1.5 1 0.5 I 0 -0.5 U U.I 1 . 6.3 I .4 y 5 6. .7.f ?5.8 V. (b) Second Sub-Basis Function ( 1) with A ) These two Figure 4-1: Third-Order Sub-Basis Rational Basis Functions with A = computationally to use we which functions, sub-basis functions in turn make up the rational basis approximate the Stage I controller. Both sub-basis functions are zero outside the interval [0, and, within this interval, are either strictly decreasing or increasing. 64 1.5 1 -- 0.5 0 ' -0.5 -0.2 0 0.2 0.4 X 0.6 0.8 1 1.2 jth Third-Order Basis Function with A Figure 4-2: Example of the Jth Third-Order Basis Function with A = 1, aj_ = 0, a I , and aj+1 = 1. We note that this jth third-order basis function is zero outside the interval [ayj1, aj+i) and, within this interval, is increasing on [aj-1, aj1 and decreasing on [aj, aj+i). 65 Now that we have specified our sub-basis functions, we can define our basis functions, {n} o, more concretely as 1(x - ao), j = 0 2(x - aj-1) + 1(x - aj), 62(x - 0 < j < M (4.12) am-1), j = M Thus, we can picture our regular "sharklet" V$' in Figure 4-2 (the edge half-sharklets V50 and OM resemble simply '1 and 2, respectively). We note that each regular basis function is asymmetric about its boundary point pivot. We shall discuss the reasons for this choice of sub-basis functions in more detail in Section 4.2.3, but suffice it to say that the current specification will greatly aid in the accuracy of our numerical computations. Now that we have specified our basis functions, we can approximate a function on BO as a weighted sum of our M basis functions as follows: f[P1(x) = Zw4 j(x), j=0 (4.13) where cZ E RM+1 is a weighting vector. We then find that each function approximation can be uniquely specified by the choice of basis-functions weights, leading us to view f[c] as a functional mapping of a finite-dimensional vector space (RM+l) into an infinite dimensional inner product space of real Borel-measurable functions (with the inner product (., -) specified in the usual manner as detailed in Chapter 2). Hence, we have transformed any optimization problem involving our function f from one that lies in an uncountably infinite space to an approximate one whose feasible set is finite-dimensional. This is the essence of the FE method! In addition, we should note that nodal values are preserved in this model with f[C2 ](aj) = Wj, i.e., the weights simulate the values of f at the nodal points. Moreover, the model provides a continuous approximation of f with no "jumps" or abrupt changes since the approximation of f on each element is continuous (as each sub-basis is continuous, and we have a weighted sum of continuous functions). As most researchers believe the conjecture that the optimal controllers in WC are analytic, 5 this is a desirable feature of our model. We summarize the description of our model parameters in Table 4.1. 5 which is a conjecture at this point, but strongly believed, especially in light of the result that optimal controllers have to be piecewise-analytic [12]. 66 K L M { ao,..., aM} EFg {O O,..., f[l] I} Parameter Descriptions B, boundary (region of interest) BO boundary (region for computation) Number of Elements Boundary Points jth Element Element Set First Sub-Basis Function Second Sub-Basis Function Basis Functions Approximate Function Table 4.1: FE Model Parameter Descriptions. These variables represent the principal components of our FE model, which we use to approximate the Stage I controller in the central problem of this thesis, Witsenhausen's Counterexample. 4.2.2 Formulas for Computing WC Formulas & Equations In this section, we will detail how we will be computing each of the following functions using our approximation to a) No[cZj](y) f, E9i b) N1 [c](y) 0) g [C ](y) f[c2]: (y - f[](i)), iPf[ E29 1 [(f d) q [c ](y) -y + g[c](y), e) G [c2](x) 22 ( Q()-x f) J[K] Ak2Z where {}_ 1 1 (ii)# 0(y - f{](YO), )V No [v y Wi_1 [cn)] (xi + f [c ](x)) [x? + xinl[c ] (xi + f [cQ](x))-2] "A (fV1(i) - ii) + EZ_ and {3}v 1 2 ( _ T=1f Z 2 w [] (xi + f[j](fy))), are the abscissas and weights for the Gauss-Hermite quadrature for the Gaussian probability weighting function of Xo - .(O, Oa2), and {x}_ and {wj}91 are the abscissas and weights for the Gauss-Hermite quadrature for the Gaussian probability kernel P1(0,1) (see Chapter 2 for more details regarding how these weight-abscissa pairs are formed). We leave the full derivations until Section 4.5 of this Chapter. Remark It should be noted that, because of the presence of multiple integrals in the above expressions (and, therefore, multiple sums in the above expressions), we naturally encounter the dreaded 67 "curse of dimensionality" so common when modeling expressions involving nested integrals. In particular, we encounter the issue in two cases. The first is in the number of elements that we choose to use for our function approximation; the more elements, the better the approximation of the function. The second is in the calculation of the above integrals, which involve the Gaussian probability weighting function, #(-); the more quadrature weights we use, the better the approximation. If we assumed that most of the integrands are well approximated by polynomials of low degree, then we can get away with using less quadrature weights (see Section 2.3 in Chapter 2). However, we cannot necessarily assume that is the case, and, as we stated in the previous section on the necessary condition, the only polynomial controllers f that satisfy the variational equation are affine (or, to wit, linear). As such, our computations do take some time, but, through experimentation and comparison with known formulas, we managed to find that having Q = 80 weights and M = 226 elements work well. The results of using these parameters are depicted in Section 4.3. 4.2.3 Justification of Rational Basis Functions There are a number of reasons why we chose rational basis functions as our sub-basis functions. For one, the rational basis functions in (4.10) and (4.11) that we use as sub-basis functions in our FE model are asymmetric with a bias in the direction of increasing abscissa values. This means that, in calculations using our FE model for the value of a function at a given point, the nodal values "downstream" relative to that point contribute more to the value of the function at that point than do the nodal values "upstream" (e.g., a(10) is downstream of a(5), whereas a(5) is upstream of a(10)), since 2(x) > x and (1(x) < x on [0, A). Hence, on a given element Ej, the weight value at a3 has a greater effect on the shape of the function than the weight value at aj_1. This, coupled with the fact that optimal controllers are non-decreasing, means that our choice of rational basis function should adequately model optimal controllers because f [c21 will naturally be non-decreasing (and strictly increasing unless wj = wj_1) on each element Sj when the weights themselves are non-decreasing in j (i.e., wj_1 < wj). Indeed, another reason for choosing rational basis functions is their success in the past. Specifically, rational basis functions were shown to have great success in solving a multitude of problems, including Burgers' equation in fluid dynamics [21]. This is relevant because, as we saw in Theorem 4.1.1, the necessary variational equation involves a term of the form 77"[f](y) +?q'[f](y)[f](y), which is directly related to the viscous Burgers' equation. This, coupled with the smooth nature of our FE model, means that we should have a very good approximation of optimal controllers. 68 4.2.4 Discussion of "Edge Effects" in MMSE Calculation Finally, before discussing the results of our numerical experiments, we comment on our need to define both inner and outer intervals, B, and BO, respectively. This is mainly because of the effect that truncations have on the calculation of various quantities of interest in WC, in particular, that of the first conditional moment of f(Xo) given Y[f], N1 [f ]. To expand further, consider an arbitrary controller b(t) and its time-limited variant bT(t), where: bT (t) =~ b(t ), ) |t|<T 0, Itl > T ,t<T E R. (4.14) Now, let's consider the first conditional moment for bT(t), J N1[bT](y) bT(x)#(y - bT(x))dF(x) br(x)#(y = - bT(x))dF(x). (4.15) = 0, (4.16) _T- Investigating (4.15), we notice that lim Ni[bT](y) y-+o which is easy to see since, for large values of y, #(y - bT(x)) is very small and, because of the truncation in the integral, we have that generally N1 [bT](y) than Ni[b](y) - - 0 at a faster rate of convergence 0. In fact, even for finite values of y within [-T, T], we will have Ni[bT] or trending in that manner, regardless of the choice of b. = 0 So, even if N [b] is much wider and significantly non-zero outside of [-T, T], we will still have N1 [bTl trending towards 0 inside of [-T, T]. This is what is meant by "edge effects". As such, one naturally can see the need for two bounds in our model. The inner bound is used for analysis and modeling while the outer bound is strictly used for computation and mainly to provide a "buffer zone" so that the edge effects on the FE computations do not interfere. Since this buffer zone has to be, in practice, quite large, we thus need more elements to compensate in our model in order that we have the appropriate amount of elements to model the controller in the region of interest, B1 .6 6 The rest, and indeed the majority, of the elements reside in the outer buffer zone, Bo\Bi. Their weights and the shape of the controller in this outer buffer region do not matter as much in the grand scheme of our model and 69 Parameter Values 0.2 k 05 512 L 2 K Q 79 M 226 Table 4.2: FE Model Parameter Values for Model Verification. These values are used to evaluate the accuracy of our FE model approximation for the Stage I controller in the previously worked-out cases of linear controllers and 1-bit quantization controllers. Numerical Experiments using Finite Element Model 4.3 In this section, we shall illustrate use of our finite element model as an exploration tool to discover new aspects about the structure of the necessary condition. 4.3.1 Accuracy of the FE Model We begin by verifying the accuracy of our FE model formulation as specified in Section 4.2. In particular, we shall use known analytical results about linear controllers and 1-bit quantization schemes as discussed in Lemmas 3.3.1 and 3.3.3, respectively, to check our numerically computed results. We will use the parameter values indicated in Table 4.2, where we note that we have a very large buffer zone in order to mitigate edge effects in our calculations. To begin, let us consider the linear case. In Figures 4-3, 4-9, and 4-10,7 we plot the relative errors 8 in fx, No[fx], and g[f.\] for A E {0.1, 0.5, 1}. We note that in all 3 cases we have extreme fidelity with regards to our model mirroring the theoretical, even at a relatively small number of quadrature weights. We have greater accuracy the less "rounds" of integration we pursue, since small errors tend to accumulate over repeated integration. 9 However, we would like to point out that in cases of large A, our model does not work as well (see Figure 4-1110). In this case, our rational basis functions cannot effectively capture such sudden and wide increases. Even so, since most of the previous work on WC indicates that optimal controllers are bounded in some sense by the linear line of slope one (and, therefore, are never expected to have such large instantaneous do not affect significantly the results presented in this thesis. 7 Figures 4-9 and 4-10 are located in Section 4.4. 8 1f m(t) is the theoretical function and mh(t) our approximation, then the relative error, ER(t), is defined as follows: IMMt-TM~tl z()A ER [M FR[m, m(t) (maxjm(r),^n(r))9 E.g., computing g(c'I is a successive "round" of integration that builds upon the computation of No [Co]. ' 0 Figure 4-11 is located in Section 4.4. 70 4.5 X 10-7 4 3.5 1 x 10-6 0.9! 0.8 0.7 3 0.61 2.5 2 1.5 1 0.5 0.4 0.3 0.2 0.5 0.11 - -4 -2 x 2 4 6 8 10 (a) Relative Error for f: 3.5 U-S -6 -4 - -2 x 0 (b) Relative Error of No: ER[NO, No](x) ER[f, f(x) 10-7 - 3 2.5 2 1.5 1 0.5 - 4 -2 0 T 4 6 8 10 - 08-- G ---X (c) Relative Error of g: ER[g, g](x) Figure 4-3: Relative Errors for Approximation of Linear Controller with A = 0.1. denote functions approximated using our FE model with a hat, ^. 71 We 1.4 x 10-14 1.2 1 0.8 0.6 0.4 0.2. 0.8 01 -0.2 0.6 -0.4 0.4 -0.6 -0.8 0.21 ~10 -8 64-20 - -i--U -2- -- 24T-20 6810 x (a) Relative Error for f: ER[f, 1.2 X x (b) Relative Error of No: ER[No, No](x) f](x) 10-16 - 1 0.8 0.6 0.4 0.2 -0-- - x- o (c) Relative Error of g: S-10 ER[g, g](X) Figure 4-4: Relative Errors for Approximation of 1-Bit Controller with a = 1. We denote functions approximated using our FE model with a hat, increases), our model should perform adequately in exploratory pursuits of WC. Our model performs even better in the nonlinear 1-bit quantization case. In Figures 4-4, 4-12, and 4-13,11 we plot the errors in f, No[f0 ], and g[fa] for a E {1, 0, c}. Once again, we note that, in all three cases, our model performs extremely well and is able to accurately capture the theoretical features of 1-bit quantizers. We also did a test for large o (Figure 4-1412) and noticed that, unlike in the linear case, our FE model is extremely accurate in these regimes. This can be because, for large o-, there is only a small sudden change as opposed to the longer change involved in greater values of A in the linear case. Finally, we consider the linear case again and see if our model can accurately determine which linear coefficient values cause the variational equation in Theorem 4.1.1 to go to zero. We know from Witsenhausen in [3] that any optimal controller must satisfy the variational equation F-a.e., "Figures 4-12 and 4-13 are located in Section 4.4. 12 Figure 4-14 is located in Section 4.4. 72 4 I I I I I I 0.4 0.5 0.6 0.7 0.8 3.5 3 2.5 2 1.5 1 0.5 0 0 0.1 0.2 0.3 0.9 1 Figure 4-5: Absolute Size (IIG[f]({xi}_o)H12) of First-Order Condition for the Linear Case using FE Model. We set k = 0.2 and a = 5 and use the values for our FE model from Table 4.2. We see that the overall behavior of |IG[A]({x}/_i 0 )11 2 follows that of the theoretical IG[A]I from Figure 3-2, allowing us to verify the accuracy and suitability of the FE model. 73 and, from [12], we know that optimal controllers must satisfy it exactly. Therefore, let us consider how our model acts at certain "points of interest" within B1 . In fact, let us consider the shape of the Euclidean norms of G[AI] abscissa set, {xi}9 0 G[0]Ia 1 3 along A using just a 10-point Gauss-Hermite quadrature , as the points on which to evaluate G[LA]. We plot these norms in Figure 4-5. We note that I|G[A](.x)112 only becomes approximately zero at three values, located roughly around 0.05, 0.3, and 0.95. In fact, when we compare our graph in Figure 4-5 with the necessary optimal condition that we had derived earlier for the linear case (see Figure 3-2), we notice how similar the graphs look. Theoretically we have that the only three solution of (3.21) are A = 0.04174243050, 0.3031960455, and 0.9582575695. This matches up well with our findings using the FE model and so we can be even more confident in the fidelity of our FE model as an exploratory tool. 4.3.2 Optimizational Framework using FE Model Encouraged by the success of our model, which we discussed in Section 4.3.1, we now consider developing an optimizational framework in which we can use our model to investigate the necessary variational equation and determine new controllers that necessarily satisfy it. Using the same rationale as we had employed in Figure 4-5 to numerically determine which linear coefficients satisfy the necessary condition exactly, let us consider the following optimization: R-1 min llG[c 2 (4.17) Z G2[](x), i=:o CJ where {x,}i--1 represent R quadrature points, and F is any feasible set of L2 over which we choose to optimize. 14 Any cZ, that minimizes the above expression is therefore a candidate (through f[C]) for defining a controller that satisfies the necessary condition. Remark Before presenting the results of using our model to perform the above minimization, let us first comment on why this is a reasonable minimization to pursue. Why frame the problem in this manner? Should we not perhaps consider the entire 2-norm of G[c](x) over B1 instead? To answer this, let us consider the cost function in its most general form from Theorem 3.1.1: J[f] k 2E [(f (Xo) - Xo)2] + IE [f 2 (XO)j 13 - E [g 2 [f](Y)] . (4.18) Recall that , is the set of all FE boundary points. In a naive implementation, we would choose F = RM+1, but, because of computational constraints and the unwieldy nature of RM+1 when M is quite large, we will, in practice, choose a feasible set of much lower dimension. 14 74 20 15 10 5 0 -5 -10 -20 -20 -15 -10 -5 0 x 5 ' 10 Figure 4-6: Example of 5-Level Sinusoidal Quantizer with f ' -15 15 = 20 [-10, -4, 0,3, 15]. In this expression, we see that the underlying expectations are relative to either Xo or Y [f], both of whose p.d.f.s involve a Gaussian probability kernel, ( ) < (g). When we approximate the integrals of these expressions according to our quadrature rules, we find that the only values of our candidate controller f that affect the cost are those at the abscissa values. This, combined with the aforementioned fact that optimal controllers must satisfy G[f](x) = 0 exactly for all x, means that our search for controllers that minimize G[f] (as approximated through c2) at these abscissa points is an extremely worthwhile approach. Now, using the above framework and our FE model, we can pursue the mathematical optimization as defined in (4.17) using five quadrature points (i.e., R = 5) as our "points of interest." Using an initial condition of hnit = [0,0,0,0,0] and MATLAB@'s fminunc solver with its default properties for tolerance and precision [221, we choose a rather simple feasible set that is parameterized 75 to: CO(x- -1Ka) - 7r) +1]+ by 4 Wos ((r ) (x + Ko) -- 7) + ]+ f [N( = tk hs f ine < (x + o j Kor <x<Ko [cs (( ) 2 Ko, x < -Ka 1, -K n fx < -u(q)eKs x<( Ko, (4.i 0 X - 7) + 1] +f3 s(( 4, i, -); 1] at - (-) Ka <x<0 o -, 2 , x ;> K o, We refer to such functions f [, ] as 5-level sinusoidal quantizers.15 Remark Basically, f [-] is defined by five "levels" that indicate distinct points/intervals on which f[i] takes on the values defined by - : - 1 on (-00, and - 5 on [Ko, oo). The values of j2K) o 3at0 4a j22a o f [fl are obtained by stretching a scaled and shifted positively increasing cycle of a cosine function. An example of what f[f] looks like for a given f is shown in Figure 4-6. Next, let us discuss why we chose this particular feasible set of solutions. The first reason is quite obvious: by restricting our optimization from a finite-dimensional space of dimension 226 to one of size 5, we have a much more tractable and practical setting in which we can obtain meaningful results using standard mathematical optimization solvers in a reasonable amount of time. Second, we chose this particular form for our feasible set since it allows us to control the shape of the controller while keeping it analytic; our approximation of connected regardless of the choice of ' f[f] remains smooth and . Since we already know that optimal controllers have to be at least piecewise-analytic (Theorem 2, [12]), any controller in our feasible set as defined in (4.19) is therefore a good candidate for an optimal controller. Indeed, if we increased the number of levels in our feasible set, we would naturally have greater flexibility (and a greater chance) of obtaining better-performing controllers, but, for proof of concept, a size of n = 5 is acceptable, as we shall see next. Remark It should be noted that we are still using our FE model as described earlier and merely tailoring the larger set of weights, c., so that they approximate a function with the stipulations 1 5 We note that we can easily determine the appropriate FE weight vector c2 for our 5-level sinusoidalquantizers by simply sampling f[f] at a. So, implicitly, when we define our feasible set of 5-level sinusoidal quantizers for the mathematical optimization in (4.17), we are considering the subset of Rm+1 whose weight vectors can conceivably represent a sampling of a feasible 5-level sinusoidal quantizer at . =[-i,- 251,4,'%] according 76 Parameter Values k 0.2 a5 L 512 K 2 Q 79 226 M Table 4.3: FE Model Parameter Values used to find Optimal 5-level Sinusoidal Quantizer. These parameters were used to fully define the optimization problem (4.17). Optimal Weights 5*, 1 -0.611762678445885 5T, 2 -0.164533341705034 t5*, 3 t5*, 4 5* 5 -0.000110638239963 0.164268853065035 0.611542404605543 Table 4.4: Optimal 5-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. as given in (4.19). Therfore, we could say that we are "bootstrapping" our 5-level smooth feasible controller model on the back of our (M + 1)-weight FE model. Using the above feasible set and the FE model parameters as specified in Table 4.3 with R = 5, we obtain the optimized result of G[f-*] as shown in Figure 4-8. The weights of this optimal 16 function are depicted in Table 4.4, and the value of the objective function as computed by our model is 4 G 2[ig*(X,) = 0.0000001474794586097925, i=O which is well within the realm of approximately satisfying the necessary condition at the five important points. However, when we compute the cost of f[f*], we have J[f5*] = 0.963530809260363, which is greater than the best linear cost for the (k = 0.2, - = 5) case (J[A*] = 0.96) and much greater than the as-of-now best known cost (J[f*] = 0.1670790, [6]). Obviously, this is not ideal since we want to find the Stage I controller of minimum cost (that is, after all, the main objective of 16 in the sense of satisfying the necessary condition at the specified R quadrature points. 77 0.8 0.4 - 0.2 - - 0.6 0 -0.2 -0.4 -0.6 -0.8 -20 -15 -10 -5 0 5 10 15 20 Figure 4-7: Optimal 5-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 78 --------- 7 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.4 -0.5 - -0.3 - -0.2 -20 -15 -10 -5 0 5 10 15 20 x Figure 4-8: Necessary Functional Equation (G[rj*]) for Optimal 5-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 79 the problem). However, as we discussed earlier in Chapter 3, this is not unexpected since there exist many solutions to the necessary condition for any given (k, -) pair, and our approximate solution is not necessarily the largest. In addition, one may wonder if our solution is zero everywhere else, other than the R = 5 quadrature points of interest at which we evaluated G[ ]. In this case, our optimal 5-level sinusoidal quantizer is in fact nonzero everywhere outside of the R quadrature points, as Figure 4-8 shows. However, this is not unexpected since the objective function in (4.17) does not take into account the 1 values of G[f] outside of the R quadrature abscissa points, {X}?_- . This means that our optimal sinusoidal quantizer does not satisfy the necessary functional equation everywhere, but we have to keep in mind that, in the context of the overall problem, the most "important" points at which we want to ensure the necessary condition is satisfied are the Gaussian quadrature points. In our case, we accomplish this using the first five Gaussian quadrature points. Now, we should note that we can find solutions to the optimization problem depicted in (4.17) using other n-level sinusoidal quantizers and a greater number of Gaussian quadrature points. Indeed, we can depict the n-level sinusoidal quantizer as a function of the form: - ,1, f [tn](x) = ( nj+1 -inj) conn (x - (n (j - 1) - B) + nj, tn,n, if x < --B if x Ez j, j =I1 . .. ,In-- if x > B (4.20) where conn(x) = x cos - ir + J and B min{ac- I ac- > x, x E {x2 } _ 1 , a E N}, with nB and 2B n - I We can then solve (4.17) numerically for various values of n - R.1 7 In Figures 4-15, 4-16, 4-17, 4-18, 4-19, and 4-20 as well as in Tables 4.6, 4.7, and 4.8, we demonstrate optimal n-level sinusoidal 1 7 We set n = R so as to allow for greater degrees of freedom in solving (4.17). 80 Optimal Costs & Norms n =R 3 4 5 6 ||G[T*]||2 1.771340660161439e-07 1.116564623767737e-07 1.474794586097925e-07 1.645876831841833e-06 J [_*]I 0.961825561638322 0.961581394710436 0.963530809260363 2.075764121916171 Table 4.5: Costs and Necessary Functional Equation Norms (IIG[-r*]J). These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3 for n = R = 3, n = R = 4, n = R = 5, and n = R = 6 and Gauss-Hermite quadrature pairs as depicted in Table 4.9. quantizers for the cases of n = values of |IGL,*]H R 3, n = R = 4, and n = R = 6.18 Table 4.5 shows their respective and J[f*]. We notice that, just like in the case of n = R = 5, G[f *] ~~0 while the overall costs J[',*] are still above the optimal linear cost J[A*] = 0.96, but, as we discussed earlier, this is not unexpected nor extremely consequential given the nature of the randomness in WC. Also, it should be noted that our obtained optimal functional solution in the n = R = 6 case (as depicted in Figure 4-19) is not non-decreasing, a property that optimal controllers in WC must possess [3]. This lack of non-decreasing behavior is reflected in the overall cost of this 6-level sinusoidal quantizer being much greater than the benchmark linear case: J[f*] 4.3.3 = 2.075764121916171 > 0.96 = J[A*]. Analysis of Results From these results, we can state concretely a couple of positive conclusions. For one, we have demonstrated that our FE model can be used not only in a verification capacity but also as a tool to make meaningful progress on WC through optimization frameworks and feasible set definitions. For another, we have discovered an additional function besides linear functions that can approximately satisfy the necessary functional equation at three, four, five and six important quadrature points. In fact, we strongly believe that, if we increase the number of quadrature points we optimize over and/or the number of weights in our feasible set, we can obtain even better approximations for any number of quadrature points! In addition, we can state some more observations about solutions to the variational equation using Figure 4-8. We note that our minimized solutions (with the exception of - *) naturally tend towards odd symmetry, which agrees with our intuition since all of the independent random variables in 7r(k2 , ,2 ) are symmetric. Indeed, we also notice that our optimal controllers are generally increasing, a characteristic that agrees with the result of Lemma 7 from Witsenhausen [3], which 18 Figures 4-15, 4-16, 4-17, 4-18, 4-19, and 4-20 and Tables 4.6, 4.7 and 4.8, are located in Section 4.4. 81 states that optimal controllers must be non-decreasing. Finally, for the specific case of n = R = 5, we note how -N,2, T5,3, and T5,4 are all closely grouped around the origin. This makes sense because the origin is where most of the underlying probability of X 0 is concentrated, and that, by sending more of this probability to this region, our controller is making the job of our MMSE estimator in Stage II easier (meaning less estimation error, which, in turn, means less overall cost). This latter behavior is also mirrored in the optimized controllers for n = R = 3 and n = R = 4. All of these observations agree with the rationale employed in previous attempts in solving WC by optimizing over (2n + 1)-quantization schemes. Thus, they provide great credence to both the veracity of our model as well as its potential as a tool for investigating WC in further detail in other research contexts. 4.4 Additional Plots for Chapter 4 82 1 x 10-6 x 0.9 1 0.8 0.7 0.6 0.5 0.8' 0.6 0.41 0.3 0.4 0.2 0.1 0.2 910 -8 6 -4 -2 10-6 x Qi -8 -6 -4 -2 2 4- 6 8 10 (a) Relative Error for f: ER[f, f](x) x --Z 4 6 9 0 (b) Relative Error of No: ER[No, No](x) 1.5 x 10-7 1 0.5 10-8 6 -4 -2 0 2 x 4 6 8 10 (c) Relative Error of g: ER[g,g](x) Figure 4-9: Relative Errors for Approximation of Linear Controller with A denote functions approximated using our FE model with a hat, ^. 83 = 0.5. We 1 4.5 x 10-3 4 -10-6 0.9 0.8 3.5 0.7 31 2.5 0.6 0.5 2 0.4 0.3 1.5 0.2 0.5 0.1 0-8 9 6 -4---- --2-4 6 8 10 x (a) Relative Error for f: ER[f, 10-8 -G -4 -2-0 x 4- IO (b) Relative Error of No: ER[No,No](x) f](X) 1.5 x 10-3 0.5 t-S4 2 2-4 x 2 4 6 8 Ii0 (c) Relative Error of g: ER[g, ](x) Figure 4-10: Relative Errors for Approximation of Linear Controller with A denote functions approximated using our FE model with a hat, ^. 84 = 1. We i x-10-6 0.9 0.7 0.6 0.8 0.5 0.7 0.6 0.5 - 4 6 8 10 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 01 -8 - -4 -2 x QG -8 -(15 -4 -2 4 6 8 10 2 (a) Relative Error for f: 2 (b) Relative Error of No: ER[No, No](x) ER[fj](X) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 910-8 -6-4 -2 x 2 4 810 (c) Relative Error of g: ER[g,g](x) Figure 4-11: Relative Errors for Approximation of Linear Controller with A denote functions approximated using our FE model with a hat, ^. 85 5. We 1.4 X 10-14 1.6 x 10-16 1.4 1.2 1.2 1 1 0.8 0.6 0.8 0.6 0.4 0.2 91048 0.4 0.2 -4-2 x 6 2 (a) Relative Error for f: S 0 8 - -4 -2 x 2- 4 0 (b) Relative Error for No: ER[No, No](x) ER[f, f](x) 10-16 3.5 3, 2.5 2 1.5' 0.5 0 9 -4 20 x 810 (c) Relative Error of g: ER[g, ](x) Figure 4-12: Relative Errors for Approximation of 1-Bit Controller with a denote functions approximated using our FE model with a hat, ^. 86 = 2.8. We 1.4 x 10-14 0.81 0.6 0.4 1.2 0.2 0.8 01 -0.2 -0.4 -0.6 -0.8 0.6 0.4 0.2 -i -0 o8 -6 -4 -2 0x 2 4 6 8 10 - - - -2 x 4 6 0 (b) Relative Error for N0 : ER[No, No](x) (a) Relative Error for f: ER[f, f](x) 4;-x 10-16 3.5 3 2.5 2 1.5 1 0.5 910-8-6 -4 -2 0 2 4 G 8 0 (c) Relative Error of g: ER[g, g](x) a = 5. We Figure 4-13: Relative Errors for Approximation of 1-Bit Controller with a = denote functions approximated using our FE model with a hat, ^. 87 1.2 X 10-14 0.8 0.6 0.4 0.8 0.2 0 0.6 -0.2 -0.4 0.4 -0.6: 0.2 -0.8 -8 -6 -4 2 0 2 x (a) Relative Error for Qi 46810 f: ER[f, f](x) 6 -8 -4 -2 U x Z I 4 Z5 I IU (b) Relative Error for No: ER[No, No](x) 10-14 1 U 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3: 0 .2 0 .1 0 0-- x 41 110 (c) Relative Error of g: ER[g, ](x) Figure 4-14: Relative Errors for Approximation of 1-Bit Controller with a = 0 2 = 25. We denote functions approximated using our FE model with a hat, ^. 88 Optimal Weights t3*,2 -0.370552234771981 0.000071861774708 3,3 0.370689948937737 t3*, Table 4.6: Optimal 3-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 0.4 0.3 0.2 0.1 0 -0.1 -0.3 - -0.2 -0.4 -20 -15 -10 -5 0 5 10 15 20 Figure 4-15: Optimal 3-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 89 0.1 II I I I -4 -2 0 2 I I - 0.08 I 0.06 - 0.04 - 0.02 0 - -0.02 -0.06 - -0.08 - - -0.04 ' -0.1 -10 -8 -6 4 6 8 10 Figure 4-16: Necessary Functional Equation (IIG[r3*1]) for Optimal 3-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 90 Optimal Weights f4*1 -0.498521259836172 t 4 ,2 -0.155696146588598 0.155941726138290 0.498769790186308 tZ*3 4* 4 Table 4.7: Optimal 4-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -20 -15 -10 -5 0 5 10 15 20 Figure 4-17: Optimal 4-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 91 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -20 -15 -10 -5 0 5 10 15 20 Figure 4-18: Necessary Functional Equation (flG[T] I|) for Optimal 4-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 92 Optimal Weights -4.67326888267048 t*h2 6.617177471430276 -3.621888183288994 T 6 3 t6, 42.585931219917199 T6 ,5 -2.669563529874990 t6T,1 6 ,6 6.475537965147624 Table 4.8: Optimal 6-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 8 6 4 2 0 -2 -4 -6 -20 -15 -10 -5 0 5 10 15 20 Figure 4-19: Optimal 6-Level Sinusoidal Quantizer. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 93 2.5 2 1.5 1 -0.5 0 -0.5 ' -1 -20 -15 -10 -5 0 5 10 15 20 Figure 4-20: Necessary Functional Equation (JIG[r*]|I) for Optimal 6-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. 94 Gauss-Hermite Quadrature Values R {xi}_____ 3 -8.660254037844387 0 8.660254037844387 _ -11.672071091694885 4 5 6 1 1 1 [ F0.045875854768068 1 0.454124145231932 -3.709818921513630 3.709818921513630 L 11.672071091694885 _ -14.284850069364028 -6.778130899871329 0 6.778130899871329 14.284850069364028 _ -16.621287167760595 -9.445879388768555 -3.083532950962971 3.083532950962971 9.445879388768555 16.621287167760595 {Wi_1 ~ 0.166666666666667 0.666666666666666 0.166666666666667 _ _ 0.454124145231932 L 0.045875854768068 j 1 0.011257411327721 0.222075922005613 0.533333333333332 0.222075922005613 _ 0.011257411327721 j 0.002555784402056 0.088615746041914 0.408828469556030 0.408828469556030 0.088615746041914 _ 0.002555784402056 j Table 4.9: Gauss-Hermite Quadrature Nodes ({xi}jL 1 ) and Weights ({wi} were determined computationally for the case of (k = 0.2, o = 5). 95 ). These values 4.5 Derivation of FE Model Approximations for Chapter 4 In this section, we will detail our derivations of the FE model formulas and equations that are derived from our approximated controller, f [cZ]. Proof a) No If [ ] ] (y) = J: - f [c.](x))dF(x) '/(y ~0 iq(y - f[P1(Yi)) No [c(y). b) Ni[f []](y) = - 0f [](x)(y f [c] (x))dF(x) ~ 6if[](i)#$(y - f[Z](zi)) i[f[c]](y). c) g I[f N1 [f []](y) No[f []](y) ]I(Y) fq] JJWX\No[f[c]](y) f[](x) ~ - f P]() (No [c'](y) q!4y - f P1](X)) N(y - f[PJ(i)) No [ o](y) ( Q[ (y d) '[f []](y) = -y + g[f []](y) 96 dF(x) dF(x) -y + g[](y) A?7[W](y). e) G[ f []I(x) = 2k 2 (fp](X) + _ X) [(y (y - f[ c2](x))r7f[c ]](y) [C](X))2 + n[f []](y)(y - f[](X)) - - 21 dy 2k 2 (f[cj(x) - x) q(y - f[P](x))7] 2k 2 (f[C](x) - x) + (y) [(y - j + f[C0](X))2 + T[](y)(y Win[C](X, + f [p](x)) [x2 + x7 f[p](x)) - 2] dy - ](x, + f [](x)) -21 f) k 2 E [(f (Xo) - f=(f](X) Xo)2] +JE [f 2p](Xo)] X)2 dF(x) Q Eiif 2 ](ii) + (f [CQl(y ) - i) fi23f2 K](y,) 100 ( j0 0 2[ ](y)#(y - f Q f[ ](x))dy) dF(x)dy f [](x))) dF(x) Q Ewi (fj002[C] - ?_,f 2 )+ " [] (xi + _ f2 [;]y, += 0i (fp](y) - i f[](x))dF(x)) dy Q Q Q - Q + 7b (y jg02[c;2](y) i 1 _f P1 V) Q k2 92 [f [C]](y)No[f [C]](y)dy (x, + i1 = k2 j Q 7bi - k2 E[g2 - f 2[](x)dF(x) - +- ( f [c](x)) dF(x)) i=1 i=1 Q =2 7 (fp](y,) )+ - ?_D,(f (] (;f) i=1 A i p] i=1 i=1 Q1 = k2 ?bf2 - i) + L 7_,if 2p](Z,) _ [C )]. 97 (j=1 WIj2 p] (X, + f [p](LF)) i=1 j=1 i=1 Lwi[Q Lq&gP(xi + f [Q(j))) - Jf [K]] 98 Chapter 5 Conclusion 5.1 Contributions In this thesis, we examined the necessary condition for optimality in WC. This necessary condition presents itself as a homogeneous integral equation in terms of the Stage I controller, be solved exactly by any optimal f. f, and must We started by presenting WC in full generality in two forms, the classical version, originally put forth by Witsenhausen in [31, and a new variant using optimal transport theory, put forth by Wu and Verdu in [121. We then proceeded to derive the necessary condition in full using the first-order condition for optimizing functionals from calculus of variations. Having done this, we developed a computational model to investigate solutions to this necessary condition in the case when all the native random variables are Gaussian, a system specification that we had denoted as 7r(k 2 , a 2 ). Using a finite element analysis approach, we specified elements and detailed our basis functions. These basis functions were asymmetric rational basis functions, which we used primarily for their computational bias towards nondecreasing functions as well as because of their smoothness. We verified our model against theoretically derived expressions for the cases of both linear controllers and 1-bit quantizing controllers. We then used this finite element model within a mathematical optimizational framework to approximately solve the necessary condition at specified Gauss-Hermite quadrature points. We defined a simple family of n-parameter controllers (the n-level sinusoidal quantizers) and demonstrated that we could find controllers that approximately satisfy the necessary condition within this family. 99 5.2 Future Work There are a number of directions for future research that the work in this thesis opens up. For one, our successful use of finite element analysis in a control-theoretic setting further enforces an already growing trend within the field and should encourage more analysis of control problems using similar numerical techniques. In addition, our repeated and successful use of Gaussian quadrature in approximating expectations and integrals involving Gaussian probability kernels should signal to other researchers that this simple numerical integration scheme can be used to quickly and accurately test the performance of their designed controllers before stress testing them using more sophisticated methods like Monte Carlo sampling. Moreover, our development of a versatile n-parameter controller family demonstrates that controllers need not be too complicated in order to satisfy the necessary condition. In fact, a very interesting research direction would be to increase the number of weights and generalize our nparameter controller family to a more sophisticated family of controllers. Additionally, some of the efficient search methods that have been used in both [51 and [6] with great success for the (2n+1)-bit quantization controllers could be employed within this feasible set. Finally, our work has shown that the necessary condition itself can be treated in a computational manner and is not as daunting of a object to study as it would appear. In fact, one could use the general structure of our model to study the effects of using different basis functions and/or unevenly sized elements in an effort to model even more accurately the equations and formulas that arise in WC. For example, one could look into using quadrature abscissa points as the boundary points of the elements instead of the uniformly spaced boundary points we utilized in our model. 100 Appendix A MATLAB® Code In this appendix, we list the main MATLAB@ functions that we developed for our computational finite element model. A.1 Function Arguments In the functions below, the common arguments used are the following: " a is a function handle that maps the ordinal position of an element in the FE model (Section 4.2) to its location on the abscissa real axis; * Bnd refers to the supremum of the BI bounded interval in which we conduct the computation using the FE model, i.e., Bnd = Ko- (Section 4.2.1); * den is the denominator of a fractional quantity; * Del refers to the mesh size of our FE model, A (Section 4.2.1); " f is a function handle representing the Stage I controller, typically composed of our FE approximation via approxRBF and requiring both an argument at which to evaluate it (i.e., x) and a vector of FE weights (i.e., w) (Section 4.2); " fVec is a vector comprising the values of f when f is evaluated at the abscissa values of the Gauss-Hermite quadrature [ XO, WO ] ; " j refers to the jth element of the FE model, j = 1,..., M (Section 4.2.1); " k refers to the scaling factor for the Stage I cost in WC (Section 3.1); 101 " 1 indicates the first sub-basis function when 1 = 1 and the second sub-basis function '1 2 when 1 = 2 (Section 4.2.1); " m is the total number of elements, or number of points at which we place a basis function, which we formally denoted as M (Section 4.2.1); " num is the numerator of a fractional quantity; " tauVec is a vector of length n representing the weights of an n-level sinusoidal quantizer (Section 4.3.2); " w is a vector of length M representing the weights (ca), or values at each element's location, of the overall FE model (Section 4.2); " x is a general variable declaration at which to evaluate a given function, typically representing the initial Stage I random variable realization Xo (Section 3.1); " y is a general variable declaration at which to evaluate a given function, typically representing the output; [X0, WO ] are vectors of length Q (Section 4.2.2) representing the Gauss-Hermite quadrature XO = {xi}-_O , (Section 2.3) abscissa points, and the probability weights at those points WO = for the initial Stage 1 random variable Xo {fwi}QI _ - (O, .2 ) (Section 3.1); [ X, W] are vectors of length Q (Section 4.2.2) representing the Gauss-Hermite quadrature (Section 2.3) abscissa points, X= [p};11 and the probability weights at those points, W= iO for the additive white Gaussian noise V ~ .(O, 102 1) (Section 3.1); * [X1, W1] are vectors of length R (Section 4.3.2) representing the Gauss-Hermite quadrature (Section 2.3) abscissa points, X1 = [{xi}R1 and the probability weights at those points, W, = IIG[ ]i at which A.2 {fWi~R-1]> (Section 4.3.2) is evaluated. Function Specifications Next, we shall detail the MATLAB® functions that we developed to obtain the results in this thesis. alpha.m This function calculates the approximate value of the right summand in the necessary functional equation G[f](y) (Theorem 4.1.1) using a FEM representation of the Stage I controller f via 77[Ci] (Section 4.2.2): Q-1 ret : wjy[w](xj + y) (2(xj + y) 2 + T[W](Xj + y) - 2). = j=0 1 function [ ret ] = alpha( x, w, f, fVec, XO, WO, X, W 2 phi = @(t) 3 Xfx = X+arrayfun(@(t) 4 etaVals 5 diff = arrayfun(@(y) 6 FuncVals = etaVals.*(2.*diff.A2 + etaVals.*diff - 7 ret = WO'*FuncVals; 8 1/sqrt(2*pi).*exp(-1/2*t.^2); f(t,w),x); = arrayfun(@(y) y, eta(y,w,f,fVec,X0,WO), Xfx)-arrayfun(@(t) end 103 Xfx); f(t,w), x); 2); approxRBF.m This function creates a function handle for a FE model of a continuous function represented by the element weight vector w (Section 4.2.1): val [ function ] val approxRBF( x, = w, Del, m, a 0; 2 val 3 curElem = getCurElem(x,m,a); 4 if = f[w]. curElem >= 1 if curElem <= m 5 val 6 = w(curElem) .*getBasisFcn(x,curElem-1,Del,m,a) + . 1 = w(curElem+l) .*getBasisFcn (x,curElem,Del,m,a); 7 else val 8 9 end else 10 11 val 12 13 = a(m); = a(O); end end calcGF.m This function calculates IIG[f] IIR (Section 4.3.2) using the FE model representation of the necessary functional equation G[Z] (Section 4.2.2): G = JIG[w]IIR. 1 function [ G ] = calcGF( f, tauVec, 2 w = getCosStepVec( tauVec, 3 G = norm(arrayfun(@(t) X, W, Xl, Wl, m, a, Bnd a, Bnd); 2*k^2*(f(t,w) -t)+alpha(t,w,f,arrayfun(@ (u) f(u,w),XO),XO,WO,X,W), 4 m+l, k, XO, WO, Xl),2); end 104 ... calcJ.m This function calculates J[f] (Section 3.1) using a FE model representation of the Stage I controller f via J[.,] (Section 4.2.2): J = J[w]. 1 [ J ] function = calcJ( f, tauVec, k, 2 w = getCosStepVec( tauVec, m+1, 3 fVec = arrayfun(@(u) XO, a, WO, X, W, X1, m, a, Bnd Bnd); f(u,w),XO); 4 5 J = WO'*(k^2.*((fVec - XO).^2)); 6 J = J + WO'*(fVec.^2); 7 J = J - 8 % J + WO'*arrayfun(@(t) WO'*arrayfun(@(t) X+arrayfun(@(v) % WO'*arrayfun(@(t) f(t,w)^2, XC); XC); mmse(u,w,f,fVec,X0,WO)^2, W'*arrayfun(@(u) f(vw),t)), k^2*(f(t,w)-t)^2, XO); end eta.m This function calculates the approximate value of q(y) (Section 3.1) using a FE model representation of the Stage I controller f via r[] (Section 4.2.2): ret = r/[w](y). 1 function ret 2 3 [ ret ] = eta( y, w, arrayfun(@(v) -v f, fVec, XO, WO) + mmse(v,w,f,fVec, end 105 XO, WO),y); getBasisFcn.m This function calculates the output of the jth basis function 4' when evaluated at the abscissa value of x: val = ij(x). 1 [ val ] = getBasisFcn function 2 val = 0; 3 if ( x, j, Del, a) m, 0 j < return; 4 j > m; elseif 5 return; 6 j == 0 elseif 7 val = getRatBasisFcn(x-a(0),Del,1); 8 elseif j 9 10 if m x == a(m) val = 1; 11 else 12 val = getRatBasisFcn(x-a(m-1),Del, 13 2 ); end 14 else 15 val = getRatBasisFcn(x-a(j),Del,l) 16 + getRatBasisFcn(x-a(j-1),Del,2); end 17 18 == end getCosStepVec. m This function calculates the FE model weight vector c2 (Section 4.2.1) from a given n-level sinusoidal quantizer realization via tauStar. [ wO ] = getCosStepVec( tauVec, m, a, Bnd) 1 function 2 %m = numElements; a = partitionFcn; 3 wO = zeros (m,l); 4 NWVec = length(tauVec); 5 numBndWeights = NWVec-2; 106 6 Delta = 2*Bnd/(numBndWeights+1); 7 connectorFcn = @(t) 8 begElem = int32(getCurElem(-Bnd,m,a))-l; 9 wO(1:(begElem+l)) 0.5*(cos((pi/Delta)*t-pi)+1); = tauVec(1).*ones(begElem+1,1); 10 endElem = int32(getCurElem(Bnd,m,a)); 11 wO((endElem+l):m) 12 bVec = = tauVec(NWVec).*ones(m-endElem,l); [begElem, arrayfun(@(t) ... int32 (getCurElem(-Bnd+t*Delta,m,a)), 1:numBndWeights) - 1, endElem]; for i = 2:length(bVec) 13 = arrayfun(@(t) subWVec 15 offsetInd = a(double(bVec(i-1))); 16 wO((bVec(i-1)+l):(bVec(i)-1)) = connectorFcn(t-offsetInd), wO(bVec(i)) 17 [(bVec(i-l)+1):(bVec(i)-1)); a(double(t)), 14 (tauVec(i)-tauVec(i-1)).*arrayfun (@(t) subWVec) ... + tauVec(i-1); = tauVec(i); end 18 end 19 getCurElem.m This function returns which element, or, more specifically, which element interval, a given abscissa value lies in. For example, if x E Ej (Section 4.2.1), then elemNum = 1 function [ elemNum 2 elemNum = 0; 3 12m = log2(m); 4 if x >= a(0) 5 6 7 8 = getCurElem ( x, m, if x < a(1) elemNum = 1; elseif x <= a(m) if x > a(m-1) elemNum = m; 9 10 ] else 11 elemNum = m/2; 12 for i = 2:(12m) 107 a) j. if x >= a(elemNum) if x < a(elemNum+l) elemNum = elemNum + 1; return; else elemNum = elemNum + 2^(12m-i); end else if x >= a(elemNum-1) elemNum = elemNum; return; else elemNum = elemNum - 2^(12m-i); end end end end else elemNum = m+1; end else elemNum = 0; end 36 end getRatBasisFcn.m This function calculates the output of the 1th sub-basis rational basis function when evaluated at the abscissa value x: val = 1 2 val 3 if 4 [ val ] = getRatBasisFcn function = Del, 0; x >= if ( x, (1(x). 0 x < Del 108 1) j (Section 4.2.1) 1 if 5 val 6 = 2 4/3-(4/3)/(1+(x/Del)+(x/Del)^2+(x/Del)^3); elseif 1 == 1 7 val = -1/3+(4/3)/(1+(x/Del)+(x/Del)^2+(x/Del)^3); 8 end 9 end 10 end 11 12 == end GF.m This function calculates G[f](y) (Theorem 4.1.1) using a FE model representation of the Stage I controller f via G[Z,] (Section 4.2.2): G 1 [ G ] function = GF( f, tauVec, 2 w = getCosStepVec( tauVec, 3 G = @(t) 4 = G[w](y). k, XO, m+1, a, WO, X, W, Xl, Wi, m, a, Bnd Bnd); f(u,w),XO),XO,WO,X,W); 2*k^2*(f(t,w)-t)+alpha(t,w,f,arrayfun(@(u) end mmse.m This function calculates the approximate value of the MMSE g(y) (Section 3.1) using a FE model representation of the Stage I controller f via g[c] (Section 4.2.2): ret = g[w](y). 1 [ ret function ] = mmse( y, w, f,fVec, 2 tol = le-150; 3 phi = @(t) 4 condPDF = smallDivide(arrayfun(@(u) 1/sqrt(2*pi).*exp(-1/2*t.^2); f(t,w),u)),y),XO), 5 XO, WO) arrayfun(@(t) arrayfun(@(v) phi(v-arrayfun(@(t) ... NO(t,w,f,XO,WO),y), tol); FuncVals = fVec.*condPDF; 109 R.qq.ppm a "PRW ret = WO'*FuncVals; 6 7 end NO.m This function calculates the approximate value of the No(y) (Section 3.1) using a FE model representation of the Stage I controller f via No[c] (Section 4.2.2): ret = No[w](y). 1 [ ret function @(t) ] = NO( y, w, phi 3 FuncVals = arrayfun(@(t) 4 ret = WO'*FuncVals; 5 XO, WO) 1/sqrt(2*pi).*exp(-1/2*t.^2); 2 = f, arrayfun(@(u) phi(u-arrayfun(@(v) f(v,w),t)),y),XO); end quantSign.m This function represents a 1-bit quantizer function out = 1 2 N = length(x); 3 out = ones (N, 1); 4 for j = 1:N 5 if < out(j) 6 0 = -1; end 7 end 8 9 x(j) 1 (Section 3.3.2): fala=1 (x). [ out ] = quantSign( x function fal- end 110 smallDivide.m This function serves as a wrapper around a division operation such that, if a MATLABS computation result is in either an oc or NaN declaration, the function returns 0. Otherwise, this function returns n,. den* 1 function [ val ] = smallDivide( num, 2 n = length(num); 3 val = zeros(n,1); 4 for i = 1:n 5 val(i) 6 if isinf(val(i)) 7 = num(i)/den; val(i) = 0; 8 end 9 if isnan(val(i)) val(i) 10 11 = 0; end end 12 13 den end 111 112 Bibliography [1] Y.-C. Ho, "Team decision theory and information structures," Proceedings of the IEEE, vol. 68, pp. 644-654, June 1980. [2] J.-M. Bismut, "An example of interaction between information and control: The Transparency of a game," IEEE Transactions on Automatic Control, vol. 18, no. 5, pp. 518-522, 1973. [3] H. S. Witsenhausen, "A counterexample in stochastic optimum control," SIAM J. Contr., vol. 6, no. 1, pp. 131-147, 1968. control [4] S. Mitter and A. Sahai, "Information and control: Witsenhausen revisited," in Learning, and hybrid systems (Y. Yamamoto and S. Hara, eds.), vol. 241 of Lecture Notes in Control and Information Sciences, pp. 281-293, Springer London, 1999. [5] J. T. Lee, E. Lau, and Y.-C. Ho, "The Witsenhausen counterexample: A hierarchical search approach for nonconvex optimization problems," IEEE Transactions on Automatic Control, vol. 46, pp. 382-397, March 2001. [6] N. Li, J. R. Marden, and J. S. Shamma, "Learning approaches to the Witsenhausen counterexample from a view of potential games," in Proceedings of the 48th IEEE Conference on Decision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference (CDC/CCC), December 2009. [7] Y.-C. Ho and T. S. Chang, "Another look at the nonclassical information structure problem," IEEE Transactions on Automatic Control, vol. 25, pp. 537-540, June 1980. [8] C. H. Papadimitriou and J. N. Tsitsiklis, "Intractable problems in control theory," Proceedings of 24th IEEE Conference on Decision and Control, pp. 1099-1103, December 1985. [9] P. Grover, A. Sahai, and S. Y. Park, "The finite-dimensional witsenhausen counterexample," Proceedings of the 7th InternationalSymposium on Modeling and Optmization in Mobile, Ad Hoc, and Wireless Networks, 2009 (WiOPT), pp. 1-10, June 2009. [10] P. Grover and A. Sahai, "Witsenhausen's counterexample as Assisted Interference Suppression," Int. J. Syst. Control Commun., vol. 2, no. 1/2/3, pp. 197-237, 2010. [11] C. Choudhuri and U. Mitra, "On Witsenhausen's counterexample: The asymptotic vector case," in Proceedings of 2012 IEEE Information Theory Workshop (ITW), pp. 162-166, September 2012. [12] Y. Wu and S. Verdu, "Wistenhausen's counterexample: A view from optimal transport theory," in Proceedings of 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC), pp. 5732-5737, December 2011. 113 [13] D. Liberzon, "Calculus of variations and optimal control theory: A concise introduction." http://liberzon.csl.illinois.edu/teaching/cvoc/node1.html, 2010. [Online; accessed 27-August2014]. [14] J. A. Gubner, "Gaussian quadrature and the eigenvalue problem," 2009. [15] G. H. Golub and J. H. Welsch, "Calculation of Gauss quadrature rules," Math. Comp., vol. 23, no. 106, pp. 221-230, 1969. [16] F. Santambrogio, "Introduction to optimal transport theory." Lecture notes, June 2009. [17] C. Villani, Optimal Transport, Old and New. Springer, 2009. [18] Y. Wu and S. Verdu, "Functional properties of minimum mean square error and mutual information," IEEE Transactions on Information Theory, vol. 58, no. 3, pp. 1289-1301, 2012. [19] Y. Wu and S. Verdu, "Functional properties of MMSE." [20] D. Guo, Y. Wu, S. Shamai, and S. Verdu, "Estimation in Gaussian noise: Properties of the minimum mean-square error," IEEE Transactions on Information Theory, vol. 57, pp. 23712385, April 2011. [21] A. van Niekerk and F. D. van Niekerk, "A Galerkin method with rational basis functions for Burgers equation," Computers Math. Applic., vol. 20, no. 2, pp. 45-51, 1990. [22] MATLAB, Version 8.4.0.150421 (R2014b). Natick, Massachusetts: The MathWorks Inc., 2014. 114