A Numerical Study of Witsenhausen's Counterexample
by
Jordan Romvary
B.S. in Electrical and Computer Engineering, Rutgers University, 2012
Submitted to the Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
at the
ARCHIVES
MASSACHUSETTS INST1TUJTE
OF TECHNOLOLGY
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
JUL 0 7 2015
June 2015
@
Massachusetts Institute of Technology 2015. All rights reserved.
A uthor....................
LIBRARIES
Signature redacted
Department of Oectrical Engineering-4nd Computer Science
May 18, 2015
Certified by.................
...........
Signature redacted
Pablo Parrilo
ofessor of Electrical Engineering
Thesis Supervisor
Accepted by ...........................
Signature redacted
DeCKA. Kolodziejski
Chair, Department Committee on graduate Studies
A Numerical Study of Witsenhausen's Counterexample
by
Jordan Romvary
Submitted to the Department of Electrical Engineering and Computer Science
on May 15, 2015, in partial fulfillment of the
requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
Abstract
In this thesis, we consider Witsenhausen's Counterexample, a two-stage control system in decentralized stochastic control. In particular, we investigate a specific homogenous integral equation
that arises from the necessary first-order condition for optimality of the Stage I controller in the
system. Using finite element (FE) analysis, we develop a numerical framework to study this integral
equation and understand the structure of optimal controllers. We then solve the integral equation as
a mathematical optimization and use our FE model to numerically compute a nonlinear controller
that satisfies the necessary condition approximately at a set of quadrature points.
Thesis Supervisor: Pablo Parrilo
Title: Professor of Electrical Engineering
3
4
Acknowledgments
I want to start off my thanking those who have helped me pursue this research as well as those
who have helped me in my adjustment to life as an MIT graduate student. To my advisor Pablo
Parrilo, I want to thank you for challenging me to think critically and for your patience through my
countless research dead ends and MATLAB® code malfunctions.
To Alan Oppenheim and Yury
Polyanskiy, thank you for your support during my first year and for the countless conversations we
had while I was searching for a research group to work in. To Terry Orlando, thank you for making
time to meet with me, for listening to my concerns, and for helping me navigate my first two years
at MIT.
I would also like to thank those in my life who have contributed the most to my academic success
as well as my development as a young man. To my parents, I want to thank you for sacrificing having
fancy vacations and new cars so that my siblings and I could attend the best schools possible and
obtain a thorough and diverse education. To my siblings, Christian, Jonathan, and Victoria, I want
to thank you for being my first and closest friends and for challenging me to reach the highest levels
as well as for supporting me when I struggled to do so. In addition, I want to thank my girlfriend
Aarthy for being there for me the past two years and for giving me the strength to persevere when
I questioned my place in graduate school. And to those other countless individuals in the Ashdown
Community and elsewhere who have contributed to my success and helped me reach this level at
MIT, I thank you wholeheartedly.
Finally, I want to acknowledge my Jesuit education at Saint Joseph's Preparatory School for
driving me to always question and investigate anything and everything. In particular, I want to
acknowledge the teachers and administrators who tought me to be a man for and with others and
to always keep in mind that all I do should be for the betterment of society and the well being of
others.
Ad maiorem Dei gloriam.
5
6
Contents
Introduction
1.1.1
Formal Definition. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
15
1.1.2
Static and Dynamic LQG Teams
. . . . . . . . . . . . . . . . . .
. . . . . .
17
.
.
15
Previous W ork
. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .
. . . . . .
21
1.3
Summary of Contributions.. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
22
1.4
Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
22
.
.
.
1.2
Background
25
Mathematical Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
25
2.2
Calculus of Variations: Functional Derivatives . . . . . . . . . . . . . . .
. . . . . .
26
2.3
Numerical Integration: Quadrature Rules and Gauss-Hermite Quadrature
. . . . . .
28
2.3.1
. . . . . . . . . . . . . . . . . . . . .
. . . . . .
31
2.4
Optimal Transport Theory: The Monge-Kantorovich Problem . . . . . .
. . . . . .
32
2.5
Properties of Special Functionals
. . . . . . . . . . . . . . . . . . . . . .
. . . . . .
33
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
33
. . . . . .
34
.
.
2.1
2.5.2
MMSE Functional
.
W2 Functional
. . . . . . . . . . . . . . . . . . . . . . . . . .
.
2.5.1
.
.
.
Gauss-Hermite Quadrature
Witsenhausen's Counterexample
35
3.1
Classical Formulation . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.2
Transport-Theoretic Formulation
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
The "Counterexample" Aspect .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.3.1
Optimal Linear Controls
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.3.2
1-Bit Quantization Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.3
.
3
. . . . . .
Main Theorems from TTF
.
2
Team Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
1.1
15
.
1
7
3.3.3
.
52
Proof of Lemma 3.3.1
. . . . . . . . . . . . . . . . . .
52
3.4.2
Proof of Lemma 3.3.3
. . . . . . . . . . . . . . . . . .
54
Finite Element Model for Witsenhausen's Counterexample
57
4.1.1
D erivation . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
57
4.1.2
D iscussion . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
61
. . . . . . . . . . . . .
62
.
. . . . . . . . . . . . .
62
.
.
57
. . . . . . . . . . . . . 67
.
.
.
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
.
Finite Element Model
4.2.1
Model Parameters
4.2.2
Formulas for Computing WC Formulas & Equations
4.2.3
Justification of Rational Basis Functions . . . . . . .
. . . . . . . . . . . . . 68
4.2.4
Discussion of "Edge Effects" in MMSE Calculation
. . . . . . . . . . . . . 69
.
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
70
4.3.1
Accuracy of the FE Model . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
70
4.3.2
Optimizational Framework using FE Model . . . . .
. . . . . . . . . . . . .
74
4.3.3
Analysis of Results . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
81
. . . . . . . . . . . . .
82
.
Numerical Experiments using Finite Element Model.....
.
4.3
Necessary Condition for WC . . . . . . . . . . . . . . . . . .
.
4.2
.
3.4.1
4.1
4.4
Additional Plots for Chapter 4
4.5
Derivation of FE Model Approximations for Chapter 4 . . .
.
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 96
99
5.1
C ontributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
5.2
Future W ork
.
Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
5
Proofs for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . .
.
4
49
.
3.4
Refuting the Conjecture . . . . . . . . . . . . . . . . .
A MATLAB@ Code
100
101
A.1
Function Argum ents
A.2
Function Specifications .......
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..................................
8
101
103
List of Figures
3-1
Classical Formulation of Witsenhausen's Counterexample. An initial random
variable Xo is passed through the Stage I controller C1 to get U 1 . The sum Xo + U, is
then passed, with additive uncertainty given by the random variable V, to the Stage
II controller C2 to get U2 . We then take the difference between U2 and the output
of Stage I to get X2
=
(Xo + U1 ) - U2 . The objective of the control system is to
minimize the quadratic cost k 2 U2 + X2 for some scalar k.
3-2
. . . . . . . . . . . . . . .
I) of First-Order Condition for the Linear Case.
Absolute Magnitude (IG[A]
We set k = 0.2 and o,= 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-3
36
Solutions to (3.21) for k 2U 2
-
45
1. Each colored value represents a distinct solution
to (3.21) for any designated (k, u) pair, where we iterate through all such pairs by
changing k on the abscissa axis.
3-4
Absolute Magnitude of (3.28) for the 1-Bit Quantization Case.
k
4-1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.2 and a
=
5 and use
Q
=
46
We set
50 quadrature weights. . . . . . . . . . . . . . . . . .
50
Third-Order Sub-Basis Rational Basis Functions with A -= . These two
sub-basis functions in turn make up the rational basis functions, which we use to
computationally approximate the Stage I controller.
Both sub-basis functions are
zero outside the interval [0, 1) and, within this interval, are either strictly decreasing
or increasing. .........
4-2
........................................
Example of the jth Third-Order Basis Function with A
aj
=
2,
and aj+1
=
64
=
-,
aj_1
=
0,
1. We note that this jth third-order basis function is zero
outside the interval [aj_, aj+l) and, within this interval, is increasing on [aj_1, aj]
and decreasing on [aj, aj+ ) .. . . . . . . . ... .. . ................65
4-3
Relative Errors for Approximation of Linear Controller with A
=
0.1. We
denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
9
71
4-4
Relative Errors for Approximation of 1-Bit Controller with a
=
1.
We
denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
4-5
72
Absolute Size (IIG[]({x}!_ 0 )jI 2 ) of First-Order Condition for the Linear
Case using FE Model. We set k
=
0.2 and o-
5 and use the values for our FE
model from Table 4.2. We see that the overall behavior of
IIG[I]({xi}= 0 )
2
follows
that of the theoretical IG[A] I from Figure 3-2, allowing us to verify the accuracy and
suitability of the FE m odel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
4-6
Example of 5-Level Sinusoidal Quantizer with
75
4-7
Optimal 5-Level Sinusoidal Quantizer. The form of this Stage I controller was
= [--10, -4,0, 3,15].
. . . . .
determined by solving (4.17) with the FE model parameters as depicted in Table 4.3,
R = 5, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . .
4-8
78
Necessary Functional Equation (G[*]) for Optimal 5-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving
(4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and GaussHermite quadrature pairs as depicted in Table 4.9.
4-9
. . . . . . . . . . . . . . . . . . .
79
Relative Errors for Approximation of Linear Controller with A = 0.5. We
denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
83
4-10 Relative Errors for Approximation of Linear Controller with A = 1. We
denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
84
4-11 Relative Errors for Approximation of Linear Controller with A = 5. We
denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
85
4-12 Relative Errors for Approximation of 1-Bit Controller with a = 2.8. We
denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
86
4-13 Relative Errors for Approximation of 1-Bit Controller with a = 0- = 5. We
denote functions approximated using our FE model with a hat, ^. . . . . . . . . . . .
87
4-14 Relative Errors for Approximation of 1-Bit Controller with a = a 2 = 25.
We denote functions approximated using our FE model with a hat, ^. . . . . . . . . .
88
4-15 Optimal 3-Level Sinusoidal Quantizer. The form of this Stage I controller was
determined by solving (4.17) with the FE model parameters as depicted in Table 4.3,
R
=
3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . .
10
89
4-16 Necessary Functional Equation (IIG[T3*]I)
for Optimal 3-Level Sinusoidal
Quantizer Weight Values. The form of this Stage I controller was determined by
solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and
Gauss-Hermite quadrature pairs as depicted in Table 4.9.
. . . . . . . . . . . . . . .
90
4-17 Optimal 4-Level Sinusoidal Quantizer. The form of this Stage I controller was
determined by solving (4.17) with the FE model parameters as depicted in Table 4.3,
R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . .
91
4-18 Necessary Functional Equation (JIG[T*]H) for Optimal 4-Level Sinusoidal
Quantizer Weight Values. The form of this Stage I controller was determined by
solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and
Gauss-Hermite quadrature pairs as depicted in Table 4.9.
. . . . . . . . . . . . . . .
92
4-19 Optimal 6-Level Sinusoidal Quantizer. The form of this Stage I controller was
determined by solving (4.17) with the FE model parameters as depicted in Table 4.3,
R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . .
4-20 Necessary Functional Equation (JIG[Tr*]
93
I) for Optimal 6-Level Sinusoidal
Quantizer Weight Values. The form of this Stage I controller was determined by
solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and
Gauss-Hermite quadrature pairs as depicted in Table 4.9.
11
. . . . . . . . . . . . . . .
94
12
List of Tables
4.1
FE Model Parameter Descriptions. These variables represent the principal components of our FE model, which we use to approximate the Stage I controller in the
central problem of this thesis, Witsenhausen's Counterexample. . . . . . . . . . . . .
4.2
FE Model Parameter Values for Model Verification.
67
These values are used
to evaluate the accuracy of our FE model approximation for the Stage I controller in
the previously worked-out cases of linear controllers and 1-bit quantization controllers. 70
4.3
FE Model Parameter Values used to find Optimal 5-level Sinusoidal Quantizer. These parameters were used to fully define the optimization problem (4.17).
4.4
77
Optimal 5-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters
as depicted in Table 4.3, R = 5, and Gauss-Hermite quadrature pairs as depicted in
Tab le 4 .9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5
Costs and Necessary Functional Equation Norms (JIG[-i*i]1I).
These values
were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table 4.3 for n = R
=
3, n= R =4, n= R= 5, and n
=
R =6
and Gauss-Hermite quadrature pairs as depicted in Table 4.9. . . . . . . . . . . . . .
4.6
81
Optimal 3-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters
as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in
Tab le 4 .9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7
89
Optimal 4-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters
as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in
Tab le 4 .9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
91
4.8
Optimal 6-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters
as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in
T ab le 4.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9
Gauss-Hermite Quadrature Nodes ({xj}R_) and Weights ({wj}R
values were determined computationally for the case of (k = 0.2, a-= 5).
14
).
93
These
. . . . . . .
95
Chapter 1
Introduction
We begin this thesis by introducing the central problem: Witsenhausen's Counterexample (WC).
We shall build up to WC in the context of team decision theory, a variant of the more general
decentralized stochastic control. We conclude with a discussion of previous works on WC as well as
our contributions through this thesis. Then, we end with a detailing of the organization of the rest
of this thesis.
1.1
Team Decision Theory
Team decision theory refers to the stochastic control situation in which each decision maker (DM) has
access to different sets of information and acts once. Following closely the approach and illustrative
examples of [1], we shall make clear some of the basic concepts regarding the interplay between
information structures and notions such as signaling, estimation, and error reduction. We shall
detail basic results from the theory, including those dealing with the static linear quadratic Gaussian
(LQG) case. 1 We then discuss WC using the terminology and methods introduced in this section
and mention how it shows that dynamic LQG teams are fundamentally different from their static
counterparts.
1.1.1
Formal Definition
To begin, we define the team problem to consist of five main facets:
'By static we mean that the underlying information structure does not imply a necessary ordering of the control
actions.
15
a) a q-length random vector
=
[6, ...
, (]
we let the probability distribution of
[ui, ...
b) a control action vector u =
C B representing all uncertainties in the system, where
be denoted as P( );
, UK]
E U, where ui represents the action of decision maker i
(DMa), for i = 1,..., K;
c) an observation vector z = [z,..
. , ZK]
E Z, where
zi = 77i[uc()
(1.1)
represents the observation available to DMi, for i = 1,..., K. We note that {ili = 1,... , K}
is also known as the information structure of the team problem.
We further note that the
observation of DMi may depend on the actions of the other agents; 2
d) a set of control laws { yi :Z
UIi = 1, ...
-÷
,K}, where Zi and Uj refer to the sets of admissible
observations and admissible actions, respectively, for DMi and where each control action of a
particular DMi is selected according to ui = y I[zi]. We let F denote the set of acceptable control
laws -yj for DMi. We also let -y = [71, ...
F
=
F 1 x ... x
1
K
refer to the strategy of the system of all DMs, and
denote the set of all admissible strategies;
e) a cost function L : U x
L(u,
, YK]
)
=
E -÷ R, where
L(ui = Y1[T1(U-1,&)],..
.,UK
= 'YK[?K(-K,
) i,,
---
, q)- 3
(1.2)
We note that the expectation of L w.r.t. P( ) is well-defined as we have fully specified the control
laws.
Next, we formally define the team decision problem in its most exact, so-called strategicform. If
we introduce the functional J(y)
=
E [L(u = '-y(q(u, i)), )] for some control law vector, or strategy,
F, then the strategic form of the decision problem is
y
min J()
yEr
min E [L(u = -y(rq(u, )), )].
yEr
2
(1.3)
We refer to the situations in which (1.1) does not depend on the actions of the other agents, or in which the
dependence of the observation on the actions of the other agents is known, as static team decision problems. On the
other hand, the situations in which (1.1) does depend on the actions of the other agents are referred to as dynamic
team decision problems, and such dynamic teams necessarily require some type of partial ordering of the actions of
the agents on which the observation of agent i depends (so-called causality constraints).
3The notation u-j refers to a vector consisting of all the entries
of u with the exception of the entry at index j.
16
We note that this optimization, as opposed to being a parameter optimization, is a functional
optimization. Such optimizations are inherently more difficult, so attempting to solve for an optimal
strategy via this formulation could be very computationally intensive.
Instead, consider the decision maker's point of view. Specifically, for DMj, let yj represent the
control laws of all the other DMs except for i. We will treat these control laws as fixed. Then, from
DMi's point of view, the decision problem becomes
min J(-yi, yi) = min E [L({ui = yi(qi(u, )),i},
We observe that, once we define our information structure
ni as an appropriate Borel-measurable
function, our measurement zi becomes a well-defined random variable.
about the conditional distribution of
(1.4)
)].
Therefore, we can talk
given zi and, in particular, the conditional expectation
EJ[] = Ez [Eji[-]]. Thus, we can further reduce (1.4) to its extensive form:
min J(-yi, >y) = min Ez, [Eg z, [L({ui = yj(nj(u, )y-i},
= Ez, [min Egjz [L({ui, y-J},
)]
(1.5)
.
Indeed, we can reduce it to the following person-by-person, or semi-strategic, form:
(1.6)
min Ji(ui, zi; yi), Vi,
min Egjz, [L({ui, yi},)]
E
UuiUui
whose solution can be obtained iteratively by "guessing" the right strategy -y and then checking
whether the chosen strategy is fixed under the optimization specified in (1.6) for all DMs.
We note, however, that (1.3) -> (1.6), but not necessarily vice versa.
1.1.2
Static and Dynamic LQG Teams
Next, we shall discuss some conceptual differences between what is possible in static and dynamic
teams.
For the sake of brevity and ease of exposition, we will assume that all the uncertainties
in the problem are Gaussian random variables of arbitrary correlation. We shall further assume
that the cost functions are quadratic and that all observation functions (i.e., q) are linear. These
stipulations define the linear quadratic Gaussian (LQG) team problem.
17
-
.A(O, a2
)
First, let us consider the very simple static LQG team as specified below, with X
being the Gaussian initial state and V1, V2 ~ N(O, 1) being the noise measurements: 4
L
=(X+U 1 -+U2 )2 +U+IU
Z
= X+ V
Z2
=
(1.7)
X+V2
An interpretation of this problem is as follows: Both DM1 and DM 2 observe the initial state with
some additive white Gaussian noise. They then act to bring the state, which was originally X, to
some other state X + Ui + U 2 . Their goal is to have their actions cancel out X, while also minimizing
the energy of their individual control actions. The order in which they act does not matter, and
we can, without loss of generality, assume a lexicographical ordering of their actions, i.e., DM1 acts
before DM 2 . In this problem, there is no possibility for cooperation between the two DMs beyond
some initial coordination and planning before they receive their respective measurements.
It turns out that the optimal solution to this problem is actually given by (1.6), as a result of
the following theorem:
Theorem 1.1.1 (Proposition 1, [1) Let
_UTQU + uTS
and z
=
Q,
S, and H be matrices such that
Q
> 0.
If L
=
H , then the unique optimal control laws are linear and can be solved from
(1.6).
At this point, the problem of finding an optimal set of control laws is no more than a simple
optimization problem, and the action ordering does not matter. What happens if we consider a
situation in which the information available to DM 1 and DM 2 is a little uneven, for example, if we
allow DM 2 to know what DM, knows and also have a separate measurement on the state of the
system after DM 2 acts? Would not such a problem necessitate DM1 acting before DM 2 ?
To explore the answers to these questions, consider the LQG team as specified below:
+ U1 + U2)2 + IU2 + IU2
=(X
~=
Y2 2 Z1~
+
L
Z,~
Z2
.,X+V
(1.8)
=,
Y2
X + U1 + V2
4
.
We note here that, because we assumed that both -y and 72 are Borel-measurable mappings, their outputs, U 1
and U2 , respectively, are themselves random variables. This is important in the case of dynamic teams because, if
772 depends on the control action U1 , then U2 is not well-defined until U1 is actualized. Hence, there is a necessary
partial ordering of their actions, i.e., DM 1 must act before DM 2
18
At first glance, this problem appears to be fundamentally different from that of (1.7).
It seems
to necessitate DM 1 acting first, advancing the state of the system to X 1 = X + U 1 , and then
DM 2 acting upon both DMj's initial measurement Z1 and a new noisy measurement of the system,
Y2 . However, if DM 2 knows the control law 71 of DM, then we can define an equivalent set of
measurements for DM 2 as
1
Z2=XVl
-22
F2
Y71(1.9)
Z1
= X+ V
Y2 - Y(Yi) = X
19
+ V2
which has the same form as the problem specification in (1.7). In fact, we can also get Z2 from
Z2
with knowledge of -Yi. Hence, from Theorem 1.1.1, we know that the solution of the optimal control
laws to (1.8) is linear.
We refer to the information structure displayed in (1.8) as that of perfect recall, that is, an
information structure in which all agents who act after other agents have access to all the information
those previous agents had when they made their decisions. Indeed, if we further treat each successive
agent's decision as coming from the same agent (i.e., a one-person team), then we get the following
theorem:
Theorem 1.1.2 (Proposition 2, [11) In one-person LQG teams with perfect recall, the optimal
control laws are linear.
Indeed, the information structure of perfect recall for one-person LQG games (which we can
interpret our last example to be) is very similar to a more general information structure known as
partially nested. Such an information structure can be applied to situations in which there is a time
structure involved and a partial ordering of the actions of the agents, i.e., there is some element of
sequential control involved. Indeed, we can represent the information available to DMi when it is
its turn to act as
Zz = Hj + DjU,
(1.10)
where Hi and Di are linear operators (i.e., matrices) that satisfy causality constraints, so that the
action of any agent that acts after DMi is not factored into the observation information Zi. Such
an information structure necessitates a partial ordering of the agent actions because, if qj depends
on Ui, then Uj is not a well-defined random variable until Uj is actualized, meaning that DMi must
act before DMj (unless DMj has some side information that can remove the dependence of rmj on
the action Ui, akin to (1.9)).
19
As such, we have the following theorem:
Theorem 1.1.3 (Proposition 3, [1) In an LQG team with a partially nested information structure, the optimal control laws are linear.
Indeed, due to the assumed superiority of linear control laws, a natural conjecture would be the
following:
Conjecture 1.1.4 In any LQG team, the optimal control laws are linear.
Conjecture 1.1.4 is reasonable given Theorems 1.1.2 and 1.1.3, but consider the following system
for some k E R+
L
=(X + U + U2)2+ k2U2
Y AX
(1.11)
Z
=
Z2
=Y 2 AX+U+V
2
At first glance, this problem stipulation looks very similar to that of (1.8). However, the underlying
information structure is NOT the partially nested information structure that was present in that
problem. Indeed, while DM 1 has perfect measurement of the state X, the only information that
.
DM 2 has about the underlying state X is wholly affected by the choice of control action of DM1
Therefore there is no equivalence to (1.7) as there was in the case of (1.8) because knowledge of the
control law -y1 does not help in the same way.
We can interpret this problem as follows: DM 1 observes the state of the system X and then
performs the action U1, advancing the state of the system to Xi = X + U1. DM 2 then receives a
noisy measurement of this state, Z2 = X + U1 +
V2,
and chooses a control action accordingly. The
,
goal for DM 1 is to try and cancel out X using as little energy as possible in its control action U 1
whereas DM 2 desires to cancel out X+U 1 . Now, if DM 1 cancels out X completely (i.e., X+U1
and DM 2 accordingly follows U 2
=
2
=
0),
2
0, then L = k X , whose expectation w.r.t X remains high. If,
on the other hand, DM 1 uses no energy (i.e., U1 = 0), then DM 2 must choose a U2 to negate X based
on an observation with additive white gaussian noise (AWGN), which is known as the minimum
mean squared error (MMSE) problem in statistic inference and has a relatively high expectation
,
cost of L w.r.t. X and V2. As such, there is an inherent trade-off between the two purposes of DM1
that of reducing the error directly through its efforts and that of signaling DM 2 by reducing the
20
uncertainty in the information received by it. 5
It turns out that the control laws to the above problem were shown by Witsenhausen in [3] to
be nonlinear, refuting Conjecture 1.1.4! This system, which we expand more upon in Chapter 3 and
which serves as the main focus of this thesis, is known as Witsenhausen's Counterexample (WC).
Indeed, one can see the advantage of nonlinearity by considering the control laws specified by
=
o-sgn(X) - X
U2
=
o- tanh(uZ 2
where U2 is the MMSE estimator for X
(1.13)
)
Ui
+ U under AWGN. In fact, these control laws, on average,
outperform any linear control law in a certain regime of k and
-! This is because these control
laws, as opposed to linear control laws, attempt to balance the error reduction and signaling aspects
of decentralized control, which, along with estimation, are the three pillars of the "tridimensional
nature" of decentralized stochastic control.
Further information concerning team decision theory and its widespread applications in information theory, economics, and game theory can be found in [1]. The rest of this thesis will focus
on WC.
1.2
Previous Work
Since its initial publication in 1968, the simple two-stage control system in (1.11) that Witsenhausen
analyzed to disprove Conjecture 1.1.4 has attracted much interest in the control theory, information
theory, and computer science research communities. Most of the research on WC has focused on two
main thrusts. The first involves finding optimal controllers through the use of step functions (which
we refer to as quantization schemes) for the canonical case of ir(k2 , .2 ) k=.2,-=5 (see Chapter 3).
Beginning with Mitter and Sahai's demonstration that quantization schemes will achieve arbitrarily
low costs in the regime of very small k (with o.2 k 2 =
5
1)
[4], a lot of work has been conducted on
A problem formulation from [2] that makes this concept of signaling more apparent is as follows:
SL
Y1
Z{
Z2
+ U1+U2)2+ _IU12
=(X
=Y2
X
(1.12)
=U1
If DM 1 and DM 2 pursue the control laws U1 = -yi(Yi) = eX and U2 = 7 2 (Y2) = E - 1 U1 , respectively, then they
can, on average, almost cancel out the expected cost J(-y, ), if E -* 0. However, if DM 1 wished to communicate some
information to DM 2 , then it could do so through a little additional cost. That is, instead of letting E -4 0, DM 1 could
choose some Eo > 0 and transmit some information to DM 2 about the size of X via its control law yi. This is known
as the "transparency of information," which is further explained in [2].
21
efficiently searching the feasible set of (2n + 1)-bit quantization schemes, most successfully through
hierarchical search methods [5] and potential games [6]. However, basic computational difficulties
of such schemes have been discussed in [7] and [8], the latter of which showed that a discrete version
of the WC was NP complete.
The second thrust uses information theory to develop upper and lower bounds on the cost for
optimal controllers. For example, considering a finite-dimensional analog to WC, Grover and Sahai
[9, 10] were able to develop control strategies that approximate optimal controllers within a bounded
interval. Moreover, Choudhuri and Mitra considered implicit discrete memoryless channels in the
case of an asymptotic version of WC in [11] to great effect.
In addition, Wu and Verdni
recently used optimal transport theory to reformulate WC from
a functional optimization to a probability measure optimization [12]. Using this formulation, Wu
and Verdfi were able to discern more analytical properties of optimal controllers and show that the
necessary condition first introduced by Witsenhausen in his original paper had to hold everywhere.
We touch more on this in Chapter 3.
1.3
Summary of Contributions
Our main contribution in this thesis is the design and implementation of a finite element model
to be used in the study of the necessary condition of WC system. As we shall discuss later, the
necessary condition presents as a homogenous integral equation of the Stage I controller,
f.
In
particular, we investigate the case in which all the native random variables are Gaussian in nature
and the system specifications, k and o-, satisfy k 2 U2
=
1.
We verify the accuracy of our computational model using analytically derived formulas for
the simple linear and 1-bit quantization controllers. We then develop a mathematical optimization
framework to solve the necessary condition approximately at certain points of interest. These points
of interest turn out be Gauss-Hermite quadrature abscissas. Using a simple family of five-parameter
controllers, we then demonstrate successfully that we can find controllers within this family that
approximately satisfy the necessary condition.
1.4
Organization of the Thesis
The thesis will proceed as follows: Chapter 2 details some of the general mathematical necessaries
and background that will be required for sufficient understanding of the content of this thesis.
22
Chapter 3 introduces WC and discusses the existence of optimal solutions and the non-optimality
of linear controllers. Chapter 4 includes a full derivation of the first-order condition for optimality
and also introduces our finite element model and the results of numerical experiments obtained
using the model. Ultimately, Chapter 5 summarizes our main contributions and suggests future
avenues for research that build upon the results of this thesis.
23
24
Chapter 2
Background
In this chapter, we will detail and define some of the mathematical terminology and notation we
will be using. We will also discuss some of the underlying concepts that serve as the basis for our
finite element model and the derivation of the necessary variational equation for WC (Chapter 4).
2.1
Mathematical Notation
To begin, let us discuss the notation and mathematical concepts we will be using in this thesis. We
denote the set of real numbers by R, and use Rd to refer to the d-dimensional vector space defined
in the usual way with R representing the underlying scalar field. Z is the set of integers, and N is
the set of non-negative integers (including 0). We denote vectors, those belonging to Rd for some
nonzero d E N as well as those belonging to the infinitely long extension of Rd, by non-italicized
lower-case letters with "hats" (e.g., A) or by an explicitly defined Greek letter with a "hat" (e.g., C.).
Constants are represented by italicized letters and are always assumed to be members of the real
field, R, unless otherwise stated.
In addition, limits are assumed to be defined as in the usual sense (so-called strong limits), and
we denote elements of a set or a vector by italicized lower-case letters with subscripts (e.g., wi or
aj). We denote sets themselves by an explicit representation like {ai}f 0 , or simply as {ai} when
the bounds of the set are understood from the context. Underlying functions are assumed to be
functions of R into R and are always denoted by lower-case letters, unless otherwise stated. Sets
of functions or vectors are denoted by non-italicized capital letters and will be defined as they are
introduced. Also, derivatives of functions are defined with ax
-2- representing a partial derivative and
d3a 44 regular"
derivative (in the sense of single-variable calculus). We denote derivatives of functions
25
explicitly using these aforementioned representations, though we at times use upper ticks (like ')
when the underlying variable we are differentiating w.r.t. is understood.
Furthermore, we represent functionals between function spaces using the regular functional
analysis notation.
For example, if G : A -- B is a mapping between function spaces A and
G
B, and if fi-g, then we write g = G[f].
Also, inner products are defined similarly for function
spaces as well as vector spaces. For example, the inner product between two vectors, a and b, is
defined as
(h, b) = E aibi, and the inner product between two functions, f and g, is defined as
(f, g) = f f(x)g(x)dx, unless otherwise indicated. 1
Also, we are assuming an underlying measurable space of (R, B(R)), where R is our observation
set, and B(R) is the Borel algebra comprising all real Borel sets.
From this measurable space,
we define our probability laws for all of our independent random variables in our control system.
Random variables themselves are denoted as italicized capital letters (e.g., X). In addition, we
denote by P(B(R)) the set of all Borel-measurablefunctions on this space, i.e., F(B(R)) is the set
of all real-valued real functions
f
: R
-÷
R such that, if E E B(R), then f- 1 (E) E B(R).
Finally, for arguments that utilize the term weak, we mean this in the functional analysis sense as
follows: Consider the underlying metric space (R, I-) and the corresponding topology that is induced
by it, (R, r1 . 1). We note that, because this metric space is completely separable (i.e., contains a dense
countable subset), we have that the Levy-Prokhorov metric metrizes the notion of weak convergence
in measures (or, more simply in the case of probability measures, convergence in distribution).
Hence, we say that a sequence of probability measures {P,} converges weakly to a probability
measure P (written P, !4 P) iff P, d+ P. Thus, for example, when we say that a particular
function
#
: P (B (R))
-+
R is lower semi-continuous, we mean that
#(P)
<; lim infu, 0, O(Pa) for
any P 14 P.
One final note is that we do utilize some standard shorthand throughout our discussions and
proofs. For example, "RHS/LHS" means "right-hand side/left-hand side".
2.2
Calculus of Variations: Functional Derivatives
The mathematical field known as calculus of variations is concerned with maximizing and/or minimizing functionals, or functions (of functions) that map functions in some function space to the
'Because we are assuming only real scalars, there is no need for complex conjugation in the second multiplicand
as one usually encounters. Though, for the sake of completeness, we implicitly define the inner product as (f, g)
f
f(x)g(x)dx.
26
A.A.4k-
-
-
lk4k-
, -
'
underlying scalar field (usually R). A central result of calculus of variations is that, if G is a functional, then any function
f
that minimizes G must necessarily have the first variationof G, 6GIf,
be zero. The treatment in this section is based on [13], and we refer the reader to that resource for
a more thorough summary of this material.
Before defining what is meant by first variation, recall the concept of a derivative of a funcWanting the derivative of a function to capture the
tion, If(x), from single-variable calculus.
instantaneous change (or speed in some contexts) of the given function, we define it as follows:
df (x) = lim f(X+A)-f W.
A
&-+-o
dxk
(2.1)
We can interpret this as providing us local information about how our function behaves under incremental changes to a value in its domain. Indeed, the derivatives and their higher-order equivalents
tell us so much information about the structure of the function that we can represent nearly every
common function using a Taylor Series of the form
f (x) = f (a) + f'(a)(x - a) +
2!
(x -a) 2 +
3!
(x - a)3 + o(jx - a1),
(2.2)
where a is the value in the domain about which we are defining our series. 2 This is a very useful
representation as it leads to a necessary condition f'(x) = 0 for any value of x that minimizes/maximizes
f,
a condition that is commonly known as the first-order necessary condition for optimality.3
We can apply the same rationale when defining derivatives of functionals, also known as functional derivatives. With a little more mathematical finery and careful attention, we can similarly
expand functions in terms of functional derivatives. In particular, if G is our underlying functional,
and
f
is a specified function in the domain of G, then we can define the Giteaux derivative (also
known as the first variation) of G at
f
as
JG|f (0) =
OG
= de
-G~f + EO]
,(2.3)
dE
6=0
where
2
4' is a function with e a scalar and E<b the variation of f. We can interpret this as generalizing
Here, we are using the little-o notation, which represents functions that go to zero faster than the specified
function. That is, we say b is o(c) if lim,,,o
b(x)=
0.
3
We are assuming underlying domains like R, which are not inherently bounded. Otherwise, we could have the
maximum of a function on a bounded interval whose derivative is not zero. For example, consider f(x) = x on [-1, 1].
27
,
-
.
.4."
the notion of a regular derivative in the sense that we look at how the functional responds to
"instantaneous" changes in its underlying function, with this change represented by f + eip, where
E is assumed to be very small.
Indeed, as was stated earlier, we can expand G in terms of its first variation as
G[f + e4] = G[f] + 6G|f (O)E + o(e).
(2.4)
This resembles the Taylor Series representation discussed above and also leads to a similar firstorder necessary condition for optimality for optimizing functionals: 6G~f (0) = 0. This condition
becomes extremely useful in our analysis of optimal controllers in WC as we see in Chapter 4.
2.3
Numerical Integration: Quadrature Rules and Gauss-Hermite Quadrature
Other concepts central to the work of this thesis are the ideas of numerical integration and quadrature rules. Our discussion here will proceed in a similar fashion to that of [14], and we refer the
reader to that resource for a much more thorough introduction to this very important topic.
To begin, the basic idea behind numerical integration is to approximate an integral of the form
f f(x)w(x)dx by another integral f f(x)w(x)dx that can be evaluated more easily. 4
In a lot of cases, this latter integral involving f can be approximated as a summation utilizing
a set of weights, {wi}
1
, and abscissa points, {xj}
1
, so that:
f (x)w(x)dx
(x)w(x)dx
n
= Zwif (xi).
i=1
Hence, the key to quadrature rules is to choose the weights and abscissa points correctly.
Now, assuming that the abscissa points are chosen ({xi} T), we can easily determine the weight
function by considering the family of Lagrange polynomials on this set of abscissa points:
A
(X - X1)(X -X1)
(xi
-
x1 )(Xi
-
X1)
-- (X - Xi-1)(x ...
(Xi
-
i_1)(xi
-i1+)
i+1) ...(X - Xn_ 1) (X - Xn)
. ..
(X
- xn_1)(Xi -
xn)
(2.5)
These polynomials act as interpolants in the open subsets between successive abscissa values (e.g.,
4In this context, we refer to w(x) as the weighting function and f(x) as the integrand. To treat this more generally,
we could define the integral as f fdp, where pt is a measure of the underlying measurable space. All of the results in
this section could then be employed in a similar fashion to treat this case as discussed in [14].
28
(Xi, Xi+1)) and as a standard Kronecker delta function at the abscissa points (i.e., at the boundaries
of the aforementioned open subsets). 5
Indeed, if we consider the values of f at these abscissa points, we can define our approximation
to f, f, explicitly as
n
f(x)
an expression that satisfies
f
f (xi)Li(x),
(2.6)
exactly at the abscissa points.
Using this approximation, we can then define our set of weights as
(2.7)
Li (x) w(x) dx,
Wi
since
Jfw(x)dx
J
f(xj)Lj(x)
w(x)dx
Li(x)w(x)dx) f(Xi).
Now that we know how to choose our weights given a set of abscissa (or nodal) points, we turn
our attention to choosing the best set of points possible. In order to do so, we can employ the simple
rationale that we would like to have some polynomial, P, of degree n with roots {xj
3
(v, P)
=
6
f v(x)i(x)w(x)dx = 0 for any polynomial v of degree less than n.
If we have such a P, then for any polynomial f of degree less than 2n, we have
f(x)w(x)dx
(t(x)P(x)
=
t(x)p(x)w(x)dx +
=
(tP) +
J
=0+
=(
r(x)w(x)dx
wjr(xj)
j=1
5
Li(x, ) =i
{
J
r(x)w(x)dx
n
6We write P(x)
+ r(x))w(x)dx
=
~0 otherwise
(x - xo) ... (x - x,).
29
r(x)w(x)dx
_1, such that
n
j=1
where we utilized Euler's division rule to determine another remainder polynomial, r, and polynomial multiplier, t, both with degrees less than n. We also utilized that xj's are roots of
that f(xj)
=
P so
t(xj)p(xj) + r(xj) = r(xj). Indeed, this means that we are able to fully represent a
continuous integral involving
In addition, even if
f
f and
w by a finite summation involving only n points and n weights!
is not a polynomial, we can think of the n-point quadrature rule as the best
approximation of the integral of
f,
if we assumed
f
was represented using a Taylor Series of degree
less than 2n.
Of course, the goal is finding this special polynomial P and determining its roots. A straightforward way to do so is to employ the standard Gram-Schmidt orthogonalization procedure to
determine an orthonormal basis for the span {1, X,X 2 ,...
functions, {pj}> O, for
x" pj(x),
-
j
=
1,
...
(2.8)
f"
and get a series of orthonormal
n,, as follows:
Pn(X)
with po(x) = 1 when (a, b)
, x}
n
a(x)b(x)w(x)dx. We then have that pn acts as P, and the roots
of pn end up being the set of abscissas we are looking for! Indeed, any polynomial pn determined
by this method is guaranteed to have n distinct roots and satisfy (V,pn) = 0 [141. Thus, we can
methodically determine our abscissa points and then, using these points, determine our weights.
In practice, however, finding pn for a general weight function w(x) is difficult. Luckily, a wellknown result from numerical analysis says that the set of orthonormal basis polynomials formed in
(2.8) have to satisfy the following recurrence relation:
Theorem 2.3.1 (Three-Term Recurrence Relation)
pn+l(x) = (x - an)pn(x) - bnpn_1(x),
with an =
(T-'-Pn)
f(nPn)
(
=
(poPo)
ard b =-
()
(Pn-1,Pn-i)
(bo =
(2.9)
0).
Using this relation, we then have the following incredible result from Golub-Welsch [15 in the
form as it was presented in [14]:
30
Theorem 2.3.2 If {aj}i_7
and {b}}-1 are the terms in a three-term recurrence relation, then the
Jacobian matrix for this three-term recurrence relation is defined as
ao
bj
Vbi
ai
\b
.
(2.10)
..
/ n-
-.
a_-1
an~i
ba_ 1
And so, if Jn = VAVT is the eigenvalue decomposition, we have that the eigenvalues of A correspond
directly to the abscissa points (xj = Aj), and the weights correspond to scaled versions of the first
components of the eigenvectors (w=
(f
w(x)dx) v2 0 , where
Vj
is the jth column of V).
Indeed, we note that Theorem 2.3.2 has transformed the problem of finding appropriate weights
and abscissas to one of finding eigenvalues and eigenvectors. As methods for solving such problems
with sparse matrices like J, have been highly optimized over recent years, we can efficiently and
accurately determine Gaussian quadrature rules!
2.3.1
Gauss-Hermite Quadrature
In the case of infinite integrals involving a Gaussian weight function, w(x) = ex 2
,
we can use
what is known as Gauss-Hermite quadrature. The three-term recurrence relation for the orthogonal
polynomial basis (known as the Gauss-Hermite polynomial basis) can easily be determined to be
[14]
PGH,n+1(=)
XPGH,n(X)
-
()
PGH,n-1(),
(2.11)
meaning that our weights and abscissas for any n-point approximation can be found by investigating
the eigenvalues and eigenvectors for a matrix of the form:
0
1
0
(2.12)
0
n-2
31
n-1
This matrix is sparse (in the sense of being mostly zero), so we can efficiently determine the eigenvalues and eigenvectors and, ipso facto, efficiently determine the abscissa values and weights.
Indeed, our approximation of expectations involving Gaussian random variables will use Gauss-
fi(A, oa2 ).
Then, we can determine an n-point approximation of the expectation of f(X) w.r.t.
X using
Hermite quadrature in a critical way. In particular, suppose f is any function and X
Gauss-Hermite quadrature as follows:
E [f(X)]
f(x)
dx
oo f((
n
2'2)U+)U) eu du
f
x
p22)Xj+
~nw
j=1
\/
(V2,)xj + /_) .(2.13)
f
j=1
It should be emphasized that (2.13) forms the backbone of our FE model as we discuss in Chapter
4.
We also note that, when we refer to an n-point Gauss-Hermite quadrature in the context of
Gaussian probability weighting functions, we are referring to the set of weights
set of abscissas
{ (V/2a)
xj
+ p},
where {wj}>_
1
{ ,}
and the
and {xj}>_ 1 are the weights and abscissas,
respectively, for the standard n-point Gauss-Hermite quadrature rule.
2.4
Optimal Transport Theory: The Monge-Kantorovich Problem
As we will be discussing an alternate formulation of WC that utilizes the theory of optimal transport,
we will now very briefly introduce some of the related terminology and the basic transport problem.
Generally speaking, optimal transport theory is concerned with problems regarding supply and
demand, specifically methods of efficiently "transporting" the supply to meet the demand. A given
tranportplan is regarded as being "efficient" if it minimizes some cost criterion, typically one based,
in some sense, on the common Euclidean metric. A classic example of optimal transport theory
involves finding minimum transportation plans from "mines" to "factories" within the Cartesian
plane, where the mines and factories are located at given coordinates and the cost function in
question is the Euclidean metric.
32
More formally, we can consider the classic Monge-Kantorovich probabilistic formulation for a
probability space (Q, B(Q)) and probability measures
[
and v on Q with a cost function c: O x Q-
[0, +oo]:
min
cd Y E 11(g, V)
(2.14)
where
II(y, v)
{E
E P(Q x Q) : -y(X, Q) = p (X), -(Q, Y) = v(Y) for Borel X, Y}
(2.15)
is the set of transportplans between it and v [161.
Since p 0 v E II(y, v) (i.e., the joint distribution of independent p and v), we know that there
always exists at least one transport plan to consider. One can also prove that an optimal transport
plan always exists using calculus of variations [16]. We will touch more on this in Chapter 3 when
we discuss a formulation of WC in terms of transport plans.
2.5
Properties of Special Functionals
In the next two sections, we will be stating (without proof) some of the necessary properties and
associated lemmas for both the MMSE functional defined in (3.11) and the quadratic Wasserstein
metric. These two functionals will be used during our discussion of the Wu and Verd6 formulation
of WC in Chapter 3.
2.5.1
W 2 Functional
First, we define the following space and metric (the results in this section borrow heavily from [17]):
Definition The quadratic Wasserstein space on R is the collection of all Borel probability measures
with finite second moments and is denoted by P2 (R) A
{ P E F (B (R)) I f
x 2 dFp(x) < oo}, where
F (B (R)) denotes the set of Borel-measurable functions on R.
Definition The quadratic Wasserstein metric (or W2 ) is a metric on P2 (R) defined for any P, Q E
P2 (R) as
W 2 (P,Q) A inf
{E[(X
-Y)
The following results are borrowed from [12]:
33
:Px=PPy=Q}.
(2.16)
Lemma 2.5.1 a) For any P, Q E P2 (R), we have
W 2 (P, Q)
F 1 (t) - F-1(t)2dt
=
(2.17)
for cumulative distributionfunctions Fp and FQ.
b) (P, Q) " W 2 (P, Q) is weakly lower semi-continuous.
c) (P, Q) " W22(P,
Q)
is convex in
Q.
d) For any strictly increasing measurable function f : R
-÷
R, W2 (Px, Pf x))
=
VE [X - f(X)].
Remark In the context of the Monge-Kantorovich transportation problem where our cost is given
by W2 , we note that the optimal coupling of random variables X and Y are X = FF1 (U) and
Y
F -1(U), where U ~ Unif(O, 1). That is to say, if P is atomless, then the optimal transportation
=
plan between P and
Q
is deterministic and is given by T =
Q-- o P, where o is the composition
operator.
2.5.2
MMSE Functional
In this subsection, we shall review two properties of the MMSE functional that are used for the
main results discussed in this thesis.
The MMSE functional is defined as
mmse (X, a.2) - min E
[(X
- g(oX + N))2]
(2.18)
= E [var (Xo-X + N)],
where X0 =
(2.19)
-X with X distributed according to the probability measure P, and N is some additive
random noise.
Using this, we have the following result from [18, 19, 20], assuming that
Q
E F (B (R)) and
o> 0:
Lemma 2.5.2 a)
Q
H-+
mmse (Q, .2 ) is weakly continuous.
b) Among all probability distributionswith variance .2 , Gaussian distributions maximize the MMSE
functional.
34
Chapter 3
Witsenhausen's Counterexample
In this chapter, we will formally present WC and analyze various properties and aspects of the twostage controller system. As we had stated previously, WC is a two-stage control system in which two
agents (hereafter referred to as Controller 1 (C1) and Controller 2 (C2)) attempt to "match" the value
of a random variable in two stages. Its difficulty arises from its non-classical information structure
(i.e., C1 and C2 have access to different information and can only communicate implicitly). We shall
proceed by overviewing the classical formulation (Witsenhausen's so-called "classical formulation")
and then briefly summarize a more recent alternate, but equivalent, formulation of WC based on
optimal transport theory as given by Wu and Verdi [121. We shall also detail the necessary equations
for understanding WC in the context of the canonical controller examples, i.e., in the linear and n-bit
quantization cases, and briefly go through how Witsenhausen mathematically refuted Conjecture
1.1.4, the "LQG conjecture" mentioned in Chapter 1.
3.1
Classical Formulation
Consider the two-stage decentralized control system specified as follows and depicted in Figure 3-1.
Let X0 and V be two independent nondefective random variables with finite second moments. Then,
we specify the state equations of the system as
" Xo = Xo
" X 1 =Xo+U
SX 2 = X-
U2
and the observation equations of the system as
35
.
Figure 3-1: Classical Formulation of Witsenhausen's Counterexample. An initial random
variable X0 is passed through the Stage I controller C1 to get U 1 . The sum Xo + U1 is then passed,
with additive uncertainty given by the random variable V, to the Stage II controller C2 to get U2
We then take the difference between U2 and the output of Stage I to get X2 = (Xo + U1 ) - U2 . The
objective of the control system is to minimize the quadratic cost k 2 U + X2 for some scalar k.
" Y1 = X0
Y
X 1 + V,
where U1 and U2 are the control variables given by the Borel-measurable functions (i.e., control
laws) yI and Y2 as U1 =
7 1 (Y)
and U 2
=
2
(Y2 ). The cost function of the system is given by
Exov[k 2 U1 + X]
and the optimal cost of the system is the minimum cost over all applicable Borel-measurable functions,
E[k 2 U + X],
min
(3.1)
'i,y 2 EF(Z3)
where F (B) is the set of Borel-measurable real-valued functions.
As Witsenhausen showed, the above basic two-stage control problem is equivalent to another
specification. Specifically, if we let g
=
Y2 and
f
be such that f(Xo) = Xo + -y 1(Y 1), then our cost
objective as specified in (3.1) is equivalent to the following:
min
J(f, g)
=
E[k 2 (f(Xo)
-
Xo) 2 + (f(Xo)
f,ger(i3)
We assume, without loss of generality, that E[Xo] = E[V]
did.
36
=
-
g(f(Xo) + V)) 2 ].
(3.2)
0 and E[V 2 ] = 1, just as Witsenhausen
I
II
I
I I
-
1. --
-
-14-
-
-
"
-
M
-
.-
. .....
....
1, 4
1. 1 -. 1.1
,
"I
In fact, we can simplify the functional optimization in (3.2) by recognizing that, relative to
any
f,
the controller g acts as the MMSE estimator for estimating f(Xo) using the measurement
f(Xo) + V. We shall then denote this MMSE estimator using standard functional analysis notation
as g[f]. Explicitly, we can write g[f] using standard notation as
gff](y) A E [f(Xo)If(Xo) + V = y] = Nf (y),
Nf~O(y)'
(3.3)
where
NkI[f](y) A
f(x)ko(y
with No[f](y) representing the p.d.f. of Y[f] A
-
f(x))dF(x),
(3.4)
f(Xo) + V and #(u) A (27r)-
the standard Gaussian probability weighting function. We can interpret Nk
kth conditional moment of f(Xo) under Y[f] (i.e., Nk[f](y)
-
e-U
representing
[f] as representing the
E [fk(Xo)Iy[f]
= y]).
Indeed, according to Lemma 3 of [3], we can further reformulate (3.1) using the Fisher information of the random variable Y[f] about f,
[f], which we represent as
If] A JNo[f](y)n[f](y)dy,
(3.5)
where q[f] is the score function defined as:
q-f](y)
Remark This
r[f]
log No[f](y) = Nof](y)
is similar to the well-known score function in statistical inference.
interpret y as parameterizing f(Xo) through Y[f].
(3.6)
We can
I[f], in this context, measures how much
"information" the variable Y[f] contains about f(XO) when it takes on a certain value. The Fisher
information then directly corresponds to the ability of C2 to effectively estimate f(Xo) through the
relation
E [(f (Xo)
-
g[f](y[f]))2 = 1 - T[f].
Using these new definitions, we can further reformulate (3.2) to be
J*(k, o-) = min k 2E
f T'(B)
[(XO - f(Xo)) 2] + 1 - I[f],
(3.7)
which is the standard version of the classical formulation of WC.
Remark We can now understand WC better in the context of (3.7). The overall cost of the system
37
is directly tied to C1 's ability to send enough "information" to C2 through f(X)
+ V (as measured
by 1[f]) so as to make C2's job easier while keeping in mind that it is penalized for the amount of
"information" it sends (as measured through the quadratic cost).
Borrowing further notation from Witsenhausen, we let ir(k2 , F) denote the problem of finding
(3.7), where k > 0, V
-
K(0, 1), and F is a valid cumulative distribution function with Xo
-
F. If
X0 ~ .J(0, o.2), then we denote the problem of finding the solution to (3.7) by 7r(k 2 , a.2 ).
Finally, Witsenhausen proved a number of desirable and important properties regarding optimal
controllers, which we now summarize in Theorem 3.1.1 [3]. Denoting the optimal controller for
-r(k 2 , F)
as
f*,
we have:
Theorem 3.1.1 a) No [f], Ni[ f ], and g [f ] are all analytic functions;
b) No[ f] is the p.d.f. for Y [f ] with No[f] > 0;
c) N[f](y) = yNk[f](y) + Nk+1 f](y);
d) g'[f](y) = E [(f(Xo) - E[f(X)])2 Y[f]
e) E
[(f(XO)
f) E [f*(Xo) ]
-
g[f](y))2] = 1
-
=
_[f ];
0 and E [f*(X)] < 40r;
g) f* is monotonically nondecreasing on a(F), the intersection of all convex sets of probability one
under F;
h) J[f] - k2IE [(Xo
-
f(Xo)) 2 ] + E [f 2 (X0 )] - E
(g2
i) the first variation of J[f] (in the sense of Ghteaux) is given by
6 Jil
=
G[f](x)6f(x)dF(x),
(3.8)
where
G[f](x) = 2k 2 (f(x)
-
x) +
j0
(y - f(x)) r f](y) [2 (f(x)
-
y) 2 + i[f] (f (x) - y) - 21;
(3.9)
furthermore, we necessarily have that 6 J.f = 0.
The proofs of the above statements can found in [3].
38
11
..........
-ali4- - -A&Lr&#,A
---
. 4- --- _
"',
Remark The necessary variational equation G[f] serves as the main focus of this thesis and will be
explored in depth in Chapter 4, where we will derive it formally and investigate it numerically. In
fact, using the above facts as well as some results concerning the boundedness of optimal controllers,
Witsenhausen was able to prove the existence of optimal controllers. Utilizing the classical proof
technique of assuming a minimizing sequence exists, Witsenhausen was able to show that the limit
of the cost of a suitably chosen controller pair sequence (fs, g,) existed and was bounded from above
and below by the optimal cost, J*. Thus, an optimal solution exists for WC, though Witsenhausen's
proof of this fact does not offer many clues as to the structure of such solutions.
Transport-Theoretic Formulation
3.2
Next, we will present an alternate formulation of the ir(k 2 ,,
2
) problem as given by Wu and Verdni in
[12], which we shall refer to throughout this thesis as the "TTF," or "Transport-Theoretic Formulation." To wit, Wu and Verdni decided to view this functional optimization problem of Witsenhausen's
in a different light. Specifically, they view it as one in which the goal is to try and find a probability
distribution,
Q,
that minimizes a weighted sum of the Wasserstein distance metric between
Xo and the MMSE of
Q
Q
and
in the presence of additive white Gaussian noise, V.
Following the reasoning in [12], we can recast (3.1) as
J*(k 2 1, 2 , p) A
inf
fer(B)
[k2,2E [(X
-
f(X))2] + a 2 mmse (f(X),
E
[(X
a2)
,
(3.10)
where
mmse (X, a2 ) =
-
g(OX + V))2]
(3.11)
gEr(S)L
E [var (XlaX + V)]
(3.12)
and X0 = oX, with X distributed according to the probability measure P ~ f(0, 1).
If Xo ~ A(O, a 2 ), then X ~
j(0, 1), and we let (3.10) become simply
J*(k 2 , a2 ) A J*(k 2
2
, g(0, 1)).
(3.13)
Noting that the optimal controller must be deterministic, Wu and Verdi found that it could be
viewed in an optimal transport-theoretic sense. That is to say, they relaxed the Stage I control
39
f
to be a random function Px x and then reframed the problem as follows:
J*(k 2 u 2, p)
=
U2
inf
PX 1 IXO
o
[k 2IE [(X 1
inf
QCP2 (R)
Xo) 2] + mmse
-
(X1,
a2 )]
[k 2 W (P, Q) + mmse (Q, 0 2 )].
(3.14)
Thus, using the facts about optimal transport theory introduced in Chapter 2, we have that, for any
particular
Q
from which X1 is distributed, the optimal control is given by
f
= F
o Fp. Hence, the
problem of finding an optimal control in the set of measurable functions on R in (3.1) is equivalent
to the problem of finding the optimal output measure
the metric space (R, I-
I)
In fact, by restricting
Q
in the space of probability measures on
equipped with the underlying Borel --field.
f
to be of a certain type (e.g., affine), we are equivalently restricting our
search to distributions Q of a certain type (e.g., Gaussian). Table 1 on p. 5735 in [12] has a listing
of a few interesting restrictions and their consequences on the structure of Q, and we urge the reader
to consult that table to gain a better understanding of this relationship.
3.2.1
Main Theorems from TTF
The following results were put forth in [12] and are presented here mostly without proof, except a
few that include broad proof sketches of the proofs as put forth by Wu and Verddi. Please consult
the original paper for more-thorough proofs.
Theorem 3.2.1 Let P E P2 (R) and or > 0.
Then, there exists a R G P2(R) that achieves the
minimum in (3.14).
Proof Sketch We recall that Witsenhausen proved in [3] that any optimal Stage I control f must
have E [f(Xo)]
4a2 . Hence, we can restrict our search for an optimal Q to the weakly compact
subset {R E P2 (R) : E [R 2] < 40 2 }. We can then use the fact that
W2 (P, Q) are lower semi-continuous w.r.t. Q to show that
semi-continuous w.r.t.
Q.
Hence, there exists at least one
Q
-
Q
'-4
mmse
k 2 Wy(P,
Q that
(Q,
a2 ) and (P, Q) a
Q)+mmse (Q,
a) is lower
achieves the infimum since weakly
lower semi-continuous functions attain infimum on weakly compact sets.1
l
Remark This proof of existence is more straightforward and simpler than Witsenhausen's original.
This illustrates the depth of understanding that viewing WC as a problem of finding the right
'One can see this by recalling the definition of compactness in terms of open covers and then using a representative
finite subcover together with the properties of the infimum of a set.
40
distribution for X1 adds.
Theorem 3.2.2 Suppose that P has a real analytic and strictly positive density. Then, we have the
following:
E [Q] = E [P] and var(Q) < var(P) +
,
a) Any optimal Q satisfying (3.14) also has a real analytic density and unbounded support, with
b) Any optimal Stage I controllerf is a strictly increasing, unbounded, piecewise real analytic function with a real analytic left inverse.
Proof Sketch The key to this proof is to peturb the optimal Stage I output random variable
Y[f]
~
Q
E P2 (R) in order to derive the following variational equation (which Witsenhausen first
derived in [3]) that serves as a necessary condition for any optimal f (P-a.e.):
2k 2 (f - id) = (#'* (q [f] 2 + 277[f]')) o f,
(3.15)
where id is the identity function and * represents the convolution operator.
In addition, we then note that there is a left inverse, h o f = id, where h is given by
h = id - 2k 2 ((())
The existence of h means that
Since q is analytic,
#'* (q2 +
f must necessarily
* (q2 + 2n')) .
(3.16)
be injective into R and therefore strictly increasing.
2q') is also analytic, and hence h is analytic. This then implies that
f is piecewise real analytic and, since h is continuous, f's range must be unbounded.
The former conclusion of the first part of the theorem can be found by using facts about composition of analytic functions and the Cauchy-Schwartz inequality. D
Remark We note that the analyticity of h as given in (3.16) does not imply that
well (as Wu and Verdd' pointed out in Footnote 3 on p. 5735 of [12]). Proving
require proving that
Q
f
f
is analytic as
is analytic would
is supported on the entirety of R.
Also, as Wu and Verdd discuss, the (piecewise) analyticity of
f
suggests that finding series
approximations to the solution of the variational equation (3.15) could be a valid approach to
finding the optimal
f.
However, the authors do note that the only polynomial solution to (3.15) is
the affine controller, i.e.,
Q
being Gaussian.
41
Indeed, we should point out that the
#'
in the alternate version of the necessary condition refers
to the function - (#(y)) and necessitates use of the chain rule.
For example, if a is some real
function, we have that (O(a(y)))' = a'(y)#'(a(y)).
Finally, and perhaps most importantly, it should be noted that, since
f
as defined above (FQ a
Fp) is right-continuous, the variational equation must be identically zero at all points. The rightcontinuity, in combination with the density on R of the set of all points at which the necessary
condition resolves to zero, makes this evidently clear.
Corollary 3.2.3 The optimal Stage I controller for (3.14) cannot be piecewise affine.
Proof Sketch The key to the proof is the use of the identity theorem for holomorphic functions
and its results when we consider h o
also assuming, of course, that
f
f
on a small-enough open interval of the real line. We are
has at least one discontinuity (a so-called "jump") and/or at least
one nonsmooth "bend." Otherwise,
f
is completely affine, and our proof arguments do not follow.
However, as we shall see, the affine controller is not always optimal and, in fact, hardly ever is.
Theorem 3.2.4 The map P
-+
J*(k2 , 0 2 , P) is concave, weakly upper semi-continuous, and trans-
lation invariant. In addition, the following inequality holds:
0
J*(k 2
2 , P)
< min{k 2 a2 var(P), a2 mmse (P, U2 )}
1.
(3.17)
Corollary 3.2.5 Let P, Q E P2 (R) and o- > 0. Then:
a) For any
b
Q, J*(k 2 U2, P* Q) > J*(k 2 ,,
) For Gaussian P, a2
Theorem 3.2.6 o 2
6J*
3o.
k
p).
J*(k 2 , c 2 ) is increasing.
J*(k 2 , a2 ) is increasing, subadditive, and Lipschitz continuous, with 0
<
2
-k+1
Lemma 3.2.7
2 ,p)
which is achieved by the affine controller f(x)
=a
(2k 2 2
var(P),
(3.18)
+
lim J*(k 2
02-+0
= k
2
var(P)
Lemma 3.2.8 If P ~ .A(0, 1) and k < 0.564, we have that
lim J*(k 2 7a2 ) < 1 = lim J*(k2 o 2 ).
'-*+OO
0T-+00
42
(3.19)
Remark The last few theorems are mainly analytical results and add to our understanding of
the underlying geometry of the problem. From these and the facts specified in Chapter 2 about
the MMSE functional and Wasserstein distance, we see that the underlying cost functional is a
weighted sum of a convex functional (the Wasserstein distance) and a concave functional (the
MMSE functional). General solutions for such problems, even in the finite-dimensional case, remain
elusive, and the solution to these optimization problems are currently an open problem. As such,
it is not surprising that WC has not been solved for any particular choice of 0- and k, let alone in
full generality.
3.3
The "Counterexample" Aspect
In this section, we will explore Witsenhausen's refutation of the LQG conjecture. We shall start by
discussing the two canonical controllers and then show that, in certain regimes, a simple nonlinear
controller outperforms the best linear controller by a large margin.
3.3.1
Optimal Linear Controls
To begin, let us consider the most basic linear control law. We define the following:
fx(X)
AX,
(3.20)
where A E R is the linear coefficient.
Lemma 3.3.1 summarizes the main formulas related to a linear controller. For ease of exposition,
we have placed the derivations at the end of the chapter in Section 3.4.
Lemma 3.3.1 For the case of linear controllers, we have the following:
IL(1
a) No[f](y) =e
2
b) N1[f,\](y) =
112)yNf'.(y),
y,
c) 9[fx](y)=
e) J[f,\]
J[A]
2
+
'
d) r/[fA](y) =
2
1+12,\2
k2(
_ A)42
43
Now, in order to find the optimal linear coefficient, A*, that minimizes J[fx], we can proceed in
two ways. We can directly differentiate the above algebraic expression for J[fx] w.r.t. A, set the
resulting expression to zero, and then solve for A. We could also utilize the necessary condition
being identically zero everywhere to derive the equation for optimal A*. That being said, we will
use the former approach as it is more straightforward (though we would get the same expression
using the necessary condition).
Lemma 3.3.2 The optimal linear coefficient A* that minimizes J[fx] has to satisfy the following
equation:
k 2 (1 - A*) =
.
(1 + o.2(A*)2)
(3.21)
2
Proof Taking the partial derivative of J[fA] w.r.t A, we have
J[fx] =
6A JV1
-
A
-2k
(i21
2
2
-
(1 -
A) 2o. 2 + 1 A
1 + o2A2
A) + (1 + .2A 2 ) (2a'A) - (.2 A 2 ) (20.2A)
(1 + 0.2A2)
= -2k 2. 2 (1 - A) +
2o 2 A
(1 + 0.2A2)
2o. 2 (-k2(1 - A) + (1.+
2
2
T2A2)2
Then, setting this expression to zero, multiplying out the constants, and rearranging the two expressions, we get the desired formula. LI
Thus, the optimal A* is a function of k and a with a value satisfying (3.21). That is, if we let
R3.21 denote the set of real roots of (3.21), then we define the optimal linear coefficient to be a
function of k and o as
A*(ko) A arg min J[fr].
TE7R3.21
(3.22)
Now, as Witsenhausen pointed out, (3.21) can have at most three solutions, so the space of solutions
we end up searching over to find the optimal A* turns out to be extremely small. Indeed, we can
visualize this necessary condition through the zeros of the function, G[A], which we define as
G[A] A k 2 (A - 1) +
44
A
(1 + a.2A2)2
(3.23)
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0
0.5
A
1
1.5
Figure 3-2: Absolute Magnitude (IG[H]I) of First-Order Condition for the Linear Case.
We set k = 0.2 and o- = 5.
45
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
1
k
0.8
1.2
1.4
1.6
1.8
2
Figure 3-3: Solutions to (3.21) for k 2 , 2 = 1. Each colored value represents a distinct solution
to (3.21) for any designated (k, -) pair, where we iterate through all such pairs by changing k on
the abscissa axis.
46
In Figure 3-2, we graph
IG[A] for the canonical case of k = 0.2 and o- = 5. In this case, we
can clearly see three distinct solutions at roughly 0.04, 0.3 and 0.95. In fact, if we were to graph
all the solutions to (3.21) we would notice a very interesting bifurcation that emerges in the special
case when k 2 , 2 = 1 and k < 1. We end up having two equal cost solutions driving upwards and
downwards to one and zero, respectively. These two curves represent the optimal solutions with the
third concave curve representing a suboptimal solution. In Chapter 4, we explore this behavior in
more depth using our finite element model, but for now, we present this visually in Figure 3-3.
Before moving on, we note that, for very large k, the optimal linear coefficient A* is found
through a simple manipulation of the equality in (3.21) as follows:
A
k 2 (1-)\'=-A(
2 2
(1 + 0 A ) 2
k2
+(1+
0 2A 2 ) 2 }
which, as k -+ o, in turn yields
A*
1.
(3.24)
Thus, we get a minimum possible cost, in the limit of large k, of
2
2
'
J[A*] 00 11 + U2-
(3.25)
which is the traditional expression for the conditional expectation of Gaussian random variables.
This result agrees with our intuition, since, in the limit of large k, the Stage 1I cost is inconsequential in comparison. In this situation, the only quantity that is of interest to be minimized is
the energy of the Stage I controller. This energy minimization is obtained by using a zero-energy
policy of allowing the value of X 0 to "pass through" Stage I unchanged. Hence, our optimal linear
coefficient is A*
= 1.
On the other extreme, if we consider the case when k
=
0, then quite clearly the optimal linear
coefficient is A* = 0, as the equality in (3.21) is satisfied by such an assignment.
This is also
confirmed via a simple graphical investigation. Thus, if we have absolutely no cost on our ability
to control what the Stage II controller sees, then we can force f(Xo) = 0 and then have our MMSE
estimator simply guess zero with fidelity one. The optimal cost in this case is quite clearly zero,
and this strategy is known as a zero-forcing control policy.
47
3.3.2
1-Bit Quantization Controls
When k is sufficiently small but nonzero, the optimal linear control function f.\
is not necessarily
the best control law one can choose. Indeed, in some parameter specifications for k and o, simple
nonlinear control laws perform better. The 1-bit quantization controls are a particular family of
nonlinear control laws that can perform better and are the next controls we investigate.
To begin, we define the following:
a,
a,
faGV)A
x<O
x> 0
(3.26)
,
where a E R is the 1-bit quantization coefficient. We note that we may also represent (3.26) more
compactly as
asgn(x).
fa(x)
As in the linear case, we summarize the relevant formulas in Lemma 3.3.3 and delay the proofs
until Section 3.4.
Lemma 3.3.3 For the case of 1-bit quantization controllers, we have the following:
a) No[fa](y) = (v/ 2) #(y)#(a) cosh(ay),
b) Ni[fa](y) = a tanh(ay)No[fa](y),
c) g[fcj(y) = a tanh(ay),
d) r/[fc](y) = -y + a tanh(ay),
e) J[fa] A J[a] A k 2 (a2 - 2
o-a + a2) + (2w/2)
a2 #(a)
f0
sech(ay)#(y)dy.
Now, in order to find the optimal 1-bit quantization coefficient, a*, we can once again choose
one of two options. The first option is to minimize J[fa] directly via a partial differentiation w.r.t
a, and the second option is to use the necessary functional equation. As in the linear case, for ease
of exposition, we shall pursue the former course of action.
Lemma 3.3.4 The optimal 1-bit quantization coefficient a* that minimizes J[fc] has to satisfy the
48
following equation:
2k 2 a*
o-
-
= (v'7)a*0(a*)
j
sech(a*y)#(y) [(a*) 2 + a* tanh(a*y) - 2] dy.
(3.27)
Proof We can proceed as we did in Lemma 3.3.2. Taking the partial derivative of J[f,] w.r.t a,
we have
a9
a [
a
2
2
[k 2
9ZJ[f ] =
[2(2
2
-2
2
k2 2-2
2)\
o-a + a
2
2\1+
o-a + a
f
/
2
+ (vf2w) a20(a)
a [
+
2I
Vr
1
sech(ay)#(y)dy.]
a2#(a)
1
c)sech(ay)0(y)dy]
OO
Setting A J[f,,]
0 and moving the second summand on the RHS over, we get the desired equation.
Remark Unlike in the linear case, we note that the necessary equation for optimal a is a transcendental equation (because of the presence of fft. sech(ay)#(y)dy), so direct analytic evaluation
of it is difficult. Luckily, as we shall do quite frequently in Chapter 4, we can approximate this
integral using quadrature rules (specifically the Gauss-Hermite quadrature rule). As such, we can
approximately graph (3.27) in a similar manner to how we treated (3.21) and qualitatively discern
properties of optimal a* (which we do in Figure 3-4). However, further discussion of this behavior
is, as it was in the linear case, outside the scope of this thesis.
3.3.3
Refuting the Conjecture
Using the above facts concerning linear controllers and 1-bit quantizers, Witsenhausen was able
to prove that, in a particular regime of wr(k 2 , a2 ), a rather naive 1-bit quantizer can outperform
by a fairly large margin the best linear controller (effectively refuting Conjecture 1.1.4). Our resulting proof mirrors Witsenhausen's, and we reproduce the mechanics of it here for the sake of
completeness.
Theorem 3.3.5 For 7r(k 2 ,0. 2 ), linear controllers are not optimal in the regime of small k with
ko- = 1.
49
1
I
I
I
I
1
1.5
2
2.5
a
I
I
I
3.5
4
4.5
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.5
3
5
Figure 3-4: Absolute Magnitude of (3.28) for the 1-Bit Quantization Case. We set k = 0.2
and o- = 5 and use Q = 50 quadrature weights.
50
Proof To begin, recall our discussion of optimal linear controls in the regime of small k, in Section
3.3.1. In that narrative, we had pointed out that, as k becomes smaller, the optimal A approaches
both zero and one (i.e., we have two optimal A's of equal cost that approach both zero and one). In
this regime of small k, J* [fx] -+ 1.
Keeping this mind, let us define the very simple nonlinear controller
f,(x)
-sgn(x).
=
Using our previously stated equation for J[f,] combined with ku
=
1, we can expand the cost of
this nonlinear controller to be
J[fa] = 2
1-
+ ( v/)
020(o-)
sech(uy)#(y)dy.
(3.29)
Now, since sech(x) < 1 for all x E R, we have from Lemma 16 in [3] that
sech(x)#(x)dx < j
q(x)dx = 1,
and, using this, we can bound the RHS of (3.29) to get
J[f,]
2(1 -
.)
+ (Vs) .24(u)
Thus, taking the limit as k -+ 0 coupled with the fact that othat J[fa] < 2 (1 -
.
=
- and limkO
s
= 0, we have
Since this upper bound is strictly less than the best possible performance
for the linear controller, we have that linear controllers are not optimal controllers in all regimes of
7r(k 2 ,U 2 ). . 1
Remark As Witsenhausen dutifully points out, his token nonlinear controller is "far from optimal."
Indeed, he notes that, by using (2n+ 1)-bit quantizers together with a gradient descent method, one
may obtain much better controllers in the regime of very small k. This agrees with our intuition and
countless papers, including the best known results to date, which we had discussed in Chapter 1.
This follows since, with small penalty on its movement, C1 has greater freedom to communicate more
explicitly through Y[f] by sending appropriately spaced messages (w.r.t. F), which are represented
by the quantization levels in any (2n + 1)-bit quantization scheme.
51
A few decades after Witsenhausen's paper, Mitter and Sahai were able to construct families
of n-bit quantization schemes that perform infinitely better than the optimal linear control in the
regime of small k [4]. In particular, the cost of their n-bit quantizers approaches zero in the regime
of very small k. This result, combined with Witsenhausen's original example, has directed most
of the work on this problem to be that of efficiently searching the space of (2n
+ 1)-bit quantizers
for the optimal quantizers (or, more realistically, best-yet performing as the space of (2n
+ 1)-bit
quantizers is infinite). Using rather elegant and creative approaches, researchers have been able to
improve the cost little by little over the years. The best known result that this author is aware of
is the one that was referenced in Chapter 1, that of Li, Marden, and Shamma. For the canonical
problem of
7r( (_)2,
52), they were able to design a 25-bit quantizer with cost J
3.4
Proofs for Chapter 3
3.4.1
Proof of Lemma 3.3.1
= 0.1670790 [6].
Proof a)
No[fx](y)
0(y - fx(x))dF(x)
=
f
-
(y - f.(x))
_
(
) dx
e
-2
dx
/
+.22))
2A(1
(1
-
27(1
+
]2
1+2,\2
2
2A
b)
j
0'
(
from which we gather that Y[f] ~ K(0, 1 + o2 A 2 ).
N1[fA](y) =
2_
22,\2)
e
0.2
e
)
2
27o
fx(x)#$(y - fA(x))dF(x)
52
Y29
-V+a,2
dx
e
27r(1
+ 0.2A2)
2
du)
~
1
A~
f -ooxe2
(1+-2
1+ u
27(a)
+
7ra
+
I
(
1
2A2
)
2
(1o
2-
f-00o) -
y
2 A2
2du
2
ye
2 022
(I+
2
2A
+o
f-oo'c')
du
y)
022
)
2
( i-Ny) dx
+
+
-
e 1
d
2
1
(
ye
12A2
2ir( 1 + u2 2)A
c)
g[f ](y)
=
Ni [fx] (y)
No[f\] (y)
(2
2
2
(1+
-
.2
T) ye-
2
12)y
A2
(I+ u22 )7 Y-
d)
-y + g[f,](y)
n fx](y)
(1 2 A2
=I
+
1+2A2
e)
J[fA]
=
k2 E
Lf
)x o ) X O)2 ] +
-
k 2 (A
-
1) 2 +
)
o g [fx(y] Y[fAfx2 ) 2
+ E [f(Xo)]
(A _ 1)2X2]
-
( .(
A2
2 2)2
53
)(I1+02A2)
=
k2
=k
3.4.2
2
(1
2
A20 2 (1 + C. 2 A ) - (02A2)
1 + a.2A2
.22+
2
A) 2
-
.2
+
2
A22
1 + q2 A 2
Proof of Lemma 3.3.3
Proof a)
O(y
=
=
- fa(x))dF(x)
(y -
fi(x))dF(x)
= c(y + oe)
dF(
) + #(y - a)
o00 dF(x)
(y - a)
+
= O(y + a)
(2
-
- fa(x))dF(x)
+
No[fQ](y)
+ 2
(
_a2)
i
f- 7r (y)#(a) (ea'Y + e- y)
= ( 2)#(y)O(a)cosh(ay).
=N-[f](y)=-yNk [f](Y) + Nk+ [f](y), which holds for any
b) Using the relation
No[fa](y) =
((vf-)#(y)#(a) cosh(ay)
)
d
=
(V)#(ce)
= (2;)#(c)
= -yN[fa](y)
(#(y) cosh(ay))
(-y#(y) cosh(ay) + #(y)ao sinh(ay))
+ aNO[fa](y)nh(y)
=+No[f](y)h(ay)
=-YNo Vfl](Y) + oeNo [fe] (y) tanh(ay),
we get
N1[fa](y) = +No[fa](y) + yNo[fa](y)
= (-yNo[fj](y) + aNo[fa](y) tanh(ay)) + yNo[fa](y)
= a tanh(ay)No[fe](y).
54
f,
and
c)
g[fa](y)
=
Ni[fa](y)
No [fa] (y)
a tanh(ay)No[f)](y)
No[fV] (y)
=
a tanh(ay).
d)
r/[fa](y) = -y + g[fa](y)
= -y +a tanh(ay).
e)
J[fQ] = k 2E [(XO - fa(Xo))2] + E [fa(Xo)] - E
X2]
- 2E [Xofa(Xo)] + E [f(X 0 )]) + E [fa(Xo)]
- 2
(joxfa(x)dF(x))
= k2
2
= k2
2-
(jf
0
tanh2(ay)N[fa](y)dy
-
2 (.
+
xfa(x)dF(x)
k 2 (2
=
k 2 (02 -2
=
k 2 (02 -2
=
2
k 2 (a2 -2 - -Oa
2 -a
2
-a
a2
-
a2
(
xfo(x)dF(x))
-
E [a2 tanh2 (aY[fa])]
tanh2(ay)No[fa](y)dy
+ a2) + a 2
+ a 2) +a2 -- a2 (ftanh2(ay) No[fa](y)dy)
+
=
+
2
1
2
a2
+ a 2 - a2
+
-a
Sa
a2
+ a2
+
k 2 (E
[(gfa](yrfa]))21
a2)
+
(1 - sech 2 (ay)) No[f.](y)dy)
sech2(ay)No[f.](y)dy)
(V21r)
a 2q(a)
55
j
sech(ay)4(y)dy.
56
Chapter 4
Finite Element Model for Witsenhausen's
Counterexa mple
In this chapter, we detail the computational model that we use to investigate WC, particularly, its
necessary functional equation that was first introduced in Chapter 3 in two forms (the classical and
TTF forms). We proceed by first deriving, in full, the necessary functional equation for optimality
in the classical format and show that this equation is equivalent to that of the TTF. We then specify
our finite element model and comment on its suitability as a numerical tool for investigation of WC.
Following this, we present numerical results through which we verify the accuracy of our model, and
then we present the results of using our FE model to approximately solve the necessary condition
at certain points of interest. In our numerical investigations, we will be considering the standard
version of WC, 7r(k 2,
4.1
2
0.
), as we have throughout this thesis.
Necessary Condition for WC
As we stated earlier in Theorem 3.1.1, the first variation of the WC cost functional J[f] has a succinct
and analytically derivable form. In this section, we shall formally derive this cost functional, show
that it is equivalent to the necessary variational equation obtained by Wu and Verdi in [121, and
then discuss various aspects of this equation, including its computational difficulties.
4.1.1
Derivation
Consider Theorem 4.1.1, which incorporates Lemma 9 from [3] and Theorem 2 from [12]:
57
Theorem 4.1.1 In WC, if f* is an optimal controller, then it satisfies the following exactly for all
x C R:
-
X)
+
G[f*](x) = 2k 2 (f*(x)
]
# (y - f*(x)) r[f*](y) [2 (y - f*(x))
2
+ j[f*](y) (y
-
f*(x)) - 21 dy,
(4.1)
(f*(x) - X)
+
d(-)
+
= 2k2 (f*(x) - X)
*
(T 2 [f*] + 27'[f*])
o
(4.2)
,
2k 2
-
2 (0 * o [i7"[f*] + q[f*],'[f*]]) o f*(x)
(4.3)
-0.
Proof To begin, recall the cost function form from Theorem 3.1.1:
Jf] = E [(f(XO)
-
Xo)2] + 1 -
'[f].
(4.4)
Now, using the standard condition for first optimality from calculus of variations, we need to figure
out what the first variation of J[f], 6JIf , is and set the result equal to zero.
Using (2.3) from
Chapter 2 with some arbitrary variation 6(t), we have
&9E
[((f (XO)
+ eO(t))
-
Xo)2]
=
E=0
((fX() + eO(x))
-
X)2 dF(x)
((f (X) + e6(X))
-
X)2 dF(x)
2 ((f (x) + eO(x)) - x) O(x)dF(x)
=
/=j2
dO
(f (x)
-
x) O(x)dF(x)
and
-(1
9E
- I[f + O])
0
=
1[f + e6]
6=0
E=OE
( [f + eO](y)No[f
2
=
-
2[f
6=0
+ E6](Y)
+ e1](y)d y)
No[f +eO](y)dy
'Sometimes we may use the notation 6J[f] for the first variation of J[f].
58
2f +
+
-o
60](y)
No[f + E6]()y)
(
(+n f +
[2,q f + cO] (y)
+ 7 2 [f + E](y)
00co
09,E
I
,=
+ () No[f + cO](y)
No[f + EO](y)
(ANo f
+ EO](y))
N[f
+
No
No[f + EO](y)
I6](Y)
No[f + EO](y) dy
2:[f
-
dy
co](y))
No[f + ce6](y)
No[f + cO](y)
aNoIf + co](y) dy
+ 772[If + Cel](Y) a9,
6](y)
E2T [f +
:
-0
a
No[f + 60](y)
No[f + 6] (y)
- 27[f + 6O](y) (1 f + co](y)
2
No[f + 0] (y)
[f + CO](y)
j
2[](y)
-
-
2,
2
f(x)) 2 ] # (y -
f(x))0(x)dF(x)
(y - f (x))(y - f(x))0(x)dF(x)
[f](y)
+ n2 [f](y)
J-000
_ (y -
dy
(y - f(x))(y
f(x))0(x)dF(x)] dy
-
+
[f[(y) 2(y - f(X))2
+ r[f](y)(y - f(x)) - 2] O(y - f(x))dy)0(x)dF(x),
using that
09E
a fjo
(y
-
f(x) - EO(x))dF(x)
j
(y
-
f(x) - EO(x))dF(x)
=
=
j(y
- f(x) - Eo(x))#(y
-
a-No[f + EO](y)
f(x) - cO(x))0(x)dF(x),
with
a No[f + Eo](y)
=
j(y
-
59
- f(x))0(x)dF(x)
and
+ Ec](y)
o2No[f
=
No[ f +,EO](y))
j(y
J j
- f (x) -,EO(x))#(y
9
+[(y
- f((x) - EO(x))#(y - f(x) - cO(x))] 9(x)dF(x)
j[#(y -
(y
- f(x) - cO(x))O(x)dF(x)
f(x) - CO(x))
f(x)
-
1 - (y
-
-
eo(x)) 2q(y
-
f(x) - E)] 6(x)dF(x)
f(x) - eO(x))2] O(y
-
f(x)
-
EO(x))O(x)dF(x).
with
02
=
E(
j
[1
-
(y
-
f (x))2] q(y - f (x))9(x)dF(x).
Thus, defining
-
x)
+
G[f](x) = 2k 2 (f(x)
# (y - f (x)) I[f](y) [2 (y - f(X))2 + T[f](y) (y
j
-
f(x)) - 21 dy,
we have
6 JIf
= JG[f](x)dF(x)6f(x).
Hence, G[f](x) = 0 must hold F-a.e. since we require that 6J~f = 0. Furthermore, as pointed out
by Wu and Verdi in [12], we must have that G[f*](x) = 0. We can see this by considering the fact
that {G[f*](x) = 0}1 C R is dense in R, coupled with the right-continuity of optimal controllers.
Finally, the latter two forms of the necessary condition can easily be ascertained through repeated
use of integration by parts, as follows:
G[f](x) = 2k 2 (f (x)
x) + j
-
# (y - f(x)) 7I[f](y) [2 (y - f(X))2 + r[f](y) (y - f(x)) - 21 dy
[(y
= 2k 2 (f (x) - x) + 2
+
(y
= 2k 2 (f (x)
-
-
f(x))O(y
x) + 2j
-
- f
-
1 ](y - f(x))r[f](y)dy
f(x))q 2 [f](y)dy
#"(y - f (x))rq[f](y)dy - j
60
-(y
-
f (X))q5(Y _ f x))r
2
[f](y)dy
= 2k 2 (f(x) - x) - 2
= 2k 2 (f(x) - x) -
j
0'(y - f (x))r7'[f](y)dy -
q'(y
-
f(x)) 7 2 [f](y)dy
0'(y - f(x)) [2'[f](y)dy + 'q2[f](y)] dy,
and
G[f](x) = 2k 2 (f(x)
-
x)
= 2k 2 (fW
-
x) +
-
f
'(y - f (x)) [2''[f](y)dy + n 2 [f](y)] dy
J
q(y - f(x)) [2,q"[f](y)dy + 27'[f](y)] dy
#
= 2k 2 (f(x) - x) + 2
where we have used repeatedly that limto
q(y -
f (x)) [q"[f](y)dy + 77'[f](y)] dy,
$(t) = limtex 0(t) = 0 in our multiple integration by
parts.
Thus, we have our formulas and have proved the theorem. E
4.1.2
Discussion
Before moving on to specifying our finite element model, let us briefly discuss some aspects of the
variational equation we just derived in Theorem 4.1.1. For one, we note the rather complicated
nature of the equation in terms of
f.
Not only does
f
appear inside and outside the integral expres-
sion, its appearance inside the integral expression is in the power term of the Gaussian probability
weight function as well as in the score function. The score function itself involves
the fraction of two integrals, both of which include
f
f appearing within
in the power terms of a Gaussian probability
weight function. This, coupled with the fact that the integral of the necessary condition has three
separate terms involving
f
(#(y - f(x)), 'r[f](y), and [2(y - f(x)) 2 + n [f](y) (y - f(x)) - 2]),
indicates extreme difficulty in analyzing this expression analytically as well as in modeling it numerically (though we do the latter successfully in this thesis through clever use of Gauss-Hermite
quadrature).
In addition, we can see that no simple (2n+ 1)-bit quantization scheme could possibly satisfy this
equation. This is because any f(x) that satisfies the necessary condition has to incorporate uniquely
identifiable information regarding x. Therefore, if in some interval x E (aleft, aright), f(x) = a, then
the second summand in G[f](x) would be the same for all x E (aleft, aright) whereas the first summand
will change depending on x. Thus, there is no way a (2n + 1)-quantization scheme could be zero
for all values of x, as we know it must if f were to satisfy the necessary condition. Now, as we shall
61
see in Section 4.3.2, we can approximately satisfy the necessary condition at quadrature points for
controllers that have flat sections (like the (2n + 1)-bit quantization schemes), however, we cannot
satisfy it everywhere in general for such controllers.
4.2
Finite Element Model
In this section, we shall present the main contribution of this thesis, our finite element (FE) model
using rational basis functions. We will specify fully our M-element FE model, discuss our choice
of rational functionals as our basis functions, and then touch on the degrading impact that "edge
effects", or the effects of boundary conditions, have on the numerical computation of the MMSE
estimator.
4.2.1
Model Parameters
To begin, let us choose two nested, connected compact intervals in which to place our basis functions.
We want to choose a bounded interval in which "most of" the probability of the underlying initial
state variable Xo resides in and an enclosing bounded interval in which we end up placing our
basis functions. The former interval models the region in which we want to most accurately model
functions since that is where most of the probability of Xo will be calculated, 2 whereas the latter
interval will be used to provide a wide-enough buffer for our numerical computation. 3 Denoting
these intervals by BI and BO, respectively, we can choose them in the context of r(k 2 , a 2 ) as
B1 = [-Ku, Ka],
(4.5)
BO = [-La, Lo],
(4.6)
where K and L are positive reals such that K < L.
We choose symmetric intervals since the
underlying probability distributions for X0 and V are symmetric.
Remark In the context of our implementation, we further specify that both K and L are powers
of 2 so that we can efficiently search for the appropriate sub-interval element using a binary search
algorithm.
Proceeding, let us partition BO into M + 1 non-overlapping evenly spaced boundary points.
Denoting these boundary points by the set & A {ao = -La, . . . , a,
2
3
. . . , am = Lo}, we can refer to
and hence the region in which the accuracy of the cost of our controller will be most affected.
and to provide some protection against deleterious edge effects caused by our numerical integration.
62
jth subinterval (hereafter referred to as jth element), Sj, by
[aj-=j
(4.7)
[agl,aM), j<M
[am_,, am], j = M
We then refer to the collection of all the such elements, E = {ei
We note that Bo
=
j
=
0, ...
, M},
as the element set.
U 1,jM S'
Next, we need to specify our basis functions. We shall be utilizing a piecewise rational basis
function set specified as follows. Let A
-
denote our mesh size, and let Oj denote
La-(-Lo) =
the jth basis function, which will be nonzero only on the two elements bordering the boundary
point aj, or only on the inner element in the case of ao or am.
As for the shape of these basis functions in their respective elements, we shall use a simple
rational representation as in [21]. Let us formulate our sub-basis functions j and 2 as the following
3rd order rational basis functions: 4
a+
(i~X) =
b2
,
0
<A
(4.8)
+
1+(A)(A
(48
0,
otherwise
and
0 < X <A
-
d
C+
(X)
=A+i+i2()
where the weights a, b, c, d - R are suitably chosen so that
and limA
find that a =-
(X)
=
(4.9)
otherwise
0,
i(0) = 1, lim,,A p1 (x)
0, (2(0) = 0
1. We can figure out these weights by solving this system of equations, and we
, b =,
c = , and d =-.
Therefore, we can fully specify our sub-basis functions
as
4
4
dz)=3 1+iq)+(q) 2+(q)
3
(4.10)
3
-
0,
0<x<A
otherwise
and
4
3
2 (X)
(L)(L)+
-1()Q)
)3
0 < X<
(4.11)
0, otherwise
Please consult Figures 4-la and 4-1b for graphical representations of these sub-basis functions.
4
We choose a third order function to allow for greater changes in the value of our function between boundary
points. We could choose higher-order rational basis functions, but we settled on using third order rational functions
as they are sufficient to capture most of the nuances of the derived equations.
63
1.5
1
0.5
0
-0.54
0
0.1 0.2 0.3 0.4 0 5 0.6 0.7 0.8 0.9
(a) First Sub-Basis Function
1
((1) with A
1.5
1
0.5
I
0
-0.5
U
U.I 1
.
6.3
I
.4 y 5 6. .7.f ?5.8 V.
(b) Second Sub-Basis Function ( 1) with
A
)
These two
Figure 4-1: Third-Order Sub-Basis Rational Basis Functions with A =
computationally
to
use
we
which
functions,
sub-basis functions in turn make up the rational basis
approximate the Stage I controller. Both sub-basis functions are zero outside the interval [0,
and, within this interval, are either strictly decreasing or increasing.
64
1.5
1 --
0.5
0
'
-0.5
-0.2
0
0.2
0.4
X
0.6
0.8
1
1.2
jth Third-Order Basis Function with A
Figure 4-2: Example of the Jth Third-Order Basis Function with A = 1, aj_ = 0,
a
I , and aj+1 = 1. We note that this jth third-order basis function is zero outside the
interval [ayj1, aj+i) and, within this interval, is increasing on [aj-1, aj1 and decreasing on [aj, aj+i).
65
Now that we have specified our sub-basis functions, we can define our basis functions, {n}
o,
more concretely as
1(x - ao), j = 0
2(x -
aj-1) +
1(x - aj),
62(x -
0 < j < M
(4.12)
am-1), j = M
Thus, we can picture our regular "sharklet" V$' in Figure 4-2 (the edge half-sharklets V50 and OM
resemble simply
'1
and
2, respectively). We note that each regular basis function is asymmetric
about its boundary point pivot. We shall discuss the reasons for this choice of sub-basis functions
in more detail in Section 4.2.3, but suffice it to say that the current specification will greatly aid in
the accuracy of our numerical computations.
Now that we have specified our basis functions, we can approximate a function on BO as a
weighted sum of our M basis functions as follows:
f[P1(x) = Zw4 j(x),
j=0
(4.13)
where cZ E RM+1 is a weighting vector. We then find that each function approximation can be
uniquely specified by the choice of basis-functions weights, leading us to view f[c] as a functional
mapping of a finite-dimensional vector space (RM+l) into an infinite dimensional inner product
space of real Borel-measurable functions (with the inner product (., -) specified in the usual manner
as detailed in Chapter 2). Hence, we have transformed any optimization problem involving our
function f from one that lies in an uncountably infinite space to an approximate one whose feasible
set is finite-dimensional. This is the essence of the FE method!
In addition, we should note that nodal values are preserved in this model with f[C2 ](aj) = Wj,
i.e., the weights simulate the values of f at the nodal points.
Moreover, the model provides a
continuous approximation of f with no "jumps" or abrupt changes since the approximation of f
on each element is continuous (as each sub-basis is continuous, and we have a weighted sum of
continuous functions). As most researchers believe the conjecture that the optimal controllers in
WC are analytic, 5 this is a desirable feature of our model.
We summarize the description of our model parameters in Table 4.1.
5
which is a conjecture at this point, but strongly believed, especially in light of the result that optimal controllers
have to be piecewise-analytic [12].
66
K
L
M
{ ao,..., aM}
EFg
{O
O,...,
f[l]
I}
Parameter Descriptions
B, boundary (region of interest)
BO boundary (region for computation)
Number of Elements
Boundary Points
jth Element
Element Set
First Sub-Basis Function
Second Sub-Basis Function
Basis Functions
Approximate Function
Table 4.1: FE Model Parameter Descriptions. These variables represent the principal components of our FE model, which we use to approximate the Stage I controller in the central problem
of this thesis, Witsenhausen's Counterexample.
4.2.2
Formulas for Computing WC Formulas & Equations
In this section, we will detail how we will be computing each of the following functions using our
approximation to
a) No[cZj](y)
f,
E9i
b) N1 [c](y)
0) g [C ](y)
f[c2]:
(y - f[](i)),
iPf[
E29 1
[(f
d) q [c ](y)
-y + g[c](y),
e) G [c2](x)
22 ( Q()-x
f) J[K] Ak2Z
where {}_
1
1
(ii)#
0(y - f{](YO),
)V
No [v y
Wi_1
[cn)] (xi + f [c ](x)) [x? + xinl[c ] (xi + f [cQ](x))-2]
"A (fV1(i) - ii) + EZ_
and {3}v
1
2
(
_
T=1f
Z
2
w
[] (xi + f[j](fy))),
are the abscissas and weights for the Gauss-Hermite quadrature for
the Gaussian probability weighting function of Xo
-
.(O,
Oa2), and
{x}_
and {wj}91 are the
abscissas and weights for the Gauss-Hermite quadrature for the Gaussian probability kernel P1(0,1)
(see Chapter 2 for more details regarding how these weight-abscissa pairs are formed). We leave
the full derivations until Section 4.5 of this Chapter.
Remark It should be noted that, because of the presence of multiple integrals in the above expressions (and, therefore, multiple sums in the above expressions), we naturally encounter the dreaded
67
"curse of dimensionality" so common when modeling expressions involving nested integrals. In particular, we encounter the issue in two cases. The first is in the number of elements that we choose
to use for our function approximation; the more elements, the better the approximation of the function. The second is in the calculation of the above integrals, which involve the Gaussian probability
weighting function,
#(-);
the more quadrature weights we use, the better the approximation.
If
we assumed that most of the integrands are well approximated by polynomials of low degree, then
we can get away with using less quadrature weights (see Section 2.3 in Chapter 2). However, we
cannot necessarily assume that is the case, and, as we stated in the previous section on the necessary condition, the only polynomial controllers
f
that satisfy the variational equation are affine (or,
to wit, linear). As such, our computations do take some time, but, through experimentation and
comparison with known formulas, we managed to find that having
Q
= 80 weights and M = 226
elements work well. The results of using these parameters are depicted in Section 4.3.
4.2.3
Justification of Rational Basis Functions
There are a number of reasons why we chose rational basis functions as our sub-basis functions. For
one, the rational basis functions in (4.10) and (4.11) that we use as sub-basis functions in our FE
model are asymmetric with a bias in the direction of increasing abscissa values. This means that,
in calculations using our FE model for the value of a function at a given point, the nodal values
"downstream" relative to that point contribute more to the value of the function at that point than
do the nodal values "upstream" (e.g., a(10) is downstream of a(5), whereas a(5) is upstream of
a(10)), since 2(x) > x and (1(x) < x on [0, A). Hence, on a given element Ej, the weight value at
a3 has a greater effect on the shape of the function than the weight value at aj_1. This, coupled
with the fact that optimal controllers are non-decreasing, means that our choice of rational basis
function should adequately model optimal controllers because
f [c21 will naturally be non-decreasing
(and strictly increasing unless wj = wj_1) on each element Sj when the weights themselves are
non-decreasing in j (i.e., wj_1 < wj).
Indeed, another reason for choosing rational basis functions is their success in the past. Specifically, rational basis functions were shown to have great success in solving a multitude of problems,
including Burgers' equation in fluid dynamics [21]. This is relevant because, as we saw in Theorem
4.1.1, the necessary variational equation involves a term of the form 77"[f](y) +?q'[f](y)[f](y),
which
is directly related to the viscous Burgers' equation. This, coupled with the smooth nature of our
FE model, means that we should have a very good approximation of optimal controllers.
68
4.2.4
Discussion of "Edge Effects" in MMSE Calculation
Finally, before discussing the results of our numerical experiments, we comment on our need to
define both inner and outer intervals, B, and BO, respectively. This is mainly because of the effect
that truncations have on the calculation of various quantities of interest in WC, in particular, that
of the first conditional moment of f(Xo) given Y[f], N1 [f ].
To expand further, consider an arbitrary controller b(t) and its time-limited variant bT(t), where:
bT (t) =~
b(t ),
)
|t|<T
0,
Itl > T
,t<T E R.
(4.14)
Now, let's consider the first conditional moment for bT(t),
J
N1[bT](y)
bT(x)#(y - bT(x))dF(x)
br(x)#(y
=
-
bT(x))dF(x).
(4.15)
=
0,
(4.16)
_T-
Investigating (4.15), we notice that
lim Ni[bT](y)
y-+o
which is easy to see since, for large values of y,
#(y
- bT(x)) is very small and, because of the
truncation in the integral, we have that generally N1 [bT](y)
than Ni[b](y)
-
-
0 at a faster rate of convergence
0. In fact, even for finite values of y within [-T, T], we will have Ni[bT]
or trending in that manner, regardless of the choice of b.
=
0
So, even if N [b] is much wider and
significantly non-zero outside of [-T, T], we will still have N1 [bTl trending towards 0 inside of
[-T, T]. This is what is meant by "edge effects".
As such, one naturally can see the need for two bounds in our model. The inner bound is used
for analysis and modeling while the outer bound is strictly used for computation and mainly to
provide a "buffer zone" so that the edge effects on the FE computations do not interfere.
Since
this buffer zone has to be, in practice, quite large, we thus need more elements to compensate in
our model in order that we have the appropriate amount of elements to model the controller in the
region of interest, B1 .6
6
The rest, and indeed the majority, of the elements reside in the outer buffer zone, Bo\Bi. Their weights and
the shape of the controller in this outer buffer region do not matter as much in the grand scheme of our model and
69
Parameter Values
0.2
k
05
512
L
2
K
Q
79
M
226
Table 4.2: FE Model Parameter Values for Model Verification. These values are used to
evaluate the accuracy of our FE model approximation for the Stage I controller in the previously
worked-out cases of linear controllers and 1-bit quantization controllers.
Numerical Experiments using Finite Element Model
4.3
In this section, we shall illustrate use of our finite element model as an exploration tool to discover
new aspects about the structure of the necessary condition.
4.3.1
Accuracy of the FE Model
We begin by verifying the accuracy of our FE model formulation as specified in Section 4.2. In
particular, we shall use known analytical results about linear controllers and 1-bit quantization
schemes as discussed in Lemmas 3.3.1 and 3.3.3, respectively, to check our numerically computed
results. We will use the parameter values indicated in Table 4.2, where we note that we have a very
large buffer zone in order to mitigate edge effects in our calculations.
To begin, let us consider the linear case. In Figures 4-3, 4-9, and 4-10,7 we plot the relative
errors 8 in
fx, No[fx], and g[f.\] for A E {0.1, 0.5, 1}. We note that in all 3 cases we have extreme
fidelity with regards to our model mirroring the theoretical, even at a relatively small number of
quadrature weights.
We have greater accuracy the less "rounds" of integration we pursue, since
small errors tend to accumulate over repeated integration. 9 However, we would like to point out
that in cases of large A, our model does not work as well (see Figure 4-1110).
In this case, our
rational basis functions cannot effectively capture such sudden and wide increases. Even so, since
most of the previous work on WC indicates that optimal controllers are bounded in some sense by
the linear line of slope one (and, therefore, are never expected to have such large instantaneous
do not affect significantly the results presented in this thesis.
7
Figures 4-9 and 4-10 are located in Section 4.4.
8
1f m(t) is the theoretical function and mh(t) our approximation, then the relative error, ER(t), is defined as follows:
IMMt-TM~tl
z()A
ER [M
FR[m, m(t)
(maxjm(r),^n(r))9
E.g., computing g(c'I is a successive "round" of integration that builds upon the computation of No [Co].
' 0 Figure 4-11 is located in Section 4.4.
70
4.5 X 10-7
4
3.5
1 x 10-6
0.9!
0.8
0.7
3
0.61
2.5
2
1.5
1
0.5
0.4
0.3
0.2
0.5
0.11
-
-4 -2
x
2 4 6 8 10
(a) Relative Error for
f:
3.5
U-S -6 -4
-
-2
x
0
(b) Relative Error of No: ER[NO, No](x)
ER[f, f(x)
10-7 -
3
2.5
2
1.5
1
0.5
-
4 -2 0 T 4 6 8 10
- 08-- G ---X
(c) Relative Error of g: ER[g, g](x)
Figure 4-3: Relative Errors for Approximation of Linear Controller with A = 0.1.
denote functions approximated using our FE model with a hat, ^.
71
We
1.4 x 10-14
1.2
1
0.8
0.6
0.4
0.2.
0.8
01
-0.2
0.6
-0.4
0.4
-0.6
-0.8
0.21
~10 -8
64-20
- -i--U
-2- -- 24T-20
6810
x
(a) Relative Error for
f: ER[f,
1.2 X
x
(b) Relative Error of No: ER[No, No](x)
f](x)
10-16
-
1
0.8
0.6
0.4
0.2
-0-- - x-
o
(c) Relative Error of g:
S-10
ER[g, g](X)
Figure 4-4: Relative Errors for Approximation of 1-Bit Controller with a = 1. We denote
functions approximated using our FE model with a hat,
increases), our model should perform adequately in exploratory pursuits of WC.
Our model performs even better in the nonlinear 1-bit quantization case. In Figures 4-4, 4-12,
and 4-13,11 we plot the errors in
f, No[f0 ], and g[fa] for a E {1, 0, c}.
Once again, we note
that, in all three cases, our model performs extremely well and is able to accurately capture the
theoretical features of 1-bit quantizers. We also did a test for large o (Figure 4-1412) and noticed
that, unlike in the linear case, our FE model is extremely accurate in these regimes. This can be
because, for large o-, there is only a small sudden change as opposed to the longer change involved
in greater values of A in the linear case.
Finally, we consider the linear case again and see if our model can accurately determine which
linear coefficient values cause the variational equation in Theorem 4.1.1 to go to zero. We know
from Witsenhausen in [3] that any optimal controller must satisfy the variational equation F-a.e.,
"Figures 4-12 and 4-13 are located in Section 4.4.
12
Figure 4-14 is located in Section 4.4.
72
4
I
I
I
I
I
I
0.4
0.5
0.6
0.7
0.8
3.5
3
2.5
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.9
1
Figure 4-5: Absolute Size (IIG[f]({xi}_o)H12) of First-Order Condition for the Linear Case
using FE Model. We set k = 0.2 and a = 5 and use the values for our FE model from Table
4.2. We see that the overall behavior of |IG[A]({x}/_i 0 )11
2 follows that of the theoretical IG[A]I from
Figure 3-2, allowing us to verify the accuracy and suitability of the FE model.
73
and, from [12], we know that optimal controllers must satisfy it exactly. Therefore, let us consider
how our model acts at certain "points of interest" within B1 . In fact, let us consider the shape of the
Euclidean norms of G[AI]
abscissa set, {xi}9
0
G[0]Ia 1 3 along A using just a 10-point Gauss-Hermite quadrature
, as the points on which to evaluate G[LA]. We plot these norms in Figure 4-5. We
note that I|G[A](.x)112 only becomes approximately zero at three values, located roughly around 0.05,
0.3, and 0.95. In fact, when we compare our graph in Figure 4-5 with the necessary optimal condition
that we had derived earlier for the linear case (see Figure 3-2), we notice how similar the graphs look.
Theoretically we have that the only three solution of (3.21) are A = 0.04174243050, 0.3031960455,
and 0.9582575695. This matches up well with our findings using the FE model and so we can be
even more confident in the fidelity of our FE model as an exploratory tool.
4.3.2
Optimizational Framework using FE Model
Encouraged by the success of our model, which we discussed in Section 4.3.1, we now consider
developing an optimizational framework in which we can use our model to investigate the necessary
variational equation and determine new controllers that necessarily satisfy it.
Using the same rationale as we had employed in Figure 4-5 to numerically determine which
linear coefficients satisfy the necessary condition exactly, let us consider the following optimization:
R-1
min llG[c 2
(4.17)
Z G2[](x),
i=:o
CJ
where {x,}i--1 represent R quadrature points, and F is any feasible set of L2 over which we choose
to optimize. 14 Any cZ, that minimizes the above expression is therefore a candidate (through
f[C])
for defining a controller that satisfies the necessary condition.
Remark Before presenting the results of using our model to perform the above minimization, let
us first comment on why this is a reasonable minimization to pursue.
Why frame the problem in
this manner? Should we not perhaps consider the entire 2-norm of G[c](x) over B1 instead? To
answer this, let us consider the cost function in its most general form from Theorem 3.1.1:
J[f]
k 2E [(f (Xo)
-
Xo)2] + IE [f 2 (XO)j
13
-
E [g 2 [f](Y)]
.
(4.18)
Recall that , is the set of all FE boundary points.
In a naive implementation, we would choose F = RM+1, but, because of computational constraints and the
unwieldy nature of RM+1 when M is quite large, we will, in practice, choose a feasible set of much lower dimension.
14
74
20
15
10
5
0
-5
-10
-20
-20
-15
-10
-5
0
x
5
'
10
Figure 4-6: Example of 5-Level Sinusoidal Quantizer with f
'
-15
15
=
20
[-10, -4, 0,3, 15].
In this expression, we see that the underlying expectations are relative to either Xo or Y [f], both of
whose p.d.f.s involve a Gaussian probability kernel,
( ) < (g).
When we approximate the integrals
of these expressions according to our quadrature rules, we find that the only values of our candidate
controller f that affect the cost are those at the abscissa values. This, combined with the aforementioned fact that optimal controllers must satisfy G[f](x) = 0 exactly for all x, means that our
search for controllers that minimize G[f] (as approximated through c2) at these abscissa points is
an extremely worthwhile approach.
Now, using the above framework and our FE model, we can pursue the mathematical optimization as defined in (4.17) using five quadrature points (i.e., R = 5) as our "points of interest." Using
an initial condition of hnit = [0,0,0,0,0] and MATLAB@'s fminunc solver with its default properties for tolerance and precision [221, we choose a rather simple feasible set that is parameterized
75
to:
CO(x- -1Ka) - 7r) +1]+
by
4
Wos ((r ) (x + Ko) -- 7) + ]+
f [N(
=
tk hs
f ine
<
(x +
o
j Kor <x<Ko
[cs ((
)
2
Ko,
x < -Ka
1,
-K n
fx
< -u(q)eKs
x<(
Ko,
(4.i
0
X - 7) + 1] +f3
s((
4,
i,
-); 1] at - (-) Ka <x<0
o -,
2
,
x ;> K
o,
We refer to such functions f [, ] as 5-level sinusoidal quantizers.15
Remark Basically,
f [-] is defined by five "levels" that indicate distinct points/intervals on which
f[i] takes on the values defined by - : - 1 on (-00,
and - 5 on [Ko, oo). The values of
j2K)
o
3at0
4a
j22a
o
f [fl are obtained by stretching a scaled and shifted positively
increasing cycle of a cosine function. An example of what
f[f] looks like for a given f is shown in
Figure 4-6.
Next, let us discuss why we chose this particular feasible set of solutions. The first reason is
quite obvious:
by restricting our optimization from a finite-dimensional space of dimension 226
to one of size 5, we have a much more tractable and practical setting in which we can obtain
meaningful results using standard mathematical optimization solvers in a reasonable amount of
time.
Second, we chose this particular form for our feasible set since it allows us to control the
shape of the controller while keeping it analytic; our approximation of
connected regardless of the choice of
'
f[f] remains smooth and
. Since we already know that optimal controllers have to be
at least piecewise-analytic (Theorem 2, [12]), any controller in our feasible set as defined in (4.19)
is therefore a good candidate for an optimal controller. Indeed, if we increased the number of levels
in our feasible set, we would naturally have greater flexibility (and a greater chance) of obtaining
better-performing controllers, but, for proof of concept, a size of n = 5 is acceptable, as we shall
see next.
Remark It should be noted that we are still using our FE model as described earlier and merely
tailoring the larger set of weights, c., so that they approximate a function with the stipulations
1 5 We note that we can easily determine the appropriate FE weight vector c2 for our 5-level sinusoidalquantizers
by simply sampling f[f] at a. So, implicitly, when we define our feasible set of 5-level sinusoidal quantizers for the
mathematical optimization in (4.17), we are considering the subset of Rm+1 whose weight vectors can conceivably
represent a sampling of a feasible 5-level sinusoidal quantizer at
.
=[-i,- 251,4,'%] according
76
Parameter Values
k
0.2
a5
L
512
K
2
Q
79
226
M
Table 4.3: FE Model Parameter Values used to find Optimal 5-level Sinusoidal Quantizer.
These parameters were used to fully define the optimization problem (4.17).
Optimal Weights
5*, 1 -0.611762678445885
5T, 2 -0.164533341705034
t5*, 3
t5*, 4
5* 5
-0.000110638239963
0.164268853065035
0.611542404605543
Table 4.4: Optimal 5-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table
4.3, R = 5, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
as given in (4.19). Therfore, we could say that we are "bootstrapping" our 5-level smooth feasible
controller model on the back of our (M + 1)-weight FE model.
Using the above feasible set and the FE model parameters as specified in Table 4.3 with R = 5,
we obtain the optimized result of G[f-*] as shown in Figure 4-8.
The weights of this optimal 16
function are depicted in Table 4.4, and the value of the objective function as computed by our
model is
4
G 2[ig*(X,) = 0.0000001474794586097925,
i=O
which is well within the realm of approximately satisfying the necessary condition at the five important points. However, when we compute the cost of f[f*], we have
J[f5*] = 0.963530809260363,
which is greater than the best linear cost for the (k = 0.2, - = 5) case (J[A*]
= 0.96) and much
greater than the as-of-now best known cost (J[f*] = 0.1670790, [6]). Obviously, this is not ideal
since we want to find the Stage I controller of minimum cost (that is, after all, the main objective of
16
in the sense of satisfying the necessary condition at the specified R quadrature points.
77
0.8
0.4
-
0.2
-
-
0.6
0
-0.2
-0.4
-0.6
-0.8
-20
-15
-10
-5
0
5
10
15
20
Figure 4-7: Optimal 5-Level Sinusoidal Quantizer. The form of this Stage I controller was
determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 5, and
Gauss-Hermite quadrature pairs as depicted in Table 4.9.
78
--------- 7
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.4
-0.5
-
-0.3
-
-0.2
-20
-15
-10
-5
0
5
10
15
20
x
Figure 4-8: Necessary Functional Equation (G[rj*]) for Optimal 5-Level Sinusoidal Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17) with
the FE model parameters as depicted in Table 4.3, R = 5, and Gauss-Hermite quadrature pairs as
depicted in Table 4.9.
79
the problem). However, as we discussed earlier in Chapter 3, this is not unexpected since there exist
many solutions to the necessary condition for any given (k, -) pair, and our approximate solution
is not necessarily the largest.
In addition, one may wonder if our solution is zero everywhere else, other than the R = 5
quadrature points of interest at which we evaluated G[ ]. In this case, our optimal 5-level sinusoidal
quantizer is in fact nonzero everywhere outside of the R quadrature points, as Figure 4-8 shows.
However, this is not unexpected since the objective function in (4.17) does not take into account the
1
values of G[f] outside of the R quadrature abscissa points, {X}?_-
. This means that our optimal
sinusoidal quantizer does not satisfy the necessary functional equation everywhere, but we have to
keep in mind that, in the context of the overall problem, the most "important" points at which we
want to ensure the necessary condition is satisfied are the Gaussian quadrature points. In our case,
we accomplish this using the first five Gaussian quadrature points.
Now, we should note that we can find solutions to the optimization problem depicted in (4.17)
using other n-level sinusoidal quantizers and a greater number of Gaussian quadrature points.
Indeed, we can depict the n-level sinusoidal quantizer as a function of the form:
- ,1,
f [tn](x) =
( nj+1
-inj) conn (x - (n (j - 1) - B)
+ nj,
tn,n,
if x < --B
if x Ez
j, j =I1 . .. ,In--
if x > B
(4.20)
where
conn(x) =
x
cos
-
ir
+
J
and
B
min{ac- I ac- > x, x E {x2 } _ 1 , a E N},
with
nB
and
2B
n - I
We can then solve (4.17) numerically for various values of n
-
R.1 7 In Figures 4-15, 4-16, 4-17,
4-18, 4-19, and 4-20 as well as in Tables 4.6, 4.7, and 4.8, we demonstrate optimal n-level sinusoidal
1
7
We set
n = R so as to allow for greater degrees of freedom in solving (4.17).
80
Optimal Costs & Norms
n
=R
3
4
5
6
||G[T*]||2
1.771340660161439e-07
1.116564623767737e-07
1.474794586097925e-07
1.645876831841833e-06
J [_*]I
0.961825561638322
0.961581394710436
0.963530809260363
2.075764121916171
Table 4.5: Costs and Necessary Functional Equation Norms (IIG[-r*]J). These values were
determined computationally as the solution of (4.17) with the FE model parameters as depicted in
Table 4.3 for n = R = 3, n = R = 4, n = R = 5, and n = R = 6 and Gauss-Hermite quadrature
pairs as depicted in Table 4.9.
quantizers for the cases of n =
values of |IGL,*]H
R
3, n = R = 4, and n = R = 6.18 Table 4.5 shows their respective
and J[f*]. We notice that, just like in the case of n = R = 5, G[f *] ~~0 while the
overall costs J[',*] are still above the optimal linear cost J[A*] = 0.96, but, as we discussed earlier,
this is not unexpected nor extremely consequential given the nature of the randomness in WC. Also,
it should be noted that our obtained optimal functional solution in the n = R = 6 case (as depicted
in Figure 4-19) is not non-decreasing, a property that optimal controllers in WC must possess [3].
This lack of non-decreasing behavior is reflected in the overall cost of this 6-level sinusoidal quantizer
being much greater than the benchmark linear case: J[f*]
4.3.3
=
2.075764121916171 > 0.96 = J[A*].
Analysis of Results
From these results, we can state concretely a couple of positive conclusions.
For one, we have
demonstrated that our FE model can be used not only in a verification capacity but also as a tool to
make meaningful progress on WC through optimization frameworks and feasible set definitions. For
another, we have discovered an additional function besides linear functions that can approximately
satisfy the necessary functional equation at three, four, five and six important quadrature points.
In fact, we strongly believe that, if we increase the number of quadrature points we optimize over
and/or the number of weights in our feasible set, we can obtain even better approximations for any
number of quadrature points!
In addition, we can state some more observations about solutions to the variational equation
using Figure 4-8. We note that our minimized solutions (with the exception of - *) naturally tend
towards odd symmetry, which agrees with our intuition since all of the independent random variables in 7r(k2 , ,2 ) are symmetric. Indeed, we also notice that our optimal controllers are generally
increasing, a characteristic that agrees with the result of Lemma 7 from Witsenhausen [3], which
18 Figures 4-15, 4-16, 4-17, 4-18, 4-19, and 4-20 and Tables 4.6, 4.7 and 4.8, are located in Section 4.4.
81
states that optimal controllers must be non-decreasing. Finally, for the specific case of n = R = 5,
we note how -N,2,
T5,3,
and
T5,4
are all closely grouped around the origin. This makes sense because
the origin is where most of the underlying probability of X 0 is concentrated, and that, by sending
more of this probability to this region, our controller is making the job of our MMSE estimator in
Stage II easier (meaning less estimation error, which, in turn, means less overall cost). This latter
behavior is also mirrored in the optimized controllers for n = R = 3 and n = R = 4.
All of these observations agree with the rationale employed in previous attempts in solving WC
by optimizing over (2n + 1)-quantization schemes. Thus, they provide great credence to both the
veracity of our model as well as its potential as a tool for investigating WC in further detail in other
research contexts.
4.4
Additional Plots for Chapter 4
82
1 x 10-6
x
0.9
1
0.8
0.7
0.6
0.5
0.8'
0.6
0.41
0.3
0.4
0.2
0.1
0.2
910 -8 6 -4 -2
10-6
x
Qi -8 -6 -4 -2
2 4- 6 8 10
(a) Relative Error for
f:
ER[f,
f](x)
x
--Z 4
6
9
0
(b) Relative Error of No: ER[No, No](x)
1.5 x 10-7
1
0.5
10-8
6 -4 -2
0 2
x
4
6
8 10
(c) Relative Error of g: ER[g,g](x)
Figure 4-9: Relative Errors for Approximation of Linear Controller with A
denote functions approximated using our FE model with a hat, ^.
83
=
0.5. We
1
4.5 x 10-3
4
-10-6
0.9
0.8
3.5
0.7
31
2.5
0.6
0.5
2
0.4
0.3
1.5
0.2
0.5
0.1
0-8
9
6 -4---- --2-4 6 8 10
x
(a) Relative Error for
f:
ER[f,
10-8 -G -4 -2-0
x
4-
IO
(b) Relative Error of No: ER[No,No](x)
f](X)
1.5 x 10-3
0.5
t-S4
2
2-4
x
2 4 6 8 Ii0
(c) Relative Error of g: ER[g, ](x)
Figure 4-10: Relative Errors for Approximation of Linear Controller with A
denote functions approximated using our FE model with a hat, ^.
84
=
1. We
i x-10-6
0.9
0.7
0.6
0.8
0.5
0.7
0.6
0.5
- 4 6 8 10
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
01 -8 -
-4 -2
x
QG -8 -(15 -4 -2
4 6 8 10
2
(a) Relative Error for f:
2
(b) Relative Error of No: ER[No, No](x)
ER[fj](X)
0.35
0.3
0.25
0.2
0.15
0.1
0.05
910-8 -6-4 -2
x
2
4
810
(c) Relative Error of g: ER[g,g](x)
Figure 4-11: Relative Errors for Approximation of Linear Controller with A
denote functions approximated using our FE model with a hat, ^.
85
5.
We
1.4 X 10-14
1.6 x 10-16
1.4
1.2
1.2
1
1
0.8
0.6
0.8
0.6
0.4
0.2
91048
0.4
0.2
-4-2
x
6
2
(a) Relative Error for
f:
S
0
8
- -4 -2
x
2- 4
0
(b) Relative Error for No: ER[No, No](x)
ER[f, f](x)
10-16
3.5
3,
2.5
2
1.5'
0.5
0
9
-4 20
x
810
(c) Relative Error of g: ER[g, ](x)
Figure 4-12: Relative Errors for Approximation of 1-Bit Controller with a
denote functions approximated using our FE model with a hat, ^.
86
=
2.8.
We
1.4 x 10-14
0.81
0.6
0.4
1.2
0.2
0.8
01
-0.2
-0.4
-0.6
-0.8
0.6
0.4
0.2
-i
-0 o8 -6 -4 -2 0x 2 4 6 8 10
-
-
- -2
x
4
6
0
(b) Relative Error for N0 : ER[No, No](x)
(a) Relative Error for f: ER[f, f](x)
4;-x 10-16
3.5
3
2.5
2
1.5
1
0.5
910-8-6 -4 -2
0 2 4 G 8
0
(c) Relative Error of g: ER[g, g](x)
a = 5. We
Figure 4-13: Relative Errors for Approximation of 1-Bit Controller with a =
denote functions approximated using our FE model with a hat, ^.
87
1.2 X 10-14
0.8
0.6
0.4
0.8
0.2
0
0.6
-0.2
-0.4
0.4
-0.6:
0.2
-0.8
-8 -6 -4 2 0 2
x
(a) Relative Error for
Qi
46810
f:
ER[f,
f](x)
6 -8 -4 -2 U
x
Z
I
4
Z5
I
IU
(b) Relative Error for No: ER[No, No](x)
10-14
1
U
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3:
0 .2
0 .1
0
0--
x
41
110
(c) Relative Error of g: ER[g, ](x)
Figure 4-14: Relative Errors for Approximation of 1-Bit Controller with a = 0 2 = 25. We
denote functions approximated using our FE model with a hat, ^.
88
Optimal Weights
t3*,2
-0.370552234771981
0.000071861774708
3,3
0.370689948937737
t3*,
Table 4.6: Optimal 3-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table
4.3, R = 3, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
0.4
0.3
0.2
0.1
0
-0.1
-0.3
-
-0.2
-0.4
-20
-15
-10
-5
0
5
10
15
20
Figure 4-15: Optimal 3-Level Sinusoidal Quantizer. The form of this Stage I controller was
determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 3, and
Gauss-Hermite quadrature pairs as depicted in Table 4.9.
89
0.1
II
I
I
I
-4
-2
0
2
I
I
-
0.08
I
0.06
-
0.04
-
0.02
0
-
-0.02
-0.06
-
-0.08
-
-
-0.04
'
-0.1
-10
-8
-6
4
6
8
10
Figure 4-16: Necessary Functional Equation (IIG[r3*1]) for Optimal 3-Level Sinusoidal
Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17)
with the FE model parameters as depicted in Table 4.3, R = 3, and Gauss-Hermite quadrature
pairs as depicted in Table 4.9.
90
Optimal Weights
f4*1
-0.498521259836172
t 4 ,2
-0.155696146588598
0.155941726138290
0.498769790186308
tZ*3
4* 4
Table 4.7: Optimal 4-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table
4.3, R = 4, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-20
-15
-10
-5
0
5
10
15
20
Figure 4-17: Optimal 4-Level Sinusoidal Quantizer. The form of this Stage I controller was
determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 4, and
Gauss-Hermite quadrature pairs as depicted in Table 4.9.
91
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-20
-15
-10
-5
0
5
10
15
20
Figure 4-18: Necessary Functional Equation (flG[T] I|) for Optimal 4-Level Sinusoidal
Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17)
with the FE model parameters as depicted in Table 4.3, R = 4, and Gauss-Hermite quadrature
pairs as depicted in Table 4.9.
92
Optimal Weights
-4.67326888267048
t*h2 6.617177471430276
-3.621888183288994
T
6 3
t6, 42.585931219917199
T6 ,5 -2.669563529874990
t6T,1
6 ,6
6.475537965147624
Table 4.8: Optimal 6-level Sinusoidal Quantizer Weight Values. These values were determined computationally as the solution of (4.17) with the FE model parameters as depicted in Table
4.3, R = 6, and Gauss-Hermite quadrature pairs as depicted in Table 4.9.
8
6
4
2
0
-2
-4
-6
-20
-15
-10
-5
0
5
10
15
20
Figure 4-19: Optimal 6-Level Sinusoidal Quantizer. The form of this Stage I controller was
determined by solving (4.17) with the FE model parameters as depicted in Table 4.3, R = 6, and
Gauss-Hermite quadrature pairs as depicted in Table 4.9.
93
2.5
2
1.5
1 -0.5
0
-0.5
'
-1
-20
-15
-10
-5
0
5
10
15
20
Figure 4-20: Necessary Functional Equation (JIG[r*]|I) for Optimal 6-Level Sinusoidal
Quantizer Weight Values. The form of this Stage I controller was determined by solving (4.17)
with the FE model parameters as depicted in Table 4.3, R = 6, and Gauss-Hermite quadrature
pairs as depicted in Table 4.9.
94
Gauss-Hermite Quadrature Values
R
{xi}_____
3
-8.660254037844387
0
8.660254037844387 _
-11.672071091694885
4
5
6
1
1
1
[
F0.045875854768068 1
0.454124145231932
-3.709818921513630
3.709818921513630
L 11.672071091694885 _
-14.284850069364028
-6.778130899871329
0
6.778130899871329
14.284850069364028 _
-16.621287167760595
-9.445879388768555
-3.083532950962971
3.083532950962971
9.445879388768555
16.621287167760595
{Wi_1
~ 0.166666666666667
0.666666666666666
0.166666666666667 _
_
0.454124145231932
L 0.045875854768068 j
1
0.011257411327721
0.222075922005613
0.533333333333332
0.222075922005613
_ 0.011257411327721 j
0.002555784402056
0.088615746041914
0.408828469556030
0.408828469556030
0.088615746041914
_ 0.002555784402056 j
Table 4.9: Gauss-Hermite Quadrature Nodes ({xi}jL 1 ) and Weights ({wi}
were determined computationally for the case of (k = 0.2, o = 5).
95
). These values
4.5
Derivation of FE Model Approximations for Chapter 4
In this section, we will detail our derivations of the FE model formulas and equations that are
derived from our approximated controller,
f [cZ].
Proof a)
No If [ ] ] (y) =
J:
- f [c.](x))dF(x)
'/(y
~0 iq(y - f[P1(Yi))
No [c(y).
b)
Ni[f []](y) =
-
0f [](x)(y
f [c] (x))dF(x)
~ 6if[](i)#$(y - f[Z](zi))
i[f[c]](y).
c)
g I[f
N1 [f []](y)
No[f []](y)
]I(Y)
fq]
JJWX\No[f[c]](y)
f[](x)
~
-
f P]()
(No
[c'](y)
q!4y
-
f P1](X))
N(y
- f[PJ(i))
No [ o](y)
(
Q[
(y
d)
'[f []](y) = -y + g[f []](y)
96
dF(x)
dF(x)
-y + g[](y)
A?7[W](y).
e)
G[ f []I(x) = 2k 2 (fp](X)
+
_ X)
[(y
(y - f[ c2](x))r7f[c ]](y)
[C](X))2 + n[f []](y)(y
-
f[](X))
-
-
21 dy
2k 2 (f[cj(x) - x)
q(y - f[P](x))7]
2k 2 (f[C](x)
-
x) +
(y) [(y
-
j
+
f[C0](X))2 + T[](y)(y
Win[C](X, + f [p](x)) [x2 + x7
f[p](x)) - 2] dy
-
](x, + f [](x)) -21
f)
k 2 E [(f
(Xo)
-
f=(f](X)
Xo)2] +JE [f 2p](Xo)]
X)2 dF(x)
Q
Eiif 2 ](ii)
+
(f [CQl(y ) - i)
fi23f2 K](y,)
100 (
j0 0 2[ ](y)#(y -
f
Q
f[ ](x))dy) dF(x)dy
f [](x))) dF(x)
Q
Ewi (fj002[C]
-
?_,f 2
)+
" [] (xi
+
_
f2 [;]y,
+=
0i (fp](y) - i
f[](x))dF(x)) dy
Q
Q
Q
-
Q
+
7b
(y
jg02[c;2](y)
i 1
_f P1 V)
Q
k2
92 [f [C]](y)No[f [C]](y)dy
(x,
+
i1
= k2
j
Q
7bi
-
k2
E[g2
-
f 2[](x)dF(x) -
+-
(
f [c](x)) dF(x))
i=1
i=1
Q
=2
7
(fp](y,)
)+
-
?_D,(f (]
(;f)
i=1
A i
p]
i=1
i=1
Q1
= k2
?bf2
-
i)
+ L 7_,if 2p](Z,)
_
[C
)].
97
(j=1
WIj2 p] (X, + f [p](LF))
i=1 j=1
i=1
Lwi[Q Lq&gP(xi + f [Q(j)))
-
Jf [K]]
98
Chapter 5
Conclusion
5.1
Contributions
In this thesis, we examined the necessary condition for optimality in WC. This necessary condition
presents itself as a homogeneous integral equation in terms of the Stage I controller,
be solved exactly by any optimal
f.
f,
and must
We started by presenting WC in full generality in two forms,
the classical version, originally put forth by Witsenhausen in [31, and a new variant using optimal
transport theory, put forth by Wu and Verdu in [121. We then proceeded to derive the necessary
condition in full using the first-order condition for optimizing functionals from calculus of variations.
Having done this, we developed a computational model to investigate solutions to this necessary
condition in the case when all the native random variables are Gaussian, a system specification
that we had denoted as 7r(k 2 , a 2 ).
Using a finite element analysis approach, we specified elements
and detailed our basis functions. These basis functions were asymmetric rational basis functions,
which we used primarily for their computational bias towards nondecreasing functions as well as
because of their smoothness. We verified our model against theoretically derived expressions for the
cases of both linear controllers and 1-bit quantizing controllers. We then used this finite element
model within a mathematical optimizational framework to approximately solve the necessary condition at specified Gauss-Hermite quadrature points. We defined a simple family of n-parameter
controllers (the n-level sinusoidal quantizers) and demonstrated that we could find controllers that
approximately satisfy the necessary condition within this family.
99
5.2
Future Work
There are a number of directions for future research that the work in this thesis opens up. For one,
our successful use of finite element analysis in a control-theoretic setting further enforces an already
growing trend within the field and should encourage more analysis of control problems using similar
numerical techniques.
In addition, our repeated and successful use of Gaussian quadrature in approximating expectations and integrals involving Gaussian probability kernels should signal to other researchers that
this simple numerical integration scheme can be used to quickly and accurately test the performance
of their designed controllers before stress testing them using more sophisticated methods like Monte
Carlo sampling.
Moreover, our development of a versatile n-parameter controller family demonstrates that controllers need not be too complicated in order to satisfy the necessary condition.
In fact, a very
interesting research direction would be to increase the number of weights and generalize our nparameter controller family to a more sophisticated family of controllers. Additionally, some of the
efficient search methods that have been used in both [51 and [6] with great success for the (2n+1)-bit
quantization controllers could be employed within this feasible set.
Finally, our work has shown that the necessary condition itself can be treated in a computational
manner and is not as daunting of a object to study as it would appear. In fact, one could use the
general structure of our model to study the effects of using different basis functions and/or unevenly
sized elements in an effort to model even more accurately the equations and formulas that arise in
WC. For example, one could look into using quadrature abscissa points as the boundary points of
the elements instead of the uniformly spaced boundary points we utilized in our model.
100
Appendix A
MATLAB® Code
In this appendix, we list the main MATLAB@ functions that we developed for our computational
finite element model.
A.1
Function Arguments
In the functions below, the common arguments used are the following:
" a is a function handle that maps the ordinal position of an element in the FE model (Section
4.2) to its location on the abscissa real axis;
* Bnd refers to the supremum of the BI bounded interval in which we conduct the computation
using the FE model, i.e., Bnd = Ko- (Section 4.2.1);
* den is the denominator of a fractional quantity;
* Del refers to the mesh size of our FE model, A (Section 4.2.1);
" f is a function handle representing the Stage I controller, typically composed of our FE
approximation via approxRBF and requiring both an argument at which to evaluate it (i.e.,
x) and a vector of FE weights (i.e., w) (Section 4.2);
" fVec is a vector comprising the values of f when f is evaluated at the abscissa values of the
Gauss-Hermite quadrature [ XO, WO ] ;
" j refers to the jth element of the FE model,
j
= 1,..., M (Section 4.2.1);
" k refers to the scaling factor for the Stage I cost in WC (Section 3.1);
101
" 1
indicates the first sub-basis function
when 1 = 1 and the second sub-basis function
'1
2
when 1 = 2 (Section 4.2.1);
" m is the total number of elements, or number of points at which we place a basis function,
which we formally denoted as M (Section 4.2.1);
" num is the numerator of a fractional quantity;
" tauVec is a vector of length n representing the weights of an n-level sinusoidal quantizer
(Section 4.3.2);
" w is a vector of length M representing the weights (ca), or values at each element's location,
of the overall FE model (Section 4.2);
" x is a general variable declaration at which to evaluate a given function, typically representing
the initial Stage I random variable realization Xo (Section 3.1);
" y is a general variable declaration at which to evaluate a given function, typically representing
the output;
[X0, WO ] are vectors of length
Q
(Section 4.2.2) representing the Gauss-Hermite quadrature
XO = {xi}-_O
,
(Section 2.3) abscissa points,
and the probability weights at those points
WO =
for the initial Stage 1 random variable Xo
{fwi}QI
_
-
(O, .2 ) (Section 3.1);
[ X, W] are vectors of length Q (Section 4.2.2) representing the Gauss-Hermite quadrature
(Section 2.3) abscissa points,
X=
[p};11
and the probability weights at those points,
W=
iO
for the additive white Gaussian noise V ~ .(O,
102
1) (Section 3.1);
* [X1, W1] are vectors of length R (Section 4.3.2) representing the Gauss-Hermite quadrature
(Section 2.3) abscissa points,
X1 = [{xi}R1
and the probability weights at those points,
W, =
IIG[ ]i
at which
A.2
{fWi~R-1]>
(Section 4.3.2) is evaluated.
Function Specifications
Next, we shall detail the MATLAB® functions that we developed to obtain the results in this thesis.
alpha.m
This function calculates the approximate value of the right summand in the necessary functional
equation G[f](y) (Theorem 4.1.1) using a FEM representation of the Stage I controller f via 77[Ci]
(Section 4.2.2):
Q-1
ret
: wjy[w](xj + y) (2(xj + y) 2 + T[W](Xj + y) - 2).
=
j=0
1
function
[ ret
]
=
alpha( x,
w,
f,
fVec,
XO,
WO,
X,
W
2
phi = @(t)
3
Xfx = X+arrayfun(@(t)
4
etaVals
5
diff = arrayfun(@(y)
6
FuncVals = etaVals.*(2.*diff.A2 + etaVals.*diff -
7
ret = WO'*FuncVals;
8
1/sqrt(2*pi).*exp(-1/2*t.^2);
f(t,w),x);
= arrayfun(@(y)
y,
eta(y,w,f,fVec,X0,WO),
Xfx)-arrayfun(@(t)
end
103
Xfx);
f(t,w),
x);
2);
approxRBF.m
This function creates a function handle for a FE model of a continuous function represented by the
element weight vector w (Section 4.2.1):
val
[
function
]
val
approxRBF( x,
=
w, Del, m,
a
0;
2
val
3
curElem = getCurElem(x,m,a);
4
if
=
f[w].
curElem >=
1
if curElem <= m
5
val
6
= w(curElem) .*getBasisFcn(x,curElem-1,Del,m,a)
+
.
1
=
w(curElem+l) .*getBasisFcn (x,curElem,Del,m,a);
7
else
val
8
9
end
else
10
11
val
12
13
= a(m);
=
a(O);
end
end
calcGF.m
This function calculates IIG[f]
IIR (Section 4.3.2) using the FE model representation of the necessary
functional equation G[Z] (Section 4.2.2):
G = JIG[w]IIR.
1
function
[ G ]
= calcGF(
f,
tauVec,
2
w = getCosStepVec( tauVec,
3
G = norm(arrayfun(@(t)
X,
W, Xl,
Wl,
m, a,
Bnd
a, Bnd);
2*k^2*(f(t,w) -t)+alpha(t,w,f,arrayfun(@ (u)
f(u,w),XO),XO,WO,X,W),
4
m+l,
k, XO, WO,
Xl),2);
end
104
...
calcJ.m
This function calculates J[f] (Section 3.1) using a FE model representation of the Stage I controller
f via J[.,] (Section 4.2.2):
J = J[w].
1
[ J ]
function
= calcJ(
f,
tauVec,
k,
2
w = getCosStepVec( tauVec, m+1,
3
fVec = arrayfun(@(u)
XO,
a,
WO,
X,
W, X1,
m, a,
Bnd
Bnd);
f(u,w),XO);
4
5
J = WO'*(k^2.*((fVec - XO).^2));
6
J
= J
+ WO'*(fVec.^2);
7
J
= J
-
8
% J + WO'*arrayfun(@(t)
WO'*arrayfun(@(t)
X+arrayfun(@(v)
% WO'*arrayfun(@(t)
f(t,w)^2,
XC);
XC);
mmse(u,w,f,fVec,X0,WO)^2,
W'*arrayfun(@(u)
f(vw),t)),
k^2*(f(t,w)-t)^2,
XO);
end
eta.m
This function calculates the approximate value of q(y) (Section 3.1) using a FE model representation
of the Stage I controller f via r[] (Section 4.2.2):
ret = r/[w](y).
1
function
ret
2
3
[ ret ] = eta( y, w,
arrayfun(@(v)
-v
f,
fVec,
XO,
WO)
+ mmse(v,w,f,fVec,
end
105
XO, WO),y);
getBasisFcn.m
This function calculates the output of the jth basis function
4'
when evaluated at the abscissa
value of x:
val = ij(x).
1
[ val ] = getBasisFcn
function
2
val = 0;
3
if
( x,
j,
Del,
a)
m,
0
j <
return;
4
j > m;
elseif
5
return;
6
j == 0
elseif
7
val = getRatBasisFcn(x-a(0),Del,1);
8
elseif j
9
10
if
m
x == a(m)
val = 1;
11
else
12
val = getRatBasisFcn(x-a(m-1),Del,
13
2
);
end
14
else
15
val = getRatBasisFcn(x-a(j),Del,l)
16
+ getRatBasisFcn(x-a(j-1),Del,2);
end
17
18
==
end
getCosStepVec. m
This function calculates the FE model weight vector c2 (Section 4.2.1) from a given n-level sinusoidal
quantizer realization via tauStar.
[ wO
] = getCosStepVec( tauVec, m, a, Bnd)
1
function
2
%m = numElements;
a = partitionFcn;
3
wO = zeros (m,l);
4
NWVec = length(tauVec);
5
numBndWeights = NWVec-2;
106
6
Delta = 2*Bnd/(numBndWeights+1);
7
connectorFcn = @(t)
8
begElem = int32(getCurElem(-Bnd,m,a))-l;
9
wO(1:(begElem+l))
0.5*(cos((pi/Delta)*t-pi)+1);
= tauVec(1).*ones(begElem+1,1);
10
endElem = int32(getCurElem(Bnd,m,a));
11
wO((endElem+l):m)
12
bVec =
= tauVec(NWVec).*ones(m-endElem,l);
[begElem, arrayfun(@(t)
...
int32 (getCurElem(-Bnd+t*Delta,m,a)), 1:numBndWeights)
-
1,
endElem];
for i = 2:length(bVec)
13
= arrayfun(@(t)
subWVec
15
offsetInd = a(double(bVec(i-1)));
16
wO((bVec(i-1)+l):(bVec(i)-1))
=
connectorFcn(t-offsetInd),
wO(bVec(i))
17
[(bVec(i-l)+1):(bVec(i)-1));
a(double(t)),
14
(tauVec(i)-tauVec(i-1)).*arrayfun (@(t)
subWVec)
...
+ tauVec(i-1);
= tauVec(i);
end
18
end
19
getCurElem.m
This function returns which element, or, more specifically, which element interval, a given abscissa
value lies in. For example, if x E Ej (Section 4.2.1), then
elemNum =
1
function
[ elemNum
2
elemNum = 0;
3
12m = log2(m);
4
if x >= a(0)
5
6
7
8
=
getCurElem
(
x, m,
if x < a(1)
elemNum
=
1;
elseif x <= a(m)
if x > a(m-1)
elemNum = m;
9
10
]
else
11
elemNum = m/2;
12
for i = 2:(12m)
107
a)
j.
if x >= a(elemNum)
if x < a(elemNum+l)
elemNum = elemNum + 1;
return;
else
elemNum = elemNum + 2^(12m-i);
end
else
if x >= a(elemNum-1)
elemNum = elemNum;
return;
else
elemNum = elemNum
-
2^(12m-i);
end
end
end
end
else
elemNum = m+1;
end
else
elemNum = 0;
end
36
end
getRatBasisFcn.m
This function calculates the output of the 1th sub-basis rational basis function
when evaluated at the abscissa value x:
val =
1
2
val
3
if
4
[ val ] = getRatBasisFcn
function
=
Del,
0;
x >=
if
( x,
(1(x).
0
x < Del
108
1)
j (Section 4.2.1)
1
if
5
val
6
=
2
4/3-(4/3)/(1+(x/Del)+(x/Del)^2+(x/Del)^3);
elseif 1 == 1
7
val = -1/3+(4/3)/(1+(x/Del)+(x/Del)^2+(x/Del)^3);
8
end
9
end
10
end
11
12
==
end
GF.m
This function calculates G[f](y) (Theorem 4.1.1) using a FE model representation of the Stage I
controller f via G[Z,] (Section 4.2.2):
G
1
[ G ]
function
= GF( f,
tauVec,
2
w = getCosStepVec( tauVec,
3
G = @(t)
4
=
G[w](y).
k, XO,
m+1,
a,
WO,
X,
W, Xl,
Wi,
m,
a,
Bnd
Bnd);
f(u,w),XO),XO,WO,X,W);
2*k^2*(f(t,w)-t)+alpha(t,w,f,arrayfun(@(u)
end
mmse.m
This function calculates the approximate value of the MMSE g(y) (Section 3.1) using a FE model
representation of the Stage I controller f via g[c] (Section 4.2.2):
ret = g[w](y).
1
[ ret
function
] = mmse( y,
w,
f,fVec,
2
tol
=
le-150;
3
phi
=
@(t)
4
condPDF = smallDivide(arrayfun(@(u)
1/sqrt(2*pi).*exp(-1/2*t.^2);
f(t,w),u)),y),XO),
5
XO, WO)
arrayfun(@(t)
arrayfun(@(v)
phi(v-arrayfun(@(t)
...
NO(t,w,f,XO,WO),y), tol);
FuncVals = fVec.*condPDF;
109
R.qq.ppm a
"PRW
ret = WO'*FuncVals;
6
7
end
NO.m
This function calculates the approximate value of the No(y) (Section 3.1) using a FE model representation of the Stage I controller f via No[c] (Section 4.2.2):
ret = No[w](y).
1
[ ret
function
@(t)
]
= NO(
y,
w,
phi
3
FuncVals = arrayfun(@(t)
4
ret = WO'*FuncVals;
5
XO, WO)
1/sqrt(2*pi).*exp(-1/2*t.^2);
2
=
f,
arrayfun(@(u) phi(u-arrayfun(@(v) f(v,w),t)),y),XO);
end
quantSign.m
This function represents a 1-bit quantizer function
out =
1
2
N = length(x);
3
out = ones (N, 1);
4
for j = 1:N
5
if
<
out(j)
6
0
=
-1;
end
7
end
8
9
x(j)
1 (Section 3.3.2):
fala=1 (x).
[ out ] = quantSign( x
function
fal-
end
110
smallDivide.m
This function serves as a wrapper around a division operation such that, if a MATLABS computation result is in either an oc or NaN declaration, the function returns 0. Otherwise, this function
returns n,.
den*
1
function
[ val ]
= smallDivide( num,
2
n = length(num);
3
val = zeros(n,1);
4
for i = 1:n
5
val(i)
6
if isinf(val(i))
7
= num(i)/den;
val(i)
=
0;
8
end
9
if isnan(val(i))
val(i)
10
11
=
0;
end
end
12
13
den
end
111
112
Bibliography
[1] Y.-C. Ho, "Team decision theory and information structures," Proceedings of the IEEE, vol. 68,
pp. 644-654, June 1980.
[2] J.-M. Bismut, "An example of interaction between information and control: The Transparency
of a game," IEEE Transactions on Automatic Control, vol. 18, no. 5, pp. 518-522, 1973.
[3] H. S. Witsenhausen, "A counterexample in stochastic optimum control," SIAM J. Contr., vol. 6,
no. 1, pp. 131-147, 1968.
control
[4] S. Mitter and A. Sahai, "Information and control: Witsenhausen revisited," in Learning,
and hybrid systems (Y. Yamamoto and S. Hara, eds.), vol. 241 of Lecture Notes in Control and
Information Sciences, pp. 281-293, Springer London, 1999.
[5] J. T. Lee, E. Lau, and Y.-C. Ho, "The Witsenhausen counterexample: A hierarchical search
approach for nonconvex optimization problems," IEEE Transactions on Automatic Control,
vol. 46, pp. 382-397, March 2001.
[6] N. Li, J. R. Marden, and J. S. Shamma, "Learning approaches to the Witsenhausen counterexample from a view of potential games," in Proceedings of the 48th IEEE Conference on Decision
and Control, 2009 held jointly with the 2009 28th Chinese Control Conference (CDC/CCC),
December 2009.
[7] Y.-C. Ho and T. S. Chang, "Another look at the nonclassical information structure problem,"
IEEE Transactions on Automatic Control, vol. 25, pp. 537-540, June 1980.
[8] C. H. Papadimitriou and J. N. Tsitsiklis, "Intractable problems in control theory," Proceedings
of 24th IEEE Conference on Decision and Control, pp. 1099-1103, December 1985.
[9] P. Grover, A. Sahai, and S. Y. Park, "The finite-dimensional witsenhausen counterexample,"
Proceedings of the 7th InternationalSymposium on Modeling and Optmization in Mobile, Ad
Hoc, and Wireless Networks, 2009 (WiOPT), pp. 1-10, June 2009.
[10] P. Grover and A. Sahai, "Witsenhausen's counterexample as Assisted Interference Suppression,"
Int. J. Syst. Control Commun., vol. 2, no. 1/2/3, pp. 197-237, 2010.
[11] C. Choudhuri and U. Mitra, "On Witsenhausen's counterexample: The asymptotic vector case,"
in Proceedings of 2012 IEEE Information Theory Workshop (ITW), pp. 162-166, September
2012.
[12] Y. Wu and S. Verdu, "Wistenhausen's counterexample: A view from optimal transport theory,"
in Proceedings of 2011 50th IEEE Conference on Decision and Control and European Control
Conference (CDC-ECC), pp. 5732-5737, December 2011.
113
[13] D. Liberzon, "Calculus of variations and optimal control theory: A concise introduction."
http://liberzon.csl.illinois.edu/teaching/cvoc/node1.html, 2010. [Online; accessed 27-August2014].
[14] J. A. Gubner, "Gaussian quadrature and the eigenvalue problem," 2009.
[15] G. H. Golub and J. H. Welsch, "Calculation of Gauss quadrature rules," Math. Comp., vol. 23,
no. 106, pp. 221-230, 1969.
[16] F. Santambrogio, "Introduction to optimal transport theory." Lecture notes, June 2009.
[17] C. Villani, Optimal Transport, Old and New. Springer, 2009.
[18] Y. Wu and S. Verdu, "Functional properties of minimum mean square error and mutual information," IEEE Transactions on Information Theory, vol. 58, no. 3, pp. 1289-1301, 2012.
[19] Y. Wu and S. Verdu, "Functional properties of MMSE."
[20] D. Guo, Y. Wu, S. Shamai, and S. Verdu, "Estimation in Gaussian noise: Properties of the
minimum mean-square error," IEEE Transactions on Information Theory, vol. 57, pp. 23712385, April 2011.
[21] A. van Niekerk and F. D. van Niekerk, "A Galerkin method with rational basis functions for
Burgers equation," Computers Math. Applic., vol. 20, no. 2, pp. 45-51, 1990.
[22] MATLAB, Version 8.4.0.150421 (R2014b). Natick, Massachusetts: The MathWorks Inc., 2014.
114