The NFL and the HMI: The No Free Lunch Theorem and the Human

advertisement
The NFL & HMI
The NFL and the HMI: The No Free Lunch Theorem and the HumanMachine Interface
By Yu-Chi Ho
Harvard University
Premise and Introduction
Many scientific fields of study have postulated some sort of impossibility theorem. In
mathematics, for example, Gödel’s proof roughly states that facts always exist which cannot be
proved or disproved in any mathematical system. In economics, Arrow’s Impossibility Theorem
on social choice precludes the ideal of a perfect democracy. The No Free Lunch (NFL) theorem
[1], though far less celebrated and much more recent, tells us that without any structural
assumptions on an optimization problem, no algorithm can perform better on average than blind
search. Although for most such impossibility theorems, the proofs are lengthy, difficult, and
often not at all intuitive, we can get a feel for the NFL theorem by considering the proverbial
“needle in a haystack” problem. Clearly, in this instance, no algorithm has any better chance of
finding the optimum than blind search. From a philosophical viewpoint, the belief that you
cannot get something for nothing is reflected by the folk theorem that
Robustness  Efficiency or Generality  Depth = Constant
More sophisticated and additional insights can be gleaned by considering the following system theoretic facts:
1. Optimal estimation and Kalman filtering: The essence of optimal filtering is to subtract out
all known structural information. What is left represents the residual independent, identically
distributed noise. No method operating on this residual can do any better than simple
averaging. The innovation sequence is just a sophisticated way of absorbing the truly new
information and integrating out the i.i.d. noise at the same time.
2. Design of the TCP/IP for the Internet: The Internet Protocol (IP) is extremely simple in
concept: basically, a packet is a packet is a packet. The Internet does not care whether it is
voice, video, data, or whatever. In the 20 years since its invention, despite the incredible
evolution of the communication network we know today as the World Wide Web, which no
one including the original designer of the IP could have foreseen, it has served us
tremendously well. The main reason is that the IP is not specially tuned to any particular way
we intend to use the network and makes no structural assumptions about the message type
and nature. Robustness is gained via its simplicity.
The second acronym of the title, HMI, stands for human-machine interface. Traditionally, this term has had the
narrow meaning of facilitating communication between humans and computing machines, such as via the graphical
user interface (GUI) or head-up displays. But as technology advances, this communication interface will also
evolve. Certainly, in the future we can see voice, natural, and fuzzy language inputs and outputs. Here, however, we
are using the term in an even broader sense: to demarcate the dividing line between what humans do and what
machines do in problem solving or optimization. As fuzzy logic proponents frequently remind us, “there are things
human beings do well versus what machines do well.” This dividing line is also a moving boundary as technology
and our understanding of the solution process progress. Solving a difficult problem often requires an appropriate
division of labor between humans and machines.
Two examples illustrate this aspect of the human-machine interface well. The first is the celebrated 1997 match
between the chess master Kasparov and the machine Deep Blue. Popular press tends to herald this as a triumph of
machine over human intelligence, but the truth is that Deep Blue’s win is actually due to an optimal allocation of the
YCHo
1
2/6/16
The NFL & HMI
skills of the assisting grand masters in nightly training and tuning of the vast computing power and sophisticated
programs in the machine. It is simply a demonstration that the combined human and computational intelligence,
properly allocated, beat human intelligence alone. The other example is the design of the Boeing 777 airplane during
which no paper engineering drawings were used, permitting coordination of design changes and concurrent
engineering on a scale never before possible. The resultant benefit is another illustration of using machines to do
tasks they do best, such as tedious and laborious tracking and coordination of details.
The twin purposes of this article are to explore the implications of NFL and to address the proper allocation of
natural and computational intelligence in optimization problem solving.
Fundamental Computational Limits
Two well-known fundamental limits exist for optimization computation. The first is the 1/t limit for confidence
interval decrease in stochastic estimation. For example, if we take i.i.d. noisy samples of a constant x in the form z =
x + noise, there is no better estimate of x than the average of the noisy samples. Furthermore, the error of the optimal
estimate cannot decrease faster than 1/t, where t is the number of samples taken. In other words, for each order of
magnitude increase in the accuracy of an estimate for performance, two orders of magnitude increase in computing
cost must be paid in the best of circumstances. In cases where the system performance (x in the above example)
must be evaluated via expensive Monte Carlo simulation, this is often an infeasible burden. The second limit is the
well-known limit of combinatorial explosion, which we denote as the NP-hard limit or, in the context of system
theory, the curse of dimensionality. The search space for most problems becomes so large so quickly that no
foreseeable progress on computing hardware can keep pace. The simplest example is to consider arbitrary
multivariable functions: one variable function is a table; two variables, a book of tables; three, a library of books;
four, a universe of libraries; and so on, to quickly overtake any astronomical numbers.
Implications
The implication of the above discussion is that if you have a general optimization involving uncertainty and very
little prior knowledge, the situation is rather hopeless. Due to the NFL theorem, you cannot do any better than a
blind search. Each blind search evaluation will be very expensive, with no hope of future improvement, theoretical
or otherwise. And the number of performance searches required to get anywhere is simply too large. Neither time
nor theoretical or technological progress are on your side. No grand optimization algorithm to end all algorithms is
possible.
The flip side of this conclusion is that structure or specialization and heuristics are important. Understanding
structure and finding good heuristics requires modeling and learning. This is the raison d’être for continuing
theoretical research on modeling and more specialized algorithms. A perfect example in the system theory context is
the solution of the LQG problem, which arguably represents optimal control theory in its finest hour. From the
theoretical structures erected since Gauss and Newton concerning calculus and least squares to the more modern
theory of probability, dynamic programming, and feedback control, we have managed to reduce a very difficult
optimization problem in real-time dynamic estimation and control to the integration of a pair of matrix Riccati
differential equations off line. At the same time, this LQG solution has the virtue of being sufficiently general to be
practically useful. But it is a solution involving a large amount of structure and specialization. According to
Witsenhausen, you breathe on it and the whole thing collapses [2].1 The solution of the LQG also illustrates another
example of HMI, the other topic of this article. Integration of ordinary differential equations of moderate size, say
1000 variables, is a reasonably straightforward and feasible task for modern computers but not for humans or even
teams of humans. Ginsberg characterizes this demarcation as letting computers do moderately brute-force tasks and
relegating humans to things more creative [3].
1
Of course, Witsenhausen had in mind structural rather than parametric perturbations.
YCHo
2
2/6/16
The NFL & HMI
Of course, theories and structures such as LQG or linear programming come along once a generation, if not less
frequently. While we continue such efforts in developing new theoretical structures, what do we do when confronted
with impossible problems now?
This is where soft computing, soft optimization, and automated learning come in. We temper our goals and at the
same time try to extend the range of what computers do well—moving the boundary of HMI.
Soft Optimization and Learning
One illustration of this concept is the substitution of “good enough with high probability” for “best for sure” in
ordinal optimization [4]. This idea of goal softening is also implicit in many tools of computational intelligence,
such as genetic algorithms and fuzzy logic. What softening of the goal buys is an easing of the computational
burden. It is much easier to get something within the top 5% than it is to get the best. To illustrate this quantitatively,
suppose you have a search space of size |= 1 billion and take N = 1000 random samples. What is the probability
that you will have at least one sample in the top n? The answer is 1 – (1 – n/||)N, which for the values chosen equals
0.01 for n = 10,000, or the top 0.001%, but decreases to10 –6 for n = 1. Thus, if you have a success probability of
0.01, approximately 100 trials are required to guarantee success, but a four-order-of-magnitude increase in trials is
required if you insist on the best.
Ordinal optimization also takes the view that for many problems, order or rank information rather than a specific
value estimate is all that is needed. Of course, order is much easier to determine computationally than value:
consider the intuitive example of determining which of two boxes held in two hands is heavier versus identifying
how much heavier one is than the other.
At the same time, topics such as data mining and neural network training represent attempts at increasing the
intelligence of machines and moving the demarcation line of HMI. In this vein, automated learning plays a
significant role. All human knowledge discoveries, advances, and propagation since time immemorial are based on
learning. From a practical viewpoint, automated learning, be it supervised or reinforced, Q-learning or pattern
recognition, aims at removing the drudgery from the human learning process (i.e., rote learning and memorization or
the processing and summarizing of large volumes of data). Success depends on a carefully structured situation based
on human understanding, modeling, and creation. The computer then does the brute-force processing within this
framework subject to the NP-hard limit mentioned above (i.e., moderate but not combinatorially large brute-force
tasks). Many success stories have been recorded. Additional insights on HMI can be gleaned by contrasting
Bayesian learning with the PAC (Probably Approximately Correct) learning currently popular in artificial
intelligence [5]. Bayesian learning imposes a great deal of probabilistic structure but, when implemented with the
right data, is computationally very effective. PAC learning, on the other hand, assumes very little and in principle is
totally general. Everything can be left to the machine. But in practice, “such solutions are little solved,” in the words
of Poincaré. Its main contribution is not in practicality, but in laying bare the theoretical worst-case limit. Both are
the creation of the human mind. We will always be needed to create models and to understand limits. The day when
machines will take over completely is far in the future.
One last example of human-machine partnership can be illustrated by heuristics. A heuristic solution born out of
human experience, insight, and ingenuity can often be good enough but is seldom optimal. Suppose we consider
randomly perturbing such a heuristic solution and let the probability that the perturbed solution actually constitutes a
better solution be “epsilon.” The probability that a better solution can then be found via N repeated random
perturbations is simply
success probability  1  (1  )N  N
For = 0.05, N = 20 can almost guarantee success. For many combinatorial optimization problems, the evaluations
of the performance criteria are often only moderately cost-intensive on a computer. Such an amalgam of human
ingenuity and experience with the computing power of a machine can yield dramatic improvements. We denote such
an approach as super- or metaheuristics [6-7].
The Athans–Zadeh Debate
Last, we illustrate NFL and HMI by reviewing the debate between Athans and Zadeh on fuzzy versus traditional
control at the 1998 Conference on Decision and Control this past December [8]. Following is my own interpretation
of the essence of the debate.
YCHo
3
2/6/16
The NFL & HMI
Athans began by noting that almost all the applications of “fuzzy logic” to traditional feedback control problems
amount to nothing more than interpolating or extrapolating among well-known controller designs implemented with
traditional techniques. Although the interpolation/extrapolation is couched in terms of the membership functions of
fuzzy logic, this is a matter of semantics and not an advance in theory or technique. Zadeh did not actually dispute
this, but pointed out that traditional tools are applicable only to well-structured problems such as feedback control of
linear or piecewise linear dynamic systems where our knowledge about the problems is deep and extensive. But the
world is full of control problems beyond this narrow confine. Most such problems are poorly understood and
described only in natural language terms. For these problems, fuzzy logic can play a role either by quantifying
imprecise natural language and/or by converting human experience to systematic but fuzzy if-then rules. There is no
argument that for these high-level task-oriented problems, such as driving and parking a car in congested city traffic,
human ability has no equal. Fuzzy logic control attempts to duplicate human skills in such problems. 2 In this sense,
and in terms of our earlier discussion, there is actually more agreement than disagreement between the two
positions: We should continue our research in modeling and techniques so that they become applicable to a wider
problem domain. But in the meantime, when faced with daunting tasks beyond the reach of current techniques, fuzzy
logic and other soft computing tools may represent a viable alternative.
Yes, fuzzy logic often uses no models because there aren’t any. Instead it attempts to quantify and systematize
human experience. True, human experience and heuristics do not always work, but from the standpoint of dealing
with hard problems via goal softening and the easing of computational burden, it is a reasonable price to pay for a
less than optimal but often good enough solution. 3 From the viewpoint of the NFL theorem, the debate is simply an
illustration of generality versus depth. Feedback control theory, particularly linear or piecewise linear control theory,
is well defined with considerable depth, whereas fuzzy logic assumes little but claims more generality. However,
being more generally applicable does not necessarily guarantee satisfactory performance. 4 Similarly, no one should
expect robust adaptive control theory to be of help in high-level military command and control. As long as we must
communicate in natural languages and deal with ever bigger and harder problems, the above dichotomy and gap will
always exist. What matters is where we choose to place the HMI boundary to reflect scientific and technological
progress and how we trade off generality and depth given the NFL theorem.
This state of affairs, as well as the fundamental limits mentioned earlier, may suggest a very pessimistic view of the
future of optimization and decision making. Add to that the age-old insight that talent and creativity count less than
10% in the real world and one is tempted to subscribe to the second law of thermodynamics view of life: You cannot
win, you cannot break even, you cannot even quit the game. On the other hand, there is always the more optimistic
“glass is half full” view, which regards every advance in human knowledge as a local victory in the cosmic scheme
of things.5 More and more, drudgery work of the kind that can only be done by educated and skilled humans will be
taken over by machines, leaving humans to do even more sophisticated and intelligent work of the creative kind.
Human intelligence as the higher level partner with that of the machine will always be needed. Once we recognize
this, we will have far less debate and recrimination and can get on with what we each do best.
Acknowledgments
The work reported in the paper was supported in part by NSF grant EEC 95-27422, Army contract DAAH04-940148, AFOSR contract 49620-98-1-0387, and ONR contract N00014-98-1-0720. The author thanks Dr. T.W. Lau
for interesting discussions.
References
2
To be fair to traditional control theory, fuzzy logic, despite its professed goal, has not accomplished the tasks of
parking or driving either.
3
This is particularly true if these soft computing approaches are quantifiable and not merely intuitive or qualitative.
4
Blind search is universally applicable but not always satisfactory or efficient. See also the remarks concerning
PAC learning above.
5
After all, Godel’s proof or Arrow’s impossibility theorem did not stop continuing research or progress in
mathematics or economics.
YCHo
4
2/6/16
The NFL & HMI
1.
2.
3.
4.
5.
6.
7.
8.
W.G. Macready and D.H. Wolpert, “The No Free Lunch theorems,” IEEE Trans. on Evolutionary Computing,
vol.1 , no. 1, pp. 67-82, April 1997
2. H. Witsenhausen, “Separation of estimation and control for discrete time systems,” Proceedings of IEEE,
vol. 59, pp. 1557-1566, 1971.
3. M.L. Ginsberg, “Do computers need common sense?” Proc. Knowledge Research & Reasoning
Conference, 1996.
Y.C. Ho, “An explanation of ordinal optimization—soft optimization for hard problems,” Information Sciences,
vol. 113, pp. 169-192, January. 1999.
D. McAllester, Some PAC-Bayesian Theorems, Technical Report, AT&T Labs-Research, 1998.
T.W.E. Lau and Y.C. Ho, “Super-heuristics and its application to combinatorial optimization problems,”
submitted to the Asian Journal on Control, 1999.
T.A. Feo, M.G.C. Resende, and S.H. Smith, “A greedy randomized adaptive search procedure for maximum
independence set,” Operation Research, vol. 42, no. 5, pp. 860-878, 1994.
M. Athans and L. Zadeh, “Fuzzy versus traditional control—a debate,” Proceedings of 1998 IEEE Conference
on Decision and Control, 1998.
YCHo
5
2/6/16
Download