The NFL & HMI The NFL and the HMI: The No Free Lunch Theorem and the HumanMachine Interface By Yu-Chi Ho Harvard University Premise and Introduction Many scientific fields of study have postulated some sort of impossibility theorem. In mathematics, for example, Gödel’s proof roughly states that facts always exist which cannot be proved or disproved in any mathematical system. In economics, Arrow’s Impossibility Theorem on social choice precludes the ideal of a perfect democracy. The No Free Lunch (NFL) theorem [1], though far less celebrated and much more recent, tells us that without any structural assumptions on an optimization problem, no algorithm can perform better on average than blind search. Although for most such impossibility theorems, the proofs are lengthy, difficult, and often not at all intuitive, we can get a feel for the NFL theorem by considering the proverbial “needle in a haystack” problem. Clearly, in this instance, no algorithm has any better chance of finding the optimum than blind search. From a philosophical viewpoint, the belief that you cannot get something for nothing is reflected by the folk theorem that Robustness Efficiency or Generality Depth = Constant More sophisticated and additional insights can be gleaned by considering the following system theoretic facts: 1. Optimal estimation and Kalman filtering: The essence of optimal filtering is to subtract out all known structural information. What is left represents the residual independent, identically distributed noise. No method operating on this residual can do any better than simple averaging. The innovation sequence is just a sophisticated way of absorbing the truly new information and integrating out the i.i.d. noise at the same time. 2. Design of the TCP/IP for the Internet: The Internet Protocol (IP) is extremely simple in concept: basically, a packet is a packet is a packet. The Internet does not care whether it is voice, video, data, or whatever. In the 20 years since its invention, despite the incredible evolution of the communication network we know today as the World Wide Web, which no one including the original designer of the IP could have foreseen, it has served us tremendously well. The main reason is that the IP is not specially tuned to any particular way we intend to use the network and makes no structural assumptions about the message type and nature. Robustness is gained via its simplicity. The second acronym of the title, HMI, stands for human-machine interface. Traditionally, this term has had the narrow meaning of facilitating communication between humans and computing machines, such as via the graphical user interface (GUI) or head-up displays. But as technology advances, this communication interface will also evolve. Certainly, in the future we can see voice, natural, and fuzzy language inputs and outputs. Here, however, we are using the term in an even broader sense: to demarcate the dividing line between what humans do and what machines do in problem solving or optimization. As fuzzy logic proponents frequently remind us, “there are things human beings do well versus what machines do well.” This dividing line is also a moving boundary as technology and our understanding of the solution process progress. Solving a difficult problem often requires an appropriate division of labor between humans and machines. Two examples illustrate this aspect of the human-machine interface well. The first is the celebrated 1997 match between the chess master Kasparov and the machine Deep Blue. Popular press tends to herald this as a triumph of machine over human intelligence, but the truth is that Deep Blue’s win is actually due to an optimal allocation of the YCHo 1 2/6/16 The NFL & HMI skills of the assisting grand masters in nightly training and tuning of the vast computing power and sophisticated programs in the machine. It is simply a demonstration that the combined human and computational intelligence, properly allocated, beat human intelligence alone. The other example is the design of the Boeing 777 airplane during which no paper engineering drawings were used, permitting coordination of design changes and concurrent engineering on a scale never before possible. The resultant benefit is another illustration of using machines to do tasks they do best, such as tedious and laborious tracking and coordination of details. The twin purposes of this article are to explore the implications of NFL and to address the proper allocation of natural and computational intelligence in optimization problem solving. Fundamental Computational Limits Two well-known fundamental limits exist for optimization computation. The first is the 1/t limit for confidence interval decrease in stochastic estimation. For example, if we take i.i.d. noisy samples of a constant x in the form z = x + noise, there is no better estimate of x than the average of the noisy samples. Furthermore, the error of the optimal estimate cannot decrease faster than 1/t, where t is the number of samples taken. In other words, for each order of magnitude increase in the accuracy of an estimate for performance, two orders of magnitude increase in computing cost must be paid in the best of circumstances. In cases where the system performance (x in the above example) must be evaluated via expensive Monte Carlo simulation, this is often an infeasible burden. The second limit is the well-known limit of combinatorial explosion, which we denote as the NP-hard limit or, in the context of system theory, the curse of dimensionality. The search space for most problems becomes so large so quickly that no foreseeable progress on computing hardware can keep pace. The simplest example is to consider arbitrary multivariable functions: one variable function is a table; two variables, a book of tables; three, a library of books; four, a universe of libraries; and so on, to quickly overtake any astronomical numbers. Implications The implication of the above discussion is that if you have a general optimization involving uncertainty and very little prior knowledge, the situation is rather hopeless. Due to the NFL theorem, you cannot do any better than a blind search. Each blind search evaluation will be very expensive, with no hope of future improvement, theoretical or otherwise. And the number of performance searches required to get anywhere is simply too large. Neither time nor theoretical or technological progress are on your side. No grand optimization algorithm to end all algorithms is possible. The flip side of this conclusion is that structure or specialization and heuristics are important. Understanding structure and finding good heuristics requires modeling and learning. This is the raison d’être for continuing theoretical research on modeling and more specialized algorithms. A perfect example in the system theory context is the solution of the LQG problem, which arguably represents optimal control theory in its finest hour. From the theoretical structures erected since Gauss and Newton concerning calculus and least squares to the more modern theory of probability, dynamic programming, and feedback control, we have managed to reduce a very difficult optimization problem in real-time dynamic estimation and control to the integration of a pair of matrix Riccati differential equations off line. At the same time, this LQG solution has the virtue of being sufficiently general to be practically useful. But it is a solution involving a large amount of structure and specialization. According to Witsenhausen, you breathe on it and the whole thing collapses [2].1 The solution of the LQG also illustrates another example of HMI, the other topic of this article. Integration of ordinary differential equations of moderate size, say 1000 variables, is a reasonably straightforward and feasible task for modern computers but not for humans or even teams of humans. Ginsberg characterizes this demarcation as letting computers do moderately brute-force tasks and relegating humans to things more creative [3]. 1 Of course, Witsenhausen had in mind structural rather than parametric perturbations. YCHo 2 2/6/16 The NFL & HMI Of course, theories and structures such as LQG or linear programming come along once a generation, if not less frequently. While we continue such efforts in developing new theoretical structures, what do we do when confronted with impossible problems now? This is where soft computing, soft optimization, and automated learning come in. We temper our goals and at the same time try to extend the range of what computers do well—moving the boundary of HMI. Soft Optimization and Learning One illustration of this concept is the substitution of “good enough with high probability” for “best for sure” in ordinal optimization [4]. This idea of goal softening is also implicit in many tools of computational intelligence, such as genetic algorithms and fuzzy logic. What softening of the goal buys is an easing of the computational burden. It is much easier to get something within the top 5% than it is to get the best. To illustrate this quantitatively, suppose you have a search space of size |= 1 billion and take N = 1000 random samples. What is the probability that you will have at least one sample in the top n? The answer is 1 – (1 – n/||)N, which for the values chosen equals 0.01 for n = 10,000, or the top 0.001%, but decreases to10 –6 for n = 1. Thus, if you have a success probability of 0.01, approximately 100 trials are required to guarantee success, but a four-order-of-magnitude increase in trials is required if you insist on the best. Ordinal optimization also takes the view that for many problems, order or rank information rather than a specific value estimate is all that is needed. Of course, order is much easier to determine computationally than value: consider the intuitive example of determining which of two boxes held in two hands is heavier versus identifying how much heavier one is than the other. At the same time, topics such as data mining and neural network training represent attempts at increasing the intelligence of machines and moving the demarcation line of HMI. In this vein, automated learning plays a significant role. All human knowledge discoveries, advances, and propagation since time immemorial are based on learning. From a practical viewpoint, automated learning, be it supervised or reinforced, Q-learning or pattern recognition, aims at removing the drudgery from the human learning process (i.e., rote learning and memorization or the processing and summarizing of large volumes of data). Success depends on a carefully structured situation based on human understanding, modeling, and creation. The computer then does the brute-force processing within this framework subject to the NP-hard limit mentioned above (i.e., moderate but not combinatorially large brute-force tasks). Many success stories have been recorded. Additional insights on HMI can be gleaned by contrasting Bayesian learning with the PAC (Probably Approximately Correct) learning currently popular in artificial intelligence [5]. Bayesian learning imposes a great deal of probabilistic structure but, when implemented with the right data, is computationally very effective. PAC learning, on the other hand, assumes very little and in principle is totally general. Everything can be left to the machine. But in practice, “such solutions are little solved,” in the words of Poincaré. Its main contribution is not in practicality, but in laying bare the theoretical worst-case limit. Both are the creation of the human mind. We will always be needed to create models and to understand limits. The day when machines will take over completely is far in the future. One last example of human-machine partnership can be illustrated by heuristics. A heuristic solution born out of human experience, insight, and ingenuity can often be good enough but is seldom optimal. Suppose we consider randomly perturbing such a heuristic solution and let the probability that the perturbed solution actually constitutes a better solution be “epsilon.” The probability that a better solution can then be found via N repeated random perturbations is simply success probability 1 (1 )N N For = 0.05, N = 20 can almost guarantee success. For many combinatorial optimization problems, the evaluations of the performance criteria are often only moderately cost-intensive on a computer. Such an amalgam of human ingenuity and experience with the computing power of a machine can yield dramatic improvements. We denote such an approach as super- or metaheuristics [6-7]. The Athans–Zadeh Debate Last, we illustrate NFL and HMI by reviewing the debate between Athans and Zadeh on fuzzy versus traditional control at the 1998 Conference on Decision and Control this past December [8]. Following is my own interpretation of the essence of the debate. YCHo 3 2/6/16 The NFL & HMI Athans began by noting that almost all the applications of “fuzzy logic” to traditional feedback control problems amount to nothing more than interpolating or extrapolating among well-known controller designs implemented with traditional techniques. Although the interpolation/extrapolation is couched in terms of the membership functions of fuzzy logic, this is a matter of semantics and not an advance in theory or technique. Zadeh did not actually dispute this, but pointed out that traditional tools are applicable only to well-structured problems such as feedback control of linear or piecewise linear dynamic systems where our knowledge about the problems is deep and extensive. But the world is full of control problems beyond this narrow confine. Most such problems are poorly understood and described only in natural language terms. For these problems, fuzzy logic can play a role either by quantifying imprecise natural language and/or by converting human experience to systematic but fuzzy if-then rules. There is no argument that for these high-level task-oriented problems, such as driving and parking a car in congested city traffic, human ability has no equal. Fuzzy logic control attempts to duplicate human skills in such problems. 2 In this sense, and in terms of our earlier discussion, there is actually more agreement than disagreement between the two positions: We should continue our research in modeling and techniques so that they become applicable to a wider problem domain. But in the meantime, when faced with daunting tasks beyond the reach of current techniques, fuzzy logic and other soft computing tools may represent a viable alternative. Yes, fuzzy logic often uses no models because there aren’t any. Instead it attempts to quantify and systematize human experience. True, human experience and heuristics do not always work, but from the standpoint of dealing with hard problems via goal softening and the easing of computational burden, it is a reasonable price to pay for a less than optimal but often good enough solution. 3 From the viewpoint of the NFL theorem, the debate is simply an illustration of generality versus depth. Feedback control theory, particularly linear or piecewise linear control theory, is well defined with considerable depth, whereas fuzzy logic assumes little but claims more generality. However, being more generally applicable does not necessarily guarantee satisfactory performance. 4 Similarly, no one should expect robust adaptive control theory to be of help in high-level military command and control. As long as we must communicate in natural languages and deal with ever bigger and harder problems, the above dichotomy and gap will always exist. What matters is where we choose to place the HMI boundary to reflect scientific and technological progress and how we trade off generality and depth given the NFL theorem. This state of affairs, as well as the fundamental limits mentioned earlier, may suggest a very pessimistic view of the future of optimization and decision making. Add to that the age-old insight that talent and creativity count less than 10% in the real world and one is tempted to subscribe to the second law of thermodynamics view of life: You cannot win, you cannot break even, you cannot even quit the game. On the other hand, there is always the more optimistic “glass is half full” view, which regards every advance in human knowledge as a local victory in the cosmic scheme of things.5 More and more, drudgery work of the kind that can only be done by educated and skilled humans will be taken over by machines, leaving humans to do even more sophisticated and intelligent work of the creative kind. Human intelligence as the higher level partner with that of the machine will always be needed. Once we recognize this, we will have far less debate and recrimination and can get on with what we each do best. Acknowledgments The work reported in the paper was supported in part by NSF grant EEC 95-27422, Army contract DAAH04-940148, AFOSR contract 49620-98-1-0387, and ONR contract N00014-98-1-0720. The author thanks Dr. T.W. Lau for interesting discussions. References 2 To be fair to traditional control theory, fuzzy logic, despite its professed goal, has not accomplished the tasks of parking or driving either. 3 This is particularly true if these soft computing approaches are quantifiable and not merely intuitive or qualitative. 4 Blind search is universally applicable but not always satisfactory or efficient. See also the remarks concerning PAC learning above. 5 After all, Godel’s proof or Arrow’s impossibility theorem did not stop continuing research or progress in mathematics or economics. YCHo 4 2/6/16 The NFL & HMI 1. 2. 3. 4. 5. 6. 7. 8. W.G. Macready and D.H. Wolpert, “The No Free Lunch theorems,” IEEE Trans. on Evolutionary Computing, vol.1 , no. 1, pp. 67-82, April 1997 2. H. Witsenhausen, “Separation of estimation and control for discrete time systems,” Proceedings of IEEE, vol. 59, pp. 1557-1566, 1971. 3. M.L. Ginsberg, “Do computers need common sense?” Proc. Knowledge Research & Reasoning Conference, 1996. Y.C. Ho, “An explanation of ordinal optimization—soft optimization for hard problems,” Information Sciences, vol. 113, pp. 169-192, January. 1999. D. McAllester, Some PAC-Bayesian Theorems, Technical Report, AT&T Labs-Research, 1998. T.W.E. Lau and Y.C. Ho, “Super-heuristics and its application to combinatorial optimization problems,” submitted to the Asian Journal on Control, 1999. T.A. Feo, M.G.C. Resende, and S.H. Smith, “A greedy randomized adaptive search procedure for maximum independence set,” Operation Research, vol. 42, no. 5, pp. 860-878, 1994. M. Athans and L. Zadeh, “Fuzzy versus traditional control—a debate,” Proceedings of 1998 IEEE Conference on Decision and Control, 1998. YCHo 5 2/6/16