From: AAAI Technical Report FS-92-01. Copyright © 1992, AAAI (www.aaai.org). All rights reserved. APPLICATIONS OF INDUCTIVE INFERENCE RESULTS TO COMPILER (AND OTHER SOFI’WARE) OPTIMIZATION A Research Perspective ~ Leona F. Fass We have examined specific theoretical issues of computation over the years and have found our criteria for "properties of a problemsolution" have evolved as our work has progressed. This changing perspective on computational techniques and solution models is reflective of the theoretical and technological advances we have observed. First we were satisfied if solutions to simple problemswere computableand correct. Next we sought solutions that were elegant and generally applicable to an extended problem class. Morerecently, we have considered feasibility issues: finding problemsolutions obtainable practicably, relative to real constraints of space and time. The general area of programprocessing is an aspect of computation we have examined over the long range, and have observed to have improved vastly, with respect to correctness, elegance and feasibility. Still, we feel such computational processing might be further advancedthrough the application of artificial intelligence techniques. In particular, we believe theoretically oriented AI results we have obtained have potential utility in such important applications as compiler (and other software) optimization. Muchof our research has concerned the development of techniques for representing, learning and analyticallyprocessing languages. With respect to these issues, the scope of our interest has included formal, natural and programminglanguages, and relationships amongthem. We began work along these lines, particularly in formal and programming languages, early in our student days, designing an ALGOL (’context-free" language) interpreter for a simulated pushdown-memorymachine. [This was at Penn, on the List Processing Research Techniques or, "GrowingMachine"Project, with faculty John W. Carr, III, and Harry J. Gray.] The machine, simulated within a host machine, couMprocess and successfully execute simple programs. It also "learned" in its way, for it had an expandable operating system that users might dynamically modify. In this sense, one of the things it learned was how to syntactically analyze and execute instructions written in "context-free" ALGOL. It was, however, severely limited This research was partially funded by an FY87grant for "Applications of Grammatical Minimalization to Compiler Design’, from the NPSFoundation Research Program. 40 by the space and time constraints of its technological era (mid-to-late 1960’s). Almost two decades--and some computer generations-later, we found ourselves looking into similar issues, in connection with our post-doctoral theoretically-oriented AI work. We had developed a technique for inductive (syntactic) learning of context-free languages and sought improve upon our original results. As we determined "learnable" solutions and examined their properties, we realized that there well could be applications to the processing of "context-free" programminglanguages, and to software systems, in use today. It is such applications of "theoretical AI" results to real-world programminglanguage science that we nowbriefly describe. Weoriginally solved the syntactic learning problem for any context-free language. (An "elegant" solution, applicable to an extended problem class.) Wedid so by establishing that, from a finite suitably represented language sample, it was possible to inductively construct a characterizing recognitive device for the entire language. Wethen showed that a corresponding generative grammar could be inferred by similar means. Due to the structural properties of context-free languages as a class (possible inherent ambiguity, central recursion in strings, nondeterministic pushdownprocessing in the general case) representation of the language becamea critical factor in developing a successful inductive inference technique. Based on the suggestion2 of Leon Levy and Aravind Joshi, we represented sentences of a context-free language in a tree-like structured fashion, conveying phrase groups (Levy and Joshi called these "skeletons~ of derivation trees). As recognitive processors, we used a class of tree automata, skeletal automata, that they first described. Consideringthe structured languages and their processors, we discovered inductively inferable syntactic models. Comparingour inferred recognitive device with others for the structured language, we determined ours to be the minimal, deterministic acceptor. We found the 2 Leon S. Levy and Aravind K. Joshi, "Skeletal Structural Descriptions’, Information and Control, Vol. 39 (1978), pp. 192-211. corresponding "canonical" grammar to be the AAAI minimal inference work. From: AAAI Technical Report FS-92-01. Copyright © 1992, (www.aaai.org). All rights reserved.These deterministic structurally-equivalent grammar.In fact, we often found that our inference techniques were "minimalization techniques" that might produce "better" grammarsor recognizers than any we had otherwise guessed or obtained. Bill Gasarch then analyzed our techniques and showed that the minimalization could be efficiently completed in polynomial time. As the result of these findings, Georgetowncolleague John Cherniavsky, and then, Naval Postgraduate School (NPS) colleague Bruce MacLennan, encouraged us investigate applications of our AI-producedtheory in such practical environments as compiler design. Our theory might provide the basis for computationalprocessing that is elegant, feasible and correct. It is still the case that manyprogramming languages in use are syntactically-describable as context-free. Thus we would, indeed, expect our theory of minimalizedrecoguitive analysis to have utility in the design of modemcompilers. Innovations in compiler construction, simulating theoretical pushdown processors for syntactic analysis, are still severely limited by real machine restrictions on memoryspace and time. Wehave begun to examinerelationships between our inference/generative/recognitive theory and the context-free language structures defined by LR(k) and LL(k) grammars. While we cannot, of course, turn an arbitrary language into one that is LR(k) or LLOc),we conjecture we can minimize grammarsof those that are so described. This may reduce table size required by compilers using such syntax-based analysis techniques. While we do not, realistically, expect to infer a compiler, we believe this application of our inference results will benefit compiler writers, whomay produce theoretically-sound processors that are ~optimized" to be morespace and time-efficient. We also believe we might productively apply our theoretical results to the attribute grammarsthat attach semantics to the syntactic constructs (subtrees!) determined by a program parse. Success in this area (with minimized attribute grammars)could result in feasible incremental compiling. This would provide immediate feedback for error correction, and so, wouldbe beneficial in generalized software design. It should also benefit programminglanguage users, by leading to faster software development. (Weparticularly thank John Cherniavskyfor introducing us to this area of inquiry.) Theseare just two of the applications of our "theoretical AI" inference results to real-world programminglanguage science. Wehave also determined some theoretically-based software testing results that have comeout of our inductive 41 mainly involve determining correctness of processing, and confirm results other M) investigators have obtained through different (’non-AI theoretical orientations. A selected list of recent relevant work follows. Most of the items cited contain back-pointers to earlier related research. Selected References L.F. Fass, "Leamability of CFLs: Inferring Syntactic Models from Constituent Structure n, presented at 1987 Linguistic Institute, Meetingon the Theoretical Interactions of Linguistics and Logic, Stanford, July 1987. Abstracted inJ. Symbolic Logic, Vol. 53, No. 4 (December,1988), pp. 1277-1278. Research Note appears in SIGARTSpecial Issue on KnowledgeAcquisition (April 1989), pp. 175-176. L.F. Fass and W.I. Gasarch, "ComplexityIssues in Skeletal Automata", preliminary version March 1987, appears as ComputerScience Series, TR2035,University of Maryland, College Park (1988). L.F. Fass, "On LanguageInference, Testing and Parsing", presented at the 1989 Linguistic Institute, Meeting on the Theoretical Interactions of Linguistics and Logic, University of Arizona, Tucson, July 1989. L.F. Fass, ~A Minimal Deterministic Acceptor for Any (Structured) Context-Free Language", preliminary version (1987). Extended version presented at the 1990-91 Annual Meeting of the Linguistic Society of America, Chicago, January 1991; abstracted in Meeting Handbook,p. 17. L.F. Fass, "A Common Basis for Inductive Inference and Testing n, Proceedings of the Seventh Pacific Northwest Software Quality Conference, Portland, Oregon (September 1989), pp. 183-200. L.F. Fass, "An Algebraic Approachto Determining Correct Software", presented at 40th Anniversary Meeting of the Society for Industrial and Applied Mathematics, Los Angeles, July 1992. Abstracted in Final Program, p. A56. L.F. Fass, MSoftware Design as a Problem in Learning Theory (A Research Overview)", Notes of AAA1-92, Workshop on Automating Software Design, San Jose, July 1992, pp. 48-49. L.F. Fass, ~Inference and Testing: When ’Prior ~ is Essential to Learning", Notes of AAM-92, Knowledge Workshopon Constraining Learning with Prior Knowledge, San Jose, July 1992, pp. 88-92. Dr. Fass may be reached at mailing address: P.O. Box 2914; Carmel CA93921.