From: AAAI Technical Report FS-92-01. Copyright © 1992, AAAI (www.aaai.org). All rights reserved.
APPLICATIONS OF INDUCTIVE INFERENCE RESULTS
TO COMPILER (AND OTHER SOFI’WARE) OPTIMIZATION
A Research Perspective
~
Leona F. Fass
We have examined specific theoretical
issues of
computation over the years and have found our criteria for
"properties of a problemsolution" have evolved as our work
has progressed. This changing perspective on computational
techniques and solution models is reflective
of the
theoretical and technological advances we have observed.
First we were satisfied if solutions to simple problemswere
computableand correct. Next we sought solutions that were
elegant and generally applicable to an extended problem
class. Morerecently, we have considered feasibility issues:
finding problemsolutions obtainable practicably, relative to
real constraints of space and time.
The general area of programprocessing is an aspect of
computation we have examined over the long range, and
have observed to have improved vastly, with respect to
correctness, elegance and feasibility. Still, we feel such
computational processing might be further advancedthrough
the application of artificial intelligence techniques. In
particular, we believe theoretically oriented AI results we
have obtained have potential utility in such important
applications as compiler (and other software) optimization.
Muchof our research has concerned the development of
techniques for representing, learning and analyticallyprocessing languages. With respect to these issues, the
scope of our interest has included formal, natural and
programminglanguages, and relationships amongthem. We
began work along these lines, particularly in formal and
programming languages, early in our student days,
designing an ALGOL
(’context-free" language) interpreter
for a simulated pushdown-memorymachine. [This was at
Penn, on the List Processing Research Techniques or,
"GrowingMachine"Project, with faculty John W. Carr, III,
and Harry J. Gray.] The machine, simulated within a host
machine, couMprocess and successfully execute simple
programs. It also "learned" in its way, for it had an
expandable operating system that users might dynamically
modify. In this sense, one of the things it learned was how
to syntactically analyze and execute instructions written in
"context-free" ALGOL.
It was, however, severely limited
This research was partially funded by an FY87grant for
"Applications of Grammatical Minimalization to Compiler
Design’, from the NPSFoundation Research Program.
40
by the space and time constraints of its technological era
(mid-to-late 1960’s).
Almost two decades--and some computer generations-later, we found ourselves looking into similar issues, in
connection with our post-doctoral theoretically-oriented AI
work. We had developed a technique for inductive
(syntactic) learning of context-free languages and sought
improve upon our original results. As we determined
"learnable" solutions and examined their properties, we
realized that there well could be applications to the
processing of "context-free" programminglanguages, and to
software systems, in use today. It is such applications of
"theoretical AI" results to real-world programminglanguage
science that we nowbriefly describe.
Weoriginally solved the syntactic learning problem for
any context-free language.
(An "elegant" solution,
applicable to an extended problem class.) Wedid so by
establishing that, from a finite suitably represented language
sample, it was possible to inductively construct a
characterizing recognitive device for the entire language.
Wethen showed that a corresponding generative grammar
could be inferred by similar means. Due to the structural
properties of context-free languages as a class (possible
inherent ambiguity, central recursion in strings, nondeterministic pushdownprocessing in the general case)
representation of the language becamea critical factor in
developing a successful inductive inference technique.
Based on the suggestion2 of Leon Levy and Aravind Joshi,
we represented sentences of a context-free language in a
tree-like structured fashion, conveying phrase groups (Levy
and Joshi called these "skeletons~ of derivation trees). As
recognitive processors, we used a class of tree automata,
skeletal automata, that they first described. Consideringthe
structured languages and their processors, we discovered
inductively inferable syntactic models.
Comparingour inferred recognitive device with others for
the structured language, we determined ours to be the
minimal, deterministic
acceptor.
We found the
2 Leon S. Levy and Aravind K. Joshi, "Skeletal Structural
Descriptions’, Information and Control, Vol. 39 (1978),
pp. 192-211.
corresponding
"canonical"
grammar
to be
the AAAI
minimal
inference
work.
From:
AAAI Technical
Report FS-92-01.
Copyright
© 1992,
(www.aaai.org).
All rights
reserved.These
deterministic structurally-equivalent grammar.In fact, we
often found that our inference
techniques
were
"minimalization techniques" that might produce "better"
grammarsor recognizers than any we had otherwise guessed
or obtained. Bill Gasarch then analyzed our techniques and
showed that the minimalization could be efficiently
completed in polynomial time.
As the result of these findings, Georgetowncolleague
John Cherniavsky, and then, Naval Postgraduate School
(NPS) colleague Bruce MacLennan, encouraged us
investigate applications of our AI-producedtheory in such
practical environments as compiler design. Our theory
might provide the basis for computationalprocessing that is
elegant, feasible and correct.
It is still the case that manyprogramming
languages in
use are syntactically-describable as context-free. Thus we
would, indeed, expect our theory of minimalizedrecoguitive
analysis to have utility in the design of modemcompilers.
Innovations in compiler construction, simulating theoretical
pushdown
processors for syntactic analysis, are still severely
limited by real machine restrictions on memoryspace and
time. Wehave begun to examinerelationships between our
inference/generative/recognitive theory and the context-free
language structures defined by LR(k) and LL(k) grammars.
While we cannot, of course, turn an arbitrary language into
one that is LR(k) or LLOc),we conjecture we can minimize
grammarsof those that are so described. This may reduce
table size required by compilers using such syntax-based
analysis techniques. While we do not, realistically, expect
to infer a compiler, we believe this application of our
inference results will benefit compiler writers, whomay
produce theoretically-sound processors that are ~optimized"
to be morespace and time-efficient.
We also believe we might productively apply our
theoretical results to the attribute grammarsthat attach
semantics to the syntactic constructs (subtrees!) determined
by a program parse. Success in this area (with minimized
attribute grammars)could result in feasible incremental
compiling. This would provide immediate feedback for
error correction, and so, wouldbe beneficial in generalized
software design. It should also benefit programminglanguage users, by leading to faster software development.
(Weparticularly thank John Cherniavskyfor introducing us
to this area of inquiry.)
Theseare just two of the applications of our "theoretical
AI" inference results to real-world programminglanguage
science. Wehave also determined some theoretically-based
software testing results that have comeout of our inductive
41
mainly involve determining
correctness of processing, and confirm results other
M)
investigators have obtained through different (’non-AI
theoretical orientations.
A selected list of recent relevant work follows. Most of
the items cited contain back-pointers to earlier related
research.
Selected References
L.F. Fass, "Leamability of CFLs: Inferring Syntactic
Models from Constituent Structure n, presented at 1987
Linguistic Institute, Meetingon the Theoretical Interactions
of Linguistics and Logic, Stanford, July 1987. Abstracted
inJ. Symbolic Logic, Vol. 53, No. 4 (December,1988), pp.
1277-1278. Research Note appears in SIGARTSpecial Issue
on KnowledgeAcquisition (April 1989), pp. 175-176.
L.F. Fass and W.I. Gasarch, "ComplexityIssues in Skeletal
Automata", preliminary version March 1987, appears as
ComputerScience Series, TR2035,University of Maryland,
College Park (1988).
L.F. Fass, "On LanguageInference, Testing and Parsing",
presented at the 1989 Linguistic Institute, Meeting on the
Theoretical Interactions of Linguistics and Logic, University
of Arizona, Tucson, July 1989.
L.F. Fass, ~A Minimal Deterministic Acceptor for Any
(Structured) Context-Free Language", preliminary version
(1987). Extended version presented at the 1990-91 Annual
Meeting of the Linguistic Society of America, Chicago,
January 1991; abstracted in Meeting Handbook,p. 17.
L.F. Fass, "A Common
Basis for Inductive Inference and
Testing n, Proceedings of the Seventh Pacific Northwest
Software Quality Conference, Portland, Oregon (September
1989), pp. 183-200.
L.F. Fass, "An Algebraic Approachto Determining Correct
Software", presented at 40th Anniversary Meeting of the
Society for Industrial and Applied Mathematics, Los
Angeles, July 1992. Abstracted in Final Program, p. A56.
L.F. Fass, MSoftware Design as a Problem in Learning
Theory (A Research Overview)", Notes of AAA1-92,
Workshop on Automating Software Design, San Jose, July
1992, pp. 48-49.
L.F. Fass, ~Inference and Testing:
When ’Prior
~ is Essential to Learning", Notes of AAM-92,
Knowledge
Workshopon Constraining Learning with Prior Knowledge,
San Jose, July 1992, pp. 88-92.
Dr. Fass may be reached at mailing address: P.O. Box
2914; Carmel CA93921.