translator

advertisement
The Euphoria Translator: Internal Structure and Optimizations
by Rob Craig,
Rapid Deployment Software
January 2007
* Introduction
The Euphoria to C Translator is a program written entirely in
Euphoria.
The job of the Translator is to convert any Euphoria program into a
set
of C source files. These C source files can then be compiled by a
C compiler to produce an executable file. Several different C/C++
compilers are supported, and executable programs (as well as shared
libraries) can be created for Windows, DOS, Linux and FreeBSD.
The main advantage that the Translator has over the Interpreter is
execution speed of the user's Euphoria program. Many decisions that
the Interpreter would make at run-time are made by the Translator at
translation time. The Translator is thus able to produce streamlined
C code, which is then further optimized by a C compiler to produce
efficient machine code. This machine code can often outperform the
Interpreter on a given Euphoria program by a factor of about 2 to
5 times.
* Front-end / Back-end Structure
The Translator uses the common Euphoria front-end. In a few places
it was convenient to add some Translator-specific code, or slightly
modify the actions of the front-end.
The meat of the Translator is in the alternate back-end that it
provides
for processing the intermediate language (IL) and symbol table
information
created by the front end. Whereas the Interpreter back-end immediately
executes each IL opcode as it reads it, the Translator instead outputs
C code for performing that IL opcode. The Binder also uses the common
front-end, with its back-end simply attaching the IL plus symbol table
to a copy of the interpreter executable file. There is also a
Euphoria back-end written in pure Euphoria (execute.e), that uses the
same common front-end.
* Ground Rules
The Translator must always translate a correct Euphoria program into
a correct C program. The Translator can assume that a program does
not have run-time errors (and in the interests of speed, it rarely
checks
for run-time errors). This allows it to deduce additional information.
For example, if it sees:
integer x
...
y[x] = 0
it can assume that x will not cause a subscript error, and must
therefore
be a reasonable-sized integer value at this point in the code.
As another example, if it sees:
integer x
x = y
it can assume that y has an integer value at this point in the code.
* Converting Euphoria IL Operations to C Code
When the Translator was first developed, it was a straightforward
matter
to study the C code in the Interpreter for a given IL opcode, and
output essentially the same C code.
At the very least, this eliminated the indirect jump from one opcode
to the next that takes place in the interpreter, so already the
Translated code was faster. The last C statement executed for one
opcode
was followed immediately by the first C statement for the next opcode.
Futhermore, one level of indirection in accessing the operands of an
operator could usually be removed, since the operand values are picked
up
indirectly via pointers by the Interpreter, whereas the Translated
code
could refer directly to the operands as C variables.
IL operations in the Interpreter must dynamically test the data types
of
each operand and select the appropriate code to execute. The
Translator
often has a good idea at translation time of what the possible data
types
are, and can eliminate run-time tests. In some cases it may also know
the
actual value of one or more operands, and can produce some very lean
and
efficient code in those cases.
Once this much was working, many additional Translation-time
optimizations
were devised to make the translated code even smaller and faster.
* Run-time routines
Translated code calls most of the same routines that the Interpreter
back-end calls during execution. Translated code accesses these
routines
via a C library that is linked with each translated program. Sharing
these routines saves maintenance effort and promotes compatibility
between the Interpreter and the Translator.
The Translator understands the possible data types returned by each
run-time routine, and uses that information to help stream-line the
code.
* Information Maintained at Translation time for variables and temps
- Type Information
The Translator tags each variable and temp at translation time with
its current best estimate of the type that that data will have at
run-time.
- Other Information
To further improve the quality of the emitted C code, the
Translator
also tracks other information.
range of possible integer values: min ... max
sequence length (if length is constant)
* Local vs Global Information
As it calculates the best C code to emit for each Euphoria statement,
the Translator tracks the range of possible values of each variable
or temp used in the current statement, as well as the range of
possible
values across the entire Euphoria program. For instance, a global
variable x might be known to be an integer at the current statement
in the current routine, but not known to always be an integer across
the whole program. Or perhaps we know the actual value of x, say 99,
at the current point, but across the whole program we can only be
sure that x is an integer.
* Basic Blocks
Local information is computed for each *basic block*. A basic block
is a stright-line series of Euphoria statements, with no branches
in or out. At a branch in, or out, we usually have to discard much of
our information. Euphoria does not have a GOTO statement for
branching,
but branches are created by if-statements, while-statements,
for-statements, exits etc, as well as subroutine calls, etc.
When there is no local information available for a variable, the
Translator falls back to using any available global information.
In general, it at least knows the type that was used in declaring
the variable, but it tries to do better than that where possible.
* Type information
For each piece of data in a program, the Translator keeps track
of the current, worst-case type that the data might have at run-time.
The possible types are:
TYPE_NULL - no idea, haven't seen any relevant information yet
TYPE_INTEGER - the data is in Euphoria 31-bit integer format
TYPE_DOUBLE - the data is an atom in floating-point format
TYPE_ATOM - the data is an atom, either in 31-bit integer format
or floating-point format
TYPE_SEQUENCE - the data is definitely a Euphoria sequence
TYPE_OBJECT - the data could be anything
During a pass through the IL, the global type information will tend to
become more pessimistic, for example, if during the first pass, if the
Translator sees something like:
x = 1
it will mark x as being TYPE_INTEGER locally, and later in the source
code,
when it sees:
x = 9.9
it will mark x as being TYPE_DOUBLE locally, but by the end
of the pass it will mark x as being TYPE_ATOM globally.
The function or_type(a, b) will take any two types and merge them
to return the most precise type that includes both of those types.
* Deletion of unreachable routines and statements
Deleting code that can never be executed is useful in two ways.
It saves some memory by reducing the amount of C code emitted.
It may also improve the Translator's knowledge about a variable.
For example, when translating for Windows, if we have:
atom x
if platform() = LINUX then
x = 9.9 -- this statement can be deleted (it's never reached)
else
x = 5 -- somewhere else in the program
end if
-- x must be 5
then we can safely assume that x = 9.9 will not be executed
and x will have the value 5 after this piece of code.
That might even enable the Translator to mark x as being an
integer globally throughout the program, or even as having the
value 5 globally.
In some cases, the Translator is not smart enough to delete a chunk
of code, but by issuing code with constants in it, rather than
variables,
it will allow a good C compiler to see that certain code is
unreachable.
* Use of Multiple Passes to Improve the Generated Code
During the first pass through a program, (i.e. through the IL for
the program), the Translator will not have seen all the assignments
to a variable. This may force it to make pessimistic assumptions
about the type and possible values that the variable might take on
at run-time. After a complete pass, it will have seen all assignments,
and this could allow it to assume a more restrictive type,
or narrower range of values for the variable for subsequent passes.
Now, suppose that somewhere in the source code we have:
y = x
and at the end of one pass the Translator figures out that x can
only have an integer value, then this information can be used
in determining the type of y on a second pass through the IL.
And suppose that earlier in the source we have:
z = y
then on a third pass the Translator will be able to use it's improved
estimate of y's type to more accurately determine the type of z.
In general, the more passes that the Translators makes through the IL,
the better it understands the type/value information for the variables
and the better it can produce C code. Type information can gradually
propagate and become more refined with each pass.
Currently, in c_decl.e, we have set:
global constant LAST_PASS = 7
-- number of Translator passes
Experiments showed that after, say, 3 or 4 passes the improvement
in the emitted C code is typically very small or even non-existent.
So we chose 7 passes as a compromise between the possibility of better
C code versus the boredom of the person waiting for the Translator to
finish its job. Since it is reading the IL on each pass, and not
rescanning or reparsing the source, things go fairly quickly.
* Propagation of information into subroutines
The Translator looks at all calls to a routine in an effort to
determine the most resticted type and possible values for each
parameter,
as used by the current program. For instance, you might declare
a subroutine parameter as atom, but if the Translator sees that
your program always passes an integer, it will take advantage of
that fact.
Example: If the Translator can see that all calls to a generic
library sort routine (that works on any type of data) actually
pass a sequence of *integers*, that information will be noted,
and the C code generated within the sort function will assume that the
elements to be sorted are always integers.
Example: If the Translator sees that your program always passes the
value 1 (say), as a parameter, it might be able to use that fact to
eliminate chunks of code inside the subroutine where other (non 1)
values are processed.
Thus a very general subroutine might be customized into a faster,
smaller version in C that only works for the specific program being
translated. This cannot be done when creating shared library
(.dll/.so) routines, since those routines will be called from
main programs that the Translator has not seen. The C code
for those routines must be fully general.
* Folding of constant integer expressions
The Translator has some ability to "fold" constant expressions
at translation time. For instance 8*5 would be calculated as 40,
and no C multiply operation would be issued. This is particularly
useful
when a variable, such as a subroutine parameter, is determined to
always have a known constant value at a particular place in the
source program.
* Eliminating Refs and DeRefs
Since the Translator and Interpreter share the same run-time
routines, it is necessary that they both use the same method
of reclaiming unused memory. This means that the Translator often
outputs C macros Ref() to increment the reference count, and DeRef()
to decrement the reference count on atoms and sequences.
If the Translator is certain that some data is always going to be a
Euphoria integer, it can leave out the Ref() or DeRef().
If the Translator is certain that some data will always be an atom
in C double format, or a sequence, it can output RefDS() or DeRefDS()
which are simpler, since they skip the test for integer.
If the Translator is certain that all elements of a sequence will be
integers, it will output DeRefDSi(). This macro avoids checking the
elements of a sequence for non-integers, which would in turn have
to be DeRef'd. This macro can therefore save a great deal of time
reclaiming storage for sequences of integers (the most common type
of sequence in most programs).
* Limitations to optimization
The Translator is never allowed to "cheat". A correct Euphoria program
must be translated into a completely correct C program that will work
in all cases. e.g.
integer y, z
atom x
sequence s
x = y + z -- can we assume that x will be assigned an integer value
here?
-- No! Integer overflow could occur and we must be
prepared
-- for it!
x = y + 1 -- Not here either!
y = length(s)
x = y + 1 -- OK, sequence lengths are assumed to not get too close
to
-- the maximum integer value (1.07 billion, or 4Gb of
storage),
-- so we assume that x will be assigned an integer
* Generating the emake batch file
A slightly different version of emake.bat is generated for each of the
supported C compilers. emake.bat (emake on Linux/FreeBSD) contains
commands for compiling each of the generated C source files, as well
as a command for linking all the object files with the Translator
library. In some cases, environment variables are tested to see which
C compiler is going to be used. A lot of this is in the front-end
file c_decl.e
* Multitasking
The method used by the Translator to support Euphoria's cooperative
multitasking is quite different from that used by the Interpreter.
The Interpreter is able to maintain its own set of call stacks for
the various active tasks. These stacks are just allocated blocks of
memory, and are under the full control of the Interpeter. When
task_yield()
is called, the Interpreter can easily point to a new call stack for
the new task about to get control. These stacks are softwarecontrolled
stacks, they are not the hardware stack.
Translated code is compiled by a C compiler and uses the hardware
stack that is manipulated by CPU instructions such as PUSH and POP.
When
a compiled C subroutine is entered, the CPU automatically pushes
information
onto the hardware stack, and the CPU stack register (ESP) is adjusted
accordingly. When Euphoria's task_yield() is called, the Euphoria
scheduler
must do some tricky low-level machine operations to point ESP to the
new
stack top for the new task. If this is the start of a brand new task,
some
stack space must be found and reserved for this task. Direct
manipulation
of ESP and other registers is not defined in the C language, so the
Euphoria scheduler and task_create() both have some machine code
inserted
into them, primarily for adjusting the hardware stack pointer. As
strange
as it sounds, when task_yield() "returns", it (in general) returns to
a
different place in the compiled C code than where it was called from.
It
actually "returns" to the place just after a call to task_yield(),
but from the new task to be run. This is accomplished by pointing the
hardware stack pointer to a new position prior to executing the
hardware
RET (return) instruction. Task switching is thus highly efficient,
requiring
the execution of just a few machine instructions.
* Routine Id's
To support the use of routine id's for indirect calls
(call_proc/call_func)
the Translator needs to have a stripped-down version of the symbol
table
stored in memory at run-time. It only contains information on Euphoria
routines in the user's program that might possibly be the target of a
routine id. See init-.c
* Literal values
Literal, non-integer values (e.g. 1.0, "HELLO", are initialized in the
standard Euphoria object format, in init-.c by init_literal(). To save
space, duplicate literals in a program will all point to the same
literal
in memory.
* Shared Libraries
The -dll option will cause the Translator to create a Linux/FreeBSD
shared library (.so file), or a Windows dynamic link library (.dll
file).
The library code and the main program share the same heap, but each
has its own storage cache.
* Some Other Issues
When generating a C source file, the Translator tries not to generate
too large a file, in case this causes a problem for the C compiler.
It also uses DeRef1(), a simpler, slower version of DeRef() when
it appears that a C source file is getting to be quite large.
See savespace().
Download