ln_arm_compilation

advertisement
Language Translation
Compilation vs. interpretation
Compilation diagram
Step 1: compile
Step 2: run
program
input
compiler
Compiled program
Compiled program
output
Language Translation
• compilation is translation from one language to
another, where the translated form is typically easier
to execute; a pure compiler produces language that
will be directly executed by hardware
• compilation allows one translation and then multiple
executions of the executable file (sometimes called a
binary file, or load module); thus a fairly large amount
of time can be spent by the compiler doing analysis
and optimization once, in order to produce an
executable that runs quickly each time it is run
• a compiled program typically runs fast but is harder to
debug
• compiler example: gcc
Language Translation
Interpretation diagram
single step
program
interpreter
input
output
Language Translation
• interpretation skips the intermediate step of
producing a form of the program in another language
and combines translation and execution
• interpretation starts from the source code each time
you want to run the program; it performs the same
analysis as a compiler but on a source-line-by-sourceline basis;
• a pure interpreter keeps no results from this analysis
even when encountering the same source line
repeatedly within the body of a loop (this means an
interpreted program will run faster if you make all the
variable and function names only one or two
characters in length and remove all the comments -but I don't recommend doing this!)
Language Translation
• an interpreted program typically runs slow but is easier
to debug because of better run-time error diagnostics
• interpreted languages easily support dynamic typing and
dynamic scoping of variables
• interpreter examples: shells, m4 or python on the
command line; also, formatted I/O (e.g., printf) relies
on interpretation
Language Translation
hybrid approach diagram
Step 1:
program
Step 2:
byte code
compiler
J VM
input
byte code
output
Language Translation
• Java compiler and JVM interpreter - a hybrid translation
model
− "javac" produces byte code, which is easy to interpret
− "java" interprets byte code
• provides for portability of byte code files across numerous
systems
• Perl also has a hybrid translation model
Language Translation
• other hybrid translation models include just-in-time (JIT)
compilers, which compile functions/procedures at runtime, on the first call
• terminology - source code that needs to be compiled is
typically
− called a "program" while source code that is
interpreted may be
− called a "script" (but may be called a "program" also)
Major translators in the compilation model
1. language preprocessor - textual substitution and
conditional compilation (direct execution of special
statements)
2. compiler - lexical analysis, parsing, code generation,
optimization
3. macro processor - textual substitution and conditional
assembly
4. assembler - translate symbols into addresses and
machine code
Major translators in the compilation model
5. linker - external symbol resolution plus relocation,
produces executable
6. loader - relocation according to load address, produces
memory image
(note many compilers generate object code directly - without
calling a separate assembler)
Compile steps
language
preprocessor
(cpp)
source
(.c)
compiler
(ccom).
assembly
language
(.s)
(.asm)
expanded
source
code
macro
expansion and
conditional
compilation
assembler
(as).
compile
time
macro processor
(m4)
assembly
source
w/ macros
(.m)
macro expansion and
conditional assembly
linker
(ld).
object code
(.o)
(.obj)
assembly
time
link
time.
executable
load module
(a.out)
(.exe)
library routine
static
linking
Load and run steps
command
interpreter
(shell)
search for
file name
loader
executable
(load module)
(a.out)
(.exe)
load-time linking
(early Windows)
library files
(Microsoft DLL)
fetch/decode/execute in CPU
memory . . . . . (. . . machine language. . . . .). .
Image . . . . . . . (. . . instructions and data . . .). .
run-time linking
(most systems)
shared objects
(.so)
dynamic linking
Translators (language preprocessor, e.g, for C)
− special syntax for preprocessor statements, e.g.,
#include
− macro facility, #define - trivially used for constant
substitution
− conditional compilation, #ifdef - used for versioning
#ifdef VERBOSE
printf( "value of a is %d\n", a );
#endif
where "#define VERBOSE" is included in the program
source or where you compile with "gcc -DVERBOSE"
Translators (compiler)
− lexical analysis: extracting lexical items ("tokens") from
the input
− syntactic analysis: parsing statements according to the
grammar rules of the language, generates a parse tree
− semantic analysis: determining the meaning of operations
according to the datatypes of the variables in the parse
tree, may involve adding conversion operators to the parse
tree
− intermediate code generation
Translators (compiler)
− machine-independent optimizations, e.g., loop
transformations
− machine-specific code generation and register allocation
− machine-dependent optimizations, e.g., branch delay slot
scheduling
Translators (compiler)
consider the statement a = b + 2*c; in the following code
float a,b;
extern float c;
...
a = b + 2*c;
...
lexical analysis extracts eight tokens and assigns symbolic
identifiers to entries in the symbol table
`a'
`='
`b'
`+' `2' `*'
`c'
symtab[0] `= ' symtab[1] `+' `2' `*' symtab[2]
`;'
`;'
Translators (compiler)
syntactic analysis builds a parse tree
=
/
\
symtab[0]
+
/
\
symtab[1]
*
/
`2'
\
symtab[2]
Translators (compiler)
semantic analysis determines meaning
=:float
/
\
symtab[0]:float
+:float
/
\
symtab[1]:float
*:float
/
convert_to_float
|
`2'
\
symtab[2]:float
Translators (compiler)
intermediate code generation yields something like
convert_to_float( 2, temp_float_0 )
multiply_float( temp_float_0 , symtab[2]
add_float(
symtab[1]
store_float(
temp_float_2 , symtab[0]
, temp_float_1 )
, temp_float_1 , temp_float_2 )
)
Translators (compiler)
machine-independent optimization goes ahead and either
does the conversion at compile time or strength reduces the
multiply by 2 to an add
add_float(
symtab[2]
, symtab[2]
, temp_float_1 )
add_float(
symtab[1]
, temp_float_1 , temp_float_2 )
store_float(
temp_float_2 , symtab[0]
)
from this registers would be assigned and ARM code would
be generated (including storage allocation and addressing
for variables)
Translators (macro processor)
− simple abstraction through textual substitution ("open"
subroutines)
− provides either keyword or positional parameter
substitution
− extends instruction set by synthesizing instructions
using macro definitions
Translators (macro processor)
− cost occurs at assembly time of expanding macro
definition, not at run
− time of procedure call, register save/restore, and
procedure return
− conditional assembly is same idea as #ifdef facility of C
preprocessor
Translators (macro processor)
comparison of macro with run-time functions
macro
invocation
parameters
in-line substitution
run-time call and return
untyped
typed
evaluated at each
appearance
trade-offs
function
fast but one copy of
code at each call site
evaluated once at time
of call
more overhead per call but
only one copy of code
Translators (assembler)
• translates program written in assembly language to binary
machine code
• resolving local symbolic addresses; typically this is 1-to-1
translation
Translators (assembler)
• forward references generally require 2-pass assemblers
pass 1: find symbolic labels and assign them addresses
run location counter (virtual instruction pointer)
determine instruction size
record addresses in symbol table
pass 2: use symbol table information to construct
instructions
symbolic -> binary
alternative to 2-pass approach is 1-pass with fixup (i.e.,
backpatching)
other assembler facilities include data layout directives
(pseudo-ops)
Translators (linker)
separate assembly or compilation means the assembler does
not know all the addresses, thus the assembler produces only
partially-resolved object files
linker combines separate object files into a single executable
− layout pieces of code & data (storage allocation based
on sizes)
− resolve external references
− perform relocation of absolute addresses
•
Translators (linker)
two pass:
1. assign code and data to memory addresses and build
symbol table from public symbols
2. use table to resolve external addresses and produce load
module
Translators (linker)
• object module file format (this is early UNIX; ELF is more
complex)
- header (includes sizes of text, data, and bss sections)
- text section (read only)
- data section (read/write)
- relocation/external symbol entries for text section
- relocation/external symbol entries for data section
- symbol table
- string table (symbol table entries index into string table)
Translators (command interpreter)
• command interpreter (shell) - a program that reads
command lines from the keyboard (or from a script file) and
either directly executes the command or searches for an
executable file having that command name and then
loadsand branches to that loaded program
Translators (loader)
• bring a program into memory in preparation for execution
• read file header to find size of pieces
• allocate memory area(s)
• read instructions and data from file into memory
• relocation - adjusting absolute addresses relative to load
point
• jump to startup code
Binding times
The assembler, linker, and loader are all programs taking
input files and producing output.
Decisions and translations made by these programs are said
to be done at "assembly time", at "link time", and at "run
time", respectively.
Actual execution (i.e., instruction interpretation by the
hardware, such as performing adds, branches, etc.) takes
place at "run time".
Binding times
• During execution, you can also talk of things happening at
specific times, such as register saving at procedure call
time.
• Dynamic linking is an example of a late decision, or "late
binding".
− It is the linking of separate procedures at either load
time or run time,
− and it typically requires that the normal (static) linker
include a simple table that names the needed routines
(for load-time linking) or include simple "stub" routines
that find and link to the shared library routines on their
first calls (for run-time linking).
Binding times
• Another form of delayed binding is "just-in-time" (JIT). This
is used in several Java compilers, where methods are not
compiled until the first call.
− Many storage allocation decisions are made at each step.
For example, offsets are assigned to labels at assembly
time, under the assumption that
− any absolute addresses will be updated by the linker and
loader later.
(When we later study virtual memory, we will see that it is also an example
of late binding - specifically one where physical memory allocation
decisions that might be made by a traditional loader are instead deferred to
run time and made by the operating system.)
other programming tools
other programming tools / components of a program
development environment
editors
(e.g., vim, gedit, emacs)
beautifiers
(e.g., indent)
project control
(e.g., make)
version control
(e.g., sccs)
GUI toolkit
(e.g., widget library)
test coverage
(e.g., gcov)
debuggers
(e.g., gdb, dbx, ddd)
other programming tools
debugging tools
(e.g., Purify)
reading or writing beyond the bounds of an array
reading or writing freed memory
freeing memory multiple times
reading uninitialized memory
reading or writing through null pointers
overflowing the stack by recursive function calls
reading or writing memory addresses on which a watch-point has
been set
other programming tools
portability advisors (e.g., lint)
style checkers
(e.g., CodeCheck)
exceeding a given input line length
exceeding a given nesting depth of if-else stmts.
not aligning open and close curly braces (Horstmann)
performance profilers (e.g., gprof)
Download