RTC08 IWTW IAR2_original CE_original

advertisement
Compiler Optimization: Getting the Most out of High-Level Code
Given the size of today’s projects, it is now imperative to write code in a high-level language,
specifically C. But that makes code optimization all the more desirable. Here are some of the
techniques and technologies of how a compiler optimizes your code for minimal footprint,
highest performance, or in an optimized combination.
by Shawn A. Prestridge, Mentor Graphics
Microcontrollers used to be very simple. They had no pipelining, a limited set of registers and
the only peripherals were I/O ports that the hardware designer tied to other pieces of hardware to
make them work. As such, writing assembly language code was a relatively straightforward task.
These days, architectures can have multistage pipelines, banked registers and many on-chip
peripherals. Because of the rising complexity of the devices, C has become the language of
choice to write software for microcontrollers. But can a compiler generate code as efficient as a
human can with an assembler? Assuming the individual has unlimited time, the answer is no.
However, in real-world conditions where you must meet schedules and achieve faster time-tomarket, a compiler can generate code far more efficiently than any human can.
Conceptually, the operation of a compiler is simple. It can take C source code and compile it into
object code. The object will later be linked together by a linker into an executable, but most
optimizations for the code are performed by the compiler. The compiler has several stages of
processing that it performs in order to turn your source code into object code (Figure 1). The first
stage runs the source code through a parser, which parses the C statements into a binary tree. The
result of this parsing is referred to as “intermediate code.” The first stage of optimization is
performed on this intermediate code by the high-level optimizer (HLO). The HLO analyzes the
code and performs transformations based upon C language constructs, so no target-specific
optimizations are performed by the HLO. Even though the IAR Embedded Workbench products
support over 30 different architectures, a large portion of the code in the compiler is the same
from one IAR Embedded Workbench to another because the parser and HLO are the same. After
the HLO optimizes the intermediate code, the code generator translates the optimized
intermediate code into target-specific code. This target code is then optimized by the low-level
optimizer, which performs architecture-specific optimizations. The optimized target code is then
transformed into object code by a compiler-internal assembler.
Optimization takes place in three phases: analysis, transformation and placement. The analysis
portion of optimization tries to understand the intention of the source code that you wrote so that
it can make intelligent decisions about how to transform your source code into more efficient C
language constructions while preserving the original meaning of the code. These transformations
are based on heuristics and generally lead to much tighter code. The compiler also performs
register allocation, which is a key part of producing efficient code. Register allocation decides
which variables should be located in registers rather than being in RAM. Having variables in a
register allows you to quickly perform mathematical operations on them without having to read
or write them from RAM. The problem is that the microcontroller only has a limited number of
registers to hold these variables, so the code has to be analyzed carefully. The analysis is split
into two parts: control flow and data flow. The control flow analysis is performed first and it is
the basis for the data flow analysis. Control flow analysis detects loops, optimizes jumps and
finds “unreachable” code. The data flow analysis finds constant values, useless computations and
“dead” code. The difference between unreachable and dead code is that unreachable code cannot
be executed based on the code structure while dead code cannot be reached based on the value of
variables.
The second stage of optimization is transformation. There are two different levels of
transformation, high-level (which is architecture-independent) and low-level (which takes
advantage of the facilities provided to it by the architecture). In Figure 2, we see some of the
high-level transformations that can occur in the code. The first transformation is called “strength
reduction” and aims to use an operation with fewer instructions and/or MCU cycles. The other
transformations in Figure 2 seek to eliminate code that is either redundant (common
subexpression elimination) or unnecessary (constant folding and useless computations).
Loop transformations are also performed by the high-level optimizer and can be found in Figure
3. The first transformation in this figure is referred to as “loop-invariant code motion” and seeks
to move code that is not impacted by the loop operations outside of the loop (as the name of the
transformation implies). The second transformation is called “loop unrolling” and is used to
amortize the overhead of the test-and-branch conditions associated with the loop at the expense
of slightly larger code. Lastly, the high-level optimizer makes decisions about whether or not to
inline a function call based upon the number of times the function is called and the size of the
code contained within the function. Function calls are very costly partially due to the branch
instructions needed to jump to a function and return from it, but mostly because of the overhead
that the microcontroller’s application binary interface (ABI) enforces on the compiler. This ABI
requires that certain registers are preserved across function calls, so every function call must be
preceded by a push of those registers to the stack and followed by a corresponding pop of the
registers back off the stack to save the context. If the function’s code is inlined, this overhead is
eliminated and the function runs faster (and is sometime smaller!) than if the function is actually
called. Inlining gives you the functionality of a macro, but makes the code type-safe.
The low-level optimizer (LLO) uses the instruction set of the underlying architecture to find
ways to optimize the code. The LLO examines the target code to find places where the
architecture can accomplish the goal with a small series of assembler instructions. Figure 4
illustrates two such constructs that can be reduced to just a few fast-executing instructions. The
LLO also looks at register allocation to decide which variables should be located in registers.
Although this allocation is normally not considered an optimization per se, it has a dramatic
effect on how fast the resulting code can execute since operations can be performed directly on
the data in the register rather than having to first read the value from some other memory source.
The LLO also decides where to place the code and data using a technique that is referred to as
“static clustering,” which collects the global and static variables into one place. This has two
important benefits: it allows the compiler to use the same base pointer for many memory
accesses and it eliminates alignment gaps between the memory elements.
There are limits to the optimization that can be performed. For example, common subexpression
elimination can only be applied to parts of expressions not involving functions. The reason is that
function calls may have side effects that cannot be determined at compile-time, therefore the
compiler must play it safe and preserve all function calls. If the function is inlined, however, the
compiler can more effectively examine the code and do common subexpression elimination to
avoid unnecessary computations with the added benefit of avoiding needless function calls. The
C language provides for the concept of separate compilation units, which means that source code
files in the project can be compiled individually. While this is indeed a very handy feature for
writing source files that are separated into common groups, it has the unfortunate side effect that
the compiler may not be aware of what is happening in other source files, which causes the
compiler to generate extra code in order to be conservative in its assumptions. This is particularly
true if you are calling small functions that are defined in other pieces of source code. The IAR
Embedded Workbench has a unique feature that allows you to choose “Multi-file compilation”
where the compiler treats several pieces of source code as one monolithic piece of code so that
the compiler has greater visibility into what the code is doing and can therefore make better
decisions about how to optimize the code effectively.
IAR Embedded Workbench allows you to control these optimizations at several different levels
to give you optimum granularity in your code development. The project-level setting is a global
setting that becomes the default for all files in the project. Several pieces of source code can be
contained within a group and that group can override the inherited optimization settings.
Similarly, optimization can be overridden at the file level or even at the function level by the use
of pragma directives. Additionally, optimization can have different goals for the compiler to
achieve: size, speed or a balanced approach. As the names of the first two imply, the compiler
will optimize purely for size or speed, respectively. When you use the balanced setting, the
compiler tries to strike a healthy balance between size and speed, sometimes giving a little on
one to achieve a little of the other. Moreover, IAR Embedded Workbench products also allow
control over which transformations are applied to the code so that you can get exactly what you
need.
Embedded compilers have evolved greatly over the last thirty years, especially as it pertains to
their optimization capabilities. Many years ago, developers had to be very careful to structure
their C code in such a way that it could be easily optimized by the compiler. However, modern
compilers employ many different techniques to produce very tight and efficient code so that you
can focus on writing your source in a clear, logical and concise manner.
IAR Systems, Uppsala, Sweden. +46 18 16 78 00. [www.iar.com].
Download