Code Generation CS 480

advertisement
Code Generation
CS 480
Can be complex
• To do a good job of teaching about code
generation I could easily spend ten weeks
• But, don’t have ten weeks, so I’ll do it in one
lecture
• (Obviously, I’ll omit a lot of material)
Remember Reverse Polish Notation
• Remember RPN?
• To evaluate y * (x + 4)
Push Y on stack
Push x on stack
Push 4 on stack
Do addition
Do multiplication
Code for (x + 12) * 7
Push fp
Push -8
Add
Get value (i.e., address is on stack, get value)
Push 12
Add
Push 7
multiply
Addressing modes
• Early discovery, some patterns, such as adding fp
+ constant, occur really often. Why not make
then part of instruction
• Common address modes
Absolute address (i.e., global)
#Constant (small integer values)
Register – contents of register
(reg) – value at register as an address
Con(reg) – value at c plus register (e.g., 8(fp) )
More unusual addressing modes
• PDP had a number of unusual addressing
modes
-(reg) use register, then decrement
(reg)+ use register, then increment
Look familiar? Could be used to build a great
stack system
Code for (x + 12) * 7, again
Move -8(fp), -(sp)
Move #12, -(sp)
Add (sp)+, (sp)
Move #7, -(sp)
Mult (sp)+, (sp)
Still easy to do, but still lots of memory access
Generating stack-style code is easy
• We can use the AR stack just like the stack in a
RPN calculator
• Simply do a post-order traversal of the AST
• Operands (variables, constants) push on stack
• Operators generate code to take arguments
from stack, compute the result, and push back
on stack
• Simple. This is what you are doing in prog 6
Then what’s the problem?
• Remember the von Neumann machine
design?
• CPU is separated from memory by a thin slow
wire, which is costly to traverse
• Every time you do a memory access things
slow down
Memory access in stack code
generation
•
•
•
•
One memory access to get instruction
One or two memory access to get arguments
Then you can do the operation
One more memory access to write result back
to memory
• That’s a lot of memory access
A few things you can do to speed up
• There are a few things that you can do with
hardware to speed this up just a bit
• Caching – (speeds instruction fetch, operand
fetch not so much)
• Pipelines – (but results coming from memory
cause frequent stalls)
Can be done much faster
Move -8(fp), r1
Add #12, r1
Mult #7, r1
Registers live on-chip, allow much faster access,
code will run much faster.
Bottom line for todays lecture
Pro
Con
Stack style
Easy to do
Slow execution
Register style
Fast execution
Hard to do
Registers can be used to hold
intermediate results
• Example (a + b) * (c + d)
Move a, r1
Add b, r1
Move c, r2
Add d, r2
Mult r2, r1
But notice we need to use two registers
No matter now many you have
• Can always come up with an expression that will
be more complicated than the number of
registers you have
(a + b) * (c + d) * (e + f) * ….
But are such things common?
Still. Need to handle it, called a register spill.
Solution: Save temp to memory, load when you
need it (basically going back to stack style).
Making effective use of addressing
modes
• Addressing modes give rise to two classes of
arguments
• Those that can be represented as addressing
modes (local variables, constants),
• And those that can not (expressions)
• Need to make effective use of those
Four cases for addition
X + y (that is, two addressing mode expressions)
move x, r1
add y, r1
X + (…) (that is, one addressing mode expression)
compute (…) and place into r1
add x, r1
(…) + x
compute (…) and place into r1
add x, r1
(…) + (….) (that is, neither side addressing mode)
compute left (…) and place into r1
compute right (…) and place into r2
add r2, r1 (and now r2 is free)
What about subtraction? Non
commutative
X - y (that is, two addressing mode expressions)
move x, r1
sub y, r1
X - (…) (that is, one addressing mode expression)
compute (…) and place into r1
sub x, r1 (NO! this is the inverse of the result we want)
(…) - x
compute (…) and place into r1
add x, r1
(…) - (….) (that is, neither side addressing mode)
compute left (…) and place into r1
compute right (…) and place into r2
sub r2, r1 (and now r2 is free)
Choices on subtraction
• Use two registers, or
• Use one register and create an invert
instruction
• Depends upon if you have enough registers
• Lots of special cases (division, multiplication,
shifts).
But registers have many other uses
• Can use registers to hold variables that are
being accessed a lot (such as loop index
values)
• Can use registers for passing parameters (save
memory access)
• So a good machine design would have lots of
registers, right? Not so fast
Using registers for variables
• Can make your code dramatically faster
• But need to discover WHICH variables are
commonly used – complicated data flow
analysis
• Or (as C does), allow the user to give you hints
(which are frequently wrong)
Using registers for parameters
• If the callee can use the values directly in
registers, then the code can be much faster
• If not, when they they were going to save
register values anyway, nothing is lost
• But callee might not know if they want
parameter is memory or in register
Cost to save an restore
• Remember the parameter calling sequence?
• The more registers you have, the more you
need to save and restore on procedure entry
and exit
• So there is a trade-off – faster function
invocation, slower execution of body of
function, or faster execution, slower
invocation
Worse yet – don’t know ahead of time
• don’t know until you have seen entire
function body how many registers you will
need
• Save them all – too much execution time
• Don’t save enough – run out of registers
• (old C trick, branch to end of function, and
generate saves after you have seen everything
else).
Interesting idea Register Window
• Interesting idea. Put a huge amount of
registers (say, 256) on-chip, but only allow
process to see a few (say, 16) at time
• When you execute a function, simply move up
window on registers
• Saves and restores are automatic, make
function invocation faster
Even better, overlapping windows
• What if six of those registers are shared with
caller, and ten are new?
• What might you use those six for?
• Can make for really function invocation
Downside of register windows
• Eventually you run out, and they must be
spilled as well (and restored)
• Context switch is now really slow, as you need
to save an restore ALL 256 registers
• Still an interesting idea
How is it really done?
• You can spend a lot of time worrying about
things that hardly ever happen (expressions
with 32 temporary values in the middle,
functions with 17 arguments)
• And programming styles change (OOP has
smaller functions, but fewer arguments)
• So there is never a clear answer, and it keeps
changing
Download