Microprocessors

advertisement
Transmeta’s Crusoe
Architecture
Umran A. Khan
Microprocessors
Generations of Crusoe’s
Processors
Original architecture TM3120, TM5400
 Later version TM5600-TM5800

 The
architecture is moreover the same, but is
improved
Faster clock rate (up to 800 MHz now)
 Smaller core/size (0.13 micron die)
 Has special instructions for the OS its emulating
 Lower power consumption
 Wider range of applications (from internet
appliances to high density servers)


We will look at the TM5400 here
Instruction Set

Uses a VLIW (Very Long Instruction Word)
Instruction format/engine
 Instruction

word is a 128 bit long packet
Each word (also called molecule) has four
individual execution units called atoms
These atoms are packed into either a 128 or 64-bit
chunks
 These atoms (operations) execute in parallel (4
operations per clock)
 These Operations must be independent from one
and another

Four Execution Units

FPU (Floating Point Unit)
 Has
a 10-stage floating point pipeline
 Uses conventional x86 80-bit register format


32 FP registers
2 Integer ALU (Arithmetic-Logic Units)
 Has
a 7-stage integer pipeline
 64 32-bit registers dedicated to it


LSU (Load/Store Unit)
Branch Unit
Sample Instruction
128 bit Instruction
FADD
ADD
LD
BRCC
FPU
Integer
LSU
BU
ALU#0
Figure copied from reference#1
(Load/Sore)
(Branch)
Introduction to Code Morphing

Code Morphing Software is a clever translation
software layer that dynamically recompiles a x86
program into its native VLIW instruction format
 Located
in the Bios Rom and runs in main memory
 An entire group of instructions are translated at once
and then is put into the translation cache
 Basically, an emulation mechanism

It can be used for architectures other than x86
such as the Linux (TM3120), Alpha’s FX!32, but
TM5400’s is known for its x86 compatibility
 Great
Potential!
Crusoe Translation layers
X86 Bios
CPU Core
Operating System
X86 Applications
Code
Morphing
Layer
Traditional x86 Architecture



Ia32 instructions are translated by the cpu into
more compact and uniformed RISC-like
instructions (translates instruction individually)
fancy/complicated translation
It has dedicated hardware for
 x86
Instruction translation
 Branch prediction
 Register Renaming
 Instruction reOrder
Transmeta’s Simplified Core

Al lot of the processor functionality is
implemented in software

Its hardware if made up of execution units, the
instruction decode unit and of course, the cache
 However, the rest of dedicated hardware (in previous
slide) is done in software
 Advantages



the cpu takes less die space
less power demanding
Less expensive for production and upgrades
Hardware vs. Software

Implemented the hardware in software
comes with a cost

Software is slower than hardware


It is not so easy


But how much slower?
Its reordering registers, renaming registers,
predicating branches on the fly, etc. using the
same hardware used for addition, instruction
execution, etc. adds complications
Does the benefits outweigh the costs?
 According
to Transmeta, IT DOES!
Execution, Decoding and
Scheduling

In x86,
 Instructions
are translated individually
 An instruction’s binary is fetched and decoded into n
operations


 an
These operations are reordered and are fed to the execution
units (i.e. FPU, ALU, etc.) in parallel
the sequence is reconstructed for execution
out-of order execution has to be reconstructed in
sequence and retranslated (complicated and costly)
Execution, Decoding and
Scheduling (Continued)

In Crusoe,
 A group of instructions are translated at once
 Instructions are translated once and are placed
into
the translation cache
 If the same code is run again, the processor can
grab it from the translation cache
 Instructions can by reordered by the scheduler by
looking at the generated code

Thus, the number of instructions executed can be
minimized
Caching and Optimization

Translation cache used more efficiently


A translation is optimized every time it is executed
However, it will probably require more than pass for it to be truly
optimized




Optimization is done in steps
Sections of code usually don't get optimized if they occur only once
Code is recompiled quickly to keep the processor and programming
running
Uses common optimizations done by a ordinary compiler

Optimizer is basically a simple compiler
Optimization Strategies

The Code Morphing software has many ways to gather
feedback about a running program
 “Instrument Translation”
Special code is used to collect information about the block
that is going to be executed
 This info is later used for optimizations and translation
Branch predictions, path speculations and the reordering loads and
stores are done by the Code Morphing layer with some (Alias)
hardware support and some condition code



Filtering


Determines how much effort must be spent on translation and
optimizing a piece code
Executions modes

Interpretation, translation with or without optimization
Translation Example
FRONTEND
ld
%r30, [%esp]
add.c %eax, %eax, %r30
ld
%r31, [%esp]
add.c %ebx, %ebx, %r31
ld
%esi, [%ebp]
sub.c %ecx, %ecx, 5
addl %eax, (%esp)
addl %ebx, (%esp)
movl %esi, (%ebp)
subl %ecx, 5
KEY
ld – load
movl - load
Addl – load and add
add.c - add with condition codes set
Subl – load and sub
sub.c - sub with condition codes set
OPTIMIZER
ld
%r30, [%esp]
add %eax, %eax, %r30
add %ebx, %ebx, %r30
ld
%esi, [%ebp]
sub.c %ecx, %ecx, 5
SCHEDULER
ld %r30, [%esp]; sub.c %ecx, %ecx, 5
ld %esi, [%ebp]; add %eax, %eax, %r30; add %ebx, %ebx, %r30
Example from reference#2
Power Management

Typical power saving approaches
 Switching off the processor


Having duty cycles
Causes glitches
 Changing the clock rate by suspending to and restarting from
the RAM

Crusoe power saving Approaches
 Longrun power management (next slide)

Integrated the north bridge of the chipset and RAM controllers
onto the cpu core


Can also integrate video and sound cards
Saves power in the overall system
Longrun Power Management
Feature of Code Morphing Software layer
by detecting cpu load
 Can adjust clock frequency on the fly
 Can dynamically change the cpu voltage
 It can reduce power consumption by 30%
by lowering the cpu clock rate by 10%

30% = 100% x (1-(.9 x .99 ))
 Less heat problems


No need for extra fans take up more power and
space
Conclusion

Advantages
 low power consumption technology
 Low cost
 Longer battery life
 Great for the mobile user, embedded systems and even high
density servers

Smaller and lighter computers
 Code Morphing technology
 Can emulate any target architecture




Compatibility
Uses special optimization techniques for target Operating
Systems
Easier Software debugging (look at reference #1)
Cheaper and Simplified upgrades
Conclusion (Continued)

Disadvantages
 An emulation can not be faster than the real thing



Code translation requires extra cycles
Code Morphing technology runs in main memory and takes up
memory bandwidth
Heavy coding
 Inherits the some of the same problems with other VLIW
processors


Need clever Compilers for parallelism
Too much fixup code (for speculation, predictions, rollbacks, etc.)
 Technology seems to be really geared toward mobile users

For desktops (power users) and servers, performance outweighs
power consumption

Performance is a measure of power consumption
Final Thoughts

Transmeta only reported a net revenue of
$4.1 millions for the first quarter of 2002

No significant share in the mobile industry
Even though Transmeta has a clever technology,
the clock speeds of AMD and Intel have
overshadowed its impact just like multiflow (clock
speed are about 1.0 GHZ faster than the Crusoe)
 AMD and Intel have also develop their own power
efficient mobile processors (mobile Athlon XP with
AMD PowerNow!™ technology and mobile
pentium 4 with Intel® SpeedStep® technology)

Stay Tuned for the next Exciting
Episode
AMD, I am
your father!
VS.
Not any
more!!!
References




http://www.hardwareanalysis.com/content/editori
als/article/1237.4/
http://www.transmeta.com/pdf/white_papers/pap
er_aklaiber_19jan00.pdf
http://www.arstechnica.com/cpu/1q00/crusoe/cru
soe-1.html
http://www.erc.msstate.edu/~reese/EE8063/html
/transmeta/transmeta.pdf
Download