Application specific low power alu design

advertisement
APPLICATION SPECIFIC LOW
POWER ALU DESIGN
Yu Zhou and Hui Guo
By Nathan Windels
OUTLINE
Review Sources of Power Consumption
 Review ALU Structures
 Chain Structure Design and Proposed ALU
Customization
 The Test Setup
 Results

2
POWER CONSUMPTION
Power consumption is a critical design issue in
embedded processor designs
 Two Types of Power Consumption:

Dynamic: Reduce switching capacitance, switching
frequency, or supply voltage
 Static: reduce circuit size, operating temperature,
increase transistor threshold voltage


Areas of Power Consumption
Semiconductor Chip Design Level (transistor sizing,
threshold voltage scaling)
 Register Transfer Level (clock gating, power gating)
 System Level (dynamic voltage scaling)
 Modify individual functional components of the
processor (ALU customization)

3
ALU STRUCTURES




The top is a tree
structure (faster, larger
area)
The bottom is a chain
structure (slower,
smaller area)
In a lot of applications,
the ALU is not in the
critical path of the
processor, so the chain
structure is often used
ASIPMeister uses the
chain structure to save
area.
4
The idea of this paper is to customize the ALU by
repositioning the elements in the chain structure
 Swapping the ‘add’ and ‘or’ components may
favour some applications and can save a
considerable amount of ALU power.
 This approach to power reduction is almost cost
free and is extremely simple to implement.

5
PROPOSED CHAIN STRUCTURE




There are n functional
components and they
are concatenated by 2to-1 multiplexers.
Oi is the operational
activity of component i
Omuxj is the operational
activity of the
multiplexor j
Change of component
positions with not effect
Oi, but it will effect
Omuxj
6
ALU CUSTOMIZATION





We can customize the ALU design by identifying
frequent functional components and placing them
close to the output.
In application specific designs, the frequency of a
functional component is obtained from instruction
frequencies.
We can therefore partition the instruction set into
ALU and non-ALU instructions. The ALU
instructions can then be grouped according to the
functional component they actually use.
Different weights of power consumption can be
assigned to different functional components.
Design in such a way that high weight and high
frequency components are placed closer to the output.
7
ALU TEST SETUP
Created a simple ALU
in VHDL
 The design was
synthesized with the
Synopsis Design
Compiler based on the
ts11fs120 library
 The Power
consumption was
estimated by Synopsis
PrimePower.

8
ALU TEST SETUP (2)
A reduced ALU of 4 functions was used for
exploring designs of all possible placements in
order to verify the effectiveness of this approach.
 The adder is the longest component, therefore
when it is positioned next to the output in the
chain, the overall delay is reduced.
 The power always reaches a minimum level when
the related functional component is placed closest
to the output.

9
FULL PROCESSOR TEST SETUP




The processor design for a
given application was
automatically generated.
The target processor
instruction set was
Portable Instruction Set
Architecture.
The VHDL model was
automatically generated
by ASIPMeister.
Simplescalar was used to
compile the application
program and to profile the
program execution.
10
ALU OPERATION FREQUENCY
We can see from this
graph that addition
has the highest
frequency for all
designs, so the adder
is placed closest to the
output.
 This table was
obtained using the
Simplescalar profiler.

11
ALU TEST RESULTS
12
ALU TEST RESULTS (2)

The CPU clock time remains unchanged
throughout all the designs, demonstrating that the
ALU is not on the critical path.
13
CONCLUSION
The order of functional components in the chain
effects the power consumption, therefore, the
frequently operating component should be placed
close to the output.
 This change is easy to make. All you have to do
is swap the order of ALU operations in the ifthen-else statement in the HDL code.
 This approach may be applicable to other designs
with similar chain structure (floating-point
ALU’s).

14
Download