APPLICATION SPECIFIC LOW POWER ALU DESIGN Yu Zhou and Hui Guo By Nathan Windels OUTLINE Review Sources of Power Consumption Review ALU Structures Chain Structure Design and Proposed ALU Customization The Test Setup Results 2 POWER CONSUMPTION Power consumption is a critical design issue in embedded processor designs Two Types of Power Consumption: Dynamic: Reduce switching capacitance, switching frequency, or supply voltage Static: reduce circuit size, operating temperature, increase transistor threshold voltage Areas of Power Consumption Semiconductor Chip Design Level (transistor sizing, threshold voltage scaling) Register Transfer Level (clock gating, power gating) System Level (dynamic voltage scaling) Modify individual functional components of the processor (ALU customization) 3 ALU STRUCTURES The top is a tree structure (faster, larger area) The bottom is a chain structure (slower, smaller area) In a lot of applications, the ALU is not in the critical path of the processor, so the chain structure is often used ASIPMeister uses the chain structure to save area. 4 The idea of this paper is to customize the ALU by repositioning the elements in the chain structure Swapping the ‘add’ and ‘or’ components may favour some applications and can save a considerable amount of ALU power. This approach to power reduction is almost cost free and is extremely simple to implement. 5 PROPOSED CHAIN STRUCTURE There are n functional components and they are concatenated by 2to-1 multiplexers. Oi is the operational activity of component i Omuxj is the operational activity of the multiplexor j Change of component positions with not effect Oi, but it will effect Omuxj 6 ALU CUSTOMIZATION We can customize the ALU design by identifying frequent functional components and placing them close to the output. In application specific designs, the frequency of a functional component is obtained from instruction frequencies. We can therefore partition the instruction set into ALU and non-ALU instructions. The ALU instructions can then be grouped according to the functional component they actually use. Different weights of power consumption can be assigned to different functional components. Design in such a way that high weight and high frequency components are placed closer to the output. 7 ALU TEST SETUP Created a simple ALU in VHDL The design was synthesized with the Synopsis Design Compiler based on the ts11fs120 library The Power consumption was estimated by Synopsis PrimePower. 8 ALU TEST SETUP (2) A reduced ALU of 4 functions was used for exploring designs of all possible placements in order to verify the effectiveness of this approach. The adder is the longest component, therefore when it is positioned next to the output in the chain, the overall delay is reduced. The power always reaches a minimum level when the related functional component is placed closest to the output. 9 FULL PROCESSOR TEST SETUP The processor design for a given application was automatically generated. The target processor instruction set was Portable Instruction Set Architecture. The VHDL model was automatically generated by ASIPMeister. Simplescalar was used to compile the application program and to profile the program execution. 10 ALU OPERATION FREQUENCY We can see from this graph that addition has the highest frequency for all designs, so the adder is placed closest to the output. This table was obtained using the Simplescalar profiler. 11 ALU TEST RESULTS 12 ALU TEST RESULTS (2) The CPU clock time remains unchanged throughout all the designs, demonstrating that the ALU is not on the critical path. 13 CONCLUSION The order of functional components in the chain effects the power consumption, therefore, the frequently operating component should be placed close to the output. This change is easy to make. All you have to do is swap the order of ALU operations in the ifthen-else statement in the HDL code. This approach may be applicable to other designs with similar chain structure (floating-point ALU’s). 14