one example on website

advertisement
Take Home Midterm #2 and Final-Project#2: POWER CONSUMPTION ,
PROCESS MISMATCH, and Logic Design
OUT: Feb. 18;
IN: Mar. 17 (complete report of Midterm-#2 and Final-Project)
DONE IN GROUPS!!!!
I acknowledge, that I did this work on my own, and did not copy or plagiarize the
information from any other people or sources. I sign this to be true, and my
signature with my name means I did this work honorably.
NAME: _______________________________
SIGNATURE:__________________________
DATE:_________________________________
Optimal Power Consumption vs. Process Sensitivity/Mismatch
Everything we’ve done so far assumes we want the fastest delay. However, for some
applications, we need the maximum delay possible and the minimum power
consumed. This is a real-time requirement. This is true for: ultra-low power,
portable electronics. I.e. sensor networks, portable electronics, etc.
Therefore, As you are the CTO of a IOT sensor networking startup, you need to
understand the power consumption values for a XOR-2:
>> http://www.ambiqmicro.com/
>> http://www.ambiqmicro.com/spot-platform
Assume the XOR-2 characterization is general enough that you can generalize your
results for any logic gate in any technology. Simulate only in simple schematic form.
Assume minimum-type sizing for XOR-2 gate. Automate this as much as possible
through scripting (i.e. Python; PERL), so you don’t waste too much time. Do this
only for TT corner.
AUTOMATION SCRIPT (Tom Ruggeri, Feb. 2011): Here is a script, written by
graduate student Tom Ruggeri, that automates this simulation across multiple
processes. It was originally written for a D-FF, so you’ll have to adapt it to work for
the XOR-2 you have the schematic for.
one example on website: jace_akerland_mt2_auto_script(2).tar.gz
(Energy dissipated is: integral of I dt / C)
Problem #1: Energy-Delay Product for Scaled Vdd Technologies with NO
Process Mismatch
Tplh Tphl Td(avg) I(static) Energy(static) Energy(dynamic) Energy(TOT)/computation
0.25u(2.5V)
2.5V
1.0V
0.6V
0.5V
0.3V
65nm(1V)
1.0V
0.7V
0.5V
0.4V
0.3V
1) PLOT: 1) DELAY vs. VDD for each process; 2) ENERGY/COMPUTATION vs.
VDD for each process; 3) ENERGY*DELAY Product vs. VDD for each process
(EDP on Y-Axis; VDD and process node on X-Axis)
NOTE: ENERGY(TOT) is the total energy consumed (static AND dynamic) in
any one clock period. This is equivalent to ENERGY consumed /
computation. YOU NEED TO FIND STATIC ENERGY CORRECTLY, and also
what the cycle time is.
Process Variation
In any “real design”, devices, capacitances, resistances, and transistors do not
appear as they seem. In actuality, they obey statistical variations in delay, making
the behavior extremely difficult to predict. While traditionally we’d like to do a
Statistical Methodology (Monte Carlo) for variation simulation/prediction, this is
difficult. A more simpler way is to implement a worst-case simulation for the
variation. There are two offsets possible in a process: 1) Vt Variation; 2) channel
length modulation. Vt variation is characterized as: Sigma(Vt)=Avt / sqrt(W*L),
where Avt ~ 2mV/um in general for most technologies. (Pelgrom’s mismatch
model)
For this part, still use your XOR-2 gate, but make the worst possible Tplh and Tphl
conditions. For example, the worst case Tphl is when: PMOS has smaller Vt; NMOS
has larger Vt; process corner is Slow PMOS; Fast NMOS. The converse case (for
Tplh) is also true, and needs to be simulated. (NOTE: Ignore the 10% gate length
variation for this portion (to simplify your simulations). Just make a sigma- Vth
increase (worst-case) in both the NMOS and PMOS of the FA delay, assuming W/L
scaling across process technology.
Problem #2: Energy-Delay Products for Scaled Vdd Technologies with WORSTCASE Process Mismatch
Tplh Tphl Td(avg) I(static) Energy(static) Energy(dynamic) Energy(TOT)/computation
0.25u(2.5V)
2.5V
1.0V
0.6V
0.5V
0.3V
65nm(1V)
1.0V
0.7V
0.5V
0.4V
0.3V
1) PLOT: 1) DELAY vs. process; 2) ENERGY(TOT) vs. process; 3)
ENERGY(TOT)*Delay Product vs. process (EDP on Y-Axis; process node on XAxis) (SAME AS PROBLEM-#1)
Problem 3: Qualitative Discussion (short answers)
1) How does the delay vs. power dissipation scale as Vdd is lowered? At what
point can we stop scaling the supply voltage Vdd? Is there a limit? (HINT:
Look at the leakage)
2) Vdd scaling is a great way to reduce power consumption, if the increase in
delay can be tolerated. However, what happens to the delay vary between: a)
no process mismatch case; b) worst case process mismatch case
3) While energy/computation seems to be reducing with reduced VDD, another
important metric is energy/computation * delay (or, energy-delay product).
How is Energy-Delay Product scaling? Are we seeing any benefit for lowVDD operation as we continue reducing VDD? Why or why not?
Problem 4: 2-Page Skim Paper
1) Ultralow-voltage, minimum-energy CMOS
http://blaauw.eecs.umich.edu/getFile.php?id=247
2) Sub-threshold Sensor Network Processor
http://blaauw.eecs.umich.edu/getFile.php?id=263
3) Razor-I paper
http://blaauw.eecs.umich.edu/getFile.php?id=25
Read the three papers above, and write a 1-page synopsis, summary of lowvoltage, digital logic design. NOTE: I DO READ AND GRADE THESE CAREFULLY, TO
DETERMINE UNDERSTANDING OF THIS MATERIAL. PLEASE DO WRITE WELL.
Problem 4b: Pipelined 16b Adder with RC clock loading
GOAL: Design a 16-bit adder, with pipelined stages, and significant CLK skew.
[PART-A]: 16b-adder with LARGE CLK Wire Skew
Unfortunately, the clock routing tree has significant delay-skew. Assume the clock is
routed backwards, from back-to-front. (The CLK wire model is below.)
1) Step-1: Design a simple D-FF (master-slave with pass-gates and inverters)
2) Step-2: Add D-FFs at the beginning/end of the 16b-adder.
3) Step-3: Add D-FFs every 4-bits.
In your final report, please estimate the leakage power, active power (LOGIC and DFFs), and maximum clock frequency (assuming worst-case delay), for a LONG RCwire for the clock path, from the back-to-front.
WIRE MODEL of CLK:
A) L=4mm, W=0.1um
B) Resistivity ρ=2.7e-8 (Ω-m)
C) Capacitance=10fF/um
[PART-B]: 16b-adder with INV-Buffers Inserted Into CLK Wire
In order to minimize the clock skew, inverter buffers are inserted into this long CLK
wire. Insert INV buffers into this long clock, to break up the CLK delay.
1) Step-1: Design a D-FF (master-slave with pass-gates and inverters)
2) Step-2: Add D-FFs at the beginning/end of the 16b-adder.
3) Step-3: Add D-FFs every 4-bits.
In your final report, please estimate the leakage power, active power (LOGIC and DFFs), and maximum clock frequency (assuming worst-case delay), for this ‘optimal’
inverter-buffer repeater insertion.
EXTRA CREDIT-1: Scale it to 32nm-CMOS, and show the changes in leakage power,
active power, maximum clock frequency, area, and noise margins.
EXTRA CREDIT-2: Add two power gate to your entire 16bx16b SRAM:
a) PMOS power-gating switch, for leakage current reduction
Size this PMOS header to achieve less 10% reduction in total delay
(compared with NO PMOS header).
Problem-5: Grad students ONLY
Security idea. (PUF)
Download