Take Home Midterm #2 and Final-Project#2: POWER CONSUMPTION , PROCESS MISMATCH, and Logic Design OUT: Feb. 18; IN: Mar. 17 (complete report of Midterm-#2 and Final-Project) DONE IN GROUPS!!!! I acknowledge, that I did this work on my own, and did not copy or plagiarize the information from any other people or sources. I sign this to be true, and my signature with my name means I did this work honorably. NAME: _______________________________ SIGNATURE:__________________________ DATE:_________________________________ Optimal Power Consumption vs. Process Sensitivity/Mismatch Everything we’ve done so far assumes we want the fastest delay. However, for some applications, we need the maximum delay possible and the minimum power consumed. This is a real-time requirement. This is true for: ultra-low power, portable electronics. I.e. sensor networks, portable electronics, etc. Therefore, As you are the CTO of a IOT sensor networking startup, you need to understand the power consumption values for a XOR-2: >> http://www.ambiqmicro.com/ >> http://www.ambiqmicro.com/spot-platform Assume the XOR-2 characterization is general enough that you can generalize your results for any logic gate in any technology. Simulate only in simple schematic form. Assume minimum-type sizing for XOR-2 gate. Automate this as much as possible through scripting (i.e. Python; PERL), so you don’t waste too much time. Do this only for TT corner. AUTOMATION SCRIPT (Tom Ruggeri, Feb. 2011): Here is a script, written by graduate student Tom Ruggeri, that automates this simulation across multiple processes. It was originally written for a D-FF, so you’ll have to adapt it to work for the XOR-2 you have the schematic for. one example on website: jace_akerland_mt2_auto_script(2).tar.gz (Energy dissipated is: integral of I dt / C) Problem #1: Energy-Delay Product for Scaled Vdd Technologies with NO Process Mismatch Tplh Tphl Td(avg) I(static) Energy(static) Energy(dynamic) Energy(TOT)/computation 0.25u(2.5V) 2.5V 1.0V 0.6V 0.5V 0.3V 65nm(1V) 1.0V 0.7V 0.5V 0.4V 0.3V 1) PLOT: 1) DELAY vs. VDD for each process; 2) ENERGY/COMPUTATION vs. VDD for each process; 3) ENERGY*DELAY Product vs. VDD for each process (EDP on Y-Axis; VDD and process node on X-Axis) NOTE: ENERGY(TOT) is the total energy consumed (static AND dynamic) in any one clock period. This is equivalent to ENERGY consumed / computation. YOU NEED TO FIND STATIC ENERGY CORRECTLY, and also what the cycle time is. Process Variation In any “real design”, devices, capacitances, resistances, and transistors do not appear as they seem. In actuality, they obey statistical variations in delay, making the behavior extremely difficult to predict. While traditionally we’d like to do a Statistical Methodology (Monte Carlo) for variation simulation/prediction, this is difficult. A more simpler way is to implement a worst-case simulation for the variation. There are two offsets possible in a process: 1) Vt Variation; 2) channel length modulation. Vt variation is characterized as: Sigma(Vt)=Avt / sqrt(W*L), where Avt ~ 2mV/um in general for most technologies. (Pelgrom’s mismatch model) For this part, still use your XOR-2 gate, but make the worst possible Tplh and Tphl conditions. For example, the worst case Tphl is when: PMOS has smaller Vt; NMOS has larger Vt; process corner is Slow PMOS; Fast NMOS. The converse case (for Tplh) is also true, and needs to be simulated. (NOTE: Ignore the 10% gate length variation for this portion (to simplify your simulations). Just make a sigma- Vth increase (worst-case) in both the NMOS and PMOS of the FA delay, assuming W/L scaling across process technology. Problem #2: Energy-Delay Products for Scaled Vdd Technologies with WORSTCASE Process Mismatch Tplh Tphl Td(avg) I(static) Energy(static) Energy(dynamic) Energy(TOT)/computation 0.25u(2.5V) 2.5V 1.0V 0.6V 0.5V 0.3V 65nm(1V) 1.0V 0.7V 0.5V 0.4V 0.3V 1) PLOT: 1) DELAY vs. process; 2) ENERGY(TOT) vs. process; 3) ENERGY(TOT)*Delay Product vs. process (EDP on Y-Axis; process node on XAxis) (SAME AS PROBLEM-#1) Problem 3: Qualitative Discussion (short answers) 1) How does the delay vs. power dissipation scale as Vdd is lowered? At what point can we stop scaling the supply voltage Vdd? Is there a limit? (HINT: Look at the leakage) 2) Vdd scaling is a great way to reduce power consumption, if the increase in delay can be tolerated. However, what happens to the delay vary between: a) no process mismatch case; b) worst case process mismatch case 3) While energy/computation seems to be reducing with reduced VDD, another important metric is energy/computation * delay (or, energy-delay product). How is Energy-Delay Product scaling? Are we seeing any benefit for lowVDD operation as we continue reducing VDD? Why or why not? Problem 4: 2-Page Skim Paper 1) Ultralow-voltage, minimum-energy CMOS http://blaauw.eecs.umich.edu/getFile.php?id=247 2) Sub-threshold Sensor Network Processor http://blaauw.eecs.umich.edu/getFile.php?id=263 3) Razor-I paper http://blaauw.eecs.umich.edu/getFile.php?id=25 Read the three papers above, and write a 1-page synopsis, summary of lowvoltage, digital logic design. NOTE: I DO READ AND GRADE THESE CAREFULLY, TO DETERMINE UNDERSTANDING OF THIS MATERIAL. PLEASE DO WRITE WELL. Problem 4b: Pipelined 16b Adder with RC clock loading GOAL: Design a 16-bit adder, with pipelined stages, and significant CLK skew. [PART-A]: 16b-adder with LARGE CLK Wire Skew Unfortunately, the clock routing tree has significant delay-skew. Assume the clock is routed backwards, from back-to-front. (The CLK wire model is below.) 1) Step-1: Design a simple D-FF (master-slave with pass-gates and inverters) 2) Step-2: Add D-FFs at the beginning/end of the 16b-adder. 3) Step-3: Add D-FFs every 4-bits. In your final report, please estimate the leakage power, active power (LOGIC and DFFs), and maximum clock frequency (assuming worst-case delay), for a LONG RCwire for the clock path, from the back-to-front. WIRE MODEL of CLK: A) L=4mm, W=0.1um B) Resistivity ρ=2.7e-8 (Ω-m) C) Capacitance=10fF/um [PART-B]: 16b-adder with INV-Buffers Inserted Into CLK Wire In order to minimize the clock skew, inverter buffers are inserted into this long CLK wire. Insert INV buffers into this long clock, to break up the CLK delay. 1) Step-1: Design a D-FF (master-slave with pass-gates and inverters) 2) Step-2: Add D-FFs at the beginning/end of the 16b-adder. 3) Step-3: Add D-FFs every 4-bits. In your final report, please estimate the leakage power, active power (LOGIC and DFFs), and maximum clock frequency (assuming worst-case delay), for this ‘optimal’ inverter-buffer repeater insertion. EXTRA CREDIT-1: Scale it to 32nm-CMOS, and show the changes in leakage power, active power, maximum clock frequency, area, and noise margins. EXTRA CREDIT-2: Add two power gate to your entire 16bx16b SRAM: a) PMOS power-gating switch, for leakage current reduction Size this PMOS header to achieve less 10% reduction in total delay (compared with NO PMOS header). Problem-5: Grad students ONLY Security idea. (PUF)