A Design Flow for the Development, Characterization, and Refinement of System Level Architectural Services Douglas Densmore Dissertation Talk and DES/CHESS Seminar May 15th, 2007 Committee Prof. Alberto Sangiovanni-Vincentelli (EECS) - Chair Prof. Jan Rabaey (EECS) Prof. Lee Schruben (IEOR) Objective • To demonstrate that architecture service modeling in system level design (SLD) can allow abstraction and modularity while maintaining accuracy and efficiency. Factor Solutions Heterogeneity Modularity Complexity Techniques Outcomes Event Based Architecture Accuracy Service Modeling #1 Architecture Service #2 Efficiency Characterization Abstraction Architecture Service #3 Refinement Verification Time to Market 2/60 Outline 1. Problem Statement • Motivating Factors • Design Trends and EDA Growth • Software Solutions • Programmable Platforms • Naïve Approach • My Improved Approach 2. Approach 3. Contribution 3/60 Motivating Factors Factor 1: Heterogeneity Problem Statement Approach Contribution Various Communication Types PCMCIA USB Various Component Types (SRAM, Quick Capture Interface) System Bus Year Existing and Predicted First Integration of SoC Technologies with Standard CMOS Processes1 Intel's PXA270 System on a Chip (SoC): Block Diagram of the Intel PXA270 Mypal A730 PDA (digital camera and a VGA-TFT display) Solution 1: Modularity 1. Courtesy: http://www.intel.com/design/embeddedpca /applicationsprocessors/302302.htm D. Edenfeld, et. al., 2003 Technology Roadmap for Semiconductors, IEEE Computer, January 2004. 4/60 Motivating Factors Problem Statement Approach Contribution Potential Design Complexity and Designer Productivity 10,000 100,000 1,000 (Top) 100 (Bottom) Equivalent Added Complexity Logic Tr./Chip 10,000 Tr./S.M 1,000 58%/Yr. compounded Complexity growth rate 10 100 1 10 0.1 21%/Yr. compounded Productivity growth rate 0.01 0.001 1 0.1 2009 2007 2005 2003 2001 1999 1997 1995 1993 1991 1989 1987 1985 1983 1981 0.01 Productivity (K) Trans./Staff – Mo. Logic Transistors per Chip (M) Factor 2: Complexity Courtesy: 1999 International Technology Roadmap for Semiconductors (ITRS) Solution 2: Abstraction 5/60 Motivating Factors Factor 3: Time to Market Digital Consumer Devices Set-Top Equipment Automotive 18 16 16 Problem Statement Approach Contribution 37% of new digital products were late to market! (Ivo Bolsens, CTO Xilinx) 14 11 Months 12 10.7 10 8 Year late effectively ends chance of revenue! 50%+ revenue loss when nine months late. 6 4 2 0 1991 2000 2005 Year Gartner DataQuest. Market Trends: ASIC and FPGA, Worldwide, 1Q05 Update edition, 2002-2008. Three months late still loses 15%+ of revenue. Courtesy: http://www.ibm.com Solution 3: Accuracy and Efficiency Challenge: Remain modular and abstract 6/60 Problem Statement Approach Contribution Methodology Gap 80,000 Maximum Tolerable Design Gap ~$78.7 Billion 70,000 nd n Tre g i s De Design Gap 60,000 $Millions 50,000 40,000 30,000 20,000 10,000 Today 0 2003 ~22% Growth for 2007 Embedded Software (Left) 2004 2009 Embedded ICs (Center) Embedded Boards (Right) Gate Level Tremendous Growth Design Complexity (# Transistors) Design Trends and EDA Growth Ravi Krishnan. Future of Embedded Systems Technology. BCC Research, June 2005. RTL ESL Gartner Dataquest projections of EDA industry revenue Gartner Dataquest projection of ESL revenues Richard Goering. ESL May Rescue EDA, Analysts Say. EE Times, June 2005. 7/60 Software Tools Solution Problem Statement Approach Contribution Meta model language Application Space Meta model Application Instance Front end compiler Platform Based Design1 is composed of three aspects: DT Year Productivity Description of 1. Top DownMeta Application DevelopmentProductivity PlatformCost of •Improvement Metropolis Modeling Delta (Gates/DesnComponent improvement Abstract syntax trees Mapping 2. Platform Mapping 4 (MMM) language and compiler Year) Affected 3. Bottom Up Design Space Exploration System are the core components. Electronic 2005 +60% 200K SW Level above Platform System Level of concerns2 Development RTL including Platform Orthogolization ... Back end3 endN Back end Back end (HWBack and SW) 1 2 •(ES-Level) Backend tools provide various both HW and Design-SpaceVerification • Functionality and Architecture operations for manipulating Export Methodology SW design. • Behavior and Performance designs and performing analysis. Indices • Design Computation, Communication, and Technology Improvements and Impact on Designer Productivity3 Platform Instance Coordination. Architectural Verification Space Simulator Synthesis Verification tool 1. 2. 3. 4. tool tool tool A. Sangiovanni-Vincentelli, Defining Platform-based Design, EE Design, March 5, 2002. K. Keutzer, Jan Rabaey, et al, System Level Design: Orthogonalization of Concerns and PlatformBased Design, IEEE Transactions on Computer-Aided Design, Vol. 19, No. 12, December 2000. 2004 International Technology Roadmap for Semiconductors (ITRS). F. Balarin, et al, Metropolis: an Integrated Electronic System Design Environment, IEEE Computer, Vol. 36, No. 4, April, 2003. 8/60 Programmable Platforms What devices should the tools target? What? A system for implementing an electronic design. Distinguished by its ability to be programmed regarding its functionality. Standardization At extremes: Software Programmable – GPPs, DSPs Bit Programmable – FPGAs, CPLDs Memories, Standard Discretes– FPGA FPGAs ‘67 Programmable Platforms Why? One set of models represent a very large design space of individual instantiations. Field ProgramMicroFabric & Embedded mability ‘87 ‘97Computation processors Platform Strengths: ‘57 ‘77 Custom Rapid Time-to-Market LSIs ASICs Versatile, Flexible (increase product lifespan) Modeling In-Field Upgradeability focus of Performance: 2-100X compared to GPPs this work Customization Weakness: Source Electronics Weekly, Jan 1991 Performance: 2-6x slower than ASIC “digital wave” will require programmable Power:Next 13x compared to ASICs devices.1 Courtesy: K.Keuzter 1. Problem Statement Approach Contribution Elements Tsugio Makimoto, Paradigm Shift in the Electronics Industry, UCB, March 2005. 9/60 Programmable Platform Focus Classification Description Granularity Abstraction level: CLB, Functional Unit, ISA Host Coupling Coupling to host processor: I/O, direct communication, same chip Reconfiguration Methodology How device is programmed: Static, dynamic, partial Memory Organization How computations access memory: Large block, distributed Design Levels Design Elements Problem Statement Approach Contribution What do MY system level models need to capture? K. Bondalapati, V. Prasanna, Reconfigurable Computing Systems, USC IBM’s CoreConnect Architecture Communication Xilinx Virtex II Switches, MUXES Implementation XC2VP30 Storage Processing RAM Organization CLB/ IP Block uArch Crossbar, Bus Register File Size Execution Unit Type ISA Address Size Register Set Custom Instructions System Arch Intercon. Network Buffer Size Number/Types of tasks Xilinx Virtex II ML310 Board P. Schaumont, et al, A Quick Safari Through the Reconfigurable Jungle, DAC, June 2001. MicroBlaze PowerPC 10/60 Naïve Approach Problem Statement Approach Contribution 1. Design Space Exploration Simulation Abstract Modular SLD Tools Bridge the Gap!! Datasheets Expertise Manual “C” Model Inflexible Automatic Tool Flow 2. Synthesis Disconnected Inaccurate! Manual RTL “Golden Model” Lengthy Feedback Implementation Platform Implementation Gap! Estimated Performance Data Architecture Model Inefficient Miss Time to Market! 11/60 My Improved Approach Technique 1: Modeling style and characterization for programmable platforms Functional level blocks of programmable components Estimated Real Performance Performance Data Data Abstract Modular SLD From characterization flow Narrow the Gap Actual Programmable Platform Description Technique 2: Refinement Verification Formal Checking Manual Methods Informal Abstract Problem Statement Approach Contribution Correct!! Refined New approach has improved accuracy and efficiency by relating programmable devices and their tool flow with SLD (Metropolis). Retains modularity and abstraction. 12/60 Approach Statement Functional Modeling (Not discussed in this work) Chapter 3 – Architecture Services Characterization 6. Program actual device directly Narrow the Gap MHS 5. Produce an actual 3. Augment model Real Performance Data Structure Extractor with real performance data Abstract, Modular programmable platform description (i.e. MHS File) 4. Simulation based, Design Space Exploration 2. Assemble SLD, transaction based architecture from services. 1. ... Xilinx Virtex II Select architecture services from Programmable libraries FLEET ... Special Purpose Chapter 2 – System Level Architecture Services 4a. General Purpose General Based on DSE results, modify architecture model if needed Yes? No? Problem Statement Approach Contribution Problem: SLD of architecture service models potentially is inaccurate and inefficient. My Approach: A PBD approach to Architecture Service Modeling which allows modularity and abstraction. By relating service models to: •programmable platforms, • platform characterization, • and refinement verification, they will retain accuracy and efficiency. 4b. Abstract Perform refinement check (event based, interface Refined based, compositional component based) Chapter 4 – System Level Service Refinement 13/60 Outline Revisited 1. Problem Statement 2. Approach • My Improved Approach • Approach Statement • Architecture Service Descriptions • Metropolis Overview 3. Contribution • Programmable Architecture Service Modeling • Programmable Platform Characterization • Example of Techniques Focus: Modularity 14/60 Architecture Service Taxonomy Services are library elements, <F, C> where F is a set of interface functions (capabilities) and C is a set of cost models. Single Component, Single Interface (SCSI) – One provided interface and one simple cost model Single Component, Single Interface Provided Interface Multiple Component, Multiple Component, Multiple Interface Single Interface Provided Interface Component Component Cost Service CostA (C1, C2) Provided Interface Xilinx Virtex II Pro GeneralComponent Multiple Component, Multiple Interface (MCMI) – Two or more provided interfaces, Abstract zero or more internal interfaces, one or more simple cost functions, and zero or C Add CF cost DCT CPU CF more complex functions. F Problem Statement Approach Contribution PurposeCostB (C2) Service Processor C FFT Bus CF Multiple Multi Component, Single Interface F F (MCSI) – One provided interface, one or more internal interfaces, and one or more complex cost functions. Provided Interface Component Internal Interface Component Internal Interface C Component Cost (C1, C2, C3) Service Services also classified as active or passive. 15/60 Service Based Arch. Styles Problem Statement Approach Contribution Assemble collections of services to provide larger sets of capabilities and cost functions. Branching Style – Allows for the usage of all types of services Ring Style – Allows for the usage of Single Interface (SI) services only MCSI MCSI MCSI SCSI SCMI SCMI SCSI SCSI SCSI SCSI MCSI MCSI SCSI SCSI SCSI MCSI MCSI MCSI SCMI MCMI SCSI SCSI MCSI MCSI MCMI SCSI SCSI MCSI SCSI SCSI Architecture Style 1 - Branching MCSI SCSI Both Styles – Allow for the usage of active/passive and single/multiple component services. Hierarchy – Each style can be abstracted into composite services. Architecture Style 2 - Ring Ovals – Passive Services Squares – Active Services 16/60 Metropolis Objects Problem Statement Approach Contribution • Metropolis elements adhere to a “separation of concerns” ideology. • Processes (Computation) P1 Proc1 P2 Active Objects Sequential Executing Thread • Media (Communication) I1 Media1 I2 Passive Objects Implement Interface Services • Quantity Managers (Coordination) QM1 Schedule access to resources and quantities 17/60 Metro. Netlists and Events Problem Statement Approach Contribution Metropolis Architectures are created via two netlists: • Scheduled – generate events1 for services in the scheduled netlist. • Scheduling – allow these events access to the services and annotate events with quantities. Scheduled Netlist Proc1 Scheduling Netlist Proc2 P1 P2 Global Time I1 Media1 I2 QM1 1. E. Lee and A. Sangiovanni-Vincentelli, A Unified Framework for Comparing Models of Computation, IEEE Trans. on Computer Aided Design of Integrated Circuits and Systems, Vol. 17, N. 12, pg. 1217-1229, December 1998 18/60 Services in Design Flow Problem Statement Approach Contribution Functional Modeling Characterization (Not discussed in this work) Data Input Process Expanded Chapter 3 – Architecture 6. (Chapter 3) Program Services Characterization 2. 4. Abstract, Modular Select architecture services from 2. libraries Assemble SLD, 1. 1. ... Exploration 4. FLEET uBlaze ... General OPB Purpose Program actual device directly Xilinx VirtexGeneral II Libraries Special Purpose Chapter 2 – System Level Architecture Services Metropolis Media Simulation based, Design Space BRAM SynMaster Xilinx Virtex II Select architecture services from Programmable libraries n io ct with real performance data programmable platform description (i.e. MHS File) programmable platform description (i.e. MHS File) a tr Structure Extractor PLB Ex 3. Augment model transaction based architecture from services. 3.Produce an actual e ur ct ru St Mapping Process Structure Extractor Real Performance Data actual device Assemble SLD, directly transaction based PowerPC Narrow the Gap architecture from MHS an actual 5. Produce services. 4a. Based on DSE results, modify architecture model if needed Yes? No? 4b. Abstract Perform refinement check (event based, interface Refined based, compositional component based) Chapter 4 – System Level Service Refinement 19/60 Programmable Arch. Modeling • Computation Services Problem Statement Approach Contribution Services are organized by Leverage function orthogonal aspects of the system. level granularity; All services created here are XCMI • Transaction Level MicroBlaze SynthSlave 1-to-1 model/IP SynthMaster with more than two provided • IP PPC405 Parameters interfaces each. •Computation I/O InterfacesInterfaces correspondence Read (addr, offset, cnt, size), Write(addr, offset, cnt, size), Execute (operation, complexity) • Communication Services Processor Local Bus (PLB) On-Chip Peripheral Bus (OPB) • Other Services OPB/PLB Bridge Mapping Process BRAM Communication Interfaces addrTransfer(target, master) addrReq(base, offset, transType, device) addrAck(device) dataTransfer(device, readSeq, writeSeq) dataAck(device) Task After Before Mapping Mapping Read (0x34, (addr, offset, 8, 10, cnt, 4) size) 20/60 Sample Metropolis Service private private private private private public medium uBlaze implements uBlazeISA, GPPOperation, OPBMaster{...} _portMFSL Problem Statement Approach Contribution Parameters int int int int int C_FSL_LINKS; C_FSL_DATA_SIZE; C_USE_BARREL; C_USE_DIV; C_USE_HW_MUL; _portSFSL uBlaze _portChar _portOPB _portSM _portGT Ports port OPBTrans _portOPB; //connection to characterizer port cycleLookup _portChar; //FSL ports port FSLMasterInterface[] _portMFSL; port FSLSlaveInterface[] _portSFSL; //connection to StateMedia port SchedReq _portSM; //StateMedia to global time port GTimeSMInterface _portGT; Interface Function Assumptions Cycle Count cpuRead(int bus) Bus Dependent 1(LMB), 7(OPB) cycle cpuWrite(int bus) Bus Dependent 1(LMB), 2(OPB) cycle fslRead(int size) Transfer Size (1 * size) cycles fslWrite(int size) Transfer Size (1 * size) cycles execute(int inst, int comp) Valid INST Field (1 * complexity) cycles Each service profiled manually and given a set of cost models Non-Ideal 21/60 Programmable Arch. Modeling • Coordination Services PPC Sched BRAM Sched MicroBlaze Sched PLB Sched Problem Statement Approach Contribution OPB Sched General Sched Request (event e) -Adds event to pending queue of requested events PostCond() Resolve() -Augment event with information -Uses algorithm an the (annotation). Thisto is select typically event from thethe pending queue interaction with quantity manager GTime 22/60 Sample Metropolis QM Problem Statement Approach Contribution Interfaces public eval void request(event e, RequestClass rc) { public update void resolve() {…} public update void postcond() {…} public eval boolean stable(){…} public quantity SeqQM implements QuantityManager {…} Ports port StateMediumSched[] portTaskSM; Quantity Manager portTaskSM public quantity PLBArb implements QuantityManager {…} Request Class Interfaces { public public public public public public public public event getRequestEvent() {…} int getserviceType() {…} int getTaskId() {…} int getComplexity() {…} void setTaskId(int id) {…} int getFlag() {…} void setFlag(int flag) {…} int getDeviceId() {…} Each resolve() function is unique 23/60 Architecture Extensions for Preemption Problem Statement •Some Services are naturally preempted Approach –CPU context switch, Bus transactions Contribution •Notion of Atomic Transactions –Prior to dispatching events to a quantity manager via the request() method, decompose events in the scheduled netlist into nonpreemptable chunks. –Maintain status with an FSM object (counter) and controller. 4. Update the FSM to track the state of the transaction. Process (Task ) Service (Media) Event Transaction (i.e. Read) 1. A transaction is introduced into the architecture model. Initial State FSM1 Trans0 FSM0 S2 S3 Decoder (Process) A 1 B 2 C 3 Quantity Manager FSM 1 2 A Trans1 S1 S1 3 B C 3. Dispatch the atomic transaction (AT) 2 . Decoder transforms to the quantity manager (individual events which make up the AT ). the transaction into atomic transactions . 6. Use Stack data structure to store transactions and FSMs setMustDo() setMustNotDo() SM 5. Communication with preempted processes through StateMedia 24/60 Architecture Extensions for Mapping Problem Statement Approach Contribution •Programmable platforms allow for both SW and HW implementations of a function. •Need to express which architecture components can provide which services and with what affinity. Operations available Ability to perform operations Export information from service Export information from associated with public HashMap getCapabilityList() service associated with Task Affinity mapping process mapping process Mapping Process (Task) HW DCT (Service) Execute 0/100 DCT 100/100 FFT 0/100 Dedicated HW DCT Only can perform DCT ! Potential Mapping Strategies Greedy Best Average Task Specific Mapping Process (Task) uBlaze (Service) Task Affinity Execute 50/100 DCT 20/100 FFT 2/100 General Purpose uProc Can perform multiple operations 25/60 Programmable Arch. Modeling •Compose scheduling and scheduled netlists in top level netlist. •Extract structure for programmable platform tool flow. Structure Extractor 1. Assemble Netlists Process • Type • Parameters • Etc B. Examine port connections to determine topology. MicroBlaze OPB Scheduled Netlist MicroBlaze Sched OPB Sched Scheduling Netlist Connections Topology Mapping 4. Extractor Script Tasks A. Identify parameters for service. For example MHZ, cache settings, etc. Problem Statement Approach Contribution Top Level Netlist 5. Gather information and parse into appropriate tool format File for Programmable Platform Tool Flow (MHS) Public netlist XlinxCCArch XilinxCCArchSched schedNetlist ; XilinxCCArchScheduling schedulingNetlist SchedToQuantity [] _stateMedia C. Examine address mapping for bus, I/O, etc. 2. Provide Service Parameters 3. Simulate Model Modular Modeling Style D. Check port names, instance names, etc for instantiation. Decide on final topology. Accurate & Efficient 26/60 Characterization in Design Flow Functional Modeling (Not discussed in this work) Chapter 3 – Architecture 6. Services CharacterizationProcess Expanded Program actual device Problem Statement Approach Contribution directly Narrow the Gap 2. Create systems MHS 5. Produce an actual 3. Augment model 2. Real Performance Data 1. 1. Assemble SLD, transaction based Select device architecture from or services. family System Creator Real Performance Data with real performance data Abstract, Modular S2 S3 Data Extractor Structure S1 Extractor ... SN ... Xilinx Virtex II Select architecture services from Programmable libraries Chapter 2 – System Level Architecture Services FLEET programmable platform description (i.e. MHS File) Characterizer Database 4. Execution Time for Processing Simulation based, Design Space Exploration General Purpose Transaction Cycles data from 3. Extract systems Special Purpose 4a. Physical Timing General Based on DSE results, modify architecture model if needed 4. Categorize and store data Yes? No? Work with Xilinx Research Labs 1. 2. Douglas Densmore, Adam Donlin, A.Sangiovanni-Vincentelli, FPGA Architecture Characterization in 4b. Perform refinement check System Level Design, Submitted(event to CODES 2005. based, interface based, Abstract Adam Donlin and Douglas Densmore,compositional Method andRefined Apparatus for Precharacterizing Systems for Use component based) in System Level Design of Integrated Circuits, Patent Pending. Chapter 4 – System Level Service Refinement 27/60 Prog. Platform Characterization Need to tie the model to actual implementation data! Problem Statement Approach Contribution 1. Create template system description. Process from Structure Extraction 2. Generate many permutations of the architecture using this template and run them through programmable platform tool flow. 3. Extract the desired performance information from the tool reports for database population. 28/60 Prog. Platform Characterization Create database ONCE prior to simulation and populate with independent (modular) information. Problem Statement Approach Contribution 1. Data detailing performance based on physical implementation. 2. Data detailing the composition of communication transactions. 3. Data detailing the processing elements computation. From Char Flow Shown From Metro Model Design From ISS for PPC 29/60 Characterized Data Organization System 1 System N 4.2ns 4ns 3.8ns 3.2ns Index } Method } Physical Timing ISS ? uProc1 ? Metro FFT 20 Cycles Filter 35 Cycles } Computation Timing Characterizer ? Model ? Transaction ? NULL Timing ? ? ISS uProc2 Problem Statement Approach Contribution Each system interface function characterized has an entry. These indices can be a hashed if appropriate. Entries can share data or be independent. FFT 10 Cycles Filter 30 Cycles Read = ACK, Trans, Data Write = ACK, Data, ACK } Entries can have all, partial, or no information. How is the data associated with each service interface function? 30/60 Prog. Platform Characterization Problem Statement Approach P B can’t U Addr. MHZ aMHZ Area Contribution Why youAreajustMaxuse static estimation? 1 2 1 T 1611 119 16.17% 39.7% •Design from rows 1, 3, and • 2As 1resource usage increases system frequency 5 of the table. 1 L 1613 102 -14.07% 0.12% generally decreases. 3 0 T 1334 117 14.56% -17.29% •Three abstraction levels: 1, • Not linear nor monotonic. 1 3 0 L 1337 95 -18.57% 0.22% 3, and 10 cycle transactions. 15% change grade for the devices. 1• 3 1 T 1787is a 120speed26.04% 33.65% 1 2 1 10%+ Delta 2 140 1 2000 3000 (Area) 1500 2000 120 100 80 80 MHZ 1 •Metropolis JPEG version: Resource Usage 112,500 write transactions 1 for2 3 MegaPixel, 24 bit color 140 depth, 95% compressed 120 image. 100 (Performance) Table 3.3 Data and MHZ 4000 System Address Changes Combo Frequency (Performance) SliceCount Count Slice (Area) Periodic Changes PowerPC Added uBlaze – 2s Added BRAM -1s 2500 60 • 19% difference between 40 Increasing 1000 intuition and characterization. System 500 20 20 Created Complexity 0 0 Two 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 0 Top database 43 46 49 52 55 58 61 64 Curves 1 3 5 7 9 11 13 15 Samples Sample once prior to Area Loose Addr MHZ LooseMeasure Addr SlicesOften Plateaus High Spikes in Adjacent Decreasing but not Tight Addr MHZ Curves Overlap Tight Slice Addr Slices Count Area Frequency simulation. (Similar) Samples monotonic or linear PLB Write Transfer Performance Comparison 1000 Modular Characterization 60 40 Accurate & Efficient 31/60 Modeling & Char. Review Branching Architecture Example Task1 Task2 Task3 MCMI Task4 DEDICATED HW PPC Problem Statement Approach Contribution Scheduling Netlist DedHW Sched PPC Sched MCMI MCMI Scheduled Netlist Media (scheduled) Quantity Manager PLB PLB Sched BRAM BRAM Sched Global Time SCSI Characterizer Process Quantity Enabled Event Disabled Event 32/60 Outline Revisited 1. Problem Statement 2. Approach 3. Contribution • Architecture Refinement Verification • Vertical Refinement • Horizontal Refinement • Surface Refinement Focus: Abstraction • Depth Refinement • Design Flow Examples • Summary and Conclusions 33/60 Arch. Refinement Verification • Architectures often involve hierarchy and multiple abstraction levels. – • 1. 2. 3. Problem Statement Approach Contribution Limited if it is not possible to check if elements in hierarchy or less abstract components are implementations of their counterparts. Asks “Can I substitute M1 for M2?” Representing the internal structure of a component. Recasting an architectural description in a new style. Applying tools developed for one style to another style. D. Garlan, Style-Based Refinement for Software Architectures, SIGSOFT 96, San Francisco, CA, pg. 72-75. Refinement Technique Description Metropolis Style/Pattern Based Define template components. Prove they have a desired relationship once. Build arch. from them. Potential; TTL YAPI Event Based Properties (behaviors) expressed as event lists. Explicitly look for this event patterns. Discussed Interface Based Create structure capturing all behavior of a components interface. Compare two models. Discussed Compositional Component Based Create structures capturing local behavior. Compose larger systems by synchronizing these smaller pieces. Discussed 34/60 Refinement Verification in Design Flow Process Expanded Functional Modeling (Not discussed in this work) P1 M1 changes to Chapter 3 Component – Architecture1. Identify be made (structural 6. Structural Program P3 Services Characterization or component) actual device P3 Problem Statement Approach Contribution directly Abstract A Refinement Question P1 M1 C P2 M2 P2 Yes? No? B 3. Augment model M1 P31 5. Produce an actual Structure Extractor with real performance data Real Performance P3 Data P1Gap Narrow the MHS P2 (More Functionality) programmable P2 platform description (i.e. MHS File) M2 A. Inter-component Abstract, Modular 4. structural changes Simulation based, (compositional Design Space P32 component based) Exploration MN C 2. P2 M2 M3 Refined P4 1. Assemble SLD, B transaction A based architecture from C. Intra-component services. ... changes (Interface Xilinx based) Virtex II Select architecture services from Programmable libraries P3 FLEET ... P1 Special Purpose verification 2. Run tools 1 Chapter 2 – System Level Surface Refinement Events General Purpose M1 General B. Structural changes between scheduled and scheduling components (event Scheduling based) Scheduled Based on DSE results, modify architecture model if needed 1 Architecture Services 4a. • Interface Based Vertical Refinement • Control Flow Graph Horizontal 1 Yes? No? Depth Refinement Refinement • Focus on introducing new • Compositional Component Based • Event Based 4b. behaviors (Reason 1) Perform refinement check • Event Based Properties • Labeled Transition Systems • Focus on reasons 1, 2, and 3 Refined • Focus on abstraction & synthesis (Reasons 2 & 3) Chapter 4 – System Level Service Refinement 35/60 1. Douglas Densmore, Metropolis Architecture Abstract Refinement Styles and Methodology, University of California, Berkeley, UCB/ERL M04/36, 14 September 2004. (event based, interface based, compositional component based) Vertical Refinement Mapping Process Rtos Sched Mapping Process Rtos PPC405 Sequential PLB Cache BRAM Problem Statement Approach Contribution Concurrent Cache Sched New origins and amounts of events scheduled and annotated PPC Sched PLB Sched BRAM Sched Scheduled Netlist •Definition: A manipulation to the scheduled netlist structure to introduce/remove the number or origin of events as seen by the scheduling netlist. Scheduling Netlist Original Sequential Concurrent 1 Concurrent 2 E1 (CPURead) E1 (RTOSRead) E1 (CPURead) E1 (CPURead) E2 (BusRead) E2 (CPURead) E2 (CacheRead) E2 (CacheRead) E3 (MemRead) E3 (BusRead) E3 (BusRead) E4 (MemRead) E4 (MemRead) 36/60 Horizontal Refinement Mapping Process Mapping Process Rtos Sched PPC405 Control Thread Arb PLB •Definition: A manipulation of both the scheduled and scheduling netlist which changes the possible ordering of events as seen by the scheduling netlist. PPC Sched PPC405 Rtos Problem Statement Approach Contribution Cache Sched Ordering of event requests changed Cache BRAM Scheduled Netlist PPC Sched PLB Sched BRAM Sched Scheduling Netlist *Contains all possible orderings if abstract enough Original* Refined (interleaved) E1 (BusRead) -> From CPU1 E1 (BusRead) -> From CPU1 E2 (BusRead) -> From CPU1 E3 (BusRead) -> From CPU2 E3 (BusRead) -> From CPU2 E2 (BusRead) -> From CPU1 E4 (BusRead) -> From CPU2 E4 (BusRead) -> From CPU2 37/60 Event Based Properties E1 (CPUExe) E2 (CPUExe) E3 (CPUExe) Problem Statement Approach Contribution E4(CPURead) Resource Utilization Bad Resolve() Good Resolve() CPU E1, E2, E3, E4 E4, E1, E2, E3 Bus X, X, X, X, E4 X, E4 Mem X, X, X, X, X, E4 X, X, E4 • Properties expressed as event sequences as seen by the scheduling netlist. Bad Resolve() Good Resolve() CPU (0) E1, E2 E1 E1, E2 E1 CPU (1) E2, E3 E2 E2, E3 E2 Bus (1) E1, EX EX E1, EX EX CPU (2) E3 E3 E3 E3 Bus (2) E1, E2 E2 E1, E2 E1 E1, E3 E3 E2, E3 E2 CPU(3) Bus (3) Latency 38/60 Macro and MicroProperties Problem Statement Approach Contribution MicroProperty - The combination of one or more attributes (or quantities) and an event relation defined with these attributes. MacroProperty – A property which implies a set of MicroProperties. Defined by the property which ensures the other’s adherence. The satisfaction (i.e. the property holds or is true) of the MacroProperty ensures all MicroProperties covered by this MacroProperty are also satisfied. Data Precision (DP) Level 3 Sufficient Bits (SB) 2 Write Access (WA) Data Consistency (DC) Level 2 1 Read Access (RA) 1 No Overflow (NO) 1 Sufficient Space (SS) 0 Data Valid (DV) 0 Data Coherency (DCo) Level 1 Snoop Complete (SC) 0 39/60 Event Petri Net Problem Statement Approach Contribution tE1 tE2 tE3 MemWrite MemRead CPURead BusWrite BusRead CPUExecute CPUWrite Model EPN tE3 tE4 tE6 tE5 pC2 pC6 pC1 pC5 pC3 tC2 pC4 Model Event Petri Net – One transition set which represents events of interest, tEN. Transitions also are used to indicated interface functions. tC3 4 3 tC1 start2 3 Link the two event petri nets such thatfor select tENs Twotogether Petri Nets – One the service feed connection transitions, model and one for the events oftCN, which produce the needed interest. tokens for the property EPN. start3 RA t12 SB start1 t13 t5 NO t4 t10 t7 t9 SS WA t6 t2 t1 2 t11 DV SC t3 t14 t8 Prop EPN pDCo pDP Property Event Petri Net – Initial marking vector is empty. One place per Macroproperty, p<prop>. Created such that in order to create a token in each MacroProperty place, all transitions must fire once and only once. pDC 40/60 Surface Refinement Def. Required Services Surface P1 Provides Services Surface 1 2 Component P2 3 Unknown MoC (DataFlow, KPN, Etc) P3 Observable TraceM – Trace in Metropolis = a finite set of function calls to media via interfaces Surface 4 Surface Surface Component Example: Interface Calls on Ports Problem Statement Approach Contribution Surface Internal Operation Interfaces Not Visible (Ports) Restriction on the location and information available to define component behavior. • Defined as in Hierarchical Verification1 –Model: An object which can generate a set of finite sequences of behaviors, B –Trace: a B –Given a model X and a model Y, X refines the model, denoted X < Y if given a trace a of X then the projection a[ObsY] is a trace of Y. –Two models are trace equivalent, X Y if X < Y and Y < X. • The answer to the refinement problem (X,Y) is YES if X refines Y, otherwise NO 1. T.Henzinger, S.Qadeer, S.K. Rajamani, “You Assume, We Guarantee: Methodology and Case Studies”, 10th International Conference on Computer Aided Verification (CAV), Lecture Notes in Computer Science 1427, Springer-Verlag, 1998, p.440-451. 41/60 Control Flow Graph •Defined much like* •Tuple <Q, qo, X, Op, > –Q – Control Locations –qo – initial CL –X – set of variables –Op – function calls to media, basic block start and end – - transition relation Hypothetical Automaton for X variable X=0 1 2 3 X= 1 X=2 Graph for Model Control Location 1 Group Node Type: ProcessDeclNode Initial Control Location 1 X=0 Control Location 2 Group Node Type: LoopNode while loop 2 X<2 Control Location 3 Group Node Type: ThisPortAccessNode 3 X >= 2 7 Port1.callRead()+ //sample code Process example{ port Read port1; port Write port2; Void thread(){ int x = 0; while (x < 2){ port1.callRead(); x++;} port2.callWrite(); *”Temporal Safety }} Proofs for Systems Code”, Henzinger et al. Problem Statement Approach Contribution Control Location 4 Group Node Type: None Ending of basic block Control Location 7 Group Node Type: ThisPortAccessNode Port2.callWrite()+ 4 8 Control Location 8 Group Node Type: None Ending of basic block Port1.callRead()Control Location 5 Group Node Type: Collection of Variable Nodes Port2.callWrite()- 5 X++(+) Control Location 6 Group Node Type: Variable Node (collection) - End 6 9 Control Location 9 Group Node Type: None Sink State X++(-) 10 Control Location 10 Group Node Type: None Bookend of LoopNode 42/60 Surface Refinement Domains Component (Switch Fabric) move.source.Adder move.dest.Adder move.source.Prod move.dest.Prod move.source.Adder move.dest.Adder move.source.Prod move.dest.Prod move.source.mem move.dest.mem Component (Switch Fabric) Communication Ref Domain move.dest.Prod move.dest.Adder <C, P, OP> Communication Ref Domain move.dest.Adder move.dest.Prod move.dest.mem OP OP move.source.Adder move.dest.Adder Problem Statement Approach Contribution move.source.Prod move.dest.Prod OP OP OP OP move.source.Prod move.dest.Adder move.source.Adder move.source.mem OP move.dest.mem OP Add (input1, input2) prodLit() Add (input1, input2) prodLit() get() Computation Ref Domain 1 Computation Ref Domain 2 put() Computation Ref Domain 1 Storage Ref Domain 1 Component (Adder) Component (Producer) Add (input1, input2) prodLit() Component (Adder) Component (Producer) Adder (input1, input2) prodLit() Component (Memory) get() put() 43/60 Surface Refinement Example FIFO SchedulerAb FIFO SchedulerRef 1 1 terminated() terminated() 2 2 True True False 3 False 3 4 4 whatRound() whatRound() 5 5 Type & !Done !Type & !Done Else 6 Type & !Done 8 False 10 putRound1_ Status 12 whatRound() 9 False True queryData() 10 11 putPolicy() 7 checked_all terminated() whatRound() 9 True Else 6 7 checked_all terminated() 8 Problem Statement Approach Contribution putRound1_ Status queryData() 11 Trace containment check for single threaded processes putPolicy() 12 Trace FIFO Scheduler Process Traces (*function calls abbr) T1 Terminated() T2 Terminated() wRnd()* T3 Terminated() wRnd()* wRnd()* T4 Terminated() wRnd()* Tnated()* T4 Cont putPolicy() PR1S()* qData ()* Bref = {T1, T3, T4} Bab = {T1, T2, T3, T4} Refinement! 1. Douglas Densmore, Sanjay Rekhi, A. Sangiovanni-Vincentelli, MicroArchitecture Development via Successive Platform Refinement, Design Automation and Test Europe (DATE), Paris France, 2004. 44/60 Surface Refinement Flow 1 Metropolis Model (.mmm) 3 Reactive Module of CFG (X) Witness Module (W) 2 Visual Representation (for debugging) CFG Backend (automatic) Kiss file of CFA 4 3a Edit and Parallel Composition (manual) Answer to X Y SIS state_assign script FORTE (automatic) 4a Mode.exe file X||W 3b MOCHA Answer to X Y CFA (Y) developed in previous iteration BLIF file 4b Manual Edits to BLIF and NEXLIF2EXE 4c Problem Statement Approach Contribution Three primary branches: 1. Visual representation for debugging 2. CFG conversation to a reactive module. Works with the MOCHA tool flow. Requires manual augmentation of a witness module since Y has private variables. 3. CFG conversation to a KISS file. Works with the SIS and Forte tool flows. Requires manual edits to BLIF to EXLIF. BLIF file developed in previous iteration 45/60 Depth Refinement - LTS Problem Statement Approach Contribution • Depth Refinement – Want to make inter-component structural changes. • Definition: A Labeled Transition System (LTS) is a tuple <Q, Q0, E, T, l> where: –Q is a set of states, –Q0 Q is a set of initial states, –E is a finite set of transition labels or actions, –T Q x E x Q is a labeled transition relation, and –l : is an interpretation of each state on system variables. •But in LTS there is no notion of input signals empty Service write Write Write2 –When we compose LTS, a transition can be triggered when another LTS is in a given state. not empty write2 read2 Read Read2 full Olga Kouchnarenko and Arnaud Lanoix. Refinement and Verification of Synchronized Component-Based Systems. In FME 2003: Formal Methods, Lecture Notes in Computer Science, volume 2805/2003, pages 341–358. Springer Berlin / Heidelberg, 2003 46/60 Refinement Rule 1 Strict transition refinement Problem Statement Approach Contribution • If there is a transition in the refined LTS from one state to another, then there must be the same transition in the abstract • Note: The two transitions must have the same label! 47/60 Refinement Rule 2 Stuttering transition refinement Problem Statement Approach Contribution • If there is a new (tau) transition in the refinement LTS, then its beginning state and ending state must correspond to the same state in the abstract 48/60 Refinement Rule 3 Lack of τ-divergence Problem Statement Approach Contribution • There are no new transitions in the refinement that go on forever 49/60 Refinement Rule 4 External non-determinism preservation Problem Statement Approach Contribution • If there is a transition in the abstract and the corresponding refined state does not have any transition then – there must be another refined state that corresponds to the abstract – it must take a transition to another refined state and in the abstract must exist a state so that these two are glued together. 50/60 Depth Ref. Design Flow 1. Problem Statement Approach //Buffer Events (reads and writes) Contribution Transition System //‘‘write1’’ event is enabled when the LTSs are in the following states Create a .fts file //Two state values (write1) when type SIGNAL = {consume, wait} capturing the LTS ((prod = produce) /\ (buf = empty) Gluing /\ (con != consume)), local con : SIGNAL for each component Relation empty (write3) when empty ((prod produce) /\and (buf = notempty) /\ (con !=write consume)), of the= refined //Can only be in one state (read1) write2 when Invariant abstract systems. write read write read= consume)), ((prod != produce) /\ (buf = Gluing notempty) /\ (con (con = consume) \/ (cond2= wait) 1.(read3) Define observable Relation when events, OE not /\ (buf = full) /\ (con = consume)), ((prod != produce) //Initial state Abstract empty 2. Transaction labels //Producer Eventsto O correspond E 2. Refinement Gluing Initially (con not= wait) Relation empty write read make when //Transistion to consume (‘‘get’’ event) write read Define gluing (prod = wait), read2 Transition get : Gluing d1 stall when invariants in .inv Relation enable (con = wait) ; (prod = produce), write read con := consume full ((con =assign consume) <--> (conR = consume)) file. /\((con = wait) <--> ((conR = wait) \/ (conR = clean))) //Consumer Events read 3. Define synchronization //Transition to wait (‘‘stallC’’ event) get when Transition stallC between LTS: in .sync file. full (con = wait), Gluing enable (con = consume) ; stallC when Relation assign con := wait (con = consume) 51/60 Example Design 3. Assemble 1. Select an an application architecture from library and understand its services or create your behavior. own services. Extracta aMetropolis structural 2.5. 4.Create Map the file from the top which level functional model functionality to the netlist of models thisthe behavior. architecture. architecture created. File for Xilinx EDK Tool Flow On-Chip Peripheral Bus (OPB) MicroBlaze IP Library BRAM BRAM Problem Statement Approach Contribution JPEG Encoder Function Model (Block Level) Preprocessing DCT Quantization Mapping Mapping Process Mapping Process Process Huffman Mapping Process SynthMaster Structure Extractor Top Level Netlist SynthSlave 52/60 Example Design Cont. Problem Statement Approach Contribution 1. Feed the captured 2. theexecution permutations tofor the 3. 4.Feed Capture Provide transaction info infofor structural file to extract the Xilinx tools and the data. software communication and hardware services. services. permutation generator. File for Xilinx EDK Tool Flow Permutation Generator Permutation 1 Permutation 2 Permutation N Platform Characterization Tool (Xilinx EDK/ISE Tools) Manual Hardware Routines int DCT (data){ DCT1 = 10 Cycles Begin calculate … Manual DCT2 =5 Cycles … FFT Trans, = 5 Cycles Bit Read = Ack, Addr, Data, Ack } 32Automatic Software Routines ISS Info Transaction Info Char Data Characterizer Database 53/60 Example Design Cont. JPEG Encoder Function Model (Block Level) Backend Preprocessing Tool DCT Process: Quantization Huffman 1. Abstract Syntax Tree (AST) retrieves structure. Mapping Mapping Mapping Mapping Process Process Process Process 2. Control Data Flow Graph - Surface FORTE – Intel Tool Reactive ModelsSynthMaster – UC Berkeley MicroBlaze On-Chip 3. Event Traces – Refinement New Algorithm Peripheral SynthSlave Properties. Bus Vertical Surface (OPB) Refinement Horizontal Refinement BRAM BRAM BRAM Concurrent Vertical Refinement ISS Info Transaction Info Char Data Verification Tool Yes? No? Problem Statement Approach Contribution 1. Simulate the design and observe the performance. Execution time 100ms Bus Cycles 4000 Ave Memory Occupancy 500KB 2. Refine design to meet performance requirements. 3. Use Refinement Verification to check validity of design changes. • Vertical, or Horizontal • Depth, Surface • Refinement properties 4. Re-simulate to see if your goals are met. Execution time 200ms Bus Cycles 1000 Ave Memory Occupancy 100KB 54/60 MJPEG Encoding D Q H Completely Sequential TM Arch 1 P DCT and Quant separated D Q D Q D Q Functional Key: H TM Arch 3 Arch 2 P Y, Cr, and Cb components parallelized PreProcessing (P) DCT (D) Quantization (Q) Huffman Encoding (H) Table Modifications (TM) Collector (Col) Arch 4 D Q D Q D Q H P D Q H Huffman operations parallelized D Q H D Q H Col Mapping Guide: Mapping Process TM TM System Est. Cycles Char. Cycles Real Cycles Rankings Arch 1 145282 (52%) 228356 (25%) 304585 4, 4, 4 Arch 2 103812 (33%) 145659 (6%) 154217 3, 3, 2 Arch 3 103935 (29%) 145414 (1.2%) 147036 2, 2, 3 Arch 4 103320 (28%) 144432 (<+1%) 143335 1, 1, 1 = = uBlaze FSL Microblaze Soft Processor (uBlaze) Fast Simplex Link (FSL) Architecture Model Functional Model P Problem Statement Approach Contribution 55/60 Other case studies • H.264 Deblocking Filter – 14 different mapping explored – Execution time analysis for computation, waiting, and communication operations. – Average differences from Metropolis simulation and actual implementation was 3.48%. • SPI-5 Packet Processing – 6 architecture models developed – Optimal FIFO length determined 56/60 Summary and Conclusions 1. Heterogeneity Modularity Problem Statement Approach Contribution – Functional block level Metropolis models of programmable services. • Direct structural correspondence aids accuracy. Automatic structure extraction creates efficiency. – Independent characterization process of actual hardware implementations. • Shown to be accurate. Independence creates efficiency. 2. Complexity Abstraction – Depth/Surface Refinement allows internal changes to the model. • Trace based formalism accuracy. Automatic checking efficiency. – Vertical/Horizontal Refinement allow structural changes to the model. • Event based formalism accuracy. Refinement property encapsulation efficiency. 57/60 Thanks • Questions? • Thanks – Metropolis Team: Yoshi Watanabe, Felice Balarin, Roberto Passerone, Abhijit Davare, Haibo Zeng, Qi Zhu, Guang Yang, Trevor Meyerowitz, Alessandro Pinto – Committee: Jan Rabaey, Alberto Sangiovanni-Vincentelli, John Wawrzynek, Lee Schruben – Industrial: Adam Donlin (Xilinx), Sanjay Rekhi (Cypress) 58/60