Module 4 HW 및 SW Co-design for SoC 정정화 교수 (한양대학교) 4장. HW/SW Co-Design for SoC HW/SW Co-design for SoC Introduction of HW/SW Co-design HW/SW Co-design Methodology System Specification HW/SW Co-partitioning HW/SW Co-synthesis HW/SW Co-verification Co-design Related Works 참고문헌 Copyrightⓒ2004 2 4장. HW/SW Co-Design for SoC Co-design 이란? 하드웨어와 소프트웨어가 조합된 시스템에서 기능과 성능의 목 적을 동시에 달성하기 위하여, 협력성과 동시성을 지원하는 설계 방법론 Abstract Co-design Process Synthesis Mapping Hardware Copyrightⓒ2004 Architecture Verification Abstraction Refinement Function Software 3 4장. HW/SW Co-Design for SoC Codesign Definition and Key Concepts Codesign Exploiting the trade-offs between hardware and so ftware in a system through their concurrent design Key concepts Concurrent: hardware and software developed at the same time on parallel paths Integrated: interaction between hardware and software developments to produce designs that me et performance criteria and functional specification s Copyrightⓒ2004 4 4장. HW/SW Co-Design for SoC Motivations for Codesign Instruction Set Processors (ISPs) available as cores in many design kits (386s, DSPs, microcontr ollers,etc.) Systems on Silicon - many transistors available in typical processes (> 10 million transistors available in IBM ASIC process, etc.) Increasing capacity of field programmable devices - some devices even able to be reprogrammed on -the-fly (FPGAs, CPLDs, etc.) Efficient C compilers for embedded processors Hardware synthesis capabilities Copyrightⓒ2004 5 4장. HW/SW Co-Design for SoC SOC Co-Design Challenges Current systems are complex and heterogenous Contain many different types of components Half of the chip can be filled with 200 low-power, RISC-like processors (ASIP) interconnected by fieldprogrammable buses, embedded in 20Mbytes of distributed DRAM and flash memory, Another Half: ASIC Computational power will not result from multi-GHz clocking but from parallelism, with below 200 MHz. This will greatly simplify the design for correct timing, testability, and signal integrity. Copyrightⓒ2004 6 4장. HW/SW Co-Design for SoC 고전적 HW/SW 설계 방법론 고전적 HW/SW 설계 과정 제작 초기 단계부터 시스템을 하드웨어와 소프트웨어로 분할 하드웨어와 소프트웨어를 독립적으로 개발 각각을 개발 완료한 후 통합 고전적 HW/SW 설계의 문제점 하드웨어와 소프트웨어 부분을 개발 중간에 교체하기 어려움 시스템을 통합하고, 검증하기 위하여 많은 시간이 소요됨 최근의 설계 요구사항 최적의 가격대 성능비 낮은 전력소모량 사용자 편이성, 무게, 부피 Time to market Copyrightⓒ2004 7 4장. HW/SW Co-Design for SoC Co-design의 필요성 설계 초기 단계부터 하드웨어 소프트웨어를 동시에 고려 하는 방법론 필요 “Co-design” Co-design의 특징 Concurrency & Integration 하드웨어/소프트웨어를 통합하여 동시에 개발, 검증 임베디드 시스템과 SoC의 설계에 적합 하드웨어/소프트웨어 최적 분할로 가격대 성능비 향상 하드웨어와 소프트웨어의 설계 시간, 설계 비용, 에러의 감소로 인한 time-to-market 가능 Copyrightⓒ2004 8 4장. HW/SW Co-Design for SoC HW/SW Co-design for SoC Introduction of HW/SW Co-design HW/SW Co-design Methodology System Specification HW/SW Co-partitioning HW/SW Co-synthesis HW/SW Co-verification Co-design Related Works 참고문헌 Copyrightⓒ2004 9 4장. HW/SW Co-Design for SoC General Co-design Process System Specification Performance Goal Control Data Flow Graph Cost Estimation [Delay, Area, Power] Constraint Analysis C O I S I M U L A T O R Memory / Pipeline Optimization Hardware Software Partitioning Co Synthesis Interface Library Software Specificaion Hardware Specification Interface Specification Behavioral Synthesis Complier Custom HW Memory Application SW Device Driver C P U Application SW Device Driver Verification Debugging System Bus Copyrightⓒ2004 10 4장. HW/SW Co-Design for SoC Flexibility: 응용 가능한 제품의 수 HW/SW Co-design Space SW: Flexibility 증가 10000 1000 100 Co-Design Space HW: 전력 효율 증가 10 1 101 102 103 104 105 전력 효율 (MIPS/W) Copyrightⓒ2004 11 4장. HW/SW Co-Design for SoC Design space exploration Customer/marketing system architect Cospecification High-level transformation System architect Design space exploration space System analysis Reused functions and processes Process transformation HW/SW partitioning and scheduling HW arch & comp. Reused HW & SW components HW synthesis SW synthesis Source: Ernst (IEEE D & T of Computer) Evaluation (cosimulation) Copyrightⓒ2004 12 4장. HW/SW Co-Design for SoC Previous work(1) ASIP( application specific integrated processor ) codesign => builds a specific programmable processor => translates the application into software code executable by the specific processor => include the instruction set design Hardware / Software synchronous system co-design => software processor acting as a master controller => a set of hardware accelerators acting as coprocessors => cost for software and speed for hardware Copyrightⓒ2004 13 4장. HW/SW Co-Design for SoC Previous work(2) Hardware / Software for distributed systems => the mapping of a set of communicating processes onto a set of interconnected processors => behavioral decomposition, processor allocation, communication transformation => partitioning methods restrict the cost function to parameter Co-design corporations => Coware, Specsyn, Siera, Ptolemy Copyrightⓒ2004 14 4장. HW/SW Co-Design for SoC System Specification 개요 시스템을 통일된 표현 기법으로 기술하는 단계 특징 하드웨어와 소프트웨어를 위한 통일된 설계/분석 기술을 지원해야 함 시스템 작업들이 쉽게 하드웨어 또는 소프트웨어로 변경될 수 있음 통합된 설계 환경에서 시스템 평가 가능 빠른 성능 분석 가능 System-level language : systemVerilog, SystemC, SpecC, etc… Copyrightⓒ2004 15 4장. HW/SW Co-Design for SoC System-Level Language 필요성 시스템 디자인의 복잡도(complexity) 증가 고수준의 추상화(abstraction)와 모델링(modeling) 요구 효율적인 시스템 디자인 flow가 필요 갖추어야 할 사항 다양한 추상화 레벨의 시스템 모델을 지원하여야 함 임베디드 소프트웨어 부분을 전체 시스템에 통합할 수 있어야 함 실행가능한 디자인 명세(specification)를 생성할 수 있어야 함 실행가능한 플랫폼 모델을 생성할 수 있어야 함 Copyrightⓒ2004 16 4장. HW/SW Co-Design for SoC System Level Language의 종류 SystemC Cynlib C/C++ Based SoC++ Handel-C A/RT (Library) VHDL+ System-Level Modeling Language VHDL/Verilog Replacements System Verilog Higher-Level Language SDL SLDL Entirely New Language SUPERLOG Java Based Copyrightⓒ2004 Java 17 4장. HW/SW Co-Design for SoC SystemVerilog 특징 Verilog 코드의 생산성(productivity)과 가독성(readability) 향상 간결한 하드웨어 기술(hardware description) 제공 Verilog-2001로 High-level abstraction을 확장 Verilog의 assertion 문을 통합하여 검증(verification) 확장 Reference systemVerilog 3.1, ballot draft: Accellera’s Extensions to Verilog Accellera, Napa, California, April 2003 Verilog 2001: A Guide to the new Verilog Standard, Stuart Sutherland, Kluwer Academic Publishers, Boston, Massachusetts, 2001 http://www.eedesign.com/story/OEG20030521S0086 Copyrightⓒ2004 18 4장. HW/SW Co-Design for SoC SystemC Design Flow Copyrightⓒ2004 19 4장. HW/SW Co-Design for SoC System-Level Language의 동향 기존의 디자인 언어 기능 확장(ex: SystemVerilog) 다양한 추상화 레벨의 시스템 모델을 지원 장점 단점 설계자에게 친숙한 문법과 환경 제공 이전 버전과의 호환성 제공 현재 표준화 미비 하드웨어 기술언어 습득이 어려움 C/C++ 기반의 언어사용(ex: SystemC) 장점 고도의 추상적이고 논리적인 기술 가능 Executable specification에 적합 단점 HW/SW 분할의 용이성 부족 하드웨어 특성에 대한 완벽한 기술이 용이하지 않음 Copyrightⓒ2004 20 4장. HW/SW Co-Design for SoC General Co-design Process System Specification Performance Goal Control Data Flow Graph Cost Estimation [Delay, Area, Power] Constraint Analysis C O I S I M U L A T O R Memory / Pipeline Optimization Hardware Software Partitioning Co Synthesis Interface Library Software Specificaion Hardware Specification Interface Specification Behavioral Synthesis Complier Custom HW Memory Application SW Device Driver C P U Application SW Device Driver Verification Debugging System Bus Copyrightⓒ2004 21 4장. HW/SW Co-Design for SoC HW / SW Partitioning 개요 정 의 목 표 HW / SW Co-design에서의 분할은 상위 단계의 시스템 동작 기술을 하드웨어와 소프트웨어 부분으로 재구성 하는 것을 의미 시스템의 성능, 면적, 지연시간, 통신으로 인한 오버헤드 등을 고려하여 가격과 성능을 모두 만족시킬 수 있도록 분할 특징 Hardware 구현 하드웨어의 속도와 작업의 병렬 수행을 통한 높은 성능을 제공 추가적인 ASIC 또는 FPGA등이 필요하므로 하드웨어 비용 증가 Software 구현 저가의 고성능 프로세서에서 동작하므로 하드웨어 비용 감소 오퍼레이션의 순차적인 실행으로 인한 성능의 감소 Copyrightⓒ2004 22 4장. HW/SW Co-Design for SoC HW / SW Partitioning 장점 시스템 개발에 소요되는 비용과 시간의 최소화 최적의 가격 대 성능비 HW의 성능과 SW의 저렴성 사이의 균형을 맞추어 최적의 가격 대 성능 비 도출 [분할 -> 평가 -> 재분할] 과정의 반복 횟수 감소 시장경쟁력 확보 (Time to Market) 테스트 단계에서의 검증 시간 단축 유연한 시스템 기술 가능 Software 구현으로 개량 및 유지 보수 용이 여러 형태의 시스템 설계 가능 Copyrightⓒ2004 23 4장. HW/SW Co-Design for SoC HW / SW Partitioning Flow System Specification Target Architecture Library Target Architecture SW Compiler HW Cost Estimation SW Cost Estimation Partitioning Graph HW / SW Partitioning HW SW Partitioning Graph Copyrightⓒ2004 24 4장. HW/SW Co-Design for SoC HW / SW Partitioning 고려사항 Hardware / Software 동작 특성 Hardware 기능의 공유 하드웨어와 소프트웨어 간의 추가적인 통신시간 고려 Scheduling 하드웨어의 면적과 수행시간을 고려하여 하드웨어 기능의 공유 HW와 SW 간의 통신 하드웨어의 특징과 소프트웨어의 특징을 고려하여 분할을 수행 해야 함 하드웨어와 소프트웨어간의 수행시간 및 순서의 조절 Functional Pipeline 하드웨어와 소프트웨어간의 통신 및 수행시간의 최적화 Copyrightⓒ2004 25 4장. HW/SW Co-Design for SoC HW / SW 동작 특성 하드웨어로 구현 : 각각의 모듈이 병렬로 동작, 하드웨어 면적의 증가 소프트웨어로 구현 : 각각의 모듈이 순차적으로 동작 가능, 수행시간의 증가 SW Implementation V1 V2 HW Implementation Void V1(..)[..] Void V1(..)[..] void main() { v1(..); v2(..); } Processor P1 Hardware H1 SW Schedule … V1 … Copyrightⓒ2004 V1 V2 HW Schedule … V2 … … …t V1 V2 … … …t 26 4장. HW/SW Co-Design for SoC HW 기능의 공유 하드웨어 면적(area) 최소화 가능 수행시간(execution time)의 증가 V1 V2 V3 V4 AREA : FU1 FU2 FU3 FU4 FU1 FU2 FU1 TIME : V1 V1 V2 V2 V3 V2 V3 V4 V3 V4 t Copyrightⓒ2004 V1 V4 t t 27 4장. HW/SW Co-Design for SoC HW와 SW간 Interfacing 하드웨어와 소프트웨어의 데이터 전송을 위해 필요 추가적인 통신 시간에 대한 고려 필요 V1 Processor P1 Hardware H1 Channel C V2 Scheduling P1 V1 C H1 V2 t Copyrightⓒ2004 28 4장. HW/SW Co-Design for SoC Logical Bus Architecture System Bus Signals address, data, control signals address space consists of the memory space & I/O space memory space : memory of the SW component I/O space : ports within SW & registers in other HW Port Signals These are specialized signals capable of directly interfacing between SW & HW component Interrupt Signals When SW & HW components have completed an operation, or when an error condition is detected Copyrightⓒ2004 29 4장. HW/SW Co-Design for SoC Scheduling Invalid Channel Accesses V2 V1 V1 w1 r3 Communication Channel을 통해서 프로 세서(SW)가 데이터를 Read 또는 Write 시 다 른 동작(HW 또는 SW) 불가능 w3 V3 V2 t Valid V1 V3 V2 w1 r3 r3 V3 HW t SW Execution Order [Read->Execution-> Write] 단계 Communication Channel을 통해서 데이 터를 읽음(Read) 필요한 데이터를 읽은 후 실행(Execution) 실행 결과를 다시 Communication Channel에 전송(Write) Copyrightⓒ2004 Invalid V2 V1 read1 read2 r1 V1 r2 w1 w2 V2 t Valid write1 V2 V1 write2 r1 w1 r2 w2 SW t 30 4장. HW/SW Co-Design for SoC Functional Pipelining 전체 시스템의 수행시간(Execution Time)을 최적화 Total Execution Time Not Pipeline V1 V2 V1 V2 V1 V3 V4 Total Execution Time Pipeline V4 V4 t V3 V2 V3 V1 V2 V1 V2 V1 V2 HW SW V3 V4 V3 V4 V3 t Copyrightⓒ2004 31 4장. HW/SW Co-Design for SoC Partitioning Graph (Computation Models) State-oriented models 시스템 을 상태(States)의 집합과 상태 변화 (Transition)의 집합으로 표현 Finite State Machine (FSM) Hierarchical Concurrent FSM 계층적인 구조를 가짐 하위(Sub-State) 구조는 상위 구조와 Concurrent 하게 동작 C’/00 C’/00 S00 C/10 S00 C/00 C/10 C/00 S10 S02 S01 S02 C/01 S01 S13 S11 C/01 S12 C’/00 C’/00 FSM Copyrightⓒ2004 C’/00 C’/00 Hierarchical Concurrent FSM 32 4장. HW/SW Co-Design for SoC Partitioning Graph (Computation Models) Activity-oriented models 시스템을 Data 또는 Control Dependency에 의한 Activity의 집합으로 표현 Data Flow Graph (DFG) Control Flow Graph (CFG) Control / Data Flow Graph (CDFG) Read a CFG와 DFG를 합친 모델 일반적으로 HW / SW Co-design에서 많이 쓰임. Read b Read c Read d Start read i i=1 - + 10 <= If i<=10 Read X(i) * Z(i) = x(i)*y(i) i=i+1 Write r Read Y(i) Read i 1 + + end end DFG Copyrightⓒ2004 CFG CDFG 33 4장. HW/SW Co-Design for SoC Software Cost Estimation Software program memory Assembler Code의 명령어 (Instruction) 개수로 추정 C - Specification C Compiler Software data memory Source Code 내의 모든 변수들 의 메모리 요구량으로부터 추정 Assembler Code Instruction Set Description Subroutine Library Software execution time Instruction set Description을 통해 명령어의 실행 시간 추정 Subroutine Library를 통해서 서 브루틴(예:Function 문)의 수행 시간을 추정 Copyrightⓒ2004 Partitioning Graph SW Estimation 34 4장. HW/SW Co-Design for SoC Hardware Cost Estimation C to VHDL C - Specification 하드웨어 비용의 추정을 위해 C Code를 VHDL 코드로 변환 C – VHDL Generator Hardware Cost 추정 VHDL Code Architecture Library를 통해서 VHDL 코드의 실제 하드웨어 면 적 및 수행시간을 추정 Target Architecture Partitioning Graph HW Estimation Copyrightⓒ2004 35 4장. HW/SW Co-Design for SoC Partitioning Algorithm 분류 구조적 알고리즘 (Constructive algorithms) 분할을 하기 전에 각각의 객체들을 그룹화 객체들의 Closeness를 이용해 그룹화 회로의 규모가 커지면, 그룹화 과정에서 과도한 수행시간 반복적 알고리즘 (Iterative algorithms) 일정한 방법에 의하여 분할된 초기 분할 결과를 이용하여 반복적 으로 분할 대상 객체를 이동시켜 분할 결과를 항상 시킴 Constructive 알고리즘에서 쓰는 Closeness 기능 보다 보다 정 확한 평가 가능 Greedy 알고리즘, Simulated Annealing 알고리즘 등등 Copyrightⓒ2004 36 4장. HW/SW Co-Design for SoC Clustering Partitioning Algorithm Closeness를 사용하는 Constructive algorithm 알고리듬 Closeness가 높은 객체들을 그룹화 다시 Closeness를 계산 종결 조건이 만족할때 까지 반복 수행 O1 30 O2 10 15 10 O4 O1 25 O3 O2 20 O2 O1 10 O4 O1 O2 O3 10 10 O3 O3 O4 10 O4 Avg(10,10) = 10 Avg(15,25) = 20 Copyrightⓒ2004 37 4장. HW/SW Co-Design for SoC Simulated Annealing Iterative algorithm modeled after physical annealing process 알고리즘 초기 분할 시작 및 초기 온도(temperature) 천천히 온도 감소e 각각의 온도에서 무작위 이동 발생 분할 비용(Cost)가 개선된 이동만 분할 적용 온도가 높을 때는 분할 비용이 좋지 않은 이동도 분할 적용 분할 결과 및 수행 시간(Complexity)은 온도 감소량(Temperature decrease rate)에 따라 달라짐 Reference Kirnighan.B.W “Optimization by simulated annealing” 1983 Copyrightⓒ2004 38 4장. HW/SW Co-Design for SoC Simulated Annealing (Cont’d) temp = initial temperature Cost = objfct(P) While not frozen loop while not Equilibrium loop P_tentative = Move(P) cost_tentative = objfct(P_tentative) cost = cost_tentative – cost if (Accept(cost, temp) > Random(0,1)) then P = P_tentative cost = cost_tentative end if end loop temp = DecreaseTemp(temp) End loop Accept(cost, temp) = min(1,e-cost/temp) Copyrightⓒ2004 39 4장. HW/SW Co-Design for SoC HW / SW Partitioning 관련 Works POLIS : U.C. Berkeley COSYMA (Software-oriented Partitioning) : Cost Estimation에 의한 Simulated Annealing 알고리즘 사용 시스템이 초기에는 소프트웨어로 구현되어 있으며, 시스템 성능의 bottleneck이 되는 부분을 찾아내어 하드웨어 부분으로 분할을 수행하여 시스템의 전체 수행시 간 최소화. Vulcan (Hardware-oriented Partitioning) : Stanford U. Co-simulation을 통해서 성능을 추정을 해서 사용자가 직접 하드웨어 부분과 소 프트웨어 부분으로 분할 Greedy 알고리즘 사용 HardwareC로 기술된 시스템에서 greedy 알고리즘을 사용하여 noncritical operation들을 Software 부분으로 분할하여 하드웨어의 크기 최소화 Ptolemy Greedy 알고리즘 사용 시스템의 시뮬레이션과 합성을 위한 통합 환경 제공 전체 시스템을 통일되고 일반적인 언어로 기술하는 대신 각 서브 시스템에 적합한 표현 방식을 사용하며 이질적인 표현 방식들 간의 통일된 인터페이스를 제공 Copyrightⓒ2004 40 4장. HW/SW Co-Design for SoC General Co-design Process System Specification Performance Goal Control Data Flow Graph Cost Estimation [Delay, Area, Power] Constraint Analysis C O I S I M U L A T O R Memory / Pipeline Optimization Hardware Software Partitioning Co Synthesis Interface Library Software Specificaion Hardware Specification Interface Specification Behavioral Synthesis Complier Custom HW Memory Application SW Device Driver C P U Application SW Device Driver Verification Debugging System Bus Copyrightⓒ2004 41 4장. HW/SW Co-Design for SoC HW/SW Co-Synthesis 분할된 결과로부터 각 하드웨어/소프 트웨어 컴포넌트와 인터페이스를 합성 하는 단계 실제 시스템 구조(architecture)에 적합 하도록 성능의 최적화(optimization) Hardware synthesis FPGA or ASIC HDL structural description 기존 EDA tools SW Design complier, Synplify HW send recv recv send Software synthesis Partition Processor (ARM, Teak DSP…) C or Assembly code generation Compile & Optimize Interface synthesis Communication channel (BUS, Shared memory…) BUS and Protocol generation Copyrightⓒ2004 Software with interface Hardware with interface 42 4장. HW/SW Co-Design for SoC Hardware Synthesis 정의 하드웨어 기술 언어(HDL)로 작성된 설계 명세(specification)로 부터 하드웨어 설계를 자동적으로 구현하는 단계 목표 빠른 설계의 작성과 수정 다양한 설계 대안을 제시하는 방법론이 지원 VLSI 설계 시 디자이너가 지나치게 상세한 부분에 대한 처리를 제거 정확하게 설계된 개발이 가능 Copyrightⓒ2004 43 4장. HW/SW Co-Design for SoC Hardware Synthesis (con’t) Communication considering Receive, send, control logics 생성 C A B E D G F Hardware Communication Network C Recv 1 B Hardware Send 1 Receive Logic D C B D Send Logic Send 2 Hardware Clock Control Logic Copyrightⓒ2004 44 4장. HW/SW Co-Design for SoC Wrapper for Hardware core ASIC Wrapper Network protocol에 따라 신호를 생성 Core ASIC의 수정이 없는 architecture independent 한 특성 Wrapper Data Bus Processor Signals for protocol Output Ready ASIC Intput Ready Start Copyrightⓒ2004 45 4장. HW/SW Co-Design for SoC Software Synthesis 정의 명세서(specifications)와 재사용 가능한 컴포넌트로부터 정확하 고 효율적인 소프트웨어를 자동으로 생성하는 단계 목표 소프트웨어 생산성의 향상 낮은 개발 단가 명세서를 만족하는 소프트웨어 구현의 신뢰성 향상 정확한 프로그램 개발 가능 메모리 사용의 최소화 (코드, 데이터) Copyrightⓒ2004 46 4장. HW/SW Co-Design for SoC Software Synthesis (con’t) … void main() { … /* hardware execution */ recv(data); B(); C(); D(); send(data1, data2); /* hardware execution */ ... } Copyrightⓒ2004 E C A F Software Send 1 C Recv 1 G D B D B Send 2 software 47 4장. HW/SW Co-Design for SoC Interface Synthesis 이종 컴포넌트간의 통신이 가능하도록 합성하는 단계 Interface components Hardware : bus interface, glue logic Software : device driver, operating system Target architecture 에 의존적 Operating System Processor ASIC Communication Architecture Device Drivers Network Interface Network Interface On-chip Network Copyrightⓒ2004 48 4장. HW/SW Co-Design for SoC Model, Channel, Protocol의 종류 Communication Model Communication Channels Message passing Shared memory communication Dedicated lines Bus FIFO buffers Shared memory Inter-process Communication Model Shared Memory M Process A begin variable i; ... M := i; ... end Process B begin variable j; ... j := M; ... end < Shared Memory > Process A begin variable i; ... send(i); ... end Process B Channel begin variable j; ... receive(j); ... end < Message Passing > Communication Protocols 2-phase or 4-phase handshake RS-232, USB, PCI and etc… Copyrightⓒ2004 49 4장. HW/SW Co-Design for SoC Communication Channels의 특징 Communication Communication Blocking or Channel Model Non-blocking Topology Dedicated lines Message passing Blocking Point-to-point Bus (without memory) Message passing Blocking Multi-way FIFO Message passing Non-blocking Point-to-point FIFO Shared memory Non-blocking Multi-way Copyrightⓒ2004 50 4장. HW/SW Co-Design for SoC Channel Refinement 메시지가 전송되는 채널을 실제 적인 통신 네트워크로 구축 Bus generation Channels F2 F1 bus width 결정 data line의 개수 F3 Protocol generation 버스를 통해 일어나는 전송 메커 니즘의 정의 Microprocessor F1 OS ASIC ASIC F2 F3 Network Interface Network Interface Device Drivers Network Interface Physical Communication Network Copyrightⓒ2004 51 4장. HW/SW Co-Design for SoC General Co-design Process System Specification Performance Goal Control Data Flow Graph Cost Estimation [Delay, Area, Power] Constraint Analysis C O I S I M U L A T O R Memory / Pipeline Optimization Hardware Software Partitioning Co Synthesis Interface Library Software Specificaion Hardware Specification Interface Specification Behavioral Synthesis Complier Custom HW Memory Application SW Device Driver C P U Application SW Device Driver Verification Debugging System Bus Copyrightⓒ2004 52 4장. HW/SW Co-Design for SoC HW/SW Co-verification 란? ASIC으로 구현되는 하드웨어의 동작과 마이크로프로세서 위에 서 돌아가는 소프트웨어를 동시 에 고려하여 회로의 동작을 검 증하는 방법 SoC, embedded system 같은 하드웨어 요소와 소프트웨어 요 소를 동시에 가지고 있는 시스 템의 검증시 사용 각 디자인 단계에서 하드웨어와 소프트웨어 요소가 가질 수 있 는 모든 문제점을 빠르게 발견 하여 수정 Copyrightⓒ2004 Software Code HDL Code Compiler Processor Model Setup Link Configuration File Name Specification Debug Files Information Configuration File Setup Debug Memory Definition HW/SW Co-verification 53 4장. HW/SW Co-Design for SoC Co-verification Methods 시뮬레이션 기반 (Simulation-based) 장점 : 유동성 (flexibility), high visibility 단점 : 검증 속도가 매우 느림 실제적인 구현 전 단계에서 사용가능 하여 경제적 에뮬레이션 기반 (Emulation-based) 장점 : speed, ICE capability 단점 : high cost, low visibility 실제적으로 동작하는 과정을 보며 검증 가능 Copyrightⓒ2004 54 4장. HW/SW Co-Design for SoC Co-simulation Soft or Virtual Prototype Simulator Feature ISS (C/C++) HDL Simulator (VHDL, Verilog) Behavioral level HDL로 구현하고 logic simulator를 사용하여 시뮬레이션 Software Simulation ISS (Instruction Set Simulator) HDL simulator Virtual Interface Bus Functional Model Hardware Simulation Software models for system Instruction Set Simulator (ISS)을 사용하여 컴파일 된 코드를 호스트에서 실행시킴 Abstract RTOS 를 지원해야 함 Peripherals를 C-model로 작성하여 시뮬레이 션 Interface Simulation 가상의 transaction을 사용 Abstract device driver 와 하드웨어 로직을 위해 behavioral level HDL로 wrapper를 기술 Copyrightⓒ2004 55 4장. HW/SW Co-Design for SoC Co-emulation Real prototype Hardware models for system Emulation System Emulation system feature FPGA for HW prototype Real CPU for SW code Peripherals for I/O CPU DSP FPGA Memory UART Etc... System-Level Testbench 에뮬레이션 시스템은 시뮬레이 터에 비해 매우 빠르게 동작하 므로 검증 시간 단축 Copyrightⓒ2004 56 4장. HW/SW Co-Design for SoC Co-verification Strategy 각 설계 단계 별로 시뮬레이션과 에뮬레이션을 사용하여 시스템 검증 설계 초기 단계 Virtual prototype을 사용하는 co-simulation에 의한 검증 Prototype의 검증을 통해 설계 초기 단계에 나타나는 비교적 간 단한 문제점을 빠르게 제거할 수 있음 설계 중반 및 구현 단계 Real prototype을 사용하는 co-emulation에 의한 검증 RTL or Gate level 에서 발생할 수 있는 문제점 같은 구현 단계에 서 나타날 수 있는 문제점을 제거 Copyrightⓒ2004 57 4장. HW/SW Co-Design for SoC HW/SW Co-design for SoC Introduction of HW/SW Co-design HW/SW Co-design Methodology System Specification HW/SW Co-partitioning HW/SW Co-synthesis HW/SW Co-verification Co-design Related Works 참고문헌 Copyrightⓒ2004 58 4장. HW/SW Co-Design for SoC Mentor Graphics Seamless CVE Seamless® 는 하드웨어 시뮬레이터, ISS, abstract RTOS, virtual interface를 통하여, 우수한 co-simulation 환경을 제공 ISS BUS transaction HDL simulator Copyrightⓒ2004 59 4장. HW/SW Co-Design for SoC Altera Excalibur ARM core, AHB, FPGA를 하나의 칩 안에 구성 실제 구현단계에서의 빠르고 뛰어난 HW/SW 검증 환경 제공 AHB ARM core FPGA Copyrightⓒ2004 60 4장. HW/SW Co-Design for SoC EDA Co-design/Co-simulation Tools Company Product Feature Cadence VCC HW/SW co-design tool Coware N2C C/C++ based high-level co-design tool Mentor Graphics Platform Express Platform based SoC design tool Mentor Graphics Seamless CVE HW/SW co-simulation tool Synopsys CoCentric System Studio SystemC based co-design & co-verification tool Copyrightⓒ2004 61 4장. HW/SW Co-Design for SoC HW/SW Co-design for SoC Introduction of HW/SW Co-design HW/SW Co-design Methodology System Specification HW/SW Co-partitioning HW/SW Co-synthesis HW/SW Co-verification Co-design Related Works 참고문헌 Copyrightⓒ2004 62 4장. HW/SW Co-Design for SoC Copyrightⓒ2004 63 4장. HW/SW Co-Design for SoC Design Flow using Seamless CVE Specify System Typical System Design Process Specify system Design HW & SW Test together Design Hardware i e n i f f e i c t n Design Software Integrate & Test No No OK? Yes Copyrightⓒ2004 64 4장. HW/SW Co-Design for SoC Design Flow using Seamless CVE Logic Simulation Requires a fully functional microprocessor model Bus functional models are not fully functional Software models are too slow Software models may not be available Hardware models have limited capability Limited debugging capability Okay for verifying hardware Ineffective for running software Copyrightⓒ2004 65 4장. HW/SW Co-Design for SoC Design Flow using Seamless CVE Instruction Set Simulation Fast Good debugging capability Can model custom hardware Limited I/O and interrupt handling Copyrightⓒ2004 66 4장. HW/SW Co-Design for SoC Design Flow using Seamless CVE Seamless CVE X-ray for Debug g e bu atio n D a re Sim ul w t f So ware t Sof ModelSIM for Simulation Seamless te Sys m t Con rol ug n b e o e D ulati r a d w re Sim r a H dwa r a H Performance Optimization Copyrightⓒ2004 67 4장. HW/SW Co-Design for SoC Design Flow using Seamless CVE Copyrightⓒ2004 68 4장. HW/SW Co-Design for SoC Platform 분류 Application Platform: 멀티미디어 platform: Nexperia, TI의 OMAP 3G 무선 platform: Infineon의 M-gold Bluetooth platform: Parthus 무선 platform: ARM의 PrimeXsys Process-centric platform Improv System, ARC, Tensilica, Triscend Communication-centric platform: Sonics, Palmchip Copyrightⓒ2004 69 4장. HW/SW Co-Design for SoC The Platform-Based Design Concept Cadence Pre-Qualified/Verified Foundation-IP* HW-SW Kernel + Reference Design Scaleable bus, test, power, IO, clock, timing architectures MEM Hardware IP SW IP Application Space CPU FPGA Reconfigurable Hardware Region (FPGA, LPGA, …) Programmable *IP can be hardware (digital or analogue) or software. IP can be hard, soft or ‘firm’ (HW), source or object (SW) Copyrightⓒ2004 Processor(s), RTOS(es) and SW architecture Foundry-Specific HW Qualification SW architecture characterisation 70 4장. HW/SW Co-Design for SoC WCDMA BER with Image Quality Verification Cadence WCDMA Channel Floating Point. WCDMA Channel A • modulation transfer function area (MTFA) Image Quality Tester (IQT) • integrated contrast sensitivity (ICS) B • square root integral (SQRI) • subjective quality factor (SQF) • folded SQRI Fixed Point. Polymorphism for Rapid Conversion Floating to Fixed Pt. Models Copyrightⓒ2004 • folded MTFA • peak signal/noise ratio (PSNR). 71 4장. HW/SW Co-Design for SoC Platform Architecture Do I need a dedicated DSP ? Which microcontroller? ARM? HC11? ARC? Which RTOS do I use? Which scheduling policy do I have to choose ? How fast will my user interface software run? How much can I fit onto my microcontroller? Which Bus? PI? AMBA? Dedicated Bus for DSP? Can I buy a QCELP decoding core? Do I need a dedicated HW or can I run this on the Microcontroller ? Copyrightⓒ2004 72 4장. HW/SW Co-Design for SoC Platform-based HW/SW Coverification Copyrightⓒ2004 73 4장. HW/SW Co-Design for SoC Ingredients of An Architecture Platform Copyrightⓒ2004 74 4장. HW/SW Co-Design for SoC Example of Platform-based Design Copyrightⓒ2004 75 4장. HW/SW Co-Design for SoC Pros & Cons of Platform-based Desig n Copyrightⓒ2004 76 4장. HW/SW Co-Design for SoC Triscend A7 System Highlights Copyrightⓒ2004 77 4장. HW/SW Co-Design for SoC Cypress MicroSystems - PSoCTM Copyrightⓒ2004 78 4장. HW/SW Co-Design for SoC Wipro’s SOC-RaPtorTM Architecture Copyrightⓒ2004 79 4장. HW/SW Co-Design for SoC Philips Rapid System Prototyping Copyrightⓒ2004 80 4장. HW/SW Co-Design for SoC Embedded Software Architecture for SoC Design Copyrightⓒ2004 81 4장. HW/SW Co-Design for SoC Solutions to Derivative Design Problem The corrupted voice mail decoding is due to the JPEG decoding hav ing a higher priority than the QCELP audio decoding of the voice m ail, on the DSP. There are 2 possible solutions: 1. HW/SW tradeoff Move JPEG decoding which stalled the QCELP audio decoding into hardware 2. SW/SW tradeoff Re-prioritise the QCELP audio decoding We will explore option 1, by moving part of the JPEG decoding (IDC T) to dedicated HW. Copyrightⓒ2004 82 4장. HW/SW Co-Design for SoC PEG Encoding/ Decoding Design of Systems-on-a-Chip Masaharu Imai School of IST, Osaka University E-mail: imai@ics.es.osaka-u.ac.jp http://vlsilab.ics.es.osaka-u.ac.jp/~imai/ Principle of codesign method Specification of Machikane-I JPEG Encoding Decoding Block Diagram of Machikane-I Estimation of Design Quality Optimum Partitioning Copyrightⓒ2004 83 4장. HW/SW Co-Design for SoC Design strategy Put high-rate but simple functions on peripheral processors. Also moves control physically closer. Consolidate low-rate background tasks on main CPU. Can multiple processes execute concurrently? Is the performance granularity of available components fine enough to allow efficient search of the solution space? Do computation and communication requirements conflict? How accurately can we estimate performance? software custom ASICs Copyrightⓒ2004 84 4장. HW/SW Co-Design for SoC Granularity of Description Copyrightⓒ2004 85 4장. HW/SW Co-Design for SoC Tradeoff between HW cost and Performance Copyrightⓒ2004 86 4장. HW/SW Co-Design for SoC Spec. of Digital Motion Camera Function Encoding Procedure Image Input R2Y: Transform from RGB to YCbCr Image Display DCT: Discrete Cosine Transform Image Compression Q: Quantization VLC: Variable Length Coding Image Store Decoding Procedure Image Transmission VLD: Variable Length Decoding IQ: Inverse Quantization User Interface IDCT: Inverse Discrete Cosine Transform Copyrightⓒ2004 Y2R: Transform from YCbCr to RGB 87 4장. HW/SW Co-Design for SoC Block Diagram Copyrightⓒ2004 88 4장. HW/SW Co-Design for SoC CODEC Copyrightⓒ2004 89 4장. HW/SW Co-Design for SoC Software Implementation Copyrightⓒ2004 90 4장. HW/SW Co-Design for SoC Some Functions by Hardware Copyrightⓒ2004 91 4장. HW/SW Co-Design for SoC Computation Time Copyrightⓒ2004 92 4장. HW/SW Co-Design for SoC Pipeline Processing of Images Copyrightⓒ2004 93 4장. HW/SW Co-Design for SoC Case 1: Parallel Processing Copyrightⓒ2004 94 4장. HW/SW Co-Design for SoC Case 2: Sequential Processing Copyrightⓒ2004 95 4장. HW/SW Co-Design for SoC Partitioning 방법 Copyrightⓒ2004 96 4장. HW/SW Co-Design for SoC Partitoining Method C Copyrightⓒ2004 97 4장. HW/SW Co-Design for SoC Partitioning Method D Copyrightⓒ2004 98 4장. HW/SW Co-Design for SoC Partitioning v.s. Design Quality Copyrightⓒ2004 99 4장. HW/SW Co-Design for SoC Partitioning v.s. Design Quality Copyrightⓒ2004 100 4장. HW/SW Co-Design for SoC 분할 방법 Grouping of Similar Process Components Major Classification Division Assignment Grouping HardWare HardWare Detailed Classification Assignment HardWare Process Component 1 Process Component 1 Group 1 Process Component 1 Process Component 2 Process Component 3 Group 2 Process Component 3 Process Component 3 Process Component 6 SoftWare The Design Target Model SoftWare Process Component 2 Group 3 Process Component 4 Group 4 SoftWare Process Component 2 Un-Decision Process Component n Un-Decision Process Component 5 Group 5 Process Component 4 Process Component 6 Group 6 Process Component 5 Y. Endo, H. Koizumi Dept. of Computer & System Eng. Tokyo Denki Univ. Japan Copyrightⓒ2004 101 4장. HW/SW Co-Design for SoC Major Classification Visual Basic로 design target model을 description하고 process component로 나누기 위해 분석한다. 각 process component에 대한 software speed를 예측 함. HW parts와 SW parts로의 구분 각 component의 process 속도가 예상 결과를 만족시키지 못하 는 경우 HW part가 된다. Func. trade-off를 위한 process component의 수를 줄 이는 것은 trade-off를 위한 작업 시간을 크게 줄임 때문에, 유사한 process component들을 group화 하여 trade-off를 위한 component의 수를 줄인다. Copyrightⓒ2004 102 4장. HW/SW Co-Design for SoC Detailed Classification HW processing time을 계산하는 방법 Model의 SW speed에 coefficient constant를 곱하는 방법 SW logic에 상응하는 knowledge database를 적용하여 HW processing speed를 계산하는 방법 HW와 SW간의 communication time HW와 SW간의 communication 속도와 전송 data의 양에 기반하 여 계산 Design의 목적과 제한사항에 대한 모든 조합이 만족되지 않으면, system algorithm의 개선과 design 목적 및 제한사항을 재검토하게 된다. Copyrightⓒ2004 103 4장. HW/SW Co-Design for SoC Development of a Real-time Motion Image Encoder using Codesign Methodology C.W Chau, S.Kwong, K.F.Man, W.A.Halang and A.D. Stoyenko City University of Hong Kong Fern Universitat, Hagen, Germany New Jersey Institute of Technology A set of Design constraints: area, real-time requirements, performance, memory requirements, power, consumption and programmability In a complete design cycle for a real-time image encoder develop or select of an algorithm use a high-level functional simulations to verify the suitability of the algorithm the algorithm will then break down into several sub-functions. decomposition of algorithm by hardware/software co-design Software implementation : Functions that need field programmability or that inherently are better implemented in software Hardware implementation : Time-critical functions to improve system performance Copyrightⓒ2004 104 4장. HW/SW Co-Design for SoC Introduction Hardware design : selection of processor, number of processors and their connections On the software design different type of processors affect code generation when using fixedpoint DSP overflow prevention must be included the algorithm must be modified to minimize the effect of finite precision of fixed-point calculations Interface design adding latches, buffers or address decoders in hardware design Handling I/O routine and hardware/software synchronization mechanism in software design Copyrightⓒ2004 105 4장. HW/SW Co-Design for SoC System model of the real-time image encoder Motion image Motion image PC/AT or compatible computer Interface Logic Compressed image data Compressed image data Motion Compensation Motion vectors and estimation errors Compressed image data Buffer Control Master Slave Image Compensation Compression control parameters Figure 2. System model of the image encoder Copyrightⓒ2004 106 4장. HW/SW Co-Design for SoC System model of the real-time image encoder Master-slave model for the system (PC/AT : master , image encoder : slave system) The purpose of real-time image encoder : compress a stream of motion images with resolution of 128 by 128 pixels Slave system : coprocessor to reduce the workload of the host system Host system : more time to handle other tasks such as user interface, data logging Copyrightⓒ2004 107 4장. HW/SW Co-Design for SoC System specifications Description Requirements Size of image 128 by 128 pixels Compression Ratio Adjustable, 30 times compression with 29db SNR Processing Speed Maximum compress speed = 15 frames/sec. Host system PC/AT or compatible computer Size Same as PC/AT add-on card Cost Less than US $200.00 Copyrightⓒ2004 108 4장. HW/SW Co-Design for SoC Interface logic module DMA interface : implementation for two data transfer channels for transferring motion images from the master to the real-time image encoder for transferring of the compressed image data back to the master Motion compensation module DMA Interface FIFO buffer PC/AT PC/AT I/O space decoder FIFO buffer Buffer control module DMA Interface Configuration and control registers Image compression module Interface logic Copyrightⓒ2004 109 4장. HW/SW Co-Design for SoC Motion Compensation Module searching window Searching Window 8 by 8 pixels block Copyrightⓒ2004 110 4장. HW/SW Co-Design for SoC Image Compression Module Discrete Cosine Transform(DCT) algorithm : transforming the estimation error from time domain to frequency domain After the transformation : DC data(mean of the each pixel in searching block), Frequency data of the block Frequency domain is much less than in time domain, since only information of edges is existed in frequency domain. Huffman coding performing on the quantized data and motion vector for more reduction Copyrightⓒ2004 111 4장. HW/SW Co-Design for SoC Buffer Control Module Maintaining a constant bit rate output of the real-time image encoder Interface Logic The importance of a constant bit rate output : some applications such as picture phone and tele-conference because a variable bit rate image compression will cause uneven transmission time and jagged playback in these applications. Timeout timer : the indicator of being required transfer bit rate too high and too low When either states are active the buffer controller will request image compression module to adjust the image compression ratio to maintain the required transfer bit rate Copyrightⓒ2004 Request Timer Timeout Timer Buffer controller Image compression module FIFO buffer 112 4장. HW/SW Co-Design for SoC Software/Hardware Partition Interface logic module : only implementation in hardware because of hardware interface between ISA bus and real-time image encoder Several hardware/software partition methods for the other three modules Interface control and buffer control PC/AT out FIFO buffer In FIFO buffer Copyrightⓒ2004 DSP56001R40 Memories : ROM, SRAM 113 4장. HW/SW Co-Design for SoC Method 1 All the functions except interface logic and buffer control are implemented in software. DSP56001R40 is used as the processor. Low cost(fixed-point DSP), high precision(24 bits word length), high performance(deliver 40 million operations per second), simple hardware implementation(simple bus interface) The simulator : sim56000 for DSP56000 family The clock cycles for compressing an image : 7.27*106 Compression speed of this system : 5.5 frames per second Therefore this design obviously cannot meet the system specifications. Copyrightⓒ2004 114 4장. HW/SW Co-Design for SoC System architecture Interface control and buffer control Interface control and buffer control Memories : REGFIFO,BANK1 and BANK2 DSP56001R40 Memories : ROM, SRAM PC/AT in FIFO buffer out FIFO buffer Copyrightⓒ2004 115 4장. HW/SW Co-Design for SoC Method 2 The IN-FIFO is used to prefetch next image block. The REG-FIFO is used to hold the current image block during block searching. In order to save the data for further use, the data must be written back to the FIFO immediately. BANK1 and BANK2 are used as a double image buffer. One of the bank stored the previous decoded frame and the other bank will be fill up with current decoded frame by the external image compression module. After the motion compensation of current frame finish, the role of the two banks will switch and then the motion compensation of next frame will start. Copyrightⓒ2004 116 4장. HW/SW Co-Design for SoC Motion compensation module REG-FIFO MUX ` From PC/AT Subtract unit IN-FIFO MUX MIN-VTR Absolute unit accumulator MIN-ERR comparator BANK1 MUX CTR1 CONTROL UNIT BANK2 MUX CTR2 To image compression module Copyrightⓒ2004 To interface logic 117 4장. HW/SW Co-Design for SoC Method 3 CTR1 and CTR2 are used to generate addresses for the two SRAM banks. The subtraction unit, absolute unit, accumulator, comparator, MIN-ERR register, MIN-VTR register : being used to find the minimum total absolute error and the corresponding motion vector of the current block searching If the error of current iteration is smaller than the error in MIN-ERR then MIN-ERR and MIN-VTR will be updated to the new minimum total absolute error and the corresponding motion vector respectively After all iterations of the current block searching, the motion vector with minimum error will be stored in REG-VTR. The motion compensation chip will signal external image compression module to read this motion vector. Clock cycles used for each image : 2.43*106 Copyrightⓒ2004 118 4장. HW/SW Co-Design for SoC Method 4 All modules are implemented in hardware. The implementation of interface logic, buffer control and motion compression module are same as before. But, the image compression module is impossible to be implemented by a single chip. At least two Actel’s A1280 is used. Only two A1280 will over the system cost. This approach is then rejected. Copyrightⓒ2004 119 4장. HW/SW Co-Design for SoC System architecture Interface control and buffer control Motion compensation chip Memories : REGFIFO,BANK1 and BANK2 DSP56001R40 Memories : ROM, SRAM PC/AT in FIFO buffer out FIFO buffer 8051MCU Copyrightⓒ2004 FIFO buffer 120 4장. HW/SW Co-Design for SoC Method 4 Interface logic, buffer control, motion compensation module : hardware The other modules : software Since 8051 MCU has enough internal memory for huffman coding so that no external memory is required A FIFO buffer : interface between 8051 and DSP DSP: estimation error, DCT/IDCT, buffer control Clock cycles used for each image : 2.1*106 Compression speed : 15.7 frames/second In-circuit simulator from Philips is used to simulate the huffman coding. Copyrightⓒ2004 121 4장. HW/SW Co-Design for SoC Results Description Compression Speed (Frames/sec.) Cost (US$) 1 Hardware : interface logic, buffer control Software : block searching, estimation error, DCT, inverse DCT, huffman coding 5.5 189.00 2 Hardware : interface logic, buffer control, block searching Software : estimation error, DCT, inverse DCT, huffman coding 16.46 295.00 3 Hardware : interface logic, buffer control, block searching, estimation error, DCT, inverse DCT, huffman coding - - 4 Hardware : interface logic, buffer control, block searching Software : estimation error, DCT, inverse DCT, huffman coding 15.45 280.70 Metho d Copyrightⓒ2004 122 4장. HW/SW Co-Design for SoC Conclusions Method 2 and method 4 can be chosen for the system to meet the system requirement. But method 4 is selected for our prototype. Although compression speed of method 4 is slower than method 2 but it still satisfy the system specifications and with cheaper system cost. Since processor in method 4 running in a lower clock frequency so that the level of difficulty of PCB layout in method 4 less than method 2. With codesign methodology, the hidden problems of the design can be discovered in earlier stage which can reduce time and cost of development. Copyrightⓒ2004 123 4장. HW/SW Co-Design for SoC A hardware / software partitioning technique with hierarchical design space exploration Houria Oudghiri, Bozena Kaminska, and Janusz Rajski, Mentor Graphics Corp. This paper describes a new hardware / software partitioning approach based on a new use of hierarchical modeling A set of DSP examples are considered for co-design on a specific architecture in order to accelerate their performance on a target architecture including a standard DSP processor running concurrently with a custom SIMD (Single Instruction Multiple Data) processor Copyrightⓒ2004 124 4장. HW/SW Co-Design for SoC 1. Introduction and motivation(1) Modern electronic system contain a mix of software running on general-purpose programmable processors, algorithms hardwired into dedicated hardware Hardware / Software co-design is an attempt to integrate hardware and software design techniques with the goal of incorporating more of the system design into a single design methodology The co-design finds applications in various fields such as protocol design, car engine control, parallelizing algorithms to be run on hardware and software Copyrightⓒ2004 125 4장. HW/SW Co-Design for SoC 1. Introduction and motivation(2) Hierarchical model The input is the system model specified as a set of blocks or operations with all their interdependency The input language may be an HDL, a programming language Classical co-design process Select one level System model Partitioning NEW Partition 1 Partition 2 Partition 3 Partition 4 HW synthesis Code generation Code generation Code generation HW partition SW partition 1 SW partition 2 SW partition 3 Interface synthesis Co-implementation Evaluation Final implementation Copyrightⓒ2004 126 4장. HW/SW Co-Design for SoC 1. Introduction and motivation(3) Specific application => the co-design process is used to partition the application algorithms into two execution codes, one for the DSP processor (software) and the other for the custom SIMD processor (hardware) Modeling technique limitations => supports various models from the simplest to the most complex for the same input system => modeling technique provides a large choice for the final implementation Estimation parameters => performance, implementation cost, communication overhead Copyrightⓒ2004 127 4장. HW/SW Co-Design for SoC 1. Introduction and motivation(4) The platform used to accelerate DSP applications is introduced. hardware / software partitioning results for the FFT algorithm are given. Comparison between implementations without partitioning and implementations with partitioning is performed. Another analysis relies on the degree of acceleration obtained when different models, in terms of complexity, are used for the same input system. Copyrightⓒ2004 128 4장. HW/SW Co-Design for SoC 2. The proposed methodology(1) The proposed partitioning algorithm => the dependency graph includes all the blocks with their interaction in the same structure => the classification and the comparison between blocks are easier => a unified structure (the weighed dependency graph) is used during all the partitioning process => the node (blocks in the model) with their own weights (the performance estimation of each block) => the edge (block interactions) with their weights too (quantity of interaction) Copyrightⓒ2004 129 4장. HW/SW Co-Design for SoC 2. The proposed methodology(2) Algorithm 1 input : List of blocks and time constraints , output : Two subsets where blocks are assigned Step 1 : construct the complete weighted dependency graph G Step 2 : Assign all blocks to software, compare the complete system execution time Step 3 : while (time constraints not satisfied) do step 3_i : Select the node with the maximum execution time (i) step 3_ii : Assign i to hardware, Update the system execution time step 3_iii : while (time constraints not satisfied) do step 3_iii_1 : Select the maximum weighted edge connected to i with the most time consuming node (j) step 3_iii_2 : Assign to hardware, Update the dependency graph G Update the system execution time endo endo Copyrightⓒ2004 130 4장. HW/SW Co-Design for SoC 2. The proposed methodology(3) The co-design target architecture is based on two types of processors The Texas Instruments DSP processor TMS320C40 is used as the master processor and the custom SIMD processor PULSE (Parallel Ultra Large Scale Engine, 4 processors in parallel) as the slave processor Copyrightⓒ2004 131 4장. HW/SW Co-Design for SoC 3. Results and discussion(1) The hierarchical model of the FFT transform behavior. Blue: PULSE Initialize Bit Reversal FFT Initialize Variable Initialize Data Bit_init Index_init Read_data Index_incr Bit_loop1 Bit_cond Bit_incr Bit_shift Bit_test Bit_swap1 Bit_swap2 Danielson control Output Dan_init Dan_loop Out_init Out_write Bit_loop2 Loop2_test Bit_acc Loop2_ass Data_test Loop2_shif Danielson Dan_init Initialize Dan_loop1 Copyrightⓒ2004 Level 2 Dan_loop1 Loop2_init Initialize Loop1_body Update Variables Loop2_body Dan_real Loop2_incr Dan_imag Level 7 Level 8 Loop1_incr Out_incr Level 1 Loop1_init Level 3 Level 4 Level 5 Level 6 132 4장. HW/SW Co-Design for SoC 3. Results and discussion(2) There is an optimal level in the hierarchy. This level provides an optimal and balanced partitioning of blocks between hardware and software implementation. The use of the most complex and detailed model doesn’t always mean obtaining the best solution Copyrightⓒ2004 133 4장. HW/SW Co-Design for SoC 3. Results and discussion(3) Table 1 : Block assignment at different hierarchical levels of the FFT transform level Nb.of C40 PULSE Time(ms) / time Bolcks constraint = 25 ms PULSE C40 Total 1 4 2 2 18.14 4.8 22.94 2 10 6 4 18.8 2.96 21.76 3 17 11 6 15.56 9 24.56 4 22 18 6 14.68 10.24 24.92 5 24 17 7 14.56 10.4 24.94 6 24 22 2 6.82 17.72 24.54 7 25 22 3 7 17.92 24.92 8 27 18 9 5.88 18.64 24.52 Copyrightⓒ2004 134 4장. HW/SW Co-Design for SoC 3. Results and discussion(4) Table 2 : Alternative comparison for the FFT transform Execution time (ms) Code size (Bytes) Partition PULSE C40 Reductio n PULSE C40 Reduction 1(C40) 0 38.64 ….. 0 1260 ….. 2(PULSE) 20.44 0 ….. 2196 0 ….. Solution #1 18.15 4.80 40 % 1424 264 23 % Solution #8 5.2 18.64 38 % 352 412 65 % Sol. # 1: all init. And output operations assifned to PULSE Sol. # 8: only processing operation assigned to PULSE Copyrightⓒ2004 135 4장. HW/SW Co-Design for SoC 3. Results and discussion(5) Considering the first levels in the hierarchy, during partitioning, improve considerably the time performance but this is not the case for the code size. The use of medium and last levels may decrease considerably the code size or the area with a very little degradation in performance. The alternative generated are compared to the lowerbound performance (the hardware solution) and the upper bound performance (the software solution) implementation in order to fine the best trade-off. Copyrightⓒ2004 136 Transformational G.F.Marchioro J.M.Daveau partitioning T.B.Ismail A.A.Jerraya co-design for Function Tx , specification Si, implementation S i+1 4장. HW/SW Co-Design for SoC (1) S0 Tx S1 Si Tx Si 1 a sequence of applied transformations => development history development history can be saved and reapplied after a specification change (2) Copyrightⓒ2004 ((S0 T0 ) T1 )..... Tn Sn1 137 4장. HW/SW Co-Design for SoC Transformation steps(1) step 1 : functional specification decomposition Copyrightⓒ2004 step 2 : functional 138 4장. HW/SW Co-Design for SoC Transformation steps(2) step 3 : structural reorganization => assign an execution processor to each function => this step acts on the design units => each abstract processor may be implemented in hardware or in software => several functions may be assigned to the same partition Copyrightⓒ2004 139 4장. HW/SW Co-Design for SoC Transformation steps(3) Step 4 : communication transformation and prototyping => communications are transformed into processors communicating through buses and sharing communication control => the process P1 and P3 will be translated to a behavioral VDHL => the process P2 will be translated to the C-language Copyrightⓒ2004 140 4장. HW/SW Co-Design for SoC Structural reorganisation Step 1 : Move => moves a design unit in the hierarchy and used to prepare a merge operation Step 2 : Merge => the modules that will be assigned to the same processor into a single design unit Step 3 : Map => permits the identification of hardware and software realization options for each process Step 4 : Flat => performs a structural flattening operation on the hierarchy Copyrightⓒ2004 141 4장. HW/SW Co-Design for SoC Co-design On-line publications R. Camposano, J. Wilberg: "Embedded System Design", Design Automation for Embedded Systems, Vol. 1, Nos. 1-2, January 1996. Networked Computer Science Technical Reports Library (Technical reports from many US and other universities): http://cstr.cs.cornell.edu/ IMEC ftp reports (Cathedral): ftp://ftp.imec.be/pub/vsdm/reports/ Stanford Tech Reports: http://elib.stanford.edu/ Synopsys Research Publications: http://www.synopsys.com/news/pubs/research/ATG_index.html Paderborn (Camposano) Technical Reports: http://www-date.unipaderborn.de/RESEARCH/BUILDABONG/buildabong.html University of Dortmund, Methodology for computer-aided design of integrated circuits (Code generation): http://ls12-www.informatik.unidortmund.de/ Copyrightⓒ2004 142 4장. HW/SW Co-Design for SoC Co-design Sites Bibliography of Hardware/Software Codesign: http://www-ti.informatik.unituebingen.de/~buchen/ Ralf Niemann's Codesign Links and Literature: http://ls12-www.informatik.unidortmund.de/~niemann/codesign/codesign_links.html URLs to Hardware/Software Co-Design Research: http://www.ece.cmu.edu/~thomas/hsURL.html RASSP Architecture Guide: http://www.sanders.com/hpc/ArchGuide/TOC.html EDA, Electronic Design Automation: http://www.eda.org COMET (Case Western Reserve University): http://bear.ces.cwru.edu/research/hard_soft.html COSMOS (Tima - Cmp, France): http://timacmp.imag.fr/Homepages/cosmos/research.html COSYMA (Braunschweig): http://www.ida.ing.tu-bs.de/projects/cosyma/ Handel-C (Oxford): http://oldwww.comlab.ox.ac.uk/oucl/hwcomp.html Lycos (Technical University of Lyngby, Denmark): http://www.it.dtu.dk/~lycos/ MOVE (Technical University Delft): http://cardit.et.tudelft.nl/MOVE/ Polis (University of Berkeley): http://www cad.eecs.berkeley.edu/Respep/Research/hsc/abstract.html ProCos (UK Research): http://www.comlab.ox.ac.uk/archive/procos/codesign.html Ptolemy (University of Berkeley): http://ptolemy.eecs.berkeley.edu/ SPAM (Princeton): http://www.ee.princeton.edu/~spam/ TRADES (University of Twente, INF/CAES): http://wwwspa.cs.utwente.nl/aid/aid.html Specificatietalen SystemC: http://www.systemc.org Copyrightⓒ2004 143 4장. HW/SW Co-Design for SoC Answer III: H/W and S/W Codesign Copyrightⓒ2004 144 4장. HW/SW Co-Design for SoC Three Co-Design Approaches ASIP co-design: starts with an application, builds a specific programmable processor and translates it into software code. H/w s/w synchronous system co-design: s/w processor as a master controller, and a set of h/w accelerators as coprocessors. H/w s/w for distributed systems: mapping of a set of communication processors onto a set of interconnected processors. Behavioral decomposition, process allocation and communication transformation. E.g., Coware Copyrightⓒ2004 145 4장. HW/SW Co-Design for SoC A Co-design method Grouping of Similar Process Components Major Classification Division Assignment HardWare Grouping HardWare Detailed Classification Assignment HardWare Process Component 1 Process Component 1 Group 1 Process Component 1 Process Component 2 Process Component 3 Group 2 Process Component 3 Process Component 3 Process Component 6 SoftWare The Design Target Model SoftWare Process Component 2 Group 3 Process Component 4 Group 4 SoftWare Process Component 2 Un-Decision Un-Decision Process Component 5 Group 5 Process Component 4 Process Component 6 Group 6 Process Component 5 Process Component n Y. Endo, H. Koizumi Copyrightⓒ2004 Dept. of Computer & System Eng. Tokyo Denki Univ. Japan 146 4장. HW/SW Co-Design for SoC Major Classification Visual Basic로 design target model을 description하고 process component로 나누기 위해 분석한다. 각 process component에 대한 software speed를 예측 함. HW parts와 SW parts로의 구분 각 component의 process 속도가 예상 결과를 만족시키지 못하 는 경우 HW part가 된다. Copyrightⓒ2004 147 4장. HW/SW Co-Design for SoC Grouping of Similar Process Components Func. trade-off를 위한 process component의 수를 줄 이는 것은 trade-off를 위한 작업 시간을 크게 줄일 수 있 다. 때문에, 유사한 process component들을 group화 하여 trade-off를 위한 component의 수를 줄인다. Copyrightⓒ2004 148 4장. HW/SW Co-Design for SoC Detailed Classification HW processing time을 계산하는 방법 HW와 SW간의 communication time Model의 SW speed에 coefficient constant를 곱하는 방법 SW logic에 상응하는 knowledge database를 적용하여 HW processing speed를 계산하는 방법 HW와 SW간의 communication 속도와 전송 data의 양에 기반하여 계 산 Design의 목적과 제한사항에 대한 모든 조합이 만족되지 않으면, system algorithm의 개선과 design 목적 및 제한사항을 재검토하게 된다. Copyrightⓒ2004 149 4장. HW/SW Co-Design for SoC A hardware / software partitioning technique with hierarchical design space exploration Houria Oudghiri, Bozena Kaminska, and Janusz Rajski, Mentor Graphics Corp. Modeling technique limitations => supports various models from the simplest to the most complex for the same input system => modeling technique provides a large choice for the final implementation Estimation parameters => performance, implementation cost, communication overhead Copyrightⓒ2004 150 4장. HW/SW Co-Design for SoC The proposed methodology The co-design target architecture is based on two types of processors The Texas Instruments DSP processor TMS320C40 is used as the master processor and the custom SIMD processor PULSE (Parallel Ultra Large Scale Engine, 4 processors in parallel) as the slave processor Copyrightⓒ2004 151 4장. HW/SW Co-Design for SoC The proposed partitioning algorithm => the dependency graph includes all the blocks with their interaction in the same structure => the classification and the comparison between blocks are easier => a unified structure (the weighed dependency graph) is used during all the partitioning process => the node (blocks in the model) with their own weights (the performance estimation of each block) => the edge (block interactions) with their weights (quantity of interaction) Copyrightⓒ2004 152 4장. HW/SW Co-Design for SoC Algorithm input : List of blocks and time constraints output : Two subsets where blocks are assigned Step 1 : construct the complete weighted dependency graph G Step 2 : Assign all blocks to software, compare the complete system execution time Step 3 : while (time constraints not satisfied) do step 3_i : Select the node with the maximum execution time (i) step 3_ii : Assign i to hardware, Update the system execution time step 3_iii : while (time constraints not satisfied) do step 3_iii_1 : Select the maximum weighted edge connected to i with the most time consuming node (j) step 3_iii_2 : Assign to hardware, Update the dependency graph G Update the system execution time endo endo Copyrightⓒ2004 153 4장. HW/SW Co-Design for SoC The hierarchical model of the FFT transform behavior Blue: PULSE Initialize Bit Reversal FFT Initialize Variable Initialize Data Bit_init Index_init Read_data Index_incr Bit_loop1 Bit_cond Bit_incr Bit_shift Bit_test Bit_swap1 Bit_swap2 Danielson control Output Dan_init Dan_loop Out_init Out_write Bit_loop2 Loop2_test Bit_acc Loop2_ass Data_test Loop2_shif Danielson Dan_init Initialize Dan_loop1 Copyrightⓒ2004 Level 2 Dan_loop1 Loop2_init Initialize Loop1_body Update Variables Loop2_body Dan_real Loop2_incr Dan_imag Level 7 Level 8 Loop1_incr Out_incr Level 1 Loop1_init Level 3 Level 4 Level 5 Level 6 154 4장. HW/SW Co-Design for SoC Block assignment at different hierarchical levels of the FFT transform level Nb.of Bolcks C40 PULSE Time(ms) / time constraint = 25 ms PULSE C40 Total 1 4 2 2 18.14 4.8 22.94 2 10 6 4 18.8 2.96 21.76 3 17 11 6 15.56 9 24.56 4 22 18 6 14.68 10.24 24.92 5 24 17 7 14.56 10.4 24.94 6 24 22 2 6.82 17.72 24.54 7 25 22 3 7 17.92 24.92 8 27 18 9 5.88 18.64 24.52 Copyrightⓒ2004 155 4장. HW/SW Co-Design for SoC Alternative comparison for the FFT transform Execution time (ms) Code size (Bytes) Partition PULSE C40 Reduction PULSE C40 Reduction 1(C40) 0 38.64 ….. 0 1260 ….. 2(PULSE) 20.44 0 ….. 2196 0 ….. Solution #1 18.15 4.80 40 % 1424 264 23 % Solution #8 5.2 18.64 38 % 352 412 65 % Sol. # 1: all init. And output operations assigned to PULSE Sol. # 8: only processing operation assigned to PULSE Copyrightⓒ2004 156 4장. HW/SW Co-Design for SoC Results and discussion Considering the first levels in the hierarchy, during partitioning, improve considerably the time performance but this is not the case for the memory size. The use of medium and last levels may decrease considerably the memory size or the area with a very little degradation in performance. The alternatives generated are compared to the lower-bound performance (the hardware solution) and the upper bound performance (the software solution) implementation in order to find the best trade-off. There is an optimal level in the hierarchy. The use of the most complex and detailed model does not mean obtaining the best solution. Copyrightⓒ2004 157 4장. HW/SW Co-Design for SoC OCAPI-xl model, IMEC The OCAPI-xl model was used to develop a stand-alone webcam including an interface to a digital CMOS image sensor, a GIF engine, a network layer and an interface to a 10BaseT ethernet PHY+MAC controller. The synthesized model for this NetCam (with raw-IP sockets) consisted out of 25 concurrent processes, described in about 2Klines of C++ code (taking about 25Kgates on an ASIC), designed from scratch in 14 man-months. Copyrightⓒ2004 158 4장. HW/SW Co-Design for SoC OCAPI-xl design flow Copyrightⓒ2004 159 4장. HW/SW Co-Design for SoC Application Structure Copyrightⓒ2004 160 4장. HW/SW Co-Design for SoC Cam-E-leon system architecture Copyrightⓒ2004 161 4장. HW/SW Co-Design for SoC H/W and S/W 통합 저전력 설계 최적화 환경 및 도구 ORINOCO S/W H/W S/W 코아 에너지 예측 DSP Station SW 에너지 효율 계산 ORINOCO 시스템 수준 에너지 예측 HW SW 통합 Seamless Co-centric 알고리즘 선택 Matlab/SPW 클러스터 링 Cossap, Synopsys 클러스터 스케쥴링 HW 에너지 효율 계산 클러스터 선택 Signal-master H/W 합성 및 에너지 예측 Copyrightⓒ2004 Synopsys 162 4장. HW/SW Co-Design for SoC IS-95 CDMA Searcher H/W and S/W 통합 설계 황인기, 성균관대 Cost (Speed,Area,Power) Synchronous Accumulator (SW) Energy Estimate (SW) Comparator (SW) Asynchronous Accumulator (SW) Comparator (SW) GOAL! PN-Code Generation Synchronous Accumulator1 (HW) Comparator with precomputation (HW) Energy Estimate (HW) Asynchronous Accumulator (HW) Comparator with precomputation (HW) Synchronous Accumulator2 (HW) Copyrightⓒ2004 163 4장. HW/SW Co-Design for SoC 참고문헌 The Codesign of Embedded Systems : A Unified Hardware/Software Representation, Sanjaya Kumar, James H.Aylor, Barry W.Johnson, Wm. A. Wulf Synthesis and simulation of digital systems containing interacting hardware and software components 29th dac A model and methodology for hardware-software codesign CAP Laboratory Homepage (http://peace.snu.ac.kr/) Pai Chou, Ross Ortega, Gaetano Borriello, "Synthesis of the Hardware/Software Interface in Microcontroller-Based Systems," Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, Santa Clara, CA, November 1992. pp.488-495. Copyrightⓒ2004 164