Network on Chip 조준동 2008.1 SKKU 휴대폰학과 © 조준동 2008 1 Technology Evolution SKKU 휴대폰학과 © 조준동 2008 2 NoC (network on chip) U.C. Berkeley • 단일 반도체 칩 상에 통신망 구조를 이식 • OSI model에 의해서 전송 프로토콜을 정의 • DSP/microprocessor/Memory 등을 H/W-S/W co-design 이용 단일 칩 내에서 연결 • 코드 최적화 및 저전력 software IP 라이브러리 구축 • 모듈간 연결을 위한 버스 구조 • 구성 요소 – Region: 특수한 토폴로지/네트워크 구조를 허용하는 영역 – Backbone – Wapper : 전송되는 메시지를 적절한 형태로 변환, 복잡하다 • 복잡하고 대형 시스템에 적합 SKKU 휴대폰학과 © 조준동 2008 3 From Spaghetti wires to Noc • Marcello Coppola, MPSOC05 On-chip communication Infrastructure SKKU 휴대폰학과 © 조준동 2008 4 NoC definition • A flexible and scalable packet-based onchip micro-network designed according to a layered methodology • Los Angeles : Reducing commute time by 15 min -> $15b economic impact • On chip communication will dominate performance, power efficiency. SKKU 휴대폰학과 © 조준동 2008 5 A Legacy SoC Approach CoreConnect (PPC), AMBA (ARM)… SKKU 휴대폰학과 © 조준동 2008 6 Putting the blocks together posed tough questions: •Do the hardware interfaces work with one another? • Do the chip have enough bus and memory bandwidth under worst-case loads? • Do software tasks communicate without deadlock? • Do all applications and features of the full system meet functional goals? • Does the system meet performance goals? • Are the cost, power acceptable? SKKU 휴대폰학과 © 조준동 2008 7 Networks-on-Silicon, Phillips SKKU 휴대폰학과 © 조준동 2008 8 Wires-Centric Design • Exploits logic structure to reduce wire loads • Enables use of advanced circuits – wire properties and crosstalk known early and well characterized • Gives a stable design – key wire loads don’t change with small logic changes SKKU 휴대폰학과 © 조준동 2008 9 Wires dominate - power, area, delay • Problem - Contemporary tools leave wires as an afterthought – result is lack of structure, visibility, and control • Solution 1 - wires first design – route key wires, then place gates • Solution 2 - route packets, not wires – on-chip networks • global wires fixed before the design starts SKKU 휴대폰학과 © 조준동 2008 10 Dedicated wires vs. Network Dedicated Wiring On-Chip Network Spaghetti wiring Ordered wiring Variation makes it hard to model crosstalk, returns, length, R & C. No variation, so easy to exactly model XT, returns, R and C. Drivers sized for ‘wire model’ – 99% too large, 1% too small Driver sized exactly for wire Hard to use advanced signaling Easy to use advanced signaling Low duty factor High duty factor No protocol overhead Small protocol overhead SKKU 휴대폰학과 © 조준동 2008 11 Wires-first design Short Wire Models Structured RTL RTL Floorplan Structure Synthesis Local Netlists Place & Route Layout Regions Key Wires Placement & Loads Wire plan Manual Design SKKU 휴대폰학과 Library Slow Paths Timing Analysis R&C © 조준동 2008 Extractor 12 On-Chip Interconnection Networks • Replace dedicated global wiring with a shared network Local Logic Router Network Wires Chip Dedicated wiring SKKU 휴대폰학과 Network © 조준동 2008 13 Most Wires are Idle Most of the Time • Don’t dedicate wires to signals, share wires across multiple signals • Route packets not wires • Organize global wiring as an on-chip interconnection network – allows the wiring resource to be shared keeping wires busy most of the time – allows a single global interconnect to be re-used on multiple designs – makes global wiring regular and highly optimized SKKU 휴대폰학과 © 조준동 2008 14 On chip communication SKKU 휴대폰학과 © 조준동 2008 15 SMART (Sonics Methodology and Architecture for Rapid Time-to-Market) • plug-and-play on-chip communications network • Packet-based • 50 employees in a year • IP 및 설계환경 제공, SoC 설계 지원 • Cadence와 연합 • SiliconBackplne III는 통신+미디어 SKKU 휴대폰학과 © 조준동 2008 16 Arteris NoC layered architecture SKKU 휴대폰학과 © 조준동 2008 17 온칩 네트워크 ● ● ● ● 아키텍처 Router/Scheduler 알고리즘 개발 SystemC를 이용한 네트워크 모델 설계 및 검증 Star형/Mesh형 온칩 네트워크 핵심 IP 설계 Master/Slave 네트워크 인터페이스, 고성능 메모 리 관리 인터페이스 설계 SKKU 휴대폰학과 © 조준동 2008 18 온칩 네트워크 기반 SoC 설계 플랫 폼 구축 및 설계 환경 ● 분산형 Crossbar Switch Topology 생성 및 IP 맵 핑 툴 개발 ● IP to Mesh Tile 맵핑 툴 개발 ● IP간 데이터 플로우 분석 기반 네트워크 Topology 생성 툴 개발, SoC 플랫폼 구축 SKKU 휴대폰학과 © 조준동 2008 19 활용 분야 - QoS를 보장하는 프로토콜을 지원하여 Real Time Application 및 대용 량 데이터 대역폭이 요구되는 응용 분야에 적합 - 멀티미디어 SoC, 휴대 및 통신용 단말기, 인터넷 셋톱 박스, 게임기, 네 트워크 단말의 제품 구현에 필요한 시스템 레벨 칩 등 - high frame rate video 및 3D 그래픽 관련 등과 같은 멀티미디어 대용 량 응용분야 SoC 설계 - 온칩 네트워크 핵심 IP 및 설계 지원 툴을 하나의 플랫폼화한 플랫폼 기 반 설계 환경을 구축하여 이를 다양한 SoC 설계에 활용함 SKKU 휴대폰학과 © 조준동 2008 20 최근 연구동향 • • • • • • • Intel’s Reconfigurable Radio Architecture. (mesh + nearest neighbor) Reconfigurable Baseband Processing, Picochip Portable Components using Containers for Heterogeneous Platforms, Mercury Computer Systems, Inc. A configurable Platform, Altera, Excalibur, Xilinx Virtex FPGA Adaptive Computing Machine, Quicksilver Tech. Mercury, Sky, Galileo, Tundra (crossbars, bridges) Virginia Tech’s reconfigurable hardware SKKU 휴대폰학과 © 조준동 2008 21 Structural layers of NOC Product Configuration Network management, allocation, operation modes Applications Resource management, diagnostics, applications Functions Executables Hardware units Resources Regions Communication SKKU 휴대폰학과 System control, product behaviour Execution control, functions RTOS, code, HW configurations Processors, memorires, configurable HW, logic Resource types, buses, IO Region types, switches, network interfaces Channels and protocols © 조준동 2008 22 Network protocol Application System/Session Transport Network Data link Physical SKKU 휴대폰학과 • Physical – 신호 전압, 타이밍, 버스 폭, 신호 동기 • Data link – 오류 검출 정정 – Arbitration of physical medium • Network – IP protocol – 데이터 라우트 • Transport – TCP 프로토콜 – End –to-end connection © 조준동 2008 23 NOC Platform development • Scaling problem – How big NOC is needed? What are the application area req uirements? • Region definition problem – What kind of regions are needed? What kind of interfaces between regions? What are the capacity requirements for t he regions? • Resource design problem – What is needed inside resources? Internal computation typ e and internal communication? • Application mapping flow problem – What kind of languages, models and tools must be support ed? How to validate and test the final products? SKKU 휴대폰학과 © 조준동 2008 24 NOC Application Development • Mapping problem – How to partition applications for NOC resources? How to allocate fu nctionality effectively? Is the performance adequate? Is the resour ce usage in balance? • Optimisation problem – How to perform global optimisation of heterogenuous applications? How to define right optimisation targets? How to utilise application /resource type specific tools? • Validation problem – Are the contraints met? Are the communication bottlenecks or pow er consumption hot spots? How to simulate 10000 GIPS system? H ow to test all applications? SKKU 휴대폰학과 © 조준동 2008 25 스위치 네트워크: CLICHE • • • • • OSI 모델을 데이터 전송 프로토콜로 사용 칩에 집적된 네트워크 (Network on Chip) 패킷 데이터 전송 대형 시스템이 구성 요소 이종 구성 요소의 칩 레벨 집적에 유리하다. SWITCH mux S S rni rni P M rni resource M queue resource c S rni rni S S rni queue rni P S S rni rni resource S rni resource Selection logic S P c rni queue resource resource M c re mux M Selection logic c S mux P S rni D c re S Selection logic switch resource c rni rni resource rni resource Selection logic P S ux m rni resource S M SKKU 휴대폰학과 © 조준동 mux S Se le lo ctio gi n c M 2008 26 NoC 의 figure of Merit Scalability Computatio Energy Efficiency Utilisation n Fault tolerance consumption Storage Result quality (accuracy) Communicatio Responsiveness Functionality n Capacity Performance Structural Functional Control Complexity System Quality Variability Materials Licencing Production Implementation Cost Development Flexibility Applicability Configurability Programmabilit y SKKU 휴대폰학과 Modifiability Coupling Cohesion Modularity Volume Lifetime Usabilit y Manufacturabilit y © 조준동 Effort Time Risk 2008 27 NoC 설계 flow R. Marculescu SKKU 휴대폰학과 © 조준동 2008 28 NoC기반의 응용 분야 Low Power communication systems High-perforrmance communication systems Baseband platform High-capacity communication systems Personal assistant Database platform Data collection systems BACKBONE Entertainment devices Multimedia platform PLATFORMS SYSTEMS SKKU 휴대폰학과 Virtual reality games © 조준동 2008 29 Layered Radio Architecture SKKU 휴대폰학과 © 조준동 2008 30 Stream-based design Stream Packet Processing Element 1 Stream Packet Processing Element 2 Configuration Pipeline Application Layer Software I/O Layer Configuration Layer Processing Layer SKKU 휴대폰학과 Stream Packet Interpret Packet Processing Pipeline Bypass Pipeline © 조준동 ReConstr. Packet 2008 31 NoC의 저전력 문제 어플리케이션 레이어 - DPM, 리소스 관리, 전력 관리 API 트랜스포트 레이어 - QoS 보장 (지연 및 메시지 손실 최소)을 위한 데이터 패킷 관리 문제, 메시지를 통한 PSM 네트워크 레이어 packetized 데이터 전송시 스위칭 및 라우팅 문제 데이터 링크 레이어 패킷 데이터 에러 손실 감축 및 복구 문제 Physical 레이어 - DVS에 따른 신뢰성 문제, 온 칩 동기 문제 SKKU 휴대폰학과 © 조준동 2008 32 Tile-based Architecture Platform R. Marculescu SKKU 휴대폰학과 © 조준동 2008 33 Energy-Aware Mapping for Tilebased Architectures R. Marculescu Objective: minimize the total communication energy consumption Constraint: meet the communication performance constraints (specified by designer) For a 4X4 tile architecture, 16! mappings SKKU 휴대폰학과 © 조준동 2008 34 OFDM + CDMA to NoC 매핑 NCO NCO CR CR CPE ADC ADC GI GI Removal Removal Demod Demod Coarse Coarse STR STR IF DP DP AGC AGC RF Timing Timing Processor Processor GI/FFT GI/FFT Detector Detector FFT FFT CSI Channel Channel Estimator Estimator /Equalizer /Equalizer Phase Phase Rotator Rotator Fine Fine STR STR Viterbi Viterbi FEC FEC SER SER DSP DSP ASIC ASIC switch S S rni rni P D M c re S rni resource M Network on Chip resource c S P S rni c S rni rni S S rni rni P M c re S S rni resource resource M rni resource S rni resource S P c rni resource M S rni P c resource rni S rni resource -매핑을 통한 응용분야 encapsulation -병렬처리가 가능한 고성능 데이터 패스 -H/W and S/W 요소 모두 사용 S rni resource M SKKU 휴대폰학과 © 조준동 2008 35 SKKU 휴대폰학과 © 조준동 2008 36 MP-SOC Cluster SKKU 휴대폰학과 © 조준동 2008 37 MPSoC Clock and Power Olivier Franza, Intel • Increased uncertainty with process scaling • Affects design margin over design, power & performance loss • • – Process, voltage, temperature variations, noise, coupling – – Increased power constraints Increasing leakage, power (density, delivery) limitations More transistors mean: – Larger clock distribution networks – Higher capacitance (more load and parasitics) With each new technology: – Gate delay decreases ~25% – Wire delay increases ~100% – Cross-chip communication increases – Clock needs multiple cycles to cover die SKKU 휴대폰학과 © 조준동 2008 38 Interconnect Delays & Density Hannu Tenhunen & Dr. Li-Rong Zheng, Royal Institute of Technology SKKU 휴대폰학과 © 조준동 2008 39 Multiple Clocks due to Interconnect limitation SKKU 휴대폰학과 © 조준동 2008 40 At reduced performance, larger resource size SKKU 휴대폰학과 © 조준동 2008 41 Noise in Mixed Signal Systems SKKU 휴대폰학과 © 조준동 2008 42 Multiple clock domains • • • • • • • • • • Low skew and jitter ALWAYS a must Clock modeling requires more accuracy Within-die variations, inductance, crosstalk, electromigration, self-heat, … Floor plan modularity Think adding/removing cores seamlessly! Hierarchical clock partitioning Reduce global clock and possibly relax its requirements Generate “locally”-used clock “locally” Implement clock domain deskewing techniques Bound clock problem into simple, reliable, efficient domains SKKU 휴대폰학과 © 조준동 2008 43 DEC/Compaq Alpha more complex core to improve performance, more complex clocks (?), Source: DEC/Compaq – Gronoski & al., JSSC 1998 – Xanthopoulos & al., ISSCC 2001 – Barroso & al., ISCA 2000 SKKU 휴대폰학과 © 조준동 2008 44 Clock and Power Convergence Intel® Itanium® Montecito • • • • • • Each core split into 3 clock domains on variable power supply Each domain controlled by Digital Frequency Divider (DFD) generating low-skew variablefrequency clocks; fed by central PLL and aligned through phase detectors Regional Voltage Detector (RVD): supply voltage monitor Second level clock buffer (SLCB): digitally controlled delay buffer for active deskewing Regional Active Deskew (RAD): phase comparators monitoring and adjusting delay difference between SLCBs Clock Vernier Device (CVD): digitally controlled delay buffer Clock generation and distribution are essential Clock generation and distribution are essential enablers of microprocessor performance SKKU 휴대폰학과 © 조준동 2008 45 On-Chip Interconnects: Circuits and Signaling, Wayne Burleson • Using Vdd programmability • High Vdd to devices on critical path • Low Vdd to devices on non-critical paths • VddOff for inactive paths A – Baseline Fabric B – Fabric with Vdd Configurable Interconnect This work builds on a similar idea for FPGAs described in: Fei Li, Yan Lin and Lei He. Vdd Programmability to Reduce FPGA Interconnect Power, IEEE/ACM International Conference on Computer-Aided Design, Nov. 2004 SKKU 휴대폰학과 © 조준동 2008 46 Reliable design, G. De Micheli 1. Manufacturing imperfections: More likely to happen as lithography scales down 2. Approximations during design: Uncertainty about details of design 3. Aging: Oxide breakdown,electromigration 4. Environment-induced Soft-errors (Data corruption due external radiation exposure), electro-magnetic interference 5. Operating-mode induced: Extremely-low voltage supply SKKU 휴대폰학과 © 조준동 2008 47 Dealing with variability • Most variability problems that induce timing errors 1. 2. 3. 4. Power supply variation Wire length estimation Crosstalk Soft errors SKKU 휴대폰학과 © 조준동 2008 48 Adaptive low-power transmission scheme Frédéric Worm, Patrick Thiran, Giovanni De Micheli, and Paolo Ienne. Self-calibrating Networks-on-Chip.In Proceedings of the IEEE International Symposium on Circuits and Systems, Kobe, Japan, May 2005. SKKU 휴대폰학과 © 조준동 2008 49 Reduced Energy Consumption SKKU 휴대폰학과 © 조준동 2008 50