M13-network-0n-chip(5..

advertisement
Network on Chip
조준동
2008.1
SKKU 휴대폰학과
© 조준동
2008
1
Technology Evolution
SKKU 휴대폰학과
© 조준동
2008
2
NoC (network on chip)
U.C. Berkeley
• 단일 반도체 칩 상에 통신망 구조를 이식
• OSI model에 의해서 전송 프로토콜을 정의
• DSP/microprocessor/Memory 등을 H/W-S/W co-design
이용 단일 칩 내에서 연결
• 코드 최적화 및 저전력 software IP 라이브러리 구축
• 모듈간 연결을 위한 버스 구조
• 구성 요소
– Region: 특수한 토폴로지/네트워크 구조를 허용하는 영역
– Backbone
– Wapper : 전송되는 메시지를 적절한 형태로 변환, 복잡하다
• 복잡하고 대형 시스템에 적합
SKKU 휴대폰학과
© 조준동
2008
3
From Spaghetti wires to Noc
• Marcello Coppola,
MPSOC05
On-chip communication Infrastructure
SKKU 휴대폰학과
© 조준동
2008
4
NoC definition
• A flexible and scalable packet-based onchip micro-network designed according to
a layered methodology
• Los Angeles : Reducing commute time by 15
min -> $15b economic impact
• On chip communication will dominate
performance, power efficiency.
SKKU 휴대폰학과
© 조준동
2008
5
A Legacy SoC Approach
CoreConnect (PPC), AMBA (ARM)…
SKKU 휴대폰학과
© 조준동
2008
6
Putting the blocks together
posed tough questions:
•Do the hardware interfaces work
with one another?
• Do the chip have enough bus
and memory bandwidth under
worst-case loads?
• Do software tasks communicate
without deadlock?
• Do all applications and features
of the full system meet functional
goals?
• Does the system meet
performance goals?
• Are the cost, power acceptable?
SKKU 휴대폰학과
© 조준동
2008
7
Networks-on-Silicon, Phillips
SKKU 휴대폰학과
© 조준동
2008
8
Wires-Centric Design
• Exploits logic structure to reduce wire
loads
• Enables use of advanced circuits
– wire properties and crosstalk known
early and well characterized
• Gives a stable design
– key wire loads don’t change with small
logic changes
SKKU 휴대폰학과
© 조준동
2008
9
Wires dominate - power, area, delay
• Problem - Contemporary tools leave wires as
an afterthought
– result is lack of structure, visibility, and control
• Solution 1 - wires first design
– route key wires, then place gates
• Solution 2 - route packets, not wires
– on-chip networks
• global wires fixed before the design starts
SKKU 휴대폰학과
© 조준동
2008
10
Dedicated wires vs. Network
Dedicated Wiring
On-Chip Network
Spaghetti wiring
Ordered wiring
Variation makes it hard to model
crosstalk, returns, length, R & C.
No variation, so easy to exactly
model XT, returns, R and C.
Drivers sized for ‘wire model’ –
99% too large, 1% too small
Driver sized exactly for wire
Hard to use advanced signaling
Easy to use advanced signaling
Low duty factor
High duty factor
No protocol overhead
Small protocol overhead
SKKU 휴대폰학과
© 조준동
2008
11
Wires-first design
Short
Wire
Models
Structured
RTL
RTL
Floorplan
Structure
Synthesis
Local
Netlists
Place &
Route
Layout
Regions
Key Wires
Placement
& Loads
Wire plan
Manual
Design
SKKU 휴대폰학과
Library
Slow
Paths
Timing
Analysis
R&C
© 조준동
2008
Extractor
12
On-Chip Interconnection
Networks
• Replace dedicated global wiring with a
shared network
Local
Logic
Router
Network
Wires
Chip
Dedicated wiring
SKKU 휴대폰학과
Network
© 조준동
2008
13
Most Wires are Idle Most of the
Time
• Don’t dedicate wires to signals, share wires across
multiple signals
• Route packets not wires
• Organize global wiring as an on-chip interconnection
network
– allows the wiring resource to be shared keeping wires busy
most of the time
– allows a single global interconnect to be re-used on multiple
designs
– makes global wiring regular and highly optimized
SKKU 휴대폰학과
© 조준동
2008
14
On chip communication
SKKU 휴대폰학과
© 조준동
2008
15
SMART
(Sonics Methodology and Architecture
for Rapid Time-to-Market)
• plug-and-play on-chip communications
network
• Packet-based
• 50 employees in a year
• IP 및 설계환경 제공, SoC 설계 지원
• Cadence와 연합
• SiliconBackplne III는 통신+미디어
SKKU 휴대폰학과
© 조준동
2008
16
Arteris NoC layered architecture
SKKU 휴대폰학과
© 조준동
2008
17
온칩 네트워크
●
●
●
●
아키텍처
Router/Scheduler 알고리즘 개발
SystemC를 이용한 네트워크 모델 설계 및 검증
Star형/Mesh형 온칩 네트워크 핵심 IP 설계
Master/Slave 네트워크 인터페이스, 고성능 메모
리 관리 인터페이스 설계
SKKU 휴대폰학과
© 조준동
2008
18
온칩 네트워크 기반 SoC 설계 플랫
폼 구축 및 설계 환경
● 분산형 Crossbar Switch Topology 생성 및 IP 맵
핑 툴 개발
● IP to Mesh Tile 맵핑 툴 개발
● IP간 데이터 플로우 분석 기반 네트워크 Topology
생성 툴 개발, SoC 플랫폼 구축
SKKU 휴대폰학과
© 조준동
2008
19
활용 분야
- QoS를 보장하는 프로토콜을 지원하여 Real Time Application 및 대용
량 데이터 대역폭이 요구되는 응용 분야에 적합
- 멀티미디어 SoC, 휴대 및 통신용 단말기, 인터넷 셋톱 박스, 게임기, 네
트워크 단말의 제품 구현에 필요한 시스템 레벨 칩 등
- high frame rate video 및 3D 그래픽 관련 등과 같은 멀티미디어 대용
량 응용분야 SoC 설계
- 온칩 네트워크 핵심 IP 및 설계 지원 툴을 하나의 플랫폼화한 플랫폼 기
반
설계 환경을 구축하여 이를 다양한 SoC 설계에 활용함
SKKU 휴대폰학과
© 조준동
2008
20
최근 연구동향
•
•
•
•
•
•
•
Intel’s Reconfigurable Radio Architecture. (mesh +
nearest neighbor)
Reconfigurable Baseband Processing, Picochip
Portable Components using Containers for
Heterogeneous Platforms, Mercury Computer
Systems, Inc.
A configurable Platform, Altera, Excalibur, Xilinx
Virtex FPGA
Adaptive Computing Machine, Quicksilver Tech.
Mercury, Sky, Galileo, Tundra (crossbars,
bridges)
Virginia Tech’s reconfigurable hardware
SKKU 휴대폰학과
© 조준동
2008
21
Structural layers of NOC
Product
Configuration
Network management, allocation, operation modes
Applications
Resource management, diagnostics, applications
Functions
Executables
Hardware units
Resources
Regions
Communication
SKKU 휴대폰학과
System control, product behaviour
Execution control, functions
RTOS, code, HW configurations
Processors, memorires, configurable HW, logic
Resource types, buses, IO
Region types, switches, network interfaces
Channels and protocols
© 조준동
2008
22
Network protocol
Application
System/Session
Transport
Network
Data link
Physical
SKKU 휴대폰학과
• Physical
– 신호 전압, 타이밍, 버스 폭, 신호
동기
• Data link
– 오류 검출 정정
– Arbitration of physical medium
• Network
– IP protocol
– 데이터 라우트
• Transport
– TCP 프로토콜
– End –to-end connection
© 조준동
2008
23
NOC Platform development
• Scaling problem
– How big NOC is needed? What are the application area req
uirements?
• Region definition problem
– What kind of regions are needed? What kind of interfaces
between regions? What are the capacity requirements for t
he regions?
• Resource design problem
– What is needed inside resources? Internal computation typ
e and internal communication?
• Application mapping flow problem
– What kind of languages, models and tools must be support
ed? How to validate and test the final products?
SKKU 휴대폰학과
© 조준동
2008
24
NOC Application Development
• Mapping problem
– How to partition applications for NOC resources? How to allocate fu
nctionality effectively? Is the performance adequate? Is the resour
ce usage in balance?
• Optimisation problem
– How to perform global optimisation of heterogenuous applications?
How to define right optimisation targets? How to utilise application
/resource type specific tools?
• Validation problem
– Are the contraints met? Are the communication bottlenecks or pow
er consumption hot spots? How to simulate 10000 GIPS system? H
ow to test all applications?
SKKU 휴대폰학과
© 조준동
2008
25
스위치 네트워크: CLICHE
•
•
•
•
•
OSI 모델을 데이터 전송 프로토콜로 사용
칩에 집적된 네트워크 (Network on Chip)
패킷 데이터 전송
대형 시스템이 구성 요소
이종 구성 요소의 칩 레벨 집적에 유리하다.
SWITCH
mux
S
S
rni
rni
P
M
rni
resource
M
queue
resource
c
S
rni
rni
S
S
rni
queue
rni
P
S
S
rni
rni
resource
S
rni
resource
Selection
logic
S
P
c
rni
queue
resource
resource
M
c
re
mux
M
Selection
logic
c
S
mux
P
S
rni
D
c
re
S
Selection
logic
switch
resource
c
rni
rni
resource
rni
resource
Selection
logic
P
S
ux
m
rni
resource
S
M
SKKU 휴대폰학과
© 조준동
mux
S
Se
le
lo ctio
gi n
c
M
2008
26
NoC 의 figure of Merit
Scalability
Computatio
Energy Efficiency
Utilisation
n
Fault tolerance
consumption
Storage
Result quality (accuracy)
Communicatio
Responsiveness
Functionality
n
Capacity
Performance
Structural
Functional
Control
Complexity
System
Quality
Variability
Materials
Licencing
Production
Implementation
Cost
Development
Flexibility
Applicability
Configurability
Programmabilit
y
SKKU 휴대폰학과
Modifiability
Coupling
Cohesion
Modularity
Volume
Lifetime
Usabilit
y
Manufacturabilit
y
© 조준동
Effort
Time
Risk
2008
27
NoC 설계 flow
R. Marculescu
SKKU 휴대폰학과
© 조준동
2008
28
NoC기반의 응용 분야
Low Power communication systems
High-perforrmance
communication systems
Baseband platform
High-capacity
communication systems
Personal
assistant
Database platform
Data
collection
systems
BACKBONE
Entertainment
devices
Multimedia platform
PLATFORMS
SYSTEMS
SKKU 휴대폰학과
Virtual reality games
© 조준동
2008
29
Layered Radio Architecture
SKKU 휴대폰학과
© 조준동
2008
30
Stream-based design
Stream Packet
Processing
Element 1
Stream Packet
Processing
Element 2
Configuration
Pipeline
Application Layer Software
I/O Layer
Configuration Layer
Processing Layer
SKKU 휴대폰학과
Stream Packet
Interpret
Packet
Processing
Pipeline
Bypass
Pipeline
© 조준동
ReConstr.
Packet
2008
31
NoC의 저전력 문제
어플리케이션 레이어
- DPM, 리소스 관리, 전력 관리 API
트랜스포트 레이어
- QoS 보장 (지연 및 메시지 손실 최소)을 위한 데이터
패킷 관리 문제, 메시지를 통한 PSM
네트워크 레이어
packetized 데이터 전송시 스위칭 및 라우팅 문제
데이터 링크 레이어
패킷 데이터 에러 손실 감축 및 복구 문제
Physical 레이어
- DVS에 따른 신뢰성 문제, 온 칩 동기 문제
SKKU 휴대폰학과
© 조준동
2008
32
Tile-based Architecture Platform
R. Marculescu
SKKU 휴대폰학과
© 조준동
2008
33
Energy-Aware Mapping for Tilebased Architectures
R. Marculescu
Objective: minimize the total communication
energy consumption
Constraint: meet the communication performance
constraints (specified by designer)
For a 4X4 tile architecture, 16! mappings
SKKU 휴대폰학과
© 조준동
2008
34
OFDM + CDMA to NoC 매핑
NCO
NCO
CR
CR
CPE
ADC
ADC
GI
GI
Removal
Removal
Demod
Demod
Coarse
Coarse
STR
STR
IF
DP
DP
AGC
AGC
RF
Timing
Timing
Processor
Processor
GI/FFT
GI/FFT
Detector
Detector
FFT
FFT
CSI
Channel
Channel
Estimator
Estimator
/Equalizer
/Equalizer
Phase
Phase
Rotator
Rotator
Fine
Fine
STR
STR
Viterbi
Viterbi
FEC
FEC
SER
SER
DSP
DSP
ASIC
ASIC
switch
S
S
rni
rni
P
D
M
c
re
S
rni
resource
M
Network on Chip
resource
c
S
P
S
rni
c
S
rni
rni
S
S
rni
rni
P
M
c
re
S
S
rni
resource
resource
M
rni
resource
S
rni
resource
S
P
c
rni
resource
M
S
rni
P
c
resource
rni
S
rni
resource
-매핑을 통한 응용분야 encapsulation
-병렬처리가 가능한 고성능 데이터 패스
-H/W and S/W 요소 모두 사용
S
rni
resource
M
SKKU 휴대폰학과
© 조준동
2008
35
SKKU 휴대폰학과
© 조준동
2008
36
MP-SOC Cluster
SKKU 휴대폰학과
© 조준동
2008
37
MPSoC Clock and Power
Olivier Franza, Intel
•
Increased uncertainty with process scaling
•
Affects design margin over design, power &
performance loss
•
•
–
Process, voltage, temperature variations, noise, coupling
–
–
Increased power constraints
Increasing leakage, power (density, delivery) limitations
More transistors mean:
– Larger clock distribution networks
– Higher capacitance (more load and parasitics)
With each new technology:
– Gate delay decreases ~25%
– Wire delay increases ~100%
– Cross-chip communication increases
– Clock needs multiple cycles to cover die
SKKU 휴대폰학과
© 조준동
2008
38
Interconnect Delays & Density
Hannu Tenhunen & Dr. Li-Rong Zheng, Royal Institute of Technology
SKKU 휴대폰학과
© 조준동
2008
39
Multiple Clocks due to Interconnect limitation
SKKU 휴대폰학과
© 조준동
2008
40
At reduced performance,
larger resource size
SKKU 휴대폰학과
© 조준동
2008
41
Noise in Mixed Signal
Systems
SKKU 휴대폰학과
© 조준동
2008
42
Multiple clock domains
•
•
•
•
•
•
•
•
•
•
Low skew and jitter ALWAYS a must
Clock modeling requires more accuracy
Within-die variations, inductance, crosstalk,
electromigration, self-heat, …
Floor plan modularity
Think adding/removing cores seamlessly!
Hierarchical clock partitioning
Reduce global clock and possibly relax its
requirements
Generate “locally”-used clock “locally”
Implement clock domain deskewing techniques
Bound clock problem into simple, reliable, efficient
domains
SKKU 휴대폰학과
© 조준동
2008
43
DEC/Compaq Alpha
more complex core to improve performance, more
complex clocks (?), Source: DEC/Compaq – Gronoski & al., JSSC 1998 – Xanthopoulos &
al., ISSCC 2001 – Barroso & al., ISCA 2000
SKKU 휴대폰학과
© 조준동
2008
44
Clock and Power Convergence
Intel® Itanium® Montecito
•
•
•
•
•
•
Each core split into 3 clock domains
on variable power supply
Each domain controlled by Digital
Frequency Divider (DFD)
generating low-skew variablefrequency clocks; fed by central
PLL and aligned through phase
detectors
Regional Voltage Detector (RVD):
supply voltage monitor
Second level clock buffer (SLCB):
digitally controlled delay buffer for
active deskewing
Regional Active Deskew (RAD):
phase comparators monitoring
and adjusting delay difference
between SLCBs
Clock Vernier Device (CVD):
digitally controlled delay buffer
Clock generation and distribution are essential Clock generation and distribution are
essential enablers of microprocessor performance
SKKU 휴대폰학과
© 조준동
2008
45
On-Chip Interconnects:
Circuits and Signaling,
Wayne Burleson
• Using Vdd programmability
• High Vdd to devices on critical path
• Low Vdd to devices on non-critical
paths
• VddOff for inactive paths
A – Baseline Fabric
B – Fabric with Vdd Configurable
Interconnect
This work builds on a similar idea for FPGAs described in:
Fei Li, Yan Lin and Lei He. Vdd Programmability to Reduce FPGA Interconnect Power, IEEE/ACM International
Conference on Computer-Aided Design, Nov. 2004
SKKU 휴대폰학과
© 조준동
2008
46
Reliable design,
G. De Micheli
1. Manufacturing imperfections: More likely to
happen as lithography scales down
2. Approximations during design: Uncertainty
about details of design
3. Aging: Oxide breakdown,electromigration
4. Environment-induced Soft-errors (Data
corruption due external radiation exposure),
electro-magnetic interference
5. Operating-mode induced: Extremely-low
voltage supply
SKKU 휴대폰학과
© 조준동
2008
47
Dealing with variability
• Most variability problems that induce timing
errors
1.
2.
3.
4.
Power supply variation
Wire length estimation
Crosstalk
Soft errors
SKKU 휴대폰학과
© 조준동
2008
48
Adaptive low-power
transmission scheme
Frédéric Worm, Patrick Thiran, Giovanni De Micheli, and Paolo Ienne.
Self-calibrating Networks-on-Chip.In Proceedings of the IEEE International
Symposium on Circuits and Systems, Kobe, Japan, May 2005.
SKKU 휴대폰학과
© 조준동
2008
49
Reduced Energy Consumption
SKKU 휴대폰학과
© 조준동
2008
50
Download