2 4장. HW/SW Co-Design for SoC

advertisement
Module 4
HW 및 SW Co-design
for SoC
정정화 교수
(한양대학교)
4장. HW/SW Co-Design for SoC
HW/SW Co-design for SoC

Introduction of HW/SW Co-design

HW/SW Co-design Methodology






System Specification
HW/SW Co-partitioning
HW/SW Co-synthesis
HW/SW Co-verification
Co-design Related Works
참고문헌
Copyrightⓒ2004
2
4장. HW/SW Co-Design for SoC
Co-design 이란?

하드웨어와 소프트웨어가 조합된 시스템에서 기능과 성능의 목
적을 동시에 달성하기 위하여, 협력성과 동시성을 지원하는 설계
방법론
Abstract Co-design Process
Synthesis
Mapping
Hardware
Copyrightⓒ2004
Architecture
Verification
Abstraction
Refinement
Function
Software
3
4장. HW/SW Co-Design for SoC
Codesign Definition
and Key Concepts

Codesign
 Exploiting the trade-offs between hardware and so
ftware in a system through their concurrent
design

Key concepts
 Concurrent: hardware and software developed
at the same time on parallel paths
 Integrated: interaction between hardware and
software developments to produce designs that me
et performance criteria and functional specification
s
Copyrightⓒ2004
4
4장. HW/SW Co-Design for SoC
Motivations for Codesign
 Instruction
Set Processors (ISPs) available as
cores in many design kits (386s, DSPs, microcontr
ollers,etc.)
 Systems on Silicon - many transistors available
in typical processes (> 10 million transistors
available in IBM ASIC process, etc.)
 Increasing capacity of field programmable devices
- some devices even able to be reprogrammed on
-the-fly (FPGAs, CPLDs, etc.)
 Efficient C compilers for embedded processors
 Hardware synthesis capabilities
Copyrightⓒ2004
5
4장. HW/SW Co-Design for SoC
SOC Co-Design Challenges



Current systems are complex and heterogenous
Contain many different types of components
Half of the chip can be filled with 200 low-power,
RISC-like processors (ASIP) interconnected by fieldprogrammable buses, embedded in 20Mbytes of
distributed DRAM and flash memory, Another Half:
ASIC
Computational power will not result from multi-GHz
clocking but from parallelism, with below 200 MHz.
This will
greatly simplify the design for correct timing, testability, and signal integrity.
Copyrightⓒ2004
6
4장. HW/SW Co-Design for SoC
고전적 HW/SW 설계 방법론

고전적 HW/SW 설계 과정




제작 초기 단계부터 시스템을 하드웨어와 소프트웨어로 분할
하드웨어와 소프트웨어를 독립적으로 개발
각각을 개발 완료한 후 통합
고전적 HW/SW 설계의 문제점
하드웨어와 소프트웨어 부분을 개발 중간에 교체하기 어려움
 시스템을 통합하고, 검증하기 위하여 많은 시간이 소요됨


최근의 설계 요구사항




최적의 가격대 성능비
낮은 전력소모량
사용자 편이성, 무게, 부피
Time to market
Copyrightⓒ2004
7
4장. HW/SW Co-Design for SoC
Co-design의 필요성

설계 초기 단계부터 하드웨어 소프트웨어를 동시에 고려
하는 방법론 필요
 “Co-design”

Co-design의 특징





Concurrency & Integration
하드웨어/소프트웨어를 통합하여 동시에 개발, 검증
임베디드 시스템과 SoC의 설계에 적합
하드웨어/소프트웨어 최적 분할로 가격대 성능비 향상
하드웨어와 소프트웨어의 설계 시간, 설계 비용, 에러의 감소로
인한 time-to-market 가능
Copyrightⓒ2004
8
4장. HW/SW Co-Design for SoC
HW/SW Co-design for SoC

Introduction of HW/SW Co-design

HW/SW Co-design Methodology






System Specification
HW/SW Co-partitioning
HW/SW Co-synthesis
HW/SW Co-verification
Co-design Related Works
참고문헌
Copyrightⓒ2004
9
4장. HW/SW Co-Design for SoC
General Co-design Process
System
Specification
Performance Goal
Control Data
Flow Graph
Cost Estimation
[Delay, Area, Power]
Constraint
Analysis
C
O
I
S
I
M
U
L
A
T
O
R
Memory / Pipeline
Optimization
Hardware Software
Partitioning
Co Synthesis
Interface Library
Software
Specificaion
Hardware
Specification
Interface
Specification
Behavioral
Synthesis
Complier
Custom HW
Memory
Application
SW
Device Driver
C P U
Application
SW
Device Driver
Verification
Debugging
System Bus
Copyrightⓒ2004
10
4장. HW/SW Co-Design for SoC
Flexibility: 응용 가능한 제품의 수
HW/SW Co-design Space
SW: Flexibility 증가
10000
1000
100
Co-Design
Space
HW: 전력 효율 증가
10
1
101
102
103
104
105
전력 효율 (MIPS/W)
Copyrightⓒ2004
11
4장. HW/SW Co-Design for SoC
Design space exploration
Customer/marketing
system architect
Cospecification
High-level
transformation
System
architect
Design space
exploration
space
System
analysis
Reused functions
and processes
Process
transformation
HW/SW partitioning
and scheduling
HW arch & comp.
Reused HW & SW
components
HW synthesis
SW synthesis
Source: Ernst (IEEE D & T of Computer)
Evaluation (cosimulation)
Copyrightⓒ2004
12
4장. HW/SW Co-Design for SoC
Previous work(1)

ASIP( application specific integrated processor ) codesign
=> builds a specific programmable processor
=> translates the application into software code
executable by the specific processor
=> include the instruction set design

Hardware / Software synchronous system co-design
=> software processor acting as a master controller
=> a set of hardware accelerators acting as coprocessors
=> cost for software and speed for hardware
Copyrightⓒ2004
13
4장. HW/SW Co-Design for SoC
Previous work(2)

Hardware / Software for distributed systems
=> the mapping of a set of communicating processes
onto a set of interconnected processors
=> behavioral decomposition, processor allocation,
communication transformation
=> partitioning methods restrict the cost function to
parameter

Co-design corporations
=> Coware, Specsyn, Siera, Ptolemy
Copyrightⓒ2004
14
4장. HW/SW Co-Design for SoC
System Specification

개요


시스템을 통일된 표현 기법으로 기술하는 단계
특징





하드웨어와 소프트웨어를 위한 통일된 설계/분석 기술을 지원해야 함
시스템 작업들이 쉽게 하드웨어 또는 소프트웨어로 변경될 수 있음
통합된 설계 환경에서 시스템 평가 가능
빠른 성능 분석 가능
System-level language : systemVerilog, SystemC, SpecC, etc…
Copyrightⓒ2004
15
4장. HW/SW Co-Design for SoC
System-Level Language

필요성




시스템 디자인의 복잡도(complexity) 증가
고수준의 추상화(abstraction)와 모델링(modeling) 요구
효율적인 시스템 디자인 flow가 필요
갖추어야 할 사항




다양한 추상화 레벨의 시스템 모델을 지원하여야 함
임베디드 소프트웨어 부분을 전체 시스템에 통합할 수 있어야 함
실행가능한 디자인 명세(specification)를 생성할 수 있어야 함
실행가능한 플랫폼 모델을 생성할 수 있어야 함
Copyrightⓒ2004
16
4장. HW/SW Co-Design for SoC
System Level Language의 종류
SystemC
Cynlib
C/C++ Based
SoC++
Handel-C
A/RT
(Library)
VHDL+
System-Level
Modeling Language
VHDL/Verilog
Replacements
System Verilog
Higher-Level
Language
SDL
SLDL
Entirely New
Language
SUPERLOG
Java Based
Copyrightⓒ2004
Java
17
4장. HW/SW Co-Design for SoC
SystemVerilog

특징





Verilog 코드의 생산성(productivity)과 가독성(readability) 향상
간결한 하드웨어 기술(hardware description) 제공
Verilog-2001로 High-level abstraction을 확장
Verilog의 assertion 문을 통합하여 검증(verification) 확장
Reference
systemVerilog 3.1, ballot draft: Accellera’s Extensions to Verilog Accellera,
Napa, California, April 2003
 Verilog 2001: A Guide to the new Verilog Standard, Stuart Sutherland,
Kluwer Academic Publishers, Boston, Massachusetts, 2001
 http://www.eedesign.com/story/OEG20030521S0086

Copyrightⓒ2004
18
4장. HW/SW Co-Design for SoC
SystemC Design Flow
Copyrightⓒ2004
19
4장. HW/SW Co-Design for SoC
System-Level Language의 동향

기존의 디자인 언어 기능 확장(ex: SystemVerilog)
다양한 추상화 레벨의 시스템 모델을 지원
 장점




단점



설계자에게 친숙한 문법과 환경 제공
이전 버전과의 호환성 제공
현재 표준화 미비
하드웨어 기술언어 습득이 어려움
C/C++ 기반의 언어사용(ex: SystemC)

장점



고도의 추상적이고 논리적인 기술 가능
Executable specification에 적합
단점


HW/SW 분할의 용이성 부족
하드웨어 특성에 대한 완벽한 기술이 용이하지 않음
Copyrightⓒ2004
20
4장. HW/SW Co-Design for SoC
General Co-design Process
System
Specification
Performance Goal
Control Data
Flow Graph
Cost Estimation
[Delay, Area, Power]
Constraint
Analysis
C
O
I
S
I
M
U
L
A
T
O
R
Memory / Pipeline
Optimization
Hardware Software
Partitioning
Co Synthesis
Interface Library
Software
Specificaion
Hardware
Specification
Interface
Specification
Behavioral
Synthesis
Complier
Custom HW
Memory
Application
SW
Device Driver
C P U
Application
SW
Device Driver
Verification
Debugging
System Bus
Copyrightⓒ2004
21
4장. HW/SW Co-Design for SoC
HW / SW Partitioning 개요

정 의


목 표


HW / SW Co-design에서의 분할은 상위 단계의 시스템 동작 기술을
하드웨어와 소프트웨어 부분으로 재구성 하는 것을 의미
시스템의 성능, 면적, 지연시간, 통신으로 인한 오버헤드 등을 고려하여
가격과 성능을 모두 만족시킬 수 있도록 분할
특징

Hardware 구현



하드웨어의 속도와 작업의 병렬 수행을 통한 높은 성능을 제공
추가적인 ASIC 또는 FPGA등이 필요하므로 하드웨어 비용 증가
Software 구현


저가의 고성능 프로세서에서 동작하므로 하드웨어 비용 감소
오퍼레이션의 순차적인 실행으로 인한 성능의 감소
Copyrightⓒ2004
22
4장. HW/SW Co-Design for SoC
HW / SW Partitioning 장점

시스템 개발에 소요되는 비용과 시간의 최소화


최적의 가격 대 성능비


HW의 성능과 SW의 저렴성 사이의 균형을 맞추어 최적의 가격 대 성능
비 도출
[분할 -> 평가 -> 재분할] 과정의 반복 횟수 감소


시장경쟁력 확보 (Time to Market)
테스트 단계에서의 검증 시간 단축
유연한 시스템 기술 가능


Software 구현으로 개량 및 유지 보수 용이
여러 형태의 시스템 설계 가능
Copyrightⓒ2004
23
4장. HW/SW Co-Design for SoC
HW / SW Partitioning Flow
System Specification
Target
Architecture
Library
Target
Architecture
SW Compiler
HW Cost
Estimation
SW Cost
Estimation
Partitioning
Graph
HW / SW
Partitioning
HW
SW
Partitioning
Graph
Copyrightⓒ2004
24
4장. HW/SW Co-Design for SoC
HW / SW Partitioning 고려사항

Hardware / Software 동작 특성


Hardware 기능의 공유


하드웨어와 소프트웨어 간의 추가적인 통신시간 고려
Scheduling


하드웨어의 면적과 수행시간을 고려하여 하드웨어 기능의 공유
HW와 SW 간의 통신


하드웨어의 특징과 소프트웨어의 특징을 고려하여 분할을 수행
해야 함
하드웨어와 소프트웨어간의 수행시간 및 순서의 조절
Functional Pipeline

하드웨어와 소프트웨어간의 통신 및 수행시간의 최적화
Copyrightⓒ2004
25
4장. HW/SW Co-Design for SoC
HW / SW 동작 특성


하드웨어로 구현 : 각각의 모듈이 병렬로 동작, 하드웨어 면적의 증가
소프트웨어로 구현 : 각각의 모듈이 순차적으로 동작 가능, 수행시간의 증가
SW Implementation
V1
V2
HW Implementation
Void V1(..)[..]
Void V1(..)[..]
void main()
{
v1(..);
v2(..);
}
Processor
P1
Hardware
H1
SW Schedule
…
V1
…
Copyrightⓒ2004
V1
V2
HW Schedule
…
V2
…
…
…t
V1
V2
…
…
…t
26
4장. HW/SW Co-Design for SoC
HW 기능의 공유


하드웨어 면적(area) 최소화 가능
수행시간(execution time)의 증가
V1
V2
V3
V4
AREA :
FU1 FU2 FU3 FU4
FU1 FU2
FU1
TIME :
V1
V1
V2
V2
V3
V2
V3
V4
V3
V4
t
Copyrightⓒ2004
V1
V4
t
t
27
4장. HW/SW Co-Design for SoC
HW와 SW간 Interfacing


하드웨어와 소프트웨어의 데이터 전송을 위해 필요
추가적인 통신 시간에 대한 고려 필요
V1
Processor
P1
Hardware
H1
Channel C
V2
Scheduling
P1
V1
C
H1
V2
t
Copyrightⓒ2004
28
4장. HW/SW Co-Design for SoC
Logical Bus Architecture
System Bus Signals
address, data, control signals
address space consists of the memory
space & I/O space
memory space : memory of the SW
component
I/O space : ports within SW & registers in
other HW
Port Signals
These are specialized signals capable of
directly interfacing between SW & HW
component
Interrupt Signals
When SW & HW components have
completed an operation, or when an error
condition is detected
Copyrightⓒ2004
29
4장. HW/SW Co-Design for SoC
Scheduling
Invalid

Channel Accesses

V2
V1
V1
w1 r3
Communication
Channel을 통해서 프로
세서(SW)가 데이터를
Read 또는 Write 시 다
른 동작(HW 또는 SW)
불가능
w3
V3
V2
t
Valid
V1
V3
V2
w1 r3
r3
V3
HW
t
SW

Execution Order
[Read->Execution->
Write] 단계
 Communication
Channel을 통해서 데이
터를 읽음(Read)
 필요한 데이터를 읽은
후 실행(Execution)
 실행 결과를 다시
Communication
Channel에 전송(Write)

Copyrightⓒ2004
Invalid
V2
V1
read1
read2
r1
V1
r2
w1 w2
V2
t
Valid
write1
V2
V1
write2
r1
w1 r2
w2
SW
t
30
4장. HW/SW Co-Design for SoC
Functional Pipelining

전체 시스템의 수행시간(Execution Time)을 최적화
Total Execution Time
Not Pipeline
V1
V2
V1
V2
V1
V3
V4
Total Execution Time
Pipeline
V4
V4
t
V3
V2
V3
V1
V2
V1
V2
V1
V2
HW
SW
V3
V4
V3
V4
V3
t
Copyrightⓒ2004
31
4장. HW/SW Co-Design for SoC
Partitioning Graph (Computation Models)

State-oriented models



시스템 을 상태(States)의 집합과 상태 변화 (Transition)의 집합으로 표현
Finite State Machine (FSM)
Hierarchical Concurrent FSM


계층적인 구조를 가짐
하위(Sub-State) 구조는 상위 구조와 Concurrent 하게 동작
C’/00
C’/00
S00
C/10
S00
C/00
C/10
C/00
S10
S02
S01
S02
C/01
S01
S13
S11
C/01
S12
C’/00
C’/00
FSM
Copyrightⓒ2004
C’/00
C’/00
Hierarchical Concurrent FSM
32
4장. HW/SW Co-Design for SoC
Partitioning Graph (Computation Models)

Activity-oriented models
시스템을 Data 또는 Control Dependency에 의한 Activity의 집합으로 표현
 Data Flow Graph (DFG)
 Control Flow Graph (CFG)
 Control / Data Flow Graph (CDFG)



Read
a
CFG와 DFG를 합친 모델
일반적으로 HW / SW Co-design에서 많이 쓰임.
Read
b
Read
c
Read
d
Start
read
i
i=1
-
+
10
<=
If i<=10
Read
X(i)
*
Z(i) = x(i)*y(i)
i=i+1
Write
r
Read
Y(i)
Read
i
1
+
+
end
end
DFG
Copyrightⓒ2004
CFG
CDFG
33
4장. HW/SW Co-Design for SoC
Software Cost Estimation

Software program memory

Assembler Code의 명령어
(Instruction) 개수로 추정
C - Specification
C Compiler

Software data memory


Source Code 내의 모든 변수들
의 메모리 요구량으로부터 추정
Assembler Code
Instruction
Set
Description
Subroutine
Library
Software execution time
Instruction set Description을
통해 명령어의 실행 시간 추정
 Subroutine Library를 통해서 서
브루틴(예:Function 문)의 수행
시간을 추정

Copyrightⓒ2004
Partitioning
Graph
SW Estimation
34
4장. HW/SW Co-Design for SoC
Hardware Cost Estimation

C to VHDL


C - Specification
하드웨어 비용의 추정을 위해 C
Code를 VHDL 코드로 변환
C – VHDL
Generator
Hardware Cost 추정

VHDL Code
Architecture Library를 통해서
VHDL 코드의 실제 하드웨어 면
적 및 수행시간을 추정
Target
Architecture
Partitioning
Graph
HW Estimation
Copyrightⓒ2004
35
4장. HW/SW Co-Design for SoC
Partitioning Algorithm 분류

구조적 알고리즘 (Constructive algorithms)




분할을 하기 전에 각각의 객체들을 그룹화
객체들의 Closeness를 이용해 그룹화
회로의 규모가 커지면, 그룹화 과정에서 과도한 수행시간
반복적 알고리즘 (Iterative algorithms)



일정한 방법에 의하여 분할된 초기 분할 결과를 이용하여 반복적
으로 분할 대상 객체를 이동시켜 분할 결과를 항상 시킴
Constructive 알고리즘에서 쓰는 Closeness 기능 보다 보다 정
확한 평가 가능
Greedy 알고리즘, Simulated Annealing 알고리즘 등등
Copyrightⓒ2004
36
4장. HW/SW Co-Design for SoC
Clustering Partitioning Algorithm

Closeness를 사용하는 Constructive algorithm

알고리듬
Closeness가 높은 객체들을 그룹화
 다시 Closeness를 계산
 종결 조건이 만족할때 까지 반복 수행

O1
30
O2
10
15
10
O4
O1
25
O3
O2
20
O2
O1
10
O4
O1
O2
O3
10
10
O3
O3
O4
10
O4
Avg(10,10) = 10
Avg(15,25) = 20
Copyrightⓒ2004
37
4장. HW/SW Co-Design for SoC
Simulated Annealing

Iterative algorithm modeled after physical annealing process

알고리즘





초기 분할 시작 및 초기 온도(temperature)
천천히 온도 감소e
각각의 온도에서 무작위 이동 발생
분할 비용(Cost)가 개선된 이동만 분할 적용
온도가 높을 때는 분할 비용이 좋지 않은 이동도 분할 적용

분할 결과 및 수행 시간(Complexity)은 온도 감소량(Temperature
decrease rate)에 따라 달라짐

Reference

Kirnighan.B.W “Optimization by simulated annealing” 1983
Copyrightⓒ2004
38
4장. HW/SW Co-Design for SoC
Simulated Annealing (Cont’d)
temp = initial temperature
Cost = objfct(P)
While not frozen loop
while not Equilibrium loop
P_tentative = Move(P)
cost_tentative = objfct(P_tentative)
cost = cost_tentative – cost
if (Accept(cost, temp) > Random(0,1)) then
P = P_tentative
cost = cost_tentative
end if
end loop
temp = DecreaseTemp(temp)
End loop
Accept(cost, temp) = min(1,e-cost/temp)
Copyrightⓒ2004
39
4장. HW/SW Co-Design for SoC
HW / SW Partitioning 관련 Works

POLIS : U.C. Berkeley


COSYMA (Software-oriented Partitioning) :



Cost Estimation에 의한 Simulated Annealing 알고리즘 사용
시스템이 초기에는 소프트웨어로 구현되어 있으며, 시스템 성능의 bottleneck이
되는 부분을 찾아내어 하드웨어 부분으로 분할을 수행하여 시스템의 전체 수행시
간 최소화.
Vulcan (Hardware-oriented Partitioning) : Stanford U.



Co-simulation을 통해서 성능을 추정을 해서 사용자가 직접 하드웨어 부분과 소
프트웨어 부분으로 분할
Greedy 알고리즘 사용
HardwareC로 기술된 시스템에서 greedy 알고리즘을 사용하여 noncritical
operation들을 Software 부분으로 분할하여 하드웨어의 크기 최소화
Ptolemy



Greedy 알고리즘 사용
시스템의 시뮬레이션과 합성을 위한 통합 환경 제공
전체 시스템을 통일되고 일반적인 언어로 기술하는 대신 각 서브 시스템에 적합한
표현 방식을 사용하며 이질적인 표현 방식들 간의 통일된 인터페이스를 제공
Copyrightⓒ2004
40
4장. HW/SW Co-Design for SoC
General Co-design Process
System
Specification
Performance Goal
Control Data
Flow Graph
Cost Estimation
[Delay, Area, Power]
Constraint
Analysis
C
O
I
S
I
M
U
L
A
T
O
R
Memory / Pipeline
Optimization
Hardware Software
Partitioning
Co Synthesis
Interface Library
Software
Specificaion
Hardware
Specification
Interface
Specification
Behavioral
Synthesis
Complier
Custom HW
Memory
Application
SW
Device Driver
C P U
Application
SW
Device Driver
Verification
Debugging
System Bus
Copyrightⓒ2004
41
4장. HW/SW Co-Design for SoC
HW/SW Co-Synthesis

분할된 결과로부터 각 하드웨어/소프
트웨어 컴포넌트와 인터페이스를 합성
하는 단계

실제 시스템 구조(architecture)에 적합
하도록 성능의 최적화(optimization)

Hardware synthesis



FPGA or ASIC
HDL structural description
기존 EDA tools


SW
Design complier, Synplify
HW
send
recv
recv
send
Software synthesis




Partition
Processor (ARM, Teak DSP…)
C or Assembly code generation
Compile & Optimize
Interface synthesis


Communication channel (BUS,
Shared memory…)
BUS and Protocol generation
Copyrightⓒ2004
Software with
interface
Hardware with
interface
42
4장. HW/SW Co-Design for SoC
Hardware Synthesis

정의


하드웨어 기술 언어(HDL)로 작성된 설계 명세(specification)로
부터 하드웨어 설계를 자동적으로 구현하는 단계
목표




빠른 설계의 작성과 수정
다양한 설계 대안을 제시하는 방법론이 지원
VLSI 설계 시 디자이너가 지나치게 상세한 부분에 대한 처리를
제거
정확하게 설계된 개발이 가능
Copyrightⓒ2004
43
4장. HW/SW Co-Design for SoC
Hardware Synthesis (con’t)

Communication considering

Receive, send, control logics 생성
C
A
B
E
D
G
F
Hardware
Communication Network
C
Recv 1
B
Hardware
Send 1
Receive
Logic
D
C
B
D
Send
Logic
Send 2
Hardware
Clock
Control Logic
Copyrightⓒ2004
44
4장. HW/SW Co-Design for SoC
Wrapper for Hardware core ASIC

Wrapper


Network protocol에 따라 신호를 생성
Core ASIC의 수정이 없는 architecture independent 한 특성
Wrapper
Data Bus
Processor
Signals for
protocol
Output Ready
ASIC
Intput Ready
Start
Copyrightⓒ2004
45
4장. HW/SW Co-Design for SoC
Software Synthesis

정의


명세서(specifications)와 재사용 가능한 컴포넌트로부터 정확하
고 효율적인 소프트웨어를 자동으로 생성하는 단계
목표





소프트웨어 생산성의 향상
낮은 개발 단가
명세서를 만족하는 소프트웨어 구현의 신뢰성 향상
정확한 프로그램 개발 가능
메모리 사용의 최소화 (코드, 데이터)
Copyrightⓒ2004
46
4장. HW/SW Co-Design for SoC
Software Synthesis (con’t)
…
void main()
{
…
/* hardware execution */
recv(data);
B();
C();
D();
send(data1, data2);
/* hardware execution */
...
}
Copyrightⓒ2004
E
C
A
F
Software
Send 1
C
Recv 1
G
D
B
D
B
Send 2
software
47
4장. HW/SW Co-Design for SoC
Interface Synthesis

이종 컴포넌트간의 통신이 가능하도록 합성하는 단계

Interface components
Hardware : bus interface, glue logic
 Software : device driver, operating system


Target architecture 에 의존적
Operating
System
Processor
ASIC
Communication
Architecture
Device Drivers
Network
Interface
Network
Interface
On-chip Network
Copyrightⓒ2004
48
4장. HW/SW Co-Design for SoC
Model, Channel, Protocol의 종류

Communication Model



Communication Channels





Message passing
Shared memory
communication
Dedicated lines
Bus
FIFO buffers
Shared memory
Inter-process Communication Model
Shared Memory M
Process A
begin
variable i;
...
M := i;
...
end
Process B
begin
variable j;
...
j := M;
...
end
< Shared Memory >
Process A
begin
variable i;
...
send(i);
...
end
Process B
Channel
begin
variable j;
...
receive(j);
...
end
< Message Passing >
Communication Protocols
2-phase or 4-phase
handshake
 RS-232, USB, PCI and etc…

Copyrightⓒ2004
49
4장. HW/SW Co-Design for SoC
Communication Channels의 특징
Communication Communication Blocking or
Channel
Model
Non-blocking
Topology
Dedicated lines
Message passing
Blocking
Point-to-point
Bus (without memory)
Message passing
Blocking
Multi-way
FIFO
Message passing
Non-blocking
Point-to-point
FIFO
Shared memory
Non-blocking
Multi-way
Copyrightⓒ2004
50
4장. HW/SW Co-Design for SoC
Channel Refinement


메시지가 전송되는 채널을 실제
적인 통신 네트워크로 구축
Bus generation
Channels
F2
F1
bus width 결정
 data line의 개수


F3
Protocol generation

버스를 통해 일어나는 전송 메커
니즘의 정의
Microprocessor
F1
OS
ASIC
ASIC
F2
F3
Network Interface
Network Interface
Device Drivers
Network Interface
Physical Communication Network
Copyrightⓒ2004
51
4장. HW/SW Co-Design for SoC
General Co-design Process
System
Specification
Performance Goal
Control Data
Flow Graph
Cost Estimation
[Delay, Area, Power]
Constraint
Analysis
C
O
I
S
I
M
U
L
A
T
O
R
Memory / Pipeline
Optimization
Hardware Software
Partitioning
Co Synthesis
Interface Library
Software
Specificaion
Hardware
Specification
Interface
Specification
Behavioral
Synthesis
Complier
Custom HW
Memory
Application
SW
Device Driver
C P U
Application
SW
Device Driver
Verification
Debugging
System Bus
Copyrightⓒ2004
52
4장. HW/SW Co-Design for SoC
HW/SW Co-verification 란?



ASIC으로 구현되는 하드웨어의
동작과 마이크로프로세서 위에
서 돌아가는 소프트웨어를 동시
에 고려하여 회로의 동작을 검
증하는 방법
SoC, embedded system 같은
하드웨어 요소와 소프트웨어 요
소를 동시에 가지고 있는 시스
템의 검증시 사용
각 디자인 단계에서 하드웨어와
소프트웨어 요소가 가질 수 있
는 모든 문제점을 빠르게 발견
하여 수정
Copyrightⓒ2004
Software Code
HDL Code
Compiler
Processor Model Setup
Link
Configuration File Name
Specification
Debug Files Information
Configuration File Setup
Debug Memory Definition
HW/SW Co-verification
53
4장. HW/SW Co-Design for SoC
Co-verification Methods

시뮬레이션 기반 (Simulation-based)




장점 : 유동성 (flexibility), high visibility
단점 : 검증 속도가 매우 느림
실제적인 구현 전 단계에서 사용가능 하여 경제적
에뮬레이션 기반 (Emulation-based)



장점 : speed, ICE capability
단점 : high cost, low visibility
실제적으로 동작하는 과정을 보며 검증 가능
Copyrightⓒ2004
54
4장. HW/SW Co-Design for SoC
Co-simulation

Soft or Virtual Prototype


Simulator Feature




ISS
(C/C++)
HDL
Simulator
(VHDL,
Verilog)
Behavioral level HDL로 구현하고 logic
simulator를 사용하여 시뮬레이션
Software Simulation




ISS (Instruction Set Simulator)
HDL simulator
Virtual Interface
Bus
Functional
Model
Hardware Simulation


Software models for system
Instruction Set Simulator (ISS)을 사용하여
컴파일 된 코드를 호스트에서 실행시킴
Abstract RTOS 를 지원해야 함
Peripherals를 C-model로 작성하여 시뮬레이
션
Interface Simulation


가상의 transaction을 사용
Abstract device driver 와 하드웨어 로직을
위해 behavioral level HDL로 wrapper를 기술
Copyrightⓒ2004
55
4장. HW/SW Co-Design for SoC
Co-emulation

Real prototype

Hardware models for system
Emulation System

Emulation system feature
FPGA for HW prototype
 Real CPU for SW code
 Peripherals for I/O


CPU
DSP
FPGA
Memory
UART
Etc...
System-Level
Testbench
에뮬레이션 시스템은 시뮬레이
터에 비해 매우 빠르게 동작하
므로 검증 시간 단축
Copyrightⓒ2004
56
4장. HW/SW Co-Design for SoC
Co-verification Strategy

각 설계 단계 별로 시뮬레이션과 에뮬레이션을 사용하여
시스템 검증

설계 초기 단계



Virtual prototype을 사용하는 co-simulation에 의한 검증
Prototype의 검증을 통해 설계 초기 단계에 나타나는 비교적 간
단한 문제점을 빠르게 제거할 수 있음
설계 중반 및 구현 단계


Real prototype을 사용하는 co-emulation에 의한 검증
RTL or Gate level 에서 발생할 수 있는 문제점 같은 구현 단계에
서 나타날 수 있는 문제점을 제거
Copyrightⓒ2004
57
4장. HW/SW Co-Design for SoC
HW/SW Co-design for SoC


Introduction of HW/SW Co-design
HW/SW Co-design Methodology




System Specification
HW/SW Co-partitioning
HW/SW Co-synthesis
HW/SW Co-verification

Co-design Related Works

참고문헌
Copyrightⓒ2004
58
4장. HW/SW Co-Design for SoC
Mentor Graphics Seamless CVE

Seamless® 는 하드웨어 시뮬레이터, ISS, abstract
RTOS, virtual interface를 통하여, 우수한 co-simulation
환경을 제공
ISS
BUS transaction
HDL simulator
Copyrightⓒ2004
59
4장. HW/SW Co-Design for SoC
Altera Excalibur

ARM core, AHB, FPGA를 하나의 칩 안에 구성

실제 구현단계에서의 빠르고 뛰어난 HW/SW 검증 환경 제공
AHB
ARM core
FPGA
Copyrightⓒ2004
60
4장. HW/SW Co-Design for SoC
EDA Co-design/Co-simulation Tools
Company
Product
Feature
Cadence
VCC
HW/SW co-design tool
Coware
N2C
C/C++ based high-level co-design tool
Mentor
Graphics
Platform Express
Platform based SoC design tool
Mentor
Graphics
Seamless CVE
HW/SW co-simulation tool
Synopsys
CoCentric System Studio
SystemC based co-design & co-verification
tool
Copyrightⓒ2004
61
4장. HW/SW Co-Design for SoC
HW/SW Co-design for SoC


Introduction of HW/SW Co-design
HW/SW Co-design Methodology






System Specification
HW/SW Co-partitioning
HW/SW Co-synthesis
HW/SW Co-verification
Co-design Related Works
참고문헌
Copyrightⓒ2004
62
4장. HW/SW Co-Design for SoC
Copyrightⓒ2004
63
4장. HW/SW Co-Design for SoC
Design Flow using Seamless
CVE
Specify
System

Typical System Design
Process

Specify system

Design HW & SW

Test together
Design
Hardware
i
e
n
i
f
f
e
i
c
t
n
Design
Software
Integrate & Test
No
No
OK?
Yes
Copyrightⓒ2004
64
4장. HW/SW Co-Design for SoC
Design Flow using Seamless CVE


Logic Simulation
Requires a fully functional microprocessor model







Bus functional models are not fully functional
Software models are too slow
Software models may not be available
Hardware models have limited capability
Limited debugging capability
Okay for verifying hardware
Ineffective for running software
Copyrightⓒ2004
65
4장. HW/SW Co-Design for SoC
Design Flow using Seamless
CVE

Instruction Set Simulation

Fast

Good debugging capability

Can model custom hardware

Limited I/O and interrupt handling
Copyrightⓒ2004
66
4장. HW/SW Co-Design for SoC
Design Flow using Seamless
CVE

Seamless CVE
X-ray for Debug
g
e bu atio n
D
a re Sim ul
w
t
f
So ware
t
Sof
ModelSIM for Simulation
Seamless
te
Sys
m
t
Con
rol
ug n
b
e
o
e D ulati
r
a
d w re Sim
r
a
H
dwa
r
a
H
Performance
Optimization
Copyrightⓒ2004
67
4장. HW/SW Co-Design for SoC
Design Flow using Seamless
CVE
Copyrightⓒ2004
68
4장. HW/SW Co-Design for SoC
Platform 분류

Application Platform:

멀티미디어 platform: Nexperia, TI의 OMAP

3G 무선 platform: Infineon의 M-gold
Bluetooth platform: Parthus
무선 platform: ARM의 PrimeXsys



Process-centric platform


Improv System, ARC, Tensilica, Triscend
Communication-centric platform:

Sonics, Palmchip
Copyrightⓒ2004
69
4장. HW/SW Co-Design for SoC
The Platform-Based
Design Concept
Cadence
Pre-Qualified/Verified
Foundation-IP*
HW-SW Kernel
+ Reference Design
Scaleable
bus, test, power, IO,
clock, timing architectures
MEM
Hardware IP
SW IP
Application
Space CPU
FPGA
Reconfigurable Hardware Region
(FPGA, LPGA, …)
Programmable
*IP can be hardware (digital
or analogue) or software.
IP can be hard, soft or
‘firm’ (HW), source or
object (SW)
Copyrightⓒ2004
Processor(s), RTOS(es)
and SW architecture
Foundry-Specific
HW Qualification
SW architecture
characterisation
70
4장. HW/SW Co-Design for SoC
WCDMA BER with Image
Quality Verification
Cadence
WCDMA
Channel
Floating Point.
WCDMA
Channel
A
• modulation
transfer
function area
(MTFA)
Image
Quality
Tester
(IQT)
• integrated
contrast
sensitivity (ICS)
B
• square root
integral (SQRI)
• subjective
quality factor
(SQF)
• folded SQRI
Fixed Point.
Polymorphism for
Rapid Conversion
Floating to Fixed Pt.
Models
Copyrightⓒ2004
• folded MTFA
• peak
signal/noise
ratio (PSNR).
71
4장. HW/SW Co-Design for SoC
Platform Architecture
Do I need a
dedicated DSP ?
Which microcontroller? ARM?
HC11? ARC?
Which RTOS do I use? Which scheduling
policy do I have to choose ?
How fast will my
user interface
software run? How
much can I fit onto
my microcontroller?
Which Bus? PI? AMBA?
Dedicated Bus for DSP?
Can I buy a QCELP
decoding core?
Do I need a dedicated
HW or can I run this
on the Microcontroller ?
Copyrightⓒ2004
72
4장. HW/SW Co-Design for SoC
Platform-based HW/SW Coverification
Copyrightⓒ2004
73
4장. HW/SW Co-Design for SoC
Ingredients of An Architecture
Platform
Copyrightⓒ2004
74
4장. HW/SW Co-Design for SoC
Example of Platform-based
Design
Copyrightⓒ2004
75
4장. HW/SW Co-Design for SoC
Pros & Cons of Platform-based Desig
n
Copyrightⓒ2004
76
4장. HW/SW Co-Design for SoC
Triscend A7 System Highlights
Copyrightⓒ2004
77
4장. HW/SW Co-Design for SoC
Cypress MicroSystems - PSoCTM
Copyrightⓒ2004
78
4장. HW/SW Co-Design for SoC
Wipro’s SOC-RaPtorTM Architecture
Copyrightⓒ2004
79
4장. HW/SW Co-Design for SoC
Philips Rapid System Prototyping
Copyrightⓒ2004
80
4장. HW/SW Co-Design for SoC
Embedded Software Architecture
for SoC Design
Copyrightⓒ2004
81
4장. HW/SW Co-Design for SoC
Solutions to Derivative
Design Problem
The corrupted voice mail decoding is due to the JPEG decoding hav
ing a higher priority than the QCELP audio decoding of the voice m
ail, on the DSP. There are 2 possible solutions:
1. HW/SW tradeoff
Move JPEG decoding which stalled the QCELP audio decoding
into hardware
2. SW/SW tradeoff
Re-prioritise the QCELP audio decoding
We will explore option 1, by moving part of the JPEG decoding (IDC
T) to dedicated HW.
Copyrightⓒ2004
82
4장. HW/SW Co-Design for SoC
PEG Encoding/ Decoding Design of
Systems-on-a-Chip
Masaharu Imai School of IST, Osaka University
E-mail: imai@ics.es.osaka-u.ac.jp
http://vlsilab.ics.es.osaka-u.ac.jp/~imai/







Principle of codesign method
Specification of Machikane-I
JPEG Encoding
Decoding
Block Diagram of Machikane-I
Estimation of Design Quality
Optimum Partitioning
Copyrightⓒ2004
83
4장. HW/SW Co-Design for SoC
Design strategy

Put high-rate but simple functions on peripheral processors.

Also moves control physically closer.

Consolidate low-rate background tasks on main CPU.

Can multiple processes execute concurrently?

Is the performance granularity of available components fine
enough to allow efficient search of the solution space?

Do computation and communication requirements conflict?

How accurately can we estimate performance?
 software
 custom ASICs
Copyrightⓒ2004
84
4장. HW/SW Co-Design for SoC
Granularity of Description
Copyrightⓒ2004
85
4장. HW/SW Co-Design for SoC
Tradeoff between HW cost
and
Performance
Copyrightⓒ2004
86
4장. HW/SW Co-Design for SoC
Spec. of Digital Motion Camera
Function
 Encoding Procedure
Image Input
 R2Y: Transform from RGB to YCbCr
Image Display
 DCT: Discrete Cosine Transform
Image Compression  Q: Quantization
 VLC: Variable Length Coding
Image Store
 Decoding Procedure
Image Transmission  VLD: Variable Length Decoding
 IQ: Inverse Quantization
User Interface
 IDCT: Inverse Discrete Cosine Transform


Copyrightⓒ2004
Y2R: Transform from YCbCr to RGB
87
4장. HW/SW Co-Design for SoC
Block Diagram
Copyrightⓒ2004
88
4장. HW/SW Co-Design for SoC
CODEC
Copyrightⓒ2004
89
4장. HW/SW Co-Design for SoC
Software Implementation
Copyrightⓒ2004
90
4장. HW/SW Co-Design for SoC
Some Functions by
Hardware
Copyrightⓒ2004
91
4장. HW/SW Co-Design for SoC
Computation Time
Copyrightⓒ2004
92
4장. HW/SW Co-Design for SoC
Pipeline Processing of
Images
Copyrightⓒ2004
93
4장. HW/SW Co-Design for SoC
Case 1: Parallel Processing
Copyrightⓒ2004
94
4장. HW/SW Co-Design for SoC
Case 2: Sequential Processing
Copyrightⓒ2004
95
4장. HW/SW Co-Design for SoC
Partitioning 방법
Copyrightⓒ2004
96
4장. HW/SW Co-Design for SoC
Partitoining Method C
Copyrightⓒ2004
97
4장. HW/SW Co-Design for SoC
Partitioning Method D
Copyrightⓒ2004
98
4장. HW/SW Co-Design for SoC
Partitioning v.s. Design Quality
Copyrightⓒ2004
99
4장. HW/SW Co-Design for SoC
Partitioning v.s. Design Quality
Copyrightⓒ2004
100
4장. HW/SW Co-Design for SoC
분할 방법
Grouping of
Similar Process Components
Major Classification
Division
Assignment
Grouping
HardWare
HardWare
Detailed Classification
Assignment
HardWare
Process
Component 1
Process
Component 1
Group 1
Process
Component 1
Process
Component 2
Process
Component 3
Group 2
Process
Component 3
Process
Component 3
Process
Component 6
SoftWare
The Design
Target
Model
SoftWare
Process
Component 2
Group 3
Process
Component 4
Group 4
SoftWare
Process
Component 2
Un-Decision
Process
Component n
Un-Decision
Process
Component 5
Group 5
Process
Component 4
Process
Component 6
Group 6
Process
Component 5
Y. Endo, H. Koizumi
Dept. of Computer & System Eng. Tokyo Denki Univ. Japan
Copyrightⓒ2004
101
4장. HW/SW Co-Design for SoC
Major Classification



Visual Basic로 design target model을 description하고
process component로 나누기 위해 분석한다.
각 process component에 대한 software speed를 예측
함.
HW parts와 SW parts로의 구분



각 component의 process 속도가 예상 결과를 만족시키지 못하
는 경우 HW part가 된다.
Func. trade-off를 위한 process component의 수를 줄
이는 것은 trade-off를 위한 작업 시간을 크게 줄임
때문에, 유사한 process component들을 group화 하여
trade-off를 위한 component의 수를 줄인다.
Copyrightⓒ2004
102
4장. HW/SW Co-Design for SoC
Detailed Classification



HW processing time을 계산하는 방법
 Model의 SW speed에 coefficient constant를 곱하는 방법
 SW logic에 상응하는 knowledge database를 적용하여 HW
processing speed를 계산하는 방법
HW와 SW간의 communication time
 HW와 SW간의 communication 속도와 전송 data의 양에 기반하
여 계산
Design의 목적과 제한사항에 대한 모든 조합이 만족되지 않으면,
system algorithm의 개선과 design 목적 및 제한사항을 재검토하게
된다.
Copyrightⓒ2004
103
4장. HW/SW Co-Design for SoC
Development of a Real-time Motion Image
Encoder using Codesign Methodology
C.W Chau, S.Kwong, K.F.Man, W.A.Halang and A.D. Stoyenko
City University of Hong Kong Fern Universitat, Hagen, Germany New Jersey Institute of Technology


A set of Design constraints: area, real-time requirements, performance,
memory requirements, power, consumption and programmability
In a complete design cycle for a real-time image encoder
 develop or select of an algorithm
 use a high-level functional simulations to verify the suitability of the
algorithm
 the algorithm will then break down into several sub-functions.
 decomposition of algorithm by hardware/software co-design

Software implementation
: Functions that need field programmability or that inherently are better
implemented in software

Hardware implementation
: Time-critical functions to improve system performance
Copyrightⓒ2004
104
4장. HW/SW Co-Design for SoC
Introduction



Hardware design
: selection of processor, number of processors and their
connections
On the software design
 different type of processors affect code generation when using
fixedpoint DSP
 overflow prevention must be included
 the algorithm must be modified to minimize the effect of finite
precision
of fixed-point calculations
Interface design
adding latches, buffers or address decoders in hardware design
 Handling I/O routine and hardware/software synchronization
mechanism in software design

Copyrightⓒ2004
105
4장. HW/SW Co-Design for SoC
System model of the real-time image
encoder
Motion image
Motion image
PC/AT or
compatible
computer
Interface
Logic
Compressed
image data
Compressed
image data
Motion
Compensation
Motion vectors
and
estimation errors
Compressed
image data
Buffer
Control
Master
Slave
Image
Compensation
Compression control
parameters
Figure 2. System model of the image encoder
Copyrightⓒ2004
106
4장. HW/SW Co-Design for SoC
System model of the real-time image
encoder

Master-slave model for the system
(PC/AT : master , image encoder : slave system)

The purpose of real-time image encoder
: compress a stream of motion images with resolution of 128 by
128 pixels

Slave system : coprocessor to reduce the workload of the host
system

Host system
: more time to handle other tasks such as user interface, data
logging
Copyrightⓒ2004
107
4장. HW/SW Co-Design for SoC
System specifications
Description
Requirements
Size of image
128 by 128 pixels
Compression Ratio
Adjustable, 30 times compression with 29db
SNR
Processing Speed
Maximum compress speed = 15 frames/sec.
Host system
PC/AT or compatible computer
Size
Same as PC/AT add-on card
Cost
Less than US $200.00
Copyrightⓒ2004
108
4장. HW/SW Co-Design for SoC
Interface logic module
DMA interface :
implementation for two data transfer channels
 for transferring motion images from the master to the real-time image
encoder
 for transferring of the compressed image data back to the master
Motion
compensation
module
DMA Interface
FIFO buffer
PC/AT
PC/AT I/O
space decoder
FIFO buffer
Buffer control
module
DMA Interface
Configuration
and control
registers
Image
compression
module
Interface logic
Copyrightⓒ2004
109
4장. HW/SW Co-Design for SoC
Motion Compensation
Module
searching window
Searching Window
8 by 8 pixels block
Copyrightⓒ2004
110
4장. HW/SW Co-Design for SoC
Image Compression Module

Discrete Cosine Transform(DCT) algorithm
: transforming the estimation error from time domain to frequency
domain

After the transformation : DC data(mean of the each pixel in
searching block),
Frequency data of the block

Frequency domain is much less than in time domain, since only
information
of edges is existed in frequency domain.
Huffman coding
 performing on the quantized data and motion vector for more
reduction
Copyrightⓒ2004
111

4장. HW/SW Co-Design for SoC
Buffer Control Module




Maintaining a constant bit rate output
of the real-time image encoder
Interface Logic
The importance of a constant bit rate
output
: some applications such as picture
phone and tele-conference
 because a variable bit rate image
compression will cause uneven
transmission time and jagged
playback in these applications.
Timeout timer : the indicator of being
required transfer bit rate too high and
too low
When either states are active  the
buffer controller will request image
compression module to adjust the
image compression ratio to maintain
the required transfer bit rate
Copyrightⓒ2004
Request
Timer
Timeout
Timer
Buffer controller
Image compression
module
FIFO buffer
112
4장. HW/SW Co-Design for SoC
Software/Hardware Partition

Interface logic module : only implementation in hardware
 because of hardware interface
between ISA bus
and real-time image encoder

Several hardware/software partition methods for the other three
modules
Interface control
and buffer control
PC/AT
out FIFO buffer
In FIFO buffer
Copyrightⓒ2004
DSP56001R40
Memories :
ROM, SRAM
113
4장. HW/SW Co-Design for SoC
Method 1







All the functions except interface logic and buffer
control are implemented in software.
DSP56001R40 is used as the processor.
Low cost(fixed-point DSP), high precision(24 bits
word length), high performance(deliver 40 million
operations per second), simple hardware
implementation(simple bus interface)
The simulator : sim56000 for DSP56000 family
The clock cycles for compressing an image : 7.27*106
Compression speed of this system : 5.5 frames per
second
Therefore this design obviously cannot meet the
system specifications.
Copyrightⓒ2004
114
4장. HW/SW Co-Design for SoC
System architecture
Interface control
and buffer control
Interface control
and buffer control
Memories :
REGFIFO,BANK1
and BANK2
DSP56001R40
Memories :
ROM, SRAM
PC/AT
in FIFO buffer
out FIFO buffer
Copyrightⓒ2004
115
4장. HW/SW Co-Design for SoC
Method 2






The IN-FIFO is used to prefetch next image block.
The REG-FIFO is used to hold the current image block during block
searching.
In order to save the data for further use, the data must be written
back to the FIFO immediately.
BANK1 and BANK2 are used as a double image buffer.
One of the bank stored the previous decoded frame and the other
bank will be fill up with current decoded frame by the external image
compression module.
After the motion compensation of current frame finish, the role of the
two banks
will switch and then the motion compensation of next frame will start.
Copyrightⓒ2004
116
4장. HW/SW Co-Design for SoC
Motion compensation
module
REG-FIFO
MUX
`
From PC/AT
Subtract
unit
IN-FIFO
MUX
MIN-VTR
Absolute
unit
accumulator
MIN-ERR
comparator
BANK1
MUX
CTR1
CONTROL
UNIT
BANK2
MUX
CTR2
To image compression module
Copyrightⓒ2004
To interface logic
117
4장. HW/SW Co-Design for SoC
Method 3






CTR1 and CTR2 are used to generate addresses for the two SRAM
banks.
The subtraction unit, absolute unit, accumulator, comparator,
MIN-ERR register, MIN-VTR register
: being used to find the minimum total absolute error and the
corresponding
motion vector of the current block searching
If the error of current iteration is smaller than the error in MIN-ERR
 then MIN-ERR and MIN-VTR will be updated to the new
minimum total
absolute error and the corresponding motion vector
respectively
After all iterations of the current block searching, the motion
vector with minimum error will be stored in REG-VTR.
The motion compensation chip will signal external image
compression module to read this motion vector.
Clock cycles used for each image : 2.43*106
Copyrightⓒ2004
118
4장. HW/SW Co-Design for SoC
Method 4






All modules are implemented in hardware.
The implementation of interface logic, buffer control and motion
compression module are same as before.
But, the image compression module is impossible to be
implemented by a single chip.
At least two Actel’s A1280 is used.
Only two A1280 will over the system cost.
This approach is then rejected.
Copyrightⓒ2004
119
4장. HW/SW Co-Design for SoC
System architecture
Interface control
and buffer control
Motion
compensation
chip
Memories :
REGFIFO,BANK1
and BANK2
DSP56001R40
Memories :
ROM, SRAM
PC/AT
in FIFO buffer
out FIFO buffer
8051MCU
Copyrightⓒ2004
FIFO buffer
120
4장. HW/SW Co-Design for SoC
Method 4








Interface logic, buffer control, motion compensation module :
hardware
The other modules : software
Since 8051 MCU has enough internal memory for huffman coding
so that no external memory is required
A FIFO buffer : interface between 8051 and DSP
DSP: estimation error, DCT/IDCT, buffer control
Clock cycles used for each image : 2.1*106
Compression speed : 15.7 frames/second
In-circuit simulator from Philips is used to simulate the huffman
coding.
Copyrightⓒ2004
121
4장. HW/SW Co-Design for SoC
Results
Description
Compression
Speed
(Frames/sec.)
Cost
(US$)
1
Hardware : interface logic, buffer control
Software : block searching, estimation error,
DCT, inverse DCT, huffman coding
5.5
189.00
2
Hardware : interface logic, buffer control,
block searching
Software : estimation error, DCT, inverse
DCT, huffman coding
16.46
295.00
3
Hardware : interface logic, buffer control,
block searching, estimation error, DCT,
inverse DCT, huffman coding
-
-
4
Hardware : interface logic, buffer control,
block searching
Software : estimation error, DCT, inverse
DCT, huffman coding
15.45
280.70
Metho
d
Copyrightⓒ2004
122
4장. HW/SW Co-Design for SoC
Conclusions





Method 2 and method 4 can be chosen for the system to meet
the system requirement.
But method 4 is selected for our prototype.
Although compression speed of method 4 is slower than method
2 but it still satisfy the system specifications and with cheaper
system cost.
Since processor in method 4 running in a lower clock frequency
so that the level of difficulty of PCB layout in method 4 less than
method 2.
With codesign methodology, the hidden problems of the design
can be discovered in earlier stage which can reduce time and
cost of development.
Copyrightⓒ2004
123
4장. HW/SW Co-Design for SoC
A hardware / software partitioning technique
with hierarchical design space exploration
Houria Oudghiri, Bozena Kaminska, and Janusz Rajski,
Mentor Graphics Corp.

This paper describes a new hardware / software
partitioning approach based on a new use of
hierarchical modeling

A set of DSP examples are considered for co-design
on a specific architecture in order to accelerate their
performance on a target architecture including a
standard DSP processor running concurrently with a
custom SIMD (Single Instruction Multiple Data)
processor
Copyrightⓒ2004
124
4장. HW/SW Co-Design for SoC
1. Introduction and
motivation(1)

Modern electronic system contain a mix of software
running on general-purpose programmable
processors, algorithms hardwired into dedicated
hardware

Hardware / Software co-design is an attempt to
integrate hardware and software design techniques
with the goal of incorporating more of the system
design into a single design methodology

The co-design finds applications in various fields
such as protocol design, car engine control,
parallelizing algorithms to be run on hardware and
software
Copyrightⓒ2004
125
4장. HW/SW Co-Design for SoC
1. Introduction and
motivation(2)
Hierarchical model


The input is the
system model
specified as a set
of blocks or
operations with
all their
interdependency
The input
language may be
an HDL, a
programming
language
Classical co-design process
Select one level
System model
Partitioning
NEW
Partition 1
Partition 2
Partition 3
Partition 4
HW
synthesis
Code
generation
Code
generation
Code
generation
HW
partition
SW
partition 1
SW
partition 2
SW
partition 3
Interface
synthesis
Co-implementation
Evaluation
Final implementation
Copyrightⓒ2004
126
4장. HW/SW Co-Design for SoC
1. Introduction and
motivation(3)

Specific application
=> the co-design process is used to partition the application
algorithms into two execution codes, one for the DSP processor
(software) and the other for the custom SIMD processor
(hardware)

Modeling technique limitations
=> supports various models from the simplest to the most
complex for the same input system
=> modeling technique provides a large choice for the final
implementation
Estimation parameters
=> performance, implementation cost, communication overhead

Copyrightⓒ2004
127
4장. HW/SW Co-Design for SoC
1. Introduction and
motivation(4)

The platform used to accelerate DSP applications is
introduced.

hardware / software partitioning results for the FFT
algorithm are given.

Comparison between implementations without
partitioning and implementations with partitioning is
performed.

Another analysis relies on the degree of acceleration
obtained when different models, in terms of
complexity, are used for the same input system.
Copyrightⓒ2004
128
4장. HW/SW Co-Design for SoC
2. The proposed
methodology(1)

The proposed partitioning algorithm
=> the dependency graph includes all the blocks with their
interaction in the same structure
=> the classification and the comparison between blocks
are easier
=> a unified structure (the weighed dependency graph) is
used during all the partitioning process
=> the node (blocks in the model) with their own weights
(the performance estimation of each block)
=> the edge (block interactions) with their weights too
(quantity of interaction)
Copyrightⓒ2004
129
4장. HW/SW Co-Design for SoC
2. The proposed
methodology(2)

Algorithm 1
input : List of blocks and time constraints , output : Two subsets where blocks
are assigned
Step 1 : construct the complete weighted dependency graph G
Step 2 : Assign all blocks to software, compare the complete system execution time
Step 3 : while (time constraints not satisfied)
do
step 3_i : Select the node with the maximum execution time (i)
step 3_ii : Assign i to hardware, Update the system execution time
step 3_iii : while (time constraints not satisfied)
do
step 3_iii_1 : Select the maximum weighted edge connected to i
with the most time consuming node (j)
step 3_iii_2 : Assign to hardware, Update the dependency graph G
Update the system execution time
endo
endo
Copyrightⓒ2004
130
4장. HW/SW Co-Design for SoC
2. The proposed
methodology(3)

The co-design target architecture is based on two types of
processors

The Texas Instruments DSP processor TMS320C40 is used as
the master processor and the custom SIMD processor PULSE
(Parallel Ultra Large Scale Engine, 4 processors in parallel) as
the slave processor
Copyrightⓒ2004
131
4장. HW/SW Co-Design for SoC
3. Results and discussion(1)

The hierarchical model of the FFT transform behavior. Blue: PULSE
Initialize
Bit
Reversal
FFT
Initialize
Variable
Initialize
Data
Bit_init
Index_init
Read_data
Index_incr
Bit_loop1
Bit_cond
Bit_incr
Bit_shift
Bit_test
Bit_swap1
Bit_swap2
Danielson
control
Output
Dan_init
Dan_loop
Out_init
Out_write
Bit_loop2
Loop2_test
Bit_acc
Loop2_ass
Data_test
Loop2_shif
Danielson
Dan_init
Initialize
Dan_loop1
Copyrightⓒ2004
Level 2
Dan_loop1
Loop2_init
Initialize
Loop1_body
Update
Variables
Loop2_body
Dan_real
Loop2_incr
Dan_imag
Level 7
Level 8
Loop1_incr
Out_incr
Level 1
Loop1_init
Level 3
Level 4
Level 5
Level 6
132
4장. HW/SW Co-Design for SoC
3. Results and discussion(2)

There is an optimal level in the hierarchy.

This level provides an optimal and balanced
partitioning of blocks between hardware and software
implementation.

The use of the most complex and detailed model
doesn’t always mean obtaining the best solution
Copyrightⓒ2004
133
4장. HW/SW Co-Design for SoC
3. Results and discussion(3)

Table 1 : Block assignment at different hierarchical levels of the FFT
transform
level
Nb.of
C40
PULSE
Time(ms) / time
Bolcks
constraint = 25 ms
PULSE
C40
Total
1
4
2
2
18.14
4.8
22.94
2
10
6
4
18.8
2.96
21.76
3
17
11
6
15.56
9
24.56
4
22
18
6
14.68
10.24
24.92
5
24
17
7
14.56
10.4
24.94
6
24
22
2
6.82
17.72
24.54
7
25
22
3
7
17.92
24.92
8
27
18
9
5.88
18.64
24.52
Copyrightⓒ2004
134
4장. HW/SW Co-Design for SoC
3. Results and discussion(4)

Table 2 : Alternative comparison for the FFT transform
Execution time (ms)
Code size (Bytes)
Partition
PULSE
C40
Reductio
n
PULSE
C40
Reduction
1(C40)
0
38.64
…..
0
1260
…..
2(PULSE)
20.44
0
…..
2196
0
…..
Solution
#1
18.15
4.80
40 %
1424
264
23 %
Solution
#8
5.2
18.64
38 %
352
412
65 %
Sol. # 1: all init. And output operations assifned to PULSE
Sol. # 8: only processing operation assigned to PULSE
Copyrightⓒ2004
135
4장. HW/SW Co-Design for SoC
3. Results and discussion(5)

Considering the first levels in the hierarchy, during
partitioning, improve considerably the time
performance but this is not the case for the code size.

The use of medium and last levels may decrease
considerably the code size or the area with a very little
degradation in performance.

The alternative generated are compared to the lowerbound performance (the hardware solution) and the
upper bound performance (the software solution)
implementation in order to fine the best trade-off.
Copyrightⓒ2004
136
Transformational
G.F.Marchioro J.M.Daveau
partitioning
T.B.Ismail A.A.Jerraya
co-design
for
Function
Tx , specification Si, implementation S i+1
4장. HW/SW Co-Design for SoC
(1)


S0  Tx  S1
Si  Tx  Si 1
a sequence of applied transformations
=> development history
development history can be saved and reapplied after
a specification change
(2)
Copyrightⓒ2004
((S0  T0 )  T1 )..... Tn  Sn1
137
4장. HW/SW Co-Design for SoC
Transformation steps(1)

step 1 : functional specification
decomposition
Copyrightⓒ2004
step 2 : functional
138
4장. HW/SW Co-Design for SoC
Transformation steps(2)

step 3 : structural reorganization
=> assign an execution processor to each function
=> this step acts on the design units
=> each abstract processor may be implemented in hardware or
in software
=> several functions may be assigned to the same partition
Copyrightⓒ2004
139
4장. HW/SW Co-Design for SoC
Transformation steps(3)

Step 4 : communication transformation and prototyping
=> communications are transformed into processors
communicating through buses and sharing communication
control
=> the process P1 and P3 will be translated to a behavioral
VDHL
=> the process P2 will be translated to the C-language
Copyrightⓒ2004
140
4장. HW/SW Co-Design for SoC
Structural reorganisation
Step 1 : Move
=> moves a design unit in the hierarchy and used to prepare a
merge operation
Step 2 : Merge
=> the modules that will be assigned to the same processor into
a single design unit
Step 3 : Map
=> permits the identification of hardware and software realization
options for each process
Step 4 : Flat
=> performs a structural flattening operation on the hierarchy
Copyrightⓒ2004
141
4장. HW/SW Co-Design for SoC
Co-design On-line
publications







R. Camposano, J. Wilberg: "Embedded System Design", Design
Automation for Embedded Systems, Vol. 1, Nos. 1-2, January 1996.
Networked Computer Science Technical Reports Library (Technical
reports from many US and other universities): http://cstr.cs.cornell.edu/
IMEC ftp reports (Cathedral): ftp://ftp.imec.be/pub/vsdm/reports/
Stanford Tech Reports: http://elib.stanford.edu/
Synopsys Research Publications:
http://www.synopsys.com/news/pubs/research/ATG_index.html
Paderborn (Camposano) Technical Reports: http://www-date.unipaderborn.de/RESEARCH/BUILDABONG/buildabong.html
University of Dortmund, Methodology for computer-aided design of
integrated circuits (Code generation): http://ls12-www.informatik.unidortmund.de/
Copyrightⓒ2004
142
4장. HW/SW Co-Design for SoC
Co-design Sites

















Bibliography of Hardware/Software Codesign: http://www-ti.informatik.unituebingen.de/~buchen/
Ralf Niemann's Codesign Links and Literature: http://ls12-www.informatik.unidortmund.de/~niemann/codesign/codesign_links.html
URLs to Hardware/Software Co-Design Research:
http://www.ece.cmu.edu/~thomas/hsURL.html
RASSP Architecture Guide: http://www.sanders.com/hpc/ArchGuide/TOC.html
EDA, Electronic Design Automation: http://www.eda.org
COMET (Case Western Reserve University): http://bear.ces.cwru.edu/research/hard_soft.html
COSMOS (Tima - Cmp, France): http://timacmp.imag.fr/Homepages/cosmos/research.html
COSYMA (Braunschweig): http://www.ida.ing.tu-bs.de/projects/cosyma/
Handel-C (Oxford): http://oldwww.comlab.ox.ac.uk/oucl/hwcomp.html
Lycos (Technical University of Lyngby, Denmark): http://www.it.dtu.dk/~lycos/
MOVE (Technical University Delft): http://cardit.et.tudelft.nl/MOVE/
Polis (University of Berkeley): http://www
cad.eecs.berkeley.edu/Respep/Research/hsc/abstract.html
ProCos (UK Research): http://www.comlab.ox.ac.uk/archive/procos/codesign.html
Ptolemy (University of Berkeley): http://ptolemy.eecs.berkeley.edu/
SPAM (Princeton): http://www.ee.princeton.edu/~spam/
TRADES (University of Twente, INF/CAES): http://wwwspa.cs.utwente.nl/aid/aid.html
Specificatietalen
SystemC: http://www.systemc.org
Copyrightⓒ2004
143
4장. HW/SW Co-Design for SoC
Answer III: H/W and S/W Codesign
Copyrightⓒ2004
144
4장. HW/SW Co-Design for SoC
Three Co-Design
Approaches



ASIP co-design: starts with an application, builds a specific
programmable processor and translates it into software code.
H/w s/w synchronous system co-design: s/w processor as a
master controller, and a set of h/w accelerators as coprocessors.
H/w s/w for distributed systems: mapping of a set of
communication processors onto a set of interconnected
processors. Behavioral decomposition, process allocation and
communication transformation. E.g., Coware
Copyrightⓒ2004
145
4장. HW/SW Co-Design for SoC
A Co-design method
Grouping of
Similar Process Components
Major Classification
Division
Assignment
HardWare
Grouping
HardWare
Detailed Classification
Assignment
HardWare
Process
Component 1
Process
Component 1
Group 1
Process
Component 1
Process
Component 2
Process
Component 3
Group 2
Process
Component 3
Process
Component 3
Process
Component 6
SoftWare
The Design
Target
Model
SoftWare
Process
Component 2
Group 3
Process
Component 4
Group 4
SoftWare
Process
Component 2
Un-Decision
Un-Decision
Process
Component 5
Group 5
Process
Component 4
Process
Component 6
Group 6
Process
Component 5
Process
Component n
Y. Endo, H. Koizumi
Copyrightⓒ2004
Dept. of Computer & System Eng. Tokyo Denki Univ. Japan
146
4장. HW/SW Co-Design for SoC
Major Classification



Visual Basic로 design target model을 description하고
process component로 나누기 위해 분석한다.
각 process component에 대한 software speed를 예측
함.
HW parts와 SW parts로의 구분

각 component의 process 속도가 예상 결과를 만족시키지 못하
는 경우 HW part가 된다.
Copyrightⓒ2004
147
4장. HW/SW Co-Design for SoC
Grouping of Similar Process
Components


Func. trade-off를 위한 process component의 수를 줄
이는 것은 trade-off를 위한 작업 시간을 크게 줄일 수 있
다.
때문에, 유사한 process component들을 group화 하여
trade-off를 위한 component의 수를 줄인다.
Copyrightⓒ2004
148
4장. HW/SW Co-Design for SoC
Detailed Classification

HW processing time을 계산하는 방법



HW와 SW간의 communication time


Model의 SW speed에 coefficient constant를 곱하는 방법
SW logic에 상응하는 knowledge database를 적용하여 HW
processing speed를 계산하는 방법
HW와 SW간의 communication 속도와 전송 data의 양에 기반하여 계
산
Design의 목적과 제한사항에 대한 모든 조합이 만족되지 않으면,
system algorithm의 개선과 design 목적 및 제한사항을 재검토하게
된다.
Copyrightⓒ2004
149
4장. HW/SW Co-Design for SoC
A hardware / software partitioning technique
with hierarchical design space exploration
Houria Oudghiri, Bozena Kaminska, and Janusz Rajski, Mentor Graphics Corp.
Modeling technique limitations
=> supports various models from the simplest to the most complex for
the same input system
=> modeling technique provides a large choice for the final
implementation
 Estimation parameters
=> performance, implementation cost, communication overhead

Copyrightⓒ2004
150
4장. HW/SW Co-Design for SoC
The proposed methodology

The co-design target architecture is based on two types of processors

The Texas Instruments DSP processor TMS320C40 is used as the master processor and
the custom SIMD processor PULSE (Parallel Ultra Large Scale Engine, 4 processors in
parallel) as the slave processor
Copyrightⓒ2004
151
4장. HW/SW Co-Design for SoC
The proposed partitioning
algorithm

=> the dependency graph includes all the blocks with their interaction in the
same structure
=> the classification and the comparison between blocks are easier
=> a unified structure (the weighed dependency graph) is used during all the
partitioning process
=> the node (blocks in the model) with their own weights (the performance
estimation of each block)
=> the edge (block interactions) with their weights (quantity of interaction)
Copyrightⓒ2004
152
4장. HW/SW Co-Design for SoC
Algorithm
input : List of blocks and time constraints
output : Two subsets where blocks are assigned
Step 1 : construct the complete weighted dependency graph G
Step 2 : Assign all blocks to software, compare the complete system execution time
Step 3 : while (time constraints not satisfied)
do
step 3_i : Select the node with the maximum execution time (i)
step 3_ii : Assign i to hardware, Update the system execution time
step 3_iii : while (time constraints not satisfied) do
step 3_iii_1 : Select the maximum weighted edge connected to i
with the most time consuming node (j)
step 3_iii_2 : Assign to hardware,
Update the dependency graph G
Update the system execution time
endo
endo
Copyrightⓒ2004
153
4장. HW/SW Co-Design for SoC
The hierarchical model
of the FFT transform behavior

Blue: PULSE
Initialize
Bit
Reversal
FFT
Initialize
Variable
Initialize
Data
Bit_init
Index_init
Read_data
Index_incr
Bit_loop1
Bit_cond
Bit_incr
Bit_shift
Bit_test
Bit_swap1
Bit_swap2
Danielson
control
Output
Dan_init
Dan_loop
Out_init
Out_write
Bit_loop2
Loop2_test
Bit_acc
Loop2_ass
Data_test
Loop2_shif
Danielson
Dan_init
Initialize
Dan_loop1
Copyrightⓒ2004
Level 2
Dan_loop1
Loop2_init
Initialize
Loop1_body
Update
Variables
Loop2_body
Dan_real
Loop2_incr
Dan_imag
Level 7
Level 8
Loop1_incr
Out_incr
Level 1
Loop1_init
Level 3
Level 4
Level 5
Level 6
154
4장. HW/SW Co-Design for SoC
Block assignment at different
hierarchical levels of the FFT transform
level
Nb.of
Bolcks
C40
PULSE
Time(ms) / time constraint =
25 ms
PULSE
C40
Total
1
4
2
2
18.14
4.8
22.94
2
10
6
4
18.8
2.96
21.76
3
17
11
6
15.56
9
24.56
4
22
18
6
14.68
10.24
24.92
5
24
17
7
14.56
10.4
24.94
6
24
22
2
6.82
17.72
24.54
7
25
22
3
7
17.92
24.92
8
27
18
9
5.88
18.64
24.52
Copyrightⓒ2004
155
4장. HW/SW Co-Design for SoC
Alternative comparison for the
FFT transform
Execution time (ms)
Code size (Bytes)
Partition
PULSE
C40
Reduction
PULSE
C40
Reduction
1(C40)
0
38.64
…..
0
1260
…..
2(PULSE)
20.44
0
…..
2196
0
…..
Solution #1
18.15
4.80
40 %
1424
264
23 %
Solution #8
5.2
18.64
38 %
352
412
65 %
Sol. # 1: all init. And output operations assigned to PULSE
Sol. # 8: only processing operation assigned to PULSE
Copyrightⓒ2004
156
4장. HW/SW Co-Design for SoC
Results and discussion





Considering the first levels in the hierarchy, during partitioning, improve
considerably the time performance but this is not the case for the memory size.
The use of medium and last levels may decrease considerably the memory size or
the area with a very little degradation in performance.
The alternatives generated are compared to the lower-bound performance (the
hardware solution) and the upper bound performance (the software solution)
implementation in order to find the best trade-off.
There is an optimal level in the hierarchy.
The use of the most complex and detailed model does not mean obtaining the
best solution.
Copyrightⓒ2004
157
4장. HW/SW Co-Design for SoC
OCAPI-xl model, IMEC








The OCAPI-xl model was used to develop a stand-alone
webcam including an interface to a digital CMOS image sensor,
a GIF engine, a network layer and an interface to a
10BaseT ethernet PHY+MAC controller. The synthesized
model for this NetCam (with raw-IP sockets) consisted out of
25 concurrent processes, described in about 2Klines of C++
code (taking about 25Kgates on an ASIC), designed from
scratch in 14 man-months.
Copyrightⓒ2004
158
4장. HW/SW Co-Design for SoC
OCAPI-xl design flow
Copyrightⓒ2004
159
4장. HW/SW Co-Design for SoC
Application Structure
Copyrightⓒ2004
160
4장. HW/SW Co-Design for SoC
Cam-E-leon system
architecture
Copyrightⓒ2004
161
4장. HW/SW Co-Design for SoC
H/W and S/W 통합 저전력 설계 최적화 환경 및 도구
ORINOCO
S/W
H/W
S/W 코아
에너지 예측
DSP Station
SW 에너지
효율 계산
ORINOCO
시스템 수준
에너지 예측
HW SW 통합
Seamless
Co-centric
알고리즘 선택
Matlab/SPW
클러스터 링
Cossap,
Synopsys
클러스터
스케쥴링
HW 에너지
효율 계산
클러스터 선택
Signal-master
H/W 합성 및 에너지 예측
Copyrightⓒ2004
Synopsys
162
4장. HW/SW Co-Design for SoC
IS-95 CDMA Searcher H/W and S/W 통합 설계
황인기, 성균관대
Cost
(Speed,Area,Power)
Synchronous
Accumulator
(SW)
Energy
Estimate
(SW)
Comparator
(SW)
Asynchronous
Accumulator
(SW)
Comparator
(SW)
GOAL!
PN-Code
Generation
Synchronous
Accumulator1
(HW)
Comparator
with
precomputation
(HW)
Energy
Estimate
(HW)
Asynchronous
Accumulator
(HW)
Comparator
with
precomputation
(HW)
Synchronous
Accumulator2
(HW)
Copyrightⓒ2004
163
4장. HW/SW Co-Design for SoC
참고문헌





The Codesign of Embedded Systems : A Unified
Hardware/Software Representation, Sanjaya Kumar, James
H.Aylor, Barry W.Johnson, Wm. A. Wulf
Synthesis and simulation of digital systems containing interacting
hardware and software components 29th dac
A model and methodology for hardware-software codesign
CAP Laboratory Homepage (http://peace.snu.ac.kr/)
Pai Chou, Ross Ortega, Gaetano Borriello, "Synthesis of the
Hardware/Software Interface in Microcontroller-Based Systems,"
Proceedings of the IEEE/ACM International Conference on
Computer-Aided Design, Santa Clara, CA, November 1992.
pp.488-495.
Copyrightⓒ2004
164
Download