lec6-sdr-arch

advertisement
A Software Radio Architecture for
Linear Multiuser Detection
I. Seskar and N. B. Mandayam
Wireless Information Network Lab.
Rutgers Univ.
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
1
SDR(Software Defined Radio) 란
무엇인가?

a collection of hardware and software technologies that enable
reconfigurable system architectures for wireless networks and user
terminals

efficient and comparatively inexpensive solution to the problem of
building multi-mode, multi-band, multi-functional wireless devices
that can be enhanced using software upgrades

SDR can really be considered an enabling technology that is
applicable across a wide range of areas within the wireless industry
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
2
SDR benefit






Standard architecture for a wide range of communications
products
Non-restrictive wireless roaming for consumers by extending the
capabilities of current and emerging commercial air-interface
standards
Uniform communication across commercial, civil, federal and
military organizations
Potential for significant life-cycle cost reductions
Over the air downloads of new features and services as well as
software patches
Advanced networking capabilities to allow truly "portable"
networks
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
3
THE SOFTWARE RADIO NODE
ARCHITECTURE


Defining the Software Radio Architecture
-
Functional Architecture
Physical Architecture Components
Resource Estimation and Management
Software Architecture
Software Tools
Architecture Migration
-
Embedded DSP
Multimode Radios
Multiband Multimode Conventional Radios
Speakeasy I & II, The Military Software Radio
Integrated Architectures
Case Studies in the Evolution of the Software Radio
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
4
Software Radio “Phase Space
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
5
Hardware Software Mix
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
6
Functional architecture of a software radio
for linear multiuser detection
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
7
Logical partitioning of functionality in a software
radio receiver
for linear multiuser detection
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
8
Block-diagram of software radio
implementation(TMS320C40 DSP)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
9
BER range vs SNR
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
10
BER range vs Number of users
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
11
conclusion
 The reconfigurability of linear multiuser
receivers allows for the integration of
multimedia services over wireless channels
with variable quality of sevice(QoS)
requirements.
 The reconfigurable radio architectures also
provide diverse QoS quarantees ranging in
several orders of magnitudes in terms of
BER requirements
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
12
Design and Implementation of a
Completely Reconfigurable Radio
Srikathyayani Srikanteswara
Michael hoffmann
Jeffrey H. Reed
Peter M. Athanas
IT-SOC 2002
©스마트 모빌 컴퓨
팅 Lab
13
Introduction

Soft radio using stream based computing
and runtime reconfigurable hardware

CCM called Stallion for the processing layer

Layered radio architecture

Implementation of rake receiver for
WCDMA
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
14
Traditional choice
ASIC
-most efficient implementation of a given
circuit
-little flexibility, high initial cost, long design
cycle
 FPGA
-lack run-time and partial reconfigurability
-not matched for wireless communication or
signal processing application
 DSP
-maximum flexibility, short design cycle
-not efficient for power consumption silicon
IT-SOC 2002 ©스마트 모빌 컴퓨팅 Lab
area

15
CCM(custom computing machine)
-achieve flexibility in h/w without
sacrificing power or silicon efficiency
-try to retain charactertitics of FPGA,ASIC
-static h/w for frequently used cores like
multiplication => efficient radio design
-customize FPGA based such that the
flexibility of FPGAs is retained only
where necessary
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
16
Stallion Architecture
CCM developed by Virgina Tech
 Flexible, high-datathroughput, low-power
computation
 Based on wormhole reconfigurable computing
 Support fast run-time reconfigurability
 Configuration time is in order of microseconds
(most FPGA is in order of milli-seconds)

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
17
Stallion Architecture
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
18
Stallion Architecture
(functional unit)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
19
참고자료
wormhole structure(floating point multiplier)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
20
Layered Radio Architecture
Hardware paging
-reconfigurable computing make hardware paging
possible
-hardware modules being paged in and out of the
system in a manner similar to software paging
performed with virtual memory
-allow for the optimal use of a system’s processing
elements

Stream-based processing
-stream;sequence of words containing both
configuration information and computational data
-simplifies the interfaces between processing
modules
-make it easy to replace modules or add additional
modules IT-SOC 2002 ©스마트 모빌 컴퓨팅 Lab

21
참고자료
wormhole structure(basic principle)support
stream-based processing
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
22
참고자료
virtual memory using hash(inverted page table structure)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
23
The Layered Radio Architecture







Soft radio interface(SRI) layer
Configuration layer
Processing layer
Data to be processed, programming information
Each layer’s functionality is isolated from the
other layers
Information is passed between the layers utilizing
stream-based processing
Processing layer requires run-time reconfigurable
h/w while the higher layers don’t have this
constraint
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
24
참고자료
The Layered Radio Architecture
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
25
Design Issue points of Reconfigurable
Platform
Complex entity that needs to handle
very high data rates efficiently
 Smooth reconfiguration of the radio
 Ability of runtime reconfiguration
 Over-the-air updates
 Low power consumption

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
26
Overview of the Layered Radio
Architecture
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
27
Overview of Layered
Architecurte
•The layered architecture leverages on streambased processing, where a common bus is used
for data as well as programming information.
•The architecture can handle complex data
processing with efficient resource allocation, while
maintaining hardware reusability, flexibility,
scalability.
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
28
Application-layer software (1)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
29
Application-layer software (2)
User interface
 Receive Data from A/D converter
 Send control packet to SRI
 Send Reply signal to HOST PC

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
30
Soft radio interface layer (1)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
31
Soft radio interface layer (2)
Contain system level description level
code
 Send algorithm code to Configuration
layer
 Dose not contain hardware
configuration binary code
 Reference from local memory

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
32
Configuration layer (1)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
33
Configuration layer (2)
Contain actual bits need to processing
layer
 Receive the programming packet from
SRI layer
 Send configuration packet to processing
layer
 Reference binary code from local
memory

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
34
Stream-based processing
Diagram of packet
Why do we use stream-based packet?
• using pipeline
• can maintain some degree of flexibility
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
35
Stream-based processing (1)
•A stream is a packet of known length
containing either programming(configuration)
information or the data to be processed.
•Each processing module performs a unique
subset of the overall processing on the data
and then passes the data and control
information to the next module.
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
36
Stream-based processing (2)
Receive packet from previous
processing element
 When packet had been received, it had
been interpreted and attached some
processed data

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
37
More detailed Soft radio interface layer
(1)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
38
More detailed Soft radio interface layer
(2)
Input signal <Where?>
A/D converter
 Host PC  Buffer  Configuration layer

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
39
More detailed Soft radio interface layer
(2)
• Packet Received from configuration has appended
• Status, error message
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
40
More detailed Soft radio interface layer
(3)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
41
More detailed configuration layer (1)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
42
More detailed configuration layer (2)
Receive algorithm code from SRI
 It has local memory and reference from
it
 Local memory has actual binary code,
processing module address and status
list

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
43
More detailed Processing layer (1)
Feature
Linearly connected
 Static + Reconfiguration module
 Separate operation (do not disturb
other module)
 Main flow is pipelined

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
44
More detailed Processing layer (2)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
45
More detailed Processing layer (3)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
46
More detailed Processing layer (4)
For pipelined operation
Each module can feed back operation
 Also support concurrent Input/Output

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
47
More detailed Processing layer (5)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
48
More detailed Processing layer (6)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
49
More detailed Processing layer (7)
when a stream packet enters a processing module
Interprets the packet and performs the
necessary action
 Examines valid packet and maintain
synchronize each module
 Each clock cycle, every processing module
accepts packet and sends out a packet

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
50
Section matching
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
51
Overview Design of Stallion (Virginia Tech)
(1)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
52
Overview Design of Stallion (Virginia Tech)
(2)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
53
Overview Design of Stallion (Virginia Tech)
(3)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
54
Overview Design of Stallion (Virginia Tech)
(4)
Why main clock had to divide?
The guard slot is used to prevent data
crash
 The forward slot is used to transmit
data and processing packet
 The backward slot is used to feedback

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
55
Overview Design of Stallion (Virginia Tech)
(5)
How is the Power-Up sequence like?
Power on, first P-module assign address 0 on
itself
 Send invalid stream in bus to ensure the
other P-module not to get same address
 First P-module will not act until stream’s
address is assign to 0

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
56
Advantages in the Layered Architecture
Advantages in the layered architecture
Defines the methodology to design multimode radios
using hardware paging
 Provides the framework for building a flexible soft
radio at the expense of the overhead for packetizing
data.
 Excellent hardware reusability
 Build libraries of hardware functions much like
software’s it
 Has good data flow properties and simple interface
between the processing layer modules.

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
57
Insight into Stallion (Virginia Tech Inc.) (1)
What is Stallion? And it’s feature?
Based on Wormhole reconfigurable
computing
 High reconfiguration speed
 Specifically suited to flexible
 High-throughput
 Low-power computations

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
58
Insight into Stallion (Virginia Tech Inc.) (2)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
59
Insight into Stallion (Virginia Tech Inc.) (3)
Operating description of Stallion
The functional units are programmed to process data,
while the crossbar aids in routing data
 Input and Output is performed with 6 Data port
 Processed data flow is looping by using smart
crossbar
 The essence of wormhole runtime reconfiguration is
that independent, self-steering streams of
programming information and operand data
computational problem at hand

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
60
Rake Receiver
The layered architecture with the use of
CCMs can support existing and future robust
high data rate system
 The implementation of rake receiver
demonstrates that the architectures can
support very high data rates while retaining
flexibility
 Implemented using a single Stallion for the
processing layer

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
61
Rake Receiver
CCDMA(x=time,y=frequency)
user1
user3
user1
user2
user1
user3
user3
user2
user2
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
62
Rake Receiver
 3-finger
rake takes 5976 cycles per slot
 Each slot contains 10240 samples
 Total processing rate of 0.5836
cycles/sample
 Operating speed of 4.48MHz
 8.96% of typical speed of 50MHz
 Show capacity to support high data rate
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
63
Rake Receiver
(implementation statics)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
64
Conclusions
The layered architecture needs formal and
unified structure for standard
 The layered architecture is suited for today’s
FPGAs that support partial reconfiguration
and for tomorrow’s configurable computing
platforms
 Current research at Virginia Tech focuses on
building a library and soft radio modules.

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
65
Reference
J.Mitola III, “The software Radio Architecture.” IEEE Commun. Mag.
May 1995, Pp. 26-38
 J.Mitola and Zvonar, “The Soft Radio Architecture for Reconfigurable
Platforms”, IEEE Commun. Mag. Pp. 140-147
 S. Srikanteswara et al. “Design and Implenmentation of a Completely
Reconfigurable Soft Radio”, IEEE 0-7803-6267-5
 S.Srikanteswara et al. “Configurable Computing for Commnunication
Systems” Proc. Wireless Commun. Conf. IMAPS, 1998, Pp. 180-185
• FPGA in the Software Radio
(IEEE Communication Magazine Feb 1999)

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
66
DSP-Based Architectures for
Mobile Communications:
: Past, Present and Future
This material is based on Paper of Gatherer, Stetzler,
McMahan and Auslander
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
67
Introduction
Goal:
Approaches to the
implementation:
 Summarize trends in power
consumption and MIPS, and
 Programmable DSPs for
describe the use of coprocessor
flexibility
 Hard-wired ASIC to
improve implementation
efficiency
Properties :
 Right answer
 Some combination of
 GSM/UMTS
both approaches
 Power
 Flexibility is becoming
 MIPS
more of an issue
 Coprocessor
 the programmability
 Complementary
offered by DSPs is even
technology
more desirable
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
68
A Historical Perspective on Wireless
Handset Architectures for GSM - I
Needs for DSP in early GSM :
 low-power requirement


Use of DSP.




upgrades to ASIC-based solutions became costly and difficult.
A single DSP was powerful enough to do all the DSP function.
To improve system power consumption and board space


DSP was included mainly to do the vocoding
“mission creep”
Flexibility : evolving standard.
a slightly different physical layer from the previous one by each
generation



most of the phone would be implemented in ASIC.
integrate a RISC microcontroller
AS GSM phones have gradually moved beyond the simple
phone function

this have led to an increase in the fraction of the DSP MIPS used
by something other than physical layer 1.
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
69
A Historical Perspective on Wireless
Handset Architectures for GSM - II
Reduced uses of ASIC in GSM:
 Making an ASIC vocoder was like replicating available
commercial DSP architecture
 Product life cycle shortened from 2.5 years to 1 year
 Different worldwide standards related to GSM



Platform based architecture
A DSP-based baseband approach can cope better
with different RF and mixed-signal offerings
Spare DSP MIPS

Echo/noise cancellation, Speech recognition, equalizer
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
70
Trends
in Low-Power DSPs - I
Trends :


DSP Power dissipation is halving the power every 18 months
The percentage of the physical layer MIPS that reside in the DSP change



100 % in GSM to 10 % in WCDMA
But, more efficient architectures and enhanced instruction sets
The example of an evolving DSP optimized for wireless application

Ex) TI C54x , Lucent 1600 series and ADI12xx series.
C54x:
 Several power saving features are built into the architecture
 Instruction set to reduce the code size and processor cycles required.
 Modified Harvard Architecture

One program memory bus coupled with two data address generator.


High memory bandwidth and multiple operand operation : fewer cycles to complete the same
function
Adding instruction



Allow efficient implementation of algorithms important to wireless application
Example of C54x : 40 bit barrel shifter and compare-select-store, single and block repeat,
block memory move, FIR, LMS….
In near future, bit manipulation…
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
71
Trends
in
Low-Power
DSPs
II
Another Trends :

VLIW processor




Supporting a compiler-based programmer-friendly environment
Example: TI TMS320C6x, ADI TigerSHARC, Lucent and
Motorolla Star*core
Instruction level parallelism : statically schedule and multipleissue implementation
Very efficient compilation of higher level code : reducing the
need for DSP-specific assembly-level coding of algorithm

Open-application-driven system
Power Management :
 C54x utilize hybrid power manage management strategy


Automatic local clock gating & 3 user-controlled idle modes
Flexible D-PLL based clock generator and multiplier
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
72
Coprocessor - I
The Problem of implementing a new standard with today’s DSP :
 Standards are driven by what is possible for ASIC implementation at a given
power and cost point.
 A newly defined standard cannot be implemented in a DSP alone
 For WCDMA voice rate terminal, only 10% operations are suitable for
implementation on a current DSP
 Functions operating on data at symbol rate as opposed to the chip rate
Appealing solution :
 Coprocessor-based architecture with a single programmable device at its
core
 Example
 Pleades project : RISC engine with an attached configurable array of
multiplier
 16-point complex radix 2 FFT : delay energy product of 0.02% and
4 % of Strong ARM and C54x
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
73
Coprocessor
II
Division of Coprocessor :

loosely coupled and tightly coupled

Defined relative to the average time to complete an instruction on
the DSP
TCC (Tightly coupled coprocessor) :
 TCC : DSP will initiate a task on the coprocessor that
completes in a few instruction cycle




A specific interface to the DSP core and access to some register
within that core
few cycle : Involve a small amount of data
Difficult parallel scheduling of task on DSP and Coprocessor
User-definable instruction set enhancement



Provide power and speed improvement for small task where there is
no data bottleneck through the DSP
Specific task and relatively small compared to DSP
Absorbing by replacing with code
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
74
Coprocessor - III
LCC (Loosely coupled coprocessor) :
 Analogous to a subroutine call than instruction
 Operation on large data sets
 Run in parallel with DSP
 More careful with the scheduling of LCC instruction
 Main advantage
 Solution of bus bandwidth when raw input data rate or data reuse in
calculation is very high
 Computational unit local to data arranged for the data access
required for a class of computation
 Application at chip rate to symbol rate boundary
 Simple but high MIPS task
 Instruction and output buffer are memory mapped to allow flexible
access
 Care full coprocessor design : No significant power penalty to be paid
for the flexibility
 DSP/Coprocessor partitioning : ex) Decoder
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
75
Application and Architectures for future
wireless devices
Example of Service :
 Imaging services
 Location-based services
 Audio and visual environment
A need for more powerful DSPs :
 Open-operating system
 Dual-core RISC + DSP
 Multiple DSP
 Extension to which the application and
communication functions can be effectively
combined in one programming environment
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
76
FPGA/PLD Comparison
 FPGA

Sea of configurable logic
blocks (CLBs)



Fine granularity
LUT-based
Complex interconnect

Unpredictable performance
 PLD

Fewer, larger logic array
blocks (LABs)



Xilinx
FPGA
structure
Coarse granularity
Sum-of-product structure
Fast; predictable timing
Altera
PLD
structure
Comparison of Technology Solutions
Power
consumption
Size
Cost
Field
Upgradable
Silicon
evolution
Tools
High-Speed
DSPs
Very High
Modest
Moderate/
High
High
Easy
Some
Multiple
ASICs
Moderate
Large
High
None
Difficult
Available
Parameterize
d
Hardware
Moderate
Moderate
Moderate
Some
Moderate
Some
Reconfigurabl
e
logic
Low
Low
Moderate/
Low
High
Easy
Unavailable
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
78
FPGA vs DSP
FPGA Chip
Programming Language
DSP Chip
VHDL, Verilog
C, Assembly Language
Fairly easy, however, a programmer needs to
understand the hardware architecture before
programming
Easy
Can be very fast if an appropriate
architecture is designed
Speed is limited by the clock speed of a DSP
chip
Reconfigurability
SRAM-type FPGAs can be reconfigurable
infinite times
Can be configurable by changing program
memory content
Reconfiguration method
Reconfiguration is done by downloading
configuration data to a chip electronically
Reconfiguration is done by simply reading a
program at a different memory address
Area where FPGAs can
outperform DSPs
FIR filter, IIR filter, conrrelator, convolver,
FFT, etc,
A signal processing program of sequential
nature
Can be minimized if the circuit is designed to
save power, or the power is dynamically
controlled
Even if program A is larger than program B,
power consumption does not change as long
as the number of memory chips is the same
Parallel multiplier/adder or distributed
arithmetic
Repeated operation of MAC function
Can be fast if a parallel algorithm is used. If
a filter is implemented using distributed
arithmetic, the speed does not depend on
the number of taps
Limited by the speed of MAC operation of a
DSP chip if a filter is implemented, the speed
becomes slower if the number of taps
increase
Can be parallelized to archieve high
performance
DSP chip programming is usually sequential
and cannot be parallelized
Ease of software programming
Performance
Power consumption
Implementation method of MAC
Speed of MAC
Parallelism
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
79
A typical FPGA architecture
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
80
A typical FPGA architecture
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
81
Xilinx_X3032_CLB
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
82
Distributed arithmetic MAC
b1
a1
Shift Register
a2
Shift Register
Register
b2
Shifter
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
83
Distributed arithmetic MAC
using an LUT
b1
Shift Register
Lookup
Table
Shift Register
Register
b2
Shifter
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
84
A Distributed arithmetic
FIR filter using an LUT
Input
Shift Register
Lookup
Table
Shift Register
Register
b2
Shifter
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
85
Utilization Comparison
Exercise: Map
wireless protocol
blocks (TCI) to FPGA,
PLD
 FPGA excel for
datapath blocks
 PLD excels for
control blocks
1. 2
1
0. 8
0. 6
Xilinx FP GA
A ltera P LD
0. 4
0. 2
0
CRC
CRC FSM
P hysSend
Remo te
TCI FSM Blocks
Logic Utilization: FPGA vs. PLD
1.2
Utilization (normalized)

Utilization (normalized)
Logic utilization: FPGA vs. PLD
1
0.8
Xilinx FPGA
0.6
Altera PLD
0.4
0.2
0
GenSync
MergeInteger
Select
Serial
TCI Datapath Blocks
Sw itch
Utilization Analysis

When does FPGA outperform PLD?

Large # of outputs or registers


Multi-level logic (>2)


PLD requires one macrocell per output
PLD is restricted by its sum-of-product (SOP) structure
When does PLD outperform FPGA?

Regular, two-level AND-OR structure


PLD performs very well for state machines
Very large # of inputs

Each CLB has typically < 7 inputs
Power Comparison
Power Consumption (mW)
70
60
50
40
Xilinx FPGA
Altera PLD
30
20
10
0
TCI CRC
TCI CRC+FSM
PhysSend FSM
Pow er Dissipation
2.5
2.03
2
Power (mW)
FPGA typically consumes less
energy than PLD
 PLD: Pseudo-NMOS AND-plane,
sense amp at AND-plane
output
 Low-energy FPGA
implementation currently exists
(V. George, 1999)
 Low-energy PLD:
 Implement PLDs with three
AND-plane structures
1) Remove sense amp
2) Static CMOS: No static
power
3) Dynamic logic: Fewer xtors

Power Consumption of TCI Blocks
St andard PLD
Dynamic PLD
1.5
1.1
CMOS PLD
1
0.5
Xilinx FPGA
0.174
LP-PGA FPGA
0.091
0.065
0
Standard Dynamic
PLD
PLD
CM OS
PLD
Xilinx
FPGA
LP-PGA
FPGA
Architecture Exploration
 For
low-energy protocol implementation,
FPGA and PLD deliver different benefits
Power – Low-power FPGA research is more mature
 Utilization – FPGA and PLD behave differently;
hence, should be utilized differently

Protocols = Extended FSMs
= FSM + Datapath units


Next State
Decoder
Inputs
Use PLD for FSMs & FPGA for datapath
units
Make parameterizeable to allow applicationspecific tradeoffs
Control Data
Output Output
Decoder Decoder
PLD
FPGA
State Reg

solution: A hybrid approach
Output Reg
 First-order
Control
Output
Data
Output
Design flow for wireless protocol
processor – Platform Instantiation
(Phase II)
Perform design exploration to find a suitable platform
instance for a given set of target applications and
constraints.
Phase II

The Y-chart approach (Kienhuis) involves
an iterative process of:



Mapping functions to parameterized
architectural modules
Performance evaluation of the resulting
platform under the given set of functional
constraints.
The principle of orthogonalization of concerns
(functional vs. implementation) is applied to
fully explore the design space.
Configurable
Functional
Platform
Specification.
Mapping
Performance
Evaluation
Design flow for wireless protocol
processor (Phase III)

Perform hardware and software
synthesis to implement a specific
application onto the platform
instance.


Through the software API, the hardware
platform can be programmed or
configured to perform the desired
functionality.
Remaining issues include generation
and compilation of application code,
real-time operating system (RTOS), and
any necessary design synthesis.
Phase III
Implementation
From Phase II
Interconnect is a Major Power
Component
“Power Breakdown for
“Chip Interconnect Trends”
Reconfigurable Logic”
(Xilinx)
Power [%]
100
80
60
C.L.B.
I.O. 5%
9%
Clock
Today
40
21%
20
0 '95 '00 '5 '10
Interconnect
65%
Year
[T. Sakurai]
[E. Kusse & J. Rabaey]
SoC using cores

Assembling an SoC using cores has
not yet become reality
-
Complex task
Manual and error-prone,
Difficult full timing closure,
Physical problem
System verification
Lack of established standard and interface
synthesis tool
- H/W S/W integration
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
93
SOC Target Architecture(1)
Standard on-chip bus structure
- CoreConnect from IBM(PowerPC)
- AMBA from ARM(ARM)
 IBM’s SOC framework
- IBM Blue Logic Core Library
- a fixed bus architecture
- Processor Local Bus(PLB),
On-Chip PeripheralBus(OPB),
Device Control Register Bus(DCR)

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
94
SOC Target Architecture(2)

CoreConnect-based SOC
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
95
SOC Target Architecture(3)
Define all the cores needed to
implement the desired functionality
 Understand the functionality of all
pins on all cores
 Define the request priorities and
interconnect pins according to them
 Define which cores access memory
, address map , clock domain etc.

IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
96
Automating SOC Integration
“Coral”
 raising the level of abstraction
 Elements
- Virtual design

- Interface encapsulation and
Glueless interfaces
- Core and Pin Properties
- Interconnection Engine
- Virtual to Real Synthesis Engine
- Configuration Engines
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
97
Virtual Design

Virtual component ,Virtual interface,
Virtual net
Real : 160 pins
Virtual : 10 pins
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
98
From Interface Encapsulation
to Glueless Inteface

Two levels of glue logic encapsulation
- all the static and parameterizable
protocol/interface logic
- glue logic between cores

For third party cores
- create core wrappers in VHDL
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
99
Core and Pin Properties(1)

Pin and component information
- BUS_TYPE: PLB,OPB,ASB….
- INTERFACE_TYPE:MASTER….
- FUNCTION_TYPE: READ,INTERRUPT
- OPERATION_TYPE:REQUEST…
- DATA_TYPE: ADDRESS,DATA….
- RESOURCE_TYPE: BUS,PERIPHERAL
- PIN_GROUP:DCU….
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
100
Core and Pin Properties(2)

Ex) DCU_plbRequest on PowerPC401
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
101
Virtual to Real Synthesis(1)

Three steps with VRSE
1) instantiate a real component in the real design
2) traverse every virtual net and the virtual pins
3) compare the properties on the real pins and
determine which real pins should be connected
together
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
102
Virtual to Real Synthesis(2)
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
103
Virtual to Real Synthesis(3)

virtual to real synthesis
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
104
Configuration Engine

Various system configuration menus
- Clocking
- Address map definition
- Interrupt map definition,
- DMA channel assignment
- I/O specification and generation
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
105
Bus Power
•
Buses as a significant source of power dissipation
and delay due to large capacitive loading
 15% of total power in Alpha 21064
 30% of total power in Intel 80386
Power Reduction in Buses

Voltage Swing Reduction



Charge Recovery



increased probability of error
additional power supply
additional latency
additional circuitry
Coding


additional bus lines
additional circuitry
Transition activity
• Self transition activity
Signal prob.
Conditional prob.
• Coupled transition activity
- Transition types
Type I
Type II
L
Type III
Type IV
L
L
Low Power Encoding Schemes

Generic encoder/decoder (codec)
architecture


Encoder : predictor, encoding logic block,
decorrelator
Decoder : correlator, decoding logic block, register
Coupling-Driven Bus-Invert Scheme
(CBI)

Coupling-driven bus-invert


Problems



flipping data signal when the coupling effect of
inverted signal is less than that of original data
How to accurately account for the coupling effect
How to effectively implement the scheme with
small hardware
Basic idea

Enumeration method to measure the coupling
effect in a cycle time
Wires-First Design

Exploits logic structure to reduce wire loads

almost all logic has considerable structure
Early visibility of timing and power dissipation
 Enables use of advanced circuits



Gives a stable design


wire properties and crosstalk known early and well
characterized
key wire loads don’t change with small logic
changes
Gives the designer control
Wires-first design
Short
Wire
Models
Structured
RTL
RTL
Floorplan
Structure
Library
Synthesis
Local
Netlists
Place &
Route
Layout
Regions
Key Wires
Placement
& Loads
Wire plan
Manual
Design
Slow
Paths
Timing
Analysis
R&C
Extractor
On-Chip Interconnection
Networks

Replace dedicated global wiring with a
shared network
Local
Logic
Router
Network
Wires
Chip
Dedicated wiring
Network
Most Wires are Idle Most of
the Time
Don’t dedicate wires to signals, share wires
across multiple signals
 Route packets not wires
 Organize global wiring as an on-chip
interconnection network




allows the wiring resource to be shared keeping
wires busy most of the time
allows a single global interconnect to be re-used
on multiple designs
makes global wiring regular and highly optimized
Dedicated wires vs. Network
Dedicated Wiring
On-Chip Network
Spaghetti wiring
Ordered wiring
Variation makes it hard to model
crosstalk, returns, length, R & C.
No variation, so easy to exactly
model XT, returns, R and C.
Drivers sized for ‘wire model’ –
99% too large, 1% too small
Driver sized exactly for wire
Hard to use advanced signaling
Easy to use advanced signaling
Low duty factor
High duty factor
No protocol overhead
Small protocol overhead
Circuits for On-Chip Networks
Uniform, well characterized lines enable custom
circuits - 0.1x power, 3x velocity
ph1N
inP
ph2N
sig1P
sig1N
pre
inN
sig2P
sig2N
Long, lossy
RC lines
H-bridge driver
100mV swing
Regenerative
Repeaters
Architecture for On-Chip
Networks

Topology - different constraints than off-chip networks


buffering is expensive, bandwidth is cheap
more wires between ‘tiles’ than needed for one channel


Flow-control




multiple networks, higher dimensions, express channels
run static, statically scheduled, and dynamic networks on one set
of wires
combine buffers with repeaters (ISSCC 2001)
use methods that make efficient use of scarce resources (Flit Res.)
Interface Design

standard interface from modules to network


pinout and protocol
independent of network implementation
참고문헌

[ 1 ] S. Srikanteswara, R. Boyle, et-al, “A Soft RADIO Architecture for
Reconfigurable Platform,” IEEE Communication Magazine, Feb. 2000.
[2] http://www.oren.com OR51210 Digital TV VSB Demodultor Product

[3]








Datasheet
http://www.ti.com
TMS320C55x
and
Overview/Datasheet
[4] http://www.xilinx.com Virtex Platform FPGA
TMS320C64x
Technical
[5] SDR 동향 보고서, TTA SDR Ad Hoc Group, Dec. 2001.
[6] John C. Davies IV, “Design and Implementation of an FPGA-based Soft-Radio
Receiver Utilizing Adaptive Tracking,” Thesis for the degree of Master of Science
in Electrical Engineering, Virginia Polytechnic Institute and State University.
[7] http://www.ist-pastoral.com
[8] http://www.ist-trust.com
[9] http://www.sdrf.org Technical Report, Released 1999.
[10] Steven Winegarden, “Bus Architecture of a System on a Chip with UserConfigurable System Logic,” IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL.
35, NO. 3, MAR. 2000.
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
118
Conclusions
앞으로의 전망과 나아가야 할 길





Soft radio 구조에 대한 광범위한 사용 증대
Hardware와 Software의 설계 결합 증대
Soft radio architecture에 대한 관심과 연구 필요
Layered architecture에 사용되는 library개발과
module개발에 대한 투자의 필요
연구 참여를 통한 인한 국제 표준화에 참여
IT-SOC 2002
©스마트 모빌 컴퓨팅 Lab
119
Download