ECE506 Week 1

advertisement
ECE 506
Reconfigurable Computing
http://www.ece.arizona.edu/~ece506
Lecture 1
Course Introduction
Ali Akoglu
Background needed for this course
° You should be familiar with:
• Digital design
• Architecture
-
Controller+Datapath
-
Memory Hierarchy
-
Pipelining
• More listed in syllabus
° Assumes no knowledge of reconfigurable computing
Topic self-contained!
Reconfigurable Computing is a lot more than just devices
Goals
° Understanding of issues related to RC
(reconfigurable computing)
• Architectures
• Tools
• Design methodologies
° Detailed investigation of a specific problem
• Research project
Course Organization
• 3-5 Homework assignments (30%)
• Project (45%)
• Exam (15%)
• Class participation/attendance (10%)
• No required text – readings will be assigned from research
papers
Project
° Groups
• Size to be determined based on enrollment
-
Likely 3-4 per group
° Topic subject to instructor approval
• Will give examples
° Phase 1: Literature Review (no page limit), Due:
Feb 19, In class hard copy!
° Phase 2: Project Plan, Due Feb 26, In class hard
copy
° Phase 3: Presentations, starting March 17
° Phase 4: Final Report and Demo (Last class, In
class hard copy)
Reading List
° Reading List will be posted ahead of time
• Collection of papers and/or tutorials
° Discussion Oriented
• Participation
General Purpose Computing?
° In 1945, John Von Neumann demonstrated that a
computer could execute any kind of computation, given
a properly programmed control, without the need for
hardware modification.
° Quickly became the fundament of future generations of
high-speed digital computers.
• One of the reasons is its simplicity of programming that follows the sequential
way of human thinking.
General Purpose Computing?
° All algorithms must be sequentially programmed to run
on a VN computer, many algorithms cannot be executed
with their potential best performance.
General Purpose Computing
° Advantage:
Flexibility: any well coded program can be executed
° Drawbacks
Speed: Not optimal due to the sequential program
execution (temporal resource sharing).
Resource efficiency: Only one part of the hardware
resources is required for the execution of an instruction.
The rest remains idle.
Memory access: Memories are about 10 times slower
than the processor
Drawbacks are compensated using high clock speed,
pipelining, caches, instruction pre-fetching, etc.
Domain Specific Processors
° Data path is tailored for an optimal execution of a
common set of operations that mostly characterizes the
algorithms
• Digital Signal Processor (DSP) belong to the most used domainspecific processors in telecommunication, multimedia, automobile,
radar, sonar, seismic, image processing, etc.
• Ability to perform one or more multiply accumulate (MAC) operations in
single cycle.
• Special support for efficient looping.
- Special loop or repeat instruction allows a loop implementation
without expending any instruction cycles for updating and testing
the loop counter or branching back to the top of the loop.
• Customized for data with a given width according to the application
domain. (image processing, pixels are represented in Red Green Blue
(RGB) system where each color is represented by a byte, then an image
processing DSP will not need more than 8 bit data path.
• Specialization increases the performance and improves the device
utilization.
• Flexibility is reduced, because it cannot be used anymore to implement
other applications other than those for which it was optimally designed.
Application Specific Processors
° Although DSPs incorporate a degree of applicationspecific features such as MAC and data width
optimization, they still remain sequential machines.
° If a processor has to be used for only one application,
then the processing unit could be designed and
optimized for that particular application.
° In multimedia processing, processors are usually
designed to perform the compression of video frames
according to a video compression standard.
Application Specific Processors
if (a
{
d =
c =
}
else
{
d =
c =
}
< b) then
a+b;
a*b;
b+1;
a-1;
At least 3 instructions
Exec.time >= 3*tinstruction
The complete execution is done in
parallel in a single clock cycle
Exec.time = tclock= delay longest path
from input to output
The VN computer needs to be clocked
at least 3 times faster (to reach equal exec.time)
Implementation Spectrum (Hardware vs. Software)
Microprocessor
Reconfigurable
ASIC
Hardware
• Computer hardware, such as application-specific integrated
circuits (ASICs)
- provides highly optimized resources for quickly performing
critical tasks,
- but it is permanently configured to only one application via a
multimillion-dollar design and fabrication effort.
• Computer software provides the flexibility to change
applications and perform a huge number of different tasks,
- orders of magnitude worse than ASIC implementations in terms
of performance, silicon area efficiency , and power usage.
• Reconfigurable hardware blends the benefits of both hardware
and software.
- implement circuits just like hardware, yet can be reprogrammed
cheaply and easily to implement a wide range of tasks.
Application Specific Processors
Processing Approaches, Need for Reconfigurable Computing
Programmable
Special Purpose
Data Level
Parallelism
SIMD MIMD
Reconfigurable
ASIC
Instruction
Level
VLIW/
Superscalar
General Purpose
CISC
With Media
extended ISA
RISC
Without
ISA
What is Reconfigurable Computing?
• Computation using hardware that can adapt
at the logic level (post-fabrication) to solve
specific problems
• A way of implementing circuits without
fabricating a device
• Spatial structure of the device is modified
to match the new application.
Reconfigurable Computing

What is it?
Compute by building a
circuit rather than
executing instructions.
 Efficient for long running
computations




Video and image
processing
DSP
Network processing
Example: Z[i] = a.X[i] + b.Y[i]
//program
Load rx, X
X
Y
*a
*b
Mpy r1, rx, ra
Load ry, Y
Mpy r2, ry, rb
Add r3, r1, r2
+
Store r3, Z
Z
implement computations spatially , simultaneously computing millions of
operations in resources distributed across a silicon chip.
Reconfigurable Computing?
° can be hundreds of times faster than microprocessorbased designs
° unlike in ASICs, computations are programmed into the
chip, not permanently frozen by the manufacturing
process.
° FPGA-based system can be programmed and
reprogrammed many times.
• a bug fix to correct faulty behavior, or it is used to add a new feature.
• reconfigure a generic computation engine for a new task
• reconfigure a device during operation to allow a single piece of silicon
to simultaneously do the work of numerous special-purpose chips
Reconfigurable Computing?
° Delivering best of hardware and software , not quite!
• creating efficient programs for them is more complex
• useful only for operations that process large streams of data,
such as signal processing, networking, and the like.
• Compared to ASICs, they may be 5 to 25 times worse in terms
of area, delay , and performance.
° ASIC design may take months to years to develop and
have a multimillion-dollar price tag
° RC design might only take days to create and cost tens
to hundreds of dollars.
° For systems that do not require the absolute highest
achievable performance or power efficiency, RC is a
compelling design alternative.
Reconfigurable Computing?
° Current devices can compute functions
• on the order of millions of basic gates,
• running at speeds in the hundreds of Megahertz.
° To boost speed and capacity , additional, special
elements can be embedded
• such as large memories, multipliers, fast-carry logic for
arithmetic and logic functions, and even complete
microprocessors.
° Reconfigurable devices today are capable of
implementing complete systems
Reconfigurable Computing Devices: FPGA
° Field Programmable Gate Arrays
• Logic blocks in a general routing structure.
• array of logic gates is the G and A in FPGA.
-
logic blocks perform simple combinational logic, as well as sequential
logic.
• FPGA can implement very complex circuits.
• The logic and routing elements in an FPGA are controlled
by programming points
• By way of a configuration file or bitstream, an FPGA can
be configured to implement the user’s desired function.
- allowing customization at the user’s electronics
bench, or even in the final end product.
- This is why FPGAs are field programmable
FPGA
° customizing an FPGA merely involves storing values to
memory locations, similarly to compiling and then loading a
program onto a computer, the creation of an FPGA-based
circuit is a simple process of creating a bitstream to load into
the device
Reconfigurable Computing, Function Level
Programming?
° Because of the FPGA’s dual nature—combining the flexibility
of software with the performance of hardware—an FPGA
designer must think differently from designers who use other
devices.
° Software developers typically write sequential programs that
exploit a microprocessor’s ability to rapidly step through a
series of instructions.
° In contrast, a high-quality FPGA design requires thinking
about spatial parallelism—that is, simultaneously using
multiple resources spread across a chip to yield a huge
amount of computation.
Reconfigurable Computing, Function Level Design?
° the flexibility of FPGAs gives architects new
opportunities generally not available in ASICs
• designs can be rapidly developed and deployed, and even
reprogrammed in the field with new functionality .
• they do not demand the huge design teams and validation efforts
required for ASICs.
° Also, the ability to change the configuration, even when
the device is running
° However, FPGAs are noticeably slower and have lower
capacity than ASICs, designers must carefully optimize
their design to the target device.
Fields of Application
° Rapid Prototyping: Testing hardware before
fabrication
• Software simulation
- Relatively inexpensive
- Slow
- Accuracy?
• Hardware emulation
- Hardware testing under real operation conditions
- Fast
- (Relatively) Accurate
- Allows for several iterations
Fields of Application
° Post-fabrication Customization
• Time to market advantage
• Ship the first version of a product
• Remote upgrading with new product versions
• Remote repairing
Fields of Application
° Multi-modal Computation: Reconfigurable vehicles,
mobile phones, etc.
Built-in Digital Camera
Video phone service
Games
Internet
Navigation system
Emergency
Diagnostics
Different standard and protocols
Monitoring
Entertainment
Fields of Application
° Adaptive Computing Systems
• Computing systems that are able to adapt their behavior and structure
to changing operating and environmental conditions, time-varying
optimization objectives, and physical constraints like changing
protocols, new standards, or dynamically changing operation
conditions of technical systems.
Fields of Application
° Fault tolerance
• Autonomous fault detection on communication lines
• Detections of defect nodes
• Task migration on node failure
• Load balancing computation
Conclusion
° 10 years of Moore’s-law progress led to the microprocessor
• Raised engineers’ productivity
• Problem-solving became programming
• Grew to billions of units/year
° Further speed gains will not be seen any more due to
unreliability and higher variations of transistor
° Stalled progress in design methods for thirty years
° Future Multi-Core Designs are already available, but
do have major problems:
• Shared Memory Model does not scale to hundreds of processors on a chip
• Distributed Memory Model is difficult to program
• Power consumption and temperature are further problems
° Reconfigurable Processors, Networks, and
Memories on a Chip may be the solution…
Hot Reconfigurable Computing Research Areas
• Developing power-efficient architectures and CAD
techniques for FPGAs
• Important new applications for reconfigurable devices
(especially embedded applications and security)
• Better understanding the role of standard microprocessors
and reconfigurable hardware.
- Multiple types of parallelism
• Coarse-grained reconfigurable architectures
• 3D Reconfigurable Architectures
• Autonomous Systems
- Self-healing
Download