The SHARC

advertisement
The SHARC
Super Harvard Architecture Computer
Clare Smtih SHARC Presentation
1
The SHARC
• Developed by Analog Devices
• Optimized for demanding DSP and imaging
applications.
• 32 Bit floating point, with 40 bit extended
floating point capabilities.
• Large on-chip memory.
• Ideal for scalable multi-processing
applications.
Clare Smtih SHARC Presentation
2
Harvard Architecture
• Program memory can store data.
• Able to simultaneously read or write data at
one location and get instructions from
another place in memory.
• 2 buses
1 Data memory bus.
2 Program bus.
• Either two separate memories or a single
dual-port memory.
3
Super Harvard Architecture
• Many processor employ Harvard
Architecture by having two separate
memories or caches integrated into the
processor chip
• The SHARC is unique in that it’s internal
memory is capable of holding a large
program as well a large amount of data.
This is what makes it SUPER!!!
Clare Smtih SHARC Presentation
4
DSP
• Digital Signal Processor.
• High speed, low overhead data movement
and rapid computations required.
• Usually has a small on-board ROM, RAM
and single cycle multiply.
• Designed to run single line, serial in, serial
out, signal processing applications very fast.
Clare Smtih SHARC Presentation
5
DSP Computations
• The inner product of two vectors is a
common computation for determining
energy or correlation.
• The following C code is an example:
for (n=0; n<length; n++)
result+= x[n] * y[n];
• The process which has the lowest
instruction time will have the best
performance.Clare Smtih SHARC Presentation
6
SHARC DSP
• The SHARC incorporates features aimed at
optimizing such loops.
• High-Speed Floating Point Capability
• Extended Floating Point
• These features are DSP specific.
• Meaning, when applied to a non-DSP
application performance may not be as
optimal.
Clare Smtih SHARC Presentation
7
Floating Point and
Extended Floating Point
• The SHARC supports floating, extendedfloating and non-floating point.
• No additional clock cycles for floating point
computations.
• Data automatically truncated and zero
padded when moved between 32-bit
memory and internal registers.
• Not accurate enough for scientific
algorithms. Excellent signal to noise ratio. 8
SHARC’s Internal Memory
• Makes SHARC unique.
• Size
• Allows many complex functions to be preformed
on-chip. Eliminating the need to move data between
internal and external memory.
• Memory size is significantly larger then most other
high speed computational devices.
• Dual-block, Dual-port
• Optimizes the Harvard Architecture by allowing the
fetch of instructions while performing data memory 9
accesses.
Multiply and Accumulate
Instructions on the SHARC
• Like most DSPs the SHARC is able to
compute a product and add the product to a
running total in a single clock cycle.
• The SHARC’s super instruction is that it
can multiply and accumulate while adding,
subtracting, or averaging data in two other
registers.
• These instructions give the SHARC its 120
megaflop rating.
10
Zero Overhead Looping
on the SHARC
• A single instruction outside the loop
performs loop set-up. Informing the
SHARC that there is a loop approaching.
• The instruction also includes the iteration
count and termination condition.
• This causes the pipeline to remain full
during loop execution and also allows the
termination condition to be tested in
parallel.
11
DAGs on the SHARC
• Data Address Generators are integer
computation units that manage the indexing
of registers.
• Allows the SHARC to to fetch a value and
update the index value.
• If the updated value exceeds a limit, the
DAB adjusts the index so that it wraps.
• This occurs in the same clock cycle as the
12
read or write.
DAG Capabilities
• Circular Buffering
• Rather then actually moving data in and out of a
vector, circular buffers are used.
• Updating the index modulo, the oldest entry can be
conveniently replaced by the newest entry.
• Bit Reverse Addressing
• The bit pattern of a vector index is reversed.
• Done automatically by the SHARC.
• Required for Fast Fourier Transform (FFT), which is
often critical to DSP applications.
Clare Smtih SHARC Presentation
13
SHARC DSP
• What Makes the SHARC unique?
– It also has some features not related directly
related to optimizing numeric computations.
• Pipelining
• Handling Branches
• Why has this not emerged sooner?
– Technology has only recently become available
to make it economical to integrate general
single computing devices.
Clare Smtih SHARC Presentation
14
SHARC’s Pipeline
• 3 stages
1 Instruction Fetch
2 Decode
3 Execution
• Takes three clock cycles for an instruction
to propagate through the pipeline.
• The processor execution speed is one
instruction per clock cycle even though
each instruction requires three clock cycles.15
Clare Smtih SHARC Presentation
SHARC’s Handling Branches
Delayed Branching
• When a branch instruction is encountered
the two instructions which have been loaded
and decoded are executed before the branch.
• This keeps the pipeline full and avoids
junking those two instructions and reloading
the pipeline.
• Beneficial in situations such as a few
instruction loops. When the ratio of wasted
clock cycles to instructions is significant. 16
SHARC’s Handling Branches
Non-delayed Branching
• Traditional branching.
• If the pipeline cannot be reordered to use
delayed branching, non-delayed branching
is space saving.
• Uses only one word of storage.
• Although, it takes three cycles as the
pipeline gets reloaded.
Clare Smtih SHARC Presentation
17
Multi-processing
• SHARC is uniquely equipped for multiprocessing.
• Links to ports are very powerful multiprocessing capabilities.
• Two main program models depending on
the application.
• Adapts well to different multi-processing
architectures.
Clare Smtih SHARC Presentation
18
Multi-processing
SHARC Links
• SHARC has 6 link ports that can transport
data at rates up to 40Mbytes/sec.
• Links designed for point-to-point
connections.
• Data can be transmitted in either direction
but not both simultaneously.
Clare Smtih SHARC Presentation
19
Multi-processing Program Model
MIMD
• Multiple instruction, multiple data.
• Good for applications that require multiple
instruction threads to execute concurrently.
• Processors operate individually.
• Each processor executes different code.
• Typically used for image reconstruction and
multi-channel DSP.
Clare Smtih SHARC Presentation
20
Multi-processing Program Model
SIMD
• Single instruction, multiple data.
• Works best when all processors execute
identical instruction sequences.
• Do not require overhead for inter-processor
synchronization.
• Typically used for synthetic aperture radar
and automatic target recognition.
Clare Smtih SHARC Presentation
21
Multi-processing Architectures
Cluster Design
• Groups of up to 6 in a cluster
• Most common for joining multiple
SAHRC's
• All processors, global I/O and global
memory connected to a common
“Cluster bus.”
• Each SHARC can “drive” the bus.
Clare Smtih SHARC Presentation
22
Multi-processing Architectures
Mesh Design
• All SHARC’s joined by their link ports and
are connected to a common bus.
• In SIMD mode one single master SHARC
drives the bus.
• In MIMD mode mesh architecture cannot
function if data is lager then on chip
available memory.
• Advantageous scalability over a wider range
23
of applications.
Summary of what makes the
SHARC Super
• It performs excellently for DSP
applications.
• Employs a Harvard Architecture with very
large on chip memory.
• Respectable Megaflop rating.
• It’s multiprocessing capabilities.
Clare Smtih SHARC Presentation
24
How optimal is the SHARC for
non-DSP Applications?
• It is obviously geared for DSP applications.
• While it may fare better then other
processors it is still behind those which are
designed specifically for non-DSP
applications.
Clare Smtih SHARC Presentation
25
Sources
• www.alacron.com/news/tp_mimd_simd.htm
• www.analog.com
• www.cs.seas.gwu.edu/~cs339/cs339lecture2.pdf
• www.ixthos.aa.psiweb.com/technical/notes_
articles/articles
Clare Smtih SHARC Presentation
26
Download