Performance Analysis of MicroBlaze Processor

advertisement
Performance Analysis of
Processor
Midterm Presentation
Performed by :
Alexei Iolin
307724211
Instructor:
Evgeny Fiksman
Winter 2005
Alexander Faingersh
306966912
Agenda
•
Project Goals
•
MicroBlaze architecture
•
OPB timer/counter
•
OPB interrupt controller
•
Connecting Customized IP to FSL bus
•
Our Customized IP
•
Performance result
Project Goals
MicroBlaze is a Soft core Processor developed by Xilinx that
meets performance, area-efficiency and low cost targets.
Although using the MicroBlaze enables fast system development
on a single FPGA, some of the “special” applications run slower
than in Hardware IP. We will examine this with EDK environment
•
Examination of MicroBlaze calculation abilities by measuring time of
running application and power consumption.
•
Implementing arbitrary application in Hardware (IDCT) and using it
as a hardware acceleration for MicroBlaze.
•
Implementing the same functionality in C and comparing the results
with hardware.
•
Adding self written C code for testing FPU.
•
Using as application code one of well known benchmarks.
Such as: DHRYSTONE MIPS ,SPEC CPU 2000.
Or implementing arbitrary benchmark.
Hardware
EDK and MicroBlaze
• The Embedded Development Kit (EDK) is a set of microprocessor
design tool and common software platforms. The EDK includes
the Platform Studio tool suite, the MicroBlaze core and a library of
peripheral IP cores.
• The MicroBlaze embedded soft core is a 32-bit
Reduced Instruction Set Computer (RISC) optimized for
implementation in FPGA. Operating at up to 200
MHz.
• MicroBlaze enables to you have complete flexibility in setting
peripherals, memory and interface features on a single FPGA
MicroBlaze Architecture
MicroBlaze Hardware
Options and Functions
• Hardware Barrel Shifter
• Hardware Divider
• Machine Status Set and Clear
Instructions
• Hardware Exception Support
• Pattern Compare Instructions
• Floating-Point Unit (FPU)
• Hardware Multiplier Enable
Bus Infrastructure
• Data-side On-chip Peripheral
Bus (DOPB)
• Instruction-side On-chip
Peripheral Bus (IOPB)
• Data-side Local Memory Bus
(DLMB)
• Instruction-side Local Memory
Bus (ILMB)
• Fast Simplex Link (FSL)
OPB Timer/Counter
The TC (Timer/Counter) is a 32-bit timer module that attaches to the OPB.
•
Two programmable interval
timers with interrupt, event
generation, and eventcapture
capabilities.
•
Each timer has 3 32bit registers:
1. TCSR - Control Register
2. TLR
- Load Register
3. TCR - Counter Register
•
Both timer/counter modules can
be used in a Generate Mode, a
Capture Mode, or a Pulse Width
Modulation (PWM) Mode.
OPB Interrupt Controller
Continuing INTC…
INTC Features
• Priority between interrupt requests is determined by vector
position.
• Supports data bus widths of 8-bits, 16-bits, or 32-bits for OPB
interface.
• Number of interrupt inputs configurable up to the width of data bus.
• Interrupt Enable Register (IER) for selectively disabling individual
interrupt inputs.
• Master Enable Register for disabling interrupt request output and
choosing software or hardware interrupts.
• Each input is configurable for edge or level sensitivity.
Connecting Customized IP to FSL BUS
•
•
1.
MicroBlaze has the ability to use its dedicated FSL bus interface
to integrate a customized IP core into a MicroBlaze soft
processor-based system.
Generally, there are two ways to integrate a customized IP core
into a MicroBlaze
One way is to connect the IP on the (OPB) .
2.
The second way is to connect the user IP to the
MicroBlaze dedicated Fast Simplex Link (FSL) bus system.
•
If the application is time-critical, the designer should take bus
standard delays into account, thus the user IP should be
connected to the FSL bus system.
Otherwise, it can be connected as a slave or master on the OPB.
Continuing Customized IP…
• In general, every application can be realized and implemented either
as software algorithm or as structural hardware. It is important to use
the hardware implementation advantage (parallel execution).
Example demonstrates how the
parallel execution advantage can be
used.
The software routine needs 12 clock
cycles to calculate the result G.
However, in hardware it takes only
2 clock cycles to compute the same
result.
•RISC architectures have a two-input and a one-output
(ALU). IP with more than two input values and more
than one output value are problematical.
• If the critical path of the whole system is through
the user IP, the whole soft processor will decrease in
performance (processor frequency).
• The software integration of customized instruction
can’t be handled directly from the compiler, thus the
user has to use inline assembly to work with them.
• The customized instructions have to be implemented
in software as inline assembler code. This could
produce a C application code, which is neither very
clean nor portable.
• It is possible to use more than 2 dynamic inputs and
more than 1 output because up to 16 FSL interface
busses are provided.
• User IP is independent, doesn’t affect the internal
MB RISC architecture thus won’t decrease the clock
frequency of MB.
• Outside implementation of IP allows to run customs
calculations parallel to main stream application.
• The new hardware doesn't require inline assembler
code because the FSL interface has predefined Cmacros for I/O to IP
• Two MB processors connected back to back have a
very fast and clean way to communicate with each
other.
Our Customized IP
• We implemented 1-dimension IDCT on FSL .
• A 1-dimension IDCT realized in software requires a high execution
time because the C- program executes many loops sequentially .
• Implementation of application as hardware module greatly reduces the
execution time due to parallel processing.
• The software application writes 8 values from memory to the FSL. The
IDCT core gets the data, calculates the result and returns the result data
(8 words) back to MB trough the FSL.
• By cascading the 1-dimensional IDCT core, it is possible to integrate a
2-dimensional IDCT core (Useful for Image processing).
Continuing Our Customized IP…
The whole embedded system consists of the MicroBlaze itself, two
FSL bus systems, the user core, an OPB on-chip bus, two OPB
peripherals (UART lite and the MicroBlaze Debug module) and the onchip block RAM.
The application program is stored in the on-chip block RAM.
Continuing Our Customized IP
FSL_M_Data - The data bus written to the FSL FIFO
FSL_M_Write - Input signal that controls the write enable signal of the FIFO.
FSL_M_Full - Output signal from the FIFO indicating that the FIFO is full.
FSL_S_Data - Output bus that indicates the data available at the read end of the FIFO.
FSL_S_Read - Input signal that controls the read acknowledge signal of the FIFO.
FSL_S_Exists - Output signal indicating that FIFO contains valid data.
Performance Results
Test Specifications
SW Application
Time
Testing basic start-up
functionality including printing
out.
Default EDK SW
TestApp.c
1.0968
sec
Testing the time that takes one
entering, printing out and exiting
empty interrupt handler.
Default EDK SW
TestApp.c
8.3326
Testing the time that takes to
enter the interrupt handler after
the interrupt occurred
TestApp +
Custom IDCT
application
14.76
usec
Testing the time for custom IDCT
(VHDL)hardware accelerator
application
TestApp +
Custom IDCT
application
3.26541
sec
msec
Time Table
EDK trainings
DONE
Studying the communication with OPB Timer and
Controller
DONE
Measuring execution time for basic application files and
interrupts.
DONE
Implementation of IDCT in HW for hardware acceleration
DONE
Midterm Presentation
DONE
Implementation of IDCT in C (fixed & FPU version)
and power consumption measurements
2 WEEK
Dhrystone benchmark or arbitrary benchmark
1 WEEK
Final presentation, poster and Project book
3 WEEK
Questions?
Download