Profiling_based

advertisement
Profiling-Based
Hardware/Software CoExploration for the Design of
Video Coding Architectures
Heiko Hübert and Benno Stabernack
Lu Hao
Contents
1. Background
2. MEMTRACE profiler
3. Software/Hardware Optimization
4. Conclusion
Background -- profiling
Profiling is used to understand the runtime behavior of applications
Efficient profiling approaches
Software profiling
 Sampling, Instrumentation
 Flexible but have high overhead
Hardware profiling
 Performance counter
 inexpensive but more rigid and may not be
universally available
Hybrid Combinations of the above
 Hold great potential since they combine the
advantages of both without the drawbacks
An example of hardware profiling
PC – Performance Counter
Background – system analysis
Why we need profiling?
 It is very important to adapt the system to the
application in order to find an efficient solution.
 Video coding
Contents
1. Background
2. MEMTRACE profiler
3. Software/Hardware Optimization
4. Conclusion
MEMTRACE profiler
 MEMTRACE delivers cycle-accurate profiling
results on a C function level.
 The results include clock cycles, various memory
access statistics, and optionally energy
consumption estimation for reduced instruction
set computer (RISC)-based processors.
 A focus is placed on memory access analysis, as
for data-intensive applications this aspect has a
high potential for increasing system efficiency.
MEMTRACE profiling toolflow
MEMTRACE -- Initialization
MEMTRACE – Performance Analysis
MEMTRACE – Post Processing
MEMTRACE backend
MEMTRACE -- Profiling data acquisition
MEMTRACE -- Profiling data acquisition
 init()
 Initialize the profiler.
 Creates a list of all functions and global variables
 nextInstruction()
 Checks if the program execution has changed from
one function to another
 If so, the cycle count of the previous function is
recalculated and the call count of the new function is
incremented
 memoryAccess()
 It is decided if a load or store access was performed,
and which bit-width (8, 16, or 32-bit) was used.
MEMTRACE -- Profiling data acquisition
busActivity()
 Identifies the bus status (idle cycle, core
access or DMA access) and increments the
appropriate counter of the current function
cacheMiss()
 Is called each time a cache miss occurs
finish()
 When the ISS terminates the simulation
Processor model generator
Interconnection
What can we do by using the result of
MEMTRACE profiler?
Contents
1. Background
2. MEMTRACE profiler
3. Software/Hardware Optimization
4. Conclusion
System partitioning
 Computationally intensive functions are wellsuited for hardware acceleration in a
coprocessor
 Control-intensive functions are better suited
for software implementation on ASIPs
(Application Specific Instruction set
Processors)
Software Optimization
Loop unrolling
For computational intensive parts,
arithmetic optimizations or SIMD
instructions can be applied, if such
instructions are available in the processor
 Video applications
Hardware Optimization
 Memory Subsystem Optimizations
 External memory
 Cache (Cache miss)
• The data areas with the most cache misses and the smallest
size should be stored in on-chip memory
 SRAM
 Instruction Set Architecture Optimizations
 Frequently used instructions should be considered as
targets for optimization during the processor
architecture development.
Conclusion
Profiling and system analysis
MEMTRACE architecture
 Initialization
 Performance analysis
 Post processing
Hardware/Software optimization
 Software
 Hardware
And questions?
Lu Hao
References





[1] H Hübert, B Stabernack. Profiling-based hardware/software co-exploration for the
design of video coding architectures. IEEE Transactions on Circuits and Systems for
Video Technology, 2009, Pages: 1680-1691
[2]ST Microelectronics: Nomadik STn8820 Mobile Multimedia Application Processor
(2008, Feb.). Data brief. [Online]. Available: www.st.com
[3] Broadcom: BCM2820 Low Power, High Performance Application Processor (2006,
Sep.). Product brief. [Online]. Available: www.broadcom.com
[4] G. de Micheli and L. Benini, Network on Chips. San Francisco, CA: Morgan
Kaufmann, 2006.
[5] H. H¨ubert, “MEMTRACE: A memory, performance and energy profiler targeting
RISC-based embedded systems for dataintensive applications,” Ph.D. dissertation,
Dept. Elect. Eng. Comput. Sci., Tech. Univ. Berlin, Germany, 2009. [Online]. Available:
http://opus.kobv.de/tuberlin/volltexte/2009/2261
Download