Niraj Shah and Mel Tsai

advertisement
A Framework for
Evaluating Programming
Models for Embedded
CMP Systems
Niraj Shah
Mel Tsai
CS252 Final Project
4/27/2000
Overview
Motivation
Target Architectures
Programming Model
Software Environment
Applications
Preliminary Results and Conclusions
Future Work
CS252 Final Project (Spring 2000)
Motivation
Embedded multiprocessor systems for are
different than their GP counterparts



Interprocess communication can be very cheap
Communication architecture tailored to application
Desirable not to have a heavy OS or large library
to handle communication
Efficiently programming these systems in an
HLL is an absolute necessity
How do we evaluate the machine abstraction
that is presented to the programmer?
CS252 Final Project (Spring 2000)
Target Architectures
PE
PE
PE
Thanks Scott
Memory System
PE
PE
PE
Register File
PE
PE
FU
FU
FU
SFU
SFU
Instruction Cache
Instructions to perform communication
operations
Some simplifying assumptions
CS252 Final Project (Spring 2000)
Programming Model
Language specification is a simplified subset of MPI




Single Program Multiple Data (SPMD) execution model
Separate address spaces for each process
Bind each process to a distinct PE
Communication primitives

Blocking/Non-blocking Sends & Receives
MPI Programming Model
MPI_Send(data_length,
*data_location,
type,
destination_PE,
tag_identifier,
MPI_COMM_WORLD);
Mescal Programming Model
Mescal_Send(data_length,
*data_location,
destination_PE);
How do we evaluate the programming model?
CS252 Final Project (Spring 2000)
Software Environment
Augmented IMPACT framework (single
PE) to target CMPs
Compiler
Generates optimized code for each PE
 Understands our programming model
 Generates code to use our hardware

CS252 Final Project (Spring 2000)
Trace Simulator
*.X_im_p
emulator
generator
trace
data
machine
description
MP
simulator
simulator
*.c*.c
++
MPI
+probes
probes
“probed”
executable
simulation
data
CS252 Final Project (Spring 2000)
MPI C
gcc
compiler
input
data
Application - JPEG
JPEG encode/decode
processdecode
2
encode
process
splitter 1
processdecode
3
encode
processdecode
4
encode
CS252 Final Project (Spring 2000)
process
combiner5
Application – Network
Routing
Based on MIT Click Modulator Router

Translated to C (from C++) by the
MESCAL team  CRACK (Click Rapidly
Adapted to C-Kode)
Built router kernel from CRACK
“Elements”
CS252 Final Project (Spring 2000)
CRACK
Parallelized CRACK
InfiniteSource
process
1
12.1%
Idle cycle times
48.8%
CheckIPprocess
Header
2
49.3%
GetIPAddress
process
3
88%
Port 0
…
Port 1
process
Port n
5
Lookupprocess
IPRoute
4
0%
CS252 Final Project (Spring 2000)
Preliminary Conclusions
Scheme to better parallelize (loadbalance) applications
Need way of overlapping computation
and communication (i.e. non-blocking)
Extensible framework is useful for
exploring different programming models
Allows for quantitative analysis of the
effect of communication primitives
CS252 Final Project (Spring 2000)
Future Work
Get more detailed numbers from parallelized
CRACK
Implement non-blocking sends and receives
Map multiple processes to a single PE
Performance evaluation of different
programming models for an application set
Support dynamic process creation
Incorporate microarchitectural simulation of
communication instructions
CS252 Final Project (Spring 2000)
Download