FBSim: A Fully Buffered Memory System Simulator

advertisement
A Fully Buffered Memory System Simulator
Rami Nasr
-M.S. Thesis, and ENEE 759H Course Project
Thursday May 12th, 2005
Another Simulator?
Sim-DRAM exists and supports FB-DIMM. Why write
another simulator?







Sim-DRAM still had a few unworkable bugs in its FB-DIMM
model when I began my study.
FB-DIMM is radically different than other memory architectures.
New simulator => fresh start.
FBsim is made exclusively for simulating and studying the FBDIMM architecture. Easier to study FB-DIMM with an exclusive
simulator.
Different scheduler, mapping algorithm, approach, style, section
of study in the FB-DIMM design space.
FBsim is ideal for simulating ‘unreasonably’ high memory
request rates and studying channel saturation effects.
The two simulators can be used to validate each other’s results
in FB-DIMM studies.
Writing a memory simulator was a great experience for me.
FBsim Overview


All code written from scratch.
Standalone product. Does not currently interface with CPU simulators or
memory traces. Instead probabilistically models memory transactions
according to user specifications.

=> Does not actually store memory data

Written in ANSI C. ~5000 lines of code. Code organized into header files,
commented, quite easy to hack.

Fast. For each memory channel, 1 second simulates ~10ms (or ~1ms
during channel saturation) on a 2.4 GHz Pentium 4.

Supports Open & Closed Page Mode, Fixed & Variable Latency Mode.

Supports output of macro and micro (frame by frame) simulation data

Does not model channel init, maintenance, sync. overhead.

Does not model memory refresh.

Does not model power consumption, and power timing limitations (tFAW
etc.).

The above options can be incorporated readily into future versions.
FBsim Overview 2
Channel
Scheduler 0
Input
Transaction
Generator
Channel
Scheduler 1
Address
Mapper
A Frame Iteration
•Try to generate transactions
•Map any generated
transactions to its channel
scheduler.
•Fire each scheduler once.
Channel
Scheduler 7
Input Transaction Model
• Step Distributions
• Normal
(Gaussian) Distributions
Input Transaction Model 2
Bus
Trace
Viewer
FBsim
Model
Address Mapping

Physical addressClosed
mustWHILE
be
mapped
somehow to the
Page
(a Mode
non zero row sum exists)
right channel, DIMM, rank,
bank, row, and column.
{
WHILE (visit each channel with a non zero
row sum exactly once)
 FBsim built to support different
DIMM capacities,
{
Theeven
next 'result'
is channel DIMM with the
different channel capacities,
unbalanced
highest number.
configurations
Decrement that DIMM's number by 1.
Decrement the row sum by 1.
Mode
}
Modulus = 4+2+1+2 = 9Open Page
 => Algorithm needed }to map incoming transaction to
DIMM
Channel Scheduler
FB-DIMM Frame Format
Review
SouthBound (SB) Frame could be a:
• Channel Frame (not modeled in FBsim)
• Command Frame (up to three DRAM commands, with only one
command possible to each DIMM in the channel)
• Command + Wdata Frame (holds one DRAM command, plus one
DDR beat of write data)
NorthBound (NB) Frame could be a:
• Channel Frame (not modeled in FBsim)
• Read Response Frame (holds two DDR beats of returned read data)
Some of my Results
Case Study Conclusion
• With at least two DIMMs on each channel,
performance scales very well in FB-DIMM
•More than two DIMMs only increases
throughput
• 1x8 achieved
7.9 GBps before
saturating (82%)
• 2x4 achieved
capacity,
15.6not
GBps
(82%)
• 4x2 achieved
•Adding each DIMM adds ~5ns average channel
latency
31.3 GBps
in FLM, and slightly over half that in VLM
(82%)
• 8x1 achieved
• In closed page mode, only 82% of peak theoretical
45.2 GBps
throughput of a channel can be reached.
(59%!)
Some of my Results 2
• In Closed Page Mode with 2:1 read/write ratio, a
reordering window of size ~12 transactions achieves
best possible performance (channel saturation) for a
FB-DIMM channel scheduler. Increasing window-size
over this has no benefit.
• The more skewed the read/write ratio, the bigger the
scheduling window needs to be (at 4:1, its ~18).
• In Variable Latency Mode, a reordering window of size
~20 achieves best possible performance.
Some of my Results 3
Micro-study shows that in Closed
Page Mode, the FB channel can at
most reach ~93% write data
utilization on the SB, and ~84% read
data utilization on the NB.
Micro-study showed that FBsim
channel utilization was slightly worse
for non 2:1 read/write ratios (it was
2% worse for 4:1). FBsim scheduler
can quite straightforwardly be made
more adaptive to read/write ratio of
transactions in scheduler.
Future Ideas with FBsim
• I’m graduating this semester (if Dr Jacob and Mr
(Dr?) Wang so please), and escaping to the
corporate world.
• => Writing a guide for FBsim along with some
ideas for future work. Anyone who wishes to take
over development is eagerly encouraged to.
• If so, I would be happy to help get things rolling
by email or in person. Feel free to access & use
anything in FBsim or my thesis paper.
• I strongly believe a very interesting paper or
three can quite quickly come out of this research
area
(me)
Future Ideas with FBsim 2
• For credibility in a paper, add an interface between FBsim and a CPU
simulator or memory traces. Run real benchmarks through FBsim. Compare
and contrast these results with the transaction modeling results.
• AND/OR add more functionality and provable realism to the transaction
modeler. Study this.
• Best yet, integrate FBsim into the Sim-DRAM package as an added option.
• Add modeling for channel overhead, memory refresh overhead, error
simulation and error handling, power consumption constraints and metrics.
• Enhance adaptivity of FBsim scheduler to non 2:1 read/write ratios.
• Experiment with address mapping algorithm and load balancing.
• Experiment with different type scheduler implementations (eg. ones not
based on pattern matching). *involved*
• Study hardware constraints in FB-DIMM channel scheduling.
More Possible FB-DIMM
Studies





Channel utilization and configuration trade-offs for
Open Page Mode
Performance degradation of shrinking scheduler reorder
window size
Relaxation on critical DRAM device parameters (density,
nBanks, timing constraints, clock frequency) allowed by
FB-DIMM architecture
OR optimizing the FB-DIMM architecture by increasing
the SB and NB channel widths (adding lines) or
bitrates, and maybe modifying the frame protocol
AMB is a logic device on a memory module!! Can add
buffers, arithmetic units, processing power, etc…..
Special Thanks to..


Dr Jacob for introducing me to the
field and guiding my progress
David Wang for the course lectures
and material
Download