Poster

advertisement
Gather-Scatter DRAM: Improving the Spatial Locality of Strided Access Patterns
Vivek Seshadri, Thomas Mullins, Amirali Boroumand, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd. C. Mowry
Field 1
InMemory
Databas
e Table
Physical Data Layout
Field 3
Record 1
Record 2
Record 3
Strided Access Pattern
1. High Latency
2. High Bandwidth Consump
3. High Energy Consumption
Record n
DRAM Module
Cache Line
Cache Line
Observation: Each row buffer has many u
Idea: Gather a cache line of useful value
Fixed mapping
Column IDbased Data
Shuffling
Goal:
cmd
addr
Stage 1
Minimize
Stage 2
chip
conflicts Stage 3
for
th
(stage ‘n’ is enabled only if n least significant bit of column ID is set)
common
Per-chip Column
patterns
Translation Logic
Output Column
if cmd Address
== READ or cmd
Gather/scatter many access patterns
n
(e.g., any 2 stride) with near-ideal
efficiency
and
latency!
Minimal support from 1) on-chip
==
WRITE:
morph = chip-ID AND
pattern
output = addr XOR morph
else:
output = addr
caches,
2) instruction set architecture, and
3)
software
Analytics
Hybrid Transactions/Analytics
verage across many workloads) (average of two workloads)
arying number of R/W/RW fields) (sum of 1 or 2 columns)
5
0
20
10
0
2.0
100
1.5
1.0
0.5
0.0
80
60
40
20
10
30
Throughput
10
30
120
Energy (mJ)
15
40
2.5
(mSec)
20
50
Execution Time
25
(mJ for 10000 trans.)
60
Energy
(millions/second)
Throughput
30
(transactions = 1R+1W per record)
(analytics = sum of 1 column)
0
Row Store Column Store GS-DRAM
25
20
15
10
5
0
w/o
Pref.
Pref.
21
8
(mSec)
Transactions
Execution Time
cmd
addr
pattern
data
(millions/second)
data
Cache Line
6
4
2
0
w/o Pref.Pref.
Download