Single Instruction Stream Parallelism is greater than 2

advertisement
NC STATE UNIVERSITY
Retention-Aware Placement in DRAM
(RAPID): Software Methods for
Quasi-Non-Volatile DRAM
Ravi K. Venkatesan
Stephen Herr, Eric Rotenberg
Center for Embedded Systems Research (CESR)
Department of Electrical & Computer Engineering
North Carolina State University
NC STATE UNIVERSITY
DRAM in Next-Generation Mobile Devices
 Feature-rich next-generation mobile devices
 Increase in memory requirements
 DRAM displacing SRAM
– Samsung SGH i700 Triband Windows Smartphone : 32 MB DRAM
– Motorola i930 cell phone has 32 MB DRAM
Dram Makers Prep Multichip Cell Phone Memories. ElectroSpec, 1/3/2003.
Innovative Mobile Products Require Early Access to Optimized DRAM Solutions. A
Position Paper by Samsung Semiconductor, Inc., 2003.
 DRAM continuously drains battery even in standby
– Periodic refresh needed to preserve stored information
 Need to reduce DRAM refresh power
2
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
DRAM – Cells
500 ms
Bit Line
Word Line
from
dummy cell
DRAM
capacitor
50 s
Sense Amp
3
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Retention Time Measurement
ISSI DRAM Chip
(IS42S16800A)
128 Mb
4
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Retention Time Characterization Algorithm
 10 trials are performed for
Disable refresh
– Writing all 1s
– Writing all 0s
Write all 1s to pagei
 Retention time for each page
= minimum of the 20 trials
wait(T)
Read pagei
Is pattern
all 1s?
NO
Retention Time(pagei) = T - ΔT
YES
T = T + ΔT
5
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
DRAM – Retention Time Distribution
Number of pages
800
600
400
200
0
worst page
retention time
(500 ms)
0
10000 20000 30000 40000 50000
Retention time (ms)
6
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Cumulative Retention Time Distribution
99% DRAM usable
with 3s refresh period
% Usable DRAM
100
80
6X longer than worst page
(500 ms at 25°C)
60
46X longer than default
(64 ms at 70°C)
40
20
0
0
10000 20000 30000 40000 50000
Refresh period (ms)
7
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
DRAM – Retention Time Distribution 45°C
100
% Usable DRAM
Number of pages
500
400
300
200
100
0
80
60
40
20
0
0
5000
10000
15000
Retention time (ms)
20000
0
5000
10000
15000
20000
Refresh period (ms)
worst page
retention time
(320 ms)
8
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
DRAM – Retention Time Distribution 70°C
100
% Usable DRAM
Number of pages
1000
800
600
400
200
80
60
40
20
0
0
0
1000 2000 3000 4000 5000 6000
Retention time (ms)
0
2000
4000
6000
Refresh period (ms)
worst page
retention time
(150 ms)
9
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
DRAM Refresh Approaches
Three classes of hardware techniques:
 Conventional
– Refresh period based on worst cell
– Worst-case temperature
 Temperature-Compensated Refresh (TCR)
– Refresh period based on worst cell
– Dynamically adjusts for temperature
 Better than worst-case design
– Multiple refresh periods (refresh period per block of cells)
– Can be coupled with TCR
10
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
RAPID (Retention-Aware Placement in DRAM)
 Novel software-only approach
KEY IDEA:
 Allocate longer-retention pages before allocating shorterretention pages
 Single refresh period selected based on shortestretention page among populated pages, rather than
shortest-retention page overall
NOTE:
 All pages are still refreshed
 Extended refresh period safe only for populated pages
 OK for overall correctness
11
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
RAPID
Conventional RAPID-1
Refresh Period
RAPID-2
RAPID-3
0.5s
3s
37s
50s
41s
22s
41s
50s
22s
37s
A
A
C
C
B
B
Retention
Time
Data Objects
A, B, C, D
3s
Program Sequence
Allocate A
Allocate B
Allocate C
0.5s
C
D
37s
C
B
B
C
D
41s
D
A
A
50s
D
D
22s
Allocate D
Free B
Free D
1s
12
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
RAPID - Recap
 Novel software-only approach
– Can exploit off-the-shelf DRAMs to reduce refresh power
 RAPID - 1
– Eliminate outlier pages
 RAPID - 2
– RAPID - 1 and
– Allocate longer-retention pages before allocating shorterretention pages
 RAPID - 3
– RAPID - 2 and
– Continuously reconsolidate data to longest-retention
pages possible
13
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Coupling RAPID with Off-the-shelf DRAMs
 RAPID requires a single refresh period that can be
adjusted
 Refresh options in off-the-shelf DRAMs:
SELF-REFRESH
AUTO-REFRESH
 DRAM issues refresh commands
internally at a refresh period that
is fixed or based on a particular
temperature range
 Regularly timed refresh
commands issued by memory
controller
 Typically not programmable
 Typically programmable
 Low power
 High power
 Used in standby operation
 Used in active mode
14
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Coupling RAPID with Off-the-shelf DRAMs (cont.)
Neither refresh option is fully satisfactory
– Self-refresh not programmable
– Auto-refresh programmable, but power-inefficient
Key observation
– DRAMs allow enabling/disabling self-refresh via
configuration register
15
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Coupling RAPID with Off-the-shelf DRAMs (cont.)
Solution
– Set up a periodic timer interrupt (INT1)
• Period = RAPID refresh period
– Set up another interrupt (INT2)
• Triggers 64 ms (self-refresh period) after INT1
– INT1 triggered:
• Enable self-refresh
– INT2 triggered:
• Disable self-refresh
INT2
INT1
INT2
INT1
64 ms
64 ms
10s
INT2
INT1
64 ms
10s
RAPID refresh period
16
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
RAPID – Implementation
Modifications to routines which allocate and deallocate physical pages in memory
INACTIVE LIST
(free-list of inactive physical pages)
Conventional
HEAD
RAPID-1
HEAD
...
....
TAIL
TAIL
RAPID-1: 1% of outlier pages are excluded from Inactive List
17
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
RAPID – Implementation (cont.)
INACTIVE LIST
(free-list of inactive physical pages)
RAPID-1
RAPID-2 and RAPID-3
H
H
...
...
...
...
.....
T
...
T
50 – 45.3
seconds
45.3 – 40.6
40.6 – 35.9
7.7 – 3
Refresh Period = 40.6s
45.3s
35.9s
RAPID-2 and RAPID-3
Single Inactive List transformed into Multiple Inactive Lists
18
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Related Work
 Characterization
[Hamamoto et al. 1998, IEEE Tran Electron Devices] On Retention Time Distribution of DRAM
 Dual Period Refresh
[Yanagisawa 1988, US Patent] Semiconductor Memory
 Multiperiod Refresh
[Kim and Papaefthymiou 2001, 2003, IEEE Tran VLSI Systems] Block-Based Multiperiod Refresh
 Multiperiod Refresh + Selective
[Ohsawa et al. 1998, ISPLED] Optimizing DRAM refresh Count for Merged DRAM/Logic LSIs
 Placement + Single Refresh Period + Selective
[Kai et al. 2002, US Patent] Semiconductor Circuit and Method of Controlling the Same
[Le et al. 2005, US Patent] Bank Address Mapping According to Bank Retention Time in DRAM
[Kawasaki et al. 2005, US Patent] DRAM and Refresh method thereof
19
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Evaluation Methodology
 Active Mode (5%), Idle Mode (95%)
 Simulated timeline = 24 hours
 Divide 24-hour timeline into time slices
 Probability [time slice = active period] = 5%
 Probability [time slice = idle period] = 95%
 Inject random number of allocations/frees in
active period
– Vary DRAM utilization across timeline
20
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Custom Hardware Techniques
 HW-Multiperiod (HW-M)
– Each DRAM page is refreshed at a tailored refresh period, that is
a multiple of shortest refresh period
 HW-Multiperiod-Occupied (HW-M-O)
– Same as HW-Multiperiod
– Refresh only currently occupied pages
 HW-Ideal (HW-I)
– Each DRAM page is refreshed at its own tailored refresh period
 HW-Ideal-Occupied (HW-I-O)
– Same as HW-Ideal
– Refresh only currently occupied pages
21
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
NC STATE UNIVERSITY
1
HW-I-O
HW-M-O
0
HW-I
HW-I-O
HW-M-O
HW-I
HW-M
R-3
R-2
R-1
TCR
0
0.2
HW-M
83% 93% 95%
0.2
0.4
R-3
0.4
0.6
R-2
0.6
45°C
0.8
R-1
0.8
1
TCR
25°C
normalized energy w.r.t. TCR
1
 83-95% energy savings at 25°C
70°C
0.8
 Similar energy savings across all
temperatures
0.6
0.4
 Nearly as effective as custom
hardware techniques
0.2
HW-I-O
HW-I
HW-M-O
Ravi Venkatesan © 2006
HW-M
R-3
R-2
R-1
0
TCR
normalized energy w.r.t. TCR
normalized energy w.r.t. TCR
RAPID – Refresh Energy Savings
22
HPCA-12 02/14/2006
NC STATE UNIVERSITY
0.2
1.2
R-3
R-2
R-1
TCR
0
0.8
0.6
0.4
0.2
0
R-3
0.4
45°C
1
R-2
0.6
1.62
R-1
0.8
1.2
TCR
25°C
1
Energy Consumption (mW hours)
1.2
3.46
70°C
1
0.8
0.6
0.4
0.2
Ravi Venkatesan © 2006
R-3
R-2
R-1
0
TCR
Energy Consumption (mW hours)
Energy Consumption (mW hours)
RAPID vs. TCR
 Worst-case non-temperature adjusted
R-1 yields same or better energy than
TCR
 R-1 simpler, cost-effective alternative
to TCR
 R-2 and R-3 yield much higher energy
savings than TCR
23
HPCA-12 02/14/2006
NC STATE UNIVERSITY
0.04
HW-I-O
HW-M-O
0
HW-I
HW-I-O
HW-M-O
HW-I
HW-M
R-3
R-2
R-1
0
0.08
HW-M
0.04
0.12
R-3
0.08
50% utilization (average)
0.16
R-2
0.12
0.2
R-1
75% utilization (average)
0.16
Energy Consumption (mW hours)
0.2
0.2
25% utilization (average)
0.16
0.12
0.08
0.04
Ravi Venkatesan © 2006
HW-I-O
HW-M-O
HW-I
HW-M
R-3
R-2
0
R-1
Energy Consumption (mW hours)
Energy Consumption (mW hours)
Refresh Energy – Different DRAM Utilizations
 R-1, HW-M and HW-I constant
 R-2 and R-3 yield more energy
savings than HW-M and HW-I as
utilization decreases
 R-2 and R-3 approach energy of
HW-I-O
 HW-I-O not retention-aware
24
HPCA-12 02/14/2006
NC STATE UNIVERSITY
Summary – RAPID
 RAPID reduces refresh power to vanishingly
small levels
– Quasi-non-volatile DRAM
 Software-only technique
– No custom DRAM support required
– Can exploit off-the-shelf DRAM
 Approaches energy levels of idealized
techniques, which require hardware support
25
Ravi Venkatesan © 2006
HPCA-12 02/14/2006
Download