T s - EuroSys 2009

advertisement
Memory Resource Allocation for File
System Prefetching
-- From a Supply Chain Management Perspective
Zhe Zhang (NCSU), Amit Kulkarni (NCSU)
Xiaosong Ma (NCSU/ORNL), Yuanyuan Zhou (UIUC)
1
3/14/2016
Aggressive prefetching: an idea
whose time has come*
 Enlarging processor-I/O gap
 Processing power doubling every 18 to 24 months
 Disparity between growth of disk latency and throughput
 Latency improving 10% per year while throughput improving
40% per year [Hennessy 03]
 Large memory cache sizes
 Usually 0.05% ~ 0.2% of storage capacity [Hsu 04]
* [Papathanasiou 05]
2
3/14/2016
… and whose challenges follow
 Systems facing large number of concurrent requests
#1
Facebook
How to manage file systems’ memory
resource for aggressive prefetching?

#10
Servers handling large number of clients
Jaguar @
Oak Ridge 11,000 Compute
nodes
National Lab
…
…
…
3
Lustre
72 I/O nodes
18 DDN S2A9500 couplets
3/14/2016
All streams are not created equal
MP3 : 128 kbps
Youtube : 200 kbps
Youtube HQ : 900 kbps
 Allocating memory resource according to access rate?
 Related work
 Access pattern detection: rate not detected [Lee 87, Li 04, Soundararajan 08]
 Aggressiveness control: based on sequentialty [Patterson 95, Kaplan 02, Li 05]
 Multi-stream prefetching: rate not sufficient utilized [Cao 96, Tomkins 97, Gill 07]
4
3/14/2016
Similar story in grocery stores!
……
Milk : 200 per day
……
Beer : 80 per day
$300 Wine : 1 per year
 Allocating storage resource according to consumption rate?
 Studied in Supply Chain Management (SCM)
 Demand rate measurement/analysis/prediction
 Dated back to first wars
 Yet active
Wal-Mart: $24M on satellite network for instant inventory control
 Dell: aiming at “zero inventory”

5
Our contributions
 A mapping between data prefetching and SCM problems
 Novel rate-aware multi-stream prefetching techniques
based on SCM heuristics
 Implementation and performance evaluation
 Modified Linux 2.6.18 kernel
 Extensive experiments with modern server and multiple
workloads
 Coordinated multi-level prefetching
 Based on multi-echelon inventory control
 Extending application access pattern to lower level
 Evaluation with combinations of state-of-the-art single level
algorithms
6
3/14/2016
Outline
 Motivation
 Background and problem mapping
 Algorithms
 Performance evaluation
 Conclusions
7
3/14/2016
Background – Inventory cycles
 Inventory theory
 Task: manage inventory for goods
 Goal: satisfy customer demands
Inventory
level
cycle
inventory
order
quantity
average
demand
fast
dem
-and
slow
demand
reorder
point
safety
inventory
8
lead
time
Time
3/14/2016
Background – Prefetching basics
Memory cache
trigger distance
prefetch degree
Disk
9
3/14/2016
Background – Prefetching cycles
 Prefetching techniques:
 Task: manage the cache for data blocks
 Goal: satisfy application requests
order
prefetch
quantity
degree
Prefetched
blocks
cycle
inventory
average
demand
fast
dem
-and
slow
demand
Tc
safety
inventory
10
trigger
distance
Ts
disklead
access
time
reorder
point
Time
3/14/2016
Challenges in mapping
 Data requests  Customer demands
 Data blocks are unique
 “Linear sequence of blocks” in detected streams
GroceryStore::getMilk();
FileSystem::getNextBlock();
FileSystem::getBlock(Position p);
 Prefetched data blocks
 Inventory
 Accessed data blocks remain in the cache
 But as “second class citizens” [Gill 05, Li 05]
11
3/14/2016
Outline
 Motivation
 Background and problem mapping
 Algorithms
 Performance evaluation
 Conclusions
12
3/14/2016
Performance metrics and objectives
 Prefetching optimization objective: improve cache hit rate
 Dynamically adjust
 Trigger distance
 Prefetch degree
 SCM optimization objective: improve fill rate
 Fraction of demand satisfied from inventory
ESC
FR  1
Q
ESC: Expected Shortage per Cycle
Q: order quantity

13
3/14/2016
Rate aware prefetching algorithms
prefetch
degree
Prefetched
blocks
cycle
inventory
average
demand
slow
demand
fast
demand
Tc
safety
inventory
reorder
point
trigger
distance
Ts
Time
 Task: calculating Tc and Ts
 Tc: lead time × average consumption rate
 Ts: based on estimation of uncertainty
14
3/14/2016
Algorithm1: Equal Time Supplies (ETS)
 Safety inventory for all goods set to the same time supply
(e.g., amount of goods consumed in 5 days)
 With “standard” distribution shapes, uncertainty is
proportional to the mean value
 Ts: set to be proportional to average data access rate
trigger distance of streami
average rate of streami
Ri
Ti  Ti Ti 
 Ttotal
 Ri
S
C
1in
total allowed trigger distance

15
3/14/2016
Algorithm2: Equal Safety Factors (ESF)
 Safety inventory set to maintain the same safety factor
across all goods
safety _ factor 
safety _ inventory

standard deviation
 Ts: set to be proportional to standard deviation of access
rate

i
C
Ti  Ti  Ti  (Ri  lead _ time) 
 (Ttotal   Ti )
 i
1in
C
S
1in
 Implementation challenges
 Measurement and calculation overhead

 Limited floating point calculation in kernel
16
3/14/2016
Outline
 Motivation
 Background and problem mapping
 Algorithms
 Performance evaluation
 Conclusions
17
3/14/2016
Comparing with Linux native prefetching
 Linux prefetching algorithm (kernel 2.6.18)
 Trigger distance (T) = Prefetch degree (P)
 Doubling T and P for each sequential hit
 Upper bounds:
 T = P = 32 (pages)
32-32
 Implementation of SCM-based algorithms
 Principle: maintaining same memory consumption as original
algorithm
[T  (T  P)]
P
memory _ consumption 
T
2
2
 Default parameters
 Tdefault = 24, Pdefault = 48

18
24-48
3/14/2016
Experimental setup
 Platform
 Linux server
 2.33GHz quad-core CPU, 16GB memory
 Comparing 32-32, 24-48, ETS and ESF algorithms
 Workloads
 Synthetic benchmarks
 Linux file transfer applications
 HTTP web server workload
 Server benchmarks
 SPC2-VOD-like (sequential)
 TPC-H (random)
19
3/14/2016
Two streams with different rates
 Rate of stream 1 fixed at 1000 pages / second
Response time (μs)
120
100
80
32-32
24-48
ETS
60
40
20
0
3000
5000
7000
Rate of fast stream (pages/second )
Average response time
ETS: 19%~25% improvement over
32-32
20
Number of missies per
cycle
 Rate of stream 2 varying b/w 3000 to 7000 pages / second
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
32-32
24-48
ETS
3000
5000
7000
Rate of fast stream (pages/second )
# of cache misses per prefetch
cycle (ESC)
ETS: same # of cycles as 24-48 and
similar ESC as 32-32
3/14/2016
Two streams with different deviations
 SD of stream 1 fixed at square root of rate
45
40
35
30
25
20
15
10
5
0
120
ETS
Response time (μs)
Response time (μs)
 SD of stream 2 varying b/w 3 to 7 times of the average rate
100
ESF
3 x mean
5 x mean
7 x mean
SD of unstable stream
Average response time
ESF: 20%~35% improvement over
ETS
21
80
60
stable stream w/ ETS
stable stream w/ ESF
unstable stream w/ ETS
unstable stream w/ ESF
40
20
0
3 x mean
5 x mean
7 x mean
SD of unstable stream
Response time of individual streams
ESF: large improvement for unstable
stream, small degradation for stable
stream
3/14/2016
Throughput of server benchmarks
 SPC2-VOD-like (sequential streams)
 TPC-H (random accesses)
Aggregate
throughput
Memory usage
32-32
ETS
32-32
ETS
8 8 12 12 1616 2020 2424
2828
3232
Number
of sequential
streams
Number
of sequential
streams
throughput
Sequential+random apps. memory
consumption
ETS: 6%~53% improvement over 32-32
22
Throughput (pages/second)
unaccessed
amount of
Avg.
(pages/second)
Throughput
data (pages)
9000
1600
8000
1400
7000
1200
6000
1000
5000
800
4000
600
3000
400
2000
200
1000
0 0
450
400
350
300
250
200
150
100
50
0
TPC-H throughput
32-32
ETS
8
12
16
20
24
28
32
Number of sequential streams
Random application throughput
ETS: never worth than 32-32; 2.5%
average improvement
3/14/2016
Conclusions and future work
 Observations
 File blocks can be managed as apples!
 Simple approaches such as ETS seems to perform well
 Future work
 Awareness of both access rate and delivery time
 Adjusting the prefetch degree
 Acknowledgements
 Anonymous reviewers
 Our shepherd: George Candea
 Our sponsors: NSF and DOE Office of Science
23
3/14/2016
Download