Conquest Better Performance Through A Disk/Persistent-RAM Hybrid File System

advertisement
Conquest:
Better Performance Through
A Disk/Persistent-RAM
Hybrid File System
USENIX 2002
An-I Andy Wang • Peter Reiher • Gerald Popek
University of California, Los Angeles
Geoffrey Kuenning
Harvey Mudd College
Conquest Overview

File systems are optimized for disks


Performance problem
Complexity

Now we have tons of inexpensive RAM
 What can we do with that RAM?
2
Conquest Approach

Combine disk and persistent RAM (e.g.,
battery-backed RAM) in a novel way

Simplification


> 20% fewer semicolons than ext2, reiserfs,
and SGI XFS
Performance (under popular benchmarks)

24% to 1900% faster than LRU disk caching
3
Motivation

Most file systems are built for disks

Problems with the disk assumption:


Performance
Complexity
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
4
Hardware Evolution
CPU (50% /yr)
Memory (50% /yr)
1 GHz
Accesses 1 MHz
Per
Second
1 KHz
(Log Scale)
1990
(1 sec : 6 days)
106
105
Disk (15% /yr)
1995
2000
(1 sec : 3 months)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
5
Inside the Pandora’s Box


Disk arm
Disk platters
Access time = seek time (disk arm)
+ rotational delay (disk platter)
+ transfer time
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
6
Disk Optimization Methods

Disk arm scheduling
 Group information on
disk
 Disk readahead
 Buffered writes
 Disk caching


Data mirroring
Hardware parallelism
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
7
Complexity Bytes
predictive readahead
synchronization
cache replacement
elevator algorithm
data consistency
asynchronous write
data clustering
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
8
Storage Media Alternatives
$/MB (log)
Magnetic RAM?
10-3
100
10-3
tape
103
disk
106
accesses/sec (log)
battery-backed DRAM
(write once) flash memory
persistent RAM
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
[Caceres et al., 1993; Hillyer et al., 1996; Qualstar 1998; Tanisys 1999; Micron
Semiconductor Products 2000; Quantum 2000]
9
Price Trend of Persistent RAM
102
101
$/MB
(log)
100
10-1
10-2
1995
Booming of digital
photography
4 to 10 GB of
persistent RAM
paper/film
Persistent RAM
1” HDD
3.5” HDD 2.5” HDD
2000
Year
2005
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
[Grochowski 2000]
10
Old Order; New World

Disk staying around


RAM as a viable storage alternative


Cost, capacity, power, heat
PDAs, digital cameras, MP3 players
More architectural changes due to RAM


A big assumption change from disk
Rethink data structures, interface, applications
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
11
Getting a Fresh Start
What does it take to design and build a system
that assumes ample persistent RAM as the
primary storage medium?
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
12
Conquest

Design and build a disk/persistent-RAM
hybrid file system
 Deliver all file system services from memory,
with the exception of high-capacity storage

Benefits:


Simplicity
Performance
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
13
Simplicity

Remove disk-related complexities for most
files
 Make things simpler for disk as well
 Less complexity



Fewer bugs
Easier maintenance
Shorter data path
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
14
Performance

Overall


Memory data path


All management performed in memory
No disk-related overhead
Disk data path

Faster speed due to simpler access models
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
15
Conquest Components





Media management
Metadata management
Allocation service
Persistence support
Resiliency support
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
16
User Access Patterns

Small files



Large files



Take little space (10%)
Represent most accesses (90%)
Take most space
Mostly sequential accesses
Except database applications
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
[Iram 1993; Douceur et al., 1999; Roselli et al., 2000]
17
Files Stored in Persistent RAM

Small files (< 1MB)




Metadata



No seek time or rotational delays
Fast byte-level accesses
Contiguous allocation
Fast synchronous update
No dual representations
Executables and shared libraries

In-place execution
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
18
Memory Data Path of Conquest
Conventional file systems
Conquest Memory Data Path
Storage requests
Storage requests
IO buffer
management
Persistence
support
IO buffer
Battery-backed
RAM
Persistence
support
Small file and metadata storage
Disk
management
Disk
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
19
Large-File-Only Disk Storage

Allocate in big chunks


Lower access overhead
Reduced management overhead

No fragmentation management
 No tricks for small files


Storing data in metadata
No elaborate data structures

Wrapping a balanced tree onto disk cylinders
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
[Devlinux.com 2000]
20
Sequential-Access Large Files

Sequential disk accesses

Near-raw bandwidth

Well-defined readahead semantics
 Read-mostly

Little synchronization overhead (between
memory and disk)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
21
Disk Data Path of Conquest
Conventional file systems
Conquest Disk Data Path
Storage requests
Storage requests
IO buffer
management
IO buffer
management
IO buffer
Persistence
support
IO buffer
Battery-backed
RAM
Small file and metadata storage
Disk
management
Disk
management
Disk
Disk
Large-file-only file system
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
22
Random-Access Large Files

Random access?




Common definition: nonsequential access
A typical movie has 150 scene changes
MP3 stores the title at the end of the files
Near Sequential access?

Simplify large-file metadata representation
significantly
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
23
Logical File Representation
Name(s)

i-node
 File attributes

Data
File
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
24
Physical File Representation
Name(s)


i-node
 File attributes
 Data locations
Data blocks
File
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
25
Ext2 Data Representation
data block location
data block location
data block location
data block location
10
index block location
index block location
index block location
i-node
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
26
Problems with Ext2 Design
- Designed for disk storage
- Optimization for small files makes things
complex
- Random-access data structure for large files
that are accessed mostly sequentially
- Data access time dependent on the byte
position in a file
- Maximum file size is limited
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
27
Conquest Representation

Persistent RAM



Hash(file name) = location of data
Offset(location of data)
Disk storage

Per-file, doubly linked list of disk block
segments (stored in persistent RAM)
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
28
Conquest Design
+ Direct data access for in-core files
+ Worse case: sequential memory search for
infrequent random accesses to on-disk files
+ Maximum file size limited by physical storage
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
29
Implementation Status

Kernel module under Linux 2.4.2
 Fully functional and POSIX compliant
 Modified memory manager to support
Conquest persistence
 Preparing for office-wide deployment
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
30
Conquest Evaluation

Architectural simplification


Feature count
Performance improvement


Memory-only workload
Memory and disk workload
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
31
Conventional Data Path
Conventional file systems
Storage requests
IO buffer
management
IO buffer
Persistence
support
Disk
management










Disk


Buffer allocation management
Buffer garbage collection
Data caching
Metadata caching
Predictive readahead
Write behind
Cache replacement
Metadata allocation
Metadata placement
Metadata translation
Disk layout
Fragmentation management
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
32
Memory Path of Conquest
Conquest Memory Data Path 
Storage requests
Persistence
support
Battery-backed
RAM
Small file and metadata storage







Memory manager
encapsulation





Buffer allocation management
Buffer garbage collection
Data caching
Metadata caching
Predictive readahead
Write behind
Cache replacement
Metadata allocation
Metadata placement
Metadata translation
Disk layout
Fragmentation management
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
33
Disk Path of Conquest
Conquest Disk Data Path

Storage requests

IO buffer
management
Battery-backed
IO buffer
RAM
Small file and metadata storage
Disk
management







Disk


Large-file-only file system

Buffer allocation management
Buffer garbage collection
Data caching
Metadata caching
Predictive readahead
Write behind
Cache replacement
Metadata allocation
Metadata placement
Metadata translation
Disk layout
Fragmentation management
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
34
PostMark Benchmark



ISP workload (emails, web-based transactions)
Conquest is comparable to ramfs
At least 24% faster than the LRU disk cache
9000
8000
7000
6000
5000
trans / sec
4000
3000
2000
1000
0
250 MB working set
with 2 GB physical RAM
5000
10000
15000
20000
25000
30000
files
SGI XFS
reiserfs
ext2fs
ramfs
Conquest
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
[Katcher 1997; Sweeney et al., 1996; Card et al., 1999; Namesys 2002]
35
PostMark Benchmark

When both memory and disk components are
exercised, Conquest can be several times faster than
ext2fs, reiserfs, and SGI XFS
5000
4000
<= RAM > RAM
10,000 files,
3.5 GB working set
with 2 GB physical RAM
3000
trans / sec
2000
1000
0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
percentage of large files
SGI XFS
reiserfs
ext2fs
Conquest
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
36
PostMark Benchmark

When working set > RAM, Conquest is 1.4 to 2 times
faster than ext2fs, reiserfs, and SGI XFS
10,000 files,
3.5 GB working set
with 2 GB physical RAM
120
100
80
trans / sec 60
40
20
0
6.0
7.0
8.0
9.0
10.0
percentage of large files
SGI XFS
reiserfs
ext2fs
Conquest
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
37
Lessons Learned

Faster than LRU caching, unexpected



Heavyweight disk handling
Severe penalty for accesses to content
Matching user access patterns to storage
media offers considerable simplification and
better performance


Not an automatic result
Need careful design
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
38
Conclusion

Conquest demonstrates how rethinking
changes in underlying assumptions can lead
to significant architectural and performance
improvements

Radical changes in hardware, applications,
and user expectations in the past decade
should lead us to rethink other aspects of OS
as well.
Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion
39
Questions . . .
Conquest: http://lasr.cs.ucla.edu/conquest
Andy Wang: awang@cs.ucla.edu
40
Download