FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment

advertisement
FARSITE: Federated,
Available, and Reliable
Storage for an Incompletely
Trusted Environment
Introduction
 Farsite:

serverless distributed file system
Logically functions as a centralized file server
 Designed
for desktop environments
 Need some effort for initial configurations
 With little central administration to
maintain
Farsite Characteristics
 Peer-to-peer
among untrusted machines
 Need to handle privacy, integrity, durability



Cryptography
Randomized replication
Byzantine fault-tolerance
Farsite Workloads
 High
access locality
 Low update rate
 Sequential accesses with rare
concurrency
Administration
 Machine
certificates bind machines to their
public keys
 User certificates bind users to their public
keys
 Namespace certificates bind namespace
roots to their managing machines
Design Assumptions
~105 machines
 All interconnected by a high-bandwidth,
low-latency network
 Majority of machines to be up most of the
time
 Uncorrelated permanent machine failures
 Read-mostly sharing
 Few malicious users
 for
Enabling Technology Trends
 Increase


in unused disk capacity
In 2000, 58% of disk capacity unused at
Microsoft
Can replicate data for reliability
 Decrease



in the computational cost
Can easily encrypt at 53 MB/sec
Disk transfers at 32 MB/sec
Can use strong cryptography for security
Namespace Roots
 Allow
multiple roots for multiple machines
Trust and Certification
 Based
on public-key-cryptographic
certificates




Encrypt(Keypublic, textplain)  textcipher
Decrypt(Keyprivate, textcipher)  textplain
Encrypt(Keyprivate, textplain)  textcipher
Decrypt(Keypublic, textcipher)  textplain
Public Key Encryption Basics
 Idea


Public key is published
Private key is the secret
 Encrypt(Keymy_public,

Anyone can create it, but only I can read it
 Encrypt(Keymy_private,

“Hi, Andy”)
“I’m Andy”)
Everyone can read it, but only I can create it
Public Key Encryption Basics
 Encrypt(Keyyour_public,
Encrypt(Keymy_private,
“I know your secret”))

Only you can read it, and only I can send it
Basic System
 Every

machine has three roles
Client
• A machine that interacts with a user

Directory group
• A set of machines that manage files via Byzantinefault-tolerant protocol
• Every group member owns a replica

File host
More on the Basic System
+ Reliability
+ Data integrity
- Performance


Byzantine’s algorithm can only tolerate up to
1/3 of failed replicas
Need lots of replicas
- Privacy
- Storage consumption
System Enhancements
 Local

caching
A client can lease a copy of a file
 Encrypt
written files with public keys of all
authorized clients




Offload those files to file hosts
Store only the content hash of those files
locally
Can validate damaged copies
Can tolerate n – 1 file host failures
Traditional Byzantine Approach [CL99]
Client
Byzantine faulttolerant protocol
File
Meta-Data
3f +1 file copies
to handle f failures
Byzantine servers
Farsite: BFT only for meta-data
Client
Byzantine faulttolerant protocol
f + 1 file copies
for f failures
File hosts
Directory group
Semantic Differences from NTFS
 Hard
limit on concurrent writes
 Soft limit on concurrent read

Sometime supply stale snapshots
 No
name-locking on open file’s path
File System Features
 Reliability
 Availability
 Security
 Durability
 Consistency
 Scalability
 Efficiency
 Manageability
Reliability and Availability
 Replication
 When
a machine in unavailable for an
extended period

Its functions migrate to others
 Caching
Privacy
 File
content and metadata are encrypted
 Convergent encryption

Encrypt(Hashone_way(blockplain), blockplain) 
blockcipher
Data blocks
Hash
Encrypt
More on Convergent Encryption
 Block
hashes are used to identify identical
block contents
 Block-level encryption allows block-level
changes without re-encrypting the entire
file
More on Convergent Encryption
 Encrypt(Keyfile,
file_hashesplain) 
file_hashescipher
Encrypt
Block hashes
More on Convergent Encryption
 Encrypt(Keyclient1_public,
Keyfile) 
Keyfile_cipher1
 Encrypt(Keyclient2_public, Keyfile) 
Keyfile_cipher2
…
 Store both encrypted file and keys
Directories
 Also
encrypted
 Use exclusive encryption

Prevent malicious client from encrypting a
syntactically illegal name
Integrity
 Use



hash trees to compare files
If the root matches, two files are identical
If not, compare the hashes at the lower level
Until the discrepancy is identified
 The
cost of in-place updates is logarithmic
of the file size
 Linear time to verify the integrity of
individual blocks
Durability
 Updates
are logged and compressed
locally
 The log is pushed back to the directory
group periodically and when a lease is
recalled
 Each log entry is verified
Consistency
 Control




can be loaned to clients
Content leases
Name leases
Mode leases
Access leases
Data Consistency
 Content


leases
Read/write
Read-only
• Assures no stale data



Single-writer, multiple-reader semantics
A lease is kept until it is expired or recalled
Can lease a file, directory, a tree
Namespace Consistency
 Name


leases
Can create a file name
Can create a directory and its files and
subdirectories
Windows File-Sharing Semantics
 Mode

leases
Read, write, delete, exclude-read, excludewrite, exclude-delete
Windows Deletion Semantics
 Open
it, mark it for deletion, close it
 A file is not deleted until the last file close
 Access leases


Public: Lease holder has the file open
Protected
• No other client will be granted access without first
contacting the lease holder

Private
• No other client has any access lease on the file
Scalability
 Hint-based

pathname translation
Caching
 Delayed
directory-change notification
Space Efficiency
 Reclaim




space from duplicate files
Workgroup-shared documents
Multiple copies of common applications
Can save 50% of storage requirement
Based on hash comparisons
Time Efficiency
 Insert
a delay between a file creation and
replication


Expect many files get deleted shortly after
their creation
Reduced network traffic
Local-Machine Administration
 Machine

replacement
A special case of hardware failure
 Little
need for backup
Performance Measurements
 Used
only five machines…
 With only 1 hour of file-system trace

2
450,164 file operations
to 4 times as long as NTFS
reads/writes/closes
 9 times as long for opens
 20 times as long for metadata accesses
 5.5 times slower I/O latencies
Download