Google File System Simulator

advertisement
Google File System Simulator
Pratima Kolan
Vinod Ramachandran
Google File System
•
•
•
•
Master Manages Metadata
Data Transfer Happens directly between client and chunk server
Files broken into 64 MB chunks
Chunks replicated across three machines for safety
Event Based Simulation
Component 1
Get Next
High Priority
Event from
Queue
Place Event in
Priority Queue
Simulator
Priority Queue
Component 2
Event 1
Event 2
Event 3
Output of
simulated event
Component 3
Simplified GFS Architecture
Client
Switch: Infinite Bandwidth
Master Server
Switch
Represent Network
Queues
Network Disk 1 Network Disk 2 Network Disk 3 Network Disk 4 Network Disk 5
Data Flow
The client queries the master server for a Chunk ID it wants to read.
The master server returns a set of disks ids that contain the Chunk.
The client requests a disk for the Chunk
The disk transfers the data to the client
Experiment Setup
• We have a client whose bandwidth can be varied
from 0…..1000 Mbps
• We have 5 disks each a having a per disk
bandwidth of 40 Mbps
• We have 3 chunk replicas per chunk of data as a
baseline
• Each client request is for 1 Chunk of data from a
disk
Simplified GFS Architecture
Client Bandwidth varied
from 0…..1000 Mbps
Client
Switch: Infinite Bandwidth
Master Server
Switch
Represent Network
Queues
Network Disk 1 Network Disk 2 Network Disk 3 Network Disk 4 Network Disk 5
Chunk ID:
0-1000
0-1000
0-2000
Per Disk Bandwidth : 40 Mbps
1001-2000 1001-2000
Experiment 1
• Disk Requests Served With out Load Balancing
– In this case we pick the first chunk server from the
list of available chunk servers that contains the
disk block.
• Disk Requests Served With Load Balancing
– In this case we apply a greedy algorithm and
balance the load of incoming requests across the
5 disks
Expectation
• In the Non load balancing case we expect the
effective request/data rate to reach a peak
value of 2 disks(80 Mbps)
• In the load balancing case we expect the
effective request/data rate to reach a peak
value of 5 disks(200 Mbps)
Load Balancing Graph
This graph plots the data rate at client vs. client bandwidth
Experiment 2
• Disk Requests Served With No Dynamic Replication
– In this case we have a fixed number of replicas(3 in our case) and the server
does not create more replication based on statistics for read requests.
• Disk Requests Served With Dynamic Replication
– In this case the server replicates certain chunks based on the frequency of the
chunk requests.
– We define a replication factor , which is fraction < 1
– No of Replicas For Chunk = (replication factor) * No of requests For The
Chunk
– We Cap the Max No of Replicas by the Number of disks
Expectation
• Our Requests are all aimed on the chunks placed
in disk 0,disk 1 , disk2.
• In the non replication case we expect the
effective data rate at the client to me limited by
the bandwidth provided by 3 disks(120 Mbps)
• In the replication case we expect the effective
data rate at the client to me limited by the
bandwidth provided by 5 disks(200 Mbps)
Replication Graph
This graph plots the data rate at client vs. client bandwidth
Experiment 3
• Disk Requests Served with no Rebalancing
– In this case we do not implement any rebalancing
of read requests based on frequency of chunk
requests
• Disk Requests Served with Rebalancing
– In this case we perform rebalancing of read
requests by picking a request with highest
frequency and transferring it to a disk with a
lesser load
Graph 3
Request Distribution Graph
5000
4500
4000
No of
Requests
3500
In each disk
3000
No Re-Balancing,No Replication
2500
No Re-Balancing,Replication
2000
Re-Balancing,No Replication
1500
1000
500
0
Disk 0
Disk 1
Disk 2
Disk 3
Disk 4
Conclusion and Future Work
• GFS is a simple file system for large-data
intensive applications
• We studied the behavior of certain read
workloads on this file system
• In the future we would like to come up with
optimizations that could fine tune GFS
Download