ppt - Duke Database Research Group

advertisement

HDFS/GFS

Outline

• Requirements for a Distributed File System

• HDFS

– Architecture

– Read/Write

• Research Directions

– Popularity

– Failures

– Network

Properties of a Data Center

• Servers are built from commodity devices

– Failure is extremely common

– Servers only have a limited amount of HDD space

• Network is over-subscribed

– The bandwidth between servers is different

• Demanding applications

– High throughput, low latency

• Resources are grouped into failure-zone

– Independent units of failure

10GB

25GB

100GB

Data-Center Architecture

Properties of a Data Center

• Servers are built from commodity devices

– Failure is extremely common

– Servers only have a limited amount of HDD space

• Network is over-subscribed

– The bandwidth between servers is different

• Demanding applications

– High throughput, low latency

• Resources are grouped into failure-zone

– Independent units of failure

Data-Center Architecture

Failure Domain 1 Failure Domain 2

Data-Center Architecture

Failure Domain 1 Failure Domain 2

Data-Center Architecture

Failure Domain 1 Failure Domain 2

Goals for a Data Center File System

• Reliable

– Over come server failures

• High performing

– Provide good performance to application

• Aware of network disparities

– Make data local to the applications

Common Design Principles

• For performance: Partitioning the data

– Split data into chunks and distribute

• provides high throughput

– Many people can read the chunks in parallel

• Better than everyone one reading the same file

How data is partitioned across nodes

• For reliability: Replication:

– overcome failure by making copies

• At least one copy should be online

How data is duplicated across nodes

• For Network-disparity: rack-aware allocation

– Read from the closest block

– Write to the closest location

Common Design Principles

• For performance: Partitioning the data

– Split data into chunks and distribute

• provides high throughput

– Many people can read the chunks in parallel

• Better than everyone one reading the same file

How data is partitioned across nodes

• For reliability: Replication:

– overcome failure by making copies

• At least one copy should be online

How data is duplicated across nodes

• For Network-disparity: rack-aware allocation

– Read from the closest block

– Write to the closest location

Common Design Principles

• For performance: Partitioning the data

– Split data into chunks and distribute

• provides high throughput

– Many people can read the chunks in parallel

• Better than everyone one reading the same file

How data is partitioned across nodes

• For reliability: Replication:

– overcome failure by making copies

• At least one copy should be online

How data is duplicated across nodes

• For Network-disparity: rack-aware allocation

– Read from the closest block

– Write to the closest location

Outline

• Requirements for a Distributed File System

• HDFS

– Architecture

– Read/Write

• Research Directions

– Popularity

– Failures

– Network

HDFS Architecture

• Name Node – Master (only 1 in a data center)

– All reads/write go through the master

– Manages the data nodes

• Detects failures – triggers replication

• Tracks performance

– Tracks location of blocks

Name

Node

• Tracks block to node mapping

• Tracks status of data nodes

• Rebalances the data center

• Orchestrates read/writes

• Data Node –

– One per server

– Stores the blocks

• Tracks status of blocks

• Ensures integrity of block

Data B Data

Node

What is a Distributed FS Write?

• HDFS

– For high-performance

• Make N copies of the data to be written

• Default N= 3

Write B

HDFS

Master

B

B

B

What is a Distributed FS Write?

• HDFS

– For Fault tolerance

• Place in two different fault domains

• 2 copies in the same rack

• 1 in a different rack

B

Zone 1

B

Zone 2

B

What is a Distributed FS Write?

• HDFS

– For Network awareness

• Currently does nothing

Picks two random racks

What is a Distributed FS Read?

• HDFS

– For Network awareness/performance

• Pick closest copy to read from.

– Nothing specific for Reliability

Read B

Name

Node

B

Zone 1

B

Zone 2

B

Implications of Read/Write Semantics

• One application write == 3 HDFS writes

– Writes are costly!!

– HDFS is optimized for write-once/read-many times workloads

• What is an update/edit? Rewrite blocks?

Modify B

Name

Node

B

Zone 1

B

Zone 2

B

Implications of Read/Write Semantics

• One application write == 3 HDFS writes

– Writes are costly!!

– HDFS is optimized for write-once/read-many times workloads

• An update/Edit:

– delete old data + write new data

Modify B

Name

Node

B

B`

B

B` B`

B

Interesting Challenges

• How happens with more popular blocks?

– Or less popular blocks?

• What happens during server failures?

– Can you loose data?

• What happens if you have a better network?

– No oversubscription

Outline

• Requirements for a Distributed File System

• HDFS

– Architecture

– Read/Write

• Research Directions

– Popularity

– Failures

– Network

Popularity in HDFS

• Not all files are equivalent

– E.g. More people search for bball than hockey

• More popular blocks will have more contention

– Leads to slower performance

– Search for bball will be slower

Popularity in HDFS

• # of copies of a block = function(popularity)

– If 50 people search for bball, then make 50 blocks

– If only 3 search for hockey, then make 3

• You want as many copies of a block as readers

Popularity in HDFS

• # of copies of a block = function(popularity)

– If 50 people search for bball, then make 50 blocks

– If only 3 search for hockey, then make 3

• You want as many copies of a block as readers

Popularity in HDFS

• As data becomes old less people care about it

– So last year’s weather versus today’s weather

• When a block becomes old (older than a week)

– Reduce the number of copies.

– In Facebook data centers, only one copy of old data

Failures in Data Center

• Do servers fail????

– Facebook: 1% of servers fail after-reboot

– Google: at least one server fails a day

Data

B

B`

Data

Data

B Node

Name

Node

• Failed node doesn’t send heart beat

• Name node determines blocks on failed node

• Starts replication.

Failures in Data Center

• Do servers fail????

– Facebook: 1% of servers fail after-reboot

– Google: at least one server fails a day

Name

Node

Data

B

B`

Data B Data

Node

• Failed node doesn’t send heart beat

• Name node determines blocks on failed node

• Starts replication.

Problems With Locality aware DFS

• Ignores contention on the servers

– I/O contention greatly impacts performance

Problems With Locality aware DFS

• Ignores contention on the servers

– I/O contention greatly impacts performance

• Ignores contention in the network

– Similar performance degradation

10GB

25GB

100GB

Types of Network Topologies

• Current Networks

– Uneven B/W everywhere

• Future Networks

– Even B/W everywhere

100GB

100GB

100GB

Implications of Network Topologies

• Blocks can be more spread out!

– No need for two blocks within the same rack

– Same BW everywhere so no need for locality aware placement

Summary

• Properties for a DFS

• Research Challenges

– Popularity

– Failure

– Data Placement

Un-discussed

• Cluster rebalancing

– Move blocks around based on utilization.

• Data integrity

– Use checksum to check if data has gotten corrupted.

• Staging + pipeline

Download