04 Hadoop&HDFS

advertisement
Hadoop&HDFS
1
OUTLINE
• Introduction
• Architecture
• Hadoop Distribution File System
– Architecture of HDFS
• NameNode
• DataNode
• HDFS Client
– Replica Management
2
OUTLINE
• Introduction
• Architecture
• Hadoop Distribution File System
– Architecture of HDFS
• NameNode
• DataNode
• HDFS Client
– Replica Management
3
What is Hadoop?
4
Hadoop 起源(2002~2004)
• 發起人-Doug Cutting
• Lucene
– 用Java 設計的高效能文件索引引擎API
– 索引文件中的每一字,讓搜尋的效率比傳統逐
字比較還要高的多
• Nutch
– 開放原始碼的網站搜尋引擎
– 利用Lucene 函式庫開發
5
Hadoop 轉折點
• Nutch遇到處理大量網站資料的瓶頸
• Google發表三大關鍵技術
– SOSP 2003 : “The Google File System”
– OSDI 2004 : “MapReduce : Simplifed Data
Processing on Large Cluster”
– OSDI 2006 : “Bigtable: A Distributed Storage
System for Structured Data”
6
Hadoop 起源 (2004~Now)
• 參考 Google 提出的技術並先後於Nutch上實作
– 分散式檔案系統Nutch Distributed File System
(NDFS)
– MapReduce
• 在2006年時,Nutch 把分散式計算
(Distributed Computing) 的部分獨立出來,稱
之為Hadoop
• NDFS改名為 Hadoop Distributed File System
(HDFS)
7
Hadoop 的特色
• 在數據資料沒有相依性的情況下,可以有
效率的平行處理這些資料。
• 可以透過自動維護資料副本的功能,提供
容錯機制,讓錯誤發生時可自動回復。
• 可以提供可靠的資料儲存及分析處理的能
力。
8
Linux
Linux
Linux
Linux
9
Hadoop 的架構(1/3)
• Hadoop 專案包含一些相關子專案
ZooKeeper
Avro
Pig
Chukwa
Hive
MapReduce
HBase
HDFS
Hadoop Core
10
Hadoop 的架構(2/3)
– Hadoop Core:
• 核心部分包含一些分散式檔案系統及一般輸出入的重要
元件跟介面。
– Avro:
• 一個有效率,跨越各種語言的RPC的資料序列化系統。
– MapReduce:
• 一個分散式資料處理模式及執行環境。
– HDFS:
• 一個分散式檔案系統。
– Pig:
• 處理大量資料集的資料流語言與執行環境。
11
Hadoop 的架構(3/3)
– HBase:
• 一個以列 (row) 為導向的分散式資料庫系統。
– ZooKeeper:
• 一個分散式協同服務,可以提供分散式應用程式的
原始指令。
– Hive:
• 一個分散式資料倉儲系統,管理HDFS上所儲存的資
料,並提供SQL為基礎的查詢語言。
– Chukwa:
• 一個分散式資料收集及分析系統。
12
Google References
The Google File System [2003]
MapReduce
[2004]
Bigtable
[2006]
Google
Hadoop
Google File System
HDFS
MapReduce
MapReduce Framework
Bigtable
HBase
13
Hadoop 與 Google 架構的不同
開發團隊
Google
Apache
贊助者
Google
Yahoo, Amazon
資源
open document
open source
作業系統
Linux
Linux / GPL
搜尋引擎
Google
Nutch
程式撰寫模式
MapReduce
Hadoop
MapReduce
檔案系統
GFS
HDFS
資料庫系統
Bigtable
HBase
特定領域的程式語言
Hive, Pig
Sawzall
協調服務
ZooKeeper
Chubby
14
OUTLINE
• Introduction
• Architecture
• Hadoop Distribution File System
– Architecture of HDFS
• NameNode
• DataNode
• HDFS Client
– Replica Management
15
OUTLINE
• Introduction
• Architecture
• Hadoop Distribution File System
– Architecture of HDFS
• NameNode
• DataNode
• HDFS Client
– Replica Management
16
Architecture of HDFS
HDFS
Client
NN
DN
DN
DN
DN
DN
Cluster
NN: NameNode
DN: DataNode
17
File Storing
DN
DN
DN
Tempo
Block
64MB
Block
64MB
Block
64MB
Block
64MB
Temp
Block
36MB
Block
36MB
Block
36MB
Block
36MB
DN
DN
DN
File
100MB
DN: DataNode
18
OUTLINE
• Introduction
• Architecture
• Hadoop Distribution File System
– Architecture of HDFS
• NameNode
• DataNode
• HDFS Client
– Replica Management
19
Responsibilities of NameNode
• Maintaining the namespace tree and the
mapping of file blocks to DataNodes
• Replica management
20
Namespace
• Files and directories are represented by inodes.
• The inode data and the list of blocks
belonging to each file comprise to metadata
of the name system called image.
• The persistent record of the image called
checkpoint.
• The modification log of the image called
journal.
21
Namespace Storing
• NameNode keeps the image in RAM.
• Checkpoint and journal are stored in the local
host’s native files system.
22
Checkpoint & Journal
Journal
Checkpoint
23
NameNode’s Version
24
Protecting the Critical Information
• If ether the checkpoint or the journal is
missing, or be corrupt, the namespace will be
lost party or entirely.
• Storing checkpoint and journal in multiple
store directories and NFS server
• Creating periodic checkpoints by either
CheckpointNode or BackupNode, and storing
checkpoint in it.
25
CheckpointNode Options
• Downloading checkpoint and journal from
NameNode
• Combining the checkpoint and the journal to
create a new checkpoint and an empty journal
• Returning the new checkpoint back to the
NameNode
26
BackupNode
• BackupNode like a Checkpoint, but in addition
maintains an image in memory.
27
OUTLINE
• Introduction
• Architecture
• Hadoop Distribution File System
– Architecture of HDFS
• NameNode
• DataNode
• HDFS Client
– Replica Management
28
Responsibilities of Each DataNode
• Storing blocks and theirs metadata
• Sending block report and heartbeats to the
NameNode
29
Blocks &Metadata
30
DataNode’s Version
31
Verification Log
32
Block Report
• Once an hour
• Contains block id, generation stamp and the
size of each block
• Is important information for Replica
Management
33
Heartbeats
• Once every three seconds
• To confirm the block replicas are available
• Contains total storage capacity, fraction of
storage in use and number of data transfers
currently in progress
• NameNode controls the DataNode by replying
the heartbeats
34
OUTLINE
• Introduction
• Architecture
• Hadoop Distribution File System
– Architecture of HDFS
• NameNode
• DataNode
• HDFS Client
– Replica Management
35
Block Writing
HDFS
Client
NN
DN
DN
DN
DN
DN
Request
DN List
Write
Cluster
NN: NameNode
DN: DataNode
36
Writing a Block
37
File Appending
Read
File Data
Read
Client
Client
Write
Read
Client
Appended Data
Read
Client
38
Block Reading
HDFS
Client
NN
DN
DN
DN
DN
DN
Request
DN List
Read
Cluster
NN: NameNode
DN: DataNode
39
OUTLINE
• Introduction
• Architecture
• Hadoop Distribution File System
– Architecture of HDFS
• NameNode
• DataNode
• HDFS Client
– Replica Management
40
Topology Example
Rack0
N00
Rack1
N01
N02
N10
N11
N12
41
Read Example
Rack0
Rack1
Client
BR
N00
BR
N01
N02
BR
N10
N11
N12
Selected Replica
42
Block Replica
Distance Example 1
Distance is 4
Rack0
Rack1
Client
BR
N00
BR
N01
N02
BR
N10
N11
N12
Selected Replica
43
Block Replica
Distance Example 2
Distance is 2
Rack0
Rack1
Client
BR
N00
BR
N01
N02
BR
N10
N11
N12
Selected Replica
44
Block Replica
Block Placement
Rack0
Rack1
Client
BR
N00
N01
N02
BR
BR
N10
N11
N12
45
Block Replica
Only one replica at one node
46
Most two replicas in the same rack
If the number of nodes
Is twice the number of racks
47
Replication Management
Over-Replicated
Under-Replicated
48
Over-Replicated
Rack0
Rack1
50%
51%
50%
BR
BR
BR
N00
N01
N02
BR
N10
N11
N12
Disk Space Utilization
49
Block Replica
Under-Replicated
Rack0
Rack1
BR
N00
N01
BR
BR
N02
N10
N11
N12
50
Block Replica
Under-Replicated
Rack0
N00
Rack1
N01
BR
BR
BR
N02
N10
N11
N12
51
Block Replica
Block Scanner
To Verify the blocks
52
Balancer
Rack0
Rack1
10%
51%
62%
50%
BR
N00
51%
40%
BR
N01
N02
51%
52%
BR
N10
N11
Threshold Value
N12
Cluster Utilization
Disk Space Utilization
53
Block Replica
Key Requirement
Rack0
Rack1
10%
51%
NO BLOCK CAN BE MOVED
51%
40%
BR
N00
51%
50%
BR
N01
N02
62%
52%
BR
N10
N11
Threshold Value
N12
Cluster Utilization
Disk Space Utilization
54
Block Replica
Download