Hadoop-HBase-Tutorial - CSE Labs User Home Pages

advertisement
Gowtham Rajappan

HDFS – Hadoop Distributed File System modeled on Google GFS.

Hadoop MapReduce – Similar to Google MapReduce

Hbase – Similar to Google Bigtable

Master: hadoop01.cselabs.umn.edu

Slaves: hadoop02 – hadoop05.cselabs.umn.edu

You will require cselabs account to access this cluster. You can
login to any of these machines from any cs/cselabs machine.


Data is divided into various tables
Table is composed of columns, columns are grouped into columnfamilies


Partitioning

A table is horizontally partitioned into regions, each region is
composed of sequential range of keys

Each region is managed by a RegionServer, a single
RegionServer may hold multiple regions
Persistence and data availability

HBase stores its data in HDFS, it doesn't replicate
RegionServers and relies on HDFS replication for data
availability.

Region data is cached in-memory
 Updates and reads are served from in-memory cache
(MemStore)
 MemStore is flushed periodically to HDFS
 Write Ahead Log (stored in HDFS) is used for durability of
updates

HBase shell provides interactive commands for manipulating
database

Create/delete tables

Insert/update/read from tables

Manage regions


Hbase provides single row atomic operations

CheckAndPut – Similar to test-and-set

CheckAndDelete

All row operations are atomic no matter how many columns are
involved.
Hbase also provides row level exclusive locks

You can use these locks to implement single row level
transactions

HBase stores multiple versions of a column in a row. Each version
is identified by a integer timestamp

By default system time is used as version timestamps. However
user can specify a logical timestamp for versioning

Each update to a row creates a new version, for the specified
column.

A version can be accessed or deleted using its timestamp.
HBase allows to obtain list of all the versions.

Hadoop Home - http://hadoop.apache.org/

Hbase - http://hbase.apache.org/

API

http://hbase.apache.org/apidocs/

http://hadoop.apache.org/
Download