slides

advertisement
HBase
Presented by
Chintamani Siddeshwar
Swathi Selvavinayakam
http://www.slideshare.net/amansk/hbase-hadoop-day-seattle-4987041
HBase
•
•
•
•
Open source BigTable
HDFS as underlying DFS
ZooKeeper as lock service
Tight integration with Hadoop MapReduce
Why HBase ?
• Scales out to thousands of nodes
• Access granularity is a row – read/write to a single row is
atomic
• Designed for workloads consisting of simple operations on
individual items
• Provides efficient access to random rows
• Allows dynamic repartitioning of data
Data Model
•
•
•
•
•
•
•
•
Sparse
Distributed
multi dimensional
persistent
Sorted
map
(row, column, timestamp) -> cell
Column = Column Family : Column Qualifier
System Structure
Other Features
•
•
•
•
•
•
•
Compression
In memory column families
Multiple masters
Rolling restart
Bloom filters
Efficient bulk loads
Source and sink for Hive, Pig, Cascading
Use Cases
•
•
•
•
•
Mozilla
Yahoo!
Twitter
Facebook
Adobe
HBase v/s RDBMS
• Column Oriented
• Flexible schema, add
columns on the fly
• Good with sparse tables
• No query language
• De-normalize your data
• No transactions
• Row Oriented ( mostly)
• Fixed schema
• Not optimized for sparse
tables
• SQL
• Normalize as you can
• Transactional
Related Chapters
• Big Data
• Data Modelling
References
• http://ofps.oreilly.com/titles/9781449396107/intro.html
• http://wiki.apache.org/hadoop/Hbase/DataModel
• http://www.slideshare.net/amansk/hbase-hadoop-dayseattle-4987041
• http://static.googleusercontent.com/external_content/untrust
ed_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf
Download