HBase Presented by Chintamani Siddeshwar Swathi Selvavinayakam http://www.slideshare.net/amansk/hbase-hadoop-day-seattle-4987041 HBase • • • • Open source BigTable HDFS as underlying DFS ZooKeeper as lock service Tight integration with Hadoop MapReduce Why HBase ? • Scales out to thousands of nodes • Access granularity is a row – read/write to a single row is atomic • Designed for workloads consisting of simple operations on individual items • Provides efficient access to random rows • Allows dynamic repartitioning of data Data Model • • • • • • • • Sparse Distributed multi dimensional persistent Sorted map (row, column, timestamp) -> cell Column = Column Family : Column Qualifier System Structure Other Features • • • • • • • Compression In memory column families Multiple masters Rolling restart Bloom filters Efficient bulk loads Source and sink for Hive, Pig, Cascading Use Cases • • • • • Mozilla Yahoo! Twitter Facebook Adobe HBase v/s RDBMS • Column Oriented • Flexible schema, add columns on the fly • Good with sparse tables • No query language • De-normalize your data • No transactions • Row Oriented ( mostly) • Fixed schema • Not optimized for sparse tables • SQL • Normalize as you can • Transactional Related Chapters • Big Data • Data Modelling References • http://ofps.oreilly.com/titles/9781449396107/intro.html • http://wiki.apache.org/hadoop/Hbase/DataModel • http://www.slideshare.net/amansk/hbase-hadoop-dayseattle-4987041 • http://static.googleusercontent.com/external_content/untrust ed_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf