Hive-HBase-In-BI.ppt

advertisement
Hive and HBase in BigInsights 2.1
Richard Ding
BigInsights Development
sding@us.ibm.com
Agenda
 SQL and NoSQL for Hadoop
 Hive Overview
 HBase Overview
 HBase Backup/Restore (Jing Chen He)
2
© 2013 IBM Corporation
SQL and NoSQL for Hadoop
 Hadoop is designed to store and stream extremely large datasets in
batch. It is highly scalable, and highly available. But
 MapReduce is difficult to use
– Java API is tedious and requires programming expertise
– Many different file formats, storage mechanisms, configuration options, etc.
 MapReduce is for batch operations
– Not intended for realtime querying
– Not support random access
– Not handle billions of small file well
 Hive and HBase are two popular open source projects addressing
above issues
– BigInsights 2.1: Hive 0.9.0+ and HBase 0.94.3+
3
© 2013 IBM Corporation
SQL-on-Hadoop
 Query the data where it resides – in HDFS or HBase
 Standards-compliant SQL interface
 Big SQL: SQL-on-Hadoop solution from BigInsights
4
© 2013 IBM Corporation
What is Hive?
 Hive provides a SQL interface for data stored in Hadoop
 Supports a wide variety of Hadoop data:
– Many different file formats and data sources (e.g. HBase)
– Many different data representations (encodings)
– Provides API to define your own
 Hive catalog ("metastore") maps file structure to a tabular form
 Hive DDL populates the catalogs
– Existing data can be described
– Empty "tables" can be defined and populated via DML
 Hive DML statements to bulk load tables
 Provides a sub-set of SQL SELECT for querying
– SQL is translated to one or more MapReduce job(s) for execution
5
© 2013 IBM Corporation
But, Hive
 Is not a real-time processing system
– Batch jobs for both loads and queries
– Responses not in (sub) seconds
 Has only limited SQL support
– Not SQL92 compliant
– No random updates and inserts
 Query optimization still a work in progress
– Compare to traditional RDBM
6
© 2013 IBM Corporation
Hive Components
Hive Eclipse
Plugin
BigInsights Interfaces
Client Interfaces
(remote)
Web Browser
Thrift Client
Hive
Application
JDBC
ODBC
hive>
Query Execution
Metadata
7
Hive Web Interface
Hive Server
CLI
Metastore
Hive Metastore
Driver
JobConf Config
© 2013 IBM Corporation
Data Model
 Tables
– Typed columns (int, float, string, boolean, binary, timestamp)
– Complex types (struct, map, array)
 Partitions
– A partition is a virtual column which defines how data is stored in DFS
based on its values. Each table can have one or more partitions (and
one or more levels of partition)
 Buckets
– In each partition, data can be divided into buckets based on the hash
value of a column in the table (useful for sampling, join optimization)
8
© 2013 IBM Corporation
Physical Layout
 Warehouse directory in DFS
– Specified by “hive.metastore.warehouse.dir” in hive-site.xml
– /biginsights/hive/warehouse the default location for BigInsights
 One can think tables, partitions and buckets as directories,
subdirectories and files respectively
Hive Entity
9
Sample
Sample location in DSF
database
testdb
$WH/testdb
table
T
$WH[/testdb]/T
partition
date=‘01012013’
$WH/T/data=01012013
bucketing
column
userid
$WH/T/data=01012013/000000_0
……
$WH/T/data=01012013/000032_0
© 2013 IBM Corporation
File Format
 Actual data stored in flat files on DFS
– Control char delimited text file (default)
– Hadoop SequenceFile
– RCFile (Record Columnar File)
 Also support custom Input/OutputFormat or custom Serde
(Serializer/Deserializer) to use any format
10
© 2013 IBM Corporation
Create Table
CREATE TABLE view_page (
view_time INT,
user_id BIGINT,
page_url STRING,
ip STRING COMMENT ‘IP Address of the User’)
COMMENT ‘This is the page view table’
PARTITIONED BY (dt STRING)
CLUSTERED BY (user_id) SORTED BY (page_url) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
STORED AS RCFILE;
11
© 2013 IBM Corporation
Create Table Command
 PARTITION BY clause defines the virtual columns which are different
from the data columns and are actually not stored with the data
 CLUSTERED BY clause specifies which column to use for bucketing
as well as how many buckets to create
 ROW FORMAT DELIMITED clause specifies how the rows are
formatted, i.e. which character will be the delimiter.
 STORED AS RCFILE indicates that the data is stored in a binary
format (RCFILE format) on DFS
 COMMENTS can be attached both at the table level as well as at the
column level
 These are all optional. If not specified, the default value will be used
12
© 2013 IBM Corporation
HBase Overview
 An open source, distributed, scalable, NoSQL datastore
 Based on Google’s Bigtable paper [2006]
 Implemented as a sparse, consistent, distributed, multi-dimensional,
persistent, sorted map
 Fault-tolerant, scale horizontally, high performance
13
© 2013 IBM Corporation
HBase Advantage
 Highly Scalable
– Automatic partitioning
– Scale linearly and automatically with new nodes
 Low Latency
– Support random read/write, small range scan
 Highly Available
 Strong Consistency
 Very good for “sparse data” (no fixed columns)
14
© 2013 IBM Corporation
But HBase is not RDBMS
 No secondary indexes (row-key only)
 No multi-row transactions (single row only)
 No SQL interface (get/put/scan/etc)
 No query optimizer
 Can take lots of disk-space
– It can be very verbose
– There is no schema
– 3x replication on DFS
15
© 2013 IBM Corporation
When to use HBase?
 Large amounts of data (100s of GBs up to Petabytes)
 Need efficient random access inside large datasets
 Need to scale gracefully
 Do not need full RDBMS capabilities
 Relative simple and fixed access pattern
16
© 2013 IBM Corporation
Data Model
 “...a sparse, distributed, persistent, multi-dimensional sorted map.
The map is indexed by a row key, column key, and a timestamp; each
value in the map is an uninterrupted array of bytes” – Google
Bigtable paper
 (row key, column key, timestamp)  value
 Table schema only define column families
– Can have large, variable number of columns per row
 Row stored in sorted order by the row keys
– Row keys are byte arrays. Rows are lexicographically sorted by row keys
 Each cell value has a version
– Timestamp
 A {row, column, version} tuple exactly specifies a cell
17
© 2013 IBM Corporation
Column Family
 “Column keys are grouped into sets called column families, which
form the basic unit of access control.” – Google Bigtable paper
 Basic storage unit. Columns in the same family should have similar
properties and access patterns
 Configurable by column family
– Compression (none, Gzip, LZO, SNAPPY)
– Version retention policies
 A column is named using the following syntax: family:qualier
18
© 2013 IBM Corporation
Data Model, Cont
 Column family as storage unit
 Cells are first sorted by row keys, then by column keys, finally by
timestamps
 Good for “sparse data” since non-exist column is just ignored
 But, simple translation from a RDBM table to a HBase table can take
a lot more storage space
 Update a column is just to add a new version
 Delete a column/row is just to add a new version with a “delete”
marker (called tombstone)
19
© 2013 IBM Corporation
HBase Shell
 HBase shell is JRuby IRB (the JRuby implementation of Interactive
Ruby Shell) with some HBase-specific commands added
 Running the shell:
$ cd /opt/ibm/biginsights/hbase
$ bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.3, r479dfe9d8f840afa063e7a61c6d073f0a57ba423, Wed Apr 24
03:11:01 PDT 2013
hbase(main):001:0>
20
© 2013 IBM Corporation
Shell: Create Table
hbase(main):002:0> create 'usertable', 'family'
0 row(s) in 1.0220 seconds
hbase(main):003:0> describe 'usertable'
DESCRIPTION
{NAME => 'usertable', FAMILIES => [{NAME => 'family',
REPLICATION_SCOPE => '0', KEEP_DELETED_CELLS => 'false',
COMPRESSION => 'NONE', ENCODE_ON_DISK => true
ENABLED => 'true', BLOCKCACHE => 'true', MIN_VERSIONS => '0',
DATA_BLOCK_ENCODING => 'NONE', IN_MEMORY => 'false',
BLOOMFILTER => 'NONE', TTL => '2147483647',
VERSIONS => '3', BLOCKSIZE => '65536'}]}
1 row(s) in 0.0170 seconds
hbase(main):004:0> disable 'usertable'
0 row(s) in 5.0300 seconds
hbase(main):005:0> drop 'usertable'
0 row(s) in 1.2030 seconds
21
© 2013 IBM Corporation
HBase Architecture
ZooKeeper Quorum
Client
Master Server
ZK Peer
……
ZK Peer
Region Server
Region Server
Master Server
……
Region Server
DFS
22
© 2013 IBM Corporation
High Level Overview
 Zookeeper provides coordination service
 Client finds region server via ZK
 Client writes/reads directly to/from the region server
 Master assigns regions and load balancing
 Region servers send heartbeats to the ZK
 Master monitors ZK for failed region servers
23
© 2013 IBM Corporation
HBase in ZooKeeper
$ bin/hbase zkcli
[zk: <hostname>:2181(CONNECTED) 0] ls /hbase
[root-region-server, rs, master, hbaseid, shutdown, backup-masters, unassigned,
draining, splitlog, table]
znode
24
Descriptions
root-region-server
location of server hosting root region
rs
ephemeral nodes of the regionservers
draining
ephemeral nodes of the draining regionservers
master
the currently active master
shutdown*
the current cluster state
unassigned*
used for region transitioning and assignment
splitlog
used for log splitting work assignment
table
used for table disabling/enabling
© 2013 IBM Corporation
Client
 HBase client (HTable) first connects to ZK using the configuration
parameters in hbase-site.xml file:
<name>hbase.zookeeper.quorum</name>
<value>comma-delimited host names</value>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
 Client then finds the region server that serves the specific region by
querying the .META. And –ROOT- catalog tables (use root-regionserver in ZK):
HBaseConfiguration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, “myTable”);
 The region information is cached in the client so that subsequent
requests need not go through the lookup process until it becomes
stale
 Client writes/reads directly to/from the region server
25
© 2013 IBM Corporation
Important Client Configurations
Configuration
Descriptions
Auto Flush
When performing a lot of Puts, make sure that setAutoFlush is set
to false on your HTable instance (default is true)
Deferred Log Flush
If deferred log flush is used, WAL edits are kept in memory until the
flush period. Deferred log flush can be configured on tables via
HTableDescriptor (or Shell command)
hbase.client.write.buffer
Default: 2 MB
hbase.client.scanner.caching
Number of rows that will be fetched when calling next on a scanner
Default: 1, recommended: 100
hbase.rpc.timeout
RPC timeout value. Default: 60 secs
Turn off WAL on Puts
Call writeToWAL(false) to increase throughput on Puts (may cause
data loss during a RS failure)
26
© 2013 IBM Corporation
Master
 Monitor all region server instances in the cluster
 Initialize region server failover
 Perform all metadata changes (e.g. create table)
 Manage region assignment
 Background services:
– LoadBalancer (move regions to balance the cluster load)
– CatalogJanitor (check and clean up the .META. Table)
– LogCleaner (clear the HLogs in the old logs directory)
27
© 2013 IBM Corporation
Backup Masters
 During BigInsights installation, you can configure two or more
masters
 When a master starts up, it races with other masters to write its
address into ZooKeeper. If it succeeds, it is the primary/active
master.
 If it does not succeed, there is another active master and it becomes
a backup master
 A backup master wait until it dies to try and become the next active
master
28
© 2013 IBM Corporation
Row Key Design
 Row key design is the most important factor
 Keep good data locality
 Know your access pattern
 Use a key structure that yields good locality for your access pattern
 Keep the key compact
 Avoid “hotspot” region
29
© 2013 IBM Corporation
Region Server
Write-Ahead-Log
Region
30
Store
StoreFile
MemStore
……
……
……
HFile
© 2013 IBM Corporation
Region
 A region is an horizontal partition of a table with a start row and an
end row
 Regions are the basic element of availability and distribution for
tables
 A region is automatically split by the hosting region server when it
reaches a specified size
 Periodically, a load balancer will move regions within the cluster to
balance the load
 When a region server fails, its regions will be reassigned to other
region servers
31
© 2013 IBM Corporation
Region Server Components
 Region server makes a set of regions available to clients. It checks in
with the Master
 WAL stores all the edits to the Store. There is one WAL per region
server. All edits for all regions carried by a particular region server
are entered first in the WAL
 Region stores data for a certain region of a table. There are multiple
stores for a single region
 A Store holds a column family in a region. It has a memstore and a
set of zero or more HFiles
32
© 2013 IBM Corporation
API - Filters
 Predicate pushdown, all filters are applied on the server side
 Examples:
Filter
33
Descriptions
PrefixFilter
A filter that will only return rows with the same row prefix
KeyOnlyFilter
A filter that will only return the key component of each KV (the
value will be rewritten as empty)
FirstKeyOnlyFilter
A filter that will only return the first KV from each row
FuzzyRowFilter
Filters data based on fuzzy row key, i.e. fuzzy info tells the the
matching mask is "????_99_????_01", where at ? can be any
value
RowFilter
This filter is used to filter based on the key. It takes an operator
(equal, greater, not equal, etc) and a byte [] comparator for the
row, and column qualifier portions of a key.
TimestampsFilter
Filter that returns only cells whose timestamp (version) is
in the specified list of timestamps (versions)
© 2013 IBM Corporation
Coprocessor
 Inspired by Google’s BigTable coprocessors
 A framework that provides a library and runtime environment for
executing user code within the HBase region server and master
processes
 Observer coprocessor (“trigger”)
– MasterObserver (preCreateTable, postCreateTable, …)
– RegionObserver (preGet, postGet, prePut, postPut, …)
– WALObserver (preWALWrite, postWALWrite)
 Load coprocessors from configuration
–
–
–
–
34
hbase.coprocessor.master.classes
hbase.coprocessor.region.classes
hbase.coprocessor.user.region.classes
hbase.coprocessor.wal.classes
© 2013 IBM Corporation
Coprocessor, cont
 Usage: Access Control
– <property> <name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
– <property> <name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.token.TokenProvider,
org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
 Usage: Secondary Index
– No built-in implementation yet
– IBM Watson Lab developed a secondary index coprocessor
hbase(main):005:0> alter ‘test', METHOD => 'table_att', 'coprocessor'=>
'hdfs://myserver.ibm.com:9010/index-coprocessor0.6.0.jar|org.apache.hadoop.hbase.coprocessor.index.SyncSecondaryIndexObser
ver|1001|arg1=1,arg2=2 '
35
© 2013 IBM Corporation
Coprocessor, cont
 Endpoint Coprocessor (“stored procedure”)
– Implementation is installed on the server side
– Invoked from client side using dynamic proxy
36
© 2013 IBM Corporation
Questions
37
© 2013 IBM Corporation
Download