Wasef: Incorporating Metadata into NoSQL Storage Systems Ala’ Alkhaldi,

advertisement
Wasef: Incorporating
Metadata into NoSQL
Storage Systems
Ala’ Alkhaldi, Indranil Gupta,
Vaijayanth Raghavan, Mainak Ghosh
Department of Computer Science
University of Illinois, Urbana Champaign
Distributed Protocols Research Group: http://dprg.cs.uiuc.edu
1
NoSQL Storage Systems
• Growing quickly
• $3.4B industry by 2018
• Fast reads and writes
• Several orders of magnitude faster than MySQL and relational databases
• Easier to Manage
• Support CRUD Operations on Data (Create Read Update Delete)
• Many companies use them in running critical infrastructures
• Google, Facebook, Yahoo!, and many others
• Many open-source NoSQL databases
• Apache Cassandra, Riak, MongoDB, etc.
2
The Need for Metadata
• Though easier to manage than RDBMSs, there are still a lot of pain
points
• Today, System Administrators need to
• Parse flat files in system logs, E.g., if they want to debug behavior
• Manually count token ranges, E.g., during node decommissioning
• Many of these pain points could be alleviated if there were a
metadata system available
• Metadata can also provide new features not possible today
• E.g., data provenance
3
Metadata
• Metadata = Essential Information about a {system, table, row}, but
excluding the data itself
• E.g., for a table: columns, and history of past deleted columns
• We argue that metadata should be treated as a first-class citizen in
NoSQL storage systems
• We present the first metadata collection system for NoSQL Storage
Systems, called Wasef
• We integrate Wasef into Apache Cassandra, which is the most popular
NoSQL Storage System
• Our Metadata-enabled Cassandra is called W-Cassandra
• Available for free download at: http://dprg.cs.uiuc.edu/downloads
4
The Wasef System
• Wasef is a Metadata Management System for NoSQL data stores
• Wasef is guided by five design principles – it should:
1. Be able to store metadata cleanly
2. Enable Accessibility of Metadata via Clean APIs
3. Be modular, and integrated with underlying NoSQL functionality
• Do not change other data APIs
4. Provide Flexibility in Granularity at which Metadata is Collected
5. Be efficient and only collect the minimal metadata required
5
Wasef Architecture
• Registry = List of (object, operation)
pairs saying which operation
triggers metadata collection for
which object
• Log = The Metadata itself
• Need easy querying and
accessibility
• Stored as system tables
• where available from the
underlying NoSQL Store
• Use CRUD (from underlying
NoSQL) for metadata
• APIs provided to
• Clients
• Use cases
6
Wasef APIs
Internal API
• Registry.add(target, operation)
• Registry.delete(target, operation)
• Registry.query(target, operation)
• Log.add(target, operation, timestamp, value)
• “target”
• Name of database entity for which
metadata is being collected
• We use a systematic naming
convention using dotted notation
• Example:
<KeySpaceName.Table.RowID.Column>
• Log.delete(target, operation, startTime, endTime)
• Log.query(target, operation, startTime, endTime)
External API
• Wrappers around Internal API
• Convenience functions
• “operation”
• Operation, which when invoked by
any client, triggers collection of
metadata for this target
• Uses a systematic naming convention
• Examples:
• Column add, Row insert, Truncate
table
7
W-Cassandra: Incorporating Wasef into
Cassandra (v 1.2.x)
Supported metadata targets and operations
Target
Identifier
Operations
Collected Metadata
Schema
Name
Alter, Drop
Old and new names,
replication map
Table
Name
Alter, Drop, Truncate
Column family name, new
and old properties (e.g.
column names, types,..)
Row
Partitioning Keys
Insert, Update, Delete
Key names, affected
columns, TTL, ...
Column
Clustering keys
and column name
Insert, Update, Delete
Key names, affected
columns, TTL, …
Node
Node ID
On request
Token ranges
8
W-Cassandra: Registry Table
Takeaways
• Separate row for each object
• Stores all triggering operations for that object
•  Makes it easy to look up during an operation
Registry
School.Teacher
School.Teacher.John
Schema of “registry” table (in CQL)
create table registry(
target text,
operation text,
primary key( target,
operation ));
Partitioning Key Clustering Key
AlterCF_Add
Truncate
null
null
Delete_Row
Update_Row
Null
null
W-Cassandra: Log Table
Takeaways
• All metadata for a given object
stored as columns within one row
• Orders entries by time inserted
 Querying all metadata
for one object is fast
Schema of “log” table (in CQL)
create table log(
Log
School.Teacher
School.Teacher.John
AlterCF_Add-1509051314-admin
AlterCF_Add-2009051414-admin
{col_name:address, col_type:text,
compaction_class:
SizeTieredCompactionStrategy}
{col_name:mobile, col_type:text,
compression_sstable:
DefaultCompressor}
Update_Row-1510051314-admin
Update_Row-2010051414-admin
{col_name:address,
col_old_val:null,col_new_val:’
Urbana,IL’, ttl:432000}
{col_name:mobile,
col_old_val:null, col_new_val:’55555’,
ttl:432000}
target text,
operation text,
time long,
client text,
value text,
primary key(target, operation, time, client));
Partitioning Key
Clustering Key
10
Use Case 1: Flexible Column Drop
Cassandra JIRA Issue 3919
• When a column is deleted, its data doesn’t go away
• Re-adding a new empty column still leaves old data available
for querying!
Wasef allows us to address this JIRA issue, and build a new
flexible column drop feature
Flexible column drop feature akin to “Trash Bin” in OSs today
• When a column is dropped, it is no longer available for
querying
• However, column is not deleted immediately
• Sys admin has a grace period to “rescue” deleted column
• Or sys admin can explicitly deleted column for good
Original Schema
First
Column Drop
Add Column
Tentative Drop
(Delete Schema Only)
Second Column
Drop
Grace Period
Expires
Permanent Drop
(Delete schema and data)
11
Use Cases 2 and 3
• Use Case 2: Automated Node Decommissioning
• When a node is decommissioned, today sysadmin needs to manually check
ranges of tokens (keys)
• W-Cassandra automates this checking process
• Use Case 3: Data Provenance
• Today, NoSQL systems do not support tracking of provenance of data items
1.
2.
Where did this data item come from?
How was this data item generated/modified?
• Wasef tracks these two (for requested objects)
12
Evaluation on AWS: System Throughput
Setup
• AWS Cluster (6 machines)
• EC2 m1.large instances
• YCSB Heavy Workload from clients
• 12 GB of datadata
• 1M operation per run
• Plot shows maximum achievable throughput
Wasef lowers throughput
by only 9%
13
Latency Results
• Compared to Cassandra, Wasef:
• Affects read latency by only 3%
• Affects update latency by 15%
• Can be optimized further
• Latencies are not affected by metadata size (up to 8% of data)
14
Scalability With Cluster Size
Setup
• Increase cluster size from 2 to 10
servers
• Also proportionally increase dataset
size and client load
• {2GB data, 25 threads} per server
• Each point is the average of 1M
operations
Wasef’s overhead only
about 10% and rises slowly
with cluster size
15
Use Case: Column Drop
Setup
•
•
•
•
Customized client
4 nodes
8 GB Dataset
Each bar average of 500 drop operations
Dropping a column is 5% slower
(and is sometimes faster)
Note: The Wasef Implemenation is
correct, while Cassandra 1.2 is not
16
Summary
• Wasef is the first system to support metadata as first-class citizens for
NoSQL data stores
• Modular, flexible, queryable, minimally intrusive
• W-Cassandra
• We augmented Cassandra 1.2.x with Wasef
• Implemented 3 use-cases scenarios: Flexible Column Drop, Automated Node
Decommissioning, Data Provenance
• Performance
• Incurs low overheads on throughput and latency
• Scales well with cluster size, workload, data size, and metadata size
• Code is available for download at:
Distributed Protocols Research Group: http://dprg.cs.uiuc.edu
17
Backup Slides
18
Related Work
Wasef is not
1. Database catalog (Structural metadata)
• Describes database entities and the hierarchical relationships between them.
• Wasef collects descriptive and administrative metadata.
2. Zookeeper, Chubby, or Tango (Standalone metadata services)
• Wasef is a subsystem of the NoSQL datastore which collects metadata during system
operations.
3. Amazon S3, Azure Cloud Store, Google Cloud Data Store
• Metadata can be associated with the stored objects. However, Metadata is limited in
size (10s of KB) and Metadata operations are inflexible.
• Wasef treats metadata as any of the system data.
4. Trio: data provenance system for RDBMS
• Scalability is a big issue.
Collecting metadata in NoSQL data stores is a relatively new field
19
Use Case 2: Node Decommissioning
Setup
•
•
•
4 nodes
4 GB dataset
Token ranges per node increased from
64 - 256
The average overhead is 1.5%
Overhead smaller at larger
datasizes
20
Scalability With Metadata Size
Update and Read Latencies are Largely
Independent of Size of Metadata
21
2. Verification tool for node decommissioning
operation
Node decommissioning from cluster
nodetool decommission
• A critical operation when the replication factor is one
• Can not be verified in the standard version
How the tool works
• During node decommission: store the new
replicas for the token ranges in Log table.
Target: node IP.
Metadata: decommission
• To verify:
nodetool decommission -verify <decommission node IP>
Token ranges are retrieved from the log and checked for existence in the system
22
3. Providing Data Provenance
Data Provenance:
• The history of an item, which includes its source, derivation, and ownership.
• It increases the value of the item since it proves its authenticity and reproducibility
(e.g. documenting the workflow of a scientific experiement)
Wasef provides data provenance by design. It collects:
•
•
•
•
•
Target full name
operation name
Timestamp
The authenticated session owner name
The results ( depends on the operation)
Provenance data is treated like client data ( can be queried, searched,
replicated, ..)
Garbage collection is not supported
23
Experiments
• We modified Cassandra to incorporate Wasef
• We ran our system on AWS (Amazon Web Services)
• Settings
• EC2 (m1.large) Instances to evaluate our W-Cassandra System
• Each instance has 2 virtual CPUs (4 ECUs), 7.5 GB of RAM, and 480GB of ephemeral disk storage.
They run Ubuntu 12.04 64-bit.
• Workload: YCSB (Yahoo Cloud Serving Benchmark)
• Heavy workload (50% read, 50% update), zipfian distribution, client uses a separate machine.
24
Download