Benchmarking Not-Only-SQL DataBases Over High Performance

advertisement
NoSQL DB Benchmarking with high performance
Networking solutions
WBDB, Xian, July 2013
© 2013 Mellanox Technologies
1
Leading Supplier of End-to-End Interconnect Solutions
Server / Compute
Storage
Switch / Gateway
Front / Back-End
Virtual Protocol Interconnect
Virtual Protocol Interconnect
56G IB & FCoIB
56G InfiniBand
10/40/56GbE & FCoE
10/40/56GbE
Fibre Channel
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
ICs
© 2013 Mellanox Technologies
Adapter Cards
Switches/Gateways
Host/Fabric Software
Cables
2
Motivation to Accelerate Data Analytics
 Data Analysis Requires Faster Network
• Hadoop Map Reduce Framework is a network
intensive workload
- Mapped data is shuffled between nodes in the cluster
• Data Replication
- A high availability event triggers Multi-Tera of data
movement
 Provide Higher Data Value
• Expose SSD’s low latency capabilities
• Better server/CPU utilization
Big Data Applications Require High Bandwidth and Low Latency Interconnect
* Data Source: Intersect360 Research, 2012, IT and Data scientists survey
© 2013 Mellanox Technologies
3
Cassandra, Update Latency
 Cassandra Database enables update capabilities
 Latency factors
• Commit-log settings
• Workload
© 2013 Mellanox Technologies
4
Cassandra, Read Latency
 Cassandra Database Read
 Latency factors
• Media used
• Workload
© 2013 Mellanox Technologies
5
System Used for Cassandra Benchmark
 5 Nodes in the Ring
 64GB RAM
• 8 x 8GB DDR3 1333MHz
 2 x E5-2670
• 8 Cores per socket
 5 x Seagate® Constellation® ES SATA 6Gb/s 2TB Hard Drive
• 7200 RPM
 NIC: Mellanox Technologies MT27500 Family [ConnectX-3]
• 10Gb Ethernet
• FW_VER=2.11.500
 Switch SX1036
 OS: RH 6.3
• MLNX_OFED_LINUX-1.5.3
 Apache Cassandra 1.1.12, 2 seeds
© 2013 Mellanox Technologies
6
Unlocking the Power of SSDs In Hadoop Environment
 SSDs Become De-Facto standard in HDFS deployment
• Read capability is a critical factor for application performance
 E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster
• 1GbE network impede any performance benefit from SSD deployment
E-DFSIO, Showing the Power of SSD @ HDFS
© 2013 Mellanox Technologies
7
HBase Benchmarking, Update Latency
 Updates are made to server memory
• Extreme low latency for HBase
- Java GC policy hurting on large throughput
© 2013 Mellanox Technologies
8
HBase Benchmarking, Read Latency
 Hitting the media capabilities
© 2013 Mellanox Technologies
9
System Used for HBase Benchmarks
 4 Region servers, 1 Master, 3 Zookeeper quorum servers
 64GB RAM
• 8 x 8GB DDR3 1333MHz
 2 x E5-2670
• 8 Cores per socket
 5 x Seagate® Constellation® ES SATA 6Gb/s 2TB Hard Drive
• 7200 RPM
 NIC: Mellanox Technologies MT27500 Family [ConnectX-3]
• 10Gb Ethernet
• FW_VER=2.11.500
 Switch SX1036
 OS: RH 6.3
• MLNX_OFED_LINUX-1.5.3
 Apache Hbase 0.94.9, Zookeeper 3.4.5, Apache Hadoop 1.1.2
© 2013 Mellanox Technologies
10
Test Drive Your Big Data




EMC 1000-Node Analytic Platform
Accelerates Industry's Hadoop Development
24 PetaByte of physical storage
Mellanox VPI Solutions
Hadoop
Acceleration
2X Faster Hadoop Job Run-Time
High Throughput, Low Latency, RDMA Critical for ROI
© 2013 Mellanox Technologies
11
The Great Things in Hadoop Distributed File System
•
•
•
•
HDFS is a block storage solution
Block size can be modified to provide efficient solutions for very large files
Inherent reliability, no need for high end storage solution to make sure data is there!
Tuned for Hadoop work loads, write one and read many
© 2013 Mellanox Technologies
12
The Less Great Things in HDFS
Metadata Server Failure
It’s hard to manage
the different setting
to get the right nodes
into the right capabilities.
© 2013 Mellanox Technologies
Default 3x Replication
Small files or latency sensitive
Ingress and extraction
of data requires
additional tools.
13
Local Disks – The Common Practice
© 2013 Mellanox Technologies
14
Other Distributed Storage Solution for Hadoop, Really?!
© 2013 Mellanox Technologies
15
OrangeFS as Hadoop Storage Solution
© 2013 Mellanox Technologies
16
Lustre as Hadoop Storage Solution
Source: Map/Reduce on Lustre, Hadoop Performance in HPC Environments, Nathan Rutman, Senior Architect, Networked Storage Solutions, Xyratex
© 2013 Mellanox Technologies
17
CEPH as Hadoop Storage Solution
 Generating lot of Interest since the Ceph kernel client was pulled into Linux kernel 2.6.34
•
•
•
•
•
Object-based parallel file system
Scalable metadata server
Each file can specify it’s own striping strategy and object size
Automatic rebalancing of data with minimal data movement
Hadoop module for integrating Ceph has been in development since 0.12 release
 Benchmarks on Ceph is still WIP
• We are currently working on using running benchmarks on Ceph – Stay tuned!!
© 2013 Mellanox Technologies
18
Thank You
© 2013 Mellanox Technologies
19
Download