DATASHEET BASHO DATA PLATFORM Basho Data Platform for High Availability and Simplified Operations Businesses want to use Big Data, the Internet of Things (IoT), and hybrid cloud applications to their competitive advantage. However, the components required to make these applications work can be very complex. Applications must be easy to manage while keeping the data highly available and massively scalable. Basho Data Platform provides a distributed, scalable, fault-tolerant framework and resource manager for integrating databases and other key components of Big Data applications. It simplifies the process of integrating and deploying Big Data, IoT, and hybrid cloud applications by providing a complete set of data services. Basho Data Platform Core Services deploy, manage, and synchronize data in and between Storage Instances (Riak® KV, Riak® S2) and Service Instances (Apache™ Spark, Redis, Apache™ Solr). Basho specializes in solving distributed systems challenges, and integrated approaches such as Basho Data Platform help ensure that applications are highly available, massively scalable, and easy to deploy at production scale. – Mac Devine, VP and CTO IBM Cloud Services at IBM Distributed Big Data and IoT Application Challenges Basho Data Platform Benefits REDUCED COMPLEXITY Integrated NoSQL databases, caching, inmemory analytics, and search components HIGH AVAILABILITY Fault tolerance across Riak KV, Apache Spark, and Redis clusters. SCALABILITY Auto-scale clusters as data grows REAL-TIME ANALYTICS Integrated Apache Spark and Riak KV FAST APPLICATION PERFORMANCE Integrated Redis caching and Riak KV OPTIMIZED SEARCH Integrated Apache Solr and Riak KV DATA SYNCHRONIZATION Replicate and synchronize data within and across Riak KV, Apache Spark, and Redis OPERATIONAL SIMPLICITY Automated deployment and cluster management for Riak KV, Riak S2, Apache Spark, and Redis. Modern applications have to serve customers located around the globe who want immediate access to information, no matter where they are or what device they are using. Ensuring that enterprise applications are available, scalable, and fast requires a complex technology stack. Here are some of the application challenges: COMPLEX INFRASTRUCTURE Globally available Big Data applications require multiple databases along with multiple service components, none of which easily integrate with the other. Companies require additional, specialized staff to manage the various components. BASHO TECHNOLOGIES, INC. // WWW.BASHO.COM 1 DATASHEET: basho data platform DOWN TIME Systems fail. When a cluster goes down, operation teams must scramble to get applications back up and running. Every second of downtime leads to lost sales and unhappy customers. SCALING Traditional relational databases are often manually sharded to accommodate growth. As the databases grow additional hardware capacity is required. Both of these are time-consuming and cumbersome processes that drain resources. Basho Data Platform Solves These Challenges Basho Data Platform addresses these challenges by providing a new way to build, deploy, and manage enterprise applications. Introducing the Basho Data Platform Basho Data Platform takes the complexity out of building and deploying active workloads in Big Data, IoT, and hybrid cloud applications by controlling the replication and synchronization of data between components. It provides cluster management ensuring your application is highly available and scalable. Basho Data Platform supports multiple database models and tightly integrates Riak KV with Apache Spark, Redis, and Apache Solr. Riak KV – Distributed NoSQL database A key/value data store that is highly available, scalable, and easy to operate, Riak KV is designed to intelligently replicate and retrieve data so your Big Data, IoT, or hybrid cloud application is always available. Its scale-out architecture lets you add capacity seamlessly using commodity hardware for near-linear performance improvement. With global replication and masterless architecture, Riak KV serves data quickly and predictably even under peak load conditions. Multi-Cluster Replication makes it easy to replicate clusters across your datacenter or around the world enabling data geolocation, secondary analytics, or business continuity. High Availability — replicates and retrieves data intelligently, making it available for read and write operations even in failure conditions. Scalability — automatically distributes data around the cluster and yields a near-linear performance increase as capacity is added. Operational Simplicity — add machines to the cluster easily, without a large operational burden. Multi-Model — includes key/value, search, and object storage in a single platform to support multiple types of data models. Riak KV Data Types — conflict-free replicated data types (CRDTs): flags, registers, counters, sets, and maps. You don’t have to write code to deal with data conflicts. Multi-Cluster Replication — Serve global traffic, maintain active backups, run secondary analytics clusters, or meet disaster recovery and regulatory requirements. Monitoring — supports both SNMP, shipping with an SNMP server built in, and JMX monitoring. Low Latency — store data and serve requests predictably and quickly, even during peak times. Fault Tolerance — lose access to nodes due to network partition or hardware failure and never lose data. Robust APIs and Client Libraries — PBC and HTTP APIs provide developers flexibility to build in their preferred language. Supported languages include: Java, Ruby, Python, C#, Erlang, .NET, and Node.js. BASHO TECHNOLOGIES, INC. // WWW.BASHO.COM 2 DATASHEET: basho data platform Apache Spark Add-On — In-memory Analytics The Apache Spark Add-On is architected for high performance, real-time analysis, and persistence of your data. Cluster Management — No need for Zookeeper. Manage Spark clusters at scale using built-in leader election enabled by the Spark Connector. Performance at Scale — Apache Spark Add-On is architected for high performance, real-time analysis and Riak KV persistence of Big Data. Fast Data Mover — Intelligently load data into Spark clusters to minimize network traffic and processing overhead. Automated Deployment — Quickly deploy and configure Spark clusters with Riak KV and auto-start failed Spark instances to reduce manual operations. Write-Back to Riak — Store intermediate and final results back into Riak KV for further processing by Spark or other Big Data application components. Application Simplicity — Integrate and update real-time analytics, caching and search technologies to simplify the design and operations of Big Data Applications. Redis Add-On — Redis Caching Increase application performance with Redis Caching. With Basho Data Proxy, Redis is now highly available. Combining the speed of Redis with the power of Riak KV provides low-latency, high performance at scale. High Availability — Using the Basho Data Proxy, the high performance caching capabilities of Redis become highly available. Fast Cache — Combining the speed of Redis with the power of Riak KV provides low-latency, high performance at scale. Automatic Sharding — Labor-intensive, error-prone manual sharding is a thing of the past with automatic data sharding across multiple cache servers. Automatic Data Synchronization — Data is automatically synchronized between Redis and Riak KV, and the Basho Redis Proxy resolves cache misses without requiring custom code to populate the cache. Automated Deployment and Management — Easily deploy and configure Redis instances with Riak KV, and auto-start failed instances or disable on failure to reduce manual processes. Application Simplicity — Integrate and update Redis caching, real-time analytics and search technologies to simplify your Big Data Application. Apache Solr Add-On — Riak Search Standard full-text Solr queries auto-expand into distributed search queries. Broad support is provided for Solr query parameters. Intelligent monitoring detects and propagates changes to Solr indexes. Use existing Solr APIs to query data in Riak KV. Ad-Hoc and Range Queries — exact match, globs, inclusive/exclusive range queries, AND/OR/NOT, prefix matching, proximity searches, term boosting, sorting, pagination, and more Index any Data Type — Support for various MIME types (JSON, XML, plain text, Riak Data Types) for automatic data extraction, along with support for custom Search extractors and support for various language-specific analyzers, tokenizers, and filters Familiar Interfaces — Protocol Buffer interface for Riak and Solr interface via HTTP Simplified Search — Scoring and ranking for most relevant results Familiar Languages — Robust, easy-to-use query languages like Lucene (default) and DisMax BASHO TECHNOLOGIES, INC. // WWW.BASHO.COM 3 DATASHEET: basho data platform Data Replication and Synchronization Internal Data Store Replicate and synchronize data across and between Riak KV and Spark, Redis, and Solr Service Instances to ensure data accuracy with no data loss and ensure high availability of your application. A built-in, distributed data store for ensuring speed, faulttolerance, and ease-of-operations is used to persist static and dynamic configuration data (port number and IP address) across the Basho Data Platform. Cluster Management & Monitoring Integrated cluster management automates deployment and configuration of Riak KV, Spark, and Redis. Once deployed in production, issues can be automatically detected, and Redis instances or Spark clusters can automatically restart. Cluster management eliminates the need for Zookeeper to manage Spark clusters. 24 / 7 CUSTOMER SUPPORT Basho offers 24 / 7 access to Basho’s Client Services team, including 1-hour response time for emergency production help. Basho’s support team has extensive experience with production installations and has worked on some of the largest Riak KV clusters in the world. Enterprise licensees have unlimited access to that experience and knowledge. Basho provides SLAs based upon the severity of the issue with 24x7 coverage. GET STARTED If you are interested in more information and would like to discuss your possible use case, please contact us at techtalk@basho.com. For technical documentation about Basho Data Platform and Basho products please visit our documentation portal at docs.basho.com. Case studies and white papers are available in our Resource Center basho.com/resources. About Basho Technologies Basho is a distributed systems company dedicated to developing disruptive technology that simplify enterprises’ most critical data management challenges. Basho has attracted one of the most talented groups of engineers and technical experts ever assembled devoted exclusively to solving some of the most complex issues presented by scaling distributed systems. Basho’s distributed database, Riak® KV, the industry leading distributed NoSQL database, and Basho’s cloud storage software, Riak® S2, are used by fast growing Web businesses and by one third of the Fortune 50 to power their critical Web, mobile and social applications. The Basho Data Platform helps enterprises reduce the complexity of supporting Big Data applications by integrating Riak KV and Riak S2 with Apache Spark, Redis, and Apache Solr. Basho is the organizer of RICON — a distributed systems conference. Riak is the registered trademark of Basho Technologies, inc. Basho Technologies, Inc 617.714.1700 // www.basho.com 10900 NE 8th Street Seattle, WA 98004