Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and analytics become the model, getting the data you need ready is the biggest challenge in any analytical exercise. Gathering the data, identifying duplicate data or blank fields, fixing misspellings, splitting columns, and adding data are difficult and time-consuming tasks. Time wasted on data preparation is time that could be spent on analysis. Cisco Data Preparation is a self-service application that makes it easy for nontechnical business analysts to gather, explore, cleanse, combine and enrich the data that fuels analytics tools like Excel, Tableau, Qlik, SAS, and more. Cisco Data Preparation: • Is a comprehensive data preparation solution that provides all essential data preparation functions from any data source, to any analytic or BI tools, with built-in goverance. • Works the way business analysts work, allowing data exploration with immediate feedback in an experience similar to Excel, without coding or scripting. • Automates the difficult, time-consuming work required by proactively guiding actions via intelligence that improves based on use. Legacy extract, load, and transform (ELT) processes are slow and put additional burden on an already-backlogged IT department. Complex tools require expertise, and basic tools like Excel lack features and don’t scale. Cisco Data Preparation lets business analysts get answers faster, provides more comprehensive insights, and delivers better business outcomes across hundreds of projects and thousands of users. Features Cisco Data Preparation is a self-service application that makes it easy for non-technical business analysts to gather, explore, cleanse, combine, and enrich the raw data that fuels analytics. Table 1 lists the features of Cisco Data Preparation. © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 5 Table 1. Data Preparation Features Feature Description Data Gather data regardless of where it comes from (Hadoop Distributed File System (HDFS), relational databases, Excel, flat files) Explore Find quality issues by engaging in ad hoc interactive exploration with full text search, interactive text and numeric filters and histograms, and visual data quality heat maps that highlight patterns, errors, duplicates, and sparse or missing data. One of the fastest ways to do this is with the aggregation feature. Clean Runs a set of sophisticated algorithms across specific sections of the data or across entire data sets. Without any coding or scripting, Data Preparation then highlights inconsistencies, gaps, and duplicate data so that analysts can fill in blanks, remove or rename duplicates, fix inconsistent capitalization, and other tasks needed to improve the data. Shape Pivot or de-pivot data in a single click; quickly split columns and create aggregations to make the data sets more suitable for the required analytic exercise. Enrich Provides the contexts needed for an analytic exercise. For example, industry data, appended 5-digit zip codes with +4, or additional information from third-party data providers can be included. Combine Combines data using machine learning. Automatically detects common attributes across multiple data sets, and then provides best-match options to the analyst, who chooses which combination to use for their analytic needs. With one click, analysts can assemble multiple data sets into a single answer set, and then merge multiple overlapping entity references into de-duplicated, trusted entities without any scripting, SQL, or complex Excel functionality like VLOOKUPS, pivot tables, and macros. Publish Makes answer sets available directly through ODBC LiveQuery to Qlik, Tableau, Excel, and any other ODBC-compliant analytics tools or applications. Data Preparation with Data Virtualization Combining Cisco Data Preparation and Cisco Data Virtualization accelerates time-to-analytic solutions. Selfservice data preparation tools, combined with IT-curated data access using Data Virtualziation, provide your business with the data and agility you need. This closed-loop data management process aligns business and IT; business gets the data and agility they need, and IT delivers on the governance, scalability, and control they require. Table 2 lists the Data Virtualization integration areas. Table 2. Data Virtualization Integration Areas Integration Area Data Virtualization data sources Data Virtualization deployments Description ● Find Data in Business Directory – Connect to business directory, which contains curated data from one or more instances of Cisco Information Server. The curated data has been vetted by IT, annotated by endusers, and gains value from repeated use. ● Find Data in Cisco Information Server (CIS) – Connect to CIS and gain access to a broader range of virtualized data. The virtualized data is integrated from numerous sources, ranging from databases to packaged apps to cloud sources. ● Ingest Data via Cisco Information Server – Use CIS to load Data Preparation. Integrate any combination of virtual and physical data quickly and load into Data Preparation to refine data for analytics. Promote answer sets into CIS – Data sets prepared by business users in Data Preparation can be further operationalized to CIS and business directory. This allows wider adoption and consumption of Data Preparation output. Technology Cisco Data Preparation runs on an enterprise-scale platform built on Hadoop and powered by Spark. It is built on a four-layer architecture designed for interactive, self-service data preparation at scale. (See Figure 1.) 1. User interface layer: Analysts quickly learn and enjoy using the Data Preparation’s visually dynamic, multi-user interface designed using HTML5 and web socket technology, making it an interactive and intuitive application. 2. Web services: A lightweight Java layer translates and mediates actions from the user interface into commands to the underlying platform layer. This layer processes critical capabilities for rules for tenants, users, projects, and cell-level modifications, creating a comprehensive governance foundation. © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 2 of 5 3. Engines: Enabled by proprietary machine learning, latent semantic indexing, statistical pattern recognition, and text analytics techniques. The first engine has parallel in-memory pipelined capabilities that vastly acceleratemany of the mundane data prepration functions. The second engine leverages Spark, and operates over a large variety and volumes of structured and unstructured data in real-time, enabling Cisco Data Preparation to scale to billions of rows. 4. File management and storage: Provides a cost effective data management environment. Data sets are stored and accessed through the library, which resides on top of HDFS. Figure 1. Cisco Data Preparation Architecture Data Preparation on Cisco UCS Big Data Infrastructure ® Data Preparation installed on Cisco UCS scales without limits by taking advantage of Cisco’s high-performance and easy-to-manage big data infrastructure. Cisco UCS provides a radically simplified architecture with embedded management that makes it easy to scale as your requirements evolve to solve larger problems and explore more complex scenarios. It also reduces your total cost of ownership (TCO) by requiring fewer infrastructure components and reducing operating expenses associated with staff resources. Together you can solve complex analytical problems, improve business performance, and mitigate risk rapidly and confidently. The recommended configuration for the Cisco Data Preparation platform deployment is based on Cisco UCS C220-M4/C240 M4, with: ● Two Intel Xeon E5-2680 v3 processors ● 256GB RAM ● 10K RPM SAS HDD or SSD drives, which work with an external Hadoop cluster for data storage Table 3 lists the benefits of using Cisco UCS. © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 3 of 5 Table 3. Cisco UCS Highlights Highlights Benefits Reliable scalability Cisco Unified Computing System™ delivers reliable scalability of hardware and management to increase business agility, operational efficiency, and help you rapidly respond to changing business requirements. Reduced TCO and improved staff efficiency This simplified, intelligent infrastructure reduces your TCO with fewer management points, switches, adapters, cables, and power and cooling components. Data preparation on Cisco UCS Cisco Data Preparation on Cisco UCS streamlines customers’ ability to prepare their data for analytics at scale, and can be seamlessly integrated into existing enterprise applications environments. Service and Support Cisco Services help you gain better visibility, better information, and better understanding to fuel performance, efficiency, and innovation from your software purchases. Cisco Services span three phases of lifecycle management: plan, build, and manage. In the plan phase, Cisco assists you to develop your Cisco Data Preparation architecture strategy and transformational roadmap in alignment with your business requirements. In the build phase, Cisco works with you to validate that the Data Preparation solution you designed are ready for your production and then implements, integrates, or migrates new solutions and applications. In the manage phase, Cisco assists you to optimize your infrastructure, applications, and service management approach, and monitors and manages your Data Preparation deployment. Technical support is part of services provided during the manage phase, which delivers around-the-clock Data Preparation product support from Cisco’s Technical Assistance Center (TAC). It also provides timely, uninterrupted access to Cisco’s latest software application updates, including major upgrade releases that might include new features and functionality System Requirements Cisco Data Preparation has the following system requirements: Operating System ● 64-bit (x64) operating system ● CentOS Linux, v6.4 and 6.5 for development and testing Software ● JDK 7 version 1.7 update 67 ● Spark 1.3 (prebuilt for CDH 5) ● Cloudera Distribution of Hadoop (CDH) 4.7 and 5.4 ● Apache Spark 1.3 (prebuilt for CDH 5) Others Cisco Information Server 7.0.2 or later Ordering Information Cisco Data Preparation is available for ordering. Table 4 lists the product identifiers required for ordering. To place an order, contact your Cisco account representative. © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 4 of 5 Table 4. Ordering Information PID Product Description CDP-P-T Data Prep – per core term CDP-P-1Y Data Prep – per core term 1 yr CDP-P-2Y Data Prep – per core term 2 yr CDP-P-3Y Data Prep – per core term 2 yr For More Information For more information about Cisco Data Preparation, contact your Cisco account representative. Printed in USA © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. 09/15 Page 5 of 5