Cisco Data Preparation

advertisement
Data Sheet
Cisco Data Preparation
Unleash your business analysts to develop the insights that drive better business
outcomes, sooner, from all your data.
As self-service business intelligence (BI) and analytics become the model, getting the data you need ready is the
biggest challenge in any analytical exercise. Gathering the data, identifying duplicate data or blank fields, fixing
misspellings, splitting columns, and adding data are difficult and time-consuming tasks. Time wasted on data
preparation is time that could be spent on analysis.
Cisco Data Preparation is a self-service application that makes it easy for nontechnical business analysts to gather,
explore, cleanse, combine and enrich the data that fuels analytics tools like Excel, Tableau, Qlik, SAS, and more.
Cisco Data Preparation:
•
Is a comprehensive data preparation solution that provides all essential data preparation functions
from any data source, to any analytic or BI tools, with built-in goverance.
•
Works the way business analysts work, allowing data exploration with immediate feedback in an
experience similar to Excel, without coding or scripting.
•
Automates the difficult, time-consuming work required by proactively guiding actions via intelligence
that improves based on use.
Legacy extract, load, and transform (ELT) processes are slow and put additional burden on an already-backlogged
IT department. Complex tools require expertise, and basic tools like Excel lack features and don’t scale. Cisco
Data Preparation lets business analysts get answers faster, provides more comprehensive insights, and delivers
better business outcomes across hundreds of projects and thousands of users.
Features
Cisco Data Preparation is a self-service application that makes it easy for non-technical business analysts to
gather, explore, cleanse, combine, and enrich the raw data that fuels analytics. Table 1 lists the features of Cisco
Data Preparation.
© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 1 of 5
Table 1.
Data Preparation Features
Feature
Description
Data
Gather data regardless of where it comes from (Hadoop Distributed File System (HDFS), relational databases, Excel, flat
files)
Explore
Find quality issues by engaging in ad hoc interactive exploration with full text search, interactive text and numeric filters
and histograms, and visual data quality heat maps that highlight patterns, errors, duplicates, and sparse or missing data.
One of the fastest ways to do this is with the aggregation feature.
Clean
Runs a set of sophisticated algorithms across specific sections of the data or across entire data sets. Without any coding or
scripting, Data Preparation then highlights inconsistencies, gaps, and duplicate data so that analysts can fill in blanks,
remove or rename duplicates, fix inconsistent capitalization, and other tasks needed to improve the data.
Shape
Pivot or de-pivot data in a single click; quickly split columns and create aggregations to make the data sets more suitable
for the required analytic exercise.
Enrich
Provides the contexts needed for an analytic exercise. For example, industry data, appended 5-digit zip codes with +4, or
additional information from third-party data providers can be included.
Combine
Combines data using machine learning. Automatically detects common attributes across multiple data sets, and then
provides best-match options to the analyst, who chooses which combination to use for their analytic needs.
With one click, analysts can assemble multiple data sets into a single answer set, and then merge multiple overlapping
entity references into de-duplicated, trusted entities without any scripting, SQL, or complex Excel functionality like
VLOOKUPS, pivot tables, and macros.
Publish
Makes answer sets available directly through ODBC LiveQuery to Qlik, Tableau, Excel, and any other ODBC-compliant
analytics tools or applications.
Data Preparation with Data Virtualization
Combining Cisco Data Preparation and Cisco Data Virtualization accelerates time-to-analytic solutions. Selfservice data preparation tools, combined with IT-curated data access using Data Virtualziation, provide your
business with the data and agility you need. This closed-loop data management process aligns business and IT;
business gets the data and agility they need, and IT delivers on the governance, scalability, and control they
require.
Table 2 lists the Data Virtualization integration areas.
Table 2.
Data Virtualization Integration Areas
Integration Area
Data Virtualization data
sources
Data Virtualization
deployments
Description
● Find Data in Business Directory – Connect to business directory, which contains curated data from one or
more instances of Cisco Information Server. The curated data has been vetted by IT, annotated by endusers, and gains value from repeated use.
● Find Data in Cisco Information Server (CIS) – Connect to CIS and gain access to a broader range of
virtualized data. The virtualized data is integrated from numerous sources, ranging from databases to
packaged apps to cloud sources.
● Ingest Data via Cisco Information Server – Use CIS to load Data Preparation. Integrate any combination of
virtual and physical data quickly and load into Data Preparation to refine data for analytics.
Promote answer sets into CIS – Data sets prepared by business users in Data Preparation can be further
operationalized to CIS and business directory. This allows wider adoption and consumption of Data Preparation
output.
Technology
Cisco Data Preparation runs on an enterprise-scale platform built on Hadoop and powered by Spark. It is built on a
four-layer architecture designed for interactive, self-service data preparation at scale. (See Figure 1.)
1.
User interface layer: Analysts quickly learn and enjoy using the Data Preparation’s visually dynamic, multi-user
interface designed using HTML5 and web socket technology, making it an interactive and intuitive application.
2.
Web services: A lightweight Java layer translates and mediates actions from the user interface into commands
to the underlying platform layer. This layer processes critical capabilities for rules for tenants, users, projects,
and cell-level modifications, creating a comprehensive governance foundation.
© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 2 of 5
3.
Engines: Enabled by proprietary machine learning, latent semantic indexing, statistical pattern recognition, and
text analytics techniques. The first engine has parallel in-memory pipelined capabilities that vastly
acceleratemany of the mundane data prepration functions. The second engine leverages Spark, and operates
over a large variety and volumes of structured and unstructured data in real-time, enabling Cisco Data
Preparation to scale to billions of rows.
4.
File management and storage: Provides a cost effective data management environment. Data sets are stored
and accessed through the library, which resides on top of HDFS.
Figure 1.
Cisco Data Preparation Architecture
Data Preparation on Cisco UCS Big Data Infrastructure
®
Data Preparation installed on Cisco UCS scales without limits by taking advantage of Cisco’s high-performance
and easy-to-manage big data infrastructure. Cisco UCS provides a radically simplified architecture with embedded
management that makes it easy to scale as your requirements evolve to solve larger problems and explore more
complex scenarios. It also reduces your total cost of ownership (TCO) by requiring fewer infrastructure components
and reducing operating expenses associated with staff resources. Together you can solve complex analytical
problems, improve business performance, and mitigate risk rapidly and confidently.
The recommended configuration for the Cisco Data Preparation platform deployment is based on Cisco UCS
C220-M4/C240 M4, with:
●
Two Intel Xeon E5-2680 v3 processors
●
256GB RAM
●
10K RPM SAS HDD or SSD drives, which work with an external Hadoop cluster for data storage
Table 3 lists the benefits of using Cisco UCS.
© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 3 of 5
Table 3.
Cisco UCS Highlights
Highlights
Benefits
Reliable scalability
Cisco Unified Computing System™ delivers reliable scalability of hardware and management
to increase business agility, operational efficiency, and help you rapidly respond to changing
business requirements.
Reduced TCO and improved staff
efficiency
This simplified, intelligent infrastructure reduces your TCO with fewer management points,
switches, adapters, cables, and power and cooling components.
Data preparation on Cisco UCS
Cisco Data Preparation on Cisco UCS streamlines customers’ ability to prepare their data for
analytics at scale, and can be seamlessly integrated into existing enterprise applications
environments.
Service and Support
Cisco Services help you gain better visibility, better information, and better understanding to fuel performance,
efficiency, and innovation from your software purchases. Cisco Services span three phases of lifecycle
management: plan, build, and manage. In the plan phase, Cisco assists you to develop your Cisco Data
Preparation architecture strategy and transformational roadmap in alignment with your business requirements. In
the build phase, Cisco works with you to validate that the Data Preparation solution you designed are ready for
your production and then implements, integrates, or migrates new solutions and applications. In the manage
phase, Cisco assists you to optimize your infrastructure, applications, and service management approach, and
monitors and manages your Data Preparation deployment. Technical support is part of services provided during
the manage phase, which delivers around-the-clock Data Preparation product support from Cisco’s Technical
Assistance Center (TAC). It also provides timely, uninterrupted access to Cisco’s latest software application
updates, including major upgrade releases that might include new features and functionality
System Requirements
Cisco Data Preparation has the following system requirements:
Operating System
●
64-bit (x64) operating system
●
CentOS Linux, v6.4 and 6.5 for development and testing
Software
●
JDK 7 version 1.7 update 67
●
Spark 1.3 (prebuilt for CDH 5)
●
Cloudera Distribution of Hadoop (CDH) 4.7 and 5.4
●
Apache Spark 1.3 (prebuilt for CDH 5)
Others
Cisco Information Server 7.0.2 or later
Ordering Information
Cisco Data Preparation is available for ordering. Table 4 lists the product identifiers required for ordering. To place
an order, contact your Cisco account representative.
© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 4 of 5
Table 4.
Ordering Information
PID
Product Description
CDP-P-T
Data Prep – per core term
CDP-P-1Y
Data Prep – per core term 1 yr
CDP-P-2Y
Data Prep – per core term 2 yr
CDP-P-3Y
Data Prep – per core term 2 yr
For More Information
For more information about Cisco Data Preparation, contact your Cisco account representative.
Printed in USA
© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
09/15
Page 5 of 5
Download