Data is Loaded, Onto Big Data Analytics

advertisement
Zunesis
9000 E. Nichols Ave, Suite 150
Centennial, CO, 80112
Zunesis.com
HP VERTICA 7
“CRANE” 2014
A look into the versatility and abilities of the platform’s newest version.
TABLE OF CONTENTS
Contents
Abstract (Author’s Notes) ____________________________________________ Error! Bookmark not defined.
Vertica and Data Types _______________________________________________ Error! Bookmark not defined.
Data is Loaded, onto Big Data Analytics _____________________________ Error! Bookmark not defined.
About Zunesis _________________________________________________________ Error! Bookmark not defined.
ABSTRACT – (AUTHOR’S NOTES)
Abstract – (Author’s Notes)
Vertica 7.0 is a real-time analytical platform built to meet the demands of any big data initiative
thrown at it. Vertica is now leveraged to host datasets of structured and semi-structured formats that can
then be exploited by business analysts, managers, and/or data scientist using standard SQL, Vertica SQLextensions, R functions, and programmatic APIs to derive insights from their troves of captive data.
The way I characterize this latest Vertica release is versatility. I see this versatility across several
features of Vertica 7.0. However, what stands the tallest is the consummation of different data structures
along with the openness of employing different analytical techniques. In my view, these will be the
hallmarks of this HP Vertica platform for the next generation of big data opportunities.
Stay tuned in the coming weeks for more on the data focus and analytical perspective of Vertica 7.0.
Frank Rogowski
Chief Executive Title
[Date]
Page 1
VERTICA AND DATA TYPES
Vertica and Data Types
Vertica now has the capability to handle semi-structured data (such as JSON and CSV) formats
natively by using built-in parsers to easily ingest certain data types; and it does so with speed and ease.
Why is this important? Because vast amounts of source data with business exploitation potential are
buried inside the depths of social media archives and machine/application logs that are ever-expanding
and accelerating.
This unrealized potential is being referred to as “dark data” in some professional circles, and it
usually has an associated semi-structure profile. Using source data to benefit organizations through
market share growth or better delivery of public services is one of the primary drivers of Big Data
euphoria. Continuous access into insights which include customer preferences and market tastes can
provide early indicators of success in a business strategy, or help marketers make necessary adjustments
to campaigns. As such, online marketers are now reveling in their newfound abilities to evaluate website
clickstream data, gain access to online shopping behaviors, and intelligently react to individual preferences.
Vertica consumes semi-structured data formats from likely sources of tweeter feeds, or webapplication logs, and turns them into what is called a “Flex Zone.” It does this without knowing the data
structure or key-value pairs of the loaded data format. Just extract and load any desired CVS and JSON files
into the Vertica Flex Zone space of the platform. Then, using standard SQL calls, a data engineer or power
end-user can instantaneously investigate what is truly valuable within the newly acquired dataset for
further and deeper analysis.
The individual elements from a Flex Table (tables in Flex Zone) that make up the semi-structured
data are called virtual columns which can be viewed as map keys to provide lookups into the actual raw
map data (i.e. JSON file). You can query the values that exist within virtual columns to see what is actually
present as content using stand SQL syntax.
Page 2
VERTICA AND DATA TYPES
After the newly ingested data source undergoes human evaluation, one option is to place a subset
or all of the key-value pairs into a structured format into the traditional Vertica columnar database table.
The transformation and promotion from Flex Tables to traditional Vertica tables is made easy enough with
a single SQL command that materializes the Flex Table’s virtual columns into regular Vertica tables defined
by columns with specified data types. This allows the data engineer to then take advantage of the deep
features and capabilities that the Vertica platform has to offer for availability, performance, scalability, data
governance, and even greater data manipulation language (DML) exploitations if warranted.
Additionally, now that the original semi-structured data is part of
structured data model, it can be joined with other existing tables
(projections) to quickly create a richer perspective or business validation as
part of a big data use case. An example could be to evaluate the effectiveness
of recent invested marketing activities, pulsing the social media
feedbacks/echoes from consumer sentiment, and finally measure the
outcome of sales experience as a composite example. Vertica provides the
platform to bring this all together in one product that accelerates the realized
business value from almost any big data use case.
Alternatively, there is also the option to leave the newly loaded semistructured data in a Vertica Flex Table, or create even a hybrid table structure where a combination of Flex
Table’s virtual columns for raw unstructured data co-exist with materialized traditional Vertica columns.
These serves for holding the defined structured data type (i.e. varchar(xx) ) . The new Vertica platform
provides the customer with data definition versatility based on their requirements for both short-term and
long-term big data strategies.
Page 3
DATA IS LOADED, ONTO BIG DATA ANALYTICS
Data is Loaded, Onto Big Data Analytics
Now that a customer has the data loaded into Vertica 7.0, the question becomes “what should I do next?”
What analytics, algorithms, and manipulation techniques should be employed to extract insights from data
that is ready and waiting on the platform? IT professionals and customer businesses managers find this
one of the most nebulous questions to answer. As is common in application/ system / use case
development the answer is - it depends.
Luckily, Vertica has figured that out too, and has created a set of capabilities and extensions that
allow customers to exploit different analytic approaches to uncover intelligence from their data. Depending
on the use case need and in-house technical skills, Vertica customers can engage the system as they feel is
best suited for their big data requirements.
It must be said this is where the “rubber hits the road” on the execution of a big data strategy
where useable insights are to be derived and then put into intelligible action.
From my viewpoint, many big data projects are properly satisfied using standard SQL-99 or SQL
Analytics that use techniques like the Window over () clause, which defines partitioning, ordering, and
framing for an analytic function. SQL aggregates work well too. Also, there is Vertica SQL extension for
performing specific tasks such as Time Series and Event Series Joins. All these SQL-ready techniques are
already part of the Vertica product that is already supporting solid business ROI and project justifications.
Use cases with some customers may need to produce time series analytics and have enormous
clickstream data needing to be analyzed in near real-time: Vertica can do that. Other organizations may
need sophisticated pattern-matching analysis for business fraud detection during and after credit
transactions: Vertica has the functionality enabled to perform that very well too.
Not everyone needs a data scientist on their staff, but some truly do. Some big data requirements
may extend into more sophisticated custom statistical algorithms. Vertica can accommodate for those that
want to leverage R, Java, or C++ for data mining and use custom logic. These three integrations could be
used to deliver kmeans clustering, decision trees, linear regression, naïve bayes models, and a zoo of other
statistical algorithms as part of the customer’s specific strategy to pull even the most esoteric insights from
their data.
Vertica 7.0 can interact with CRAN-provided R packages or custom R- functions for statistical
computing as an in-database capability. This is accomplished by loading R packages and creating User
Defined Functions (UDFs) making them available for use later in standard SQL calls . The platform also
works with custom code from C++ or Java programming languages via an API to execute their statistical
analysis on targeted data for a custom programmatic model. With these three approaches they are
materialized as user defined extensions (UDx) within Vertica. User defined extensions (UDXs) let
customers execute business logic best suited for analytic operations that may be found to be difficult to
perform in standard SQL. By invoking normal SQL queries that include user-defined functions (UDF) in the
syntax, these UDFs are called upon to perform necessary calculations on the data and return a desired
output.
Page 4
DATA IS LOADED, ONTO BIG DATA ANALYTICS
Again, Vertica 7.0 can work with those facilities as in-database capabilities, thus making it very appealing
as an open platform to conduct “slicing and dicing” of data into illuminating insights while demonstrating
ROI swiftly regardless of which analytic technique the customer wants to employ. The HP Vertica Analytics
Platform framework functions as an easy and powerful onramp for creative use of open source libraries
and third-party software.
With this Vertica approach, organizations can exploit their data using their existing human talent
to their individual and team’s strength while simultaneously providing an ecosystem they can grow into as
their big data sophistication evolves and warrants.
Below is a table that illustrates the different data manipulation techniques and analytical
capacities that are possible within Vertica 7.0 platform. The Zunesis service team has ample use cases
examples across different industry domains and many technical implementation examples across each of
the three analytical categories below. This is usually part of a Zunesis Big Data workshop and HP Vertica
product demonstration that is available upon request.
For additional detail please visit our Zunesis website on big data
Page 5
ABOUT ZUNESIS
About Zunesis
To replace a photo with your own, right-click it and then choose Change Picture.
NAME
TITLE
NAME
TITLE
NAME
TITLE
Tel [Telephone]
Fax [Fax]
[Email Address]
Tel [Telephone]
Fax [Fax]
[Email Address]
Tel [Telephone]
Fax [Fax]
[Email Address]
Company Information
Zunesis
9000 E. Nichols Ave, Suite 150
Centennial, CO, 80112
Tel [Telephone]
Fax [Fax]
Zunesis.com
Page 6
Download