Zunesis 9000 E. Nichols Ave, Suite 150 Centennial, CO, 80112 Zunesis.com HP VERTICA 7 “CRANE” 2014 A look into the versatility and abilities of the platform’s newest version. TABLE OF CONTENTS Contents Abstract (Author’s Notes) ____________________________________________ Error! Bookmark not defined. Vertica and Data Types _______________________________________________ Error! Bookmark not defined. Data is Loaded, onto Big Data Analytics _____________________________ Error! Bookmark not defined. About Zunesis _________________________________________________________ Error! Bookmark not defined. ABSTRACT – (AUTHOR’S NOTES) Abstract – (Author’s Notes) Vertica 7.0 is a real-time analytical platform built to meet the demands of any big data initiative thrown at it. Vertica is now leveraged to host datasets of structured and semi-structured formats that can then be exploited by business analysts, managers, and/or data scientist using standard SQL, Vertica SQLextensions, R functions, and programmatic APIs to derive insights from their troves of captive data. The way I characterize this latest Vertica release is versatility. I see this versatility across several features of Vertica 7.0. However, what stands the tallest is the consummation of different data structures along with the openness of employing different analytical techniques. In my view, these will be the hallmarks of this HP Vertica platform for the next generation of big data opportunities. Stay tuned in the coming weeks for more on the data focus and analytical perspective of Vertica 7.0. Frank Rogowski Chief Executive Title [Date] Page 1 VERTICA AND DATA TYPES Vertica and Data Types Vertica now has the capability to handle semi-structured data (such as JSON and CSV) formats natively by using built-in parsers to easily ingest certain data types; and it does so with speed and ease. Why is this important? Because vast amounts of source data with business exploitation potential are buried inside the depths of social media archives and machine/application logs that are ever-expanding and accelerating. This unrealized potential is being referred to as “dark data” in some professional circles, and it usually has an associated semi-structure profile. Using source data to benefit organizations through market share growth or better delivery of public services is one of the primary drivers of Big Data euphoria. Continuous access into insights which include customer preferences and market tastes can provide early indicators of success in a business strategy, or help marketers make necessary adjustments to campaigns. As such, online marketers are now reveling in their newfound abilities to evaluate website clickstream data, gain access to online shopping behaviors, and intelligently react to individual preferences. Vertica consumes semi-structured data formats from likely sources of tweeter feeds, or webapplication logs, and turns them into what is called a “Flex Zone.” It does this without knowing the data structure or key-value pairs of the loaded data format. Just extract and load any desired CVS and JSON files into the Vertica Flex Zone space of the platform. Then, using standard SQL calls, a data engineer or power end-user can instantaneously investigate what is truly valuable within the newly acquired dataset for further and deeper analysis. The individual elements from a Flex Table (tables in Flex Zone) that make up the semi-structured data are called virtual columns which can be viewed as map keys to provide lookups into the actual raw map data (i.e. JSON file). You can query the values that exist within virtual columns to see what is actually present as content using stand SQL syntax. Page 2 VERTICA AND DATA TYPES After the newly ingested data source undergoes human evaluation, one option is to place a subset or all of the key-value pairs into a structured format into the traditional Vertica columnar database table. The transformation and promotion from Flex Tables to traditional Vertica tables is made easy enough with a single SQL command that materializes the Flex Table’s virtual columns into regular Vertica tables defined by columns with specified data types. This allows the data engineer to then take advantage of the deep features and capabilities that the Vertica platform has to offer for availability, performance, scalability, data governance, and even greater data manipulation language (DML) exploitations if warranted. Additionally, now that the original semi-structured data is part of structured data model, it can be joined with other existing tables (projections) to quickly create a richer perspective or business validation as part of a big data use case. An example could be to evaluate the effectiveness of recent invested marketing activities, pulsing the social media feedbacks/echoes from consumer sentiment, and finally measure the outcome of sales experience as a composite example. Vertica provides the platform to bring this all together in one product that accelerates the realized business value from almost any big data use case. Alternatively, there is also the option to leave the newly loaded semistructured data in a Vertica Flex Table, or create even a hybrid table structure where a combination of Flex Table’s virtual columns for raw unstructured data co-exist with materialized traditional Vertica columns. These serves for holding the defined structured data type (i.e. varchar(xx) ) . The new Vertica platform provides the customer with data definition versatility based on their requirements for both short-term and long-term big data strategies. Page 3 DATA IS LOADED, ONTO BIG DATA ANALYTICS Data is Loaded, Onto Big Data Analytics Now that a customer has the data loaded into Vertica 7.0, the question becomes “what should I do next?” What analytics, algorithms, and manipulation techniques should be employed to extract insights from data that is ready and waiting on the platform? IT professionals and customer businesses managers find this one of the most nebulous questions to answer. As is common in application/ system / use case development the answer is - it depends. Luckily, Vertica has figured that out too, and has created a set of capabilities and extensions that allow customers to exploit different analytic approaches to uncover intelligence from their data. Depending on the use case need and in-house technical skills, Vertica customers can engage the system as they feel is best suited for their big data requirements. It must be said this is where the “rubber hits the road” on the execution of a big data strategy where useable insights are to be derived and then put into intelligible action. From my viewpoint, many big data projects are properly satisfied using standard SQL-99 or SQL Analytics that use techniques like the Window over () clause, which defines partitioning, ordering, and framing for an analytic function. SQL aggregates work well too. Also, there is Vertica SQL extension for performing specific tasks such as Time Series and Event Series Joins. All these SQL-ready techniques are already part of the Vertica product that is already supporting solid business ROI and project justifications. Use cases with some customers may need to produce time series analytics and have enormous clickstream data needing to be analyzed in near real-time: Vertica can do that. Other organizations may need sophisticated pattern-matching analysis for business fraud detection during and after credit transactions: Vertica has the functionality enabled to perform that very well too. Not everyone needs a data scientist on their staff, but some truly do. Some big data requirements may extend into more sophisticated custom statistical algorithms. Vertica can accommodate for those that want to leverage R, Java, or C++ for data mining and use custom logic. These three integrations could be used to deliver kmeans clustering, decision trees, linear regression, naïve bayes models, and a zoo of other statistical algorithms as part of the customer’s specific strategy to pull even the most esoteric insights from their data. Vertica 7.0 can interact with CRAN-provided R packages or custom R- functions for statistical computing as an in-database capability. This is accomplished by loading R packages and creating User Defined Functions (UDFs) making them available for use later in standard SQL calls . The platform also works with custom code from C++ or Java programming languages via an API to execute their statistical analysis on targeted data for a custom programmatic model. With these three approaches they are materialized as user defined extensions (UDx) within Vertica. User defined extensions (UDXs) let customers execute business logic best suited for analytic operations that may be found to be difficult to perform in standard SQL. By invoking normal SQL queries that include user-defined functions (UDF) in the syntax, these UDFs are called upon to perform necessary calculations on the data and return a desired output. Page 4 DATA IS LOADED, ONTO BIG DATA ANALYTICS Again, Vertica 7.0 can work with those facilities as in-database capabilities, thus making it very appealing as an open platform to conduct “slicing and dicing” of data into illuminating insights while demonstrating ROI swiftly regardless of which analytic technique the customer wants to employ. The HP Vertica Analytics Platform framework functions as an easy and powerful onramp for creative use of open source libraries and third-party software. With this Vertica approach, organizations can exploit their data using their existing human talent to their individual and team’s strength while simultaneously providing an ecosystem they can grow into as their big data sophistication evolves and warrants. Below is a table that illustrates the different data manipulation techniques and analytical capacities that are possible within Vertica 7.0 platform. The Zunesis service team has ample use cases examples across different industry domains and many technical implementation examples across each of the three analytical categories below. This is usually part of a Zunesis Big Data workshop and HP Vertica product demonstration that is available upon request. For additional detail please visit our Zunesis website on big data Page 5 ABOUT ZUNESIS About Zunesis To replace a photo with your own, right-click it and then choose Change Picture. NAME TITLE NAME TITLE NAME TITLE Tel [Telephone] Fax [Fax] [Email Address] Tel [Telephone] Fax [Fax] [Email Address] Tel [Telephone] Fax [Fax] [Email Address] Company Information Zunesis 9000 E. Nichols Ave, Suite 150 Centennial, CO, 80112 Tel [Telephone] Fax [Fax] Zunesis.com Page 6