QLIKVIEW DATA FLOWS TECHNICAL BRIEF A QlikView Technical Brief September 2013 qlikview.com Table of Contents Introduction3 Overview3 Data Sourcing 5 Loading and Modeling Data 6 Provisioning Data 9 Using The Data for Analytics 10 Data Governance and Security 12 Learn More 13 QlikView Data Flows Technical Brief | 2 Introduction Business Discovery relies on the connection, transformation, distribution and ultimately, analysis of data. This paper provides an introductory overview of the data flows through a typical QlikView deployment and describes the role of individual systems. We explain how data is sourced from multiple, heterogeneous sources, how it is manipulated to make it consistent and logical, and how it is distributed where users can interact with the QlikView applications. Overview There are four main systems involved in building a QlikView enterprise system: QlikView Desktop, QlikView Publisher, QlikView Server, and Clients. To understand the data flow, we need to understand the role of these systems and where they are situated in the system architecture (see Figure 1). SAN Storage SMTP service Figure 1 - QlikView Architectural Overview QlikView Data Flows Technical Brief | 3 QlikView Desktop: This is the main tool for creating QlikView applications. The application designer uses this tool to specify where data is sourced, how it is manipulated, and how it is displayed. The application presentation is handled by Clients, but application data processing is managed by the Server. Clients: This is where users use the QlikView application to view and interact with data. The application can be part of a standalone executable or part of a web page. The client side of applications is designed to consume few computational resources. QlikView Server: This system serves applications and data to clients, performs application calculations, and manages security. QlikView Publisher: This system provides a means of controlling how the data used by applications is updated. The sections of this paper will follow how data is sourced, loaded and modeled, provisioned, and then analyzed in QlikView. Additionally, discussion of how data is governed and secured is included. Data Sourcing Loading & Modeling Data Data for Analytics Provisioning Data Governance and Security QlikView Data Flows Technical Brief | 4 Data Sourcing QlikView extracts data from multiple, heterogeneous sources (e.g. databases, spreadsheets, web pages and ‘Big Data’ sources such as Hadoop and Google BigQuery) and creates a homogenous data set suitable for analysis and visualization. QlikView Direct Query In-Memory Data Loading & Modeling SFDC Web pages Spread sheets QVX SFDC SAP Other (Direct Discovery) ODBC SAP OLEDB ‘Big Data’ Data Warehouse ‘Standard’ Databases Heterogeneous data sources, locations and formats Figure 2: Loading data from multiple sources into QlikView Data is sourced from a multitude of systems, from standard ODBC, OLE DB and JDBC data stores (such as Oracle), spreadsheets and web pages (HTML, XML, etc.) to systems that require custom connectors (such as Salesforce.com, SAP and Google BigQuery). For most data sources, a connection is made using a wizard that simplifies the connection process and allows the application designer control over how data is read. For example, the designer can choose not to read in certain fields or to rename them. The presence of a data warehouse is not required, although if it already exists, it is easily leveraged by QlikView. Accessing Hadoop-based Big Data systems is straightforward too. Where an ODBC or JDBC driver does not exist, QlikView has an open standard data exchange protocol (called QVX) that can be used to build custom connectors to data sources that do not offer standard connectivity. QlikView Data Flows Technical Brief | 5 Loading and Modeling Data QlikView’s primary method of conducting data analysis is to use its in-memory engine. Since its inception in 1993, QlikView has used an in-memory approach to data analytics and for over 20 years, has built on this technology to offer the best in-memory analytics solution in the industry. In addition, QlikView introduced a direct query capability, Direct Discovery, to allow a measure of direct data access to the underlying data systems. Data is loaded by QlikView from the various source systems into the in-memory engine via the Load Script. The Load Script is contained within a QlikView application and uses a SQL-like language to connect to source systems and perform data modeling. The data gets loaded when the Load Script gets executed. Using QlikView Publisher, data loading can occur on a periodic basis and/or based on triggers. Figure 3: Example QlikView Load Script QlikView Data Flows Technical Brief | 6 Once data is loaded into a QlikView application, it’s held in-memory. What this means is that QlikView applications require one-time data source access to read in a dataset and store that historical data. For new data (‘delta’ or ‘updated’ data), QlikView can simply load this new data and append it to the historical data without having to do a full reload. In addition, QlikView utilizes sophisticated algorithms to compress the data (sometimes up to 90% from the size on disk in a database) to make optimal use of the in-memory store. For more information, please see blog post: http://community.qlikview.com/blogs/ qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers Application developers also use the Load Script to model data from the various source systems prior to inserting it into the in-memory engine. In reality, business intelligence tools must cope with data that is incomplete, poorly labeled, or duplicated across multiple sources. Linking data from different data sources requires the use of a key, but the same data can be labeled in different way across different sources (e.g. “Sales,” “Sales Revenue,” and “Sales Numbers” might all be the same data – see Figure 4). QlikView can easily merge these similar data fields from different tables into a single, consolidated view (e.g. converting the 3 “Sales” fields into a single field called “Sales $” – see Figure 4). Figure 4: Renaming of fields from different sources QlikView Data Flows Technical Brief | 7 A more subtle problem is slightly different data formats for the same underlying data, for example one data source might store dates in a single “YYYY-MM-DD” field while another might have separate Year, Month, and Date fields. The application designer must be able to consolidate all date fields into a single, representative view. The Load Script allows fields to be renamed, separated, joined, or otherwise manipulated. For example, the developer can do table joins, or create a ‘Name’ field by combining ‘First Name’ and ‘Last Name’ fields. Because QlikView directly reads in data sources, it’s possible to manipulate fields across multiple data sources, for example the user could conditionally read in sales person data (HR database) where the sales person has made a sale (Sales database). Figure 5: QlikView Data Model Viewer QlikView provides a data model viewer (see Figure 5) that makes it easy to see the associations that have been made within the engine as well as providing information about the data such as density, field names, table names, and so on. It can also find data model problems to fix them with the scripting environment. The QlikView engine provides a unique associative capability to the data that has been loaded. This means that data that is sourced from multiple systems can be treated as a single data entity within the engine for the purpose of analytics, regardless of where the data came from. QlikView applies associations between the data from the various systems by automatically mapping fields that have the same name and same data type. This allows users to interrogate and make discoveries in their data as if it were a single table of data, rather than data coming from a variety of disparate and unconnected systems. In Figure 5, one can see the automatic associations are made between the ‘Facts’ table and the ‘Employees’ table, for example, via the ‘EmployeeID’ field. QlikView Data Flows Technical Brief | 8 Provisioning Data QlikView offers a set of file-based data persistence options. In fact, every QlikView application (a “.qvw” file) itself contains all the data needed for the application. This data within the .qvw file on disk, which is binary encoded, represents the data that was loaded during the previous execution of the Load Script. The Load Script is also contained within the .qvw file, as is the entire presentation layer. Larger deployments typically use a data staging layer. This is to a) provide atomic data packages that are optimized for a particular analytic need (e.g. a ‘Finance’ package that contains data from various Finance and Ops systems), and b) provide an optimized data loading environment for QlikView. QlikView developers can create a “.qvd” file which is an optimized QlikView data file that can be loaded rapidly into a “.qvw” application. Typical deployments of QlikView include a “QVD Layer” containing a number of .qvd files (e.g. a Finance QVD, Sales QVD, Q1 QVD and so on) that application developers can use off the shelf to build their own specific QlikView applications and promote the reuse of consistent data across many QlikView apps. See Figure 6. Figure 6: Example QVD Layer QlikView Data Flows Technical Brief | 9 Using The Data for Analytics Once the data is loaded into the in-memory associative engine, a large variety of very powerful, and real time, analytics capabilities are available. This is because of the rapid and highlyflexible nature of QlikView’s in-memory technology. Developers can create sophisticated analytics applications that give business users a very rich set of analysis capabilities and allow business users to conduct their own analysis and interrogate their data the way they wish. Using the Expression Language that is accessed via most visualization objects in QlikView, the in-memory data can be dynamically aggregated, manipulated, and compared on-the-fly. New dimensions can be calculated on-the-fly that were not previously in existence in the data model. New hierarchies can be defined, and different groupings (or sets) of data can be isolated for the purpose of comparative analytics. QlikView Data Flows Technical Brief | 10 With the in-memory analytics engine, QlikView apps can be built to do the following: • Calculated Dimensions • Aggregations on-the-fly (e.g. statistics) • Hierarchies on-the-fly • Set Analysis • Comparative Analysis • Conditional Display There has been a lot of discussion in the marketplace about ‘in-memory:’ The term in-memory really doesn’t even begin to paint the full picture for someone about what analytics capabilities are available in a product. People should investigate what exactly they are getting when they acquire an inmemory solution. With QlikView, it is the ability to use in-memory technology to do on-demand calculation (i.e. nothing needs to be pre-calculated or pre-aggregated) across an entire multi-table data model, in a completely associative manner that makes QlikView truly unique in this regard. For a more in-depth understanding of how QlikView works under the covers, see the blog post at: http://community.qlikview.com/blogs/ qlikviewdesignblog/2013/07/15/logical-inference-and-aggregations The Expression environment contains hundreds of functions that developers can utilize to build dynamic and highly relevant apps. These functions are grouped (see Figure 7) and cover topics such as Aggregation, Financial, Mapping, Number Interpolation and so on. Figure 7: Categories of the hundreds of functions available in the Expression Language QlikView Data Flows Technical Brief | 11 Data Governance and Security How can you know that the sales revenue figure used by the accounting department is the same as that used by sales and marketing? How can you be sure that numbers are calculated the same way across different applications? This problem is given more urgency and importance by regulatory and reporting laws that require traceability. Ensuring consistency and accountability is the essence of data governance. The QlikView Governance Dashboard and QlikView Expressor provide data governance and centralized, controlled data provisioning for QlikView applications respectively. The Governance Dashboard provides a comprehensive view into the data flows into QlikView, how the data is manipulated, and who is using what and when. QlikView Expressor allows for the provisioning of consistent and traceable rules for calculating business quantities such as sales revenue, employee costs, and profit. Data stewards use QlikView Expressor to provide common business rule definitions across a QlikView deployment. Security is about controlling who has access to what data. All QlikView deployments require authentication which is handled via Integrated Windows Authentication or a 3rd party Single Sign-On solution. Once the user’s identity is established, there is the issue of authorization to access different data sets. Authorization can be set at the application, application section level, row level and individual data element levels. QlikView uses a number of industry standard and proprietary technologies to provide detailed control over what data users can see. In a QlikView system, all communications between the client and the server use either HTTPS or the QlikView proprietary QVP protocol and no ports are opened between the client and the server. For more information, please reference the QlikView Security Overview Technology White Paper. QlikView Data Flows Technical Brief | 12 Learn More QlikView Architectural Overview QlikView Governance Overview QlikView Security Overview QlikView Design Blog Post: Logical Inference and Aggregations QlikView Design Blog Post: Symbol Tables and Bit Stuffed Pointers © 2013 QlikTech International AB. All rights reserved. QlikTech, QlikView, Qlik, Q, Simplifying Analysis for Everyone, Power of Simplicity, New Rules, The Uncontrollable Smile and other QlikTech products and services as well as their respective logos are trademarks or registered trademarks of QlikTech International AB. All other company names, products and services used herein are trademarks or registered trademarks of their respective owners. The information published herein is subject to change without notice. This publication is for informational purposes only, without representation or warranty of any kind, and QlikTech shall not be liable for errors or omissions with respect to this publication. The only warranties for QlikTech products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting any additional warranty. QlikView Data Flows Technical Brief | 13