QLIKVIEW DATA FLOWS TECHNICAL BRIEF

advertisement
QLIKVIEW DATA FLOWS
TECHNICAL BRIEF
A QlikView Technical Brief
September 2013
qlikview.com
Table of Contents
Introduction3
Overview3
Data Sourcing
5
Loading and Modeling Data
6
Provisioning Data
9
Using The Data for Analytics
10
Data Governance and Security
12
Learn More
13
QlikView Data Flows Technical Brief | 2
Introduction
Business Discovery relies on the connection, transformation, distribution and ultimately,
analysis of data. This paper provides an introductory overview of the data flows through a
typical QlikView deployment and describes the role of individual systems. We explain how
data is sourced from multiple, heterogeneous sources, how it is manipulated to make it
consistent and logical, and how it is distributed where users can interact with the QlikView
applications.
Overview
There are four main systems involved in building a QlikView enterprise system: QlikView
Desktop, QlikView Publisher, QlikView Server, and Clients. To understand the data flow, we
need to understand the role of these systems and where they are situated in the system
architecture (see Figure 1).
SAN Storage
SMTP service
Figure 1 - QlikView Architectural Overview
QlikView Data Flows Technical Brief | 3
QlikView Desktop: This is the main tool for creating QlikView applications. The
application designer uses this tool to specify where data is sourced, how it is manipulated,
and how it is displayed. The application presentation is handled by Clients, but application
data processing is managed by the Server.
Clients: This is where users use the QlikView application to view and interact with data.
The application can be part of a standalone executable or part of a web page. The client
side of applications is designed to consume few computational resources.
QlikView Server: This system serves applications and data to clients, performs application
calculations, and manages security.
QlikView Publisher: This system provides a means of controlling how the data used by
applications is updated.
The sections of this paper will follow how data is sourced, loaded and modeled, provisioned,
and then analyzed in QlikView. Additionally, discussion of how data is governed and secured
is included.
Data
Sourcing
Loading &
Modeling Data
Data for
Analytics
Provisioning
Data
Governance
and Security
QlikView Data Flows Technical Brief | 4
Data Sourcing
QlikView extracts data from multiple, heterogeneous sources (e.g. databases, spreadsheets,
web pages and ‘Big Data’ sources such as Hadoop and Google BigQuery) and creates a
homogenous data set suitable for analysis and visualization.
QlikView
Direct Query
In-Memory Data Loading & Modeling
SFDC
Web
pages
Spread
sheets
QVX
SFDC
SAP
Other
(Direct Discovery)
ODBC
SAP
OLEDB
‘Big Data’
Data
Warehouse
‘Standard’
Databases
Heterogeneous data sources, locations and formats
Figure 2: Loading data from multiple sources into QlikView
Data is sourced from a multitude of systems, from standard ODBC, OLE DB and JDBC
data stores (such as Oracle), spreadsheets and web pages (HTML, XML, etc.) to systems
that require custom connectors (such as Salesforce.com, SAP and Google BigQuery). For
most data sources, a connection is made using a wizard that simplifies the connection
process and allows the application designer control over how data is read. For example, the
designer can choose not to read in certain fields or to rename them. The presence of a data
warehouse is not required, although if it already exists, it is easily leveraged by QlikView.
Accessing Hadoop-based Big Data systems is straightforward too. Where an ODBC or
JDBC driver does not exist, QlikView has an open standard data exchange protocol (called
QVX) that can be used to build custom connectors to data sources that do not offer
standard connectivity.
QlikView Data Flows Technical Brief | 5
Loading and Modeling Data
QlikView’s primary method of conducting data analysis is to use its in-memory engine. Since
its inception in 1993, QlikView has used an in-memory approach to data analytics and for
over 20 years, has built on this technology to offer the best in-memory analytics solution in
the industry. In addition, QlikView introduced a direct query capability, Direct Discovery, to
allow a measure of direct data access to the underlying data systems.
Data is loaded by QlikView from the various source systems into the in-memory engine
via the Load Script. The Load Script is contained within a QlikView application and uses a
SQL-like language to connect to source systems and perform data modeling. The data gets
loaded when the Load Script gets executed. Using QlikView Publisher, data loading can
occur on a periodic basis and/or based on triggers.
Figure 3: Example QlikView Load Script
QlikView Data Flows Technical Brief | 6
Once data is loaded into a QlikView application, it’s held in-memory. What this means is
that QlikView applications require one-time data source access to read in a dataset and
store that historical data. For new data (‘delta’ or ‘updated’ data), QlikView can simply
load this new data and append it to the historical data without having to do a full reload.
In addition, QlikView utilizes sophisticated algorithms to compress the data (sometimes
up to 90% from the size on disk in a database) to make optimal use of the in-memory
store. For more information, please see blog post: http://community.qlikview.com/blogs/
qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers
Application developers also use the Load Script to model data from the various source
systems prior to inserting it into the in-memory engine. In reality, business intelligence tools
must cope with data that is incomplete, poorly labeled, or duplicated across multiple sources.
Linking data from different data sources requires the use of a key, but the same data can be
labeled in different way across different sources (e.g. “Sales,” “Sales Revenue,” and “Sales
Numbers” might all be the same data – see Figure 4). QlikView can easily merge these
similar data fields from different tables into a single, consolidated view (e.g. converting the 3
“Sales” fields into a single field called “Sales $” – see Figure 4).
Figure 4: Renaming of fields from different sources
QlikView Data Flows Technical Brief | 7
A more subtle problem is slightly different data formats for the same underlying data, for
example one data source might store dates in a single “YYYY-MM-DD” field while another
might have separate Year, Month, and Date fields. The application designer must be able to
consolidate all date fields into a single, representative view.
The Load Script allows fields to be renamed, separated, joined, or otherwise manipulated.
For example, the developer can do table joins, or create a ‘Name’ field by combining ‘First
Name’ and ‘Last Name’ fields. Because QlikView directly reads in data sources, it’s possible to
manipulate fields across multiple data sources, for example the user could conditionally read in
sales person data (HR database) where the sales person has made a sale (Sales database).
Figure 5: QlikView Data Model Viewer
QlikView provides a data model viewer (see Figure 5) that makes it easy to see the
associations that have been made within the engine as well as providing information about
the data such as density, field names, table names, and so on. It can also find data model
problems to fix them with the scripting environment.
The QlikView engine provides a unique associative capability to the data that has been
loaded. This means that data that is sourced from multiple systems can be treated as a
single data entity within the engine for the purpose of analytics, regardless of where the
data came from. QlikView applies associations between the data from the various systems
by automatically mapping fields that have the same name and same data type. This allows
users to interrogate and make discoveries in their data as if it were a single table of data,
rather than data coming from a variety of disparate and unconnected systems. In Figure
5, one can see the automatic associations are made between the ‘Facts’ table and the
‘Employees’ table, for example, via the ‘EmployeeID’ field.
QlikView Data Flows Technical Brief | 8
Provisioning Data
QlikView offers a set of file-based data persistence options. In fact, every QlikView
application (a “.qvw” file) itself contains all the data needed for the application. This data
within the .qvw file on disk, which is binary encoded, represents the data that was loaded
during the previous execution of the Load Script. The Load Script is also contained within
the .qvw file, as is the entire presentation layer.
Larger deployments typically use a data staging layer. This is to a) provide atomic data
packages that are optimized for a particular analytic need (e.g. a ‘Finance’ package that
contains data from various Finance and Ops systems), and b) provide an optimized data
loading environment for QlikView. QlikView developers can create a “.qvd” file which is an
optimized QlikView data file that can be loaded rapidly into a “.qvw” application.
Typical deployments of QlikView include a “QVD Layer” containing a number of .qvd files
(e.g. a Finance QVD, Sales QVD, Q1 QVD and so on) that application developers can use
off the shelf to build their own specific QlikView applications and promote the reuse of
consistent data across many QlikView apps. See Figure 6.
Figure 6: Example QVD Layer
QlikView Data Flows Technical Brief | 9
Using The Data for Analytics
Once the data is loaded into the in-memory associative engine, a large variety of very powerful,
and real time, analytics capabilities are available. This is because of the rapid and highlyflexible nature of QlikView’s in-memory technology. Developers can create sophisticated
analytics applications that give business users a very rich set of analysis capabilities and allow
business users to conduct their own analysis and interrogate their data the way they wish.
Using the Expression Language that is accessed via most visualization objects in QlikView,
the in-memory data can be dynamically aggregated, manipulated, and compared on-the-fly.
New dimensions can be calculated on-the-fly that were not previously in existence in the
data model. New hierarchies can be defined, and different groupings (or sets) of data can be
isolated for the purpose of comparative analytics.
QlikView Data Flows Technical Brief | 10
With the in-memory analytics
engine, QlikView apps can be
built to do the following:
• Calculated Dimensions
• Aggregations on-the-fly
(e.g. statistics)
• Hierarchies on-the-fly
• Set Analysis
• Comparative Analysis
• Conditional Display
There has been a lot of discussion in the marketplace about ‘in-memory:’
The term in-memory really doesn’t even begin to paint the full picture for
someone about what analytics capabilities are available in a product. People
should investigate what exactly they are getting when they acquire an inmemory solution. With QlikView, it is the ability to use in-memory technology
to do on-demand calculation (i.e. nothing needs to be pre-calculated or
pre-aggregated) across an entire multi-table data model, in a completely
associative manner that makes QlikView truly unique in this regard.
For a more in-depth understanding of how QlikView works under the
covers, see the blog post at: http://community.qlikview.com/blogs/
qlikviewdesignblog/2013/07/15/logical-inference-and-aggregations
The Expression environment contains hundreds of functions that developers
can utilize to build dynamic and highly relevant apps. These functions are
grouped (see Figure 7) and cover topics such as Aggregation, Financial,
Mapping, Number Interpolation and so on.
Figure 7: Categories of the hundreds of functions available in
the Expression Language
QlikView Data Flows Technical Brief | 11
Data Governance and Security
How can you know that the sales revenue figure used by the accounting department
is the same as that used by sales and marketing? How can you be sure that numbers
are calculated the same way across different applications? This problem is given more
urgency and importance by regulatory and reporting laws that require traceability. Ensuring
consistency and accountability is the essence of data governance.
The QlikView Governance Dashboard and QlikView Expressor provide data governance
and centralized, controlled data provisioning for QlikView applications respectively. The
Governance Dashboard provides a comprehensive view into the data flows into QlikView,
how the data is manipulated, and who is using what and when. QlikView Expressor allows
for the provisioning of consistent and traceable rules for calculating business quantities
such as sales revenue, employee costs, and profit. Data stewards use QlikView Expressor
to provide common business rule definitions across a QlikView deployment.
Security is about controlling who has access to what data. All QlikView deployments require
authentication which is handled via Integrated Windows Authentication or a 3rd party Single
Sign-On solution. Once the user’s identity is established, there is the issue of authorization
to access different data sets. Authorization can be set at the application, application section
level, row level and individual data element levels. QlikView uses a number of industry
standard and proprietary technologies to provide detailed control over what data users
can see. In a QlikView system, all communications between the client and the server use
either HTTPS or the QlikView proprietary QVP protocol and no ports are opened between
the client and the server. For more information, please reference the QlikView Security
Overview Technology White Paper.
QlikView Data Flows Technical Brief | 12
Learn More
QlikView Architectural Overview
QlikView Governance Overview
QlikView Security Overview
QlikView Design Blog Post: Logical Inference and Aggregations
QlikView Design Blog Post: Symbol Tables and Bit Stuffed Pointers
© 2013 QlikTech International AB. All rights reserved. QlikTech, QlikView, Qlik, Q, Simplifying Analysis for Everyone, Power of Simplicity, New Rules, The Uncontrollable Smile and
other QlikTech products and services as well as their respective logos are trademarks or registered trademarks of QlikTech International AB. All other company names, products
and services used herein are trademarks or registered trademarks of their respective owners. The information published herein is subject to change without notice. This publication
is for informational purposes only, without representation or warranty of any kind, and QlikTech shall not be liable for errors or omissions with respect to this publication. The only
warranties for QlikTech products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should
be construed as constituting any additional warranty.
QlikView Data Flows Technical Brief | 13
Download