Guided Analytics for Large Data

advertisement

Guided Analytics with large Detail Data

When we think of “guided analytics”, we need to think of it as two pieces of the same puzzle.

1.

What is it we want the user to be able to accomplish?

2.

How do we create an infrastructure to get him there?

The first piece deals with the business problem being addressed, and, is not the focus of this paper.

The second piece deals with the technical structure of the solution. This paper attempts to detail one way of drilling from one set of data, to another, in an efficient manner. Done properly, it is possible to “walk through” a logical process, to drill into massive amounts of data, without ever having to “scan” a large table.

A Tableau Packaged workbook ( Guided_Analytics_For_Large_Data.twbx

) is part of this document package.

For this example, we’ll use the well known Tableau Super Store Sales data, and, we’ll set up a dashboard that facilitates drilling from Region/Quarter, to State/Month, to Zip Code/Day, to individual order line.

Please use your imagination, pretending that the total detail data in hundreds of millions of rows…

Note:

This is just a very basic view of the total “Guided Analytic” process. A full dissertation would read like War And Peace!

The concept:

In a “real world”, IT created, data warehouse, the best performance solutions come from pre computed aggregate tables, each structured to allow for drilling from small tables, into slightly larger ones, in a repetitive process, until we end up at the largest table, looking at a very small subset of the total data.

Use Tableau’s Data Extract capability to build the individual aggregate tables needed.

Build views from each of the individual Data Extracts.

Use Tableau Actions to link from each Data extract TO the next larger Data Extract. Since we are dealing with LARGE data sets, processes should be implemented to limit the amount of data being pulled to the user view.

1.

Actions are Single Select ONLY.

2.

Clearing the selection should show no data.

The structure looks like this:

File sizes for the raw data, and, the Tableau Data Extracts used in this effort;

The level 1 TDE 46k

The level 2 TDE

The level 3 TDE

63K

244k

The level 4 TDE

The initial raw data 1,670k

543k

Before looking at the individual steps, let’s look at the final product:

Ugly and boring – but, “visual best practices is not the intent of this paper.

Now, let’s look at each of the four views, individually, commenting on the pieces needed for the drilling process.

#1 - Region-Quarter Summary – very small, and, very fast. Nothing special about this one…

#2 – State-Month - A little larger, BUT, we are only accessing a subset of the data.

Note the Level of Detail Shelf

#3 – ZipCode–Date. Note the Level of detail Shelf, again…

#4 – Detail. From the prior view, we’ll pass ZipCode and date – each must be somewhere in the detail view.

Once these view are built, we create the Tableau Actions needed to do the linking.

Remember the two key rules: Single Select AND No data on Clearing Selection!

There are three actions required:

The first one drills from the Region-Quarter view TO the State-Month view:

The second one drills from the State-Month view TO the ZipCode-Day view:

The final one drills from the ZipCode-Day view TO the Detail view:

OTHER things to consider when creating this type of process:

1.

While creating the various Data Extracts, he raw data should NOT be changing.

Otherwise the user could be drilling into what appears to be incorrect data when one extract is created from data in one state, and, a different extract is created from data in another state.

2.

Even though this example uses nothing but extracts, a real world process would, likely, do the final drill into the live data. Make sure that data is structure for the key information being passed. Example: in this example, if we were drilling into the live data for the Detail view, we’d want to be certain the detail table had indexes on the date and the zip code columns. Otherwise, we’d be doing a full table scan against that table!

3.

An additional topic to consider is using MENU selections, instead of SELECT, for the

Actions. Give the user choices in the drill process. The same concept (extracts linking to extracts) is followed.

4.

As the extracts are built, be sure to select (as appropriate):

The goal is to make each Data Extract as small and as “pertinent” as possible.

Download