Guilty Until Proven Innocent – User Trust in Our Data This week I`m

advertisement
Guilty Until Proven Innocent – User Trust in Our Data
This week I’m in jury duty. As part of the privilege of being a United States citizen we are occasionally
called to serve as jurors so that the defendant, as much as is humanly possible, is assured a fair trial.
During this trial, the judge, and both the prosecution and defense, have reiterated on a few occasions
that the defendant is presumed innocent until proven guilty. This means that the burden of proof is on
the prosecution to prove beyond a reasonable doubt all the material elements of the case. If the
prosecution fails to do this beyond a reasonable doubt then it is the responsibility of the juror to vote
not guilty.
Being a DW/BI practitioner I’ve been reflecting on this “innocent until proven guilty” in relation to user
trust in the data from the data warehouse. I’ve realized that user trust in the data is exactly the
opposite. In almost every case, until a significant track record is established and maintained, the data
warehouse (and any reports/marts/cubes/dashboards that sit on top of it) are assumed to be wrong
(guilty) until proven right (innocent).
How many of us have ever heard statements such as: “the data warehouse is wrong”, “your numbers
are off”, etc? Most of us know that user trust in the data is one of the most important factors
determining the success of a DW/BI effort. However, most of us have observed that there seems to be a
natural, almost automatic, distrust in the data that comes from the Data Warehouse.
I see several reasons for this phenomenon:
1. Resistance to change. We, as people, tend to be resistant to change. Change disrupts, and
sometimes it disrupts in a way that is negative to the status quo. Many times individuals have a
personal stake in the status quo and don’t want the change. Other times, the change is simply
inconvenient and requires the rethinking of assumptions that might have been based on the
previous view of the world.
2. The data warehouse is politically-charged endeavor. For example, if the data warehouse reveals
business reality in a way that had not been seen before, it can shed light on areas of corporate
misalignment, underperformance, areas that are outside the organization core competency and
should be spun off or abandoned, etc. This is not a bad thing for an organization, but it may have
very negative personal impacts to individuals within the organization. I heard a quote recently
that went something like this: “it’s difficult to get someone to understand something when their
salary depends on not understanding it.” If the data warehouse has the possibility to reveal
things like this it will naturally be open to attack and meet with resistance.
3. Mistakes by the DW/BI team. To be fair, we as DW/BI practitioners have definitely made
mistakes at times. We’ve missed valid business cases, we’ve not understood the business rules
correctly, and we’ve made mistakes in our ETL, our modeling, our calculations, our presentation,
and many other areas. To further complicate this, we sometimes carry ourselves with that alltoo-characteristic IT elitism (nice word for arrogance) that assumes we know more than our
users. (I’ve definitely done this.) We assume our numbers are right until proven wrong, and we
take some offense to them being challenged.
4. The assumption that because something is accepted it is therefore correct. How many times
have we seen existing, accepted reports or accepted numbers turn out to have been wrong for
years? In other words, there are hidden bugs and logic errors that can go undetected for a long
time. Accepted numbers, even if wrong, can be difficult to get the organization to abandon.
5. The organization wants to clean up its data definitions. When an organization embarks on
Enterprise Information Management (EIM) many hidden definitional misalignments will come to
the surface. Furthermore, the communication of the new definitions is often not as clear, wholehearted, and consistent as it needs to be. When data definitions are in flux - the old, de-facto
definitions can make the new numbers look wrong. This gets even worse when we realize that
language and data are not exact sciences.
6. The new numbers may not see the whole picture. How many of us have ever seen situations
where - even with good definitions and correct business rules - we still have a less than full
picture of the truth when we take the numbers to a larger audience. For example, the classic
case of a “customer” being defined correctly for one group, but when another group looks at it
they say that “customer” means something different to them. An enterprise data warehouse
would need to understand both of those meanings to be a true single version of truth for the
organization.
Anybody that has been involved in EIM or data warehousing for very long can attest that it is difficult
endeavor on many fronts. All of the above, plus many other factors, make DW/BI a very challenging
thing to do well. We must not assume there is a simple solution to this problem, but this does not mean
there is no solution.
So how do we overcome this mistrust of our data? Here are some practical steps:
1. Understand what we are up against – we have the burden of proof. I’m reminded of that quote
by Gandalf in the Lord of the Rings (JRR Tolkien) when he says “It is wisdom to recognize
necessity, when all other courses have been weighed, though as folly it may appear to those
who cling to false hope.” It is a false hope to assume that we will we have easy acceptance and
not have to defend our numbers. We must “recognize the necessity” that the burden of proof
will be on our team to prove our data is as good as or better than what is currently out there.
2. Have a strong focus on user adoption. One thing we have learned (the hard way I’m afraid) is
that user adoption is a critical focus of the DW/BI team. This is typically much neglected in the
specific projects and ongoing resourcing considerations of a DW/BI team. User adoption
includes training, good documentation of business rules/data lineage/data definitions, user
hand holding to help them use the BI tools, reports, cubes, etc. Also, free food never hurts. 
3. Cultivate relationships with key data opinion leaders in the business. There are usually several, if
not many, users in the business community who are very good at really understanding and
putting together the numbers. Many times these are users whose job is to provide information
to their bosses – often this includes senior management. These users are key relationships to
4.
5.
6.
7.
8.
9.
foster because if they have personal stake in the data warehouse they will defend it. We ignore
them at our own peril.
Build data quality validation into the process. Sometimes all it takes for a user to start accepting
new numbers is to clearly understand the process of how we arrived at the numbers (data
lineage) and be shown consistently that the quality of the data is high. Self disclosure of the
process is critical. The source systems, ETL, data warehouse and presentation layers cannot be a
black box. We must demonstrate how data gets from source to destination. This can be
accomplished by a wiki or other documentation that explains where the data comes from, how
it is cleansed and transformed at the different layers of the architecture and how it ends up
looking like it does. Couple this with some good, ongoing data profiling that is documented,
made available, and overseen by a data steward, and we are well on our way to significant user
trust in our data.
Anytime there is a numbers discrepancy – get all the way to the bottom of it. This can be painful
and time consuming, but the burden of proof is on us. This sometimes means cracking open a
source system to find data quality or data governance issues, combing through code in existing
reports or stored procedures, working through definitional inconsistencies, etc. If there are
accepted numbers, for example from an existing report or dashboard, we must either a) match
the numbers exactly or b) be able to demonstrate exactly why our numbers are different and
how they are better because of the process that we’ve put the data through. “I don’t know” is
not an acceptable answer to “why are these numbers different?” We must know, be able to
explain, and document why they are different. Furthermore, do not expect to only have to
explain these reasons once - we sometimes have to show the same user or different users a few
times until they get anchored in the reasons the new numbers are different.
We must admit our mistakes and seek to make them right. Trust can be lost easily. If we make a
significant mistake we must own up to it and aggressively make it right. I recognize this can be a
lot easier to write than it is to do. We must understand that we are attempting to cultivate trust
so we must be careful of anything technical - or in attitude - that would undermine that trust.
Publish EIM data definitions and allow for a feedback loop. This is key to address the issue of the
whole picture of our data. In the example above, where we talked about the two different
business definitions of “customer”, we need to be able to have users see and possibly challenge
our definitions and cause us to incorporate relevant changes as needed. Again, it’s not that the
definition was necessarily wrong – but incomplete.
Develop and incorporate internal “branding” for the DW/BI/EIM effort. Once we know that we
have the data correct and the users accept it we want to brand the report or dashboard with
some logo that represents that we are signing off on this data as correct. We must continually
and aggressively pursue data quality, and then over time, this branding will be a sign of quality
that users will trust. This is a main goal of the DW/BI team: a brand that says data quality.
Keep any portals, dashboards or other presentation content that is managed by our team, fresh
and current. We must not have stale, outdated content or skins in our presentation areas that
we maintain. I heard a statement recently, from one of the members of the Kimball group, that
the only data warehouse some users will ever see is the portal we maintain. In the users’ minds,
reports/dashboard/portals equal the data warehouse. If they are stale, irrelevant, unappealing
that will be the users’ perspective of the data warehouse. Somewhat surprisingly, this can
undermine the trust in the data warehouse. Overcoming this requires some resources from the
DW/BI team to be devoted to keeping it up to date.
In summary, if we understand that 1) we are “guilty until proven innocent” and that 2) the burden of
proof to establish the credibility of our data rests with us we will not be surprised at the need to take
deliberate and aggressive steps to build and maintain that credibility. We need to understand that we
are going to be assumed wrong until we prove, and keep proving, we are right. User trust, as with trust
in other areas, can be difficult to build and easy to lose. We must keep a consistent, intentional focus on
establishing and keeping that trust. Implementing, and continuing in, the above steps can go a long way
to making our DW/BI team into a name synonymous with quality.
Kenny Sargent
Technical Lead, Enterprise Data Products
Compassion International, Colorado Springs, CO
Kenny is the Technical Lead for the Enterprise Data Products team at Compassion International, the
world’s largest Christian holistic Child development organization. He can be reached at
ksargent@us.ci.org.
(Thanks to Joy and Warren from the Kimball Group, Lovan Chetty at Kalido, several folks from Project
Performance Corporation, and my team members who’ve lived in the trenches with me on the Enterprise
Data Products team - for helping formulate and shape some of the ideas presented above. Special
thanks to my Heavenly Father for giving me understanding.)
Download