Guilty Until Proven Innocent – User Trust in Our Data This week I’m in jury duty. As part of the privilege of being a United States citizen we are occasionally called to serve as jurors so that the defendant, as much as is humanly possible, is assured a fair trial. During this trial, the judge, and both the prosecution and defense, have reiterated on a few occasions that the defendant is presumed innocent until proven guilty. This means that the burden of proof is on the prosecution to prove beyond a reasonable doubt all the material elements of the case. If the prosecution fails to do this beyond a reasonable doubt then it is the responsibility of the juror to vote not guilty. Being a DW/BI practitioner I’ve been reflecting on this “innocent until proven guilty” in relation to user trust in the data from the data warehouse. I’ve realized that user trust in the data is exactly the opposite. In almost every case, until a significant track record is established and maintained, the data warehouse (and any reports/marts/cubes/dashboards that sit on top of it) are assumed to be wrong (guilty) until proven right (innocent). How many of us have ever heard statements such as: “the data warehouse is wrong”, “your numbers are off”, etc? Most of us know that user trust in the data is one of the most important factors determining the success of a DW/BI effort. However, most of us have observed that there seems to be a natural, almost automatic, distrust in the data that comes from the Data Warehouse. I see several reasons for this phenomenon: 1. Resistance to change. We, as people, tend to be resistant to change. Change disrupts, and sometimes it disrupts in a way that is negative to the status quo. Many times individuals have a personal stake in the status quo and don’t want the change. Other times, the change is simply inconvenient and requires the rethinking of assumptions that might have been based on the previous view of the world. 2. The data warehouse is politically-charged endeavor. For example, if the data warehouse reveals business reality in a way that had not been seen before, it can shed light on areas of corporate misalignment, underperformance, areas that are outside the organization core competency and should be spun off or abandoned, etc. This is not a bad thing for an organization, but it may have very negative personal impacts to individuals within the organization. I heard a quote recently that went something like this: “it’s difficult to get someone to understand something when their salary depends on not understanding it.” If the data warehouse has the possibility to reveal things like this it will naturally be open to attack and meet with resistance. 3. Mistakes by the DW/BI team. To be fair, we as DW/BI practitioners have definitely made mistakes at times. We’ve missed valid business cases, we’ve not understood the business rules correctly, and we’ve made mistakes in our ETL, our modeling, our calculations, our presentation, and many other areas. To further complicate this, we sometimes carry ourselves with that alltoo-characteristic IT elitism (nice word for arrogance) that assumes we know more than our users. (I’ve definitely done this.) We assume our numbers are right until proven wrong, and we take some offense to them being challenged. 4. The assumption that because something is accepted it is therefore correct. How many times have we seen existing, accepted reports or accepted numbers turn out to have been wrong for years? In other words, there are hidden bugs and logic errors that can go undetected for a long time. Accepted numbers, even if wrong, can be difficult to get the organization to abandon. 5. The organization wants to clean up its data definitions. When an organization embarks on Enterprise Information Management (EIM) many hidden definitional misalignments will come to the surface. Furthermore, the communication of the new definitions is often not as clear, wholehearted, and consistent as it needs to be. When data definitions are in flux - the old, de-facto definitions can make the new numbers look wrong. This gets even worse when we realize that language and data are not exact sciences. 6. The new numbers may not see the whole picture. How many of us have ever seen situations where - even with good definitions and correct business rules - we still have a less than full picture of the truth when we take the numbers to a larger audience. For example, the classic case of a “customer” being defined correctly for one group, but when another group looks at it they say that “customer” means something different to them. An enterprise data warehouse would need to understand both of those meanings to be a true single version of truth for the organization. Anybody that has been involved in EIM or data warehousing for very long can attest that it is difficult endeavor on many fronts. All of the above, plus many other factors, make DW/BI a very challenging thing to do well. We must not assume there is a simple solution to this problem, but this does not mean there is no solution. So how do we overcome this mistrust of our data? Here are some practical steps: 1. Understand what we are up against – we have the burden of proof. I’m reminded of that quote by Gandalf in the Lord of the Rings (JRR Tolkien) when he says “It is wisdom to recognize necessity, when all other courses have been weighed, though as folly it may appear to those who cling to false hope.” It is a false hope to assume that we will we have easy acceptance and not have to defend our numbers. We must “recognize the necessity” that the burden of proof will be on our team to prove our data is as good as or better than what is currently out there. 2. Have a strong focus on user adoption. One thing we have learned (the hard way I’m afraid) is that user adoption is a critical focus of the DW/BI team. This is typically much neglected in the specific projects and ongoing resourcing considerations of a DW/BI team. User adoption includes training, good documentation of business rules/data lineage/data definitions, user hand holding to help them use the BI tools, reports, cubes, etc. Also, free food never hurts. 3. Cultivate relationships with key data opinion leaders in the business. There are usually several, if not many, users in the business community who are very good at really understanding and putting together the numbers. Many times these are users whose job is to provide information to their bosses – often this includes senior management. These users are key relationships to 4. 5. 6. 7. 8. 9. foster because if they have personal stake in the data warehouse they will defend it. We ignore them at our own peril. Build data quality validation into the process. Sometimes all it takes for a user to start accepting new numbers is to clearly understand the process of how we arrived at the numbers (data lineage) and be shown consistently that the quality of the data is high. Self disclosure of the process is critical. The source systems, ETL, data warehouse and presentation layers cannot be a black box. We must demonstrate how data gets from source to destination. This can be accomplished by a wiki or other documentation that explains where the data comes from, how it is cleansed and transformed at the different layers of the architecture and how it ends up looking like it does. Couple this with some good, ongoing data profiling that is documented, made available, and overseen by a data steward, and we are well on our way to significant user trust in our data. Anytime there is a numbers discrepancy – get all the way to the bottom of it. This can be painful and time consuming, but the burden of proof is on us. This sometimes means cracking open a source system to find data quality or data governance issues, combing through code in existing reports or stored procedures, working through definitional inconsistencies, etc. If there are accepted numbers, for example from an existing report or dashboard, we must either a) match the numbers exactly or b) be able to demonstrate exactly why our numbers are different and how they are better because of the process that we’ve put the data through. “I don’t know” is not an acceptable answer to “why are these numbers different?” We must know, be able to explain, and document why they are different. Furthermore, do not expect to only have to explain these reasons once - we sometimes have to show the same user or different users a few times until they get anchored in the reasons the new numbers are different. We must admit our mistakes and seek to make them right. Trust can be lost easily. If we make a significant mistake we must own up to it and aggressively make it right. I recognize this can be a lot easier to write than it is to do. We must understand that we are attempting to cultivate trust so we must be careful of anything technical - or in attitude - that would undermine that trust. Publish EIM data definitions and allow for a feedback loop. This is key to address the issue of the whole picture of our data. In the example above, where we talked about the two different business definitions of “customer”, we need to be able to have users see and possibly challenge our definitions and cause us to incorporate relevant changes as needed. Again, it’s not that the definition was necessarily wrong – but incomplete. Develop and incorporate internal “branding” for the DW/BI/EIM effort. Once we know that we have the data correct and the users accept it we want to brand the report or dashboard with some logo that represents that we are signing off on this data as correct. We must continually and aggressively pursue data quality, and then over time, this branding will be a sign of quality that users will trust. This is a main goal of the DW/BI team: a brand that says data quality. Keep any portals, dashboards or other presentation content that is managed by our team, fresh and current. We must not have stale, outdated content or skins in our presentation areas that we maintain. I heard a statement recently, from one of the members of the Kimball group, that the only data warehouse some users will ever see is the portal we maintain. In the users’ minds, reports/dashboard/portals equal the data warehouse. If they are stale, irrelevant, unappealing that will be the users’ perspective of the data warehouse. Somewhat surprisingly, this can undermine the trust in the data warehouse. Overcoming this requires some resources from the DW/BI team to be devoted to keeping it up to date. In summary, if we understand that 1) we are “guilty until proven innocent” and that 2) the burden of proof to establish the credibility of our data rests with us we will not be surprised at the need to take deliberate and aggressive steps to build and maintain that credibility. We need to understand that we are going to be assumed wrong until we prove, and keep proving, we are right. User trust, as with trust in other areas, can be difficult to build and easy to lose. We must keep a consistent, intentional focus on establishing and keeping that trust. Implementing, and continuing in, the above steps can go a long way to making our DW/BI team into a name synonymous with quality. Kenny Sargent Technical Lead, Enterprise Data Products Compassion International, Colorado Springs, CO Kenny is the Technical Lead for the Enterprise Data Products team at Compassion International, the world’s largest Christian holistic Child development organization. He can be reached at ksargent@us.ci.org. (Thanks to Joy and Warren from the Kimball Group, Lovan Chetty at Kalido, several folks from Project Performance Corporation, and my team members who’ve lived in the trenches with me on the Enterprise Data Products team - for helping formulate and shape some of the ideas presented above. Special thanks to my Heavenly Father for giving me understanding.)