W H I T E PA P E R 10 Questions To Align Technology with Business Requirements for Your Big Data Project By Neil Raden Datameer Most companies are looking at big data analytics as a must-have capability to remain competitive, but the industry is moving so quickly, it is difficult to find best practices or good guidance about how to proceed. Datameer 10 Q U E S T I O N S T O A N S W E R B E F O R E S TA R T I N G A B I G D ATA A N A LY T I C S P R O J E C T W H I T E PA P E R Why Alignment Matters Big data projects have the potential to turn the existing relationships between technology teams and knowledge workers upside-down. Lines of business are taking the lead with both initiating and directing these big data projects, with IT providing support but not, for the most part, specifications and design. Previously in the world of Business Intelligence and Enterprise Data Warehouses, this process was reversed: IT assisted with modeling the data based on business requirements, building the databases and populating the data according to the model schema. Today, data architectures are built on the fly as potentially interesting data is sourced and models are essentially driven from the data. Not surprisingly, this role-reversal and general lack of alignment has introduced challenges -- in the big data solution buying process, in the implementation, and even the proving of ROI. This change in roles, and alignment on how a new big data solution furthers specific business initiatives, needs to be understood by all the parties before investing in a big data solution. Following are 10 questions to ask your counterpart in business or IT to help align your big data initiatives with the goals of the organization. 1. Are there important questions, or ongoing analyses, that we have not been able to address either with current tools and infrastructure, or from a lack of resources? Until recently, most enterprise analytics initiatives followed a similar structure: data was gathered from various, mostly internal sources, integrated into a common logical model and housed in a series of relational databases. Traditional systems have often taken a long time to deploy, but once in place are highly efficient at answering fairly standard business questions like “How much of Product X are we selling by region? By month? By channel?” What traditional BI processes are not as good at answering are questions like “How can we predict which customers are most likely to churn?” or “how can we understand which customer interactions are most valuable in helping to turn a prospective customer into a paying customer?” If these more sophisticated questions are among the most important for your business and cannot be answered by your existing systems, now is the time to investigate big data. PA G E 3 Datameer 10 Q U E S T I O N S T O A N S W E R B E F O R E S TA R T I N G A B I G D ATA A N A LY T I C S P R O J E C T W H I T E PA P E R Today’s computing economics and self-service analytic platforms make it possible to discover the answers to problems that were not conceivable even a few years ago. Even the end-to-end process is vastly simplified with better tools and methodologies. Think about how you’ll address resource constraints if you have them. You won’t be able to do more with new technology until you can put resources behind it. Thankfully, modern approaches and platforms can offload some of what used to be the “IT backlog,” but business users and analysts, are busy too, so have a plan in place. 2. Do we know what others in our industry are doing? Will we begin with fairly standard industry metrics or models, or attempt to break new ground? Unless your business is so exotic or so unique, it is likely that others in your industry have already addressed some of the really important industry problems successfully. Assistance from third parties can accelerate the first project. For example, there are canonical applications for telco companies, which all routinely develop models for churn, financial services firms for fraud and retail operations for customer intimacy. Though each may have its own distinguishing characteristics, the models, methods, data and analysis can be similar to a large extent and can inform your project with what is already known. Excellent sources of insight are available online in professional discussion groups. However, if you are “breaking new ground,” it will take some skill and work to estimate the level of effort you will need. 3. What was our experience with BI? How can that experience provide guidance toward organizing our big data analytics project? BI implementations provide many useful benefits, but they have historically not been adopted by large numbers of knowledge workers in organizations. It is important to understand why this happens and to ensure that your big data analytics will be widely used by many types of workers. While BI projects can be highly successful, when examining the reasons behind projects that don’t achieve their intended benefits, recurring themes emerge that you can use to improve your likelihood of success with big data analytics. Items such as sustainable executive sponsorship, direction from the business community, focusing on “quick wins” or “low hanging fruit” to draw some positive attention, and a commitment to improving everyone’s skills are attributes of highly successful BI projects. PA G E 4 Datameer 10 Q U E S T I O N S T O A N S W E R B E F O R E S TA R T I N G A B I G D ATA A N A LY T I C S P R O J E C T W H I T E PA P E R BI use is stratified. To be effective, it requires a contribution from both IT and those types of workers who straddle the line between IT and business analysis. This creates a hierarchy of use, where there are a small number of the “power users,” which usually means they understand the data and models, have mastered the advanced capabilities of the tools and know how to get more capability from IT when necessary. But those requests take time. Less informed analysts (with respect to the BI technology) rely on the power users to get their requests handled and utilize only a fraction of the software capability. Still others consume the output of the second group and transfer it to presentations and spreadsheets. Big data analytics can avoid this stratification with better software and flow-through architecture, from data ingestions to analysis with far less dependence on IT. Your project should see that your staff is ready to gain the expertise they need to make the investment valuable, but some diligence is required to ensure that old habits of 3-tier BI do not persist. Some organizations are so traditional in their work, reporting relationships, time-tovalue, etc., that there is little motivation to gain new skills. It’s a good idea to take the temperature of the organization to see if they are ready act on new insights gained from new processes. 4. What type of “data problem” do we really have? Do we know what data we need, where it is located and what format it is in? Big data does not always imply great volumes. It may simply be that the number of data sources, or their structure, do not fit the old data warehouse model. Questions you need to ask are: • Can you reasonably predict its volume? • Is there data complexity - will you only need fairly structured data or are there other diverse types that you will need to learn to work with? • Will you use external sources such as syndicated data or data from partners that you haven’t used previously? Using Hadoop-native big data analytic platforms, data can flow from ingestion all the way through to end-user analytics without creating an architecture of many intermediate repositories and proprietary tools. The core difference between an EDW architecture and carefully designed Hadoop Analytics approach is the ability to “model on the fly” (sometimes called “schema on read”), which provides tremendous flexibility and the ability to quickly iterate through the insight discovery process. This is especially true if the data is in unstructured and variable formats. This allows the Hadoop-native analytic platform to figure out its structure for analytic purposes without requiring the IT team to get involved. PA G E 5 Datameer 10 Q U E S T I O N S T O A N S W E R B E F O R E S TA R T I N G A B I G D ATA A N A LY T I C S P R O J E C T W H I T E PA P E R For a first project, this type of investigation need not be comprehensive, but spending some time thinking about it and investigating the skill and effort that might be needed later would be time well spent. Remember, unlike traditional data warehousing and BI, you are not punished for augmenting your data sources over time. In a capable big data analytics architecture, it is assumed that data and analyses are dynamic. 5. What will we do with the analytic results? What do we expect to gain from this? Will our culture be able to absorb results from advanced analytics and machine learning? Where will the value come from? Process efficiency? Customer retention or up-sell? Marketing productivity? Higher customer conversion? More revenue from existing customers? Increased sales productivity? Better customer experience across all channels? Improving service productivity and time to resolution? What approach do you plan to take, and do you have the skills in place to execute on it, at least on a pro forma basis? Looking at financial measures like ROI or IRR is always a little informal because analytics, the process of informing people or processes, does not on its own always show up in the balance sheet. It takes other processes that benefit from their analytics to show true return. In fact, even measuring costs and benefits of an analytics solution is pretty murky. You also need to ask: how will you assist your organization to adopt new processes born of learning from big data analytics? Developing spreadsheets and presentations for meetings and discussion and agreement based on rows and columns of numbers is quite a bit different from basing decisions on statistical models, probability, credibility and causation. New cultural norms are sometimes necessary as new insights are discovered. Cultural attributes that can undermine success with analytics include: • Using analytics to assign blame, “hold people accountable,” or apply pressure. In that model, the organization will tend to resist broader adoption and deeper investigation of analytics. • “Explaining away” the findings of the analysis. Environmental factors and organizational evolution can sometimes create “false positives” in analytics. That said, if most analytical insights are regularly challenged, your organization likely has either dirty data, or a culture that prefers managing from gut feel and guesswork rather than being data-driven. PA G E 6 Datameer 10 Q U E S T I O N S T O A N S W E R B E F O R E S TA R T I N G A B I G D ATA A N A LY T I C S P R O J E C T W H I T E PA P E R 6. Will we be able to control confirmation bias? With lots of diverse data, it is fairly easy to prove almost any hypothesis. So you need to ask yourself: will we be able to implement a good process that helps us discover the answers the data shows, not simply confirm a theory? Rather than starting with a hypothesis and trying to gather data to support it, try to “listen to the data” with an open mind. Beyond that, if others in the organization perceive that analysis has been provided with an “agenda” behind it, they will tend to reject the data and any conversation or idea predicated on the “suspect” analysis. 7. Will we deliver the application using a cloud, or on-premise? Decisions about cloud versus on-premise are important with respect to capacity, flexibility, cost, reliability, experience and a host of others factors. Cloud solutions will tend to be more attractive when organizations are lacking: • Capital budget to procure hardware and license software, and can be funded from operational expense (OPEX) • Technical skills sets required to deploy, integrate, and manage various components of the system • Time for a full procurement and deployment process, giving the organization a “short cut” to value If big data analytics is potentially your first serious initiative in the cloud, these issues require careful consideration: • Security – how secure is the cloud solution architecture and infrastructure, and do you have the appropriate procedures in place internally to maintain ongoing security? • Compliance – certain countries have regulations on securing specific data sets and maintaining the data within that same country (Safe Harbor laws, etc.). Does your cloud solution provider support this? • Ownership and administration – if the system is in the cloud, will the business users maintain it or will IT still have to manage it? The expected approach should be agreed upon before embarking on a cloud deployment. PA G E 7 Datameer 10 Q U E S T I O N S T O A N S W E R B E F O R E S TA R T I N G A B I G D ATA A N A LY T I C S P R O J E C T W H I T E PA P E R 8. How will we know that it worked? If management asks in 6 months, ‘Was it successful,” how will we answer? Your planning will surely include some financial metrics (decommissioning existing proprietary solutions, or at least avoiding costly upgrades, measurable improved performance in any number of ways) but think about non-financial improvements as well. One life insurance company saw 50% annual turnover in their actuarial department drop to almost zero when the actuaries were finally able to do creative work instead of chasing data from legacy systems and trying to reconcile it. A charitable organization saw its fundraising costs drop from 22% to less than 10% by discovering the most effective programs and donors. Management loves numbers, so be sure to use some metrics. But we’d also suggest some mention of engagement of the staff, increase in analytical skills and output and especially cycle time for new analyses. What did the organization learn that it didn’t know before? How have processes changed given new insights? What hard ROI can be demonstrated? What would happen from a business and user perspective if the system was decommissioned tomorrow? Is there a next phase planned? Why or why not? 9. If we don’t yet have ongoing funding, do we understand where that funding will need to come from and what will be required to justify it? After all, even the most successful and wealthy companies regard investment as an exercise in scarce resources, so don’t end up stranded and unable to meet your goals for lack of funds. Of course, implementing in the cloud bypasses, to some extent, the capital budget merry-go-round and may be a good alternative if capital funding is not a possibility. It can be easier to get organizational funding for a pilot or proof-of-concept. However, it can take meaningful time and effort to deliver a pilot. It’s important to document what the pilot is supposed to achieve, with a clear and specific use case, and “ball park” ROI. There’s no point in spending time on the pilot if it will ultimately be shut down for lack of ongoing funding, even when it demonstrates value. Understand what organization(s) would be expected to fund the project post-pilot, and get clear agreement on the expectations of the pilot, and the proof-points that will be required for ongoing funding. PA G E 8 Datameer 10 Q U E S T I O N S T O A N S W E R B E F O R E S TA R T I N G A B I G D ATA A N A LY T I C S P R O J E C T W H I T E PA P E R 10. Have we spent enough time understanding the level of effort to gather and integrate data we’ve never seen before? Big data, whether of great scale or diversity, or both, poses problems not seen in existing analytical environments. ETL tools used for data warehousing are largely process scripting and monitoring tools and weak on data that isn’t neatly structured. But with big data, you will need skills and tools for data that is, frankly, a little messy. That may require some machine learning (ML) as well as orchestration of the ingestion process, whether streaming or in batch. Use a Proof of Concept to get a sense of what you are facing in terms of time, skill and expense. ABOUT THE AUTHOR Neil Raden is an author, consultant and industry analyst, a featured figure internationally and the founder and Principal Analyst at Hired Brains Research, an industry analyst and consulting firm specializing in the application of data management and analytics. Hired Brains focuses on the needs of organizations and capabilities of technology by providing context to the often-bewildering choices facing organizations. Rather than producing market studies ranking companies, Hired Brains guides organization through a process that stresses value and realistic planning. Beginning with his work as a property and casualty actuary with AIG in New York before forming Hired Brains in 1985 to deliver predictive analytics services, software engineering, and systems integration, he gathered experience in delivering environments for decision making in fields as diverse as health care to nuclear waste management to cosmetics marketing and many others in between. With a mixture of research and advisory work infused with experience working with clients on real projects to provide context to the industry, Hired Brains assists both providers and consumers of technology. He welcomes your comment nraden@hiredbrains,com. PA G E 9 FREE TRIAL datameer.com/free-trial T W I T T E R @Datameer LINKEDIN linkedin.com/company/datameer ©2016 Datameer, Inc. All rights reserved. Datameer is a trademark of Datameer, Inc. Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. Other names may be trademarks of their respective owners. SAN FRANCISCO NEW YORK HALLE 1550 Bryant Street, Suite 490 9 East 19th Street, 5th floor Datameer GmbH San Francisco, CA 94103 USA New York, NY 10003 USA Große Ulrichstraße 7 – 9 Tel: +1 415 817 9558 Tel: +1 646 586 5526 06108 Halle (Saale), Germany Fax: +1 415 814 1243 Tel: +49 345 2795030