eBook 6 obstacles analysts face for self-serve analytics and how to overcome them Strategies and tips for implementing self-service analytics for a data mesh architecture 1 The promise of self-service analytics in a data mesh architecture Forward thinking, data-driven organizations have embraced modern data architectures as a way to better manage their data and deliver value at scale. The rise of decentralized environments such as a data mesh architecture are taking this to a new level, enabling organizations with even greater flexibility, scalability, and seamless data movement. A data mesh is also a highly effective way for organizations to improve self-service analytics by allowing data analysts and business users faster access to the data they need without having to transport it to a data lake or warehouse - greatly reducing their reliance on IT and engineering. For any business that wants to truly benefit from this, it is critically important to remove any blockers standing in the way of the value that self-serve analytics can deliver. In this ebook, we will cover the six most common obstacles that data leaders should be aware of when implementing self-service analytics so that you can help your team deliver high quality data products that provide faster decision-making, and greater business agility. Raj Bains Founder CEO, Prophecy 2 Obstacle 1 Slow product delivery Slow product delivery is one of the biggest obstacles data leaders face when it comes to implementing self-service analytics. When data products take a long time to deliver, decision makers may not have the information they need to make timely and informed decisions. This can result in missed opportunities or worse, decisions based on incomplete, inaccurate or outdated information. Delays in data product delivery can cause frustration and erode confidence in the data team’s ability to deliver value to the organization. Here are some of the top factors that contribute to slow product delivery: Lack of visibility into available data The inability of analysts to seamlessly view, access, and analyze data from multiple and diverse sources can lead to significant challenges and hinder decision-making processes. This may arise due to various reasons, such as incompatible data formats, data silos, inadequate data integration tools, or lack of proper data governance. Such hindrances can significantly limit the efficiency and effectiveness of data analysis and create data blind spots. The analysts may struggle to identify correlations, trends, or patterns, and may not be able to extract meaningful insights that could drive business outcomes. Thus, it is imperative for organizations to invest in robust data management and integration solutions to enable their analysts to access and analyze data from different sources without any impediments. Tooling not built for data analysts Most data transformation tooling is not designed with business data user in mind. As a result, there is often an over reliance on engineering resources to help build data pipelines that require complex coding. When this occurs, the timeline for engineering to perform the complicated coding work needed to build data pipelines that adequately perform the integration and transformation work that feeds new data products can stretch from days to weeks or even months. This causes frustration for all stakeholders and keeps engineering from more critical tasks. Poor data quality assurance Low-quality data pipelines can be a significant problem for organizations, as they can lead to a range of issues that can impact the performance and reliability of the system. When these pipelines are built and deployed, they may not function correctly or may generate inaccurate data. These problems can go undetected during testing and development, only to be discovered during the production phase, which leads to needing to be removed and rebuilt. 3 Lack of reusability of data artifacts When data analysts are provided with quick access to pre-existing data sources, datasets, or pipelines, it can result in considerable savings of time and resources. Conversely, the lack of such access can cause significant delays in delivering data products, as analysts are forced to begin the analysis process anew every time a dataset needs evaluation. Inability to identify data quality issues When dealing with data analysis, it is essential to ensure that the data used is of high quality. Poor quality data can pose a significant challenge, as it may contain errors, inconsistencies, or even duplicates. Such data can be time-consuming and challenging to work with, and it can significantly slow down the deployment of data pipelines. Moreover, the insights generated from such data may not be accurate or reliable, thereby impacting the overall quality of the analysis. This is because data pipelines rely on the accuracy and consistency of data to produce meaningful insights. Therefore, it is crucial to invest in data cleaning and validation processes to ensure that the data used is of high quality, thereby minimizing errors and inaccuracies in the final output. The solution A key concept of a data mesh is the idea of data productization, which involves treating data as a product that is designed, developed, and maintained by data teams. Ensuring that a data team is best equipped to help enable self-service analytics and avoid slow product delivery can be done with a data engineering layer that allows organizations to: • Integrate more easily with modern tooling for data engineering and analytics • Quickly build real-time data pipelines that access data from disparate source systems • Enable more data team members to participate in data engineering • More easily collaborate between different teams and departments • Ensure data quality to improve data pipeline reliability and performance 4 Obstacle 2 Lack of ownership and accountability A lack of ownership is the inability to know who owns a particular dataset, how it has been transformed, and who is responsible for various aspects of the process. In order to ensure successful self-service analytics, it is important that data leaders take steps to define who owns the data, or data product, and how it will be managed. Without clear ownership, businesses can expect a variety of challenges to trickle down the workflow: Poorly defined business and technical requirements In order for self-service analytics to be successful, it is essential that organizations develop well-defined business and technical requirements (best practices) which are communicated clearly to all stakeholders. These include having a clear understanding of what data sources are available, how data is structured, as well as how often data is refreshed or new data is added. Not clearly defining this can lead to inaccurate results in analyses and poor decision making. Disconnect between data product and value If there is no clarity into the intended business value of a particular data product, the trustworthiness of the data product will be low. Even if the data product itself is performant and can provide good insights, not having this clarity or understanding leads to lack of usage. Low data product quality Delivering data that may be stale or incomplete can result in inaccurate analysis, incorrect insights, and bad decisions which can do far more harm to the business than good. Additionally, once a data product is viewed as unreliable, then it will not be used or will be viewed as unusable, which is a waste of the time and resources consumed for development. 5 The solution These challenges often occur when an organization has a fragmented and decentralized data architecture, which a data mesh would be a good strategic fit for rather than a data warehouse architecture that would require more centralized ownership of data. This decentralized architecture is best served by a data engineering platform that can: • Automatically ensure data quality • Offer an intuitive low-code environment allowing teams to iterate on data products until requirements are better understood and defined. • Build and modify data pipelines quickly and easily, without having to write code,reducing the time and resources required to create and modify pipelines as requirements become more defined or evolve Obstacle 3 Lack of data visibility Data visibility is how well an organization can monitor, display, and analyze data from disparate sources. With clear data visibility businesses are able to make better informed decisions quickly. Organizational success relies heavily on data visibility because the absence of it can lead to inefficient operations, missed opportunities, increased risk, and poor decision-making. Put simply, having data visibility is critical. Here are some things that affect data visibility and how it can impact self-serve analytics: Understanding where data originates Self-serve analytics are often based on data that is collected from multiple and disparate sources. Not having clarity about these sources and the quality of that data can lead to inaccurate results or incomplete insights. Controlling data change When a data source is modified or updated, the information it contains changes. For example, a sales database may have new entries added or old entries modified when new transactions occur or customer information is updated. These changes affect the raw data and can have an impact on how that data is interpreted and used for analytics. 6 Rising data volumes The introduction of new data to an organization’s data ecosystem leads to an overall increase in data volumes, which can significantly impact the performance of data pipelines responsible for moving data between various stages of the analysis process. As the data volume increases, processing time and latency of the pipelines also increase, leading to potential pipeline failures and data loss. Furthermore, the introduction of new data can influence the insights derived from the data, potentially highlighting previously unidentified trends or contradicting existing data, necessitating changes in analytical models or the creation of new ones. Unclear data labels and definitions Labeling data is a critical aspect of data analysis as it provides context and meaning to information. By adding labels, it becomes simpler to identify, organize, and analyze information. When data lacks proper labeling, it becomes challenging to perform accurate analysis and draw reliable conclusions. The analysis of unlabeled data may lead to inaccurate insights, which can have significant consequences for organizations. The solution In a data mesh architecture, data visibility is important because it enables data teams and other stakeholders to understand and effectively manage the flow of data across the organization. This can be done with tooling that allows the data team to: • Build and visualize data flows, making it easy to understand data lineage and see how data is transformed and where it comes from • Easily catalog data, including metadata and information about data sources, tables, and columns which help users understand data structure and context 7 Obstacle 4 Working in silos Siloed work is one of the more common issues that data teams are faced with, whether their organization is in growth mode or at enterprise-level. This occurs when different teams or team members take ownership of certain data sets or data products, work in isolation, and do not share information, knowledge, or resources with others in the organization. Working in silos greatly limits collaboration and reduces efficiency within an organization as teams can be unaware of the work being done by others. Here’s what that often looks like: Ineffective communication with data engineers Limited or poor communication with data engineering can severely impact the effectiveness of self-serve analytics. Data engineers are responsible for enabling data analysts and other business data users to create the self-service analytics needed to make informed decisions, with limited communication, decision making is delayed at best or poor guidance is shared with the organization at worst. Duplicate efforts across teams In organizations with siloed data structures, it is common for different teams to tackle similar challenges without realizing that their efforts may be overlapping or duplicative. This can lead to the inefficient use of resources, including time, budget, and manpower. Broken transformations When data engineers and data analysts work in siloed structure, engineering is not aware of the data pipeline work done by analysts and whether these pipelines are performing correctly. When there is a lack of awareness and engineering does not have visibility into whether data transformations are being done correctly, the resulting intelligence shared with decision makers can lead to poor decision making. Poor data hygiene Within a siloed data organization, it can be difficult to ensure that data is being properly maintained, updated, and secured. This can lead to errors, inconsistencies, or other issues that can compromise the integrity of the data being used. 8 Disconnect with business needs If data engineering doesn’t have a clear understanding of the downstream use cases for their data, it makes it impossible to ensure that the data being used is properly structured and optimized for specific use cases. Lack of data governance and lineage When siloed teams are working with their own data sets, there is very little visibility into how that data has been transformed and used over time and if there are any potential issues or errors with the data. This impacts efforts to enable self-serve analytics because the data analysts or business users are totally unable to ensure their data is reliable and accurate. As data evolves, the opportunity for quality issues can arise without the proper mechanisms to track modifications, additions, or deletions. This can impact the quality and completeness of the data which can impact trust and reliability in the data. The solution In a data mesh architecture, collaboration is essential for integrating data products, enabling data sharing, and building data pipelines across different domains. This allows data teams to unlock the true potential of their data assets and drive data-driven decision-making and value creation. Collaboration can be ensured with a data engineering platform that offers: • A centralized workspace where data team members can work together in a shared environment, eliminating the need for disjointed workflows and ensuring everyone is on the same page • Visual and low-code tooling that not only simplifies complex data engineering processes, but also enables better communication between data team members as they can all easily understand and contribute to the data pipelines and transformations • Robust version control so that data team members can work simultaneously on data assets, track changes, and merge their work without conflicts • Data lineage capabilities that trace the origin and transformation of data throughout data pipelines, promoting transparency and improved understanding by providing clear insights into data dependencies and transformations • Notifications and alerting features so that team members can receive updates on changes made by others as they happen and opening up communication opportunities 9 Obstacle 5 Insufficient Service Level Agreements Service Level Agreements (SLAs) are critical for ensuring that business users have reliable access to the data they need when they need it. Without SLAs, users experience inconsistent and unreliable access to data, resulting in delayed response times, or errors in their analysis. They may also have difficulty in trusting the data they are using, which can result in inaccurate insights and incorrect decision-making. In practice, this may look like: Performance does not meet business needs Without SLAs, business users are not able to access the data they need when they need it, data quality can be poor, and the accuracy of the data to support decision making can be questionable. Additionally without SLAs in place, there’s no clear accountability when performance issues happen, which can cause friction between engineering and business users. Inability to observe data accuracy and quality When no SLAs are in place that are specific to data quality, neither data engineering nor data analysts/business users, can’t be sure that their business intelligence is accurate. Poor quality data leads to incorrect analyses and insights, and overall poor decision-making. “Garbage in, garbage out” applies to this situation. Unclear timelines Without SLAs in place that enforce overall data availability and quality, there is no guarantee that data products will be delivered on time, or that it will be of high quality. When there is lack of clarity into data product delivery, this causes significant delays in data analysis, and can even lead to failed delivery of data products. Further, a lack of data freshness, or lack of SLAs that ensure data is current/up to date, can lead to delays in decision-making and missed opportunities for growth or cost savings. 10 The solution The right data engineering platform can provide data teams with powerful capabilities that can ultimately lead to improved SLA compliance such as: • Automation of many of the tedious and repetitive tasks (automating workflows) involved in data engineering helps reduce the chances of errors and delays, which can impact SLA compliance. • Low-code data transformation tooling enables a much larger segment of the data team to perform data transformations without having to be coding experts, boosting productivity while lowering the burden on data engineering, and creating a potential productivity bottleneck, greatly improving SLA compliance. • Strong collaboration capabilities ensure that everyone is on the same page and working towards the same goals, which can help improve SLA compliance. Obstacle 6 Code quality and complexity Code quality and complexity each have massive impacts on the success of self-serve analytics but often go hand in hand. Code quality is an assessment of how well written code is. If the code underlying data pipelines or driving business intelligence is not easy to read and understand by engineering or other business users, then the reliability and overall performance of the code can be highly questionable. Code complexity (which is also an indicator of code quality) refers to the amount of code used to build and deliver the reporting and intelligence used to drive business decision making. Here’s how they impact self-serve analytics: Code is incomprehensible Code that’s hard to understand is an indicator of poor code quality. When engineers write code that doesn’t follow best practices, or code is written in a way that other users can’t easily follow along with what is happening, it can lead to issues that prevent self-service analytics from working correctly. This can result in data products that go unused at best, or data products that are unreliable and deliver inaccurate outputs that negatively impact decision making at worst. 11 Breaks between development and production When there is uncertainty that something working in development might break something in production, it can be difficult to trust the data and insights generated by self-service analytics. This uncertainty can lead to hesitation when it comes to making decisions based on the data, as well as a lack of confidence in the accuracy of the results. Inability to perform unit tests Unit tests are intended to confirm that individual units of code are functioning correctly and as expected. Without unit testing, the risk of deploying low quality code that can cause data loss, unreliable outputs, and inaccurate business intelligence. Pipelines are difficult to share When data pipelines can’t easily be shared, it negatively impacts the ability of data team members to collaborate effectively on these pipelines, especially for QA purposes. This can lead to a higher risk of inaccuracies in any analysis driven by these pipelines, or pipelines that perform poorly. From a reusability standpoint, when pipelines can’t be shared easily, there is a greater risk of duplication of work where multiple pipelines may get developed that do the same work, which is a waste of time and resources. The solution Ensuring that code quality is maintained while reducing the impact of code complexity on selfservice analytics can be tied to the capabilities provided for data engineering. These include: • Visual tooling for building data pipelines which simplifies the process of designing and managing complex data workflows. This also helps eliminate the need for complex code and reduce the risk of errors. • Low-code development which can help reduce the amount of manual coding required and improve the overall quality of the code. • Built in data quality capabilities such as data validation, error handling, and remediation suggestions • Strong collaborative capabilities like commenting and sharing, code reusability, and version control 12 Delivering on the value of data mesh with self-serve transformations Prophecy is a low-code data engineering platform that combines the power of a visual ETL tool with the flexibility and performance of custom-written code. Using drag and drop options, users of varying skill levels can automatically generate high-quality code that can be easily reviewed, optimized, and debugged on your cloud data platform of choice with no vendor lock-in. Furthermore, Prophecy enables all data users to build data pipelines that adhere to the principles of data mesh, such as domain-driven design and event-driven architecture, and can be easily shared and reused across different teams or domains. Prophecy empowers data users to implement data mesh with a self-serve, low-code platform, enabling all data users to visually transform and ship trusted data with software engineering best practices. 13 Customer spotlight: Waterfall Asset Management As a leader in managing complex financial assets such as asset-backed credit, whole loans, real assets, and private equity, Waterfall Asset Management understands how critical data and analytics are in order to discover compelling investment opportunities. Implementing their data mesh architecture enabled them to greatly improve the productivity of their data operations as well as investment performance for their clients. Business impact of Prophecy 14x 4x 3 hours improvement in data operations productivity faster time-to-insight for trade desk analysts to complete Prophecy POC The Challenge Manual processes hurts time-to-value Waterfall Asset Management, a global investment management firm, has been on a mission to leverage the power of data insights to improve investment performance and mitigate risk for clients. However, their data operations teams were hampered by manual processes and a legacy ETL system that buckled under the velocity and variety of their data at scale. Hiring more engineers was the bandaid fix, slowing workflows and impacting data quality. 14 The Past Previous to implementing their data mesh architecture, data delivery was slow in their legacy ETL system that required manual interventions by data engineering and was unable to scale to support the needs of the business Missing investment opportunities Less tech-savvy business users were either completely reliant on data engineering to access and transform data or forced to perform manual data work rather than creating business value. Their inability to deliver data-driven insights in a timely and accurate manner impacted client services and portfolio performance. The Solution Empowering business users to do more with data Waterfall chose Prophecy’s low-code data engineering platform on the Databricks Lakehouse — giving business users an intuitive, self-serve tool to visually onboard and transform data without needing to code or depend on engineering. Putting the power of data directly into the hands of the users dramatically increased team productivity and customer satisfaction. Open-source code and data governance built-in As business users develop pipelines, Waterfall’s engineering team is able to access 100% open-source Spark code — automatically generated by the platform. This enables data 15 engineers to ensure software development best practices for collaboration and ensure data products are performant and reliable. The Future Prophecy’s low-code data platform serves as the backbone for their data mesh architecture enabling broader participation in data product delivery and providing massive performance improvements The Results Data that moves at the speed of the market Equipped with a low-code data platform that all data users can use, Waterfall has fasttracked data engineering workflows, reducing the time to prepare and transform data from up to 2 weeks to about ½ a day — a 14x performance gain. The benefits of faster data engineering have been felt downstream, as business users (trade desk) are now receiving actionable data within 2 hours compared to day — a 4x improvement — when they were dependent on the engineering team. And with repeatable frameworks in place, data pipelines can be standardized and easily shared across teams, reducing overall effort and operational errors. 16 “Prophecy removes the engineering barriers that were blocking our ability to help our clients drive investment value with data. Now anyone on our team can go from data to insights much faster, without having to speak to a team of engineers.” Shehzad Nabi Chief Technology Officer Streamlining the path to better investments With the tools in place for all data users to accelerate data onboarding, the data operations team has been able to shift their focus toward more strategic tasks like data governance in an effort to address data quality problems before they reach the end user. With Prophecy in place, it’s clear that Waterfall’s focus on leveraging data to improve investment decisions is paying off and will continue to drive their business into the future. Try Prophecy and realize self-service success Prophecy helps organizations avoid the common obstacles standing in the way of implementing self-service analytics. Through a low-code platform, both technical and business data users are able to easily build high quality data products in a collaborative and transparent way, enabling faster decision-making, and greater business agility. Get started with a free trial You can create a free account and get full access to all platform features for 14 days. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today. Prophecy is a low-code data transformation platform that offers an easy-to-use visual interface to build, deploy, and manage data pipelines with software engineering best practices. Prophecy is trusted by enterprises including multiple companies in the Fortune 50 where hundreds of engineers run thousands of ETL workloads every day. Prophecy is backed by some of the top VCs including Insight Partners and SignalFire. Learn how Prophecy can help your data engineering in the cloud at www.prophecy.io.