Machine Learning Product Manual VERSION: 0.0.5 Authors: Laszlo Sragner, Christopher Kelly Contact: mlpm@hypergolic.co.uk Website: http://machinelearningproductmanual.com/ © 2020-2021, Hypergolic Ltd, CC-BY 4.0 Introduction This handbook will help you to plan, organise and execute cross-functional machine learning projects in small team environments. Throughout this document, you will learn the lifecycle of these projects, what assets you need to create and how to operate them on an ongoing basis. Finally, it will give you the reasoning on why these steps are necessary and how it will bene t your product. Bene ts Business-driven problem solving: Framework to surface, evaluate and prioritise potential opportunities. Consistent low friction organisational structure: Instructions on how to build cross-functional product teams and involve domain experts and engineers in machine learning projects. Fluid communication among teams involved through meaningful reports and meetings. Fastest way into production and feedback from real users: Detailed instructions on how to build a machine learning MVP. Workload focus on the core product team: Describes how to allocate resources and update plans on an ongoing basis. Granular investment of time and resources: How to improve results on an MVP on an ongoing basis. How to use this book This manual treats machine learning as a product management question that will be part of the organisation on an ongoing basis. Decisions that are made during the initial phase of the project should take into account long term impacts as well. The manual is deliberately opinionated and prescriptive, so the readers need to evaluate their circumstances if they want to deviate from this roadmap. One of the core bene ts of this book's method is the granularity of the process with many small goals to achieve, which makes the process very exible. The Bene ts section above details the advantages of adopting the process. The Executive Summary section gives you talking points to discuss with business leaders at your company to build organisational support. First: Read Overview to get a very high level overview of the whole process. Second: Read Bootstrapping Overview and Operating Overview about how the product will be rolled out technically. Third: Read Organisation Overview about what teams you need set up and how they will communicate with each other. Fourth: Read everything else from the beginning of Lifecycle and keep reading the corresponding Organisation and Communication bits as you progress forward. License © 2020-2021, Hypergolic Ltd, Some Rights Reserved Except where otherwise noted, this work is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) See also: License Acknowledgements The Authors would like to express their thanks to the authors of: MkDocs Material for MkDocs Fastapi Mermaid for providing tools and examples to make this eBook possible. Thank You! Diagrams by LucidChart, available on request in SVG format at mlpm@hypergolic.co.uk . Contents Introduction (this page) Executive Summary Overview Lifecycle Overview Business Understanding Overview Identifying Business Problems De ning KPIs Ranking Business Problems Planning Overview Approach to Solving Problems with ML ML MVP Design Workshop Resourcing Business Problems Initial Product De nition ML MVP Design Review Decision Overview Product Strategy Business Plan Go/No-Go Product Decision Bootstrapping Overview Organisational Setup Initial Project Structure Initial Task Execution Operating Overview Product Product Overview User Research User Analysis Product Concepts Product Requirements Product Prioritisation Periodic Business Review Feature Iteration Core Iteration Business-As-Usual Organisation Overview Business Leadership Team Product Management Team Core Product Teams Domain Team Data Science Team Data Acquisition Team Functional Teams Engineering Team Data Warehouse Team Business Intelligence Team Communication Overview Meetings Overview Special Product Kick-Off ML Design Workshop ML Design Review Regular Product Meeting Coordinating Sprint Planning Sprint Planning Daily Standup Company Wide All-Hands One-To-Ones Artefacts Business Problem Template Business Problem Ranked List Product Strategy Template Business Plan Template User Persona Template Product Concept Template ML Product Requirements Template Meeting Minutes Template Other Communication Monitoring Dashboards Reports Object Handovers User Interview Questions Glossary Citing Changes License [CC-BY] Executive Summary Only 10% of companies obtain signi cant nancial bene ts from machine learning technologies (source: MITSloan). Until now, machine learning has been dominated by traditional techniques that don't support the needs of businesses. This manual introduces a framework that aligns business value and machine learning efforts in your organisation. It creates transparency through regular feedback from customers and more con dent nancial projections. Using rigorous business processes, you have granular control over the strategic allocation of resources. Cross-functional involvement leads to higher organisational trust in machine learning projects and enables adoption at scale across the company. Transparency Instead of waiting for results until the complete product launches, the framework shows you how to cut it into smaller standalone deliverables. Each iteration is validated at the market by selected users, and the project team reports live product performance and updated estimations. This creates continuous insight into the project's health status for both the organisation and its executives. The project presents real business results rather than monthly progress reports. Control Traditional machine learning projects only deliver results after they have consumed most of their allocated time and resources. Using this framework project teams work iteratively, delivering results continuously. In each cycle, the team reorients the project based on insights that were discovered in the previous round. Executives have a clear view of progress and can intervene if the project deviates from original objectives. This transparency and control leads to increased oversight and reduces potential opportunity costs. Trust A better organised, more transparent and more controllable project is a trustable project. Rather than being in the dark and reading never-ending progress reports, teams and management get real results, make real decisions that impact the future materially. Executives can be con dent in their decisions investing in machine learning. This trust leads to more positive involvement in these projects across the organisation. Organisational Transformation The framework has a long-term, transformational effect on adopting machine learning across the organisation. Currently, the high failure rate of projects leads to decreased trust and involvement in them across organisations. Data Scientists are siloed to machine learning projects and fail to drive cross-functional engagement. Apart from the bene ts during the product's lifetime, the framework drives organisational adoption through three main factors: 1. The close alignment with business value motivates the whole organisation to be involved in machine learning projects. 2. Cross-functional teams have the opportunity to repeatedly practice their roles which builds institutional knowledge and reduces friction in future projects. 3. Adaptive planning allows teams to focus on the most valuable opportunities according to their expert opinions. This increases involvement, autonomy and general ownership and enables a positive outlook on future engagements. Organisational transformation with machine learning requires engaged cross-functional teams across the whole company. The framework described in this book enables them to overcome the challenges traditional techniques struggle with. Project teams that adopt it will be able to successfully execute high-value projects and spread this institutional know-how throughout the company. Overview This manual is broken down into three major parts. Each of these has an overview that gives a summary: Lifecycle Organisation Communication The Lifecycle part describes how to break down the creation and management of the product: what the team needs to do, when and why. The Lifecycle is divided into ve phases each described in separate chapter. Each chapter is divided into sections which describe a step according to three components: Goals, How to do it and resulting Artefacts. This structure will be familiar if you have used Microsoft's Team Data Science Process before. The Organisation part describes a suitable setup of teams and roles that can facilitate the above process. When designed, we made sure that each team's responsibility matches their unique skillset so they can work in their comfort zone and focus on what they do best. Each step in the lifecycle has its designated responsible team. That team is optimally situated to deal with that and own it throughout the lifetime of the product. The Communication part describes the required channels and artefacts between the teams and the meetings to facilitate these. To create a low friction setup, these are kept at the minimum and optimised, so the ow of information is as e cient as possible. Lifecycle of a Machine Learning Product This article outlines the high-level stages of Machine Learning product creation. Each phase consists of several steps that each has its own goals, roles, tasks, and deliverables. Machine Learning Product Lifecycle consists of ve phases: Business Understanding Planning Decision Bootstrapping Operating Machine Learning Product Lifecycle (Phase 1) Business Understanding The goal of this phase is to understand the range of business problems that can be solved by Machine Learning and then select one to work on. It requires a good understanding of the environment, the clients and how it ts into general corporate strategy. It also needs to assess if Machine Learning is the right solution and reject it if not. Planning Before work on the product can commence, the right stakeholders need to create a plan that allows MVP style early productionisation and later iterative improvement cycles. During two workshops, the Product Team agrees on the initial target personas for the MVP. Decision Planning is followed by a GO/NOGO decision by the Business Leadership Team and enables the Bootstrapping and Operating phases. The Product Team estimates the one-off and ongoing costs and ongoing bene ts to support the decision. Machine Learning Product Lifecycle (Phase 2) Bootstrapping The goal of an MVP driven product is to shrink the time and resources needed between planning and continuous operation. This phase is Bootstrapping. The project structure is implemented, the communication interfaces are established, and a partial solution is deployed into production. Operating After Bootstrapping the Product Management Team collects feedback from Business Leadership, Users and the Domain Team. Based on this feedback, they create feature requests with the Core Product Teams (Domain, Data Acquisition and Data Science Teams). The Core Product Teams are responsible for iterating on the feature and creating the assets that can be deployed and prove that the feature is complete. Business Understanding Most products and businesses will, at any one time, have many different business problems that present themselves. These problems may emerge from client conversations or marketing observations, via sales, marketing, customer success, or other customer-facing teams, and maybe closer or further from any existing business strategy. The vast majority of teams will not have the resources to tackle all of the business problems of which they are aware of. For this reason, it is necessary to analyse and prioritise problems to work on. Many different prioritisation frameworks can be applied to business problems (some popular examples include RICE, MoSCoW, Kano, and Buy-a-feature). We are not prescriptive about the particular model that any team follows, but rather provide a minimal, lightweight framework for prioritising problems emphasising the particular nuances of prioritisation in ML. Generally prioritisation, and in particular the activities that precede it should be undertaken with the understanding that ML products are higher risk, and product success is not guaranteed. Spending longer periods re ning estimates of work or cost will only minimally impact prioritisation and will have diminishing returns given the challenges of estimation. In this section Identifying Business Problems for ML De ning KPIs Ranking Business Problems Identifying Business Problems for ML Business problem identi cation is an ongoing process in any organisation. It is expected that business problems will be collected from a wide variety of stakeholders within and outside of your organisation. These problems may not be well-formulated, and it cannot be assumed that the problems have been analysed carefully for t with the ML capability or business strategy. The product manager is responsible for organising the collection of information, analysing and where necessary re-stating problems to include su cient information for review, and maintaining a library of these problems. Given the relative newness of ML and the exceptional coverage of AI in the media, the perception of its capabilities by the layperson is often distorted and poorly understood. This has two impacts in particular: 1. Existing business units may not effectively identify the most impactful business problems for which ML can be useful. 2. The new capabilities of ML create new opportunities for products, to solve business problems which are currently not vocalised by the market. As such in a larger and more established organisation it will often be bene cial for an ML Product Manager to partner with product managers and business leaders in other business units, to better identify use cases which support the unit's business objectives. It is expected that at the end of this process, a collection of business problems will be stated with su cient quality for review. It is important to implement a framework which makes the process of developing and reviewing these business problems easy. Goals Analyse a business problem to determine whether it is a good t for an ML function. Document the identi ed business problems in a collection for a review. How to do it Collect business problems: Collect business problems from a wide variety of stakeholders. Analyse a business problem for suitability of ML as a solution: Determine whether the business problem is relevant to the capability. Document the business problem: Prepare enough information to ensure that the business problem is clearly expressed and capable of being assessed for importance and any particular challenges. Collect business problems As discussed above, business problems should be collected as widely as possible. It is often more useful to create a straightforward way to capture a business problem, as it is expressed. This business problem might be captured during or immediately after a meeting with a client, from a conversation with a salesperson or at another time. It may be useful to allow other stakeholders in the organisation the ability to enter identi ed problems directly. This will allow for easy passive problem collection. When analysing an existing product or business area, it will be faster to partner with internal domain experts to identify business problems. Care must be taken to build a mutual understanding of the ML capability, as well as the business objectives. As a minimum, the following should be understood and analysed: 1. Who are the current users and buyers of the product (often personas will be available for review)? 2. What are the jobs that those users and buyers are doing, for which the product is used? 3. How do the users and buyers measure success in those jobs; how do they value different jobs being completed and the attributes of completeness (e.g. accuracy, speed, cost)? 4. How does the product operate from a functional perspective? 5. What data is consumed, transformed and passed to end-users? Is the data of good quality, is it structured or unstructured, is it time-sensitive? 6. What is the current strategy of the business unit? 7. What KPIs does the business unit use to measure success? 8. What does the current product roadmap look like for the business unit, and what requests have been raised to add to the roadmap? 9. What observations have been made on competitor capability, win/loss analysis and the wider competitive landscape? This is not an exhaustive checklist, but understanding and documenting these attributes will provide a sound foundation for further ideation of ML-relevant business problems. In particular, you should be aware that some of the best opportunities for the application of ML to the existing product might exist in areas adjacent to those currently served. As such, it is crucial to document adjacent data ows and business processes. Example Consider a helpdesk ticketing product, used by a customer service team to manage incoming queries. In a typical legacy system the customer service agent receives an email, opens a browser and accesses an internal system, where they enter the details from the email and select a problem or query type from a drop-down. The system routes the ticket to an available person in the correct team, based on the problem. An analysis of the ow of data for this system would start from the point at which a ticket is created, and end at the point at which the ticket is closed. Some useful features might be introduced using ML, for example a system could be implemented to try and predict volumes of queries following an OS upgrade, though opportunities to add value are limited. Over the last few years helpdesk systems have integrated more closely with the communication channels used by the underlying customers. Whilst the central ticketing concept remains, these helpdesk systems connect directly with email servers and clients to ingest customer emails, and integrate with live chat tools. An analysis of this data ow starts earlier, from the point at which the customer rst expresses the problem. This opens up new opportunities for using ML. Instead of a customer service agent reading the email and directing the ticket, an ML system can classify many emails and immediately route them to the correct location. Other ML tools can look for anomalies in incoming emails to more quickly identify emerging issues. As these systems mature, some queries can be responded to directly and without human intervention. All of these processes signi cantly improve the customer experience and make helpdesk tools more valuable as a result, but they would have been far more di cult to implement without looking at the work ow adjacent to the original ticketing process. Analyse a business problem for suitability of ML as a solution Business problems should be assessed for suitability and alignment with ML. An exhaustive review is not useful at this stage, but two tests are particularly helpful: Is there an engineering solution to the problem? Generally speaking, if an engineering solution exists, that should be the priority to solve the problem. Is the information required by the ML model generally objective? Most problems will require some data labelling to train the ML model. If you were to describe the labelling task to a diverse group of people who have su cient general knowledge of the problem to complete the task, will they all label consistently? Document the business problem Finally, if the identi ed business problem has been accepted as relevant to the ML team, then it needs to be written in such a way as to be assessed for relevance. Problem statements should contain the following information: Describe who is experiencing the problem. Describe the current state experienced, and the desired or ideal state. Describe any relevant information about the things which prevent the person or organisation achieving the ideal state currently. Describe the bene ts to the person or organisation of achieving the desired state. Artefacts Business Problem Template De ning KPIs Business problems are prioritised by understanding their impact on a business. This impact should be understood in terms of the difference in current and future state, as de ned in Identifying Business Problems for ML, and the business value associated with that state change. The nature of ML solutions is such that an early de nition of KPIs for any business problem is the most useful way to understand and facilitate discussion of impact. KPI development is an iterative process, so whilst it is discussed here in the context of understanding a proposed business problem it will be necessary to periodically revisit the KPIs for any data product to ensure they remain the most relevant metrics to track. There are extensive resources available online (e.g.: www.whatmatters.com) to guide a discussion of and re nement of KPIs, and the estimation of business value. The process here may be adapted further based on your use of OKRs, or any other internal prioritisation system. Goals De ne one or more KPIs which will be affected by solving the problem and the anticipated movement in those KPIs. How to do it De ne measurable outcomes: List the measurable outcomes that are expected to be achieved by solving the business problem. De ne the business KPI impact: Business KPIs are the core metrics tracked by the business to understand business health. De ne measurable outcomes Measurable outcomes are the things which you expect to change as a result of solving the business problem, or from moving from the current state to the future state. They have several attributes: They should express measurable milestones which, if achieved, will advance objective(s) in a useful manner to their constituents. They must describe outcomes, not activities. "Deploy a model" is not an outcome. "Reduce the time to complete X task" more clearly represents the desired change. They must include evidence of completion. This evidence must be available, credible, and easily discoverable. It will often be necessary to include more than one measurable outcome for a particular business problem. A good test is to ask "can I meet the measurable outcomes without fully solving the business problem"? If the answer to this is yes, then you need to iterate further. De ne the business KPI impact Once the measurable outcomes have been documented, it is necessary to provide some estimate of business impact. Business KPIs include ARR, EBITDA and others. Business problems can alternatively be ranked based on product KPIs if these are in use. Product KPIs include DAUs, CAC, NPS, retention rate/churn and many more. Where existing ranking systems are in use, as far as possible, they should be followed, including any adaptation for their use in ML. One or more measurable outcomes is likely to be a desire to improve the performance of something. Particular care should be taken, ensuring that measurable outcomes and business KPI impact remain aligned. This is the advantage of ranking ML problems from a foundation of KPIs. Example A streaming video service may describe a problem that "users watch one video, but struggle to nd another and often switch to another service for their next show." The desired end state might be "make better suggestions to improve the proportion of users who watch a second video." When expressed as a de ned outcome, we need to know how we expect this proportion to move (say, from 10% to 50%). We anticipate that the movement in this KPI will result in increases in user retention rate. We might estimate that this will improve the retention rate by 1%. In this case, we can now inspect the business plan more clearly. Is it realistic to improve second-video-watching from 10% to 50%? If this metric were currently at 95%, and the expected improvement was to 96%, would we expect that to have a signi cant impact on retention rate? The goal is not to complete a case study of each business problem here. The key point is that care and attention should be taken to understand the potential to improve some relevant aspect of the product. Ultimately, it will often be best to score impact on a simple 10 point scale. Nonetheless, some understanding of the underlying expectations of performance changes should be documented and available for discussion and review with other stakeholders. Artefacts Business Problem Template Ranking Business Problems As highlighted in the Overview, it is essential and necessary to rank business problems. There are more business problems than can be tackled with the available resources, so the purpose of ranking is to enable the optimal selection of business problems to tackle. The order should take into account the perceived business impact, as well as any impediments to delivering the solution. It does not include an estimation of cost. It is almost always better to attempt a partial solution quickly for important business problems than attempt to nd a cost. Any cost you nd is likely to be speculative and unreliable, given the inherent risk in solving problems with ML. The ultimate objective of ranking is to select the next business problem to review in more detail and attempt to address. That said selection of a business problem to develop more fully does not commit you to complete development of the solution. The process of reviewing and documenting a potential solution necessarily involves a more thorough analysis that can lead to the discovery of impediments to the solution. It is important to ensure all stakeholders are con dent of a solution after this process has been completed. Goals Rank business problems in priority order, based on the value of solving them. Select the next business problem to work on in more detail. How to do it Assign a friction score: A friction score recognises and takes into account any particular impediments to delivering the solution described. Order the business problems: Once bene t and friction have been documented, problems can be ranked for selection. Assign a friction score ML problems can be affected by different types of risk, and many ML products ultimately fail because these risks are not acknowledged or tracked. As such, it is important to take account of certain areas of risk when tackling business problems. This risk will be measure with a 'friction' score. Availability of data to label. Do you have all of the data you need to train a model on? If not, where will any additional data be acquired from? Ownership of data. Do you model on data you legally own? You may need to understand your company's role in terms of GDPR. Is your intended use for this data lawful and consistent with data regulations and any internal rules? A basic test is; if you got this information from a person, are you sure that the product you are creating is consistent with the reason they gave you the data in the rst place? If the data is from a company; do you have the right to use the data to build other products? Availability of domain experts to label data. Given that there is an ML model to be developed, it is likely that some training data will be required. This will need to be labelled. If this process is domain-dependant, are domain experts available who can provide those labels? If not, who will label your data? Legal/regulatory issues which may impede the ability to operate in production. Are there particular regulations which govern the area you are working on? For example, a model which improves acceptance/rejection decisions for nancial products will have to pass signi cant hurdles to ensure it doesn't introduce any bias in decision making. This will create signi cant friction in deploying the model, delaying the realisation of any perceived bene ts. Engineering structure in which the model is used. Does solving the business problem with ML require signi cant changes to a well established work ow? Or can a solution be imagined which broadly follows existing engineering patterns? Sometimes this challenge can be addressed by a carefully structured MVP; perhaps the useful information or output in which the model is used can be delivered over a basic email or supplementary web page; if bene ts are proven then the additional investment in engineering change is less important. Do not over-weight this component. Taking into account of these risks, a friction score should now be assigned to the business problem. Again this is a general score, so a value of 1 - 10 is su cient for ranking purposes. Order the business problems Business problems must now be ordered. The most straightforward method of ordering is to combine KPI impact and friction scores. If KPI impact is measured from 1 (low impact) to 10 (high impact), and friction is measured from 1 (low impediment to solution) to 10 (high impediment to solution), then a simple score can be calculated from (KPI impact) X (10-friction) . New problems can be selected from the top of the list. More complex scoring systems can be developed from here, and these might be more or less strategic. For instance, you may nd that many of your highest-impact problems also have higher friction in delivery. You may opt to create a balanced delivery framework in which a mixture of high-impact / high-risk projects and low-impact / low-risk projects are worked on together. Development of this framework will depend on your particular circumstances. Artefacts Business Problem Ranked List Planning Overview All ML projects continue with a planning phase, during which the initial MVP scope is set. This scoping process requires extensive cross-functional work to ensure an MVP is agreed upon. The MVP must be deliverable in a short timescale, and must provide meaningful business value for an identi ed group of constituents. The ML MVP differs from a more traditional MVP. They were made famous in Ries' The Lean Startup. These are focused strictly on nding the minimum set of requirements which deliver value for the identi ed Early Adopters. In this paradigm, the risk of a new product is entirely in the question of product-market t. In contrast, using ML introduces an additional risk of delivery. It is possible that the product which best ts the market is not deliverable with existing ML models. An ML MVP must control for both of these risks. You must resource for success. Whilst no planning can guarantee success in ML; inability to prepare will certainly create the conditions for failure. ML models which are created in a vacuum of domain expertise will rarely work in production. Further, the probabilistic nature of ML necessitates that careful attention is paid to how the results of any given ML model will be presented to human users, whether internal last-mile operators or external clients. Extensive resources are available online covering the nuances of traditional product management. It is outside the scope of this manual to cover the generalities of good product design or good market discovery. The lifecycle described in this section should be integrated with your existing product management practice for the best results. In this section Approach to Solving Problems with ML ML MVP Design Workshop Resourcing Business Problems Initial Product De nition ML MVP Design Review Approach to Solving Problems with ML During the Business Understanding phase, you de ned a business problem and determined its approximate value. Some of this work will likely bene t from being revisited, for both recency and thoroughness. By the end of this section, you should have a clear plan to create an MVP aligned with your business objectives. Goals De ne the initial ML model target. De ne how the model will be integrated into the product. De ne success criteria. How to do it Decompose the business problem: The business problem should be decomposed into a Machine Learning problem, and any additional assumptions surrounding it. Conduct any required additional Market Analysis: In particular, this market analysis should seek to clarify value and decision-making, validate business assumptions where possible, and identify the composition of the market. Identify opportunities to simplify the Machine Learning model: For example, can a multi-class classi cation model be implemented usefully as a binary classi cation model? These opportunities should be grounded in your market analysis. Identify the initial target market for your product: The target market will be the earliest addressable market which is experiencing the business problem most signi cantly. Design a pipeline for implementation of the product: Create an initial pipeline for the product. For early models, you may often need to use last-mile human analysis or other human-in-the-loop (HITL) processes. Understand how your data will ow, be modelled, and then be processed. Clarify how the threshold will be set for your model and the resources that will be required to operate it. Design the delivery mechanism for the product: Clarify how it will be delivered to the target market and any additional minimum requirements they have. De ne the minimum release criteria: De ne what constitutes done in terms of the development effort. In a HITL process, what is the maximum capacity of the available resource, and what is the volume of data owing into the model? This will help to create a minimum standard for release. De ne the MVP success criteria: In De ning KPIs the business problem impact was set. This is the foundation to analyse the success of the MVP. De ne one or more measurable and observable benchmarks by which success will be judged. Decompose the business problem A business problem should be framed in terms of: a task to be completed a gap between current and desired capabilities other business objectives. It should be relatively clear from a well-formed business problem what the value of solving it is. Now, the business problem described in the business understanding phase must be decomposed. The problem should be separated into a machine learning problem, and any other non-ML business problem(s) and assumptions. Articulating your assumptions will help to expose areas of risk to control. The Machine Learning problem should be de ned as a black-box at this point by the Product Manager. Its purpose should be de ned based on what it does rather than how it does it. This will allow creating an interface between the product and the model (the "What") and the Data Science Team can create the model (the "How") later. The "What" will also inform the labelling speci cation for the Data Acquisition Team and how the Data Science Team will process and select data to be cleared. Conduct any required additional Market Analysis Following initial problem decomposition, you must determine whether additional market research is required before moving forward. You will bene t from a clear understanding of market segments for the market being addressed by the proposed product. You may also want to satisfy yourself that the assumptions about your product are reasonable. The bene ts of additional market research should be weighed carefully against product and delivery risk. Once high-risk assumptions have been validated, and market segments have been de ned, it is almost always better to push ahead with your MVP than to spend more time in this phase. After this point, spending more time in market research will likely only reveal details relevant to a future phase of work, that may not happen if a successful model cannot be created. Identify opportunities to simplify the Machine Learning model Having identi ed the ML problem and corresponding market segments, the ML problem should be analysed to identify opportunities for simpli cation. For example, decompose a multi-class classi cation into multiple binary ones. In this stage, it is important to nd a balance between a simpli ed model and a product which creates business value. Without business value being created, it will not be possible to test the potential success of this or any future, derived product. Given the need to nd a balance here, this speci cation must be created and re ned through dialogue between the (Lead) Data Scientist and Product Manager. Identify the initial target market for your product The target market for an MVP is, by de nition, smaller than the target market for any future envisaged product. For this phase, the ideal target is an Early Adopter; that section of the market which feels the pain of the problem most strongly and urgently. Here, we use Early Adopters in the general sense. They are identi ed by someone who already: Have dedicated target / OKRs on the issue. Allocated a budget for it. Took action to solve the problem by other means. Often Early Adopters, and the target market more generally, can be identi ed on a scale based on how much action they are willing to take to resolve the problem. The objective is to validate both market need and product feasibility. You may identify a target segment for an ML MVP which is not at the end of the scale if that target is satis ed by a reduced ML model that can be improved later. Market segmentation is a well-covered subject, which is not addressed in detail here. Common tactics include the use of buyer personas and user personas, the use of geographic and economic indicators, up to the use of the range of tactics described in Anthony W. Ulwick's Jobs to Be Done. Design a pipeline for implementation of the product The next key element of MVP planning is to design the data ow for the product. That is the logical sequence of steps that will take raw data on one side, and process and prepare that data for use in a product on the other side. Common considerations include: 1. What raw data is required, and how will it be accessed? 2. Will that data be ltered by any existing/deterministic criteria? 3. What is the expected range of results from the ML model? 4. What non-ML processing will be required? If last-mile or HITL processing is applied, how will the results of that work be captured for future model iterations? Design the delivery mechanism for the product The delivery mechanism should be de ned after understanding the target market and the data ow. It is particularly important to understand the range of possible outputs from the ML model and how they will be handled. What will be presented to the user in the case of a failure of the classi cation mechanism? How will uncertainty be handled? Much has been written about the need for 'explainability' in Machine Learning models, but this can often be a misnomer. Far more important is a need to present or process data in such a way as to accommodate the uncertainty inherent in ML. De ne the minimum release criteria The minimum release criteria is usually a function of model capabilities and resourcing. A typical HITL pipeline will involve an element of human processing, for which you will need to have some (domain) resource available. The amount of resource available and the volume of data being processed will determine the required performance of your model before release. Example For this example, assume you have a pipeline of 100,000 new documents to analyse per day. You believe around 10 have a particular characteristic which is valued by your target market. You want to maximise your chances of nding all 10, and you have one domain expert who can analyse one document in ve minutes or around 100 documents per day. At near 100% recall, your model will need to achieve a precision of 10% or more on release to create an operable product. Other types of models might require different measures of minimum performance before release. It is important to declare these benchmarks and set them at a realistic performance threshold. This is often much easier with HITL processes than not. De ne the MVP success criteria Finally, you must de ne the MVP success criteria. What measure(s) will you use to determine whether or not your MVP is successful? A willingness to pay is one measure of success. However, some MVPs may not have the required features in the early stages to charge. In this case, the time spent using the product can be a good proxy. Are the initial (trial) clients using the product repeatedly? If a client logs in just once, they may just be experimenting. If they come back every day, you have a good indication that there is a market for your product. It is important that MVP success criteria are capable of being evidenced relatively easily, and are as objective as possible. Time and cost are clear measures of a client nding value in the product. However, it is reasonable to also include qualitative measures of success in the very early stages, for example, feedback from trial users. Artefacts None References: Running an ML MVP Design Workshop Running an ML MVP Design Workshop As described in the previous section, Approach to Solving Problems with ML, re ning designs for ML MVPs involves nding an appropriate balance between: The ability to simplify and deliver one or more ML models. The need to address a market segment with a valuable product. Given the balancing of these needs, the Data Science Team must work closely with the Product Manager to identify one or more good quality MVP candidates. We nd that this is best achieved using one or more design workshops. The basics of running a good meeting, or indeed a good design workshop, are not covered in depth here, but they should not be ignored. It is always valuable to declare meeting roles. You should be aware of and take steps to avoid group-think and encourage open dialogue. If no one in your team has run a design workshop before, you will nd it bene cial to research this further to nd a model which works for you. Goals Agree on the key characteristics of your MVP. How to do it Prepare and distribute workshop materials: This is primarily the responsibility of the Product Manager, who should ensure everyone coming into the workshop has enough information to contribute fully to the discussion. Present and discuss the business problem and market: First, the Product Manager should drive a discussion of the business problem being addressed. This sets the context for the remainder of the meeting. Everyone taking part should fully understand why this product is being proposed. Ideate target markets, ML models and product pipelines: At this point as many candidates should be created as possible. Creating more than one design will allow competing designs to be assessed more clearly, and you are more likely to nd an optimal design as a result. Select MVP design candidate(s): Once the ideas have been discussed and evaluated, one or more candidate designs should be selected to re ne in more detail. Identify any high-priority unknowns: Before ending the workshop, ensure everyone has a clear understanding of any unknowns which need to be resolved during the design phase. Prepare and distribute workshop materials Workshop materials should be prepared in advance. These should be sent out with enough time for participants to read and ask any clarifying questions before starting a workshop. The focus of these materials should be: the business problem, as de ned in Business Understanding all information available on market segmentation, all the available context and background. The context might include interviews with buyers and users in the target market, market studies, or other relevant information. Present and discuss the business problem and market The Product Manager will usually lead the discussion on the business problem at the start of the workshop. This ensures that the work is grounded in a clear and shared understanding of the business problem. This section can contain activities that open dialogue and support ideation about the next steps. For example, participants might construct basic user-personas and roleplay conversation examining the business problem. Ideate target markets, ML models and product pipelines The processes of de ning target markets, ML models and product pipelines are de ned in Approach to Solving Problems with ML. The objective here is to de ne as many options for a high-level MVP design as possible. There are different ways to create ideas in a group, and you should not be afraid to experiment with different techniques during this process. You may include timed elements of working singularly and back in a group, the use of whiteboarding or post-it-note collaboration, brainstorming activities or anything which works to generate more ideas for the MVP. Select MVP design candidate(s) Once you have a collection of MVP design candidates, you will need to select one or more candidates to develop in more detail. Again, there are many options to select the best candidate from within the group. As a minimum, you should allow all participants to nominate design candidates for selection, and you should allow time to present and examine each design candidate. Ask yourself; does this design candidate meet the objective de ned by the business problem? Does it deliver value for the target users? Is it deliverable? Following this crossexamination, the candidate(s) should be selected. It is better to select fewer candidates, as more extensive work will be required. It is bene cial to select two candidates if one is better but higher risk, and a second is easier to deliver but may deliver less business value. Multiple candidates should not be selected just for the sake of it. Identify any high-priority unknowns It may be tempting to close the design meeting at this point, as the candidates have been selected, but you should rst articulate any particular unknowns which must be addressed during the design phase. These unknowns may have come up during the cross-examination or may remain unstated. The availability of designs for the candidate, and answering of any other unknowns, is the criteria by which the designs will be assessed, and this aspect of the planning phase will be deemed complete. Artefacts Meeting Minutes Template The meeting minutes should be supplemented with records from the design workshop. This might include photos of whiteboard sketches, post-its or other artefacts to help capture the discussion from the workshop. Resourcing Business Problems In common with all projects and products, appropriate resourcing for ML products is critical. You will usually select the next business problem to work on based on capacity within the Core Product Teams. However, ML products need contributions from many other domains to be successfully delivered. Apart from key skills needed in traditional product development, ML products are particularly sensitive to the availability of domain expertise. Domain experts usually are best positioned to help the Data Scientists understand the model to be created, and may also be needed for labelling tasks and last-mile or HITL work. Resourcing needs should be tracked as soon as the planning phase starts. If the required resources aren't available for a business problem, then one with different resourcing requirement should be selected. Goals Ensure that there is a clear understanding of roles and responsibilities for the MVP phase of work. Ensure the project is su ciently resourced for success. How to do it Identify required resources: The business problem will provide an initial indication of resourcing needs as the planning phase starts. As this phase progresses and the ML MVP design evolves, new resourcing needs may become apparent. These should be tracked throughout. Match available resource to the required resource: Resource should be tracked to ensure general availability. Where resource is coming from outside the core teams, clear reporting and communication lines must be agreed to ensure alignment. Identify and resolve any under-resourcing issues: If there are insu cient resources for any identi ed task, this should be resolved before progressing the project. Identify required resources It is good practice to track project resources actively on an ongoing basis, to ensure product development does not hit a bottleneck and stall. Particular attention should be paid to this during the planning phase, as resource needs emerge. The key functional areas are described in the Organisation part. These should be mapped to the data pipeline as it is developed during your design workshop. Pay particular attention to the following: Data labelling needs: What capabilities are required to label data? How will the team accomplish this during the MVP phase? Model speci cation: Feature speci cation will be driven by a knowledge of the domain and data. How will the Domain Team transfer this knowledge to the Data Scientists? Engineering: What engineering needs does your product have? Do those engineers reside within your sponsoring organisation, or are they integrated with another department? Match available resource to the required resource As new resourcing needs emerge, ensure they are matched to a named person as soon as possible. It will be bene cial for any resource to take part in design workshops to re ne the product speci cations. Where resource is allocated from outside departments, it is important to set expectations as to the time commitment required. Wherever possible, secondments will assist effective delivery by tighter integration with the Product Team. Other methods can be used to ensure alignment, including adjusted personal OKRs and alignment via the business sponsor. A resource is made up of two components: Capability: The key skills required for the activity. Time: The amount of that capability required, based on the activities to be undertaken. So, it is not enough to match the key skills to a product role. Some estimation of time commitment is also required. Time estimates are famously hard to make, and so close estimates should be avoided. Generally, tracking needs hour-by-hour will limit exibility and is likely to be counterproductive; it is often better to track in half-day or whole-day increments from sprint to sprint. Identify and resolve any under-resourcing issues Actively tracking projected resource requirements in some way will help to identify under-resourcing issues sooner. When an under-resourcing issue arrises it should be addressed as soon as possible. Proceeding with a project which does not have the resource to succeed will expose all team members to unnecessary risk of failure. Artefacts None Initial Product De nition Many activities from traditional product development should be carried out when working on ML products. These support rapid problem solving and alignment between development activities and the business problem. These activities are not covered in depth in this manual. New product managers should explore these topics in more detail before proceeding. ML product speci cation requires additional considerations. First, it is bene cial to specify the modelling aspect of the problem in more functional detail. This differs from the user- rst approach adopted for traditional product requirements. Second, the use of ML introduces probabilistic behaviour. This can lead to a failed user experience that needs to be compensated by better UX and UI design. Goals Create a clear set of product requirements for the ML MVP How to do it Create User Personas: User Personas are a semi- ctional representation of your target user. They help to guide product development and maintain alignment with the business problem and domain. Write Product Requirements - ML: Product requirements represent the shared understanding of the product to be built. They should always be grounded in a business need. Write Product Requirements - non-ML: Non-ML product requirements should be produced and tracked within the Product Team. Re ne User Interaction Design for model consumption: Careful attention should be paid to how the model will be consumed and used for the bene t of the end-user. Re ne User Interaction Design for labelling tasks: Labelling tasks can also introduce design challenges. Multi-label models need careful consideration to avoid bias in the results. Create User Personas There are lots of great resources online to help you create user personas. For a primer, see the Wikipedia entry, or the Interaction Design Foundation blog. Creating user personas helps to ensure your requirements remain aligned to actual users, and they should be referred to throughout those requirements. Your personas should be based on interviews with real users (or prospective ones) as much as possible. It helps to ensure you have a clear picture, so they always describe a speci c ( ctional) person. They should also have a name and a concrete age (not an age range). It often helps to include a picture. Apart from the demographic information, describe how the user experiences the business problem in user-centric terms. Write Product Requirements - ML The purpose of a product requirements document is to develop an approach to solving the identi ed business problem and de ne the business requirements for the proposed change. It should be completed to a reasonable degree before commencing any work to assess expected resource demands. However, it is a living document and may be updated after starting work. The ML product requirements document will include, minimally, the following: A description of an ML model to be created. One or more user stories which de ne how the model will be used for context but not for engineering developement. Those will be discussed in non-ML requirements. The minimum release criteria for the model. This will often be de ned in terms of precision and recall. A set of assumptions about how the model will create value for the user. One or more success metrics, which will be used to determine whether those assumptions were correct. As ML product speci cations are iterated on, the product requirements document should track other key aspects of the ML model: How data will be labelled to support the development of the model. How data will be collected from the implementation of the model if it is required for model re-training. Write Product Requirements - non-ML Non-ML product requirements should be tracked as they would for any other product. They should share the same user personas as their ML counterparts, and be aligned to the same business problem. For further information on how to write good product requirements, see Jeff Patton's User Story Mapping, or the blog section of your favourite product management software, e.g. Atlassian or Aha!. Re ne User Interaction Design for model consumption Your product's users will not know how Machine Learning works. Because decisions are made statistically, information about the process should be shared only in the rare case the user understands it. While designing interactions for Machine Learning, you must take into account the probabilistic nature of the model output to ensure users can interact with your product in predictable and productive ways. As with other areas of product de nition, there are extensive resources available online to support you in designing interactions. See the Interaction Design Foundation again, for a good primer. Some teams may bene t from the help of a dedicated UX designer. Designs are not static and should evolve, taking into account how user behaviour changes, and how your understanding of user behaviour changes over time. When designing for interactions with models, you should pay particular attention to the following: What is the range of outputs from your model? Can your model fail to make a prediction, and if so, how will you handle this situation with your users? What action do you expect them to take in response to the range of outputs? Your model may produce an incorrect prediction. What will be the result of this? Will the user be aware of it immediately or only later? Can you take steps to mitigate that impact? If your model produces a response taking into account some user input, how will you manage that input? Will the user know what to provide to create the desired response? Example To demonstrate how to think about UX when designing ML interactions, consider a self-driving car. Whilst this might be a more complicated ML problem, the principles hold true for many other classes of problem. In this example the model will produce some control output to direct the car, and the user is the person in the driver's seat who wants to travel from point A to point B. Here, the self-driving model can either produce a set of control outputs which cause the car to speed up, slow down, turn or continue straight ahead. At lower con dence levels control should be handed back to the user. A set of interactions must be designed to alert the user that they have to take back control and ensure that this happens in orderly way. The self-driving model can produce bad predictions, even at high con dence levels. The results of this could be inconvenient, in the case of taking the wrong lane, or catastrophic in the case of causing a collision with another road user. Interactions must be designed which encourage the user to pay close attention to the road ahead, and the behaviour of the car. The user should be able to override the model quickly and regain control. Finally, the user must decide where the machine learning model directs the car. Routes might be selected via a traditional GPS interface. However, the user will now use this interface to travel via local roads which they know well, and through which they are likely to have preferred routes. The GPS interface may need to be adapted, to learn user preferences, display different options to a destination, or otherwise help guide a user to achieve the desired result. Re ne User Interaction Design for labelling tasks Finally, attention should be paid to the design for labelling tasks. Good design speed up labelling. More importantly, it can impact the quality of labelling efforts. Labeller might interpret instructions in a biased way. The UX design should support the correct interpretation of the labelling task. You will need to consider this interaction carefully to optimise your labelling efforts. As the data and labelling tasks evolve, this design will need to be continually revisited to ensure it remains optimal. Artefacts ML Product Requirements Template User Persona Template ML MVP Design Review In the initial design workshop, you agreed on the key characteristics of your MVP. Following that agreement the team worked individually and in smaller groups to develop the initial product de nition. Before considering the planning phase complete, the team should meet for a nal time to review the more detailed product plans. Reviewing product designs as a group, in a meeting, helps to ensure a more robust assessment of the completed plans. Team members should assess the designs to ensure they are well understood by all, and match with the plan from the original design workshop. Goals Ensure that the nal MVP design is well understood and accepted by the team. How to do it Review and nalise MVP design: The nal designs should be reviewed as a group. The purpose of a collective review is to ensure the team has a shared understanding of design proposals, accepts that they are complete and that they are well aligned with the discussions from the rst workshop. Review and nalise MVP design In this workshop, the business problem should be quickly recapped. If there has been a delay between the rst meeting and this one, then more time can be spent on this area. Next, the candidate design(s) should be summarised, and the complete designs presented in more detail by the authors of them. The team should scrutinise the designs closely and ensure they are acceptable for moving to the next phase of work. Artefacts Meeting Minutes Template Decisions Overview In the Bootstrapping phase, you will create an MVP which will assess the technical complexity and market demand for your product. Following this phase, you will make small, iterative changes, maintaining and improving models and introducing new features, to create a valuable product (the Operating phase). To do this effectively the Product Team needs to be able to make feature prioritisation decisions quickly, acting autonomously. That autonomy must be goverened by a clear agreement between the Business Leadership Team and the Product Team. In this part, we describe the agreement in terms of two artefacts: The Product Strategy. This is a qualitative statement of your medium-term goals for the product. The Business Plan. This is a quantitative, high-level statement of expected value and cost, which is the reason why the product is being pursued. Most of the work of the decision phase, to create the Product Strategy and Business Plan, can be completed concurrently with the Planning phase. Once the planning is complete a formal agreement should be saught to proceed with the product, based on the documents above. In this section Product Strategy Business Plan Go/No-Go Product Decision Product Strategy A product strategy consists of two things: A statement about the goals of the Product Team to address a problem. An outline of how the team will attempt to meet those goals. It describes the work of the team over several months. You must create a short statement of product strategy before the start of the Bootstrapping phase. This statement forms a contract between the Product Team and the Business Leadership Team. It describes how they will use the resources at their disposal. A product strategy is a tool which enables more autonomous decision-making. The Product Team should be empowered to make the feature prioritisation, design or other product decisions that they feel will best meet the agreed goals. In return, if the Product Team believes that the strategy is no longer a viable one, then the team must engage with the Business Leadership Team to update it. There are lots of resources on product strategy available online. Some good examples include Shopify, Aha!, and The Silicon Valley Product Group, or Good Strategy/Bad Strategy for a more general discussion on strategy. If you are creating a new product with a broad scope, you may nd it useful to write a product vision statement. If more than one team is working on a product, you may nd it helpful to divide your product strategy into OKRs assigned to each team. Our focus here is on creating the minimum framework to support your decision making. Goals De ne a product strategy for your data product How to do it Agree on a set of medium-term goals for the project: Here, goals are an expected set of outcomes for the data product. Agree on a statement of how those goals will be met: This statement is the broad outline of how those outcomes will be achieved. It should give you room to take reasonable action to meet those goals. Agree on a set of medium-term goals for the project The rst part of your product strategy to agree on is the goal. The goal should be a business goal, not a technical goal; it is an outcome you would like to achieve. Good goals do have some of the following attributes: A goal should be inspectable. That is, an impartial observer should be able to review your progress against the goal and determine whether or not it was met. As a result, you will need to include some metric or measure of success. Some business goals apply to a broad market. Tackling a broad market with a new product is usually unrealistic. The goal should say who will experience the desirable outcome; the market segment, target customer or another group. A good goal is also time-bound, and a typical timeframe might be on the order of 6 - 12 months. Too short a period will require a speci c, small and rigid goal, requiring repeated re-engagement with the Business Leadership Team as you learn and adjust. Too long a period will require a more loosely stated goal to account for the risk of change, reducing transparency and accountability. In both cases, autonomy will suffer. Finally, the goal should be realistic with the resources you have and the timeframe you've speci ed. For some ML problems, the goal may be directly transcribed from the original Business Problem if it is well contained and quickly solvable. For open-ended business problems, you will need to agree on an intermediate goal with the Business Leadership Team that is relevant to your business strategy. Keep in mind that goals are as much about saying what you are not going to do, as what you are. A well-stated goal will help you to avoid distractions from other stakeholders with different priorities. This is particularly important when working in an organisation with multiple divisions or several different products. Agree on a statement of how those goals will be met The second part of a product strategy is a broad outline of the steps you will take to meet these goals. For example, this statement might express some beliefs about how useful information can be extracted from a body of data. It should not be a detailed description of how you will approach the problem. Too much detail here will lead to frequent returns to renegotiate new terms with the Business Leadership Team. Artefacts Product Strategy Template Business Plan Business plans come in many shapes and sizes. For our purposes, the business plan we describe here is a short statement which captures the expected value of the business problem, and the resources required to capture that value. As ML projects progress, those two things can change signi cantly, so making them explicit gives you a benchmark to judge when change is reasonable and within the bounds of a project, and when more action needs to be taken in response to those changes. Signi cant deviations from the business plan are a strong indication that you need to return to the Business Leadership Team and re-negotiate how resources are deployed. Goals Create a simple business plan for your data product Keep that business plan up to date How to do it Calculate the expected bene ts of your product: Determine the value of your mature data product and the total market size for that product. Approximate ongoing costs for your product: Costs will be in the form of people, infrastructure and any other ongoing costs. Determine any one-time costs for your product: One-time costs might include initial model training, data labelling, data acquisition or anything else. Maintain a time-bound business plan: Use the costs and bene ts to create and maintain a time-bound business plan for your product. Calculate the expected bene ts of your product You will rst need to calculate the expected bene ts of the product, based on the goal de ned in the product strategy. These bene ts should be stated in terms of the economic value the product brings to the user, and the number of users the product is likely to reach. The rst can be estimated by a simple model which might be based on the cost of time saved, or incremental revenue gained. This gure is not a commitment to delivering the economic bene t but provides a benchmark value for the work being done to assess any deviation. The bene ts should be periodically revisited and updated, mainly if the value to users or the target market changes over time. Updates are likely to happen as you gain a better understanding of your target market. Approximate ongoing costs for your product Product costs should be estimated at a high level, including reasonable assumptions about several categories of resources. This is a high-level business plan, including product costs only and is used to set expectations and track deviations. It does not include other business expenses, for example, sales and marketing expense. Team costs: Your team costs will be based on the team expected to work on the product from the bootstrapping phase. You should include resources from other areas, for example, Engineering and Domain Teams, in the proportions needed for your product. Infrastructure costs: Some reasonable assessment of infrastructure should be made, with ongoing costs tracked. Data costs: Any data (labelled or otherwise) which you expect to acquire from external sources should be costed and tracked. Determine any one-time costs for your product One-time costs usually are less signi cant, but if any expenses are expected to be incurred, these should be recorded. These costs might include signi cant engineering efforts or initial data labelling or data acquisition costs. One-time costs will usually be amortised over several years. Follow your company policy when allocating these costs. Artefacts Business Plan Template Go/NoGo Product Decision Once the Product Strategy and Business Plan have been completed, they must be agreed with the Business Leadership Team before proceeding with the Bootstrapping phase. They should be presented, together with the nalised Product De nition, in a Product Kick-Off meeting convened for this purpose. In many cases it is useful to agree these documents informally with the members of the Business Leadership Team before the Product Kick-Off meeting, though even if you have reached agreement the meeting should be held and recorded. Goals Agree the Product Strategy and Business Plan with the Business Leadership Team, and agree to proceed with the Bootstrapping phase. How to do it Hold a Product Kick-Off meeting: As described in the corresponding section, hold a meeting to agree the documents with the Business Leadership Team. Hold a Product Kick-Off meeting A Product Kick-Off meeting should be held with the Business Leadership Team, and the team leaders of the Product Team. Allow time for the documents to be scrutinised, and be prepared to provide further background information where needed, for example from any market analysis completed. Ensure that agreements are recorded in the minutes for the meeting, which should be shared with all attendees. Artefacts Product Strategy Business Plan Bootstrapping The goal of Bootstrapping phase is to have a deployed MVP and complete one Feature Iteration. This requires the completion of the Business Understanding, Planning and Decision phases. It also requires the organisational setup and the scheduling of the necessary regular meetings and communication lines. Because this is a "doing" phase, everyone must be clear about their duties and communication lines. The Product Teams and Functional Teams must synchronise their activities in advance at the planning phase, so they don't cause a delay in progress by waiting for each other. Infrastructure Creating Machine Learning models require a considerable amount of computing and IO power. The product should use existing resources as much as possible. If new hardware elements are required, those should be in line with the rest of the host organisation. For example, if the company generally uses PostgreSQL for databases, then logs and labelled data should be stored in PostgreSQL databases as well. This allows simpler work ows from the Engineering (DevOps) Team, reduce friction and speeds up delivery. The Data Science Team should discuss these questions and agree at the Planning phase. Data Lineage and Source Code From a technical point of view, the Data Science Team is in charge of the product-speci c implementation. This requires accessing the relevant data sources and processing the data. The processing pipelines should be stored in an appropriate version control system (e.g. GitHub). Labelling The Data Science Team is also in charge of setting up the required labelling facilities that the Domain and Data Acquisition Teams will use throughout the product. The solution should support data lineage linked to the available data sources. This will enable a historical analysis of the model performance. Communication with Functional Teams The Data Science Team should engage with the Functional Teams in their respective areas of responsibility to achieve a functional MVP. Primarily with the Engineering Team to create the right wrapper and APIs for the models and complete the deployment. Deployment The Data Science Team should use coherent data modelling that naturally leads to a shared interface of the problem both at processing, training, evaluation and inference time. The Engineering Team is responsible for creating the wrapper and maintaining it alongside their standard DevOps procedures. Logging Logging happens between the Engineering Team and the Datawarehouse Team, logging of the product should follow company guidelines, so it doesn't create friction and ts into their usual work schedule. This enables all teams to operate in their comfort zone of relevant skills. If the company doesn't have a Data Warehouse, the Data Science Team should lead the effort to establish logging facilities that can ful l this function. In this case, the simplest solution should be used (e.g. a simple SQL DB su ces) but the Data Science Team should treat this function as external to the product. Monitoring The Data Warehouse Team is responsible for the ETL/ELT processes and producing the sanitised data for both the Data Science Teams for raw consumption and also for the Business Intelligence Team for analysis. Using services maintained by the Business Intelligence Team provides self-serving capabilities for all Product Teams and Business Leadership. This enables frictionless monitoring and analysis of a wide variety of problems without technical resources to be consumed. The Data Science Team will maintain direct access to the data, which acts as a source for modelling and also to perform ad-hoc and more detailed analysis. Reporting The Monitoring Dashboard should act as the primary source of reporting KPIs to the Product Management Team and the Business Leadership Team. If further analysis required, this should be performed by the Data Science Team as a data processing step and the code for that to be version controlled. This will deliver end-to-end data lineage in the product. In this section Organisational Setup Initial Project Structure Initial Task Execution Organisational Setup Before technical work can commence the Product Team should identify and set up the responsible teams. Goals Identify teams and team leaders Establish meetings Establish communication lines How to do it Use the Product Document from the Planning phase to identify the teams: Share strategy and vision established in the Decision phase by the company. Communicate the need for the regular and special meetings: Explain what administration this requires and why. All documents and meeting notes should be "living documents" directly supporting the product delivery efforts. Create noti cations, alerts, channels: A regular feedback loop is necessary to increase awareness and ownership about the product among the Product Teams. Simple formal communication tools allow the teams to engage with the situation and establish team cohesion. Teams Please read Organisation Overview and sub-pages for a detailed description of the required teams, their skills and responsibilities. Even if the company doesn't have these dedicated teams (e.g. Data Warehouse Teams or Data Acquisition Teams) or a single person is responsible for more than one of them (e.g. Head of Data Science is responsible for Business Intelligence) the responsibilities must be allocated. The relevant person should keep in mind which "hat" they wear at a meeting. Separating the functions and establishing the correct communication line help to maintain Bounded Context and therefore identifying issues at a place where they can be resolved by the best person possible. Meetings After the teams are created, the Product Team should identify representatives that will take part in crossfunctional meetings and be responsible for communicating and executing the necessary tasks. See also Meetings Overview. Communication During the Bootstrapping phase Product and Functional Teams closely work together. The right communication needs to happen according to the diagram in the Organisation Overview to execute the plans speci ed in the Planning phase e ciently. This enables that only the teams that must be involved in a task are present in meetings, and they can explain their position and reach an agreement quickly. Initial project structure Starting any projects from zero is hard. Starting one with many cross-functional teams is even harder. The Product Team should invest in setting up good structure early on that enables rapid execution of the tasks in Bootstrapping and facilitates smooth communication among the many teams involved. Goals Create the administrative and communication framework that enables: The Product Team to perform their required activities. The Product Team to engage with the Functional Teams. How to do it Set up code repositories according to company policy. Acquire computing and storage resources from the host organisation. Set up communication lines (e.g.: chatrooms, email lists, noti cations). One of the critical points of the framework is that very little analysis happens in the planning phase. Experience shows that the effort of creating one-off code for estimations in the planning phase is comparable to writing production-ready code. This is because the majority of the data processing and analysis steps that will be done in production must be done for any decent estimation. Because the initial product is only a partial solution and an MVP, it is easier to create high-quality data processing pipelines and deployable solutions than write the temporary code. The focus should be on using extendable frameworks and scalable and repeatable practices for pipeline creation. These frameworks enable large scale analysis at an early stage rather than ad-hoc scripting in Jupyter notebooks. Jupyter notebooks are notoriously tricky to version control which is another reason to minimise their use to visualisation. By versioning, every run of every pipeline Data Scientists maintain a full picture on the work performed and form an excellent basis to successive iterations. All script and pipeline code must be stored in a git-like repository. The commit hashes can be used to identify the state of the codebase when the pipeline was run. It can also identify the data that was created by these pipelines. Ultimately the trained model is identi ed by the same hash. This allows complete data lineage throughout all artefacts and a high level of reproducibility. Branching In a traditional software engineering project usually there is only one main branch which contains the canonical version of the product. Each change and new feature is reconciled with this main branch and integrated or discarded. Machine Learning products are different. Multiple versions of the same model can be in production at the same time because only long-running A/B tests can decide which one is better. Therefore branches in the project are more persistent. Model's must incorporate the branch information into their metadata for data lineage, and this should be represented in logging and business intelligence as well. Data modelling Data modelling is an important aspect of MLPM. It allows all technical teams to coordinate their interfaces through a standard de nition. It also enables Data Science Teams to initiate changes and transform previous data into a new structure through schema evolution. Data storage All data is stored in data classes belonging to the data model and versioned as described above. This enables datasets to be linked to the code that generated them and in turn version the datasets with the version control system. Labelling Because iterations happen continuously, a convenient labelling facility is needed that performs three functions: Accepts data to be labelled from the production environment Allows the Data Science Team to select data for labelling Allows the Data Acquisition Team to label data and record any metadata (time, labeller, etc.) Allows the Data Science Team to create a labelled dataset from this labelled data and version it The labelling tool should also be exible to be changed when a subsequent iteration requires a different data model. Initial Task Execution This section is the most execution-oriented part of the whole product lifecycle. As in all MVPs, with minimum time and effort, the maximum amount of work should be done. This is best achieved by using general frameworks that can be ne-tuned later that are proven to be non-blocking. This section expects that all meetings and communication lines are established as they are the primary medium to coordinate the optimal rollout. Goals Deploy working prototype in production How to do it Carefully review the 10 steps below and the diagram: Decide what's the best way to implement each function and how much effort will it take. Execute each task as soon as possible, when a blocking task is nished: Communicate with teams on the critical path to foresee delays and work on other tasks to ll in the gaps. Communicate and synchronize with teams according to the diagram in Organisation Overview: Once the system starts to work, monitor the situation through alerts and iron out details responding to these. The following diagram depicts the complete architectural setup of the nal product. In the next sections, we describe the steps to deliver each component in a way that leads to a working MVP. Step 1: Data Science Code Repository This is where the source code for the data class de nitions, data processing pipelines and models will reside. Relevant teams: Data Science Step 2: Data Model De nition The Data Science Team de nes the classes that will contain all the data for the models. Data classes are the format that all individual piece of information is represented. There are three main data classes; each of them can be the composite of multiple other ones. Input class: This is the representation of the model's input. Model serving will accept API requests in this form. The Feature Store or the model serving service itself will turn these into features and pass it to the served model. Multiple of these will form a batch in evaluation mode. Output class: This is the representation of the model's output. Model serving will return a response in this form. Labelling creates these classes and attaches them to an input class Labelled class: These are composite of a pair of input/output classes. Labelling creates labelled classes by creating and assigning a labelled class to each required input class. A set of these are used to create a training batch. Relevant teams: Data Science Optional: Feature Store Data classes need to be turned into features to be able to be used with a model. These transformations must be stored in the Code Repository. There are multiple different commercial and open source solutions for feature stores. In the MVP stage, this can be simpli ed by adding an extra component to the modelling service that performs these transformations. Step 3: Data Processing Pipelines The Data Science Team transfers the raw data into data classes and datasets with different processing pipelines. This is where data cleaning happens and where labelled data gets incorporated. The diagram below shows how labelling needs to be organised to enable continuous integration at a model training level: The relevant components for labelling are marked with blue. Because labelling is a productionised system, it is helpful to map its components to the respective parts of the live product. This enables reuse and simpli es its implementation. Relevant teams: Data Science Step 4: Labelling Because labelling happens continuously, it needs to be a productionised system. The Data Science Team can assign items to be labelled at any time, and the Data Acquisition Team can label continuously. In turn, the Data Science Team can request the current set of labelled data and train a model with it. This is end-to-end continuous integration in Machine Learning and one of the core concepts of iterative Machine Learning. The following steps need to be accomplished to achieve this: Create a labelling interface: Because only the Data Acquisition Team will use this, a no-code UI can be created for this in the MVP stage, which can be turned into a full UI if the project matures. Labelled data storage: Because the labelling interface accesses this data, it needs to be stored in a similar database to the rest of the productionised system. Add output classes to the Feature Store: The feature store needs to be able to retrieve labels from the database and turn them into output classes. It will also transform these into output features. Relevant teams: Data Science, Data Acquisition Step 5: Model De nition Following the plans made with the Domain Team, the Data Science Team can create the trainable model structure according to the speci cation of data classes and update the required feature transformation code. Relevant teams: Data Science Step 6: Initial Labelling Task In this step, the Data Science Team selects items to be labelled according to the product plans, inserts them into the production database, and the Data Acquisition Team labels them. Because the architecture enables continuous delivery, the Data Science Team doesn't need to wait till it is nished. They can immediately move to the next step. Relevant teams: Data Acquisition Step 7: Model Training and Evaluation After a considerable amount of data is labelled, the model is trained, and a "mock" model is created. This training run is important even if the resulting model is not performant enough because the model can be handed to the Engineering Team, and they can create the model serving service. The mock model can potentially provide insights about the problem that is not available in the planning phase. After consulting with the Domain Team, these ndings can be incorporated into the labelling process, and new items can be selected. Evaluation is a standard processing step that is performed with a custom data pipeline. Logs of both the training and evaluation can be stored in a database. In the MVP stage temporarily, this can be placed in les. Both should be able to be visualised with appropriate tools. Often in evaluation more detailed information is required than just the output and errors. Because the processing is done with a custom data pipeline this evaluation output can be saved into a data model class and saved into a dataset which enables rich analysis. Relevant teams: Data Science, Domain, Data Acquisition Optional: Model Registry Trained models are stored in model registries to enable data lineage and in-depth performance analysis of the production environment. There are multiple commercial and open source solutions for this. In the MVP stage, this can be omitted, but you must make sure that the model artefacts are tied to the commit hashes of the Code Repository. Relevant teams: Data Science Step 8: Model Serving The Engineering Team can create the model serving service with the help of the mock model created in Step 7. So in this step, only the procedure to hand over and update the service needs to be established. They must also set up logging of the requests that include the model's identi er so it can be identi ed in monitoring. Relevant teams: Engineering Optional: A/B Testing Machine Learning models can perform differently in production and test environments despite the most careful evaluation. To make sure that this is under control an A/B testing system can help to deploy models either in a canary (1% Test, 99% Contol) or in a normal A/B test. Relevant teams: Engineering Step 9: Logging The Data Warehouse Team update the ETL processes so the logs from the model usage are transformed into a state that can be presented in dashboards or further analysed by the Data Science Team. Relevant teams: Data Warehouse Step 10: Monitoring and Reporting The last step is to make sure that the model in production operates correctly. Business Intelligence Team can incorporate model output into KPI monitoring. Data Science Team writes ad-hoc general analysis and reports on performance for the Product Management Team. Relevant teams: Business Intelligence, Data Science At this point, the MVP is nished, and the project can move onto the Operating phase. In that phase, the Product Team will create new features through multiple Feature Iterations. The Core Product Teams will use the architecture and the tools created here to do Core Iterations. The Data Acquisition Team will use the labelling facilities to maintain the performance of the models in production, and the Data Science Team will continue monitoring. Operating Phase Overview The operating phase is the long term state of the product. A product mindset requires that after completion, the project is not abandoned, but the same team maintains it through successive iterations. These improvements are necessary because the Bootstrapping phase delivers an MVP solution only, and even that is developed in the shortest possible time. The Operating phase is the place where the main body of work happens, and the MVP is turned into a fully- edged product. The Product Team performs the following four tasks: Periodic Business Review Product Review Product Review and Feature Iteration Business As Usual Periodic Business Review The Product Management Team collects feedback from the Users, the Domain Team and the Business Leadership Team. They reconcile this with quantitative feedback from the Data Science Team and update the product roadmap. This allows them to make better recommendations about future actions. Product Review and Feature Iteration Feature Iteration is the primary vehicle to update the product. The product review enables the Product Team to assess the need for new features. The Core Product Teams can respond to this with feature ideas, and these features ideas are delivered by Core Iteration. Core Iteration This is the main process to update a Machine Learning product. It is a circular process that is applied at multiple places throughout the lifecycle: At Bootstrapping it delivers the MVP. At a Feature Iteration, it delivers an updated model. At the Business-As-Usual step, it maintains the model in production. It consists of four steps: Evaluation of model performance and update of labelling speci cation. Selecting, labelling or synthetic generation of training data. Update, train and deploy a model ensemble. Evaluate the updated model. The above process should be repeated until the desired outcome is achieved. Business-As-Usual Models experience decay in production for various reasons. The Core Product Team maintains this by regularly using the Core Iteration process to update the models. Data Science and Data Acquisition Teams can perform this process alone regularly as part of their daily jobs. In this section Product Product Overview User Research User Analysis Product Concepts Product Requirements Periodic Business Review Feature Iteration Core Iteration Business-As-Usual Product Overview In the operating phase, you can divide the role of the Product Management Team in two. First, they assess if delivered work meets the product's original objectives. Second, they plan for upcoming work. These functions are essential to maintaining ongoing alignment between business value and machine learning activities. Planning for future work takes on several important aspects: The Product Management Team must prioritise upcoming work, deciding which features provide maximum value aligned with the de ned product strategy. Features and future work must be suitably well de ned to avoid ambiguity in delivery. They must clearly de ne the success criteria for that feature; the measures by which you will judge whether it was successful in creating the desired business value. Assessing delivered work starts where the planning aspect ends; measuring and determining whether the success criteria have been met. Wherever possible, you should use quantitative measures of success, and for this reason, user telemetry is often essential. However, quantitative metrics may be more challenging to de ne in certain products. Qualitative feedback from users, obtained through user interviews and other techniques, will help to re ne your understanding of product-market t and gain a more nuanced understanding of how your product is being used. In This Section User Reasearch User Analysis Product Concepts Product Requirements Product Prioritisation User Research Client and user interviews are often one of the most important sources of information when seeking to uncover more information about unmet needs and to understand how products are being used. Interviews should be carried out on an ongoing basis, with a wide variety of users. The type of user will dictate the most suitable format; some interviews might be necessarily brief, whereas others may be more involved. Your interview process should cover both speci c validation of deployed features, as well as open questions to understand the context and inform future feature prioritisation (not necessarily in the same interview). Interviews serve many purposes, including understanding the demographics of the user base, their preferences, their roles and responsibilities and the day-to-day activities they carry out as part of their job. They seek to identify what tasks are more or less important to users and what tasks are currently more challenging and why. They may also seek to validate ideas about user analytics. Some interviews are more open-ended; looking for new opportunities to assist otherwise well-served users. Others may be more targeted at speci c areas, based on intelligence or to con rm observations from business analytics. This information should be re-incorporated into user personas and used to prioritise future features. Goals Validate features that have been deployed. Improve your understanding of user needs. How to do it Agree priorities for user feedback: Typically this will be agreed in the product meeting, between the wider Product Team, based on perceived gaps in knowledge. Maintain a list of user questions: Based on the user feedback priorities, product management should keep a prioritised list of questions for users. Select users to interview: You will often have better access to a smaller subset of users. Some care should still be taken to ensure you interview a balanced group which is representative of the broader market. Conduct interviews: Interviews can take many formats, from Slack conversations, email Q&A's, phone conversations or in-person meetings. Keep detailed interview records: Recording interviews soon afterwards ensures that the information recorded is as close to that stated by the user as possible, and helps to ground future work in user problems. Maintain user personas: User personas are semi- ctionalised descriptions of typical users. They are referred to when writing requirements to lend context to features and their use. Agree on priorities for user feedback User research should be speci c and directed towards a particular goal. Goals will vary between being broad and narrow and might include: Gaining a better understanding of user job activities. Researching how a proposed product would be used. Understanding barriers to use current products or features. Validating ideas about user analytics. As user interview time is generally limited, agreeing on a set of priorities ensures maximum bene t from that time. Priorities are generally best agreed in the product meeting, and will be set based on feedback received, features deployed and to assist in the prioritisation of proposed future features. These priorities should not limit conversation outside of that; open-ended questions may lead to relevant topics, and it is reasonable for interviewers to take such opportunities as they arise. Maintain a list of user questions It is often useful to create and maintain a list of prioritised questions. Questions should be clear, speci c, and open-ended; writing questions down in advance will help less experienced interviewers, in particular, to structure the questions well. If more than one person is conducting interviews, then using a list of questions also helps ensure consistency and comparability between interviews and users. Finally, conscious prioritisation of questions ensures that you get answers to your most important questions when interview time is ad-hoc and limited. An example starting list of questions is available in User Interview_Questions Select users to interview In many circumstances, you will not be able to interview all of your users at will. You may need to work with other groups within your organisation to gain access to users. For B2B organisations, sales and customer success teams often have good access and may be used as a route to the users. For B2C organisations, you might work more closely with marketing personnel to develop a strategy to access users. As you build up your interview process, you may need to record demographic, client, or other information to keep interviews as representative of your user base as possible. Conduct interviews The format of your interview will vary depending on your user base. Many user types might have limited availability. It is better to develop a regular cadence of brief interviews than wait until a user has more time. This will result in a good stream of feedback. Be exible about the format of these interviews, and work with your user base, taking advantage of any mutual chat technologies if they are available and useful. That said, longer interviews are still important, particularly when researching more open-ended problems, these should be scheduled regularly, as far as your user base allows. Interviews might be normally led by Product Management, or a UX team if available. Other teams, including Engineering, Data Science and Domain Teams, should also take part in these interviews at regular intervals to ensure everyone has a chance to question and better understand user needs. Keep detailed interview records Record your interviews in a location that the whole organisation has access to, and write them up as soon as possible after the interview has ended. Writing them up whilst they are fresh in your mind ensures they accurately represent the things said by the interviewee, undistorted by time passed. Open-ended interviews might yield insights which seem irrelevant at the time but are important for future pieces of work. Detailed records ensure this information is not lost. Maintain user personas Extensive information on user personas is available online. Good user personas portray a speci c individual, and the work this individual is trying to accomplish. It should be possible for a reader of a user persona to imagine the person being portrayed easily. It can be useful to include demographic information for this ctional individual, a (made up) name, and even a photograph. A collection of user personas should be maintained that is representative of your user base. These personas should be regularly revisited following rounds of interviews, to keep them relevant and representative. Artefacts User Persona Template User Analysis User analytics provide a quantitative measurement of user behaviour. Wherever possible, when designing a new feature, you should think about the changes in user behaviour you anticipate and how that would be re ected in the interaction with your product. You should have a general view on those interactions which indicate a successful product, and which indicate whether or not you are successfully approaching the goals of your product strategy. You need to engage with the Engineering Team to ensure these interactions are captured and recorded. You also need to develop monitoring and report on them, directly and using meaningful metrics derived from them. Goals Monitor how users interact with your product Create meaningful reports on those interactions to guide prioritisation decisions How to do it Develop a list of interactions that you want to record: With an understanding of your product, product strategy and individual implementations, decide which interactions are meaningful. Ensure a system is implemented to record those user interactions: Work with your Engineering Team to ensure those interactions can be captured and stored. Create a process to monitor and report on those user interactions: Processes could range from a simple SQL query to start with, towards complete integration with any available BI tools as you scale up. Develop a list of interactions that you want to record When planning a new product or introducing new features, you should consider how users might interact with those features. You must decide on what you need to record to track those interactions. Keep a record of de ned user events. Ensure a system is implemented to record those user interactions From the planning stage onwards, if you do not have a BI system which records all of the interactions you need to monitor, you must ensure that such a system is implemented. Ensure that the system is kept up-to-date with each new feature implemented. You may have to consider user privacy when implementing this system. Create a process to monitor and report on those user interactions When rst launching a product, you will often nd it bene cial to record all interactions and use a straightforward reporting structure. For example, a daily report extracted using an SQL query might su ce to understand and track fundamental user interactions. As you gain a better understanding of user behaviour, you should re ne your reporting to focus on the most meaningful interactions. You may develop more sophisticated reporting as your understanding matures. User analytics should be monitored frequently, noting deviations, and it should be reported in the product meeting. Artefacts None Product Concepts Product Concepts can be used to capture observed business problems during the lifecycle of a product. They record su cient detail for the Product Team to determine whether or not a solution should be pursued, based on the perceived bene t and potential viability of solving that problem. Product Concepts are used to triage issues and possible features to direct development. At any point in time, any member of the Product Team might observe potential problems for users, clients or team members. These ideas should be turned into Product Concepts and go through a rapid prioritisation that can help to quickly triage these without developing each into full-blown requirements documents. The Product Concept document is designed to give a quick snapshot of an observed problem; it is intended to be light-touch to be able to capture a broad range of problems. You should capture every idea in a Product Concept without selectivity, selection will happen at the Product Meeting later. The Product Concept document contains three key sections: 1. Problem & Desired Outcome statements. 2. Impact metrics. 3. Team commentary. The document is not meant to be exhaustive; it is a high-level outline which has enough information to prioritise the analysis of it against other problems. The document should capture as much context about the problem as possible, so it is best written by the person observing the problem. Goals Capture a business problem or proposed feature with enough information to assess its relative impact How to do it Write a problem and desired outcome statement: New features should be recorded as business problems to understand the impact of them. Write a short problem and desired outcome statement to de ne this problem. Record impact metrics: Impact metrics should be aligned with the product strategy, and any related KPIs captured. A scale of one to ve can be used to de ne the range of impacts, from low to high. Present the concept at the product meeting: Concepts should be recorded in impact order, for review and prioritisation in the product meeting. Write a problem and desired outcome statement Whoever observes the problem should write the rst draft of a problem statement. A good problem statement should describe: What a user is trying to do. What is stopping them from doing it or making it more di cult. A description of the impact this has. This should describe the observed problem and include su cient contextual information to assess the metrics in the user impact section. The problem observer should also write a desired-outcome statement. This is not a solution proposal, but a general statement by which a potential solution can be judged as acceptable (or not). A good desired-outcome follows the Outcome-Driven Innovation model; including a direction (e.g. increase or decrease), a metric (e.g. the time taken or likelihood), an object of control (the outcome), and a contextual clari er (describing the context in which the outcome is desired). Typically smaller problems will be associated with improvements in one metric, and one desired outcome statement. Larger problems, however, may include more than one statement. Record impact metrics De ne metrics to evaluate the expected impact of the proposed concept. The right impact metrics are aligned with your product strategy (and KPIs). Impact metrics are a guide to the impact of a problem and are used as a yardstick to prioritise further research and problem-solving activities. They are not meant to be heavily researched. Metrics should be listed approximately, as an estimate, by quartile or generally, and should include the proportion of users impacted, frequency of occurrence of the problem, impact size and impact velocity. Present the concept at the product meeting See Product Meeting and Product Prioritisation for more details on the product process. Prioritisation of new features should be reviewed by the team, before selecting a product concept to work up into a full requirements speci cation. Artefacts Product Concept Template Product Requirements Product Concepts are selected for development into Product Requirements by the Product Team, in the product meeting. The collective review aims to balance risk and reward, taking into account the different perspectives of all stakeholders. The purpose of the product requirements stage is to develop an approach to solving the identi ed business problem and de ne the business requirements for the proposed change. It should be completed to a reasonable degree before commencing data science work, though it is a living document and may be updated after starting work. The objective of a product requirement will align with the problem de ned in the previous product concept. It de nes requirements using user stories, data pipelines and also provides a space for user interaction design. It is used as a base to supplement the discussion with the Data Science Team, but it does not replace that discussion. Approaches to solving a problem are typically de ned in problem-solving workshops. The outcome of such a workshop may produce more than one possible solution to the problem, and as such more than one product requirement document may be created. Usually, only one document or approach will be subsequently prioritised. The success metrics section of the product requirements page is particularly important. These should be clearly measurable attributes which align with the desired outcome described in the product concept. Goals De ne an approach to solving the identi ed and prioritised business problem De ne how the feature will be assessed as successful or not How to do it De ne user stories: User stories are user-centric descriptions of a new feature or process. De ne any changes in the data pipeline: Any changes to the data pipeline should be clearly described. De ne any change in the user interface: New data features may introduce additional challenges when presenting model output to users. When introducing new features, careful consideration should be given to the user interface. De ne success metrics: Success metrics should be clearly de ned, including what should be measured and for how long. De ne user stories Requirements should always be described using user stories. User stories should be linked to one or more user personas (See: User Research) and should follow a Who-What-Why format. More information on user stories is available online, for example on Wikipedia, or from Mike Cohn. De ne any changes in the data pipeline Some features will require changes to the data pipeline originally de ned in the Planning phase. As you discuss new features with the Data Science Team, ensure that the data pipeline is reviewed and any changes documented. De ne any change in the user interface User interaction should be described using work ows, screen mockups and any other relevant tools. User interaction design will usually start as hand-drawn or more basic depictions, and iterated on through design workshops. The principles described in Initial Product De nition should be adhered to, namely careful handling of the range of outputs from the model, including a null output. De ne success metrics Finally, one or more metrics should be included, which will be measured to judge whether or not the change has been successful in meeting the desired outcome. Outcomes may be directly measurable by a single metric or indirectly through several ones. These metrics should be derived from available user analytics, and additional metrics may include interview outcomes or more qualitative criteria. Artefacts ML Product Requirements Template Product Prioritisation A body of Product Concepts should be maintained, to collect and organise the ideas from within and outside the Product Team on the future direction of the product. Product Concepts are organised by expected impact to aid discussion of which concepts should be developed into more complete Product Requirements, and ultimately be introduced into the product. Candidate concepts are selected in the Product Meeting. Concepts should be reviewed and selected based on their alignment with the Product Strategy. If the most valuable product concepts are not aligned with the strategy, then that strategy should be renegotiated with the Business Leadership Team before the concepts are prioritised and worked on. If moving forward with a concept would materially impact the Business Plan, then this should also be raised and renegotiated with the Business Leadership Team before moving forward. In the process of creating product requirements, you may invalidate initial assumptions made when prioritising a product concept. If this is the case then the product requirement document should be discussed and reviewed at a subsequent product meeting before moving forward. Goals Select a Product Concept to work on How to do it Review the collection of Product Concepts: Take advantage of online tools to aid this review. Assess Product Concepts. Product concepts should be reviewed for impact, alignment with product strategy, and impact on business plan. Select a Product Concept to progress. Select one or more product concepts to work on. Review the collection of Product Concepts Product concepts should be reviewed collectively, in the product meeting. Online tools can aide the maintenance of this collection of documents. Summary information can be maintained in google sheets, or dedicated product management tools can be used (for example Con uence, Aha!, ProductPlan and many more). Take care to track Product Concepts closely, looking for groups of similar concpets or those which can be tackled collectively. Some maintainence of this library of information will be required, particularly as products age and concepts become out-of-date. Assess Product Concepts Product concepts must be assessed: Based on the value they create. Based on their alignment with Product Strategy. Based on their alignment with the Business Plan. This alignment can often be captured with a simple scoring mechanism, which makes ranking and selecting concepts easier and less partial. Since concepts are early ideas it is normally missleading to score on anything more than a one to ten scale. Select a Product Concept to progress You should seek agreement across the Product Team when selecting a Product Concept. A shared scoring mechanism and clear statement of product strategy can help to align team members when selecting a concept. Agreements to move a concept forward should be recorded clearly in the meeting minutes from the product meeting in which the agreement was reached. If you nd it hard to reach an agreement, this may indicate a product strategy which is unclear or no longer viable. If this is the case, schedule a periodic business review to discuss in more detail. Artefacts Product Concept Periodic Business Review Almost by de nition, Machine Learning problems are explorations into the unknown. By creating an MVP, and releasing work in small chunks, you will gradually learn more about the business problem you are investigating. You will learn more about the datasets and the domain you are working on. As you progress you will build up a greater understanding of the complexity of the problem and the market for its solution. For you to realise the bene ts of this iterative approach, that knowledge must be re-incorporated into your plans for future work. In some cases, knowledge is incorporated into your plans through feature prioritisation decisions. These decisions support the gradual pursuit of a product strategy. At other times you will nd information which challenges the product strategy or business plan. In order to optimally manage risk you must regularly review progress against these two artefacts, and determine whether action needs to be taken. Reviews can be held for two reasons: 1. A regular review during the lifecycle of your product. 2. A review which is held ad-hoc as a result of learning new information which impacts your plans. Reviews are held with the Product Team and Business Leadership. Their purpose is to assess whether the product strategy or business plan need to change. Changes to the business plan might also include a change in resource allocation to the product. It is important to note that new information could be positive or negative; in either case a review should be held. For example, information from early trials might indicate a larger market and higher value for your product. In this case it may be advantageous to increase the available resource in order to launch a more complete product faster. Goals Review progress and make suitable recommendations for future action How to do it Review the product strategy. Review the product strategy to determine whether it is still achievable and desirable. Review the business plan. Review the business plan to determine whether it is still accurate. Decide whether any action should be taken. Decide whether any action needs to be taken to revise the product plans. Review the product strategy The product strategy should be periodically reviewed. You should ask the following questions: Is the product strategy still achievable? Do you still expect to be able to deliver the described product in the time given? Is the identi ed market still realistic and achievable? What evidence have you gathered for product feasibility and market applicability? Review the business plan The business plan should also be reviewed. The following questions are a good starting point: Is the business plan still realistic? Do you believe that the market value is still accurate? Do you still expect to be able to deliver the product within the resource and cost assumptions? Decide whether any action should be taken After reviewing the artefacts, you must determine whether further action is needed. The Product Team must determine whether they can still support the agreed Product Strategy and Business Plan. If not, they should be renegotiated. If further change is needed, for instance the reallocation of resource, then the Product Team's principal role is to ensure the Business Leadership Team has enough information about the progress of the product to make an informed decision. Reasonable outcomes from a periodic business review include: If the product strategy and business plan remain valid and have minimal changes, you will usually recommend that the project continues as planned. If the product strategy is still realistic, but you require more resources, you might require the Business Leadership Team to decide whether to add more resources or accept an extended time for delivery. If you have invalidated the product strategy, you might suggest an alternative product strategy (for example, with a different target market). If you have not identi ed an alternative strategy, you might suggest that the resource is allocated to another product. You may also recognise positive changes to the product strategy or the business plan. For example, you might nd that the target market places a higher value on your product. In this case you should discuss whether changes should be made to take advantage of this. Discussions and outcomes should be clearly documented in the meeting minutes. Artefacts Product Strategy Business Plan Business Leadership Report Any other supporting information from recent work Feature Iteration Once the Product Management Team reviews general feedback from all stakeholders, they prioritise a set of problems to deal with. Feature Iteration is the process that solves these problems, primarily with the help of the Core Iteration process. Goals Resolve a selected problem by updating the model ensemble. How to do it Select a problem: The product review will prioritise a set of problems. Brainstorm solution ideas: The Product Team should work on problems that are valuable and feasible. Brainstorming helps coming up with a set of solutions that increases the chance of success. Update the model by Core Iteration: The Core Product Team updates the model with the help of the wellpractised Core Iteration process. Deployment into production: A new model might behave differently in production or have unforeseen consequences. These need to be carefully evaluated after deployment. Select a problem Product Management Team collects three types of feedback on the product from three different sources: Business Leadership Team: Strategic goals and business context Users: User surveys and feedbacks Domain Team: Subject matter insights and ideas The Product Management Team distils these into one or more problem statements which align with the general product strategy. A coherent and measurable business case enables the Product Team to think about potential solutions. Feature Iteration Process Brainstorm solution ideas The Product Management Team approaches the Core Product Teams about the prioritised problems. They brainstorm solution ideas for each of these issues. It is important that this brainstorming is not a technical implementation phase. Maximum a lightweight analysis should be done on any of the problems as they are speculative at this stage. Based on these ideas, the Product Team prioritises a feature and requests the Core Product Teams to start working on the feature. Update the model by Core Iteration Given a feature request, the Core Product Teams can start working on delivering a solution and deploying it in the form of a model. This process is the Core Iteration. The Data Science Team is in charge of the technical aspects of the solution. They take the rough speci cation from the ideas phase and clarify the data requirements with the Domain Team. If data need to be labelled or processed further, they set up the Data Acquisition Team to perform this. The Data Acquisition Team also speci es the ongoing data management requirements and resources needed for that. It is essential to think about the two stages of development: One-off data collection and cleaning and labelling: This is required to evaluate the solution and deploy it as a local MVP. The DA Team must label enough data that the DS and Domain Teams can decide if the solution promises to pass the requirements at least in mid-term. Ongoing data collection, human in the loop, augmented progress: Once the model is in production, but its performance does not reach requirements, manual intervention is needed at the points the model deviates from speci cation. Example The classic example of this is online fraud detection. Next to the "Fraud" and "Not Fraud" labels, there is a third one, "Not Sure". These are channelled to human operators who manually make a decision and thereby improving the model's end-to-end characteristics and also creating labelled data for future model improvement iterations. Once the initial run of data is processed, the Data Science Team trains the model and evaluates on a holdout set. The results are shared with the Domain Team, and they try to nd potential improvements partly based on domain insights. Often these insights help to clarify the impact of error types and relocate priorities. These evaluations can uncover unforeseen problems that were hidden during the ideation phase. This is the reason that from the idea phase, the project should progress to Feature Iteration phase so that the Product Team can work with real evidence on the di culties rather than estimated or even hypothetical ones. Deployment If the feature requires changes in the data model and the model's API in production, they negotiate this with the Functional Teams. There are multiple options on the actual rollout of the model to customers. These all assume some A/B testing capability in the company's production environment. Canary style deployment means that rst only a tiny portion of the users are connected to the new model. The Engineering Team (or the DevOps/MLOps teams) closely monitors errors emanating from the model. This amount of data is not enough to statistically evaluate the model but can nd bugs that got through unit testing. A proper A/B test must be conducted with a sizeable test proportion to get real-life performance feedback. The smaller the expected change, the more samples needed from the tested model to draw a statistically signi cant conclusion. Core Iteration Core Iteration is the primary process through which the Core Product Teams deliver features to the product or update existing features. It is a structured way to incorporate qualitative insights into the quantitative lifecycle of Machine Learning modelling. It enables end-to-end updates and improvements and combines technical and domain teams into one unit. Goals Update or create a feature in the model Update labelling instructions for the Data Acquisition Team How to do it Use quantitative and qualitative feedback: Add the Product Manager's qualitative feedback to the Data Science Team's evaluation of the last iteration, combine it with the help of the Domain Team. Update labelling instructions and data to be labelled: As the problem statement is updated it is necessary to change how labelled data is created to enable the creation of the new model Update the model ensemble: The new feature might require new models or model updates. Train and Evaluate: Create the new model, decide if it improved and send it for deployment. The evaluation results will feed into the next iteration cycle. Traditional "Waterfall" process of modelling It breaks the traditional Waterfall Model process of creating Machine Learning models. That is: Collect data Clean data Label data Create a model Evaluate model To establish the need for an iterative approach to model delivery, we need to review the problems with the Waterfall model: Problem 1: Unforeseen subproblems at speci cation time Real-life problems have many subproblems and edge cases. Businesses set absolute performance requirements and failing to achieve those the model must be improved before it can be considered for deployment. These subproblems are often unknown at speci cation time but need to be rapidly discovered and resolved. Using the Waterfall technique means that this discovery will happen too late, only at the evaluation stage. By this time, the data cleaning and labelling processes have been fully completed and all allocated resources spent on it. Further improvements can only be achieved by allocating more resources. This also means that potentially an obsolete set of data was labelled. After a point, labelling data based on the initial assumptions do not improve performance. This is because a subproblem that was unknown at speci cation time inhibits further improvements. Problem 2: Rigid model architecture Creating and training custom models is an expensive problem, and Data Scientists regularly default to using simple out-of-to-box techniques (e.g. logistic regression, random forests, SVM). These techniques have some hyperparameters, but often their framework does not provide ensemble options. They are not exible enough to incorporate the number of subproblems a real-life problem has. The only options to improve performance are: Collect more data Change hyperparameters Feature engineering Collecting more data, as discussed in the previous section, is prohibitively expensive beyond a certain point. Hyperparameter tuning is a low hanging fruit, and there are several packages to perform it but have limited scope for improvement. This leaves feature engineering, which again is an expensive option because it involves reprocessing the entire dataset. To summarise: The model performance needs to reach an absolute performance threshold stated in the speci cation but cannot because the data is labelled based on the original speci cation. Performance is held back by an unforeseen problem that is not re ected (or prioritised) in the labelled dataset. Even if it would be, there is no guarantee that the rigid modelling tool can solve the problem through feature engineering and hyperparameter optimisation. Initial MVP Model The critical problem to solve is to gather a better understanding of the problem-solution combination at a faster pace. This can happen only at each evaluation phase. The framework should run evaluations frequently and effortlessly and be able to start on some small amount of initial labelling. The initial model should be such that it is capable of future improvements and natively support ensemble techniques. Deep Learning packages are often thought of as convoluted and costly to train. But in an iterative ML framework, they can be used as "programming languages" due to their generic programming capabilities. Deep learning packages can implement models of any complexity. Using them even for simpler models has the advantage that the deployment pipeline doesn't need to be changed if a more complex model can solve the task. Quantitative and Qualitative Evaluation The quantitative evaluation results should be reviewed, seeking qualitative characteristics. The primary strategy to gain insights is to stratify the input space by some feature and feature value and compare performance metrics only on subsets against the metrics on the entire set. If an anomaly is detected, the Domain Team can seek some expert reason why this is happening. Often this insight modi es and clari es the labelling speci cation and guides Data Scientists to create new labelling tasks for the Data Acquisition Team. The feature can cause a problem in two ways: the modelling problem is more challenging if it is present or the feature is rare and unrepresented in the training set. Both problems can be treated in successive iteration with ensemble models. First, a new model should be written if the problem is more challenging, and it needs to be modelled differently from the rest of the input space. It can be added to the complete model with ensemble techniques. Second, if the feature causing the problem is relatively rare, which is why it was overlooked in the rst place, it can be oversampled in the next iteration. The Domain Team can clarify the labelling speci cation, and the Data Acquisition Team can operate on it immediately. These hard to model features can be the source of model drift. If the feature's presence in the input distribution increases, the model performance will drop. The Data Science Team should modify their Business-As-Usual plans to monitor this and notify the relevant Functional Teams (Data Warehouse and Business Intelligence Teams) about these changes. Updating the Labelling Instructions The insights from the evaluation have multiple implications. They suggest potential solutions, changes to labelling speci cation and further hypothesises to evaluate in the next evaluation phase. First, the Data Acquisition Team needs to be informed on how the labelling should happen, how much resource is expected to reach the required performance metrics. Because Machine Learning is usually employed in di cult and hard to understand the situation, usually it is bene cial to run a couple of test labelling exercises where multiple people label the same samples. These can be compared and reveal if the labelling instructions are clear and objective, and everyone can interpret them in the same way. Also, it reveals if there are edge cases that weren't yet clari ed. These can happen during a series of ad-hoc meetings. Synthetic Data In some cases, the labelling task can be expressed as a computational problem where metadata is labelled, and a data processing step creates labelled data. Example Labelling surface forms in named entity linking, then matching these surface forms in the corpus is synthetic data creation as labellers do not physically label each occurrence of a named entity. Updating the Modell Ensemble Deep Learning frameworks enable the Data Science Team to treat Machine Learning modelling as a programming task. Iteratively updating the model and testing it on the labelled data. The new feature will be implemented as an additional submodel and packaged together with the model. This allows reuse of the deployment infrastructure and minimum changes from the Engineering Team because the model can reuse the old models APIs. Bene ts of Iterative Modelling There are numerous bene ts of iterative modelling, just like continuous delivery in Software Engineering: The models require a smaller upfront investment. Modelling can be started on incomplete speci cation. Unknown issues surface faster and before the majority of the resources spent on the project. The team can practice deployment frequently; the model launch is a regular, uncommon event. Risks are lower as each change is smaller, easy to fall back to a previous model. Business-As-Usual Once the model is deployed and tested, the feature is part of the product and must be maintained. This requires budgeting for the cost of ongoing data acquisition and the performance monitoring by the Data Science Team. Goals Monitor and Maintain the ongoing performance of the product How to do it Monitor model performance: The Data Science Team continuously monitors the performance of the product. Maintain model performance: The Core Product Teams regularly and autonomously update the models in production by performing the Core Iteration process. Estimate ongoing cost of BAU: If the feature requires signi cant ongoing maintenance, its costs should be estimated as this will come off the products general labelling budgeted. Monitor model performance The Data Science Team continuously monitors the performance of the product. They use either the dashboards provided by the Business Intelligence Team or ad-hoc analysis. They also set up noti cations and alerts to be aware of any anomalous activity. These are incorporated into the regular quantitative reports for the Product Management Team. The Domain Team tracks the performance from a qualitative way, and the Product Management Team collect user surveys and observes if the feature causes any positive or negative feedback. Maintain model performance Models that are susceptible to drift require continuous attention. These models usually handle problems which naturally face unknown situations. Typical examples are news related tasks (topic modelling or named entity recognition). Experiencing drift is a negative effect on overall performance, so the Data Science and Domain Teams should work towards a solution that avoids this. Some drift can be handled through ongoing labelling. The iterative process of model creation enables this type of maintenance. The Data Science Team is in charge of setting up the ongoing labelling task, and the Data Acquisition Team is in charge of actually doing it. Estimate ongoing cost of BAU Both teams are also in charge of estimating the labour needs of the labelling task. Separating labelling from modelling and making it more measurable and calculable is a signi cant reason for creating the Data Acquisition Team. Ongoing labelling tasks form continuous labour costs that the Data Acquisition Team must manage with their available resources. New features can only be requested against the available uncommitted capacities, and new MVP models and initial labelling tasks for new features should t into their budget as well. Organisation Overview Machine Learning requires cross-functional effort from the host company. The organisational structure described in this document is deliberately broad to make sure that all stakeholders are represented in it. This enables the product managers to identify already existing roles and resources in the organisation and utilise them in the project. The role of already existing teams is to provide services to the Product Team and prioritise these requests into their schedule. The organisational principle of this manual is Separation of Concerns. Each team should operate only in their area of expertise which makes sure they are comfortable with their role in the project. They connect to as few teams as possible through simple communication interfaces that enable operational simplicity. The organisational diagram describes the teams and interfaces between them. Separating the project into these groups enable allocating all roles and responsibilities necessary to the project. The communication channels are established to distribute artefacts and information between them. Team Structure Each team consist of one or more persons; one of them is appointed to be the leader who will represent the group's role at leadership meetings. A person can represent multiple teams in small companies. However, the team's function must be performed regardless as they are necessary for delivering and iterating on the product. Teams The individual team pages detail their roles in each of the phases. Business Leadership Team Product Team Product Management Team Core Product Team Data Science Team Data Acquisition Team Domain Team Functional Teams Engineering Team Data Warehouse Team Business Intelligence Team Business Leadership Team The Business Leadership Team consists of the sponsor of the product from a business perspective and any staff that supports this role. The team's primary role is to ensure that the product is embedded into the company's general strategy and enable adding value from early on. Product Management Team The Product Management Team is responsible for creating the project roadmap and enabling the progress of the project accordingly. In the planning phase, they are in charge of mapping out the business opportunities and use cases and create the plans for the next phases with the Core Product Teams. In the Bootstrapping phase, they lead the effort to set up the structure of the project and establish communication channels. In the Operating phase, they collect feedback from users, the Domain Team and the Business Leadership Team and request solutions from the Core Product Teams. Core Product Teams Core Product Teams are the Domain Team, the Data Science Team and the Data Acquisition Team. They deliver the core Machine Learning part of the product by covering all ve part of the Machine Learning modelling process: Data Speci cation: Domain Team and Data Science Team Data Collection: Data Science Team Data Labelling and dataset maintenance: Data Science Team and Data Acquisition Team Model Creation: Data Science Team Model Evaluation: Domain Team and Data Science Team Domain Team The Domain Team represents the business know-how regarding the subject of the project. They understand how the company interacts with its customers right now and what processes that requires. They can articulate how their processes work and what they require from the Product Management Team and the Data Science Team. These teams, in turn, can suggest automation and optimisation use cases that the Domain Team can evaluate conceptually and after completion qualitatively. During the Operating phase, they are in contact with the Users through their business processes and express their feedback about them to the Product Management Team. This will be aggregated with the Users actual feedback and Business Leadership's strategic advice. Data Science Team Based on the product de nition the Data Science Team's job is to create models that will be part of the augmented business process. They come up solutions for the product problems proposed by the Product Management Team and determine their feasibility. They work closely with the Domain Team and the Data Acquisition Team to estimate costs and risks. Data Acquisition Team The Data Acquisition Team is in charge of creating and maintaining the datasets to train the models. They understand the one-off and ongoing costs of this effort to maintain model performance at the level set in the speci cation. They also maintain any metadata related to these datasets. They closely coordinate with the other two Core Product Teams to achieve their goals. Functional Teams The Functional Teams are the Engineering Team, the Data Warehousing Team and the Business Intelligence Team. Functional Teams support the Core Product Teams in their goal to create the Machine Learning product. They are consulted at the Planning phase to ensure low friction and feasibility and take part in the Bootstrapping phase. At the Operating phase, their contribution should be minimal as the Core Product Teams must deliver the main part of each new feature, but they need to synchronise with the Functional Teams in case a change might impact them. The Product Team tries to involve the Functional Teams minimally and in a way that causes little friction. The deployment process should use the company-wide standards as much as possible because this causes the least amount of work for the Functional Teams. Engineering Team The Engineering Team implements the model deployment framework and makes sure that the relevant logs are recorded for the Data Warehouse Team. The solution should wrap the model in a way that it looks just like any other API. If the company uses an MLOps platform, then the Engineering Team should include the MLOps team that maintains that platform. This framework should be implemented during the Bootstrapping phase as live deployment is necessary for reaching MVP status. Data Warehouse Team They are responsible for recording the logs from the deployed models and processing them through ETL. This should be done in the usual standardised way the company deals with other logs. The Data Warehouse Team provides the Data Science Team with the necessary data to evaluate the performance of the model in the context of other data of the company. This is important because the model performance should be evaluated in a broader business sense and not just from a statistical perspective. Business Intelligence Team The Business Intelligence Team provides the necessary information and framework on how to present information about the product and its models. If they maintain a dashboard system, this information should appear there and enable all the Product Teams to self-service their data needs on the project. This can also be used to present performance updates to the Business Leadership Team. It is bene cial and a low friction solution to reuse an already existing platform for this purpose. Business Leadership Team Business Leadership represents the company and its strategic objectives in the project. They also approve the project, the allocated resources for it and its continuation based on the value it is adding. To perform these tasks, they must inform the Product Team of the business context of the company and receive the necessary information from the product so they can make the GO/NOGO decisions. Tasks and Responsibilities Business understanding phase Communicate high-level business strategy from the company's executives Assess the commercial impact of the product Integrate it into the company's commercial plans Planning phase Receive the estimations on the project impact Make GO/NOGO decisions Operating phase Receive the ongoing cost/bene t analysis of the product Decide on further resources to be allocated Communication Receives progress reports on the current status of the product during Bootstrapping phase Receives regular impact reports at the Operating phase Provides regular updates on company strategy for the Product Manager to update the Product Roadmap Meetings Product Kick-Off Meeting Product Meeting (optional) Product Leadership Team The Product Management Team is responsible for creating the product roadmap and enabling its progress. The Product Management Team represents the users and the business in the product organisation. They represent the "Question", and the Core Product Teams represent the "Solution". Tasks and Responsibilities Business understanding phase Communicate with the Business Leadership Team on strategic opportunities Collect customer feedback Assess technical capabilities Surface potential solutions Planning phase Plans the Bootstrapping phase with Core and Functional Teams Draws estimated plan for Operating phase Estimates one-off and ongoing costs, ongoing bene ts and the risks of deviating from these estimates Bootstrapping phase Communicate with the Core Product and Functional Teams on the progress of the tasks Review plans to ensure smooth progress Operating phase Collect feedback from the Users, the Domain Team and the Business Leadership Team Requests new features from the Core Product Team Maintaining the Product Roadmap Company-wide Retrospectives Presents progress at company-wide all-hands meeting Communication Receives feedback from Users on product usability Receives feedback from the Business Leadership Team on product performance Receives feedback from Domain Team on product usability Receives progress reports from the Data Science Team Provides feature requests and their justi cation for the Core Product Teams Meetings Product Kick-Off Meeting Product Meeting Coordinating Sprint Planning Company-wide All-Hands Meeting Domain Team The Domain Team represents the business know-how regarding the subject of the project. Before the Machine Learning project starts, they support the business case with their knowledge. They also evaluate solutions from a qualitative perspective and provide feedback to the Product Management Team. Tasks and Responsibilities Business understanding phase Provides the domain knowledge based understanding of the business problem to the Product Management Team Planning phase Contributes domain knowledge to the solution Sets acceptance criterion to the solution Bootstrapping phase Contributes to the data collection and acquisition tasks Evaluates MVP Provides feedback to the Product Management Team Operating phase Reports on problems about the product Creates potential new feature ideas Contributes domain knowledge to each feature request Communication Provides feedback to Product Management Team on the business understanding Provides expertise to the Data Science Team about potential solutions Reports on the quality of the data Meetings Product Kick-Off Meeting Product Meeting Coordinating Sprint Planning Company-wide All-Hands Meeting Data Science Team The Data Science Team's primary role is to create the initial model and organise its deployment with the Engineering Team. Together with the Domain Team, they come up with solutions for the feature requests of the Product Management Team. Also they oversee the dataset creation with the Data Acquisition Team. During the Operating phase, they monitor the performance of the models in production and feed this information back to the Product Management Team. Tasks and Responsibilities Business understanding phase Cursory feasibility studies on data and solutions Planning phase Solution ideas, their implementation plans and their resource needs Linking business value to statistical performance Risks related to missing performance targets (business and statistical) Bootstrapping phase Implement infrastructure to deliver continuous data processing capabilities Create the data model Create raw datasets Create data processing tasks for Data Acquisition Team Implement the models Train the models Evaluate the models Specify the required logging structure for the Data Warehouse Team Deliver the model artefact to the Engineering Team Operating phase Together with the Domain Team create solution candidates for the Product Management Team's feature requests Together with the Data Acquisition Team create the datasets required to deliver the solutions for the new features Evaluate the solutions quantitatively and with the Domain Team qualitatively Communication Communicate user analytics of the product to the Domain Team and Product Management Team Communicate business performance of the product to the Business Leadership Team (through the Product Management Team) Communicate data de nition and speci cation to the Data Acquisition Team Communicate API speci cations and changes to the Engineering Team Communicate telemetry speci cations to the Data Warehouse Team Communicate monitoring needs to the Business Intelligence Team Meetings Product Kick-Off Meeting Product Meeting Coordinating Sprint Planning Company-wide All-Hands Meeting Data Acquisition Team The Data Acquisition Team is in charge of collecting and maintaining the datasets required to create the models for the products. They work together with the Data Science Team and the Domain Team to create the speci cations and quality requirements for these datasets. The Data Science Team provides them with the raw data and tools to use. Tasks and Responsibilities Business understanding phase N/A Planning phase Feasibility estimation on the one-off and ongoing labour needs of the data acquisition efforts on the product Bootstrapping phase Running the initial data acquisition process Operating phase Specifying processes for new features Estimating resource needs for these processes Delivering new datasets Communication Reports to Domain Team and Data Science Team on the progress of data acquisition Meetings Product Kick-Off Meeting Product Meeting Coordinating Sprint Planning Company-wide All-Hands Meeting Engineering Team The Engineering Team is in charge of the technical infrastructure and solutions in the organisation. They specify and deliver the solution that integrates the Machine Learning product into their infrastructure. If the company has an MLOps team, they are assumed to be part of the Engineering Team, and they are in charge of the tasks that meet their areas of responsibility. Tasks and Responsibilities Business understanding phase N/A Planning phase Plan deployment and feasibility Plan telemetry requirements together with the Data Warehouse Team Bootstrapping phase Create the deployable solution Operating phase Maintain solution and deploy new models if needed Communication Communicates with Data Science Team on model serving API Communicates with Data Warehouse Team on telemetry speci cation Meetings Product Kick-Off Meeting Product Meeting Coordinating Sprint Planning Company-wide All-Hands Meeting Data Warehouse Team The Data Warehouse Team is in charge of recording the raw data from the deployed models and transform it into the company's data warehouse through already established processes. The Data Science Team can use this data to evaluate the production performance of the deployed models. Tasks and responsibilities Business understanding phase N/A Planning phase Plan for telemetry implementation based on the Data Science and Engineering Teams speci cations Bootstrapping phase Add telemetry processing into the ETL process Operating phase Maintain solution Communication Communicate with Data Science and Engineering Teams on ongoing telemetry and data needs Meetings Coordinating Sprint Planning Company-wide All-Hands Meeting Business Intelligence Team The Business Intelligence Team is in charge of providing the KPIs to the Product Team through self-serving dashboards. This team calculates the relevant KPIs, so they are comparable across the organisation. Data Science Team expresses the model's business performance in this framework rather than a custom one. Tasks and Responsibilities Business understanding phase N/A Planning phase Plan KPI update needs Plan dashboard update needs Bootstrapping phase Implement KPIs Implement dashboard changes Operating phase Maintain KPIs Maintain dashboards Communication Receives data speci cation from the Data Warehouse Team Receives KPI speci cation from the Data Science Team Meetings Coordinating Sprint Planning (optional) Company-wide All-Hands Meeting Communication Good communication is essential in any ML project. When working in short sprints and delivering work incrementally, smooth communication ow keeps all teams productive and aligned towards the shared team goals. You must foster an environment in which open communication is encouraged. You are likely to have many tools available to support good communication, you should ensure they are organised and well utilised. Adapt the communication elements described in this part to your tools and circumstances. We describe a number of important meetings, reports and other communication items in this section. None of these should become an exercise in bureaucracy. Keep meeting materials lightweight and practical, with a focus on the essential elements contributing to clear communication and coordination. In this section Meetings Overview Special Meetings Product Kick-Off ML Design Workshop ML Design Review Regular Product Meeting Coordinating Sprint Planning Sprint Planning Daily Standup Company Wide All-Hands One-To-Ones Artefacts Business Problem Template Business Problem Ranked List Product Strategy Template Business Plan Template User Persona Template Product Concept Template ML Product Requirements Template Other Communication Monitoring Dashboards Reports Object Handovers User Interview Questions Meetings Overview A range of regular and special meetings should be organised to support the work as it progresses through the lifecycle. These meetings help to maintain alignment between teams and ensure that resources are used e ciently. You should follow best practice for all meetings, ensuring agendas and, where relevant, materials are distributed in advance for review by attendees, and publishing minutes after meetings have concluded. In this section Special Meetings Product Kick-Off ML Design Workshop ML Design Review Regular Meetings Product Meeting Coordinating Sprint Planning Sprint Planning Daily Standup Company Wide All-Hands One-To-Ones Product Kick-Off Attendees Product and Functional Team Leaders Product Management Team Leader Business Sponsor Trigger Before starting the Bootstrapping phase, once plans are su ciently de ned, and product strategy and business plan documents have been produced. Purpose The product kick-off is the forum in which an agreement is sought to start the product and begin the bootstrapping phase. During the meeting the planning and decision artefacts are reviewed. The key artefacts are the product strategy and business plan from the decision process, and the plans from the initial product de nition. There must be agreement on the product strategy and business plan before starting work on the bootstrapping phase, as these provide the governance mechanism for all future work (until they are replaced by a similar agreement). In practice, it is often useful to meet and seek informal agreement between individuals ahead of time, but even if this agreement is reached the meeting should be held and the decision recorded. If no agreement is reached, the group should agree on how the impasse will be resolved before reviewing updated terms. Agenda Review the Product Strategy Review the Business Plan Review the Initial Product De nition Decide whether to accept and proceed with the Bootstrapping phase, or revise ML Design Workshop Attendees Functional and Product Team leaders Some team members Product Management Team Trigger When planning for a new product, or when making signi cant changes to an existing product. Purpose The ML Design Workshop is described in the corresponding section of the Planning phase, as a core part of that phase of work. The same format should be used any time extensive changes are being made to the product. Design workshops, when they are conducted well, help to generate more creative ideas to solve problems and help to ensure that solutions satisfy both product and technical constraints. As opposed to all other meetings, a design workshop should be schedulde for as long as time allows. Most of the meeting will be spent brainstorming different ideas to solve the problem being addressed. After conducting a design workshop and documenting the selected design, a design review should always be held. Agenda Present and discuss the business problem and market. Ideate target markets, ML models and product pipelines. Select design candidate(s). Identify any high-priority unknowns. ML Design Review Attendees Functional and Product Team leaders Some team members Product Management Team Trigger When a set of product plans has been completed, normally following a design workshop. Purpose A design review is held to review product designs, whenever a sign cant change is introduced. If the plans started with a design workshop then a review should always be held. Even if you did not start with a design workshop, if the plans are complex then it will be bene cial to review them as a group. The most important artefact to present and discuss is the ML Product Requirements Template, though other supplementary materials may also be used. Design reviews help to ensure all members of the Product Team have a clear, shared understanding of the plans. They minimise the risk that a requirement has been misunderstood. In order to do this effectively all members of the team should be encouraged to scrutinise and comment on each element of the plan. Agenda Review and nalise design. Product Meeting Attendees Team leaders from the Functional and Product Teams Product Management Team Business Leadership Team representative Frequency Weekly Purpose The product meeting is the main forum for regular review of progress, the prioritisation of new features to develop and in general the primary information sharing place. It should be held weekly, and attended by team leaders and the product manager. The product manager prepares and distributes materials in advance. The product manager should maintain a product roadmap, tracking performance against the product strategy and the progress on features from concept to delivery and assessment. By sending materials out in advance, the team has the opportunity to review and fully prepare for the discussion. Agenda Review user analytics: See User Analysis; a report of progress against the pre-de ned KPIs, alongside any other relevant information on user behaviour. Review qualitative user research: See User Research; a report of user interviews and other qualitative information from users, as it relates to the product. Review progress against previous initiatives: Review of technical progress vs plan. This should include progress reports for items still in development, as well as any relevant reports on delivered models and features. Implemented initiatives and expected product bene ts. Speci c review of recently released models and features, and assessment against success criteria. Discuss product strategy and business plan: Are these still viable and realistic? If not, agree with a recommendation on action to be taken with the business leadership team. Review Product Concepts: What has been previously prioritised? Did the team create the requirements? Are any concepts being raised for prioritisation today? Review Product Requirements: Are any product requirements ready to add to the backlog? Coordinating Sprint Planning Attendees Product Team Leaders Functional Team Leaders Frequency Weekly Purpose Coordinated sprints keep teams aligned, which is particularly important in data projects which necessarily involve more teams and require more cross-functional organisation. All teams deliver work incrementally and in parallel. Weekly company-wide sprints provide visibility to work being produced and ensure roadblocks can be addressed collectively. See Sprint Planning for details on individual team sprints. Company-wide sprints are attended by the team leaders or their representatives. Agenda Review of progress towards previous week's sprint goals. Review of overspill from previous week, and work already assigned to next week's sprint. Review of new requests from the product meeting, and allocation to next week's sprint. Sprint Planning Attendees All members of a team Frequency Weekly Purpose Sprint planning is the main way work is organised within a team. A sprint planning meeting should be held at or just before the start of a new sprint after the company-wide sprint meeting has been held. Agenda Review and con rmation of work completed in the previous sprint. Review of overspill and pre-assigned work for the current sprint. Review of new feature requests agreed in the company-wide sprint planning meeting. Allocation of work and priority for the upcoming sprint. Daily Standup Attendees All members of a team Frequency Every day Purpose Many teams schedule daily standups at the start of the day, though timing can vary. The standup format allows teams to adjust quickly to new information in a sprint, responding to roadblocks and other issues, to stay on track for delivery of the planned sprint items. Some teams are now experimenting with different formats of daily standup, including asynchronous standups. As long as a daily team-wide standup is held, then teams should nd the format that works best for them. Agenda Progress: Reports and updates on progress from team members, against sprint goals. Plan: Review of the plan for the next day, in the context of the rest of the sprint. Problems: Discussion of any open problems or challenges currently impacting the team's ability to deliver. Company-Wide All Hands Attendees All staff within a business unit. Frequency Varies, depending on the situation. Short meetings every two to four weeks tend to work better than long meetings once a quarter. Purpose All-hands meetings are held with the wider company (or division or business unit). They provide an opportunity to share information with the whole team and celebrate success collectively. It is useful to conduct regular all-hands meetings, though these should be relatively informal with time for Q&A and encouragement of open dialogue. All hands help to motivate team members towards the shared objectives of the company. Agenda Update on company achievements, goals & strategy Relevant updates on product achievements & goals Any useful and positive customer feedback Q&A One to Ones Attendees Team leader and a member of the team Frequency Fortnightly Purpose One-to-ones should be held with each team member frequently. These meetings provide an opportunity to coach team members, identify skill gaps, and answer questions which team members don't feel able to raise in team meetings. They should be relatively informal, covering progress on current work where needed but always leaving time for regular discussion of soft skills, performance and professional aims & objectives. An open-format meeting and open-ended questions work best. Agenda How is it going? How can I help you? Anything else? Business Problem Template Business Problem Description A narrative description of the business problem Should include who experiences it, the current experience and ideal experience, why it has not been solved previously, and the bene ts of solving Alternative Solutions Considered Buy / Build / Partner E.g. An engineering solution would have these characteristics; an analysis of external suppliers found none. Measurable Outcome E.g. Reduced time taken from X to Y KPi Impact Impact Description Written description of the business impact of solving the problem Impact Analysis KPI Name Current KPI Score Anticipated KPI Impact E.g. Daily Active Users E.g. 25,000 High/Medium/Low E.g. User Retention Rate E.g. 92% per month High/Medium/Low KPI Impact Score A score for the impact, which will be derived from your own KPIs Scoring mechanisms might range from 1 - 10 Solution Friction Risk Analysis Risk Name Risk Rating E.g. Decision to be automated is subject to legislative controls E.g. High E.g. Older data is archived offsite E.g. Low Risk Score A summary score for the risk analysis, based on your own situation and on a scale from 1 - 10 Business Problem Ranked List Business Problem Name KPI Impact Score Friction Score Total Score E.g. Detect failing transactions earlier 6 3 42 Product Strategy Template Product Strategy Summary Two line description of the product strategy, to be shared and used widely Medium-Term Goals Options Explored Summary of different goals and strategies which were explored Option 1 Target market: Target market expectations: description of what is anticipated to be needed in order to satisfy the market segment Target timeframe: reasonable timeframe to meet these expectations Alignment with business strategy: consideration of how this aligns Option 2 Target market: Target market expectations: description of what is anticipated to be needed in order to satisfy the market segment Target timeframe: reasonable timeframe to meet these expectations Alignment with business strategy: consideration of how this aligns Detailed Description of Selected Goals Space for more information on the goals selected for the strategy; what are the desired outcomes? Selection Justi cation Description of why the particular segment and goal or set of goals was selected How the Goals Will Be Met Options Explored General description of any alternative options that were considered Description of Selected Option Space for more description on the strategy to meet the goal Business Plan Metadata Name of product: name Link to business problem: link Summary Total value per year: size of market * value per user Total cost per year: amortised one-time costs, plus annual costs Target Market Target market description: eg Fashion stores with one to three physical locations based in Europe Target market size: eg 10,000 individual businesses Links to any research on the target market's reference to previous market research needs: Business Problem Value Time saved: (include the user or department who saves the time, approx. number of users per client, cost per hour of the user) Costs avoided: (describe what costs will be avoided and how) Incremental revenue: (describe how the solution supports incremental revenue generation) Other: (provide a clear outline of how a client would value this) Ongoing Solution Costs Team Size use fractions to represent expected proportion of a shared resource Data Scientists: Data Analysts: Domain Resources: Product Manager: Engineers: Infrastructure Costs Estimate of infrastructure costs to train and operate the data product Data Costs Approximate cost of any new dataset(s) which need to be acquired to create the product One-Time Costs Any signi cant one-time costs that are expected to be incurred, for example, one-time acquisition of a labelled dataset These costs will normally be amortised over ve years, though another length of time might be used depending on the organisation User Persona Template Demographic Information Name: Age: Job Title: Company Type: Location: Education: Picture Including a picture to represent this user will help everyone visualise the person described Goals Description of the goals the person is trying to achieve Frustrations Top-of-mind frustrations in trying to achieve the goals Bio Description providing more details about the persona Technical Capabilities Level of competence with IT, statistics or other relevant areas Product Concept Template Concept Information Name of concept: Date proposed: Status: E.g. under review; not prioritised at this time; detailed requirements created Concept Detail Problem Statement What is the user trying to do? What is stopping them doing it? What impact does this have? Desired Outcome Statement In general terms, what would be a satisfactory outcome for the user Concept Impact Impact Score A score for the expected impact on KPIs, from 1 - 10 Score Justi cation Brief description of why the concept was given this score ML Product Requirements Template Requirements Information Requirement Name: A short name included for ease of reference Document Status: E.g. Draft, Scheduled, Released, Complete Document Owner: Creation Date: Scheduled Date: Link to Concept: Success Metrics A description of the KPIs or other metrics which will be measured after release to determine the success of the initiative User Stories Requirement User Story Importance E.g. Make song E.g. Dave, who is a casual music listener, wants to nd High recommendations more songs similar to the one he is currently listening to Notes because he has nished this one. Data Flow Data Flow Description Description of the ow of data, the application of ML models, any last-mile processing, and any important feedback loops. An overview of general architecture This is often best shown as a ow diagram. Data Flow Acceptance Criteria Any information about the acceptance criteria of the model, often in terms of precision and recall. User Interaction Description and diagrams of user interactions with the features described. May be in the form of low- mock-ups and user ows to start with, up to higher delity and more comprehensive images as the requirement is developed further. Open Questions Question Answer Date raised Response from Meeting Minutes Template Meeting Name: eg Data Science Team Sprint Planning, Sprint 10 Meeting Date and Time: Attendees eg J. Locke Agenda eg 1. Review last sprint 2. Review next sprint 3. New requests 4. Task assignment Minutes eg 1. Review last sprint Model X training completed and ready to deploy Actions eg JL to follow up with engineering on model X deployment Monitoring Ongoing monitoring of products is essential. This monitoring incorporates three aspects: The Data Science Team must monitor the performance and mid-term (statistical impact) of model changes. The Software Engineering Team must monitor the current health of deployed models (immediate impact). The Product Team must monitor the impact on the user and performance of the product, via KPIs. Monitoring should be real-time and automated, providing a clear view of the performance of the product. When using A/B testing techniques, monitoring processes will need to be able to distinguish the different classes. Processes must be in place to detect unexpected aberrations quickly, allowing for fast roll-backs when necessary. Dashboards Most organisations now have dedicated Business Intelligence Teams and processes. You must integrate with these as soon as possible. The use of shared dashboards provides three things: The ability to easily monitor KPIs on an ongoing basis. The ability to easily share business performance with stakeholders. A common standard for the measurement of KPIs, which allows for direct comparability. Exercise caution when starting any new product. Start with a wide range of KPIs and user metrics to ensure you build up a complete picture of user behaviour. Then narrow your focus to one or more key metrics or one North Star metric. Reports Dividing a product approach into iterative steps will not create transparency automatically. It is necessary to communicate product progress and status clearly and regularly to stakeholders in ways that are aligned with the working practices of those stakeholders. Two reports are of particular importance, covered here: Reporting to the Product Leadership Team. Regular, weekly reports should be shared across the team to ensure all team leaders have a clear and detailed view of the current state. Reporting to the Business Leadership Team and other senior stakeholders. Regular but less frequent summary reports shared, showing evidenced progress towards the product strategy. Product Leadership Reporting Each team must provide a weekly report to the Product Manager, in the 24 hours preceding the weekly Product Meeting. The Product Manager will aggregate these reports to provide a weekly summary, send out as part of the materials for the Product Meeting before the meeting is held. Reports should detail progress against sprint goals, performance metrics, and any new and relevant qualitative insights from work completed. As this is a weekly report, it is useful to take advantage of automated reporting tools from any ticketing or tracking software in use. Qualitative statements should be short. Business Leadership Reporting Good quality reporting to the Business Leadership Team is essential to ensure transparency and maintain trust. Reporting should be sent consistently and based on the needs of your senior stakeholders. In most cases, weekly progress is too nuanced to be of value, and a monthly reporting cycle is better, but this should be discussed and agreed at the start of any new project. Reporting should make use of the reports from the Product Meeting, and again automation will assist in keeping to a regular cycle. Reports should lead with KPIs and the movement in them from month to month, then provide more detail on modelling and other technical work which has been delivered to production. Work planned for the following month can also be detailed, again this should be in terms of the things that will be introduced into production over the following four weeks. Summary qualitative statements can be included at the beginning of your reporting and should be concise and generally limited to the top two or three points of interest. Object Handovers In the lifecycle of a data product, objects will frequently need to be passed from team-to-team. These objects might include trained models going from the Data Science Team to the Engineering Team for deployment, or table schemas being passed from the Data Warehousing Team to the Data Science Team, and so on. Often these objects will lie on the critical path of delivery for a phase of work, so the handovers must be smooth and e cient. All handovers should be documented in both the company-wide sprint plan and in individual sprint plans. The expected handover date should be recorded and tracked by the teams, and by the Product Manager. You should maintain clear and repeatable procedures for handovers, negotiated between teams, to ensure the smooth running of a project. In many cases tooling to automate handovers can be introduced. The automation of handovers will help to ensure the smooth running of projects and should be pursued wherever possible. Example 1 Updating of data schemas can be monitored by attaching noti cations on code repositories. Then any activity on these can send alerts to Slack channels monitored by the subscribing team. User Interview Questions We include below a range of user interview questions, which can be referred to as a starting point in your user research. Over time, you may adapt these to your interview style and product focus. However, you should always keep in mind the principle of asking open questions, based around the user and the job they are trying to do rather than the things they are doing within a product. Starting Out Early in the product lifecycle, and periodically after that, you need to build up a clear picture of the underlying job the client is trying to get done. Note that this is different from what they are using the product to do. Often that is only a part of their overall job. You must try and build up a clear picture of the criteria by which they judge the job as being completed. Example For example, Salesforce is a B2B Customer Relationship Management tool, used by a salesperson to record their interactions with sales prospects. If you ask someone what they are trying to do with Salesforce, they would likely talk about the process of storing notes and sharing progress on a sale with their manager. However, if you were to ask the same person what job they were trying to do, they would tell you that they are trying to sell a product. The most valuable improvements to a tool like Salesforce will be those that improve job performance, in this case, sell more products. Whereas making it easier to store and share notes and progress reports will likely be much less valuable. Why are you using that product? What job are you trying to get done? This question is about discovering and understanding the job the customer is doing in the customer's context. The question is job-centric rather than product-centric. For more information on asking job-centric questions, search online for the jobs-to-be-done framework, e.g.: Anthony W. Ulwick's Jobs to Be Done. What is most important to you when completing that job? This is the rst question to ask. Your follow up questions will dive into more detail on the things that are important to the interviewee about completing that job. Your aim in these questions is to construct a series of desiredoutcome statements; understanding the conditions by which the job is judged to be more successful. What things are you currently nding most challenging about that job? Once you have a clear understanding of the job the customer is doing, and how the customer views that job as being more or less successful, you can explore the aspects that pose a particular challenge. It's important to interview widely, as different customers will, of course, have different perspectives on this question. After Product Launch Once a product is launched, questions will explore the relative success or failure of a product. They seek to validate further desired-outcomes and the extent to which they have been met or improved upon, and whether that improvement has proved bene cial. Here, questions will be more speci c to your product, but most questions will follow a similar framework. Did you successfully complete the job you were trying to do? Here, the question should be re-phrased to be speci c to the job you previously identi ed. You should verify whether a job was completed successfully with your product. It may also be useful to understand if there was anything different about the conditions for the speci c job that was attempted and if they were normal or abnormal. Example Taking our previous example, if you have created a new CRM tool, you might ask if the user was able to sell a product with it successfully. If the user is experimenting, they may have used the CRM tool with a prospect that they felt they had less chance of selling to anyway. It is important to understand if this is the case before assessing how well the product performed. Did the product help you to perform the job? This question should be phrased as well in terms of the speci c identi ed job. A general and open one will help to solicit more feedback on the overall job before becoming more speci c. What things were more di cult compared to how you performed the job previously? This question compares the before and after of the product. It provides insight into anything important which may not yet have been delivered and will help to inform future feature iterations. Was the performance of any aspect of the job worse compared to not using the product? Here you are looking to uncover any unforeseen consequences of a change in how a job is being performed. Improving one aspect of a job may negatively impact another related aspect. What did you nd most challenging when trying to use the product? Environmental questions about a product are important to ensure adoption. Sometimes simple changes in work ow, like the order in which information is entered in a form, can signi cantly improve product adoption rates. What is most important to you to improve in the near future? This question again should be asked about the job, not the product. It seeks to nd out what is top-of-mind for the interviewee and to explore how they value the product being used. Supplementary Questions Some other questions can be particularly useful in different contexts. Why is that important? "Why?" is a powerful question, and you should ask it often. Understanding the why of a problem seeks to move from a symptom to a root cause, and allows for better problem-solving. For more information, see Toyota's Five whys. What have you done to try to meet this need so far? This question is particularly important when analysing business problems in greater depth. It helps to assess value more accurately. If someone has done nothing to solve a problem, then the solution is probably not very valuable. Conversely, if the interviewee has invested a signi cant amount of time trying to solve it, then that is a stronger indication of value. Why haven't you built a tool to do this already? Similar to the previous question, this question seeks to assess the value of solving the problem and is particularly useful in B2B organisations that are large enough to solve problems using internal resources. If a budget has been allocated to nd a way to solve a problem, then that is a direct assessment of value which can be used to benchmark the problem. Glossary of Terms ARR Annual Recurring Revenue BAU Business-As-Usual BOUNDED CONTEXT Bounded Context is a principle that divides complex organisations into separate units and de nes communication interfaces between them. Two teams responsible for two contexts only communicate with each other if there is an interface de ned between them. See also: Martin Fowler CAC Customer Acquisition Cost CANARY TESTING A technique to test a new model in production by submitting a tiny proportion of the whole tra c to it. CORE ITERATION The main process of changing a model where the Core Product Teams create data to be labelled, label it, train and evaluate a new model. CORE PRODUCT TEAMS The Data Science, Domain and Data Acquisition Teams collectively, they perform together the Core Iteration. DAU Daily Active Users EBITDA Earnings before interest, taxes, depreciation, and amortization. ETL/ELT Extract-Transform-Load or Extract-Load-Transform: Standard terminology for converting raw telemetry into structured data for Business Intelligence. FEATURE ITERATION The process of adding a new feature to the ML product. The Product Management Team requests a new feature and the Core Product Teams start one or more Core Iterations to achive it. FUNCTIONAL TEAM The Engineering, Data Warehouse and Business Intelligence Teams together. While most of the work is done by the Product Team, Functional Teams must take part in planning and creating the end-to-end product. The Product Team ensures that they are involved in the process and contribute to their area of expertise but doesn't experience too much friction. HITL Human-in-the-loop. A process where inputs are labelled with Machine Learning if the model can decide with high con dence and by real person if not. KPI Key Performance Indicator MAU Monthly Active Users ML Machine Learning MLPM Machine Learning Product Management NPS Net Promoter Score OKR Objectives and Key Results PRODUCT CONCEPT We use _Product Concepts_ to describe early ideas for different features to prioritise, written in a structured way. These ideas might come from customer interviews, in which a customer expresses a desire for a particular feature, or from a team member who observes some friction in a user journey, or somewhere else in the organisation. Users of [productboard](https://www.productboard.com/) will be familiar with these as _Insights_, users of [Aha!](https://www.aha.io/) refer to these as _Ideas_, other platforms use different terminology. PRODUCT TEAMS The Product Management Team and the three Core Product Teams together, they perform together the Feature Iteration. SEPARATION OF CONCERNS **Separation of Concerns** is a design principle to establish a well organized system where each part ful lls a distinct and meaningful role. Citing Reference to cite when you use this book in a research paper: @misc{2020MLPM, author={Sragner, Laszlo and Kelly, Christopher}, title={Machine Learning Product Manual}, url={http://machinelearningproductmanual.com/}, originalyear = {30.10.2020} } alternatively: Sragner, L. and Kelly, C., 2020. Machine Learning Product Manual. [ebook] London. Available at: <http://machinelearningproductmanual.com/>. Diagrams of the book are available in SVG format upon request, please contact the authors at: mlpm@hypergolic.co.uk Changes Version 0.0.5 Replace mermaid charts with Lucidchart SVGs Fix document so that Adobe Acrobat Reader can open it Version 0.0.4 Executive Summary and Bene ts All content nalised Version 0.0.3 All content reviewed Version 0.0.2 All content written Version 0.0.1 Initial version License and Terms © 2020-2021, Hypergolic Ltd, Some Rights Reserved Except where otherwise noted, this work is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work in the manner speci ed by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to machinelearningproductmanual.com. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. Diagrams of the book are available in SVG format upon request, please contact the authors at: mlpm@hypergolic.co.uk