Salesforce Data Cloud Demystified Eliot Harper Table of Contents Data Cloud Fundamentals 1 Data Ingestion 5 Data Model 10 Harmonization 15 Identity Resolution 18 Insights 24 Segments 27 Activation 31 About Us 34 Copyright © 2023 CloudKettle Inc. All rights reserved. This publication is protected by copyright and permission must be obtained from the publisher prior to any reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying or otherwise. To obtain permission to use material from this book, please email hello@cloudkettle.com with your request. ISBN 978-0-6455971-2-7 Data Cloud Fundamentals Salesforce Data Cloud is arguably the most significant platform to be released in the history of Salesforce. However, its journey to market has been more evolutionary than revolutionary. Salesforce started on this journey over five years ago, when they first announced Customer 360 at Salesforce Connections in June 2019. Then at Dreamforce in the same year, the platform was renamed Customer 360 Audiences, which was then renamed to Salesforce CDP in May 2021, then Genie at Dreamforce in 2022, before settling on Data Cloud in February this year. It’s been quite hard to keep up. Data Cloud is a purpose-built platform that enables organizations to connect all their customer data, at scale. How Data Cloud Works Data Cloud is a purpose-built platform that enables organizations to connect all their customer data, at scale. Whether that data resides in a mobile app, data lake, data warehouse, Salesforce platform, or is collected from user interactions on your website, Data Cloud provides the ability to ingest data from multiple sources, either as a batch process or in real-time, then harmonizes the data into a structured, canonical data model and applies reconciliation rules to unify individual records to single profile that adapts to a customers’ activity and behavior — in real-time. Then you can analyze, segment and activate this data, automating customer experiences across the complete suite of Salesforce Customer 360 products. Whether it’s injecting a Contact into a Marketing Cloud 1 journey or converting a Lead in Sales Cloud, or creating a case in Service Cloud, Data Cloud can fulfill marketing, sales and service automation use cases. Some organizations assume that they don’t need Data Cloud, as they already use data platforms. Perhaps they’re already storing customer data in a data lake, and using a data warehouse for business intelligence activities. But it’s important to understand that Data Cloud is actually neither of those platforms, but both of them. Data Lake A data lake provides a convenient repository to store data quickly, where you can deposit raw copies of structured, unstructured or semi-structured data files, without needing to perform data modeling at time of ingestion. But the problem with data lakes is they can quickly become data swamps, flooded with outdated and incomplete data. The net result is that it can be hard to extract data from data lakes, which aren’t optimized for querying at scale. Data Warehouse On the other hand, a data warehouse, also referred to as ‘enterprise data warehouse’ or ‘EDWs’, provides a central data repository which is specifically optimized for analytics and reporting purposes. Data warehouses provide fast and efficient data analysis, while also enhancing data quality through data cleansing, data organization and data governance processes. Data warehouses tend to be more performant than data lakes, but they can be more expensive 2 and limited in their ability to scale. Additionally, they can form data silos, which means they are often incompatible with other data sets. That makes it hard for users in other parts of an organization to access and use the data. Data Lakehouse Data Cloud, however, is built on a data lakehouse. A data lakehouse is a data platform, which merges the benefits of data lakes and data warehouses into a single data architecture, so data teams can accelerate their data processing as they no longer need to straddle two disparate data systems to complete and scale advanced analytics, like machine learning. CRM But what about a CRM? Afterall, isn’t that a data platform? Well, yes, it stores customer data — but that’s where the similarities end. CRMs are used for managing customer relationships and sales engagements, pipelines, customer interactions, business transactions and facilitating sales and service processes. And by design, a CRM is built for storing known customer data. If they're unknown, then they simply don't exist. Also, traditionally, CRM platforms store data in a transactional database that’s optimized for data integrity. But to use this data at scale, for tasks like analytics or machine learning, it’s necessary to copy this data to another system to process it. MDM Data Cloud isn’t a Master Data Management (or MDM) platform either. MDM platforms are enterprise software products that create and manage a central, persisted system of record for master data, through a semantic reconciliation process. While Data Cloud provides data normalization, it doesn't provide a golden record and — rather it creates a unified customer profile that changes and adapts based on an individual's activity. System of Record Data Cloud is not a substitute for a CRM, MDM or any other platform, and it’s never the first touchpoint in a data lifecycle. You still need a platform (or platforms) that generate a system of record (or unique identifier) to register that first entry point for your data — whether it’s an order, support case, or customer record. Once this identifier has been established, then your data can 3 be ingested into Data Cloud, which in turn provides a fabric layer that orchestrates all your data from different sources. And unlike transactional data stores, records in Data Cloud are fluid and designed to provide that moment-intime insight into an individual’s profile, their intent and behavior — all of which can change at any time. Data Cloud: Next Generation Architecture Salesforce has built one of the first — and arguably the most successful cloud CRM platforms of all time. At its core, Salesforce uses a transactional data store that follows a single logical operation sequence to provide atomicity of record operations. This approach ensures that the database can cancel, or undo, a transaction or operation that is not completed appropriately. While transactional data databases provide a high degree of data integrity, the downside is that these databases are designed for processing transactions, not analysis or transformations. In short, they don’t process or scale well. Additionally, Salesforce has been a serial acquirer for almost two decades. Platforms like ExactTarget (now Marketing Cloud Engagement), Pardot (Marketing Cloud Account Engagement) and Demandware (Commerce Cloud), to mention a few, all use different database architectures and platforms. While these platforms have effectively been integrated into Sales and Service Cloud, data has to be replicated across platforms. Data Cloud addresses both of these paradigms though decoupling existing Salesforce platforms and harmonizes data into a normalized, highly canonical data model where users can run analysis and predictions across the enterprise on a highly scalable microservices architecture that enables thousands of requests per second — storing billions of profiles, while also providing a petabyte-scale analytics environment. Data Cloud is set to form the foundation of the next generation cloud architecture for Salesforce and is set to be a game changer for both Salesforce and their customers. 4 Data Ingestion As its name implies, Salesforce Data Cloud is a data platform — before you can begin using the platform, you first need to get data into it. And while Data Cloud provides various options for importing data, it’s important to select an optimal method to integrate data sources into the platform. Unfortunately in data architecture, integrations are often implemented in a poorly considered way, which can compromise data accessibility, quality and scalability. This chapter describes the different ingestion methods for importing data into Data Cloud and considerations for designing robust data pipelines. Web Integration The Salesforce Interactions SDK (or Software Development Kit) enables developers to capture interactions for both known and pseudonymous website visitors and store these events in Data Cloud. In turn, this data can be used to build behavioral profiles and audience segmentation. The SDK is implemented by a client-side script on the source website that provides various methods to send event payloads to Data Cloud. The following data types can be captured in Data Cloud using the SDK Profile data for an individual, like user identity, phone and email eCommerce data including cart and cart items, orders and product catalog entries Consent tracking to capture when a user provides consent (the SDK will only send events only if a customer has consented to tracking) While the Interactions SDK was initially created for Marketing Cloud Personalization to track first-party cookie website data using a JavaScript beacon, the SDK provides a broader, extensible data capture and collection framework that includes product-specific modules for both Marketing Cloud 5 Personalization and Data Cloud. Marketing Cloud Personalization is not required in order to capture website behavior in Data Cloud using the SDK. Engagement Mobile SDK This SDK is formerly known as the Marketing Cloud Mobile SDK, which is used to send in-app messaging, inbox (push) messaging and location-based messaging to mobile apps using Marketing Cloud MobilePush. Similar to the Interactions SDK, the Engagement Mobile SDK has been extended to include a Data Cloud module that enables profile, eCommerce and consent data events (supported by the Interactions SDK) to be tracked in Data Cloud, in addition to mobile messaging and app-related behavioral events. The Engagement Mobile SDK is available for both iOS and Android mobile platforms and can be used without requiring MobilePush integration. Salesforce Connectors Data Cloud includes a set of connectors that enables data from Salesforce products to be ingested into Data Cloud using a configurable interface, without requiring custom development. Platform connectors include: B2C Commerce Connector for importing product and customer order data Marketing Cloud Connector for importing engagement data for email, SMS and mobile push events — additionally, you can import up to 20 data extensions per account (across all business units). Marketing Cloud Personalization Connector for importing website events including, user profiles, behavioral data (like page views) and order data — up to 5 Personalization datasets can be imported. Salesforce CRM for importing records from standard and custom objects from one or more Sales and Service Cloud orgs. All connectors include ‘bundles’ which are a set of predefined data sets from the source platform that align with common use cases. Bundles not only determine source data sets and fields, but also automatically map fields to the respective object and fields to Data Cloud standard Data Model Objects (DMOs). These predefined data mappings can also be customized. 6 MuleSoft Anypoint Connector MuleSoft Anypoint is an integration platform that enables API integration across different databases, SaaS platforms, storage resources, and network services, through hundreds of pre-built connectors. The MuleSoft Anypoint Connector for Data Cloud enables data to be ingested from other Anypoint Connectors, either as a streaming or bulk process. Additionally, the Connector can be used to publish insights from Data Cloud into upstream platforms. Cloud Storage Connectors Data Cloud supports bulk importing and exporting data from and to popular object storage services including: Amazon S3 Microsoft Azure Storage Google Cloud Storage These connectors are well suited for batch ingestion of voluminous datasets, as data files can be up to 200GB in size, with up to 1,000 maximum batch files for each scheduled run. Storage connectors provide a simple and convenient method for transferring data to Data Cloud on a scheduled basis, particularly for organizations that already run platforms and manage their data on these popular cloud computing services. SFTP Secure File Transfer Protocol (or SFTP) is an industry-standard network protocol for securely transferring large data files. Data Cloud can import CSV files from SFTP servers and supports up to 4.5GB file size in a single data stream. The vast majority of enterprise platforms support exporting CSV files, which when used in combination with a file transfer process or platform, makes SFTP a ubiquitous method for bulk importing data into Data Cloud. 7 Ingestion API While there are various “out-of-the-box” connectors that enable declarativestyle integration methods to Data Cloud without requiring any custom development, there are scenarios when data could be required to be programmatically loaded into the platform, either in near real-time or as a batch process. The Data Cloud Ingestion API fulfills both requirements through supporting both streaming and bulk data imports. Using the Streaming API, developers can build a JSON formatted payload that aligns to the data schema defined in a deployed data stream. This API follows a “fire and forget” approach, where a response is immediately returned and the imported data is processed asynchronously by the platform in near real-time, approximately every 3 minutes. This API is best suited for small batches of records (not exceeding 200KB). Use cases include: Visitors signing up on a website that triggers a database change An order fulfillment platform, where an order or shipment status changes A website chatbot conversation that is initiated by a website visitor Hotel or travel purchases completed on an online booking platform The Bulk Ingestion API allows large data sets to be created, updated or deleted in Data Cloud, where CSV files with a file size of up to 150MB (and up to 100 files per job) can be imported. This API follows a similar multi-step process to the Salesforce Bulk API, where a job is first programmatically created, then CSV data is uploaded to the job, then the job is closed and the uploaded data is enqueued for processing. This API is best suited for transferring large amounts of data at a regular interval, for example, daily or weekly. Possible use cases include: Daily customer transactional data from a financial service provider Point-of-sale data from in-store customer transactions Customer loyalty status or points balances from a loyalty management system Subscriber engagement data from a third-party messaging platform 8 Pipeline Considerations Identifying an appropriate connector, protocol, SDK or API is just the first stage in designing an integration to Data Cloud. How that data is then prepared and transferred for import — referred to as a ‘data pipeline’ — is equally important, as poor pipeline architecture can undermine the integration and worse still, the integrity of your data. Anti-patterns often surface in pipeline architectures. An anti-pattern is similar to a pattern, but while it may appear as a working solution, it’s the complete opposite of best practice. Anti-patterns typically arise when integration is done without any planning, design, or documentation. For example, a Data Cloud user may configure Amazon S3 Connector to import membership data from an S3 bucket. Data is exported from a source system to the S3 bucket, but the user is unaware how long the data export process takes and there is no validation of the exported data. The data stream runs on a predefined schedule, before the data has completed copying to the bucket — and even when the data file is available, data fails to import as required fields are missing. When building data pipelines for Data Cloud, quality is key. It is recommended When building data to establish processes that validate required fields and data schemas prior to pipelines for Data file import, then report on exceptions, so they can be remediated. Additionally, Cloud, quality is key. the platform has Limits and Guidelines for ingesting data. Ensure that data file properties and operations fall within these defined thresholds. Also, monitor the Data Stream Refresh History for errors. There are several scenarios when data stream refresh may fail, and the Refresh History page can be used to identify and troubleshoot errors, as they occur. 9 Data Model While many Salesforce products including Sales Cloud, Service Cloud, Education Cloud, Health Cloud and other Salesforce industry clouds are built on a common ‘core’ platform that share the same datastore (an Oracle relational database), Data Cloud uses a very different architecture and technology stack from other ‘Clouds’ in the Salesforce product line. This chapter explains the data models and related concepts used by the platform. Starting with the storage layer, Data Cloud includes multiple services, including DynamoDB for hot storage (so data can be supplied fast), Amazon S3 for cold storage, and a SQL metadata store for indexing all metadata. As a result, Data Cloud can provide a petabyte-scale data store, which breaks the scalability and performance constraints associated with relational databases. The physical architecture in Data Cloud is represented as a set of data objects and understanding these is key, as it forms the principal in how data is ingested, harmonized and activated in the platform. Data Flow Objects and Phases 10 Data Source A Data Source is the initial data layer used by Data Cloud. A Data Source represents a platform or system where your data originates from, outside of Data Cloud. These sources can either be: Salesforce platforms including Sales Cloud, Commerce Cloud, Marketing Data Cloud can Cloud and Marketing Cloud Personalization provide a petabyte- Object storage platforms including Amazon S3, Microsoft Azure Storage scale data store, and Google Cloud Storage which breaks the Ingestion APIs and Connector SDKs to programmatically load data from scalability and websites, mobile apps and other systems performance con- SFTP for file based transfer straints associated with relational Data Stream databases. A Data Stream is an entity which can be extracted from a Data Source, like ‘Orders’ from Commerce Cloud, ‘Contacts’ from Sales Cloud, or ‘Subscribers’ from Marketing Cloud. Once a Data Source is connected to Data Cloud, Data Streams provide paths to the respective entity. As a result, a Data Source can contain one or more Data Streams. Data Source Object A Data Stream is ingested to a Data Source Object or ‘DSO’. This object provides a physical, temporary staging data store that contains the data in its raw, native file format of the Data Stream (for example, a CSV file). Formulas can be applied to perform minor transformations on fields at time of data ingestion. Data Lake Object The next data object in the data flow is the Data Lake Object or ‘DLO’. The DLO is the first object that is available for inspection and enables users to prepare their data by mapping fields and applying additional transformations. Similar to the DSO, this object also provides a physical store and it forms the product of a DSO (and any transformation). 11 DLOs are storage containers that reside in the data lake (Amazon S3), generally as Apache Parquet files, which are an open-source, column-oriented file format designed for efficient data storage and retrieval files. On top of this, Apache Iceberg provides an abstraction layer between the physical data files and their table representation. The adoption of these industry standard formats are worth noting, as these file formats are widely supported by other cloud computing providers, and as a result, enable external platforms to integrate to Data Cloud in a zero-copy architecture, for example, Snowflake. Data Model Object Unlike DSOs and DLOs which use a physical data store, a Data Model Object, or ‘DMO’, enables a virtual, non-materialized view into the data lake. The result from running a query associated with a view is not stored anywhere and is always based on the current data snapshot in the DLOs. Attributes within a DMO can be created from different Data Streams, Calculated Insights and other sources. Similar to Salesforce objects, DMOs provide a canonical data model with predefined attributes, which are presented as standard objects, but custom DMOs can also be created (referred to as custom objects). And similar to Salesforce objects, DMOs can also have a standard or custom relationship to other DMOs, which can be structured as a one-to-one or many-to-one relationship. There are currently 89 standard DMOs in Data Cloud. DMOs are organized into different Data Object subject areas, including Case for service and support cases Engagement for engagement with an Individual, like email engagement activity (send, open, click) Loyalty for managing reward and recognition programs. Party for representing attributes related to an individual, like contact or account information. Privacy to track data privacy and consent data privacy preferences for an Individual. Product to define attributes related to products and services (or goods) Sales Order for defining past and forecast sales by product 12 For example, the Sales Order Subject Area uses the following DMOs: Sales Order for information around current and pending sales order Sales Order Product for attributes related to a specific product or servic Sales Store representing a retailer Order Delivery Method to define different order and delivery methods for fulfillment Opportunity to represent a sale that is in progress Opportunity Product to connect an opportunity to the product (or products) that it represents. Sales Order Subject Area in Data Cloud (source: architect.salesforce.com) Data Spaces Data Spaces provide logical partitions and a method of separating data between different brands, regions or departments, limiting data to users, without needing to have multiple Data Cloud instances. Additionally, Data Spaces can be used to align with a Software Development Lifecycle or SDLC, where you can stage and test Data Objects in a separate environment, without impacting production data. Data Sources, Data Streams and and DLOs can be made accessible across Data Spaces, while DMOs and other platform features are isolated to users, based on permission sets. 13 Data Spaces in Data Cloud (source: help.salesforce.com) Conclusion Due in part to advancements and the drop in storage costs, companies now have gargantuan datasets at their disposal. Every time a customer makes a purchase, opens an email, or even simply views a web page, these engagement events can be captured and stored, which if you can organize properly, enables you to understand your customers, predict their needs, personalize interactions, and much more. Data Cloud is set to form the backbone of the Salesforce platform to support the data needs of their customers and product line through the next two decades. But like other SaaS vendors, a core challenge for Salesforce is the success of its original platform created more than 20 years ago, which was really designed for a different era and runs on a competitor’s platform. One of the main challenges that Salesforce has faced with its database and platform architecture is that it doesn’t handle voluminous or “big” data well. Data Cloud addresses the paradigms associated with its core architecture and relational databases, through a well considered architecture that addresses limitations associated with relational databases. Data Cloud is set to form the backbone of the Salesforce platform to support the data needs of their customers and product line through the next two decades. 14 Harmonization Data Cloud accepts data from a variety of sources. Harmonization refers to the process of transforming (or modeling) different data sources into one. Once data has been unified (or harmonized) into a standardized data model, it can then be used for insights, segmentation and activation. Data Dictionary The first step in data harmonization is to map the source data. Poorly considered choices at this stage can result in inconsistencies and worse still, incorrect data in downstream processes. In order to mitigate this risk, it is recommended to create a data dictionary for each data source prior to In order to mitigate this risk, it is recommended to create a data dictionary for each data source prior to implementing data mapping. implementing data mapping. A data dictionary defines the data entities, attributes, context, and allowable values. Creating a dictionary for each data source not only enables a data mapping specification to be determined, but will also identify common attributes across data sources and how (or if) they relate to each other. Once data dictionaries have been defined, the data can be mapped. DMO Concepts In Data Cloud, mapping occurs between the DLO and DMO. It's important to stress that Data Cloud is not a 'bring your own' data model. For example, unlike Marketing Cloud, which allows users to create their own schemas and relationships in Data Extensions, Data Cloud uses a highly canonical data model consisting of 89 (and counting) pre-defined models, or ‘DMOs’ to accommodate for the vast majority of platform use cases. Additionally, as previously mentioned, standard DMOs can be extended by adding custom fields and objects to create a hybrid model approach. However, it is recommended to identify applicable standard DMOs and fields where possible, and only extend the model with custom fields and objects where necessary. 15 While several Data Cloud DMOs may appear to be similar to Salesforce standard objects — for example Individual, Account, and Case — these objects are not the same. They may semantically express the same item, but they are conceptually very different. Field Mapping Mapping DSO fields to DMO fields in Data Cloud is a straightforward process and is achieved through an intuitive interface. As discussed in the previous chapter, When creating a data stream, you must select a category or type of data found in that data stream; either Profile, Engagement, or Other. Correct category assignment is important, as it can't be changed after the data stream is created. Categories are also assigned to DMOs and are inherited from the first DLO mapped to it, where the DMO will only allow mapping of the same DLO category that was first assigned to it. While categories are generally chosen by the user, there are some exceptions, for example the Individual DMO will always use the profile category. Additionally, data sources from Salesforce Connectors enforce default mappings which can't be changed. Mapping fields in Data Cloud (source: Salesforce Trailhead) The result of field mapping is a series of DMOs, each representing a semantic object from the physical objects (DLOs) that provide a virtual, non-materialized view into the data lake. It is also important to note that DLO to DMO field mapping is a one-to-one relationship, that is, you can't have multiple email addresses or mobile phone numbers for a single individual. For such requirements, it is necessary to split the record into two or more, for example each sharing the same Salesforce Contact Id. 16 Fully Qualified Keys An additional platform concept to understand is that Data Cloud employs a unique concept of a Fully Qualified Key or 'FQK', which avoids key conflicts when data from different sources are harmonized. An FQK is a composition of a source provided key and a key qualifier, to avoid such conflict scenarios. For example, if an event registration DLO and CRM DLO are both mapped to the Individual DMO, an FQK effectively enables queries to be grouped by data source to allow users to accurately identify, target, and analyze records. Summary DMOs are the result of harmonizing multiple data sources into unified views of the data lake. Arguably the most important step in platform design is to make well considered choices at the harmonization stage, as wrong category assignment, inconsistent mapping and incorrect DMO selection can quickly result in technical debt that requires significant effort to remediate. Applying a methodical approach to data mapping, investing in authoring a comprehensive data dictionary, and making informed decisions in this implementation stage will help ensure that harmonized data from different data sources is accurately interpreted. 17 Identity Resolution While Data Cloud can ingest profile data from different types of data sources, a key capability of the platform is in establishing a single representation of different datasets — unifying multiple profile data points to a single represenation of an individual. Through a combination of deterministic and probabilistic matching rules, Data Cloud is able to fulfill this process at scale, by efficiently processing millions of records and derive a unified profile for each individual. This chapter explains the core identity resolution concepts used by the platform, and considerations when creating rules to resolve unified profiles. Same Person, Different Data Platforms and systems vary in their approach to storing profile information of an individual. A digital marketing platform might (at a minimum) have a first name and an email address for a customer, while a CRM may hold a more complete dataset including a mailing address and phone number, while other platforms like an event registration or commerce platform may store additonal profile attributes. Furthermore, some customers may have multiple (duplicate) profiles in data sources, for example, past orders in a commerce platform that use different email addresses for individual customers. Data Cloud establishes relationships from one or more individual records to a unified individual. Unified Individual Identity resolution is a multi-step process. The first step is in unifying the different data points to establish a unified individual. Data Cloud achieves this through an Individual Identity Link DMO that connects all profile data points to a unified individual record. The important concept to understand is that unlike many Master Data Management (MDM) platforms which provide a 'golden record' through aggregating profile attributes into a single record for an individual, Data Cloud establishes relationships from one or more individual 18 records to a unified individual. As a result, lineage to the source data is retained, enabling tracking of historical data flow over time, so details of where the data originated, how it has changed, and its ultimate destination is completely preserved. For each unified individual record, there can be one or more Individuals, but there is always at least one unified individual record for every individual, even if there are no matches returned. Unified Profile A unified profile is the result of a match rule — unified profiles are created on activation of match rules. This profile type represents attributes related to a unified individual and has a mutable identifier — that is, the representation of an individual can change anytime, based on in-the-moment data. For example, a unified profile could be based on three records belonging to the same person where one record is from a CRM platform and two are based on purchase history from a commerce platform, like "James purchased two orders; a backpack and lipstick". However, if these related data points change in the data source, for example, "James just updated the shipping information for his lipstick order to his partner’s", then we now know that James doesn't use lipstick and that transaction is related to a different individual. The result is a pliable profile model that is constantly resolving identities to link the different data points together to provide a 'best intent' representation of what an individual is determined to be at point-in-time, based on the data that's currently available. Match Rules Match rules in Data Cloud determine how the identity resolution process should identify matching records. Rules consist of one or more match criteria. Once all criteria within a rule are met, then a profile is matched. Match rules don't apply any gradation, scoring, weighting or hierarchy. Match criteria is evaluated independently and if the platform determines a match is returned, then it's considered to be a match. 19 Person accounts from Salesforce CRM can’t be used in identity resolution as they contain a mixture of account and contact fields that don't correspond to either business accounts or individuals. Three different types of match rules are available which are exact, exact normalized, and fuzzy. The exact match method uses a string-to-string comparison to return a deterministic-based match based on a case-insensitive value. For example, ‘McArthur’, ‘Mcarthur’ and ‘MCAUTHUR’ source data values will all be considered as an exact match. The normalized match method is available for certain fields in DMOs. This method normalizes field values based on their type and applies a deterministicbased match for the following fields Email address: removes trailing space characters and non-alphanumeric delimiter characters like '''' and <> Phone: removes white spaces and other non-alphanumeric characters, while also parses and validates phone numbers based on country cod Address: Standardizes based on country-specific rules for addresses, for example, an address line of "2201 Bruce Ave N" and "2201 Bruce Avenue North" would be determined to match The Fuzzy match is a probabilistic-based match that finds strings that match with a given string partially, but not exactly. This method is commonly used in search engines and is used in the platform to match the first name field only. It uses the Bidirectional Encoder Representations from Transformers (or 'BERT') AI Language Model to match common misspellings, diacritical marks, synonyms, and other parameters. Different precision levels can be set for fuzzy match rules to provide granularity over match results High Precision to match nicknames, punctuation, international abbreviations, international alphabet characters, and cross-cultural spellings. For example, ‘Alexander’ and ‘Alex’. 20 Medium Precision to match values with the same initials, gender variants, shuffled names, and similar subnames. For example, ‘Eliot’ and ‘Eliott’ Low Precision to match values with loose similarities. For example, ‘Liz’ and ‘Elizabeth’. Reconciliation Rules A reconciliation rule is used to determine the preferred value to use in a Unified field in the event of field value conflicts, for fields that can't contain multiple values, like 'first name'. Reconciliation rules can be based on: Last Updated: the most recently updated record based on Last Modified Date field in the DSO. Most Frequent: selects the most frequently occurring value Source priority: sorts DLOs in a ranked order or most to least preferred Reconciliation rules don’t apply to contact points, like phone numbers or email addresses, as contact points remain part of a unified profile, so all contact points are available for activation. Party Matching Data Cloud uses a notion of a “party” which is an abstract entity that can either refer to a subject area, a Party Identification DMO, or unique identifying fields in Data Cloud. Party DMOs in Data Cloud (source: architect.salesforce.com) 21 In the context of identity resolution, this rule uses the Party Identification DMO to match a set of values to an Individual. This matching method uses the following attributes Party Identification Id: a primary key Party: a foreign key match to the Individual Id field in the Individual DMO Party Identification Type: an optional descriptive identifier used to provide additional information about the identifier, for example "Email Subscriber" Identification Name: a second namespace of the party identification type that represents the value of the Party Identification Number, for example "Subscriber Key" Party Identification Number: the value that is used for matching purposes, for example, the Marketing Cloud Subscriber Key A Party Matching Rule essentially matches on the Party Identification Type, Name and Number values. Conceptually, when these three values are concatenated together, unified individual records with identical values are considered to be a match. Implementation Comprehensive testing is essential before enabling identity resolution rules, so rules can be validated for accuracy. It is recommended to identify different data point permutations from data streams and use sample datasets to test the resulting unified individual records. Data Cloud provides two separate identity graphs (or ‘rulesets’) enabling rule changes to be staged, tested and compared against production records, as each ruleset provides a separate set of unified individuals. The maximum number of ruleset jobs that can be executed in any 24 hour period is four rulesets per data space, so a considered approach needs to be applied to implementing and testing rule changes, as free-form testing methods may result in the 24 hour threshold being reached. Summary While the concept of a ‘unified profile’ is a deviation from data models used in other Salesforce products and can take some time to comprehend, it provides 22 many benefits to platform users, most notably the ability to preserve data lineage back to the data source. Another concept to understand is that unified profiles are fluid by design and don’t create a ‘golden record’ or ‘super record’, rather they recognise that the representation of an individual, which can change anytime, providing profile visualization based on in-the-moment data. Identity resolution is not a ‘black box’ — it’s a complex process that requires careful consideration, configuration and testing to ensure an optimal resolution process and ultimately representation of an individual. Salesforce has clearly thought very carefully about identity resolution and addressed many shortcomings of similar platforms. It’s one of many evolving platform features that we can expect to rapidly mature, as Salesforce continues to invest heavily in the product roadmap and enhancements. 23 Insights Data Cloud includes insights features that enable harmonized and unified data to be augmented with multi-dimensional metrics, for example Lifetime Value (LTV), Net Promoter Score (NPS), Customer Satisfaction Score (CSAT), churn rate, lead score, to mention just a few. There are two approaches for creating insights in the platform; calculated insights and streaming insights. This chapter explains both features. Calculated Insights Calculated insights are derived from scheduled batch data and can be used to build insights from single, multi-dimensional or time-dimensional calculations and in turn used across business intelligence, segmentation, activation and personalization use cases. This insight type is applied after data harmonization and unification, and enables the entire data model (all DMOs) to be queried, with a broad historical lookback period to right back data inception. The primary use case for calculated insights is in segmentation and activation, as it enriches attributes in the data model which can be used to determine applicable audience segments, then trigger and personalize messaging. Use cases for calculated insights include calculating Recency Frequency Monetary (RFM) Lifetime Value (LTV) Customer Satisfaction Score (CSAT) A further use case for calculated insights is to validate the data quality of identity resolution rulesets and outliers or anomalies. For example, calculated insights can determine: The number of matched contact points for each unified profile The consolidation rate of matched profiles for each data source The number of unique contact points for each data source 24 Calculated insights are well-suited for providing a single calculated value which can be reused across the business, not only within Data Cloud and activation datasets, but also by external platforms using the Calculated Insights API. For example, to determine a customer loyalty status or points balance. There are two approaches to creating calculated insights in the platform; writing SQL expressions, or declaratively through a builder tool in the platform user interface. SQL expressions are based on the ANSI SQL syntax and include certain aggregates and functions for calculating measures (quantitative values) which include: count (COUNT) average (AVG) total (SUM) minimum (MIN) maximum (MAX) Streaming insights are similar to calculated insights, but are designed to solve different use ‘realtime’ data use cases for engagement data. Measures can be used together with dimensions (qualitative values) for example name, date and other attributes, to categorize measures. Note that only measures, not dimensions, can be activated in the platform, however dimension filters can be used during activation. Calculated insights can also be run sequentially, where one calculated insight can be used as an input for the next calculated insight (up to three), which is useful for reusing common logic across insights. Calculated insights are refreshed on a defined frequency (a minimum of one hour) and are computed by a Spark job running on Amazon EMR (a managed cluster platform for running big data frameworks). Streaming Insights Streaming insights are similar to calculated insights, but are designed to solve different use ‘real-time’ data use cases for engagement data. While calculated insights follow a batch process, streaming insights are real-time activities, but 25 limited to micro-batches of a few records and can be used by Data Actions in the platform. Streaming insights are derived from real-time data sources, specifically the web SDK, mobile SDK and Marketing Cloud Personalization. While calculated insights allow joins across the entire data model to be queried, streaming insights only permit joining of the Engagement DMO with Individual and Unified Individual DMOs. Additionally, the available aggregation and interval functions to query the data are a subset of those available in Streaming Insights. Streaming insights use cases are typically derived from transaction or eventbased data. Typical use cases include: Geofencing for location-based data in a mobile app Open a support case based on a customer review Trigger emails based on website behavior Create contact in CRM based on a new ecommerce order Similar to calculated insights, streaming insights can be created using a visual builder or as defined as an ANSI SQL expression, but measures are limited to SUM or COUNT. Another key difference is that SQL expressions for streaming insights require a start and end window definition. This is used to aggregate data for multiple individuals into a specific time window, spanning 1 minute to 24 hours. Streaming insights can be filtered to only qualify certain events through action rules. Derived insights trigger a Data Action, like sending an email, mobile push notification, Platform Event or even a webhook callout to an external platform. Streaming insights are not currently supported in Segments or Activations. Summary While calculated insights and streaming insights follow a similar approach to deriving insights (through SQL), they fulfill different cases. Calculated insights are designed for batch processing large data sets and can perform complex calculations, while streaming insights are focused on processing micro-batches of real-time events. Both calculated and streaming insights provide a powerful approach to augment data in the platform with multi-dimensional metrics to fulfill sales automation, marketing automation and business intelligence use cases. 26 Segments Data ingested into Salesforce Data Cloud is used to define and augment individual profiles. This profile data can then be used to group individual profiles with similar characteristics. This process is known as segmentation. Data Cloud offers advanced segmentation capabilities and enables segment creation to fulfill a variety of business needs. This chapter explains the segmentation capabilities and functionality in Data Cloud. Segments For Everyone Segmentation is a core feature of any Customer Data Platform (or CDP). These platforms are specifically designed to fulfill marketing use cases, but Data Cloud goes beyond CDP use cases and enables different roles in an organization to create segments according to their business needs, including Sales teams to pre-qualify leads and prospects based on their web and email engagement behavior Service teams to prioritize cases based on customer profiles and needs Marketing teams to identify customer lifecycle stages and nurture them using journeys Analysts to group individuals based demographic, psychographic, behavioral and geographic data IT teams to understand use across devices (mobile and desktop) and apps Product teams to classify product and feature use of digital products Finance teams to analyze economic values of specific customer groups Segmentation Concepts While individual profiles have different relationships to DMOs, segments provide a view of what this data represents. When considering which data points to include from a data source, it is important to consider what attributes are required to create segments. 27 Segments use the harmonized data model together with calculated insights — not DLOs or DSOs. Only objects with a profile category can be targeted in a segment, specifically Individual, Unified Individual, and Loyalty Member, referred to as 'target entities'. Each segment defines the target entity to build the segment (or 'segment on'), which in turn determines which attributes are available in the attribute library. If identity resolution has been configured, then it is recommended that Unified Individual is used as the segment target. Filters are used to define the decision to qualify an individual for segment membership and include three components: Container: the DMO for the attribute that is being filtered Aggregation Criteria: amount of results needed to qualify for the segment Attribute Criteria: attributes used to filter segment results Anatomy of a segment filter in Data Cloud When segmenting on Unified Individual, the platform searches all connected Individuals to find a match and once an individual is found, then the Unified Individual profile joins the segment. In the example below, there are two 28 individual records that have been resolved to one Unified Individual profile; one is a home owner while the other is not. If the Attribute Criteria for the segment is 'Home Owner is true', then the Unified Individual joins the segment, as the filter criteria matches one Individual for that Unified Individual profile. Relationship between a Unified Individual and Individuals Current and historical segment membership can be previewed or retrieved from the Segment Membership DMO, which is useful for testing, analytics and business intelligence purposes. Segment Types There are two different types of segments available in the platform; standard and rapid segments. Standard segments refresh every 12 or 24 hours, while rapid segments refresh every 1 or 4 hours. Rapid segments are only available for segment activation in Marketing Cloud Engagement, and enable near-time messaging for and journey injection. Nested Segments Nested segments provide a convenient method to reuse segments in other segments, either as an inclusion or exclusion criteria. For example, ‘include a segment of all current customers’, or ‘exclude customers that have not opened an email in the past 12 months’. 29 Limits Segments that reference the Engagement category data are limited to a two year lookback window, however segments can be used in conjunction with calculated insights to aggregate historical data right back to data inception. Summary Segments in Data Cloud have multiple use cases for different business disciplines, which span well beyond the boundaries of CDP segmentation. The feature enables users across different organizational roles to understand, analyze and target individuals, through a highly configurable and intuitive interface. 30 Activation Activation in Data Cloud materializes and publishes a collection of segment members along with their supporting attributes to a configured activation platform. There are many different use cases for activating data, both for sales, service and marketing automation, including: Create leads in a CRM platform Create a support case based on customer survey data Send an email from a marketing platform Activate targeted advertising from an ad platform Personalize web content This chapter explains the activation concepts and features available in Data Cloud. Activation Target An activation target is the receiving platform of a segment. Activation targets define credential and connection data to allow Data Cloud to send a payload of segment members to the target platform. Supported activation targets include: Amazon S3 cloud storage Google Ads, Meta and Amazon Ads advertising platforms Salesforce Marketing Cloud Data Cloud DMO Activation An activation is created from a segment and defines information related to segment members, including: 31 Activation Target: the target platform or system Activation Membership: the object containing the segment members to activate, for example Individual Attributes: fields from the Activation Membership object or from any DMO, providing that there is a path to the DMO from the ‘Activated On’ object defined in the segment. Contact Points: which contact attribute will be used by the activation target, for example, a phone number or email address (Contact Points are optional for Amazon S3 targets) Source Priority While Data Cloud allows multiple contact points to be defined for each Unified Individual profile (for example, different email addresses in Individual records derived from different data sources), a source priority order determines which contact point value to use for segment members when multiple values are available. In such scenarios, a source priority order can be used to determine which contact point value should be used based on a prioritized data source — for example, an email from a Contact record in CRM, than an email address from a marketing platform. If the defined source priority ruleset does not resolve a contact point value, then the contact point is determined based on the highest Einstein Engagement Score in Marketing Cloud (to predict an Individual's likelihood to engage with a message), and if that cannot be determined, then the contact point is defined based lowest numerical identifier associated with an Individual. Attributes Attributes can be included in an activation payload to personalize messages in Marketing Cloud and Amazon S3 activation targets. Two types of attributes are available in activations; direct attributes and related attributes. Direct attributes have a one-to-one relationship for the Individual the DMO, for example, 'First Name'. Related attributes have a one-to-many relationship to an individual, for example, 'email opens' or 'ecommerce orders'. 32 Filters can be applied to related attributes (which are different from segmentation filters) to narrow results for personalization — for example, customers who have opened an email in the last 30 days. Attributes are structured in a single JSON formatted string, which can be parsed for messaging personalization using a server-side language in Marketing Cloud (either AMPscript, SSJS or GTL). Data Actions While activations publish segments to an activation target, Data Actions enable near real-time events based on streaming insights or record changes to an engagement object when a record is created, updated or deleted (a Change Data Capture event), which in turn can trigger a flow or orchestrate external processes. Supported data action targets include: Salesforce Platform Events Salesforce Marketing Cloud Webhook Data Actions can optionally be enriched with data from related objects and can also include one or more event rules to determine when data should be published based on a set of conditions. Possible use cases for Data Actions include using: Salesforce Platform Event to trigger a Flow that updates a Lead record to Sales Qualified and convert Leads to Contacts based on initial purchase Marketing Cloud to inject new customers into a welcome journey after activating their account to send them a series of onboarding drip campaign emails Webhook to integrate with logistics and shipping platforms to create new fulfillment when order is created. Summary Activations and Data Actions in Data Cloud enables platform data to be actionable and unlock many use cases, not only marketing automation and personalization, but also for sales and service use cases. 33 About Us About the Author Eliot Harper is a Principal Architect at CloudKettle and a Salesforce MVP. Eliot is an acknowledged expert in both Salesforce Data Cloud and Salesforce Marketing Cloud and is author of The AMPscript Guide and Journey Builder Developer’s Guide. He is a sought-after speaker at international events and regularly publishes related content on social media. About CloudKettle CloudKettle helps enterprises drive revenue with the Salesforce and Google ecosystems. We do this by providing the strategy and hands-on keyboard execution to leverage platforms like Salesforce Data Cloud, Sales Cloud, Marketing Cloud, Einstein, and CRM Analytics to create highly personalized cross-channel experiences that drive revenue. As your strategic advisor, we help by enhancing your people, processes, and technology to build a roadmap centered around scalable tactics and security. To learn more, contact hello@cloudkettle.com. 34