Uploaded by Daniel Santiago Castillo González

Salesforce Data Cloud Demystified eBook

advertisement
Salesforce Data Cloud
Demystified
Eliot Harper
Table of Contents
Data Cloud Fundamentals
1 Data Ingestion
5 Data Model
10 Harmonization
15 Identity Resolution
18 Insights
24 Segments
27 Activation
31 About Us
34
Copyright © 2023 CloudKettle Inc. All rights reserved. This publication is protected by copyright and permission must be obtained
from the publisher prior to any reproduction, storage in a retrieval system, or
transmission in any form or by any means, electronic, mechanical,
photocopying or otherwise. To obtain permission to use material from this
book, please email hello@cloudkettle.com with your request. ISBN 978-0-6455971-2-7
Data Cloud Fundamentals
Salesforce Data Cloud is arguably the most significant
platform to be released in the history of Salesforce. However,
its journey to market has been more evolutionary than revolutionary. Salesforce started on this journey over five years ago,
when they first announced Customer 360 at Salesforce
Connections in June 2019. Then at Dreamforce in the same
year, the platform was renamed Customer 360 Audiences,
which was then renamed to Salesforce CDP in May 2021, then
Genie at Dreamforce in 2022, before settling on Data Cloud in
February this year. It’s been quite hard to keep up.
Data Cloud is a
purpose-built
platform that enables
organizations to
connect all their
customer data, at
scale.
How Data Cloud Works
Data Cloud is a purpose-built platform that enables organizations to connect all
their customer data, at scale. Whether that data resides in a mobile app, data
lake, data warehouse, Salesforce platform, or is collected from user interactions on your website, Data Cloud provides the ability to ingest data from
multiple sources, either as a batch process or in real-time, then harmonizes the
data into a structured, canonical data model and applies reconciliation rules to
unify individual records to single profile that adapts to a customers’ activity and
behavior — in real-time. Then you can analyze, segment and activate this data,
automating customer experiences across the complete suite of Salesforce
Customer 360 products. Whether it’s injecting a Contact into a Marketing Cloud
1
journey or converting a Lead in Sales Cloud, or creating a case in Service Cloud,
Data Cloud can fulfill marketing, sales and service automation use cases. Some organizations assume that they don’t need Data Cloud, as they already
use data platforms. Perhaps they’re already storing customer data in a data
lake, and using a data warehouse for business intelligence activities. But it’s
important to understand that Data Cloud is actually neither of those platforms,
but both of them. Data Lake
A data lake provides a convenient repository to store data quickly, where you
can deposit raw copies of structured, unstructured or semi-structured data
files, without needing to perform data modeling at time of ingestion. But the
problem with data lakes is they can quickly become data swamps, flooded with
outdated and incomplete data. The net result is that it can be hard to extract
data from data lakes, which aren’t optimized for querying at scale. Data Warehouse
On the other hand, a data warehouse, also referred to as ‘enterprise data
warehouse’ or ‘EDWs’, provides a central data repository which is specifically
optimized for analytics and reporting purposes. Data warehouses provide fast
and efficient data analysis, while also enhancing data quality through data
cleansing, data organization and data governance processes. Data warehouses
tend to be more performant than data lakes, but they can be more expensive
2
and limited in their ability to scale. Additionally, they can form data silos, which
means they are often incompatible with other data sets. That makes it hard for
users in other parts of an organization to access and use the data. Data Lakehouse
Data Cloud, however, is built on a data lakehouse. A data lakehouse is a data
platform, which merges the benefits of data lakes and data warehouses into a
single data architecture, so data teams can accelerate their data processing as
they no longer need to straddle two disparate data systems to complete and
scale advanced analytics, like machine learning.
CRM
But what about a CRM? Afterall, isn’t that a data platform? Well, yes, it stores
customer data — but that’s where the similarities end. CRMs are used for
managing customer relationships and sales engagements, pipelines, customer
interactions, business transactions and facilitating sales and service
processes. And by design, a CRM is built for storing known customer data. If
they're unknown, then they simply don't exist. Also, traditionally, CRM platforms
store data in a transactional database that’s optimized for data integrity. But to
use this data at scale, for tasks like analytics or machine learning, it’s
necessary to copy this data to another system to process it.
MDM
Data Cloud isn’t a Master Data Management (or MDM) platform either. MDM
platforms are enterprise software products that create and manage a central,
persisted system of record for master data, through a semantic reconciliation
process. While Data Cloud provides data normalization, it doesn't provide a
golden record and — rather it creates a unified customer profile that changes
and adapts based on an individual's activity.
System of Record
Data Cloud is not a substitute for a CRM, MDM or any other platform, and it’s
never the first touchpoint in a data lifecycle. You still need a platform (or
platforms) that generate a system of record (or unique identifier) to register
that first entry point for your data — whether it’s an order, support case, or
customer record. Once this identifier has been established, then your data can
3
be ingested into Data Cloud, which in turn provides a fabric layer that
orchestrates all your data from different sources. And unlike transactional data
stores, records in Data Cloud are fluid and designed to provide that moment-intime insight into an individual’s profile, their intent and behavior — all of which
can change at any time.
Data Cloud: Next Generation Architecture
Salesforce has built one of the first — and arguably the most successful cloud
CRM platforms of all time. At its core, Salesforce uses a transactional data
store that follows a single logical operation sequence to provide atomicity of
record operations. This approach ensures that the database can cancel, or
undo, a transaction or operation that is not completed appropriately. While
transactional data databases provide a high degree of data integrity, the
downside is that these databases are designed for processing transactions,
not analysis or transformations. In short, they don’t process or scale well. Additionally, Salesforce has been a serial acquirer for almost two decades.
Platforms like ExactTarget (now Marketing Cloud Engagement), Pardot
(Marketing Cloud Account Engagement) and Demandware (Commerce Cloud),
to mention a few, all use different database architectures and platforms. While
these platforms have effectively been integrated into Sales and Service Cloud,
data has to be replicated across platforms. Data Cloud addresses both of these paradigms though decoupling existing
Salesforce platforms and harmonizes data into a normalized, highly canonical
data model where users can run analysis and predictions across the enterprise
on a highly scalable microservices architecture that enables thousands of
requests per second — storing billions of profiles, while also providing a
petabyte-scale analytics environment. Data Cloud is set to form the foundation of the next generation cloud
architecture for Salesforce and is set to be a game changer for both Salesforce
and their customers.
4
Data Ingestion
As its name implies, Salesforce Data Cloud is a data platform —
before you can begin using the platform, you first need to get
data into it. And while Data Cloud provides various options for
importing data, it’s important to select an optimal method to
integrate data sources into the platform. Unfortunately in data
architecture, integrations are often implemented in a poorly
considered way, which can compromise data accessibility,
quality and scalability. This chapter describes the different
ingestion methods for importing data into Data Cloud and
considerations for designing robust data pipelines.
Web Integration
The Salesforce Interactions SDK (or Software Development Kit) enables
developers to capture interactions for both known and pseudonymous website
visitors and store these events in Data Cloud. In turn, this data can be used to
build behavioral profiles and audience segmentation. The SDK is implemented
by a client-side script on the source website that provides various methods to
send event payloads to Data Cloud. The following data types can be captured
in Data Cloud using the SDK
Profile data for an individual, like user identity, phone and email
eCommerce data including cart and cart items, orders and product catalog
entries
Consent tracking to capture when a user provides consent (the SDK will
only send events only if a customer has consented to tracking)
While the Interactions SDK was initially created for Marketing Cloud
Personalization to track first-party cookie website data using a JavaScript
beacon, the SDK provides a broader, extensible data capture and collection
framework that includes product-specific modules for both Marketing Cloud
5
Personalization and Data Cloud. Marketing Cloud Personalization is not
required in order to capture website behavior in Data Cloud using the SDK.
Engagement Mobile SDK
This SDK is formerly known as the Marketing Cloud Mobile SDK, which is used
to send in-app messaging, inbox (push) messaging and location-based
messaging to mobile apps using Marketing Cloud MobilePush. Similar to the
Interactions SDK, the Engagement Mobile SDK has been extended to include a
Data Cloud module that enables profile, eCommerce and consent data events
(supported by the Interactions SDK) to be tracked in Data Cloud, in addition to
mobile messaging and app-related behavioral events. The Engagement Mobile SDK is available for both iOS and Android mobile
platforms and can be used without requiring MobilePush integration.
Salesforce Connectors
Data Cloud includes a set of connectors that enables data from Salesforce
products to be ingested into Data Cloud using a configurable interface, without
requiring custom development. Platform connectors include:
B2C Commerce Connector for importing product and customer order data
Marketing Cloud Connector for importing engagement data for email, SMS
and mobile push events — additionally, you can import up to 20 data
extensions per account (across all business units).
Marketing Cloud Personalization Connector for importing website events
including, user profiles, behavioral data (like page views) and order data —
up to 5 Personalization datasets can be imported.
Salesforce CRM for importing records from standard and custom objects
from one or more Sales and Service Cloud orgs.
All connectors include ‘bundles’ which are a set of predefined data sets from
the source platform that align with common use cases. Bundles not only
determine source data sets and fields, but also automatically map fields to the
respective object and fields to Data Cloud standard Data Model Objects
(DMOs). These predefined data mappings can also be customized.
6
MuleSoft Anypoint Connector
MuleSoft Anypoint is an integration platform that enables API integration
across different databases, SaaS platforms, storage resources, and network
services, through hundreds of pre-built connectors. The MuleSoft Anypoint
Connector for Data Cloud enables data to be ingested from other Anypoint
Connectors, either as a streaming or bulk process. Additionally, the Connector
can be used to publish insights from Data Cloud into upstream platforms. Cloud Storage Connectors
Data Cloud supports bulk importing and exporting data from and to popular
object storage services including:
Amazon S3
Microsoft Azure Storage
Google Cloud Storage
These connectors are well suited for batch ingestion of voluminous datasets,
as data files can be up to 200GB in size, with up to 1,000 maximum batch files
for each scheduled run. Storage connectors provide a simple and convenient
method for transferring data to Data Cloud on a scheduled basis, particularly
for organizations that already run platforms and manage their data on these
popular cloud computing services. SFTP
Secure File Transfer Protocol (or SFTP) is an industry-standard network
protocol for securely transferring large data files. Data Cloud can import CSV
files from SFTP servers and supports up to 4.5GB file size in a single data
stream. The vast majority of enterprise platforms support exporting CSV files,
which when used in combination with a file transfer process or platform,
makes SFTP a ubiquitous method for bulk importing data into Data Cloud.
7
Ingestion API
While there are various “out-of-the-box” connectors that enable declarativestyle integration methods to Data Cloud without requiring any custom
development, there are scenarios when data could be required to be
programmatically loaded into the platform, either in near real-time or as a batch
process. The Data Cloud Ingestion API fulfills both requirements through
supporting both streaming and bulk data imports. Using the Streaming API, developers can build a JSON formatted payload that
aligns to the data schema defined in a deployed data stream. This API follows
a “fire and forget” approach, where a response is immediately returned and the
imported data is processed asynchronously by the platform in near real-time,
approximately every 3 minutes. This API is best suited for small batches of
records (not exceeding 200KB). Use cases include:
Visitors signing up on a website that triggers a database change
An order fulfillment platform, where an order or shipment status changes
A website chatbot conversation that is initiated by a website visitor
Hotel or travel purchases completed on an online booking platform
The Bulk Ingestion API allows large data sets to be created, updated or deleted
in Data Cloud, where CSV files with a file size of up to 150MB (and up to 100
files per job) can be imported. This API follows a similar multi-step process to
the Salesforce Bulk API, where a job is first programmatically created, then CSV
data is uploaded to the job, then the job is closed and the uploaded data is
enqueued for processing. This API is best suited for transferring large amounts
of data at a regular interval, for example, daily or weekly. Possible use cases
include:
Daily customer transactional data from a financial service provider
Point-of-sale data from in-store customer transactions
Customer loyalty status or points balances from a loyalty management
system
Subscriber engagement data from a third-party messaging platform
8
Pipeline Considerations
Identifying an appropriate connector, protocol, SDK or API is just the first stage
in designing an integration to Data Cloud. How that data is then prepared and
transferred for import — referred to as a ‘data pipeline’ — is equally important,
as poor pipeline architecture can undermine the integration and worse still, the
integrity of your data.
Anti-patterns often surface in pipeline architectures. An anti-pattern is similar
to a pattern, but while it may appear as a working solution, it’s the complete
opposite of best practice. Anti-patterns typically arise when integration is done
without any planning, design, or documentation. For example, a Data Cloud
user may configure Amazon S3 Connector to import membership data from an
S3 bucket. Data is exported from a source system to the S3 bucket, but the
user is unaware how long the data export process takes and there is no
validation of the exported data. The data stream runs on a predefined
schedule, before the data has completed copying to the bucket — and even
when the data file is available, data fails to import as required fields are
missing. When building data pipelines for Data Cloud, quality is key. It is recommended
When building data
to establish processes that validate required fields and data schemas prior to
pipelines for Data
file import, then report on exceptions, so they can be remediated. Additionally,
Cloud, quality is key.
the platform has Limits and Guidelines for ingesting data. Ensure that data file
properties and operations fall within these defined thresholds. Also, monitor
the Data Stream Refresh History for errors. There are several scenarios when
data stream refresh may fail, and the Refresh History page can be used to
identify and troubleshoot errors, as they occur.
9
Data Model
While many Salesforce products including Sales Cloud, Service
Cloud, Education Cloud, Health Cloud and other Salesforce
industry clouds are built on a common ‘core’ platform that
share the same datastore (an Oracle relational database), Data
Cloud uses a very different architecture and technology stack
from other ‘Clouds’ in the Salesforce product line. This chapter
explains the data models and related concepts used by the
platform.
Starting with the storage layer, Data Cloud includes multiple services, including
DynamoDB for hot storage (so data can be supplied fast), Amazon S3 for cold
storage, and a SQL metadata store for indexing all metadata. As a result, Data
Cloud can provide a petabyte-scale data store, which breaks the scalability and
performance constraints associated with relational databases. The physical architecture in Data Cloud is represented as a set of data objects
and understanding these is key, as it forms the principal in how data is
ingested, harmonized and activated in the platform.
Data Flow Objects and Phases
10
Data Source
A Data Source is the initial data layer used by Data Cloud. A Data Source
represents a platform or system where your data originates from, outside of
Data Cloud. These sources can either be:
Salesforce platforms including Sales Cloud, Commerce Cloud, Marketing
Data Cloud can
Cloud and Marketing Cloud Personalization
provide a petabyte-
Object storage platforms including Amazon S3, Microsoft Azure Storage
scale data store,
and Google Cloud Storage
which breaks the
Ingestion APIs and Connector SDKs to programmatically load data from
scalability and
websites, mobile apps and other systems
performance con-
SFTP for file based transfer
straints associated
with relational
Data Stream
databases.
A Data Stream is an entity which can be extracted from a Data Source, like
‘Orders’ from Commerce Cloud, ‘Contacts’ from Sales Cloud, or ‘Subscribers’
from Marketing Cloud. Once a Data Source is connected to Data Cloud, Data
Streams provide paths to the respective entity. As a result, a Data Source can
contain one or more Data Streams. Data Source Object
A Data Stream is ingested to a Data Source Object or ‘DSO’. This object
provides a physical, temporary staging data store that contains the data in its
raw, native file format of the Data Stream (for example, a CSV file). Formulas
can be applied to perform minor transformations on fields at time of data
ingestion. Data Lake Object
The next data object in the data flow is the Data Lake Object or ‘DLO’. The DLO
is the first object that is available for inspection and enables users to prepare
their data by mapping fields and applying additional transformations. Similar to
the DSO, this object also provides a physical store and it forms the product of a
DSO (and any transformation). 11
DLOs are storage containers that reside in the data lake (Amazon S3), generally
as Apache Parquet files, which are an open-source, column-oriented file format
designed for efficient data storage and retrieval files. On top of this, Apache
Iceberg provides an abstraction layer between the physical data files and their table representation. The adoption of these industry standard formats are
worth noting, as these file formats are widely supported by other cloud
computing providers, and as a result, enable external platforms to integrate to
Data Cloud in a zero-copy architecture, for example, Snowflake.
Data Model Object
Unlike DSOs and DLOs which use a physical data store, a Data Model Object, or
‘DMO’, enables a virtual, non-materialized view into the data lake. The result
from running a query associated with a view is not stored anywhere and is
always based on the current data snapshot in the DLOs. Attributes within a
DMO can be created from different Data Streams, Calculated Insights and
other sources. Similar to Salesforce objects, DMOs provide a canonical data model with predefined attributes, which are presented as standard objects, but custom DMOs
can also be created (referred to as custom objects). And similar to Salesforce
objects, DMOs can also have a standard or custom relationship to other DMOs,
which can be structured as a one-to-one or many-to-one relationship. There are
currently 89 standard DMOs in Data Cloud. DMOs are organized into different
Data Object subject areas, including
Case for service and support cases
Engagement for engagement with an Individual, like email engagement
activity (send, open, click)
Loyalty for managing reward and recognition programs.
Party for representing attributes related to an individual, like contact or
account information.
Privacy to track data privacy and consent data privacy preferences for an
Individual.
Product to define attributes related to products and services (or goods)
Sales Order for defining past and forecast sales by product
12
For example, the Sales Order Subject Area uses the following DMOs:
Sales Order for information around current and pending sales order
Sales Order Product for attributes related to a specific product or servic
Sales Store representing a retailer
Order Delivery Method to define different order and delivery methods for
fulfillment
Opportunity to represent a sale that is in progress
Opportunity Product to connect an opportunity to the product (or products)
that it represents.
Sales Order Subject Area in Data Cloud (source: architect.salesforce.com)
Data Spaces
Data Spaces provide logical partitions and a method of separating data
between different brands, regions or departments, limiting data to users,
without needing to have multiple Data Cloud instances. Additionally, Data
Spaces can be used to align with a Software Development Lifecycle or SDLC,
where you can stage and test Data Objects in a separate environment, without
impacting production data. Data Sources, Data Streams and and DLOs can be
made accessible across Data Spaces, while DMOs and other platform features
are isolated to users, based on permission sets. 13
Data Spaces in Data Cloud (source: help.salesforce.com)
Conclusion
Due in part to advancements and the drop in storage costs, companies now
have gargantuan datasets at their disposal. Every time a customer makes a
purchase, opens an email, or even simply views a web page, these engagement
events can be captured and stored, which if you can organize properly, enables
you to understand your customers, predict their needs, personalize
interactions, and much more. Data Cloud is set to
form the backbone of
the Salesforce
platform to support
the data needs of
their customers and
product line through
the next two
decades.
But like other SaaS vendors, a core challenge for Salesforce is the success of
its original platform created more than 20 years ago, which was really designed
for a different era and runs on a competitor’s platform. One of the main
challenges that Salesforce has faced with its database and platform
architecture is that it doesn’t handle voluminous or “big” data well. Data Cloud
addresses the paradigms associated with its core architecture and relational
databases, through a well considered architecture that addresses limitations
associated with relational databases. Data Cloud is set to form the backbone of the Salesforce platform to support the
data needs of their customers and product line through the next two decades.
14
Harmonization
Data Cloud accepts data from a variety of sources.
Harmonization refers to the process of transforming (or
modeling) different data sources into one. Once data has been
unified (or harmonized) into a standardized data model, it can
then be used for insights, segmentation and activation.
Data Dictionary
The first step in data harmonization is to map the source data. Poorly
considered choices at this stage can result in inconsistencies and worse still,
incorrect data in downstream processes. In order to mitigate this risk, it is
recommended to create a data dictionary for each data source prior to
In order to mitigate
this risk, it is recommended to create a
data dictionary for
each data source
prior to implementing data mapping.
implementing data mapping. A data dictionary defines the data entities, attributes, context, and allowable
values. Creating a dictionary for each data source not only enables a data
mapping specification to be determined, but will also identify common
attributes across data sources and how (or if) they relate to each other. Once
data dictionaries have been defined, the data can be mapped.
DMO Concepts
In Data Cloud, mapping occurs between the DLO and DMO. It's important to
stress that Data Cloud is not a 'bring your own' data model. For example, unlike
Marketing Cloud, which allows users to create their own schemas and
relationships in Data Extensions, Data Cloud uses a highly canonical data
model consisting of 89 (and counting) pre-defined models, or ‘DMOs’ to
accommodate for the vast majority of platform use cases. Additionally, as
previously mentioned, standard DMOs can be extended by adding custom
fields and objects to create a hybrid model approach. However, it is
recommended to identify applicable standard DMOs and fields where possible,
and only extend the model with custom fields and objects where necessary.
15
While several Data Cloud DMOs may appear to be similar to Salesforce
standard objects — for example Individual, Account, and Case — these objects
are not the same. They may semantically express the same item, but they are
conceptually very different. Field Mapping
Mapping DSO fields to DMO fields in Data Cloud is a straightforward process
and is achieved through an intuitive interface. As discussed in the previous
chapter, When creating a data stream, you must select a category or type of
data found in that data stream; either Profile, Engagement, or Other. Correct
category assignment is important, as it can't be changed after the data stream
is created. Categories are also assigned to DMOs and are inherited from the
first DLO mapped to it, where the DMO will only allow mapping of the same
DLO category that was first assigned to it. While categories are generally
chosen by the user, there are some exceptions, for example the Individual DMO
will always use the profile category. Additionally, data sources from Salesforce
Connectors enforce default mappings which can't be changed. Mapping fields in Data Cloud (source: Salesforce Trailhead)
The result of field mapping is a series of DMOs, each representing a semantic
object from the physical objects (DLOs) that provide a virtual, non-materialized
view into the data lake. It is also important to note that DLO to DMO field
mapping is a one-to-one relationship, that is, you can't have multiple email
addresses or mobile phone numbers for a single individual. For such
requirements, it is necessary to split the record into two or more, for example
each sharing the same Salesforce Contact Id.
16
Fully Qualified Keys
An additional platform concept to understand is that Data Cloud employs a
unique concept of a Fully Qualified Key or 'FQK', which avoids key conflicts
when data from different sources are harmonized. An FQK is a composition of
a source provided key and a key qualifier, to avoid such conflict scenarios. For
example, if an event registration DLO and CRM DLO are both mapped to the
Individual DMO, an FQK effectively enables queries to be grouped by data
source to allow users to accurately identify, target, and analyze records. Summary
DMOs are the result of harmonizing multiple data sources into unified views of
the data lake. Arguably the most important step in platform design is to make
well considered choices at the harmonization stage, as wrong category
assignment, inconsistent mapping and incorrect DMO selection can quickly
result in technical debt that requires significant effort to remediate. Applying a
methodical approach to data mapping, investing in authoring a comprehensive
data dictionary, and making informed decisions in this implementation stage
will help ensure that harmonized data from different data sources is accurately
interpreted.
17
Identity Resolution
While Data Cloud can ingest profile data from different types of
data sources, a key capability of the platform is in establishing
a single representation of different datasets — unifying multiple
profile data points to a single represenation of an individual.
Through a combination of deterministic and probabilistic
matching rules, Data Cloud is able to fulfill this process at
scale, by efficiently processing millions of records and derive a
unified profile for each individual. This chapter explains the core identity resolution concepts
used by the platform, and considerations when creating rules
to resolve unified profiles.
Same Person, Different Data
Platforms and systems vary in their approach to storing profile information of
an individual. A digital marketing platform might (at a minimum) have a first
name and an email address for a customer, while a CRM may hold a more
complete dataset including a mailing address and phone number, while other
platforms like an event registration or commerce platform may store additonal
profile attributes. Furthermore, some customers may have multiple (duplicate)
profiles in data sources, for example, past orders in a commerce platform that
use different email addresses for individual customers.
Data Cloud
establishes relationships from one or
more individual
records to a unified
individual.
Unified Individual
Identity resolution is a multi-step process. The first step is in unifying the
different data points to establish a unified individual. Data Cloud achieves this
through an Individual Identity Link DMO that connects all profile data points to
a unified individual record. The important concept to understand is that unlike
many Master Data Management (MDM) platforms which provide a 'golden
record' through aggregating profile attributes into a single record for an
individual, Data Cloud establishes relationships from one or more individual
18
records to a unified individual. As a result, lineage to the source data is
retained, enabling tracking of historical data flow over time, so details of where
the data originated, how it has changed, and its ultimate destination is
completely preserved.
For each unified individual record, there can be one or more Individuals, but
there is always at least one unified individual record for every individual, even if
there are no matches returned. Unified Profile
A unified profile is the result of a match rule — unified profiles are created on
activation of match rules. This profile type represents attributes related to a
unified individual and has a mutable identifier — that is, the representation of
an individual can change anytime, based on in-the-moment data. For example,
a unified profile could be based on three records belonging to the same person
where one record is from a CRM platform and two are based on purchase
history from a commerce platform, like "James purchased two orders; a
backpack and lipstick". However, if these related data points change in the data
source, for example, "James just updated the shipping information for his
lipstick order to his partner’s", then we now know that James doesn't use
lipstick and that transaction is related to a different individual.
The result is a pliable profile model that is constantly resolving identities to link
the different data points together to provide a 'best intent' representation of
what an individual is determined to be at point-in-time, based on the data that's
currently available.
Match Rules
Match rules in Data Cloud determine how the identity resolution process
should identify matching records. Rules consist of one or more match criteria.
Once all criteria within a rule are met, then a profile is matched. Match rules
don't apply any gradation, scoring, weighting or hierarchy. Match criteria is
evaluated independently and if the platform determines a match is returned,
then it's considered to be a match. 19
Person accounts from Salesforce CRM can’t be used in identity resolution as
they contain a mixture of account and contact fields that don't correspond to
either business accounts or individuals. Three different types of match rules
are available which are exact, exact normalized, and fuzzy. The exact match method uses a string-to-string comparison to return a
deterministic-based match based on a case-insensitive value. For example,
‘McArthur’, ‘Mcarthur’ and ‘MCAUTHUR’ source data values will all be
considered as an exact match.
The normalized match method is available for certain fields in DMOs. This
method normalizes field values based on their type and applies a deterministicbased match for the following fields
Email address: removes trailing space characters and non-alphanumeric
delimiter characters like '''' and <>
Phone: removes white spaces and other non-alphanumeric characters,
while also parses and validates phone numbers based on country cod
Address: Standardizes based on country-specific rules for addresses, for
example, an address line of "2201 Bruce Ave N" and "2201 Bruce Avenue
North" would be determined to match
The Fuzzy match is a probabilistic-based match that finds strings that match
with a given string partially, but not exactly. This method is commonly used in
search engines and is used in the platform to match the first name field only. It
uses the Bidirectional Encoder Representations from Transformers (or 'BERT')
AI Language Model to match common misspellings, diacritical marks,
synonyms, and other parameters. Different precision levels can be set for fuzzy match rules to provide granularity
over match results
High Precision to match nicknames, punctuation, international
abbreviations, international alphabet characters, and cross-cultural
spellings. For example, ‘Alexander’ and ‘Alex’.
20
Medium Precision to match values with the same initials, gender variants,
shuffled names, and similar subnames. For example, ‘Eliot’ and ‘Eliott’
Low Precision to match values with loose similarities. For example, ‘Liz’ and
‘Elizabeth’.
Reconciliation Rules
A reconciliation rule is used to determine the preferred value to use in a Unified
field in the event of field value conflicts, for fields that can't contain multiple
values, like 'first name'. Reconciliation rules can be based on:
Last Updated: the most recently updated record based on Last Modified
Date field in the DSO.
Most Frequent: selects the most frequently occurring value
Source priority: sorts DLOs in a ranked order or most to least preferred Reconciliation rules don’t apply to contact points, like phone numbers or email
addresses, as contact points remain part of a unified profile, so all contact
points are available for activation.
Party Matching
Data Cloud uses a notion of a “party” which is an abstract entity that can either
refer to a subject area, a Party Identification DMO, or unique identifying fields in
Data Cloud. Party DMOs in Data Cloud (source: architect.salesforce.com)
21
In the context of identity resolution, this rule uses the Party Identification DMO
to match a set of values to an Individual. This matching method uses the
following attributes
Party Identification Id: a primary key
Party: a foreign key match to the Individual Id field in the Individual DMO
Party Identification Type: an optional descriptive identifier used to provide
additional information about the identifier, for example "Email Subscriber"
Identification Name: a second namespace of the party identification type
that represents the value of the Party Identification Number, for example
"Subscriber Key"
Party Identification Number: the value that is used for matching purposes,
for example, the Marketing Cloud Subscriber Key
A Party Matching Rule essentially matches on the Party Identification Type,
Name and Number values. Conceptually, when these three values are
concatenated together, unified individual records with identical values are
considered to be a match.
Implementation
Comprehensive testing is essential before enabling identity resolution rules, so
rules can be validated for accuracy. It is recommended to identify different data
point permutations from data streams and use sample datasets to test the
resulting unified individual records. Data Cloud provides two separate identity
graphs (or ‘rulesets’) enabling rule changes to be staged, tested and compared
against production records, as each ruleset provides a separate set of unified
individuals. The maximum number of ruleset jobs that can be executed in any 24 hour
period is four rulesets per data space, so a considered approach needs to be
applied to implementing and testing rule changes, as free-form testing
methods may result in the 24 hour threshold being reached.
Summary
While the concept of a ‘unified profile’ is a deviation from data models used in
other Salesforce products and can take some time to comprehend, it provides
22
many benefits to platform users, most notably the ability to preserve data
lineage back to the data source. Another concept to understand is that unified
profiles are fluid by design and don’t create a ‘golden record’ or ‘super record’,
rather they recognise that the representation of an individual, which can
change anytime, providing profile visualization based on in-the-moment data. Identity resolution is not a ‘black box’ — it’s a complex process that requires
careful consideration, configuration and testing to ensure an optimal resolution
process and ultimately representation of an individual. Salesforce has clearly
thought very carefully about identity resolution and addressed many
shortcomings of similar platforms. It’s one of many evolving platform features
that we can expect to rapidly mature, as Salesforce continues to invest heavily
in the product roadmap and enhancements.
23
Insights
Data Cloud includes insights features that enable harmonized
and unified data to be augmented with multi-dimensional
metrics, for example Lifetime Value (LTV), Net Promoter Score
(NPS), Customer Satisfaction Score (CSAT), churn rate, lead
score, to mention just a few. There are two approaches for
creating insights in the platform; calculated insights and
streaming insights. This chapter explains both features.
Calculated Insights
Calculated insights are derived from scheduled batch data and can be used to
build insights from single, multi-dimensional or time-dimensional calculations
and in turn used across business intelligence, segmentation, activation and
personalization use cases. This insight type is applied after data harmonization
and unification, and enables the entire data model (all DMOs) to be queried,
with a broad historical lookback period to right back data inception. The primary use case for calculated insights is in segmentation and activation,
as it enriches attributes in the data model which can be used to determine
applicable audience segments, then trigger and personalize messaging. Use
cases for calculated insights include calculating
Recency Frequency Monetary (RFM)
Lifetime Value (LTV)
Customer Satisfaction Score (CSAT) A further use case for calculated insights is to validate the data quality of
identity resolution rulesets and outliers or anomalies. For example, calculated
insights can determine:
The number of matched contact points for each unified profile
The consolidation rate of matched profiles for each data source
The number of unique contact points for each data source
24
Calculated insights are well-suited for providing a single calculated value which
can be reused across the business, not only within Data Cloud and activation
datasets, but also by external platforms using the Calculated Insights API. For
example, to determine a customer loyalty status or points balance. There are two approaches to creating calculated insights in the platform;
writing SQL expressions, or declaratively through a builder tool in the platform
user interface. SQL expressions are based on the ANSI SQL syntax and include certain
aggregates and functions for calculating measures (quantitative values) which
include:
count (COUNT)
average (AVG)
total (SUM)
minimum (MIN)
maximum (MAX) Streaming insights
are similar to calculated insights, but
are designed to solve
different use ‘realtime’ data use cases
for engagement data.
Measures can be used together with dimensions (qualitative values) for
example name, date and other attributes, to categorize measures. Note that
only measures, not dimensions, can be activated in the platform, however
dimension filters can be used during activation. Calculated insights can also be run sequentially, where one calculated insight
can be used as an input for the next calculated insight (up to three), which is
useful for reusing common logic across insights. Calculated insights are refreshed on a defined frequency (a minimum of one
hour) and are computed by a Spark job running on Amazon EMR (a managed
cluster platform for running big data frameworks). Streaming Insights
Streaming insights are similar to calculated insights, but are designed to solve
different use ‘real-time’ data use cases for engagement data. While calculated
insights follow a batch process, streaming insights are real-time activities, but
25
limited to micro-batches of a few records and can be used by Data Actions in the
platform. Streaming insights are derived from real-time data sources, specifically
the web SDK, mobile SDK and Marketing Cloud Personalization. While calculated insights allow joins across the entire data model to be queried,
streaming insights only permit joining of the Engagement DMO with Individual
and Unified Individual DMOs. Additionally, the available aggregation and interval
functions to query the data are a subset of those available in Streaming Insights.
Streaming insights use cases are typically derived from transaction or eventbased data. Typical use cases include:
Geofencing for location-based data in a mobile app
Open a support case based on a customer review
Trigger emails based on website behavior
Create contact in CRM based on a new ecommerce order Similar to calculated insights, streaming insights can be created using a visual
builder or as defined as an ANSI SQL expression, but measures are limited to
SUM or COUNT. Another key difference is that SQL expressions for streaming
insights require a start and end window definition. This is used to aggregate data
for multiple individuals into a specific time window, spanning 1 minute to 24
hours. Streaming insights can be filtered to only qualify certain events through action
rules. Derived insights trigger a Data Action, like sending an email, mobile push
notification, Platform Event or even a webhook callout to an external platform.
Streaming insights are not currently supported in Segments or Activations.
Summary
While calculated insights and streaming insights follow a similar approach to
deriving insights (through SQL), they fulfill different cases. Calculated insights are
designed for batch processing large data sets and can perform complex
calculations, while streaming insights are focused on processing micro-batches
of real-time events. Both calculated and streaming insights provide a powerful
approach to augment data in the platform with multi-dimensional metrics to fulfill
sales automation, marketing automation and business intelligence use cases.
26
Segments
Data ingested into Salesforce Data Cloud is used to define and
augment individual profiles. This profile data can then be used
to group individual profiles with similar characteristics. This
process is known as segmentation. Data Cloud offers
advanced segmentation capabilities and enables segment
creation to fulfill a variety of business needs. This chapter
explains the segmentation capabilities and functionality in
Data Cloud.
Segments For Everyone
Segmentation is a core feature of any Customer Data Platform (or CDP). These
platforms are specifically designed to fulfill marketing use cases, but Data
Cloud goes beyond CDP use cases and enables different roles in an
organization to create segments according to their business needs, including
Sales teams to pre-qualify leads and prospects based on their web and
email engagement behavior
Service teams to prioritize cases based on customer profiles and needs
Marketing teams to identify customer lifecycle stages and nurture them
using journeys
Analysts to group individuals based demographic, psychographic,
behavioral and geographic data
IT teams to understand use across devices (mobile and desktop) and apps
Product teams to classify product and feature use of digital products
Finance teams to analyze economic values of specific customer groups
Segmentation Concepts
While individual profiles have different relationships to DMOs, segments
provide a view of what this data represents. When considering which data
points to include from a data source, it is important to consider what attributes
are required to create segments.
27
Segments use the harmonized data model together with calculated insights —
not DLOs or DSOs. Only objects with a profile category can be targeted in a
segment, specifically Individual, Unified Individual, and Loyalty Member,
referred to as 'target entities'. Each segment defines the target entity to build the segment (or 'segment on'),
which in turn determines which attributes are available in the attribute library. If
identity resolution has been configured, then it is recommended that Unified
Individual is used as the segment target. Filters are used to define the decision to qualify an individual for segment
membership and include three components:
Container: the DMO for the attribute that is being filtered
Aggregation Criteria: amount of results needed to qualify for the segment
Attribute Criteria: attributes used to filter segment results
Anatomy of a segment filter in Data Cloud
When segmenting on Unified Individual, the platform searches all connected
Individuals to find a match and once an individual is found, then the Unified
Individual profile joins the segment. In the example below, there are two
28
individual records that have been resolved to one Unified Individual profile; one
is a home owner while the other is not. If the Attribute Criteria for the segment
is 'Home Owner is true', then the Unified Individual joins the segment, as the
filter criteria matches one Individual for that Unified Individual profile.
Relationship between a Unified Individual and Individuals
Current and historical segment membership can be previewed or retrieved
from the Segment Membership DMO, which is useful for testing, analytics and
business intelligence purposes.
Segment Types There are two different types of segments available in the platform; standard
and rapid segments. Standard segments refresh every 12 or 24 hours, while
rapid segments refresh every 1 or 4 hours. Rapid segments are only available
for segment activation in Marketing Cloud Engagement, and enable near-time
messaging for and journey injection.
Nested Segments
Nested segments provide a convenient method to reuse segments in other
segments, either as an inclusion or exclusion criteria. For example, ‘include a
segment of all current customers’, or ‘exclude customers that have not opened
an email in the past 12 months’.
29
Limits
Segments that reference the Engagement category data are limited to a two
year lookback window, however segments can be used in conjunction with
calculated insights to aggregate historical data right back to data inception.
Summary
Segments in Data Cloud have multiple use cases for different business
disciplines, which span well beyond the boundaries of CDP segmentation. The feature enables users across different organizational roles to understand,
analyze and target individuals, through a highly configurable and intuitive
interface.
30
Activation
Activation in Data Cloud materializes and publishes a
collection of segment members along with their supporting
attributes to a configured activation platform. There are many
different use cases for activating data, both for sales, service
and marketing automation, including:
Create leads in a CRM platform
Create a support case based on customer survey data
Send an email from a marketing platform
Activate targeted advertising from an ad platform
Personalize web content This chapter explains the activation concepts and features
available in Data Cloud.
Activation Target
An activation target is the receiving platform of a segment. Activation targets
define credential and connection data to allow Data Cloud to send a payload of
segment members to the target platform. Supported activation targets
include:
Amazon S3 cloud storage
Google Ads, Meta and Amazon Ads advertising platforms
Salesforce Marketing Cloud
Data Cloud DMO
Activation
An activation is created from a segment and defines information related to
segment members, including:
31
Activation Target: the target platform or system
Activation Membership: the object containing the segment members to
activate, for example Individual
Attributes: fields from the Activation Membership object or from any DMO,
providing that there is a path to the DMO from the ‘Activated On’ object
defined in the segment.
Contact Points: which contact attribute will be used by the activation target,
for example, a phone number or email address (Contact Points are optional
for Amazon S3 targets)
Source Priority While Data Cloud allows multiple contact points to be defined for each Unified
Individual profile (for example, different email addresses in Individual records
derived from different data sources), a source priority order determines which
contact point value to use for segment members when multiple values are
available. In such scenarios, a source priority order can be used to determine
which contact point value should be used based on a prioritized data source —
for example, an email from a Contact record in CRM, than an email address
from a marketing platform. If the defined source priority ruleset does not resolve a contact point value,
then the contact point is determined based on the highest Einstein
Engagement Score in Marketing Cloud (to predict an Individual's likelihood to
engage with a message), and if that cannot be determined, then the contact
point is defined based lowest numerical identifier associated with an
Individual.
Attributes
Attributes can be included in an activation payload to personalize messages in
Marketing Cloud and Amazon S3 activation targets. Two types of attributes are
available in activations; direct attributes and related attributes. Direct attributes
have a one-to-one relationship for the Individual the DMO, for example, 'First
Name'. Related attributes have a one-to-many relationship to an individual, for
example, 'email opens' or 'ecommerce orders'. 32
Filters can be applied to related attributes (which are different from
segmentation filters) to narrow results for personalization — for example,
customers who have opened an email in the last 30 days. Attributes are structured in a single JSON formatted string, which can be
parsed for messaging personalization using a server-side language in
Marketing Cloud (either AMPscript, SSJS or GTL). Data Actions
While activations publish segments to an activation target, Data Actions enable
near real-time events based on streaming insights or record changes to an
engagement object when a record is created, updated or deleted (a Change
Data Capture event), which in turn can trigger a flow or orchestrate external
processes. Supported data action targets include:
Salesforce Platform Events
Salesforce Marketing Cloud
Webhook Data Actions can optionally be enriched with data from related objects and can
also include one or more event rules to determine when data should be
published based on a set of conditions. Possible use cases for Data Actions
include using:
Salesforce Platform Event to trigger a Flow that updates a Lead record to
Sales Qualified and convert Leads to Contacts based on initial purchase
Marketing Cloud to inject new customers into a welcome journey after
activating their account to send them a series of onboarding drip campaign
emails
Webhook to integrate with logistics and shipping platforms to create new
fulfillment when order is created. Summary
Activations and Data Actions in Data Cloud enables platform data to be
actionable and unlock many use cases, not only marketing automation and
personalization, but also for sales and service use cases.
33
About Us
About the Author
Eliot Harper is a Principal Architect at CloudKettle and a Salesforce MVP. Eliot
is an acknowledged expert in both Salesforce Data Cloud and Salesforce
Marketing Cloud and is author of The AMPscript Guide and Journey Builder
Developer’s Guide. He is a sought-after speaker at international events and
regularly publishes related content on social media.
About CloudKettle
CloudKettle helps enterprises drive revenue with the Salesforce and Google
ecosystems. We do this by providing the strategy and hands-on keyboard
execution to leverage platforms like Salesforce Data Cloud, Sales Cloud,
Marketing Cloud, Einstein, and CRM Analytics to create highly personalized
cross-channel experiences that drive revenue.
As your strategic advisor, we help by enhancing your people, processes, and
technology to build a roadmap centered around scalable tactics and security.
To learn more, contact hello@cloudkettle.com.
34
Download