Uploaded by sihanchen911

AWS and Amazon Sagemaker

advertisement
For the exclusive use of L. Wang, 2024.
9 -6 2 2 -0 6 0
MAY 26, 2022
KARIM R. LAKHANI
SHANE GREENSTEIN
KERRY HERMAN
AWS and Amazon SageMaker (A): The
Commercialization of Machine Learning Services
The vision is to democratize machine learning for the entire planet.
— Amazon SageMaker Team
It was 2018, and Bratin Saha, vice president, Amazon Web Services (AWS) Machine Learning, and
a handful of his AWS colleagues were debating their latest challenge. Machine learning (ML), whose
fundamental structure was inspired by the human brain, offered a promise of insights and prediction
through analytics based on data. Unlike other analytics, ML could detect patterns in unstructured data.
Uses of ML included predicting patients at risk of opioid use disorder by looking at patient
prescriptions; observing products on a factory line to be sure they met quality standards; and, with
sensors that captured vibrations, signaling when a machine needed maintenance.
While the cloud computing services AWS offered had quickly become one of the most sought-after
services across the globe—and was approaching being pervasive and mainstream—creating an ML
service posed some challenges. The data AWS customers stored in the cloud included everything: text,
images, geospatial data, X-rays, audio or speech assets, and much more. Saha noted, “Traditional
analytics relied on structured data, i.e., using data to predict something. But the vast majority of data
in the world—as much as 80% of it—was unstructured. Companies were swimming in text, emails,
social media, tweets. But we could not tap into this data. Our customers were looking for a way to tap
into this data and drive better outcomes; machine learning held promise for this.”
Large technology (tech) companies—those with teams of 10 or more data scientists—and a few
others had been investing in ML for some time and were creating a lot of value by integrating ML into
services. Saha and his team estimated they made up the vast majority of 2017 ML revenues. Other,
smaller companies were interested, and as Saha said, “thought they would experiment a bit.” These
smaller players made up the vast majority of ML volume. And increasingly, AWS was seeing growing
interest from mid-sized companies. Saha said, “There were some who thought ML would eventually
go big and were investing in it and starting to integrate.”
Professors Karim R. Lakhani and Shane Greenstein and Director Kerry Herman (Case Research & Writing Group) prepared this case. It was
reviewed and approved before publication by a company designate. Funding for the development of this case was provided by Harvard Business
School and not by the company. HBS cases are developed solely as the basis for class discussion. Cases are not intended to serve as endorsements,
sources of primary data, or illustrations of effective or ineffective management.
Copyright © 2022 President and Fellows of Harvard College. To order copies or request permission to reproduce materials, call 1-800-545-7685,
write Harvard Business School Publishing, Boston, MA 02163, or go to www.hbsp.harvard.edu. This publication may not be digitized, photocopied,
or otherwise reproduced, posted, or transmitted, without the permission of Harvard Business School.
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
For the exclusive use of L. Wang, 2024.
622-060
AWS and Amazon SageMaker (A): The Commercialization of Machine Learning Services
As they considered building an ML service for AWS under the name SageMaker, the team had to
decide where to start. Saha asked, “Which customers should AWS target first with their ML service?”
Amazon Web Services: A Brief History
In 2006, Amazon launched AWS, a set of information technology (IT) infrastructure services to
businesses in the form of web services—eventually to become known as cloud computing. In the early
2000s, as it solved for its own e-commerce fulfillment needs, Amazon addressed its early (1994) IT
development approaches, none of which had necessarily planned for future requirements. Amazon
leadership discovered a lack of coordination across infrastructure teams, leading to duplicated efforts
around building resources for each project, with little attention to potential efficiencies around scale or
reuse. 1 This hindered any ability to move quickly and flexibly.
To solve this problem, Amazon’s leadership called for a set of standard infrastructure services to be
built for the entire company to access. They mandated well-documented application programming
interfaces (APIs), creating an ordered and disciplined approach to development across Amazon’s
internal development community; these eventually translated seamlessly to third-party developers as
Amazon opened its services. 2 The services all operated behind the AWS Management Console. As one
observer noted, Amazon essentially made infinite disk space available to anyone, at a low, pay-onlyfor-what-you-need price, and paired this with simple APIs such that anyone could “pick it up and
build something useful in it, in the first 24 hours of using an unreleased, unannounced product.” 3
In meeting its own e-commerce fulfillment needs, Amazon had developed valuable expertise in
building and running cost-effective data centers, as well as compute, storage, and database
infrastructure services. These different components combined to form an Internet “operating system.” 4
Packaged as discrete components, on a pay-as-used basis, these services formed AWS. Andy Jassy,
then chief of staff to CEO Jeff Bezos, said, “We realized we could contribute all of those key components
of that [I]nternet operating system, and with that we went to pursue this much broader mission [. . .]
which is [. . .] to allow any organization of company or any developer to run their technology
applications on top of our technology infrastructure platform.” 5
As its CEO, Jassy led AWS from its inception in 2003 as a stand-alone business providing tech
infrastructure for customers’ business use cases, including storage, computing, advertising, ecommerce, and product development. 6 (See Exhibit 1 for the AWS service structure and Exhibit 2 for
a representative list of AWS services.) Amazon advertised the service as an opportunity for small
enterprises or startups to quickly scale their businesses, as it eliminated the need for each small
company or startup to build and maintain their own tech infrastructure. 7 This gave AWS an early set
of connections with new users in the mass-middle business market.
Machine Learning
Machine learning, a subset of artificial intelligence (AI), taught computer programs to identify
patterns in datasets. 8 Early ML work took a knowledge-driven approach. Many traced the earliest
examples of a computer learning program to IBM researcher Arthur Samuel, who in 1952 wrote a
program for an IBM computer to play checkers. The more games the computer played, the more it
improved, as it learned which moves contributed to winning strategies. In 1957, when psychologist
and Cornell researcher Frank Rosenblatt helped design computers that learned through trial and error,
2
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
For the exclusive use of L. Wang, 2024.
AWS and Amazon SageMaker (A): The Commercialization of Machine Learning Services
622-060
he worked on perceptrons a based on a type of neural network that simulated thought processes in the
human brain. 9 In 1967, the “nearest neighbor” algorithm helped computers use pattern recognition.
Other advances followed. Explanation Based Learning, or EBL, taught computers to analyze training
data, and by discarding data it identified as unimportant, create a general rule it could follow. NetTalk
learned to speak (or pronounce words) the same way that an infant would. 10 In the 1990s, ML work
shifted to a more data-driven approach, with programs aimed at helping computers analyze data to
make insights or gain learning. 11
ML consisted of either supervised learning (a model that categorized data based on pre-determined
criteria), unsupervised learning (a model that analyzed datasets to reveal previously unknown trends
or associations), or reinforcement learning (a model that identified the best path to reach a particular
outcome). 12 For example, ML could learn enough to accurately identify malware, or patterns around
anomalous ways data was accessed in the cloud to successfully predict security flaws. 13 It had
applications in the Internet of Things, including some related to automation of mechanical processes
and safety on production lines. It also helped create recommendation systems, voice-activated virtual
assistants, and automation. 14
ML typically took place in several stages. First, users collected the relevant data and “cleaned” it to
standardize all accompanying metadata. For instance, within a dataset that listed educational
institutions, the cleaning stage might transform “HBS” into “Harvard Business School” to prevent the
ML algorithms from categorizing them as separate entities. After cleaning the data, users could apply
labels, calculations, or other modifications.
Next, developers built algorithms that analyzed the data. An ML model’s initial datasets for
supervised learning trained the algorithms to perform specific functions, such as predicting whether
an image depicted a cat or a dog. 15 In contrast, unsupervised learning algorithms analyzed datasets to
identify trends; for example, Netflix’s ML algorithms segmented customers based on their previous
viewing history. 16 After creating and training the algorithms, programmers evaluated the accuracy of
the ML’s results and, if necessary, recalibrated the model or exposed it to additional training datasets. 17
In the final stage of ML development, programmers deployed the system and tracked its performance
over time. 18
ML relied on machine learning libraries—essentially a compilation of functions and routines ready
for use—that allowed developers and data scientists to focus on the model they wanted to train and
deploy, rather than writing code to manage the underlying complexities of their data. They offered
common learning algorithms and utilities, and were typically usable across a range of code, including
Java, Python, and other popular programming languages. Different libraries focused on various
functionalities, i.e., text processing, graphics, data manipulation, or scientific computation. 19
Companies applying ML across their organization all typically applied the same series of steps to
their ML operations: data collection, data processing, feature engineering, data labeling, model design,
model training, model optimization, and model deployment and monitoring. AWS (and other cloud
vendors) aimed to provide a suite of tools that enabled data scientists to complete these steps—pulling
data from their data warehouse, creating their algorithm model code, and deploying the model onto
production—within one go-to environment and set of tools.
a A perceptron, sometimes referred to as the first neural network, was an algorithm that classified input into two possible
categories (i.e., binary).
3
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
For the exclusive use of L. Wang, 2024.
622-060
AWS and Amazon SageMaker (A): The Commercialization of Machine Learning Services
Amazon SageMaker: Making Machine Learning Accessible, But How?
Amazon had been doing ML since the early 2000s for Amazon.com. “We know a lot about what our
customers want,” Mukesh Karki, SageMaker’s engineering director, said. “We can do this and simplify
it for our clients.” Amazon’s retail teams had often innovated to solve internal problems, at times
looking to find a common widget. Omer Zaki, SageMaker’s general manager, added, “Back in the day,
Amazon did a lot of ML for their own operations and retail. These were a leading indicator of what the
customer or industry would need.”
The most significant gaps to making ML more accessible were software tools for different phases of
the ML workflow; simplifications that could make the math behind ML more accessible and
standardized; and a standardized interface between the tools used in different phases of the ML
workflow. Without these, it was very difficult for developers, engineers, and other business unitaffiliated managers not trained in advanced statistics to apply ML to their company’s needs. Saha said,
“For ML and AI we saw that we would have to help develop products that fit customer needs, teach
users new capabilities, and seed adoption through various use cases. It meant teaching users how to
use AI.”
A Long Game?
At a broader level, Saha also believed that, while ML might not be a large part of a company’s IT
spend, it was possible that ML services would play a decisive role in cloud service uptake. He said, “In
10 to 15 years, ML will be integral to everything a company does. So, while today ML could be a small
portion of a company’s total IT spend, over time it will become a bigger portion. It will be an important
factor in which cloud provider a customer chooses.” He added, “So we saw we could have a
disproportionately large downstream impact.”
AWS’s and Amazon’s top leadership supported the team’s explorations. Saha said, “This is where
long-term vision helps. Andy (Jassy), Adam (Selipsky), Peter (DeSantis), Swami (Sivasubramanian)
and AWS leadership in general always reserve a good part of our planning to review future
opportunities so that AWS continues to stay ahead of future customer needs. They ask, ‘What does the
business look like several years out? What investments should we look at?’” Yet, as the team plotted
building and growing new ML services, operating within AWS or Amazon without a viable business
model was not an option. Saha said, “We are a long-term-oriented company, and to show that there is
a viable business, we had to make solid business plan, as if we were a startup. We needed intermediate
milestones.” Saha formalized the team, drawing on engineers with data services experience. The team
sized their investments based on a multi-year plan and set milestones (see Exhibit 3 for their early
vision). The milestones provided guard rails around setting goals to reach their targets. Saha added,
“This is why AWS is a great place to build a business. We have always worked off the long term.”
Relying on AWS Roadmaps
Services including Amazon’s Elastic Compute Cloud (EC2), launched in 2006, and Relational
Database Services (RDS), spun up in 2009, provided a roadmap of sorts for the new ML services, with
some differences. RDS had been able to leverage existing tools, standards, and software approaches
and enhanced them with additional features, availability, performance, scalability, reliability, and
security. It also made them accessible on an as-needed pricing basis in the cloud. In contrast to moving
their enterprise’s IT needs to a cloud provider such as AWS, AI/ML capabilities required a new
education of customers and a better understanding of use cases, with many more users or types of
4
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
For the exclusive use of L. Wang, 2024.
AWS and Amazon SageMaker (A): The Commercialization of Machine Learning Services
622-060
users, all with different learning capabilities. Added to this was the challenge of market segment fit.
Zaki said, “The category didn’t exist.”
What to Build?
Relational database approaches typically operated under established paradigms, with accepted
standards across all the main players (Microsoft SQL Server, MySQL, Oracle Database, etc.), meaning
they were somewhat standardized. In contrast, ML had no such paradigms and was far less uniform
across developers and data scientists, with a mix of libraries, tools, and players, including Apache’s
MXNet, Facebook’s PyTorch, and Google’s TensorFlow. The team’s ML effort faced the challenge of
creating building blocks with value to offer, but this value would be different for different users, under
different circumstances and in differing environments, depending on each customer’s ML experience,
data, needs, and goals.
The team identified the core capabilities data scientists and developers needed for ML. These
included understanding several tools that helped users manage any of the tools or libraries available,
and to apply them to their specific needs; training machine learning models; running inference; and,
supporting notebooks, b which were popular among data scientists. The team learned that some users
needed pre-built algorithms with generic capabilities, while others would bring their own.
Containerization—or operating system (OS) virtualization, where apps could run in isolated user
spaces on the same OS—was important to these users. This raised the challenge of finding the fastest
path to those conditions.
The team recognized that customers’ data scientists had to go through a series of steps to even get
to the point of being able to start using ML. This included getting computing power provisioned to
them, often meaning that systems managers had to take that resource from somewhere else in the
company. The team’s first efforts focused on building a component that could manage these needs.
Sumit Thakur, SageMaker’s principal product manager, had worked on training ML models at scale.
He recalled, “We had to think differently from the way AWS has done things. We’re typically focused
on rebuilding something by moving it to the cloud—this is the AWS cloud/enterprise IT story. Now
we have to think more in terms of a different persona and build for them.” The team developed on the
AWS Management Console—a user interface for their ML service—to launch first.
Developing Road Maps
Getting early versions of the product out was a priority, and the team focused on iterating on
customer input. Saha noted, “We believed velocity of innovation was especially critical in an early
domain where standards had not yet been established. Customer feedback played a key role in our
roadmap process.”
The first step, referred to as the planning season (typically Q2), helped set the stage for strategy and
vision setting. The team captured signals coming from customers using early tool and product
iterations. This feedback informed a planning exercise—thinking through short- and long-term bets—
allowing the team to write up a page or two describing who the customer was and what their needs
were. “We’ll identify four to five questions to answer,” said Thakur. “We think of these as business
pitches.” This then informed the team’s customer outreach and product adjustments. By the beginning
of the following year, the team would have spoken to more customers, understood which ideas were
b Notebooks referred to digital computation notebooks used to document procedures, data, calculations and findings.
5
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
For the exclusive use of L. Wang, 2024.
622-060
AWS and Amazon SageMaker (A): The Commercialization of Machine Learning Services
worth pursuing, and picked out those they considered high-value. Thakur said, “This is what we call
‘working backwards.’”
The next step, or what was also called press release and frequently asked questions (PRFAQ), had
the team filtering the new product features through the customer’s and end user’s experience. Thakur
acknowledged, “This means there is a presence of the customer in the room. It helps align everyone in
the room around ‘Who are we building this for?’ It enables us to authentically represent who we are
building this for.” At the end of the process, the team sought stakeholder review and approvals at every
level. From there, they launched into customer discovery sessions, including writing user stories.
The third step involved concretely scoping those features and experiences that the planning process
and PRFAQ had specified to be solved (see Exhibit 4).
Whom to Build It For?
Machine learning customers could be broadly categorized into three categories—sophisticated,
mid-market, and just-getting-started. The sophisticated companies were often large tech firms with
large data scientist teams who were sophisticated users of ML in many aspects of their business. The
team’s research had concluded that, given the early stages for ML, there were only a few such
companies, but these top ML spenders were a significant opportunity. Saha said, “The top 10 can be a
disproportionate amount of spend. This was an attractive target.” Another group of customers were
companies with mid-sized data science teams doing ML. These customers were typically just starting
to deploy ML at scale in their companies, and the average revenue per customer was comparatively
much smaller. Likewise, this group of customers presented a smaller opportunity, given ML was still
a fairly immature domain. Finally, there were many companies who were just getting started with ML.
These typically had small data science teams, most of whom were still experimenting with ML. Saha
noted, “While the average spend of these customers was also small, the volume here was easily an
order of magnitude more than other customer profiles.”
6
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
For the exclusive use of L. Wang, 2024.
AWS and Amazon SageMaker (A): The Commercialization of Machine Learning Services
Exhibit 1
Source:
AWS Structure
Amazon, “An Introduction to AWS,” https://aws.amazon.com/startups/start-building/how-aws-works/, accessed
September 2021.
Exhibit 2
Select List of Representative AWS Services, 2011
Function
Service Name
Compute
Amazon Elastic Compute Cloud (EC2)
Amazon Elastic MapReduce
Auto Scaling
Amazon CloudFront
Amazon SimpleDB
Amazon Relational Database Service (RDS)
Amazon Fulfillment Web Service (FWS)
Amazon Simple Queue Service (SQS)
Amazon Simple Notification Service (SNS)
Amazon CloudWatch
Amazon Virtual Private Cloud (VPC)
Elastic Load Balancing
Amazon Flexible Payments Service
Amazon DevPay
Amazon Simple Storage Service (S3)
Amazon Elastic Block Storage (EBS)
AWS Import/Export
AWS Premium Support
Alexa Web Information Service
Alexa Top Sites
Amazon Mechanical Turk
Content Delivery
Database
E-Commerce
Messaging
Monitoring
Networking
Payments and Billing
Storage
Support
Web Traffic
Workforce
Source:
622-060
Janakiram MSV, “AWS Service Sprawl Starts to Hurt the Could Ecosystem,” Forbes, January 8, 2018,
https://bit.ly/3Dn6iSv, accessed December 2021.
7
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
For the exclusive use of L. Wang, 2024.
622-060
AWS and Amazon SageMaker (A): The Commercialization of Machine Learning Services
Exhibit 3
Early Vision for Amazon SageMaker Product
Code to Support the PC Operating System
Note:
This diagram illustrates code supporting the PC operating system. Moving from left to right, source code, representing
higher level abstractions, is grabbed by the compiler which converts it (relying on a range of tools and environments,
including libraries, integrated development environment, software project management, code repositories and
debuggers) into executable code on the right. Continuous performance, cost, and ease of use improvements are the
benefits gained by this code process.
Comparing PC and Machine Learning
Note:
This diagram compares PC and machine learning. As with the process to convert source code to executable code,
machine learning starts with data which is then acted on by algorithms which gives a model as output.
8
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
For the exclusive use of L. Wang, 2024.
AWS and Amazon SageMaker (A): The Commercialization of Machine Learning Services
Exhibit 3 (continued)
622-060
Early Vision for Amazon SageMaker Product
Code to Support the ML Operating System
Source:
Company documents.
Note:
The diagram illustrates the underlying tools and activities involved in the ML process. Data represents higher-level
abstractions and relies on SageMaker’s toolkit engines to unpack. From there algorithms and conversion rely on
libraries (i.e., SageMaker’s Deep Learning AMI; SM Studio, SM Experiments SM Model Management; and, SM
Debugger) to complete their actions. The resulting model coming out of this process relies on runtime monitoring (i.e.,
SM Monitor). Continuous performance, cost, ease of use improvements are managed or tracked by additional tools
(i.e., SageMaker’s Engine—EIA, Dynamic Training, Distributed Training, and MLPerf).
Exhibit 4
Illustrative Roadmap to Building High Velocity Innovation Around Customer Needs
Step 1 
Working Backwards
Think through short- and longterm bets.
Consider alternative “business
pitches.”
Reach out to customers and
discuss.
Product adjustments.
Source:
Step 2 
Press Release & Frequently
Asked Questions (PRFAQ)
Step 3
Scoping the Feature
Discuss as if a customer is in
the room.
Focus on: “Who are we building
this feature for?”
Stakeholder review.
Narrow activities to key features.
Writing user stories.
Define features.
Use planning process to identify
priorities.
Identify open issues in PRFAQ.
Casewriter.
9
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
For the exclusive use of L. Wang, 2024.
622-060
AWS and Amazon SageMaker (A): The Commercialization of Machine Learning Services
Endnotes
1 Ron Miller, “How AWS Came to Be,” TechCrunch, July 2, 2016, https://tcrn.ch/2ZI91GB, accessed September 2021.
2 Miller, “How AWS Came to Be.”
3 Tom Krazit, “How Amazon’s S3 Jumpstarted the Cloud Revolution,” Protocol, March 12, 2021, https://bit.ly/3Nw47AV,
accessed December 2021.
4 Miller, “How AWS Came to Be.”
5 Miller, “How AWS Came to Be.”
6 Amazon, “Cloud Computing with AWS,” https://aws.amazon.com/what-is-aws/, accessed September 2021.
7 Amazon, “An Introduction to AWS,” https://go.aws/3Nwse2w, accessed September 2021.
8 Karen Hao, “What Is Machine Learning?” MIT Technology Review, November 17, 2018, https://bit.ly/36zgMT8, accessed
September 2021.
9 Melanie Lefkowitz, “Professor’s Perceptron Paved the Way for AI—60 Years Too Soon,” Cornell Chronicle, September 25,
2010, https://news.cornell.edu/stories/2019/09/professors-perceptron-paved-way-ai-60-years-too-soon, accessed December
2021.
10 Bernard Marr, “A Short History of Machine Learning—Every Manager Should Read,” Forbes, February 19, 2016,
https://bit.ly/36whY9T, accessed December 2021.
11 Marr, “A Short History of Machine Learning—Every Manager Should Read.”
12 Marco Iansiti and Karim R. Lakhani, Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the
World (Boston: Harvard Business Review Press, 2020), pp. 64-69.
13 Bernard Marr, “The Top 10 AI and Machine Learning Use Cases Everyone Should Know About,” Forbes, September 30, 2016,
https://bit.ly/3wJkDrn, accessed November 2021.
14 IBM Cloud Education, “Machine Learning,” IBM, July 15, 2020, https://ibm.co/3IPFYBP, accessed September 2021.
15 Iansiti and Lakhani, Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the World,
pp. 64-66.
16 Iansiti and Lakhani, Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the World,
pp. 66-68.
17 Iansiti and Lakhani, Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the World,
pp. 64-68.
18 Amazon, “Machine Learning with Amazon SageMaker.”
19 Akhil Bhadwal, “15 Best Machine Learning Libraries You Should Know in 2021,” hackr.io blog, February 26, 2021,
https://hackr.io/blog/best-machine-learning-libraries, accessed December 2021.
10
This document is authorized for use only by Lu Wang in DSO 574 (2024) taught by Milan Miric, University of Southern California from Jan 2024 to Jun 2024.
Download