Uploaded by Vlad Kushkin

AI Developers Stymied by Server Shortage at AWS, Microsoft, Google — The Information

advertisement
AI Developers Stymied by Server
Shortage at AWS, Microsoft, Google
Art by Mike Sullivan
By Aaron
Aaron
Aaron
AaronHolmes
Holmes
Holmes
Holmesand Anissa Gardizy

Share article
April 7, 2023 6:00 AM PDT
S
tartups and other companies trying to capitalize on the artificial intelligence boom sparked by
OpenAI are running into a problem: They can’t find enough specialized computers to make
their own AI software.
A spike in demand for server chips that can train and run machine-learning software has caused a
shortage, prompting major cloud-server providers including Amazon Web Services, Microsoft, Google
and Oracle to limit their availability for customers, according to interviews with the cloud companies
and their customers. Some customers have reported monthslong wait times to rent the hardware.
THE TAKEAWAY
• Customers report difficulties renting AI servers from major cloud providers
• Smaller AI server providers are booming, nearing capacity
• The server chip crunch is affecting even OpenAI
“All the startups who are trying to get into this space…maybe they can get one [server] but there’s no
way they’re going to get five,” said Johnny Dallas, founder and CEO of Zeet, which sells software that
makes it easier for engineers to run apps across multiple clouds.
The server chip shortage is a frustrating hangup for software developers trying to build AI tools hinging
on recent advancements in machine-learning models. These programmers, at small and big companies
alike, are developing large-language models to make personalized writing coaches or search engines
that respond to questions with written answers rather than links, similar to OpenAI’s ChatGPT. Many
others are licensing and augmenting software from OpenAI and its rivals to create specialized customer
service chatbots and research tools for corporate employees. For instance, OpenAI software is helping
Morgan Stanley bankers find the best locations to auction a work of art, based on the bank’s myriad
internal reports on art markets.
Yasyf Mohamedali, an engineer-in-residence at venture capital firm Root Ventures, said he has spent
weeks trying to rent an AI server from AWS and Google Cloud but hasn’t been successful. He recently
managed to get access to one through a small startup that rents them.
“It is literally not possible to get access” to AI servers “unless you have some existing contract with
[major cloud providers] or you’re pre-paying for it,” said Mohamedali, who is trying to retrofit an oldfashioned photo booth to print images enhanced with AI.
At the heart of the problem is one company, Nvidia, that produces the majority of chips—known as
graphical processing units—required to develop the AI software. But the shortage doesn’t stem from
supply chain problems; rather, cloud providers may have failed to anticipate the current wave of new AI
customers and haven’t ordered enough chips, said Wedbush Securities analyst Matt Bryson. That’s
because they have been scaling back on building new servers in the past year as cloud-spending growth
has slowed, he said.
Now the cloud providers are scrambling to get more, though it takes Nvidia two to three months to
fulfill new orders, he said. Lately, Nvidia has been shipping its newest line of GPUs to cloud providers, a
development that could ease the current shortage. (An Nvidia spokesperson declined to comment.)
Cloud providers expanding their data centers also are running into problems getting enough energy
sources to power them, according to a February report
report
report
reportfrom commercial real estate firm CBRE. Making
matters worse, training AI software requires so much computer processing power that some cloud
providers can’t split their GPU-powered servers between different customers the way they do with
servers for simpler tasks like hosting websites, according to a person who has worked for multiple
cloud providers.
Waiting for GPUs
Companies trying to rent a large block of GPU servers now have to wait at least several months to access
those chips from Amazon, Oracle, Microsoft and Google, according to Naveen Rao, co-founder and CEO
of MosaicML, which sells software to help AI developers run their machine-learning models in the
cloud. Among MosaicML’s customers, those that have made multiyear spending commitments have had
more luck getting GPU servers than small startups that haven’t, Rao said.
Avidan Ross, managing partner at Root Ventures, using a photo booth the firm built that generates AI art. Photo by Yasyf
Mohamedali / Root Ventures
New AWS customers have struggled to get immediate access to GPU servers, and in some cases the
company advised them to rent servers that use Trainium, a chip Amazon developed in-house, according
to two people with knowledge of the situation. But developers are more familiar with developing
software using GPUs, so they prefer those chips.
In recent months, multiple customers of Brev.dev, which helps developers use cloud-based servers to
train new AI models, reported they were unable to rent a single Nvidia GPU server from AWS, said
Brev.dev co-founder and CEO Nader Khalil. Customers that agreed to rent a large number of GPUs at
once had more luck, he said.
Similarly, Microsoft has also told new cloud customers in the past month they must wait at least several
weeks to access AI services that rely on GPU servers, and the company has been rationing access to GPS
among its own research and product teams, The Information previously
previously
previously
previouslyreported
reported.
reported
reported
Microsoft in recent months has suggested some cloud customers give up GPU servers they paid for in
advance but aren’t currently using so it can sell the capacity to other companies, according to someone
with direct knowledge. (Customers can get their money back if they give up the reserved servers, they
said.) This person also said Microsoft is in early-stage discussions to buy Nvidia GPUs from companies
that bought them to support the bitcoin mining industry, which has collapsed.
Another company feeling the GPU crunch within Microsoft is OpenAI, which relies on Azure to run its
models. The limited GPU capacity has made it harder for OpenAI to add capacity for Foundry, a product
that gives its customers dedicated computers for running AI software, according to a person with
knowledge of the situation.
And Oracle in the past month turned away new AI customers, citing constrained GPU server capacity,
according to someone with direct knowledge of the conversations. Oracle has a much smaller cloudserver business but has seen an influx in new business from AI startups, thanks to its relatively
inexpensive prices for training and running large-language models, The Information reported
reported
reported
reportedin
March.
“People who have cost constraints can’t afford to have 100 GPUs
running idle.”
An Oracle spokesperson did not respond to a request for comment. Jacinda Mein, a Google
spokesperson, said Google Cloud has been “able to serve nearly all customer demand” and is currently
adding more GPUs. Brandon Sanford, a spokesperson for Microsoft, said the company is “excited by the
surge we’re seeing from customers and [we] have processes in place to prioritize customer needs and
adjust for demand.” A spokesperson for AWS declined to comment on the record for this article while an
OpenAI spokesperson declined to comment.
Startups Come to the Rescue
Unable to get GPUs from the major cloud providers, Root Ventures’ Mohamedali said he and other
founders have turned to a slew of startups such as RunPod, Lambda Labs, Crusoe Energy and
CoreWeave. Mohamedali said he secured access to a GPU server a few weeks ago from RunPod, which
helps companies sell unused capacity in their data centers, including GPU-powered servers.
Major clouds sell GPUs on an always-on basis, meaning customers often pay for server capacity they are
not actually using, said Erik Dunteman, founder and CEO of Banana, which lets companies rent GPU
servers based on the number of seconds they use them—a more attractive option for smaller developers.
“People who have cost constraints can’t afford to have 100 GPUs running idle,” he said.
But the smaller cloud-GPU providers are now also grappling with rising demand. Lambda Labs, which
sells access to such chips and other hardware for AI development, is “close to capacity” when it comes
to Nvidia GPUs, said co-founder and CEO Stephen Balaban. The entirety of the $44 million in funding
the 128-person company raised in March will go toward buying more of the chips, he added. RunPod
and Crusoe are also close to hitting a limit in the number of GPUs they can provide, company
representatives said.
Most Crusoe customers are seeking 100 to more than 1,000 GPUs at a time to train and run their AI
models, said CEO and co-founder Chase Lochmiller.
Though the smaller providers are alleviating some of the chip-server crunch, Mohamedali and Dallas
said the big clouds could soon catch up and offer a better experience through features that make it
easier to utilize the servers. (For his phone booth project, Mohamedali says he still uses AWS for
computing tasks that do not require a GPU.)
Quran Goes Down
GPU shortages have happened before. For instance, Tarteel, a Quran recitation app that uses GPU
servers to power audio transcriptions, was unable to quickly rent additional servers powered by those
chips from its cloud providers, Google Cloud and AWS, when demand spiked during the Ramadan
holiday in April last year. It needed between eight and 16 of the Nvidia chips at a time to train its AI
models.
“We’d request capacity and it just wouldn’t be there,” said Anas Abou Allaban, co-founder of Tarteel.
“We’d have to continue trying every 5 to 10 minutes.…It usually took a few hours,” he said.
Google at one point wouldn’t grant Allaban’s company more chips until he met with a salesperson, he
said, which meant the Tarteel service was offline for about 12 hours. (Spokespeople for Google and
Amazon declined to comment on Tarteel.)
As a result, Tarteel moved its computing workloads that require GPUs to CoreWeave, another small
provider of GPU servers. The startup now sometimes requests up to 100 GPUs and hasn’t run into
problems, Allaban said. But he said he wouldn’t be surprised if the server crunch hits smaller providers.
“They’ll have to keep up with the demand,” he said.
Amir Efrati contributed to this article.
Aaron Holmes is a reporter covering tech with a focus on enterprise and cybersecurity. You can reach him
at aaron@theinformation.com or on Signal at 706-347-1880.
Subscriber Comments
J
Justin Weil
Great article. Question: I thought the bitcoin/crypto mining GPUs were a different type than those used for AI
(A 100). Is this not the case?
Like · Reply · Report · about 11 hours ago
A
1
Alex Diep
Software Engineer - Google
Justin Weil There are two types of accelerators for mining crypto: ASIC and GPUs. ASIC are application
specific for mining so they are pointless here.
The GPUs used for mining tend to be consumer grade GPUs like RTX series or Radeon. These consumer
grade GPUs can be used for small models, they don't have enough memory to fit the big models.
Another problem is that they aren't oriented to connecting multiple GPUs together. Gamers today tend
to only have a single GPU.
This is where the value of A100 and professional grade GPUs come in. They allow for connecting many
of them to scale. This lets them process the biggest models that can't fit into a single GPU. Nvidia
removed NVLink on 4090 since the 4090 can be as fast as an single A100 so allowing for NVLink will
eat into A100 sales.
So, yes they are different and they aren't approximate for training large models.
Like · Reply · Report · about 10 hours ago
2
J
Justin Weil
Alex Diep thanks, so my supposition was correct. Which is why I didn’t get the author’s comment
about Microsoft wanting to buy GPUs from miners. Even if they were to, there’s plenty of supply of
those in the channel.
1
Like · Reply · Report · about 7 hours ago
John Rutledge
Unicorn Wrangler - Rutledge Family Capital LLC
Fascinating.
I'm wondering if Ethereum proof of work mining rigs brought online and accessed via the Render Network
might help put a material dent in the dearth of computing supply?
Anyone have any thoughts on that possibility?
Like · Reply · Report · about 3 hours ago
R
Roger Chen
Not you? Log Out
Want to edit your profile? Edit Profile
POST NEW COMMENT
Featured Partner
Vanta
Compliance doesn’t have to be complicated. In fact, with Vanta it can be super simple. Vanta automates
the pricey, time-consuming process of prepping for SOC 2, ISO 27001, GDPR, and more. With Vanta, you
can save up to 400 hours of work and 85% of costs. Book a demo to learn why 4,000+ fast-growing
companies chose Vanta as their trusted partner — and claim your $1,000 discount.
LEARN MORE
Corporate Subscriber Circle
Vauban from Carta: SPVs & funds for VC and angel investors
TapOnIt: The next generation of SMS
Loxz: Unleash the Power of ML with World Class Loxz Team
Learn More
Recent Articles
1.
2.
3.
4.
Amazon’s Website Snafu Undercuts Its Image as a Well-Oiled Machine
Winners and Losers of the Stablecoin Shakeout
Arming the Enemy? Why U.S. VCs Investing in China AI Is Complicated
What OpenAI Is Doing That Google Isn’t
ORG CHARTS
BRIEFINGS
ABOUT
VIDEO
EVENTS
PARTNERS
© 2013-2023 The Information. All Rights Reserved.
TERMS · PRIVACY · PAYMENT POLICY · HELP & SUPPORT · RSS FEED · TIPS · CAREERS
Download