AI Developers Stymied by Server Shortage at AWS, Microsoft, Google — The Information

AI Developers Stymied by Server Shortage at AWS, Microsoft, Google Art by Mike Sullivan By Aaron Aaron Aaron AaronHolmes Holmes Holmes Holmesand Anissa Gardizy  Share article April 7, 2023 6:00 AM PDT S tartups and other companies trying to capitalize on the artificial intelligence boom sparked by OpenAI are running into a problem: They can’t find enough specialized computers to make their own AI software. A spike in demand for server chips that can train and run machine-learning software has caused a shortage, prompting major cloud-server providers including Amazon Web Services, Microsoft, Google and Oracle to limit their availability for customers, according to interviews with the cloud companies and their customers. Some customers have reported monthslong wait times to rent the hardware. THE TAKEAWAY • Customers report difficulties renting AI servers from major cloud providers • Smaller AI server providers are booming, nearing capacity • The server chip crunch is affecting even OpenAI “All the startups who are trying to get into this space…maybe they can get one [server] but there’s no way they’re going to get five,” said Johnny Dallas, founder and CEO of Zeet, which sells software that makes it easier for engineers to run apps across multiple clouds. The server chip shortage is a frustrating hangup for software developers trying to build AI tools hinging on recent advancements in machine-learning models. These programmers, at small and big companies alike, are developing large-language models to make personalized writing coaches or search engines that respond to questions with written answers rather than links, similar to OpenAI’s ChatGPT. Many others are licensing and augmenting software from OpenAI and its rivals to create specialized customer service chatbots and research tools for corporate employees. For instance, OpenAI software is helping Morgan Stanley bankers find the best locations to auction a work of art, based on the bank’s myriad internal reports on art markets. Yasyf Mohamedali, an engineer-in-residence at venture capital firm Root Ventures, said he has spent weeks trying to rent an AI server from AWS and Google Cloud but hasn’t been successful. He recently managed to get access to one through a small startup that rents them. “It is literally not possible to get access” to AI servers “unless you have some existing contract with [major cloud providers] or you’re pre-paying for it,” said Mohamedali, who is trying to retrofit an oldfashioned photo booth to print images enhanced with AI. At the heart of the problem is one company, Nvidia, that produces the majority of chips—known as graphical processing units—required to develop the AI software. But the shortage doesn’t stem from supply chain problems; rather, cloud providers may have failed to anticipate the current wave of new AI customers and haven’t ordered enough chips, said Wedbush Securities analyst Matt Bryson. That’s because they have been scaling back on building new servers in the past year as cloud-spending growth has slowed, he said. Now the cloud providers are scrambling to get more, though it takes Nvidia two to three months to fulfill new orders, he said. Lately, Nvidia has been shipping its newest line of GPUs to cloud providers, a development that could ease the current shortage. (An Nvidia spokesperson declined to comment.) Cloud providers expanding their data centers also are running into problems getting enough energy sources to power them, according to a February report report report reportfrom commercial real estate firm CBRE. Making matters worse, training AI software requires so much computer processing power that some cloud providers can’t split their GPU-powered servers between different customers the way they do with servers for simpler tasks like hosting websites, according to a person who has worked for multiple cloud providers. Waiting for GPUs Companies trying to rent a large block of GPU servers now have to wait at least several months to access those chips from Amazon, Oracle, Microsoft and Google, according to Naveen Rao, co-founder and CEO of MosaicML, which sells software to help AI developers run their machine-learning models in the cloud. Among MosaicML’s customers, those that have made multiyear spending commitments have had more luck getting GPU servers than small startups that haven’t, Rao said. Avidan Ross, managing partner at Root Ventures, using a photo booth the firm built that generates AI art. Photo by Yasyf Mohamedali / Root Ventures New AWS customers have struggled to get immediate access to GPU servers, and in some cases the company advised them to rent servers that use Trainium, a chip Amazon developed in-house, according to two people with knowledge of the situation. But developers are more familiar with developing software using GPUs, so they prefer those chips. In recent months, multiple customers of Brev.dev, which helps developers use cloud-based servers to train new AI models, reported they were unable to rent a single Nvidia GPU server from AWS, said Brev.dev co-founder and CEO Nader Khalil. Customers that agreed to rent a large number of GPUs at once had more luck, he said. Similarly, Microsoft has also told new cloud customers in the past month they must wait at least several weeks to access AI services that rely on GPU servers, and the company has been rationing access to GPS among its own research and product teams, The Information previously previously previously previouslyreported reported. reported reported Microsoft in recent months has suggested some cloud customers give up GPU servers they paid for in advance but aren’t currently using so it can sell the capacity to other companies, according to someone with direct knowledge. (Customers can get their money back if they give up the reserved servers, they said.) This person also said Microsoft is in early-stage discussions to buy Nvidia GPUs from companies that bought them to support the bitcoin mining industry, which has collapsed. Another company feeling the GPU crunch within Microsoft is OpenAI, which relies on Azure to run its models. The limited GPU capacity has made it harder for OpenAI to add capacity for Foundry, a product that gives its customers dedicated computers for running AI software, according to a person with knowledge of the situation. And Oracle in the past month turned away new AI customers, citing constrained GPU server capacity, according to someone with direct knowledge of the conversations. Oracle has a much smaller cloudserver business but has seen an influx in new business from AI startups, thanks to its relatively inexpensive prices for training and running large-language models, The Information reported reported reported reportedin March. “People who have cost constraints can’t afford to have 100 GPUs running idle.” An Oracle spokesperson did not respond to a request for comment. Jacinda Mein, a Google spokesperson, said Google Cloud has been “able to serve nearly all customer demand” and is currently adding more GPUs. Brandon Sanford, a spokesperson for Microsoft, said the company is “excited by the surge we’re seeing from customers and [we] have processes in place to prioritize customer needs and adjust for demand.” A spokesperson for AWS declined to comment on the record for this article while an OpenAI spokesperson declined to comment. Startups Come to the Rescue Unable to get GPUs from the major cloud providers, Root Ventures’ Mohamedali said he and other founders have turned to a slew of startups such as RunPod, Lambda Labs, Crusoe Energy and CoreWeave. Mohamedali said he secured access to a GPU server a few weeks ago from RunPod, which helps companies sell unused capacity in their data centers, including GPU-powered servers. Major clouds sell GPUs on an always-on basis, meaning customers often pay for server capacity they are not actually using, said Erik Dunteman, founder and CEO of Banana, which lets companies rent GPU servers based on the number of seconds they use them—a more attractive option for smaller developers. “People who have cost constraints can’t afford to have 100 GPUs running idle,” he said. But the smaller cloud-GPU providers are now also grappling with rising demand. Lambda Labs, which sells access to such chips and other hardware for AI development, is “close to capacity” when it comes to Nvidia GPUs, said co-founder and CEO Stephen Balaban. The entirety of the $44 million in funding the 128-person company raised in March will go toward buying more of the chips, he added. RunPod and Crusoe are also close to hitting a limit in the number of GPUs they can provide, company representatives said. Most Crusoe customers are seeking 100 to more than 1,000 GPUs at a time to train and run their AI models, said CEO and co-founder Chase Lochmiller. Though the smaller providers are alleviating some of the chip-server crunch, Mohamedali and Dallas said the big clouds could soon catch up and offer a better experience through features that make it easier to utilize the servers. (For his phone booth project, Mohamedali says he still uses AWS for computing tasks that do not require a GPU.) Quran Goes Down GPU shortages have happened before. For instance, Tarteel, a Quran recitation app that uses GPU servers to power audio transcriptions, was unable to quickly rent additional servers powered by those chips from its cloud providers, Google Cloud and AWS, when demand spiked during the Ramadan holiday in April last year. It needed between eight and 16 of the Nvidia chips at a time to train its AI models. “We’d request capacity and it just wouldn’t be there,” said Anas Abou Allaban, co-founder of Tarteel. “We’d have to continue trying every 5 to 10 minutes.…It usually took a few hours,” he said. Google at one point wouldn’t grant Allaban’s company more chips until he met with a salesperson, he said, which meant the Tarteel service was offline for about 12 hours. (Spokespeople for Google and Amazon declined to comment on Tarteel.) As a result, Tarteel moved its computing workloads that require GPUs to CoreWeave, another small provider of GPU servers. The startup now sometimes requests up to 100 GPUs and hasn’t run into problems, Allaban said. But he said he wouldn’t be surprised if the server crunch hits smaller providers. “They’ll have to keep up with the demand,” he said. Amir Efrati contributed to this article. Aaron Holmes is a reporter covering tech with a focus on enterprise and cybersecurity. You can reach him at aaron@theinformation.com or on Signal at 706-347-1880. Subscriber Comments J Justin Weil Great article. Question: I thought the bitcoin/crypto mining GPUs were a different type than those used for AI (A 100). Is this not the case? Like · Reply · Report · about 11 hours ago A 1 Alex Diep Software Engineer - Google Justin Weil There are two types of accelerators for mining crypto: ASIC and GPUs. ASIC are application specific for mining so they are pointless here. The GPUs used for mining tend to be consumer grade GPUs like RTX series or Radeon. These consumer grade GPUs can be used for small models, they don't have enough memory to fit the big models. Another problem is that they aren't oriented to connecting multiple GPUs together. Gamers today tend to only have a single GPU. This is where the value of A100 and professional grade GPUs come in. They allow for connecting many of them to scale. This lets them process the biggest models that can't fit into a single GPU. Nvidia removed NVLink on 4090 since the 4090 can be as fast as an single A100 so allowing for NVLink will eat into A100 sales. So, yes they are different and they aren't approximate for training large models. Like · Reply · Report · about 10 hours ago 2 J Justin Weil Alex Diep thanks, so my supposition was correct. Which is why I didn’t get the author’s comment about Microsoft wanting to buy GPUs from miners. Even if they were to, there’s plenty of supply of those in the channel. 1 Like · Reply · Report · about 7 hours ago John Rutledge Unicorn Wrangler - Rutledge Family Capital LLC Fascinating. I'm wondering if Ethereum proof of work mining rigs brought online and accessed via the Render Network might help put a material dent in the dearth of computing supply? Anyone have any thoughts on that possibility? Like · Reply · Report · about 3 hours ago R Roger Chen Not you? Log Out Want to edit your profile? Edit Profile POST NEW COMMENT Featured Partner Vanta Compliance doesn’t have to be complicated. In fact, with Vanta it can be super simple. Vanta automates the pricey, time-consuming process of prepping for SOC 2, ISO 27001, GDPR, and more. With Vanta, you can save up to 400 hours of work and 85% of costs. Book a demo to learn why 4,000+ fast-growing companies chose Vanta as their trusted partner — and claim your $1,000 discount. LEARN MORE Corporate Subscriber Circle Vauban from Carta: SPVs & funds for VC and angel investors TapOnIt: The next generation of SMS Loxz: Unleash the Power of ML with World Class Loxz Team Learn More Recent Articles 1. 2. 3. 4. Amazon’s Website Snafu Undercuts Its Image as a Well-Oiled Machine Winners and Losers of the Stablecoin Shakeout Arming the Enemy? Why U.S. VCs Investing in China AI Is Complicated What OpenAI Is Doing That Google Isn’t ORG CHARTS BRIEFINGS ABOUT VIDEO EVENTS PARTNERS © 2013-2023 The Information. All Rights Reserved. TERMS · PRIVACY · PAYMENT POLICY · HELP & SUPPORT · RSS FEED · TIPS · CAREERS

AI Developers Stymied by Server Shortage at AWS, Microsoft, Google — The Information

Related documents

Products

Support

AI Developers Stymied by Server Shortage at AWS, Microsoft, Google — The Information

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib