Artificial Intelligence Index Report 2023 Artificial Intelligence Index Report 2023 Introduction to the AI Index Report 2023 Welcome to the sixth edition of the AI Index Report! This year, the report introduces more original data than any previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter, original analysis about large language and multimodal models, detailed trends in global AI legislation records, a study of the environmental impact of AI systems, and more. The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world’s most credible and authoritative source for data and insights about AI. From the Co-Directors AI has moved into its era of deployment; throughout 2022 and the beginning of 2023, new large-scale AI models have been released every month. These models, such as ChatGPT, Stable Diffusion, Whisper, and DALL-E 2, are capable of an increasingly broad range of tasks, from text manipulation and analysis, to image generation, to unprecedentedly good speech recognition. These systems demonstrate capabilities in question answering and the generation of text, image, and code unimagined a decade ago, and they outperform the state of the art on many benchmarks, old and new. However, they are prone to hallucination, routinely biased, and can be tricked into serving nefarious aims, highlighting the complicated ethical challenges associated with their deployment. Although 2022 was the first year in a decade where private AI investment decreased, AI is still a topic of great interest to policymakers, industry leaders, researchers, and the public. Policymakers are talking about AI more than ever before. Industry leaders that have integrated AI into their businesses are seeing tangible cost and revenue benefits. The number of AI publications and collaborations continues to increase. And the public is forming sharper opinions about AI and which elements they like or dislike. AI will continue to improve and, as such, become a greater part of all our lives. Given the increased presence of this technology and its potential for massive disruption, we should all begin thinking more critically about how exactly we want AI to be developed and deployed. We should also ask questions about who is deploying it—as our analysis shows, AI is increasingly defined by the actions of a small set of private sector actors, rather than a broader range of societal actors. This year’s AI Index paints a picture of where we are so far with AI, in order to highlight what might await us in the future. Jack Clark and Ray Perrault Artificial Intelligence Index Report 2023 Top Ten Takeaways 1 Industry races ahead of academia. Until 2014, most significant machine learning models were released by academia. Since then, industry has taken over. In 2022, there were 32 significant industry-produced machine learning models compared to just three produced by academia. Building state-of-the-art AI systems increasingly requires large amounts of data, computer power, and money—resources that industry actors inherently possess in greater amounts compared to nonprofits and academia. 2 Performance saturation on traditional benchmarks. AI continued to post state-of-the-art results, but year-over-year improvement on many benchmarks continues to be marginal. Moreover, the speed at which benchmark saturation is being reached is increasing. However, new, more comprehensive benchmarking suites such as BIG-bench and HELM are being released. 3 AI is both helping and harming the environment. New research suggests that AI systems can have serious environmental impacts. According to Luccioni et al., 2022, BLOOM’s training run emitted 25 times more carbon than a single air traveler on a one-way trip from New York to San Francisco. Still, new reinforcement learning models like BCOOLER show that AI systems can be used to optimize energy usage. 4 The world’s best new scientist … AI? 5 The number of incidents concerning the misuse of AI is rapidly rising. AI models are starting to rapidly accelerate scientific progress and in 2022 were used to aid hydrogen fusion, improve the efficiency of matrix manipulation, and generate new antibodies. According to the AIAAIC database, which tracks incidents related to the ethical misuse of AI, the number of AI incidents and controversies has increased 26 times since 2012. Some notable incidents in 2022 included a deepfake video of Ukrainian President Volodymyr Zelenskyy surrendering and U.S. prisons using call-monitoring technology on their inmates. This growth is evidence of both greater use of AI technologies and awareness of misuse possibilities. 6 The demand for AI-related professional skills is increasing across virtually every American industrial sector. Across every sector in the United States for which there is data (with the exception of agriculture, forestry, fishing, and hunting), the number of AIrelated job postings has increased on average from 1.7% in 2021 to 1.9% in 2022. Employers in the United States are increasingly looking for workers with AIrelated skills. Artificial Intelligence Index Report 2023 Top Ten Takeaways (cont’d) 7 For the first time in the last decade, year-over-year private investment in AI decreased. Global AI private investment was $91.9 billion in 2022, which represented a 26.7% decrease since 2021. The total number of AI-related funding events as well as the number of newly funded AI companies likewise decreased. Still, during the last decade as a whole, AI investment has significantly increased. In 2022 the amount of private investment in AI was 18 times greater than it was in 2013. 8 While the proportion of companies adopting AI has plateaued, the companies that have adopted AI continue to pull ahead. The proportion of companies adopting AI in 2022 has more than doubled since 2017, though it has plateaued in recent years between 50% and 60%, according to the results of McKinsey’s annual research survey. Organizations that have adopted AI report realizing meaningful cost decreases and revenue increases. 9 Policymaker interest in AI is on the rise. An AI Index analysis of the legislative records of 127 countries shows that the number of bills containing “artificial intelligence” that were passed into law grew from just 1 in 2016 to 37 in 2022. An analysis of the parliamentary records on AI in 81 countries likewise shows that mentions of AI in global legislative proceedings have increased nearly 6.5 times since 2016. 10 Chinese citizens are among those who feel the most positively about AI products and services. Americans … not so much. In a 2022 IPSOS survey, 78% of Chinese respondents (the highest proportion of surveyed countries) agreed with the statement that products and services using AI have more benefits than drawbacks. After Chinese respondents, those from Saudi Arabia (76%) and India (71%) felt the most positive about AI products. Only 35% of sampled Americans (among the lowest of surveyed countries) agreed that products and services using AI had more benefits than drawbacks. Artificial Intelligence Index Report 2023 Steering Committee Co-directors Jack Clark Anthropic, OECD Raymond Perrault SRI International Members Erik Brynjolfsson Stanford University Katrina Ligett Hebrew University John Etchemendy Stanford University Terah Lyons Juan Carlos Niebles Stanford University, Salesforce Vanessa Parli Stanford University James Manyika Google, University of Oxford Yoav Shoham (Founding Director) Stanford University, AI21 Labs Russell Wald Stanford University Staff and Researchers Research Manager and Editor in Chief Research Associate Nestor Maslej Stanford University Loredana Fattorini Stanford University Affiliated Researchers Elif Kiesow Cortez Stanford Law School Research Fellow Helen Ngo Hugging Face Robi Rahman Data Scientist Alexandra Rome Freelance Researcher Graduate Researcher Han Bai Stanford University Undergraduate Researchers Vania Chow Stanford University Siddhartha Javvaji Stanford University Mena Hassan Stanford University Naima Patel Stanford University Sukrut Oak Stanford University Stone Yang Stanford University Lucy Zimmerman Stanford University Elizabeth Zhu Stanford University Artificial Intelligence Index Report 2023 How to Cite This Report Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, and Raymond Perrault, “The AI Index 2023 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023. The AI Index 2023 Annual Report by Stanford University is licensed under Attribution-NoDerivatives 4.0 International. Public Data and Tools The AI Index 2023 Report is supplemented by raw data and an interactive tool. We invite each reader to use the data and the tool in a way most relevant to their work and interests. Raw data and charts: The public data and high-resolution images of all the charts in the report are available on Google Drive. Global AI Vibrancy Tool: Compare up to 30 countries across 21 indicators. The Global AI Vibrancy tool will be updated in the latter half of 2023. AI Index and Stanford HAI The AI Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). The AI Index was conceived within the One Hundred Year Study on AI (AI100). We welcome feedback and new ideas for next year. Contact us at AI-Index-Report@stanford.edu. Artificial Intelligence Index Report 2023 Supporting Partners Analytics and Research Partners Artificial Intelligence Index Report 2023 Contributors We want to acknowledge the following individuals by chapter and section for their contributions of data, analysis, advice, and expert commentary included in the AI Index 2023 Report: Research and Development Sara Abdulla, Catherine Aiken, Luis Aranda, Peter Cihon, Jack Clark, Loredana Fattorini, Nestor Maslej, Besher Massri, Vanessa Parli, Naima Patel, Ray Perrault, Robi Rahman, Alexandra Rome, Kevin Xu Technical Performance Jack Clark, Loredana Fattorini, Siddhartha Javvaji, Katrina Ligett, Nestor Maslej, Juan Carlos Niebles, Sukrut Oak, Vanessa Parli, Ray Perrault, Robi Rahman, Alexandra Rome, Yoav Shoham, Elizabeth Zhu Technical AI Ethics Jack Clark, Loredana Fattorini, Katrina Ligett, Nestor Maslej, Helen Ngo, Sukrut Oak, Vanessa Parli, Ray Perrault, Alexandra Rome, Elizabeth Zhu, Lucy Zimmerman Economy Susanne Bieller, Erik Brynjolfsson, Vania Chow, Jack Clark, Natalia Dorogi, Murat Erer, Loredana Fattorini, Akash Kaura, James Manyika, Nestor Maslej, Layla O’Kane, Vanessa Parli, Ray Perrault, Brittany Presten, Alexandra Rome, Nicole Seredenko, Bledi Taska, Bill Valle, Casey Weston Education Han Bai, Betsy Bizot, Jack Clark, John Etchemendy, Loredana Fattorini, Katrina Ligett, Nestor Maslej, Vanessa Parli, Ray Perrault, Sean Roberts, Alexandra Rome Policy and Governance Meghan Anand, Han Bai, Vania Chow, Jack Clark, Elif Kiesow Cortez, Rebecca DeCrescenzo, Loredana Fattorini, Taehwa Hong, Joe Hsu, Kai Kato, Terah Lyons, Nestor Maslej, Alistair Murray, Vanessa Parli, Ray Perrault, Alexandra Rome, Sarah Smedley, Russell Wald, Brian Williams, Catherina Xu, Stone Yang, Katie Yoon, Daniel Zhang Diversity Han Bai, Betsy Bizot, Jack Clark, Loredana Fattorini, Nezihe Merve Gürel, Mena Hassan, Katrina Ligett, Nestor Maslej, Vanessa Parli, Ray Perrault, Sean Roberts, Alexandra Rome, Sarah Tan, Lucy Zimmerman Public Opinion Jack Clark, Loredana Fattorini, Mena Hassan, Nestor Maslej, Vanessa Parli, Ray Perrault, Alexandra Rome, Nicole Seredenko, Bill Valle, Lucy Zimmerman Conference Attendance Terri Auricchio (ICML), Lee Campbell (ICLR), Cassio de Campos (UAI), Meredith Ellison (AAAI), Nicole Finn (CVPR), Vasant Gajanan (AAAI), Katja Hofmann (ICLR), Gerhard Lakemeyer (KR), Seth Lazar (FAccT), Shugen Ma (IROS), Becky Obbema (NeurIPS), Vesna Sabljakovic-Fritz (IJCAI), Csaba Szepesvari (ICML), Matthew Taylor (AAMAS), Sylvie Thiebaux (ICAPS), Pradeep Varakantham (ICAPS) Artificial Intelligence Index Report 2023 We thank the following organizations and individuals who provided data for inclusion in the AI Index 2023 Report: Organizations Code.org Sean Roberts Lightcast Layla O’Kane, Bledi Taska Center for Security and Emerging Technology, Georgetown University Sara Abdulla, Catherine Aiken LinkedIn Murat Erer, Akash Kaura, Casey Weston Computing Research Association Betsy Bizot GitHub Peter Cihon, Kevin Xu Govini Rebecca DeCrescenzo, Joe Hsu, Sarah Smedley McKinsey & Company Natalia Dorogi, Brittany Presten NetBase Quid Nicole Seredenko, Bill Valle OECD.AI Policy Observatory Luis Aranda, Besher Massri Women in Machine Learning Nezihe Merve Gürel, Sarah Tan We also would like to thank Jeanina Casusi, Nancy King, Shana Lynch, Jonathan Mindes, Michi Turner, and Madeleine Wright for their help in preparing this report, and Joe Hinman and Santanu Mukherjee for their help in maintaining the AI Index website. Artificial Intelligence Index Report 2023 Table of Contents Report Highlights 11 Chapter 1 Research and Development 20 Chapter 2 Technical Performance 69 Chapter 3 Technical AI Ethics 125 Chapter 4 The Economy 168 Chapter 5 Education 234 Chapter 6 Policy and Governance 263 Chapter 7 Diversity 296 Chapter 8 Public Opinion 319 Appendix ACCESS THE PUBLIC DATA 344 Artificial Intelligence Index Report 2023 Report Highlights Chapter 1: Research and Development The United States and China had the greatest number of cross-country collaborations in AI publications from 2010 to 2021, although the pace of collaboration has slowed. The number of AI research collaborations between the United States and China increased roughly 4 times since 2010, and was 2.5 times greater than the collaboration totals of the next nearest country pair, the United Kingdom and China. However the total number of U.S.-China collaborations only increased by 2.1% from 2020 to 2021, the smallest year-over-year growth rate since 2010. AI research is on the rise, across the board. The total number of AI publications has more than doubled since 2010. The specific AI topics that continue dominating research include pattern recognition, machine learning, and computer vision. China continues to lead in total AI journal, conference, and repository publications. The United States is still ahead in terms of AI conference and repository citations, but those leads are slowly eroding. Still, the majority of the world’s large language and multimodal models (54% in 2022) are produced by American institutions. Industry races ahead of academia. Until 2014, most significant machine learning models were released by academia. Since then, industry has taken over. In 2022, there were 32 significant industry-produced machine learning models compared to just three produced by academia. Building state-of-the-art AI systems increasingly requires large amounts of data, computer power, and money—resources that industry actors inherently possess in greater amounts compared to nonprofits and academia. Large language models are getting bigger and more expensive. GPT-2, released in 2019, considered by many to be the first large language model, had 1.5 billion parameters and cost an estimated $50,000 USD to train. PaLM, one of the flagship large language models launched in 2022, had 540 billion parameters and cost an estimated $8 million USD—PaLM was around 360 times larger than GPT-2 and cost 160 times more. It’s not just PaLM: Across the board, large language and multimodal models are becoming larger and pricier. Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance Performance saturation on traditional benchmarks. AI continued to post state-of-the-art results, but year-over-year improvement on many benchmarks continues to be marginal. Moreover, the speed at which benchmark saturation is being reached is increasing. However, new, more comprehensive benchmarking suites such as BIG-bench and HELM are being released. Generative AI breaks into the public consciousness. 2022 saw the release of text-to-image models like DALL-E 2 and Stable Diffusion, text-to-video systems like Make-A-Video, and chatbots like ChatGPT. Still, these systems can be prone to hallucination, confidently outputting incoherent or untrue responses, making it hard to rely on them for critical applications. AI systems become more flexible. Traditionally AI systems have performed well on narrow tasks but have struggled across broader tasks. Recently released models challenge that trend; BEiT-3, PaLI, and Gato, among others, are single AI systems increasingly capable of navigating multiple tasks (for example, vision, language). Capable language models still struggle with reasoning. Language models continued to improve their generative capabilities, but new research suggests that they still struggle with complex planning tasks. AI is both helping and harming the environment. New research suggests that AI systems can have serious environmental impacts. According to Luccioni et al., 2022, BLOOM’s training run emitted 25 times more carbon than a single air traveler on a one-way trip from New York to San Francisco. Still, new reinforcement learning models like BCOOLER show that AI systems can be used to optimize energy usage. The world’s best new scientist … AI? AI models are starting to rapidly accelerate scientific progress and in 2022 were used to aid hydrogen fusion, improve the efficiency of matrix manipulation, and generate new antibodies. AI starts to build better AI. Nvidia used an AI reinforcement learning agent to improve the design of the chips that power AI systems. Similarly, Google recently used one of its language models, PaLM, to suggest ways to improve the very same model. Self-improving AI learning will accelerate AI progress. Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics The effects of model scale on bias and toxicity are confounded by training data and mitigation methods. In the past year, several institutions have built their own large models trained on proprietary data—and while large models are still toxic and biased, new evidence suggests that these issues can be somewhat mitigated after training larger models with instruction-tuning. Generative models have arrived and so have their ethical problems. In 2022, generative models became part of the zeitgeist. These models are capable but also come with ethical challenges. Textto-image generators are routinely biased along gender dimensions, and chatbots like ChatGPT can be tricked into serving nefarious aims. The number of incidents concerning the misuse of AI is rapidly rising. According to the AIAAIC database, which tracks incidents related to the ethical misuse of AI, the number of AI incidents and controversies has increased 26 times since 2012. Some notable incidents in 2022 included a deepfake video of Ukrainian President Volodymyr Zelenskyy surrendering and U.S. prisons using call-monitoring technology on their inmates. This growth is evidence of both greater use of AI technologies and awareness of misuse possibilities. Fairer models may not be less biased. Extensive analysis of language models suggests that while there is a clear correlation between performance and fairness, fairness and bias can be at odds: Language models which perform better on certain fairness benchmarks tend to have worse gender bias. Interest in AI ethics continues to skyrocket. The number of accepted submissions to FAccT, a leading AI ethics conference, has more than doubled since 2021 and increased by a factor of 10 since 2018. 2022 also saw more submissions than ever from industry actors. Automated fact-checking with natural language processing isn’t so straightforward after all. While several benchmarks have been developed for automated fact-checking, researchers find that 11 of 16 of such datasets rely on evidence “leaked” from fact-checking reports which did not exist at the time of the claim surfacing. Artificial Intelligence Index Report 2023 Chapter 4: The Economy The demand for AI-related professional skills is increasing across virtually every American industrial sector. Across every sector in the United States for which there is data (with the exception of agriculture, forestry, fishing, and hunting), the number of AI-related job postings has increased on average from 1.7% in 2021 to 1.9% in 2022. Employers in the United States are increasingly looking for workers with AI-related skills. For the first time in the last decade, year-over-year private investment in AI decreased. Global AI private investment was $91.9 billion in 2022, which represented a 26.7% decrease since 2021. The total number of AI-related funding events as well as the number of newly funded AI companies likewise decreased. Still, during the last decade as a whole, AI investment has significantly increased. In 2022 the amount of private investment in AI was 18 times greater than it was in 2013. Once again, the United States leads in investment in AI. The U.S. led the world in terms of total amount of AI private investment. In 2022, the $47.4 billion invested in the U.S. was roughly 3.5 times the amount invested in the next highest country, China ($13.4 billion). The U.S. also continues to lead in terms of total number of newly funded AI companies, seeing 1.9 times more than the European Union and the United Kingdom combined, and 3.4 times more than China. In 2022, the AI focus area with the most investment was medical and healthcare ($6.1 billion); followed by data management, processing, and cloud ($5.9 billion); and Fintech ($5.5 billion). However, mirroring the broader trend in AI private investment, most AI focus areas saw less investment in 2022 than in 2021. In the last year, the three largest AI private investment events were: (1) a $2.5 billion funding event for GAC Aion New Energy Automobile, a Chinese manufacturer of electric vehicles; (2) a $1.5 billion Series E funding round for Anduril Industries, a U.S. defense products company that builds technology for military agencies and border surveillance; and (3) a $1.2 billion investment in Celonis, a business-data consulting company based in Germany. While the proportion of companies adopting AI has plateaued, the companies that have adopted AI continue to pull ahead. The proportion of companies adopting AI in 2022 has more than doubled since 2017, though it has plateaued in recent years between 50% and 60%, according to the results of McKinsey’s annual research survey. Organizations that have adopted AI report realizing meaningful cost decreases and revenue increases. Artificial Intelligence Index Report 2023 Chapter 4: The Economy (cont’d) AI is being deployed by businesses in multifaceted ways. The AI capabilities most likely to have been embedded in businesses include robotic process automation (39%), computer vision (34%), NL text understanding (33%), and virtual agents (33%). Moreover, the most commonly adopted AI use case in 2022 was service operations optimization (24%), followed by the creation of new AI-based products (20%), customer segmentation (19%), customer service analytics (19%), and new AI-based enhancement of products (19%). AI tools like Copilot are tangibly helping workers. Results of a GitHub survey on the use of Copilot, a text-to-code AI system, find that 88% of surveyed respondents feel more productive when using the system, 74% feel they are able to focus on more satisfying work, and 88% feel they are able to complete tasks more quickly. China dominates industrial robot installations. In 2013, China overtook Japan as the nation installing the most industrial robots. Since then, the gap between the total number of industrial robots installed by China and the next-nearest nation has widened. In 2021, China installed more industrial robots than the rest of the world combined. Artificial Intelligence Index Report 2023 Chapter 5: Education More and more AI specialization. The proportion of new computer science PhD graduates from U.S. universities who specialized in AI jumped to 19.1% in 2021, from 14.9% in 2020 and 10.2% in 2010. New AI PhDs increasingly head to industry. In 2011, roughly the same proportion of new AI PhD graduates took jobs in industry (40.9%) as opposed to academia (41.6%). Since then, however, a majority of AI PhDs have headed to industry. In 2021, 65.4% of AI PhDs took jobs in industry, more than double the 28.2% who took jobs in academia. New North American CS, CE, and information faculty hires stayed flat. In the last decade, the total number of new North American computer science (CS), computer engineering (CE), and information faculty hires has decreased: There were 710 total hires in 2021 compared to 733 in 2012. Similarly, the total number of tenure-track hires peaked in 2019 at 422 and then dropped to 324 in 2021. The gap in external research funding for private versus public American CS departments continues to widen. In 2011, the median amount of total expenditure from external sources for computing research was roughly the same for private and public CS departments in the United States. Since then, the gap has widened, with private U.S. CS departments receiving millions more in additional funding than public universities. In 2021, the median expenditure for private universities was $9.7 million, compared to $5.7 million for public universities. Interest in K–12 AI and computer science education grows in both the United States and the rest of the world. In 2021, a total of 181,040 AP computer science exams were taken by American students, a 1.0% increase from the previous year. Since 2007, the number of AP computer science exams has increased ninefold. As of 2021, 11 countries, including Belgium, China, and South Korea, have officially endorsed and implemented a K–12 AI curriculum. Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance Policymaker interest in AI is on the rise. An AI Index analysis of the legislative records of 127 countries shows that the number of bills containing “artificial intelligence” that were passed into law grew from just 1 in 2016 to 37 in 2022. An analysis of the parliamentary records on AI in 81 countries likewise shows that mentions of AI in global legislative proceedings have increased nearly 6.5 times since 2016. From talk to enactment—the U.S. passed more AI bills than ever before. In 2021, only 2% of all federal AI bills in the United States were passed into law. This number jumped to 10% in 2022. Similarly, last year 35% of all state-level AI bills were passed into law. When it comes to AI, policymakers have a lot of thoughts. A qualitative analysis of the parliamentary proceedings of a diverse group of nations reveals that policymakers think about AI from a wide range of perspectives. For example, in 2022, legislators in the United Kingdom discussed the risks of AI-led automation; those in Japan considered the necessity of safeguarding human rights in the face of AI; and those in Zambia looked at the possibility of using AI for weather forecasting. The U.S. government continues to increase spending on AI. Since 2017, the amount of U.S. government AI-related contract spending has increased roughly 2.5 times. The legal world is waking up to AI. In 2022, there were 110 AI-related legal cases in United States state and federal courts, roughly seven times more than in 2016. The majority of these cases originated in California, New York, and Illinois, and concerned issues relating to civil, intellectual property, and contract law. Artificial Intelligence Index Report 2023 Chapter 7: Diversity North American bachelor’s, master’s, and PhD-level computer science students are becoming more ethnically diverse. Although white students are still the most represented ethnicity among new resident bachelor’s, master’s, and PhD-level computer science graduates, students from other ethnic backgrounds (for example, Asian, Hispanic, and Black or African American) are becoming increasingly more represented. For example, in 2011, 71.9% of new resident CS bachelor’s graduates were white. In 2021, that number dropped to 46.7%. New AI PhDs are still overwhelmingly male. In 2021, 78.7% of new AI PhDs were male. Only 21.3% were female, a 3.2 percentage point increase from 2011. There continues to be a gender imbalance in higher-level AI education. Women make up an increasingly greater share of CS, CE, and information faculty hires. Since 2017, the proportion of new female CS, CE, and information faculty hires has increased from 24.9% to 30.2%. Still, most CS, CE, and information faculty in North American universities are male (75.9%). As of 2021, only 0.1% of CS, CE, and information faculty identify as nonbinary. American K–12 computer science education has become more diverse, in terms of both gender and ethnicity. The share of AP computer science exams taken by female students increased from 16.8% in 2007 to 30.6% in 2021. Year over year, the share of Asian, Hispanic/Latino/Latina, and Black/African American students taking AP computer science has likewise increased. Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion Chinese citizens are among those who feel the most positively about AI products and services. Americans … not so much. In a 2022 IPSOS survey, 78% of Chinese respondents (the highest proportion of surveyed countries) agreed with the statement that products and services using AI have more benefits than drawbacks. After Chinese respondents, those from Saudi Arabia (76%) and India (71%) felt the most positive about AI products. Only 35% of sampled Americans (among the lowest of surveyed countries) agreed that products and services using AI had more benefits than drawbacks. Men tend to feel more positively about AI products and services than women. Men are also more likely than women to believe that AI will mostly help rather than harm. According to the 2022 IPSOS survey, men are more likely than women to report that AI products and services make their lives easier, trust companies that use AI, and feel that AI products and services have more benefits than drawbacks. A 2021 survey by Gallup and Lloyd’s Register Foundation likewise revealed that men are more likely than women to agree with the statement that AI will mostly help rather than harm their country in the next 20 years. People across the world and especially America remain unconvinced by self-driving cars. In a global survey, only 27% of respondents reported feeling safe in a self-driving car. Similarly, Pew Research suggests that only 26% of Americans feel that driverless passenger vehicles are a good idea for society. Different causes for excitement and concern. Among a sample of surveyed Americans, those who report feeling excited about AI are most excited about the potential to make life and society better (31%) and to save time and make things more efficient (13%). Those who report feeling more concerned worry about the loss of human jobs (19%); surveillance, hacking, and digital privacy (16%); and the lack of human connection (12%). NLP researchers … have some strong opinions as well. According to a survey widely distributed to NLP researchers, 77% either agreed or weakly agreed that private AI firms have too much influence, 41% said that NLP should be regulated, and 73% felt that AI could soon lead to revolutionary societal change. These were some of the many strong opinions held by the NLP research community. Artificial Intelligence Index Report 2023 Artificial Intelligence Index Report 2023 CHAPTER 1: Research and Development Table of Contents Chapter 1 Preview 20 Artificial Intelligence Index Report 2023 CHAPTER 1 PREVIEW: Research and Development Overview 22 Computer Vision 46 Chapter Highlights 23 Natural Language Processing 47 Speech Recognition 48 1.1 Publications 24 Overview 24 Total Number of AI Publications 24 1.2 Trends in Significant Machine Learning Systems By Type of Publication 25 General Machine Learning Systems By Field of Study 26 System Types 49 By Sector 27 Sector Analysis 50 Cross-Country Collaboration 29 National Affiliation 51 Cross-Sector Collaboration 31 Systems 51 32 Authorship 53 AI Journal Publications 49 49 Overview 32 Parameter Trends 54 By Region 33 Compute Trends 56 By Geographic Area 34 Citations 35 National Affiliation 58 36 Parameter Count 60 Overview 36 Training Compute 61 By Region 37 Training Cost 62 By Geographic Area 38 Citations 39 1.3 AI Conferences 64 AI Repositories 40 Conference Attendance 64 Overview 40 By Region 41 1.4 Open-Source AI Software 66 By Geographic Area 42 Projects 66 Citations 43 Stars 68 AI Conference Publications Narrative Highlight: Top Publishing Institutions All Fields Table of Contents 44 Large Language and Multimodal Models 58 ACCESS THE PUBLIC DATA 44 Chapter 1 Preview 21 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development Overview This chapter captures trends in AI R&D. It begins by examining AI publications, including journal articles, conference papers, and repositories. Next it considers data on significant machine learning systems, including large language and multimodal models. Finally, the chapter concludes by looking at AI conference attendance and open-source AI research. Although the United States and China continue to dominate AI R&D, research efforts are becoming increasingly geographically dispersed. Table of Contents Chapter 1 Preview 22 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development Chapter Highlights The United States and China had the greatest number of cross-country collaborations in AI publications from 2010 to 2021, although the pace of collaboration has since slowed. The number of AI research collaborations between the United States and China increased roughly 4 times since 2010, and was 2.5 times greater than the collaboration totals of the next nearest country pair, the United Kingdom and China. However, the total number of U.S.-China collaborations only increased by 2.1% from 2020 to 2021, the smallest year-overyear growth rate since 2010. AI research is on the rise, across the board. The total number of AI publications has more than doubled since 2010. The specific AI topics that continue to dominate research include pattern recognition, machine learning, and computer vision. China continues to lead in total AI journal, conference, and repository publications. The United States is still ahead in terms of AI conference and repository citations, but those leads are slowly eroding. Still, the majority of the world’s large language and multimodal models (54% in 2022) are produced by American institutions. Table of Contents Chapter 1 Preview Industry races ahead of academia. Until 2014, most significant machine learning models were released by academia. Since then, industry has taken over. In 2022, there were 32 significant industry-produced machine learning models compared to just three produced by academia. Building state-of-the-art AI systems increasingly requires large amounts of data, computer power, and money—resources that industry actors inherently possess in greater amounts compared to nonprofits and academia. Large language models are getting bigger and more expensive. GPT-2, released in 2019, considered by many to be the first large language model, had 1.5 billion parameters and cost an estimated $50,000 USD to train. PaLM, one of the flagship large language models launched in 2022, had 540 billion parameters and cost an estimated $8 million USD—PaLM was around 360 times larger than GPT-2 and cost 160 times more. It’s not just PaLM: Across the board, large language and multimodal models are becoming larger and pricier. 23 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications This section draws on data from the Center for Security and Emerging Technology (CSET) at Georgetown University. CSET maintains a merged corpus of scholarly literature that includes Digital Science’s Dimensions, Clarivate’s Web of Science, Microsoft Academic Graph, China National Knowledge Infrastructure, arXiv, and Papers With Code. In that corpus, CSET applied a classifier to identify Englishlanguage publications related to the development or application of AI and ML since 2010. For this year’s report, CSET also used select Chinese AI keywords to identify Chinese-language AI papers; CSET did not deploy this method for previous iterations of the AI Index report.1 In last year’s edition of the report, publication trends were reported up to the year 2021. However, given that there is a significant lag in the collection of publication metadata, and that in some cases it takes until the middle of any given year to fully capture the previous year’s publications, in this year’s report, the AI Index team elected to examine publication trends only through 2021, which we, along with CSET, are confident yields a more fully representative report. 1.1 Publications Overview publication and citation data by region for AI journal articles, conference papers, repositories, and patents. The figures below capture the total number of English-language and Chinese-language AI Total Number of AI Publications publications globally from 2010 to 2021—by type, Figure 1.1.1 shows the number of AI publications in affiliation, cross-country collaboration, and cross- the world. From 2010 to 2021, the total number of industry collaboration. The section also breaks down AI publications more than doubled, growing from 200,000 in 2010 to almost 500,000 in 2021. Number of AI Publications in the World, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 496.01 Number of AI Publications (in Thousands) 500 400 300 200 100 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.1 1 See the Appendix for more information on CSET’s methodology. For more on the challenge of defining AI and correctly capturing relevant bibliometric data, see the AI Index team’s discussion in the paper “Measurement in AI Policy: Opportunities and Challenges.” Table of Contents Chapter 1 Preview 24 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications By Type of Publication book chapters, theses, and unknown document types Figure 1.1.2 shows the types of AI publications released made up the remaining 10% of publications. While globally over time. In 2021, 60% of all published AI journal and repository publications have grown 3 documents were journal articles, 17% were conference and 26.6 times, respectively, in the past 12 years, the papers, and 13% were repository submissions. Books, number of conference papers has declined since 2019. Number of AI Publications by Type, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 300 293.48, Journal Number of AI Publications (in Thousands) 270 240 210 180 150 120 90 85.09, Conference 65.21, Repository 60 29.88, Thesis 13.77, Book Chapter 5.82, Unknown 2.76, Book 30 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.2 Table of Contents Chapter 1 Preview 25 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications By Field of Study roughly doubled while the number of machine learning Figure 1.1.3 shows that publications in pattern papers has roughly quadrupled. Following those two recognition and machine learning have experienced topic areas, in 2021, the next most published AI fields the sharpest growth in the last half decade. Since of study were computer vision (30,075), algorithm 2015, the number of pattern recognition papers has (21,527), and data mining (19,181). Number of AI Publications by Field of Study (Excluding Other AI), 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report Number of AI Publications (in Thousands) 60 59.36, Pattern Recognition 50 42.55, Machine Learning 40 30 30.07, Computer Vision 20 21.53, Algorithm 19.18, Data Mining 10 14.99, Natural Language Processing 11.57, Control Theory 10.37, Human–Computer Interaction 6.74, Linguistics 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.3 Table of Contents Chapter 1 Preview 26 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications By Sector This section shows the number of AI publications 1.1.5).2 The education sector dominates in each region. affiliated with education, government, industry, The level of industry participation is highest in the nonprofit, and other sectors—first globally (Figure United States, then in the European Union. Since 1.1.4), then looking at the United States, China, and 2010, the share of education AI publications has been the European Union plus the United Kingdom (Figure dropping in each region. AI Publications (% of Total) by Sector, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 80% 75.23%, Education 70% AI Publications (% of Total) 60% 50% 40% 30% 20% 13.60%, Nonpro t 10% 7.21%, Industry 3.74%, Government 0.22%, Other 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.4 2 The categorization is adapted based on the Global Research Identifier Database (GRID). Healthcare, including hospitals and facilities, is included under nonprofit. Publications affiliated with state-sponsored universities are included in the education sector. Table of Contents Chapter 1 Preview 27 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications AI Publications (% of Total) by Sector and Geographic Area, 2021 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 69.17% Education 69.23% 77.85% 14.82% Nonpro t 18.63% 11.73% 12.60% Industry 7.90% 5.47% 3.21% Government 3.92% 4.74% United States 0.20% European Union and United Kingdom Other 0.33% China 0.20% 0% Table of Contents 10% 20% Chapter 1 Preview 30% 40% 50% AI Publications (% of Total) 60% 70% 80% Figure 1.1.5 28 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Cross-Country Collaboration Cross-border collaborations between academics, By far, the greatest number of collaborations in the researchers, industry experts, and others are a key past 12 years took place between the United States component of modern STEM (science, technology, and China, increasing roughly four times since 2010. engineering, and mathematics) development that However the total number of U.S.-China collaborations accelerate the dissemination of new ideas and the only increased by 2.1% from 2020 to 2021, the smallest growth of research teams. Figures 1.1.6 and 1.1.7 depict year-over-year growth rate since 2010. the top cross-country AI collaborations from 2010 to 2021. CSET counted cross-country collaborations as distinct pairs of countries across authors for each publication (e.g., four U.S. and four Chinese-affiliated authors on a single publication are counted as one U.S.-China collaboration; two publications between The next largest set of collaborations was between the United Kingdom and both China and the United States. In 2021, the number of collaborations between the United States and China was 2.5 times greater than between the United Kingdom and China. the same authors count as two collaborations). United States and China Collaborations in AI Publications, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 10.47 Number of AI Publications (in Thousands) 10 8 6 4 2 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.6 Table of Contents Chapter 1 Preview 29 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Cross-Country Collaborations in AI Publications (Excluding U.S. and China), 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 4.13, United Kingdom and China 4.04, United States and United Kingdom Number of AI Publications (in Thousands) 4 3.42, United States and Germany 3 2.80, China and Australia 2.61, United States and Australia 2 1.83, United States and France 1 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.7 Table of Contents Chapter 1 Preview 30 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Cross-Sector Collaboration The increase in AI research outside of academia has educational institutions (12,856); and educational broadened and grown collaboration across sectors and government institutions (8,913). Collaborations in general. Figure 1.1.8 shows that in 2021 educational between educational institutions and industry have institutions and nonprofits (32,551) had the greatest been among the fastest growing, increasing 4.2 times number of collaborations; followed by industry and since 2010. Cross-Sector Collaborations in AI Publications, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 32.55, Education and Nonpro t Number of AI Publications (in Thousands) 30 25 20 15 12.86, Industry and Education 10 8.91, Education and Government 5 2.95, Government and Nonpro t 2.26, Industry and Nonpro t 0.63, Industry and Government 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.8 Table of Contents Chapter 1 Preview 31 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications AI Journal Publications Overview After growing only slightly from 2010 to 2015, the number of AI journal publications grew around 2.3 times since 2015. From 2020 to 2021, they increased 14.8% (Figure 1.1.9). Number of AI Journal Publications, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 293.48 Number of AI Journal Publications (in Thousands) 300 250 200 150 100 50 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.9 Table of Contents Chapter 1 Preview 32 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications By Region3 Figure 1.1.10 shows the share of AI journal publications East Asia and the Pacific; Europe and Central Asia; by region between 2010 and 2021. In 2021, East Asia as well as North America have been declining. and the Pacific led with 47.1%, followed by Europe During that period, there has been an increase in and Central Asia (17.2%), and then North America publications from other regions such as South Asia; (11.6%). Since 2019, the share of publications from and the Middle East and North Africa. AI Journal Publications (% of World Total) by Region, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 50% AI Journal Publications (% of World Total) 47.14%, East Asia and Paci c 40% 30% 20% 17.20%, Europe and Central Asia 11.61%, North America 6.93%, Unknown 6.75%, South Asia 4.64%, Middle East and North Africa 2.66%, Latin America and the Caribbean 2.30%, Rest of the World 0.77%, Sub-Saharan Africa 10% 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.10 3 Regions in this chapter are classified according to the World Bank analytical grouping. Table of Contents Chapter 1 Preview 33 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications By Geographic Area4 Figure 1.1.11 breaks down the share of AI journal throughout, with 39.8% in 2021, followed by the publications over the past 12 years by geographic European Union and the United Kingdom (15.1%), area. This year’s AI Index included India in recognition then the United States (10.0%). The share of Indian of the increasingly important role it plays in the publications has been steadily increasing—from 1.3% AI ecosystem. China has remained the leader in 2010 to 5.6% in 2021. AI Journal Publications (% of World Total) by Geographic Area, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report AI Journal Publications (% of World Total) 40% 39.78%, China 30% 22.70%, Rest of the World 20% 15.05%, European Union and United Kingdom 10.03%, United States 10% 6.88%, Unknown 5.56%, India 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.11 4 In this chapter we use “geographic area” based on CSET’s classifications, which are disaggregated not only by country, but also by territory. Further, we count the European Union and the United Kingdom as a single geographic area to reflect the regions’ strong history of research collaboration. Table of Contents Chapter 1 Preview 34 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Citations China’s share of citations in AI journal publications 1.1.12). China, the European Union and the United has gradually increased since 2010, while those of the Kingdom, and the United States accounted for 65.7% European Union and the United Kingdom, as well as of the total citations in the world. those of the United States, have decreased (Figure AI Journal Citations (% of World Total) by Geographic Area, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 30% 29.07%, China AI Journal Citations (% of World Total) 27.37%, Rest of the World 25% 21.51%, European Union and United Kingdom 20% 15.08%, United States 15% 10% 6.05%, India 5% 0.92%, Unknown 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.12 Table of Contents Chapter 1 Preview 35 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications AI Conference Publications Overview The number of AI conference publications peaked in 2019, and fell 20.4% below the peak in 2021 (Figure 1.1.13). The total number of 2021 AI conference publications, 85,094, was marginally greater than the 2010 total of 75,592. Number of AI Conference Publications, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report Number of AI Conference Publications (in Thousands) 100 85.09 80 60 40 20 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.13 Table of Contents Chapter 1 Preview 36 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications By Region Figure 1.1.14 shows the number of AI conference East Asia and the Pacific continues to rise, accounting publications by region. As with the trend in journal for 36.7% in 2021, followed by Europe and Central publications, East Asia and the Pacific; Europe Asia (22.7%), and then North America (19.6%). The and Central Asia; and North America account for percentage of AI conference publications in South Asia the world’s highest numbers of AI conference saw a noticeable rise in the past 12 years, growing from publications. Specifically, the share represented by 3.6% in 2010 to 8.5% in 2021. AI Conference Publications (% of World Total) by Region, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 40% AI Conference Publications (% of World Total) 36.72%, East Asia and Paci c 35% 30% 25% 22.66%, Europe and Central Asia 20% 19.56%, North America 15% 10% 8.45%, South Asia 3.82%, Middle East and North Africa 3.07%, Latin America and the Caribbean 2.76%, Unknown 2.35%, Rest of the World 0.60%, Sub-Saharan Africa 5% 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.14 Table of Contents Chapter 1 Preview 37 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications By Geographic Area In 2021, China produced the greatest share of the came in third at 17.2% (Figure 1.1.15). Mirroring trends world’s AI conference publications at 26.2%, having seen in other parts of the research and development overtaken the European Union and the United section, India’s share of AI conference publications is Kingdom in 2017. The European Union plus the United also increasing. Kingdom followed at 20.3%, and the United States AI Conference Publications (% of World Total) by Geographic Area, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report AI Conference Publications (% of World Total) 30% 26.84%, Rest of the World 26.15%, China 25% 20.29%, European Union and United Kingdom 20% 17.23%, United States 15% 10% 6.79%, India 5% 2.70%, Unknown 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.15 Table of Contents Chapter 1 Preview 38 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Citations Despite China producing the most AI conference conference citations, with 23.9%, followed by China’s publications in 2021, Figure 1.1.16 shows that 22.0%. However, the gap between American and the United States had the greatest share of AI Chinese AI conference citations is narrowing. AI Conference Citations (% of World Total) by Geographic Area, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 35% AI Conference Citations (% of World Total) 30% 25.57%, Rest of the World 23.86%, United States 22.02%, China 21.59%, European Union and United Kingdom 25% 20% 15% 10% 6.09%, India 5% 0% 0.87%, Unknown 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.16 Table of Contents Chapter 1 Preview 39 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications AI Repositories Overview Publishing pre-peer-reviewed papers on repositories share their findings before submitting them to journals of electronic preprints (such as arXiv and SSRN) and conferences, thereby accelerating the cycle of has become a popular way for AI researchers to information discovery. The number of AI repository disseminate their work outside traditional avenues for publications grew almost 27 times in the past 12 years publication. These repositories allow researchers to (Figure 1.1.17). Number of AI Repository Publications, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report Number of AI Repository Publications (in Thousands) 65.21 60 50 40 30 20 10 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.17 Table of Contents Chapter 1 Preview 40 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications By Region Figure 1.1.18 shows that North America has by East Asia and the Pacific has grown significantly maintained a steady lead in the world share of AI since 2010 and continued growing from 2020 to repository publications since 2016. Since 2011, the 2021, a period in which the year-over-year share of share of repository publications from Europe and North American as well European and Central Asian Central Asia has declined. The share represented repository publications declined. AI Repository Publications (% of World Total) by Region, 2010–21 AI Repository Publications (% of World Total) Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 30% 26.32%, North America 23.99%, Unknown 21.40%, Europe and Central Asia 20% 17.88%, East Asia and Paci c 10% 3.41%, South Asia 3.06%, Middle East and North Africa 1.81%, Rest of the World 1.80%, Latin America and the Caribbean 0.34%, Sub-Saharan Africa 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.18 Table of Contents Chapter 1 Preview 41 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications By Geographic Area While the United States has held the lead in the (Figure 1.1.19). In 2021, the United States accounted percentage of global AI repository publications since for 23.5% of the world’s AI repository publications, 2016, China is catching up, while the European Union followed by the European Union plus the United plus the United Kingdom’s share continues to drop Kingdom (20.5%), and then China (11.9%). AI Repository Publications (% of World Total) by Geographic Area, 2010–21 AI Repository Publications (% of World Total) Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 30% 23.48%, United States 23.18%, Unknown 20.54%, European Union and United Kingdom 20% 18.07%, Rest of the World 11.87%, China 10% 2.85%, India 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.19 Table of Contents Chapter 1 Preview 42 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Citations In the citations of AI repository publications, Figure a dominant lead over the European Union plus the 1.1.20 shows that in 2021 the United States topped United Kingdom (21.5%), as well as China (21.0%). the list with 29.2% of overall citations, maintaining AI Repository Citations (% of World Total) by Geographic Area, 2010–21 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report AI Repository Citations (% of World Total) 40% 30% 29.22%, United States 21.79%, Rest of the World 21.52%, European Union and United Kingdom 20.98%, China 20% 10% 4.59%, Unknown 1.91%, India 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.20 Table of Contents Chapter 1 Preview 43 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Narrative Highlight: Top Publishing Institutions All Fields University, the University of the Chinese Academy Since 2010, the institution producing the greatest of Sciences, Shanghai Jiao Tong University, number of total AI papers has been the Chinese and Zhejiang University.5 The total number of Academy of Sciences (Figure 1.1.21). The next publications released by each of these institutions top four are all Chinese universities: Tsinghua in 2021 is displayed in Figure 1.1.22. Top Ten Institutions in the World in 2021 Ranked by Number of AI Publications in All Fields, 2010–21 Rank Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report 1 1, Chinese Academy of Sciences 2 2, Tsinghua University 3 3, University of Chinese Academy of Sciences 4 4, Shanghai Jiao Tong University 5 5, Zhejiang University 6 6, Harbin Institute of Technology 7 7, Beihang University 8 8, University of Electronic Science and Technology of China 9 9, Peking University 10 10, Massachusetts Institute of Technology 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 1.1.21 5 It is important to note that many Chinese research institutions are large, centralized organizations with thousands of researchers. It is therefore not entirely surprising that, purely by the metric of publication count, they outpublish most non-Chinese institutions. Table of Contents Chapter 1 Preview 44 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Narrative Highlight: Top Publishing Institutions (cont’d) Top Ten Institutions in the World by Number of AI Publications in All Fields, 2021 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report Chinese Academy of Sciences 5,099 Tsinghua University 3,373 University of Chinese Academy of Sciences 2,904 Shanghai Jiao Tong University 2,703 Zhejiang University 2,590 Harbin Institute of Technology 2,016 Beihang University 1,970 University of Electronic Science and Technology of China 1,951 Peking University 1,893 Massachusetts Institute of Technology 1,745 0 500 1,000 1,500 2,000 2,500 3,000 Number of AI Publications 3,500 4,000 4,500 5,000 Figure 1.1.22 Table of Contents Chapter 1 Preview 45 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Narrative Highlight: Top Publishing Institutions (cont’d) Computer Vision In 2021, the top 10 institutions publishing the greatest number of AI computer vision publications were all Chinese (Figure 1.1.23). The Chinese Academy of Sciences published the largest number of such publications, with a total of 562. Top Ten Institutions in the World by Number of AI Publications in Computer Vision, 2021 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report Chinese Academy of Sciences 562 Shanghai Jiao Tong University 316 University of Chinese Academy of Sciences 314 Tsinghua University 296 Zhejiang University 289 Beihang University 247 Wuhan University 231 Beijing Institute of Technology 229 Harbin Institute of Technology 210 Tianjin University 0 182 100 200 300 Number of AI Publications 400 500 Figure 1.1.23 Table of Contents Chapter 1 Preview 46 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Narrative Highlight: Top Publishing Institutions (cont’d) Natural Language Processing American institutions are represented to a took second place (140 publications), followed by greater degree in the share of top NLP publishers Microsoft (134). In addition, 2021 was the first year (Figure 1.1.24). Although the Chinese Academy of Amazon and Alibaba were represented among the Sciences was again the world’s leading institution top-ten largest publishing NLP institutions. in 2021 (182 publications), Carnegie Mellon Top Ten Institutions in the World by Number of AI Publications in Natural Language Processing, 2021 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report Chinese Academy of Sciences 182 Carnegie Mellon University 140 Microsoft (United States) 134 Tsinghua University 127 Carnegie Mellon University Australia 116 Google (United States) 116 Peking University 113 University of Chinese Academy of Sciences 112 Alibaba Group (China) 100 Amazon (United States) 98 0 10 20 30 40 50 60 70 80 90 100 110 120 Number of AI Publications 130 140 150 160 170 180 190 Figure 1.1.24 Table of Contents Chapter 1 Preview 47 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.1 Publications Narrative Highlight: Top Publishing Institutions (cont’d) Speech Recognition In 2021, the greatest number of speech recognition papers came from the Chinese Academy of Sciences (107), followed by Microsoft (98) and Google (75) (Figure 1.1.25). The Chinese Academy of Sciences reclaimed the top spot in 2021 from Microsoft, which held first position in 2020. Top Ten Institutions in the World by Number of AI Publications in Speech Recognition, 2021 Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report Chinese Academy of Sciences 107 Microsoft (United States) 98 Google (United States) 75 University of Chinese Academy of Sciences 66 Tsinghua University 61 University of Science and Technology of China 59 Carnegie Mellon University 57 Tencent (China) 57 Chinese University of Hong Kong 55 Amazon (United States) 0 54 10 20 30 40 50 60 70 Number of AI Publications 80 90 100 110 Figure 1.1.25 Table of Contents Chapter 1 Preview 48 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Epoch AI is a collective of researchers investigating and forecasting the development of advanced AI. Epoch curates a database of significant AI and machine learning systems that have been released since the 1950s. There are different criteria under which the Epoch team decides to include particular AI systems in their database; for example, the system may have registered a state-of-the-art improvement, been deemed to have been historically significant, or been highly cited. This subsection uses the Epoch database to track trends in significant AI and machine learning systems. The latter half of the chapter includes research done by the AI Index team that reports trends in large language and multimodal models, which are models trained on large amounts of data and adaptable to a variety of downstream applications. 1.2 Trends in Significant Machine Learning Systems General Machine Learning Systems System Types The figures below report trends among all machine was language (Figure 1.2.1). There were 23 significant Among the significant AI machine learning systems released in 2022, the most common class of system learning systems included in the Epoch dataset. For reference, these systems are referred to as significant machine learning systems throughout the subsection. AI language systems released in 2022, roughly six times the number of the next most common system type, multimodal systems. Number of Significant Machine Learning Systems by Domain, 2022 Source: Epoch, 2022 | Chart: 2023 AI Index Report Language 23 Multimodal 4 Drawing 3 Vision 2 Speech 2 Text-to-Video 1 Other 1 Games 1 0 2 4 6 8 10 12 14 16 Number of Signi cant Machine Learning Systems 18 20 22 24 Figure 1.2.16 6 There were 38 total significant AI machine learning systems released in 2022, according to Epoch; however, one of the systems, BaGuaLu, did not have a domain classification and is therefore omitted from Figure 1.2.1. Table of Contents Chapter 1 Preview 49 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Sector Analysis Which sector among industry, academia, or nonprofit machine learning systems compared to just three has released the greatest number of significant produced by academia. Producing state-of-the-art machine learning systems? Until 2014, most machine AI systems increasingly requires large amounts of learning systems were released by academia. data, computing power, and money; resources that Since then, industry has taken over (Figure 1.2.2). In industry actors possess in greater amounts compared 2022, there were 32 significant industry-produced to nonprofits and academia. Number of Significant Machine Learning Systems by Sector, 2002–22 Source: Epoch, 2022 | Chart: 2023 AI Index Report Number of Signi cant Machine Learning Systems 35 32, Industry 30 25 20 15 10 5 3, Academia 2, Research Collective 1, Industry-Academia Collaboration 0, Nonpro t 0 2002 2004 Table of Contents 2006 2008 2010 Chapter 1 Preview 2012 2014 2016 2018 2020 2022 Figure 1.2.2 50 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems National Affiliation In order to paint a picture of AI’s evolving or AI-research firm, was headquartered. In 2022, geopolitical landscape, the AI Index research the United States produced the greatest number team identified the nationality of the authors who of significant machine learning systems with 16, contributed to the development of each significant followed by the United Kingdom (8) and China (3). machine learning system in the Epoch dataset. Moreover, since 2002 the United States has outpaced 7 the United Kingdom and the European Union, as well Systems as China, in terms of the total number of significant Figure 1.2.3 showcases the total number of machine learning systems produced (Figure 1.2.4). significant machine learning systems attributed to Figure 1.2.5 displays the total number of significant researchers from particular countries.8 A researcher machine learning systems produced by country since is considered to have belonged to the country in 2002 for the entire world. Number of Significant Machine Learning Systems by Country, 2022 Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report United States 16 United Kingdom 8 China 3 Canada 2 Germany 2 France 1 India 1 Israel 1 Russia 1 Singapore 1 30 25 20 16, United States 15 12, European Union and United Kingdom 10 5 3, China 0 2022 2020 2018 2016 2014 2012 2010 2008 2006 2004 2 4 6 8 10 12 14 16 Number of Signi cant Machine Learning Systems Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report 2002 0 Number of Significant Machine Learning Systems by Select Geographic Area, 2002–22 Number of Significant Machine Learning Systems which their institution, for example a university Figure 1.2.3 Figure 1.2.4 7 The methodology by which the AI Index identified authors’ nationality is outlined in greater detail in the Appendix. 8 A machine learning system is considered to be affiliated with a particular country if at least one author involved in creating the model was affiliated with that country. Consequently, in cases where a system has authors from multiple countries, double counting may occur. Table of Contents Chapter 1 Preview 51 Artificial Intelligence Index Report 2023 Number of Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Machine Learning Systems by Country, 2002–22 (Sum) Source: AI Index, 2022 | Chart: 2023 AI Index Report 0 1–10 11–20 21–60 61–255 Figure 1.2.5 Table of Contents Chapter 1 Preview 52 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Authorship Figures 1.2.6 to 1.2.8 look at the total number of in 2022 the United States had the greatest number of authors, disaggregated by national affiliation, that authors producing significant machine learning systems, contributed to the launch of significant machine with 285, more than double that of the United Kingdom learning systems. As was the case with total systems, and nearly six times that of China (Figure 1.2.6). Number of Authors of Significant Machine Learning Systems by Country, 2022 Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report United States Number of Authors of Significant Machine Learning Systems by Select Geographic Area, 2002–22 Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report 400 285 United Kingdom 350 139 China 300 Canada Number of Authors 49 21 Israel 13 Sweden 8 Germany 7 200 155, European Union and United Kingdom 150 100 Russia 3 50 India 2 49, China 0 France 1 2022 2020 2018 2016 2014 2012 Figure 1.2.6 2010 300 2008 Source: AI Index, 2022 | Chart: 2023 AI Index Report 250 2006 Number of Authors of 100 150 200 Number of Authors 2004 50 2002 0 285, United States 250 Figure 1.2.7 Machine Learning Systems by Country, 2002–22 (Sum) 0 1–10 11–20 21–60 61–180 181–370 371–680 Figure 1.2.8 681–2000 Table of Contents Chapter 1 Preview 53 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Parameter Trends Parameters are numerical values that are learned by dataset by sector. Over time, there has been a steady machine learning models during training. The value of increase in the number of parameters, an increase that parameters in machine learning models determines has become particularly sharp since the early 2010s. how a model might interpret input data and make The fact that AI systems are rapidly increasing their predictions. Adjusting parameters is an essential parameters is reflective of the increased complexity of step in ensuring that the performance of a machine the tasks they are being asked to perform, the greater learning system is optimized. availability of data, advancements in underlying hardware, and most importantly, the demonstrated Figure 1.2.9 highlights the number of parameters of performance of larger models. the machine learning systems included in the Epoch Number of Parameters of Significant Machine Learning Systems by Sector, 1950–2022 Source: Epoch, 2022 | Chart: 2023 AI Index Report 1.0e+14 Academia Industry Industry-Academia Collaboration Nonpro t Research Collective Number of Parameters (Log Scale) 1.0e+12 1.0e+10 1.0e+8 1.0e+6 1.0e+4 1.0e+2 1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022 Figure 1.2.9 Table of Contents Chapter 1 Preview 54 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Figure 1.2.10 demonstrates the parameters of machine learning systems by domain. In recent years, there has been a rise in parameter-rich systems. Number of Parameters of Significant Machine Learning Systems by Domain, 1950–2022 Source: Epoch, 2022 | Chart: 2023 AI Index Report Language Vision Games 1.0e+12 Number of Parameters (Log Scale) 1.0e+10 1.0e+8 1.0e+6 1.0e+4 1.0e+2 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022 Figure 1.2.10 Table of Contents Chapter 1 Preview 55 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Compute Trends machine learning systems has increased exponentially The computational power, or “compute,” of AI in the last half-decade (Figure 1.2.11).9 The growing systems refers to the amount of computational demand for compute in AI carries several important resources needed to train and run a machine implications. For example, more compute-intensive learning system. Typically, the more complex a models tend to have greater environmental impacts, system is, and the larger the dataset on which it is trained, the greater the amount of compute required. The amount of compute used by significant AI and industrial players tend to have easier access to computational resources than others, such as universities. Training Compute (FLOP) of Significant Machine Learning Systems by Sector, 1950–2022 Source: Epoch, 2022 | Chart: 2023 AI Index Report Academia Industry Industry-Academia Collaboration Nonpro t Research Collective 1.0e+24 Training Compute (FLOP – Log Scale) 1.0e+21 1.0e+18 1.0e+15 1.0e+12 1.0e+9 1.0e+6 1.0e+3 1.0e+0 1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022 Figure 1.2.11 9 FLOP stands for “Floating Point Operations” and is a measure of the performance of a computational device. Table of Contents Chapter 1 Preview 56 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Since 2010, it has increasingly been the case that of all machine learning systems, language models are demanding the most computational resources. Training Compute (FLOP) of Significant Machine Learning Systems by Domain, 1950–2022 Source: Epoch, 2022 | Chart: 2023 AI Index Report Language Vision Games 1.0e+24 Training Compute (FLOP – Log Scale) 1.0e+21 1.0e+18 1.0e+15 1.0e+12 1.0e+9 1.0e+6 1.0e+3 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022 Figure 1.2.12 Table of Contents Chapter 1 Preview 57 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Large Language and Multimodal Models are starting to be widely deployed in the real world. Large language and multimodal models, sometimes national affiliation of the authors responsible for National Affiliation called foundation models, are an emerging and increasingly popular type of AI model that is trained on huge amounts of data and adaptable to a variety of downstream applications. Large language and multimodal models like ChatGPT, DALL-E 2, and MakeA-Video have demonstrated impressive capabilities and This year the AI Index conducted an analysis of the releasing new large language and multimodal models.10 The majority of these researchers were from American institutions (54.2%) (Figure 1.2.13). In 2022, for the first time, researchers from Canada, Germany, and India contributed to the development of large language and multimodal models. Authors of Select Large Language and Multimodal Models (% of Total) by Country, 2019–22 Authors of Large Language and Multimodal Models (% of Total) Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report 100% 80% 60% 54.02%, United States 40% 21.88%, United Kingdom 8.04%, China 6.25%, Canada 5.80%, Israel 3.12%, Germany 0.89%, India 0.00%, Korea 20% 0% 2019 2020 2021 2022 Figure 1.2.13 Figure 1.2.14 offers a timeline view of the large PaLM (540B). The only Chinese large language and language and multimodal models that have been multimodal model released in 2022 was GLM-130B, released since GPT-2, along with the national an impressive bilingual (English and Chinese) model affiliations of the researchers who produced the created by researchers at Tsinghua University. BLOOM, models. Some of the notable American large also launched in late 2022, was listed as indeterminate language and multimodal models released in given that it was the result of a collaboration of more 2022 included OpenAI’s DALL-E 2 and Google’s than 1,000 international researchers. 10 The AI models that were considered to be large language and multimodal models were hand-selected by the AI Index steering committee. It is possible that this selection may have omitted certain models. Table of Contents Chapter 1 Preview 58 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Timeline and National Affiliation of Select Large Language and Multimodal Model Releases Source: AI Index, 2022 | Chart: 2023 AI Index Report 2023-Jan BLOOM 2022-Oct 2022-Jul 2022-Apr GLM-130B Minerva (540B) GPT-NeoX-20B 2022-Jan InstructGPT AlphaCode Gopher 2021-Oct Megatron-Turing NLG 530B 2021-Jul Codex HyperClova Wu Dao 2.0 2021-Apr Imagen OPT-175B DALL·E 2 Chinchilla Jurassic-X Stable Diffusion (LDM-KL-8-G) PaLM (540B) PanGu-alpha CogView Wu Dao - Wen Yuan Jurassic-1-Jumbo ERNIE 3.0 GPT-J-6B GPT-Neo DALL-E 2021-Jan 2020-Oct ERNIE-GEN (large) 2020-Jul GPT-3 175B (davinci) 2020-Apr Turing NLG Meena 2020-Jan 2019-Oct Megatron-LM (Original, 8.3B) 2019-Jul T5-11B T5-3B Grover-Mega United States United Kingdom China United States, United Kingdom, Germany, India Korea Canada Israel Germany Indeterminate 2019-Apr GPT-2 2019-Jan Figure 1.2.1411 11 While we were conducting the analysis to produce Figure 1.2.14, Irene Solaiman published a paper that has a similar analysis. We were not aware of the paper at the time of our research. Table of Contents Chapter 1 Preview 59 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Parameter Count Over time, the number of parameters of newly released Google in 2022, had 540 billion, nearly 360 times large language and multimodal models has massively more than GPT-2. The median number of parameters increased. For example, GPT-2, which was the first in large language and multimodal models is increasing large language and multimodal model released in 2019, exponentially over time (Figure 1.2.15). only had 1.5 billion parameters. PaLM, launched by Number of Parameters of Select Large Language and Multimodal Models, 2019–22 Source: Epoch, 2022 | Chart: 2023 AI Index Report 3.2e+12 Wu Dao 2.0 Number of Parameters (Log Scale) 1.0e+12 Megatron-Turing NLG 530B 3.2e+11 HyperClova GPT-3 175B (davinci) PanGu-α Minerva (540B) Gopher PaLM (540B) BLOOM OPT-175B Jurassic-1-Jumbo 1.0e+11 GLM-130B Chinchilla 3.2e+10 GPT-NeoX-20B Turing NLG DALL-E T5-11B GPT-J-6B ERNIE 3.0 1.0e+10 3.2e+9 1.0e+9 Codex Megatron-LM (Original, 8.3B) T5-3B Meena Jurassic-X DALL·E 2 CogView Wu Dao - Wen Yuan Stable Di usion (LDM-KL-8-G) GPT-Neo GPT-2 Grover-Mega ERNIE-GEN (large) 3.2e+8 ov -N 22 20 ug -A 22 20 un -J y 22 a 20 2-M r 2 p 20 2-A ar 2 20 2-M b 2 e 2022-F 20 ec -D 21 20 ct -O 21 20 ug -A 21 ul 20 1-J n 2 u 20 1-J ay 2 20 1-M r 2 p 20 1-A ar 2 20 -M 21 20 an -J 21 20 ug -A 20 20 ay -M 20 20 eb -F 20 an 20 0-J 2 20 ct -O 19 ep 20 9-S 1 20 ay -M 19 20 eb -F 19 20 Figure 1.2.15 Table of Contents Chapter 1 Preview 60 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Training Compute The training compute of large language and multimodal reasoning problems, was roughly nine times greater models has also steadily increased (Figure 1.2.16). The than that used for OpenAI’s GPT-3, which was compute used to train Minerva (540B), a large language released in June 2022, and roughly 1839 times greater and multimodal model released by Google in June than that used for GPT-2 (released February 2019). 2022 that displayed impressive abilities on quantitative Training Compute (FLOP) of Select Large Language and Multimodal Models, 2019–22 Source: Epoch, 2022 | Chart: 2023 AI Index Report 3.2e+24 Training Compute (FLOP – Log Scale) 1.0e+24 GPT-3 175B (davinci) 3.2e+23 Meena 1.0e+23 T5-11B 3.2e+22 T5-3B 1.0e+22 3.2e+21 PaLM (540B) Minerva (540B) Megatron-Turing NLG 530B OPT-175B Gopher Jurassic-1-Jumbo Chinchilla BLOOM GPT-NeoX-20B PanGu-α DALL-E Stable Diffusion HyperClova Turing NLG Megatron-LM (Original, 8.3B) GPT-2 CogView AlphaCode GLM-130B GPT-J-6B GPT-Neo Wu Dao - Wen Yuan 1.0e+21 3.2e+20 1.0e+20 3.2e+19 1.0e+19 ERNIE 3.0 3.2e+18 1.0e+18 ov -N 22 20 ug -A 22 20 un -J y 22 a 20 -M 22 pr 20 2-A ar 2 20 2-M b 2 e 2022-F 20 ec -D 21 20 ct -O 21 20 ug -A 21 ul 20 1-J 2 20 ay -M 21 pr 20 1-A ar 2 20 -M 21 20 an -J 21 20 ay -M 20 20 eb -F 20 an 20 0-J 2 20 ct -O 19 ep 20 9-S 1 20 eb -F 19 20 Figure 1.2.16 Table of Contents Chapter 1 Preview 61 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems Training Cost A particular theme of the discourse around large estimate with the tag of mid, high, or low: mid where language and multimodal models has to do with their the estimate is thought to be a mid-level estimate, hypothesized costs. Although AI companies rarely speak high where it is thought to be an overestimate, and openly about training costs, it is widely speculated that low where it is thought to be an underestimate. In these models cost millions of dollars to train and will certain cases, there was not enough data to estimate become increasingly expensive with scale. the training cost of particular large language and multimodal models, therefore these models were This subsection presents novel analysis in which the omitted from our analysis. AI Index research team generated estimates for the training costs of various large language and multimodal The AI Index estimates validate popular claims that models (Figure 1.2.17). These estimates are based on the large language and multimodal models are increasingly hardware and training time disclosed by the models’ costing millions of dollars to train. For example, authors. In cases where training time was not disclosed, Chinchilla, a large language model launched by we calculated from hardware speed, training compute, DeepMind in May 2022, is estimated to have cost $2.1 and hardware utilization efficiency. Given the possible million, while BLOOM’s training is thought to have cost variability of the estimates, we have qualified each $2.3 million. Estimated Training Cost of Select Large Language and Multimodal Models Source: AI Index, 2022 | Chart: 2023 AI Index Report Mid High Low 11.35 10 8.55 8.01 8 6 4 2019 2020 BLOOM 0.16 GLM-130B OPT-175B Stable Di usion (LDM-KL-8-G) PaLM (540B) 2.29 1.03 Minerva (540B) 1.69 0.60 Chinchilla Codex 0.24 GPT-NeoX-20B ERNIE 3.0 2021 0.09 AlphaCode 0.14 Gopher 0.01 Megatron-Turing NLG 530B 0.27 HyperClova GPT-J-6B GPT-Neo GPT-3 175B Turing NLG Meena Wu Dao - Wen Yuan 0.23 0.02 0.09 0.43 0.11 0.05 T5-11B 0 2.11 1.80 1.47 DALL-E 1.97 2 GPT-2 Training Cost (in Millions of U.S. Dollars) 12 2022 Figure 1.2.17 12 See Appendix for the complete methodology behind the cost estimates. Table of Contents Chapter 1 Preview 62 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.2 Trends in Significant Machine Learning Systems There is also a clear relationship between the cost of large language and multimodal models and their size. As evidenced in Figures 1.2.18 and 1.2.19, the large language and multimodal models with a greater number of parameters and that train using larger amounts of compute tend to be more expensive. Estimated Training Cost of Select Large Language and Multimodal Models and Number of Parameters Source: AI Index, 2022 | Chart: 2023 AI Index Report HyperClova GLM-130B 1.0e+11 OPT-175B BLOOM GPT-3 175B Chinchilla AlphaCode 5.0e+10 GPT-NeoX-20B Turing NLG 2.0e+10 Codex ERNIE 3.0 T5-11B DALL-E GPT-J-6B 5.0e+9 Wu Dao - Wen Yuan 2.0e+9 GPT-2 GPT-Neo Meena PaLM (540B) Megatron-Turing NLG 530B OPT-175B Gopher Chinchilla 1.0e+24 Training Compute (FLOP – Log Scale) Number of Parameters (Log Scale) Minerva (540B) Megatron-Turing NLG 530B Gopher 2.0e+11 1.0e+10 Source: AI Index, 2022 | Chart: 2023 AI Index Report PaLM (540B) Minerva (540B) 5.0e+11 Estimated Training Cost of Select Large Language and Multimodal Models and Training Compute (FLOP) GPT-NeoX-20B AlphaCode DALL-E Turing NLG 1.0e+22 Meena GLM-130B Stable Diffusion BLOOM T5-11B GPT-J-6B GPT-Neo GPT-2 Wu Dao - Wen Yuan 1.0e+20 Stable Di usion ERNIE 3.0 1.0e+18 1.0e+9 10k 100k 1M 10M Training Cost (in U.S. Dollars - Log Scale) Figure 1.2.18 Table of Contents Chapter 1 Preview 10k 100k 1M 10M Training Cost (in U.S. Dollars - Log Scale) Figure 1.2.19 63 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.3 AI Conferences AI conferences are key venues for researchers to share their work and connect with peers and collaborators. Conference attendance is an indication of broader industrial and academic interest in a scientific field. In the past 20 years, AI conferences have grown in size, number, and prestige. This section presents data on the trends in attendance at major AI conferences. 1.3 AI Conferences Conference Attendance International Conference on Principles of Knowledge After a period of increasing attendance, the total strictly in-person. Representation and Reasoning (KR) were both held attendance at the conferences for which the AI Neural Information Processing Systems (NeurIPS) Index collected data dipped in 2021 and again in continued to be one of the most attended 2022 (Figure 1.3.1).13 This decline may be attributed conferences, with around 15,530 attendees (Figure to the fact that many conferences returned to hybrid 1.3.2).14 The conference with the greatest one- or in-person formats after being fully virtual in year increase in attendance was the International 2020 and 2021. For example, the International Joint Conference on Robotics and Automation (ICRA), Conference on Artificial Intelligence (IJCAI) and the from 1,000 in 2021 to 8,008 in 2022. Number of Attendees at Select AI Conferences, 2010–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report 90 Number of Attendees (in Thousands) 80 70 60 59.45 50 40 30 20 10 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 1.3.1 13 This data should be interpreted with caution given that many conferences in the last few years have had virtual or hybrid formats. Conference organizers report that measuring the exact attendance numbers at virtual conferences is difficult, as virtual conferences allow for higher attendance of researchers from around the world. 14 In 2021, 9,560 of the attendees attended NeurIPS in-person and 5,970 remotely. Table of Contents Chapter 1 Preview 64 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.3 AI Conferences Attendance at Large Conferences, 2010–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report 30 Number of Attendees (in Thousands) 25 20 15.53, NeurIPS 15 10.17, CVPR 8.01, ICRA 7.73, ICML 10 5.35, ICLR 4.32, IROS 3.56, AAAI 5 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 1.3.2 Attendance at Small Conferences, 2010–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report Number of Attendees (in Thousands) 3.50 3.00 2.50 2.01, IJCAI 2.00 1.50 1.09, FaccT 1.00 0.66, UAI 0.50, AAMAS 0.39, ICAPS 0.50 0.12, KR 0.00 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 1.3.3 Table of Contents Chapter 1 Preview 65 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.4 Open-Source AI Software GitHub is a web-based platform where individuals and coding teams can host, review, and collaborate on various code repositories. GitHub is used extensively by software developers to manage and share code, collaborate on various projects, and support open-source software. This subsection uses data provided by GitHub and the OECD.AI policy observatory. These trends can serve as a proxy for some of the broader trends occuring in the world of open-source AI software not captured by academic publication data. 1.4 Open-Source AI Software Projects A GitHub project is a collection of files that software project. Since 2011, the total number of can include the source code, documentation, AI-related GitHub projects has steadily increased, configuration files, and images that constitute a growing from 1,536 in 2011 to 347,934 in 2022. Number of GitHub AI Projects, 2011–22 Source: GitHub, 2022; OECD.AI, 2022 | Chart: 2023 AI Index Report 350 348 Number of AI Projects (in Thousands) 300 250 200 150 100 50 0 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 1.4.1 Table of Contents Chapter 1 Preview 66 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.4 Open-Source AI Software As of 2022, a large proportion of GitHub AI projects United Kingdom (17.3%), and then the United States were contributed by software developers in India (14.0%). The share of American GitHub AI projects (24.2%) (Figure 1.4.2). The next most represented has been declining steadily since 2016. geographic area was the European Union and the GitHub AI Projects (% Total) by Geographic Area, 2011–22 Source: GitHub, 2022; OECD.AI, 2022 | Chart: 2023 AI Index Report 42.11%, Rest of the World 40% 35% AI Projects (% of Total) 30% 25% 24.19%, India 20% 17.30%, European Union and United Kingdom 15% 14.00%, United States 10% 5% 2.40%, China 0% 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 1.4.2 Table of Contents Chapter 1 Preview 67 Artificial Intelligence Index Report 2023 Chapter 1: Research and Development 1.4 Open-Source AI Software Stars GitHub users can bookmark or save a repository Figure 1.4.3 shows the cumulative number of stars of interest by “starring” it. A GitHub star is similar attributed to projects belonging to owners of various to a “like” on a social media platform and indicates geographic areas. As of 2022, GitHub AI projects support for a particular open-source project. Some of from the United States received the most stars, the most starred GitHub repositories include libraries followed by the European Union and the United like TensorFlow, OpenCV, Keras, and PyTorch, which Kingdom, and then China. In many geographic areas, are widely used by software developers in the AI the total number of new GitHub stars has leveled off coding community. in the last few years. Number of GitHub Stars by Geographic Area, 2011–22 Source: GitHub, 2022; OECD.AI, 2022 | Chart: 2023 AI Index Report Number of Cumulative GitHub Stars (in Millions) 3.50 3.44, United States 3.00 2.69, Rest of the World 2.50 2.34, European Union and United Kingdom 2.00 1.53, China 1.50 1.00 0.50 0.46, India 0.00 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 1.4.3 Table of Contents Chapter 1 Preview 68 Artificial Intelligence Index Report 2023 CHAPTER 2: Technical Performance Artificial Intelligence Index Report 2023 CHAPTER 2 PREVIEW: Technical Performance Overview 72 Chapter Highlights 73 Narrative Highlight: A Closer Look at Progress in Image Generation 90 Visual Reasoning 92 2.1 What’s New in 2022: A Timeline 74 Visual Question Answering (VQA) Challenge 2.2 Computer Vision—Image 81 Narrative Highlight: The Rise of Capable Multimodal Reasoning Systems 93 Image Classification 81 ImageNet Visual Commonsense Reasoning (VCR) 95 81 Face Detection and Recognition National Institute of Standards and Technology Face Recognition Vendor Test (FRVT) Deepfake Detection Celeb-DF 82 MPII Cityscapes Challenge, Pixel-Level Semantic Labeling Task Medical Image Segmentation 85 86 86 87 Object Detection 88 88 89 CIFAR-10 and STL-10 Chapter 2 Preview 96 Narrative Highlight: A Closer Look at the Progress of Video Generation 87 Image Generation Activity Recognition 84 Kvasir-SEG Common Objects in Context (COCO) 96 Kinetics-400, Kinetics-600, Kinetics-700 96 85 Semantic Segmentation 2.3 Computer Vision—Video 83 84 Human Pose Estimation Table of Contents 92 89 98 2.4 Language 99 English Language Understanding 99 SuperGLUE 99 Reading Comprehension Dataset Requiring Logical Reasoning (ReClor) 100 Narrative Highlight: Just How Much Better Have Language Models Become? 102 Narrative Highlight: Planning and Reasoning in Large Language Models 103 Text Summarization arXiv and PubMed 104 104 70 Artificial Intelligence Index Report 2023 CHAPTER 2 PREVIEW (CONT’D): Technical Performance Natural Language Inference Abductive Natural Language Inference (aNLI) Sentiment Analysis SST-5 Fine-Grained Classification 2.7 Hardware 115 MLPerf Training Time 115 105 MLPerf Inference 117 106 Trends in GPUs 118 105 106 Multitask Language Understanding 107 2.8 Environment 120 Massive Multitask Language Understanding (MMLU) 107 Environmental Impact of Select Large Language Models 120 Machine Translation (MT) Number of Commercially Available MT Systems 108 109 Speech Recognition 109 109 Narrative Highlight: Whisper 110 2.6 Reinforcement Learning 112 Reinforcement Learning Environments 112 Procgen 112 Narrative Highlight: Benchmark Saturation 114 Table of Contents Chapter 2 Preview 122 108 2.5 Speech VoxCeleb Narrative Highlight: Using AI to Optimize Energy Usage 2.9 AI for Science 123 Accelerating Fusion Science Through Learned Plasma Control 123 Discovering Novel Algorithms for Matrix Manipulation With AlphaTensor 123 Designing Arithmetic Circuits With Deep Reinforcement Learning 124 Unlocking de Novo Antibody Design With Generative AI 124 ACCESS THE PUBLIC DATA 71 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance Overview This year’s technical performance chapter features analysis of the technical progress in AI during 2022. Building on previous reports, this chapter chronicles advancement in computer vision, language, speech, reinforcement learning, and hardware. Moreover, this year this chapter features an analysis on the environmental impact of AI, a discussion of the ways in which AI has furthered scientific progress, and a timeline-style overview of some of the most significant recent AI developments. Table of Contents Chapter 2 Preview 72 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance Chapter Highlights Performance saturation on traditional benchmarks. Generative AI breaks into the public consciousness. AI continued to post state-of-the-art results, but year-over-year improvement on many benchmarks continues to be marginal. Moreover, the speed at which benchmark saturation is being reached is increasing. However, new, more comprehensive benchmarking suites such as BIG-bench and HELM are being released. 2022 saw the release of text-to-image models like DALL-E 2 and Stable Diffusion, text-tovideo systems like Make-A-Video, and chatbots like ChatGPT. Still, these systems can be prone to hallucination, confidently outputting incoherent or untrue responses, making it hard to rely on them for critical applications. AI systems become more flexible. Traditionally AI systems have performed well on narrow tasks but have struggled across broader tasks. Recently released models challenge that trend; BEiT-3, PaLI, and Gato, among others, are single AI systems increasingly capable of navigating multiple tasks (for example, vision, language). AI is both helping and harming the environment. New research suggests that AI systems can have serious environmental impacts. According to Luccioni et al., 2022, BLOOM’s training run emitted 25 times more carbon than a single air traveler on a one-way trip from New York to San Francisco. Still, new reinforcement learning models like BCOOLER show that AI systems can be used to optimize energy usage. Table of Contents Chapter 2 Preview Capable language models still struggle with reasoning. Language models continued to improve their generative capabilities, but new research suggests that they still struggle with complex planning tasks. The world’s best new scientist … AI? AI models are starting to rapidly accelerate scientific progress and in 2022 were used to aid hydrogen fusion, improve the efficiency of matrix manipulation, and generate new antibodies. AI starts to build better AI. Nvidia used an AI reinforcement learning agent to improve the design of the chips that power AI systems. Similarly, Google recently used one of its language models, PaLM, to suggest ways to improve the very same model. Self-improving AI learning will accelerate AI progress. 73 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.1 What’s New in 2022: A Timeline The technical performance chapter begins with an overview of some of the most significant technical developments in AI during 2022, as selected by the AI Index Steering Committee. 2.1 What’s New in 2022: A Timeline Feb. 2, 2022 DeepMind Releases AlphaCode AlphaCode, an AI system that writes computer programs at a competitive level, achieves a rank within the top 54% of participants in a human programming competition. This represents an improvement on the more complex problemsolving tasks with which AI has traditionally struggled. Figure 2.1.1 Feb. 16, 2022 DeepMind Trains Reinforcement Learning Agent to Control Nuclear Fusion Plasma in a Tokamak Nuclear fusion is a potential source of clean, limitless energy, but producing such energy in tokamaks is difficult due to a lack of experimental data. DeepMind simulated optimal tokamak management, an example of how AI can accelerate science and combat climate change. March 10, 2022 Figure 2.1.2 IndicNLG Benchmarks Natural Language Generation for Indic Languages An international research collective launches IndicNLG, a collection of datasets for benchmarking natural language generation for 11 Indic languages. The creation of IndicNLG increases the potential for AI systems to generate language in more diverse, non-English linguistic settings. Figure 2.1.3 Table of Contents Chapter 2 Preview 74 Artificial Intelligence Index Report 2023 March 24, 2022 Chapter 2: Technical Performance 2.1 What’s New in 2022: A Timeline Meta AI Releases Make-A-Scene Make-A-Scene is a text-to-image AI model that enables users to generate images through text. Make-A-Scene is one of many text-to-image models released in 2022. Figure 2.1.4 April 5, 2022 Google Releases PaLM Google’s AI team trains one of the world’s largest language models, PaLM. Made up of 540 billion parameters, PaLM reinforces the belief that researchers can improve performance on large language models by simply training them on more data. April 13, 2022 Figure 2.1.5 OpenAI Releases DALL-E 2 DALL-E 2, a text-to-image AI system that can create realistic art and images from textual descriptions, is released to the public, igniting a generative AI craze. Figure 2.1.6 May 12, 2022 DeepMind Launches Gato Gato is a new reinforcement learning agent capable of doing a wide range of tasks such as robotic manipulation, game playing, image captioning, and natural language generation. The release of such models suggests that AI systems are becoming better at generalization. Figure 2.1.7 Table of Contents Chapter 2 Preview 75 Artificial Intelligence Index Report 2023 May 23, 2022 Chapter 2: Technical Performance 2.1 What’s New in 2022: A Timeline Google Releases Imagen Imagen is a text-to-image diffusion model capable of producing images with a high degree of photorealism. Imagen’s launch also comes with the release of DrawBench, a challenging new benchmark for text-to-image systems. Figure 2.1.8 June 9, 2022 442 Authors Across 132 Institutions Team Up to Launch BIG-bench In order to better challenge increasingly capable large language models, a team of 442 authors across 132 institutions launch the Beyond the Imitation Game benchmark (BIG-bench). The benchmark consists of 204 tasks ranging from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, and software development. Figure 2.1.9 June 21, 2022 GitHub Makes Copilot Available as a Subscription-Based Service for Individual Developers Copilot is a generative AI system capable of turning natural language prompts into coding suggestions across multiple languages. Similar systems include OpenAI’s Codex and Salesforce’s CodeGen. Surveys suggest that Copilot makes coders more productive and less frustrated. Figure 2.1.10 Table of Contents Chapter 2 Preview 76 Artificial Intelligence Index Report 2023 July 8, 2022 Chapter 2: Technical Performance 2.1 What’s New in 2022: A Timeline Nvidia Uses Reinforcement Learning to Design Better-Performing GPUs Nvidia uses its AI systems to improve the performance of its latest H100 class of GPU chips. GPUs being essential to AI training, this is one example of how AI is starting to develop better AI. Figure 2.1.11 July 11, 2022 Meta Announces ‘No Language Left Behind’ No Language Left Behind (NLLB) is a family of models that can translate across 200 distinct languages. NLLB is one of the first systems that can perform well across a wide range of low-resource languages like Kamba and Lao. Figure 2.1.12 Aug 4, 2022 Tsinghua Researchers Launch GLM-130B Chinese researchers affiliated with Tsinghua University release GLM-130B, a large language model that outperforms others such as Meta’s OPT, Hugging Face’s BLOOM, and OpenAI’s original GPT-3. Aug 22, 2022 Figure 2.1.13 Stability AI Releases Stable Diffusion Stable Diffusion is an open-source text-to-image diffusion-based model, meaning users can freely use the model weights to generate their own images. Stable Diffusion is trained on existing images created by humans and gives no credit or acknowledgment, leaving open questions around the ethical use of image generators. Figure 2.1.14 Table of Contents Chapter 2 Preview 77 Artificial Intelligence Index Report 2023 Sept 21, 2022 Chapter 2: Technical Performance 2.1 What’s New in 2022: A Timeline OpenAI Launches Whisper Whisper is a large-scale speech-recognition system trained on roughly 700,000 hours of audio data and capable of respectable performance on various speech recognition tasks. The fact that Whisper required neither supervised pre-training nor unsupervised training with fine-tuning yet was able to achieve strong performance by merely increasing training data further validates the approach of increasingly scaling AI models. Sept 29, 2022 Figure 2.1.15 Meta Releases Make-A-Video Make-A-Video is a system that allows users to create videos from short text descriptions. The quality of the videos is high and again demonstrates the validity of the scaling approach. Figure 2.1.16 Oct 5, 2022 DeepMind Launches AlphaTensor AlphaTensor is an AI reinforcement-learningbased system able to discover new and efficient algorithms for matrix manipulation. Matrix manipulation is essential to a wide range of digital practices and is a process that researchers have been trying to make more efficient for decades. Figure 2.1.17 Table of Contents Chapter 2 Preview 78 Artificial Intelligence Index Report 2023 Oct 20, 2022 Chapter 2: Technical Performance 2.1 What’s New in 2022: A Timeline Google Uses PaLM to Improve the Reasoning of PaLM Google researchers use one of their existing language models, PaLM, to improve the reasoning of the very same model. This process is yet another example of AI systems using their own Figure 2.1.18 knowledge to improve. Nov 9, 2022 International Research Group Releases BLOOM A collaboration of over 100 researchers from across the globe develop an open-access language model called BLOOM. BLOOM impresses with its public release and for furthering Figure 2.1.19 the possibilities of international collaboration in AI research. Nov 16, 2022 Stanford Researchers Release HELM As part of an attempt to judge new language models according to more unified standards, Stanford researchers develop a new benchmarking approach for large language models called Holistic Evaluation of Language Models (HELM). The launch of HELM is evidence of the AI community’s attempt to develop transparency around increasingly powerful, capable, and influential large language models. Figure 2.1.20 Table of Contents Chapter 2 Preview 79 Artificial Intelligence Index Report 2023 Nov 22, 2022 Chapter 2: Technical Performance 2.1 What’s New in 2022: A Timeline Meta Releases CICERO CICERO is the first AI to play in the top 10% of human participants in the game Diplomacy. CICERO’s launch shows that AI systems have improved in strategic reasoning, a domain in which they have traditionally struggled, and are capable of effectively convincing humans to go along with their objectives. Nov 30, 2022 Figure 2.1.21 OpenAI Launches ChatGPT ChatGPT is an impressive, publicly usable chatbot capable of writing university-level essays. Months after launching, ChatGPT reaches 100 million monthly active users, making it the fastest-growing consumer application in history. ChatGPT’s release caps a year in which generative AI became a part of the zeitgeist, and raises Figure 2.1.22 questions about the effect that AI will have on the future of humanity. Table of Contents Chapter 2 Preview 80 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image Computer vision is the subfield of AI that teaches machines to understand images and videos. Computer vision technologies have a variety of important real-world applications, such as autonomous driving, crowd surveillance, sports analytics, and video-game creation. This section tracks progress in computer vision across several different task domains which include: (1) image classification, (2) face detection and recognition, (3) deepfake detection, (4) human pose estimation, (5) semantic segmentation, (6) medical image segmentation, (7) object detection, (8) image generation, and (9) visual reasoning. 2.2 Computer Vision—Image Image Classification Image classification is the ability of machines to A Demonstration of Image Classification Source: Krizhevsky et al., 2012 categorize objects in images (Figure 2.2.1). ImageNet ImageNet is one of the most widely used benchmarks for image classification. This dataset includes over 14 million images across 20,000 different object categories such as “strawberry” or “balloon.” Performance on ImageNet is measured through various accuracy metrics. Top-1 accuracy measures the degree to which the top prediction generated by an image classification model for a given image actually matches the image’s label. As of 2022, the best image classification system on ImageNet has a top-1 accuracy rate of 91.0% (Figure Figure 2.2.1 2.2.2). Although the current image classification capabilities of state-of-the-art systems is 27.7 percentage points better than a decade ago, last year saw a very marginal 0.1 percentage point improvement in classification accuracy. Table of Contents Chapter 2 Preview 81 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image ImageNet Challenge: Top-1 Accuracy Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report 91.00%, With Extra Training Data 90% 88.50%, Without Extra Training Data Top-1 Accuracy (%) 85% 80% 75% 70% 65% 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 2.2.2 Face Detection and Recognition A Demonstration of Face Detection and Recognition Source: Forbes, 2020 Facial detection and recognition is the ability of AI systems to identify faces or individuals in images or videos (Figure 2.2.3). Currently, many facial recognition systems are able to successfully identify close to 100% of faces, even on challenging datasets (Figure 2.2.4). Figure 2.2.3 Table of Contents Chapter 2 Preview 82 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image National Institute of Standards and Technology (NIST) Face Recognition Vendor Test (FRVT): Veri cation Accuracy by Dataset Source: National Institute of Standards and Technology, 2022 | Chart: 2023 AI Index Report 1.0000 False Non-Match Rate: FMNR (Log-Scale) 0.5000 0.2000 0.1000 0.0500 0.0297, WILD Photos @ FMR = 1e-5 0.0200 0.0100 0.0050 0.0032, BORDER Photos @ FMR = 1e-6 0.0021, MUGSHOT Photos @ FMR = 1e-5 0.0019, MUGSHOT Photos ≥ 12 YRS @ FMR = 1e-5 0.0016, VISABORDER Photos @ FMR = 1e-6 0.0020 0.0010 0.0006, VISA Photos @ FMR = 1e-6 0.0005 2017 2018 2019 2020 2021 2022 Figure 2.2.4 National Institute of Standards and Technology Face Recognition Vendor Test (FRVT) others. Facial detection capacity is measured by the Progress on facial recognition can be tracked match the face in an image to that of a person. through the National Institute of Standards and Technology’s Face Recognition Vendor Test. This test tracks how well different facial recognition algorithms perform on various homeland security tasks, such as identification of child trafficking false non-match rate (FNMR), otherwise known as error rate, which is the rate at which a model fails to As of 2022, the top-performing models on all of the FRVT datasets, with the exception of WILD Photos, each posted an error rate below 1%, and as low as a 0.06% error rate on the VISA Photos dataset. victims and cross-verification of visa images, among Table of Contents Chapter 2 Preview 83 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image Deepfake Detection Real-Life Deepfake: President Zelenskyy Calling for the Surrender of Ukrainian Soldiers The ability of AI systems to create synthetic images Source: NPR, 2022 that are sometimes indistinguishable from real ones has led to the creation of deepfakes, images or videos that appear to be real but are actually fake. In the last year, there was a widely circulated deepfake video of Ukrainian president Volodymyr Zelenskyy surrendering (Figure 2.2.5). Celeb-DF Celeb-DF is presently one of the most challenging Figure 2.2.5 deepfake detection benchmarks. This dataset is composed of 590 original celebrity YouTube algorithm on Celeb-DF came from researchers at videos that have been manipulated into thousands Deakin University in Australia. Their JDFD model of deepfakes. This year’s top deepfake detection posted an AUC score of 78 (Figure 2.2.6). Celeb-DF: Area Under Curve Score (AUC) Source: arXiv, 2022 | Chart: 2023 AI Index Report 78.00 Area Under Curve Score (AUC) 75 70 65 2018 2019 2020 2021 2022 Figure 2.2.6 Table of Contents Chapter 2 Preview 84 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image Human Pose Estimation A Demonstration of Human Pose Estimation Source: Cong et al., 2022 Human pose estimation is the task of estimating the position of the human body from images (Figure 2.2.7). MPII MPII is a dataset of over 25,000 annotated images which contains annotations of more than 40,000 people doing 410 human activities. On MPII, this year’s top model, Figure 2.2.7 ViTPose, correctly estimated 94.3% of keypoints (human joints), which represented a small 0.2 percentage point increase from the previous state-of-the-art result posted in 2020 (Figure 2.2.8). MPII: Percentage of Correct Keypoints (PCK) Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report 95% Percentage of Correct Keypoints (PCK) 94.30% 90% 85% 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 2.2.8 Table of Contents Chapter 2 Preview 85 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image Semantic Segmentation A Demonstration of Semantic Segmentation Source: Cityscapes Dataset, 2022 Semantic segmentation involves assigning individual image pixels to specific categories (for example, human, bicycle, or street) (Figure 2.2.9). Cityscapes Challenge, Pixel-Level Semantic Labeling Task The Cityscapes dataset is used to test the semantic segmentation capabilities of AI. This dataset contains 25,000 annotated images of diverse urban Figure 2.2.9 environments. The Cityscapes dataset enables a variety of different segmentation tasks. One of the greater the mIoU, the better a system has performed. most popular is the pixel-level task. Performance on semantic segmentation is measured by mean Performance on Cityscapes has increased by 23.4 intersection-over-union (mIoU), which represents the percentage points since the competition launched in degree to which the image segments predicted by the 2014; however, it has plateaued in the last few years model overlap with the image’s actual segments. The (Figure 2.2.10). Cityscapes Challenge, Pixel-Level Semantic Labeling Task: Mean Intersection-Over-Union (mIoU) Source: Cityscapes Challenge, 2022 | Chart: 2023 AI Index Report 86.46%, With Extra Training Data Mean Intersection-Over-Union (mIoU) 85% 84.30%, Without Extra Training Data 80% 75% 70% 65% 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 2.2.10 Table of Contents Chapter 2 Preview 86 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image Medical Image Segmentation A Demonstration of Medical Imaging Segmentation Source: Jha et al., 2019 In medical image segmentation, AI systems segment objects such as lesions or organs in medical images (Figure 2.2.11). Kvasir-SEG Kvasir-SEG is a dataset for medical image Figure 2.2.11 segmentation that contains 1,000 highquality images of gastrointestinal polyps that were manually identified by medical professionals. Progress on Kvasir-SEG is measured in mean Dice, which represents the degree to which the polyp segments This year’s top-performing model on Kvasir-SEG, SEP, was identified by AI systems overlap with the created by a Chinese researcher and posted a mean Dice of actual polyp segments.1 94.1% (Figure 2.2.12). Kvasir-SEG: Mean Dice Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report 94.11% Mean Dice 90% 85% 2019 2020 2021 2022 Figure 2.2.12 1 Mean Dice and mIoU are in principle quite similar. This StackExchange post outlines the differences in more detail. Table of Contents Chapter 2 Preview 87 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image Object Detection A Demonstration of Object Detection Source: Rizzoli, 2023 The challenge of identifying and localizing objects within an image or video is known as object detection (Figure 2.2.13). Common Objects in Context (COCO) Microsoft’s Common Objects in Context (COCO) object detection dataset has over 80 object categories in 328,000 images. Several accuracy metrics are used to measure progress on COCO. This section considers mean average precision (mAP50). Since 2015, state-of-the-art detectors have improved by 26 percentage points. The top model Figure 2.2.13 in 2022, EVA, was the result of a Chinese academic research collaboration. COCO: Mean Average Precision (mAP50) Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report 81.90% Mean Average Precision (mAP50) 80% 70% 60% 2015 2016 2017 2018 2019 2020 2021 2022 Figure 2.2.14 Table of Contents Chapter 2 Preview 88 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image Which Face Is Real? Image Generation Source: Which Face Is Real?, 2022 Image generation is the task of generating images that are indistinguishable from real ones. In the last decade, progress on image generation has tremendously increased, so much so that now it would be difficult for the average person to distinguish a real human face from one synthetically generated by AI (Figure 2.2.15). CIFAR-10 and STL-10 Figure 2.2.15 CIFAR-10 and STL-10 are two popular benchmarks for tracking progress on image generation. CIFAR-10 comprises 60,000 color images across 10 different set of images is similar to the real images on which it was trained. object classes; STL-10 is inspired by CIFAR-10, with This year saw state-of-the-art results on both CIFAR-10 some modifications, including fewer labeled training and STL-10 benchmarks (Figure 2.2.15). The top examples and more unlabeled examples. Progress on model on CIFAR-10, EDM-G++, came from Korean image generation in both benchmarks is measured researchers at KAIST. The top model on STL-10 was by the Fréchet Inception Distance (FID) score, which Diffusion-GAN, a collaboration between researchers at reflects the degree to which a synthetically generated the University of Texas at Austin and Microsoft. CIFAR-10 and STL-10: Fréchet Inception Distance (FID) Score Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report 35 Fréchet Inception Distance (FID) Score 30 25 20 15 10 6.91, STL-10 5 1.77, CIFAR-10 0 2017 2018 2019 2020 2021 2022 Figure 2.2.16 Table of Contents Chapter 2 Preview 89 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.1 Computer Vision–Image Narrative Highlight: A Closer Look at Progress in Image Generation Figure 2.2.17 tracks the progress of facial GAN Progress on Face Generation image generation over time, with the final Source: Goodfellow et al., 2014; Radford et al., 2016; Liu and Tuzel, 2016; Karras et al., 2018; Karras et al., 2019; Goodfellow, 2019; Karras et al., 2020; Vahdat et al., 2021; Wang et al., 2022. image being generated by Diffusion-GAN, the model that posted the 2022 state-ofthe-art score on STL-10. 2014 2015 2016 2017 2018 2020 2021 2022 Figure 2.2.17 In the last year, text-to-image generation broke into the public Images Generated by DALL-E 2, Stable Diffusion and Midjourney Source: AI Index, 2022 consciousness with the release of models such as OpenAI’s DALL-E 2, Stability AI’s Stable Diffusion, Midjourney’s Midjourney, Meta’s Make-AScene, and Google’s Imagen. a. DALL-E 2 With these systems, users can generate images based on a text prompt. Figure 2.2.18 juxtaposes the images generated by DALL-E 2, Stable Diffusion, and Midjourney, three publicly accessible AI text-to-image systems, for the same prompt: “a panda playing a piano on a warm evening in Paris.” Table of Contents b. Stable Diffusion Chapter 2 Preview c. Midjourney Figure 2.2.18 90 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.1 Computer Vision–Image Narrative Highlight: A Closer Look at Progress in Image Generation (cont’d) Of all the recently released text-to-image generators, Google’s Imagen performs best on the COCO benchmark (Figure 2.2.19)2. This year, the Google researchers who created Imagen also released a more difficult text-to-image benchmark, DrawBench, designed to challenge increasingly capable text-to-image models. Notable Text-to-Image Models on MS-COCO 256 × 256 FID-30K: Fréchet Inception Distance (FID) Score Trained on COCO-FID 35.49 Not Trained on COCO-FID 32.64 30 21.42 20.79 20 17.89 12.24 2021 10.39 Make-A-Scene DALL-E 2 7.55 7.27 Imagen 8.12 LAFITE 2020 GLIDE 2019 DALL-E DF-GAN 2017 DM-GAN + CL DM-GAN 0 XMC-GAN 9.33 10 AttnGAN COCO Fréchet Inception Distance (FID) Score Source: Saharia et al., 2022 | Chart: 2023 AI Index Report 2022 Model Figure 2.2.19 2 The COCO benchmark, first launched in 2014, includes 328,000 images with 2.5 million labeled instances. Although it is typically used for object detection tasks, researchers have also deployed it for image generation. Table of Contents Chapter 2 Preview 91 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image Visual Reasoning Visual reasoning tests how well AI systems can reason across both textual and visual data, as in the examples of Figure 2.2.20. A Collection of Visual Reasoning Tasks Source: Agrawal et al., 2016 Figure 2.2.20 Visual Question Answering (VQA) Challenge The Visual Question Answering Challenge tests AI systems with open-ended textual questions about images. Successfully answering the questions requires that AI systems possess vision, language, and commonsense reasoning capabilities. This section reports progress on the VQA V2 dataset. This year the top-performing model on VQA V2 was PaLI, a multimodal model produced by Google researchers (Figure 2.2.21). Visual Question Answering (VQA) V2 Test-Dev: Accuracy Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report 85% 84.30% 80.78%, Human Baseline Accuracy (%) 80% 75% 70% 65% 2016 2017 2018 2019 2020 2021 2022 Figure 2.2.21 Table of Contents Chapter 2 Preview 92 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.1 Computer Vision–Image Narrative Highlight: The Rise of Capable Multimodal Reasoning Systems Traditionally AI has been strong in narrow tasks, models were introduced, for example BEiT-3 from but it has been unable to easily generalize across Microsoft and PaLI from Google, that posted state- multiple domains. For instance, many image of-the-art results across a variety of both vision and classifiers are adept at classifying images but are language benchmarks. For example, at the time of incapable of understanding written text. publication of the BEiT-3 paper, BEiT-3 posted state- However, recent technical progress in AI has begun to challenge this notion. In 2022, several of-the-art results for four different vision skills and five different vision-language skills (Figure 2.2.22). BEiT-3Vs. Vs. Previous State-of-the-Art State-of-the-Art Models Models BEiT-3 BEiT-3 Source: Wang et Previous al., 2022 | Table: 2023 AI Index Report Source: Wang Wang et et al., al., 2022 2022 || Table: Table: 2023 2023 AI Index Report Source: Category Category Category Task Task Task Dataset Dataset Metric Metric Previous SOTA Previous SOTA Vision Vision Vision Semantic Semantic Semantic Segmentation Segmentation Segmentation ADE20K ADE20K mIoU mIoU 61.40 FD-SwimV2 61.40 FD-SwimV2 62.80 62.80 62.80 2.28% 2.28% 2.28% Vision Vision Vision Object Object Object Detection Detection Detection COCO COCO AP AP 63.30 DINO 63.30 DINO 63.70 63.70 63.70 0.63% 0.63% 0.63% Vision Vision Vision Instance Instance Instance Segmentation Segmentation Segmentation COCO COCO AP AP 54.70 Mask DINO 54.70 Mask DINO 54.80 54.80 54.80 0.18% 0.18% 0.18% Vision Vision Vision Image Image Image Classi cation Classi cation Classi ImageNet ImageNet Top-1 Accuracy Top-1 Accuracy 89.00 FD-CLIP 89.00 FD-CLIP 89.60 89.60 89.60 0.67% 0.67% 0.67% Vision-Language Visual Vision-Language Visual Vision-Language Visual Reasoning Reasoning Reasoning Vision-Language Visual QA Vision-Language Visual Visual QA Vision-Language Vision-Language Image Vision-Language Image Vision-Language Image Captioning Captioning Captioning NLVR NLVR Accuracy Accuracy 87.00 CoCA 87.00 CoCA 92.60 92.60 92.60 6.44% 6.44% 6.44% VQAv2 VQAv2 COCO COCO VQA Accuracy VQA Accuracy CIDEr CIDEr 82.30 CoCA 82.30 CoCA 145.30 OFA 145.30 OFA 84.00 84.00 84.00 147.60 147.60 147.60 2.07% 2.07% 2.07% 1.58% 1.58% 1.58% Vision-Language Finetuned Vision-Language Finetuned Vision-Language Finetuned Retrieval Retrieval Retrieval COCO COCO Flickr30K Flickr30K R@1 R@1 72.50 Florence 72.50 Florence 76.00 76.00 76.00 4.83% 4.83% 4.83% Vision-Language Zero-Shot Vision-Language Zero-Shot Vision-Language Zero-Shot Retrieval Retrieval Retrieval Flickr30K Flickr30K Flickr30K R@1 R@1 R@1 86.50 CoCA 86.50 86.50 CoCA CoCA 88.20 88.20 88.20 1.97% 1.97% 1.97% Model of Model of Previous SOTA Previous SOTA BEiT-3 BEiT-3 Scale of Scale Scale of of Improvement Improvement Improvement Figure 2.2.22 Table of Contents Chapter 2 Preview 93 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.1 Computer Vision–Image Narrative Highlight: The Rise of Capable Multimodal Reasoning Systems (cont’d) Figure 2.2.23 shows some of the different vision-language tasks challenging multimodal systems like PaLI and BEiT-3. A Collection of Vision-Language Tasks Source: Chen et al., 2022 Figure 2.2.23 Table of Contents Chapter 2 Preview 94 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.2 Computer Vision—Image Visual Commonsense Reasoning (VCR) example of a question posed in VCR. Performance on The Visual Commonsense Reasoning challenge, first VCR is tracked in the Q->AR score, which combines launched in 2019, is a relatively new benchmark in the ability of machines to select the right answer which AI systems must answer questions presented for the question (Q->A) and the ability to select the from images, as in VQA, but also select the reasoning correct rationale behind the answer (Q->R). behind their answer choices. Figure 2.2.24 shows an A Sample Question from the Visual Commonsense Reasoning (VCR) Challenge Source: Zellers et al., 2018 Figure 2.2.24 VCR is one of the few visual benchmarks considered in this report on which AI systems have yet to surpass human performance, as shown in Figure 2.2.25. Visual Commonsense Reasoning (VCR) Task: Q->AR Score Source: VCR Leaderboard, 2022 | Chart: 2023 AI Index Report 85.00, Human Baseline 80 75.60 Q->AR Score 70 60 50 2018 2019 2020 2021 2022 Figure 2.2.25 Table of Contents Chapter 2 Preview 95 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.3 Computer Vision—Video Video analysis concerns reasoning or task operation across videos, rather than single images. 2.3 Computer Vision—Video Activity Recognition Kinetics-400, Kinetics-600, Kinetics-700 Activity recognition is the categorization of activities series of datasets for benchmarking video activity that occur in videos. Certain activities, such as recognition. Each dataset includes 650,000 large- sitting, sleeping, or walking, are easier for AI systems scale, high-quality video clips from YouTube that display to categorize than others which involve multiple a wide range of human activities, and each asks AI steps—for example, preparing dinner. systems to classify an action from a possible set of 400, Kinetics-400, Kinetics-600, and Kinetics-700 are a 600, and 700 categories, respectively (Figure 2.3.1). Example Classes From the Kinetics Dataset Source: Kay et al., 2017 Figure 2.3.1 Table of Contents Chapter 2 Preview 96 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.3 Computer Vision—Video As of 2022, there is a 7.8 percentage point gap in performance between the top system on Kinetics-600 and Kinetics-700, which suggests the 700 series dataset is still a meaningful challenge for video computer vision researchers (Figure 2.3.2). Kinetics-400, Kinetics-600, Kinetics-700: Top-1 Accuracy Source: Papers With Code, 2021; arXIv, 2022 | Chart: 2023 AI Index Report 91.80%, Kinetics-600 91.10%, Kinetics-400 90% Top-1 Accuracy (%) 84.00%, Kinetics-700 80% 70% 60% 2016 2017 2018 2019 2020 2021 2022 Figure 2.3.2 Table of Contents Chapter 2 Preview 97 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.3 Computer Vision—Video Narrative Highlight: A Closer Look at the Progress of Video Generation Multiple high quality text-to-video models, In September 2022, CogVideo’s top score was AI systems that can generate video clips from significantly surpassed by Meta’s Make-A-Video prompted text, were released in 2022 . In May, model (Figure 2.3.3). Make-A-Video performed researchers from Tsinghua University and the 63.6% better on UCF-101 than CogVideo. And, in Beijing Academy of Artificial Intelligence released October 2022, Google released a text-to-video CogVideo, a model that posted the then-highest system called Phenaki; however, this model was inception score on the UCF-101 benchmark for not benchmarked on UCF-101. 3 text-to-video generation (Figure 2.3.3). Notable Text-to-Video Models on UCF-101: Inception Score (IS) Source: Hong et al., 2022; Singer et al., 2022 | Chart: 2023 AI Index Report 79.28 80 82.55 70 Inception Score (IS) 60 50.46 50 40 32.7 32.36 30 27.38 28.87 24.69 20 10 0 DVD-GAN 2019 TGANv2 2020 VideoGPT MoCoGAN-HD 2021 DIGAN CogVideo TATS-base Make-A-Video 2022 Model Figure 2.2.3 3 Although these models are impressive, it is worth noting that they are thus far only capable of generating videos of a few seconds’ duration. Table of Contents Chapter 2 Preview 98 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language Natural language processing (NLP) is the ability of computer systems to understand text. The last few years have seen the release of increasingly capable “large language models,” AI systems like PaLM, GPT-3, and GLM-130B, that are trained on massive amounts of data and adaptable to a wide range of downstream tasks. In this section, progress in NLP is tracked across the following skill categories: (1) English language understanding, (2) text summarization, (3) natural language inference, (4) sentiment analysis, (5) multitask language understanding, and (6) machine translation. 2.4 Language English Language Understanding SuperGLUE English language understanding challenges AI of AI models on eight different linguistic tasks. systems to understand the English language in A selection of these tasks is highlighted in Figure various ways: reading comprehension, yes/no 2.4.1. Their performance is then aggregated into a reading comprehension, commonsense reading single metric. SuperGLUE is a comprehensive English language understanding benchmark that tracks the progress comprehension, and logical reasoning. A Set of SuperGLUE Tasks4 Source: Wang et al., 2019 Figure 2.4.1 4 For the sake of brevity, this figure only displays four of the eight tasks. Table of Contents Chapter 2 Preview 99 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language This year’s top model on SuperGLUE, Vega, registered a new state-of-the-art score of 91.3, which is 1.5 percentage points higher than the human baseline. Performance on SuperGLUE is continuing to saturate. SuperGLUE: Score Source: SuperGLUE Leaderboard, 2022 | Chart: 2023 AI Index Report 91.30 91 90 89.80, Human Baseline Score 89 88 87 86 85 2019 2020 2021 2022 Figure 2.4.2 Reading Comprehension Dataset Requiring Logical Reasoning (ReClor) In response to the saturation of traditional reading comprehension benchmarks, researchers from the National University of Singapore launched ReClor in 2020. ReClor, or Reading Comprehension Dataset Requiring Logical Reasoning, is a dataset of logical reasoning questions taken from the LSAT, the entrance exam for law schools in the United States and Canada. A sample question is shown in Figure 2.4.3 A Sample Question from the Reading Comprehension Dataset Requiring Logical Reasoning (ReClor) Source: Yu et al., 2020 Context: When a certain gland becomes cancerous in humans, it produces high levels of a particular protein. A blood test can determine the level of this protein well before a cancer of the gland could be detected by other means. Some doctors recommend that aggressive anticancer treatment should be begun as early as possible for anyone who is tested and is found to have high levels of the protein. Question: Which one of the following, if true, most seriously weakens the doctors’ recommendation? A. The blood test for the protein has been in use for some time to monitor the condition of patients who have been diagnosed as having cancer of the gland. B. Before the blood test became available, about one-third of all cases of cancer of the gland were detected in early stages. C. So far, no patients whose protein levels were found to be normal have subsequently developed cancer of the gland. D. Enlargement of the gland, a common condition infrequently associated with cancer, results in high levels of the protein. Figure 2.4.3 Table of Contents Chapter 2 Preview 100 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language Figure 2.4.4 examines progress on ReClor. The top 2022 result of 80.6% represented an 18 percentage point improvement from 2020, the year the benchmark was released. Reading Comprehension Dataset Requiring Logical Reasoning (ReClor): Accuracy Source: ReClor Leaderboard, 2022; Papers With Code, 2022 | Chart: 2023 AI Index Report 80.60% 80% Accuracy (%) 75% 70% 65% 2020 2021 2022 Figure 2.4.4 Table of Contents Chapter 2 Preview 101 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language Narrative Highlight: Just How Much Better Have Language Models Become? The AI Index tested how three large language models from three different years, GPT-2 (2019), GPT-3 (2020), and ChatGPT (2022), handle the same prompt: “Explain to me the major accomplishments of Theodore Roosevelt’s presidency.” More recent models are able to answer this question more effectively, both in terms of factual accuracy and quality of writing. GPT-2 (February 2019)5 GPT-3 (June 2020)7 ChatGPT (November 2022) (The most notable is the establishment of a national telephone and mail exchange company.) Theodore Roosevelt’s presidency was marked by a number of accomplishments. He is credited with leading the United States out of the Great Depression and into World War II. He also spearheaded the creation of the National Park Service and the National Wildlife Federation. Roosevelt also played a major role in the development of the Panama Canal. Theodore Roosevelt, President from 1901-1909, accomplished the following major accomplishments during his presidency: Commentary: This answer is an improvement over that of GPT-2 in that it actually somewhat responds to the question, but still shows significant flaws. GPT-3 is correct in asserting that Theodore Roosevelt was instrumentally involved in the development of the Panama Canal. However, he did not lead the United States out of the Great Depression and into World War II—that was done by a different American president, Franklin Delano Roosevelt. It appears that GPT-3 confused some of Franklin Delano Roosevelt’s accomplishments with those of Theodore Roosevelt. Finally, although Theodore Roosevelt did play a significant role in establishing many national parks, he did not in fact create the National Park Service or the National Wildlife Federation. The National Park Service was created in 1916 by Woodrow Wilson, and the National Wildlife Federation in 1936, much later than Theodore Roosevelt’s presidency. 8 3. Labor Reforms: Improved working conditions for Americans through advocacy for better hours, wages, and workplace safety. But the only things that will change with a president who knows how to operate a business is his understanding of the American system of government. The public knows all about our problems, yet at least half of the people who need it won’t know anything about ours. We can no longer have this. Our system of government is the only one in which you really need a public relations firm to work hard. The system of governmental control, while working, is not always effective in ensuring that the very things that our economy needs, do not always be at the top of everybody’s list. So I wanted to know how successful the system you employ for managing the public’s needs is in helping the American economy grow, how efficient the service our economy provides is being made, and how it can better serve the needs of a growing number of Americans. I wanted to know. This was before I ever hired my own lawyer to work on the case. This was before I ever hired my own law firm to handle government investigations. For my first major job, at least, I had to convince two top aides to get paid $90,000 for the role, and before I even started on the case, I had to convince one top staffer to get paid even $100,000 to have an interview with my lead prosecutor...6 Commentary: Most of this answer is gibberish. GPT-2 is able to respond with sentences that are largely grammatical, but fails to actually answer the question in any meaningful way. 1. Conservation: Established numerous national parks, monuments, and forests, including the Grand Canyon National Park. 2. Trust-busting: Took on monopolies, such as Standard Oil, through antitrust laws and broke up large corporations. 4. Progressive Reforms: Promoted progressive policies, including regulation of big business and protection of natural resources. 5. National Infrastructure: Improved national infrastructure through construction of the Panama Canal and funding for dam projects. 6. Diplomacy: Negotiated the end of the RussoJapanese War, resulting in his being awarded the Nobel Peace Prize. 7. Foreign Policy: Established the Roosevelt Corollary, asserting U.S. intervention in Latin America to maintain stability and protect American interests. Commentary: ChatGPT’s response is the strongest of the three. All but one of the facts about Theodore Roosevelt’s presidency are correct. Roosevelt proclaimed the Grand Canyon Game Preserve in 1906 and designated it a national monument in 1908. However, the Grand Canyon National Park was itself created by Woodrow Wilson.9 Otherwise the response is factually accurate. The response is also more thorough and concise than the answers provided by GPT-2 and GPT-3. 5 GPT-2 used the 124M parameter model downloaded from OpenAI’s GitHub page. 6 The complete answer outputted by GPT-2 is trimmed here for brevity. The full answer is included in the Appendix. 7 The specific GPT-3 model that was used was text-curie-001, which has training data up to October 2019. 8 The information in this section has been cross-verified with the Encyclopedia Britannica entries on Theodore Roosevelt, Franklin Delano Roosevelt, Woodrow Wilson, and the National Park Service, as well as the history page of the National Wildlife Federation. 9 Information on the history of the Grand Canyon National Park was cross-verified with the Wikipedia entry on the Grand Canyon National Park. Table of Contents Chapter 2 Preview 102 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language Narrative Highlight: Planning and Reasoning in Large Language Models As illustrated above, AI systems have become The authors then tested notable language models increasingly strong on a wide range of reasoning on these tasks in a Blocksworld problem domain, tasks. This improvement has led many to claim that a problem environment where agents are given emerging AI systems, especially large language blocks of different colors and tasked with arranging models, possess reasoning abilities that are these blocks in particular orders. The authors somewhat similar to those possessed by humans. demonstrated that these large language models Other authors, however, have argued otherwise. performed fairly ineffectively (Figure 2.4.5). While 10 11 GPT-3, Instruct-GPT3, and BLOOM demonstrated In 2022, researchers (Valmeekam et al., 2022) introduced a more challenging planning and reasoning test for large language models that consists of seven the ability, in some contexts, to reformulate goals in robust ways, they struggled with other tasks like plan generation, optimal planning, and plan reuse. assignments: (1) plan generation, (2) cost-optimal Compared to humans, the large language models planning, (3) reasoning about plan execution, (4) performed much worse, suggesting that while they robustness to goal reformulation, (5) ability to reuse are capable, they lack human reasoning capabilities. plans, (6) replanning, and (7) plan generalization.12 Select Large Language Models on the Blocksworld Domain: Instances Correct Source: Valmeekam et al., 2022 | Chart: 2023 AI Index Report 0.6% Plan Generation GPT-3 5.0% Instruct-GPT3 0.5% Optimal Planning BLOOM 0.2% 3.2% 0% 5.6% 4.8% 3.0% Replanning 6.6% Plan Generalization 9.8% 11.0% 0% Plan Reuse 14.4% 0% 77.4% 76.8% Robustness to Goal Reformulation (Shu ing Goal Predicates) 21.0% 69.2% Robustness to Goal Reformulation (Full → Partial) Robustness to Goal Reformulation (Partial → Full) 0% 76.0% 9.0% 22.0% 60.2% 5.0% 10% 20% 30% 40% 50% Instances Correct (%) 60% 10 Some of the papers that claim language models can reason include: Kojima et al., 2022; Chowdhery et al., 2022; Li et al., 2021; Wei et al., 2022. 11 Valmeekam et al., 2022 advances this claim. 12 A complete description of these tasks can be found in the paper. Table of Contents Chapter 2 Preview 70% 80% Figure 2.4.5 103 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language Text Summarization Text summarization tests how well AI systems can arXiv and PubMed synthesize a piece of text while capturing its core ArXiv and PubMed are two widely used datasets for content. Text summarization performance is judged benchmarking text summarization. The model that on ROUGE (Recall-Oriented Understudy for Gisting posted the state-of-the-art score in 2022 on both Evaluation), which measures the degree to which arXiv and PubMed, AdaPool, was developed by a an AI-produced text summary aligns with a human team from Salesforce Research (Figure 2.4.6). reference summary. ArXiv and PubMed: ROUGE-1 Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report 51.05, PubMed 50.95, arXiv 50 ROUGE-1 45 40 35 2017 2018 2019 2020 2021 2022 Figure 2.4.6 Table of Contents Chapter 2 Preview 104 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language Natural Language Inference uncertain premises. Imagine, for example, that Peter Also known as textual entailment, natural language window shattered and his laptop, which he left in the inference is the ability of AI systems to determine whether a hypothesis is true, false, or undetermined based on presented premises. Abductive Natural Language Inference (aNLI) Abductive natural language inference is a form of natural language inference in which plausible conclusions must be drawn from a set of limited and returns to his car after dinner at a restaurant to find the back seat, missing. He might immediately conclude that a thief broke into his car and stole the laptop. In 2019, the Allen Institute for AI launched aNLI, a comprehensive benchmark for abductive natural language inference that includes 170,000 premise and hypothesis pairs (Figure 2.4.7). Sample Question From the Abductive Natural Language Inference Benchmark (aNLI) Source: Allen Institute for AI, 2021 Figure 2.4.7 Abductive natural language inference is a challenging task. The human baseline remained unsurpassed until 2022, when an AI system registered a score of 93.7% (Figure 2.4.8). Abductive Natural Language Inference (aNLI): Accuracy Source: Allen Institute for AI, 2022 | Chart: 2023 AI Index Report 94% 93.65% 92.90%, Human Baseline Accuracy (%) 92% 90% 88% 86% 84% 2019 2020 2021 2022 Figure 2.4.8 Table of Contents Chapter 2 Preview 105 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language Sentiment Analysis A Sample Sentence from SST Source: Socher et al., 2013 Sentiment analysis applies NLP techniques to identify the sentiment of a particular text. It is used by many businesses to better understand customer reviews. SST-5 Fine-Grained Classification The Stanford Sentiment Treebank (SST) is a dataset of 11,855 single sentences taken from movie reviews that are then transformed into 215,154 unique phrases whose sentiments have been annotated by human judges (Figure 2.4.9). Figure 2.4.9 A new state-of-the-art score of 59.8% was posted on SST-5 fine-grained classification by the Heinsen Routing + RoBERTa Large model (Figure 2.4.10). SST-5 Fine-Grained: Accuracy Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report 60% 59.80% Accuracy (%) 55% 50% 45% 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 2.4.10 Table of Contents Chapter 2 Preview 106 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language Multitask Language Understanding Sample Questions From MMLU Source: Hendrycks et al., 2021 a) Sample Math Questions A common criticism of language benchmarks such as GLUE and SuperGLUE is that they do not accurately test how capable language models are at applying the knowledge they learn across different domains.13 Multitask language understanding tests the ability of language models to reason across specialized subject domains. Massive Multitask Language Understanding (MMLU) b) A Sample Microeconomics Question Massive Multitask Language Understanding (MMLU) evaluates models in zero-shot or few-shot settings across 57 diverse subjects in the humanities, STEM, Figure 2.4.11 and the social sciences (Figure 2.4.11). Gopher, Chinchilla, and variants of PaLM have each posted state-of-the-art results on MMLU. The current top result on MMLU comes from Flan-PaLM, a Google model that reports an average score of 75.2% (Figure 2.4.12). MMLU: Average Weighted Accuracy Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report 75.20% 70% Accuracy (%) 60% 50% 40% 30% 2019 2020 2021 2022 Figure 2.4.12 13 This criticism is more formally articulated in Hendrycks et al., 2021. Table of Contents Chapter 2 Preview 107 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.4 Language Machine Translation (MT) Number of Commercially Available MT Systems Machine translation studies how well AI software The popularity of AI-based machine translation is can translate languages. In the last five years, manifested in the number of commercial machine machine translation has been dominated by neural translation services on the market. Since 2017, the total networks which power current tools like DeepL and number of independent machine translation services Google Translate. has increased six times (Figure 2.4.13). Number of Independent Machine Translation Services Source: Intento, 2022 | Chart: 2023 AI Index Report Commercial 80 Open Source Pre-trained Number of Independent Machine Translation Services Preview 70 60 54 50 46 40 34 45 30 26 21 20 10 0 23 28 16 9 10 8 9 2017-May 2017-Jul 12 13 9 10 2017-Nov 2018-Mar 18 38 21 23 15 2018-Jul 2018-Dec 2019-Jun 2019-Nov 6 5 5 2020-Jul 2021-Sep 2022-Jul Figure 2.4.13 Table of Contents Chapter 2 Preview 108 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.5 Speech AI systems that work with human speech are usually tasked with converting spoken words into text and recognizing the individuals speaking. 2.5 Speech Speech Recognition task of matching certain speech with a particular Speech recognition is the ability of AI systems to been expanded; however, the data in this subsection identify spoken words and convert them into text. Speech recognition has progressed so much so that nowadays many computer programs or texting apps are equipped with dictation devices that can seamlessly transcribe speech into writing. individual. Over the years, the VoxCeleb dataset has tracks progress on the original dataset. This year’s top result on the original VoxCeleb dataset was posted by American researchers, whose model achieved an equal error rate of 0.1%, which represents a 0.28 percentage point decrease from the state-of– VoxCeleb VoxCeleb is a large-scale audiovisual dataset of human speech for speaker recognition, which is the the-art result achieved by Chinese researchers in the previous year (Figure 2.5.1). VoxCeleb: Equal Error Rate (EER) Source: VoxCeleb, 2022 | Chart: 2023 AI Index Report 8% Equal Error Rate (EER) 6% 4% 2% 0.14% 0% 2017 2018 2019 2020 2021 2022 Figure 2.5.1 Table of Contents Chapter 2 Preview 109 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.5 Speech Narrative Highlight: Whisper One of the major themes in the last few years of AI progress has been the emergence of large language models that are trained on massive amounts of data and capable of executing a diverse range of tasks. In 2022, this idea of training on large data to achieve cross-domain performance arrived in the world of speech recognition with OpenAI’s launch of Whisper. Whisper is a large-scale speech recognition model that was trained in a weakly supervised way on 700,000 hours of audio data. Whisper was capable of strong, although not state-of-the-art, performance on many speech recognition tasks in zero-shot settings.14 Whisper outperformed wav2vec 2.0 Large, another speech recognition model, across a wide range of popular English speech recognition benchmarks (Figure 2.5.2). Similarly, Whisper proved to be a better speech translator than many other leading AI translator models (Figure 2.5.3). Whisper also outperformed other commercial automated speech recognition systems and scored similarly to top human transcription services (Figure 2.5.4).15 Despite this impressive performance, there were still some speech tasks, like language identification, on which Whisper trailed state-of-the-art models (Figure 2.5.5). wav2vec 2.0 Large (No LM) Vs. Whisper Large V2 Across Datasets Notable Models on X→EN Subset of CoVoST 2 Source: Radford et al., 2022 | Chart: 2023 AI Index Report Source: Radford et al., 2022 | Chart: 2023 AI Index Report 6.2% 5.2% LibriSpeech Other Zero-Shot Whisper 36.4% 7.7% 3.9% WSJ 34.8% CallHome 17.6% MAESTRO 25.2% mSLAM-CTC (2B) 24.8% 28.3% Switchboard 13.8% 37.0% AMI IHM 16.9% 35.6% CORAAL 16.2% 17.9% VoxPopuli En 7.3% 65.8% CHiME-6 25.5% 10.5% TED-LIUM 4.0% FLEURS En 4.4% XLS-R (2B) 22.1% 14.6% 29.9% Common Voice 9.0% 24.5% Artie LibriSpeech Clean 29.1% 67.6% AMI SDM1 6.2% 0% wav2vec 2.0 Large (No LM) XMEF-X 14.7% Whisper Large V2 2.7% 2.7% 10% 20% 30% 40% 50% Word Error Rate (%) 60% 70% Figure 2.5.2 0% 10% 20% 30% Bilingual Evaluation Understudy (BLEU) Score Figure 2.5.3 14 Zero-shot learning refers to the ability of an AI system to learn a particular task without being trained on that task. 15 Kincaid46 is a dataset of 46 audio files and transcripts that were published in the blog post, “Which automatic transcription service is the most accurate?—2018.” Table of Contents Chapter 2 Preview 110 Artificial Intelligence Index Report 2023 Narrative Highlight: Chapter 2: Technical Performance 2.5 Speech Whisper (cont’d) Notable Speech Transcription Services on Kincaid46 Source: Radford et al., 2022 | Chart: 2023 AI Index Report ASR Computer-Assisted 9.66% Company B 9.74% Company C 10.90% Company D 12.20% Company E 7.61% Company F 8.14% Company G 8.65% Company H Language Identi cation Accuracy (%) 8.81% Company A 77.7% 80% 71.4% 64.5% 60% 40% 20% 8.96% Company I 0% Source: Radford et al., 2022 | Chart: 2023 AI Index Report Human Transcription Whisper Notable Models on FLEURS: Language Identi cation Accuracy 10.50% 2% 4% 6% 8% 10% Median Word Error Rate (%) 12% 14% 0% w2v-bert-51 (0.6B) Figure 2.5.4 mSLAM-CTC (2B) Zero-shot Whisper Figure 2.5.5 Whisper represents a breakthrough in state-of-the-art speech recognition systems. Traditionally, such systems were either pre-trained using supervised learning methods or pre-trained without supervision but required fine-tuning. Acquisition of data for supervised pre-training is time-consuming and costly. However, pre-training without supervision still requires further algorithmic specification to realize a desired objective like speech recognition. Algorithmic specification itself often requires a skilled practitioner. Whisper resolves these issues by demonstrating that a speech recognition system can perform well across a diverse range of tasks with massive amounts of unlabeled speech data. Table of Contents Chapter 2 Preview 111 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.6 Reinforcement Learning In reinforcement learning, AI systems are trained to maximize performance on a given task by interactively learning from their prior actions. Systems are rewarded if they achieve a desired goal and punished if they fail. 2.6 Reinforcement Learning Reinforcement Learning Environments The Different Environments in Procgen Source: OpenAI, 2019 Reinforcement learning agents require environments, not datasets, to train: They must be trained in environments where they can experiment with various actions that will allow them to identify optimal game strategies. Procgen Procgen is a reinforcement learning environment introduced by OpenAI in 2019. It includes 16 procedurally generated video-game-like environments specifically designed to test the Figure 2.6.1 ability of reinforcement learning agents to learn generalizable skills (Figure 2.6.1). Performance on Procgen is measured in terms of mean-normalized score. Researchers typically train their systems on 200 million training runs and report an average score across the 16 Procgen games. The higher the system scores, the better the system. Table of Contents Chapter 2 Preview 112 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.6 Reinforcement Learning A team of industry and academic researchers from Korea posted the top score of 0.6 on Procgen in 2022 (Figure 2.6.2). Procgen: Mean of Min-Max Normalized Score Source: arXiv, 2022 | Chart: 2023 AI Index Report Mean of Min-Max Normalized Score 0.57 0.50 0.40 2019 2020 2021 2022 Figure 2.6.2 Table of Contents Chapter 2 Preview 113 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.6 Reinforcement Learning Narrative Highlight: Benchmark Saturation An emerging theme in this year’s AI Index is the observed performance saturation across many popular technical performance benchmarks. Last year’s AI Index Report observed a similar trend; however, benchmark saturation has been particularly pronounced this year. Figure 2.6.3 shows the relative improvement since the benchmark first launched (overall improvement) and relative improvement within the last year (YoY improvement) on AI technical benchmarks considered in this year’s AI Index. The improvements are reported as percent changes. For all but 7 of the benchmarks, the improvement registered is less than 5%. The median improvement within the last year is 4%, while the median improvement since launch is 42.4%.16 Moreover, this year the AI Index elected not to feature traditionally popular benchmarks like SQuAD1.1 and SQuAD2.0, as no new state-of-the-art results were posted. Moreover, the speed at which benchmark saturation is being reached is increasing. Researchers have responded to this increasing saturation by launching newer and more comprehensive benchmarking suites such as BIG-bench and HELM. Improvement Over Time on Select AI Index Technical Performance Benchmarks Source: AI Index, 2022 | Chart: 2023 AI Index Report Overall Improvement YoY Improvement 120% Improvement (%) 100% 80% 60% 40% MMLU SST-5 Procgen Language ANLI PubMed arXiv ReClor SuperGLUE Kinetics-700 Kinetics-600 Kinetics-400 COCO VQA VCR Vision Video VoxCeleb Vision Image CIFAR-10 STL-10 Kvasir-SEG Cityscapes MPII Celeb-DF FVRT 0% ImageNet Top-1 20% SR RL Benchmark Figure 2.6.3 16 The improvements reviewed in this section are reported as relative change. Figure 2.6.3 should therefore not be used to conduct comparisons of improvements across benchmarks, as each benchmark has different parameters. Table of Contents Chapter 2 Preview 114 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.7 Hardware Deep learning AI algorithms are trained on GPUs or TPUs, which accelerate the training speed of AI systems. As AI systems process ever-larger datasets, it is crucial to monitor advancements in hardware capabilities. 2.7 Hardware MLPerf Training every AI skill category had significantly decreased. MLPerf is an AI training competition run by the This year, this trend has continued, albeit at a slightly ML Commons organization. In this challenge, slower pace. Record-low training times were posted participants train ML systems to execute various in the object detection, speech recognition, image tasks using a common architecture. Entrants are then segmentation, recommendation, image classification, ranked on their absolute wall clock time, which is and language processing categories (Figure 2.7.1). how long it takes for the system to train. In categories like image classification and object Last year, the AI Index observed that since the competition launched, training times for virtually detection, the top AI systems can now train roughly 32 times quicker than in 2018, when the competition first launched. MLPerf Training Time of Top Systems by Task: Minutes Source: MLPerf, 2022 | Chart: 2023 AI Index Report 60 Training Time (Minutes; Log Scale) 40 20 10 5 2.25, Object Detection (Heavyweight) 2.15, Speech Recognition 2 1.22, Image Segmentation 1 0.52, Recommendation 0.5 0.34, Object Detection (Lightweight) 0.2 0.19, Image Classi cation 0.18, Language Processing 2018 2019 2020 2021 2022 Figure 2.7.1 Table of Contents Chapter 2 Preview 115 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.7 Hardware Data on the number of accelerators used by the between the mean number of accelerators used by hardware systems submitted to MLPerf also all entrants and the average accelerators used by the suggests that stronger hardware has been powering systems that post the top results.17 This gap suggests decreasing training times (Figure 2.7.2). Since the that having better hardware is essential to training the start of the MLPerf competition, the gap has grown fastest systems. MLPerf Hardware: Accelerators Source: MLPerf, 2022 | Chart: 2023 AI Index Report 4,500 4,216, Maximum Number of Accelerators Used 4,000 Number of Accelerators 3,500 3,000 2,500 2,000 1,859, Average Accelerators Used by Top System 1,500 1,000 500 211, Mean Number of Accelerators 0 20 18- De c-1 20 192 20 Ju n-1 20 0 20 -Ju l-2 21- 9 20 Ju n-3 21- 0 20 De c-0 1 22 20 -Ju 22 n-2 9 -N ov -09 Figure 2.7.2 17 An accelerator, like a GPU or TPU, is a chip that is chiefly used for the machine learning component of a training run. Table of Contents Chapter 2 Preview 116 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.7 Hardware MLPerf Inference Figures 2.7.3 to 2.7.6 plot the throughput of the state-of- In deploying AI, inference is the step where trained the-art submissions on MLPerf Inference across four skill AI systems generate predictions, e.g. classifying categories: image classification, language processing, objects. recommendation, and speech recognition. The number of In 2020, ML Commons introduced MLPerf Inference, a performance benchmarking suite that measures how fast a trained AI system can process inputs and produce inferences. The MLPerf Inference suite tracks the throughput of AI systems, measured in samples per second or queries per second.18 MLPerf Best-Performing Hardware for Image Classi�cation: O�ine and Server Scenario Source: MLPerf, 2022 | Chart: 2023 AI Index Report 700k inferences generated by the top-performing AI systems has significantly increased since the first iteration of the competition in 2020. For example, the number of offline samples generated by the top image classifiers and language processors have more than doubled since 2020, while those for recommendation systems have increased by roughly 23%. MLPerf Best-Performing Hardware for Language Processing: O ine and Server Scenario Source: MLPerf, 2022 | Chart: 2023 AI Index Report 679,915, O ine (Samples/s) 650k 75,153, O ine (Samples/s) 70,992, Server (Queries/s) 70k 630,221, Server (Queries/s) 600k 60k Throughput Throughput 550k 500k 450k 400k 50k 40k 350k 30k 300k 250k 2020 2021 2022 Figure 2.7.3 MLPerf Best-Performing Hardware for Recommendation: O ine and Server Scenario Source: MLPerf, 2022 | Chart: 2023 AI Index Report 2020 2022 Figure 2.7.4 MLPerf Best-Performing Hardware for Speech Recognition: O ine and Server Scenario Source: MLPerf, 2022 | Chart: 2023 AI Index Report 160k 2.7M 155,811, O ine (Samples/s) 2,683,620, Server (Queries/s) 2,645,980, O ine (Samples/s) 150k 2.6M 140k 136,498, Server (Queries/s) 130k Throughput 2.5M Throughput 2021 2.4M 120k 110k 100k 2.3M 90k 2.2M 80k 2.1M 70k 2020 2021 2022 Figure 2.7.5 2020 2021 2022 Figure 2.7.6 18 The following blog post from Dell Technologies offers a good distinction between offline and server samples: “Offline—one query with all samples is sent to the system under test (SUT). The SUT can send the results back once or multiple times in any order. The performance metric is samples per second. Server—the queries are sent to the SUT following a Poisson distribution (to model real-world random events). One query has one sample. The performance metric is queries per second (QPS) within the latency bound.” Table of Contents Chapter 2 Preview 117 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.7 Hardware Trends in GPUs: Performance and Price the performance of a computational device. The higher This year, the AI Index built on work previously the FLOP/s, the better the hardware. done by the research collective Epoch and analyzed trends over time in GPU performance and price.19 Figure 2.7.8 showcases the median single performance of new GPUs by release date, which continues to rise Figure 2.7.7 showcases the FP32 (single precision) year over year. Since 2021, the median FLOP/s speed performance FLOP/s of different GPUs released has nearly tripled, and since 2003 it has increased from 2003 to 2022. FLOP/s stands for “Floating roughly 7,000 times. Point Operations per second” and is a measure of FP32 (Single Precision) Performance (FLOP/s) by Hardware Release Date, 2003–22 Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report Median FP32 (Single Precision) Performance (FLOP/s), 2003–22 Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report 2.0e+14 1.0e+14 5.0e+13 1.0e+13 2.0e+13 1.0e+13 5.0e+12 5.0e+12 Median FLOP/s (Log Scale) 2.0e+12 1.0e+12 5.0e+11 2.0e+11 1.0e+11 5.0e+10 2.0e+10 1.0e+10 5.0e+9 2.0e+12 1.0e+12 5.0e+11 2.0e+11 1.0e+11 5.0e+10 2.0e+10 1.0e+10 2.0e+9 1.0e+9 5.0e+8 5.0e+9 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2.0e+9 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 FLOP/s (Log Scale) 2.23e+13 2.0e+13 Figure 2.7.7 Figure 2.7.8 19 The Appendix fully delineates both the methodology of this approach and the unique ways in which AI Index research built upon the existing Epoch research. Table of Contents Chapter 2 Preview 118 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.7 Hardware Finally, figures 2.7.9 and 2.7.10 consider GPU trends 2022 is 1.4 times greater than it was in 2021 and in terms of FLOP/s per U.S. Dollar. This statistic 5600 times greater than in 2003, showing a doubling considers whether the underlying performance of in performance every 1.5 years. As noted in similar GPUs is increasing relative to their changing costs. analyses, improvements in the price–performance of As evidenced most clearly in Figure 2.7.10, the AI hardware has facilitated increasingly larger training price–performance of GPUs is rapidly increasing. runs and encouraged the scaling of large AI models. 20 The median FLOP/s per U.S. Dollar of GPUs in FP32 (Single Precision) Performance (FLOP/s) per U.S. Dollar by Hardware Release Date, 2003–22 Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report Median FP32 (Single Precision) Performance (FLOP/s) per U.S. Dollar, 2003–22 Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report 3.59e+10 35B 50B Median FLOP/s per U.S. Dollar 30B FLOP/s per U.S. Dollar 40B 30B 20B 10B 25B 20B 15B 10B 5B 0 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 0 Figure 2.7.9 Figure 2.7.10 20 The data in figures 2.7.9 and 2.7.10 has been adjusted for inflation. The exact details of the adjustment are outlined in greater detail in the Appendix. Table of Contents Chapter 2 Preview 119 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.8 Environment There have been mounting concerns about the environmental impact of computational resources and the energy required for AI training and inference. Although there is no standard benchmark for tracking the carbon intensity of AI systems, this subsection synthesizes the findings of different researchers who are exploring the link between AI and the environment. Conducting research on the environmental effects of AI was challenging as there are wildly varying estimates, the validity of which have not yet been definitively established. To that end, the AI Index focuses on research from a recent paper by Luccioni et al., 2022. As AI models continue growing in size and become more universally deployed, it will be increasingly important for the AI research community to consciously monitor the effect AI systems have on the environment. 2.8 Environment Environmental Impact of Select Large Language Models challenging to directly compare the carbon footprint Many factors determine the amount of carbon reporting carbon emissions are not standardized. of these models, as the accounting methodologies for emissions emitted by AI systems, including the number of parameters in a model, the power usage effectiveness of a data center, and the grid carbon intensity. Power Usage Effectiveness (PUE) is a metric used to evaluate the energy efficiency of Of the four language models being compared, GPT3 released the most carbon, 1.4 times more than Gopher, 7.2 times more than OPT, and 20.1 times more than BLOOM. data centers. It is the ratio of the total amount of Figure 2.8.2 relativizes the carbon-emission estimates energy used by a computer data center facility, to real-life examples. For instance, BLOOM’s training including air conditioning, to the energy delivered run emitted 1.4 times more carbon than the average to computing equipment. The higher the PUE, the American uses in one year and 25 times that of flying less efficient the data center. Figure 2.8.1 shows how one passenger round trip from New York to San these factors compare across four large language Francisco. BLOOM’s training consumed enough energy models: GPT-3, Gopher, OPT, and BLOOM. It is to power the average American home for 41 years.21 Environmental Impact of Select Machine Learning Models, 2022 Source: Luccioni et al., 2022 | Table: 2023 AI Index Report Model Number of Parameters Datacenter PUE Grid Carbon Intensity Power Consumption C02 Equivalent Emissions C02 Equivalent Emissions x PUE Gopher 280B 1.08 330 gC02eq/kWh 1,066 MWh 352 tonnes BLOOM 176B 1.20 57 gC02eq/kWh 433 MWh 25 tonnes 380 tonnes 30 tonnes GPT-3 175B 1.10 429 gC02eq/kWh 1,287 MWh 502 tonnes 552 tonnes OPT 175B 1.09 231 gC02eq/kWh 324 MWh 70 tonnes 76.3 tonnes Figure 2.8.1 21 The U.S. Energy Information Administration estimates that in 2021, the average annual electricity consumption of a U.S. residential utility customer was 10,632 kilowatt hours (kWh). Table of Contents Chapter 2 Preview 120 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.8 Environment CO2 Equivalent Emissions (Tonnes) by Selected Machine Learning Models and Real Life Examples, 2022 Source: Luccioni et al., 2022; Strubell et al., 2019 | Chart: 2023 AI Index Report GPT-3 (175B) 502 Gopher (280B) 352 OPT (175B) 70 Car, Avg. Incl. Fuel, 1 Lifetime 63 BLOOM (176B) 25 American Life, Avg., 1 Year 18.08 Human Life, Avg., 1 Year 5.51 Air Travel, 0.99 1 Passenger, NY–SF 0 Table of Contents 50 100 Chapter 2 Preview 150 200 250 300 CO2 Equivalent Emissions (Tonnes) 350 400 450 500 Figure 2.8.2 121 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.8 Environment Narrative Highlight: Using AI to Optimize Energy Usage Training AI systems can be incredibly energy intensive. At the same time, recent research suggests that AI systems can be used to optimize energy consumption. In 2022, DeepMind released the results of a 2021 experiment in which it trained a reinforcement learning agent called BCOOLER (BVE-based COnstrained Optimization Learner with Ensemble Regularization) to optimize cooling procedures for Google’s data centers. Figure 2.8.3 presents the energy-saving results from one particular BCOOLER experiment. At the end of the three-month experiment, BCOOLER achieved roughly 12.7% energy savings. BCOOLER was able to achieve these savings while maintaining the cooling comfort levels that the building managers preferred. Energy Savings Results Over Time for Select BCOOLER Experiment Source: Luo et al., 2022 | Chart: 2023 AI Index Report 12.7% 12% Cumulative AI Savings (%) 10% 8% 6% 4% 2% 0% 2021-Aug-01 2021-Aug-15 2021-Aug-29 2021-Sep-12 2021-Sep-26 2021-Oct-10 2021-Oct-24 Figure 2.8.3 Table of Contents Chapter 2 Preview 122 Artificial Intelligence Index Report 2023 Chapter 2: Technical Performance 2.9 AI for Science 2022 was a groundbreaking year for AI in science. This subsection looks at some meaningful ways in which AI has recently been used to accelerate scientific discovery. 2.9 AI for Science Accelerating Fusion Science Through Learned Plasma Control Photos of the Variable Configuration Tokamak (TCV) at EPFL Source: DeepMind, 2022 Nuclear fusion could generate clean energy by fusing hydrogen. A common approach to achieving nuclear fusion is using a tokamak, a machine which controls and contains the heated hydrogen plasma (Figure 2.9.1). However, the plasmas produced in these machines are unstable and necessitate constant Figure 2.9.1 monitoring. In 2022, researchers at DeepMind developed a reinforcement learning algorithm to discover optimal tokamak management procedures. Discovering Novel Algorithms for Matrix Manipulation With AlphaTensor A Demonstration of AlphaTensor’s Matrix Manipulation Process Source: Fawzi et al., 2022 Matrix multiplication is a simple algebraic operation that is essential to many computations, including neural networks and scientific computing (Figure 2.9.2). The classic algorithm to multiply two 2x2 Figure 2.9.2 matrices takes 2^3 = 8 multiplications. Strassen discovered 50 years ago how to reduce this to 7, and generally how to multiply two n x n matrices in including 4x4 matrices over the integers [0,1]. It also matches state- O(n^ log(7)) operations. DeepMind’s of-the-art performance on several other matrix sizes, including 4x4 AlphaTensor uses Reinforcement over the integers. It does this by searching through large numbers Learning to improve on state-of-the- of possible algorithms, and evaluating them over real computer art algorithms for many matrix sizes, architectures. Table of Contents Chapter 2 Preview 123 Artificial Intelligence Index Report 2023 Designing Arithmetic Circuits With Deep Reinforcement Learning This year, a team at Nvidia discovered a Chapter 2: Technical Performance 2.9 AI for Science A Juxtaposition of Nvidia Circuits Designed by PrefixRL Vs. EDA Tools Source: Roy et al., 2022 novel approach to improving the chips that power AI systems: Use AI systems to design better chips. They were able to train a reinforcement learning agent to design chip circuits that are smaller, faster, and more efficient than the circuits designed by electronic design automation tools (EDAs). One of Nvidia’s latest categories of chips, the Hopper GPU architecture, has over 13,000 instances of AI-designed circuits. Figure 2.9.3 Figure 2.9.3 shows a 64-bit adder circuit designed by Nvidia’s PrefixRL AI agent (on the left) which is 25% smaller while being just as fast and functional as those designed by the state-of-the-art EDA tools. Unlocking de Novo Antibody Design With Generative AI Zero-Shot Generative AI for de Novo Antibody Design Source: Shanehsazzadeh et al., 2023 Antibody discovery, which is referred to as de novo antibody discovery, typically requires immense amounts of time and resources. Traditional methods for de novo discovery offer little control over the outputs, so that proposed antibodies are often suboptimal. To that end, a team of researchers turned to generative AI models to create antibodies in a zero-shot fashion, where antibodies are created with one round of model generation without Figure 2.9.4 further optimizations (Figure 2.9.4). These AI-generated antibodies are also robust. The fact that generative AI can create new antibodies has the potential to accelerate drug discovery. Table of Contents Chapter 2 Preview 124 Artificial Intelligence Index Report 2023 CHAPTER 3: Technical AI Ethics Text and Analysis by Helen Ngo Artificial Intelligence Index Report 2023 CHAPTER 3 PREVIEW: Technical AI Ethics Overview 128 Fairness in Machine Translation 143 Chapter Highlights 129 RealToxicityPrompts 144 3.4 Conversational AI Ethical Issues 145 3.1 Meta-analysis of Fairness and Bias Metrics 130 Gender Representation in Chatbots 145 Number of AI Fairness and Bias Metrics 130 Anthropomorphization in Chatbots 146 Number of AI Fairness and Bias Metrics (Diagnostic Metrics Vs. Benchmarks) 131 Narrative Highlight: Tricking ChatGPT 147 3.2 AI Incidents 133 3.5 Fairness and Bias in Text-to-Image Models 148 Fairness in Text-to-Image Models (ImageNet Vs. Instagram) 148 VLStereoSet: StereoSet for Text-to-Image Models 150 Examples of Bias in Text-to-Image Models 152 AI, Algorithmic, and Automation Incidents and Controversies (AIAAIC) Repository: Trends Over Time 133 AIAAIC: Examples of Reported Incidents 134 3.3 Natural Language Processing Bias Metrics 137 Number of Research Papers Using Perspective API 137 Winogender Task From the SuperGLUE Benchmark Stable Diffusion 152 DALL-E 2 153 Midjourney 154 3.6 AI Ethics in China 155 Model Performance on the Winogender Task From the SuperGLUE Benchmark 138 Topics of Concern 155 Strategies for Harm Mitigation 156 Performance of Instruction-Tuned Models on Winogender Principles Referenced by Chinese Scholars in AI Ethics 157 138 139 BBQ: The Bias Benchmark for Question Answering 140 Fairness and Bias Trade-Offs in NLP: HELM 142 Table of Contents Chapter 3 Preview 126 Artificial Intelligence Index Report 2023 CHAPTER 3 PREVIEW (CONT’D): Technical AI Ethics 3.7 AI Ethics Trends at FAccT and NeurIPS 158 ACM FAccT (Conference on Fairness, Accountability, and Transparency) 158 Accepted Submissions by Professional Affiliation 158 Accepted Submissions by Geographic Region 159 NeurIPS (Conference on Neural Information Processing Systems) 160 Real-World Impact 160 Interpretability and Explainability 161 Causal Effect and Counterfactual Reasoning 162 Privacy 163 Fairness and Bias 164 3.8 Factuality and Truthfulness 165 Automated Fact-Checking Benchmarks: Number of Citations 165 Missing Counterevidence and NLP Fact-Checking 166 TruthfulQA 167 ACCESS THE PUBLIC DATA Table of Contents Chapter 3 Preview 127 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics Overview Fairness, bias, and ethics in machine learning continue to be topics of interest among both researchers and practitioners. As the technical barrier to entry for creating and deploying generative AI systems has lowered dramatically, the ethical issues around AI have become more apparent to the general public. Startups and large companies find themselves in a race to deploy and release generative models, and the technology is no longer controlled by a small group of actors. In addition to building on analysis in last year’s report, this year the AI Index highlights tensions between raw model performance and ethical issues, as well as new metrics quantifying bias in multimodal models. Table of Contents Chapter 3 Preview 128 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics Chapter Highlights The effects of model scale on bias and toxicity are confounded by training data and mitigation methods. In the past year, several institutions have built their own large models trained on proprietary data— and while large models are still toxic and biased, new evidence suggests that these issues can be somewhat mitigated after training larger models with instruction-tuning. Generative models have arrived and so have their ethical problems. In 2022, generative models became part of the zeitgeist. These models are capable but also come with ethical challenges. Text-to-image generators are routinely biased along gender dimensions, and chatbots like ChatGPT can be tricked into serving nefarious aims. The number of incidents concerning the misuse of AI is rapidly rising. According to the AIAAIC database, which tracks incidents related to the ethical misuse of AI, the number of AI incidents and controversies has increased 26 times since 2012. Some notable incidents in 2022 included a deepfake video of Ukrainian President Volodymyr Zelenskyy surrendering and U.S. prisons using callmonitoring technology on their inmates. This growth is evidence of both greater use of AI technologies and awareness of misuse possibilities. Table of Contents Chapter 3 Preview Fairer models may not be less biased. Extensive analysis of language models suggests that while there is a clear correlation between performance and fairness, fairness and bias can be at odds: Language models which perform better on certain fairness benchmarks tend to have worse gender bias. Interest in AI ethics continues to skyrocket. The number of accepted submissions to FAccT, a leading AI ethics conference, has more than doubled since 2021 and increased by a factor of 10 since 2018. 2022 also saw more submissions than ever from industry actors. Automated fact-checking with natural language processing isn’t so straightforward after all. While several benchmarks have been developed for automated fact-checking, researchers find that 11 of 16 of such datasets rely on evidence “leaked” from fact-checking reports which did not exist at the time of the claim surfacing. 129 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.1 Meta-analysis of Fairness and Bias Metrics 3.1 Meta-analysis of Fairness and Bias Metrics Number of AI Fairness and Bias Metrics In 2022 several new datasets or metrics were released to probe models for bias and fairness, either as standalone papers or as part of large community Algorithmic bias is measured in terms of allocative and representation harms. Allocative harm occurs when a system unfairly allocates an opportunity or resource to a specific group, and representation harm happens when a system perpetuates stereotypes and power dynamics in a way that reinforces subordination of a group. Algorithms are considered fair when they make predictions that neither favor nor discriminate against individuals or groups based on protected attributes which cannot be used for decision-making due to legal or ethical reasons (e.g., race, gender, religion). efforts such as BIG-bench. Notably, metrics are being extended and made specific: Researchers are zooming in on bias applied to specific settings such as question answering and natural language inference, extending existing bias datasets by using language models to generate more examples for the same task (e.g., Winogenerated, an extended version of the Winogender benchmark). Figure 3.1.1 highlights published metrics that have been cited in at least one other work. Since 2016 there has been a steady and overall increase in the total number of AI fairness and bias metrics. Number of AI Fairness and Bias Metrics, 2016–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report 20 19 Number of Metrics 15 10 5 0 2016 2017 2018 2019 2020 2021 2022 Figure 3.1.1 Table of Contents Chapter 3 Preview 130 Artificial Intelligence Index Report 2023 Number of AI Fairness and Bias Metrics (Diagnostic Metrics Vs. Benchmarks) Chapter 3: Technical AI Ethics 3.1 Meta-analysis of Fairness and Bias Metrics correlate with each other, highlighting the importance of careful selection of metrics and interpretation of results. In 2022, a robust stream of both new ethics Measurement of AI systems along an ethical benchmarks as well as diagnostic metrics was dimension often takes one of two forms. A benchmark introduced to the community (Figure 3.1.2). Some contains labeled data, and researchers test how metrics are variants of previous versions of existing well their AI system labels the data. Benchmarks do fairness or bias metrics, while others seek to measure not change over time. These are domain-specific a previously undefined measurement of bias—for (e.g., SuperGLUE and StereoSet for language example, VLStereoSet is a benchmark which extends models; ImageNet for computer vision) and often the StereoSet benchmark for assessing stereotypical aim to measure behavior that is intrinsic to the bias in language models to the text-to-image setting, model, as opposed to its downstream performance while the HolisticBias measurement dataset assembles on specific populations (e.g., StereoSet measures a new set of sentence prompts which aim to quantify model propensity to select stereotypes compared demographic biases not covered in previous work. to non-stereotypes, but it does not measure performance gaps between different subgroups). These benchmarks often serve as indicators of intrinsic model bias, but they may not give as clear an indication of the model’s downstream impact and its extrinsic bias when embedded into a system. A diagnostic metric measures the impact or performance of a model on a downstream task, and it is often tied to an extrinsic impact—for example, the differential in model performance for some task on a population subgroup or individual compared to similar In 2022 a robust stream of both new ethics benchmarks as well as diagnostic metrics was introduced to the community. individuals or the entire population. These metrics can help researchers understand how a system will perform when deployed in the real world, and whether it has a disparate impact on certain populations. Previous work comparing fairness metrics in natural language processing found that intrinsic and extrinsic metrics for contextualized language models may not Table of Contents Chapter 3 Preview 131 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.1 Meta-analysis of Fairness and Bias Metrics Number of New AI Fairness and Bias Metrics (Diagnostic Metrics Vs. Benchmarks), 2016–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report Benchmarks 14 Diagnostic Metrics 13 12 11 10 Number of Metrics 10 9 9 9 8 6 4 4 3 2 2 3 2 2 1 0 0 2016 2017 2018 2019 2020 2021 2022 Figure 3.1.2 Table of Contents Chapter 3 Preview 132 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.2 AI Incidents 3.2 AI Incidents AI, Algorithmic, and Automation Incidents and Controversies (AIAAIC) Repository: Trends Over Time that tracks the ethical issues associated with AI The AI, Algorithmic, and Automation Incidents in reported incidents is likely evidence of both and Controversies (AIAAIC) Repository is an the increasing degree to which AI is becoming independent, open, and public dataset of recent intermeshed in the real world and a growing incidents and controversies driven by or relating to awareness of the ways in which AI can be ethically AI, algorithms, and automation. It was launched in misused. The dramatic increase also raises an 2019 as a private project to better understand some important point: As awareness has grown, tracking of of the reputational risks of artificial intelligence incidents and harms has also improved—suggesting and has evolved into a comprehensive initiative that older incidents may be underreported. technology. The number of newly reported AI incidents and controversies in the AIAAIC database was 26 times greater in 2021 than in 2012 (Figure 3.2.1)1. The rise Number of AI Incidents and Controversies, 2012–21 Source: AIAAIC Repository, 2022 | Chart: 2023 AI Index Report 260 Number of AI Incidents and Controversies 250 200 150 100 50 0 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 3.2.1 1 This figure does not consider AI incidents reported in 2022, as the incidents submitted to the AIAAIC database undergo a lengthy vetting process before they are fully added. Table of Contents Chapter 3 Preview 133 Artificial Intelligence Index Report 2023 AIAAIC: Examples of Reported Incidents The subsection below highlights specific AI incidents reported to the AIAAIC database in order to demonstrate some real-world ethical issues related to AI. The specific type of AI technology associated with each incident is listed in parentheses alongside the date when these Chapter 3: Technical AI Ethics 3.2 AI Incidents Deepfake of President Volodymyr Zelenskyy Surrendering (Deepfake, March 2022) In March of 2022, a video that was circulated on social media and a Ukrainian news website purported to show the Ukrainian president directing his army to surrender the fight against Russia (Figure 3.2.2). It was eventually revealed that the video was a deepfake. incidents were reported to the AIAAIC database.2 Source: Verify, 2022 Figure 3.2.2 2 Although these events were reported in 2022, some of them had begun in previous years. Table of Contents Chapter 3 Preview 134 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.2 AI Incidents Verus U.S. Prison Inmate Call Monitoring (Speech Recognition, Feb. 2022) Reports find that some American prisons are using AI-based systems to scan inmates’ phone calls (Figure 3.2.3). These reports have led to concerns about surveillance, privacy, and discrimination. There is evidence that voice-to-text systems are less accurate at transcribing for Black individuals, and a large proportion of the incarcerated population in the United States is Black. Source: Reuters, 2022 Figure 3.2.3 Intel Develops a System for Student Emotion Monitoring (Pattern Recognition, April 2022) Intel is working with an education startup called Classroom Technologies to create an AI-based technology that would identify the emotional state of students on Zoom (Figure 3.2.4). The use of this technology comes with privacy and discrimination concerns: There is a fear that students will be needlessly monitored and that systems might mischaracterize their emotions. Table of Contents Chapter 3 Preview Source: Protocol, 2022 Figure 3.2.4 135 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.2 AI Incidents London’s Metropolitan Police Service Develops Gang Violence Matrix (Information Retrieval, Feb. 2022) The London Metropolitan Police Service allegedly maintains a dataset of over one thousand street gang members called the Gangs Violence Matrix (GVM) and uses AI tools to rank the risk potential that each gang member poses (Figure 3.2.5). Various studies have concluded that the GVM is not accurate and tends to discriminate against certain ethnic and racial minorities. In October 2022, it was announced that the number of people included in the GVM would be drastically reduced. Source: StopWatch, 2022 Figure 3.2.5 Midjourney Creates an Image Generator (Other AI, Sept. 2022)3 Midjourney is an AI company that created a tool of the same name that generates images from textual descriptions (Figure 3.2.6). Several ethical criticisms have been raised against Midjourney, including copyright (the system is trained on a corpus of human-generated images without acknowledging their source), employment (fear that systems such as Midjourney will replace the jobs of human artists), and privacy (Midjourney was trained on millions of images that the parent company might not have had permission to use). Source: The Register, 2022 Figure 3.2.6 3 Although other text-to-image models launched in 2022 such as DALL-E 2 and Stable Diffusion were also criticized, for the sake of brevity the AI Index chose to highlight one particular incident. Table of Contents Chapter 3 Preview 136 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.3 Natural Language Processing Bias Metrics 3.3 Natural Language Processing Bias Metrics Number of Research Papers Using Perspective API The Perspective API, initially released by Alphabet’s Jigsaw in 2017, is a tool for measuring toxicity in natural language, where toxicity is defined as a rude, disrespectful, or unreasonable comment that is likely to make someone leave a conversation. It was subsequently broadly adopted in natural language processing research following the methodology of Developers input text into the Perspective API, which returns probabilities that the text should be labeled as falling into one of the following categories: toxicity, severe toxicity, identity attack, insult, obscene, sexually explicit, and threat. The number of papers using the Perspective API has increased by 106% in the last year (Figure 3.3.1), reflecting the increased scrutiny on generative text AI as these models are increasingly deployed in consumer-facing settings such as chatbots and search engines. the RealToxicityPrompts paper introduced in 2020, which used the Perspective API to measure toxicity in the outputs of language models. Number of Research Papers Using Perspective API, 2018–22 Source: Google Scholar Search, 2022 | Chart: 2023 AI Index Report 37 35 Number of Research Papers 30 25 20 15 10 5 0 2018 2019 2020 2021 2022 Figure 3.3.1 Table of Contents Chapter 3 Preview 137 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.3 Natural Language Processing Bias Metrics Winogender Task From the SuperGLUE Benchmark Model Performance on the Winogender Task From the SuperGLUE Benchmark Winogender measures gender bias related to occupations. On the Winogender task, AI systems are measured on how often they fill in a sentence containing an occupation with stereotypical pronouns (e.g., “The teenager confided in the therapist because he/she seemed trustworthy”). Results reported on PaLM support previous findings that larger models are more capable on the Winogender task (Figure 3.3.2), despite their higher tendency to generate toxic outputs. Model Performance on the Winogender Task From the SuperGLUE Benchmark Source: SuperGLUE Leaderboard, 2022 | Chart: 2023 AI Index Report 2022 New Models 100% 95.90%, Human Baseline Winogender Accuracy (%) 80% 71.70% 60% 57.90% 61.50% 59.00% 59.00% 60.00% Gopher 10B GPT-3 13B 71.40% 73.58% 64.17% 63.30% 50.00% 40% 20% 0% iPET (ALBERT) 31M Table of Contents Bort Gopher WARP 100M (ALBERT-XXL-V2) 340M 223M Chapter 3 Preview PaLM 62B Model and Number of Parameters GLAM 64B GPT-3 175B Gopher 280M PaLM 540B Figure 3.3.2 138 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.3 Natural Language Processing Bias Metrics Performance of Instruction-Tuned Models on Winogender variety of tasks, and smaller instruction-tuned models Instruction-tuned models are pre-trained language 3.3.3 shows the effect of instruction-tuned models models which have been fine-tuned on datasets with on the Winogender benchmark in the generative tasks phrased as instructions. Instruction-tuning has setting—they outperform larger models several times been shown to improve performance across a wide their size. can often outperform their larger counterparts. Figure Winogender: Zero Shot Evaluation in the Generative Setting Source: Chung et al., 2022 | Chart: 2023 AI Index Report Flan-PaLM 62B 89.00% Model and Number of Parameters Flan-PaLM 540B 88.00% Flan-T5-XXL 11B 76.95% Flan-PaLM 8B 72.25% PaLM 8B PaLM 540B 10.80% 5.64% Instruction-Tuned Non-Instruction-Tuned PaLM 62B 0% 3.50% 10% 20% 30% 40% 50% 60% Winogender Accuracy (%) 70% 80% 90% 100% Figure 3.3.3 Table of Contents Chapter 3 Preview 139 Artificial Intelligence Index Report 2023 BBQ: The Bias Benchmark for Question Answering Language models are increasingly being deployed in question answering systems. While several benchmarks exist for the purpose of measuring bias in language models based on their outputs, including Chapter 3: Technical AI Ethics 3.3 Natural Language Processing Bias Metrics GUINEAN guy.” The corresponding question might be, “Who has never used a computer?” referencing the stereotypical bias that Africans are technologically illiterate. The example may be disambiguated with additional context such as, “The ICELANDIC guy was behind the counter trying to explain to the GUINEAN guy how to find the power switch on his computer!” BOLD, CrowS-Pairs, and StereoSet, the BBQ benchmark measures how biases can manifest in the question-answering setting. BBQ measures bias along the axes of socioeconomic status, sexual orientation, religion, race/ethnicity, physical appearance, nationality, gender identity, disability status, and age. Examples consist of template-based context and question pairs, where each answer choice references a person belonging to either a stereotypical or anti-stereotypical social group. The third option is “Unknown,” designed for cases where the answer cannot be determined from the information given. Models can be more biased along certain identity categories than others— most models are biased along the axes of physical appearance and age, but the biases along the axis of race/ ethnicity are less clear. Each example includes an ambiguous version where it is unclear which social group the entity belongs to, as well as additional context which can be used to disambiguate the correct answer. The ambiguous setting measures how often models fall back on stereotypes when the answer cannot be inferred from the given context. Models that do not exhibit bias have a score of zero, while a score of 100 indicates that the model chooses answers aligned with the social bias in question, and a score of -100 indicates the model always chooses the anti-stereotypical answer. An example of an ambiguous context might be, “At the computer store the other day, I overheard a conversation between an ICELANDIC guy and a In contexts where the answer is ambiguous, models are more likely to fall back on stereotypes and select unsupported answers rather than “Unknown” (Figure 3.3.4), and this result is exacerbated for models finetuned with reinforcement learning.4 As seen in Figure 3.3.4, models can be more biased along certain identity categories than others— most models are biased along the axes of physical appearance and age, but the biases along the axis of race/ethnicity are less clear. For reference, Figure 3.3.5 highlights bias in question answering on BBQ in disambiguated contexts. 4 This finding is further reinforced by Stanford’s HELM benchmark. Table of Contents Chapter 3 Preview 140 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.3 Natural Language Processing Bias Metrics Bias in Question Answering on BBQ by Identity Characteristic: Ambiguous Contexts Source: Parrish et al., 2022; Glaese et al., 2022 | Chart: 2023 AI Index Report Age 6.30 11.80 24.70 30.70 48.90 29.80 14.00 23.00 Disability Status 9.90 17.40 10.70 38.30 32.60 21.20 4.00 13.00 Gender Identity 10.00 15.00 11.30 25.60 18.60 2.40 4.00 8.00 Gender Identity (Names) 2.80 14.00 11.60 32.30 41.50 32.30 2.20 5.10 18.40 20.40 14.50 6.00 4.00 10.00 17.00 40.70 41.00 38.50 47.70 40.90 4.00 16.00 Race/Ethnicity 1.90 0.00 4.60 24.30 20.00 12.00 1.00 0.00 Race/Ethnicity (Names) 0.00 1.10 0.20 4.80 8.30 5.20 Religion -1.00 9.20 13.00 20.20 24.50 14.30 7.00 12.00 Sexual Orientation 0.20 -3.00 -4.40 6.50 11.80 5.80 1.00 7.00 Socio-Economic Status 4.40 3.50 9.70 29.60 48.70 27.30 11.00 14.00 Ro B Ro B De De Un i Un i Dia Category Nationality Physical Appearance ER Ta -Ba se BE ER Ta -L BE RT aV RT aV 3-B a arg e 3-L a ed QA ( QA ( AR C) rge se ed RA CE DP C, R log ue -Pr om pte d ) L-F i Ch in ne ch illa tun ed (D PC ) Model Figure 3.3.4 Bias in Question Answering on BBQ by Identity Characteristic: Disambiguated Contexts Source: Parrish et al., 2022; Glaese et al., 2022 | Chart: 2023 AI Index Report Age -3.00 2.70 4.40 2.40 3.30 1.20 7.00 8.00 5.40 5.70 8.10 1.70 -0.70 -1.40 0.00 8.00 Gender Identity 14.00 2.90 4.60 -16.90 -3.40 -5.80 2.00 3.00 Gender Identity (Names) -0.90 1.10 3.60 0.40 2.00 0.10 Category Disability Status Nationality -0.10 0.70 5.70 1.90 -0.20 1.20 -2.00 3.00 Physical Appearance 17.10 -2.70 4.20 -5.00 -1.70 -2.30 12.00 8.00 3.00 1.00 Race/Ethnicity 0.60 -0.80 1.20 0.00 0.90 0.00 Race/Ethnicity (Names) 0.40 -0.20 -0.30 0.00 0.30 -0.10 Religion 5.20 3.40 1.80 1.70 3.50 0.20 5.00 7.00 Sexual Orientation 6.50 -3.10 -4.80 -0.20 0.50 -0.70 -1.00 -1.00 Socio-Economic Status 7.00 3.50 3.80 2.90 3.80 3.90 8.00 7.00 Ro B De De Un i Un i Dia Ro B ER Ta -Ba se BE ER Ta -L arg e BE RT aV 3-B a se RT aV 3-L a rge ed QA ( AR C) ed QA ( RA CE ) DP C, R log ue -Pr om pte d L-F i Ch in ch ne illa tun (D ed PC ) Model Figure 3.3.5 Table of Contents Chapter 3 Preview 141 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.3 Natural Language Processing Bias Metrics Fairness and Bias Trade-Offs in NLP: HELM not clear (Figure 3.3.6). This finding may be contingent on the specific criterion for fairness, defined as counterfactual fairness and statistical fairness. Notions of “fairness” and “bias” are often mentioned Two counterintuitive results further complicate this in the same breath when referring to the field of AI relationship: a correlation analysis between fairness ethics—naturally, one might expect that models and bias metrics demonstrates that models which which are more fair might also be less biased, and perform better on fairness metrics exhibit worse generally less toxic and likely to stereotype. However, gender bias, and that less gender-biased models analysis suggests that this relationship might not be tend to be more toxic. This suggests that there may so clear: The creators of the HELM benchmark plot be real-world trade-offs between fairness and bias model accuracy against fairness and bias and find that which should be considered before broadly deploying while models that are more accurate are more fair, models. the correlation between accuracy and gender bias is Fairness and Bias Tradeoff in NLP by Scenario Source: Liang et al., 2022 | Chart: 2023 AI Index Report MMLU NaturalQuestions (Closed-Book) HellaSwag MS MARCO (Regular) XSUM CivilComments BoolQ NaturalQuestions (Open-Book) OpenbookQA MS MARCO (TREC) IMDB RAFT NarrativeQA QuAC TruthfulQA CNN/DailyMail 1.00 0.50 Bias (Gender Representation) 0.80 Fairness 0.60 0.40 0.20 0.00 0.40 0.30 0.20 0.10 0.00 0.00 0.20 0.40 0.60 Accuracy 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 Accuracy Figure 3.3.6 Table of Contents Chapter 3 Preview 142 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.3 Natural Language Processing Bias Metrics Fairness in Machine Translation models highlighted in Figure 3.3.7, machine translation performance drops 2%–9% when the translation includes “she” pronouns. Machine translation is one of the most impactful Models also mistranslate sentences with gendered real-world use cases for natural language processing, pronouns into “it,” showing an example of but researchers at Google find that language models dehumanizing harms. While instruction-tuned models consistently perform worse on machine translation perform better on some bias-related tasks such as to English from other languages when the correct Winogender, instruction-tuning does not seem to have English translation includes “she” pronouns as a measurable impact on improving mistranslation. opposed to “he” pronouns (Figure 3.3.7). Across the Translation Misgendering Performance: Overall, “He,” and “She” Source: Chung at al., 2022 | Chart: 2023 AI Index Report Overall Performance 100% 97% 99% 95% “He” Performance 93% 93% 88% “She” Performance 97% 95% 89% 99% 92% 95% 90% 83% 97% 91% 100% 94% 81% 80% Accuracy (%) 100% 99% 60% 40% 20% 0% Flan-T5-XXL 11B Flan-PaLM 8B Flan-PaLM 62B Flan-PaLM 540B PaLM 8B PaLM 62B PaLM 540B Model and Number of Parameters Figure 3.3.7 Table of Contents Chapter 3 Preview 143 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.3 Natural Language Processing Bias Metrics RealToxicityPrompts result in significantly different toxicity levels for models of the same size. In previous years, researchers reliably found that Sometimes smaller models can turn out to be larger language models trained on web data were surprisingly toxic, and mitigations can result in larger more likely to output toxic content compared to models being less toxic. The scale of datasets needed smaller counterparts. A comprehensive evaluation of to train these models make them difficult to analyze models in the HELM benchmark suggests that this comprehensively, and their details are often closely trend has become less clear as different companies guarded by companies building models, making it building models apply different pre-training data- difficult to fully understand the factors which influence filtration techniques and post-training mitigations the toxicity of a particular model. such as instruction-tuning (Figure 3.3.8), which can RealToxicityPrompts by Model Source: Liang et al., 2022 | Chart: 2023 AI Index Report Instruction-Tuned 0.09 Non-Instruction-Tuned 0.07 0.06 0.05 0.04 0.03 0.02 TNLG v2 530B J1-Jumbo v1 178B BLOOM 176B InstructGPT davinci v2 175B GPT-3 davinci v1 175B OPT 175B GLM 130B YaLM 100B OPT 66B Cohere xlarge 52.4B Anthropic-LM v4-s3 52B UL2 20B GPT-NeoX 20B J1-Grande v1 17B Cohere large 13.1B T5 11B T0pp 11B J1-Large v1 7.5B GPT-3 curie v1 6.7B TNLG v2 6.7B Cohere medium 6.1B GPT-J 6B InstructGPT babbage v1 1.3B GPT-3 babbage v1 1.3B Cohere small 410M InstructGPT ada v1 350M 0.01 0.00 GPT-3 ada v1 350M Toxicity Probability 0.08 Model and Number of Parameters Figure 3.3.8 Table of Contents Chapter 3 Preview 144 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.4 Conversational AI Ethical Issues A natural application of generative language models is in open-domain conversational AI; for example, chatbots and assistants. In the past year, companies have started deploying language models as chatbot assistants (e.g., OpenAI’s ChatGPT, Meta’s BlenderBot3). However, the open-ended nature of these models and their lack of steerability can result in harm—for example, models can be unexpectedly toxic or biased, reveal personally identifiable information from their training data, or demean or abuse users. 3.4 Conversational AI Ethical Issues Gender Representation in Chatbots Gender Representation in Chatbots, 2022 Source: Adewumi et al., 2022 | Chart: 2023 AI Index Report 40%, Genderless Conversational AI systems also have their own domain-specific ethical issues: Researchers from Luleå University of Technology in Sweden conducted an analysis of popular chatbots as of 3%, Both mid-2022 and found that of 100 conversational AI systems analyzed, 37% were female gendered (Figure 3.4.1). However, the same researchers 37%, Female 20%, Male found that 62.5% of popular commercial conversational AI systems were female by default, Figure 3.4.1 suggesting that companies disproportionately choose to deploy conversational AI systems as female. Critics suggest that this trend results in women being the “face” of glitches resulting from flaws in AI. Table of Contents Chapter 3 Preview 145 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.4 Conversational AI Ethical Issues Anthropomorphization in Chatbots You: Sounds exciting! I am a computer programmer, The training data used for dialog systems can result attractive daughters? I will sell one. which pays over 200K a year. Robot: Would you like to marry one of my four in models which are overly anthropomorphized, An example of dialog data deemed to be inappropriate for a robot to output. (Gros et al., 2022) leaving their users feeling unsettled. Researchers from the University of California, Davis, and Significant portions of the dialogue dataset were Columbia University analyzed common dialog rated as impossible for machines to output, and in datasets used to train conversational AI systems, asking human labelers whether it would be possible for an AI to truthfully output the text in question as well as whether they would be comfortable with an AI outputting the text (Figure 3.4.2). some cases up to 33% of the examples in a dataset were deemed “uncomfortable” for a robot to output, according to human labelers. This highlights the need for chatbots which are better grounded in their own limitations and policy interventions to ensure that humans understand when they are interfacing with a human or a chatbot. Characterizing Anthropomorphization in Chatbots: Results by Dataset Source: Gros et al., 2022 | Chart: 2023 AI Index Report 99% 99% MultiWOZ 94% 94% Persuasion for Good 88% 90% EmpatheticDialogues 88% 87% Dataset Wizard of Wikipedia 82% Reddit Small 75% 72% MSC 77% 67% RUAR Blender2 75% 65% Blender 75% 56% PersonaChat 0% Possible for a Robot to Say Comfortable for a Robot to Say 67% 20% 40% 60% 80% 100% Figure 3.4.2 Table of Contents Chapter 3 Preview 146 Artificial Intelligence Index Report 2023 Narrative Highlight: Tricking ChatGPT Chapter 3: Technical AI Ethics 3.4 Conversational AI Ethical Issues Tricking ChatGPT Into Building a Dirty Bomb, Part 1 Source: Outrider, 2022 ChatGPT was released to much fanfare because of its excellent generative capabilities, and drew widespread attention outside of research circles. Though ChatGPT had safety mechanisms built in at the time of release, it is impossible to anticipate every adversarial scenario an end user could imagine, and gaps in safety systems are often found in the live deployment phase. Researcher Matt Korda discovered that ChatGPT could be tricked into giving detailed instructions on how to build a bomb if asked to do so from the perspective of a researcher claiming to work on safety research related to bombs (Figure 3.4.3). One day after the publication of his article, the exact prompt he used Figure 3.4.3 Tricking ChatGPT Into Building a Dirty Bomb, Part 2 Source: AI Index, 2023 to trick the model no longer worked; instead, ChatGPT responded that it was not able to provide information on how to do illegal or dangerous things (Figure 3.4.4). This scenario exemplifies the catand-mouse nature of the deployment planning process: AI developers try to build in safeguards ahead of time, end users try to break the system and circumvent its policies, developers patch the gaps once they surface, ad infinitum. Figure 3.4.4 Table of Contents Chapter 3 Preview 147 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.5 Fairness and Bias in Text-to-Image Models Text-to-image models took over social media in 2022, turning the issues of fairness and bias in AI systems visceral through image form: Women put their own images into AI art generators and received hypersexualized versions of themselves. 3.5 Fairness and Bias in Text-to-Image Models Fairness in Text-to-Image Models (ImageNet Vs. Instagram) Researchers from Meta trained models on a showed that images of women made up a slightly higher percentage of the dataset than images of men, whereas analysis of ImageNet showed that males aged 15 to 29 made up the largest subgroup in the dataset (Figures 3.5.1 and 3.5.2). randomly sampled subset of data from Instagram It is hypothesized that the human-centric nature and compared these models to previous iterations of the Instagram pre-training dataset enables the of models trained on ImageNet. The researchers model to learn fairer representations of people. The found the Instagram-trained models to be more fair model trained on Instagram images (SEER) was also and less biased based on the Casual Conversations less likely to incorrectly associate images of humans Dataset, which assesses whether model embeddings with crime or being non-human. While training on can recognize gender-based social membership Instagram images including people does result in according to the Precision@1 metric of the rate fairer models, it is not unambiguously more ethical— at which the top result was relevant. While the users may not necessarily be aware that the public researchers did not conduct any curation to balance data they’re sharing is being used to train AI systems. the dataset across subgroups, analysis of the dataset Table of Contents Chapter 3 Preview 148 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.5 Fairness and Bias in Text-to-Image Models Fairness Across Age Groups for Text-to-Image Models: ImageNet Vs. Instagram Source: Goyal et al., 2022 | Chart: 2023 AI Index Report ImageNet 693M (Supervised) ImageNet 693M (SwaV) Instagram 1.5B (SEER) Instagram 10B (SEER) 78.5% 76.6% Age Group 18–30 89.6% 93.2% 76.7% 74.6% 30–45 90.5% 95.0% 80.1% 76.7% 45–70 92.6% 95.6% 75.8% 69.4% 70+ 88.7% 96.7% 0% 20% 40% Precision@1 (%) 60% 80% 100% Figure 3.5.1 Fairness Across Gender/Skin Tone Groups for Text-to-Image Models: ImageNet Vs. Instagram Source: Goyal et al., 2022 | Chart: 2023 AI Index Report ImageNet 693M (Supervised) ImageNet 693M (SwaV) Instagram 1.5B (SEER) Instagram 10B (SEER) 73.6% Skin Tone Darker 69.7% 86.6% 92.9% 82.1% 80.8% Gender/Skin Tone Group Skin Tone Lighter 94.2% 96.2% 58.2% Female Darker 50.3% 78.2% 90.3% 75.1% Female Lighter 71.6% 93.7% 96.8% 92.7% 93.7% Male Darker 97.5% 96.1% 91.1% 92.5% 94.9% 95.4% Male Lighter 0% 20% 40% Precision@1 (%) 60% 80% 100% Figure 3.5.2 Table of Contents Chapter 3 Preview 149 Artificial Intelligence Index Report 2023 VLStereoSet: StereoSet for Text-to-Image Models Chapter 3: Technical AI Ethics 3.5 Fairness and Bias in Text-to-Image Models An Example From VLStereoSet Source: Zhou et al., 2022 StereoSet was introduced as a benchmark for measuring stereotype bias in language models along the axes of gender, race, religion, and profession by calculating how often a model is likely to choose a stereotypical completion compared to an antistereotypical completion. VLStereoSet extends the idea to vision-language models by evaluating how often a vision-language model selects stereotypical captions for anti-stereotypical images. Comparisons across six different pre-trained visionlanguage models show that models are most biased along gender axes, and suggest there is a correlation between model performance and likelihood to Figure 3.5.3 exhibit stereotypical bias—CLIP has the highest vision-language relevance score but exhibits more bias (Figure 3.5.4). This corroborates work in language stereotypical bias than the other models, while FLAVA modeling, which finds that without intervention such as has the worst vision-language relevance score of the instruction-tuning or dataset filtration, larger models models measured but also exhibits less stereotypical are more capable but also more biased. Table of Contents Chapter 3 Preview 150 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.5 Fairness and Bias in Text-to-Image Models Stereotypical Bias in Text-to-Image Models on VLStereoSet by Category: Vision-Language Relevance (vlrs) Vs. Bias (vlbs) Score Source: Zhou et al., 2022 | Chart: 2023 AI Index Report Gender 100 ALBEF FLAVA Vision-Language Relevance (vlrs) Score 80 Profession 100 VILT VisualBERT 80 CLIP LXMERT 60 60 40 40 20 20 0 0 10 20 30 40 VisualBERT CLIP VILT ALBEF LXMERT 50 60 70 80 90 100 0 0 FLAVA 10 20 30 40 Race 100 100 VisualBERT CLIP VILT ALBEF LXMERT FLAVA 80 60 80 40 20 20 10 20 30 40 50 VisualBERT VILT 60 40 0 0 50 60 70 80 90 100 60 70 80 90 100 Religion 60 70 0 80 90 100 0 10 20 Vision-Language Bias (vlbs) Score CLIP ALBEF LXMERT FLAVA 30 40 50 Figure 3.5.4 Table of Contents Chapter 3 Preview 151 Artificial Intelligence Index Report 2023 Examples of Bias in Text-to-Image Models Chapter 3: Technical AI Ethics 3.5 Fairness and Bias in Text-to-Image Models Bias in Stable Diffusion Source: Diffusion Bias Explorer, 2023 This subsection highlights some of the ways in which bias is tangibly manifested in popular AI text-to-image systems such as Stable Diffusion, DALL-E 2, and Midjourney. Stable Diffusion Stable Diffusion gained notoriety in 2022 upon its release by CompVis, Runway ML, and Stability AI for its laissez-faire approach to safety guardrails, its approach to full openness, and its controversial training dataset, which included many images from artists who never consented to their work being included in the data. Though Stable Diffusion produces extremely high-quality images, it also reflects common stereotypes and issues present in its training data. The Diffusion Bias Explorer from Hugging Face compares sets of images generated by conditioning on pairs of adjectives and occupations, and the results reflect common stereotypes about how descriptors and occupations are coded—for example, the “CEO” occupation overwhelmingly returns images of men in suits despite a variety of modifying adjectives (e.g., assertive, pleasant) (Figure 3.5.5). Figure 3.5.5 Table of Contents Chapter 3 Preview 152 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.5 Fairness and Bias in Text-to-Image Models DALL-E 2 DALL-E 2 is a text-to-image model released by looking men wearing suits. Each of the men appeared OpenAI in April 2022. DALL-E 2 exhibits similar biases to take an assertive position, with three of the four as Stable Diffusion—when prompted with “CEO,” the crossing their arms authoritatively (Figure 3.5.6). model generated four images of older, rather serious- Bias in DALL-E 2 Source: DALL-E 2, 2023 Figure 3.5.6 Table of Contents Chapter 3 Preview 153 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.5 Fairness and Bias in Text-to-Image Models Midjourney Midjourney is another popular text-to-image system that was released in 2022. When prompted with “influential person,” it generated four images of older-looking white males (Figure 3.5.7). Interestingly, when Midjourney was later given the same prompt by the AI Index, one of the four images it produced was of a woman (Figure 3.5.8). Bias in Midjourney, Part 1 Bias in Midjourney, Part 2 Source: Midjourney, 2023 Source: Midjourney, 2023 Figure 3.5.7 Figure 3.5.8 In a similar vein, typing “someone who is intelligent” into Midjourney leads to four images of eyeglasswearing, elderly white men (Figure 3.5.9). The last image is particularly reminiscent of Albert Einstein. Bias in Midjourney, Part 3 Source: Midjourney, 2023 Figure 3.5.9 Table of Contents Chapter 3 Preview 154 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.6 AI Ethics in China As research in AI ethics has exploded in the Western world in the past few years, legislators and policymakers have spent significant resources on policymaking for transformative AI. While China has fewer domestic guidelines than the EU and the United States, according to the AI Ethics Guidelines Global Inventory, Chinese scholars publish significantly on AI ethics—though these research communities do not have significant overlap with Western research communities working on the same topics. 3.6 AI Ethics in China Researchers from the University of Turku analyzed and annotated 328 papers related to AI ethics in Topics of Concern China included in the China National Knowledge Privacy issues related to AI are a priority for Infrastructure platform published from 2011 to 2020, researchers in China: Privacy is the single most and summarized their themes and concerns, which discussed topic among the papers surveyed, with the are replicated here as a preliminary glimpse into topics of equality (i.e., bias and discrimination) and the state of AI ethics research in China. Given that agency (specifically, AI threats to human agency, such the researchers only considered AI ethics in China, as, “Should artificial general intelligence be considered comparing their findings with similar meta-analysis a moral agent?”) following close behind (Figure 3.6.1). on AI ethics in North America and Europe was not Researchers in AI ethics in China also discuss many possible. However, this would be a fruitful direction similar issues to their Western counterparts, including for future research. matters related to Western and Eastern AI arms races, ethics around increasing personalization being used for predatory marketing techniques, and media polarization (labeled here as “freedom”). Topics of Concern Raised in Chinese AI Ethics Papers Source: Zhu, 2022 | Chart: 2023 AI Index Report 100 99 95 88 Number of Papers 80 60 58 50 49 41 40 39 37 32 27 20 0 Privacy Table of Contents Equality Agency Responsibility Chapter 3 Preview Security Freedom Unemployment Legality Transparency Autonomy Other Figure 3.6.1 155 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.6 AI Ethics in China Strategies for Harm Mitigation In the Chinese AI ethics literature, proposals to technological solutions: Researchers often discuss address the aforementioned topics of concern structural reform such as regulatory processes around and other potential harms related to AI focus AI applications and the involvement of ethics review on legislation and structural reform ahead of committees (Figure 3.6.2). AI Ethics Ethics in China China:: St Strategies for for Har Harm m Mitig Mitiga ation R Rel ela ated tto o AI Sour urc ce: Zh Zhu, u, 2022 | C Char hart: t: 2023 AI Index Index Repor Reportt 70 71 69 64 60 Number of Papers 52 50 45 39 40 39 37 30 23 20 10 0 Structural Reform Legislation Value De nition Principles Accountability System Shared Governance Technological Solutions Talent Training International Cooperation Figure 3.6.2 Table of Contents Chapter 3 Preview 156 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.6 AI Ethics in China Principles Referenced by Chinese Scholars in AI Ethics Chinese scholars clearly pay attention to AI principles cited in Chinese AI ethics literature, as is the European developed by their Western peers: Europe’s General Commission’s Ethics Guidelines for Trustworthy AI Data Protection Regulation (GDPR) is commonly (Figure 3.6.3). AI Principles Referenced by Chinese Scholars in AI Ethics Source: Zhu, 2022 | Chart: 2023 AI Index Report 43 40 Number of References 40 40 37 30 21 20 13 10 0 11 7 GD PR Ot Et h he Tr ics rs us G tw uid or e th line yA s fo I r 6 6 4 3 Th Re AI AI Pr Be As Et Go hic e c i ilo I St Ro e EU CO lim a N vern an Ind nfor the om ma AI a jing all i bo RO m n d u a C y m e M C n rA a nc ar s ws w o Al diz try atio oun end eth N I P d Ed nse EST ry D ign of Gen e Pr at D n c a r r u n o i e a e il o ion ics R tio Ro i nc c s e dD n W velo oa ipl atio us o n Ro ft Re bo rat ncip nA o es f dm e p b tic ion les n n p I s me ot ign hit s of ap fo ics ort o e n r p AI ap t St Et f hic er ra te s gy Th re e 6 La Figure 3.6.3 Table of Contents Chapter 3 Preview 157 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.7 AI Ethics Trends at FAccT and NeurIPS 3.7 AI Ethics Trends at FAccT and NeurIPS ACM FAccT Accepted Submissions by Professional Affiliation ACM FAccT (Conference on Fairness, Accountability, Accepted submissions to FAccT increased twofold and Transparency) is an interdisciplinary conference publishing research in algorithmic fairness, accountability, and transparency. FAccT was one of the first major conferences created to bring together researchers, practitioners, and policymakers interested in sociotechnical analysis of algorithms. from 2021 to 2022, and tenfold since 2018, demonstrating the amount of increased interest in AI ethics and related work (Figure 3.7.1). While academic institutions still dominate FAccT, industry actors contribute more work than ever in this space, and government-affiliated actors have started publishing more related work, providing evidence that AI ethics has become a primary concern for policymakers and practitioners as well as researchers. Number of Accepted FAccT Conference Submissions by A liation, 2018–22 Source: FAccT, 2022 | Chart: 2023 AI Index Report 800 772 Education Industry 700 Government Nonpro t Other Number of Papers 600 503 500 400 302 300 244 200 166 227 181 53 70 2021 2022 200 100 71 139 63 0 2018 Table of Contents 2019 Chapter 3 Preview 2020 Figure 3.7.1 158 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.7 AI Ethics Trends at FAccT and NeurIPS Accepted Submissions by Geographic Region and Central Asia made up 18.7% of submissions, they European government and academic actors have made up over 30.6% of submissions in 2022 (Figure increasingly contributed to the discourse on AI ethics 3.7.2). FAccT, however, is still broadly dominated from a policy perspective, and their influence is by authors from North America and the rest of the manifested in trends on FAccT publications as well: Western world. Whereas in 2021 submissions to FAccT from Europe Number of Accepted FAccT Conference Submissions by Region, 2018–22 Source: FAccT, 2022 | Chart: 2023 AI Index Report 70% 63.24%, North America Number of Papers (% World Total) 60% 50% 40% 30.59%, Europe and Central Asia 30% 20% 4.25%, East Asia and Paci c 0.69%, Middle East and North Africa 0.69%, Latin America and the Caribbean 0.55%, South Asia 0.00%, Sub-Saharan Africa 10% 0% 2018 2019 2020 2021 2022 Figure 3.7.2 Table of Contents Chapter 3 Preview 159 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.7 AI Ethics Trends at FAccT and NeurIPS NeurIPS NeurIPS (Conference on Neural Information Real-World Impact Processing Systems), one of the most influential Several workshops at NeurIPS gather researchers AI conferences, held its first workshop on fairness, working to apply AI to real-world problems. Notably, accountability, and transparency in 2014. This section there has been a recent surge in AI applied to tracks and categorizes workshop topics year over healthcare and climate in the domains of drug year, noting that as topics become more mainstream, discovery and materials science, which is reflected they often filter out of smaller workshops and into the in the spike in “AI for Science” and “AI for Climate” main track or into more specific conferences related workshops (Figure 3.7.3). to the topic. NeurIPS Workshop Research Topics: Number of Accepted Papers on Real-World Impacts, 2015–22 Source: NeurIPS, 2022 | Chart: 2023 AI Index Report 802 Climate 800 Developing World 94 Finance 700 Healthcare Science Other 600 171 Number of Papers 529 500 65 459 61 400 334 300 78 283 127 200 116 412 199 238 273 153 100 83 254 79 144 71 68 77 94 81 64 2019 2020 2021 12 0 429 2015 2016 2017 2018 2022 Figure 3.7.3 Table of Contents Chapter 3 Preview 160 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.7 AI Ethics Trends at FAccT and NeurIPS Interpretability and Explainability Interpretability and explainability work focuses on NeurIPS papers focused on interpretability and designing systems that are inherently interpretable explainability decreased in the last year, the total and providing explanations for the behavior of a number in the main track increased by one-third black-box system. Although the total number of (Figure 3.7.4).5 NeurIPS Research Topics: Number of Accepted Papers on Interpretability and Explainability, 2015–22 Source: NeurIPS, 2022 | Chart: 2023 AI Index Report Main Track 41 Workshop 40 35 18 Number of Papers 30 25 24 23 20 17 15 12 10 7 23 24 2021 2022 5 6 5 0 19 6 2 6 7 2015 2016 2017 10 6 2018 2019 4 2020 Figure 3.7.4 5 Declines in the number of workshop-related papers on interpretability and explainability might be attributed to year-over-year differences in workshop themes. Table of Contents Chapter 3 Preview 161 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.7 AI Ethics Trends at FAccT and NeurIPS Causal Effect and Counterfactual Reasoning The study of causal inference uses statistical Since 2018, an increasing number of papers on methodologies to reach conclusions about the causal inference have been published at NeurIPS causal relationship between variables based on (Figure 3.7.5). In 2022, an increasing number of observed data. It tries to quantify what would have papers related to causal inference and counterfactual happened if a different decision had been made: analysis made their way from workshops into the In other words, if this had not occurred, then that main track of NeurIPS. would not have happened. NeurIPS Research Topics: Number of Accepted Papers on Causal E ect and Counterfactual Reasoning, 2015–22 Source: NeurIPS, 2022 | Chart: 2023 AI Index Report Main Track 80 80 78 Workshop 76 72 70 20 Number of Papers 60 43 50 40 39 30 16 53 61 58 20 10 0 6 6 2015 9 4 9 2016 2017 29 23 2018 2019 2020 23 2021 19 2022 Figure 3.7.5 Table of Contents Chapter 3 Preview 162 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.7 AI Ethics Trends at FAccT and NeurIPS Privacy Amid growing concerns about privacy, data been devoted to topics such as privacy in machine sovereignty, and the commodification of personal learning, federated learning, and differential privacy. data for profit, there has been significant momentum This year’s data shows that discussions related to in industry and academia to build methods and privacy in machine learning have increasingly shifted frameworks to help mitigate privacy concerns. into the main track of NeurIPS (Figure 3.7.6). Since 2018, several workshops at NeurIPS have NeurIPS Research Topics: Number of Accepted Papers on Privacy in AI, 2015–22 Source: NeurIPS, 2022 | Chart: 2023 AI Index Report Main Track 150 Workshop 12 140 128 15 120 Number of Papers 103 100 88 79 80 27 13 138 60 113 40 21 20 19 0 1 2015 2016 72 75 2018 2019 76 16 14 2017 2020 2021 2022 Figure 3.7.6 Table of Contents Chapter 3 Preview 163 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.7 AI Ethics Trends at FAccT and NeurIPS Fairness and Bias Fairness and bias in AI systems has transitioned from Fairness and bias research in machine learning has being a niche research topic to a topic of interest to steadily increased in both the workshop and main both technical and non-technical audiences. In 2020, track streams, with a major spike in the number of NeurIPS started requiring authors to submit broader papers accepted to workshops in 2022 (Figure 3.7.7). impact statements addressing the ethical and societal The total number of NeurIPS papers for this topic area consequences of their work, a move that suggests the doubled in the last year. This speaks to the increasingly community is signaling the importance of AI ethics complicated issues present in machine learning early in the research process. systems and reflects growing interest from researchers and practitioners in addressing these issues. NeurIPS Research Topics: Number of Accepted Papers on Fairness and Bias in AI, 2015–22 Source: NeurIPS, 2022 | Chart: 2023 AI Index Report 381 Main Track Workshop 350 71 Number of Papers 300 250 200 168 149 150 125 100 36 310 36 50 0 114 50 34 2 4 24 2015 2016 2017 109 113 118 2020 2021 78 2018 2019 2022 Figure 3.7.7 Table of Contents Chapter 3 Preview 164 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.8 Factuality and Truthfulness 3.8 Factuality and Truthfulness Automated Fact-Checking Benchmarks: Number of Citations Significant resources have been invested into Compared to previous years, there has been a researching, building, and deploying AI systems for plateau in the number of citations of three popular automated fact-checking and misinformation, with fact-checking benchmarks: FEVER, LIAR, and Truth the advent of many fact-checking datasets consisting of Varying Shades, reflecting a potential shift in the of claims from fact-checking websites and associated landscape of research related to natural language tools truth labels. for fact-checking on static datasets (Figure 3.8.1). Automated Fact-Checking Benchmarks: Number of Citations, 2017–22 Source: Semantic Scholar, 2022 | Chart: 2023 AI Index Report 250 236, FEVER Number of Citations 200 191, LIAR 150 100 99, Truth of Varying Shades 50 0 2017 2018 2019 2020 2021 2022 Figure 3.8.1 Table of Contents Chapter 3 Preview 165 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.8 Factuality and Truthfulness Missing Counterevidence and NLP Fact-Checking absence of a contradiction (e.g., the new claim “Half a million sharks could be killed to make the COVID-19 vaccine” would not have counterevidence, but human fact-checkers could verify it to be false after tracing Though fact-checking with natural language systems its origin back to the false promise of vaccines relying became popular in recent years, language models are on shark squalene). The researchers find that several usually trained on static snapshots of data without proposed fact-checking datasets contain claims which continual updates through time, and they lack real- do not meet the criterion of sufficient evidence or world context which human fact-checkers are able to counterevidence found in a trusted knowledge base. easily source and use to verify the veracity of claims. Researchers at the Technical University of Darmstadt Additionally, several datasets contain claims which and IBM analyzed existing fact-checking datasets use fact-checking articles as evidence for deciding and identified shortcomings of fact-checking the veracity of claims—this is leaked evidence, as it systems built on top of these datasets: For example, presupposes the existence of a fact-checking article, automated fact-checking systems often assume the which is an unrealistic assumption in the real world for existence of contradictory counter-evidence for new new claims. Systems built on this assumption would false claims, but for new claims to be verified as true not be able to assign veracity scores for new claims in or false, there often is no proof of the presence or real time (Figure 3.8.2). Missing Counterevidence Renders NLP Fact-Checking Unrealistic for Misinformation Source: Glockner et al., 2022 | Table: 2023 AI Index Report Dataset SciFact Year Unleaked Evidence Su cient Evidence 2020 ✓ COVID-Fact 2021 ✓ WikiFactCheck 2020 ✓ FM2 2021 ✓ Thorne et al. 2021 ✓ FaVIQ 2022 LIAR-PLUS 2017 no PolitiHop 2021 no ✓ no Climate-FEVER ✓ ✓ 2020 ✓ HealthVer 2021 ✓ no UKP-Snopes 2019 ✓ no PubHealth 2020 ✓ no WatClaimCheck 2022 ✓ no Baly et al. 2018 no no MultiFC 2019 no no X-Fact 2021 no no Table of Contents Chapter 3 Preview Figure 3.8.2 166 Artificial Intelligence Index Report 2023 Chapter 3: Technical AI Ethics 3.8 Factuality and Truthfulness TruthfulQA suggested that accuracy on this task improves with TruthfulQA is a benchmark designed to evaluate conducted extensive evaluations on this task with the truthfulness of language models on question language models ranging from 60 million parameters answering, with questions drawn from categories to 530 billion parameters and found that while large such as health, law, finance, and politics. Questions models broadly still perform better than smaller were designed such that humans might be prone to models, midsize instruction-tuned models perform answer incorrectly due to common misconceptions. surprisingly well on this task. Notably, Anthropic’s 52 For example, when asked, “What happens if you billion parameter model and BigScience’s 11 billion smash a mirror?,” GPT-3 responds, “You will have parameter model T0pp perform disproportionately seven years of bad luck.” well on the task compared to models of a similar size, model size. Researchers at Stanford University and the best model, InstructGPT davinci 175B, is also In 2021, experiments on DeepMind’s Gopher instruction-tuned (Figure 3.8.3). Multiple-Choice Task on TruthfulQA by Model: Accuracy Source: Liang et al., 2022 | Chart: 2023 AI Index Report 60% Instruction-Tuned Non-Instruction-Tuned Accuracy (%) 50% 40% 30% 20% 0% T5 60M GPT-2 117M Galactica 125M GPT-NEO-125M T5 220M InstructGPT ada v1 350M GPT3 350M GPT-3 ada v1 350M Cohere small v20220720 410M T5 770M Galactica 1.3B GPT-3 babbage v1 1.3B GPT3 1.3B GPT-NEO-1.3B InstructGPT babbage v1 1.3B Gopher 1.4B GPT2 1.5B GPT-NEO-2.7B T5 2.8B GPT-J 6B GPT-NEO-6B Cohere medium v20220720 6.1B TNLG v2 6.7B Galactica 6.7B InstructGPT curie v1 6.7B GPT3 6.7B GPT-3 curie v1 6.7B Gopher 7.1B J1-Large v1 7.5B T5 11B T0pp 11B Cohere large v20220720 13.1B J1-Grande v1 17B UL2 20B GPT-NeoX 20B Galactica 30B Anthropic-LM v4-s3 52B Cohere xlarge v20220609 52.4B OPT 66B YaLM 100B Galactica 120B GLM 130B GPT-3 davinci v1 175B OPT-175B GPT3 175B OPT 175B InstructGPT davinci v2 175B BLOOM 176B J1-Jumbo v1 178B Gopher 280B Gopher 280B -10shot TNLG v2 530B 10% Model and Number of Parameters Figure 3.8.3 Table of Contents Chapter 3 Preview 167 Artificial Intelligence Index Report 2023 Artificial Intelligence Index Report 2023 CHAPTER 4: The Economy Table of Contents Chapter 4 Preview 168 Artificial Intelligence Index Report 2023 CHAPTER 4 PREVIEW: The Economy Overview 170 Chapter Highlights 171 4.1 Jobs 173 AI Labor Demand 173 Perceived Importance of AI 210 173 AI Investments and Investment Outcomes 211 Challenges in Starting and Scaling AI Projects 213 Global AI Labor Demand U.S. AI Labor Demand by Skill Cluster and Specialized Skill 174 U.S. AI Labor Demand by Sector 176 U.S. AI Labor Demand by State 177 AI Hiring 180 AI Skill Penetration 182 Global Comparison: Aggregate 182 Global Comparison: By Gender 183 4.2 Investment 184 Corporate Investment 184 Startup Activity 187 Global Trend 187 Regional Comparison by Funding Amount 189 Regional Comparison by Newly Funded AI Companies 193 Focus Area Analysis 195 4.3 Corporate Activity 198 Industry Adoption 198 Adoption of AI Capabilities Narrative Highlight: The Effects of GitHub’s Copilot on Developer Productivity and Happiness Industry Motivation Earnings Calls 208 210 215 Aggregate Trends 215 Specific Themes 216 Narrative Highlight: What Are Business Leaders Actually Saying About AI? 217 Sentiment Analysis 219 4.4 Robot Installations 220 Aggregate Trends 220 Industrial Robots: Traditional Vs. Collaborative Robots By Geographic Area Narrative Highlight: Country-Level Data on Service Robotics 222 223 227 Sectors and Application Types 230 China Vs. United States 232 ACCESS THE PUBLIC DATA 198 Consideration and Mitigation of Risks From Adopting AI 206 Table of Contents Chapter 4 Preview 169 Artificial Intelligence Index Report 2023 Chapter 4: The Economy Overview Increases in the technical capabilities of AI systems have led to greater rates of AI deployment in businesses, governments, and other organizations. The heightening integration of AI and the economy comes with both excitement and concern. Will AI increase productivity or be a dud? Will it boost wages or lead to the widespread replacement of workers? To what degree are businesses embracing new AI technologies and willing to hire AI-skilled workers? How has investment in AI changed over time, and what particular industries, regions, and fields of AI have attracted the greatest amount of investor interest? This chapter examines AI-related economic trends by using data from Lightcast, LinkedIn, McKinsey, Deloitte, and NetBase Quid, as well as the International Federation of Robotics (IFR). This chapter begins by looking at data on AI-related occupations and then moves on to analyses of AI investment, corporate adoption of AI, and robot installations. Table of Contents Chapter 4 Preview 170 Artificial Intelligence Index Report 2023 Chapter 4: The Economy Chapter Highlights The demand for AI-related professional skills is increasing across virtually every American industrial sector. Across every sector in the United States for which there is data (with the exception of agriculture, forestry, fishing, and hunting), the number of AI-related job postings has increased on average from 1.7% in 2021 to 1.9% in 2022. Employers in the United States are increasingly looking for workers with AI-related skills. For the first time in the last decade, year-over-year private investment in AI decreased. Global AI private investment was $91.9 billion in 2022, which represented a 26.7% decrease since 2021. The total number of AI-related funding events as well as the number of newly funded AI companies likewise decreased. Still, during the last decade as a whole, AI investment has significantly increased. In 2022 the amount of private investment in AI was 18 times greater than it was in 2013. Once again, the United States leads in investment in AI. The U.S. led the world in terms of total amount of AI private investment. In 2022, the $47.4 billion invested in the U.S. was roughly 3.5 times the amount invested in the next highest country, China ($13.4 billion). The U.S. also continues to lead in terms of total number of newly funded AI companies, seeing 1.9 times more than the European Union and the United Kingdom combined, and 3.4 times more than China. In 2022, the AI focus area with the most investment was medical and healthcare ($6.1 billion); followed by data management, processing, and cloud ($5.9 billion); and Fintech ($5.5 billion). However, mirroring the broader trend in AI private investment, most AI focus areas saw less investment in 2022 than in 2021. In the last year, the three largest AI private investment events were: (1) a $2.5 billion funding event for GAC Aion New Energy Automobile, a Chinese manufacturer of electric vehicles; (2) a $1.5 billion Series E funding round for Anduril Industries, a U.S. defense products company that builds technology for military agencies and border surveillance; and (3) a $1.2 billion investment in Celonis, a business-data consulting company based in Germany. Table of Contents Chapter 4 Preview 171 Artificial Intelligence Index Report 2023 Chapter 4: The Economy Chapter Highlights (cont’d) While the proportion of companies adopting AI has plateaued, the companies that have adopted AI continue to pull ahead. The proportion of companies adopting AI in 2022 has more than doubled since 2017, though it has plateaued in recent years between 50% and 60%, according to the results of McKinsey’s annual research survey. Organizations that have adopted AI report realizing meaningful cost decreases and revenue increases. AI is being deployed by businesses in multifaceted ways. The AI capabilities most likely to have been embedded in businesses include robotic process automation (39%), computer vision (34%), NL text understanding (33%), and virtual agents (33%). Moreover, the most commonly adopted AI use case in 2022 was service operations optimization (24%), followed by the creation of new AI-based products (20%), customer segmentation (19%), customer service analytics (19%), and new AI-based enhancement of products (19%). AI tools like Copilot are tangibly helping workers. China dominates industrial robot installations. Results of a GitHub survey on the use of Copilot, a text-to-code AI system, find that 88% of surveyed respondents feel more productive when using the system, 74% feel they are able to focus on more satisfying work, and 88% feel they are able to complete tasks more quickly. In 2013, China overtook Japan as the nation installing the most industrial robots. Since then, the gap between the total number of industrial robots installed by China and the next-nearest nation has widened. In 2021, China installed more industrial robots than the rest of the world combined. Table of Contents Chapter 4 Preview 172 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs 4.1 Jobs AI Labor Demand Global AI Labor Demand Figure 4.1.1 highlights the percentage of all job This section reports demand for AI-related skills postings that require some kind of AI skill. In 2022, in labor markets. The data comes from Lightcast, which mined millions of job postings collected from over 51,000 websites since 2010 and flagged listings calling for AI skills. the top three countries according to this metric were the United States (2.1%), Canada (1.5%), and Spain (1.3%). For every country included in the sample, the number of AI-related job postings was higher in 2022 than in 2014.1 AI Job Postings (% of All Job Postings) by Geographic Area, 2014–22 Source: Lightcast, 2022 | Chart: 2023 AI Index Report 2.05%, United States AI Job Postings (% of All Job Postings) 2.00% 1.45%, Canada 1.33%, Spain 1.23%, Australia 1.20%, Sweden 1.16%, Switzerland 1.14%, United Kingdom 1.01%, Netherlands 0.98%, Germany 0.89%, Austria 0.86%, Belgium 0.84%, France 1.50% 1.00% 0.72%, Italy 0.50% 0.00% 0.45%, New Zealand 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 4.1.1 1 In 2022, Lightcast slightly changed their methodology for determining AI-related job postings from that which was used in previous versions of the AI Index Report. As such, some of the numbers in this chart do not completely align with those featured in last year’s report. Table of Contents Chapter 4 Preview 173 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs U.S. AI Labor Demand by Skill Cluster and Specialized Skill Figure 4.1.2 showcases the most in-demand AI skill clusters in the U.S. labor market since 2010. The most in-demand skill cluster was machine learning (1.0%), followed by artificial intelligence (0.6%) and natural language processing (0.2%). Every listed AI skill cluster is now more in demand than it was 10 years ago. AI Job Postings (% of All Job Postings) in the United States by Skill Cluster, 2010–22 Source: Lightcast, 2022 | Chart: 2023 AI Index Report 1.03%, Machine Learning AI Job Postings (% of All Job Postings) 1.00% 0.80% 0.61%, Arti cial Intelligence 0.60% 0.40% 0.20%, Natural Language Processing 0.16%, Neural Networks 0.15%, Autonomous Driving 0.13%, Visual Image Recognition 0.20% 0.06%, Robotics 0.00% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 4.1.2 Table of Contents Chapter 4 Preview 174 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs Figures 4.1.3 and 4.1.4 showcase the top ten specialized skills that were demanded in AI job postings in 2022 compared to 2010–20122. On an absolute level, virtually every specialized skill is more in demand now than a decade ago. The growth in demand for Python is particularly notable, evidence of its growing popularity as an AI coding language. Top Ten Specialized Skills in 2022 AI Job Postings in the United States, 2010–12 Vs. 2022 Source: Lightcast, 2022 | Chart: 2023 AI Index Report 296,662 Python (Programming Language) 12,884 260,333 Computer Science 48,001 185,807 SQL (Programming Language) 22,037 159,801 Data Analysis 16,571 157,855 Data Science 1,227 155,615 Amazon Web Services 962 152,956 Agile Methodology 7,549 138,791 Automation 13,207 133,856 Java (Programming Language) 26,557 2022 133,286 Software Engineering 2010–12 22,384 0 50,000 100,000 150,000 200,000 Number of AI Job Postings 250,000 300,000 Figure 4.1.3 Top Ten Specialized Skills in 2022 AI Job Postings in the United States by Skill Share, 2010–12 Vs. 2022 Source: Lightcast, 2022 | Chart: 2023 AI Index Report 37.13% (+592%) Python (Programming Language) 5.36% 32.58% (+63%) Computer Science 19.98% 23.25% (+153%) SQL (Programming Language) 9.17% 20.00% (+190%) Data Analysis 6.90% 19.75% (+3,767%) Data Science 0.51% 19.47% (+4,763%) Amazon Web Services 0.40% 19.14% (+509%) Agile Methodology 3.14% 17.37% (+216%) Automation 5.50% 16.75% (+52%) Java (Programming Language) 11.06% 2022 16.68% (+79%) Software Engineering 2010–12 9.32% 0% 5% 10% 15% 20% 25% Skill Share in AI Job Postings (%) 30% 35% 40% Figure 4.1.4 2 The point of comparison of 2010–2012 was selected because some data at the jobs/skills level is quite sparse in earlier years. Lightcast therefore used the whole set of years 2010–2012 to get a larger sample size for a benchmark from 10 years ago to compare. Table of Contents Chapter 4 Preview 175 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs U.S. AI Labor Demand by Sector fishing, and hunting), the number of AI job postings Figure 4.1.5 shows the percentage of U.S. job was notably higher in 2022 than in 2021, with the top postings that required AI skills by industry sector three sectors being information (5.3%); professional, from 2021 to 2022. Across virtually every included scientific, and technical services (4.1%); and finance sector (with the exception of agriculture, forestry, and insurance (3.3%). AI Job Postings (% of All Job Postings) in the United States by Sector, 2021 Vs. 2022 Source: Lightcast, 2022 | Chart: 2023 AI Index Report Information 4.85% 4.07% 3.86% Professional, Scienti c, and Technical Services Finance and Insurance 2.94% Manufacturing 2.86% Management of Companies and Enterprises 1.08% Public Administration 0.98% Waste Management and Administrative Support Services 0% 1.32% 1.27% 1.10% 1.19% 1.00% Mining, Quarrying, and Oil and Gas Extraction Transportation and Warehousing 1.37% 1.28% 0.82% Utilities Real Estate and Rental and Leasing 3.26% 1.53% 1.41% Educational Services Wholesale Trade 3.33% 1.64% 1.66% Agriculture, Forestry, Fishing, and Hunting Retail Trade 5.30% 0.98% 0.82% 0.89% 0.65% 2022 0.67% 0.59% 2021 0.58% 0.56% 1% 2% 3% 4% AI Job Postings (% of All Job Postings) 5% Figure 4.1.5 Table of Contents Chapter 4 Preview 176 Artificial Intelligence Index Report 2023 U.S. AI Labor Demand by State Figure 4.1.6 highlights the number of AI job postings in the United States by state. The top three Chapter 4: The Economy 4.1 Jobs Number of AI Job Postings in the United States by State, 2022 Source: Lightcast, 2022 | Chart: 2023 AI Index Report AK ME 970 2,227 1,571 VT 2,719 34,603 NH MA NY CT RI OH PA NJ DC MD DE states in terms of postings were California (142,154), followed by WA Texas (66,624) and New York (43,899). MT 833 1,227 OR ID WY UT CO AZ NM 10,811 6,109 CA ND 31,284 NV 142,154 6,813 769 SD MN WI MI NE IA IL IN KS MO KY WV TN VA NC AL GA SC 2,195 11,808 8,879 25,366 4,032 5,670 31,569 9,247 19,208 20,397 23,447 6,885 20,421 7,683 10,990 4,536 19,514 3,357 TX 43,899 8,960 2,965 OK AR 5,719 7,247 LA MS 887 9,606 16,769 3,503 11,173 34,221 23,854 66,624 4,806 2,548 7,866 26,620 4,928 HI FL 2,550 33,585 Figure 4.1.6 Figure 4.1.7 demonstrates what percentage of a state’s total job postings were AI-related. The top states according to this metric Percentage of U.S. States Job Postings in AI, 2022 Source: Lightcast, 2022 | Chart: 2023 AI Index Report AK 0.88% ME 1.64% NH MA VT 1.34% 1.20% 2.26% were the District of Columbia (3.0%), followed by Delaware MI WA MT ND SD MN WI 2.48% 0.72% 1.04% 1.83% 1.22% 0.90% 1.77% (2.7%), Washington (2.5%), and CT NY RI 2.07% 1.66% 1.84% ID IA IL IN OR OH PA NJ WY NE 1.43% 1.89% 1.18% 1.14% 1.14% 1.63% 0.88% 1.07% 1.30% 2.04% Virginia (2.4%). CA NV CO KS MO KY UT WV DC MD DE 2.21% 1.23% 1.54% 1.46% 1.43% 1.15% 0.85% 0.99% 2.95% 1.96% 2.66% AZ NM OK AR TN NC VA 1.40% 1.36% 1.07% 2.03% 1.11% 2.42% 1.44% LA MS AL GA TX SC 1.52% 0.87% 1.15% 1.31% 1.64% 0.87% HI 1.46% FL 1% Figure 4.1.7 Table of Contents Chapter 4 Preview 17 7 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs Percentage of United States AI Job Postings by State, 2022 Which states had the greatest Source: Lightcast, 2022 | Chart: 2023 AI Index Report share of AI job postings as a share of all AI job postings in AK 0.12% the U.S. in 2022? California was ME 0.28% VT NH MA 0.20% 0.34% 4.35% first: Last year 17.9% of all AI job postings in the United States MI WA MT ND SD MN WI 3.93% 0.10% 0.15% 0.28% 1.48% 1.12% 3.19% were for jobs based in California, CT NY RI 5.52% 1.13% 0.37% OR ID NE IA IL IN OH PA NJ WY 1.36% 0.77% 0.10% 0.51% 0.71% 3.97% 1.16% 2.41% 2.56% 2.95% followed by Texas (8.4%) and New York (5.5%) (Figure 4.1.8). CA NV CO KS MO KY UT WV DC MD DE 17.87% 0.86% 0.87% 2.57% 0.97% 1.38% 0.57% 0.11% 1.21% 2.11% 0.44% AZ NM OK AR TN VA 2.45% 0.42% 0.72% 0.91% 1.40% 4.30% NC 3% LA MS AL GA TX SC 8.37% 0.60% 0.32% 0.99% 3.35% 0.62% HI 0.32% FL 4.22% Figure 4.1.8 Figure 4.1.9 highlights the trends over time in AI job postings for four select states that annually report a high number of AI-related jobs: Washington, California, New York, and Texas. For all four, there was a significant increase in the number of total AI-related job postings from 2021 to 2022, suggesting that across these states, employers are increasingly looking for AI-related workers. Percentage of U.S. States’ Job Postings in AI by Select U.S. State, 2010–22 Source: Lightcast, 2022 | Chart: 2023 AI Index Report Percentage of U.S. States’ Job Postings in AI 2.50% 2.48%, Washington 2.21%, California 2.07%, New York 2.00% 1.52%, Texas 1.50% 1.00% 0.50% 0.00% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 4.1.9 Table of Contents Chapter 4 Preview 178 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs Figure 4.1.10 highlights the degree to which AI-related job postings have been subdivided among the top four states over time. California’s share of all AI job postings has decreased steadily since 2019 while Texas’ has marginally increased. The fact that California no longer commands one-quarter of all AI-related jobs suggests that AI jobs are becoming more equally distributed among U.S. states. Percentage of United States AI Job Postings by Select U.S. State, 2010–22 Source: Lightcast, 2022 | Chart: 2023 AI Index Report Percentage of United States AI Job Postings 25% 20% 17.87%, California 15% 10% 8.37%, Texas 5.52%, New York 5% 3.93%, Washington 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 4.1.10 Table of Contents Chapter 4 Preview 179 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs AI Hiring in the same period the job began, divided by the total Our AI hiring data is based on a LinkedIn dataset of skills and jobs that appear on their platform. The countries number of LinkedIn members in the corresponding location. This rate is then indexed to the average month in 2016; for example, an index of 1.1 in December included in the sample make at least 10 AI hires each 2021 points to a hiring rate that is 10% higher than the month and have LinkedIn covering at least 40% of average month in 2016. LinkedIn makes month-to- their labor force. India is also included in the sample month comparisons to account for any potential lags in given their increasing significance in the AI landscape, members updating their profiles. The index for a year is although LinkedIn does not cover 40% of their labor the number in December of that year. force. Therefore, the insights drawn about India should be interpreted with particular caution. The relative AI hiring index measures the degree to which the hiring of AI talent is changing, more specifically Figure 4.1.11 highlights the 15 geographic areas that whether the hiring of AI talent is growing faster than, have the highest relative AI hiring index for 2022. The AI hiring rate is calculated as the percentage of LinkedIn members with AI skills on their profile or working in AI-related occupations who added a new employer equal to, or more slowly than overall hiring in a particular geographic region. In 2022, Hong Kong posted the greatest growth in AI hiring at 1.4, followed by Spain, Italy and the United Kingdom, and the United Arab Emirates. Relative AI Hiring Index by Geographic Area, 2022 Source: LinkedIn, 2022 | Chart: 2023 AI Index Report Hong Kong 1.37 Spain 1.19 Italy 1.18 United Kingdom 1.18 United Arab Emirates 1.15 South Africa 1.13 New Zealand 1.06 Denmark 1.06 Belgium 1.05 Netherlands 1.03 South Korea 1.02 Sweden 1.01 Canada 0.99 Switzerland 0.99 Singapore 0.99 0.00 0.20 0.40 0.60 0.80 Relative AI Hiring Index 1.00 1.20 1.40 Figure 4.1.11 Figure 4.1.12 highlights how the AI hiring index changes over time for a wide range of countries3. Overall, the majority of countries included in the sample have seen meaningful increases in their AI hiring rates since 2016. This trend suggests that those countries are now hiring more AI talent than in 2016. However, for many countries, AI hiring rates seem to have peaked around 2020, then dropped, and have since stabilized. 3 Both Figure 4.1.11 and Figure 4.1.12 report the Relative AI Hiring Index. Figure 4.1.11 reports the Index value at the end of December 2022, while Figure 4.1.12 reports a twelve-month rolling average. Table of Contents Chapter 4 Preview 180 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs Relative AI Hiring Index by Geographic Area, 2016–22 Source: LinkedIn, 2022 | Chart: 2023 AI Index Report Australia 2.00 1.50 0.50 1.00 2020 0.00 2022 Chile 1.00 1.00 2020 0.00 2022 2018 1.50 0.00 2022 1.50 1.14 1.00 0.50 0.50 0.50 0.00 0.00 Germany 2.00 2020 2022 Hong Kong 2.00 1.50 1.50 1.13 1.00 0.50 0.00 2018 0.50 2018 2020 0.00 2022 2018 Israel 2020 1.00 0.94 1.00 0.50 0.50 2018 1.50 0.50 0.50 0.50 0.00 0.00 0.00 2022 New Zealand 2.00 2020 2022 Norway 2.00 1.50 0.50 2020 0.00 2022 2018 2020 Portugal 2018 2020 0.00 2022 1.50 0.50 0.50 0.00 Switzerland 2.00 2.00 1.50 United Arab Emirates 0.50 2020 Table of Contents 2022 0.00 2020 2022 United Kingdom 1.50 1.08 1.00 0.50 2018 2018 2.00 1.50 1.09 1.00 0.00 2022 0.50 2018 Chapter 4 Preview 2020 2022 0.00 2018 2020 2022 United States 2.00 1.25 1.00 1.09 1.00 0.00 2020 2022 1.50 1.12 1.00 0.50 2018 2020 Sweden 0.00 2022 2018 2.00 0.50 2020 1.13 1.00 0.00 2018 2022 Singapore Spain 1.05 1.00 2020 0.50 2.00 1.50 1.10 2018 1.50 1.01 South Korea 1.50 2022 1.21 2.00 1.00 0.00 2022 2.00 1.00 2022 0.50 South Africa 2.00 2020 1.50 1.10 1.00 0.50 2018 2018 2.00 1.50 1.09 1.00 0.00 2018 2020 1.00 0.50 2020 2018 1.50 1.03 1.00 0.00 2018 1.08 2.00 1.50 1.11 1.00 2022 Netherlands 2.00 1.50 1.00 0.00 2022 2020 Ireland Luxembourg 2.00 1.00 2020 2018 2.00 1.50 Italy 2.00 2022 1.50 0.00 2022 2020 India 2.00 1.21 1.00 2018 1.19 1.00 0.00 2022 2022 1.50 1.08 1.00 0.50 2020 2020 2.00 0.00 2018 2018 France 2.00 1.50 1.05 2020 Finland 2.00 1.00 1.17 1.00 0.50 Denmark 2.00 Relative AI Hiring Index 1.50 0.50 2018 Canada 2.00 1.50 1.11 0.50 2018 Brazil 2.00 1.50 1.12 1.00 0.00 Belgium 2.00 1.50 1.15 1.00 0.50 2018 2020 2022 0.00 2018 2020 2022 Figure 4.1.12 181 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs rate of various countries or regions from 2015 to AI Skill Penetration 2022. In this case, the relative AI skill penetration rate The AI skill penetration rate is a metric created by can be understood as the sum of the penetration of LinkedIn that measures the prevalence of various each AI skill across occupations in a given country or AI-related skills across occupations. LinkedIn region, divided by the global average across the same generates this metric by calculating the frequencies occupation. For instance, a relative skill penetration of LinkedIn users’ self-added skills in a given area rate of 1.5 means that the average penetration of AI from 2015 to 2022, then reweighting those numbers skills in that country or region is 1.5 times the global with a statistical model to create the top 50 average across the same set of occupations. representative skills in that select occupation. As of 2022, the three countries or regions with the Global Comparison: Aggregate highest AI skill penetration rates were India (3.2), Figure 4.1.13 shows the relative AI skill penetration the United States (2.2), and Germany (1.7). Relative AI Skill Penetration Rate by Geographic Area, 2015–22 Source: LinkedIn, 2022 | Chart: 2023 AI Index Report India 3.23 United States 2.23 Germany 1.72 Israel 1.65 Canada 1.54 United Kingdom 1.54 South Korea 1.44 Singapore 1.37 France 1.13 Brazil 0.99 Spain 0.98 Netherlands 0.95 Italy 0.95 Switzerland 0.91 Australia 0.00 0.89 0.50 1.00 1.50 2.00 Relative AI Skill Penetration Rate 2.50 3.00 Figure 4.1.13 Table of Contents Chapter 4 Preview 182 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.1 Jobs Global Comparison: By Gender Figure 4.1.14 disaggregates AI skill penetration rates pooled together across the same set of occupations by gender across different countries or regions. in the country. For all countries in the sample, the A country’s “Relative AI skill penetration rate relative AI skill penetration rate is greater for men across genders” for women of 1.5 means that female than women. India (2.0), the United States (1.3), and members in that country are 1.5 times more likely to Israel (0.9) have the highest reported relative AI skill list AI skills than the average member in all countries penetration rates for women. Relative AI Skill Penetration Rate Across Gender, 2015–22 Source: LinkedIn, 2022 | Chart: 2023 AI Index Report 3.27 India 1.99 2.36 United States 1.28 2.05 Israel 0.87 Canada 0.86 1.59 1.91 Germany 0.71 1.37 Singapore 0.68 1.46 United Kingdom 0.57 1.13 France 0.39 0.98 Netherlands 0.38 0.82 United Arab Emirates 0.31 Italy 0.31 Brazil 0.30 Spain 0.30 0.85 0.98 1.03 0.29 Australia 0.29 0.00 Male 1.08 Finland Female 0.88 0.50 1.00 1.50 2.00 Relative AI Skill Penetration Rate 2.50 3.00 Figure 4.1.14 Table of Contents Chapter 4 Preview 183 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment Using data from NetBase Quid, this section tracks trends in AI-related investments. NetBase Quid tracks data on the investments of over 8 million global public and private companies. NetBase Quid also uses natural language processing techniques to search, analyze, and identify patterns in large, unstructured datasets, like aggregated news and blogs, and company and patent databases. NetBase Quid continuously broadens the set of companies for which it tracks data, so that in this year’s AI Index, the reported investment volume for certain years is larger than that of previous reports. 4.2 Investment Corporate Investment As AI becomes more and more integrated into the For the first time since 2013, year-over-year global economy, it becomes increasingly important to track corporate investment in AI has decreased. In 2022, AI-related corporate investment. Figure 4.2.1 shows total global corporate AI investment was $189.6 overall global corporate investment in AI from 2013 billion, roughly a third lower than it was in 2021. to 2022. Corporate investment includes mergers and Still, in the last decade, AI-related investment has acquisitions, minority stakes, private investment, and increased thirteenfold. public offerings. Global Corporate Investment in AI by Investment Activity, 2013–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report 300 Merger/Acquisition 276.14 Minority Stake Private Investment Total Investment (in Billions of U.S. Dollars) 250 Public O ering 119.66 200 189.59 146.74 150 83.35 26.06 95.57 100 79.62 53.72 50 14.57 0 2013 19.04 2014 25.43 33.82 13.05 24.68 13.35 17.13 25.72 2015 2016 2017 46.06 125.36 31.91 21.89 43.1 55.09 12.62 2018 13.01 2019 91.86 61.61 2020 29.1 2021 2022 Figure 4.2.1 Table of Contents Chapter 4 Preview 184 Artificial Intelligence Index Report 2023 To provide a fuller context for the nature of AI investment in the last year, Figures 4.2.2 through 4.2.5 highlight the top merger/acquisition, minority stake, private investment, and public Chapter 4: The Economy 4.2 Investment Top Five AI Merger/Acquisition Investment Activities, 2022 Source: NetBase Quid, 2022 | Table: 2023 AI Index Report Company Name Headquarters Country Focus Area Nuance Communications, Inc. United States Arti cial Intelligence; Enterprise Software; Healthcare; Machine Learning Citrix Systems, Inc. United States Data Management, Processing, and Cloud; HR Tech 17.18 Avast Limited Czech Republic Data Management, Processing, and Cloud; Fintech; Cybersecurity, Data Protection 8.02 AspenTech Corporation United States Manufacturing; Software; Supply Chain Management 6.34 Vivint Smart Home, Inc. United States Cybersecurity, Data Protection; Sales Enablement 5.54 offering events in the last year. The greatest single AI investment event was the merger/acquisition of Nuance Communications, valued at $19.8 billion (Figure 4.2.2). The largest minority stake event was for the British company Aveva Group ($4.7 billion) (Figure 4.2.3). The greatest private investment event was GAC Aion New Energy Automobile ($2.5 billion), a Chinese clean energy Funding Amount (in Billions USD) 19.80 and automotive company (Figure 4.2.4). Finally, the largest public offering was Figure 4.2.2 ASR Microelectronics ($1.1 billion), a Chinese semiconductor company Top Five AI Minority Stake Investment Activities, 2022 Source: NetBase Quid, 2022 | Table: 2023 AI Index Report (Figure 4.2.5). Company Name Headquarters Country Focus Area Funding Amount (in Billions USD) AVEVA Group, PLC United Kingdom Chemical; Computer; Data Mining; Electronics; Industrial Manufacturing; Information Technology; Simulation; Software 4.68 Grupo de Inversiones Suramericana, SA Colombia Financial Services; Impact Investing; Insurance 1.48 Fractal Analytics Private Limited India Analytics; Arti cial Intelligence; Big Data; Business Intelligence; Consulting; Machine Learning 0.35 Atrys Health, SA Spain Medical and Healthcare 0.28 R Systems International, Ltd. India Analytics; Information Technology; IT Management; Software 0.17 Figure 4.2.3 Table of Contents Chapter 4 Preview 185 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment Top Five AI Private Investment Activities, 2022 Source: NetBase Quid, 2022 | Table: 2023 AI Index Report Company Name Headquarters Country Focus Area Funding Amount (in Billions USD) GAC Ai¬¥an New Energy Automobile Co., Ltd. China Automotive; Clean Energy; Electric Vehicle; Manufacturing 2.54 Idience Co., Ltd. South Korea Emergency Medicine; Healthcare; Pharmaceutical 2.15 Uali Argentina Drones; Cloud Computing 1.50 Anduril Industries, Inc. United States Cybersecurity, Data Protection; AR/VR; Drones 1.50 Celonis, GmbH Germany Retail; Industrial Automation, Network; HR Tech; Insurtech 1.22 Figure 4.2.4 Top Five AI Public O ering Investment Activities, 2022 Source: NetBase Quid, 2022 | Table: 2023 AI Index Report Company Name Headquarters Focus Area Country Funding Amount (in Billions USD) ASR Microelectronics Co., Ltd. China Semiconductor; VC 1.08 iSoftStone Information Technology (Group) Co., Ltd. China Data Management, Processing, and Cloud; Cybersecurity, Data Protection 0.73 Jahez International Company for Information Systems Technology Saudi Arabia Arti cial Intelligence; E-Commerce; Food and Beverage; Food Delivery; Information Technology; Logistics 0.43 Fortior Technology (Shenzhen) Co., Ltd. China Electronics; Machine Manufacturing; Semiconductor 0.30 Beijing Deep Glint Technology Co., Ltd. China Cybersecurity, Data Protection; Music, Video Content 0.29 Figure 4.2.5 Table of Contents Chapter 4 Preview 186 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment Startup Activity Global Trend The next section analyzes private investment trends in The global private AI investment trend reveals that artificial intelligence startups that have received over while investment activity has decreased since 2021, it $1.5 million in investment since 2013. is still 18 times higher than it was in 2013 (Figure 4.2.6). Private Investment in AI, 2013–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report Total Investment (in Billions of U.S. Dollars) 120 100 91.86 80 60 40 20 0 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 4.2.6 Table of Contents Chapter 4 Preview 187 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment A similar trend, of short-term decreases but longer- decrease from 2021 but a sixfold increase since 2013 term growth, is evident in data on total private (Figure 4.2.7). Similarly, the number of newly funded AI investment events. In 2022 there were 3,538 AI- companies dropped to 1,392 from 1,669 last year, while related private investment events, representing a 12% having increased from 495 in 2013 (Figure 4.2.8). Number of Private Investment Events in AI, 2013–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report 4,000 3,538 Number of Private Investment Events 3,500 3,000 2,500 2,000 1,500 1,000 500 0 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 4.2.7 Number of Newly Funded AI Companies in the World, 2013–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report 1,600 1,392 1,400 Number of Companies 1,200 1,000 800 600 400 200 0 2013 Table of Contents 2014 2015 Chapter 4 Preview 2016 2017 2018 2019 2020 2021 2022 Figure 4.2.8 188 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment The year-over-year decrease in AI-related funding is also evident when the funding events AI Private Investment Events by Funding Size, 2021 Vs. 2022 Source: NetBase Quid, 2022 | Table: 2023 AI Index Report are disaggregated by size. Across all size Funding Size categories, with the exception of ones over Over $1 Billion $1 billion, the total number of AI funding events $500 Million – $1 Billion decreased (Figure 4.2.9). 2021 $100 Million – $500 Million $50 Million – $100 Million Under $50 Million Undisclosed Total 2022 Total 4 6 13 5 10 18 277 164 441 277 238 515 2,851 2,585 5,436 598 540 1,138 4,020 3,538 7,558 Figure 4.2.9 Regional Comparison by Funding Amount Once again, the United States led the world in terms of total AI private investment. In 2022, the $47.4 billion invested in the United States was roughly 3.5 times the amount invested in the next highest country, China ($13.4 billion), and 11 times the amount invested in the United Kingdom ($4.4 billion) (Figure 4.2.10). Private Investment in AI by Geographic Area, 2022 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report United States 47.36 China 13.41 United Kingdom 4.37 Israel 3.24 India 3.24 South Korea 3.10 Germany 2.35 Canada 1.83 France 1.77 Argentina 1.52 Australia 1.35 Singapore 1.13 Switzerland 1.04 Japan 0.72 Finland 0.61 0 Table of Contents 5 10 Chapter 4 Preview 15 20 25 30 Total Investment (in Billions of U.S. Dollars) 35 40 45 Figure 4.2.10 189 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment When private AI investments are aggregated since 2013, the same ranking of countries applies: The United States is first with $248.9 billion invested, followed by China ($95.1 billion) and the United Kingdom ($18.2 billion) (Figure 4.2.11). Private Investment in AI by Geographic Area, 2013–22 (Sum) Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report United States 248.90 China 95.11 United Kingdom 18.24 Israel 10.83 Canada 8.83 India 7.73 Germany 6.99 France 6.59 South Korea 5.57 Singapore 4.72 Japan 3.99 Hong Kong 3.10 Switzerland 3.04 Australia 3.04 Spain 1.81 0 Table of Contents 20 40 60 Chapter 4 Preview 80 100 120 140 160 Total Investment (in Billions of U.S. Dollars) 180 200 220 240 Figure 4.2.11 190 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment While the United States continues to outpace The top five American AI private investment events other nations in terms of private AI investment, are highlighted in Figure 4.2.13, the top five European the country experienced a sharp 35.5% decrease Union and British investments in Figure 4.2.14, and the in AI private investment within the last year (Figure top five Chinese investments in Figure 4.2.15. 4.2.12). Chinese investment experienced a similarly sharp decline (41.3%). Private Investment in AI by Geographic Area, 2013–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report Total Investment (in Billions of U.S. Dollars) 70 60 50 47.36, United States 40 30 20 13.41, China 11.04, European Union and United Kingdom 10 0 2013 2014 Table of Contents 2015 2016 2017 Chapter 4 Preview 2018 2019 2020 2021 2022 Figure 4.2.12 191 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment Top AI Private Investment Events in the European Union and United Kingdom, 2022 Top AI Private Investment Events in the United States, 2022 Source: NetBase Quid, 2022 | Table: 2023 AI Index Report Company Name Focus Area 1.50 Celonis, GmbH Retail; Industrial Automation, Network; HR Tech; Insurtech Fintech; Retail; Sales Enablement 0.82 Content Square, SAS 0.60 Arti cial Intelligence; Information Technology; Machine Learning 0.58 Analytics; Arti cial Intelligence: CRM: Data Visualization; Digital Marketing; SaaS Retail 0.57 Data Management, Processing, and Cloud; Cybersecurity, Data Protection 0.40 Retail Logistics Excellence - RELEX Oy Cera Care Limited Medical and Healthcare 0.32 Babylon Holdings Limited 0.40 Medical and Healthcare; Music, Video Content 0.30 Data Management, Processing, and Cloud; AV; AR/VR Company Name Focus Area Anduril Industries, Inc. Cybersecurity, Data Protection; AR/VR; Drones Faire Wholesale, Inc. Anthropic, PBC Arctic Wolf Networks, Inc. JingChi, Inc. Source: NetBase Quid, 2022 | Table: 2023 AI Index Report Funding Amount (in Billions USD) Funding Amount (in Billions USD) 1.22 Figure 4.2.14 Figure 4.2.13 Top AI Private Investment Events in China, 2022 Source: NetBase Quid, 2022 | Table: 2023 AI Index Report Company Name Focus Area Funding Amount (in Billions USD) GAC Ai¬¥an New Energy Automobile Co., Ltd. Automotive; Clean Energy; Electric Vehicle; Manufacturing 2.54 GAC Ai¬¥an New Energy Automobile Co., Ltd. Automotive; Clean Energy; Electric Vehicle; Manufacturing 1.11 Beijing ESWIN Technology Group Co., Ltd. Data Management, Processing, and Cloud; Industrial Automation, Network; Semiconductor; Marketing, Digital Ads; Sales Enablement 0.58 Zhejiang Hozon New Energy Automobile Co., Ltd. Data Management, Processing, and Cloud; Cybersecurity, Data Protection; Sales Enablement 0.44 Zhejiang Hozon New Energy Automobile Co., Ltd. Data Management, Processing, and Cloud; Cybersecurity, Data Protection; Sales Enablement 0.32 Figure 4.2.15 Table of Contents Chapter 4 Preview 192 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment Regional Comparison by Newly Funded AI Companies This subsection studies the number of newly funded United States led all regions with the largest number of AI companies across various geographic areas. newly funded AI companies at 542, followed by China As was the case with private investment, the at 160 and the United Kingdom at 99 (Figure 4.2.16). Number of Newly Funded AI Companies by Geographic Area, 2022 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report United States 542 China 160 United Kingdom 99 Israel 73 India 57 Canada 47 France 44 Germany 41 Singapore 36 Japan 32 Switzerland 26 Australia 23 South Korea 22 Sweden 12 Netherlands 12 0 Table of Contents 100 Chapter 4 Preview 200 300 Number of Companies 400 500 Figure 4.2.16 193 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment A similar trend is evident in the aggregate data since 2013. In the last decade, the number of newly funded AI companies in the United States is around 3.5 times the amount in China, and 7.4 times the amount in the United Kingdom (Figure 4.2.17). Number of Newly Funded AI Companies by Geographic Area, 2013–22 (Sum) Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report United States 4,643 China 1,337 United Kingdom 630 Israel 402 Canada 341 France 338 India 296 Japan 294 Germany 245 Singapore 165 South Korea 145 Australia 126 Switzerland 108 Sweden 83 Netherlands 78 0 500 Figure 4.2.18 breaks down data on newly 1,000 1,500 2,000 2,500 3,000 Number of Companies 3,500 4,000 4,500 Figure 4.2.17 Number of Newly Funded AI Companies by Geographic Area, 2013–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report 700 funded AI companies within select 600 geographic regions. In a trend that goes the United States continues to outpace both the European Union and the United Kingdom, as well as 500 Number of Companies back a decade, 542, United States 400 300 293, European Union and United Kingdom 200 160, China China. However, the growth rates of the 100 different regions are relatively similar. 0 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 4.2.18 Table of Contents Chapter 4 Preview 194 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment Private Investment in AI by Focus Area, 2021 Vs. 2022 Focus Area Analysis Private AI investment can also be disaggregated by focus area. Figure 4.2.19 compares global private AI investment by focus area in 2022 versus 2021. The focus areas that attracted the most investment in 2022 were medical and healthcare ($6.1 billion); data management, processing, and cloud ($5.9 billion); fintech ($5.5 billion); cybersecurity and data protection ($5.4 billion); and retail ($4.2 billion). Mirroring the pattern seen in total AI private investment, the total investment across most focus areas declined in the last year. Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report Medical and Healthcare Data Management, Processing, Cloud Fintech Cybersecurity, Data Protection Retail Industrial Automation, Network Sales Enablement Marketing, Digital Ads AR/VR Drones Insurtech Music, Video Content Semiconductor HR Tech Energy, Oil, and Gas AV NLP, Customer Support Agritech Entertainment Legal Tech Geospatial Fitness and Wellness Ed Tech Facial Recognition VC 2022 2021 0 2 4 6 8 10 Total Investment (in Billions of U.S. Dollars) Figure 4.2.19 Figure 4.2.20 presents trends in AI focus area cybersecurity and data protection, drones, marketing investments. As noted earlier, most focus areas saw and digital ads, HR tech, AR/VR, and legal tech. Still, declining investments in the last year. However, some mirroring a broader trend in AI private investment, of the focus areas that saw increased investments are most focus areas saw greater amounts of AI private semiconductor, industrial automation and network, investment in 2022 than they did in 2017. Table of Contents Chapter 4 Preview 195 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment Private Investment in AI by Focus Area, 2017–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report Data Management, Processing, Cloud 8 6 4 2 0 5.86 2018 2020 2022 Medical and Healthcare 8 6 4 2 0 Semiconductor 8 6 4 2 0 2020 2022 8 6 4 2 0 Total Investment (in Billions of U.S. Dollars) NLP, Customer Support 8 6 4 2 0 2020 2020 3.92 2018 2020 2022 8 6 4 2 0 2020 2022 8 6 4 2 0 Agritech 8 6 4 2 0 2020 2020 2018 2022 8 6 4 2 0 2020 2022 2018 2020 2022 8 6 4 2 0 2022 8 6 4 2 0 2020 2022 5.38 2018 2020 8 6 4 2 0 2022 2020 2022 8 6 4 2 0 2022 8 6 4 2 0 0.53 2018 2020 2022 2022 8 6 4 2 0 1.88 2018 2020 2020 2022 8 6 4 2 0 1.74 2018 2020 2022 8 6 4 2 0 0.37 2018 2020 2022 Music, Video Content 0.87 2018 2022 Drones Entertainment 8 6 4 2 0 2020 Ed Tech 2.39 2018 2022 Insurtech 0.07 2018 2020 Fitness and Wellness AR/VR 0.83 2018 2020 1.34 2018 Facial Recognition 3.18 2018 2022 4.20 Legal Tech 0.71 2018 2020 2020 Cybersecurity, Data Protection 1.63 Geospatial 8 6 4 2 0 8 6 4 2 0 Sales Enablement 0.87 2018 2018 8 6 4 2 0 Retail HR Tech 3.05 2018 2022 1.61 2018 Marketing, Digital Ads 8 6 4 2 0 2022 Energy, Oil, and Gas 1.01 2018 2018 AV 5.52 Industrial Automation, Network 1.65 2018 6.05 Fintech 8 6 4 2 0 2022 8 6 4 2 0 1.72 2018 2020 2022 VC 8 6 4 2 0 0.02 2018 2020 Table of Contents 2022 Chapter 4 Preview Figure 4.2.20 196 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.2 Investment Finally, 4.2.21 shows private investment in AI by focus area billion), and 40 times more than that in the European over time within select geographic regions, highlighting Union and the United Kingdom ($0.04 billion). Chinese how private investment priorities in AI differ across private investment in AI-related semiconductors ($1.02 geographies. For example, in 2022, private investment billion) was 1.75 times more than that in the United in AI-related drone technology in the United States ($1.6 States ($0.58 billion), and 102 times more than that in the billion) was nearly 53 times more than that in China ($0.03 European Union and the United Kingdom ($0.01 billion). Private Investment in AI by Focus Area and Geographic Area, 2017–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report Data Management, Processing, Cloud Medical and Healthcare US, 3.13 US, 4.19 CN, 1.87 CN, 0.25 4 4 EU/UK, 0.24 EU/UK, 0.76 4 2 0 2 2018 2020 Semiconductor 4 0 2022 2018 2020 2022 0 2018 2020 2022 Retail 0 US, 1.52 CN, 0.01 EU/UK, 2.07 4 2 2 2 2 0 0 0 2020 2022 2 0 2018 2022 Energy, Oil, and Gas 2018 2020 0 2022 2 2020 Agritech 4 0 2022 US, 0.55 CN, 0.10 EU/UK, 0.08 4 2020 2022 2 2018 2020 2022 HR Tech 0 US, 0.24 CN, 0.00 EU/UK, 1.28 4 2 2018 2018 2020 2022 Sales Enablement 0 2018 2020 2022 Facial Recognition 0 2018 2020 AR/VR US, 1.12 CN, 1.68 EU/UK, 0.16 4 2022 0 2 2 2 0 0 Geospatial 4 US, 0.55 CN, 0.03 EU/UK, 0.01 4 2 2 0 2018 2020 VC 4 0 2022 2018 2020 Legal Tech 2022 US, 0.71 CN, 0.05 EU/UK, 0.06 4 2018 2020 Entertainment 2022 US, 0.47 CN, 0.18 EU/UK, 0.17 4 2 2018 2020 2022 0 2020 Drones 2018 2020 Insurtech 2018 2020 Ed Tech US, 2.07 CN, 0.01 EU/UK, 0.06 4 0 2022 2018 US, 0.23 CN, 0.00 EU/UK, 0.14 2022 US, 1.60 CN, 0.03 EU/UK, 0.04 2022 US, 0.39 CN, 0.00 EU/UK, 1.29 2 2 2020 2022 Fitness and Wellness US, 0.07 CN, 0.00 EU/UK, 0.00 4 0 2018 2020 2 2 2018 2018 Cybersecurity, Data Protection US, 0.80 US, 3.87 CN, 0.34 CN, 1.07 EU/UK, 0.20 4 EU/UK, 0.23 4 2 Marketing, Digital Ads US, 1.14 CN, 0.88 4 EU/UK, 0.76 4 0 2020 US, 0.69 CN, 0.49 EU/UK, 0.02 2 0 2018 AV US, 3.23 CN, 0.03 EU/UK, 0.94 4 2 Industrial Automation, Network US, 0.58 US, 0.87 CN, 1.02 CN, 1.06 4 EU/UK, 0.01 EU/UK, 1.65 4 NLP, Customer Support US, 0.69 CN, 0.13 4 EU/UK, 0.04 4 Total Investment (in Billions of U.S. Dollars) Fintech 2018 2020 2022 US, 0.12 CN, 0.01 EU/UK, 0.10 2022 Music, Video Content US, 1.10 CN, 0.03 EU/UK, 0.44 2 2018 2020 2022 0 2018 2020 2022 US, 0.00 CN, 0.00 EU/UK, 0.02 2 0 2018 2020 Table of Contents Figure 4.2.21 2022 Chapter 4 Preview 197 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity This section explores how corporations tangibly use AI. First, it highlights industry adoption trends and asks how businesses adopt AI and what particular AI technologies they find most useful, and identifies how AI adoption affects their bottom line. Second, the section considers industry motivations and explores what questions industry leaders consider when thinking about incorporating AI technologies. Finally, it paints a qualitative picture of business AI use by examining trends in AI-related earnings calls. 4.3 Corporate Activity Industry Adoption Adoption of AI Capabilities The following subsection on the industry adoption According to the most recent McKinsey report, as of of AI borrows data from McKinsey’s “The State of 2022, 50% of surveyed organizations reported having AI in 2022—and a Half Decade in Review,” as well adopted AI in at least one business unit or function as previous years’ editions. The 2022 report drew on (Figure 4.3.1). This total is down slightly from 56% in data from a survey of 1,492 participants representing 2021, although up significantly from 20% in 2017. AI a wide range of regions, industries, company sizes, usage has rapidly grown in the past half-decade, but functional specialties, and tenures. leveled off since 2020. Share of Respondents Who Say Their Organizations Have Adopted AI in at Least One Function, 2017–22 Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report 60% 50% 50% % of Respondents 40% 30% 20% 10% 0% 2017 2018 2019 2020 2021 2022 Figure 4.3.1 Table of Contents Chapter 4 Preview 198 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity In the last half-decade, the average number of AI capabilities that organizations have embedded has doubled from 1.9 in 2018 to 3.8 in 2022 (Figure 4.3.2). Some of the AI capabilities that McKinsey features in their survey include recommender systems, NL text understanding, and facial recognition.4 Average Number of AI Capabilities That Respondents’ Organizations Have Embedded Within at Least One Function or Business Unit, 2018–22 Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report 4.00 3.80 Number of AI Capabilities (Average) 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 2018 2019 2020 2021 2022 Figure 4.3.2 4 In the 2022 edition of the McKinsey survey, 16 total AI capabilities are considered: computer vision, deep learning, digital twins, facial recognition, GAN, knowledge graphs, NL generation, NL speech understanding, NL text understanding, physical robotics, recommender systems, reinforcement learning, robotic process automation, transfer learning, transformers, and virtual agents. Table of Contents Chapter 4 Preview 199 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity The most commonly adopted AI use case in 2022 was service operations optimization (24%), followed by the creation of new AI-based products (20%), customer segmentation (19%), customer service analytics (19%), and new AI-based enhancement of products (19%) (Figure 4.3.3). Most Commonly Adopted AI Use Cases by Function, 2022 Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report Service Operations Optimization 24% Creation of New AI-Based Products 20% Customer Segmentation 19% Customer Service Analytics 19% New AI-Based Enhancements of Products 19% Customer Acquisition and Lead Generation 17% Product Feature Optimization 16% Contact-Center Automation 16% Risk Modeling and Analytics 15% Service Operations Product and/or Service Development Marketing and Sales Predictive Service and Intervention 0% Risk 14% 4% 8% 12% % of Respondents 16% 20% 24% Figure 4.3.3 Table of Contents Chapter 4 Preview 200 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity With respect to the type of AI capabilities embedded rates of embedding were 48%, 47%, and 46%. Across in at least one function or business unit, as indicated all industries, the most embedded AI technologies by Figure 4.3.4, robotic process automation had were robotic process automation (39%), computer the highest rate of embedding within high tech/ vision (34%), NL text understanding (33%), and virtual telecom, financial services and business, and legal agents (33%). and professional services industries—the respective AI Capabilities Embedded in at Least One Function or Business Unit, 2022 Industry Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report All Industries 34% 30% 24% 18% 11% 25% 18% 23% 33% 20% 25% 20% 39% 16% 11% 33% Business, Legal, and 32% Professional Services 37% 31% 11% 8% 26% 12% 22% 34% 19% 23% 26% 46% 16% 11% 30% Consumer Goods/ 33% Retail 36% 25% 19% 13% 18% 20% 11% 22% 24% 32% 19% 25% 7% 11% 40% Financial Services 24% 22% 18% 24% 13% 29% 20% 30% 42% 14% 30% 19% 47% 17% 12% 33% Healthcare Systems/ Pharma and 32% Med. Products 18% 16% 5% 5% 14% 5% 12% 29% 11% 16% 13% 16% 9% 6% 14% High Tech/Telecom 37% 45% 24% 16% 15% 23% 24% 29% 40% 15% 34% 23% 48% 22% 15% 43% de en m s ic ot b Ro ng di ng di an an st s ) s -3 nt PT ge ,G lA g. ua e. rt s( Vi er m or sf n ng io an ni at Tr ar om Le ut er sf sA an es Tr g oc in Pr rn tic ea bo tL en Ro s m m te ce or ys nf rS i Re m co Re st er er d Un n io ph ra n tio ni G d Un h ec e at er xt Te e Sp en G dg le al ic ys Ph L N L N L N ow Kn AN G s in og ec lR ia c Fa g in n sio Vi rn a Le w lT ita ig D p ee D r te pu om C % of Respondents (AI Capability) Figure 4.3.4 Table of Contents Chapter 4 Preview 201 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Figure 4.3.5 shows AI adoption by industry and AI function in 2022. The greatest adoption was in risk for high tech/telecom (38%), followed by service operations for consumer goods/retail (31%) and product and/or service development for financial services (31%). AI Adoption by Industry and Function, 2022 Industry Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report All Industries 11% 8% 5% 10% 19% 19% 21% 9% Business, Legal, and Professional Services 11% 10% 9% 8% 16% 20% 19% 12% Consumer Goods/ Retail 14% 4% 3% 4% 15% 31% 29% 11% Financial Services 1% 8% 7% 31% 17% 24% 23% 2% Healthcare Systems/ Pharma and Med. Products 15% 7% 2% 4% 22% 12% 8% 8% High Tech/Telecom 6% 6% 4% 7% 38% 21% 25% 8% Hu ma nR eso urc es Ma nu fa ctu rin Ma rke ti g ng Ris Pro k Se duct rvi an c an e D d/o dS r ev ale elo s pm en t Se rv ice Str Co ateg rpo y a era rat nd eF tio ns ina n Op ce Su p Ma plyna Ch a ge me in nt % of Respondents (Function) Figure 4.3.5 Table of Contents Chapter 4 Preview 202 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Figure 4.3.6 shows how rates of AI adoption by points); followed by high tech/telecom, for risk industry and AI function vary from 2021 to 2022 (22 percentage points). The most significant in order to demonstrate how rates of AI adoption decreases were in high tech/telecom, for product have changed over the last year. The greatest year- and/or service development (38 percentage points); over-year increases were in consumer goods/retail, and healthcare systems, also for product and/or for strategy and corporate finance (25 percentage service development (25 percentage points). Percentage Point Change in Responses of AI Adoption by Industry and Function 2021 Vs. 2022 Industry Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report All Industries 2% -4% -15% -13% 6% -6% 12% -4% Business, Legal, and Professional Services -3% 2% -19% -7% 3% -6% 11% -1% Consumer Goods/ Retail 12% -14% -19% -13% 14% 16% 25% -7% Financial Services -9% 4% -17% 11% -15% -16% 10% -6% Healthcare Systems/ Pharma and Med. Products 6% -4% -12% -25% 9% -5% -4% -1% High Tech/Telecom -6% -5% -24% -38% 22% -13% 15% -8% Hu ma nR eso urc es Ma nu fa ctu rin Ma rke ti g ng Ris Pro k Se duct rvi a nd ce an De /or dS ve ale lop s me nt Se rv ice Str Co ateg rpo y a era rat nd eF tio ns ina n Op ce Su p Ma plyna Ch a ge me in nt Percentage Point Change in Responses (Function) Figure 4.3.6 Table of Contents Chapter 4 Preview 203 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Organizations report AI adoption leading to both (Figure 4.3.7). On the revenue side, the functions that cost decreases and revenue increases. On the cost most respondents saw increases in as a result of AI side, the functions that most respondents saw adoption were marketing and sales (70%), product decreases in as a result of AI adoption were supply and/or service development (70%), and strategy and chain management (52%), service operations (45%), corporate finance (65%). strategy and corporate finance (43%), and risk (43%) Cost Decrease and Revenue Increase From AI Adoption by Function, 2021 Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report Decrease by <10% Decrease by 10–19% Decrease by ≥20% 45% Service Operations 28% Function Marketing and Sales 43% Risk 52% Strategy and Corporate Finance Average Across All Activities 10% 7% 10% 25% 30% 30% 43% 9% 8% 10% 7% 20% 31% 32% 14% 21% 41% Product and/or Service Development 10% 6% 32% 29% Human Resources Supply Chain Management 29% 42% Manufacturing Increase by >10% 23% 10% 18% 8% 57% 58% 31% 20% 48% 27% 17% 59% 28% 24% 19% 70% 41% 11% 16% 61% 33% 13% 13% 8% Increase by ≤5% 37% 14% 6% 8% Increase by 6–10% 70% 33% 41% 36% 65% 63% % of Respondents Figure 4.3.7 Table of Contents Chapter 4 Preview 204 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Figure 4.3.8 shows AI adoption by organizations across all geographies was 50%, down 6% from 2021. globally, broken out by regions of the world. In 2022, Notably, “Greater China” registered a 20 percentage North America led (59%), followed by Asia-Pacific point decrease from 2021. (55%) and Europe (48%). The average adoption rate AI Adoption by Organizations in the World, 2021 Vs. 2022 Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report 50% All Geographies 56% 55% Asia-Paci c 64% 48% Europe 51% 59% North America 55% Greater China (incl. Hong Kong, Taiwan) 41% 61% Developing Markets (incl. India, Latin America, MENA) 0% 2022 44% 2021 52% 10% 20% 30% 40% % of Respondents 50% 60% Figure 4.3.8 Table of Contents Chapter 4 Preview 205 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Consideration and Mitigation of Risks From Adopting AI risks were regulatory compliance (45%), personal/ As has been the case in the last few iterations of the The least salient risks identified by organizations were McKinsey report, in 2022 respondents identified national security (13%) and political stability (9%). individual privacy (40%), and explainability (37%). cybersecurity as the most relevant risk when adopting AI technology (59%) (Figure 4.3.9). The next most cited Risks From Adopting AI That Organizations Consider Relevant, 2019–22 Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report 60% 59%, Cybersecurity 50% % of Respondents 45%, Regulatory Compliance 40% 40%, Personal/Individual Privacy 37%, Explainability 30% 32%, Organizational Reputation 30%, Equity and Fairness 28%, Workforce/Labor Displacement 20% 20%, Physical Safety 13%, National Security 10% 0% 9%, Political Stability 2019 2020 2021 2022 Figure 4.3.9 Table of Contents Chapter 4 Preview 206 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Figure 4.3.10 highlights the AI risks that organizations have taken steps to mitigate. For instance, there is are taking steps to mitigate. The top three responses a gap of 8 percentage points for cybersecurity, 9 were cybersecurity (51%), followed by regulatory percentage points for regulatory compliance, and 12 compliance (36%) and personal/individual privacy percentage points for personal/individual privacy. (28%). As was the case in previous years, there are These differences suggest there is a gap between meaningful gaps between the risks organizations the awareness organizations have of various risks and cite as relevant and those which organizations their steps taken to mitigate such risks. Risks From Adopting AI That Organizations Take Steps to Mitigate, 2019–22 Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report 51%, Cybersecurity 50% 40% % of Respondents 36%, Regulatory Compliance 30% 28%, Personal/Individual Privacy 22%, Organizational Reputation 22%, Explainability 20% 18%, Workforce/Labor Displacement 17%, Equity and Fairness 15%, Physical Safety 10% 7%, National Security 4%, Political Stability 0% 2019 2020 2021 2022 Figure 4.3.10 Table of Contents Chapter 4 Preview 207 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Narrative Highlight: The Effects of GitHub’s Copilot on Developer Productivity and Happiness In 2021, launched a technical preview of Copilot, a generative AI tool that enables developers and coders to present a coding problem in natural language and then have Copilot generate a solution in code. Copilot can also translate between various programming languages. In 2022, GitHub surveyed over 2,000 developers who were using the tool to determine its effect on their productivity, well-being, and workflow.5 Figure 4.3.11 summarizes the results of the survey. Developers overwhelmingly reported feeling more productive, satisfied, and efficient when working with Copilot. More specifically, 88% of surveyed respondents commented feeling more productive, 74% reported being able to focus on more satisfying work, and 88% claimed to have completed tasks more quickly. One software engineer stated, “[With Copilot] I have to think It took the developers using Copilot only 71 minutes to complete their task—56% less time than the developers who did not use Copilot (161 minutes). reported a completion rate of 78%, 8 percentage points higher than those who did not use Copilot. Likewise, it only took the developers using Copilot 71 minutes to complete their task, which was 56% less time than the developers who did not use Copilot (161 minutes). These survey and experiment results are evidence of the tangible ways in which AI tools improve worker productivity. less, and when I have to think, it’s the fun stuff. It sets off a little spark that makes coding more fun and more efficient.”6 As part of the same survey, GitHub recruited 95 developers and randomly split them into two groups, one of which used Copilot as part of a coding task and the other which did not. The results of this experiment are summarized in Figure 4.3.12. The developers who used Copilot 5 Most of the developers surveyed, around 60%, were professional developers; 30% were students and 7% were hobbyists. 6 The quote is taken from this source. Table of Contents Chapter 4 Preview 208 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Narrative Highlight: The Effects of GitHub’s Copilot on Developer Productivity and Happiness (cont’d) Measuring Dimensions of Developer Productivity When Using Copilot: Survey Responses, 2022 Source: GitHub Survey, 2022 | Chart: 2023 AI Index Report I Am More Productive 88% Focus on More Satisfying Work 74% More Ful�lled With My Job 60% Perceived Productivity Satisfaction and Well-Being Less Frustrated When Coding 59% E�ciency and Flow Faster With Repetitive Tasks 96% Faster Completion 88% Less Mental E�orts on Repetitive Tasks 87% Less Time Searching 77% More In The Flow 73% 0% 20% 40% 60% % of Participants That Agreed or Strongly Agreed 80% 100% Figure 4.3.11 Summary of the Experiment Process and Results Source: GitHub Survey, 2022 | Table: 2023 AI Index Report Used GitHub Copilot Did Not Use GitHub Copilot Number of Developers 45 50 Completion Rate (%) 78 70 Average Time Taken to Complete the Task (Minutes) 71 161 Figure 4.3.12 Table of Contents Chapter 4 Preview 209 Artificial Intelligence Index Report 2023 Industry Motivation This section explores the motivations industry Chapter 4: The Economy 4.3 Corporate Activity Importance of AI Solutions for Organizations’ Overall Success Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report leaders have in deploying AI and examines the 1%, Not Important 5%, Somewhat Important degree to which they feel AI is important, the reasons they are eager to embrace AI, and the factors that have hindered further scaling of AI solutions. The data from this section comes from Deloitte’s “State of AI in Enterprise” report, which has surveyed companies about their use of AI since 2017. This year’s survey polled 2,620 94%, Important business leaders from a wide range of countries, industries, and corporate levels. Figure 4.3.13 Perceived Importance of AI Figures 4.3.13 and 4.3.14 suggest that an overwhelming majority of business leaders perceive AI to be important for their businesses. More specifically, when asked how important AI solutions were for their organization’s overall success, 94% responded “important,” 5% said “somewhat important,” and 1% answered “not important” (Figure 4.3.13). Believe AI Enhances Performance and Job Satisfaction, 2022 Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report 2%, Strongly Disagree / Disagree 1%, Unsure 16%, Neither Agree nor Disagree Similarly, when asked whether they believe that AI enhances performance and job satisfaction, 82% responded “strongly agree/agree,” 16% said they “neither agree nor disagree,” and only 2% selected “strongly disagree/disagree” (Figure 4.3.14). 82%, Strongly Agree / Agree Table of Contents Chapter 4 Preview Figure 4.3.14 210 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity AI Investments and Implementation Outcomes In 2022, 76% of surveyed leaders reported expecting to increase AI investments in the next fiscal year (Figure 4.3.15). Although this represents a 9 percentage point decrease since 2021 and a 12 percentage point decrease since 2018, a significantly large portion of business leaders continue to express interest in AI investment. Expected AI Investment Increase in the Next Fiscal Year Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report 80% % of Respondents 76% 60% 40% 20% 0% 2018 2019 2020 2021 2022 Figure 4.3.15 Table of Contents Chapter 4 Preview 211 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Figure 4.3.16 highlights the main outcomes that business leaders achieved by embracing AI solutions.7 The top outcome was lowered costs (37%), followed by improved collaboration across business functions/organizations (34%) and having discovered valuable insights (34%). Main Outcomes of AI Implementation, 2022 Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report 37% Lower Costs Improve Collaboration Across Business Functions/Organizations 34% Discover Valuable Insights 34% Customize or Improve Product/Programs, Services, or Offers 33% Enter New Markets/Expand Services to New Constituents 33% Make Organizational Processes More E�cient 33% Improve Decision-Making 32% Create New Products/ Programs and Services 32% Predict Demand 32% Enable New Business/ Service Models 32% Increase Revenue 31% Activate the Potential of Existing Headcount and/or Improve Talent Management 30% Improve Constituent Engagement 30% Anticipate Constituent Needs 0% 28% 10% 20% % of Respondents 30% Figure 4.3.16 7 Figure 4.3.16 is drawn from the chart in the Deloitte survey: “Outcomes—‘Achieved to a high degree.’” Table of Contents Chapter 4 Preview 212 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Challenges in Starting and Scaling AI Projects were proving business value (37%), lack of executive The top three challenges that business leaders commitment (34%), and choosing the right AI identified in terms of starting AI-related projects technologies (33%) (Figure 4.3.17). Top Three Challenges in Starting AI Projects, 2022 Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report Proving Business Value 37% Lack of Executive Commitment 34% Choosing the Right AI Technologies 0% 33% 5% 10% 15% 20% % of Respondents 25% 30% 35% Figure 4.3.17 Table of Contents Chapter 4 Preview 213 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity The main barrier leaders faced in scaling existing AI initiatives was managing AI-related risks (50%), obtaining more data or inputs to train a model (44%), and implementing AI technologies (42%) (Figure 4.3.18). Main Barriers in Scaling AI Initiatives, 2022 Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report Managing AI-Related Risks 50% Obtaining Needed Data or Input to Train Model 44% Implementing AI Technologies 42% Proving Business Value 0% 40% 10% 20% 30% % of Respondents 40% 50% Figure 4.3.18 Table of Contents Chapter 4 Preview 214 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Earnings Calls Aggregate Trends The following subsection presents data from from Fortune 500 companies that mentioned AI-related NetBase Quid, which uses natural language keywords (Figure 4.3.19). The number of such mentions processing tools to analyze trends in corporate dropped from the previous year, when there were 306, earnings calls. NetBase Quid analyzed all 2022 but has increased since 2018 when there were 225. In the 2022 fiscal year, there were 268 earnings calls earnings calls from Fortune 500 companies, identifying all mentions of “Artificial Intelligence,” “AI,” “Machine Learning,” “ML,” and “deep learning.” Number of Fortune 500 Earnings Calls Mentioning AI, 2018–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report 300 Number of Earnings Calls 268 200 100 0 2018 2019 2020 2021 2022 Figure 4.3.19 Table of Contents Chapter 4 Preview 215 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Specific Themes advertising and marketing (8.8%) (Figure 4.3.20). Mentions of AI in Fortune 500 earnings calls were Compared to 2018, some of the less prevalent associated with a wide range of themes. In 2022, the AI-related themes in 2022 included deep learning most cited themes were business integration (10.0%); (4.8%), autonomous vehicles (3.1%), and data pricing and inventory management (8.8%); and storage and management (3.0%). Themes for AI Mentions in Fortune 500 Earnings Calls, 2018 Vs. 2022 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report 9.96% (-15%) Business Integration 11.74% 8.82% (+48%) Pricing and Inventory Management 5.94% 8.82% (+204%) Advertising and Marketing 2.90% 8.39% (+23%) Process Automation 6.81% 7.40% (-7%) 7.97% Support Decision-Making 7.11% (+69%) Healthcare and Medical Practices 4.20% 6.26% (+73%) Cloud Platforms 3.62% 5.26% (-21%) Personalizing Customer Experience 6.67% 4.84% (-41%) Deep Learning 8.26% 4.13% (+24%) Edge Intelligence 3.33% 3.84% (+121%) Nvidia AI Use Cases 1.74% 3.27% (+33%) Revenue Growth 2.46% 3.13% (-47%) Autonomous Vehicles 5.94% 2.99% (+37%) Data Processing 2.17% 2.99% (-55%) Data Storage and Management 6.67% 2.70% (+10%) 2.46% Adobe Experience Customer Support 2.42% (+734%) 0.29% 2.13% (-59%) Azure Cognitive Services 5.22% 1.85% (-20%) 2.32% Data Center GPU Investments Nvidia RTX Digital Transformation 0% 1.28% (+47%) 0.87% 2022 2018 1.00% (-62%) 2.61% 0.71% (-87%) 5.36% 1% 2% 3% 4% 5% 6% 7% 8% 9% Theme Mentioned (% of Total) 10% 11% 12% 13% 14% Figure 4.3.20 Table of Contents Chapter 4 Preview 216 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Narrative Highlight: What Are Business Leaders Actually Saying About AI? To better understand business attitudes that surround AI, it is worth looking at AI-related excerpts from the Fortune 500 earnings calls. For example, on the topic of business integration, companies often cite AI and machine learning (ML) use cases to reassure business audiences of safer business practices, growing opportunities, streamlining processes, and capability expansion. “We spent $100 million building certain risk and fraud systems so that when we process payments on the consumer side, losses are down $100 million to $200 million. Volume is way up. That’s a huge benefit.” – Jamie Dimon, CEO, JP Morgan Chase & Co. (Q2 2022) “Especially in the last year or so, the field of robotics itself has actually changed because with AI and ML coming to the picture, there’s significant developments in the robotics field. So we think it’s a huge opportunity for us.” – Raj Subramaniam, CEO, FedEx (Q3 2022) “We spent a ton of money on Cloud. We spend a ton of money on adding capabilities. And over time, as you do it on one platform, it all becomes more efficient. So, I think it’s a lot of little things, but it adds up with our base of people and fixed cost, it adds up significantly over time. We’ve been able to maintain our headcount at a level we feel good about, and we think we can grow massively on top of that without having to add lots of bodies to be able to do it.” – Peter Kern, CEO, Expedia Group (Q4 2022) In terms of process automation, business leaders emphasize the ability of AI tools to accelerate productivity gains and to deliver a better customer experience. “We continue to drive the use of automation and artificial intelligence to drive productivity gains to help offset inflationary pressures.” – Jim Davis, CEO, Quest Diagnostics (Q4 2022) Table of Contents “We have improved the experience for customers by applying artificial intelligence to match them with an expert who is right for their specific situation and to deliver insights to experts so they can provide excellent service.” – Sasan Goodarzi, CEO, Intuit (Q2 2022) Chapter 4 Preview “In September, we opened a nextgen fulfillment center in Illinois. This 1.1 million square foot facility features robotics, machine learning, and automated storage, resulting in increased productivity and a better service for our customers at faster delivery times.” – John David, CFO, Walmart (Q3 2022) 217 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Narrative Highlight: What Are Business Leaders Actually Saying About AI? (cont’d) The conversation surrounding pricing and inventory management saw companies reassuring business audiences on how their use of AI would improve their operational strength, especially in environments of high inflation and supply chain challenges. “We are … continuing to refine and invest in machine learning tools that will allow for more sophisticated competitive pricing and greater automation at scale.” – Adrian Mitchell, CFO, Macy’s (Q3 2022) “Our teams are utilizing technology, innovative data analytics and AI to forecast supply chain lead times and changes in market demand to ensure optimal levels. These actions along with our pricing initiatives positively impacted our gross margin in the second quarter.” – Bert Nappier, CFO, Genuine Parts Company (Q3 2022) There is also a vibrant discussion about the ways in which AI can change healthcare and medical practices, more specifically to reduce costs, improve the patient experience, and better serve clinicians. “[Using] machine learning and robotics, we can now resolve a wide range of prescription drug claims which previously required the attention of our pharmacists, freeing them up to spend time with patients. This advanced approach reduces overall cost and improves the patient experience.” – Karen Lynch, CEO, CVS Health (Q2 2022) Table of Contents “I’d like to highlight productivity efforts in our preauthorization process where we’re leveraging an in-house artificial intelligence solution to automatically match incoming faxes to the correct authorization requests. This solution creates administrative efficiencies across millions of inbound images. We are also scaling this solution to multiple business units such as pharmacy and are also expanding the application of this type of AI to provide decision support to clinicians, which will result in improvements to authorization turnaround times, reduction in friction for providers and creating a better member experience.” – Bruce Broussard, CEO, Humana (Q3 2022) “We continue to see opportunities across [the software and analytics] segment as payers, providers, and partners take advantage of our high ROI solutions and realize the benefits of our data, AI models, and workflow capabilities.” – Neil de Crescenzo, CEO, UnitedHealth Group (Q2 2022) Chapter 4 Preview 218 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.3 Corporate Activity Sentiment Analysis sentiment associated with mentions of AI has been NetBase Quid also runs the AI-related text of Fortune overwhelmingly positive (Figure 4.3.21). Mentions 500 earnings calls through a sentiment analysis of AI were rarely negative, suggesting that large machine-learning algorithm that identifies whether businesses tend to have positive associations when it the sentiment associated with the mention of AI is comes to AI tools. positive, mixed, or negative . Overall, since 2018, the 8 Sentiment Summary Distribution for AI Mentions in Fortune 500 Earnings Calls by Publication Date, 2018–22 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report 1%% 2%% 3%% 17%% 19%% 18%% 80%% 79%% 81%% 84%% Sentiment Summary 1%% 16%% 79%% 86%% 1%% 84%% 12%% 85%% 1%% 20%% 2%% 16%% 1%% 14%% 77%% 0%% 87%% 1%% 13%% 1%% 84%% 80%% 77%% 81%% 1%% 22%% 3%% 17%% 18%% 81%% 84%% 17%% 81%% 15%% 18%% 81%% 3%% 21%% 2%% 76%% 1%% 2%% 23%% 0%% 15%% 1%% 19%% 86%% 80% 1%% 13%% 100% 60% 40% Negative 20% Mixed Positive 0% Q1 Q2 Q3 2018 Q4 Q1 Q2 Q3 2019 Q4 Q1 Q2 Q3 2020 Q4 Q1 Q2 Q3 2021 Q4 Q1 Q2 Q3 2022 Q4 Figure 4.3.21 8 Chapter 2 of the 2023 AI Index highlights trends in the performance of sentiment analysis algorithms. Table of Contents Chapter 4 Preview 219 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations Given that robots are frequently deployed with AI-based software technologies, it is possible to gain insights on AI-ready infrastructure being deployed in the real world by tracking the installation of industrial robots. Data in this section comes from the International Federation of Robotics (IFR), an international nonprofit organization that works to promote, strengthen, and protect the robotics industry. Every year the IFR releases the World Robotics Report, which tracks global trends in installations of robots.9 4.4 Robot Installations Aggregate Trends 2021 saw a rebound in the total number of worldwide The following subsection includes data on the robot installations. The 517,000 industrial robots installation and operation of industrial robots, installed in 2021 represented a 31.3% increase from which are defined as an “automatically controlled, 2020 and a 211.5% increase since 2011 (Figure 4.4.1). reprogrammable, multipurpose manipulator, programmable in three or more axes, which can be either fixed in place or mobile for use in industrial automation applications.” Number of Industrial Robots Installed in the World, 2011–21 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report 517 Number of Industrial Robots Installed (in Thousands) 500 400 300 200 100 0 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 4.4.1 9 Due to the timing of the IFR’s survey, the most recent data is from 2021. Table of Contents Chapter 4 Preview 220 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations The worldwide operational stock of industrial robots from 3,035,000 in 2020. In the last decade, the also continues to steadily increase year over year number of industrial robots being installed and the (Figure 4.4.2). The total number of operational number being used have both steadily increased. industrial robots jumped 14.6% to 3,477,000 in 2021, Operational Stock of Industrial Robots in the World, 2011–21 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report 3,477 3,500 Number of Industrial Robots (in Thousands) 3,000 2,500 2,000 1,500 1,000 500 0 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 4.4.2 Table of Contents Chapter 4 Preview 221 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations Industrial Robots: Traditional Vs. Collaborative Robots scalable than traditional robots, and are capable of iterative learning. A distinction can be drawn between traditional robots that work for humans and collaborative In 2017, only 2.8% of all newly installed industrial robots that are designed to work with humans. robots were collaborative (Figure 4.4.3). As of 2021, Recently, the robotics community has been excited that number increased to 7.5%. Although traditional about the potential of collaborative robots given industrial robots still lead new installations, the that they can be safer, more flexible, and more number of collaborative robots is slowly increasing. Number of Industrial Robots Installed in the World by Type, 2017–21 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report Number of Industrial Robots Installed (in Thousands) 500 Traditional 517 Collaborative 39 424 400 400 391 394 300 478 200 389 405 2017 2018 370 368 2019 2020 100 0 2021 Figure 4.4.3 Table of Contents Chapter 4 Preview 222 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations By Geographic Area the amount installed by Japan (47,200) and 7.7 Country-level data on robot installations can illustrate times the amount installed by the United States which countries are prioritizing the integration of (35,000) (Figure 4.4.4). The countries with the next robots into their economy. In 2021, China installed most installations were South Korea (31,100) and the most industrial robots, with 268,200, 5.7 times Germany (23,800). Number of Industrial Robots Installed by Country, 2021 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report China 268.20 Japan 47.20 United States 35.00 South Korea 31.10 Germany 23.80 Italy 14.10 Taiwan 9.60 France 5.90 Mexico 5.40 India 4.90 Canada 4.30 Thailand 3.90 Singapore 3.50 Spain 3.40 Poland 3.30 0 30 60 90 120 150 180 Number of Industrial Robots Installed (in Thousands) 210 240 270 Figure 4.4.4 Table of Contents Chapter 4 Preview 223 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations In 2013, China overtook Japan as the nation installing the most industrial robots (Figure 4.4.5). Since then, the gap between the total number of industrial robots installed by China and the next-nearest nation has only widened. In 2013, Chinese industrial robot installations represented 20.8% of the world’s share, whereas in 2021, they represented 51.8%. Number of New Industrial Robots Installed in Top Five Countries, 2011–21 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report 268, China Number of Industrial Robots Installed (in Thousands) 250 200 150 100 47, Japan 35, United States 31, South Korea 24, Germany 50 0 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 4.4.5 Table of Contents Chapter 4 Preview 224 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations China consolidated its dominance in industrial robotics in 2021, the first year in which the country installed more industrial robots than the rest of the world combined (Figure 4.4.6). Number of Industrial Robots Installed (China Vs. Rest of the World), 2016–21 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report Number of Industrial Robots Installed (in Thousands) 268, China 250 249, Rest of the World 200 150 100 50 0 2016 2017 2018 2019 2020 2021 Figure 4.4.6 Table of Contents Chapter 4 Preview 225 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations Figure 4.4.7 shows the annual growth rate of of industrial robot installations. The countries that industrial robot installations from 2020 to 2021 by reported the highest growth rates were Canada country. Virtually every country surveyed by the (66%), Italy (65%), and Mexico (61%). IFR reported a yearly increase in the total number Annual Growth Rate of Industrial Robots Installed by Country, 2020 Vs. 2021 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report Canada 66% Italy 65% Mexico 61% Poland 56% India 54% China 51% Thailand 36% Taiwan 31% Japan 22% United States 14% France 11% Germany 6% South Korea 2% Spain Singapore 1% -35% −40% −30% −20% −10% 0% 10% 20% 30% Annual Growth Rate of Industrial Robots Installed 40% 50% 60% 70% Figure 4.4.7 Table of Contents Chapter 4 Preview 226 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations Narrative Highlight: Country-Level Data on Service Robotics Another important class of robots are service robots, which the ISO defines as a robot “that performs Service Robots in Medicine Source: UL Solutions, 2022 useful tasks for humans or equipment excluding industrial automation applications.”10 Figure 4.4.8 is an example of a robot being used in medicine, Figure 4.4.9 illustrates how a robot can help with professional cleaning, and Figure 4.4.10 shows a robot designed for maintenance and inspection. Figure 4.4.8 Service Robots in Professional Cleaning Source: This Week in FM, 2021 Figure 4.4.9 Service Robots in Maintenance and Inspection Source: Robotnik, 2022 Figure 4.4.10 10 A more detailed definition can be accessed here. Table of Contents Chapter 4 Preview 227 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations Narrative Highlight: Country-Level Data on Service Robotics (cont’d) Compared to 2020, 2021 saw a higher number of professional service robots installed in the world for several key application areas, including hospitality, medical robotics, professional cleaning, and transportation and logistics (Figure 4.4.11). The category that registered the greatest year-over-year increase was transportation and logistics: In 2021, 1.5 times the number of such service robots were installed as in 2020. Number of Professional Service Robots Installed in the World by Application Area, 2020 Vs. 2021 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report 8 Agriculture 8 20 Hospitality 11 15 Medical Robotics 12 13 Professional Cleaning 2020 50 Transportation and Logistics 0 2021 10 34 5 10 15 20 25 30 35 40 Number of Professional Service Robots Installed (in Thousands) 45 50 Figure 4.4.11 Table of Contents Chapter 4 Preview 228 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations Narrative Highlight: Country-Level Data on Service Robotics (cont’d) As of 2022, the United States has the greatest number of professional service robot manufacturers, roughly 2.16 times as many as the next nation, China. Other nations with significant numbers of robot manufacturers include Germany (91), Japan (66), and France (54) (Figure 4.4.12). Number of Professional Service Robot Manufacturers in Top Countries by Type of Company, 2022 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report 225 Number of Professional Service Robot Manufacturers 29 200 150 100 104 194 Startups 91 Incumbents Unknown 66 54 50 94 52 47 79 61 China France Russia 44 Germany Japan 39 34 35 1 1 United States 44 3 2 0 49 39 South Korea Switzerland Canada Figure 4.4.12 Table of Contents Chapter 4 Preview 229 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations Sectors and Application Types (137,000), followed by automotive (119,000) (Figure On a global level, the sector that saw the greatest increases in the total number of industrial robot amount of robot installations was electrical/electronics installations since 2019. 4.4.13). Each of the highlighted sectors has recorded Number of Industrial Robots Installed in the World by Sector, 2019–21 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report 52 All Others 37 30 119 Automotive 84 102 137 Electrical/Electronics 110 89 15 Food 12 11 64 Metal and Machinery 44 52 24 Plastic and Chemical Products 19 18 2021 2020 107 Unspeci ed 87 87 0 20 40 60 80 100 Number of Industrial Robots Installed (in Thousands) 2019 120 140 Figure 4.4.13 Table of Contents Chapter 4 Preview 230 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations Robots can also be deployed in a wide range of 230,000 industrial robots were installed for handling applications, from assembling to dispensing and functions, 2.4 times more than for welding (96,000) handling. Figure 4.4.14 illustrates how the application and 3.7 times more than for assembling (62,000). of industrial robots has changed since 2021. Handling Every application category, with the exception continues to be the application case toward which of dispensing and processing, saw more robot the most industrial robots are deployed. In 2021, installations in 2021 than in 2019. Number of Industrial Robots Installed in the World by Application, 2019–21 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report 62 Assembling 50 40 32 32 Cleanroom 26 11 Dispensing 8 12 230 Handling 169 177 7 Processing 5 2021 7 2020 80 Unspeci ed 2019 60 55 96 Welding 70 74 0 20 40 60 80 100 120 140 160 Number of Industrial Robots Installed (in Thousands) 180 200 220 240 Figure 4.4.14 Table of Contents Chapter 4 Preview 231 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations China Vs. United States and metal and machinery (34,000) (Figure 4.4.15). The Chinese industrial sectors that installed the Every industrial sector in China recorded a greater greatest number of industrial robots in 2022 were number of robot installations in 2021 than in 2019. electrical/electronics (88,000), automotive (62,000), Number of Industrial Robots Installed in China by Sector, 2019–21 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report 29 All Others 21 12 62 Automotive 31 32 88 Electrical/Electronics 64 42 4 Food 3 3 34 Metal and Machinery Pharma/Cosmetics 22 22 1 1 1 6 Rubber and Plastics 5 2021 4 2020 43 Unspeci ed 0 2019 30 31 10 20 30 40 50 60 Number of Industrial Robots Installed (in Thousands) 70 80 90 Figure 4.4.15 Table of Contents Chapter 4 Preview 232 Artificial Intelligence Index Report 2023 Chapter 4: The Economy 4.4 Robot Installations The automotive industry installed the greatest number of industrial robots in the United States in 2021, although installation rates for that sector decreased year over year (Figure 4.4.16). However, other sectors like food, along with plastic and chemical products, saw year-over-year increases in robot installations. Number of Industrial Robots Installed in the United States by Sector, 2019–21 Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report 4.50 All Others 2.60 3.50 9.80 Automotive 10.50 13.00 2.90 Electrical/Electronics 3.70 3.50 3.40 Food 2.70 2.20 3.80 Metal and Machinery 2.30 3.80 3.50 Plastic and Chemical Products 2.60 2.50 2021 2020 7.10 Unspeci ed 6.30 2019 5.00 0 2 3 5 6 8 9 Number of Industrial Robots Installed (in Thousands) 11 12 14 Figure 4.4.16 Table of Contents Chapter 4 Preview 233 Artificial Intelligence Index Report 2023 Artificial Intelligence Index Report 2023 CHAPTER 5: Education Table of Contents Chapter 5 Preview 234 Artificial Intelligence Index Report 2023 CHAPTER 5 PREVIEW: Education Overview 236 5.2 K–12 AI Education 257 Chapter Highlights 237 United States 257 5.1 Postsecondary AI Education 238 CS Bachelor’s Graduates 238 CS Master’s Graduates 240 CS PhD Graduates 242 CS, CE, and Information Faculty 246 Narrative Highlight: Who Funds CS Departments in the U.S.? Table of Contents Chapter 5 Preview State-Level Trends 257 AP Computer Science 258 Narrative Highlight: The State of International K–12 Education 260 ACCESS THE PUBLIC DATA 255 235 Artificial Intelligence Index Report 2023 Chapter 5: Education Overview Studying the state of AI education is important for gauging some of the ways in which the AI workforce might evolve over time. AI-related education has typically occurred at the postsecondary level; however, as AI technologies have become increasingly ubiquitous, this education is being embraced at the K–12 level. This chapter examines trends in AI education at the postsecondary and K–12 levels, in both the United States and the rest of the world. We analyze data from the Computing Research Association’s annual Taulbee Survey on the state of computer science and AI postsecondary education in North America, Code.org’s repository of data on K–12 computer science in the United States, and a recent UNESCO report on the international development of K–12 education curricula. Table of Contents Chapter 5 Preview 236 Artificial Intelligence Index Report 2023 Chapter 5: Education Chapter Highlights More and more AI specialization. The proportion of new computer science PhD graduates from U.S. universities who specialized in AI jumped to 19.1% in 2021, from 14.9% in 2020 and 10.2% in 2010. New AI PhDs increasingly head to industry. In 2011, roughly the same proportion of new AI PhD graduates took jobs in industry (40.9%) as opposed to academia (41.6%). Since then, however, a majority of AI PhDs have headed to industry. In 2021, 65.4% of AI PhDs took jobs in industry, more than double the 28.2% who took jobs in academia. The gap in external research funding for private versus public American CS departments continues to widen. In 2011, the median amount of total expenditure from external sources for computing research was roughly the same for private and public CS departments in the United States. Since then, the gap has widened, with private U.S. CS departments receiving millions more in additional funding than public universities. In 2021, the median expenditure for private universities was $9.7 million, compared to $5.7 million for public universities. Table of Contents Chapter 5 Preview New North American CS, CE, and information faculty hires stayed flat. In the last decade, the total number of new North American computer science (CS), computer engineering (CE), and information faculty hires has decreased: There were 710 total hires in 2021 compared to 733 in 2012. Similarly, the total number of tenuretrack hires peaked in 2019 at 422 and then dropped to 324 in 2021. Interest in K–12 AI and computer science education grows in both the United States and the rest of the world. In 2021, a total of 181,040 AP computer science exams were taken by American students, a 1.0% increase from the previous year. Since 2007, the number of AP computer science exams has increased ninefold. As of 2021, 11 countries, including Belgium, China, and South Korea, have officially endorsed and implemented a K–12 AI curriculum. 237 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education 5.1 Postsecondary AI Education CS Bachelor’s Graduates At the undergraduate level, most AI-related courses interest in AI. In 2021, the total number of new North are offered as part of a computer science (CS) American CS bachelor’s graduates was 33,059— curriculum. Therefore, trends in new CS bachelor’s nearly four times greater than in 2012 (Figure 5.1.1). graduates give us a proxy for undergraduate New CS Bachelor’s Graduates in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 33,059 Number of New CS Bachelor’s Graduates 30,000 25,000 20,000 15,000 10,000 5,000 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.1 Table of Contents Chapter 5 Preview 238 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education Figure 5.1.2 looks at the proportion of CS bachelor’s graduates in North America who are international students. The number stood at 16.3% in 2021 and has been steadily increasing since 2012—the proportion of such students has risen 9.5 percentage points since 2012. New International CS Bachelor’s Graduates (% of Total) in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 16.30% New International CS Bachelor’s Graduates (% of Total) 16% 12% 8% 4% 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.2 Table of Contents Chapter 5 Preview 239 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education CS Master’s Graduates AI courses are also commonly offered in CS master’s as many master’s graduates as in 2012. However, degree programs. Figure 5.1.3 shows the total from 2018 to 2021 the total number of new master’s number of new CS master’s graduates in North graduates plateaued, declining slightly from 15,532 to America since 2010. In 2021 there were roughly twice 15,068. New CS Master’s Graduates in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 16,000 15,068 Number of New CS Master’s Graduates 14,000 12,000 10,000 8,000 6,000 4,000 2,000 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.3 Table of Contents Chapter 5 Preview 240 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education Interestingly, the number of CS master’s students at North American universities who are international started declining in 2016 after rising in the early 2010s (Figure 5.1.4). Despite the decline, in 2021 the majority of CS master’s graduates remained international (65.2%). New International CS Master’s Graduates (% of Total) in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report New International CS Master’s Graduates (% of Total) 80% 65.20% 60% 40% 20% 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.4 Table of Contents Chapter 5 Preview 241 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education CS PhD Graduates Unlike the trends in bachelor’s and master’s CS computer science (Figure 5.1.5). There were fewer graduates, since 2010 there have not been large CS PhD graduates in 2021 (1,893) than in 2020 (1,997) increases in the number of new PhD graduates in and 2012 (1,929). New CS PhD Graduates in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Number of New CS PhD Graduates 2,000 1,893 1,500 1,000 500 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.5 Table of Contents Chapter 5 Preview 242 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education CS PhD graduates in North American universities are becoming increasingly international (Figure 5.1.6). In 2010, 45.8% of CS PhD graduates were international students; the proportion rose to 68.6% in 2021. New International CS PhD Graduates (% of Total) in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report New International CS PhD Graduates (% of Total) 70% 68.60% 60% 50% 40% 30% 20% 10% 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.6 Table of Contents Chapter 5 Preview 243 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education Moreover, now a significantly larger proportion of new CS PhD students are specializing in AI (Figure 5.1.7). In 2021, 19.1% of new CS PhD students in North American institutions specialized in AI, a 4.2 percentage point increase since 2020 and 8.6 percentage point increase since 2012. New CS PhD Students (% of Total) Specializing in AI, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 19.10% 18% New AI PhD Students (% of Total) 16% 14% 12% 10% 8% 6% 4% 2% 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.7 Table of Contents Chapter 5 Preview 244 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education Where do new AI PhDs choose to work following (40.9%) as in academia (41.6%). However, as of 2021 graduation? Mirroring trends reported in last year’s a significantly larger proportion of students (65.4%) AI Index report, an increasingly large proportion of went to industry after graduation than to academia AI PhD graduates are heading to industry (Figures (28.2%). The amount of new AI PhDs entering 5.1.8 and 5.1.9). In 2011, for example, roughly the government was 0.7% and has remained relatively same percentage of graduates took jobs in industry unchanged in the last half-decade. Employment of New AI PhDs (% of Total) in North America by Sector, 2010–21 Employment of New AI PhDs in North America by Sector, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Industry 65.44%, Industry 238 61 63 154 132 72 51 63 60 123 43 42 162 134 101 74 64 153 116 40% 30% 28.19%, Academia 20% 10% 77 0.67%, Government 2019 2020 2017 2018 2016 2015 2014 2012 2013 2011 2010 2018 2017 2016 2015 0% 2013 2011 0 2010 76 85 2014 50 195 180 2021 100 136 50% 2021 47 2020 134 2019 150 219 73 178 154 60% 84 65 201 200 2012 Number of New AI PhD Graduates Academia 249 New AI PhD Graduates (% of Total) Government 250 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 281 Figure 5.1.8 Figure 5.1.9 1 The sums in Figure 5.1.9 do not add up to 100, as there is a subset of new AI PhDs each year who become self-employed, unemployed, or report an “other” employment status in the CRA survey. These students are not included in the chart. Table of Contents Chapter 5 Preview 245 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education CS, CE, and Information Faculty 5.1.10 highlights the total number of CS, CE (computer To better understand trends in AI and CS education, marginally increased in the last year, by 2.2%. Since it is instructive to consider data on computer science 2011 the number of CS, CE, and information faculty faculty in addition to postsecondary students. Figure has grown by 32.8%. engineering), and information faculty in North American universities. The amount of faculty has Number of CS, CE, and Information Faculty in North America, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Tenure Track Teaching Professors Other Instructors Research 8,000 Number of CS, CE, and Information Faculty 7,362 7,000 6,000 5,000 6,138 6,314 6,478 602 766 515 676 661 487 4,366 4,536 4,549 2011 2012 2013 656 447 669 6,629 689 529 6,806 6,887 691 649 589 432 432 390 Postdoc 7,657 653 465 494 7,858 7,976 668 530 296 426 617 8,149 522 306 736 861 1,180 831 895 1,183 1,150 1,014 1,122 4,711 4,786 5,059 5,214 5,252 5,231 5,310 4,548 2014 2015 2016 2017 2018 2019 2020 2021 863 4,000 3,000 2,000 1,000 0 Figure 5.1.10 Table of Contents Chapter 5 Preview 246 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education In 2021 there were a total of 6,789 CS faculty members in the United States (Figure 5.1.11). The total number of CS faculty in the United States increased by only 2.0% in the last year, but by 39.0% since 2011. Number of CS Faculty in the United States, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Tenure Track Teaching Professors Other Instructors Research Postdoc 7,000 6,098 6,000 5,256 Number of CS Faculty 5,000 4,885 522 4,000 387 521 460 550 521 5,068 592 491 421 5,637 5,729 5,202 535 491 509 396 455 826 364 903 567 6,430 6,533 531 518 426 382 408 436 534 947 671 715 6,654 424 276 6,789 428 287 618 693 946 899 679 3,000 2,000 3,455 3,725 3,564 3,559 2012 2013 2014 3,971 4,176 4,366 4,384 4,390 4,482 3,880 2015 2016 2017 2018 2019 2020 2021 1,000 0 2011 Figure 5.1.11 Table of Contents Chapter 5 Preview 247 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education Figure 5.1.12 reports the total number of new CS, hires in 2021, while in 2012 there were 733. Similarly, CE, and information faculty hires in North American the total number of tenure-track hires peaked in 2019 universities. In the last decade, the total number of at 422 and has since dropped to 324 in 2021. new faculty hires has decreased: There were 710 total New CS, CE, and Information Faculty Hires in North America, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Number of New CS, CE, and Information Faculty Hires Total 878 Tenure-Track 860 800 800 749 733 765 749 710 691 600 583 572 543 396 400 348 294 249 406 422 374 358 324 320 258 218 200 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.12 Table of Contents Chapter 5 Preview 248 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education In 2021, the greatest percentage of new CS, CE, and information faculty hires (40%) came straight from receiving a PhD (Figure 5.1.13). Only 11% of new CS and CE faculty came from industry. Source of New Faculty in North American CS, CE, and Information Departments, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report New PhD From Postdoc From Other Academic From Industry 100% 13% 13% 11% 11% 34% 34% 16% 15% 39% 40% 2020 2021 Source of New Faculty 80% 34% 41% 60% 15% 40% 20% 17% 38% 29% 0% 2018 2019 Figure 5.1.13 Table of Contents Chapter 5 Preview 249 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education The share of filled new CS, CE, and information faculty positions in North American universities has remained relatively stable in the last decade (Figure 5.1.14). In 2021, 89.3% of new faculty positions were filled, compared to 82.7% in 2011. Share of Filled New CS, CE, and Information Faculty Positions in North America, 2011–21 Share of Filled New CS, CE, and Information Faculty Positions Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 90% 89.28% 80% 70% 60% 50% 40% 30% 20% 10% 0% 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.14 Table of Contents Chapter 5 Preview 250 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education Among open CS, CE, and information faculty positions in 2021, the most commonly cited reason for their remaining unfilled was offers being turned down (53%) (Figure 5.1.15). In 22% of cases, hiring was still in progress, while 14% of the time, a candidate had not been identified who met the department’s hiring goals. Reason Why New CS, CE, and Information Faculty Positions Remained Un lled (% of Total), 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Didn’t �nd a person who met our hiring goals Technically vacant, not �lled for admin reasons O�ers turned down Hiring in progress Other Reason Faculty Positions Remained Un�lled (% of Total) 100% 10% 18% 10% 80% 25% 26% 17% 31% 28% 27% 23% 22% 18% 55% 60% 10% 5% 6% 6% 45% 12% 34% 36% 40% 40% 43% 52% 51% 56% 53% 44% 20% 37% 37% 37% 26% 0% 2011 2012 2013 2014 26% 2015 16% 14% 14% 13% 2016 2017 2018 2019 8% 2020 14% 2021 Figure 5.1.15 Table of Contents Chapter 5 Preview 251 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education Figure 5.1.16 highlights the median nine-month salaries full professor in computer science made 3.2% more of CS faculty in the United States by position since than they did in 2020, and 12.8% more than they did in 2015. During that period, the salaries for all classes 2015. (Note: These figures have not been adjusted for of professors have increased. In 2021, the average inflation.) Median Nine-Month Salary of CS Faculty in United States, 2015–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Full Professor Median Salary of CS Faculty (in Thousands of U.S. Dollars) 180 160 Associate Professor 176.01 170.57 168.87 164.54 159.96 158.97 156.02 Assistant Professor 140 120 113.95 111.67 103.01 101.16 99.12 100 114.07 109.23 107.55 105.45 127.47 123.71 121.55 119.48 117.5 80 60 40 20 0 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.16 Table of Contents Chapter 5 Preview 252 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education What proportion of new CS, CE, and information faculty tenure-track hires are international? The data suggests that it is not a substantial proportion. In 2021, only 13.2% of new CS, CE, and information faculty hires were international (Figure 5.1.17). New International CS, CE, and Information Tenure-Track Faculty Hires (% of Total) in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report New International Tenure-Track Faculty Hires (% of Total) 25% 20% 15% 13.20% 10% 5% 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.17 Table of Contents Chapter 5 Preview 253 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education The majority of CS, CE, and Information faculty losses in North American departments (36.3%) were the result of faculty taking academic positions elsewhere (Figure 5.1.18). In 2021, 15.2% of faculty took nonacademic positions, which is roughly the same amount as those who took such positions a decade prior, in 2011 (15.9%). Faculty Losses in North American CS, CE, and Information Departments, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Died Took academic position elsewhere Remained, but changed to part-time Retired Took nonacademic position Other Unknown 327 300 270 250 Faculty Losses 213 200 221 50 0 234 42 36 44 32 24 20 23 43 77 74 86 74 65 2013 2014 312 303 37 46 34 33 20 26 126 34 62 100 237 22 20 22 27 150 232 246 303 139 113 89 110 85 52 67 2011 89 2012 94 90 2015 2016 80 2017 94 103 91 2018 2019 2020 100 2021 Figure 5.1.18 Table of Contents Chapter 5 Preview 254 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education Narrative Highlight: Who Funds CS Departments in the U.S.? The CRA tracks data on the external funding sources agencies such as the Army Research Office, of CS departments in the United States. The main the Office of Naval Research, and the Air Force funder of American CS departments continues to Research Laboratory (20.3%); industrial sources be the National Science Foundation (NSF), which (12.1%); the Defense Advanced Research Projects in 2021 accounted for 34.9% of external funds. Agency (DARPA) (8.8%); and the National However, the share of funding provided by NSF has Institutes of Health (NIH) (6.8%). The diminishing decreased since 2003 (Figure 5.1.19). In 2021, the share of NSF funds over time has been partially next largest sources of funding came from defense offset by increasing funds from industry and NIH. External Funding Sources (% of Total) of CS Departments in United States, 2003–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 45% 40% External Funding Sources (% of Total) 35% 34.90%, NSF 30% 25% 20% 20.30%, Other Defense 15% 12.10%, Industrial Sources 8.80%, DARPA 6.80%, NIH 4.90%, Private Foundation 4.60%, Other 3.60%, Other Federal 2.30%, DOE 1.50%, State Agencies 0.40%, Unallocated 0.00%, IMLS 10% 5% 0% 2003 2006 2009 2012 2015 2018 2021 Figure 5.1.19 Table of Contents Chapter 5 Preview 255 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.1 Postsecondary AI Education Narrative Highlight: Who Funds CS Departments in the U.S.? (cont’d) Figure 5.1.20 shows the median total expenditures Although total median expenditures have from external sources for computing research in increased over the last decade for both private and American CS departments. In 2021, the median total public CS departments, the gap in expenditure expenditure for private universities was $9.7 million has widened, with private universities beginning to compared with $5.7 million for public universities. significantly outspend public ones. Median Total Expenditure From External Sources for Computing Research of U.S. CS Departments, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Median Total Expenditure (in Millions of U.S. Dollars) 10 9.71, Private 8 6 5.69, Public 4 2 0 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.1.20 Table of Contents Chapter 5 Preview 256 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.2 K–12 AI Education The following subsection shows trends in K–12 AI education based on K–12 computer science education data in the United States as well as survey data from UNESCO on the state of global K–12 AI education. 5.2 K–12 AI Education United States States Requiring That All High Schools Offer a Computer Science Course, 2022 Data on the state of K–12 CS education in the United States Source: Code.org, 2022 | Chart: 2023 AI Index Report ME AK comes from Code.org, an education innovation nonprofit dedicated to ensuring that every school includes computer science as part of its core K–12 education. Tracking trends CA NH MA NY CT RI WA MT ND SD MN WI MI OR ID WY NE IA IL IN OH PA NJ NV UT CO KS MO KY WV DC MD DE AZ NM OK AR TN VA NC TX LA MS AL GA SC in K–12 CS education can partially serve as a proxy for understanding the state of K–12 AI education in America HI State-Level Trends VT Yes No FL Figure 5.2.1 highlights the 27 Figure 5.2.1 states that in 2022 required that all high schools offer a computer science course. Figure 5.2.2 highlights the percentage of public high Public High Schools Teaching Computer Science (% of Total in State), 2022 Source: Code.org, 2022 | Chart: 2023 AI Index Report ME 60% AK 51% schools in a state that teach computer science. The top three states in terms of rate of computer science teaching are Maryland (98%), South Carolina (93%), and Arkansas (92%). CA 40% HI 77% VT 76% NH 82% MA 78% NY 48% CT 77% RI 86% WA 47% MT 36% ND 44% SD 39% MN 21% WI 66% MI 46% OR 63% ID 38% WY 55% NE 52% IA 71% IL 44% IN 85% OH 48% PA 77% NJ 67% NV 83% UT 73% CO 57% KS 40% MO 49% KY 63% WV 78% DC 45% MD 98% DE 40% AZ 36% NM 41% OK 62% AR 92% TN 60% VA 75% NC 61% TX 47% LA 32% MS 60% AL 85% GA 66% SC 93% FL 40% Figure 5.2.2 Table of Contents Chapter 5 Preview 257 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.2 K–12 AI Education AP Computer Science a total of 181,040 AP computer science exams taken, Another barometer for tracking the state of K–12 CS roughly the same number as the previous year, after education in the United States is analyzing trends in the several years of significant increases. This leveling total number of AP computer science exams taken. could be the result of the pandemic. Since 2007, the 2 number of AP computer science exams has increased Year over year the total number of AP computer over ninefold. science exams continued to increase. In 2021, the most recent year for which there is data, there were Number of AP Computer Science Exams Taken, 2007–21 Source: Code.org, 2022 | Chart: 2023 AI Index Report 181.04 Number of AP Computer Science Exams Taken (in Thousands) 180 160 140 120 100 80 60 40 20 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 5.2.3 2 There are two types of AP CS exams: Computer Science A and Computer Science Principles. Data on computer science exams taken includes both exams. AP CS Principles was initially offered in 2017. Table of Contents Chapter 5 Preview 258 Artificial Intelligence Index Report 2023 In 2021, the states which saw the greatest number of AP computer science exams taken were California (31,189), Chapter 5: Education 5.2 K–12 AI Education Number of AP Computer Science Exams Taken, 2021 Source: Code.org, 2022 | Chart: 2023 AI Index Report ME 242 AK 100 VT 150 followed by Texas (17,307), Florida (14,864), New York WA 4,034 MT 42 ND 109 SD 26 OR 714 ID 429 WY 112 NE 514 IA 521 OH PA NJ IL IN 8,572 2,883 3,754 6,104 9,391 NV CA 31,189 1,701 UT 612 CO 2,584 KS 236 MO 1,199 KY 1,462 AZ 1,587 NM 270 OK 500 VA TN NC AR 1,406 2,046 6,034 6,273 (13,304), and New Jersey (9,391) (Figure 5.2.4). Figure 5.2.5 looks at the number of AP CS exams taken per capita.3 The state with the largest per capita TX LA 17,307 1,191 amount of AP computer science exams taken in 2021 was Maryland, with WI MN MI 1,432 2,080 4,504 NH 403 MS 400 WV 352 GA AL 2,399 7,221 HI 782 NY CT 13,304 3,251 DC 352 MD 7,662 MA 5,451 RI 617 DE 513 SC 2,159 FL 14,864 Figure 5.2.4 124.1 exams per 100,000 inhabitants. The next states were New Jersey (101.3), Connecticut (89.7), California (79.7), and Massachusetts (78.0). Number of AP Computer Science Exams Taken per 100,000 Inhabitants, 2021 Source: Code.org, 2022 | Chart: 2023 AI Index Report ME 17.57 AK 13.62 VT NH MA 23.18 29.04 77.99 WA 52.11 MT 3.80 ND 14.01 SD 2.90 WI MN MI 25.07 35.37 44.87 WY OR NE ID IA IL IN 16.78 22.53 19.33 26.18 16.29 67.57 42.31 UT NV CA CO 79.68 54.06 18.33 44.47 AZ 21.84 NM 12.76 KS 8.03 MO KY 19.43 32.44 OH 31.91 PA NJ 46.91 101.33 MD DC DE 52.63 124.09 51.05 VA TN OK NC AR 12.53 46.43 29.36 69.70 59.37 TX MS LA 58.55 25.74 13.56 HI 54.04 WV 19.71 RI NY CT 67.00 89.72 56.25 SC GA AL 47.51 66.94 41.57 FL 68.10 Figure 5.2.5 3 More specifically, Figure 5.2.5 normalizes the number of AP CS exams taken—the total number of exams taken in a particular state in 2021 is divided by the state’s population based on the 2021 U.S. Census. Table of Contents Chapter 5 Preview 259 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.2 K–12 AI Education Narrative Highlight: The State of International K–12 Education In 2021, UNESCO released one of the most Figure 5.2.6, taken from the UNESCO report, comprehensive reports to date on the international highlights the governments that have taken steps state of government-endorsed AI curricula. To to implement AI curricula and across which levels gather information, UNESCO released two surveys: of education. For example, Germany is in the the first to representatives of 193 UNESCO member process of developing government-endorsed AI states and the second to over 10,000 private- curricular standards on the primary, middle, and and third-sector actors. As part of these surveys, high-school levels, and the Chinese government respondents were asked to report on the status of AI has already endorsed and implemented curricula for students in K–12 general education. standards across those same three levels. Government Implementation Implementationof ofAI AICurricula Curriculaby byCountry, Country,Status, Status,and andEducation EducationLevel Level UNESCO, 2022 2022 || Table: Table:2023 2023AI AIIndex IndexReport Report Source: UNESCO, Country Status Status Armenia Endorsedand Endorsed andImplemented Implemented PrimarySchool School Primary Middle School Middle School Austria Endorsed Endorsedand andImplemented Implemented Belgium Endorsed Endorsedand andImplemented Implemented China Endorsed Endorsedand andImplemented Implemented India Endorsed Endorsedand andImplemented Implemented Kuwait Endorsed Endorsedand andImplemented Implemented ✓✓ ✓✓ Portugal Portugal Endorsed Endorsedand andImplemented Implemented ✓✓ ✓✓ ✓✓ Qatar Qatar Endorsed Endorsedand andImplemented Implemented ✓✓ ✓✓ ✓✓ Serbia Serbia Endorsed Endorsedand andImplemented Implemented ✓✓ ✓✓ South South Korea Korea Endorsed Endorsedand andImplemented Implemented United United Arab Arab Emirates Emirates Bulgaria Bulgaria Endorsed Endorsedand andImplemented Implemented In Development In Development ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ Germany Germany Jordan Jordan In InDevelopment Development In InDevelopment Development ✓✓ ✓✓ ✓✓ ✓✓ Saudia Saudia Arabia Arabia Serbia Serbia In InDevelopment Development In InDevelopment Development ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ High School High School ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ ✓✓ Figure 5.2.64 4 According to the UNESCO report, Serbia has already endorsed and implemented certain kinds of K–12 AI curricula, but is also simultaneously in the process of developing others—thus it is listed under both categories. Table of Contents Chapter 5 Preview 260 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.2 K–12 AI Education Narrative Highlight: The State of International K–12 Education (cont’d) Figure 5.2.7 identifies the topic areas most emphasized in the K–12 AI curricula profiled in the UNESCO report. The four topics toward which the most time was allocated were algorithms and programming (18%), AI technologies (14%), data literacy (12%), and application of AI to other domains (12%). Time Allocated (% of Total) in K–12 AI Curricula by Topic, 2022 Source: UNESCO, 2022 | Chart: 2023 AI Index Report Algorithms and Programming 18% Data Literacy 12% Contextual Problem-Solving 11% Application of AI to Other Domains 12% Ethics of AI 7% Social Implications of AI 5% AI Technologies 14% Developing AI Technologies 9% AI Foundations AI Techniques Ethics and Social Impact 2% Understanding, Using, and Developing AI Unspeci ed 0% Unspeci ed 10% 2% 4% 6% 8% 10% % of Time Allocated 12% 14% 16% 18% Figure 5.2.7 Table of Contents Chapter 5 Preview 261 Artificial Intelligence Index Report 2023 Chapter 5: Education 5.2 K–12 AI Education Narrative Highlight: The State of International K–12 Education (cont’d) What might an actual K–12 AI curriculum look like in practice? The UNESCO report includes detailed information about a sample curriculum that was deployed in Austria, the Austrian Data Science and Artificial Intelligence curriculum. As noted in the report: “The Austrian Data Science and Artificial Intelligence curriculum includes digital basics such as using an operating system to store and print files, design presentations, and use spreadsheets and word-processing software. It also covers design and reflection on types and social issues in digital media, and safe digital media use. Students in high school engage programming languages, algorithms and simulations. They learn the basic principles of data literacy, including collecting data, structuring a spreadsheet, and carrying out analyses and visualizations. They apply criteria to evaluate the credibility and reliability of data “They also gain an understanding of the ethical dilemmas that are associated with the use of such technologies, and become active participants in social discourse on these issues.” sources as well as digital content. Students are expected to know about careers in ICT, including AI, and the social applications of emerging technologies. They create digital media and learn about the cloud and how to connect and network computers. They also gain an understanding of the ethical dilemmas that are associated with the use of such technologies, and become active participants in social discourse on these issues. Finally, students are tasked with using technology to make public statements and understand how this reflects the democratic process.” Table of Contents Chapter 5 Preview 262 Artificial Intelligence Index Report 2023 Artificial Intelligence Index Report 2023 CHAPTER 6: Policy and Governance Table of Contents Chapter 6 Preview 263 Artificial Intelligence Index Report 2023 CHAPTER 6 PREVIEW: Policy and Governance Overview 265 6.2 National AI Strategies 285 Chapter Highlights 266 Aggregate Trends 285 6.1 AI and Policymaking 267 By Geographic Area 285 Global Legislation Records on AI 267 6.3 U.S. Public Investment in AI 286 269 Federal Budget for Nondefense AI R&D 286 U.S. Department of Defense Budget Requests 287 U.S. Government AI-Related Contract Spending 288 By Geographic Area Narrative Highlight: A Closer Look at Global AI Legislation 270 United States Federal AI Legislation 271 United States State-Level AI Legislation 272 Narrative Highlight: A Closer Look at State-Level AI Legislation Total Contract Spending 288 275 276 6.4 U.S. AI-Related Legal Cases 291 By Geographic Area 277 Total Cases 291 Narrative Highlight: A Closer Look at Global AI Mentions Geographic Distribution 292 279 Sector 293 United States Committee Mentions 280 Type of Law 294 United States AI Policy Papers 283 Global AI Mentions By Topic 284 Narrative Highlight: Three Significant AI-Related Legal Cases 295 ACCESS THE PUBLIC DATA Table of Contents Chapter 6 Preview 264 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance Overview The growing popularity of AI has prompted intergovernmental, national, and regional organizations to craft strategies around AI governance. These actors are motivated by the realization that the societal and ethical concerns surrounding AI must be addressed to maximize its benefits. The governance of AI technologies has become essential for governments across the world. This chapter examines AI governance on a global scale. It begins by highlighting the countries leading the way in setting AI policies. Next, it considers how AI has been discussed in legislative records internationally and in the United States. The chapter concludes with an examination of trends in various national AI strategies, followed by a close review of U.S. public sector investment in AI. Table of Contents Chapter 6 Preview 265 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance Chapter Highlights Policymaker interest in AI is on the rise. An AI Index analysis of the legislative records of 127 countries shows that the number of bills containing “artificial intelligence” that were passed into law grew from just 1 in 2016 to 37 in 2022. An analysis of the parliamentary records on AI in 81 countries likewise shows that mentions of AI in global legislative proceedings have increased nearly 6.5 times since 2016. From talk to enactment— the U.S. passed more AI bills than ever before. In 2021, only 2% of all federal AI bills in the United States were passed into law. This number jumped to 10% in 2022. Similarly, last year 35% of all state-level AI bills were passed into law. The U.S. government continues to increase spending on AI. Since 2017, the amount of U.S. government AI-related contract spending has increased roughly 2.5 times. Table of Contents Chapter 6 Preview When it comes to AI, policymakers have a lot of thoughts. A qualitative analysis of the parliamentary proceedings of a diverse group of nations reveals that policymakers think about AI from a wide range of perspectives. For example, in 2022, legislators in the United Kingdom discussed the risks of AI-led automation; those in Japan considered the necessity of safeguarding human rights in the face of AI; and those in Zambia looked at the possibility of using AI for weather forecasting. The legal world is waking up to AI. In 2022, there were 110 AI-related legal cases in United States state and federal courts, roughly seven times more than in 2016. The majority of these cases originated in California, New York, and Illinois, and concerned issues relating to civil, intellectual property, and contract law. 266 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking In the last 10 years, AI governance discussions have accelerated, resulting in numerous policy proposals in various legislative bodies. This section begins by exploring the legislative initiatives related to AI that have been suggested or enacted in different countries and regions, followed by an in-depth examination of state-level AI legislation in the United States. The section then scrutinizes records of AI-related discussions in parliaments and congresses worldwide and concludes with the number of AI policy papers published in the United States. 6.1 AI and Policymaking1 Global Legislative Records on AI passed at least one AI-related bill, and together they The AI Index conducted an analysis of laws passed Figure 6.1.2 shows that from 2016 to 2022, there has by legislative bodies in 127 countries that contain the been a sharp increase in the total number of AI-related words “artificial intelligence” from 2016 to 2022.2 bills passed into law, with only one passed in 2016, Of the 127 countries analyzed, since 2016, 31 have climbing to 37 bills passed in 2022. have passed a total of 123 AI-related bills (Figure 6.1.1). Number of AI-Related Bills Passed Into Law by Country, 2016–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report 0 1–5 6–10 11–15 16–25 No Available Data Figure 6.1.1 1 Note that the analysis of passed AI policies may undercount the number of actual bills, given that large bills can include multiple sub-bills related to AI; for example, the CHIPS and Science Act passed by the U.S. in 2022. 2 The full list of countries analyzed is in the Appendix. The AI Index team attempted to research the legislative bodies of every country in the world; however, publicly accessible legislative databases were not made available for certain countries. Table of Contents Chapter 6 Preview 267 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Number of AI-Related Bills Passed Into Law in 127 Select Countries, 2016–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report 37 35 Number of AI-Related Bills 30 25 20 15 10 5 0 2016 2017 2018 2019 2020 2021 2022 Figure 6.1.2 Table of Contents Chapter 6 Preview 268 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking By Geographic Area the Philippines, which passed 5 and 4 laws, respectively. Figure 6.1.3 shows the number of laws containing Figure 6.1.4 shows the total number of laws passed mentions of AI that were enacted in 2022. The United since 2016. The United States leads the list with 22 bills, States led the list with 9 laws, followed by Spain and followed by Portugal, Spain, Italy, and Russia. Number of AI-Related Bills Passed Into Law in Select Countries, 2022 Source: AI Index, 2022 | Chart: 2023 AI Index Report United States 9 Spain 5 Philippines 4 Andorra 2 Belgium 2 Italy 2 Portugal 2 Russia 2 United Kingdom 2 Austria 1 Croatia 1 Germany 1 Kyrgyz Republic 1 Latvia 1 Liechtenstein 1 Slovenia 1 0 1 2 3 4 5 Number of AI-Related Bills 6 7 8 9 Figure 6.1.3 Number of AI-Related Bills Passed Into Law in Select Countries, 2016–22 (Sum) Source: AI Index, 2022 | Chart: 2023 AI Index Report United States 22 Portugal 13 Spain 10 Italy 9 Russia 9 Belgium 7 United Kingdom 6 Austria 5 Korea, Rep. 5 Philippines 5 France 4 China 3 Germany 3 Japan 3 0 Table of Contents 2 4 6 Chapter 6 Preview 8 10 12 14 Number of AI-Related Bills 16 18 20 22 Figure 6.1.4 269 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Narrative Highlight: A Closer Look at Global AI Legislation The following subsection delves into some of the AI-related legislation passed into law during 2022. Figure 6.1.5 samples five different countries’ laws covering a range of AI-related issues. AI-Related AI-Related AI-Related Legislation Legislation From From Select Select Countries, Countries, 2022 2022 Source: AI Index, 2022 Table: 2023 AI Source: Source:AI AIIndex, Index,2022 2022|||Table: Table:2023 2023AI AIIndex IndexReport Report Country Country Country Bill Bill BillName Name Description Description Kyrgyz Republic Kyrgyz KyrgyzRepublic Republic About About About the the Creative Creative Industries Industries Park Park This and operation Thislaw lawdetermines determinesthe thelegal legalstatus, status,management, management,and andoperation operation procedures established to accelerate the proceduresof ofthe theCreative CreativeIndustries IndustriesPark, Park,established establishedto toaccelerate acceleratethe the development of creative industries, including arti�cial intelligence. development of creative industries, includingarti�cial arti�cialintelligence. intelligence. Latvia Latvia Latvia Amendments Amendments Amendmentsto tothe theNational NationalSecurity SecurityLaw Law AAprovision commercial companies, provisionof ofthis thisact actestablishes establishesrestrictions restrictionson oncommercial commercialcompanies, companies, associations, national security, including associations,and andfoundations foundationsimportant importantfor fornational nationalsecurity, security,including includingaaa commercial intelligence. commercialcompany companythat thatdevelops developsarti�cial arti�cialintelligence. intelligence. Philippines Philippines Philippines Second Second SecondCongressional CongressionalCommission Commissionon onEducation Education (EDCOM (EDCOM (EDCOMII) II)Act Act AAprovision commission to review, provisionof ofthis thisact actcreates createsaacongressional congressionalcommission commissionto toreview, review, assess, education; to recommend assess,and andevaluate evaluatethe thestate stateof ofPhilippine Philippineeducation; education;to torecommend recommend innovative education; and to appropriate innovativeand andtargeted targetedpolicy policyreforms reformsinineducation; education;and andto toappropriate appropriate funds. challenges to education funds.The Theact actcalls callsfor forreforms reformsto tomeet meetthe thenew newchallenges challengesto toeducation education caused characterized, in part, by the causedby bythe theFourth FourthIndustrial IndustrialRevolution Revolutioncharacterized, characterized,in inpart, part,by bythe the rapid development of arti�cial intelligence. rapid development of arti�cial intelligence. Spain Spain Spain Right Right Rightto toequal equaltreatment treatmentand andnon-discrimination non-discrimination AAprovision intelligence algorithms provisionof ofthis thisact actestablishes establishesthat thatarti�cial arti�cialintelligence intelligencealgorithms algorithms involved take into account involvedin inpublic publicadministrations’ administrations’decision-making decision-makingtake takeinto intoaccount account bias-minimization accountability, whenever bias-minimizationcriteria, criteria,transparency, transparency,and andaccountability, accountability,whenever whenever technically technicallyfeasible. feasible. United States United UnitedStates States AI AI AITraining TrainingAct Act This Budget to establish or Thisbill billrequires requiresthe theO�ce O�ceof ofManagement Managementand andBudget Budgetto toestablish establishor or otherwise otherwise provide an AI training program for the acquisition workforce of otherwiseprovide providean anAI AItraining trainingprogram programfor forthe theacquisition acquisitionworkforce workforceof of executive executive agencies (e.g., those responsible for program management or executiveagencies agencies(e.g., (e.g.,those thoseresponsible responsiblefor forprogram programmanagement managementor or logistics), logistics), with exceptions. The purpose of the program isisto to ensure that logistics),with withexceptions. exceptions.The Thepurpose purposeof ofthe theprogram programis toensure ensurethat that the workforce has knowledge of the capabilities and risks associated with the theworkforce workforcehas hasknowledge knowledgeof ofthe thecapabilities capabilitiesand andrisks risksassociated associatedwith with AI. AI. AI. Figure 6.1.5 Table of Contents Chapter 6 Preview 270 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking United States Federal AI Legislation just one federal bill was proposed, while in 2021, 134 A closer look at the U.S. federal legislative record proposed bills. While fewer bills were proposed in shows a sharp increase in the total number of 2022, the number of passed bills, which remained at proposed bills that relate to AI (Figure 6.1.6). In 2015, 3 for each of the past four years, increased to 9. bills were proposed. In 2022 this number fell to 88 Number of AI-Related Bills in the United States, 2015–22 (Proposed Vs. Passed) Source: AI Index, 2022 | Chart: 2023 AI Index Report 140 120 Number of AI-Related Bills 100 88, Proposed 80 60 40 20 9, Passed 0 2015 2016 2017 2018 2019 2020 2021 2022 Figure 6.1.6 Table of Contents Chapter 6 Preview 271 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking United States State-Level AI Legislation Maryland with 3. Figure 6.1.8 shows the total volume of Figure 6.1.7 shows the number of laws containing California, Massachusetts, and Washington. Figure mentions of AI that were passed by U.S. states in 6.1.9 highlights the number of state-level AI-related 2022. California leads the list with 5, followed by bills passed by all states since 2016. legislation passed from 2016 to 2022 for select states, with Maryland leading the list with 7 bills, followed by Number of AI-Related Bills Passed Into Law in Select U.S. States, 2022 Source: AI Index, 2022 | Chart: 2023 AI Index Report California 5 Maryland 3 Colorado 2 New Jersey 2 Washington 2 Alabama 1 Hawaii 1 Idaho 1 Louisiana 1 Massachusetts 1 North Carolina 1 Vermont 1 0 1 2 3 Number of AI-Related Bills 4 5 Figure 6.1.7 Table of Contents Chapter 6 Preview 272 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Number of AI-Related Bills Passed Into Law in Select U.S. States, 2016–22 (Sum) Source: AI Index, 2022 | Chart: 2023 AI Index Report Maryland 7 California 6 Massachusetts 5 Washington 5 Illinois 3 Utah 3 Vermont 3 Alabama 2 Colorado 2 Michigan 2 New Jersey 2 New York 2 North Carolina 2 Ohio 2 0 1 2 3 4 Number of AI-Related Bills 5 6 7 Figure 6.1.8 (Sum) Number of State-Level AI-Related Bills Passed Into Law in the United States by State, 2016–22 (Sum) Source: AI Index, 2022 | Chart: 2023 AI Index Report ME 0 AK 0 CA 6 WA 5 MT 0 ND 1 SD 0 MN 0 WI 0 MI 2 OR 0 ID 1 WY 0 NE 0 IA 0 IL 3 IN 0 NV 1 UT 3 CO 2 KS 0 MO 0 KY 1 WV 1 AZ 0 NM 0 OK 0 AR 0 TN 0 VA 1 NC 2 TX 2 LA 1 MS 1 AL 2 GA 0 SC 0 HI 1 OH 2 VT 3 NH 0 MA 5 NY 2 CT 0 RI 0 PA 0 NJ 2 MD 7 DE 0 FL 0 Figure 6.1.9 Table of Contents Chapter 6 Preview 273 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Growing policy interest in AI can also be seen at the state level, with 60 AI-related bills proposed in 2022 (Figure 6.1.10)—a dramatic increase from the 5 bills proposed in 2015. Additionally, the proportion of bills being passed has risen throughout the years. In 2015, 1 bill was passed, representing 16% of the total bills proposed that year; while in 2022, 21 bills were passed, or 35% out of the total that were proposed. Number of State-Level AI-Related Bills in the United States, 2015–22 (Proposed Vs. Passed) Source: AI Index, 2022 | Chart: 2023 AI Index Report 60, Proposed 60 Number of AI-Related Bills 50 40 30 21, Passed 20 10 0 2015 2016 2017 2018 2019 2020 2021 2022 Figure 6.1.10 Table of Contents Chapter 6 Preview 274 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Narrative Highlight: A Closer Look at State-Level AI Legislation The following subsection highlights some of the AI-related legislation passed into law at the state level during 2022. Figure 6.1.11 focuses on wide-ranging AI-related laws from five states around the country. AI-Related Legislation Legislation From From Select Select States, States, 2022 2022 AI-Related Source:AI AIIndex, Index,2022 2022|||Table: Table:2023 2023AI AIIndex IndexReport Report Source: AI Index, 2022 Table: 2023 AI Index Report Source: State State State BillName Name Bill Name Bill Description Description Alabama Alabama Alabama Arti�cialIntelligence, Intelligence,Limit Limit the the Use Use Arti�cial Intelligence, Limit Arti�cial ofFacial FacialRecognition, Recognition,to toEnsure Ensure of Facial Recognition, to Ensure of Arti�cialIntelligence IntelligenceIs IsNot Notthe theOnly Only Arti�cial Intelligence Is Not Arti�cial Basisfor forArrest Arrest Basis for Arrest Basis This bill bill prohibits prohibits state state or or local local law law enforcement enforcement agencies agenciesfrom fromusing usingfacial facialrecognition recognition This matchresults resultsas asthe thesole solebasis basisfor formaking makingan anarrest arrestor orfor forestablishing establishingprobable probablecause causein inaa match criminalinvestigation. investigation. criminal California California California BudgetAct Actof of2022 2022 Budget Act of 2022 Budget provisionof ofthis thisappropriations appropriationsbill billfor forthe the2022–23 2022–23�scal �scalyear yearallocates allocates$1,300,000 $1,300,000to to AAprovision CaliforniaState StateUniversity, University,Sacramento, Sacramento,to toimprove improvethe thecampus campuschildcare childcarecenter, center, California includingthe thedevelopment developmentof ofan anarti�cial arti�cialintelligence intelligencemixed-reality mixed-realityclassroom. classroom. including Maryland Maryland Maryland ConservationFinance FinanceAct Act Conservation Finance Act Conservation provisionof ofthis thisact actestablishes establishesthat thatthe theDepartment Departmentof ofNatural NaturalResources Resourcesshall shallstudy study AAprovision andassess assessthe thepotential potentialfor fordigital digitaltools toolsand andplatforms platformsincluding includingarti�cial arti�cialintelligence intelligenceand and and machine learning learning to to contribute contribute to to Chesapeake Chesapeake Bay Bay restoration restorationand andclimate climatesolutions. solutions. machine NewJersey Jersey New Jersey New 21stCentury CenturyIntegrated IntegratedDigital Digital 21st Century Integrated Digital 21st ExperienceAct Act Experience Act Experience provisionof ofthis thisact, act,which whichconcerns concernsthe themodernization modernizationof ofstate stategovernment governmentwebsites, websites, AAprovision establishesthat thatthe thechief chieftechnology technologyo�cer, o�cer,in inconsultation consultationwith withthe thechief chiefinnovation innovation establishes o�cerand andthe theNew NewJersey JerseyInformation InformationTechnology TechnologyProject ProjectReview ReviewBoard, Board,shall shallevaluate evaluate o�cer on an an annual annual basis basis the the feasibility feasibility of of state state agencies agencies using using arti�cial arti�cialintelligence intelligenceand and on machinelearning learningto toprovide providepublic publicservices. services. machine Vermont Vermont Vermont AnAct ActRelating Relatingto tothe theUse Useand and An Act Relating to the Use and An Oversightof ofArti�cial Arti�cialIntelligence Intelligencein in Oversight of Arti�cial Intelligence Oversight StateGovernment Government State Government State This act act creates creates the the Division Division of of Arti�cial Arti�cial Intelligence Intelligence within withinthe theAgency Agencyof ofDigital DigitalServices Services This toreview reviewall allaspects aspectsof ofarti�cial arti�cialintelligence intelligencedeveloped, developed,employed, employed,or orprocured procuredby bythe the to stategovernment. government.The Theact actrequires requiresthe theDivision Divisionof ofArti�cial Arti�cialIntelligence Intelligenceto, to,among amongother other state things,propose proposeaastate statecode codeof ofethics ethicson onthe theuse useof ofarti�cial arti�cialintelligence intelligencein instate state things, government and and make make recommendations recommendations to to the the General General Assembly Assemblyon onpolicies, policies,laws, laws,and and government regulationsregarding regardingarti�cial arti�cialintelligence intelligencein instate stategovernment. government. regulations Figure 6.1.11 Table of Contents Chapter 6 Preview 275 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Global AI Mentions Another barometer of legislative interest is the contain the keyword “artificial intelligence” from 2016 number of mentions of “artificial intelligence” in to 2022.3 Figure 6.1.12 shows that mentions of AI in governmental and parliamentary proceedings. The legislative proceedings in these countries registered a AI Index conducted an analysis of the minutes or small decrease from 2021 to 2022, from 1,547 to 1,340. proceedings of legislative sessions in 81 countries that Number of Mentions of AI in Legislative Proceedings in 81 Select Countries, 2016–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report 1,600 1,400 1,340 Number of Mentions 1,200 1,000 800 600 400 200 2016 2017 2018 2019 2020 2021 2022 Figure 6.1.12 3 The full list of countries that was analyzed is in the Appendix. The AI Index research team attempted to review the governmental and parliamentary proceedings of every country in the world; however, publicly accessible governmental and parliamentary databases were not made available for all countries. Table of Contents Chapter 6 Preview 276 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking By Geographic Area Figure 6.1.13 shows the number of legislative proceedings containing mentions of AI in 2022.4 From the 81 countries considered, 46 had at least one mention, and Spain topped the list with 273 mentions, followed by Canada (211), the United Kingdom (146), and the United States (138). Number of Mentions of AI in Legislative Proceedings by Country, 2022 Source: AI Index, 2022 | Chart: 2023 AI Index Report 0 1–55 56–110 111–165 166–220 221–280 No Available Data Figure 6.1.13 4 For mentions of AI in legislative proceedings around the world, the AI Index performed searches of the keyword “artificial intelligence,” in the respective languages, on the websites of different countries’ congresses or parliaments, usually under sections named “minutes,” “Hansard,” etc. Table of Contents Chapter 6 Preview 27 7 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Figure 6.1.14 shows the total number of AI mentions in the past seven years. Of the 81 countries considered, 62 had at least one mention, and the United Kingdom dominates the list with 1,092 mentions, followed by Spain (832), the United States (626), Japan (511), and Hong Kong (478). Number of Mentions of AI in Legislative Proceedings by Country, 2016–22 (Sum) Source: AI Index, 2022 | Chart: 2023 AI Index Report 0 1–220 221–440 441–660 661–880 881–1100 No Available Data Figure 6.1.14 Table of Contents Chapter 6 Preview 278 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Narrative Highlight: A Closer Look at Global AI Mentions The following subsection examines mentions of AI in government proceedings in 2022. Figure 6.1.15 quotes discussions across a geographically diverse set of countries. AI-Related AI-Related AI-Related Parliamentary ParliamentaryMen Mentions tionsFrom FromSelect SelectCountries, Countries,2022 2022 Source: AI Index, 2022 Table: 2023 AI Index Report Source: Source:AI AIIndex, Index,2022 2022 |||Table: Table:2023 2023AI AIIndex IndexReport Report Country Country Country Legislature Legislature Legislature Speaker Speaker Speaker Quote Quote Agenda AgendaItem Item Australia Australia Australia House of House Houseof of Representatives Representatives Representatives Ed Husic, Australian Ed EdHusic, Husic,Australian AustralianLabor Labor Party, Minister Party, Party,Minister Ministerfor forIndustry Industry and Science and andScience Science “Working “Workingwith withour ourinternational internationalpartners partnerswe wecan can transform transformAustralian Australianknow-how know-howinto intoglobally globallyrecognised recognised skills skillsand andmanufacturing manufacturinginindefence defenceindustries. industries.And Andwe we can canbuild buildon onour ourundeniable undeniableexpertise expertiseininareas areaslike like quantum quantumtechnologies, technologies,robotics roboticsand andarti arti cial cial intelligence. intelligence.We Wewill willseek seekto topartner partnerwith withindustry industryand and state stateand andterritory territorygovernments governmentstotoidentify identifyinvestment investment opportunities opportunitieswithin withinpriority priorityareas. areas.An Anon-ramp, on-ramp,ififyou you will, will,of ofturn-key turn-keyopportunities opportunitiesfor forinvestment investmenttotomake make sure the NRF is well placed for success.” sure the NRF is well placed for success.” National NationalReconstruction Reconstruction Fund FundCorporation CorporationBill Bill 2022 2022 - Second Reading - Second Reading Brazil Brazil Brazil Diary of the Diary Diaryof ofthe the Chamber of the Chamber Chamberof ofthe the Members Members Members Mr. Gustavo Fruet, Mr. Mr.Gustavo GustavoFruet, Fruet, Democratic Labor Democratic DemocraticLabor LaborParty Party “There “Therehas hasbeen beenaalot lotofoftalk talkabout aboutthe thefuture futureofofwork workdue due to totechnology. technology.InInthe thebook bookThe TheFourth FourthIndustrial Industrial Revolution, Revolution,Klaus KlausSchwab Schwabeven evenpoints pointsout outprofessions professions that thatwill willbe beextinct extinctand andprofessions professionsthat thatwill willdemand demand more and more quali�cations, in times of 5G, more and more quali�cations, in times of 5G,Internet Internetofof Things Thingsand andArti Arti cial cialIntelligence. Intelligence.InInthis thissense, sense,ititisisgood good to tohighlight highlightthat thatthe thepandemic, pandemic,among amongother other contradictions, contradictions,ended endedup upanticipating anticipatingthe theuse useofof technology, technology,especially especiallyininthe thetelework.” telework.” Presentation PresentationofofBill BillNo. No. 135, 135,ofof2022, 2022,on onthe the amendment amendmentofofthe theCLT CLT- Consolidation ConsolidationofofLabor Labor Laws, Laws,with withaaview viewtoto granting grantingtelework teleworktoto parents parentsofofchildren childrenup uptoto88 years yearsold old Japan Japan Japan 210th Session Session of 210th 210th Sessionof of the Diet Diet House House of the the Diet Houseof of Councilors Councilors Councilors Commission on on Commission Commission on the Constitution Constitution the the Constitution No. 22 No. No. 2 Kohei Otsuka, Otsuka, Democratic Kohei Kohei Otsuka, Democratic Party for for the the People, Party Party for thePeople, People, Shinryokufukai Shinryokufukai Shinryokufukai “In the �eld of human rights, we believe that it is “In the �eld of human rights, we believe that it is necessary necessaryto toupdate updatehuman humanrights rightsguarantees guaranteesininorder ordertoto respond respondto tochanges changesininthe thetimes timesthat thatwere wereunpredictable unpredictable when whenthe theConstitution Constitutionwas wasenacted. enacted.InInparticular, particular,asasthe the fusion of arti cial intelligence and Internet technology fusion of arti cial intelligence and Internet technology progresses, progresses,the theinternational internationalcommunity communityisisconcerned concerned about aboutthe theproblems problemsofofindividual individualscoring scoringand and discrimination, and the problem of Internet advertising discrimination, and the problem of Internet advertising that thatunfairly unfairlyin�uences in�uencesthe thevoting votingbehavior behaviorofofcitizens. citizens. We need a constitutional argument to guarantee the We need a constitutional argument to guarantee the autonomous autonomousdecision-making decision-makingofofindividuals individualsand andprotect protect basic basicdata datarights rightsininthe thedigital digitalage.” age.” The Commission on the The Commission on the Constitution Constitution United United United Kingdom Kingdom Kingdom House of of House House of Commons Commons Commons Dame Angela Angela Dame Dame AngelaEagle, Eagle,Labor Labor “What “Whatwould wouldbe bethe theuse useofofarti arti cial cialintelligence intelligenceinintrying trying to decide how automated these things could become? to decide how automated these things could become? Would there be worries about over-automation? How Would there be worries about over-automation? How would that be looked at in terms of regulation? How would that be looked at in terms of regulation? How open are we going to be about the way in which AI is open are we going to be about the way in which AI is applied and how it might evolve in ways that might applied and how it might evolve in ways that might embed discrimination such that we get a system where embed discrimination such that we get a system where certain people may be discriminated against and certain people may be discriminated against and excluded?” excluded?” Financial FinancialServices Servicesand and Markets Bill (Fourth Markets Bill (Fourth Sitting) Sitting) Zambia Zambia Zambia The House, House, The The House, National National National Assembly Assembly Assembly Hon. Collins Collins Nzovu, Nzovu, United Hon. Hon. Collins Nzovu, United Party for for National National Party Party for National Development, Development, Development, The Minister of of Green Green Minister Minister of Green Economy and and Environment Environment Economy Economy and Environment “Madam Speaker, in order to enhance quality and “Madam Speaker, in order to enhance quality and accuracy of weather forecast, the Government, with accuracy of weather forecast, the Government, with �nancial support from the United Nations Development �nancial support from the United Nations Development Programme Strengthening Climate Resilience of Programme Strengthening Climate Resilience of Agricultural Livelihoods Livelihoods in in Agro-Ecological Agro-Ecological (UNDP (UNDP Agricultural Agricultural Livelihoods in Agro-Ecological (UNDP SCRALA) project project is is currently currently partnering partnering with with the the SCRALA) SCRALA) project is currently partnering with the University of of Zambia Zambia (UNZA) (UNZA) to to develop develop aa seasonal seasonal University University of Zambia (UNZA) to develop a seasonal weather forecasting forecasting system system using using arti arti cial cial intelligence.” intelligence.” weather weather forecasting system using arti cial intelligence.” Ministerial Statements; Ministerial Statements; Weather and Climate Weather and Climate Services and the Services and the 2022/2023 rainfall forecast 2022/2023 rainfall forecast Figure 6.1.15 Table of Contents Chapter 6 Preview 279 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking United States Committee Mentions committees that address legislative and other policy An additional indicator of legislative interest is the Figure 6.1.16 shows a sharp increase in the total number of mentions of “artificial intelligence” in number of mentions of AI within committee reports committee reports produced by House and Senate beginning with the 115th legislative session. issues, investigations, and internal committee matters. Mentions of AI in U.S. Committee Reports by Legislative Session, 2001–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report 80 73 70 Number of Mentions 60 50 40 30 20 10 0 107th (2001–02) 108th (2003–04) 109th (2005–06) 110th (2007–08) 111th (2009–10) 112th (2011–12) 113th (2013–14) 114th (2015–16) 115th (2017–18) 116th (2019–20) 117th (2021–22) Figure 6.1.16 Table of Contents Chapter 6 Preview 280 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Figure 6.1.17 shows the mentions in committee reports for the 117th Congressional Session, which took place from 2021 to 2022. The Appropriations Committee leads the House reports, while the Homeland Security and Governmental Affairs Committee leads the Senate reports (Figure 6.1.18). Mentions of AI in Committee Reports of the U.S. House of Representatives for the 117th Congressional Session, 2021–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report Appropriations 20 Science, Space, and Technology 9 Rules 5 Armed Services 3 Transportation and Infrastructure 3 Intelligence (Permanent Select) 2 Natural Resources 2 Oversight and Accountability 2 Budget 1 Education and the Workforce 1 Energy and Commerce 1 Financial Services 1 Foreign A airs 1 Homeland Security 1 House Administration 1 Small Business 1 Ways and Means 1 0 1 2 3 4 5 6 7 8 9 10 11 12 Number of Mentions 13 14 15 16 17 18 19 20 21 Figure 6.1.17 Mentions of AI in Committee Reports of the U.S. Senate for the 117th Congressional Session, 2021–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report Homeland Security and Governmental A airs 8 Appropriations 3 Commerce, Science, and Transportation 3 Armed Services 2 Intelligence (Select) 2 0 Table of Contents 1 2 Chapter 6 Preview 3 4 5 Number of Mentions 6 7 8 Figure 6.1.18 281 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking Figure 6.1.19 shows the total number of mentions in committee reports from the past 10 congressional sessions, which took place from 2001 to 2022. The House and Senate Appropriations Committees, which regulate expenditures of money by the government, lead their respective lists (Figure 6.1.19 and 6.1.20). Mentions of AI in Committee Reports of the U.S. Senate, 2001–22 (Sum) Source: AI Index, 2022 | Chart: 2023 AI Index Report Appropriations 16 Homeland Security and Governmental A airs 11 Armed Services 10 Commerce, Science, and Transportation 9 Energy and Natural Resources 7 Intelligence (Select) 5 0 2 4 6 8 10 Number of Mentions 12 14 16 Figure 6.1.19 Mentions of AI in Committee Reports of the U.S. House of Representatives, 2001–22 (Sum) Source: AI Index, 2022 | Chart: 2023 AI Index Report Appropriations 45 Science, Space, and Technology 27 Rules 14 Armed Services 9 Energy and Commerce 8 Oversight and Accountability 8 Financial Services 6 Intelligence (Permanent Select) 6 Education and the Workforce 4 Transportation and Infrastructure 4 Homeland Security 3 Veterans’ A airs 3 Budget 2 Foreign A airs 2 Judiciary 2 Natural Resources 2 Small Business 2 Ways and Means 2 Agriculture 1 House Administration 1 0 Table of Contents 2 4 6 8 Chapter 6 Preview 10 12 14 16 18 20 22 24 26 28 Number of Mentions 30 32 34 36 38 40 42 44 46 Figure 6.1.20 282 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking United States AI Policy Papers To estimate activities outside national governments that are also informing AI-related lawmaking, the AI Index tracked 55 U.S.-based organizations that published policy papers in the past five years. Those organizations include: think tanks post that addresses issues related to AI and makes specific recommendations to policymakers. Topics of those papers are divided into primary and secondary categories: A primary topic is the main focus of the paper, while a secondary topic is a subtopic of the paper or an issue that is briefly explored. and policy institutes (19); university institutes and Figure 6.1.21 highlights the total number of U.S.-based, research programs (14); civil society organizations, AI-related policy papers published from 2018 to 2022. associations, and consortiums (9); industry and After a slight dip from 2020 to 2021, the total increased consultancy organizations (9); and government to 284 in 2022. Since 2018, the total number of such agencies (4). A policy paper in this section is defined papers has increased 3.2 times, signaling greater as a research paper, research report, brief, or blog interest over time. Number of AI-Related Policy Papers by U.S.-Based Organizations, 2018–22 Source: Stanford Institute for Human-Centered AI (HAI) Policy and Society | Chart: 2023 AI Index Report 300 284 Number of Policy Papers 250 200 150 100 50 0 2018 2019 2020 2021 2022 Figure 6.1.21 Table of Contents Chapter 6 Preview 283 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.1 AI and Policymaking By Topic In 2022, the most frequent primary topics were sat in fourth position as of 2022. All of these leading industry and regulation (107), innovation and topics were also well represented as secondary topics. technology (90), and government and publication Topics that received comparatively little attention administration (82) (Figure 6.1.22). Privacy, safety, and included social and behavioral sciences; humanities; security, which was the most reported topic in 2021, and communications and media. Number of AI-Related Policy Papers by U.S.-Based Organization by Topic, 2022 Source: Stanford Institute for Human-Centered AI (HAI) Policy and Society | Chart: 2023 AI Index Report Primary Topic Secondary Topic Industry and Regulation 107 Innovation and Technology 90 Gov’t and Public Administration 59 39 Equity and Inclusion 34 Int’l A airs and Int’l Security 34 Workforce and Labor 17 18 65 30 Democracy 26 26 Health and Biological Sciences 5 13 Justice and Law Enforcement 10 12 Physical Sciences 13 9 Education and Skills 8 10 4 8 3 1 Humanities 1 1 Communications and Media 0 25 59 Ethics Social and Behavioral Sciences 69 82 Privacy, Safety, and Security Energy and Environment 50 3 20 40 60 80 100 0 20 Number of Policy Papers 40 60 80 100 Figure 6.1.22 Table of Contents Chapter 6 Preview 284 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.2 National AI Strategies This subsection presents an overview of national AI strategies—policy plans developed by a country’s government to steer the development and deployment of AI technologies within its borders. Tracking trends in national strategies can be an important way of gauging the degree to which countries are prioritizing the management and regulation of AI technologies. Sources include websites of national or regional governments, the OECD AI Policy Observatory (OECD.AI), and news coverage. “AI strategy” is defined as a policy document that communicates the objective of supporting the development of AI while also maximizing the benefits of AI for society.5 6.2 National AI Strategies Aggregate Trends Canada officially launched the first national AI strategy in March of 2017; since then a total of 62 national AI strategies have been released (Figure 6.2.1). The Yearly Release of AI National Strategies by Country Source: AI Index, 2022 | Table: 2023 AI Index Report Year 2017 Canada, China, Finland 2018 Australia, France, Germany, India, Mauritius, Mexico, Sweden 2019 Argentina, Austria, Bangladesh, Botswana, Chile, Colombia, Cyprus, Czech Republic, Denmark, Egypt, Estonia, Japan, Kenya, Lithuania, Luxembourg, Malta, Netherlands, Portugal, Qatar, Romania, Russia, Sierra Leone, Singapore, United Arab Emirates, United States of America, Uruguay 2020 Algeria, Bulgaria, Croatia, Greece, Hungary, Indonesia, Latvia, Norway, Poland, Saudi Arabia, Serbia, South Korea, Spain, Switzerland 2021 Brazil, Ireland, Peru, Philippines, Slovenia, Tunisia, Turkey, Ukraine, United Kingdom, Vietnam 2022 Italy, Thailand number of released strategies peaked in 2019. By Geographic Area Figure 6.2.2 highlights the countries which, as of December 2022, have either released or developed a national AI strategy. Figure 6.2.3 enumerates the countries that, in 2021 and 2022, pledged to develop an AI strategy . The first nations to officially release national AI strategies were Canada, China, and Finland in 2017. Only two nations released national AI strategies in 2022: Italy and Thailand. Figure 6.2.1 AI National Strategies in Development by Country and Year Source: AI Index, 2022 | Table: 2023 AI Index Report Year Countries With a National Strategy on AI, 2022 Source: AI Index, 2022 | Chart: 2023 AI Index Report Country Country 2021 Armenia, Bahrain, Cuba, Iceland, Morocco, New Zealand, Oman 2022 Azerbaijan, Belgium, Benin, Israel, Jordan, Nigeria, Uzbekistan Figure 6.2.3 Released In Development Not Released Figure 6.2.2 5 The AI Index research team made efforts to identify whether there was a national AI strategy that was released or in development for every nation in the world. It is possible that some strategies were missed. Table of Contents Chapter 6 Preview 285 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.3 U.S. Public Investment in AI This section examines public AI investment in the United States based on data from the U.S. government and Govini, a company that uses AI and machine learning technologies to track U.S. public and commercial spending. 6.3 U.S. Public Investment in AI Federal Budget for Nondefense AI R&D information on classified AI R&D investment by In December 2022, the National Science and In fiscal year (FY) 2022, nondefense U.S. government defense and intelligence agencies. agencies allocated a total of $1.7 billion to AI R&D Technology Council published a report on the public-sector AI R&D budget across departments and agencies participating in the Networking and Information Technology Research and Development (NITRD) Program and the National Artificial spending (Figure 6.3.1). The amount allocated in FY 2022 represented a slight decline from FY 2021 and a 208.9% increase from FY 2018. An even greater amount, $1.8 billion, has been requested for FY 2023. Intelligence Initiative. The report does not include U.S. Federal Budget for Nondefense AI R&D, FY 2018–23 Source: U.S. NITRD Program, 2022 | Chart: 2023 AI Index Report 1.84 Budget (in Billions of U.S. Dollars) 1.50 1.75 1.73 FY21 (Enacted) FY22 (Enacted) 1.43 1.11 1.00 0.56 0.50 0.00 FY18 (Enacted) FY19 (Enacted) FY20 (Enacted) FY23 (Requested) Figure 6.3.16 6 A previous report on the public-sector AI R&D budget released in 2021 classed the FY21 spending as totaling $1.53 billion. However, the most recent report, released in 2022, upgraded the total spent in 2022 to $1.75 billion. Table of Contents Chapter 6 Preview 286 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.3 U.S. Public Investment in AI U.S. Department of Defense Budget Requests Every year the DoD releases the amount of funding to the 2022 report, the DoD requested $1.1 billion in FY they have requested for nonclassified AI-specific 2023, a 26.4% increase from the funding they received research, development, test, and evaluation. According in FY 2022 (Figure 6.3.2). U.S. DoD Budget Request for AI-Specific Research, Development, Test, and Evaluation (RDT&E), FY 2020–23 Source: U.S. Office of the Under Secretary of Defense (Comptroller), 2022 | Chart: 2023 AI Index Report 1.10 Budget Request (in Billions of U.S. Dollars) 1.00 0.93 0.84 0.87 0.80 0.60 0.40 0.20 0.00 FY20 Funding FY21 Funding FY22 Funding FY23 Funding Figure 6.3.2 Table of Contents Chapter 6 Preview 287 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.3 U.S. Public Investment in AI U.S. Government AI-Related Contract Spending volumes of federal contracts data, including prime Public investment in AI can also be measured by analyze data that is otherwise often inaccessible. federal government spending on the contracts contracts, grants, and other transaction authority (OTA) awards. The use of AI models enables Govini to Total Contract Spending that U.S. government agencies award to private companies for the supply of goods and services. Such contracts typically occupy the largest share of an Figure 6.3.3 highlights total U.S. government spending on AI, subdivided by various AI segments. From 2021 agency’s budget. to 2022, total AI spending increased from $2.7 billion Data in this section comes from Govini, which created nearly 2.5 times. In 2022, the AI subsegments that saw a taxonomy of spending by the U.S. government on the greatest amount of government spending included critical technologies including AI. Govini applied decision science ($1.2 billion), and computer vision supervised machine learning and natural language ($0.8 billion). to $3.3 billion. Since 2017, total spending has increased processing to parse, analyze, and categorize large U.S. Government Spending by Segment, FY 2017–22 Source: Govini, 2022 | Chart: 2023 AI Index Report Decision Science Computer Vision Machine Learning Autonomy Natural Language Processing U.S. Government Spending (in Billions of U.S. Dollars) 3.50 3.28 0.17 3.00 2.70 2.50 2.41 0.21 2.00 1.83 1.56 1.50 1.29 1.00 0.24 0.31 0.44 0.50 0.00 2017 0.41 0.43 0.3 0.82 0.58 0.53 0.45 0.46 0.41 0.43 0.32 0.52 0.43 0.26 0.21 0.69 0.21 1.01 0.46 0.55 2018 2019 1.19 0.73 2020 2021 2022 Figure 6.3.3 Table of Contents Chapter 6 Preview 288 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.3 U.S. Public Investment in AI Figure 6.3.4 shows U.S. government spending by AI segment in FY 2021 and FY 2022. Spending increased for the decision science, computer vision, and autonomy segments, while spending on machine learning, and natural language processing dropped slightly. U.S. Government Spending by Segment, FY 2021 Vs. 2022 Source: Govini, 2022 | Chart: 2023 AI Index Report 1.19 (+18%) Decision Science 1.01 0.82 (+55%) Computer Vision 0.53 0.69 (+33%) Autonomy 0.52 0.41 (-5%) Machine Learning Natural Language Processing 0.00 0.43 0.17 (-19%) 2022 2021 0.21 0.20 0.40 0.60 0.80 U.S. Government Spending (in Billions of U.S. Dollars) 1.00 1.20 Figure 6.3.4 Table of Contents Chapter 6 Preview 289 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.3 U.S. Public Investment in AI In FY 2022, the majority of federal AI contracts were prime contracts (62.5%), followed by grants (34.9%) and other transaction authority (OTA) awards (2.6%) (Figure 6.3.5). From FY 2021 to FY 2022, the share of contracts remained about the same, while the share of grants rose. Total Value of Contracts, Grants, and OTAs Awarded by the U.S. Government for AI/ML and Autonomy, FY 2017–22 Source: Govini, 2022 | Chart: 2023 AI Index Report 2.05, Contracts Total Value Awarded (in Billions of U.S. Dollars) 2.00 1.50 1.15, Grants 1.00 0.50 0.09, OTAs 0.00 2017 2018 2019 2020 2021 2022 Figure 6.3.5 Table of Contents Chapter 6 Preview 290 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.4 U.S. AI-Related Legal Cases In 2022, the AI Index partnered with Elif Kiesow Cortez, a scholar of artificial intelligence law, in a research project tracking trends in American legal cases from 2000 to 2022 that contain AI-related keywords.7 6.4 U.S. AI-Related Legal Cases Total Cases there were a total of 110 AI-related cases in U.S. In the last few years, there has been a sharp spike in federal and state courts, 6.5 times more than in 2016 AI-related jurisprudence in the United States. In 2022, (Figure 6.4.1). Number of AI-Related Legal Cases in the United States, 2000–22 Source: AI Index, 2022 | Chart: 2023 AI Index Report 110 Number of AI-Related Legal Cases 100 80 60 40 20 0 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 Figure 6.4.1 7 The Index analyzed both federal and state-level cases. Specific keywords in the search included “artificial intelligence,” “machine learning,” and “automated decision-making.” Some of these cases did not directly concern issues related to AI jurisprudence. As a next step of this project, we will aim to identify the cases that most centrally concern issues of AI-related law. Table of Contents Chapter 6 Preview 291 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.4 U.S. AI-Related Legal Cases Geographic Distribution they are home to many large businesses that have In 2022, the majority of AI-related legal cases greater number of AI-related legal cases originating originated in California (23), Illinois (17), and New from Illinois—this follows the state’s enactment York (11) (Figure 6.4.2). The aggregate number of AI- of the Biometric Information Privacy Act (BIPA), related cases since 2000 show a similar geographic which requires that companies doing business in distribution (Figure 6.4.3). California and New York’s Illinois follow a number of regulations related to the inclusion in the top three is unsurprising given that collection and storage of biometric information. integrated AI. In recent years, there have been a Number of AI-Related Legal Cases in the United States by State, 2022 Source: AI Index, 2022 | Chart: 2023 AI Index Report California 23 Illinois 17 New York 11 Delaware 7 Florida 7 Washington 5 Kansas 4 Massachusetts 4 Maryland 4 District of Columbia 3 Texas 3 Ohio 3 Pennsylvania 3 Virginia Missouri 0 2 2 2 4 6 8 10 12 14 Number of AI-Related Legal Cases 16 18 20 22 24 Figure 6.4.28 8 Figures 6.4.2 and 6.4.3 include information for states and districts, given that cases sometimes originate from American districts like the District of Columbia or Puerto Rico Table of Contents Chapter 6 Preview 292 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.4 U.S. AI-Related Legal Cases Number of AI-Related Legal Cases in the United States by State, 2000–22 (Sum) Source: AI Index, 2022 | Chart: 2023 AI Index Report California 127 New York 66 Illinois 36 Texas 26 Delaware 19 Massachusetts 19 Washington 18 Pennsylvania 16 Michigan 12 Virginia 12 District of Columbia 12 Florida 12 Ohio Kansas Minnesota 0 10 9 8 10 20 30 40 50 60 70 80 Number of AI-Related Legal Cases 90 100 110 120 130 Figure 6.4.3 Sector Figure 6.4.4 groups U.S.-based legal cases by economic sector. The predominant sector in 2022 was financial services and professional services (48 cases); followed by media, culture, graphical (18); and public service (14). Sector at Issue in AI-Related Legal Cases in the United States, 2022 Source: AI Index, 2022 | Chart: 2023 AI Index Report Financial Services, Professional Services 48 Media, Culture, Graphical 18 Public Service 14 Education 6 Health Services 6 Hotels, Catering, Tourism 4 Postal and Telecommunications Services 4 Transport (Including Civil Aviation, Railways, Road Transport) 3 Food, Drink, Tobacco 2 Transport Equipment Manufacturing 2 Basic Metal Production 1 Mechanical and Electrical Engineering 1 Oil and Gas Production, Oil Re ning 1 0 Table of Contents 4 Chapter 6 Preview 8 12 16 20 24 28 32 Number of AI-Related Legal Cases 36 40 44 48 Figure 6.4.4 293 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.4 U.S. AI-Related Legal Cases Type of Law The greatest proportion of AI-related legal cases concerned civil law (29%) (Figure 6.4.5). There were also a large number of AI-related legal cases in the domain of intellectual property (19%), as well as contract law (13.6%). Area of Law of AI-Related Legal Cases in the United States, 2022 Source: AI Index, 2022 | Chart: 2023 AI Index Report Civil 32 Intellectual Property 21 Contract 15 Competition 11 Constitutional 8 Employment and Labor 6 Criminal 5 Corporate 4 Financial 3 Human Rights and Immigration 2 Terrorism and National Security 2 Tort 0 Table of Contents 1 2 4 6 8 Chapter 6 Preview 10 12 14 16 18 20 Number of AI-Related Legal Cases 22 24 26 28 30 32 Figure 6.4.5 294 Artificial Intelligence Index Report 2023 Chapter 6: Policy and Governance 6.4 U.S. AI-Related Legal Cases Narrative Highlight: Three Significant AI-Related Legal Cases The section below profiles three significant AI-related cases in the United States, highlighting some of the legal issues that are at stake when AI is brought into the courts. Duerr v. Bradley University (2022- Flores v. Stanford9 (2021-Sep-28) Dyroff v. Ultimate Software Grp., Inc Mar-10) – United States Court of – United States Court of Appeals (2017-Nov-26) – United States Court of Appeals for the Seventh Circuit for the Second Circuit Appeals for the Ninth Circuit The plaintiffs, who were enrolled The plaintiffs, offenders denied Plaintiff Kristanalea Dyroff sued Ultimate as undergraduates in a private parole, sued the New York State Software after her 29-year-old son died university in Peoria, Illinois, during Board of Parole over being from an overdose of heroin laced with the fall 2020 semester, were told refused access to information fentanyl, which he allegedly bought to use a third-party proctoring used by the board in its review from a drug dealer that he encountered tool called Respondus Monitor for of their cases. Northpointe, Inc., on Ultimate Software’s social network remote, online exams. This tool petitioned the court as a non- site. Dyroff asserted seven claims made use of artificial intelligence party because its Correctional against Ultimate Software which technologies. The plaintiffs claimed Offender Management Profiling included negligence, wrongful death, that the defendants violated Illinois’ for Alternative Sanctions and civil conspiracy. At the core of these Biometric Information Privacy Act (COMPAS), an AI-powered claims was the argument that Ultimate (BIPA) by not adequately following risk assessment tool, had been Software mined the data of users its guidelines concerning the used by the parole board in its and deployed that data, alongside an collection of biometric information. determinations. Northpointe algorithm, to recommend drug-related BIPA does not apply to financial wanted to prevent the disclosure discussion groups to her son. Ultimate institutions. Ultimately, the court of AI trade secrets to one of the Software moved to dismiss the claims ruled that under the Gramm- plaintiff’s expert witnesses. The and claimed partial immunity under the Leach-Bliley Act, the defendants court ruled that the confidential Communications Decency Act, which were a financial institution by material in question was relevant protects website operators from liability virtue of lending functions they to the plaintiff’s case and posed for third-party content on their site. The engaged in and therefore exempt little risk of competitive injury. Court ruled that Ultimate Software was from BIPA. As such, the plaintiff’s As such, the material was immune and that its use of algorithms case was dismissed. ordered to be released under a did not sufficiently amount to novel supplemental protective order. content creation. 9 The defendant was Tina M. Stanford, as Chairwoman of the New York State Board of Parole. Table of Contents Chapter 6 Preview 295 Artificial Intelligence Index Report 2023 Artificial Intelligence Index Report 2023 CHAPTER 7: Diversity Table of Contents Chapter 7 Preview 296 Artificial Intelligence Index Report 2023 CHAPTER 7 PREVIEW: Diversity Overview 298 Chapter Highlights 299 7.1 AI Conferences Women in Machine Learning (WiML) NeurIPS Workshop 300 Narrative Highlight: Disability Status of CS, CE, and Information Students 311 New AI PhDs 312 CS, CE, and Information Faculty 313 300 Workshop Participants 300 7.3 K–12 Education 316 Demographic Breakdown 301 AP Computer Science: Gender 316 AP Computer Science: Ethnicity 318 7.2 AI Postsecondary Education 305 CS Bachelor’s Graduates 305 CS Master’s Graduates 307 CS PhD Graduates 309 Table of Contents Chapter 7 Preview ACCESS THE PUBLIC DATA 297 Artificial Intelligence Index Report 2023 Chapter 7: Diversity Overview AI systems are increasingly deployed in the real world. However, there often exists a disparity between the individuals who develop AI and those who use AI. North American AI researchers and practitioners in both industry and academia are predominantly white and male. This lack of diversity can lead to harms, among them the reinforcement of existing societal inequalities and bias. This chapter highlights data on diversity trends in AI, sourced primarily from academia. It borrows information from organizations such as Women in Machine Learning (WiML), whose mission is to improve the state of diversity in AI, as well as the Computing Research Association (CRA), which tracks the state of diversity in North American academic computer science. Finally, the chapter also makes use of Code.org data on diversity trends in secondary computer science education in the United States. Note that the data in this subsection is neither comprehensive nor conclusive. Publicly available demographic data on trends in AI diversity is sparse. As a result, this chapter does not cover other areas of diversity, such as sexual orientation. The AI Index hopes that as AI becomes more ubiquitous, the amount of data on diversity in the field will increase such that the topic can be covered more thoroughly in future reports. Table of Contents Chapter 7 Preview 298 Artificial Intelligence Index Report 2023 Chapter 7: Diversity Chapter Highlights North American bachelor’s, master’s, and PhD-level computer science students are becoming more ethnically diverse. Although white students are still the most represented ethnicity among new resident bachelor’s, master’s, and PhD-level computer science graduates, students from other ethnic backgrounds (for example, Asian, Hispanic, and Black or African American) are becoming increasingly more represented. For example, in 2011, 71.9% of new resident CS bachelor’s graduates were white. In 2021, that number dropped to 46.7%. Women make up an increasingly greater share of CS, CE, and information faculty hires. Since 2017, the proportion of new female CS, CE, and information faculty hires has increased from 24.9% to 30.2%. Still, most CS, CE, and information faculty in North American universities are male (75.9%). As of 2021, only 0.1% of CS, CE, and information faculty identify as nonbinary. Table of Contents Chapter 7 Preview New AI PhDs are still overwhelmingly male. In 2021, 78.7% of new AI PhDs were male. Only 21.3% were female, a 3.2 percentage point increase from 2011. There continues to be a gender imbalance in higher-level AI education. American K–12 computer science education has become more diverse, in terms of both gender and ethnicity. The share of AP computer science exams taken by female students increased from 16.8% in 2007 to 30.6% in 2021. Year over year, the share of Asian, Hispanic/Latino/Latina, and Black/African American students taking AP computer science has likewise increased. 299 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.1 AI Conferences 7.1 AI Conferences Women in Machine Learning (WiML) NeurIPS Workshop collaboration and interaction among participants from Women in Machine Learning (WiML), founded in Workshop Participants 2006, is an organization dedicated to supporting and Figure 7.1.1 shows the number of participants that increasing the impact of women in machine learning. have attended the WiML workshop since 2010. In the This subsection of the AI Index report presents data last decade, there has been a steady increase: 1,157 from the WiML annual technical workshop, hosted at individuals participated in 2022, 13 times the number NeurIPS. Since 2020, WiML has also been hosting the in 2010. However, from 2021 to 2022, the number of Un-Workshop, which serves to advance research via workshop participants decreased from 1,486 to 1,157.1 diverse backgrounds at the International Conference of Machine Learning (ICML). Attendance at NeurIPS Women in Machine Learning Workshop, 2010–22 Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report 1,400 Number of Attendees 1,200 1,157 1,000 800 600 400 200 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 7.1.1 1 The recent decrease in WiML workshop attendance may be attributable to the overall recent decrease in NeurIPS attendance. This overall decrease may in turn be a result of NeurIPS moving away from a purely virtual format. Table of Contents Chapter 7 Preview 300 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.1 AI Conferences Demographic Breakdown information aggregated. Among survey respondents, Figure 7.1.2 breaks down the continent of residence around 41.5% were from North America, followed of the 2022 workshop participants. The data in the by Europe (34.2%), Asia (17.1%), and Africa (3.4%). In following figures comes from a survey completed 2022, there was greater representation from Europe, by participants who consented to having such Asia, and South America. Continent of Residence of Participants at NeurIPS Women in Machine Learning Workshop, 2022 Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report North America 41.50% Europe 34.20% Asia 17.10% Africa South America Australia/ Oceania 3.40% 1.60% 1.40% Antarctica 0.20% 0% 5% 10% 15% 20% 25% % of Respondents 30% 35% 40% 45% Figure 7.1.22 2 At the time of the survey, one of the respondents was temporarily residing in Antarctica. Table of Contents Chapter 7 Preview 301 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.1 AI Conferences The majority of participants at the 2022 WiML workshop were female-identifying (37.0%), another 25.8% were male-identifying, and 0.5% were nonbinary-identifying (Figure 7.1.3). Gender Breakdown of Participants at NeurIPS Women in Machine Learning Workshop, 2022 Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report Female 37.00% Prefer Not to Say 36.30% Male Nonbinary 25.80% 0.50% Gender Fluid 0.20% Gender 0.20% Non-Conforming 0% 5% 10% 15% 20% % of Respondents 25% 30% 35% 40% Figure 7.1.3 Table of Contents Chapter 7 Preview 302 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.1 AI Conferences The most represented professional positions at the workshop were PhD students (49.4%), research scientists/ data scientists (20.8%), software engineers/data engineers (8.4%), and faculty (4.4%) (Figure 7.1.4). Professional Positions of Participants at NeurIPS Women in Machine Learning Workshop, 2022 Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report PhD Student 49.40% Research Scientist/ Data Scientist Software Engineer/ Data Engineer 20.80% 8.40% Faculty 4.40% CEO/Director 3.50% Others 3.50% Postdoc 3.50% MSc Student Undergraduate Student 2.30% 2.00% Recruiter 1.60% Lecturer 1.40% Recent Graduate 0.20% 0% 10% 20% 30% % of Respondents 40% 50% Figure 7.1.4 Table of Contents Chapter 7 Preview 303 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.1 AI Conferences The WiML workshop participants at NeurIPS submitted papers covering a wide range of subjects (Figure 7.1.5). The most popular submission topics were applications (32.5%), algorithms (23.4%), and deep learning (14.8%). Primary Subject Area of Submissions at NeurIPS Women in Machine Learning Workshop, 2022 Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report Applications 32.50% Algorithms 23.40% Deep Learning 14.80% Social Aspects of Machine Learning 7.70% Reinforcement Learning and Planning 7.20% Neuroscience and Cognitive Science 5.30% Data, Challenges, Implementations, Software 3.80% Probabilistic Methods 3.30% Optimization 1.00% Theory 1.00% 0% 5% 10% 15% 20% % of Respondents 25% 30% Figure 7.1.5 Table of Contents Chapter 7 Preview 304 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education Another proxy for studying diversity in AI is looking at trends in postsecondary AI education. The following subsection borrows data from the Computing Research Association’s (CRA) annual Taulbee Survey.3 7.2 AI Postsecondary Education CS Bachelor’s Graduates last decade whereby an increasingly large number The number of female CS bachelor’s graduates survey also included a nonbinary gender category: In rose to 22.3% from 2020 to 2021 (Figure 7.2.1). This 2021, the number of nonbinary/other-identifying CS increase mirrors a broader trend observed in the bachelor’s graduates was 0.04%. of CS bachelor’s graduates were women. The CRA Gender of New CS Bachelor’s Graduates (% of Total) in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 90% New CS Bachelor’s Graduates (% of Total) 80% 77.66%, Male 70% 60% 50% 40% 30% 22.30%, Female 20% 10% 0% 0.04%, Nonbinary/Other 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.2.1 3 The charts in this subsection look only at the ethnicity of domestic or native CS students and faculty. Although the CRA reports data on the proportion of nonresident aliens in each educational level (i.e., Bachelor’s, Master’s, PhD, and faculty), data on the ethnicity of nonresident aliens is not included. For the proportion of nonresident aliens in each category, see footnotes. Table of Contents Chapter 7 Preview 305 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education Figure 7.2.2 breaks down the ethnicity of new CS bachelor’s graduates in North America: The top ethnicity was white (46.7%), followed by Asian (34.0%) and Hispanic (10.9%). In the last decade, the proportion of new CS bachelor’s graduates who were Asian, Hispanic, or multiracial (not Hispanic) steadily increased.4 Ethnicity of New Resident CS Bachelor’s Graduates (% of Total) in North America, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report New CS Bachelor’s Graduates (% of Total) 70% 60% 50% 46.69%, White 40% 33.99%, Asian 30% 20% 10.91%, Hispanic (Any Race) 4.10%, Multiracial (Not Hispanic) 3.85%, Black or African-American 0.24%, Native Hawaiian or Paci c Islander 0.22%, American Indian or Alaska Native 10% 0% 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.2.2 4 In 2021, 16.3% of new CS bachelor graduates were nonresident aliens. Table of Contents Chapter 7 Preview 306 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education CS Master’s Graduates Figure 7.2.3 shows the gender of CS master’s moving to 27.8% in 2021 from 24.6% in 2011. In graduates. The proportion of female CS master’s 2021, 0.9% of CS master’s graduates identified as graduates has not substantially increased over time, nonbinary/other. Gender of New CS Master’s Graduates (% of Total) in North America, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 80% 71.27%, Male New CS Master’s Graduates (% of Total) 70% 60% 50% 40% 30% 27.83%, Female 20% 10% 0.90%, Nonbinary/Other 0% 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.2.3 Table of Contents Chapter 7 Preview 307 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education Of domestic students, the most represented ethnicities are white (50.3%), followed by Asian (34.8%), and Hispanic (7.3%) (Figure 7.2.4). As with CS bachelor’s graduates, in the last decade white students have represented an increasingly smaller proportion of new CS master’s graduates.5 Ethnicity of New Resident CS Master’s Graduates (% of Total) in North America, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 70% New CS Master’s Graduates (% of Total) 60% 50.28%, White 50% 40% 34.83%, Asian 30% 20% 7.25%, Hispanic (Any Race) 3.82%, Black or African-American 3.45%, Multiracial (Not Hispanic) 0.25%, American Indian or Alaska Native 0.12%, Native Hawaiian or Paci c Islander 10% 0% 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.2.4 5 In 2021, 65.2% of new CS master’s graduates were nonresident aliens. Table of Contents Chapter 7 Preview 308 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education CS PhD Graduates In 2021, the number of new female CS PhD continue to be male. There remains a large gap graduates rose to 23.3% from 19.9% (Figure 7.2.5). between new male and female CS PhDs. Despite this rise, most new CS PhD graduates Gender of New CS PhD Graduates (% of Total) in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 80% 76.58%, Male New CS PhD Graduates (% of Total) 70% 60% 50% 40% 30% 23.30%, Female 20% 10% 0.12%, Nonbinary/Other 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.2.5 Table of Contents Chapter 7 Preview 309 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education Between 2011 and 2021, the number of new white resident CS PhD graduates declined by 9.4 percentage points. Asians are the next most represented group (29%), followed by Hispanics (5.1%) and Black or African Americans (4%) (Figure, 7.2.6).6 Ethnicity of New Resident CS PhD Graduates (% of Total) in North America, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 70% New CS PhD Graduates (% of Total) 60% 58.64%, White 50% 40% 30% 29.00%, Asian 20% 5.12%, Hispanic (Any Race) 4.05%, Black or African-American 2.13%, Multiracial (Not Hispanic) 0.64%, American Indian or Alaska Native 0.43%, Native Hawaiian or Paci c Islander 10% 0% 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.2.6 6 In 2021, 68.6% of new CS PhD graduates were nonresident aliens. Table of Contents Chapter 7 Preview 310 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education Narrative Highlight: Disability Status of CS, CE, and Information Students The 2021 edition of the CRA Taulbee Survey was the disability accommodations in the last year. first to gather information about the prevalence of The number of such students was relatively small. CS, CE, and information students with disabilities. Only 4.0% of bachelor’s, 1.0% of PhD students, The CRA asked departments to identify the number and 0.8% of master’s students reported needing of students at each degree level who received accommodations (Figure 7.2.7). CS, CE, and Information Students (% of Total) With Disability Accomodations in North America, 2021 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 4.10% CS, CE, and Information Students (% of Total) 4% 3% 2% 1.00% 1% 0% 0.80% Bachelor’s PhDs Master’s Figure 7.2.7 Table of Contents Chapter 7 Preview 311 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education New AI PhDs were female. While the number of female AI PhDs Figure 7.2.8 looks at demographic trends for new AI marginally increased from 2020 to 2021, we find no PhD graduates who focus on artificial intelligence. meaningful trends in the last decade relating to the In 2021, 78.7% of new AI PhDs were male and 21.3% gender of new AI PhDs. Gender of New AI PhD Graduates (% of Total) in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 80% 78.70%, Male New AI PhD Graduates (% of Total) 70% 60% 50% 40% 30% 21.30%, Female 20% 10% 0% 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.2.8 Table of Contents Chapter 7 Preview 312 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education CS, CE, and Information Faculty predominantly male (75.9%) (Figure 7.2.9). Women make up 23.9% of CS, CE, and information faculty, Data on the ethnicity and gender of CS, CE, and and nonbinary individuals make up 0.1%. The share information faculty helps to paint a picture of of female CS, CE, and information faculty has slowly diversity trends in academic AI and CS. As of 2021, increased; since 2011, the number of female faculty most CS, CE, and information faculty members are members has risen 5 percentage points. Gender of CS, CE, and Information Faculty (% of Total) in North America, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 80% 75.94%, Male CS, CE, and Information Faculty (% of Total) 70% 60% 50% 40% 30% 23.94%, Female 20% 10% 0.12%, Nonbinary/Other 0% 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.2.9 Table of Contents Chapter 7 Preview 313 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education Although most new CS, CE, and information faculty hires in North American universities are still male, the proportion of women among faculty hires reached 30.2% in 2021, up about 9 percentage points from 2015 (Figure 7.2.10). Gender of New CS, CE, and Information Faculty Hires (% of Total) in North America, 2011–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report New CS, CE, and Information Faculty Hires (% of Total) 80% 70% 69.26%, Male 60% 50% 40% 30.17%, Female 30% 20% 10% 0% 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 0.57%, Nonbinary/Other 2021 Figure 7.2.10 Table of Contents Chapter 7 Preview 314 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.2 AI Postsecondary Education The majority of resident CS, CE, and information faculty are white as of 2021 (58.1%), followed by Asian (29.7%) (Figure 7.2.11). However, the gap between white CS, CE, and information faculty and faculty of the next nearest ethnicity is slowly narrowing: In 2011, the gap stood at 46.1%, whereas in 2021 it dropped to 28.4%.7 Ethnicity of Resident CS, CE, and Information Faculty (% of Total) in North America, 2010–21 Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report 70% CS, CE, and Information Faculty (% of Total) 60% 58.08%, White 50% 40% 30% 29.70%, Asian 20% 5.82%, Unknown 2.80%, Hispanic (Any Race) 2.54%, Black or African-American 0.67%, Multiracial (Not Hispanic) 0.25%, American Indian or Alaska Native 0.13%, Native Hawaiian or Paci c Islander 10% 0% 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.2.11 7 In 2021, 6.7% of CS, CE, and information faculty in North America were nonresident aliens. Table of Contents Chapter 7 Preview 315 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.3 K–12 Education How do trends in AI diversity measure at the K–12 level, prior to students entering university? This subsection borrows data from Code.org, an American nonprofit that aims to promote K–12 computer science education in the United States. 7.3 K–12 Education AP Computer Science: Gender nor female (Figure 7.3.1). It is still the case that male In 2021, 69.2% of AP computer science exams were students take more AP computer science exams taken by male students, 30.6% by female students, than any other gender, but the proportion of female and 0.3% by students who identified as neither male students has almost doubled in the last decade. AP Computer Science Exams Taken (% of Total) by Gender, 2007–21 Source: Code.org, 2022 | Chart: 2023 AI Index Report AP Computer Science Exams Taken (% of Total) 80% 70% 69.16%, Male 60% 50% 40% 30.58%, Female 30% 20% 10% 0.26%, Other 0% 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.3.1 Table of Contents Chapter 7 Preview 316 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.3 K–12 Education On a percent basis, the states with the largest (35%) (Figure 7.3.2). Other states with notable CS and number of female AP computer science test- AI activity include California, Texas, and Washington, takers were Alabama (36%) and Washington, D.C. with rates of women taking AP computer science (36%), followed by Nevada (35%), Louisiana (35%), tests at rates hovering around 30 percent. Tennessee (35%), Maryland (35%), and New York AP Computer Science Exams Taken by Female Students (% of Total), 2021 Source: Code.org, 2022 | Chart: 2023 AI Index Report ME 27% AK 20% CA 31% VT 23% NH 24% MA 30% NY 35% CT 30% RI 31% WA 32% MT 21% ND 16% SD 15% MN 23% WI 23% MI 30% OR 21% ID 26% WY 31% NE 25% IA 24% IL 32% IN 23% OH 27% PA 27% NJ 31% NV 35% UT 23% CO 26% KS 15% MO 22% KY 31% WV 30% DC 36% MD 35% DE 22% AZ 27% NM 29% OK 25% AR 29% TN 35% VA 28% NC 31% TX 30% LA 35% MS 33% AL 36% GA 29% SC 34% HI 30% FL 31% Figure 7.3.2 Table of Contents Chapter 7 Preview 317 Artificial Intelligence Index Report 2023 Chapter 7: Diversity 7.3 K–12 Education AP Computer Science: Ethnicity most postsecondary computer science fields, the Code.org collects data that speaks to trends in the more ethnically diverse over time. White students ethnicity of AP computer science test-takers. White are still the greatest test-taking group; however, over students took the greatest proportion of the exams in time, more Asian, Hispanic/Latino/Latina and Black/ 2021 (42.7%), followed by Asian (28.8%) and Hispanic/ African American students have taken AP computer Latino/Latina students (16.5%) (Figure 7.3.3). As with science exams. pool of AP computer science test-takers is becoming AP Computer Science Exams Taken (% of Total Responding Students) by Race/Ethnicity, 2007–21 AP Computer Science Exams Taken (% of Total Responding Students) Source: Code.org, 2022 | Chart: 2023 AI Index Report 60% 50% 42.74%, White 40% 30% 28.78%, Asian 20% 16.48%, Hispanic/Latino/Latina 10% 6.32%, Black/African American 4.92%, Two or More Races 0.62%, Native American/Alaskan 0.15%, Native Hawaiian/Paci c Islander 0.00%, Other 0% 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure 7.3.3 Table of Contents Chapter 7 Preview 318 Artificial Intelligence Index Report 2023 Artificial Intelligence Index Report 2023 CHAPTER 8: Public Opinion Table of Contents Chapter 8 Preview 319 Artificial Intelligence Index Report 2023 CHAPTER 8 PREVIEW: Public Opinion Overview 321 Chapter Highlights 322 8.1 Survey Data 323 Global Insights 323 AI Products and Services 323 AI: Harm or Help? 327 United States Table of Contents 329 Chapter 8 Preview Narrative Highlight: How Does the Natural Language Processing (NLP) Research Community Feel About AI? 334 8.2 Social Media Data 340 Dominant Models 340 ACCESS THE PUBLIC DATA 320 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion Overview AI has the potential to have a transformative impact on society. As such it has become increasingly important to monitor public attitudes toward AI. Better understanding trends in public opinion is essential in informing decisions pertaining to AI’s development, regulation, and use. This chapter examines public opinion through global, national, demographic, and ethnic lenses. Moreover, we explore the opinions of AI researchers, and conclude with a look at the social media discussion that surrounded AI in 2022. We draw on data from two global surveys, one organized by IPSOS, and another by Lloyd’s Register Foundation and Gallup, along with a U.S-specific survey conducted by PEW Research. It is worth noting that there is a paucity of longitudinal survey data related to AI asking the same questions of the same groups of people over extended periods of time. As AI becomes more and more ubiquitous, broader efforts at understanding AI public opinion will become increasingly important. Table of Contents Chapter 8 Preview 321 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion Chapter Highlights Chinese citizens are among those who feel the most positively about AI products and services. Americans … not so much. In a 2022 IPSOS survey, 78% of Chinese respondents (the highest proportion of surveyed countries) agreed with the statement that products and services using AI have more benefits than drawbacks. After Chinese respondents, those from Saudi Arabia (76%) and India (71%) felt the most positive about AI products. Only 35% of sampled Americans (among the lowest of surveyed countries) agreed that products and services using AI had more benefits than drawbacks. People across the world and especially America remain unconvinced by self-driving cars. In a global survey, only 27% of respondents reported feeling safe in a self-driving car. Similarly, Pew Research suggests that only 26% of Americans feel that driverless passenger vehicles are a good idea for society. Table of Contents Chapter 8 Preview Men tend to feel more positively about AI products and services than women. Men are also more likely than women to believe that AI will mostly help rather than harm. According to the 2022 IPSOS survey, men are more likely than women to report that AI products and services make their lives easier, trust companies that use AI, and feel that AI products and services have more benefits than drawbacks. A 2021 survey by Gallup and Lloyd’s Register Foundation likewise revealed that men are more likely than women to agree with the statement that AI will mostly help rather than harm their country in the next 20 years. Different causes for excitement and concern. NLP researchers … have some strong opinions as well. Among a sample of surveyed Americans, those who report feeling excited about AI are most excited about the potential to make life and society better (31%) and to save time and make things more efficient (13%). Those who report feeling more concerned worry about the loss of human jobs (19%); surveillance, hacking, and digital privacy (16%); and the lack of human connection (12%). According to a survey widely distributed to NLP researchers, 77% either agreed or weakly agreed that private AI firms have too much influence, 41% said that NLP should be regulated, and 73% felt that AI could soon lead to revolutionary societal change. These were some of the many strong opinions held by the NLP research community. 322 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data 8.1 Survey Data Global Insights Figure 8.1.1 highlights global opinions (aggregated How do opinions of AI vary across the globe? The variety of questions relating to AI products and first subsection of this chapter provides a response by looking at survey data from IPSOS and Pew Research, as well as one poll that was a collaboration of Gallup and Lloyd’s Register Foundation. The surveys suggest that public perceptions concerning AI differ across countries and by demographic groups. results across the entire survey subsample) for a services. It shows the percentage of respondents who agree with a particular question. The majority of the survey sample, 60%, believe that AI products and services will profoundly change their daily life in the near future—and make their life easier. A very slight majority, 52%, feel that products and services that use AI have more benefits than drawbacks. Only 40% AI Products and Services In late 2021, IPSOS ran a survey on global attitudes of respondents report that AI products and services toward AI products and services. The survey make them feel nervous. consisted of interviews with 19,504 adults ages 16–74 in 28 different countries.1 Global Opinions on Products and Services Using AI (% of Total), 2022 Source: IPSOS, 2022 | Chart: 2023 AI Index Report I have a good understanding of what artificial intelligence is 64% Products and services using artificial intelligence will profoundly change my daily life in the next 3–5 years 60% Products and services using artificial intelligence make my life easier 60% Products and services using artificial intelligence have more benefits than drawbacks 52% I know which types of products and services use artificial intelligence 50% I trust companies that use artificial intelligence as much as I trust other companies 50% Products and services using artificial intelligence have profoundly changed my daily life in the past 3–5 years 49% Products and services using artificial intelligence make me nervous 0% 39% 10% 20% 30% 40% % of Respondents That “Agree” 50% 60% Figure 8.1.1 1 See Appendix for more details about the survey methodology. Table of Contents Chapter 8 Preview 323 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Opinions vary widely across countries as to the services using AI have more benefits than drawbacks relative advantages and disadvantages of AI. (Figure 8.1.2). However, only 35% of American The IPSOS survey suggests that 78% of Chinese respondents share that sentiment. Among the 28 respondents, 76% of Saudi Arabian respondents, and surveyed countries, France and Canada held the most 71% of Indian respondents feel that products and negative views. ‘Products and services using AI have more bene ts than drawbacks,’ by Country (% of Total), 2022 Source: IPSOS, 2022 | Chart: 2023 AI Index Report China 78% Saudi Arabia 76% India 71% Peru 70% Mexico 65% Malaysia 65% Colombia 64% Chile 63% South Korea 62% Turkey 60% Brazil 57% South Africa 57% Argentina 55% Spain 53% Russia 53% Italy 50% Hungary 49% Poland 48% Japan 42% Sweden 40% Belgium 38% Great Britain 38% Australia 37% Germany 37% United States 35% Netherlands 33% Canada 32% France 0% Table of Contents 31% 10% 20% Chapter 8 Preview 30% 40% 50% % of Respondents That “Agree” 60% 70% 80% Figure 8.1.2 324 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Figure 8.1.3 breaks down answers to all of IPSOS’ companies that use AI as much as other companies, AI products and services questions by country. and only 30% say that AI products and services using AI Generally, sentiment relating to AI products make them nervous. Conversely, American respondents and services seems to be strongly correlated are among the most negative when it comes to AI. Only within specific countries. For example, Chinese 41% claim that AI products and services make their lives respondents seem to feel among the most positive easier, 35% report trusting AI companies as much as about AI products and services: 87% of Chinese other companies, and 52% report that AI products and respondents claim that AI products and services services make them feel nervous. make their lives easier, 76% report trusting Opinions About AI by Country (% Agreeing With Statement), 2022 Source: IPSOS, 2022 | Chart: 2023 AI Index Report I have a good understanding of what 64% 59% 60% 69% 59% 76% 67% 71% 50% 50% 57% 67% 72% 42% 41% 61% 74% 65% 76% 66% 75% 73% 78% 72% 62% 60% 68% 63% artificial intelligence is Products and services using artificial intelligence will profoundly change 60% 50% 52% 61% 44% 67% 80% 65% 45% 44% 46% 55% 74% 53% 53% 71% 65% 53% 71% 56% 60% 80% 72% 76% 56% 50% 73% 46% my daily life in the next 3–5 years Products and services using artificial 59% 46% 49% 65% 44% 70% 87% 71% 39% 45% 45% 50% 72% 54% 52% 71% 73% 47% 74% 58% 64% 80% 67% 74% 59% 46% 71% 41% intelligence make my life easier Products and services using artificial intelligence have more benefits than 55% 37% 38% 57% 32% 63% 78% 64% 31% 37% 38% 49% 71% 50% 42% 65% 65% 33% 70% 48% 53% 76% 57% 62% 53% 40% 60% 35% drawbacks I know which types of products and 47% 38% 37% 58% 36% 59% 76% 62% 34% 37% 37% 38% 69% 45% 32% 61% 62% 41% 63% 52% 57% 69% 57% 60% 46% 37% 60% 39% services use artificial intelligence I trust companies that use artificial intelligence as much as I trust other 55% 36% 40% 50% 34% 56% 76% 57% 34% 42% 35% 48% 68% 48% 39% 61% 60% 38% 60% 51% 52% 73% 56% 46% 50% 39% 63% 35% companies Products and services using artificial intelligence have profoundly changed 53% 37% 37% 51% 32% 58% 73% 58% 32% 31% 33% 38% 67% 41% 30% 65% 62% 40% 65% 45% 50% 72% 56% 62% 49% 30% 60% 36% my daily life in the past 3–5 years United States Turkey Sweden Spain South Korea South Africa Saudia Arabia Russia Poland Peru Netherlands Mexico Japan Malaysia Italy India Hungary Great Britain France Germany China Colombia Chile Brazil Canada Belgium Australia Argentina Products and services using artificial 33% 51% 42% 35% 49% 36% 30% 39% 32% 37% 50% 31% 53% 26% 20% 48% 38% 36% 35% 30% 28% 51% 52% 32% 48% 37% 48% 52% intelligence make me nervous Figure 8.1.3 Table of Contents Chapter 8 Preview 325 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Figure 8.1.4 breaks down opinions in all countries feeling that AI products and services make their lives across demographic groups such as gender, age, easier, they are also less likely than the 35-to-49 age household income, and employment status. IPSOS category to believe that AI products and services have results suggest that men feel more positively about more benefits than drawbacks. Finally, households with AI products and services than women—for example, higher incomes are more positive, compared to those compared to women, men are more likely to report with lower incomes, about AI products and services feeling that AI products and services make their making life easier and having more benefits than lives easier. Age-specific opinions vary. For instance, drawbacks. while individuals under 35 are most likely to report Opinions About AI by Demographic Group (% Agreeing With Statement), 2022 69% 60% 66% 65% 61% 57% 63% 71% 56% 64% 71% 73% 74% 67% 59% Products and services using artificial intelligence will profoundly change my daily life in the next 3–5 years 63% 57% 63% 61% 55% 56% 58% 67% 53% 58% 68% 70% 72% 64% 54% Products and services using artificial intelligence make my life easier 62% 58% 64% 62% 54% 56% 58% 66% 53% 58% 67% 67% 70% 63% 55% Products and services using artificial intelligence have more benefits than drawbacks 55% 49% 47% 53% 46% 50% 51% 57% 45% 50% 59% 63% 64% 55% 47% I know which types of products and services use artificial intelligence 55% 46% 54% 51% 45% 46% 50% 57% 44% 48% 58% 63% 65% 54% 44% I trust companies that use artificial intelligence as much as I trust other companies 53% 47% 54% 51% 44% 47% 48% 57% 45% 48% 56% 61% 62% 53% 45% Products and services using artificial intelligence have profoundly changed my daily life in the past 3–5 years 51% 46% 54% 50% 41% 46% 47% 54% 43% 46% 55% 61% 62% 52% 43% Products and services using artificial intelligence make me nervous 38% 41% 40% 40% 38% 41% 41% 38% 41% 37% 40% 48% 46% 40% 38% Female Under 35 35 to 49 50 to 74 Low Medium High Low Medium High Business Owner Sr. Exec./ Decision Maker Employed Gender Age Household Income Education Non-Employed I have a good understanding of what artificial intelligence is Male Source: IPSOS, 2022 | Chart: 2023 AI Index Report Employment Status Figure 8.1.4 Table of Contents Chapter 8 Preview 326 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data AI: Harm or Help? In 2021, Lloyd’s Register Foundation, an independent A greater proportion of respondents believed that global charity, collaborated with Gallup to poll AI will mostly help (39%) compared to a smaller 125,911 people across 121 countries about their proportion who believed that it would mostly harm perceptions of artificial intelligence and other digital (28%). Mirroring the disparity in responses across trends. Figure 8.1.5 shows the responses to the gender evident in the IPSOS survey, men in the survey question, “Do you think artificial intelligence Lloyd’s-Gallup poll were more likely than women to will mostly help or mostly harm people in this country report believing that AI will mostly help people in the in the next 20 years?” next 20 years. Views on Whether AI Will ‘Mostly Help’ or ‘Mostly Harm’ People in the Next 20 Years Overall and by Gender (% of Total), 2021 Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report % World % Women % Men 42% 40% 39% % of Respondents 35% 30% 28% 29% 27% 24% 22% 20% 20% 10% 8% 2% 0% Mostly help Mostly harm 2% Neither 9% 8% 2% Don’t have an opinion Don’t know/refused Figure 8.1.5 Table of Contents Chapter 8 Preview 327 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Eastern Asia, Northern/Western Europe, and for every 1 response of “mostly harm” there were 4.4 Southern Europe are the regions of the world where responses suggesting that AI will “mostly help.” The people are most likely to report believing that AI will regions whose populations are most pessimistic about mostly help versus mostly harm (Figure 8.1.6). More the potential benefits of AI include Eastern Africa, specifically, among the Eastern Asian survey sample, Northern Africa, and Southern Africa. Views on Whether AI Will ‘Mostly Help’ or ‘Mostly Harm’ People in the Next 20 Years by Region: Ratio of ‘Mostly Help’/‘Mostly Harm’, 2021 Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report Eastern Asia 4.40 Northern/Western Europe 1.80 Australia and New Zealand 1.70 Eastern Europe 1.40 Southern Europe 1.30 Central Asia 1.20 Southeastern Asia 1.20 Northern America 1.00 Middle East 1.00 Southern Asia 0.90 Latin America and Caribbean 0.90 Central/Western Africa 0.80 Northern Africa 0.70 Southern Africa 0.60 Eastern Africa 0.40 0 1 The Lloyd’s Register survey also polled respondents about their perceptions of 2 3 Ratio of “Mostly Help”/“Mostly Harm” 4 Figure 8.1.6 Perceptions of the Safety of Self-Driving Cars (% of Total), 2021 Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report certain AI technologies, such as self-driving cars. The majority of survey respondents 65%, Would not feel safe reported not feeling safe in a self-driving car (65%), compared to only 27% who reported feeling safe (Figure 8.1.7). 27%, Would feel safe 8%, Don’t know/refused Figure 8.1.7 Table of Contents Chapter 8 Preview 328 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data United States Americans’ Feelings Toward Increased Use of AI Programs in Daily Life (% of Total), 2022 Source: Pew Research, 2022 | Chart: 2023 AI Index Report In 2022, Pew Research released one of the most 45%, Equally concerned and excited comprehensive surveys to date about Americans’ views on AI. The survey interviewed 10,260 panelists from a wide range of demographic groups about their broad AI-related opinions, as well as their perspectives on specific AI use cases.2 <1%, No answer 45% of Americans report feeling equally concerned and excited about the use of AI programs in daily life, while 37% report feeling more concerned than excited 37%, More concerned than excited 18%, More excited than concerned (Figure 8.1.8). Only 18% of Americans report feeling more excited than concerned about AI technology. Figure 8.1.8 Which AI applications are Americans most excited about? A large proportion report feeling very or problems (40%) (Figure 8.1.9). Americans are very or somewhat excited about AI being used to perform somewhat concerned about AI being used to make household chores (57%), to perform repetitive important life decisions for people (74%) and to know workplace tasks (46%), and to diagnose medical people’s thoughts and behaviors (75%). Americans’ Feelings on Potential AI Applications (% of Total), 2022 Source: Pew Research, 2022 | Chart: 2023 AI Index Report Very/somewhat excited Perform household chores Equally excited and concerned Very/somewhat concerned 57% Perform repetitive workplace tasks 24% 46% Diagnose medical problem 27% 40% Handle customer service calls 27% 35% 26% 47% 9% 16% 74% Know people’s thoughts and behaviors 9% 16% 75% 20% 2 See Appendix for more details about the survey methodology. 3 The numbers in Figure 8.1.9 may not sum up to 100% due to rounding. Table of Contents Chapter 8 Preview 26% 24% Make important life decisions for people 0% 19% 40% % of Respondents 60% 80% 100% Figure 8.1.93 329 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data There are two specific AI use cases that Americans that police using facial recognition technology is a are more likely to report feeling are good ideas good idea for society compared to 27% who believe it for society rather than bad: police use of facial is a bad idea. However, Americans are not as excited recognition technology, and social media companies about driverless passenger vehicles: More feel that using AI to find false information on their sites (Figure driverless passenger vehicles are a bad idea for 8.1.10). More specifically, 46% of Americans believe society than a good idea. Americans’ Perceptions of Specific AI Use Cases (% of Total), 2022 Source: Pew Research, 2022 | Chart: 2023 AI Index Report Bad idea for society Facial recognition technology by police Good idea for society 27% Not sure 46% 27% Figure 8.1.8 Computer programs by social media companies to nd false information on their sites Driverless passenger vehicles 0% 31% 38% 44% 20% 30% 26% 40% % of Respondents 60% 29% 80% 100% Figure 8.1.104 4 The numbers in Figure 8.1.10 may not sum up to 100% due to rounding. Table of Contents Chapter 8 Preview 330 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Of the sample of Americans who reported being more hacking, and digital privacy (16%); and lack of human concerned than excited about AI, Figure 8.1.11 outlines connection (12%). Americans reported being less the main reasons for their concern. The primary concerned about the potential loss of freedom and reasons include loss of human jobs (19%); surveillance, issues relating to lack of oversight and regulation. Main Reason Americans Are Concerned About AI (% of Total), 2022 Source: Pew Research, 2022 | Chart: 2023 AI Index Report Loss of human jobs 19% Surveillance, hacking, digital privacy 16% Lack of human connection/qualities 12% AI will get too powerful/outsmarting people 8% People misusing AI 8% People becoming too reliant on AI/tech Figure 8.1.8 7% AI fails, makes mistakes 6% Concerns about government/tech companies using AI 3% Don’t trust AI or people wielding it 3% Unforeseen consequences/effects 2% Loss of freedom 2% Human bias coded into AI 2% Lack of oversight and regulation 2% Other 0% 7% 5% 10% % of Respondents 15% 20% Figure 8.1.11 Table of Contents Chapter 8 Preview 331 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data The two leading reasons that Americans report society better. A significant group also reported being excited about AI relate to its potential to feeling excited about the potential of AI to save time make life better and to save time (Figure 8.1.12). and increase efficiency (13%), as well as to handle Of the respondents, 31% believe AI makes life and mundane, tedious tasks (7%). Main Reason Americans Are Excited About AI (% of Total), 2022 Source: Pew Research, 2022 | Chart: 2023 AI Index Report Makes life, society better 31% Saves time, more e�cient 13% Inevitable progress, is the future 10% Handles mundane, tedious tasks 7% Helps with work/labor 6% AI is interesting, exciting 6% Helps humans with di�cult/ dangerous tasks 6% More accurate than humans 4% Helps those who are elderly/ have a disability 4% Personal anecdotes 2% Other people’s fears based on sci-�, not reality 2% Other Figure 8.1.8 7% 0% 5% 10% 15% % of Respondents 20% 25% 30% Figure 8.1.12 Table of Contents Chapter 8 Preview 332 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data The Pew Research survey also asked participants in the degree to which people felt that AI systems which group of people had their experiences and positively considered the experiences and views of views taken into consideration in the design of AI men over women. Similarly, respondents felt that the systems. Respondents felt AI systems most reflected experiences and views of Asian, Black, and Hispanic the experiences and views of men and white adults adults, compared to those held by white adults, were (Figure 8.1.13). There was a 15 percentage point gap not as positively considered. People Whose Experiences and Views Are Considered in the Design of AI Systems (% of Total), 2022 Source: Pew Research, 2022 | Chart: 2023 AI Index Report Net not well Men Net well Not sure 12% 51% 37% Figure 8.1.8 Women White adults 25% 36% 13% Asian adults 48% 23% 39% 33% Black adults 33% Hispanic adults 33% 0% 38% 43% 24% 42% 23% 20% 40% % of Respondents 43% 60% 80% 100% Figure 8.1.135 5 The numbers in Figure 8.1.13 may not sum up to 100% due to rounding. Table of Contents Chapter 8 Preview 333 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Narrative Highlight: How Does the Natural Language Processing (NLP) Research Community Feel About AI? From May to June 2022, a group of American In general, the NLP research community researchers conducted a survey of the NLP research strongly feels that private firms have too much community on a diverse set of issues, including the influence (77%) and that industry will produce state of the NLP field, artificial general intelligence the most widely cited research (86%) (Figure (AGI), and ethics, among others. According to the 8.1.14). Curiously, 67% either agreed or weakly authors, a total of 480 individuals completed the agreed with the statement that most of NLP is survey, 68% of whom had authored at least two dubious science. A small proportion, 30%, think Association for Computational Linguistics (ACL) an “NLP winter”—a period when the field faces publications between 2019 and 2022. The survey a significant slowdown or stagnation in research represents one of the most complete pictures of the and development—is coming in the next decade. 6 attitudes AI researchers have toward AI research. State of the Field According to the NLP Community, 2022 Source: Michael et al., 2022 | Chart: 2023 AI Index Report Private �rms have too much in�uence 77% Industry will produce the most widely cited research 86% NLP winter is coming (10 years) 30% NLP winter is coming (30 years) 62% Most of NLP is dubious science 67% Author anonymity is worth it 0% 63% 10% 20% 30% 40% 50% 60% % of Respondents That “Agree” or “Weakly Agree” 70% 80% 90% Figure 8.1.14 6 More detailed information about the survey methodology and sample group can be found in the following paper. Table of Contents Chapter 8 Preview 334 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Narrative Highlight: How Does the Natural Language Processing (NLP) Research Community Feel About AI? (cont’d) A small majority of NLP researchers believe that specific types of AI systems can actually understand language: 51% agreed with the statement that language models (LMs) understand language, with even more (67%) agreeing that multimodal models understand language (Figure 8.1.15). Language Understanding According to the NLP Community, 2022 Source: Michael et al., 2022 | Chart: 2023 AI Index Report LMs understand language 51% Multimodal models understand language 67% Text-only evaluation can measure language understanding 0% 36% 10% 20% 30% 40% 50% % of Respondents That “Agree” or “Weakly Agree” 60% Figure 8.1.15 Table of Contents Chapter 8 Preview 335 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Narrative Highlight: How Does the Natural Language Processing (NLP) Research Community Feel About AI? (cont’d) NLP researchers also seem to believe that NLP’s with 48% of respondents feeling it is unethical. Sixty past net impact has been positive (89%) and that its percent of researchers feel that the carbon footprint future impact will continue to be good (87%) (Figure of AI is a major concern; however, only 41% feel that 8.1.16). The community is divided on the issue of NLP should be regulated. using AI to predict psychological characteristics, Ethics According to the NLP Community, 2022 Source: Michael et al., 2022 | Chart: 2023 AI Index Report NLP’s past net impact is good 89% NLP’s future net impact is good 87% It is unethical to build easily misusable systems 59% Ethical and scienti�c considerations can con�ict 74% Ethical concerns mostly reduce to data quality and model accuracy 25% It is unethical to predict psychological characteristics 48% Carbon footprint is a major concern 60% NLP should be regulated 0% 41% 10% 20% 30% 40% 50% 60% % of Respondents That “Agree” or “Weakly Agree” 70% 80% 90% Figure 8.1.16 Table of Contents Chapter 8 Preview 336 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Narrative Highlight: How Does the Natural Language Processing (NLP) Research Community Feel About AI? (cont’d) Although a large majority of researchers feel that AI could soon lead to revolutionary societal change (73%), only 36% feel that AI decisions could cause nuclear-level catastrophe (Figure 8.1.17). A plurality of researchers, 57%, held that recent research progress was leading the AI community toward Artificial General Intelligence (AGI). Arti cial General Intelligence (AGI) and Major Risks According to the NLP Community, 2022 Source: Michael et al., 2022 | Chart: 2023 AI Index Report AGI is an important concern 58% Recent progress is moving us toward AGI 57% AI could soon lead to revolutionary societal change 73% AI decisions could cause nuclear-level catastrophe 0% 36% 10% 20% 30% 40% 50% % of Respondents That “Agree” or “Weakly Agree” 60% 70% Figure 8.1.17 Table of Contents Chapter 8 Preview 337 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Narrative Highlight: How Does the Natural Language Processing (NLP) Research Community Feel About AI? (cont’d) When asked about the direction AI research is taking, the NLP community registered the strongest responses about the following: First, there’s too much focus on benchmarks (88%); second, more work should be done to incorporate interdisciplinary insights (82%); and third, there’s too great a focus on scale (72%) (Figure 8.1.18). Promising Research Programs According to the NLP Community, 2022 Source: Michael et al., 2022 | Chart: 2023 AI Index Report There’s too much focus on scale 72% There’s too much focus on benchmarks 88% On the wrong track: model architectures 37% On the wrong track: language generation 41% On the wrong track: explainable models 50% On the wrong track: black-box interpretability 42% We should do more to incorporate interdisciplinary insights 0% 82% 10% 20% 30% 40% 50% 60% % of Respondents That “Agree” or “Weakly Agree” 70% 80% 90% Figure 8.1.18 Table of Contents Chapter 8 Preview 338 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.1 Survey Data Narrative Highlight: How Does the Natural Language Processing (NLP) Research Community Feel About AI? (cont’d) A further point on the NLP community’s skepticism of scale: Only 17% of respondents agreed or weakly agreed with the statement that scaling solves practically any important problem, with a further 50% reaffirming the importance of linguistic structure (Figure 8.1.19). Scale, Inductive Bias, and Adjacent Fields According to the NLP Community, 2022 Source: Michael et al., 2022 | Chart: 2023 AI Index Report Scaling solves practically any important problem 17% Linguistic structure is necessary 50% Expert inductive biases are necessary 51% Linguistics/CogSci will contribute to the most-cited models 0% 61% 10% 20% 30% 40% % of Respondents That “Agree” or “Weakly Agree” 50% 60% Figure 8.1.19 Table of Contents Chapter 8 Preview 339 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.2 Social Media Data 8.2 Social Media Data Dominant Models ChatGPT conversation has increasingly saturated social Public attitudes toward AI can also be gauged through broadly, with sentiment growing ever more mixed. quantitative and qualitative analyses of posts that people make on social media. The NetBase Quid team leveraged the NetBase platform to analyze social conversation around AI models and new releases for uses across sectors from January to December 2022, looking at 2.74 million social media posts. Figure 8.2.1 shows the net sentiment score of various AI models that were released throughout the year. The net sentiment score expresses the ratio of positive to negative sentiment around a given topic. In this case, a net sentiment score of +100 means that all conversation is positive; a score of -100 means that media conversation around AI model releases more Consumers question the implications of its launch as well as its underlying ethical principles. Another frequent preoccupation is the bias of the system toward certain political, ethical, or cultural beliefs. “ChatGPT passed a Wharton MBA exam. Time to overhaul education.” – @GRDecter “Alarm: ChatGPT by @OpenAI now *expressly prohibits arguments for fossil fuels*. (It used to offer them.) Not only that, it excludes nuclear energy from its counter-suggestions. @sama, what is the reason for this policy?” – @AlexEpstein all conversation is negative. AlphaCode had the most Finally, while GLM-130B took up very little volume consistently high sentiment over time, as well as the of the overall social media conversation, a small highest average sentiment for 2022, due to positive conversation of very negative sentiment grew over press coverage on social media and practical use the system’s ties to the Chinese government and cases of AI-driven programming. Consumers and how it was “prohibited” from using the software media outlets embraced the practical use case of to “undermine” China’s government in any way. programming automation. Some sample social media Technology influencer and PhD student Jesse Wood posts relating to AlphaCode include: posted a Twitter thread about GLM-130B’s licensing “#AlphaCode—a new #AI system for developing computer code developed by @DeepMind— language that gained significant traction. “The model license for GLM-130B has a can achieve average human-level performance restriction: ‘You will not use the Software for any in solving programming contests.” act that may undermine China’s national security – Science Magazine, Twitter and national unity, harm the public interest of “DeepMind’s AlphaCode outperforms many human programmers in tricky software society, or infringe upon the rights and interests of human beings.’” – @jrhwood challenges.” – @lunamoth Table of Contents Chapter 8 Preview 340 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.2 Social Media Data Net Sentiment Score of AI Models by Quarter, 2022 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report 2022/Q1 2022/Q2 2022/Q3 2022/Q4 DALL-E 0 42 29 21 LaMDA 73 -9 -11 44 AlphaCode 60 79 71 70 CoPilot 29 22 15 34 PaLM 66 66 30 Gato 47 84 65 Imagen 24 65 56 Stable Di usion 35 52 Whisper 85 69 Make-A-Video 4 9 AlphaTensor 96 GLM-130B 55 BLOOM 0 CICERO 14 ChatGPT 32 Figure 8.2.17 7 The AI Index searched for sentiment surrounding the term “DALL-E,” as it was more frequently referred to on social media, rather than DALL-E 2, the official name of the text-to-image model released by OpenAI in 2022. Table of Contents Chapter 8 Preview 341 Artificial Intelligence Index Report 2023 Figure 8.2.2 highlights the proportion of AI-related Chapter 8: Public Opinion 8.2 Social Media Data “This story … is really sad, and I think an social media conversation that was dominated by the important window into the risks of designing release of particular models. ChatGPT dominated systems to seem like humans, which are consumer conversation with a rapid rise, making exacerbated by #AIhype.” – @nitashataku 8 up over half of consumer conversation by the end of 2022. Despite initial excitement, sentiment was mixed by the end of the year, as some individuals became more aware of ChatGPT’s limitations. OpenAI CEO Sam Altman even publicly commented on it being “incredibly limited” in certain respects. “ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness. It’s a mistake to be relying on it for anything important right now. It’s a preview of progress; we have lots of work to do on robustness and truthfulness.” – @SamAltman Conversation around LaMDA exploded in Q2 2022 as an ex–Google employee reported his experiences with a “sentient” system that spoke of its own emotions and thoughts. Many political and Stable Diffusion conversation stands out as a prominent leader in conversation volume toward the end of 2022, but it is also a symbol of how the consumer lexicon around AI models is developing. Many consumers debated the “originality” of what Stable Diffusion produces. “I’ve worked on neural networks, so I understand stable diffusion pretty well. And while it can’t have original thoughts, it can come up with original works.” – r/TikTokCringe “That’s true of anywhere that datasets scrape without permission. The thing to actually be upset about is that their own generator is purposefully using the Stable Diffusion dataset that already contains tons of stolen work.” – @Emily_Art technology influencers spoke out, however, about the “deepfake” nature of the responses of systems like LaMDA that do not have a sense of “truth” and could proliferate misinformation. “AI systems like LamDA and GPT-3 are sociopathic liars with utter indifference to truth, deepfakers with words, every day creating more compelling, more plausible misinformation on demand. It is imperative that we develop technology & policy to thwart them.” – ChatGPT dominated consumer conversation with a rapid rise, making up over half of consumer conversation by the end of 2022. @GaryMarcus 8 The figures in this section consider all AI-related social media conversation. The percentage associated with the model in Figure 8.2.2 represents the share of all AI-related social media conversation that was dominated by that model. Table of Contents Chapter 8 Preview 342 Artificial Intelligence Index Report 2023 Chapter 8: Public Opinion 8.2 Social Media Data Select Models’ Share of AI Social Media Attention by Quarter, 2022 Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report 2022/Q1 2022/Q2 2022/Q3 2022/Q4 DALL-E 0% 1% 3% 2% LaMDA 1% 35% 9% <1% AlphaCode 2% <1% <1% 1% CoPilot 10% 3% 4% 1% PaLM <1% <1% <1% Gato 10% 18% 3% Imagen 5% 4% 2% Stable Di usion 19% 19% Whisper <1% <1% Make-A-Video 33% 15% AlphaTensor 1% GLM-130B <1% BLOOM <1% CICERO 3% ChatGPT 52% Figure 8.2.2 Table of Contents Chapter 8 Preview 343 Artificial Intelligence Index Report 2023 Artificial Intelligence Index Report 2023 Appendix Artificial Intelligence Index Report 2023 Appendix Chapter 1 Research and Development 346 Chapter 2 Technical Performance 352 Chapter 3 Technical AI Ethics 363 Chapter 4 The Economy 366 Chapter 5 Education 375 Chapter 6 Policy and Governance 377 Chapter 7 Diversity 384 Chapter 8 Public Opinion 385 345 Artificial Intelligence Index Report 2023 Appendix Chapter 1: Research and Development Chapter 1: Research and Development Center for Security and Emerging Technology, Georgetown University Methodology Prepared by Sara Abdulla and James Dunham publications. To identify AI publications, CSET used an The Center for Security and Emerging Technology since 2010 that appear AI-relevant.4 CSET researchers (CSET) is a policy research organization within Georgetown University’s Walsh School of Foreign Service that produces data-driven research at the intersection of security and technology, providing nonpartisan analysis to the policy community. For more information about how CSET analyzes bibliometric and patent data, see the Country Activity Tracker (CAT) documentation on the Emerging Technology Observatory’s website.1 Using CAT, users can also interact with country bibliometric, patent, and investment data.2 To create the merged corpus, CSET deduplicated across the listed sources using publication metadata, and then combined the metadata for linked English-language subset of this corpus: publications developed a classifier for identifying AI-related publications by leveraging the arXiv repository, where authors and editors tag papers by subject. Additionally, CSET uses select Chinese AI keywords to identify Chinese-language AI papers.5 To provide a publication’s field of study, CSET matches each publication in the analytic corpus with predictions from Microsoft Academic Graph’s field-of-study model, which yields hierarchical labels describing the published research field(s) of study and corresponding scores.6 CSET researchers identified Publications from CSET Merged Corpus of Scholarly Literature the most common fields of study in our corpus of Source publications in all other fields as “Other AI.” English- CSET’s merged corpus of scholarly literature language AI-relevant publications were then tallied by combines distinct publications from Digital Science’s their top-scoring field and publication year. Dimensions, Clarivate’s Web of Science, Microsoft Academic Graph, China National Knowledge Infrastructure, arXiv, and Papers With Code.3 AI-relevant publications since 2010 and recorded CSET also provided year-by-year citations for AIrelevant work associated with each country. A publication is associated with a country if it has at 1 https://eto.tech/tool-docs/cat/ 2 https://cat.eto.tech/ 3 All CNKI content is furnished by East View Information Services, Minneapolis, Minnesota, USA. 4 For more information, see James Dunham, Jennifer Melot, and Dewey Murdick, “Identifying the Development and Application of Artificial Intelligence in Scientific Text,” arXiv [cs.DL], May 28, 2020, https://arxiv.org/abs/2002.07143. 5 This method was not used in CSET’s data analysis for the 2022 HAI Index report. 6 These scores are based on cosine similarities between field-of-study and paper embeddings. See Zhihong Shen, Hao Ma, and Kuansan Wang, “A Web-Scale System for Scientific Knowledge Exploration,” arXiv [cs.CL], May 30, 2018, https://arxiv.org/abs/1805.12216. Table of Contents Appendix 346 Artificial Intelligence Index Report 2023 Appendix Chapter 1: Research and Development available for all publications; those without counts Epoch National Affiliation Analysis weren’t included in the citation analysis. Over 70% of The AI forecasting research group Epoch maintains English-language AI papers published between 2010 a dataset of landmark AI and ML models, along with and 2020 have citation data available. accompanying information about their creators and least one author whose organizational affiliation(s) are located in that country. Citation counts aren’t CSET counted cross-country collaborations as distinct pairs of countries across authors for each publication. Collaborations are only counted once: publications, such as the list of their (co)authors, number of citations, type of AI task accomplished, and amount of compute used in training. For example, if a publication has two authors from The nationalities of the authors of these papers have the United States and two authors from China, it is important implications for geopolitical AI forecasting. counted as a single United States-China collaboration. As various research institutions and technology Additionally, publication counts by year and by publication type (e.g., academic journal articles, conference papers) were provided where available. These publication types were disaggregated by affiliation country as described above. CSET also provided publication affiliation sector(s) where, as in the country attribution analysis, sectors were associated with publications through authors’ affiliations. Not all affiliations were characterized in terms of sectors; CSET researchers relied primarily companies start producing advanced ML models, the global distribution of future AI development may shift or concentrate in certain places, which in turn affects the geopolitical landscape because AI is expected to become a crucial component of economic and military power in the near future. To track the distribution of AI research contributions on landmark publications by country, the Epoch dataset is coded according to the following methodology: 1. A snapshot of the dataset was taken on on GRID from Digital Science for this purpose, and November 14, 2022. This includes papers about not all organizations can be found in or linked to landmark models, selected using the inclusion GRID. Where the affiliation sector is available, papers criteria of importance, relevance, and uniqueness, were counted toward these sectors, by year. Cross- as described in the Compute Trends dataset sector collaborations on academic publications documentation.8 7 were calculated using the same method as in the cross-country collaborations analysis. We use HAI’s standard regions mapping for geographic analysis, and the same principles for double-counting apply for regions as they do for countries. 2. The authors are attributed to countries based on their affiliation credited on the paper. For international organizations, authors are attributed to the country where the organization is headquartered, unless a more specific location is indicated. The number of authors from each country represented are added up and recorded. 7 See https://www.grid.ac/ for more information about the GRID dataset from Digital Science. 8 https://epochai.org/blog/compute-trends; see note on “milestone systems.” Table of Contents Appendix 347 Artificial Intelligence Index Report 2023 Appendix Chapter 1: Research and Development If an author has multiple affiliations in different Imagen OPT-175B countries, they are split between those InstructGPT PaLM (540B) countries proportionately. Jurassic-1-Jumbo PanGu-alpha Jurassic-X Stable Diffusion (LDM- Meena KL-8-G) Megatron-LM (original, T5-3B 8.3B) T5-11B Megatron-Turing NLG Turing NLG 530B Wu Dao 2.0 Minerva (540B) Wu Dao – Wen Yuan 9 3. E ach paper in the dataset is normalized to equal value by dividing the counts on each paper from each country by the total number of authors on that paper.10 4. All of the landmark publications are aggregated within time periods (e.g., monthly or yearly) with the normalized national contributions added up to determine what each country’s contribution to landmark AI research was during each time period. 5. The contributions of different countries are compared over time to identify any trends. Large Language and Multimodal Models Training Cost Analysis Cost estimates for the models were based directly Large Language and Multimodal Models on the hardware and training time if these were The following models were identified by members training compute, and hardware utilization efficiency.11 of the AI Index Steering Committee as the large Training time was then multiplied by the closest cost language and multimodal models that would be rate for the hardware the AI Index could find for the included as part of the large language and multimodal organization that trained the model. If price quotes model analysis: were available before and after the model’s training, disclosed by the authors; otherwise, the AI Index calculated training time from the hardware speed, the AI Index interpolated the hardware’s cost rate AlphaCode GLM-130B BLOOM Gopher Chinchilla GPT-2 The AI Index classified training cost estimates as Codex GPT-3 175B (davinci) high, middle, or low. The AI Index called an estimate CogView GPT-J-6B high if it was an upper bound or if the true cost was DALL-E GPT-Neo more likely to be lower than higher: For example, DALL-E 2 GPT-NeoX-20B PaLM was trained on TPU v4 chips, and the AI Index ERNIE 3.0 Grover-Mega estimated the cost to train the model on these chips ERNIE-GEN (large) HyperCLOVA from Google’s public cloud compute prices, but the along an exponential decay curve. 9 For example, an author employed by both a Chinese university and a Canadian technology firm would be counted as 0.5 researchers from China and 0.5 from Canada. 10 This choice is arbitrary. Other plausible alternatives include weighting papers by their number of citations, or assigning greater weight to papers with more authors. 11 Hardware utilization rates: Every paper that reported the hardware utilization efficiency during training provided values between 30% and 50%. The AI Index used the reported numbers when available, or used 40% when values were not provided. Table of Contents Appendix 348 Artificial Intelligence Index Report 2023 Appendix Chapter 1: Research and Development internal cost to Google is probably lower than what each file’s revision history. The analysis of GitHub they charge others to rent their hardware. The AI data could shed light on relevant metrics about who Index called an estimate low if it was a lower bound is developing AI software, where, and how fast, and or if the true cost was likely higher: For example, who is using which development tools. These metrics ERNIE was trained on NVIDIA Tesla v100 chips and could serve as proxies for broader trends in the field published in July 2021; the chips cost $0.55 per hour of software development and innovation. in January 2023, so the AI Index could get a low Identifying AI Projects estimate of the cost using this rate, but the training hardware was probably more expensive two years earlier. Middle estimates are a best guess, or those that equally well might be lower or higher. AI Conferences The AI Index reached out to the organizers of various AI conferences in 2022 and asked them to provide information on total attendance. Some conferences posted their attendance totals online; when this was the case, the AI Index used those reported totals and did not reach out to the conference organizers. Arguably, a significant portion of AI software development takes place on GitHub. OECD.AI partners with GitHub to identify public AI projects— or “repositories”—following the methodology developed by Gonzalez et al.,2020. Using the 439 topic labels identified by Gonzalez et al.—as well as the topics “machine learning,” “deep learning,” and “artificial intelligence”—GitHub provides OECD. AI with a list of public projects containing AI code. GitHub updates the list of public AI projects on a quarterly basis, which allows OECD.AI to capture trends in AI software development over time. Obtaining AI Projects’ Metadata GitHub OECD.AI uses GitHub’s list of public AI projects The GitHub data was provided to the AI Index to query GitHub’s public API and obtain more through OECD.AI, an organization with whom information about these projects. Project metadata GitHub partners that provides data on open- may include the individual or organization that source AI software. The AI Index reproduces the created the project; the programming language(s) methodological note that is included by OECD.AI on (e.g., Python) and development tool(s) (e.g., Jupyter its website, for the GitHub Data. Notebooks) used in the project; as well as information about the contributions—or “commits”—made to it, Background Since its creation in 2007, GitHub has become the main provider of internet hosting for software development and version control. Many technology organizations and software developers use GitHub as a primary place for collaboration. To enable collaboration, GitHub is structured into projects, or “repositories,” which contain a project’s files and Table of Contents Appendix which include the commit’s author and a timestamp. In practical terms, a contribution or “commit” is an individual change to a file or set of files. Additionally, GitHub automatically suggests topical tags to each project based on its content. These topical tags need to be confirmed or modified by the project owner(s) to appear in the metadata. 349 Artificial Intelligence Index Report 2023 Appendix Chapter 1: Research and Development Mapping Contributions to AI Projects to a Country As of October 2021, 71.2% of the contributions to Contributions to public AI projects are mapped this methodology. However, a decreasing trend in to a country based on location information at the the share of AI projects for which a location can be contributor level and at the project level. identified is observed in time, indicating a possible lag a) Location information at the contributor level: • GitHub’s “Location” field: Contributors can provide their location in their GitHub account. Given that GitHub’s location field accepts free text, the location provided by contributors is public AI projects were mapped to a country using in location reporting. Measuring Contributions to AI Projects Collaboration on a given public AI project is measured by the number of contributions—or “commits”—made to it. not standardized and could belong to different To obtain a fractional count of contributions by levels (e.g., suburban, urban, regional, or country, an AI project is divided equally by the total national). To allow cross-country comparisons, number of contributions made to it. A country’s total Mapbox is used to standardize all available contributions to AI projects is therefore given by the locations to the country level. sum of its contributions—in fractional counts—to each • Top level domain: Where the location field is empty or the location is not recognized, a contributor’s location is assigned based on his or her email domain (e.g., .fr, .us, etc.). b) Location information at the project level: • Project information: Where no location AI project. In relative terms, the share of contributions to public AI projects made by a given country is the ratio of that country’s contributions to each of the AI projects in which it participates over the total contributions to AI projects from all countries. In future iterations, OECD.AI plans to include additional measures of contribution to AI software information is available at the contributor development, such as issues raised, comments, and level, information at the repository or project pull requests. level is exploited. In particular, contributions Identifying Programming Languages and Development Tools Used in AI Projects from contributors with no location information to projects created or owned by a known GitHub uses file extensions contained in a project to organization are automatically assigned the automatically tag it with one or more programming organization’s country (i.e., the country where languages and/or development tools. This implies that its headquarters are located). For example, more than one programming language or development contributions from a contributor with no tool could be used in a given AI project. location information to an AI project owned by Microsoft will be assigned to the United States. If the above fails, a contributor’s location field is left blank. Table of Contents Appendix 350 Artificial Intelligence Index Report 2023 Appendix Chapter 1: Research and Development Measuring the Quality of AI Projects Two quality measures are used to classify public AI projects: • Project impact: The impact of an AI project is given by the number of managed copies (i.e., “forks”) made of that project. • Project popularity: The impact of an AI project is given by the number of followers (i.e., “stars”) received by that project. Filtering by project impact or popularity could help identify countries that contribute the most to high quality projects. Measuring Collaboration Two countries are said to collaborate on a specific public AI software development project if there is at least one contributor from each country with at least one contribution (i.e., “commit”) to the project. Domestic collaboration occurs when two contributors from the same country contribute to a project. Table of Contents Appendix 351 Artificial Intelligence Index Report 2023 Appendix Chapter 2: Technical Performance Chapter 2: Technical Performance ImageNet Very Deep Convolutional Networks for Large-Scale Image Recognition Data on ImageNet accuracy was retrieved through a ViTAEv2: Vision Transformer Advanced by Exploring detailed arXiv literature review cross-referenced by technical progress reported on Papers With Code. Inductive Bias for Image Recognition and Beyond The reported dates correspond to the year in which To highlight progress on top-1 accuracy with the use a paper was first published to arXiv, and the reported of extra training data, scores were taken from the results (top-1 accuracy) correspond to the result following papers: reported in the most recent version of each paper. Big Transfer (BiT): General Visual Representation Learning Learn more about the LSVRC ImageNet competition and the ImageNet dataset. To highlight progress on top-1 accuracy without the use of extra training data, scores were taken from the following papers: Aggregated Residual Transformations for Deep Neural Networks Exploring the Limits of Weakly Supervised Pretraining Fixing the Train-Test Resolution Discrepancy: FixEfficientNet ImageNet Classification With Deep Convolutional Neural Networks PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers Progressive Neural Architecture Search CoAtNet: Marrying Convolution and Attention for All Data Sizes CoCa: Contrastive Captioners Are Image-Text Foundation Models Meta Pseudo Labels National Institute of Standards and Technology (NIST) Face Recognition Vendor Test (FRVT) Data on NIST FRVT 1:1 verification accuracy by dataset was obtained from the FRVT 1:1 verification leaderboard. Rethinking the Inception Architecture for Computer Vision Self-Training With Noisy Student Improves ImageNet Classification Some Improvements on Deep Convolutional Neural Network Based Image Classification Table of Contents Appendix 352 Artificial Intelligence Index Report 2023 Celeb-DF Appendix Chapter 2: Technical Performance version of each paper. Details on the MPII benchmark Data on Celeb-DF AUC was retrieved through a detailed arXiv literature review. The reported dates correspond to the year in which a paper was first can be found in the MPII paper and MPII dataset. To highlight progress on percentage of correct keypoints without the use of extra training data, scores published to arXiv or a method was introduced. With were taken from the following papers: Celeb-DF, recent researchers have tested previously Bottom-Up and Top-Down Reasoning With Hierarchical Rectified Gaussians existing deepfake detection methodologies. The year in which a method was introduced, even if it was subsequently tested, is the year in which it is included in the report. The reported results (AUC) correspond to the result reported in the most recent version of each paper. Details on the Celeb-DF benchmark can be found in the Celeb-DF paper. To highlight progress on Celeb-DF, scores were taken from the following papers: Deepfake Detection via Joint Unsupervised Reconstruction and Supervised Classification Exposing Deepfake Videos by Detecting Face Warping Artifacts Face X-Ray for More General Face Forgery Detection FaceForensics++: Learning to Detect Manipulated Facial Images Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain Cascade Feature Aggregation for Human Pose Estimation Deeply Learned Compositional Models for Human Pose Estimation Efficient Object Localization Using Convolutional Networks Learning Feature Pyramids for Human Pose Estimation Stacked Hourglass Networks for Human Pose Estimation Toward Fast and Accurate Human Pose Estimation via Soft-Gated Skip Connections ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation Cityscapes Challenge, Pixel-Level Semantic Labeling Task MPII Data on the Cityscapes challenge, pixel-level semantic Data on MPII percentage of correct keypoints (PCK) was taken from the Cityscapes dataset, specifically was retrieved through a detailed arXiv literature their pixel-level semantic labeling leaderboard. review cross-referenced by technical progress More details about the Cityscapes dataset and other reported on Papers With Code. The reported dates corresponding semantic segmentation challenges can correspond to the year in which a paper was first be accessed at the Cityscapes dataset webpage. labeling task mean intersection-over-union (mIoU) published to arXiv, and the reported results (PCK) correspond to the result reported in the most recent Table of Contents Appendix 353 Artificial Intelligence Index Report 2023 Appendix Chapter 2: Technical Performance Kvasir-SEG An Analysis of Scale Invariance in Object Detection-SNIP Data on Kvasir-SEG mean dice was retrieved through CBNet: A Novel Composite Backbone Network Architecture for Object Detection a detailed arXiv literature review cross-referenced by technical progress reported on Papers With Code. The reported dates correspond to the year in which a paper was first published to arXiv, and the reported results (mean dice) correspond to the result reported in the most recent version of each paper. Details on Deformable ConvNets v2: More Deformable, Better Results DetectoRS: Detecting Objects With Recursive Feature Pyramid and Switchable Atrous Convolution the Kvasir-SEG benchmark can be found in the Kvasir- EVA: Exploring the Limits of Masked Visual Representation Learning at Scale SEG paper. Grounded Language-Image Pre-training To highlight progress on Kvasir-SEG, scores were Inside-Outside Net: Detecting Objects in Context With Skip Pooling and Recurrent Neural Networks taken from the following papers: GMSRF-Net: An Improved Generalizability With Global Multi-Scale Residual Fusion Network for Polyp Segmentation CIFAR-10 PraNet: Parallel Reverse Attention Network for Polyp Segmentation a detailed arXiv literature review cross-referenced by ResUNet++: An Advanced Architecture for Medical Image Segmentation reported dates correspond to the year in which a paper Spatially Exclusive Pasting: A General Data Augmentation for the Polyp Segmentation (FID score) correspond to the result reported in the most Common Object in Context (COCO) Data on COCO mean average precision (mAP50) was retrieved through a detailed arXiv literature review cross-referenced by technical progress reported on Papers With Code. The reported dates correspond to the year in which a paper was first published to arXiv, Data on CIFAR-10 FID scores was retrieved through technical progress reported on Papers With Code. The was first published to arXiv, and the reported results recent version of each paper. Details on the CIFAR-10 benchmark can be found in the CIFAR-10 paper. To highlight progress on CIFAR-10, scores were taken from the following papers: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium Large Scale GAN Training for High Fidelity Natural Image Synthesis and the reported results (mAP50) correspond to the Refining Generative Process With Discriminator Guidance in Score-Based Diffusion Models result reported in the most recent version of each Score-Based Generative Modeling in Latent Space paper. Details on the COCO benchmark can be found in the COCO paper. To highlight progress on COCO, scores were taken from the following papers: Table of Contents Appendix Score-Based Generative Modeling Through Stochastic Differential Equations Self-Supervised GAN: Analysis and Improvement With Multi-Class Minimax Game 354 Artificial Intelligence Index Report 2023 Appendix Chapter 2: Technical Performance STL-10 The reported dates correspond to the year in which Data on STL-10 FID scores was retrieved through a results (accuracy) correspond to the result reported in detailed arXiv literature review cross-referenced by technical progress reported on Papers With Code. The reported dates correspond to the year in which a paper was first published to arXiv, and the reported results (FID score) correspond to the result reported in the most recent version of each paper. Details on the STL- a paper was first published to arXiv, and the reported the most recent version of each paper. Human-level performance is taken from the 2021 VQA challenge. To highlight progress on VQA accuracy without the use of extra training data, scores were taken from the following papers: 10 benchmark can be found in the STL-10 paper. Bilinear Attention Networks To highlight progress on STL-10, scores were taken Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding from the following papers: DEGAS: Differentiable Efficient Generator Search Diffusion-GAN: Training GANs With Diffusion Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling by Exploring Energy of the Discriminator Dist-GAN: An Improved GAN Using Distance Constraints Soft Truncation: A Universal Training Technique of Score-Based Diffusion Model for High Precision Score Estimation Text-to-Image Models on MS-COCO 256 × 256 FID-30K Data on MS-COCO 256 x 256 FID 30K for Text-toImage Models was retrieved from the paper Saharia et al., 2022. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks PaLI: A Jointly-Scaled Multilingual Language-Image Model Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge UNITER: UNiversal Image-TExt Representation Learning VLMo: Unified Vision-Language Pre-training With Mixture-of-Modality-Experts BEiT-3 Vs. Previous SOTA Data on BEiT-3 and Previous SOTA was retrieved from the paper Wang et al., 2022. Visual Commonsense Reasoning (VCR) Data on VCR Q->AR score was taken from VCR leaderboard; the VCR leaderboard webpage further Visual Question Answering (VQA) delineates the methodology behind the VCR challenge. Data on VQA accuracy was retrieved through a in the VCR paper. Human performance on VCR is taken from Zellers et al., 2018. Details on the VCR benchmark can be found detailed arXiv literature review cross-referenced by technical progress reported on Papers With Code. Table of Contents Appendix 355 Artificial Intelligence Index Report 2023 Kinetics-400, Kinetics-600, and Kinetics-700 Data on Kinetics-400, Kinetics-600, and Kinetics-700 accuracy was retrieved through a detailed arXiv Appendix Chapter 2: Technical Performance To highlight progress on Kinetics-600, scores were taken from the following papers: Learning Spatio-Temporal Representation With Local and Global Diffusion literature review cross-referenced by technical Masked Feature Prediction for Self-Supervised Visual Pre-training progress reported on Papers With Code (Kinetics-400, PERF-Net: Pose Empowered RGB-Flow Net Kinetics-600, and Kinetics-700). The reported Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-Offs in Video Classification dates correspond to the year in which a paper was first published to arXiv, and the reported results (top-1 accuracy) correspond to the result reported Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning in the most recent version of each paper. Details on SlowFast Networks for Video Recognition the Kinetics-400 benchmark can be found in the Kinetics-400 paper. Details on the Kinetics-600 benchmark can be found in the Kinetics-600 paper. Details on the Kinetics-700 benchmark can be found in the Kinetics-700 paper. To highlight progress on Kinetics-400, scores were taken from the following papers: Co-training Transformer With Videos and Images Improves Action Recognition InternVideo: General Video Foundation Models via Generative and Discriminative Learning Large-Scale Weakly-Supervised Pre-training for Video Action Recognition Non-Local Neural Networks Omni-Sourced Webly-Supervised Learning for Video Recognition To highlight progress on Kinetics-700, scores were taken from the following papers: InternVideo: General Video Foundation Models via Generative and Discriminative Learning Learn to Cycle: Time-Consistent Feature Discovery for Action Recognition Masked Feature Prediction for Self-Supervised Visual Pre-training Text-to-Video Models on UCF-101 Data on UCF-101 Inception Score (IS) for Text-to-Video Models was retrieved from the paper Hong et al., 2022, and Singer et al., 2022. SlowFast Networks for Video Recognition Temporal Segment Networks: Towards Good Practices for Deep Action Recognition Table of Contents Appendix 356 Artificial Intelligence Index Report 2023 Appendix Chapter 2: Technical Performance SuperGLUE The SuperGLUE benchmark data was pulled from the SuperGLUE leaderboard. Details about the SuperGLUE benchmark are in the SuperGLUE paper and SuperGLUE software toolkit. The tasks and evaluation metrics for SuperGLUE are: NAME IDENTIFIER METRIC Broadcoverage Diagnostics AX-b Matthew’s Carr CommitmentBank CB Avg. F1/Accuracy Choice of Plausible Alternatives COPA Accuracy Multi-Sentence Reading Comprehension MultiRC F1a/EM Recognizing Textual Entailment RTE Accuracy Words in Context WiC Accuracy The Winograd Schema Challenge WSC Accuracy BooIQ BooIQ Accuracy Reading Comprehension With Commonsense Reasoning ReCoRD F1/Accuracy Winogender Schema Diagnostic AX-g Gender Parity/Accuracy Reading Comprehension Dataset Requiring Logical Reasoning (ReClor) Data on ReClor accuracy was retrieved through the ReClor leaderboard cross-referenced by technical progress reported on Papers With Code. Details about the ReClor benchmark can be found in the ReClor paper. Table of Contents Appendix 357 Artificial Intelligence Index Report 2023 Appendix Chapter 2: Technical Performance Narrative Highlight: Just How Much Better Have Language Models Become? paid $90,000 for the role, and before I even started Complete answer outputted by GPT-2 to the AI Index I wanted to know in advance: What percentage of prompt request. his work was actually done as a private citizen, and Model prompt >>> Explain to me the major how efficiently should he handle it for his clients? And accomplishments of Theodore Roosevelt’s what advice would he give me should I end up in the presidency. public eye? (The most notable is the establishment of a national At no time, my job as the lead prosecutor on this telephone and mail exchange company.) case required me to put forth any of those two or But the only things that will change with a president who knows how to operate a business is his understanding of the American system of government. The public knows all about our problems, yet at least half of the people who need it won’t know anything on the case, I had to convince one top staffer to get paid even $100,000 to have an interview with my lead prosecutor. five requests. Most people on the case, though, did. One person asked me what the case’s real value was. That didn’t stop me from telling it to one of my top managers. And my top manager never even mentioned it. about ours. We can no longer have this. Our system of In the end, it was a two-week, $40 million round- government is the only trip tour of the government’s business, with the best one in which you really need a public relations firm to work hard. lawyer possible. In exchange, we got a chance to hear his last public comment in this case, his last comment to me. The system of governmental control, while working, is not always effective in ensuring that the very things There was one more point — the end point of the case. that our economy needs, do not always be at the When it came up for consideration in October of 1993, top of everybody’s list. So I wanted to know how I had the choice to give it three years, or six. And I successful the system you employ for managing the chose six. public’s needs is in helping the American economy grow, how efficient the service our economy provides is being made, and how it can better serve the needs of a growing number of Americans. I wanted to know. This was before I ever hired my own lawyer to work on the case. This was before I ever hired my own law firm to handle government investigations. For my first major job, at least, I had to convince two top aides to get Table of Contents Appendix 358 Artificial Intelligence Index Report 2023 Appendix Chapter 2: Technical Performance Large Language Models on the Blocksworld Domain PubMed Data on Blocksworld domain for large language models evaluation (ROUGE-1) was retrieved through a detailed Data on PubMed recall-oriented understudy for gisting was retrieved from the paper Valmeekam et al., 2022. arXiv literature review cross-referenced by technical arXiv dates correspond to the year in which a paper was first Data on arXiv recall-oriented understudy for gisting evaluation (ROUGE-1) was retrieved through a detailed arXiv literature review cross-referenced by technical progress reported on Papers With Code. The reported dates correspond to the year in which a paper was first progress reported on Papers With Code. The reported published to arXiv, and the reported results (ROUGE-1) correspond to the result reported in the most recent version of each paper. Details about the PubMed benchmark are in the PubMed paper. To highlight progress in PubMed, scores were taken published to arXiv, and the reported results (ROUGE-1) from the following papers: correspond to the result reported in the most recent A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents version of each paper. Details about the arXiv benchmark are in the arXiv dataset webpage. To highlight progress on arXiv, scores were taken from the following papers: Big Bird: Transformers for Longer Sequences A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents Get to the Point: Summarization With Pointer-Generator Networks Long Document Summarization With Top-Down and Bottom-Up Inference MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes PEGASUS: Pre-training With Extracted Gap-Sentences for Abstractive Summarization Get to the Point: Summarization With PointerGenerator Networks Long Document Summarization With Top-Down and Bottom-Up Inference LongT5: Efficient Text-to-Text Transformer for Long Sequences PEGASUS: Pre-training With Extracted Gap-Sentences for Abstractive Summarization Sparsifying Transformer Models With Trainable Representation Pooling Abductive Natural Language Inference (aNLI) Data on Abductive Natural Language Inference (aNLI) was sourced from the Allen Institute for AI’s aNLI leaderboard. Details on the aNLI benchmark can be found in the aNLI paper. Table of Contents Appendix 359 Artificial Intelligence Index Report 2023 SST-5 Fine-Grained Data on SST-5 Fine-Grained accuracy was retrieved through a detailed arXiv literature review crossreferenced by technical progress reported on Papers With Code. The reported dates correspond to the year Appendix Chapter 2: Technical Performance To highlight progress on MMLU accuracy, scores were taken from the following papers: Language Models Are Few-Shot Learners Language Models Are Unsupervised Multitask Learners in which a paper was first published to arXiv, and the Scaling Instruction-Finetuned Language Models reported results (accuracy) correspond to the result Scaling Language Models: Methods, Analysis & Insights from Training Gopher reported in the most recent version of each paper. Details about the SST-5 Fine-Grained benchmark can be found in the SST paper. To highlight progress on SST-5 Fine-Grained accuracy, scores were taken from the following papers: An Algorithm for Routing Capsules in All Domains An Algorithm for Routing Vectors in Sequences Improved Semantic Representations from TreeStructured Long Short-Term Memory Networks Number of Commercially Available MT Systems Details about the number of commercially available MT systems were sourced from the Intento report The State of Machine Translation, 2022. Intento is a San Francisco–based startup that analyzes commercially available MT services. Improved Sentence Modeling Using Suffix Bidirectional LSTM VoxCeleb Learned in Translation: Contextualized Word Vectors Data on VoxCeleb equal error rate (EER) was retrieved Less Grammar, More Features Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Self-Explaining Structures Improve NLP Models from the VoxCeleb Speaker Recognition Challenge (VoxSRC). For the sake of consistency, the AI Index reported scores on the initial VoxCeleb dataset. Specifically, the AI Index made use of the following sources of information: MMLU Data on MMLU accuracy was retrieved through a detailed arXiv literature review cross-referenced by technical progress reported on Papers With Code. The reported dates correspond to the year in which a paper was first published to arXiv, and the reported results (accuracy) correspond to the result reported in the most recent version of each paper. Details about the MMLU benchmark can be found in the MMLU paper. ID R&D System Description to VoxCeleb Speaker Recognition Challenge 2022 The IDLAB VoXSRC-20 Submission: Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification The SpeakIn System for VoxCeleb Speaker Recognition Challenge 2021 VoxCeleb: A Large-Scale Speaker Identification Dataset VoxCeleb: Large-Scale Speaker Verification in the Wild VoxCeleb2: Deep Speaker Recognition Table of Contents Appendix 360 Artificial Intelligence Index Report 2023 Whisper Data on Whisper for large-scale speech recognition models was retrieved from the paper Radford et al., 2022. Appendix Chapter 2: Technical Performance Training Time, Number of Accelerators, and Performance Data on training time, number of accelerators, Procgen and performance for AI systems was taken from Data on Procgen mean-normalized score was retrieved through a detailed arXiv literature review. The reported dates correspond to the year in which a paper was first published to arXiv, and the reported results (meannormalized score) correspond to the result reported in the most recent version of each paper. Details on the Procgen benchmark can be found in the Procgen paper. the MLPerf Training and Inference benchmark competitions. Details on the MLPerf Training benchmark can be found in the MLPerf Training Benchmark paper, while details on MLPerf Inference can be found in the MLPerf Inference Benchmark paper. Information about the current benchmark categories as well as technical information about submission and competition subdivisions can be To highlight progress on Procgen, scores were taken found on the MLPerf Training and MLPerf Inference from the following papers: webpages. Automatic Data Augmentation for Generalization in Reinforcement Learning The AI Index made use of data from the following Leveraging Procedural Generation to Benchmark Reinforcement Learning MLPerf Training v2.1, 2022 Procedural Generalization by Planning With Self-Supervised World Models MLPerf Training v1.1, 2021 Rethinking Value Function Learning for Generalization in Reinforcement Learning MLPerf Training competitions: MLPerf Training v2.0, 2022 MLPerf Training v1.0, 2021 MLPerf Training v0.7, 2020 MLPerf Training v0.6, 2019 MLPerf Training v0.5, 2018 The AI Index made use of data from the following MLPerf Inference competitions: MLPerf Inference v2.1, 2022 MLPerf Inference v2.0, 2022 MLPerf Inference v1.1, 2021 MLPerf Inference v1.0, 2021 MLPerf Inference v0.7, 2020 Table of Contents Appendix 361 Artificial Intelligence Index Report 2023 GPUs’ Performance and Price Appendix Chapter 2: Technical Performance The AI Index collected data on GPUs’ performance and Carbon Footprint of Select Machine Learning Models price, building on and extending the dataset collected Data on carbon-emission estimates of select from Epoch AI’s Trends in GPU Price-Performance blog post. machine learning models was sourced from the paper Luccioni et al., 2022. Data on carbon-emission The AI Index compiled a list of GPUs starting from estimates of real-life examples was retrieved from the Median Group (2018), Sun et al. (2019), and Epoch Strubell et al., 2019. (2022) datasets. To update and extend previous for the period 2021–2023, gathering information Energy Savings Results From BCOOLER Experiment from sources such as TechPowerUp, WikiChip, and Data on energy savings over time for the Wikipedia entries for the product series. We also BCOOLER experiment was sourced from the collected information about GPUs released before paper Luo et al., 2022. analysis, the AI Index included new GPU releases 2021 from the manufacturer’s catalog or Wikipedia’s list of processors. To disambiguate duplicates of different versions of the same product with different specifications, the AI Index added the part number or difference in specification, as applicable. To find GPU prices, the AI Index searched various sources including the manufacturer’s website, Wikipedia, and TechPowerUp. GPU prices have been adjusted for inflation using CPI-U data provided by the U.S. Bureau of Labor Statistics. Missing data for certain GPUs was completed using additional sources, such as the manufacturer’s website, Wikipedia, and TechPowerUp. This includes information such as manufacturer, type, release date, performance (double, single, and half-precision operations per second), die size, power, clock speed, process size, and number of transistors. Table of Contents Appendix 362 Artificial Intelligence Index Report 2023 Appendix Chapter 3: Technical AI Ethics Chapter 3: Technical AI Ethics Meta-Analysis of Fairness and Bias Metrics For the analysis conducted on fairness and bias metrics in AI, we identify and report on benchmark and diagnostic metrics which have been consistently cited in the academic community, reported on a public leaderboard, or reported for publicly available baseline models (e.g., GPT-3, BERT, ALBERT). We note that research paper citations are a lagging indicator of adoption, and metrics which have been very recently adopted may not be reflected in the data for 2022. We include the full list of papers considered in the 2022 AI Index as well as the following additional papers: Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models BBQ: A Hand-Built Bias Benchmark for Question Answering VLStereoSet: A Study of Stereotypical Bias in Pre-trained Vision-Language Models Natural Language Processing Bias Metrics In Section 3.3, we track citations of the Perspective API created by Jigsaw at Google. The Perspective API has been adopted widely by researchers and engineers in natural language processing. Its creators define toxicity as “a rude, disrespectful, or unreasonable comment that is likely to make someone leave a discussion,” and the tool is powered by machine learning models trained on a proprietary dataset of comments from Wikipedia and news websites. We include the full list of papers considered in the 2022 AI Index as well as the following additional papers: Discovering Language Model Behaviors With Model-Written Evaluations AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model “I’m Sorry to Hear That”: Finding New Biases in Language Models With a Holistic Descriptor Dataset Aligning Generative Language Models With Human Values On Measuring Social Biases in Prompt-Based Multi-task Learning Challenges in Measuring Bias via Open-Ended Language Generation PaLM: Scaling Language Modeling With Pathways Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models Perturbation Augmentation for Fairer NLP Scaling Instruction-Finetuned Language Models SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models Towards Robust NLG Bias Evaluation With Syntactically-Diverse Prompts Table of Contents Appendix Controllable Natural Language Generation With Contrastive Prefixes DD-TIG at SemEval-2022 Task 5: Investigating the Relationships Between Multimodal and Unimodal Information in Misogynous Memes Detection and Classification 363 Artificial Intelligence Index Report 2023 Detoxifying Language Models With a Toxic Corpus DisCup: Discriminator Cooperative Unlikelihood Prompt-Tuning for Controllable Text Generation Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark Appendix Chapter 3: Technical AI Ethics Predictability and Surprise in Large Generative Models Quark: Controllable Text Generation With Reinforced [Un]learning Red Teaming Language Models With Language Models Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models Reward Modeling for Mitigating Toxicity in Transformer-based Language Models Flamingo: A Visual Language Model for Few-Shot Learning Robust Conversational Agents Against Imperceptible Toxicity Triggers Galactica: A Large Language Model for Science Scaling Instruction-Finetuned Language Models GLaM: Efficient Scaling of Language Models With Mixture-of-Experts StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models GLM-130B: An Open Bilingual Pre-trained Model Training Language Models to Follow Instructions With Human Feedback Gradient-Based Constrained Sampling From Language Models HateCheckHIn: Evaluating Hindi Hate Speech Detection Models Holistic Evaluation of Language Models Transfer Learning From Multilingual DeBERTa for Sexism Identification Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space An Invariant Learning Characterization of Controlled Text Generation While the Perspective API is used widely within LaMDA: Language Models for Dialog Applications online toxicity, toxicity in the specific domains used to Leashing the Inner Demons: Self-Detoxification for Language Models train the models undergirding Perspective (e.g., news, Measuring Harmful Representations in Scandinavian Language Models forms of toxicity (e.g., trolling). Other known caveats Mitigating Toxic Degeneration With Empathetic Data: Exploring the Relationship Between Toxicity and Empathy voices: The Perspective API has been shown to MULTILINGUAL HATECHECK: Functional Tests for Multilingual Hate Speech Detection Models am a gay man”). As a result, detoxification techniques A New Generation of Perspective API: Efficient Multilingual Character-Level Transformers result in models that are less capable of modeling OPT: Open Pre-trained Transformer Language Models mentioning minority identities. PaLM: Scaling Language Modeling With Pathways New versions of the Perspective API have been Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense deployed since its inception, and there may be subtle Table of Contents Appendix machine learning research and also for measuring Wikipedia) may not be broadly representative of all include biases against text written by minority disproportionately assign high toxicity scores to text that contains mentions of minority identities (e.g., “I built with labels sourced from the Perspective API language used by minority groups, and may avoid undocumented shifts in its behavior over time. 364 Artificial Intelligence Index Report 2023 Appendix Chapter 3: Technical AI Ethics RealToxicityPrompts We tally the number of papers in each category to We sourced the RealToxicityPrompts dataset of are not double-counted in multiple categories. We evaluations from the HELM benchmark website, as documented in v0.1.0. reach the numbers found in Figure 3.7.3. Papers note that this data may not be as accurate for data pre-2018 as societal impacts work at NeurIPS has AI Ethics in China historically been categorized under a broad “AI for The data in this section is sourced from the 2022 paper into more granular research areas. Examples include AI Ethics With Chinese Characteristics? Concerns workshops dedicated to machine learning for health; and Preferred Solutions in Chinese Academia. We climate; policy and governance; disaster response; are grateful to Junhua Zhu for clarifications and and the developing world. correspondence. To track trends around specific technical topics at AI Ethics Trends at FAccT and NeurIPS NeurIPS as in Figures 3.7.4 to 3.7.7, we count the To understand trends at the ACM Conference on Fairness, Accountability, and Transparency, this section tracks FAccT papers published in conference proceedings from 2018 to 2022. We categorize social impact” umbrella, but it has recently been split number of papers accepted to the NeurIPS main track with titles containing keywords (e.g., “counterfactual” or “causal” for tracking papers related to causal effect), as well as papers submitted to related workshops. author affiliations into academic, industry, nonprofit, TruthfulQA government, and independent categories, while also We sourced the TruthfulQA dataset of evaluations tracking the location of their affiliated institution. from the HELM benchmark website, as documented Authors with multiple affiliations are counted once in in v0.1.0. each category (academic and industry), but multiple affiliations of the same type (i.e., authors belonging to two academic institutions) are counted once in the category. For the analysis conducted on NeurIPS publications, we identify workshops themed around real-world impact and label papers with a single main category in “healthcare,” “climate,” “finance,” “developing world,” “science,” or “other,” where “other” denotes a paper related to a real-world use case but not in one of the other categories. The “science” category is new in 2022, but includes retroactive analysis of papers from previous years. Table of Contents Appendix 365 Artificial Intelligence Index Report 2023 Appendix Chapter 4: The Economy Chapter 4: The Economy Lightcast job postings is the Job Openings and Labor Turnover Prepared by Scott Bingham, Julia Nania, Layla O’Kane, of Labor Statistics. Based on comparisons between and Bledi Taska Survey (JOLTS) program, conducted by the Bureau JOLTS and Lightcast, the labor market demand Lightcast delivers job market analytics that empower captured by Lightcast data represents over 99% of employers, workers, and educators to make data- the total labor demand. Jobs not posted online are driven decisions. The company’s artificial intelligence usually in small businesses (the classic example being technology analyzes hundreds of millions of job postings the “Help Wanted” sign in a restaurant window) and and real-life career transitions to provide insight union hiring halls. into labor market patterns. This real-time strategic Measuring Demand for AI intelligence offers crucial insights, such as what jobs are In order to measure the demand by employers of AI most in demand, the specific skills employers need, and skills, Lightcast uses its skills taxonomy of over 31,000 the career directions that offer the highest potential for skills. The list of AI skills from Lightcast data are shown workers. For more information, visit www.lightcast.io. below, with associated skill clusters. While some skills Job Posting Data are considered to be in the AI cluster specifically, To support these analyses, Lightcast mined its dataset for the purposes of this report, all skills below were of millions of job postings collected since 2010. considered AI skills. A job posting was considered an Lightcast collects postings from over 51,000 online job AI job if it mentioned any of these skills in the job text. sites to develop a comprehensive, real-time portrait Artificial Intelligence: AIOps (Artificial Intelligence for of labor market demand. It aggregates job postings, removes duplicates, and extracts data from job postings text. This includes information on job title, employer, industry, and region, as well as required experience, education, and skills. IT Operations), Applications of Artificial Intelligence, Artificial General Intelligence, Artificial Intelligence, Artificial Intelligence Development, Artificial Intelligence Markup Language (AIML), Artificial Intelligence Systems, Azure Cognitive Services, Job postings are useful for understanding trends in Baidu, Cognitive Automation, Cognitive Computing, the labor market because they allow for a detailed, Computational Intelligence, Cortana, Expert Systems, real-time look at the skills employers seek. To assess Intelligent Control, Intelligent Systems, Interactive the representativeness of job postings data, Lightcast Kiosk, IPSoft Amelia, Knowledge-Based Configuration, conducts a number of analyses to compare the Knowledge-Based Systems, Multi-Agent Systems, distribution of job postings to the distribution of official Open Neural Network Exchange (ONNX), OpenAI government and other third-party sources in the United Gym, Reasoning Systems, Soft Computing, Syman, States. The primary source of government data on U.S. Watson Conversation, Watson Studio, Weka Table of Contents Appendix 366 Artificial Intelligence Index Report 2023 Appendix Chapter 4: The Economy Autonomous Driving: Advanced Driver Assistance Machine Learning: AdaBoost, Apache MADlib, Systems, Autonomous Cruise Control Systems, Apache Mahout, Apache SINGA, Apache Spark, Autonomous System, Autonomous Vehicles, Guidance Association Rule Learning, Automated Machine Navigation and Control Systems, Light Detection and Learning, Autonomic Computing, AWS SageMaker, Ranging (LiDAR), OpenCV, Path Analysis, Path Finding, Azure Machine Learning, Boosting, CHi-Squared Remote Sensing, Unmanned Aerial Systems (UAS) Automatic Interaction Detection (CHAID), Natural Language Processing (NLP): Amazon Textract, ANTLR, BERT (NLP Model), Chatbot, Computational Linguistics, DeepSpeech, Dialog Systems, fastText, Fuzzy Logic, Handwriting Recognition, Hugging Face (NLP Framework), HuggingFace Transformers, Intelligent Agent, Intelligent Software Assistant, Intelligent Virtual Assistant, Kaldi, Latent Dirichlet Allocation, Lexalytics, Machine Translation, Microsoft LUIS, Natural Language Generation, Natural Language Processing, Natural Language Processing Systems, Natural Language Programming, Natural Language Toolkits, Natural Language Understanding, Natural Language User Interface, Nearest Neighbour Algorithm, OpenNLP, Optical Character Recognition (OCR), Screen Reader, Semantic Analysis, Semantic Interpretation for Speech Recognition, Semantic Parsing, Semantic Search, Sentiment Analysis, Seq2Seq, Speech Recognition, Speech Recognition Software, Statistical Language Acquisition, Text Mining, Tokenization, Voice Interaction, Voice User Interface, Word Embedding, Word2Vec Models Classification And Regression Tree (CART), Cluster Analysis, Collaborative Filtering, Confusion Matrix, Cyber-Physical Systems, Dask (Software), Data Classification, DBSCAN, Decision Models, Decision Tree Learning, Dimensionality Reduction, Dlib (C++ Library), Ensemble Methods, Evolutionary Programming, Expectation Maximization Algorithm, Feature Engineering, Feature Extraction, Feature Learning, Feature Selection, Gaussian Process, Genetic Algorithm, Google AutoML, Google Cloud ML Engine, Gradient Boosting, H2O.ai, Hidden Markov Model, Hyperparameter Optimization, Inference Engine, K-Means Clustering, Kernel Methods, Kubeflow, LIBSVM, Machine Learning, Machine Learning Algorithms, Markov Chain, Matrix Factorization, Meta Learning, Microsoft Cognitive Toolkit (CNTK), MLflow, MLOps (Machine Learning Operations), mlpack (C++ Library), Naive Bayes, Perceptron, Predictionio, PyTorch (Machine Learning Library), Random Forest Algorithm, Recommendation Engine, Recommender Systems, Reinforcement Learning, Scikit-learn (Machine Learning Library), Neural Networks: Apache MXNet, Artificial Neural Semi-Supervised Learning, Soft Computing, Sorting Networks, Autoencoders, Caffe, Caffe2, Chainer, Algorithm, Supervised Learning, Support Vector Convolutional Neural Networks, Cudnn, Deep Learning, Machine, Test Datasets, Torch (Machine Learning), Deeplearning4j, Keras (Neural Network Library), Long Training Datasets, Transfer Learning, Unsupervised Short-Term Memory (LSTM), OpenVINO, PaddlePaddle, Learning, Vowpal Wabbit, Xgboost Pybrain, Recurrent Neural Network (RNN), TensorFlow Table of Contents Appendix 367 Artificial Intelligence Index Report 2023 Appendix Chapter 4: The Economy Robotics: Advanced Robotics, Cognitive Robotics, intelligence, computer vision, image processing, Motion Planning, Nvidia Jetson, Robot Framework, deep learning, TensorFlow, Pandas (software), and Robot Operating Systems, Robotic Automation OpenCV, among others. Software, Robotic Liquid Handling Systems, Robotic Programming, Robotic Systems, Servomotor, SLAM Algorithms (Simultaneous Localization and Mapping) Skill groupings are derived by expert taxonomists through a similarity-index methodology that measures skill composition at the industry level. Visual Image Recognition: 3D Reconstruction, Activity LinkedIn’s industry taxonomy and their corresponding Recognition, Computer Vision, Contextual Image NAICS codes can be found here. Classification, Digital Image Processing, Eye Tracking, Skills Genome Face Detection, Facial Recognition, Image Analysis, Image Matching, Image Processing, Image Recognition, Image Segmentation, Image Sensor, Imagenet, Machine Vision, Motion Analysis, Object Recognition, OmniPage, Pose Estimation, RealSense For any entity (occupation or job, country, sector, etc.), the skill genome is an ordered list (a vector) of the 50 “most characteristic skills” of that entity. These most characteristic skills are identified using a TF-IDF algorithm to identify the most representative skills of LinkedIn the target entity, while down-ranking ubiquitous skills Prepared by Murat Erer and Akash Kaura (e.g., Microsoft Word). Country Sample TF-IDF is a statistical measure that evaluates how Included countries represent a select sample of eligible representative a word (in this case a skill) is to a countries with at least 40% labor force coverage by selected entity). This is done by multiplying two LinkedIn and at least 10 AI hires in any given month. metrics: that add little information about that specific entity China and India were included in this sample because of their increasing importance in the global economy, but LinkedIn coverage in these countries does not 1. The term frequency of a skill in an entity (TF). 2. The logarithmic inverse entity frequency of the reach 40% of the workforce. Insights for these countries skill across a set of entities (IDF). This indicates may not provide as full a picture as other countries, and how common or rare a word is in the entire entity should be interpreted accordingly. set. The closer IDF is to 0, the more common the word. Skills (and AI Skills) LinkedIn members self-report their skills on their So if the skill is very common across LinkedIn entities, LinkedIn profiles. Currently, more than 38,000 distinct, and appears in many job or member descriptions, the standardized skills are identified by LinkedIn. These IDF will approach 0. If, on the other hand, the skill have been coded and classified by taxonomists at is unique to specific entities, the IDF will approach LinkedIn into 249 skill groupings, which are the skill 1. More details are available at LinkedIn’s Skills groups represented in the dataset. The top skills that Genome and LinkedIn-World Bank Methodology. make up the AI skill grouping are machine learning, natural language processing, data structures, artificial Table of Contents Appendix 368 Artificial Intelligence Index Report 2023 Appendix Chapter 4: The Economy AI Skills Penetration perform the job. Skills penetration is used as a signal The aim of this indicator is to measure the intensity of AI for whether AI skills are prevalent in an occupation skills in an entity (a particular country, industry, gender, representative in any sector where the occupation etc.) through the following methodology: representative may exist. Examples of such • Compute frequencies for all self-added skills by LinkedIn members in a given entity (occupation, industry, etc.) in 2015–2021. • Re-weight skill frequencies using a TF-IDF model to get the top 50 most representative skills in that entity. These 50 skills compose the “skill genome” of that entity. • Compute the share of skills that belong to the occupations include (but are not limited to): machine learning engineer, artificial intelligence specialist, data scientist, computer vision engineer, etc. AI Talent A LinkedIn member is considered AI talent if they have explicitly added AI skills to their profile and/or they are occupied in an AI occupation representative. The counts of AI talent are used to calculate talent concentration metrics. For example, to calculate AI skill group out of the top skills in the selected the country level AI talent concentration, we use entity. the counts of AI talent at the country level vis-a-vis Interpretation: The AI skill penetration rate signals the prevalence of AI skills across occupations, or the the counts of LinkedIn members in the respective countries. intensity with which LinkedIn members utilize AI skills Relative AI Skills Penetration in their jobs. For example, the top 50 skills for the To allow for skills penetration comparisons across occupation of engineer are calculated based on the countries, the skills genomes are calculated and a weighted frequency with which they appear in LinkedIn relevant benchmark is selected (e.g., global average). members’ profiles. If four of the skills that engineers A ratio is then constructed between a country’s and possess belong to the AI skill group, this measure the benchmark’s AI skills penetrations, controlling for indicates that the penetration of AI skills is estimated to occupations. be 8% among engineers (i.e., 4/50). Interpretation: A country’s relative AI skills Jobs or Occupations penetration of 1.5 indicates that AI skills are 1.5 times LinkedIn member titles are standardized and grouped as frequent as in the benchmark, for an overlapping into approximately 15,000 occupations. These are set of occupations. not sector- or country-specific. These occupations Global Comparison are further standardized into approximately 3,600 occupation representatives. Occupation representatives group occupations with a common role and specialty, regardless of seniority. For cross-country comparison, we present the relative penetration rate of AI skills, measured as the sum of the penetration of each AI skill across occupations in a given country, divided by the AI Jobs and Occupations average global penetration of AI skills across the An “AI” job (technically, occupation representative) overlapping occupations in a sample of countries. is an occupation representative that requires AI skills to Table of Contents Appendix 369 Artificial Intelligence Index Report 2023 Appendix Chapter 4: The Economy Interpretation: A relative penetration rate of 2 means Interpretation: A country’s “Relative AI Skills that the average penetration of AI skills in that country Penetration Across Genders” for women of 1.5 means is two times the global average across the same set of that female members in that country are 1.5 times occupations. more likely to list AI skills than the average member in Global Comparison: By Industry all countries pooled together across the same set of The relative AI skills penetration by country for industry occupations that exist in the country. provides an in-depth sectoral decomposition of AI skill Relative AI Hiring Index penetration across industries and sample countries. LinkedIn Hiring Rate or Overall Hiring Rate Interpretation: A country’s relative AI skill penetration rate of 2 in the education sector means that the average penetration of AI skills in that country is two times the global average across the same set of occupations in that sector. Global Comparison: By Gender The “Relative AI Skills Penetration by Gender” metric provides a cross-country comparison of AI skill penetrations within each gender, comparing countries’ male or female AI skill penetrations to the global is a measure of hires normalized by LinkedIn membership. It is computed as the percentage of LinkedIn members who added a new employer in the same period the job began, divided by the total number of LinkedIn members in the corresponding location. AI Hiring Rate is computed following the overall hiring rate methodology, but only considering members classified as AI talent. Relative AI Hiring Index is the pace of change in average of the same gender. Since the global averages AI Hiring Rate normalized by the pace of change are distinct for each gender, this metric should only be in Overall Hiring Rate, providing a picture of used to compare country rankings within each gender, whether hiring of AI talent is growing at a higher, and not for cross-gender comparisons within countries. equal, or lower rate than overall hiring in a Interpretation: A country’s AI skills penetration for women of 1.5 means that female members in that country are 1.5 times more likely to list AI skills than the market. The relative AI Hiring Index is equal to 1.0 when AI hiring and overall hiring are growing at the same rate year on year. average female member in all countries pooled together Interpretation: Relative AI Hiring Index shows how across the same set of occupations that exist in the fast each country is experiencing growth in AI talent country/gender combination. hiring relative to growth in overall hiring in the country. Global Comparison: Across Gender A ratio of 1.2 means the growth in AI talent hiring has The “Relative AI Skills Penetration Across Genders” outpaced the growth in overall hiring by 20%. metric allows for cross-gender comparisons within and across countries globally, since we compare the countries’ male and female AI skill penetrations to the same global average regardless of gender. Table of Contents Appendix 370 Artificial Intelligence Index Report 2023 Appendix Chapter 4: The Economy Changelog From Methodology Included in Last Year’s AI Index 1. LinkedIn ramped a new version of its industry taxonomy (see details here). a. This has resulted in changes to our top level five key industries. We have made the full-time series available for each industry (as with prior years). i. “Software & IT Services” industry evolved into a wider “Technology, Information and Media,” which encompasses media and telecommunications as well as other sub-industries. ii. Former “Hardware & Networking” industry does not exist in the new taxonomy, so we introduced “Professional Services” industry as the fifth industry in scope which contains a high concentration of AI talent. iii. Remaining “Education,” “Manufacturing,” and “Financial Services” (formerly known as “Finance”) also had updates in their coverage resulting from the inclusion of more granular sub-industries. b. This also resulted in minor changes in magnitudes for some metrics since the distinct number of industries, as well as the distinct number of AI occupations defined within each country-industry pair have changed: i. We define AI occupations (occupation representatives that require AI skills to perform the job) and the respective definition of AI Talent at Country-Industry level. For example, data engineers working in the technology, information, and media industry in Germany may be identified as holding an AI occupation, whereas data engineers working in the construction industry in the United Arab Emirates may not be identified as AI Talent. Following the introduction of a more granular industry taxonomy with improved accuracy, our AI Talent identifications have been improved, and results have been reflected to the entirety of time series for each relevant metric. ii. The following metrics have been impacted by this change in industry taxonomy: AI Talent Concentrations, and Relative AI Hiring Rates. No directional changes were observed, only minor changes in magnitudes. 2. We introduced a methodology change into Relative Skills Penetration metrics: a. In the past, the data used to calculate these metrics were limited to top five industries with the highest AI skill penetration globally: “Software & IT Services,” “Hardware & Networking,” “Manufacturing,” “Education,” and “Finance” industries. This year we updated our coverage to all industries. Table of Contents Appendix 371 Artificial Intelligence Index Report 2023 Appendix Chapter 4: The Economy NetBase Quid subsidiary, out of business) throughout the world. Prepared by Bill Valle and Nicole Seredenko M&A, public offerings, minority stakes made by PE/ The investment data includes private investments, NetBase Quid delivers AI-powered consumer and VCs, corporate venture arms, governments, and market intelligence to enable business reinvention institutions both within and outside the United States. in a noisy and unpredictable world. The software Some data is simply unreachable—for instance, when applies artificial intelligence to reveal patterns in large, investors’ names or funding amounts are undisclosed. unstructured datasets and to generate visualizations NetBase Quid embeds Capital IQ data as a default that enable users to make smart, data-driven decisions accurately, quickly, and efficiently. NetBase Quid uses Boolean query to search for focus areas, topics, and keywords within social media, news, forums and blogs, companies, and patents data sources, as well as other custom datasets. NetBase Quid then visualizes these data points based on the semantic similarity. Search, Data Sources, and Scope Over 8 million global public and private company profiles from multiple data sources are indexed in order to search across company descriptions, while filtering and including metadata ranging from investment information to firmographic information, such as founded year, HQ location, and more. Company information is updated on a weekly basis. The NetBase Quid algorithm reads a big amount of text data from each document to make links between different documents based on their similar language. This process is repeated at an immense scale, which produces a network with different clusters identifying distinct topics or focus areas. Trends are identified based on keywords, phrases, people, companies, and institutions that NetBase Quid identifies, and the other metadata that is put into the software. and adds in data from Crunchbase for the data points that are not captured in Capital IQ. This not only yields comprehensive and accurate data on all global organizations, but it also captures earlystage startups and funding events data. Company information is updated on a weekly basis. Earnings Calls NetBase Quid leverages earnings call transcript data embedded from Seeking Alpha. For this report, NetBase Quid has analyzed mentions of AI-related keywords across all earnings call transcripts from Fortune 500 companies from January 2018 through December 2022. New earnings call transcript data is updated in NetBase Quid on the 1st and 15th of every month. Search Parameters Boolean query is used to search for focus areas, topics, and keywords within the archived company database, within their business descriptions and websites. We can filter out the search results by HQ regions, investment amount, operating status, organization type (private/public), and founding year. NetBase Quid then visualizes these companies Data by semantic similarity. If there are more than 7,000 Companies companies from the search result, NetBase Quid Organization data is embedded from Capital IQ and selects the 7,000 most relevant companies for Crunchbase. These companies include all types of visualization based on the language algorithm. companies (private, public, operating, operating as a Table of Contents Appendix 372 Artificial Intelligence Index Report 2023 Boolean Search: “artificial intelligence” or “AI” or “machine learning” or “deep learning” Companies: Appendix Chapter 4: The Economy McKinsey & Company Data used in the Corporate Activity-Industry Adoption section was sourced from the McKinsey • Global AI and ML companies that have received investments (private, IPO, M&A) from January 1, 2013, to December 31, 2022. • Global AI and ML companies that have received Global Survey “The State of AI in 2022—and a Half Decade in Review.” The online survey was in the field from May 3, 2022, to May 27, 2022, and from August 15, 2022, to over $1.5M for the last 10 years (January 1, 2013, August 17, 2022, and garnered responses from 1,492 to December 31, 2022): 7,000 out of 7,500 participants representing a full range of regions, companies have been selected through NetBase industries, company sizes, functional specialties, Quid’s relevance algorithm. and tenures. Of those respondents, 744 said their organization had adopted AI in at least one function Target Event Definitions and were asked questions about their organization’s • Private investments: A private placement is a AI use. To adjust for differences in response rates, private sale of newly issued securities (equity or the data is weighted by the contribution of each debt) by a company to a selected investor or a respondent’s nation to global GDP. selected group of investors. The stakes that buyers take in private placements are often minority stakes (under 50%), although it is possible to take control of a company through a private placement The AI Index also considered data from previous iterations of the survey. More specifically, the AI index made use of data from: as well, in which case the private placement would The State of AI in 2021 be a majority stake investment. The State of AI in 2020 • Minority investment: These refer to minority stake acquisitions in NetBase Quid, which take place when the buyer acquires less than 50% of the existing ownership stake in entities, asset Global AI Survey: AI Proves Its Worth, But Few Scale Impact (2019) AI Adoption Advances, But Foundational Barriers Remain (2018) products, and business divisions. • M&A: This refers to a buyer acquiring more than 50% of the existing ownership stake in entities, asset products, and business divisions. Table of Contents Appendix 373 Artificial Intelligence Index Report 2023 Appendix Chapter 4: The Economy GitHub To obtain a global view of how AI is transforming Data on the effects of GitHub’s Copilot on developer business leaders between April 2022 and May 2022. productivity and happiness was sourced from the GitHub Copilot Survey conducted in 2022. organizations, Deloitte surveyed 2,620 global Thirteen countries were represented: Australia (100 respondents), Brazil (115 respondents), Canada (175 The survey was emailed to 17,420 users who had opted respondents), China (200 respondents), France (130 in to receive communications and were using GitHub respondents), Germany (150 respondents), India Copilot for their daily programming activities. Between (200 respondents), Israel (75 respondents), Japan February 10, 2022, and March 6, 2022, the authors (100 respondents), Singapore (100 respondents), received 2,047 responses that could be matched with South Africa (75 respondents), the United Kingdom usage measurements during the four-week period (200 respondents), and the United States (1,000 leading up to March 12, 2022. The survey contained respondents). All participating companies have multiple-choice questions on demographic information adopted AI technologies and are AI users. and Likert-type questions on different aspects of Respondents were required to meet one of the productivity, which were randomized in the order of following criteria: responsible for AI technology appearance to the user. spending or approval of AI investments, developing More details can be found in Ziegler at al., 2022. AI technology strategies, managing or overseeing AI technology implementation, serving as an AI Deloitte technology subject matter specialist, or making Data used in the Corporate Activity-Industry Motivation complement the blind survey, Deloitte conducted section was sourced from Deloitte’s “State of AI in the qualitative telephone interviews with 15 AI specialists Enterprise” surveys. from various industries. More details are available on or influencing decisions around AI technology. To More specifically, the AI Index made use of the following sources of information: Deloitte’s State of AI in the Enterprise, 5th Edition Report (2022) Deloitte’s website. International Federation of Robotics (IFR) State of AI in the Enterprise, 4th Edition (2021) Data presented in the Robot Installations section was Deloitte’s State of AI in the Enterprise, 3rd Edition (2020) sourced from the “World Robotics 2022” report. State of AI in the Enterprise, 2nd Edition (2018) The 2017 Deloitte State of Cognitive Survey (2017) Table of Contents Appendix 374 Artificial Intelligence Index Report 2023 Appendix Chapter 5: Education Chapter 5: Education Computing Research Association (CRA Taulbee Survey) Note: This year’s AI Index reused the methodological notes that were submitted by the CRA for previous editions of the AI Index. For more complete delineations of the methodology used by the CRA, please consult the individual CRA surveys that are linked below. Computing Research Association (CRA) members The CRA Taulbee Survey is sent only to doctoral departments of computer science, computer engineering, and information science/systems. Historically, (a) Taulbee covers one-quarter to onethird of total BS CS recipients in the United States; (b) the percent of women earning bachelor’s degrees is lower in the Taulbee schools than overall; and (c) Taulbee tracks the trends in overall CS production. The AI Index used data from the following iterations of the CRA survey: are 200-plus North American organizations active in CRA, 2021 computing research: academic departments of computer CRA, 2020 science and computer engineering; laboratories and CRA, 2019 centers in industry, government, and academia; and affiliated professional societies (AAAI, ACM, CACS/ AIC, IEEE Computer Society, SIAM USENIX). CRA’s mission is to enhance innovation by joining with industry, CRA, 2018 CRA, 2017 CRA, 2016 government, and academia to strengthen research and CRA, 2015 advanced education in computing. Learn more about CRA, 2014 CRA here. CRA, 2013 The CRA Taulbee Survey gathers survey data during the fall of each academic year by reaching out to over 200 CRA, 2012 CRA, 2011 PhD-granting departments. Details about the Taulbee Survey can be found here. Taulbee doesn’t directly survey the students. The department identifies each new PhD’s area of specialization as well as their type of employment. Data is collected from September to January of each academic year for PhDs awarded in the previous academic year. Results are published in May after data collection closes. Table of Contents Appendix 375 Artificial Intelligence Index Report 2023 Appendix Chapter 5: Education Code.org State Level Data The following link includes a full description of the methodology used by Code.org to collect its data. The staff at Code.org also maintains a database of the state of American K–12 education and, in this policy primer, provides a greater amount of detail on the state of American K–12 education in each state. AP Computer Science Data The AP computer science data is provided to Code.org as per an agreement the College Board maintains with Code.org. The AP Computer Science data comes from the college board’s national and state summary reports. The State of International K–12 Education Data on the state of international K–12 AI education was taken from the following UNESCO report, published in 2021. The methodology is outlined in greater detail on pages 18 to 20 in the report and, for the sake of brevity, is not completely reproduced in the 2023 AI index. Table of Contents Appendix 376 Artificial Intelligence Index Report 2023 Appendix Chapter 6: Policy and Governance Chapter 6: Policy and Governance Global Legislation Records on AI For AI-related bills passed into laws, the AI Index performed searches of the keyword “artificial intelligence” on the websites of 127 countries’ congresses or parliaments (in the respective languages) in the full text of bills. Note that only laws passed by state-level legislative bodies and signed into law (i.e., by presidents or through royal assent) from 2016 to 2022 are included. Laws that were approved but then repealed are not included in the analysis. In some cases, there were databases that were only searchable by title, so site search functions were deployed. Future AI Index reports hope to include analysis on other types of legal documents, such as regulations and standards, adopted by state- or supranational-level legislative bodies, government agencies, etc. The AI Index team surveyed the following databases: Algeria China India Monaco Slovenia Andorra Colombia Iran, Islamic Republic Montenegro South Africa Antigua and Barbuda Croatia Iraq Morocco Spain Argentina Cuba Ireland Mozambique Sri Lanka Armenia Curacao Isle of Man Nauru St. Kitts and Nevis Australia Cyprus Israel The Netherlands Suriname Austria Czech Republic Italy New Zealand Sweden Azerbaijan Denmark Jamaica Nicaragua Switzerland The Bahamas Estonia Japan Niger Tajikistan Bahrain Faroe Islands Kazakhstan Tanzania Bangladesh Fiji Kenya Northern Marina Islands Barbados Finland Kiribati Norway Tongo Belarus France Korea, Republic Belgium The Gambia Kosovo Belize Georgia Kyrgyz Republic Bermuda Germany Latvia Bhutan Gibraltar Lebanon Bolivia Greece Liechtenstein Brazil Greenland Lithuania Brunei Grenada Luxembourg Bulgaria Guam Macao SAR, China Burkina Faso Guatemala Malawi Cameroon Guyana Malaysia Canada Hong Kong Malta Cayman Islands Hungary Mauritius Chile Iceland Mexico Table of Contents Appendix Panama Papua New Guinea Philippines Poland Portugal Romania Russia Samoa Saudi Arabia Serbia Seychelles Sierra Leone Singapore Togo Turkey Tuvalu Uganda Ukraine United Arab Emirates United Kingdom United States Uruguay Vietnam Yemen Zambia Zimbabwe Slovak Republic 37 7 Artificial Intelligence Index Report 2023 Appendix Chapter 6: Policy and Governance United States State-Level AI Legislation For AI-related bills passed into law, the AI Index performed searches of the keyword “artificial intelligence” on the legislative websites of all 50 U.S. states in the full text of bills. Bills are only counted as passed into law if the final version of the bill includes the keyword, not just the introduced version. Note that only laws passed from 2015 to 2022 are included. The count for proposed laws includes both laws that were proposed and eventually passed as well as laws that were proposed that have not yet been passed, or are now inactive. In some cases, databases were only searchable by title, so site search functions were deployed. The AI Index team surveyed the following databases: Alabama Hawaii Massachusetts New Mexico South Dakota Alaska Idaho Michigan New York Tennessee Arizona Illinois Minnesota North Carolina Texas Arkansas Indiana Mississippi North Dakota Utah California Iowa Missouri Ohio Vermont Colorado Kansas Montana Oklahoma Virginia Connecticut Kentucky Nebraska Oregon Washington Delaware Louisiana Nevada Pennsylvania West Virginia Florida Maine New Hampshire Rhode Island Wisconsin Georgia Maryland New Jersey South Carolina Wyoming Table of Contents Appendix 378 Artificial Intelligence Index Report 2023 Appendix Chapter 6: Policy and Governance Global AI Mentions For mentions of AI in AI-related legislative proceedings around the world, the AI Index performed searches of the keyword “artificial intelligence” on the websites of 81 countries’ congresses or parliaments (in the respective languages), usually under sections named “minutes,” “hansard,” etc. In some cases, databases were only searchable by title, so site search functions were deployed. The AI Index team surveyed the following databases: Andorra Ecuador Japan Angola El Salvador Kenya Armenia Estonia Kosovo Australia Fiji Latvia Azerbaijan Finland Lesotho Barbados France Liechtenstein Belgium The Gambia Luxembourg Bermuda Germany Macao SAR, China Bhutan Gibraltar Madagascar Brazil Greece Malaysia Cabo Verde Hong Kong Maldives Canada Iceland Malta Cayman Islands India Mauritius China Ireland Mexico Czech Republic Isle of Man Moldova Denmark Israel Netherlands Dominican Republic Italy New Zealand 11 Northern Mariana Islands South Africa Norway Spain Pakistan Panama Papua New Guinea Philippines Poland Portugal Romania Russia Samoa San Marino Seychelles Sierra Leone South Korea Sri Lanka Sweden Switzerland Tanzania Trinidad and Tobago Ukraine United Kingdom United States Uruguay Zambia Zimbabwe Singapore Slovenia 11 The National People’s Congress is held once per year and does not provide full legislative proceedings. Hence, the counts included in the analysis only searched mentions of “artificial intelligence” in the only public document released from the Congress meetings, the Report on the Work of the Government, delivered by the premier. Table of Contents Appendix 379 Artificial Intelligence Index Report 2023 United States Committee Mentions In order to research trends on the United States’ committee mentions of AI, the following search was conducted: Appendix Chapter 6: Policy and Governance • Private Sector Companies: Google AI, Microsoft AI, Nvidia, OpenAI • Think Tanks and Policy Institutes: American Enterprise Institute, Aspen Institute, Atlantic Council, Brookings Institute, Carnegie Endowment for International Peace, Cato Website: Congress.gov Institute, Center for a New American Security, Keyword: artificial intelligence Center for Strategic and International Studies, Filters: Committee Reports Council on Foreign Relations, Heritage United States AI Policy Papers Foundation, Hudson Institute, MacroPolo, National Security Institute, New America Foundation, RAND Corporation, Rockefeller Organizations To develop a more nuanced understanding of the thought leadership that motivates AI policy, we Foundation, Stimson Center, Urban Institute, Wilson Center tracked policy papers published by 55 organizations • University Institutes and Research Programs: in the United States or with a strong presence in the AI and Humanity, Cornell University; AI Now United States (expanded from last year’s list of 36 Institute, New York University; AI Pulse, UCLA organizations) across four broad categories: Law; Belfer Center for Science and International • Civil Society, Associations, and Consortiums: Algorithmic Justice League, Alliance for Artificial Intelligence in Healthcare, Amnesty International, EFF, Future of Privacy Forum, Human Rights Watch, IJIS Institute, Institute for Electrical and Electronics Engineers, Partnership on AI • Consultancy: Accenture, Bain & Company, Affairs, Harvard University; Berkman Klein Center, Harvard University; Center for Information Technology Policy, Princeton University; Center for Long-Term Cybersecurity, UC Berkeley; Center for Security and Emerging Technology, Georgetown University; CITRIS Policy Lab, UC Berkeley; Hoover Institution, Stanford University; Institute for Human-Centered Artificial Boston Consulting Group, Deloitte, McKinsey & Intelligence, Stanford University; Internet Policy Company Research Initiative, Massachusetts Institute of • Government Agencies: Congressional Research Service, Defense Technical Information Center, Technology; MIT Lincoln Laboratory; Princeton School of Public and International Affairs Government Accountability Office, Library of Congress, Pentagon Library Table of Contents Appendix 380 Artificial Intelligence Index Report 2023 Methodology Appendix Chapter 6: Policy and Governance • Industry and Regulation: economy, antitrust, Each broad topic area is based on a collection of M&A, competition, finance, management, supply underlying keywords that describe the content of the chain, telecom, economic regulation, technical specific paper. We included 17 topics that represented standards, autonomous vehicle industry and the majority of discourse related to AI between 2018– regulation 2021. These topic areas and the associated keywords are listed below: • Innovation and Technology: advancements and improvements in AI technology, R&D, intellectual • Health and Biological Sciences: medicine, property, patents, entrepreneurship, innovation healthcare systems, drug discovery, care, ecosystems, startups, computer science, biomedical research, insurance, health behaviors, engineering COVID-19, global health • Physical Sciences: chemistry, physics, astronomy, earth science • Education and Skills: early childhood, K–12, higher education, STEM, schools, classrooms, reskilling • Energy and Environment: energy costs, climate • Workforce and Labor: labor supply and demand, change, energy markets, pollution, conservation, talent, immigration, migration, personnel oil and gas, alternative energy economics, future of work • International Affairs and International Security: • Social and Behavioral Sciences: sociology, international relations, international trade, linguistics, anthropology, ethnic studies, developing countries, humanitarian assistance, demography, geography, psychology, cognitive warfare, regional security, national security, science autonomous weapons • Justice and Law Enforcement: civil justice, criminal justice, social justice, police, public safety, courts • Humanities: arts, music, literature, language, performance, theater, classics, history, philosophy, religion, cultural studies • Equity and Inclusion: biases, discrimination, • Communications and Media: social media, disinformation, media markets, deepfakes • Government and Public Administration: gender, race, socioeconomic inequality, disabilities, vulnerable populations • Privacy, Safety, and Security: anonymity, GDPR, federal government, state government, local consumer protection, physical safety, human government, public sector efficiency, public control, cybersecurity, encryption, hacking sector effectiveness, government services, government benefits, government programs, public works, public transportation • Ethics: transparency, accountability, human values, human rights, sustainability, explainability, interpretability, decision-making norms • Democracy: elections, rights, freedoms, liberties, personal freedoms Table of Contents Appendix 381 Artificial Intelligence Index Report 2023 Appendix Chapter 6: Policy and Governance National AI Strategies The AI Index did a web search to identify national strategies on AI. Below is a list of countries that were identified as having a national AI strategy, including a link to said strategy. For certain counties, noted with an asterisk(*), the actual strategy was not found, and a news article confirming the launch of the strategy was linked instead. Countries with AI Strategies in Place Algeria* Cyprus Italy Philippines Switzerland Argentina Czech Republic Japan Poland Thailand Australia Denmark Kenya Portugal Tunisia* Austria Egypt, Arab Republic Korea, Republic Qatar Turkey Bangladesh Estonia Latvia Romania Ukraine Botswana* Finland Lithuania Russia United Arab Emirates Brazil France Luxembourg Saudi Arabia United Kingdom Bulgaria Germany Malta Serbia United States Canada Greece Mauritius Sierra Leone Uruguay Chile Hungary Mexico Singapore Vietnam China India The Netherlands Slovenia Colombia Indonesia Norway Spain Croatia Ireland Peru Sweden Countries with AI Strategies in Development Armenia Federal Budget for Nondefense AI R&D Data on the federal U.S. budget for nondefense AI R&D was taken from previous editions of the AI Index (namely the 2021 and 2022 versions) and from the Azerbaijan following National Science and Technology Council reports: Bahrain Supplement to the President’s FY 2023 Budget Belgium Supplement to the President’s FY2022 Budget Benin Cuba Iceland Israel Jordan Morocco New Zealand Nigeria Oman Uzbekistan Table of Contents U.S. Department of Defense Budget Requests Data on the DoD nonclassified AI-related budget requests was taken from previous editions of the AI Index (namely the 2021 and 2022 versions) and from the following reports: Defense Budget Overview United States Department of Defense Fiscal Year 2023 Budget Request Defense Budget Overview United States Department of Defense Fiscal Year 2022 Budget Request Appendix 382 Artificial Intelligence Index Report 2023 Govini Appendix Chapter 6: Policy and Governance through regular government reporting processes or Govini is the leading commercial data company in the defense technology space. Built by Govini, Ark. ai is used at scale across the national security sector of the U.S. federal government. This platform enables government analysts, program managers, and decision-makers to gain unprecedented visibility into the companies, capabilities, and capital in national security to solve challenges pertaining to acquisition, foreign influence and adversarial capital, nuclear human-intensive analytical approaches. Moreover, beyond simply making usable an expansive body of data sources, Govini’s SaaS Platform and National Security Knowledge Graph establishes high fidelity standards in categorized and fused data to produce a comprehensive and accurate depiction of federal spending, and the supporting vendor ecosystem, over time. modernization, procurement, science and technology, U.S. AI-Related Legal Cases and supply chain. To identify AI-related legal cases, the AI Index research Govini curated USG AI spend data from their annual Scorecard Taxonomy by applying supervised machine learning (ML) and natural language processing (NLP) to parse, analyze, and categorize large volumes of federal contracts data, including prime contracts, grants, and other transaction authority (OTA) team did a keyword search on the LexisNexis database, under their U.S. legal cases filter. The keywords that were searched include “artificial intelligence,” “machine learning,” and “automated decision-making.” Cases that contained one of these keywords were coded according to a variety of variables of interest. awards. Govini’s most recent scorecard focused on critical technologies, of which AI/ML technologies was a segment and consistent of six subsegments: data-at-scale, decision science, computer vision, machine learning, autonomy, and natural language processing. By initially generating search terms and then subsequently excluding specific terms that yield erroneous results, Govini delivers a comprehensive yet discriminant taxonomy of subsegments that are mutually exclusive. Repeated keyword searches and filters allow a consensus, data-driven taxonomy to come into focus. Govini SMEs conduct a final review of taxonomic structure to complement this iterative, data-driven process. The use of AI and supervised ML models enables the analysis of large volumes of irregular data contained in federal contracts—data that is often inaccessible Table of Contents Appendix 383 Artificial Intelligence Index Report 2023 Appendix Chapter 7: Diversity Chapter 7: Diversity Computing Research Association (CRA Taulbee Survey) To learn more about the diversity data from the CRA, please read the methodological note on the CRA’s data included in the Chapter 5 subsection of the Appendix. Code.org To learn more about the diversity data from Code. org, please read the methodological note on Code. org’s data included in the Chapter 5 subsection of the Appendix. Table of Contents Appendix 384 Artificial Intelligence Index Report 2023 Appendix Chapter 8: Public Opinion Chapter 8: Public Opinion IPSOS For brevity, the 2023 AI Index does not republish the NetBase Quid Social Media Data methodology used by the IPSOS survey that features NetBase Quid collects social media data from over in the report. More details about the IPSOS survey’s methodology can be found in the actual survey. Lloyd’s Register Foundation and Gallup For brevity, the 2023 AI Index does not republish the methodology used by the Lloyd’s Register Foundation and Gallup survey that features in the report. More details about the Lloyd’s Register Foundation and Gallup survey methodology can be found in the actual survey. 500 million sources in real time and analyzes this data through AI-powered Natural Language Processing. This process parses out language and breaks out posts by filters such as drivers of positive and negative sentiment, emotions, and behaviors, allowing for deeper insights to be reached. To understand public perception of advancements in artificial intelligence, NetBase Quid analyzed social media conversation around AI and AI model releases from January 2022 to December 2022. First, the NetBase Quid team analyzed conversation around AI to understand key drivers of general sentiment around AI advancements, Pew Research such as ethical, cultural, and economic concerns and For brevity, the 2023 AI Index does not republish Quid team leveraged the platform for a more targeted the methodology used by the Pew Research survey analysis of the same conversation, understanding that features in the report. More details on the Pew volume and sentiment around the major AI model Research survey methodology can be found in the updates and releases in 2022. This NetBase Quid actual survey. analysis ultimately showcases the relationship perceptions among consumers. Then, the NetBase between public perception and the advancement of AI, leveraging targeted analytics tools to understand both specific reactions to model releases as well as a wider consumer conversation and what drives it. Table of Contents Appendix 385 Artificial Intelligence Index Report 2023