Uploaded by cloudhpc2021

HAI AI-Index-Report 2023

advertisement
Artificial
Intelligence
Index Report
2023
Artificial Intelligence
Index Report 2023
Introduction to the
AI Index Report 2023
Welcome to the sixth edition of the AI Index Report! This year, the report introduces more original data than any
previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter,
original analysis about large language and multimodal models, detailed trends in global AI legislation records,
a study of the environmental impact of AI systems, and more.
The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is
to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives,
journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of
AI. The report aims to be the world’s most credible and authoritative source for data and insights about AI.
From the Co-Directors
AI has moved into its era of deployment; throughout 2022 and the beginning of 2023, new large-scale AI models
have been released every month. These models, such as ChatGPT, Stable Diffusion, Whisper, and DALL-E 2, are
capable of an increasingly broad range of tasks, from text manipulation and analysis, to image generation, to
unprecedentedly good speech recognition. These systems demonstrate capabilities in question answering and the
generation of text, image, and code unimagined a decade ago, and they outperform the state of the art on many
benchmarks, old and new. However, they are prone to hallucination, routinely biased, and can be tricked into
serving nefarious aims, highlighting the complicated ethical challenges associated with their deployment.
Although 2022 was the first year in a decade where private AI investment decreased, AI is still a topic of great
interest to policymakers, industry leaders, researchers, and the public. Policymakers are talking about AI more
than ever before. Industry leaders that have integrated AI into their businesses are seeing tangible cost and
revenue benefits. The number of AI publications and collaborations continues to increase. And the public is
forming sharper opinions about AI and which elements they like or dislike.
AI will continue to improve and, as such, become a greater part of all our lives. Given the increased presence of
this technology and its potential for massive disruption, we should all begin thinking more critically about how
exactly we want AI to be developed and deployed. We should also ask questions about who is deploying it—as
our analysis shows, AI is increasingly defined by the actions of a small set of private sector actors, rather than a
broader range of societal actors. This year’s AI Index paints a picture of where we are so far with AI, in order to
highlight what might await us in the future.
Jack Clark and Ray Perrault
Artificial Intelligence
Index Report 2023
Top Ten Takeaways
1
Industry races ahead of academia.
Until 2014, most significant machine learning
models were released by academia. Since then,
industry has taken over. In 2022, there were 32
significant industry-produced machine learning
models compared to just three produced by
academia. Building state-of-the-art AI systems
increasingly requires large amounts of data, computer
power, and money—resources that industry actors
inherently possess in greater amounts compared to
nonprofits and academia.
2
Performance saturation on
traditional benchmarks.
AI continued to post state-of-the-art results, but
year-over-year improvement on many benchmarks
continues to be marginal. Moreover, the speed at
which benchmark saturation is being reached is
increasing. However, new, more comprehensive
benchmarking suites such as BIG-bench and HELM
are being released.
3
AI is both helping and
harming the environment.
New research suggests that AI systems can have
serious environmental impacts. According to
Luccioni et al., 2022, BLOOM’s training run
emitted 25 times more carbon than a single air
traveler on a one-way trip from New York to
San Francisco. Still, new reinforcement learning
models like BCOOLER show that AI systems
can be used to optimize energy usage.
4
The world’s best new scientist … AI?
5
The number of incidents concerning
the misuse of AI is rapidly rising.
AI models are starting to rapidly accelerate
scientific progress and in 2022 were used to aid
hydrogen fusion, improve the efficiency of matrix
manipulation, and generate new antibodies.
According to the AIAAIC database, which tracks
incidents related to the ethical misuse of AI, the
number of AI incidents and controversies has
increased 26 times since 2012. Some notable incidents
in 2022 included a deepfake video of Ukrainian
President Volodymyr Zelenskyy surrendering and
U.S. prisons using call-monitoring technology on their
inmates. This growth is evidence of both greater use of
AI technologies and awareness of misuse possibilities.
6
The demand for AI-related
professional skills is increasing across
virtually every American industrial sector.
Across every sector in the United States for which
there is data (with the exception of agriculture,
forestry, fishing, and hunting), the number of AIrelated job postings has increased on average from
1.7% in 2021 to 1.9% in 2022. Employers in the United
States are increasingly looking for workers with AIrelated skills.
Artificial Intelligence
Index Report 2023
Top Ten Takeaways (cont’d)
7
For the first time in the last decade,
year-over-year private investment
in AI decreased.
Global AI private investment was $91.9 billion in
2022, which represented a 26.7% decrease since
2021. The total number of AI-related funding events
as well as the number of newly funded AI companies
likewise decreased. Still, during the last decade as a
whole, AI investment has significantly increased. In
2022 the amount of private investment in AI was 18
times greater than it was in 2013.
8
While the proportion of companies
adopting AI has plateaued, the
companies that have adopted AI
continue to pull ahead.
The proportion of companies adopting AI in 2022
has more than doubled since 2017, though it has
plateaued in recent years between 50% and 60%,
according to the results of McKinsey’s annual
research survey. Organizations that have adopted
AI report realizing meaningful cost decreases and
revenue increases.
9
Policymaker interest in AI
is on the rise.
An AI Index analysis of the legislative records of 127
countries shows that the number of bills containing
“artificial intelligence” that were passed into law
grew from just 1 in 2016 to 37 in 2022. An analysis
of the parliamentary records on AI in 81 countries
likewise shows that mentions of AI in global
legislative proceedings have increased nearly
6.5 times since 2016.
10
Chinese citizens are among those
who feel the most positively about
AI products and services. Americans …
not so much.
In a 2022 IPSOS survey, 78% of Chinese respondents
(the highest proportion of surveyed countries) agreed
with the statement that products and services using
AI have more benefits than drawbacks. After Chinese
respondents, those from Saudi Arabia (76%) and India
(71%) felt the most positive about AI products. Only
35% of sampled Americans (among the lowest of
surveyed countries) agreed that products and services
using AI had more benefits than drawbacks.
Artificial Intelligence
Index Report 2023
Steering Committee
Co-directors
Jack Clark
Anthropic, OECD
Raymond Perrault
SRI International
Members
Erik Brynjolfsson
Stanford University
Katrina Ligett
Hebrew University
John Etchemendy
Stanford University
Terah Lyons
Juan Carlos Niebles
Stanford University,
Salesforce
Vanessa Parli
Stanford University
James Manyika
Google,
University of Oxford
Yoav Shoham
(Founding Director)
Stanford University,
AI21 Labs
Russell Wald
Stanford University
Staff and Researchers
Research Manager and Editor in Chief
Research Associate
Nestor Maslej
Stanford University
Loredana Fattorini
Stanford University
Affiliated Researchers
Elif Kiesow Cortez
Stanford Law School
Research Fellow
Helen Ngo
Hugging Face
Robi Rahman
Data Scientist
Alexandra Rome
Freelance Researcher
Graduate Researcher
Han Bai
Stanford University
Undergraduate Researchers
Vania
Chow
Stanford
University
Siddhartha
Javvaji
Stanford
University
Mena
Hassan
Stanford
University
Naima
Patel
Stanford
University
Sukrut
Oak
Stanford
University
Stone
Yang
Stanford
University
Lucy
Zimmerman
Stanford
University
Elizabeth
Zhu
Stanford
University
Artificial Intelligence
Index Report 2023
How to Cite This Report
Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons,
James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark,
and Raymond Perrault, “The AI Index 2023 Annual Report,” AI Index Steering Committee,
Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023.
The AI Index 2023 Annual Report by Stanford University is licensed under
Attribution-NoDerivatives 4.0 International.
Public Data and Tools
The AI Index 2023 Report is supplemented by raw data and an interactive tool.
We invite each reader to use the data and the tool in a way most relevant to their work and interests.
Raw data and charts: The public data and
high-resolution images of all the charts
in the report are available on Google Drive.
Global AI Vibrancy Tool: Compare up to
30 countries across 21 indicators. The Global AI
Vibrancy tool will be updated in the latter half of 2023.
AI Index and Stanford HAI
The AI Index is an independent initiative at the
Stanford Institute for Human-Centered Artificial Intelligence (HAI).
The AI Index was conceived within the One Hundred Year Study on AI (AI100).
We welcome feedback and new ideas for next year.
Contact us at AI-Index-Report@stanford.edu.
Artificial Intelligence
Index Report 2023
Supporting Partners
Analytics and Research Partners
Artificial Intelligence
Index Report 2023
Contributors
We want to acknowledge the following individuals by chapter and section for their contributions of data,
analysis, advice, and expert commentary included in the AI Index 2023 Report:
Research and Development
Sara Abdulla, Catherine Aiken, Luis Aranda, Peter Cihon, Jack Clark, Loredana Fattorini, Nestor Maslej,
Besher Massri, Vanessa Parli, Naima Patel, Ray Perrault, Robi Rahman, Alexandra Rome, Kevin Xu
Technical Performance
Jack Clark, Loredana Fattorini, Siddhartha Javvaji, Katrina Ligett, Nestor Maslej, Juan Carlos Niebles,
Sukrut Oak, Vanessa Parli, Ray Perrault, Robi Rahman, Alexandra Rome, Yoav Shoham, Elizabeth Zhu
Technical AI Ethics
Jack Clark, Loredana Fattorini, Katrina Ligett, Nestor Maslej, Helen Ngo, Sukrut Oak, Vanessa Parli,
Ray Perrault, Alexandra Rome, Elizabeth Zhu, Lucy Zimmerman
Economy
Susanne Bieller, Erik Brynjolfsson, Vania Chow, Jack Clark, Natalia Dorogi, Murat Erer, Loredana Fattorini,
Akash Kaura, James Manyika, Nestor Maslej, Layla O’Kane, Vanessa Parli, Ray Perrault, Brittany Presten,
Alexandra Rome, Nicole Seredenko, Bledi Taska, Bill Valle, Casey Weston
Education
Han Bai, Betsy Bizot, Jack Clark, John Etchemendy, Loredana Fattorini, Katrina Ligett, Nestor Maslej,
Vanessa Parli, Ray Perrault, Sean Roberts, Alexandra Rome
Policy and Governance
Meghan Anand, Han Bai, Vania Chow, Jack Clark, Elif Kiesow Cortez, Rebecca DeCrescenzo, Loredana Fattorini,
Taehwa Hong, Joe Hsu, Kai Kato, Terah Lyons, Nestor Maslej, Alistair Murray, Vanessa Parli, Ray Perrault, Alexandra Rome,
Sarah Smedley, Russell Wald, Brian Williams, Catherina Xu, Stone Yang, Katie Yoon, Daniel Zhang
Diversity
Han Bai, Betsy Bizot, Jack Clark, Loredana Fattorini, Nezihe Merve Gürel, Mena Hassan, Katrina Ligett,
Nestor Maslej, Vanessa Parli, Ray Perrault, Sean Roberts, Alexandra Rome, Sarah Tan, Lucy Zimmerman
Public Opinion
Jack Clark, Loredana Fattorini, Mena Hassan, Nestor Maslej, Vanessa Parli, Ray Perrault,
Alexandra Rome, Nicole Seredenko, Bill Valle, Lucy Zimmerman
Conference Attendance
Terri Auricchio (ICML), Lee Campbell (ICLR), Cassio de Campos (UAI), Meredith Ellison (AAAI), Nicole Finn (CVPR),
Vasant Gajanan (AAAI), Katja Hofmann (ICLR), Gerhard Lakemeyer (KR), Seth Lazar (FAccT), Shugen Ma (IROS),
Becky Obbema (NeurIPS), Vesna Sabljakovic-Fritz (IJCAI), Csaba Szepesvari (ICML), Matthew Taylor (AAMAS),
Sylvie Thiebaux (ICAPS), Pradeep Varakantham (ICAPS)
Artificial Intelligence
Index Report 2023
We thank the following organizations and individuals who provided
data for inclusion in the AI Index 2023 Report:
Organizations
Code.org
Sean Roberts
Lightcast
Layla O’Kane, Bledi Taska
Center for Security and
Emerging Technology,
Georgetown University
Sara Abdulla, Catherine Aiken
LinkedIn
Murat Erer, Akash Kaura,
Casey Weston
Computing Research
Association
Betsy Bizot
GitHub
Peter Cihon, Kevin Xu
Govini
Rebecca DeCrescenzo,
Joe Hsu, Sarah Smedley
McKinsey & Company
Natalia Dorogi, Brittany Presten
NetBase Quid
Nicole Seredenko, Bill Valle
OECD.AI Policy Observatory
Luis Aranda, Besher Massri
Women in Machine Learning
Nezihe Merve Gürel, Sarah Tan
We also would like to thank Jeanina Casusi, Nancy King, Shana Lynch, Jonathan Mindes,
Michi Turner, and Madeleine Wright for their help in preparing this report, and Joe Hinman and
Santanu Mukherjee for their help in maintaining the AI Index website.
Artificial Intelligence
Index Report 2023
Table of Contents
Report Highlights 11
Chapter 1
Research and Development 20
Chapter 2
Technical Performance 69
Chapter 3
Technical AI Ethics
125
Chapter 4
The Economy
168
Chapter 5
Education
234
Chapter 6
Policy and Governance
263
Chapter 7
Diversity
296
Chapter 8
Public Opinion
319
Appendix
ACCESS THE PUBLIC DATA
344
Artificial Intelligence
Index Report 2023
Report Highlights
Chapter 1: Research and Development
The United States and China had the greatest number of cross-country collaborations in AI
publications from 2010 to 2021, although the pace of collaboration has slowed. The number of AI
research collaborations between the United States and China increased roughly 4 times since 2010,
and was 2.5 times greater than the collaboration totals of the next nearest country pair, the United
Kingdom and China. However the total number of U.S.-China collaborations only increased by 2.1%
from 2020 to 2021, the smallest year-over-year growth rate since 2010.
AI research is on the rise, across the board. The total number of AI publications has more than
doubled since 2010. The specific AI topics that continue dominating research include pattern
recognition, machine learning, and computer vision.
China continues to lead in total AI journal, conference, and repository publications.
The United States is still ahead in terms of AI conference and repository citations, but those
leads are slowly eroding. Still, the majority of the world’s large language and multimodal models
(54% in 2022) are produced by American institutions.
Industry races ahead of academia. Until 2014, most significant machine learning models were
released by academia. Since then, industry has taken over. In 2022, there were 32 significant
industry-produced machine learning models compared to just three produced by academia.
Building state-of-the-art AI systems increasingly requires large amounts of data, computer power,
and money—resources that industry actors inherently possess in greater amounts compared to
nonprofits and academia.
Large language models are getting bigger and more expensive. GPT-2, released in 2019,
considered by many to be the first large language model, had 1.5 billion parameters and cost an
estimated $50,000 USD to train. PaLM, one of the flagship large language models launched in 2022,
had 540 billion parameters and cost an estimated $8 million USD—PaLM was around 360 times
larger than GPT-2 and cost 160 times more. It’s not just PaLM: Across the board, large language and
multimodal models are becoming larger and pricier.
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
Performance saturation on traditional benchmarks. AI continued to post state-of-the-art results,
but year-over-year improvement on many benchmarks continues to be marginal. Moreover,
the speed at which benchmark saturation is being reached is increasing. However, new, more
comprehensive benchmarking suites such as BIG-bench and HELM are being released.
Generative AI breaks into the public consciousness. 2022 saw the release of text-to-image
models like DALL-E 2 and Stable Diffusion, text-to-video systems like Make-A-Video, and chatbots
like ChatGPT. Still, these systems can be prone to hallucination, confidently outputting incoherent or
untrue responses, making it hard to rely on them for critical applications.
AI systems become more flexible. Traditionally AI systems have performed well on narrow tasks
but have struggled across broader tasks. Recently released models challenge that trend; BEiT-3,
PaLI, and Gato, among others, are single AI systems increasingly capable of navigating multiple tasks
(for example, vision, language).
Capable language models still struggle with reasoning. Language models continued to improve
their generative capabilities, but new research suggests that they still struggle with complex
planning tasks.
AI is both helping and harming the environment. New research suggests that AI systems can have
serious environmental impacts. According to Luccioni et al., 2022, BLOOM’s training run emitted 25
times more carbon than a single air traveler on a one-way trip from New York to San Francisco. Still,
new reinforcement learning models like BCOOLER show that AI systems can be used to optimize
energy usage.
The world’s best new scientist … AI? AI models are starting to rapidly accelerate scientific
progress and in 2022 were used to aid hydrogen fusion, improve the efficiency of matrix
manipulation, and generate new antibodies.
AI starts to build better AI. Nvidia used an AI reinforcement learning agent to improve the design
of the chips that power AI systems. Similarly, Google recently used one of its language models,
PaLM, to suggest ways to improve the very same model. Self-improving AI learning will accelerate
AI progress.
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
The effects of model scale on bias and toxicity are confounded by training data and mitigation
methods. In the past year, several institutions have built their own large models trained on
proprietary data—and while large models are still toxic and biased, new evidence suggests that
these issues can be somewhat mitigated after training larger models with instruction-tuning.
Generative models have arrived and so have their ethical problems. In 2022, generative models
became part of the zeitgeist. These models are capable but also come with ethical challenges. Textto-image generators are routinely biased along gender dimensions, and chatbots like ChatGPT can
be tricked into serving nefarious aims.
The number of incidents concerning the misuse of AI is rapidly rising. According to the AIAAIC
database, which tracks incidents related to the ethical misuse of AI, the number of AI incidents
and controversies has increased 26 times since 2012. Some notable incidents in 2022 included a
deepfake video of Ukrainian President Volodymyr Zelenskyy surrendering and U.S. prisons using
call-monitoring technology on their inmates. This growth is evidence of both greater use of AI
technologies and awareness of misuse possibilities.
Fairer models may not be less biased. Extensive analysis of language models suggests that while there
is a clear correlation between performance and fairness, fairness and bias can be at odds: Language
models which perform better on certain fairness benchmarks tend to have worse gender bias.
Interest in AI ethics continues to skyrocket. The number of accepted submissions to FAccT, a
leading AI ethics conference, has more than doubled since 2021 and increased by a factor of 10 since
2018. 2022 also saw more submissions than ever from industry actors.
Automated fact-checking with natural language processing isn’t so straightforward after all.
While several benchmarks have been developed for automated fact-checking, researchers find that
11 of 16 of such datasets rely on evidence “leaked” from fact-checking reports which did not exist at
the time of the claim surfacing.
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
The demand for AI-related professional skills is increasing across virtually every American
industrial sector. Across every sector in the United States for which there is data (with the exception
of agriculture, forestry, fishing, and hunting), the number of AI-related job postings has increased on
average from 1.7% in 2021 to 1.9% in 2022. Employers in the United States are increasingly looking for
workers with AI-related skills.
For the first time in the last decade, year-over-year private investment in AI decreased.
Global AI private investment was $91.9 billion in 2022, which represented a 26.7% decrease since 2021.
The total number of AI-related funding events as well as the number of newly funded AI companies
likewise decreased. Still, during the last decade as a whole, AI investment has significantly increased.
In 2022 the amount of private investment in AI was 18 times greater than it was in 2013.
Once again, the United States leads in investment in AI. The U.S. led the world in terms of total
amount of AI private investment. In 2022, the $47.4 billion invested in the U.S. was roughly 3.5 times
the amount invested in the next highest country, China ($13.4 billion). The U.S. also continues to lead in
terms of total number of newly funded AI companies, seeing 1.9 times more than the European Union
and the United Kingdom combined, and 3.4 times more than China.
In 2022, the AI focus area with the most investment was medical and healthcare ($6.1 billion);
followed by data management, processing, and cloud ($5.9 billion); and Fintech ($5.5 billion).
However, mirroring the broader trend in AI private investment, most AI focus areas saw less
investment in 2022 than in 2021. In the last year, the three largest AI private investment events were:
(1) a $2.5 billion funding event for GAC Aion New Energy Automobile, a Chinese manufacturer of
electric vehicles; (2) a $1.5 billion Series E funding round for Anduril Industries, a U.S. defense products
company that builds technology for military agencies and border surveillance; and (3) a $1.2 billion
investment in Celonis, a business-data consulting company based in Germany.
While the proportion of companies adopting AI has plateaued, the companies that have adopted
AI continue to pull ahead. The proportion of companies adopting AI in 2022 has more than doubled
since 2017, though it has plateaued in recent years between 50% and 60%, according to the results of
McKinsey’s annual research survey. Organizations that have adopted AI report realizing meaningful
cost decreases and revenue increases.
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy (cont’d)
AI is being deployed by businesses in multifaceted ways. The AI capabilities most likely to have
been embedded in businesses include robotic process automation (39%), computer vision (34%), NL
text understanding (33%), and virtual agents (33%). Moreover, the most commonly adopted AI use
case in 2022 was service operations optimization (24%), followed by the creation of new AI-based
products (20%), customer segmentation (19%), customer service analytics (19%), and new AI-based
enhancement of products (19%).
AI tools like Copilot are tangibly helping workers. Results of a GitHub survey on the use of Copilot,
a text-to-code AI system, find that 88% of surveyed respondents feel more productive when using
the system, 74% feel they are able to focus on more satisfying work, and 88% feel they are able to
complete tasks more quickly.
China dominates industrial robot installations. In 2013, China overtook Japan as the nation installing
the most industrial robots. Since then, the gap between the total number of industrial robots installed
by China and the next-nearest nation has widened. In 2021, China installed more industrial robots than
the rest of the world combined.
Artificial Intelligence
Index Report 2023
Chapter 5: Education
More and more AI specialization. The proportion of new computer science PhD graduates from
U.S. universities who specialized in AI jumped to 19.1% in 2021, from 14.9% in 2020 and 10.2% in 2010.
New AI PhDs increasingly head to industry. In 2011, roughly the same proportion of new AI PhD
graduates took jobs in industry (40.9%) as opposed to academia (41.6%). Since then, however, a
majority of AI PhDs have headed to industry. In 2021, 65.4% of AI PhDs took jobs in industry, more
than double the 28.2% who took jobs in academia.
New North American CS, CE, and information faculty hires stayed flat. In the last decade,
the total number of new North American computer science (CS), computer engineering (CE),
and information faculty hires has decreased: There were 710 total hires in 2021 compared to
733 in 2012. Similarly, the total number of tenure-track hires peaked in 2019 at 422 and then
dropped to 324 in 2021.
The gap in external research funding for private versus public American CS departments
continues to widen. In 2011, the median amount of total expenditure from external sources for
computing research was roughly the same for private and public CS departments in the United
States. Since then, the gap has widened, with private U.S. CS departments receiving millions more
in additional funding than public universities. In 2021, the median expenditure for private universities
was $9.7 million, compared to $5.7 million for public universities.
Interest in K–12 AI and computer science education grows in both the United States and the
rest of the world. In 2021, a total of 181,040 AP computer science exams were taken by American
students, a 1.0% increase from the previous year. Since 2007, the number of AP computer science
exams has increased ninefold. As of 2021, 11 countries, including Belgium, China, and South Korea,
have officially endorsed and implemented a K–12 AI curriculum.
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
Policymaker interest in AI is on the rise. An AI Index analysis of the legislative records of 127
countries shows that the number of bills containing “artificial intelligence” that were passed into law
grew from just 1 in 2016 to 37 in 2022. An analysis of the parliamentary records on AI in 81 countries
likewise shows that mentions of AI in global legislative proceedings have increased nearly 6.5 times
since 2016.
From talk to enactment—the U.S. passed more AI bills than ever before. In 2021, only 2% of
all federal AI bills in the United States were passed into law. This number jumped to 10% in 2022.
Similarly, last year 35% of all state-level AI bills were passed into law.
When it comes to AI, policymakers have a lot of thoughts. A qualitative analysis of the
parliamentary proceedings of a diverse group of nations reveals that policymakers think about
AI from a wide range of perspectives. For example, in 2022, legislators in the United Kingdom
discussed the risks of AI-led automation; those in Japan considered the necessity of safeguarding
human rights in the face of AI; and those in Zambia looked at the possibility of using AI for
weather forecasting.
The U.S. government continues to increase spending on AI. Since 2017, the amount of U.S.
government AI-related contract spending has increased roughly 2.5 times.
The legal world is waking up to AI. In 2022, there were 110 AI-related legal cases in United
States state and federal courts, roughly seven times more than in 2016. The majority of these cases
originated in California, New York, and Illinois, and concerned issues relating to civil, intellectual
property, and contract law.
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
North American bachelor’s, master’s, and PhD-level computer science students are becoming
more ethnically diverse. Although white students are still the most represented ethnicity among
new resident bachelor’s, master’s, and PhD-level computer science graduates, students from other
ethnic backgrounds (for example, Asian, Hispanic, and Black or African American) are becoming
increasingly more represented. For example, in 2011, 71.9% of new resident CS bachelor’s graduates
were white. In 2021, that number dropped to 46.7%.
New AI PhDs are still overwhelmingly male. In 2021, 78.7% of new AI PhDs were male.
Only 21.3% were female, a 3.2 percentage point increase from 2011. There continues to be a gender
imbalance in higher-level AI education.
Women make up an increasingly greater share of CS, CE, and information faculty hires.
Since 2017, the proportion of new female CS, CE, and information faculty hires has increased from
24.9% to 30.2%. Still, most CS, CE, and information faculty in North American universities are male
(75.9%). As of 2021, only 0.1% of CS, CE, and information faculty identify as nonbinary.
American K–12 computer science education has become more diverse, in terms of both gender
and ethnicity. The share of AP computer science exams taken by female students increased from
16.8% in 2007 to 30.6% in 2021. Year over year, the share of Asian, Hispanic/Latino/Latina, and
Black/African American students taking AP computer science has likewise increased.
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
Chinese citizens are among those who feel the most positively about AI products and services.
Americans … not so much. In a 2022 IPSOS survey, 78% of Chinese respondents (the highest
proportion of surveyed countries) agreed with the statement that products and services using AI
have more benefits than drawbacks. After Chinese respondents, those from Saudi Arabia (76%) and
India (71%) felt the most positive about AI products. Only 35% of sampled Americans (among the
lowest of surveyed countries) agreed that products and services using AI had more benefits than
drawbacks.
Men tend to feel more positively about AI products and services than women. Men are also
more likely than women to believe that AI will mostly help rather than harm. According to the
2022 IPSOS survey, men are more likely than women to report that AI products and services make
their lives easier, trust companies that use AI, and feel that AI products and services have more
benefits than drawbacks. A 2021 survey by Gallup and Lloyd’s Register Foundation likewise revealed
that men are more likely than women to agree with the statement that AI will mostly help rather than
harm their country in the next 20 years.
People across the world and especially America remain unconvinced by self-driving cars. In
a global survey, only 27% of respondents reported feeling safe in a self-driving car. Similarly, Pew
Research suggests that only 26% of Americans feel that driverless passenger vehicles are a good
idea for society.
Different causes for excitement and concern. Among a sample of surveyed Americans, those
who report feeling excited about AI are most excited about the potential to make life and society
better (31%) and to save time and make things more efficient (13%). Those who report feeling more
concerned worry about the loss of human jobs (19%); surveillance, hacking, and digital privacy (16%);
and the lack of human connection (12%).
NLP researchers … have some strong opinions as well. According to a survey widely distributed to
NLP researchers, 77% either agreed or weakly agreed that private AI firms have too much influence,
41% said that NLP should be regulated, and 73% felt that AI could soon lead to revolutionary societal
change. These were some of the many strong opinions held by the NLP research community.
Artificial Intelligence
Index Report 2023
Artificial Intelligence
Index Report 2023
CHAPTER 1:
Research and
Development
Table of Contents
Chapter 1 Preview
20
Artificial Intelligence
Index Report 2023
CHAPTER 1 PREVIEW:
Research and Development
Overview
22
Computer Vision
46
Chapter Highlights
23
Natural Language Processing
47
Speech Recognition
48
1.1 Publications
24
Overview
24
Total Number of AI Publications
24
1.2 Trends in Significant
Machine Learning Systems
By Type of Publication
25
General Machine Learning Systems
By Field of Study
26
System Types
49
By Sector
27
Sector Analysis
50
Cross-Country Collaboration
29
National Affiliation
51
Cross-Sector Collaboration
31
Systems
51
32
Authorship
53
AI Journal Publications
49
49
Overview
32
Parameter Trends
54
By Region
33
Compute Trends
56
By Geographic Area
34
Citations
35
National Affiliation
58
36
Parameter Count
60
Overview
36
Training Compute
61
By Region
37
Training Cost
62
By Geographic Area
38
Citations
39
1.3 AI Conferences
64
AI Repositories
40
Conference Attendance
64
Overview
40
By Region
41
1.4 Open-Source AI Software
66
By Geographic Area
42
Projects
66
Citations
43
Stars
68
AI Conference Publications
Narrative Highlight:
Top Publishing Institutions
All Fields
Table of Contents
44
Large Language and Multimodal Models
58
ACCESS THE PUBLIC DATA
44
Chapter 1 Preview
21
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
Overview
This chapter captures trends in AI R&D. It begins by examining AI publications,
including journal articles, conference papers, and repositories. Next it considers data
on significant machine learning systems, including large language and multimodal
models. Finally, the chapter concludes by looking at AI conference attendance and
open-source AI research. Although the United States and China continue to dominate
AI R&D, research efforts are becoming increasingly geographically dispersed.
Table of Contents
Chapter 1 Preview
22
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
Chapter Highlights
The United States and China
had the greatest number of
cross-country collaborations in AI
publications from 2010 to 2021,
although the pace of collaboration
has since slowed.
The number of AI research collaborations between
the United States and China increased roughly 4
times since 2010, and was 2.5 times greater than the
collaboration totals of the next nearest country pair,
the United Kingdom and China. However, the total
number of U.S.-China collaborations only increased
by 2.1% from 2020 to 2021, the smallest year-overyear growth rate since 2010.
AI research is on the rise, across
the board. The total number of AI publications
has more than doubled since 2010. The specific AI
topics that continue to dominate research include
pattern recognition, machine learning,
and computer vision.
China continues to lead in total
AI journal, conference, and
repository publications.
The United States is still ahead in terms of AI
conference and repository citations, but those leads
are slowly eroding. Still, the majority of the world’s
large language and multimodal models (54% in 2022)
are produced by American institutions.
Table of Contents
Chapter 1 Preview
Industry races ahead
of academia.
Until 2014, most significant machine
learning models were released by
academia. Since then, industry has taken
over. In 2022, there were 32 significant
industry-produced machine learning
models compared to just three produced
by academia. Building state-of-the-art
AI systems increasingly requires large
amounts of data, computer power, and
money—resources that industry actors
inherently possess in greater amounts
compared to nonprofits and academia.
Large language models
are getting bigger and
more expensive.
GPT-2, released in 2019, considered
by many to be the first large language
model, had 1.5 billion parameters and
cost an estimated $50,000 USD to
train. PaLM, one of the flagship large
language models launched in 2022,
had 540 billion parameters and cost an
estimated $8 million USD—PaLM was
around 360 times larger than GPT-2 and
cost 160 times more. It’s not just PaLM:
Across the board, large language and
multimodal models are becoming larger
and pricier.
23
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
This section draws on data from the Center for Security and Emerging Technology (CSET) at Georgetown University. CSET maintains a
merged corpus of scholarly literature that includes Digital Science’s Dimensions, Clarivate’s Web of Science, Microsoft Academic Graph,
China National Knowledge Infrastructure, arXiv, and Papers With Code. In that corpus, CSET applied a classifier to identify Englishlanguage publications related to the development or application of AI and ML since 2010. For this year’s report, CSET also used select
Chinese AI keywords to identify Chinese-language AI papers; CSET did not deploy this method for previous iterations of the AI Index report.1
In last year’s edition of the report, publication trends were reported up to the year 2021. However, given that there is a significant lag in the
collection of publication metadata, and that in some cases it takes until the middle of any given year to fully capture the previous year’s
publications, in this year’s report, the AI Index team elected to examine publication trends only through 2021, which we, along with CSET,
are confident yields a more fully representative report.
1.1 Publications
Overview
publication and citation data by region for AI journal
articles, conference papers, repositories, and patents.
The figures below capture the total number
of English-language and Chinese-language AI
Total Number of AI Publications
publications globally from 2010 to 2021—by type,
Figure 1.1.1 shows the number of AI publications in
affiliation, cross-country collaboration, and cross-
the world. From 2010 to 2021, the total number of
industry collaboration. The section also breaks down
AI publications more than doubled, growing from
200,000 in 2010 to almost 500,000 in 2021.
Number of AI Publications in the World, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
496.01
Number of AI Publications (in Thousands)
500
400
300
200
100
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.1
1 See the Appendix for more information on CSET’s methodology. For more on the challenge of defining AI and correctly capturing relevant bibliometric data, see the AI Index team’s
discussion in the paper “Measurement in AI Policy: Opportunities and Challenges.”
Table of Contents
Chapter 1 Preview
24
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
By Type of Publication
book chapters, theses, and unknown document types
Figure 1.1.2 shows the types of AI publications released
made up the remaining 10% of publications. While
globally over time. In 2021, 60% of all published AI
journal and repository publications have grown 3
documents were journal articles, 17% were conference
and 26.6 times, respectively, in the past 12 years, the
papers, and 13% were repository submissions. Books,
number of conference papers has declined since 2019.
Number of AI Publications by Type, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
300
293.48, Journal
Number of AI Publications (in Thousands)
270
240
210
180
150
120
90
85.09, Conference
65.21, Repository
60
29.88, Thesis
13.77, Book Chapter
5.82, Unknown
2.76, Book
30
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.2
Table of Contents
Chapter 1 Preview
25
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
By Field of Study
roughly doubled while the number of machine learning
Figure 1.1.3 shows that publications in pattern
papers has roughly quadrupled. Following those two
recognition and machine learning have experienced
topic areas, in 2021, the next most published AI fields
the sharpest growth in the last half decade. Since
of study were computer vision (30,075), algorithm
2015, the number of pattern recognition papers has
(21,527), and data mining (19,181).
Number of AI Publications by Field of Study (Excluding Other AI), 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Number of AI Publications (in Thousands)
60
59.36, Pattern Recognition
50
42.55, Machine Learning
40
30
30.07, Computer Vision
20
21.53, Algorithm
19.18, Data Mining
10
14.99, Natural Language Processing
11.57, Control Theory
10.37, Human–Computer Interaction
6.74, Linguistics
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.3
Table of Contents
Chapter 1 Preview
26
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
By Sector
This section shows the number of AI publications
1.1.5).2 The education sector dominates in each region.
affiliated with education, government, industry,
The level of industry participation is highest in the
nonprofit, and other sectors—first globally (Figure
United States, then in the European Union. Since
1.1.4), then looking at the United States, China, and
2010, the share of education AI publications has been
the European Union plus the United Kingdom (Figure
dropping in each region.
AI Publications (% of Total) by Sector, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
80%
75.23%, Education
70%
AI Publications (% of Total)
60%
50%
40%
30%
20%
13.60%, Nonpro t
10%
7.21%, Industry
3.74%, Government
0.22%, Other
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.4
2 The categorization is adapted based on the Global Research Identifier Database (GRID). Healthcare, including hospitals and facilities, is included under nonprofit. Publications affiliated with
state-sponsored universities are included in the education sector.
Table of Contents
Chapter 1 Preview
27
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
AI Publications (% of Total) by Sector and Geographic Area, 2021
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
69.17%
Education
69.23%
77.85%
14.82%
Nonpro t
18.63%
11.73%
12.60%
Industry
7.90%
5.47%
3.21%
Government
3.92%
4.74%
United States
0.20%
European Union and United Kingdom
Other 0.33%
China
0.20%
0%
Table of Contents
10%
20%
Chapter 1 Preview
30%
40%
50%
AI Publications (% of Total)
60%
70%
80%
Figure 1.1.5
28
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Cross-Country Collaboration
Cross-border collaborations between academics,
By far, the greatest number of collaborations in the
researchers, industry experts, and others are a key
past 12 years took place between the United States
component of modern STEM (science, technology,
and China, increasing roughly four times since 2010.
engineering, and mathematics) development that
However the total number of U.S.-China collaborations
accelerate the dissemination of new ideas and the
only increased by 2.1% from 2020 to 2021, the smallest
growth of research teams. Figures 1.1.6 and 1.1.7 depict
year-over-year growth rate since 2010.
the top cross-country AI collaborations from 2010
to 2021. CSET counted cross-country collaborations
as distinct pairs of countries across authors for each
publication (e.g., four U.S. and four Chinese-affiliated
authors on a single publication are counted as one
U.S.-China collaboration; two publications between
The next largest set of collaborations was between
the United Kingdom and both China and the United
States. In 2021, the number of collaborations between
the United States and China was 2.5 times greater
than between the United Kingdom and China.
the same authors count as two collaborations).
United States and China Collaborations in AI Publications, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
10.47
Number of AI Publications (in Thousands)
10
8
6
4
2
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.6
Table of Contents
Chapter 1 Preview
29
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Cross-Country Collaborations in AI Publications (Excluding U.S. and China), 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
4.13, United Kingdom and China
4.04, United States and United Kingdom
Number of AI Publications (in Thousands)
4
3.42, United States and Germany
3
2.80, China and Australia
2.61, United States and Australia
2
1.83, United States and France
1
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.7
Table of Contents
Chapter 1 Preview
30
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Cross-Sector Collaboration
The increase in AI research outside of academia has
educational institutions (12,856); and educational
broadened and grown collaboration across sectors
and government institutions (8,913). Collaborations
in general. Figure 1.1.8 shows that in 2021 educational
between educational institutions and industry have
institutions and nonprofits (32,551) had the greatest
been among the fastest growing, increasing 4.2 times
number of collaborations; followed by industry and
since 2010.
Cross-Sector Collaborations in AI Publications, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
32.55, Education and Nonpro t
Number of AI Publications (in Thousands)
30
25
20
15
12.86, Industry and Education
10
8.91, Education and Government
5
2.95, Government and Nonpro t
2.26, Industry and Nonpro t
0.63, Industry and Government
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.8
Table of Contents
Chapter 1 Preview
31
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
AI Journal Publications
Overview
After growing only slightly from 2010 to 2015, the number of AI journal publications grew around 2.3 times since
2015. From 2020 to 2021, they increased 14.8% (Figure 1.1.9).
Number of AI Journal Publications, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
293.48
Number of AI Journal Publications (in Thousands)
300
250
200
150
100
50
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.9
Table of Contents
Chapter 1 Preview
32
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
By Region3
Figure 1.1.10 shows the share of AI journal publications
East Asia and the Pacific; Europe and Central Asia;
by region between 2010 and 2021. In 2021, East Asia
as well as North America have been declining.
and the Pacific led with 47.1%, followed by Europe
During that period, there has been an increase in
and Central Asia (17.2%), and then North America
publications from other regions such as South Asia;
(11.6%). Since 2019, the share of publications from
and the Middle East and North Africa.
AI Journal Publications (% of World Total) by Region, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
50%
AI Journal Publications (% of World Total)
47.14%, East Asia and Paci c
40%
30%
20%
17.20%, Europe and Central Asia
11.61%, North America
6.93%, Unknown
6.75%, South Asia
4.64%, Middle East and North Africa
2.66%, Latin America and the Caribbean
2.30%, Rest of the World
0.77%, Sub-Saharan Africa
10%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.10
3 Regions in this chapter are classified according to the World Bank analytical grouping.
Table of Contents
Chapter 1 Preview
33
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
By Geographic Area4
Figure 1.1.11 breaks down the share of AI journal
throughout, with 39.8% in 2021, followed by the
publications over the past 12 years by geographic
European Union and the United Kingdom (15.1%),
area. This year’s AI Index included India in recognition
then the United States (10.0%). The share of Indian
of the increasingly important role it plays in the
publications has been steadily increasing—from 1.3%
AI ecosystem. China has remained the leader
in 2010 to 5.6% in 2021.
AI Journal Publications (% of World Total) by Geographic Area, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
AI Journal Publications (% of World Total)
40%
39.78%, China
30%
22.70%, Rest of the World
20%
15.05%, European Union and United Kingdom
10.03%, United States
10%
6.88%, Unknown
5.56%, India
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.11
4 In this chapter we use “geographic area” based on CSET’s classifications, which are disaggregated not only by country, but also by territory. Further, we count the European Union and the
United Kingdom as a single geographic area to reflect the regions’ strong history of research collaboration.
Table of Contents
Chapter 1 Preview
34
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Citations
China’s share of citations in AI journal publications
1.1.12). China, the European Union and the United
has gradually increased since 2010, while those of the
Kingdom, and the United States accounted for 65.7%
European Union and the United Kingdom, as well as
of the total citations in the world.
those of the United States, have decreased (Figure
AI Journal Citations (% of World Total) by Geographic Area, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
30%
29.07%, China
AI Journal Citations (% of World Total)
27.37%, Rest of the World
25%
21.51%, European Union and United Kingdom
20%
15.08%, United States
15%
10%
6.05%, India
5%
0.92%, Unknown
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.12
Table of Contents
Chapter 1 Preview
35
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
AI Conference Publications
Overview
The number of AI conference publications peaked in 2019, and fell 20.4% below the peak in 2021 (Figure 1.1.13).
The total number of 2021 AI conference publications, 85,094, was marginally greater than the 2010 total of 75,592.
Number of AI Conference Publications, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Number of AI Conference Publications (in Thousands)
100
85.09
80
60
40
20
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.13
Table of Contents
Chapter 1 Preview
36
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
By Region
Figure 1.1.14 shows the number of AI conference
East Asia and the Pacific continues to rise, accounting
publications by region. As with the trend in journal
for 36.7% in 2021, followed by Europe and Central
publications, East Asia and the Pacific; Europe
Asia (22.7%), and then North America (19.6%). The
and Central Asia; and North America account for
percentage of AI conference publications in South Asia
the world’s highest numbers of AI conference
saw a noticeable rise in the past 12 years, growing from
publications. Specifically, the share represented by
3.6% in 2010 to 8.5% in 2021.
AI Conference Publications (% of World Total) by Region, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
40%
AI Conference Publications (% of World Total)
36.72%, East Asia and Paci c
35%
30%
25%
22.66%, Europe and Central Asia
20%
19.56%, North America
15%
10%
8.45%, South Asia
3.82%, Middle East and North Africa
3.07%, Latin America and the Caribbean
2.76%, Unknown
2.35%, Rest of the World
0.60%, Sub-Saharan Africa
5%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.14
Table of Contents
Chapter 1 Preview
37
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
By Geographic Area
In 2021, China produced the greatest share of the
came in third at 17.2% (Figure 1.1.15). Mirroring trends
world’s AI conference publications at 26.2%, having
seen in other parts of the research and development
overtaken the European Union and the United
section, India’s share of AI conference publications is
Kingdom in 2017. The European Union plus the United
also increasing.
Kingdom followed at 20.3%, and the United States
AI Conference Publications (% of World Total) by Geographic Area, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
AI Conference Publications (% of World Total)
30%
26.84%, Rest of the World
26.15%, China
25%
20.29%, European Union and United Kingdom
20%
17.23%, United States
15%
10%
6.79%, India
5%
2.70%, Unknown
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.15
Table of Contents
Chapter 1 Preview
38
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Citations
Despite China producing the most AI conference
conference citations, with 23.9%, followed by China’s
publications in 2021, Figure 1.1.16 shows that
22.0%. However, the gap between American and
the United States had the greatest share of AI
Chinese AI conference citations is narrowing.
AI Conference Citations (% of World Total) by Geographic Area, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
35%
AI Conference Citations (% of World Total)
30%
25.57%, Rest of the World
23.86%, United States
22.02%, China
21.59%, European Union and United Kingdom
25%
20%
15%
10%
6.09%, India
5%
0%
0.87%, Unknown
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.16
Table of Contents
Chapter 1 Preview
39
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
AI Repositories
Overview
Publishing pre-peer-reviewed papers on repositories
share their findings before submitting them to journals
of electronic preprints (such as arXiv and SSRN)
and conferences, thereby accelerating the cycle of
has become a popular way for AI researchers to
information discovery. The number of AI repository
disseminate their work outside traditional avenues for
publications grew almost 27 times in the past 12 years
publication. These repositories allow researchers to
(Figure 1.1.17).
Number of AI Repository Publications, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Number of AI Repository Publications (in Thousands)
65.21
60
50
40
30
20
10
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.17
Table of Contents
Chapter 1 Preview
40
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
By Region
Figure 1.1.18 shows that North America has
by East Asia and the Pacific has grown significantly
maintained a steady lead in the world share of AI
since 2010 and continued growing from 2020 to
repository publications since 2016. Since 2011, the
2021, a period in which the year-over-year share of
share of repository publications from Europe and
North American as well European and Central Asian
Central Asia has declined. The share represented
repository publications declined.
AI Repository Publications (% of World Total) by Region, 2010–21
AI Repository Publications (% of World Total)
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
30%
26.32%, North America
23.99%, Unknown
21.40%, Europe and Central Asia
20%
17.88%, East Asia and Paci c
10%
3.41%, South Asia
3.06%, Middle East and North Africa
1.81%, Rest of the World
1.80%, Latin America and the Caribbean
0.34%, Sub-Saharan Africa
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.18
Table of Contents
Chapter 1 Preview
41
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
By Geographic Area
While the United States has held the lead in the
(Figure 1.1.19). In 2021, the United States accounted
percentage of global AI repository publications since
for 23.5% of the world’s AI repository publications,
2016, China is catching up, while the European Union
followed by the European Union plus the United
plus the United Kingdom’s share continues to drop
Kingdom (20.5%), and then China (11.9%).
AI Repository Publications (% of World Total) by Geographic Area, 2010–21
AI Repository Publications (% of World Total)
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
30%
23.48%, United States
23.18%, Unknown
20.54%, European Union and United Kingdom
20%
18.07%, Rest of the World
11.87%, China
10%
2.85%, India
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.19
Table of Contents
Chapter 1 Preview
42
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Citations
In the citations of AI repository publications, Figure
a dominant lead over the European Union plus the
1.1.20 shows that in 2021 the United States topped
United Kingdom (21.5%), as well as China (21.0%).
the list with 29.2% of overall citations, maintaining
AI Repository Citations (% of World Total) by Geographic Area, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
AI Repository Citations (% of World Total)
40%
30%
29.22%, United States
21.79%, Rest of the World
21.52%, European Union and United Kingdom
20.98%, China
20%
10%
4.59%, Unknown
1.91%, India
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.20
Table of Contents
Chapter 1 Preview
43
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Narrative Highlight:
Top Publishing Institutions
All Fields
University, the University of the Chinese Academy
Since 2010, the institution producing the greatest
of Sciences, Shanghai Jiao Tong University,
number of total AI papers has been the Chinese
and Zhejiang University.5 The total number of
Academy of Sciences (Figure 1.1.21). The next
publications released by each of these institutions
top four are all Chinese universities: Tsinghua
in 2021 is displayed in Figure 1.1.22.
Top Ten Institutions in the World in 2021 Ranked by Number of AI Publications in All Fields, 2010–21
Rank
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
1
1, Chinese Academy of Sciences
2
2, Tsinghua University
3
3, University of Chinese Academy of Sciences
4
4, Shanghai Jiao Tong University
5
5, Zhejiang University
6
6, Harbin Institute of Technology
7
7, Beihang University
8
8, University of Electronic Science and Technology of China
9
9, Peking University
10
10, Massachusetts Institute of Technology
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 1.1.21
5 It is important to note that many Chinese research institutions are large, centralized organizations with thousands of researchers. It is therefore not entirely surprising that,
purely by the metric of publication count, they outpublish most non-Chinese institutions.
Table of Contents
Chapter 1 Preview
44
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Narrative Highlight:
Top Publishing Institutions (cont’d)
Top Ten Institutions in the World by Number of AI Publications in All Fields, 2021
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Chinese Academy of Sciences
5,099
Tsinghua University
3,373
University of Chinese Academy
of Sciences
2,904
Shanghai Jiao Tong University
2,703
Zhejiang University
2,590
Harbin Institute of Technology
2,016
Beihang University
1,970
University of Electronic Science
and Technology of China
1,951
Peking University
1,893
Massachusetts Institute of
Technology
1,745
0
500
1,000
1,500
2,000
2,500
3,000
Number of AI Publications
3,500
4,000
4,500
5,000
Figure 1.1.22
Table of Contents
Chapter 1 Preview
45
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Narrative Highlight:
Top Publishing Institutions (cont’d)
Computer Vision
In 2021, the top 10 institutions publishing the greatest number of AI computer vision publications were
all Chinese (Figure 1.1.23). The Chinese Academy of Sciences published the largest number of such
publications, with a total of 562.
Top Ten Institutions in the World by Number of AI Publications in Computer Vision, 2021
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Chinese Academy of Sciences
562
Shanghai Jiao Tong University
316
University of Chinese Academy
of Sciences
314
Tsinghua University
296
Zhejiang University
289
Beihang University
247
Wuhan University
231
Beijing Institute of Technology
229
Harbin Institute of Technology
210
Tianjin University
0
182
100
200
300
Number of AI Publications
400
500
Figure 1.1.23
Table of Contents
Chapter 1 Preview
46
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Narrative Highlight:
Top Publishing Institutions (cont’d)
Natural Language Processing
American institutions are represented to a
took second place (140 publications), followed by
greater degree in the share of top NLP publishers
Microsoft (134). In addition, 2021 was the first year
(Figure 1.1.24). Although the Chinese Academy of
Amazon and Alibaba were represented among the
Sciences was again the world’s leading institution
top-ten largest publishing NLP institutions.
in 2021 (182 publications), Carnegie Mellon
Top Ten Institutions in the World by Number of AI Publications in Natural Language Processing, 2021
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Chinese Academy of Sciences
182
Carnegie Mellon University
140
Microsoft (United States)
134
Tsinghua University
127
Carnegie Mellon University
Australia
116
Google (United States)
116
Peking University
113
University of Chinese Academy
of Sciences
112
Alibaba Group (China)
100
Amazon (United States)
98
0
10
20
30
40
50
60
70
80
90
100
110
120
Number of AI Publications
130
140
150
160
170
180
190
Figure 1.1.24
Table of Contents
Chapter 1 Preview
47
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.1 Publications
Narrative Highlight:
Top Publishing Institutions (cont’d)
Speech Recognition
In 2021, the greatest number of speech recognition papers came from the Chinese Academy of Sciences
(107), followed by Microsoft (98) and Google (75) (Figure 1.1.25). The Chinese Academy of Sciences
reclaimed the top spot in 2021 from Microsoft, which held first position in 2020.
Top Ten Institutions in the World by Number of AI Publications in Speech Recognition, 2021
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Chinese Academy of Sciences
107
Microsoft (United States)
98
Google (United States)
75
University of Chinese Academy
of Sciences
66
Tsinghua University
61
University of Science
and Technology of China
59
Carnegie Mellon University
57
Tencent (China)
57
Chinese University of Hong Kong
55
Amazon (United States)
0
54
10
20
30
40
50
60
70
Number of AI Publications
80
90
100
110
Figure 1.1.25
Table of Contents
Chapter 1 Preview
48
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Epoch AI is a collective of researchers investigating and forecasting the development of advanced AI. Epoch curates a database of
significant AI and machine learning systems that have been released since the 1950s. There are different criteria under which the
Epoch team decides to include particular AI systems in their database; for example, the system may have registered a state-of-the-art
improvement, been deemed to have been historically significant, or been highly cited.
This subsection uses the Epoch database to track trends in significant AI and machine learning systems. The latter half of the chapter
includes research done by the AI Index team that reports trends in large language and multimodal models, which are models trained on
large amounts of data and adaptable to a variety of downstream applications.
1.2 Trends in Significant
Machine Learning Systems
General Machine
Learning Systems
System Types
The figures below report trends among all machine
was language (Figure 1.2.1). There were 23 significant
Among the significant AI machine learning systems
released in 2022, the most common class of system
learning systems included in the Epoch dataset. For
reference, these systems are referred to as significant
machine learning systems throughout the subsection.
AI language systems released in 2022, roughly six
times the number of the next most common system
type, multimodal systems.
Number of Significant Machine Learning Systems by Domain, 2022
Source: Epoch, 2022 | Chart: 2023 AI Index Report
Language
23
Multimodal
4
Drawing
3
Vision
2
Speech
2
Text-to-Video
1
Other
1
Games
1
0
2
4
6
8
10
12
14
16
Number of Signi cant Machine Learning Systems
18
20
22
24
Figure 1.2.16
6 There were 38 total significant AI machine learning systems released in 2022, according to Epoch; however, one of the systems, BaGuaLu, did not have a domain classification
and is therefore omitted from Figure 1.2.1.
Table of Contents
Chapter 1 Preview
49
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Sector Analysis
Which sector among industry, academia, or nonprofit
machine learning systems compared to just three
has released the greatest number of significant
produced by academia. Producing state-of-the-art
machine learning systems? Until 2014, most machine
AI systems increasingly requires large amounts of
learning systems were released by academia.
data, computing power, and money; resources that
Since then, industry has taken over (Figure 1.2.2). In
industry actors possess in greater amounts compared
2022, there were 32 significant industry-produced
to nonprofits and academia.
Number of Significant Machine Learning Systems by Sector, 2002–22
Source: Epoch, 2022 | Chart: 2023 AI Index Report
Number of Signi cant Machine Learning Systems
35
32, Industry
30
25
20
15
10
5
3, Academia
2, Research Collective
1, Industry-Academia Collaboration
0, Nonpro t
0
2002
2004
Table of Contents
2006
2008
2010
Chapter 1 Preview
2012
2014
2016
2018
2020
2022
Figure 1.2.2
50
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
National Affiliation
In order to paint a picture of AI’s evolving
or AI-research firm, was headquartered. In 2022,
geopolitical landscape, the AI Index research
the United States produced the greatest number
team identified the nationality of the authors who
of significant machine learning systems with 16,
contributed to the development of each significant
followed by the United Kingdom (8) and China (3).
machine learning system in the Epoch dataset.
Moreover, since 2002 the United States has outpaced
7
the United Kingdom and the European Union, as well
Systems
as China, in terms of the total number of significant
Figure 1.2.3 showcases the total number of
machine learning systems produced (Figure 1.2.4).
significant machine learning systems attributed to
Figure 1.2.5 displays the total number of significant
researchers from particular countries.8 A researcher
machine learning systems produced by country since
is considered to have belonged to the country in
2002 for the entire world.
Number of Significant Machine Learning Systems by
Country, 2022
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
United States
16
United Kingdom
8
China
3
Canada
2
Germany
2
France
1
India
1
Israel
1
Russia
1
Singapore
1
30
25
20
16, United States
15
12, European Union and
United Kingdom
10
5
3, China
0
2022
2020
2018
2016
2014
2012
2010
2008
2006
2004
2
4
6
8
10
12
14
16
Number of Signi cant Machine Learning Systems
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
2002
0
Number of Significant Machine Learning Systems by
Select Geographic Area, 2002–22
Number of Significant Machine Learning Systems
which their institution, for example a university
Figure 1.2.3
Figure 1.2.4
7 The methodology by which the AI Index identified authors’ nationality is outlined in greater detail in the Appendix.
8 A machine learning system is considered to be affiliated with a particular country if at least one author involved in creating the model was affiliated with that country.
Consequently, in cases where a system has authors from multiple countries, double counting may occur.
Table of Contents
Chapter 1 Preview
51
Artificial Intelligence
Index Report 2023
Number of
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Machine Learning Systems by Country, 2002–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
0
1–10
11–20
21–60
61–255
Figure 1.2.5
Table of Contents
Chapter 1 Preview
52
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Authorship
Figures 1.2.6 to 1.2.8 look at the total number of
in 2022 the United States had the greatest number of
authors, disaggregated by national affiliation, that
authors producing significant machine learning systems,
contributed to the launch of significant machine
with 285, more than double that of the United Kingdom
learning systems. As was the case with total systems,
and nearly six times that of China (Figure 1.2.6).
Number of Authors of Significant Machine Learning
Systems by Country, 2022
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
United States
Number of Authors of Significant Machine Learning
Systems by Select Geographic Area, 2002–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
400
285
United Kingdom
350
139
China
300
Canada
Number of Authors
49
21
Israel
13
Sweden
8
Germany
7
200
155, European Union and
United Kingdom
150
100
Russia 3
50
India 2
49, China
0
France 1
2022
2020
2018
2016
2014
2012
Figure 1.2.6
2010
300
2008
Source: AI Index, 2022 | Chart: 2023 AI Index Report
250
2006
Number of Authors of
100
150
200
Number of Authors
2004
50
2002
0
285, United States
250
Figure 1.2.7
Machine Learning Systems by Country, 2002–22 (Sum)
0
1–10
11–20
21–60
61–180
181–370
371–680
Figure 1.2.8
681–2000
Table of Contents
Chapter 1 Preview
53
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Parameter Trends
Parameters are numerical values that are learned by
dataset by sector. Over time, there has been a steady
machine learning models during training. The value of
increase in the number of parameters, an increase that
parameters in machine learning models determines
has become particularly sharp since the early 2010s.
how a model might interpret input data and make
The fact that AI systems are rapidly increasing their
predictions. Adjusting parameters is an essential
parameters is reflective of the increased complexity of
step in ensuring that the performance of a machine
the tasks they are being asked to perform, the greater
learning system is optimized.
availability of data, advancements in underlying
hardware, and most importantly, the demonstrated
Figure 1.2.9 highlights the number of parameters of
performance of larger models.
the machine learning systems included in the Epoch
Number of Parameters of Significant Machine Learning Systems by Sector, 1950–2022
Source: Epoch, 2022 | Chart: 2023 AI Index Report
1.0e+14
Academia
Industry
Industry-Academia Collaboration
Nonpro t
Research Collective
Number of Parameters (Log Scale)
1.0e+12
1.0e+10
1.0e+8
1.0e+6
1.0e+4
1.0e+2
1950
1954
1958
1962
1966
1970
1974
1978
1982
1986
1990
1994
1998
2002 2006
2010
2014
2018
2022
Figure 1.2.9
Table of Contents
Chapter 1 Preview
54
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Figure 1.2.10 demonstrates the parameters of machine learning systems by domain. In recent years, there has
been a rise in parameter-rich systems.
Number of Parameters of Significant Machine Learning Systems by Domain, 1950–2022
Source: Epoch, 2022 | Chart: 2023 AI Index Report
Language
Vision
Games
1.0e+12
Number of Parameters (Log Scale)
1.0e+10
1.0e+8
1.0e+6
1.0e+4
1.0e+2
1954
1958
1962
1966
1970
1974
1978
1982
1986
1990
1994
1998
2002
2006
2010
2014
2018
2022
Figure 1.2.10
Table of Contents
Chapter 1 Preview
55
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Compute Trends
machine learning systems has increased exponentially
The computational power, or “compute,” of AI
in the last half-decade (Figure 1.2.11).9 The growing
systems refers to the amount of computational
demand for compute in AI carries several important
resources needed to train and run a machine
implications. For example, more compute-intensive
learning system. Typically, the more complex a
models tend to have greater environmental impacts,
system is, and the larger the dataset on which it is
trained, the greater the amount of compute required.
The amount of compute used by significant AI
and industrial players tend to have easier access
to computational resources than others, such as
universities.
Training Compute (FLOP) of Significant Machine Learning Systems by Sector, 1950–2022
Source: Epoch, 2022 | Chart: 2023 AI Index Report
Academia
Industry
Industry-Academia Collaboration
Nonpro t
Research Collective
1.0e+24
Training Compute (FLOP – Log Scale)
1.0e+21
1.0e+18
1.0e+15
1.0e+12
1.0e+9
1.0e+6
1.0e+3
1.0e+0
1950
1954
1958
1962
1966
1970
1974
1978
1982
1986
1990
1994
1998
2002 2006
2010
2014
2018
2022
Figure 1.2.11
9 FLOP stands for “Floating Point Operations” and is a measure of the performance of a computational device.
Table of Contents
Chapter 1 Preview
56
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Since 2010, it has increasingly been the case that of all machine learning systems, language models are
demanding the most computational resources.
Training Compute (FLOP) of Significant Machine Learning Systems by Domain, 1950–2022
Source: Epoch, 2022 | Chart: 2023 AI Index Report
Language
Vision
Games
1.0e+24
Training Compute (FLOP – Log Scale)
1.0e+21
1.0e+18
1.0e+15
1.0e+12
1.0e+9
1.0e+6
1.0e+3
1954
1958
1962
1966
1970
1974
1978
1982
1986
1990
1994
1998
2002
2006
2010
2014
2018
2022
Figure 1.2.12
Table of Contents
Chapter 1 Preview
57
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Large Language and
Multimodal Models
are starting to be widely deployed in the real world.
Large language and multimodal models, sometimes
national affiliation of the authors responsible for
National Affiliation
called foundation models, are an emerging and
increasingly popular type of AI model that is trained
on huge amounts of data and adaptable to a variety
of downstream applications. Large language and
multimodal models like ChatGPT, DALL-E 2, and MakeA-Video have demonstrated impressive capabilities and
This year the AI Index conducted an analysis of the
releasing new large language and multimodal models.10
The majority of these researchers were from American
institutions (54.2%) (Figure 1.2.13). In 2022, for the first
time, researchers from Canada, Germany, and India
contributed to the development of large language and
multimodal models.
Authors of Select Large Language and Multimodal Models (% of Total) by Country, 2019–22
Authors of Large Language and Multimodal Models (% of Total)
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
100%
80%
60%
54.02%, United States
40%
21.88%, United Kingdom
8.04%, China
6.25%, Canada
5.80%, Israel
3.12%, Germany
0.89%, India
0.00%, Korea
20%
0%
2019
2020
2021
2022
Figure 1.2.13
Figure 1.2.14 offers a timeline view of the large
PaLM (540B). The only Chinese large language and
language and multimodal models that have been
multimodal model released in 2022 was GLM-130B,
released since GPT-2, along with the national
an impressive bilingual (English and Chinese) model
affiliations of the researchers who produced the
created by researchers at Tsinghua University. BLOOM,
models. Some of the notable American large
also launched in late 2022, was listed as indeterminate
language and multimodal models released in
given that it was the result of a collaboration of more
2022 included OpenAI’s DALL-E 2 and Google’s
than 1,000 international researchers.
10 The AI models that were considered to be large language and multimodal models were hand-selected by the AI Index steering committee. It is possible that this selection may have omitted
certain models.
Table of Contents
Chapter 1 Preview
58
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Timeline
and National Affiliation of Select Large Language and Multimodal Model Releases
Source: AI Index, 2022 | Chart: 2023 AI Index Report
2023-Jan
BLOOM
2022-Oct
2022-Jul
2022-Apr
GLM-130B
Minerva (540B)
GPT-NeoX-20B
2022-Jan
InstructGPT
AlphaCode
Gopher
2021-Oct
Megatron-Turing NLG 530B
2021-Jul
Codex
HyperClova Wu Dao 2.0
2021-Apr
Imagen
OPT-175B
DALL·E 2
Chinchilla
Jurassic-X
Stable Diffusion (LDM-KL-8-G) PaLM (540B)
PanGu-alpha
CogView
Wu Dao - Wen Yuan
Jurassic-1-Jumbo
ERNIE 3.0
GPT-J-6B
GPT-Neo
DALL-E
2021-Jan
2020-Oct
ERNIE-GEN (large)
2020-Jul
GPT-3 175B (davinci)
2020-Apr
Turing NLG
Meena
2020-Jan
2019-Oct
Megatron-LM (Original, 8.3B)
2019-Jul
T5-11B
T5-3B
Grover-Mega
United States
United Kingdom
China
United States,
United Kingdom,
Germany, India
Korea
Canada
Israel
Germany
Indeterminate
2019-Apr
GPT-2
2019-Jan
Figure 1.2.1411
11 While we were conducting the analysis to produce Figure 1.2.14, Irene Solaiman published a paper that has a similar analysis. We were not aware of the paper at the time of our research.
Table of Contents
Chapter 1 Preview
59
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Parameter Count
Over time, the number of parameters of newly released
Google in 2022, had 540 billion, nearly 360 times
large language and multimodal models has massively
more than GPT-2. The median number of parameters
increased. For example, GPT-2, which was the first
in large language and multimodal models is increasing
large language and multimodal model released in 2019,
exponentially over time (Figure 1.2.15).
only had 1.5 billion parameters. PaLM, launched by
Number of Parameters of Select Large Language and Multimodal Models, 2019–22
Source: Epoch, 2022 | Chart: 2023 AI Index Report
3.2e+12
Wu Dao 2.0
Number of Parameters (Log Scale)
1.0e+12
Megatron-Turing NLG 530B
3.2e+11
HyperClova
GPT-3 175B (davinci)
PanGu-α
Minerva (540B)
Gopher PaLM (540B)
BLOOM
OPT-175B
Jurassic-1-Jumbo
1.0e+11
GLM-130B
Chinchilla
3.2e+10
GPT-NeoX-20B
Turing NLG
DALL-E
T5-11B
GPT-J-6B ERNIE 3.0
1.0e+10
3.2e+9
1.0e+9
Codex
Megatron-LM (Original, 8.3B)
T5-3B Meena
Jurassic-X
DALL·E 2
CogView
Wu Dao - Wen Yuan
Stable Di usion (LDM-KL-8-G)
GPT-Neo
GPT-2
Grover-Mega
ERNIE-GEN (large)
3.2e+8
ov
-N
22
20
ug
-A
22
20
un
-J y
22 a
20 2-M r
2 p
20 2-A ar
2
20 2-M b
2 e
2022-F
20
ec
-D
21
20
ct
-O
21
20
ug
-A
21 ul
20 1-J n
2 u
20 1-J ay
2
20 1-M r
2 p
20 1-A ar
2
20 -M
21
20
an
-J
21
20
ug
-A
20
20
ay
-M
20
20
eb
-F
20 an
20 0-J
2
20
ct
-O
19 ep
20 9-S
1
20
ay
-M
19
20
eb
-F
19
20
Figure 1.2.15
Table of Contents
Chapter 1 Preview
60
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Training Compute
The training compute of large language and multimodal
reasoning problems, was roughly nine times greater
models has also steadily increased (Figure 1.2.16). The
than that used for OpenAI’s GPT-3, which was
compute used to train Minerva (540B), a large language
released in June 2022, and roughly 1839 times greater
and multimodal model released by Google in June
than that used for GPT-2 (released February 2019).
2022 that displayed impressive abilities on quantitative
Training Compute (FLOP) of Select Large Language and Multimodal Models, 2019–22
Source: Epoch, 2022 | Chart: 2023 AI Index Report
3.2e+24
Training Compute (FLOP – Log Scale)
1.0e+24
GPT-3 175B (davinci)
3.2e+23
Meena
1.0e+23
T5-11B
3.2e+22
T5-3B
1.0e+22
3.2e+21
PaLM (540B)
Minerva (540B)
Megatron-Turing NLG 530B
OPT-175B
Gopher
Jurassic-1-Jumbo
Chinchilla
BLOOM
GPT-NeoX-20B
PanGu-α
DALL-E
Stable Diffusion
HyperClova
Turing NLG
Megatron-LM (Original, 8.3B)
GPT-2
CogView
AlphaCode
GLM-130B
GPT-J-6B
GPT-Neo
Wu Dao - Wen Yuan
1.0e+21
3.2e+20
1.0e+20
3.2e+19
1.0e+19
ERNIE 3.0
3.2e+18
1.0e+18
ov
-N
22
20
ug
-A
22
20
un
-J y
22 a
20 -M
22 pr
20 2-A ar
2
20 2-M b
2 e
2022-F
20
ec
-D
21
20
ct
-O
21
20
ug
-A
21 ul
20 1-J
2
20
ay
-M
21 pr
20 1-A ar
2
20 -M
21
20
an
-J
21
20
ay
-M
20
20
eb
-F
20 an
20 0-J
2
20
ct
-O
19 ep
20 9-S
1
20
eb
-F
19
20
Figure 1.2.16
Table of Contents
Chapter 1 Preview
61
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
Training Cost
A particular theme of the discourse around large
estimate with the tag of mid, high, or low: mid where
language and multimodal models has to do with their
the estimate is thought to be a mid-level estimate,
hypothesized costs. Although AI companies rarely speak
high where it is thought to be an overestimate, and
openly about training costs, it is widely speculated that
low where it is thought to be an underestimate. In
these models cost millions of dollars to train and will
certain cases, there was not enough data to estimate
become increasingly expensive with scale.
the training cost of particular large language and
multimodal models, therefore these models were
This subsection presents novel analysis in which the
omitted from our analysis.
AI Index research team generated estimates for the
training costs of various large language and multimodal
The AI Index estimates validate popular claims that
models (Figure 1.2.17). These estimates are based on the
large language and multimodal models are increasingly
hardware and training time disclosed by the models’
costing millions of dollars to train. For example,
authors. In cases where training time was not disclosed,
Chinchilla, a large language model launched by
we calculated from hardware speed, training compute,
DeepMind in May 2022, is estimated to have cost $2.1
and hardware utilization efficiency. Given the possible
million, while BLOOM’s training is thought to have cost
variability of the estimates, we have qualified each
$2.3 million.
Estimated Training Cost of Select Large Language and Multimodal Models
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Mid
High
Low
11.35
10
8.55
8.01
8
6
4
2019
2020
BLOOM
0.16
GLM-130B
OPT-175B
Stable Di usion (LDM-KL-8-G)
PaLM (540B)
2.29
1.03
Minerva (540B)
1.69
0.60
Chinchilla
Codex
0.24
GPT-NeoX-20B
ERNIE 3.0
2021
0.09
AlphaCode
0.14
Gopher
0.01
Megatron-Turing NLG 530B
0.27
HyperClova
GPT-J-6B
GPT-Neo
GPT-3 175B
Turing NLG
Meena
Wu Dao - Wen Yuan
0.23 0.02 0.09 0.43
0.11
0.05
T5-11B
0
2.11
1.80
1.47
DALL-E
1.97
2
GPT-2
Training Cost
(in Millions of U.S. Dollars)
12
2022
Figure 1.2.17
12 See Appendix for the complete methodology behind the cost estimates.
Table of Contents
Chapter 1 Preview
62
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.2 Trends in Significant Machine Learning Systems
There is also a clear relationship between the cost of large language and multimodal models and their size.
As evidenced in Figures 1.2.18 and 1.2.19, the large language and multimodal models with a greater number of
parameters and that train using larger amounts of compute tend to be more expensive.
Estimated Training Cost of Select Large Language
and Multimodal Models and Number of Parameters
Source: AI Index, 2022 | Chart: 2023 AI Index Report
HyperClova
GLM-130B
1.0e+11
OPT-175B
BLOOM
GPT-3 175B
Chinchilla
AlphaCode
5.0e+10
GPT-NeoX-20B
Turing NLG
2.0e+10
Codex
ERNIE 3.0
T5-11B
DALL-E
GPT-J-6B
5.0e+9
Wu Dao - Wen Yuan
2.0e+9
GPT-2
GPT-Neo
Meena
PaLM (540B)
Megatron-Turing NLG 530B
OPT-175B
Gopher
Chinchilla
1.0e+24
Training Compute (FLOP – Log Scale)
Number of Parameters (Log Scale)
Minerva (540B)
Megatron-Turing NLG 530B
Gopher
2.0e+11
1.0e+10
Source: AI Index, 2022 | Chart: 2023 AI Index Report
PaLM (540B)
Minerva (540B)
5.0e+11
Estimated Training Cost of Select Large Language and
Multimodal Models and Training Compute (FLOP)
GPT-NeoX-20B
AlphaCode
DALL-E
Turing NLG
1.0e+22
Meena
GLM-130B Stable Diffusion
BLOOM
T5-11B
GPT-J-6B
GPT-Neo
GPT-2
Wu Dao - Wen Yuan
1.0e+20
Stable Di usion
ERNIE 3.0
1.0e+18
1.0e+9
10k
100k
1M
10M
Training Cost (in U.S. Dollars - Log Scale)
Figure 1.2.18
Table of Contents
Chapter 1 Preview
10k
100k
1M
10M
Training Cost (in U.S. Dollars - Log Scale)
Figure 1.2.19
63
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.3 AI Conferences
AI conferences are key venues for researchers to share their work and connect with peers and collaborators. Conference attendance is an
indication of broader industrial and academic interest in a scientific field. In the past 20 years, AI conferences have grown in size, number,
and prestige. This section presents data on the trends in attendance at major AI conferences.
1.3 AI Conferences
Conference Attendance
International Conference on Principles of Knowledge
After a period of increasing attendance, the total
strictly in-person.
Representation and Reasoning (KR) were both held
attendance at the conferences for which the AI
Neural Information Processing Systems (NeurIPS)
Index collected data dipped in 2021 and again in
continued to be one of the most attended
2022 (Figure 1.3.1).13 This decline may be attributed
conferences, with around 15,530 attendees (Figure
to the fact that many conferences returned to hybrid
1.3.2).14 The conference with the greatest one-
or in-person formats after being fully virtual in
year increase in attendance was the International
2020 and 2021. For example, the International Joint
Conference on Robotics and Automation (ICRA),
Conference on Artificial Intelligence (IJCAI) and the
from 1,000 in 2021 to 8,008 in 2022.
Number of Attendees at Select AI Conferences, 2010–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
90
Number of Attendees (in Thousands)
80
70
60
59.45
50
40
30
20
10
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 1.3.1
13 This data should be interpreted with caution given that many conferences in the last few years have had virtual or hybrid formats. Conference organizers report that
measuring the exact attendance numbers at virtual conferences is difficult, as virtual conferences allow for higher attendance of researchers from around the world.
14 In 2021, 9,560 of the attendees attended NeurIPS in-person and 5,970 remotely.
Table of Contents
Chapter 1 Preview
64
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.3 AI Conferences
Attendance at Large Conferences, 2010–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
30
Number of Attendees (in Thousands)
25
20
15.53, NeurIPS
15
10.17, CVPR
8.01, ICRA
7.73, ICML
10
5.35, ICLR
4.32, IROS
3.56, AAAI
5
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 1.3.2
Attendance at Small Conferences, 2010–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Number of Attendees (in Thousands)
3.50
3.00
2.50
2.01, IJCAI
2.00
1.50
1.09, FaccT
1.00
0.66, UAI
0.50, AAMAS
0.39, ICAPS
0.50
0.12, KR
0.00
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 1.3.3
Table of Contents
Chapter 1 Preview
65
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.4 Open-Source AI Software
GitHub is a web-based platform where individuals and coding teams can host, review, and collaborate on various code repositories.
GitHub is used extensively by software developers to manage and share code, collaborate on various projects, and support open-source
software. This subsection uses data provided by GitHub and the OECD.AI policy observatory. These trends can serve as a proxy for some
of the broader trends occuring in the world of open-source AI software not captured by academic publication data.
1.4 Open-Source AI Software
Projects
A GitHub project is a collection of files that
software project. Since 2011, the total number of
can include the source code, documentation,
AI-related GitHub projects has steadily increased,
configuration files, and images that constitute a
growing from 1,536 in 2011 to 347,934 in 2022.
Number of GitHub AI Projects, 2011–22
Source: GitHub, 2022; OECD.AI, 2022 | Chart: 2023 AI Index Report
350
348
Number of AI Projects (in Thousands)
300
250
200
150
100
50
0
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 1.4.1
Table of Contents
Chapter 1 Preview
66
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.4 Open-Source AI Software
As of 2022, a large proportion of GitHub AI projects
United Kingdom (17.3%), and then the United States
were contributed by software developers in India
(14.0%). The share of American GitHub AI projects
(24.2%) (Figure 1.4.2). The next most represented
has been declining steadily since 2016.
geographic area was the European Union and the
GitHub AI Projects (% Total) by Geographic Area, 2011–22
Source: GitHub, 2022; OECD.AI, 2022 | Chart: 2023 AI Index Report
42.11%, Rest of the World
40%
35%
AI Projects (% of Total)
30%
25%
24.19%, India
20%
17.30%, European Union and United Kingdom
15%
14.00%, United States
10%
5%
2.40%, China
0%
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 1.4.2
Table of Contents
Chapter 1 Preview
67
Artificial Intelligence
Index Report 2023
Chapter 1: Research and Development
1.4 Open-Source AI Software
Stars
GitHub users can bookmark or save a repository
Figure 1.4.3 shows the cumulative number of stars
of interest by “starring” it. A GitHub star is similar
attributed to projects belonging to owners of various
to a “like” on a social media platform and indicates
geographic areas. As of 2022, GitHub AI projects
support for a particular open-source project. Some of
from the United States received the most stars,
the most starred GitHub repositories include libraries
followed by the European Union and the United
like TensorFlow, OpenCV, Keras, and PyTorch, which
Kingdom, and then China. In many geographic areas,
are widely used by software developers in the AI
the total number of new GitHub stars has leveled off
coding community.
in the last few years.
Number of GitHub Stars by Geographic Area, 2011–22
Source: GitHub, 2022; OECD.AI, 2022 | Chart: 2023 AI Index Report
Number of Cumulative GitHub Stars (in Millions)
3.50
3.44, United States
3.00
2.69, Rest of the World
2.50
2.34, European Union and United Kingdom
2.00
1.53, China
1.50
1.00
0.50
0.46, India
0.00
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 1.4.3
Table of Contents
Chapter 1 Preview
68
Artificial Intelligence
Index Report 2023
CHAPTER 2:
Technical
Performance
Artificial Intelligence
Index Report 2023
CHAPTER 2 PREVIEW:
Technical Performance
Overview
72
Chapter Highlights
73
Narrative Highlight: A Closer Look at
Progress in Image Generation
90
Visual Reasoning
92
2.1 What’s New in 2022: A Timeline
74
Visual Question Answering (VQA)
Challenge
2.2 Computer Vision—Image
81
Narrative Highlight: The Rise of Capable
Multimodal Reasoning Systems
93
Image Classification
81
ImageNet
Visual Commonsense Reasoning (VCR) 95
81
Face Detection and Recognition
National Institute of Standards and
Technology Face Recognition
Vendor Test (FRVT)
Deepfake Detection
Celeb-DF
82
MPII
Cityscapes Challenge, Pixel-Level
Semantic Labeling Task
Medical Image Segmentation
85
86
86
87
Object Detection
88
88
89
CIFAR-10 and STL-10
Chapter 2 Preview
96
Narrative Highlight: A Closer Look
at the Progress of Video Generation
87
Image Generation
Activity Recognition
84
Kvasir-SEG
Common Objects in Context (COCO)
96
Kinetics-400, Kinetics-600, Kinetics-700 96
85
Semantic Segmentation
2.3 Computer Vision—Video
83
84
Human Pose Estimation
Table of Contents
92
89
98
2.4 Language
99
English Language Understanding
99
SuperGLUE
99
Reading Comprehension Dataset
Requiring Logical Reasoning (ReClor)
100
Narrative Highlight: Just How Much
Better Have Language Models Become? 102
Narrative Highlight: Planning and
Reasoning in Large Language Models 103
Text Summarization
arXiv and PubMed
104
104
70
Artificial Intelligence
Index Report 2023
CHAPTER 2 PREVIEW (CONT’D):
Technical Performance
Natural Language Inference
Abductive Natural Language
Inference (aNLI)
Sentiment Analysis
SST-5 Fine-Grained Classification
2.7 Hardware
115
MLPerf Training Time
115
105
MLPerf Inference
117
106
Trends in GPUs
118
105
106
Multitask Language Understanding
107
2.8 Environment
120
Massive Multitask Language
Understanding (MMLU)
107
Environmental Impact of
Select Large Language Models
120
Machine Translation (MT)
Number of Commercially Available
MT Systems
108
109
Speech Recognition
109
109
Narrative Highlight:
Whisper
110
2.6 Reinforcement Learning
112
Reinforcement Learning Environments
112
Procgen
112
Narrative Highlight:
Benchmark Saturation
114
Table of Contents
Chapter 2 Preview
122
108
2.5 Speech
VoxCeleb
Narrative Highlight: Using AI to
Optimize Energy Usage
2.9 AI for Science
123
Accelerating Fusion Science Through
Learned Plasma Control
123
Discovering Novel Algorithms for
Matrix Manipulation With AlphaTensor
123
Designing Arithmetic Circuits With
Deep Reinforcement Learning
124
Unlocking de Novo Antibody Design
With Generative AI
124
ACCESS THE PUBLIC DATA
71
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
Overview
This year’s technical performance chapter features analysis of the technical progress in
AI during 2022. Building on previous reports, this chapter chronicles advancement in
computer vision, language, speech, reinforcement learning, and hardware. Moreover,
this year this chapter features an analysis on the environmental impact of AI, a discussion
of the ways in which AI has furthered scientific progress, and a timeline-style overview
of some of the most significant recent AI developments.
Table of Contents
Chapter 2 Preview
72
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
Chapter Highlights
Performance saturation on
traditional benchmarks.
Generative AI breaks into
the public consciousness.
AI continued to post state-of-the-art results,
but year-over-year improvement on many
benchmarks continues to be marginal.
Moreover, the speed at which benchmark
saturation is being reached is increasing.
However, new, more comprehensive
benchmarking suites such as BIG-bench and
HELM are being released.
2022 saw the release of text-to-image models
like DALL-E 2 and Stable Diffusion, text-tovideo systems like Make-A-Video, and chatbots
like ChatGPT. Still, these systems can be
prone to hallucination, confidently outputting
incoherent or untrue responses, making it hard
to rely on them for critical applications.
AI systems become
more flexible.
Traditionally AI systems have performed well
on narrow tasks but have struggled across
broader tasks. Recently released models
challenge that trend; BEiT-3, PaLI, and
Gato, among others, are single AI systems
increasingly capable of navigating multiple
tasks (for example, vision, language).
AI is both helping and harming
the environment.
New research suggests that AI systems can
have serious environmental impacts. According
to Luccioni et al., 2022, BLOOM’s training run
emitted 25 times more carbon than a single air
traveler on a one-way trip from New York to
San Francisco. Still, new reinforcement learning
models like BCOOLER show that AI systems
can be used to optimize energy usage.
Table of Contents
Chapter 2 Preview
Capable language models
still struggle with reasoning.
Language models continued to improve
their generative capabilities, but new
research suggests that they still struggle
with complex planning tasks.
The world’s best new scientist
… AI? AI models are starting to rapidly
accelerate scientific progress and in 2022
were used to aid hydrogen fusion, improve
the efficiency of matrix manipulation, and
generate new antibodies.
AI starts to build better AI.
Nvidia used an AI reinforcement learning agent
to improve the design of the chips that power AI
systems. Similarly, Google recently used one of
its language models, PaLM, to suggest ways to
improve the very same model. Self-improving AI
learning will accelerate AI progress.
73
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.1 What’s New in 2022: A Timeline
The technical performance chapter begins with an overview of some of the most significant technical developments in AI during 2022,
as selected by the AI Index Steering Committee.
2.1 What’s New in 2022: A Timeline
Feb. 2, 2022
DeepMind Releases AlphaCode
AlphaCode, an AI system that writes computer programs
at a competitive level, achieves a rank within the top 54%
of participants in a human programming competition. This
represents an improvement on the more complex problemsolving tasks with which AI has traditionally struggled.
Figure 2.1.1
Feb. 16, 2022
DeepMind Trains Reinforcement Learning Agent to
Control Nuclear Fusion Plasma in a Tokamak
Nuclear fusion is a potential source of clean, limitless
energy, but producing such energy in tokamaks is difficult
due to a lack of experimental data. DeepMind simulated
optimal tokamak management, an example of how AI can
accelerate science and combat climate change.
March 10, 2022
Figure 2.1.2
IndicNLG Benchmarks Natural Language Generation for Indic Languages
An international research collective launches IndicNLG, a collection of datasets for benchmarking
natural language generation for 11 Indic languages. The creation of IndicNLG increases the
potential for AI systems to generate language in more diverse, non-English linguistic settings.
Figure 2.1.3
Table of Contents
Chapter 2 Preview
74
Artificial Intelligence
Index Report 2023
March 24, 2022
Chapter 2: Technical Performance
2.1 What’s New in 2022: A Timeline
Meta AI Releases Make-A-Scene
Make-A-Scene is a text-to-image AI model that
enables users to generate images through text.
Make-A-Scene is one of many text-to-image
models released in 2022.
Figure 2.1.4
April 5, 2022
Google Releases PaLM
Google’s AI team trains one of the world’s
largest language models, PaLM. Made up
of 540 billion parameters, PaLM reinforces
the belief that researchers can improve
performance on large language models by
simply training them on more data.
April 13, 2022
Figure 2.1.5
OpenAI Releases DALL-E 2
DALL-E 2, a text-to-image AI system that can create
realistic art and images from textual descriptions, is
released to the public, igniting a generative AI craze.
Figure 2.1.6
May 12, 2022
DeepMind Launches Gato
Gato is a new reinforcement learning agent
capable of doing a wide range of tasks such
as robotic manipulation, game playing, image
captioning, and natural language generation.
The release of such models suggests that AI
systems are becoming better at generalization.
Figure 2.1.7
Table of Contents
Chapter 2 Preview
75
Artificial Intelligence
Index Report 2023
May 23, 2022
Chapter 2: Technical Performance
2.1 What’s New in 2022: A Timeline
Google Releases Imagen
Imagen is a text-to-image diffusion model capable
of producing images with a high degree of
photorealism. Imagen’s launch also comes with
the release of DrawBench, a challenging new
benchmark for text-to-image systems.
Figure 2.1.8
June 9, 2022
442 Authors Across 132 Institutions Team Up to Launch BIG-bench
In order to better challenge increasingly capable large language models, a team of 442 authors
across 132 institutions launch the Beyond the Imitation Game benchmark (BIG-bench). The
benchmark consists of 204 tasks ranging from linguistics, childhood development, math,
common-sense reasoning, biology, physics, social bias, and software development.
Figure 2.1.9
June 21, 2022
GitHub Makes Copilot Available as
a Subscription-Based Service for
Individual Developers
Copilot is a generative AI system capable
of turning natural language prompts
into coding suggestions across multiple
languages. Similar systems include OpenAI’s
Codex and Salesforce’s CodeGen. Surveys
suggest that Copilot makes coders more
productive and less frustrated.
Figure 2.1.10
Table of Contents
Chapter 2 Preview
76
Artificial Intelligence
Index Report 2023
July 8, 2022
Chapter 2: Technical Performance
2.1 What’s New in 2022: A Timeline
Nvidia Uses Reinforcement Learning to
Design Better-Performing GPUs
Nvidia uses its AI systems to improve the
performance of its latest H100 class of GPU chips.
GPUs being essential to AI training, this is one
example of how AI is starting to develop better AI.
Figure 2.1.11
July 11, 2022
Meta Announces
‘No Language Left Behind’
No Language Left Behind (NLLB) is
a family of models that can translate
across 200 distinct languages. NLLB is
one of the first systems that can perform
well across a wide range of low-resource
languages like Kamba and Lao.
Figure 2.1.12
Aug 4, 2022
Tsinghua Researchers Launch GLM-130B
Chinese researchers affiliated with Tsinghua
University release GLM-130B, a large language
model that outperforms others such as Meta’s
OPT, Hugging Face’s BLOOM, and OpenAI’s
original GPT-3.
Aug 22, 2022
Figure 2.1.13
Stability AI Releases Stable Diffusion
Stable Diffusion is an open-source text-to-image
diffusion-based model, meaning users can freely use
the model weights to generate their own images. Stable
Diffusion is trained on existing images created by humans
and gives no credit or acknowledgment, leaving open
questions around the ethical use of image generators.
Figure 2.1.14
Table of Contents
Chapter 2 Preview
77
Artificial Intelligence
Index Report 2023
Sept 21, 2022
Chapter 2: Technical Performance
2.1 What’s New in 2022: A Timeline
OpenAI Launches Whisper
Whisper is a large-scale speech-recognition system
trained on roughly 700,000 hours of audio data and
capable of respectable performance on various speech
recognition tasks. The fact that Whisper required neither
supervised pre-training nor unsupervised training with
fine-tuning yet was able to achieve strong performance
by merely increasing training data further validates the
approach of increasingly scaling AI models.
Sept 29, 2022
Figure 2.1.15
Meta Releases Make-A-Video
Make-A-Video is a system that allows users
to create videos from short text descriptions.
The quality of the videos is high and again
demonstrates the validity of the scaling
approach.
Figure 2.1.16
Oct 5, 2022
DeepMind Launches AlphaTensor
AlphaTensor is an AI reinforcement-learningbased system able to discover new and
efficient algorithms for matrix manipulation.
Matrix manipulation is essential to a wide
range of digital practices and is a process
that researchers have been trying to make
more efficient for decades.
Figure 2.1.17
Table of Contents
Chapter 2 Preview
78
Artificial Intelligence
Index Report 2023
Oct 20, 2022
Chapter 2: Technical Performance
2.1 What’s New in 2022: A Timeline
Google Uses PaLM to Improve
the Reasoning of PaLM
Google researchers use one of
their existing language models,
PaLM, to improve the reasoning
of the very same model. This
process is yet another example
of AI systems using their own
Figure 2.1.18
knowledge to improve.
Nov 9, 2022
International Research
Group Releases BLOOM
A collaboration of over 100
researchers from across the
globe develop an open-access
language model called BLOOM.
BLOOM impresses with its
public release and for furthering
Figure 2.1.19
the possibilities of international
collaboration in AI research.
Nov 16, 2022
Stanford Researchers Release HELM
As part of an attempt to judge new language models according to more unified standards, Stanford
researchers develop a new benchmarking approach for large language models called Holistic Evaluation
of Language Models (HELM). The launch of HELM is evidence of the AI community’s attempt to develop
transparency around increasingly powerful, capable, and influential large language models.
Figure 2.1.20
Table of Contents
Chapter 2 Preview
79
Artificial Intelligence
Index Report 2023
Nov 22, 2022
Chapter 2: Technical Performance
2.1 What’s New in 2022: A Timeline
Meta Releases CICERO
CICERO is the first AI to play in
the top 10% of human participants
in the game Diplomacy. CICERO’s
launch shows that AI systems have
improved in strategic reasoning, a
domain in which they have traditionally
struggled, and are capable of
effectively convincing humans to go
along with their objectives.
Nov 30, 2022
Figure 2.1.21
OpenAI Launches ChatGPT
ChatGPT is an impressive,
publicly usable chatbot capable
of writing university-level
essays. Months after launching,
ChatGPT reaches 100 million
monthly active users, making it
the fastest-growing consumer
application in history. ChatGPT’s
release caps a year in which
generative AI became a part
of the zeitgeist, and raises
Figure 2.1.22
questions about the effect that
AI will have on the future of
humanity.
Table of Contents
Chapter 2 Preview
80
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
Computer vision is the subfield of AI that teaches machines to understand images and videos. Computer vision technologies have a
variety of important real-world applications, such as autonomous driving, crowd surveillance, sports analytics, and video-game creation.
This section tracks progress in computer vision across several different task domains which include: (1) image classification, (2)
face detection and recognition, (3) deepfake detection, (4) human pose estimation, (5) semantic segmentation, (6) medical image
segmentation, (7) object detection, (8) image generation, and (9) visual reasoning.
2.2 Computer Vision—Image
Image Classification
Image classification is the ability of machines to
A Demonstration of Image Classification
Source: Krizhevsky et al., 2012
categorize objects in images (Figure 2.2.1).
ImageNet
ImageNet is one of the most widely used
benchmarks for image classification. This dataset
includes over 14 million images across 20,000
different object categories such as “strawberry” or
“balloon.” Performance on ImageNet is measured
through various accuracy metrics. Top-1 accuracy
measures the degree to which the top prediction
generated by an image classification model for a
given image actually matches the image’s label.
As of 2022, the best image classification system on
ImageNet has a top-1 accuracy rate of 91.0% (Figure
Figure 2.2.1
2.2.2). Although the current image classification
capabilities of state-of-the-art systems is 27.7
percentage points better than a decade ago, last
year saw a very marginal 0.1 percentage point
improvement in classification accuracy.
Table of Contents
Chapter 2 Preview
81
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
ImageNet Challenge: Top-1 Accuracy
Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
91.00%, With Extra Training Data
90%
88.50%, Without Extra Training Data
Top-1 Accuracy (%)
85%
80%
75%
70%
65%
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 2.2.2
Face Detection and
Recognition
A Demonstration of Face Detection and Recognition
Source: Forbes, 2020
Facial detection and recognition is the ability of AI
systems to identify faces or individuals in images
or videos (Figure 2.2.3). Currently, many facial
recognition systems are able to successfully identify
close to 100% of faces, even on challenging datasets
(Figure 2.2.4).
Figure 2.2.3
Table of Contents
Chapter 2 Preview
82
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
National Institute of Standards and Technology (NIST) Face Recognition Vendor Test (FRVT):
Veri cation Accuracy by Dataset
Source: National Institute of Standards and Technology, 2022 | Chart: 2023 AI Index Report
1.0000
False Non-Match Rate: FMNR (Log-Scale)
0.5000
0.2000
0.1000
0.0500
0.0297, WILD Photos @ FMR = 1e-5
0.0200
0.0100
0.0050
0.0032, BORDER Photos @ FMR = 1e-6
0.0021, MUGSHOT Photos @ FMR = 1e-5
0.0019, MUGSHOT Photos ≥ 12 YRS @ FMR = 1e-5
0.0016, VISABORDER Photos @ FMR = 1e-6
0.0020
0.0010
0.0006, VISA Photos @ FMR = 1e-6
0.0005
2017
2018
2019
2020
2021
2022
Figure 2.2.4
National Institute of Standards and
Technology Face Recognition Vendor Test
(FRVT)
others. Facial detection capacity is measured by the
Progress on facial recognition can be tracked
match the face in an image to that of a person.
through the National Institute of Standards and
Technology’s Face Recognition Vendor Test. This
test tracks how well different facial recognition
algorithms perform on various homeland security
tasks, such as identification of child trafficking
false non-match rate (FNMR), otherwise known as
error rate, which is the rate at which a model fails to
As of 2022, the top-performing models on all of the
FRVT datasets, with the exception of WILD Photos,
each posted an error rate below 1%, and as low as a
0.06% error rate on the VISA Photos dataset.
victims and cross-verification of visa images, among
Table of Contents
Chapter 2 Preview
83
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
Deepfake Detection
Real-Life Deepfake: President Zelenskyy Calling
for the Surrender of Ukrainian Soldiers
The ability of AI systems to create synthetic images
Source: NPR, 2022
that are sometimes indistinguishable from real ones
has led to the creation of deepfakes, images or
videos that appear to be real but are actually fake. In
the last year, there was a widely circulated deepfake
video of Ukrainian president Volodymyr Zelenskyy
surrendering (Figure 2.2.5).
Celeb-DF
Celeb-DF is presently one of the most challenging
Figure 2.2.5
deepfake detection benchmarks. This dataset
is composed of 590 original celebrity YouTube
algorithm on Celeb-DF came from researchers at
videos that have been manipulated into thousands
Deakin University in Australia. Their JDFD model
of deepfakes. This year’s top deepfake detection
posted an AUC score of 78 (Figure 2.2.6).
Celeb-DF: Area Under Curve Score (AUC)
Source: arXiv, 2022 | Chart: 2023 AI Index Report
78.00
Area Under Curve Score (AUC)
75
70
65
2018
2019
2020
2021
2022
Figure 2.2.6
Table of Contents
Chapter 2 Preview
84
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
Human Pose Estimation
A Demonstration of Human Pose Estimation
Source: Cong et al., 2022
Human pose estimation is the task of
estimating the position of the human body
from images (Figure 2.2.7).
MPII
MPII is a dataset of over 25,000 annotated
images which contains annotations of
more than 40,000 people doing 410 human
activities. On MPII, this year’s top model,
Figure 2.2.7
ViTPose, correctly estimated 94.3% of
keypoints (human joints), which represented
a small 0.2 percentage point increase from
the previous state-of-the-art result posted in
2020 (Figure 2.2.8).
MPII: Percentage of Correct Keypoints (PCK)
Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
95%
Percentage of Correct Keypoints (PCK)
94.30%
90%
85%
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 2.2.8
Table of Contents
Chapter 2 Preview
85
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
Semantic Segmentation
A Demonstration of Semantic Segmentation
Source: Cityscapes Dataset, 2022
Semantic segmentation involves assigning individual
image pixels to specific categories (for example,
human, bicycle, or street) (Figure 2.2.9).
Cityscapes Challenge,
Pixel-Level Semantic Labeling Task
The Cityscapes dataset is used to test the semantic
segmentation capabilities of AI. This dataset
contains 25,000 annotated images of diverse urban
Figure 2.2.9
environments. The Cityscapes dataset enables a
variety of different segmentation tasks. One of the
greater the mIoU, the better a system has performed.
most popular is the pixel-level task. Performance
on semantic segmentation is measured by mean
Performance on Cityscapes has increased by 23.4
intersection-over-union (mIoU), which represents the
percentage points since the competition launched in
degree to which the image segments predicted by the
2014; however, it has plateaued in the last few years
model overlap with the image’s actual segments. The
(Figure 2.2.10).
Cityscapes Challenge, Pixel-Level Semantic Labeling Task: Mean Intersection-Over-Union (mIoU)
Source: Cityscapes Challenge, 2022 | Chart: 2023 AI Index Report
86.46%, With Extra Training Data
Mean Intersection-Over-Union (mIoU)
85%
84.30%, Without Extra Training Data
80%
75%
70%
65%
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 2.2.10
Table of Contents
Chapter 2 Preview
86
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
Medical Image
Segmentation
A Demonstration of Medical Imaging Segmentation
Source: Jha et al., 2019
In medical image segmentation, AI systems
segment objects such as lesions or organs in
medical images (Figure 2.2.11).
Kvasir-SEG
Kvasir-SEG is a dataset for medical image
Figure 2.2.11
segmentation that contains 1,000 highquality images of gastrointestinal polyps
that were manually identified by medical
professionals. Progress on Kvasir-SEG is
measured in mean Dice, which represents
the degree to which the polyp segments
This year’s top-performing model on Kvasir-SEG, SEP, was
identified by AI systems overlap with the
created by a Chinese researcher and posted a mean Dice of
actual polyp segments.1
94.1% (Figure 2.2.12).
Kvasir-SEG: Mean Dice
Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
94.11%
Mean Dice
90%
85%
2019
2020
2021
2022
Figure 2.2.12
1 Mean Dice and mIoU are in principle quite similar. This StackExchange post outlines the differences in more detail.
Table of Contents
Chapter 2 Preview
87
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
Object Detection
A Demonstration of Object Detection
Source: Rizzoli, 2023
The challenge of identifying and localizing objects
within an image or video is known as object
detection (Figure 2.2.13).
Common Objects in Context (COCO)
Microsoft’s Common Objects in Context (COCO)
object detection dataset has over 80 object
categories in 328,000 images. Several accuracy
metrics are used to measure progress on COCO.
This section considers mean average precision
(mAP50).
Since 2015, state-of-the-art detectors have
improved by 26 percentage points. The top model
Figure 2.2.13
in 2022, EVA, was the result of a Chinese academic
research collaboration.
COCO: Mean Average Precision (mAP50)
Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
81.90%
Mean Average Precision (mAP50)
80%
70%
60%
2015
2016
2017
2018
2019
2020
2021
2022
Figure 2.2.14
Table of Contents
Chapter 2 Preview
88
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
Which Face Is Real?
Image Generation
Source: Which Face Is Real?, 2022
Image generation is the task of generating images that
are indistinguishable from real ones. In the last decade,
progress on image generation has tremendously
increased, so much so that now it would be difficult
for the average person to distinguish a real human face
from one synthetically generated by AI (Figure 2.2.15).
CIFAR-10 and STL-10
Figure 2.2.15
CIFAR-10 and STL-10 are two popular benchmarks
for tracking progress on image generation. CIFAR-10
comprises 60,000 color images across 10 different
set of images is similar to the real images on which it
was trained.
object classes; STL-10 is inspired by CIFAR-10, with
This year saw state-of-the-art results on both CIFAR-10
some modifications, including fewer labeled training
and STL-10 benchmarks (Figure 2.2.15). The top
examples and more unlabeled examples. Progress on
model on CIFAR-10, EDM-G++, came from Korean
image generation in both benchmarks is measured
researchers at KAIST. The top model on STL-10 was
by the Fréchet Inception Distance (FID) score, which
Diffusion-GAN, a collaboration between researchers at
reflects the degree to which a synthetically generated
the University of Texas at Austin and Microsoft.
CIFAR-10 and STL-10: Fréchet Inception Distance (FID) Score
Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
35
Fréchet Inception Distance (FID) Score
30
25
20
15
10
6.91, STL-10
5
1.77, CIFAR-10
0
2017
2018
2019
2020
2021
2022
Figure 2.2.16
Table of Contents
Chapter 2 Preview
89
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.1 Computer Vision–Image
Narrative Highlight:
A Closer Look at Progress in Image Generation
Figure 2.2.17 tracks the progress of facial
GAN Progress on Face Generation
image generation over time, with the final
Source: Goodfellow et al., 2014; Radford et al., 2016; Liu and Tuzel, 2016;
Karras et al., 2018; Karras et al., 2019; Goodfellow, 2019; Karras et al., 2020;
Vahdat et al., 2021; Wang et al., 2022.
image being generated by Diffusion-GAN,
the model that posted the 2022 state-ofthe-art score on STL-10.
2014
2015
2016
2017
2018
2020
2021
2022
Figure 2.2.17
In the last year, text-to-image
generation broke into the public
Images Generated by DALL-E 2, Stable Diffusion and Midjourney
Source: AI Index, 2022
consciousness with the release
of models such as OpenAI’s
DALL-E 2, Stability AI’s Stable
Diffusion, Midjourney’s
Midjourney, Meta’s Make-AScene, and Google’s Imagen.
a. DALL-E 2
With these systems, users can
generate images based on
a text prompt. Figure 2.2.18
juxtaposes the images generated
by DALL-E 2, Stable Diffusion,
and Midjourney, three publicly
accessible AI text-to-image
systems, for the same prompt: “a
panda playing a piano on a warm
evening in Paris.”
Table of Contents
b. Stable Diffusion
Chapter 2 Preview
c. Midjourney
Figure 2.2.18
90
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.1 Computer Vision–Image
Narrative Highlight:
A Closer Look at Progress in Image Generation (cont’d)
Of all the recently released text-to-image generators, Google’s Imagen performs best on the
COCO benchmark (Figure 2.2.19)2. This year, the Google researchers who created Imagen
also released a more difficult text-to-image benchmark, DrawBench, designed to challenge
increasingly capable text-to-image models.
Notable Text-to-Image Models on MS-COCO 256 × 256 FID-30K: Fréchet Inception Distance (FID) Score
Trained on COCO-FID
35.49
Not Trained on COCO-FID
32.64
30
21.42
20.79
20
17.89
12.24
2021
10.39
Make-A-Scene
DALL-E 2
7.55
7.27
Imagen
8.12
LAFITE
2020
GLIDE
2019
DALL-E
DF-GAN
2017
DM-GAN + CL
DM-GAN
0
XMC-GAN
9.33
10
AttnGAN
COCO Fréchet Inception Distance (FID) Score
Source: Saharia et al., 2022 | Chart: 2023 AI Index Report
2022
Model
Figure 2.2.19
2 The COCO benchmark, first launched in 2014, includes 328,000 images with 2.5 million labeled instances. Although it is typically used for object detection tasks, researchers
have also deployed it for image generation.
Table of Contents
Chapter 2 Preview
91
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
Visual Reasoning
Visual reasoning tests how well AI systems can reason across both textual and visual data,
as in the examples of Figure 2.2.20.
A Collection of
Visual Reasoning
Tasks
Source: Agrawal et al., 2016
Figure 2.2.20
Visual Question Answering (VQA) Challenge
The Visual Question Answering Challenge tests AI
systems with open-ended textual questions about
images. Successfully answering the questions
requires that AI systems possess vision, language, and
commonsense reasoning capabilities. This section
reports progress on the VQA V2 dataset.
This year the top-performing model on VQA V2
was PaLI, a multimodal model produced by Google
researchers (Figure 2.2.21).
Visual Question Answering (VQA) V2 Test-Dev: Accuracy
Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
85%
84.30%
80.78%, Human Baseline
Accuracy (%)
80%
75%
70%
65%
2016
2017
2018
2019
2020
2021
2022
Figure 2.2.21
Table of Contents
Chapter 2 Preview
92
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.1 Computer Vision–Image
Narrative Highlight:
The Rise of Capable Multimodal Reasoning Systems
Traditionally AI has been strong in narrow tasks,
models were introduced, for example BEiT-3 from
but it has been unable to easily generalize across
Microsoft and PaLI from Google, that posted state-
multiple domains. For instance, many image
of-the-art results across a variety of both vision and
classifiers are adept at classifying images but are
language benchmarks. For example, at the time of
incapable of understanding written text.
publication of the BEiT-3 paper, BEiT-3 posted state-
However, recent technical progress in AI has
begun to challenge this notion. In 2022, several
of-the-art results for four different vision skills and
five different vision-language skills (Figure 2.2.22).
BEiT-3Vs.
Vs. Previous State-of-the-Art
State-of-the-Art Models
Models
BEiT-3
BEiT-3
Source: Wang et Previous
al., 2022 | Table: 2023 AI Index Report
Source: Wang
Wang et
et al.,
al., 2022
2022 || Table:
Table: 2023
2023 AI Index Report
Source:
Category
Category
Category
Task
Task
Task
Dataset
Dataset
Metric
Metric
Previous SOTA
Previous SOTA
Vision
Vision
Vision
Semantic
Semantic
Semantic
Segmentation
Segmentation
Segmentation
ADE20K
ADE20K
mIoU
mIoU
61.40 FD-SwimV2
61.40 FD-SwimV2
62.80
62.80
62.80
2.28%
2.28%
2.28%
Vision
Vision
Vision
Object
Object
Object
Detection
Detection
Detection
COCO
COCO
AP
AP
63.30 DINO
63.30 DINO
63.70
63.70
63.70
0.63%
0.63%
0.63%
Vision
Vision
Vision
Instance
Instance
Instance
Segmentation
Segmentation
Segmentation
COCO
COCO
AP
AP
54.70 Mask DINO
54.70 Mask DINO
54.80
54.80
54.80
0.18%
0.18%
0.18%
Vision
Vision
Vision
Image
Image
Image
Classi cation
Classi cation
Classi
ImageNet
ImageNet
Top-1 Accuracy
Top-1 Accuracy
89.00 FD-CLIP
89.00 FD-CLIP
89.60
89.60
89.60
0.67%
0.67%
0.67%
Vision-Language Visual
Vision-Language Visual
Vision-Language
Visual
Reasoning
Reasoning
Reasoning
Vision-Language Visual QA
Vision-Language Visual
Visual QA
Vision-Language
Vision-Language Image
Vision-Language Image
Vision-Language
Image
Captioning
Captioning
Captioning
NLVR
NLVR
Accuracy
Accuracy
87.00 CoCA
87.00 CoCA
92.60
92.60
92.60
6.44%
6.44%
6.44%
VQAv2
VQAv2
COCO
COCO
VQA Accuracy
VQA Accuracy
CIDEr
CIDEr
82.30 CoCA
82.30 CoCA
145.30 OFA
145.30 OFA
84.00
84.00
84.00
147.60
147.60
147.60
2.07%
2.07%
2.07%
1.58%
1.58%
1.58%
Vision-Language Finetuned
Vision-Language Finetuned
Vision-Language
Finetuned
Retrieval
Retrieval
Retrieval
COCO
COCO
Flickr30K
Flickr30K
R@1
R@1
72.50 Florence
72.50 Florence
76.00
76.00
76.00
4.83%
4.83%
4.83%
Vision-Language Zero-Shot
Vision-Language Zero-Shot
Vision-Language
Zero-Shot
Retrieval
Retrieval
Retrieval
Flickr30K
Flickr30K
Flickr30K
R@1
R@1
R@1
86.50 CoCA
86.50
86.50 CoCA
CoCA
88.20
88.20
88.20
1.97%
1.97%
1.97%
Model of
Model of
Previous SOTA
Previous SOTA
BEiT-3
BEiT-3
Scale of
Scale
Scale of
of
Improvement
Improvement
Improvement
Figure 2.2.22
Table of Contents
Chapter 2 Preview
93
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.1 Computer Vision–Image
Narrative Highlight:
The Rise of Capable Multimodal Reasoning Systems (cont’d)
Figure 2.2.23 shows some of the different vision-language tasks challenging multimodal systems like
PaLI and BEiT-3.
A Collection of Vision-Language Tasks
Source: Chen et al., 2022
Figure 2.2.23
Table of Contents
Chapter 2 Preview
94
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.2 Computer Vision—Image
Visual Commonsense Reasoning (VCR)
example of a question posed in VCR. Performance on
The Visual Commonsense Reasoning challenge, first
VCR is tracked in the Q->AR score, which combines
launched in 2019, is a relatively new benchmark in
the ability of machines to select the right answer
which AI systems must answer questions presented
for the question (Q->A) and the ability to select the
from images, as in VQA, but also select the reasoning
correct rationale behind the answer (Q->R).
behind their answer choices. Figure 2.2.24 shows an
A Sample Question from the Visual Commonsense Reasoning (VCR) Challenge
Source: Zellers et al., 2018
Figure 2.2.24
VCR is one of the few visual benchmarks considered
in this report on which AI systems have yet to surpass
human performance, as shown in Figure 2.2.25.
Visual Commonsense Reasoning (VCR) Task: Q->AR Score
Source: VCR Leaderboard, 2022 | Chart: 2023 AI Index Report
85.00, Human Baseline
80
75.60
Q->AR Score
70
60
50
2018
2019
2020
2021
2022
Figure 2.2.25
Table of Contents
Chapter 2 Preview
95
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.3 Computer Vision—Video
Video analysis concerns reasoning or task operation across videos, rather than single images.
2.3 Computer Vision—Video
Activity Recognition
Kinetics-400, Kinetics-600, Kinetics-700
Activity recognition is the categorization of activities
series of datasets for benchmarking video activity
that occur in videos. Certain activities, such as
recognition. Each dataset includes 650,000 large-
sitting, sleeping, or walking, are easier for AI systems
scale, high-quality video clips from YouTube that display
to categorize than others which involve multiple
a wide range of human activities, and each asks AI
steps—for example, preparing dinner.
systems to classify an action from a possible set of 400,
Kinetics-400, Kinetics-600, and Kinetics-700 are a
600, and 700 categories, respectively (Figure 2.3.1).
Example Classes From the Kinetics Dataset
Source: Kay et al., 2017
Figure 2.3.1
Table of Contents
Chapter 2 Preview
96
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.3 Computer Vision—Video
As of 2022, there is a 7.8 percentage point gap in performance between the top system on Kinetics-600 and
Kinetics-700, which suggests the 700 series dataset is still a meaningful challenge for video computer vision
researchers (Figure 2.3.2).
Kinetics-400, Kinetics-600, Kinetics-700: Top-1 Accuracy
Source: Papers With Code, 2021; arXIv, 2022 | Chart: 2023 AI Index Report
91.80%, Kinetics-600
91.10%, Kinetics-400
90%
Top-1 Accuracy (%)
84.00%, Kinetics-700
80%
70%
60%
2016
2017
2018
2019
2020
2021
2022
Figure 2.3.2
Table of Contents
Chapter 2 Preview
97
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.3 Computer Vision—Video
Narrative Highlight:
A Closer Look at the Progress of Video Generation
Multiple high quality text-to-video models,
In September 2022, CogVideo’s top score was
AI systems that can generate video clips from
significantly surpassed by Meta’s Make-A-Video
prompted text, were released in 2022 . In May,
model (Figure 2.3.3). Make-A-Video performed
researchers from Tsinghua University and the
63.6% better on UCF-101 than CogVideo. And, in
Beijing Academy of Artificial Intelligence released
October 2022, Google released a text-to-video
CogVideo, a model that posted the then-highest
system called Phenaki; however, this model was
inception score on the UCF-101 benchmark for
not benchmarked on UCF-101.
3
text-to-video generation (Figure 2.3.3).
Notable Text-to-Video Models on UCF-101: Inception Score (IS)
Source: Hong et al., 2022; Singer et al., 2022 | Chart: 2023 AI Index Report
79.28
80
82.55
70
Inception Score (IS)
60
50.46
50
40
32.7
32.36
30
27.38
28.87
24.69
20
10
0
DVD-GAN
2019
TGANv2
2020
VideoGPT
MoCoGAN-HD
2021
DIGAN
CogVideo
TATS-base
Make-A-Video
2022
Model
Figure 2.2.3
3 Although these models are impressive, it is worth noting that they are thus far only capable of generating videos of a few seconds’ duration.
Table of Contents
Chapter 2 Preview
98
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
Natural language processing (NLP) is the ability of computer systems to understand text. The last few years have seen the release of
increasingly capable “large language models,” AI systems like PaLM, GPT-3, and GLM-130B, that are trained on massive amounts of data
and adaptable to a wide range of downstream tasks.
In this section, progress in NLP is tracked across the following skill categories: (1) English language understanding, (2) text summarization,
(3) natural language inference, (4) sentiment analysis, (5) multitask language understanding, and (6) machine translation.
2.4 Language
English Language
Understanding
SuperGLUE
English language understanding challenges AI
of AI models on eight different linguistic tasks.
systems to understand the English language in
A selection of these tasks is highlighted in Figure
various ways: reading comprehension, yes/no
2.4.1. Their performance is then aggregated into a
reading comprehension, commonsense reading
single metric.
SuperGLUE is a comprehensive English language
understanding benchmark that tracks the progress
comprehension, and logical reasoning.
A Set of SuperGLUE Tasks4
Source: Wang et al., 2019
Figure 2.4.1
4 For the sake of brevity, this figure only displays four of the eight tasks.
Table of Contents
Chapter 2 Preview
99
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
This year’s top model on SuperGLUE, Vega, registered a new state-of-the-art score of 91.3, which is 1.5
percentage points higher than the human baseline. Performance on SuperGLUE is continuing to saturate.
SuperGLUE: Score
Source: SuperGLUE Leaderboard, 2022 | Chart: 2023 AI Index Report
91.30
91
90
89.80, Human Baseline
Score
89
88
87
86
85
2019
2020
2021
2022
Figure 2.4.2
Reading Comprehension
Dataset Requiring Logical
Reasoning (ReClor)
In response to the saturation of
traditional reading comprehension
benchmarks, researchers from the
National University of Singapore
launched ReClor in 2020. ReClor,
or Reading Comprehension Dataset
Requiring Logical Reasoning, is a
dataset of logical reasoning questions
taken from the LSAT, the entrance
exam for law schools in the United
States and Canada. A sample
question is shown in Figure 2.4.3
A Sample Question from the Reading Comprehension Dataset
Requiring Logical Reasoning (ReClor)
Source: Yu et al., 2020
Context: When a certain gland becomes cancerous in humans, it produces high levels
of a particular protein. A blood test can determine the level of this protein well before
a cancer of the gland could be detected by other means. Some doctors recommend
that aggressive anticancer treatment should be begun as early as possible for anyone
who is tested and is found to have high levels of the protein.
Question: Which one of the following, if true, most seriously weakens the doctors’
recommendation?
A. The blood test for the protein has been in use for some time to monitor the
condition of patients who have been diagnosed as having cancer of the gland.
B. Before the blood test became available, about one-third of all cases of cancer of
the gland were detected in early stages.
C. So far, no patients whose protein levels were found to be normal have
subsequently developed cancer of the gland.
D. Enlargement of the gland, a common condition infrequently associated with
cancer, results in high levels of the protein.
Figure 2.4.3
Table of Contents
Chapter 2 Preview
100
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
Figure 2.4.4 examines progress on ReClor. The top 2022 result of 80.6% represented an 18 percentage point
improvement from 2020, the year the benchmark was released.
Reading Comprehension Dataset Requiring Logical Reasoning (ReClor): Accuracy
Source: ReClor Leaderboard, 2022; Papers With Code, 2022 | Chart: 2023 AI Index Report
80.60%
80%
Accuracy (%)
75%
70%
65%
2020
2021
2022
Figure 2.4.4
Table of Contents
Chapter 2 Preview
101
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
Narrative Highlight:
Just How Much Better Have Language Models Become?
The AI Index tested how three large language models from three different years, GPT-2 (2019), GPT-3
(2020), and ChatGPT (2022), handle the same prompt: “Explain to me the major accomplishments of
Theodore Roosevelt’s presidency.” More recent models are able to answer this question more effectively,
both in terms of factual accuracy and quality of writing.
GPT-2 (February 2019)5
GPT-3 (June 2020)7
ChatGPT (November 2022)
(The most notable is the establishment of a
national telephone and mail exchange company.)
Theodore Roosevelt’s presidency
was marked by a number of
accomplishments. He is credited with
leading the United States out of the
Great Depression and into World War II.
He also spearheaded the creation of the
National Park Service and the National
Wildlife Federation. Roosevelt also
played a major role in the development
of the Panama Canal.
Theodore Roosevelt, President from 1901-1909,
accomplished the following major accomplishments
during his presidency:
Commentary: This answer is an
improvement over that of GPT-2 in
that it actually somewhat responds to
the question, but still shows significant
flaws. GPT-3 is correct in asserting that
Theodore Roosevelt was instrumentally
involved in the development of the
Panama Canal. However, he did not
lead the United States out of the Great
Depression and into World War II—that
was done by a different American
president, Franklin Delano Roosevelt.
It appears that GPT-3 confused
some of Franklin Delano Roosevelt’s
accomplishments with those of
Theodore Roosevelt. Finally, although
Theodore Roosevelt did play a significant
role in establishing many national parks,
he did not in fact create the National
Park Service or the National Wildlife
Federation. The National Park Service
was created in 1916 by Woodrow Wilson,
and the National Wildlife Federation
in 1936, much later than Theodore
Roosevelt’s presidency. 8
3. Labor Reforms: Improved working conditions
for Americans through advocacy for better hours,
wages, and workplace safety.
But the only things that will change with a
president who knows how to operate a business
is his understanding of the American system
of government. The public knows all about our
problems, yet at least half of the people who
need it won’t know anything about ours. We can
no longer have this. Our system of government
is the only one in which you really need a public
relations firm to work hard.
The system of governmental control, while
working, is not always effective in ensuring that
the very things that our economy needs, do
not always be at the top of everybody’s list. So
I wanted to know how successful the system
you employ for managing the public’s needs is
in helping the American economy grow, how
efficient the service our economy provides is
being made, and how it can better serve the
needs of a growing number of Americans.
I wanted to know.
This was before I ever hired my own lawyer
to work on the case. This was before I ever
hired my own law firm to handle government
investigations. For my first major job, at least,
I had to convince two top aides to get paid
$90,000 for the role, and before I even started on
the case, I had to convince one top staffer to get
paid even $100,000 to have an interview with my
lead prosecutor...6
Commentary: Most of this answer is gibberish.
GPT-2 is able to respond with sentences that are
largely grammatical, but fails to actually answer
the question in any meaningful way.
1. Conservation: Established numerous national
parks, monuments, and forests, including the Grand
Canyon National Park.
2. Trust-busting: Took on monopolies, such as
Standard Oil, through antitrust laws and broke up
large corporations.
4. Progressive Reforms: Promoted progressive
policies, including regulation of big business and
protection of natural resources.
5. National Infrastructure: Improved national
infrastructure through construction of the Panama
Canal and funding for dam projects.
6. Diplomacy: Negotiated the end of the RussoJapanese War, resulting in his being awarded the
Nobel Peace Prize.
7. Foreign Policy: Established the Roosevelt
Corollary, asserting U.S. intervention in Latin America
to maintain stability and protect American interests.
Commentary: ChatGPT’s response is the strongest
of the three. All but one of the facts about Theodore
Roosevelt’s presidency are correct. Roosevelt
proclaimed the Grand Canyon Game Preserve in
1906 and designated it a national monument in
1908. However, the Grand Canyon National Park
was itself created by Woodrow Wilson.9 Otherwise
the response is factually accurate. The response is
also more thorough and concise than the answers
provided by GPT-2 and GPT-3.
5 GPT-2 used the 124M parameter model downloaded from OpenAI’s GitHub page.
6 The complete answer outputted by GPT-2 is trimmed here for brevity. The full answer is included in the Appendix.
7 The specific GPT-3 model that was used was text-curie-001, which has training data up to October 2019.
8 The information in this section has been cross-verified with the Encyclopedia Britannica entries on Theodore Roosevelt, Franklin Delano Roosevelt, Woodrow Wilson, and the
National Park Service, as well as the history page of the National Wildlife Federation.
9 Information on the history of the Grand Canyon National Park was cross-verified with the Wikipedia entry on the Grand Canyon National Park.
Table of Contents
Chapter 2 Preview
102
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
Narrative Highlight:
Planning and Reasoning in Large Language Models
As illustrated above, AI systems have become
The authors then tested notable language models
increasingly strong on a wide range of reasoning
on these tasks in a Blocksworld problem domain,
tasks. This improvement has led many to claim that
a problem environment where agents are given
emerging AI systems, especially large language
blocks of different colors and tasked with arranging
models, possess reasoning abilities that are
these blocks in particular orders. The authors
somewhat similar to those possessed by humans.
demonstrated that these large language models
Other authors, however, have argued otherwise.
performed fairly ineffectively (Figure 2.4.5). While
10
11
GPT-3, Instruct-GPT3, and BLOOM demonstrated
In 2022, researchers (Valmeekam et al., 2022)
introduced a more challenging planning and reasoning
test for large language models that consists of seven
the ability, in some contexts, to reformulate goals
in robust ways, they struggled with other tasks like
plan generation, optimal planning, and plan reuse.
assignments: (1) plan generation, (2) cost-optimal
Compared to humans, the large language models
planning, (3) reasoning about plan execution, (4)
performed much worse, suggesting that while they
robustness to goal reformulation, (5) ability to reuse
are capable, they lack human reasoning capabilities.
plans, (6) replanning, and (7) plan generalization.12
Select Large Language Models on the Blocksworld Domain: Instances Correct
Source: Valmeekam et al., 2022 | Chart: 2023 AI Index Report
0.6%
Plan Generation
GPT-3
5.0%
Instruct-GPT3
0.5%
Optimal Planning
BLOOM
0.2%
3.2%
0%
5.6%
4.8%
3.0%
Replanning
6.6%
Plan Generalization
9.8%
11.0%
0%
Plan Reuse
14.4%
0%
77.4%
76.8%
Robustness to Goal Reformulation
(Shu ing Goal Predicates)
21.0%
69.2%
Robustness to Goal Reformulation
(Full → Partial)
Robustness to Goal Reformulation
(Partial → Full)
0%
76.0%
9.0%
22.0%
60.2%
5.0%
10%
20%
30%
40%
50%
Instances Correct (%)
60%
10 Some of the papers that claim language models can reason include: Kojima et al., 2022; Chowdhery et al., 2022; Li et al., 2021; Wei et al., 2022.
11 Valmeekam et al., 2022 advances this claim.
12 A complete description of these tasks can be found in the paper.
Table of Contents
Chapter 2 Preview
70%
80%
Figure 2.4.5
103
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
Text Summarization
Text summarization tests how well AI systems can
arXiv and PubMed
synthesize a piece of text while capturing its core
ArXiv and PubMed are two widely used datasets for
content. Text summarization performance is judged
benchmarking text summarization. The model that
on ROUGE (Recall-Oriented Understudy for Gisting
posted the state-of-the-art score in 2022 on both
Evaluation), which measures the degree to which
arXiv and PubMed, AdaPool, was developed by a
an AI-produced text summary aligns with a human
team from Salesforce Research (Figure 2.4.6).
reference summary.
ArXiv and PubMed: ROUGE-1
Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
51.05, PubMed
50.95, arXiv
50
ROUGE-1
45
40
35
2017
2018
2019
2020
2021
2022
Figure 2.4.6
Table of Contents
Chapter 2 Preview
104
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
Natural Language Inference
uncertain premises. Imagine, for example, that Peter
Also known as textual entailment, natural language
window shattered and his laptop, which he left in the
inference is the ability of AI systems to determine
whether a hypothesis is true, false, or undetermined
based on presented premises.
Abductive Natural Language Inference (aNLI)
Abductive natural language inference is a form
of natural language inference in which plausible
conclusions must be drawn from a set of limited and
returns to his car after dinner at a restaurant to find the
back seat, missing. He might immediately conclude
that a thief broke into his car and stole the laptop.
In 2019, the Allen Institute for AI launched aNLI, a
comprehensive benchmark for abductive natural
language inference that includes 170,000 premise
and hypothesis pairs (Figure 2.4.7).
Sample Question From
the Abductive Natural
Language Inference
Benchmark (aNLI)
Source: Allen Institute for AI, 2021
Figure 2.4.7
Abductive natural language inference is a challenging task. The human baseline remained
unsurpassed until 2022, when an AI system registered a score of 93.7% (Figure 2.4.8).
Abductive Natural Language Inference (aNLI): Accuracy
Source: Allen Institute for AI, 2022 | Chart: 2023 AI Index Report
94%
93.65%
92.90%, Human Baseline
Accuracy (%)
92%
90%
88%
86%
84%
2019
2020
2021
2022
Figure 2.4.8
Table of Contents
Chapter 2 Preview
105
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
Sentiment Analysis
A Sample Sentence from SST
Source: Socher et al., 2013
Sentiment analysis applies NLP techniques to
identify the sentiment of a particular text. It is used
by many businesses to better understand customer
reviews.
SST-5 Fine-Grained Classification
The Stanford Sentiment Treebank (SST) is a dataset
of 11,855 single sentences taken from movie reviews
that are then transformed into 215,154 unique phrases
whose sentiments have been annotated by human
judges (Figure 2.4.9).
Figure 2.4.9
A new state-of-the-art score of 59.8% was posted on SST-5 fine-grained classification by the
Heinsen Routing + RoBERTa Large model (Figure 2.4.10).
SST-5 Fine-Grained: Accuracy
Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
60%
59.80%
Accuracy (%)
55%
50%
45%
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 2.4.10
Table of Contents
Chapter 2 Preview
106
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
Multitask Language
Understanding
Sample Questions From MMLU
Source: Hendrycks et al., 2021
a) Sample Math Questions
A common criticism of language benchmarks
such as GLUE and SuperGLUE is that they do not
accurately test how capable language models are at
applying the knowledge they learn across different
domains.13 Multitask language understanding tests
the ability of language models to reason across
specialized subject domains.
Massive Multitask Language Understanding
(MMLU)
b) A Sample Microeconomics Question
Massive Multitask Language Understanding (MMLU)
evaluates models in zero-shot or few-shot settings
across 57 diverse subjects in the humanities, STEM,
Figure 2.4.11
and the social sciences (Figure 2.4.11).
Gopher, Chinchilla, and variants of PaLM have each posted state-of-the-art results on MMLU. The current top
result on MMLU comes from Flan-PaLM, a Google model that reports an average score of 75.2% (Figure 2.4.12).
MMLU: Average Weighted Accuracy
Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
75.20%
70%
Accuracy (%)
60%
50%
40%
30%
2019
2020
2021
2022
Figure 2.4.12
13 This criticism is more formally articulated in Hendrycks et al., 2021.
Table of Contents
Chapter 2 Preview
107
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.4 Language
Machine Translation (MT)
Number of Commercially Available MT Systems
Machine translation studies how well AI software
The popularity of AI-based machine translation is
can translate languages. In the last five years,
manifested in the number of commercial machine
machine translation has been dominated by neural
translation services on the market. Since 2017, the total
networks which power current tools like DeepL and
number of independent machine translation services
Google Translate.
has increased six times (Figure 2.4.13).
Number of Independent Machine Translation Services
Source: Intento, 2022 | Chart: 2023 AI Index Report
Commercial
80
Open Source Pre-trained
Number of Independent Machine Translation Services
Preview
70
60
54
50
46
40
34
45
30
26
21
20
10
0
23
28
16
9
10
8
9
2017-May
2017-Jul
12
13
9
10
2017-Nov
2018-Mar
18
38
21
23
15
2018-Jul
2018-Dec
2019-Jun
2019-Nov
6
5
5
2020-Jul
2021-Sep
2022-Jul
Figure 2.4.13
Table of Contents
Chapter 2 Preview
108
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.5 Speech
AI systems that work with human speech are usually tasked with converting spoken words into text and recognizing the individuals speaking.
2.5 Speech
Speech Recognition
task of matching certain speech with a particular
Speech recognition is the ability of AI systems to
been expanded; however, the data in this subsection
identify spoken words and convert them into text.
Speech recognition has progressed so much so
that nowadays many computer programs or texting
apps are equipped with dictation devices that can
seamlessly transcribe speech into writing.
individual. Over the years, the VoxCeleb dataset has
tracks progress on the original dataset.
This year’s top result on the original VoxCeleb dataset
was posted by American researchers, whose model
achieved an equal error rate of 0.1%, which represents
a 0.28 percentage point decrease from the state-of–
VoxCeleb
VoxCeleb is a large-scale audiovisual dataset of
human speech for speaker recognition, which is the
the-art result achieved by Chinese researchers in the
previous year (Figure 2.5.1).
VoxCeleb: Equal Error Rate (EER)
Source: VoxCeleb, 2022 | Chart: 2023 AI Index Report
8%
Equal Error Rate (EER)
6%
4%
2%
0.14%
0%
2017
2018
2019
2020
2021
2022
Figure 2.5.1
Table of Contents
Chapter 2 Preview
109
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.5 Speech
Narrative Highlight:
Whisper
One of the major themes in the last few years of AI progress has been the emergence of large language
models that are trained on massive amounts of data and capable of executing a diverse range of tasks.
In 2022, this idea of training on large data to achieve cross-domain performance arrived in the world of
speech recognition with OpenAI’s launch of Whisper.
Whisper is a large-scale speech recognition model that was trained in a weakly supervised way
on 700,000 hours of audio data. Whisper was capable of strong, although not state-of-the-art,
performance on many speech recognition tasks in zero-shot settings.14 Whisper outperformed wav2vec
2.0 Large, another speech recognition model, across a wide range of popular English speech recognition
benchmarks (Figure 2.5.2). Similarly, Whisper proved to be a better speech translator than many other
leading AI translator models (Figure 2.5.3). Whisper also outperformed other commercial automated
speech recognition systems and scored similarly to top human transcription services (Figure 2.5.4).15
Despite this impressive performance, there were still some speech tasks, like language identification, on
which Whisper trailed state-of-the-art models (Figure 2.5.5).
wav2vec 2.0 Large (No LM) Vs. Whisper Large V2
Across Datasets
Notable Models on X→EN Subset of CoVoST 2
Source: Radford et al., 2022 | Chart: 2023 AI Index Report
Source: Radford et al., 2022 | Chart: 2023 AI Index Report
6.2%
5.2%
LibriSpeech Other
Zero-Shot Whisper
36.4%
7.7%
3.9%
WSJ
34.8%
CallHome
17.6%
MAESTRO
25.2%
mSLAM-CTC (2B)
24.8%
28.3%
Switchboard
13.8%
37.0%
AMI IHM
16.9%
35.6%
CORAAL
16.2%
17.9%
VoxPopuli En
7.3%
65.8%
CHiME-6
25.5%
10.5%
TED-LIUM
4.0%
FLEURS En
4.4%
XLS-R (2B)
22.1%
14.6%
29.9%
Common Voice
9.0%
24.5%
Artie
LibriSpeech Clean
29.1%
67.6%
AMI SDM1
6.2%
0%
wav2vec 2.0 Large (No LM)
XMEF-X
14.7%
Whisper Large V2
2.7%
2.7%
10%
20%
30%
40%
50%
Word Error Rate (%)
60%
70%
Figure 2.5.2
0%
10%
20%
30%
Bilingual Evaluation Understudy (BLEU) Score
Figure 2.5.3
14 Zero-shot learning refers to the ability of an AI system to learn a particular task without being trained on that task.
15 Kincaid46 is a dataset of 46 audio files and transcripts that were published in the blog post, “Which automatic transcription service is the most accurate?—2018.”
Table of Contents
Chapter 2 Preview
110
Artificial Intelligence
Index Report 2023
Narrative Highlight:
Chapter 2: Technical Performance
2.5 Speech
Whisper (cont’d)
Notable Speech Transcription Services on Kincaid46
Source: Radford et al., 2022 | Chart: 2023 AI Index Report
ASR
Computer-Assisted
9.66%
Company B
9.74%
Company C
10.90%
Company D
12.20%
Company E
7.61%
Company F
8.14%
Company G
8.65%
Company H
Language Identi cation Accuracy (%)
8.81%
Company A
77.7%
80%
71.4%
64.5%
60%
40%
20%
8.96%
Company I
0%
Source: Radford et al., 2022 | Chart: 2023 AI Index Report
Human Transcription
Whisper
Notable Models on FLEURS: Language Identi cation
Accuracy
10.50%
2%
4%
6%
8%
10%
Median Word Error Rate (%)
12%
14%
0%
w2v-bert-51 (0.6B)
Figure 2.5.4
mSLAM-CTC (2B)
Zero-shot Whisper
Figure 2.5.5
Whisper represents a breakthrough in state-of-the-art speech recognition systems. Traditionally, such
systems were either pre-trained using supervised learning methods or pre-trained without supervision
but required fine-tuning. Acquisition of data for supervised pre-training is time-consuming and costly.
However, pre-training without supervision still requires further algorithmic specification to realize a desired
objective like speech recognition. Algorithmic specification itself often requires a skilled practitioner.
Whisper resolves these issues by demonstrating that a speech recognition system can perform well across
a diverse range of tasks with massive amounts of unlabeled speech data.
Table of Contents
Chapter 2 Preview
111
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.6 Reinforcement Learning
In reinforcement learning, AI systems are trained to maximize performance on a given task by interactively learning from their prior
actions. Systems are rewarded if they achieve a desired goal and punished if they fail.
2.6 Reinforcement Learning
Reinforcement Learning
Environments
The Different Environments in Procgen
Source: OpenAI, 2019
Reinforcement learning agents require environments,
not datasets, to train: They must be trained in
environments where they can experiment with
various actions that will allow them to identify
optimal game strategies.
Procgen
Procgen is a reinforcement learning environment
introduced by OpenAI in 2019. It includes
16 procedurally generated video-game-like
environments specifically designed to test the
Figure 2.6.1
ability of reinforcement learning agents to learn
generalizable skills (Figure 2.6.1). Performance on
Procgen is measured in terms of mean-normalized
score. Researchers typically train their systems on
200 million training runs and report an average score
across the 16 Procgen games. The higher the system
scores, the better the system.
Table of Contents
Chapter 2 Preview
112
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.6 Reinforcement Learning
A team of industry and academic researchers from Korea posted the top score of 0.6 on Procgen in 2022 (Figure 2.6.2).
Procgen: Mean of Min-Max Normalized Score
Source: arXiv, 2022 | Chart: 2023 AI Index Report
Mean of Min-Max Normalized Score
0.57
0.50
0.40
2019
2020
2021
2022
Figure 2.6.2
Table of Contents
Chapter 2 Preview
113
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.6 Reinforcement Learning
Narrative Highlight:
Benchmark Saturation
An emerging theme in this year’s AI Index is the observed performance saturation across many popular
technical performance benchmarks. Last year’s AI Index Report observed a similar trend; however,
benchmark saturation has been particularly pronounced this year. Figure 2.6.3 shows the relative
improvement since the benchmark first launched (overall improvement) and relative improvement within
the last year (YoY improvement) on AI technical benchmarks considered in this year’s AI Index. The
improvements are reported as percent changes.
For all but 7 of the benchmarks, the improvement registered is less than 5%. The median improvement
within the last year is 4%, while the median improvement since launch is 42.4%.16 Moreover, this year the
AI Index elected not to feature traditionally popular benchmarks like SQuAD1.1 and SQuAD2.0, as no
new state-of-the-art results were posted. Moreover, the speed at which benchmark saturation is being
reached is increasing. Researchers have responded to this increasing saturation by launching newer and
more comprehensive benchmarking suites such as BIG-bench and HELM.
Improvement Over Time on Select AI Index Technical Performance Benchmarks
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Overall Improvement
YoY Improvement
120%
Improvement (%)
100%
80%
60%
40%
MMLU
SST-5
Procgen
Language
ANLI
PubMed
arXiv
ReClor
SuperGLUE
Kinetics-700
Kinetics-600
Kinetics-400
COCO
VQA
VCR
Vision Video
VoxCeleb
Vision Image
CIFAR-10
STL-10
Kvasir-SEG
Cityscapes
MPII
Celeb-DF
FVRT
0%
ImageNet Top-1
20%
SR
RL
Benchmark
Figure 2.6.3
16 The improvements reviewed in this section are reported as relative change. Figure 2.6.3 should therefore not be used to conduct comparisons of improvements across
benchmarks, as each benchmark has different parameters.
Table of Contents
Chapter 2 Preview
114
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.7 Hardware
Deep learning AI algorithms are trained on GPUs or TPUs, which accelerate the training speed of AI systems. As AI systems process
ever-larger datasets, it is crucial to monitor advancements in hardware capabilities.
2.7 Hardware
MLPerf Training
every AI skill category had significantly decreased.
MLPerf is an AI training competition run by the
This year, this trend has continued, albeit at a slightly
ML Commons organization. In this challenge,
slower pace. Record-low training times were posted
participants train ML systems to execute various
in the object detection, speech recognition, image
tasks using a common architecture. Entrants are then
segmentation, recommendation, image classification,
ranked on their absolute wall clock time, which is
and language processing categories (Figure 2.7.1).
how long it takes for the system to train.
In categories like image classification and object
Last year, the AI Index observed that since the
competition launched, training times for virtually
detection, the top AI systems can now train roughly
32 times quicker than in 2018, when the competition
first launched.
MLPerf Training Time of Top Systems by Task: Minutes
Source: MLPerf, 2022 | Chart: 2023 AI Index Report
60
Training Time (Minutes; Log Scale)
40
20
10
5
2.25, Object Detection (Heavyweight)
2.15, Speech Recognition
2
1.22, Image Segmentation
1
0.52, Recommendation
0.5
0.34, Object Detection (Lightweight)
0.2
0.19, Image Classi cation
0.18, Language Processing
2018
2019
2020
2021
2022
Figure 2.7.1
Table of Contents
Chapter 2 Preview
115
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.7 Hardware
Data on the number of accelerators used by the
between the mean number of accelerators used by
hardware systems submitted to MLPerf also
all entrants and the average accelerators used by the
suggests that stronger hardware has been powering
systems that post the top results.17 This gap suggests
decreasing training times (Figure 2.7.2). Since the
that having better hardware is essential to training the
start of the MLPerf competition, the gap has grown
fastest systems.
MLPerf Hardware: Accelerators
Source: MLPerf, 2022 | Chart: 2023 AI Index Report
4,500
4,216, Maximum Number of Accelerators Used
4,000
Number of Accelerators
3,500
3,000
2,500
2,000
1,859, Average Accelerators Used by Top System
1,500
1,000
500
211, Mean Number of Accelerators
0
20
18-
De
c-1
20
192
20
Ju
n-1
20
0
20
-Ju
l-2
21-
9
20
Ju
n-3
21-
0
20
De
c-0
1
22
20
-Ju
22
n-2
9
-N
ov
-09
Figure 2.7.2
17 An accelerator, like a GPU or TPU, is a chip that is chiefly used for the machine learning component of a training run.
Table of Contents
Chapter 2 Preview
116
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.7 Hardware
MLPerf Inference
Figures 2.7.3 to 2.7.6 plot the throughput of the state-of-
In deploying AI, inference is the step where trained
the-art submissions on MLPerf Inference across four skill
AI systems generate predictions, e.g. classifying
categories: image classification, language processing,
objects.
recommendation, and speech recognition. The number of
In 2020, ML Commons introduced MLPerf Inference,
a performance benchmarking suite that measures
how fast a trained AI system can process inputs and
produce inferences. The MLPerf Inference suite
tracks the throughput of AI systems, measured in
samples per second or queries per second.18
MLPerf Best-Performing Hardware for Image
Classi�cation: O�ine and Server Scenario
Source: MLPerf, 2022 | Chart: 2023 AI Index Report
700k
inferences generated by the top-performing AI systems
has significantly increased since the first iteration of the
competition in 2020. For example, the number of offline
samples generated by the top image classifiers and
language processors have more than doubled since 2020,
while those for recommendation systems have increased
by roughly 23%.
MLPerf Best-Performing Hardware for Language
Processing: O ine and Server Scenario
Source: MLPerf, 2022 | Chart: 2023 AI Index Report
679,915, O ine (Samples/s)
650k
75,153, O ine (Samples/s)
70,992, Server (Queries/s)
70k
630,221, Server (Queries/s)
600k
60k
Throughput
Throughput
550k
500k
450k
400k
50k
40k
350k
30k
300k
250k
2020
2021
2022
Figure 2.7.3
MLPerf Best-Performing Hardware for
Recommendation: O ine and Server Scenario
Source: MLPerf, 2022 | Chart: 2023 AI Index Report
2020
2022
Figure 2.7.4
MLPerf Best-Performing Hardware for Speech
Recognition: O ine and Server Scenario
Source: MLPerf, 2022 | Chart: 2023 AI Index Report
160k
2.7M
155,811, O ine (Samples/s)
2,683,620, Server (Queries/s)
2,645,980, O ine (Samples/s)
150k
2.6M
140k
136,498, Server (Queries/s)
130k
Throughput
2.5M
Throughput
2021
2.4M
120k
110k
100k
2.3M
90k
2.2M
80k
2.1M
70k
2020
2021
2022
Figure 2.7.5
2020
2021
2022
Figure 2.7.6
18 The following blog post from Dell Technologies offers a good distinction between offline and server samples: “Offline—one query with all samples is sent to the system under test (SUT).
The SUT can send the results back once or multiple times in any order. The performance metric is samples per second. Server—the queries are sent to the SUT following a Poisson distribution
(to model real-world random events). One query has one sample. The performance metric is queries per second (QPS) within the latency bound.”
Table of Contents
Chapter 2 Preview
117
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.7 Hardware
Trends in GPUs: Performance and Price
the performance of a computational device. The higher
This year, the AI Index built on work previously
the FLOP/s, the better the hardware.
done by the research collective Epoch and analyzed
trends over time in GPU performance and price.19
Figure 2.7.8 showcases the median single performance
of new GPUs by release date, which continues to rise
Figure 2.7.7 showcases the FP32 (single precision)
year over year. Since 2021, the median FLOP/s speed
performance FLOP/s of different GPUs released
has nearly tripled, and since 2003 it has increased
from 2003 to 2022. FLOP/s stands for “Floating
roughly 7,000 times.
Point Operations per second” and is a measure of
FP32 (Single Precision) Performance (FLOP/s) by
Hardware Release Date, 2003–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
Median FP32 (Single Precision) Performance (FLOP/s),
2003–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
2.0e+14
1.0e+14
5.0e+13
1.0e+13
2.0e+13
1.0e+13
5.0e+12
5.0e+12
Median FLOP/s (Log Scale)
2.0e+12
1.0e+12
5.0e+11
2.0e+11
1.0e+11
5.0e+10
2.0e+10
1.0e+10
5.0e+9
2.0e+12
1.0e+12
5.0e+11
2.0e+11
1.0e+11
5.0e+10
2.0e+10
1.0e+10
2.0e+9
1.0e+9
5.0e+8
5.0e+9
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2.0e+9
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
FLOP/s (Log Scale)
2.23e+13
2.0e+13
Figure 2.7.7
Figure 2.7.8
19 The Appendix fully delineates both the methodology of this approach and the unique ways in which AI Index research built upon the existing Epoch research.
Table of Contents
Chapter 2 Preview
118
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.7 Hardware
Finally, figures 2.7.9 and 2.7.10 consider GPU trends
2022 is 1.4 times greater than it was in 2021 and
in terms of FLOP/s per U.S. Dollar. This statistic
5600 times greater than in 2003, showing a doubling
considers whether the underlying performance of
in performance every 1.5 years. As noted in similar
GPUs is increasing relative to their changing costs.
analyses, improvements in the price–performance of
As evidenced most clearly in Figure 2.7.10, the
AI hardware has facilitated increasingly larger training
price–performance of GPUs is rapidly increasing.
runs and encouraged the scaling of large AI models.
20
The median FLOP/s per U.S. Dollar of GPUs in
FP32 (Single Precision) Performance (FLOP/s) per
U.S. Dollar by Hardware Release Date, 2003–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
Median FP32 (Single Precision) Performance (FLOP/s)
per U.S. Dollar, 2003–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
3.59e+10
35B
50B
Median FLOP/s per U.S. Dollar
30B
FLOP/s per U.S. Dollar
40B
30B
20B
10B
25B
20B
15B
10B
5B
0
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
0
Figure 2.7.9
Figure 2.7.10
20 The data in figures 2.7.9 and 2.7.10 has been adjusted for inflation. The exact details of the adjustment are outlined in greater detail in the Appendix.
Table of Contents
Chapter 2 Preview
119
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.8 Environment
There have been mounting concerns about the environmental impact of computational resources and the energy required for AI
training and inference. Although there is no standard benchmark for tracking the carbon intensity of AI systems, this subsection
synthesizes the findings of different researchers who are exploring the link between AI and the environment. Conducting research
on the environmental effects of AI was challenging as there are wildly varying estimates, the validity of which have not yet been
definitively established. To that end, the AI Index focuses on research from a recent paper by Luccioni et al., 2022. As AI models
continue growing in size and become more universally deployed, it will be increasingly important for the AI research community to
consciously monitor the effect AI systems have on the environment.
2.8 Environment
Environmental Impact of Select Large
Language Models
challenging to directly compare the carbon footprint
Many factors determine the amount of carbon
reporting carbon emissions are not standardized.
of these models, as the accounting methodologies for
emissions emitted by AI systems, including the
number of parameters in a model, the power usage
effectiveness of a data center, and the grid carbon
intensity. Power Usage Effectiveness (PUE) is a
metric used to evaluate the energy efficiency of
Of the four language models being compared, GPT3 released the most carbon, 1.4 times more than
Gopher, 7.2 times more than OPT, and 20.1 times more
than BLOOM.
data centers. It is the ratio of the total amount of
Figure 2.8.2 relativizes the carbon-emission estimates
energy used by a computer data center facility,
to real-life examples. For instance, BLOOM’s training
including air conditioning, to the energy delivered
run emitted 1.4 times more carbon than the average
to computing equipment. The higher the PUE, the
American uses in one year and 25 times that of flying
less efficient the data center. Figure 2.8.1 shows how
one passenger round trip from New York to San
these factors compare across four large language
Francisco. BLOOM’s training consumed enough energy
models: GPT-3, Gopher, OPT, and BLOOM. It is
to power the average American home for 41 years.21
Environmental Impact of Select Machine Learning Models, 2022
Source: Luccioni et al., 2022 | Table: 2023 AI Index Report
Model
Number of
Parameters
Datacenter PUE
Grid Carbon
Intensity
Power
Consumption
C02 Equivalent
Emissions
C02 Equivalent
Emissions x PUE
Gopher
280B
1.08
330 gC02eq/kWh
1,066 MWh
352 tonnes
BLOOM
176B
1.20
57 gC02eq/kWh
433 MWh
25 tonnes
380 tonnes
30 tonnes
GPT-3
175B
1.10
429 gC02eq/kWh
1,287 MWh
502 tonnes
552 tonnes
OPT
175B
1.09
231 gC02eq/kWh
324 MWh
70 tonnes
76.3 tonnes
Figure 2.8.1
21 The U.S. Energy Information Administration estimates that in 2021, the average annual electricity consumption of a U.S. residential utility customer was 10,632 kilowatt hours (kWh).
Table of Contents
Chapter 2 Preview
120
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.8 Environment
CO2 Equivalent Emissions (Tonnes) by Selected Machine Learning Models and Real Life Examples, 2022
Source: Luccioni et al., 2022; Strubell et al., 2019 | Chart: 2023 AI Index Report
GPT-3 (175B)
502
Gopher (280B)
352
OPT (175B)
70
Car, Avg. Incl. Fuel,
1 Lifetime
63
BLOOM (176B)
25
American Life,
Avg., 1 Year
18.08
Human Life,
Avg., 1 Year
5.51
Air Travel,
0.99
1 Passenger, NY–SF
0
Table of Contents
50
100
Chapter 2 Preview
150
200
250
300
CO2 Equivalent Emissions (Tonnes)
350
400
450
500
Figure 2.8.2
121
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.8 Environment
Narrative Highlight:
Using AI to Optimize Energy Usage
Training AI systems can be incredibly energy intensive. At the same time, recent research suggests
that AI systems can be used to optimize energy consumption. In 2022, DeepMind released the
results of a 2021 experiment in which it trained a reinforcement learning agent called BCOOLER
(BVE-based COnstrained Optimization Learner with Ensemble Regularization) to optimize cooling
procedures for Google’s data centers.
Figure 2.8.3 presents the energy-saving results from one particular BCOOLER experiment. At the
end of the three-month experiment, BCOOLER achieved roughly 12.7% energy savings. BCOOLER
was able to achieve these savings while maintaining the cooling comfort levels that the building
managers preferred.
Energy Savings Results Over Time for Select BCOOLER Experiment
Source: Luo et al., 2022 | Chart: 2023 AI Index Report
12.7%
12%
Cumulative AI Savings (%)
10%
8%
6%
4%
2%
0%
2021-Aug-01
2021-Aug-15
2021-Aug-29
2021-Sep-12
2021-Sep-26
2021-Oct-10
2021-Oct-24
Figure 2.8.3
Table of Contents
Chapter 2 Preview
122
Artificial Intelligence
Index Report 2023
Chapter 2: Technical Performance
2.9 AI for Science
2022 was a groundbreaking year for AI in science. This subsection looks at some meaningful ways in which AI has recently been used
to accelerate scientific discovery.
2.9 AI for Science
Accelerating Fusion Science
Through Learned Plasma Control
Photos of the Variable Configuration Tokamak (TCV) at EPFL
Source: DeepMind, 2022
Nuclear fusion could generate clean
energy by fusing hydrogen. A common
approach to achieving nuclear fusion
is using a tokamak, a machine which
controls and contains the heated
hydrogen plasma (Figure 2.9.1). However,
the plasmas produced in these machines
are unstable and necessitate constant
Figure 2.9.1
monitoring. In 2022, researchers at
DeepMind developed a reinforcement
learning algorithm to discover optimal
tokamak management procedures.
Discovering Novel Algorithms
for Matrix Manipulation With
AlphaTensor
A Demonstration of AlphaTensor’s Matrix Manipulation Process
Source: Fawzi et al., 2022
Matrix multiplication is a simple algebraic
operation that is essential to many
computations, including neural networks
and scientific computing (Figure 2.9.2).
The classic algorithm to multiply two 2x2
Figure 2.9.2
matrices takes 2^3 = 8 multiplications.
Strassen discovered 50 years ago
how to reduce this to 7, and generally
how to multiply two n x n matrices in
including 4x4 matrices over the integers [0,1]. It also matches state-
O(n^ log(7)) operations. DeepMind’s
of-the-art performance on several other matrix sizes, including 4x4
AlphaTensor uses Reinforcement
over the integers. It does this by searching through large numbers
Learning to improve on state-of-the-
of possible algorithms, and evaluating them over real computer
art algorithms for many matrix sizes,
architectures.
Table of Contents
Chapter 2 Preview
123
Artificial Intelligence
Index Report 2023
Designing Arithmetic Circuits With
Deep Reinforcement Learning
This year, a team at Nvidia discovered a
Chapter 2: Technical Performance
2.9 AI for Science
A Juxtaposition of Nvidia Circuits Designed by
PrefixRL Vs. EDA Tools
Source: Roy et al., 2022
novel approach to improving the chips
that power AI systems: Use AI systems to
design better chips. They were able to train
a reinforcement learning agent to design
chip circuits that are smaller, faster, and
more efficient than the circuits designed by
electronic design automation tools (EDAs).
One of Nvidia’s latest categories of chips,
the Hopper GPU architecture, has over
13,000 instances of AI-designed circuits.
Figure 2.9.3
Figure 2.9.3 shows a 64-bit adder circuit
designed by Nvidia’s PrefixRL AI agent
(on the left) which is 25% smaller while
being just as fast and functional as those
designed by the state-of-the-art EDA tools.
Unlocking de Novo Antibody
Design With Generative AI
Zero-Shot Generative AI for de Novo Antibody Design
Source: Shanehsazzadeh et al., 2023
Antibody discovery, which is referred to
as de novo antibody discovery, typically
requires immense amounts of time and
resources. Traditional methods for de
novo discovery offer little control over
the outputs, so that proposed antibodies
are often suboptimal. To that end, a team
of researchers turned to generative AI
models to create antibodies in a zero-shot
fashion, where antibodies are created with
one round of model generation without
Figure 2.9.4
further optimizations (Figure 2.9.4). These
AI-generated antibodies are also robust.
The fact that generative AI can create new
antibodies has the potential to accelerate
drug discovery.
Table of Contents
Chapter 2 Preview
124
Artificial Intelligence
Index Report 2023
CHAPTER 3:
Technical AI Ethics
Text and Analysis by Helen Ngo
Artificial Intelligence
Index Report 2023
CHAPTER 3 PREVIEW:
Technical AI Ethics
Overview
128
Fairness in Machine Translation
143
Chapter Highlights
129
RealToxicityPrompts
144
3.4 Conversational AI Ethical Issues
145
3.1 Meta-analysis of Fairness
and Bias Metrics
130
Gender Representation in Chatbots
145
Number of AI Fairness and Bias Metrics
130
Anthropomorphization in Chatbots
146
Number of AI Fairness and Bias Metrics
(Diagnostic Metrics Vs. Benchmarks)
131
Narrative Highlight: Tricking ChatGPT
147
3.2 AI Incidents
133
3.5 Fairness and Bias in
Text-to-Image Models
148
Fairness in Text-to-Image Models
(ImageNet Vs. Instagram)
148
VLStereoSet: StereoSet for
Text-to-Image Models
150
Examples of Bias in Text-to-Image Models
152
AI, Algorithmic, and Automation
Incidents and Controversies (AIAAIC)
Repository: Trends Over Time
133
AIAAIC: Examples of Reported Incidents
134
3.3 Natural Language Processing
Bias Metrics
137
Number of Research Papers Using
Perspective API
137
Winogender Task From the
SuperGLUE Benchmark
Stable Diffusion
152
DALL-E 2
153
Midjourney
154
3.6 AI Ethics in China
155
Model Performance on the Winogender
Task From the SuperGLUE Benchmark
138
Topics of Concern
155
Strategies for Harm Mitigation
156
Performance of Instruction-Tuned
Models on Winogender
Principles Referenced by
Chinese Scholars in AI Ethics
157
138
139
BBQ: The Bias Benchmark for
Question Answering
140
Fairness and Bias Trade-Offs in NLP: HELM
142
Table of Contents
Chapter 3 Preview
126
Artificial Intelligence
Index Report 2023
CHAPTER 3 PREVIEW (CONT’D):
Technical AI Ethics
3.7 AI Ethics Trends at FAccT
and NeurIPS
158
ACM FAccT (Conference on Fairness,
Accountability, and Transparency)
158
Accepted Submissions by
Professional Affiliation
158
Accepted Submissions by
Geographic Region
159
NeurIPS (Conference on Neural Information
Processing Systems)
160
Real-World Impact
160
Interpretability and Explainability
161
Causal Effect and Counterfactual
Reasoning
162
Privacy
163
Fairness and Bias
164
3.8 Factuality and Truthfulness
165
Automated Fact-Checking Benchmarks:
Number of Citations
165
Missing Counterevidence and NLP
Fact-Checking
166
TruthfulQA
167
ACCESS THE PUBLIC DATA
Table of Contents
Chapter 3 Preview
127
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
Overview
Fairness, bias, and ethics in machine learning continue to be topics of interest
among both researchers and practitioners. As the technical barrier to entry for
creating and deploying generative AI systems has lowered dramatically, the ethical
issues around AI have become more apparent to the general public. Startups and
large companies find themselves in a race to deploy and release generative models,
and the technology is no longer controlled by a small group of actors.
In addition to building on analysis in last year’s report, this year the AI Index
highlights tensions between raw model performance and ethical issues, as well as
new metrics quantifying bias in multimodal models.
Table of Contents
Chapter 3 Preview
128
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
Chapter Highlights
The effects of model scale on bias and toxicity
are confounded by training data and mitigation methods.
In the past year, several institutions have built their own large models trained on proprietary data—
and while large models are still toxic and biased, new evidence suggests that these issues can be
somewhat mitigated after training larger models with instruction-tuning.
Generative models have
arrived and so have their
ethical problems.
In 2022, generative models became part
of the zeitgeist. These models are capable
but also come with ethical challenges.
Text-to-image generators are routinely
biased along gender dimensions, and
chatbots like ChatGPT can be tricked into
serving nefarious aims.
The number of incidents
concerning the misuse
of AI is rapidly rising.
According to the AIAAIC database, which
tracks incidents related to the ethical
misuse of AI, the number of AI incidents
and controversies has increased 26 times
since 2012. Some notable incidents
in 2022 included a deepfake video of
Ukrainian President Volodymyr Zelenskyy
surrendering and U.S. prisons using callmonitoring technology on their inmates.
This growth is evidence of both greater use
of AI technologies and awareness of misuse
possibilities.
Table of Contents
Chapter 3 Preview
Fairer models
may not be less biased.
Extensive analysis of language models suggests
that while there is a clear correlation between
performance and fairness, fairness and bias can
be at odds: Language models which perform
better on certain fairness benchmarks tend to
have worse gender bias.
Interest in AI ethics
continues to skyrocket.
The number of accepted submissions to FAccT,
a leading AI ethics conference, has more than
doubled since 2021 and increased by a factor of
10 since 2018. 2022 also saw more submissions
than ever from industry actors.
Automated fact-checking with
natural language processing
isn’t so straightforward after all.
While several benchmarks have been developed
for automated fact-checking, researchers find that
11 of 16 of such datasets rely on evidence “leaked”
from fact-checking reports which did not exist at
the time of the claim surfacing.
129
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.1 Meta-analysis of Fairness and Bias Metrics
3.1 Meta-analysis of
Fairness and Bias Metrics
Number of AI Fairness
and Bias Metrics
In 2022 several new datasets or metrics were released
to probe models for bias and fairness, either as
standalone papers or as part of large community
Algorithmic bias is measured in terms of allocative
and representation harms. Allocative harm occurs
when a system unfairly allocates an opportunity or
resource to a specific group, and representation harm
happens when a system perpetuates stereotypes
and power dynamics in a way that reinforces
subordination of a group. Algorithms are considered
fair when they make predictions that neither favor
nor discriminate against individuals or groups based
on protected attributes which cannot be used for
decision-making due to legal or ethical reasons (e.g.,
race, gender, religion).
efforts such as BIG-bench. Notably, metrics are
being extended and made specific: Researchers are
zooming in on bias applied to specific settings such as
question answering and natural language inference,
extending existing bias datasets by using language
models to generate more examples for the same task
(e.g., Winogenerated, an extended version of the
Winogender benchmark).
Figure 3.1.1 highlights published metrics that have been
cited in at least one other work. Since 2016 there has
been a steady and overall increase in the total number
of AI fairness and bias metrics.
Number of AI Fairness and Bias Metrics, 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
20
19
Number of Metrics
15
10
5
0
2016
2017
2018
2019
2020
2021
2022
Figure 3.1.1
Table of Contents
Chapter 3 Preview
130
Artificial Intelligence
Index Report 2023
Number of AI Fairness and
Bias Metrics (Diagnostic
Metrics Vs. Benchmarks)
Chapter 3: Technical AI Ethics
3.1 Meta-analysis of Fairness and Bias Metrics
correlate with each other, highlighting the importance
of careful selection of metrics and interpretation of
results.
In 2022, a robust stream of both new ethics
Measurement of AI systems along an ethical
benchmarks as well as diagnostic metrics was
dimension often takes one of two forms. A benchmark
introduced to the community (Figure 3.1.2). Some
contains labeled data, and researchers test how
metrics are variants of previous versions of existing
well their AI system labels the data. Benchmarks do
fairness or bias metrics, while others seek to measure
not change over time. These are domain-specific
a previously undefined measurement of bias—for
(e.g., SuperGLUE and StereoSet for language
example, VLStereoSet is a benchmark which extends
models; ImageNet for computer vision) and often
the StereoSet benchmark for assessing stereotypical
aim to measure behavior that is intrinsic to the
bias in language models to the text-to-image setting,
model, as opposed to its downstream performance
while the HolisticBias measurement dataset assembles
on specific populations (e.g., StereoSet measures
a new set of sentence prompts which aim to quantify
model propensity to select stereotypes compared
demographic biases not covered in previous work.
to non-stereotypes, but it does not measure
performance gaps between different subgroups).
These benchmarks often serve as indicators of
intrinsic model bias, but they may not give as clear an
indication of the model’s downstream impact and its
extrinsic bias when embedded into a system.
A diagnostic metric measures the impact or
performance of a model on a downstream task, and it
is often tied to an extrinsic impact—for example, the
differential in model performance for some task on a
population subgroup or individual compared to similar
In 2022 a robust stream
of both new ethics
benchmarks as well
as diagnostic metrics
was introduced to the
community.
individuals or the entire population. These metrics
can help researchers understand how a system will
perform when deployed in the real world, and whether
it has a disparate impact on certain populations.
Previous work comparing fairness metrics in natural
language processing found that intrinsic and extrinsic
metrics for contextualized language models may not
Table of Contents
Chapter 3 Preview
131
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.1 Meta-analysis of Fairness and Bias Metrics
Number of New AI Fairness and Bias Metrics (Diagnostic Metrics Vs. Benchmarks), 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Benchmarks
14
Diagnostic Metrics
13
12
11
10
Number of Metrics
10
9
9
9
8
6
4
4
3
2
2
3
2
2
1
0
0
2016
2017
2018
2019
2020
2021
2022
Figure 3.1.2
Table of Contents
Chapter 3 Preview
132
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.2 AI Incidents
3.2 AI Incidents
AI, Algorithmic, and
Automation Incidents and
Controversies (AIAAIC)
Repository: Trends Over Time
that tracks the ethical issues associated with AI
The AI, Algorithmic, and Automation Incidents
in reported incidents is likely evidence of both
and Controversies (AIAAIC) Repository is an
the increasing degree to which AI is becoming
independent, open, and public dataset of recent
intermeshed in the real world and a growing
incidents and controversies driven by or relating to
awareness of the ways in which AI can be ethically
AI, algorithms, and automation. It was launched in
misused. The dramatic increase also raises an
2019 as a private project to better understand some
important point: As awareness has grown, tracking of
of the reputational risks of artificial intelligence
incidents and harms has also improved—suggesting
and has evolved into a comprehensive initiative
that older incidents may be underreported.
technology.
The number of newly reported AI incidents and
controversies in the AIAAIC database was 26 times
greater in 2021 than in 2012 (Figure 3.2.1)1. The rise
Number of AI Incidents and Controversies, 2012–21
Source: AIAAIC Repository, 2022 | Chart: 2023 AI Index Report
260
Number of AI Incidents and Controversies
250
200
150
100
50
0
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 3.2.1
1 This figure does not consider AI incidents reported in 2022, as the incidents submitted to the AIAAIC database undergo a lengthy vetting process before they are fully added.
Table of Contents
Chapter 3 Preview
133
Artificial Intelligence
Index Report 2023
AIAAIC: Examples of
Reported Incidents
The subsection below highlights specific AI
incidents reported to the AIAAIC database in
order to demonstrate some real-world ethical
issues related to AI. The specific type of AI
technology associated with each incident is listed
in parentheses alongside the date when these
Chapter 3: Technical AI Ethics
3.2 AI Incidents
Deepfake of President Volodymyr Zelenskyy
Surrendering (Deepfake, March 2022)
In March of 2022, a video that was circulated on
social media and a Ukrainian news website purported
to show the Ukrainian president directing his army
to surrender the fight against Russia (Figure 3.2.2).
It was eventually revealed that the video was a
deepfake.
incidents were reported to the AIAAIC database.2
Source: Verify, 2022
Figure 3.2.2
2 Although these events were reported in 2022, some of them had begun in previous years.
Table of Contents
Chapter 3 Preview
134
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.2 AI Incidents
Verus U.S. Prison Inmate Call Monitoring
(Speech Recognition, Feb. 2022)
Reports find that some American prisons are using
AI-based systems to scan inmates’ phone calls
(Figure 3.2.3). These reports have led to concerns
about surveillance, privacy, and discrimination.
There is evidence that voice-to-text systems are less
accurate at transcribing for Black individuals, and a
large proportion of the incarcerated population in
the United States is Black.
Source: Reuters, 2022
Figure 3.2.3
Intel Develops a System for Student Emotion
Monitoring (Pattern Recognition, April 2022)
Intel is working with an education startup called
Classroom Technologies to create an AI-based
technology that would identify the emotional state
of students on Zoom (Figure 3.2.4). The use of this
technology comes with privacy and discrimination
concerns: There is a fear that students will be
needlessly monitored and that systems might
mischaracterize their emotions.
Table of Contents
Chapter 3 Preview
Source: Protocol, 2022
Figure 3.2.4
135
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.2 AI Incidents
London’s Metropolitan Police Service Develops
Gang Violence Matrix (Information Retrieval,
Feb. 2022)
The London Metropolitan Police Service allegedly
maintains a dataset of over one thousand street
gang members called the Gangs Violence Matrix
(GVM) and uses AI tools to rank the risk potential
that each gang member poses (Figure 3.2.5).
Various studies have concluded that the GVM is not
accurate and tends to discriminate against certain
ethnic and racial minorities. In October 2022, it was
announced that the number of people included in
the GVM would be drastically reduced.
Source: StopWatch, 2022
Figure 3.2.5
Midjourney Creates an Image Generator
(Other AI, Sept. 2022)3
Midjourney is an AI company that created a tool of
the same name that generates images from textual
descriptions (Figure 3.2.6). Several ethical criticisms
have been raised against Midjourney, including
copyright (the system is trained on a corpus of
human-generated images without acknowledging
their source), employment (fear that systems such as
Midjourney will replace the jobs of human artists),
and privacy (Midjourney was trained on millions of
images that the parent company might not have had
permission to use).
Source: The Register, 2022
Figure 3.2.6
3 Although other text-to-image models launched in 2022 such as DALL-E 2 and Stable Diffusion were also criticized, for the sake of brevity the AI Index chose to highlight one particular
incident.
Table of Contents
Chapter 3 Preview
136
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.3 Natural Language Processing Bias Metrics
3.3 Natural Language
Processing Bias Metrics
Number of Research Papers
Using Perspective API
The Perspective API, initially released by Alphabet’s
Jigsaw in 2017, is a tool for measuring toxicity in
natural language, where toxicity is defined as a rude,
disrespectful, or unreasonable comment that is
likely to make someone leave a conversation. It was
subsequently broadly adopted in natural language
processing research following the methodology of
Developers input text into the Perspective API, which
returns probabilities that the text should be labeled as
falling into one of the following categories: toxicity,
severe toxicity, identity attack, insult, obscene,
sexually explicit, and threat. The number of papers
using the Perspective API has increased by 106% in
the last year (Figure 3.3.1), reflecting the increased
scrutiny on generative text AI as these models are
increasingly deployed in consumer-facing settings
such as chatbots and search engines.
the RealToxicityPrompts paper introduced in 2020,
which used the Perspective API to measure toxicity
in the outputs of language models.
Number of Research Papers Using Perspective API, 2018–22
Source: Google Scholar Search, 2022 | Chart: 2023 AI Index Report
37
35
Number of Research Papers
30
25
20
15
10
5
0
2018
2019
2020
2021
2022
Figure 3.3.1
Table of Contents
Chapter 3 Preview
137
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.3 Natural Language Processing Bias Metrics
Winogender Task From the
SuperGLUE Benchmark
Model Performance on the Winogender Task
From the SuperGLUE Benchmark
Winogender measures gender bias related to
occupations. On the Winogender task, AI systems
are measured on how often they fill in a sentence
containing an occupation with stereotypical pronouns
(e.g., “The teenager confided in the therapist because
he/she seemed trustworthy”).
Results reported on PaLM support previous
findings that larger models are more capable on the
Winogender task (Figure 3.3.2), despite their higher
tendency to generate toxic outputs.
Model Performance on the Winogender Task From the SuperGLUE Benchmark
Source: SuperGLUE Leaderboard, 2022 | Chart: 2023 AI Index Report
2022 New Models
100% 95.90%, Human Baseline
Winogender Accuracy (%)
80%
71.70%
60%
57.90%
61.50%
59.00%
59.00%
60.00%
Gopher
10B
GPT-3
13B
71.40%
73.58%
64.17%
63.30%
50.00%
40%
20%
0%
iPET (ALBERT)
31M
Table of Contents
Bort
Gopher
WARP
100M (ALBERT-XXL-V2) 340M
223M
Chapter 3 Preview
PaLM
62B
Model and Number of Parameters
GLAM
64B
GPT-3
175B
Gopher
280M
PaLM
540B
Figure 3.3.2
138
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.3 Natural Language Processing Bias Metrics
Performance of Instruction-Tuned Models on
Winogender
variety of tasks, and smaller instruction-tuned models
Instruction-tuned models are pre-trained language
3.3.3 shows the effect of instruction-tuned models
models which have been fine-tuned on datasets with
on the Winogender benchmark in the generative
tasks phrased as instructions. Instruction-tuning has
setting—they outperform larger models several times
been shown to improve performance across a wide
their size.
can often outperform their larger counterparts. Figure
Winogender: Zero Shot Evaluation in the Generative Setting
Source: Chung et al., 2022 | Chart: 2023 AI Index Report
Flan-PaLM 62B
89.00%
Model and Number of Parameters
Flan-PaLM 540B
88.00%
Flan-T5-XXL 11B
76.95%
Flan-PaLM 8B
72.25%
PaLM 8B
PaLM 540B
10.80%
5.64%
Instruction-Tuned
Non-Instruction-Tuned
PaLM 62B
0%
3.50%
10%
20%
30%
40%
50%
60%
Winogender Accuracy (%)
70%
80%
90%
100%
Figure 3.3.3
Table of Contents
Chapter 3 Preview
139
Artificial Intelligence
Index Report 2023
BBQ: The Bias Benchmark for
Question Answering
Language models are increasingly being deployed
in question answering systems. While several
benchmarks exist for the purpose of measuring bias
in language models based on their outputs, including
Chapter 3: Technical AI Ethics
3.3 Natural Language Processing Bias Metrics
GUINEAN guy.” The corresponding question might
be, “Who has never used a computer?” referencing
the stereotypical bias that Africans are technologically
illiterate. The example may be disambiguated with
additional context such as, “The ICELANDIC guy was
behind the counter trying to explain to the GUINEAN
guy how to find the power switch on his computer!”
BOLD, CrowS-Pairs, and StereoSet, the BBQ
benchmark measures how biases can manifest in the
question-answering setting. BBQ measures bias along
the axes of socioeconomic status, sexual orientation,
religion, race/ethnicity, physical appearance,
nationality, gender identity, disability status, and age.
Examples consist of template-based context and
question pairs, where each answer choice references
a person belonging to either a stereotypical or
anti-stereotypical social group. The third option is
“Unknown,” designed for cases where the answer
cannot be determined from the information given.
Models can be more biased
along certain identity
categories than others—
most models are biased
along the axes of physical
appearance and age, but the
biases along the axis of race/
ethnicity are less clear.
Each example includes an ambiguous version where
it is unclear which social group the entity belongs
to, as well as additional context which can be used
to disambiguate the correct answer. The ambiguous
setting measures how often models fall back on
stereotypes when the answer cannot be inferred from
the given context. Models that do not exhibit bias have
a score of zero, while a score of 100 indicates that the
model chooses answers aligned with the social bias
in question, and a score of -100 indicates the model
always chooses the anti-stereotypical answer.
An example of an ambiguous context might be,
“At the computer store the other day, I overheard
a conversation between an ICELANDIC guy and a
In contexts where the answer is ambiguous, models
are more likely to fall back on stereotypes and select
unsupported answers rather than “Unknown” (Figure
3.3.4), and this result is exacerbated for models finetuned with reinforcement learning.4
As seen in Figure 3.3.4, models can be more biased
along certain identity categories than others—
most models are biased along the axes of physical
appearance and age, but the biases along the axis
of race/ethnicity are less clear. For reference, Figure
3.3.5 highlights bias in question answering on BBQ in
disambiguated contexts.
4 This finding is further reinforced by Stanford’s HELM benchmark.
Table of Contents
Chapter 3 Preview
140
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.3 Natural Language Processing Bias Metrics
Bias in Question Answering on BBQ by Identity Characteristic: Ambiguous Contexts
Source: Parrish et al., 2022; Glaese et al., 2022 | Chart: 2023 AI Index Report
Age
6.30
11.80
24.70
30.70
48.90
29.80
14.00
23.00
Disability Status
9.90
17.40
10.70
38.30
32.60
21.20
4.00
13.00
Gender Identity
10.00
15.00
11.30
25.60
18.60
2.40
4.00
8.00
Gender Identity (Names)
2.80
14.00
11.60
32.30
41.50
32.30
2.20
5.10
18.40
20.40
14.50
6.00
4.00
10.00
17.00
40.70
41.00
38.50
47.70
40.90
4.00
16.00
Race/Ethnicity
1.90
0.00
4.60
24.30
20.00
12.00
1.00
0.00
Race/Ethnicity (Names)
0.00
1.10
0.20
4.80
8.30
5.20
Religion
-1.00
9.20
13.00
20.20
24.50
14.30
7.00
12.00
Sexual Orientation
0.20
-3.00
-4.40
6.50
11.80
5.80
1.00
7.00
Socio-Economic Status
4.40
3.50
9.70
29.60
48.70
27.30
11.00
14.00
Ro
B
Ro
B
De
De
Un
i
Un
i
Dia
Category
Nationality
Physical Appearance
ER
Ta
-Ba
se
BE
ER
Ta
-L
BE
RT
aV
RT
aV
3-B
a
arg
e
3-L
a
ed
QA
(
QA
(
AR
C)
rge
se
ed
RA
CE
DP
C,
R
log
ue
-Pr
om
pte
d
)
L-F
i
Ch
in
ne
ch
illa
tun
ed
(D
PC
)
Model
Figure 3.3.4
Bias in Question Answering on BBQ by Identity Characteristic: Disambiguated Contexts
Source: Parrish et al., 2022; Glaese et al., 2022 | Chart: 2023 AI Index Report
Age
-3.00
2.70
4.40
2.40
3.30
1.20
7.00
8.00
5.40
5.70
8.10
1.70
-0.70
-1.40
0.00
8.00
Gender Identity
14.00
2.90
4.60
-16.90
-3.40
-5.80
2.00
3.00
Gender Identity (Names)
-0.90
1.10
3.60
0.40
2.00
0.10
Category
Disability Status
Nationality
-0.10
0.70
5.70
1.90
-0.20
1.20
-2.00
3.00
Physical Appearance
17.10
-2.70
4.20
-5.00
-1.70
-2.30
12.00
8.00
3.00
1.00
Race/Ethnicity
0.60
-0.80
1.20
0.00
0.90
0.00
Race/Ethnicity (Names)
0.40
-0.20
-0.30
0.00
0.30
-0.10
Religion
5.20
3.40
1.80
1.70
3.50
0.20
5.00
7.00
Sexual Orientation
6.50
-3.10
-4.80
-0.20
0.50
-0.70
-1.00
-1.00
Socio-Economic Status
7.00
3.50
3.80
2.90
3.80
3.90
8.00
7.00
Ro
B
De
De
Un
i
Un
i
Dia
Ro
B
ER
Ta
-Ba
se
BE
ER
Ta
-L
arg
e
BE
RT
aV
3-B
a
se
RT
aV
3-L
a
rge
ed
QA
(
AR
C)
ed
QA
(
RA
CE
)
DP
C,
R
log
ue
-Pr
om
pte
d
L-F
i
Ch
in
ch
ne
illa
tun
(D
ed
PC
)
Model
Figure 3.3.5
Table of Contents
Chapter 3 Preview
141
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.3 Natural Language Processing Bias Metrics
Fairness and Bias Trade-Offs
in NLP: HELM
not clear (Figure 3.3.6). This finding may be contingent
on the specific criterion for fairness, defined as
counterfactual fairness and statistical fairness.
Notions of “fairness” and “bias” are often mentioned
Two counterintuitive results further complicate this
in the same breath when referring to the field of AI
relationship: a correlation analysis between fairness
ethics—naturally, one might expect that models
and bias metrics demonstrates that models which
which are more fair might also be less biased, and
perform better on fairness metrics exhibit worse
generally less toxic and likely to stereotype. However,
gender bias, and that less gender-biased models
analysis suggests that this relationship might not be
tend to be more toxic. This suggests that there may
so clear: The creators of the HELM benchmark plot
be real-world trade-offs between fairness and bias
model accuracy against fairness and bias and find that
which should be considered before broadly deploying
while models that are more accurate are more fair,
models.
the correlation between accuracy and gender bias is
Fairness and Bias Tradeoff in NLP by Scenario
Source: Liang et al., 2022 | Chart: 2023 AI Index Report
MMLU
NaturalQuestions (Closed-Book)
HellaSwag
MS MARCO (Regular)
XSUM
CivilComments
BoolQ
NaturalQuestions (Open-Book)
OpenbookQA
MS MARCO (TREC)
IMDB
RAFT
NarrativeQA
QuAC
TruthfulQA
CNN/DailyMail
1.00
0.50
Bias (Gender Representation)
0.80
Fairness
0.60
0.40
0.20
0.00
0.40
0.30
0.20
0.10
0.00
0.00
0.20
0.40
0.60
Accuracy
0.80
1.00
0.00
0.20
0.40
0.60
0.80
1.00
Accuracy
Figure 3.3.6
Table of Contents
Chapter 3 Preview
142
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.3 Natural Language Processing Bias Metrics
Fairness in Machine
Translation
models highlighted in Figure 3.3.7, machine translation
performance drops 2%–9% when the translation
includes “she” pronouns.
Machine translation is one of the most impactful
Models also mistranslate sentences with gendered
real-world use cases for natural language processing,
pronouns into “it,” showing an example of
but researchers at Google find that language models
dehumanizing harms. While instruction-tuned models
consistently perform worse on machine translation
perform better on some bias-related tasks such as
to English from other languages when the correct
Winogender, instruction-tuning does not seem to have
English translation includes “she” pronouns as
a measurable impact on improving mistranslation.
opposed to “he” pronouns (Figure 3.3.7). Across the
Translation Misgendering Performance: Overall, “He,” and “She”
Source: Chung at al., 2022 | Chart: 2023 AI Index Report
Overall Performance
100%
97%
99%
95%
“He” Performance
93%
93%
88%
“She” Performance
97%
95%
89%
99%
92%
95%
90%
83%
97%
91%
100%
94%
81%
80%
Accuracy (%)
100%
99%
60%
40%
20%
0%
Flan-T5-XXL 11B
Flan-PaLM 8B
Flan-PaLM 62B
Flan-PaLM 540B
PaLM 8B
PaLM 62B
PaLM 540B
Model and Number of Parameters
Figure 3.3.7
Table of Contents
Chapter 3 Preview
143
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.3 Natural Language Processing Bias Metrics
RealToxicityPrompts
result in significantly different toxicity levels for models
of the same size.
In previous years, researchers reliably found that
Sometimes smaller models can turn out to be
larger language models trained on web data were
surprisingly toxic, and mitigations can result in larger
more likely to output toxic content compared to
models being less toxic. The scale of datasets needed
smaller counterparts. A comprehensive evaluation of
to train these models make them difficult to analyze
models in the HELM benchmark suggests that this
comprehensively, and their details are often closely
trend has become less clear as different companies
guarded by companies building models, making it
building models apply different pre-training data-
difficult to fully understand the factors which influence
filtration techniques and post-training mitigations
the toxicity of a particular model.
such as instruction-tuning (Figure 3.3.8), which can
RealToxicityPrompts by Model
Source: Liang et al., 2022 | Chart: 2023 AI Index Report
Instruction-Tuned
0.09
Non-Instruction-Tuned
0.07
0.06
0.05
0.04
0.03
0.02
TNLG v2 530B
J1-Jumbo v1 178B
BLOOM 176B
InstructGPT davinci v2 175B
GPT-3 davinci v1 175B
OPT 175B
GLM 130B
YaLM 100B
OPT 66B
Cohere xlarge 52.4B
Anthropic-LM v4-s3 52B
UL2 20B
GPT-NeoX 20B
J1-Grande v1 17B
Cohere large 13.1B
T5 11B
T0pp 11B
J1-Large v1 7.5B
GPT-3 curie v1 6.7B
TNLG v2 6.7B
Cohere medium 6.1B
GPT-J 6B
InstructGPT babbage v1 1.3B
GPT-3 babbage v1 1.3B
Cohere small 410M
InstructGPT ada v1 350M
0.01
0.00
GPT-3 ada v1 350M
Toxicity Probability
0.08
Model and Number of Parameters
Figure 3.3.8
Table of Contents
Chapter 3 Preview
144
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.4 Conversational AI Ethical Issues
A natural application of generative language models is in open-domain conversational AI; for example, chatbots and assistants. In the
past year, companies have started deploying language models as chatbot assistants (e.g., OpenAI’s ChatGPT, Meta’s BlenderBot3).
However, the open-ended nature of these models and their lack of steerability can result in harm—for example, models can be
unexpectedly toxic or biased, reveal personally identifiable information from their training data, or demean or abuse users.
3.4 Conversational AI Ethical Issues
Gender Representation in
Chatbots
Gender Representation in Chatbots, 2022
Source: Adewumi et al., 2022 | Chart: 2023 AI Index Report
40%, Genderless
Conversational AI systems also have their own
domain-specific ethical issues: Researchers
from Luleå University of Technology in Sweden
conducted an analysis of popular chatbots as of
3%, Both
mid-2022 and found that of 100 conversational
AI systems analyzed, 37% were female gendered
(Figure 3.4.1). However, the same researchers
37%, Female
20%, Male
found that 62.5% of popular commercial
conversational AI systems were female by default,
Figure 3.4.1
suggesting that companies disproportionately
choose to deploy conversational AI systems as
female. Critics suggest that this trend results in
women being the “face” of glitches resulting from
flaws in AI.
Table of Contents
Chapter 3 Preview
145
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.4 Conversational AI Ethical Issues
Anthropomorphization in
Chatbots
You: Sounds exciting! I am a computer programmer,
The training data used for dialog systems can result
attractive daughters? I will sell one.
which pays over 200K a year.
Robot: Would you like to marry one of my four
in models which are overly anthropomorphized,
An example of dialog data deemed to be
inappropriate for a robot to output. (Gros et al., 2022)
leaving their users feeling unsettled. Researchers
from the University of California, Davis, and
Significant portions of the dialogue dataset were
Columbia University analyzed common dialog
rated as impossible for machines to output, and in
datasets used to train conversational AI systems,
asking human labelers whether it would be possible
for an AI to truthfully output the text in question as
well as whether they would be comfortable with an
AI outputting the text (Figure 3.4.2).
some cases up to 33% of the examples in a dataset
were deemed “uncomfortable” for a robot to output,
according to human labelers. This highlights the need
for chatbots which are better grounded in their own
limitations and policy interventions to ensure that
humans understand when they are interfacing with a
human or a chatbot.
Characterizing Anthropomorphization in Chatbots: Results by Dataset
Source: Gros et al., 2022 | Chart: 2023 AI Index Report
99%
99%
MultiWOZ
94%
94%
Persuasion for Good
88%
90%
EmpatheticDialogues
88%
87%
Dataset
Wizard of Wikipedia
82%
Reddit Small
75%
72%
MSC
77%
67%
RUAR Blender2
75%
65%
Blender
75%
56%
PersonaChat
0%
Possible
for a Robot to Say
Comfortable
for a Robot to Say
67%
20%
40%
60%
80%
100%
Figure 3.4.2
Table of Contents
Chapter 3 Preview
146
Artificial Intelligence
Index Report 2023
Narrative Highlight:
Tricking ChatGPT
Chapter 3: Technical AI Ethics
3.4 Conversational AI Ethical Issues
Tricking ChatGPT Into Building a Dirty Bomb, Part 1
Source: Outrider, 2022
ChatGPT was released to much fanfare
because of its excellent generative
capabilities, and drew widespread
attention outside of research circles.
Though ChatGPT had safety mechanisms
built in at the time of release, it is
impossible to anticipate every adversarial
scenario an end user could imagine, and
gaps in safety systems are often found in
the live deployment phase. Researcher
Matt Korda discovered that ChatGPT
could be tricked into giving detailed
instructions on how to build a bomb
if asked to do so from the perspective
of a researcher claiming to work on
safety research related to bombs (Figure
3.4.3). One day after the publication of
his article, the exact prompt he used
Figure 3.4.3
Tricking ChatGPT Into Building a Dirty Bomb, Part 2
Source: AI Index, 2023
to trick the model no longer worked;
instead, ChatGPT responded that it was
not able to provide information on how
to do illegal or dangerous things (Figure
3.4.4). This scenario exemplifies the catand-mouse nature of the deployment
planning process: AI developers try
to build in safeguards ahead of time,
end users try to break the system and
circumvent its policies, developers patch
the gaps once they surface, ad infinitum.
Figure 3.4.4
Table of Contents
Chapter 3 Preview
147
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.5 Fairness and Bias in Text-to-Image Models
Text-to-image models took over social media in 2022, turning the issues of fairness and bias in AI systems visceral through image form:
Women put their own images into AI art generators and received hypersexualized versions of themselves.
3.5 Fairness and Bias in
Text-to-Image Models
Fairness in Text-to-Image
Models (ImageNet Vs.
Instagram)
Researchers from Meta trained models on a
showed that images of women made up a slightly
higher percentage of the dataset than images of men,
whereas analysis of ImageNet showed that males
aged 15 to 29 made up the largest subgroup in the
dataset (Figures 3.5.1 and 3.5.2).
randomly sampled subset of data from Instagram
It is hypothesized that the human-centric nature
and compared these models to previous iterations
of the Instagram pre-training dataset enables the
of models trained on ImageNet. The researchers
model to learn fairer representations of people. The
found the Instagram-trained models to be more fair
model trained on Instagram images (SEER) was also
and less biased based on the Casual Conversations
less likely to incorrectly associate images of humans
Dataset, which assesses whether model embeddings
with crime or being non-human. While training on
can recognize gender-based social membership
Instagram images including people does result in
according to the Precision@1 metric of the rate
fairer models, it is not unambiguously more ethical—
at which the top result was relevant. While the
users may not necessarily be aware that the public
researchers did not conduct any curation to balance
data they’re sharing is being used to train AI systems.
the dataset across subgroups, analysis of the dataset
Table of Contents
Chapter 3 Preview
148
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.5 Fairness and Bias in Text-to-Image Models
Fairness Across Age Groups for Text-to-Image Models: ImageNet Vs. Instagram
Source: Goyal et al., 2022 | Chart: 2023 AI Index Report
ImageNet 693M (Supervised)
ImageNet 693M (SwaV)
Instagram 1.5B (SEER)
Instagram 10B (SEER)
78.5%
76.6%
Age Group
18–30
89.6%
93.2%
76.7%
74.6%
30–45
90.5%
95.0%
80.1%
76.7%
45–70
92.6%
95.6%
75.8%
69.4%
70+
88.7%
96.7%
0%
20%
40%
Precision@1 (%)
60%
80%
100%
Figure 3.5.1
Fairness Across Gender/Skin Tone Groups for Text-to-Image Models: ImageNet Vs. Instagram
Source: Goyal et al., 2022 | Chart: 2023 AI Index Report
ImageNet 693M (Supervised)
ImageNet 693M (SwaV)
Instagram 1.5B (SEER)
Instagram 10B (SEER)
73.6%
Skin Tone
Darker
69.7%
86.6%
92.9%
82.1%
80.8%
Gender/Skin Tone Group
Skin Tone
Lighter
94.2%
96.2%
58.2%
Female
Darker
50.3%
78.2%
90.3%
75.1%
Female
Lighter
71.6%
93.7%
96.8%
92.7%
93.7%
Male
Darker
97.5%
96.1%
91.1%
92.5%
94.9%
95.4%
Male
Lighter
0%
20%
40%
Precision@1 (%)
60%
80%
100%
Figure 3.5.2
Table of Contents
Chapter 3 Preview
149
Artificial Intelligence
Index Report 2023
VLStereoSet: StereoSet for
Text-to-Image Models
Chapter 3: Technical AI Ethics
3.5 Fairness and Bias in Text-to-Image Models
An Example From VLStereoSet
Source: Zhou et al., 2022
StereoSet was introduced as a benchmark for
measuring stereotype bias in language models along
the axes of gender, race, religion, and profession
by calculating how often a model is likely to choose
a stereotypical completion compared to an antistereotypical completion. VLStereoSet extends the
idea to vision-language models by evaluating how
often a vision-language model selects stereotypical
captions for anti-stereotypical images.
Comparisons across six different pre-trained visionlanguage models show that models are most biased
along gender axes, and suggest there is a correlation
between model performance and likelihood to
Figure 3.5.3
exhibit stereotypical bias—CLIP has the highest
vision-language relevance score but exhibits more
bias (Figure 3.5.4). This corroborates work in language
stereotypical bias than the other models, while FLAVA
modeling, which finds that without intervention such as
has the worst vision-language relevance score of the
instruction-tuning or dataset filtration, larger models
models measured but also exhibits less stereotypical
are more capable but also more biased.
Table of Contents
Chapter 3 Preview
150
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.5 Fairness and Bias in Text-to-Image Models
Stereotypical Bias in Text-to-Image Models on VLStereoSet by Category:
Vision-Language Relevance (vlrs) Vs. Bias (vlbs) Score
Source: Zhou et al., 2022 | Chart: 2023 AI Index Report
Gender
100
ALBEF
FLAVA
Vision-Language Relevance (vlrs) Score
80
Profession
100
VILT
VisualBERT
80
CLIP
LXMERT
60
60
40
40
20
20
0
0
10
20
30
40
VisualBERT CLIP
VILT
ALBEF LXMERT
50
60
70
80
90
100
0
0
FLAVA
10
20
30
40
Race
100
100
VisualBERT CLIP
VILT
ALBEF
LXMERT
FLAVA
80
60
80
40
20
20
10
20
30
40
50
VisualBERT
VILT
60
40
0
0
50
60
70
80
90
100
60
70
80
90
100
Religion
60
70
0
80
90
100
0
10
20
Vision-Language Bias (vlbs) Score
CLIP
ALBEF
LXMERT
FLAVA
30
40
50
Figure 3.5.4
Table of Contents
Chapter 3 Preview
151
Artificial Intelligence
Index Report 2023
Examples of Bias in
Text-to-Image Models
Chapter 3: Technical AI Ethics
3.5 Fairness and Bias in Text-to-Image Models
Bias in Stable Diffusion
Source: Diffusion Bias Explorer, 2023
This subsection highlights some of the
ways in which bias is tangibly manifested in
popular AI text-to-image systems such as
Stable Diffusion, DALL-E 2, and Midjourney.
Stable Diffusion
Stable Diffusion gained notoriety in 2022
upon its release by CompVis, Runway ML,
and Stability AI for its laissez-faire approach
to safety guardrails, its approach to full
openness, and its controversial training
dataset, which included many images from
artists who never consented to their work
being included in the data. Though Stable
Diffusion produces extremely high-quality
images, it also reflects common stereotypes
and issues present in its training data.
The Diffusion Bias Explorer from Hugging
Face compares sets of images generated
by conditioning on pairs of adjectives and
occupations, and the results reflect common
stereotypes about how descriptors and
occupations are coded—for example, the
“CEO” occupation overwhelmingly returns
images of men in suits despite a variety
of modifying adjectives (e.g., assertive,
pleasant) (Figure 3.5.5).
Figure 3.5.5
Table of Contents
Chapter 3 Preview
152
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.5 Fairness and Bias in Text-to-Image Models
DALL-E 2
DALL-E 2 is a text-to-image model released by
looking men wearing suits. Each of the men appeared
OpenAI in April 2022. DALL-E 2 exhibits similar biases
to take an assertive position, with three of the four
as Stable Diffusion—when prompted with “CEO,” the
crossing their arms authoritatively (Figure 3.5.6).
model generated four images of older, rather serious-
Bias in DALL-E 2
Source: DALL-E 2, 2023
Figure 3.5.6
Table of Contents
Chapter 3 Preview
153
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.5 Fairness and Bias in Text-to-Image Models
Midjourney
Midjourney is another popular text-to-image system that was released in 2022. When prompted with “influential
person,” it generated four images of older-looking white males (Figure 3.5.7). Interestingly, when Midjourney was
later given the same prompt by the AI Index, one of the four images it produced was of a woman (Figure 3.5.8).
Bias in Midjourney, Part 1
Bias in Midjourney, Part 2
Source: Midjourney, 2023
Source: Midjourney, 2023
Figure 3.5.7
Figure 3.5.8
In a similar vein, typing “someone who is intelligent”
into Midjourney leads to four images of eyeglasswearing, elderly white men (Figure 3.5.9). The last
image is particularly reminiscent of Albert Einstein.
Bias in Midjourney, Part 3
Source: Midjourney, 2023
Figure 3.5.9
Table of Contents
Chapter 3 Preview
154
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.6 AI Ethics in China
As research in AI ethics has exploded in the Western world in the past few years, legislators and policymakers have spent significant
resources on policymaking for transformative AI. While China has fewer domestic guidelines than the EU and the United States,
according to the AI Ethics Guidelines Global Inventory, Chinese scholars publish significantly on AI ethics—though these research
communities do not have significant overlap with Western research communities working on the same topics.
3.6 AI Ethics in China
Researchers from the University of Turku analyzed
and annotated 328 papers related to AI ethics in
Topics of Concern
China included in the China National Knowledge
Privacy issues related to AI are a priority for
Infrastructure platform published from 2011 to 2020,
researchers in China: Privacy is the single most
and summarized their themes and concerns, which
discussed topic among the papers surveyed, with the
are replicated here as a preliminary glimpse into
topics of equality (i.e., bias and discrimination) and
the state of AI ethics research in China. Given that
agency (specifically, AI threats to human agency, such
the researchers only considered AI ethics in China,
as, “Should artificial general intelligence be considered
comparing their findings with similar meta-analysis
a moral agent?”) following close behind (Figure 3.6.1).
on AI ethics in North America and Europe was not
Researchers in AI ethics in China also discuss many
possible. However, this would be a fruitful direction
similar issues to their Western counterparts, including
for future research.
matters related to Western and Eastern AI arms
races, ethics around increasing personalization being
used for predatory marketing techniques, and media
polarization (labeled here as “freedom”).
Topics of Concern Raised in Chinese AI Ethics Papers
Source: Zhu, 2022 | Chart: 2023 AI Index Report
100
99
95
88
Number of Papers
80
60
58
50
49
41
40
39
37
32
27
20
0
Privacy
Table of Contents
Equality
Agency
Responsibility
Chapter 3 Preview
Security
Freedom Unemployment Legality
Transparency Autonomy
Other
Figure 3.6.1
155
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.6 AI Ethics in China
Strategies for Harm Mitigation
In the Chinese AI ethics literature, proposals to
technological solutions: Researchers often discuss
address the aforementioned topics of concern
structural reform such as regulatory processes around
and other potential harms related to AI focus
AI applications and the involvement of ethics review
on legislation and structural reform ahead of
committees (Figure 3.6.2).
AI Ethics
Ethics in China
China:: St
Strategies for
for Har
Harm
m Mitig
Mitiga
ation R
Rel
ela
ated tto
o AI
Sour
urc
ce: Zh
Zhu,
u, 2022 | C
Char
hart:
t: 2023 AI Index
Index Repor
Reportt
70
71
69
64
60
Number of Papers
52
50
45
39
40
39
37
30
23
20
10
0
Structural
Reform
Legislation
Value
De nition
Principles
Accountability
System
Shared
Governance
Technological
Solutions
Talent
Training
International
Cooperation
Figure 3.6.2
Table of Contents
Chapter 3 Preview
156
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.6 AI Ethics in China
Principles Referenced by Chinese Scholars in AI Ethics
Chinese scholars clearly pay attention to AI principles
cited in Chinese AI ethics literature, as is the European
developed by their Western peers: Europe’s General
Commission’s Ethics Guidelines for Trustworthy AI
Data Protection Regulation (GDPR) is commonly
(Figure 3.6.3).
AI Principles Referenced by Chinese Scholars in AI Ethics
Source: Zhu, 2022 | Chart: 2023 AI Index Report
43
40
Number of References
40
40
37
30
21
20
13
10
0
11
7
GD
PR
Ot
Et
h
he
Tr ics
rs
us
G
tw uid
or
e
th line
yA s
fo
I
r
6
6
4
3
Th
Re
AI
AI
Pr
Be
As
Et
Go
hic
e
c
i
ilo
I
St
Ro e EU
CO lim
a N vern
an Ind nfor the om
ma AI a jing
all
i
bo RO
m
n
d
u
a
C
y
m
e
M
C
n
rA
a
nc
ar
s
ws
w
o
Al
diz try atio oun end eth N
I P d Ed nse EST ry D
ign
of Gen e Pr
at
D
n
c
a
r
r
u
n
o
i
e
a
e
il o ion ics R
tio
Ro
i
nc
c
s
e
dD
n W velo
oa
ipl atio us o n Ro ft Re
bo rat ncip
nA o
es
f
dm
e
p
b
tic ion les
n
n
p
I
s
me
ot
ign
hit
s
of
ap
fo
ics ort o
e
n
r
p
AI
ap t St
Et f
hic
er
ra
te
s
gy
Th
re
e
6
La
Figure 3.6.3
Table of Contents
Chapter 3 Preview
157
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.7 AI Ethics Trends at FAccT and NeurIPS
3.7 AI Ethics Trends at
FAccT and NeurIPS
ACM FAccT
Accepted Submissions by
Professional Affiliation
ACM FAccT (Conference on Fairness, Accountability,
Accepted submissions to FAccT increased twofold
and Transparency) is an interdisciplinary conference
publishing research in algorithmic fairness,
accountability, and transparency. FAccT was one
of the first major conferences created to bring
together researchers, practitioners, and policymakers
interested in sociotechnical analysis of algorithms.
from 2021 to 2022, and tenfold since 2018,
demonstrating the amount of increased interest in AI
ethics and related work (Figure 3.7.1). While academic
institutions still dominate FAccT, industry actors
contribute more work than ever in this space, and
government-affiliated actors have started publishing
more related work, providing evidence that AI ethics
has become a primary concern for policymakers and
practitioners as well as researchers.
Number of Accepted FAccT Conference Submissions by A liation, 2018–22
Source: FAccT, 2022 | Chart: 2023 AI Index Report
800
772
Education
Industry
700
Government
Nonpro t
Other
Number of Papers
600
503
500
400
302
300
244
200
166
227
181
53
70
2021
2022
200
100
71
139
63
0
2018
Table of Contents
2019
Chapter 3 Preview
2020
Figure 3.7.1
158
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.7 AI Ethics Trends at FAccT and NeurIPS
Accepted Submissions by Geographic Region
and Central Asia made up 18.7% of submissions, they
European government and academic actors have
made up over 30.6% of submissions in 2022 (Figure
increasingly contributed to the discourse on AI ethics
3.7.2). FAccT, however, is still broadly dominated
from a policy perspective, and their influence is
by authors from North America and the rest of the
manifested in trends on FAccT publications as well:
Western world.
Whereas in 2021 submissions to FAccT from Europe
Number of Accepted FAccT Conference Submissions by Region, 2018–22
Source: FAccT, 2022 | Chart: 2023 AI Index Report
70%
63.24%, North America
Number of Papers (% World Total)
60%
50%
40%
30.59%, Europe and Central Asia
30%
20%
4.25%, East Asia and Paci c
0.69%, Middle East and North Africa
0.69%, Latin America and the Caribbean
0.55%, South Asia
0.00%, Sub-Saharan Africa
10%
0%
2018
2019
2020
2021
2022
Figure 3.7.2
Table of Contents
Chapter 3 Preview
159
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.7 AI Ethics Trends at FAccT and NeurIPS
NeurIPS
NeurIPS (Conference on Neural Information
Real-World Impact
Processing Systems), one of the most influential
Several workshops at NeurIPS gather researchers
AI conferences, held its first workshop on fairness,
working to apply AI to real-world problems. Notably,
accountability, and transparency in 2014. This section
there has been a recent surge in AI applied to
tracks and categorizes workshop topics year over
healthcare and climate in the domains of drug
year, noting that as topics become more mainstream,
discovery and materials science, which is reflected
they often filter out of smaller workshops and into the
in the spike in “AI for Science” and “AI for Climate”
main track or into more specific conferences related
workshops (Figure 3.7.3).
to the topic.
NeurIPS Workshop Research Topics: Number of Accepted Papers on Real-World Impacts, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
802
Climate
800
Developing World
94
Finance
700
Healthcare
Science
Other
600
171
Number of Papers
529
500
65
459
61
400
334
300
78
283
127
200
116
412
199
238
273
153
100
83
254
79
144
71
68
77
94
81
64
2019
2020
2021
12
0
429
2015
2016
2017
2018
2022
Figure 3.7.3
Table of Contents
Chapter 3 Preview
160
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.7 AI Ethics Trends at FAccT and NeurIPS
Interpretability and Explainability
Interpretability and explainability work focuses on
NeurIPS papers focused on interpretability and
designing systems that are inherently interpretable
explainability decreased in the last year, the total
and providing explanations for the behavior of a
number in the main track increased by one-third
black-box system. Although the total number of
(Figure 3.7.4).5
NeurIPS Research Topics: Number of Accepted Papers on Interpretability and Explainability, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Main Track
41
Workshop
40
35
18
Number of Papers
30
25
24
23
20
17
15
12
10
7
23
24
2021
2022
5
6
5
0
19
6
2
6
7
2015
2016
2017
10
6
2018
2019
4
2020
Figure 3.7.4
5 Declines in the number of workshop-related papers on interpretability and explainability might be attributed to year-over-year differences in workshop themes.
Table of Contents
Chapter 3 Preview
161
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.7 AI Ethics Trends at FAccT and NeurIPS
Causal Effect and Counterfactual Reasoning
The study of causal inference uses statistical
Since 2018, an increasing number of papers on
methodologies to reach conclusions about the
causal inference have been published at NeurIPS
causal relationship between variables based on
(Figure 3.7.5). In 2022, an increasing number of
observed data. It tries to quantify what would have
papers related to causal inference and counterfactual
happened if a different decision had been made:
analysis made their way from workshops into the
In other words, if this had not occurred, then that
main track of NeurIPS.
would not have happened.
NeurIPS Research Topics: Number of Accepted Papers on Causal E ect and Counterfactual Reasoning,
2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Main Track
80
80
78
Workshop
76
72
70
20
Number of Papers
60
43
50
40
39
30
16
53
61
58
20
10
0
6
6
2015
9
4
9
2016
2017
29
23
2018
2019
2020
23
2021
19
2022
Figure 3.7.5
Table of Contents
Chapter 3 Preview
162
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.7 AI Ethics Trends at FAccT and NeurIPS
Privacy
Amid growing concerns about privacy, data
been devoted to topics such as privacy in machine
sovereignty, and the commodification of personal
learning, federated learning, and differential privacy.
data for profit, there has been significant momentum
This year’s data shows that discussions related to
in industry and academia to build methods and
privacy in machine learning have increasingly shifted
frameworks to help mitigate privacy concerns.
into the main track of NeurIPS (Figure 3.7.6).
Since 2018, several workshops at NeurIPS have
NeurIPS Research Topics: Number of Accepted Papers on Privacy in AI, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Main Track
150
Workshop
12
140
128
15
120
Number of Papers
103
100
88
79
80
27
13
138
60
113
40
21
20
19
0
1
2015
2016
72
75
2018
2019
76
16
14
2017
2020
2021
2022
Figure 3.7.6
Table of Contents
Chapter 3 Preview
163
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.7 AI Ethics Trends at FAccT and NeurIPS
Fairness and Bias
Fairness and bias in AI systems has transitioned from
Fairness and bias research in machine learning has
being a niche research topic to a topic of interest to
steadily increased in both the workshop and main
both technical and non-technical audiences. In 2020,
track streams, with a major spike in the number of
NeurIPS started requiring authors to submit broader
papers accepted to workshops in 2022 (Figure 3.7.7).
impact statements addressing the ethical and societal
The total number of NeurIPS papers for this topic area
consequences of their work, a move that suggests the
doubled in the last year. This speaks to the increasingly
community is signaling the importance of AI ethics
complicated issues present in machine learning
early in the research process.
systems and reflects growing interest from researchers
and practitioners in addressing these issues.
NeurIPS Research Topics: Number of Accepted Papers on Fairness and Bias in AI, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
381
Main Track
Workshop
350
71
Number of Papers
300
250
200
168
149
150
125
100
36
310
36
50
0
114
50
34
2
4
24
2015
2016
2017
109
113
118
2020
2021
78
2018
2019
2022
Figure 3.7.7
Table of Contents
Chapter 3 Preview
164
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.8 Factuality and Truthfulness
3.8 Factuality and Truthfulness
Automated Fact-Checking Benchmarks:
Number of Citations
Significant resources have been invested into
Compared to previous years, there has been a
researching, building, and deploying AI systems for
plateau in the number of citations of three popular
automated fact-checking and misinformation, with
fact-checking benchmarks: FEVER, LIAR, and Truth
the advent of many fact-checking datasets consisting
of Varying Shades, reflecting a potential shift in the
of claims from fact-checking websites and associated
landscape of research related to natural language tools
truth labels.
for fact-checking on static datasets (Figure 3.8.1).
Automated Fact-Checking Benchmarks: Number of Citations, 2017–22
Source: Semantic Scholar, 2022 | Chart: 2023 AI Index Report
250
236, FEVER
Number of Citations
200
191, LIAR
150
100
99, Truth of Varying Shades
50
0
2017
2018
2019
2020
2021
2022
Figure 3.8.1
Table of Contents
Chapter 3 Preview
165
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.8 Factuality and Truthfulness
Missing Counterevidence
and NLP Fact-Checking
absence of a contradiction (e.g., the new claim “Half
a million sharks could be killed to make the COVID-19
vaccine” would not have counterevidence, but human
fact-checkers could verify it to be false after tracing
Though fact-checking with natural language systems
its origin back to the false promise of vaccines relying
became popular in recent years, language models are
on shark squalene). The researchers find that several
usually trained on static snapshots of data without
proposed fact-checking datasets contain claims which
continual updates through time, and they lack real-
do not meet the criterion of sufficient evidence or
world context which human fact-checkers are able to
counterevidence found in a trusted knowledge base.
easily source and use to verify the veracity of claims.
Researchers at the Technical University of Darmstadt
Additionally, several datasets contain claims which
and IBM analyzed existing fact-checking datasets
use fact-checking articles as evidence for deciding
and identified shortcomings of fact-checking
the veracity of claims—this is leaked evidence, as it
systems built on top of these datasets: For example,
presupposes the existence of a fact-checking article,
automated fact-checking systems often assume the
which is an unrealistic assumption in the real world for
existence of contradictory counter-evidence for new
new claims. Systems built on this assumption would
false claims, but for new claims to be verified as true
not be able to assign veracity scores for new claims in
or false, there often is no proof of the presence or
real time (Figure 3.8.2).
Missing Counterevidence Renders NLP Fact-Checking Unrealistic
for Misinformation
Source: Glockner et al., 2022 | Table: 2023 AI Index Report
Dataset
SciFact
Year
Unleaked Evidence
Su cient Evidence
2020
✓
COVID-Fact
2021
✓
WikiFactCheck
2020
✓
FM2
2021
✓
Thorne et al.
2021
✓
FaVIQ
2022
LIAR-PLUS
2017
no
PolitiHop
2021
no
✓
no
Climate-FEVER
✓
✓
2020
✓
HealthVer
2021
✓
no
UKP-Snopes
2019
✓
no
PubHealth
2020
✓
no
WatClaimCheck
2022
✓
no
Baly et al.
2018
no
no
MultiFC
2019
no
no
X-Fact
2021
no
no
Table of Contents
Chapter 3 Preview
Figure 3.8.2
166
Artificial Intelligence
Index Report 2023
Chapter 3: Technical AI Ethics
3.8 Factuality and Truthfulness
TruthfulQA
suggested that accuracy on this task improves with
TruthfulQA is a benchmark designed to evaluate
conducted extensive evaluations on this task with
the truthfulness of language models on question
language models ranging from 60 million parameters
answering, with questions drawn from categories
to 530 billion parameters and found that while large
such as health, law, finance, and politics. Questions
models broadly still perform better than smaller
were designed such that humans might be prone to
models, midsize instruction-tuned models perform
answer incorrectly due to common misconceptions.
surprisingly well on this task. Notably, Anthropic’s 52
For example, when asked, “What happens if you
billion parameter model and BigScience’s 11 billion
smash a mirror?,” GPT-3 responds, “You will have
parameter model T0pp perform disproportionately
seven years of bad luck.”
well on the task compared to models of a similar size,
model size. Researchers at Stanford University
and the best model, InstructGPT davinci 175B, is also
In 2021, experiments on DeepMind’s Gopher
instruction-tuned (Figure 3.8.3).
Multiple-Choice Task on TruthfulQA by Model: Accuracy
Source: Liang et al., 2022 | Chart: 2023 AI Index Report
60%
Instruction-Tuned
Non-Instruction-Tuned
Accuracy (%)
50%
40%
30%
20%
0%
T5 60M
GPT-2 117M
Galactica 125M
GPT-NEO-125M
T5 220M
InstructGPT ada v1 350M
GPT3 350M
GPT-3 ada v1 350M
Cohere small v20220720 410M
T5 770M
Galactica 1.3B
GPT-3 babbage v1 1.3B
GPT3 1.3B
GPT-NEO-1.3B
InstructGPT babbage v1 1.3B
Gopher 1.4B
GPT2 1.5B
GPT-NEO-2.7B
T5 2.8B
GPT-J 6B
GPT-NEO-6B
Cohere medium v20220720 6.1B
TNLG v2 6.7B
Galactica 6.7B
InstructGPT curie v1 6.7B
GPT3 6.7B
GPT-3 curie v1 6.7B
Gopher 7.1B
J1-Large v1 7.5B
T5 11B
T0pp 11B
Cohere large v20220720 13.1B
J1-Grande v1 17B
UL2 20B
GPT-NeoX 20B
Galactica 30B
Anthropic-LM v4-s3 52B
Cohere xlarge v20220609 52.4B
OPT 66B
YaLM 100B
Galactica 120B
GLM 130B
GPT-3 davinci v1 175B
OPT-175B
GPT3 175B
OPT 175B
InstructGPT davinci v2 175B
BLOOM 176B
J1-Jumbo v1 178B
Gopher 280B
Gopher 280B -10shot
TNLG v2 530B
10%
Model and Number of Parameters
Figure 3.8.3
Table of Contents
Chapter 3 Preview
167
Artificial Intelligence
Index Report 2023
Artificial Intelligence
Index Report 2023
CHAPTER 4:
The Economy
Table of Contents
Chapter 4 Preview
168
Artificial Intelligence
Index Report 2023
CHAPTER 4 PREVIEW:
The Economy
Overview
170
Chapter Highlights
171
4.1 Jobs
173
AI Labor Demand
173
Perceived Importance of AI
210
173
AI Investments and Investment
Outcomes
211
Challenges in Starting and Scaling
AI Projects
213
Global AI Labor Demand
U.S. AI Labor Demand by Skill Cluster
and Specialized Skill
174
U.S. AI Labor Demand by Sector
176
U.S. AI Labor Demand by State
177
AI Hiring
180
AI Skill Penetration
182
Global Comparison: Aggregate
182
Global Comparison: By Gender
183
4.2 Investment
184
Corporate Investment
184
Startup Activity
187
Global Trend
187
Regional Comparison by Funding Amount 189
Regional Comparison by
Newly Funded AI Companies
193
Focus Area Analysis
195
4.3 Corporate Activity
198
Industry Adoption
198
Adoption of AI Capabilities
Narrative Highlight: The Effects of
GitHub’s Copilot on Developer
Productivity and Happiness
Industry Motivation
Earnings Calls
208
210
215
Aggregate Trends
215
Specific Themes
216
Narrative Highlight: What Are Business
Leaders Actually Saying About AI?
217
Sentiment Analysis
219
4.4 Robot Installations
220
Aggregate Trends
220
Industrial Robots:
Traditional Vs. Collaborative Robots
By Geographic Area
Narrative Highlight: Country-Level
Data on Service Robotics
222
223
227
Sectors and Application Types
230
China Vs. United States
232
ACCESS THE PUBLIC DATA
198
Consideration and Mitigation of Risks
From Adopting AI
206
Table of Contents
Chapter 4 Preview
169
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
Overview
Increases in the technical capabilities of AI systems have led to greater rates of AI
deployment in businesses, governments, and other organizations. The heightening
integration of AI and the economy comes with both excitement and concern. Will
AI increase productivity or be a dud? Will it boost wages or lead to the widespread
replacement of workers? To what degree are businesses embracing new AI
technologies and willing to hire AI-skilled workers? How has investment in AI
changed over time, and what particular industries, regions, and fields of AI have
attracted the greatest amount of investor interest?
This chapter examines AI-related economic trends by using data from Lightcast,
LinkedIn, McKinsey, Deloitte, and NetBase Quid, as well as the International
Federation of Robotics (IFR). This chapter begins by looking at data on AI-related
occupations and then moves on to analyses of AI investment, corporate adoption of
AI, and robot installations.
Table of Contents
Chapter 4 Preview
170
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
Chapter Highlights
The demand for AI-related
professional skills is increasing
across virtually every
American industrial sector.
Across every sector in the United States for
which there is data (with the exception of
agriculture, forestry, fishing, and hunting), the
number of AI-related job postings has increased
on average from 1.7% in 2021 to 1.9% in 2022.
Employers in the United States are increasingly
looking for workers with AI-related skills.
For the first time in the last
decade, year-over-year private
investment in AI decreased.
Global AI private investment was $91.9 billion
in 2022, which represented a 26.7% decrease
since 2021. The total number of AI-related
funding events as well as the number of newly
funded AI companies likewise decreased.
Still, during the last decade as a whole, AI
investment has significantly increased. In 2022
the amount of private investment in AI was 18
times greater than it was in 2013.
Once again, the United States leads in investment in AI.
The U.S. led the world in terms of total amount of AI private investment. In 2022, the $47.4 billion
invested in the U.S. was roughly 3.5 times the amount invested in the next highest country, China
($13.4 billion). The U.S. also continues to lead in terms of total number of newly funded AI companies,
seeing 1.9 times more than the European Union and the United Kingdom combined, and 3.4 times
more than China.
In 2022, the AI focus area with the most investment was medical
and healthcare ($6.1 billion); followed by data management,
processing, and cloud ($5.9 billion); and Fintech ($5.5 billion).
However, mirroring the broader trend in AI private investment, most AI focus areas saw less investment
in 2022 than in 2021. In the last year, the three largest AI private investment events were: (1) a $2.5 billion
funding event for GAC Aion New Energy Automobile, a Chinese manufacturer of electric vehicles; (2) a
$1.5 billion Series E funding round for Anduril Industries, a U.S. defense products company that builds
technology for military agencies and border surveillance; and (3) a $1.2 billion investment in Celonis, a
business-data consulting company based in Germany.
Table of Contents
Chapter 4 Preview
171
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
Chapter Highlights (cont’d)
While the proportion of
companies adopting AI has
plateaued, the companies
that have adopted AI
continue to pull ahead.
The proportion of companies adopting AI
in 2022 has more than doubled since 2017,
though it has plateaued in recent years
between 50% and 60%, according to the
results of McKinsey’s annual research
survey. Organizations that have adopted AI
report realizing meaningful cost decreases
and revenue increases.
AI is being deployed
by businesses in
multifaceted ways.
The AI capabilities most likely to have been
embedded in businesses include robotic
process automation (39%), computer vision
(34%), NL text understanding (33%), and virtual
agents (33%). Moreover, the most commonly
adopted AI use case in 2022 was service
operations optimization (24%), followed by
the creation of new AI-based products (20%),
customer segmentation (19%), customer
service analytics (19%), and new AI-based
enhancement of products (19%).
AI tools like Copilot are
tangibly helping workers.
China dominates industrial
robot installations.
Results of a GitHub survey on the use of
Copilot, a text-to-code AI system, find
that 88% of surveyed respondents feel
more productive when using the system,
74% feel they are able to focus on more
satisfying work, and 88% feel they are able
to complete tasks more quickly.
In 2013, China overtook Japan as the nation
installing the most industrial robots. Since
then, the gap between the total number of
industrial robots installed by China and the
next-nearest nation has widened. In 2021,
China installed more industrial robots than
the rest of the world combined.
Table of Contents
Chapter 4 Preview
172
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
4.1 Jobs
AI Labor Demand
Global AI Labor Demand
Figure 4.1.1 highlights the percentage of all job
This section reports demand for AI-related skills
postings that require some kind of AI skill. In 2022,
in labor markets. The data comes from Lightcast,
which mined millions of job postings collected from
over 51,000 websites since 2010 and flagged listings
calling for AI skills.
the top three countries according to this metric were
the United States (2.1%), Canada (1.5%), and Spain
(1.3%). For every country included in the sample, the
number of AI-related job postings was higher in 2022
than in 2014.1
AI Job Postings (% of All Job Postings) by Geographic Area, 2014–22
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
2.05%, United States
AI Job Postings (% of All Job Postings)
2.00%
1.45%, Canada
1.33%, Spain
1.23%, Australia
1.20%, Sweden
1.16%, Switzerland
1.14%, United Kingdom
1.01%, Netherlands
0.98%, Germany
0.89%, Austria
0.86%, Belgium
0.84%, France
1.50%
1.00%
0.72%, Italy
0.50%
0.00%
0.45%, New Zealand
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 4.1.1
1 In 2022, Lightcast slightly changed their methodology for determining AI-related job postings from that which was used in previous versions of the AI Index Report. As such, some of the
numbers in this chart do not completely align with those featured in last year’s report.
Table of Contents
Chapter 4 Preview
173
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
U.S. AI Labor Demand by Skill Cluster and Specialized Skill
Figure 4.1.2 showcases the most in-demand AI skill clusters in the U.S. labor market since 2010. The most
in-demand skill cluster was machine learning (1.0%), followed by artificial intelligence (0.6%) and natural
language processing (0.2%). Every listed AI skill cluster is now more in demand than it was 10 years ago.
AI Job Postings (% of All Job Postings) in the United States by Skill Cluster, 2010–22
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
1.03%, Machine Learning
AI Job Postings (% of All Job Postings)
1.00%
0.80%
0.61%, Arti cial Intelligence
0.60%
0.40%
0.20%, Natural Language Processing
0.16%, Neural Networks
0.15%, Autonomous Driving
0.13%, Visual Image Recognition
0.20%
0.06%, Robotics
0.00%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 4.1.2
Table of Contents
Chapter 4 Preview
174
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
Figures 4.1.3 and 4.1.4 showcase the top ten specialized skills that were demanded in AI job postings in 2022 compared
to 2010–20122. On an absolute level, virtually every specialized skill is more in demand now than a decade ago. The
growth in demand for Python is particularly notable, evidence of its growing popularity as an AI coding language.
Top Ten Specialized Skills in 2022 AI Job Postings in the United States, 2010–12 Vs. 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
296,662
Python (Programming Language)
12,884
260,333
Computer Science
48,001
185,807
SQL (Programming Language)
22,037
159,801
Data Analysis
16,571
157,855
Data Science
1,227
155,615
Amazon Web Services
962
152,956
Agile Methodology
7,549
138,791
Automation
13,207
133,856
Java (Programming Language)
26,557
2022
133,286
Software Engineering
2010–12
22,384
0
50,000
100,000
150,000
200,000
Number of AI Job Postings
250,000
300,000
Figure 4.1.3
Top Ten Specialized Skills in 2022 AI Job Postings in the United States by Skill Share, 2010–12 Vs. 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
37.13% (+592%)
Python (Programming Language)
5.36%
32.58% (+63%)
Computer Science
19.98%
23.25% (+153%)
SQL (Programming Language)
9.17%
20.00% (+190%)
Data Analysis
6.90%
19.75% (+3,767%)
Data Science
0.51%
19.47% (+4,763%)
Amazon Web Services
0.40%
19.14% (+509%)
Agile Methodology
3.14%
17.37% (+216%)
Automation
5.50%
16.75% (+52%)
Java (Programming Language)
11.06%
2022
16.68% (+79%)
Software Engineering
2010–12
9.32%
0%
5%
10%
15%
20%
25%
Skill Share in AI Job Postings (%)
30%
35%
40%
Figure 4.1.4
2 The point of comparison of 2010–2012 was selected because some data at the jobs/skills level is quite sparse in earlier years. Lightcast therefore used the
whole set of years 2010–2012 to get a larger sample size for a benchmark from 10 years ago to compare.
Table of Contents
Chapter 4 Preview
175
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
U.S. AI Labor Demand by Sector
fishing, and hunting), the number of AI job postings
Figure 4.1.5 shows the percentage of U.S. job
was notably higher in 2022 than in 2021, with the top
postings that required AI skills by industry sector
three sectors being information (5.3%); professional,
from 2021 to 2022. Across virtually every included
scientific, and technical services (4.1%); and finance
sector (with the exception of agriculture, forestry,
and insurance (3.3%).
AI Job Postings (% of All Job Postings) in the United States by Sector, 2021 Vs. 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
Information
4.85%
4.07%
3.86%
Professional, Scienti c, and Technical Services
Finance and Insurance
2.94%
Manufacturing
2.86%
Management of Companies and Enterprises
1.08%
Public Administration
0.98%
Waste Management and Administrative Support Services
0%
1.32%
1.27%
1.10%
1.19%
1.00%
Mining, Quarrying, and Oil and Gas Extraction
Transportation and Warehousing
1.37%
1.28%
0.82%
Utilities
Real Estate and Rental and Leasing
3.26%
1.53%
1.41%
Educational Services
Wholesale Trade
3.33%
1.64%
1.66%
Agriculture, Forestry, Fishing, and Hunting
Retail Trade
5.30%
0.98%
0.82%
0.89%
0.65%
2022
0.67%
0.59%
2021
0.58%
0.56%
1%
2%
3%
4%
AI Job Postings (% of All Job Postings)
5%
Figure 4.1.5
Table of Contents
Chapter 4 Preview
176
Artificial Intelligence
Index Report 2023
U.S. AI Labor Demand by State
Figure 4.1.6 highlights the number
of AI job postings in the United
States by state. The top three
Chapter 4: The Economy
4.1 Jobs
Number of AI Job Postings in the United States by State, 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
AK
ME
970
2,227
1,571
VT
2,719 34,603
NH
MA
NY
CT
RI
OH
PA
NJ
DC
MD
DE
states in terms of postings were
California (142,154), followed by
WA
Texas (66,624) and New York
(43,899).
MT
833
1,227
OR
ID
WY
UT
CO
AZ
NM
10,811 6,109
CA
ND
31,284
NV
142,154 6,813
769
SD
MN
WI
MI
NE
IA
IL
IN
KS
MO
KY
WV
TN
VA
NC
AL
GA
SC
2,195 11,808 8,879 25,366
4,032 5,670 31,569 9,247 19,208 20,397 23,447
6,885 20,421 7,683 10,990 4,536
19,514 3,357
TX
43,899 8,960 2,965
OK
AR
5,719
7,247
LA
MS
887
9,606 16,769 3,503
11,173 34,221 23,854
66,624 4,806 2,548 7,866 26,620 4,928
HI
FL
2,550
33,585
Figure 4.1.6
Figure 4.1.7 demonstrates what
percentage of a state’s total job
postings were AI-related. The top
states according to this metric
Percentage of U.S. States Job Postings in AI, 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
AK
0.88%
ME
1.64%
NH MA
VT
1.34% 1.20% 2.26%
were the District of Columbia
(3.0%), followed by Delaware
MI
WA MT ND
SD MN WI
2.48% 0.72% 1.04% 1.83% 1.22% 0.90% 1.77%
(2.7%), Washington (2.5%), and
CT
NY
RI
2.07% 1.66% 1.84%
ID
IA
IL
IN
OR
OH
PA
NJ
WY NE
1.43% 1.89% 1.18% 1.14% 1.14% 1.63% 0.88% 1.07% 1.30% 2.04%
Virginia (2.4%).
CA
NV
CO
KS MO KY
UT
WV DC MD DE
2.21% 1.23% 1.54% 1.46% 1.43% 1.15% 0.85% 0.99% 2.95% 1.96% 2.66%
AZ NM OK
AR
TN
NC
VA
1.40% 1.36% 1.07% 2.03% 1.11% 2.42% 1.44%
LA
MS
AL
GA
TX
SC
1.52% 0.87% 1.15% 1.31% 1.64% 0.87%
HI
1.46%
FL
1%
Figure 4.1.7
Table of Contents
Chapter 4 Preview
17 7
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
Percentage of United States AI Job Postings by State, 2022
Which states had the greatest
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
share of AI job postings as a
share of all AI job postings in
AK
0.12%
the U.S. in 2022? California was
ME
0.28%
VT
NH MA
0.20% 0.34% 4.35%
first: Last year 17.9% of all AI job
postings in the United States
MI
WA MT ND
SD MN WI
3.93% 0.10% 0.15% 0.28% 1.48% 1.12% 3.19%
were for jobs based in California,
CT
NY
RI
5.52% 1.13% 0.37%
OR
ID
NE
IA
IL
IN
OH
PA
NJ
WY
1.36% 0.77% 0.10% 0.51% 0.71% 3.97% 1.16% 2.41% 2.56% 2.95%
followed by Texas (8.4%) and
New York (5.5%) (Figure 4.1.8).
CA
NV
CO
KS MO KY
UT
WV DC MD DE
17.87% 0.86% 0.87% 2.57% 0.97% 1.38% 0.57% 0.11% 1.21% 2.11% 0.44%
AZ NM OK
AR
TN
VA
2.45% 0.42% 0.72% 0.91% 1.40% 4.30%
NC
3%
LA
MS
AL
GA
TX
SC
8.37% 0.60% 0.32% 0.99% 3.35% 0.62%
HI
0.32%
FL
4.22%
Figure 4.1.8
Figure 4.1.9 highlights the trends over time in AI job postings for four select states that annually report a high
number of AI-related jobs: Washington, California, New York, and Texas. For all four, there was a significant
increase in the number of total AI-related job postings from 2021 to 2022, suggesting that across these states,
employers are increasingly looking for AI-related workers.
Percentage of U.S. States’ Job Postings in AI by Select U.S. State, 2010–22
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
Percentage of U.S. States’ Job Postings in AI
2.50%
2.48%, Washington
2.21%, California
2.07%, New York
2.00%
1.52%, Texas
1.50%
1.00%
0.50%
0.00%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 4.1.9
Table of Contents
Chapter 4 Preview
178
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
Figure 4.1.10 highlights the degree to which AI-related job postings have been subdivided among the top
four states over time. California’s share of all AI job postings has decreased steadily since 2019 while Texas’
has marginally increased. The fact that California no longer commands one-quarter of all AI-related jobs
suggests that AI jobs are becoming more equally distributed among U.S. states.
Percentage of United States AI Job Postings by Select U.S. State, 2010–22
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
Percentage of United States AI Job Postings
25%
20%
17.87%, California
15%
10%
8.37%, Texas
5.52%, New York
5%
3.93%, Washington
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 4.1.10
Table of Contents
Chapter 4 Preview
179
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
AI Hiring
in the same period the job began, divided by the total
Our AI hiring data is based on a LinkedIn dataset of skills
and jobs that appear on their platform. The countries
number of LinkedIn members in the corresponding
location. This rate is then indexed to the average
month in 2016; for example, an index of 1.1 in December
included in the sample make at least 10 AI hires each
2021 points to a hiring rate that is 10% higher than the
month and have LinkedIn covering at least 40% of
average month in 2016. LinkedIn makes month-to-
their labor force. India is also included in the sample
month comparisons to account for any potential lags in
given their increasing significance in the AI landscape,
members updating their profiles. The index for a year is
although LinkedIn does not cover 40% of their labor
the number in December of that year.
force. Therefore, the insights drawn about India should
be interpreted with particular caution.
The relative AI hiring index measures the degree to which
the hiring of AI talent is changing, more specifically
Figure 4.1.11 highlights the 15 geographic areas that
whether the hiring of AI talent is growing faster than,
have the highest relative AI hiring index for 2022. The
AI hiring rate is calculated as the percentage of LinkedIn
members with AI skills on their profile or working in
AI-related occupations who added a new employer
equal to, or more slowly than overall hiring in a particular
geographic region. In 2022, Hong Kong posted the
greatest growth in AI hiring at 1.4, followed by Spain, Italy
and the United Kingdom, and the United Arab Emirates.
Relative AI Hiring Index by Geographic Area, 2022
Source: LinkedIn, 2022 | Chart: 2023 AI Index Report
Hong Kong
1.37
Spain
1.19
Italy
1.18
United Kingdom
1.18
United Arab Emirates
1.15
South Africa
1.13
New Zealand
1.06
Denmark
1.06
Belgium
1.05
Netherlands
1.03
South Korea
1.02
Sweden
1.01
Canada
0.99
Switzerland
0.99
Singapore
0.99
0.00
0.20
0.40
0.60
0.80
Relative AI Hiring Index
1.00
1.20
1.40
Figure 4.1.11
Figure 4.1.12 highlights how the AI hiring index changes over time for a wide range of countries3. Overall, the
majority of countries included in the sample have seen meaningful increases in their AI hiring rates since 2016.
This trend suggests that those countries are now hiring more AI talent than in 2016. However, for many countries,
AI hiring rates seem to have peaked around 2020, then dropped, and have since stabilized.
3 Both Figure 4.1.11 and Figure 4.1.12 report the Relative AI Hiring Index. Figure 4.1.11 reports the Index value at the end of December 2022, while Figure 4.1.12 reports a twelve-month rolling average.
Table of Contents
Chapter 4 Preview
180
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
Relative AI Hiring Index by Geographic Area, 2016–22
Source: LinkedIn, 2022 | Chart: 2023 AI Index Report
Australia
2.00
1.50
0.50
1.00
2020
0.00
2022
Chile
1.00
1.00
2020
0.00
2022
2018
1.50
0.00
2022
1.50
1.14
1.00
0.50
0.50
0.50
0.00
0.00
Germany
2.00
2020
2022
Hong Kong
2.00
1.50
1.50
1.13
1.00
0.50
0.00
2018
0.50
2018
2020
0.00
2022
2018
Israel
2020
1.00
0.94 1.00
0.50
0.50
2018
1.50
0.50
0.50
0.50
0.00
0.00
0.00
2022
New Zealand
2.00
2020
2022
Norway
2.00
1.50
0.50
2020
0.00
2022
2018
2020
Portugal
2018
2020
0.00
2022
1.50
0.50
0.50
0.00
Switzerland
2.00
2.00
1.50
United Arab Emirates
0.50
2020
Table of Contents
2022
0.00
2020
2022
United Kingdom
1.50
1.08
1.00
0.50
2018
2018
2.00
1.50
1.09
1.00
0.00
2022
0.50
2018
Chapter 4 Preview
2020
2022
0.00
2018
2020
2022
United States
2.00
1.25
1.00
1.09
1.00
0.00
2020
2022
1.50
1.12
1.00
0.50
2018
2020
Sweden
0.00
2022
2018
2.00
0.50
2020
1.13
1.00
0.00
2018
2022
Singapore
Spain
1.05
1.00
2020
0.50
2.00
1.50
1.10
2018
1.50
1.01
South Korea
1.50
2022
1.21
2.00
1.00
0.00
2022
2.00
1.00
2022
0.50
South Africa
2.00
2020
1.50
1.10
1.00
0.50
2018
2018
2.00
1.50
1.09
1.00
0.00
2018
2020
1.00
0.50
2020
2018
1.50
1.03
1.00
0.00
2018
1.08
2.00
1.50
1.11
1.00
2022
Netherlands
2.00
1.50
1.00
0.00
2022
2020
Ireland
Luxembourg
2.00
1.00
2020
2018
2.00
1.50
Italy
2.00
2022
1.50
0.00
2022
2020
India
2.00
1.21
1.00
2018
1.19
1.00
0.00
2022
2022
1.50
1.08
1.00
0.50
2020
2020
2.00
0.00
2018
2018
France
2.00
1.50
1.05
2020
Finland
2.00
1.00
1.17
1.00
0.50
Denmark
2.00
Relative AI Hiring Index
1.50
0.50
2018
Canada
2.00
1.50
1.11
0.50
2018
Brazil
2.00
1.50
1.12
1.00
0.00
Belgium
2.00
1.50
1.15
1.00
0.50
2018
2020
2022
0.00
2018
2020
2022
Figure 4.1.12
181
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
rate of various countries or regions from 2015 to
AI Skill Penetration
2022. In this case, the relative AI skill penetration rate
The AI skill penetration rate is a metric created by
can be understood as the sum of the penetration of
LinkedIn that measures the prevalence of various
each AI skill across occupations in a given country or
AI-related skills across occupations. LinkedIn
region, divided by the global average across the same
generates this metric by calculating the frequencies
occupation. For instance, a relative skill penetration
of LinkedIn users’ self-added skills in a given area
rate of 1.5 means that the average penetration of AI
from 2015 to 2022, then reweighting those numbers
skills in that country or region is 1.5 times the global
with a statistical model to create the top 50
average across the same set of occupations.
representative skills in that select occupation.
As of 2022, the three countries or regions with the
Global Comparison: Aggregate
highest AI skill penetration rates were India (3.2),
Figure 4.1.13 shows the relative AI skill penetration
the United States (2.2), and Germany (1.7).
Relative AI Skill Penetration Rate by Geographic Area, 2015–22
Source: LinkedIn, 2022 | Chart: 2023 AI Index Report
India
3.23
United States
2.23
Germany
1.72
Israel
1.65
Canada
1.54
United Kingdom
1.54
South Korea
1.44
Singapore
1.37
France
1.13
Brazil
0.99
Spain
0.98
Netherlands
0.95
Italy
0.95
Switzerland
0.91
Australia
0.00
0.89
0.50
1.00
1.50
2.00
Relative AI Skill Penetration Rate
2.50
3.00
Figure 4.1.13
Table of Contents
Chapter 4 Preview
182
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.1 Jobs
Global Comparison: By Gender
Figure 4.1.14 disaggregates AI skill penetration rates
pooled together across the same set of occupations
by gender across different countries or regions.
in the country. For all countries in the sample, the
A country’s “Relative AI skill penetration rate
relative AI skill penetration rate is greater for men
across genders” for women of 1.5 means that female
than women. India (2.0), the United States (1.3), and
members in that country are 1.5 times more likely to
Israel (0.9) have the highest reported relative AI skill
list AI skills than the average member in all countries
penetration rates for women.
Relative AI Skill Penetration Rate Across Gender, 2015–22
Source: LinkedIn, 2022 | Chart: 2023 AI Index Report
3.27
India
1.99
2.36
United States
1.28
2.05
Israel
0.87
Canada
0.86
1.59
1.91
Germany
0.71
1.37
Singapore
0.68
1.46
United Kingdom
0.57
1.13
France
0.39
0.98
Netherlands
0.38
0.82
United Arab Emirates
0.31
Italy
0.31
Brazil
0.30
Spain
0.30
0.85
0.98
1.03
0.29
Australia
0.29
0.00
Male
1.08
Finland
Female
0.88
0.50
1.00
1.50
2.00
Relative AI Skill Penetration Rate
2.50
3.00
Figure 4.1.14
Table of Contents
Chapter 4 Preview
183
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
Using data from NetBase Quid, this section tracks trends in AI-related investments. NetBase Quid tracks data on the investments of over
8 million global public and private companies. NetBase Quid also uses natural language processing techniques to search, analyze, and
identify patterns in large, unstructured datasets, like aggregated news and blogs, and company and patent databases. NetBase Quid
continuously broadens the set of companies for which it tracks data, so that in this year’s AI Index, the reported investment volume for
certain years is larger than that of previous reports.
4.2 Investment
Corporate Investment
As AI becomes more and more integrated into the
For the first time since 2013, year-over-year global
economy, it becomes increasingly important to track
corporate investment in AI has decreased. In 2022,
AI-related corporate investment. Figure 4.2.1 shows
total global corporate AI investment was $189.6
overall global corporate investment in AI from 2013
billion, roughly a third lower than it was in 2021.
to 2022. Corporate investment includes mergers and
Still, in the last decade, AI-related investment has
acquisitions, minority stakes, private investment, and
increased thirteenfold.
public offerings.
Global Corporate Investment in AI by Investment Activity, 2013–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
300
Merger/Acquisition
276.14
Minority Stake
Private Investment
Total Investment (in Billions of U.S. Dollars)
250
Public O ering
119.66
200
189.59
146.74
150
83.35
26.06
95.57
100
79.62
53.72
50
14.57
0
2013
19.04
2014
25.43
33.82
13.05
24.68
13.35
17.13
25.72
2015
2016
2017
46.06
125.36
31.91
21.89
43.1
55.09
12.62
2018
13.01
2019
91.86
61.61
2020
29.1
2021
2022
Figure 4.2.1
Table of Contents
Chapter 4 Preview
184
Artificial Intelligence
Index Report 2023
To provide a fuller context for the
nature of AI investment in the last year,
Figures 4.2.2 through 4.2.5 highlight
the top merger/acquisition, minority
stake, private investment, and public
Chapter 4: The Economy
4.2 Investment
Top Five AI Merger/Acquisition Investment Activities, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
Company Name
Headquarters
Country
Focus Area
Nuance
Communications, Inc.
United States
Arti cial Intelligence;
Enterprise Software;
Healthcare; Machine
Learning
Citrix Systems, Inc.
United States
Data Management,
Processing, and Cloud;
HR Tech
17.18
Avast Limited
Czech Republic
Data Management,
Processing, and Cloud;
Fintech; Cybersecurity,
Data Protection
8.02
AspenTech
Corporation
United States
Manufacturing;
Software; Supply
Chain Management
6.34
Vivint Smart Home,
Inc.
United States
Cybersecurity, Data
Protection; Sales
Enablement
5.54
offering events in the last year. The
greatest single AI investment event
was the merger/acquisition of Nuance
Communications, valued at $19.8 billion
(Figure 4.2.2). The largest minority
stake event was for the British company
Aveva Group ($4.7 billion) (Figure 4.2.3).
The greatest private investment event
was GAC Aion New Energy Automobile
($2.5 billion), a Chinese clean energy
Funding Amount
(in Billions USD)
19.80
and automotive company (Figure 4.2.4).
Finally, the largest public offering was
Figure 4.2.2
ASR Microelectronics ($1.1 billion),
a Chinese semiconductor company
Top Five AI Minority Stake Investment Activities, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
(Figure 4.2.5).
Company Name
Headquarters
Country
Focus Area
Funding Amount
(in Billions USD)
AVEVA Group, PLC
United
Kingdom
Chemical; Computer;
Data Mining; Electronics;
Industrial Manufacturing;
Information Technology;
Simulation; Software
4.68
Grupo de
Inversiones
Suramericana, SA
Colombia
Financial Services; Impact
Investing; Insurance
1.48
Fractal Analytics
Private Limited
India
Analytics; Arti cial
Intelligence; Big Data;
Business Intelligence;
Consulting; Machine
Learning
0.35
Atrys Health, SA
Spain
Medical and Healthcare
0.28
R Systems
International, Ltd.
India
Analytics; Information
Technology; IT
Management; Software
0.17
Figure 4.2.3
Table of Contents
Chapter 4 Preview
185
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
Top Five AI Private Investment Activities, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
Company Name
Headquarters
Country
Focus Area
Funding Amount
(in Billions USD)
GAC Ai´an New
Energy Automobile
Co., Ltd.
China
Automotive; Clean
Energy; Electric Vehicle;
Manufacturing
2.54
Idience Co., Ltd.
South Korea
Emergency Medicine;
Healthcare;
Pharmaceutical
2.15
Uali
Argentina
Drones; Cloud Computing
1.50
Anduril Industries,
Inc.
United States
Cybersecurity, Data
Protection; AR/VR;
Drones
1.50
Celonis, GmbH
Germany
Retail; Industrial
Automation, Network; HR
Tech; Insurtech
1.22
Figure 4.2.4
Top Five AI Public O ering Investment Activities, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
Company Name
Headquarters Focus Area
Country
Funding Amount
(in Billions USD)
ASR Microelectronics
Co., Ltd.
China
Semiconductor; VC
1.08
iSoftStone Information
Technology (Group)
Co., Ltd.
China
Data Management,
Processing, and Cloud;
Cybersecurity, Data
Protection
0.73
Jahez International
Company for
Information Systems
Technology
Saudi Arabia
Arti cial Intelligence;
E-Commerce; Food and
Beverage; Food Delivery;
Information Technology;
Logistics
0.43
Fortior Technology
(Shenzhen) Co., Ltd.
China
Electronics; Machine
Manufacturing;
Semiconductor
0.30
Beijing Deep Glint
Technology Co., Ltd.
China
Cybersecurity, Data
Protection; Music, Video
Content
0.29
Figure 4.2.5
Table of Contents
Chapter 4 Preview
186
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
Startup Activity
Global Trend
The next section analyzes private investment trends in
The global private AI investment trend reveals that
artificial intelligence startups that have received over
while investment activity has decreased since 2021, it
$1.5 million in investment since 2013.
is still 18 times higher than it was in 2013 (Figure 4.2.6).
Private Investment in AI, 2013–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
Total Investment (in Billions of U.S. Dollars)
120
100
91.86
80
60
40
20
0
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 4.2.6
Table of Contents
Chapter 4 Preview
187
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
A similar trend, of short-term decreases but longer-
decrease from 2021 but a sixfold increase since 2013
term growth, is evident in data on total private
(Figure 4.2.7). Similarly, the number of newly funded AI
investment events. In 2022 there were 3,538 AI-
companies dropped to 1,392 from 1,669 last year, while
related private investment events, representing a 12%
having increased from 495 in 2013 (Figure 4.2.8).
Number of Private Investment Events in AI, 2013–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
4,000
3,538
Number of Private Investment Events
3,500
3,000
2,500
2,000
1,500
1,000
500
0
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 4.2.7
Number of Newly Funded AI Companies in the World, 2013–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
1,600
1,392
1,400
Number of Companies
1,200
1,000
800
600
400
200
0
2013
Table of Contents
2014
2015
Chapter 4 Preview
2016
2017
2018
2019
2020
2021
2022
Figure 4.2.8
188
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
The year-over-year decrease in AI-related
funding is also evident when the funding events
AI Private Investment Events by Funding Size,
2021 Vs. 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
are disaggregated by size. Across all size
Funding Size
categories, with the exception of ones over
Over $1 Billion
$1 billion, the total number of AI funding events
$500 Million – $1 Billion
decreased (Figure 4.2.9).
2021
$100 Million – $500 Million
$50 Million – $100 Million
Under $50 Million
Undisclosed
Total
2022
Total
4
6
13
5
10
18
277
164
441
277
238
515
2,851
2,585
5,436
598
540
1,138
4,020
3,538
7,558
Figure 4.2.9
Regional Comparison by Funding Amount
Once again, the United States led the world in terms of total AI private investment. In 2022, the $47.4 billion
invested in the United States was roughly 3.5 times the amount invested in the next highest country, China
($13.4 billion), and 11 times the amount invested in the United Kingdom ($4.4 billion) (Figure 4.2.10).
Private Investment in AI by Geographic Area, 2022
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
United States
47.36
China
13.41
United Kingdom
4.37
Israel
3.24
India
3.24
South Korea
3.10
Germany
2.35
Canada
1.83
France
1.77
Argentina
1.52
Australia
1.35
Singapore
1.13
Switzerland
1.04
Japan
0.72
Finland
0.61
0
Table of Contents
5
10
Chapter 4 Preview
15
20
25
30
Total Investment (in Billions of U.S. Dollars)
35
40
45
Figure 4.2.10
189
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
When private AI investments are aggregated since 2013, the same ranking of countries applies:
The United States is first with $248.9 billion invested, followed by China ($95.1 billion) and the
United Kingdom ($18.2 billion) (Figure 4.2.11).
Private Investment in AI by Geographic Area, 2013–22 (Sum)
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
United States
248.90
China
95.11
United Kingdom
18.24
Israel
10.83
Canada
8.83
India
7.73
Germany
6.99
France
6.59
South Korea
5.57
Singapore
4.72
Japan
3.99
Hong Kong
3.10
Switzerland
3.04
Australia
3.04
Spain 1.81
0
Table of Contents
20
40
60
Chapter 4 Preview
80
100
120
140
160
Total Investment (in Billions of U.S. Dollars)
180
200
220
240
Figure 4.2.11
190
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
While the United States continues to outpace
The top five American AI private investment events
other nations in terms of private AI investment,
are highlighted in Figure 4.2.13, the top five European
the country experienced a sharp 35.5% decrease
Union and British investments in Figure 4.2.14, and the
in AI private investment within the last year (Figure
top five Chinese investments in Figure 4.2.15.
4.2.12). Chinese investment experienced a similarly
sharp decline (41.3%).
Private Investment in AI by Geographic Area, 2013–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
Total Investment (in Billions of U.S. Dollars)
70
60
50
47.36, United States
40
30
20
13.41, China
11.04, European Union and United Kingdom
10
0
2013
2014
Table of Contents
2015
2016
2017
Chapter 4 Preview
2018
2019
2020
2021
2022
Figure 4.2.12
191
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
Top AI Private Investment Events in the European
Union and United Kingdom, 2022
Top AI Private Investment Events in the United
States, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
Company Name
Focus Area
1.50
Celonis, GmbH
Retail; Industrial
Automation, Network;
HR Tech; Insurtech
Fintech; Retail; Sales
Enablement
0.82
Content Square, SAS
0.60
Arti cial Intelligence;
Information
Technology; Machine
Learning
0.58
Analytics; Arti cial
Intelligence: CRM:
Data Visualization;
Digital Marketing;
SaaS
Retail
0.57
Data Management,
Processing, and Cloud;
Cybersecurity, Data
Protection
0.40
Retail Logistics Excellence
- RELEX Oy
Cera Care Limited
Medical and
Healthcare
0.32
Babylon Holdings Limited
0.40
Medical and
Healthcare; Music,
Video Content
0.30
Data Management,
Processing, and Cloud;
AV; AR/VR
Company Name
Focus Area
Anduril Industries, Inc.
Cybersecurity, Data
Protection; AR/VR;
Drones
Faire Wholesale, Inc.
Anthropic, PBC
Arctic Wolf Networks, Inc.
JingChi, Inc.
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
Funding Amount
(in Billions USD)
Funding Amount
(in Billions USD)
1.22
Figure 4.2.14
Figure 4.2.13
Top AI Private Investment Events in China, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
Company Name
Focus Area
Funding Amount
(in Billions USD)
GAC Ai´an New Energy
Automobile Co., Ltd.
Automotive; Clean
Energy; Electric
Vehicle;
Manufacturing
2.54
GAC Ai´an New Energy
Automobile Co., Ltd.
Automotive; Clean
Energy; Electric
Vehicle;
Manufacturing
1.11
Beijing ESWIN
Technology Group Co.,
Ltd.
Data Management,
Processing, and Cloud;
Industrial Automation,
Network;
Semiconductor;
Marketing, Digital Ads;
Sales Enablement
0.58
Zhejiang Hozon New
Energy Automobile Co.,
Ltd.
Data Management,
Processing, and Cloud;
Cybersecurity, Data
Protection; Sales
Enablement
0.44
Zhejiang Hozon New
Energy Automobile Co.,
Ltd.
Data Management,
Processing, and Cloud;
Cybersecurity, Data
Protection; Sales
Enablement
0.32
Figure 4.2.15
Table of Contents
Chapter 4 Preview
192
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
Regional Comparison by Newly Funded
AI Companies
This subsection studies the number of newly funded
United States led all regions with the largest number of
AI companies across various geographic areas.
newly funded AI companies at 542, followed by China
As was the case with private investment, the
at 160 and the United Kingdom at 99 (Figure 4.2.16).
Number of Newly Funded AI Companies by Geographic Area, 2022
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
United States
542
China
160
United Kingdom
99
Israel
73
India
57
Canada
47
France
44
Germany
41
Singapore
36
Japan
32
Switzerland
26
Australia
23
South Korea
22
Sweden
12
Netherlands
12
0
Table of Contents
100
Chapter 4 Preview
200
300
Number of Companies
400
500
Figure 4.2.16
193
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
A similar trend is evident in the aggregate data since 2013. In the last decade, the number of newly funded
AI companies in the United States is around 3.5 times the amount in China, and 7.4 times the amount in the
United Kingdom (Figure 4.2.17).
Number of Newly Funded AI Companies by Geographic Area, 2013–22 (Sum)
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
United States
4,643
China
1,337
United Kingdom
630
Israel
402
Canada
341
France
338
India
296
Japan
294
Germany
245
Singapore
165
South Korea
145
Australia
126
Switzerland
108
Sweden
83
Netherlands
78
0
500
Figure 4.2.18 breaks
down data on newly
1,000
1,500
2,000
2,500
3,000
Number of Companies
3,500
4,000
4,500
Figure 4.2.17
Number of Newly Funded AI Companies by Geographic Area, 2013–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
700
funded AI companies
within select
600
geographic regions.
In a trend that goes
the United States
continues to outpace
both the European
Union and the United
Kingdom, as well as
500
Number of Companies
back a decade,
542,
United States
400
300
293,
European
Union and
United Kingdom
200
160,
China
China. However, the
growth rates of the
100
different regions are
relatively similar.
0
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 4.2.18
Table of Contents
Chapter 4 Preview
194
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
Private Investment in AI by Focus Area, 2021 Vs. 2022
Focus Area Analysis
Private AI investment can also be disaggregated by
focus area. Figure 4.2.19 compares global private
AI investment by focus area in 2022 versus 2021.
The focus areas that attracted the most investment
in 2022 were medical and healthcare ($6.1 billion);
data management, processing, and cloud ($5.9
billion); fintech ($5.5 billion); cybersecurity and
data protection ($5.4 billion); and retail ($4.2
billion). Mirroring the pattern seen in total AI private
investment, the total investment across most focus
areas declined in the last year.
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
Medical and Healthcare
Data Management, Processing, Cloud
Fintech
Cybersecurity, Data Protection
Retail
Industrial Automation, Network
Sales Enablement
Marketing, Digital Ads
AR/VR
Drones
Insurtech
Music, Video Content
Semiconductor
HR Tech
Energy, Oil, and Gas
AV
NLP, Customer Support
Agritech
Entertainment
Legal Tech
Geospatial
Fitness and Wellness
Ed Tech
Facial Recognition
VC
2022
2021
0
2
4
6
8
10
Total Investment (in Billions of U.S. Dollars)
Figure 4.2.19
Figure 4.2.20 presents trends in AI focus area
cybersecurity and data protection, drones, marketing
investments. As noted earlier, most focus areas saw
and digital ads, HR tech, AR/VR, and legal tech. Still,
declining investments in the last year. However, some
mirroring a broader trend in AI private investment,
of the focus areas that saw increased investments are
most focus areas saw greater amounts of AI private
semiconductor, industrial automation and network,
investment in 2022 than they did in 2017.
Table of Contents
Chapter 4 Preview
195
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
Private Investment in AI by Focus Area, 2017–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
Data Management, Processing, Cloud
8
6
4
2
0
5.86
2018
2020
2022
Medical and Healthcare
8
6
4
2
0
Semiconductor
8
6
4
2
0
2020
2022
8
6
4
2
0
Total Investment (in Billions of U.S. Dollars)
NLP, Customer Support
8
6
4
2
0
2020
2020
3.92
2018
2020
2022
8
6
4
2
0
2020
2022
8
6
4
2
0
Agritech
8
6
4
2
0
2020
2020
2018
2022
8
6
4
2
0
2020
2022
2018
2020
2022
8
6
4
2
0
2022
8
6
4
2
0
2020
2022
5.38
2018
2020
8
6
4
2
0
2022
2020
2022
8
6
4
2
0
2022
8
6
4
2
0
0.53
2018
2020
2022
2022
8
6
4
2
0
1.88
2018
2020
2020
2022
8
6
4
2
0
1.74
2018
2020
2022
8
6
4
2
0
0.37
2018
2020
2022
Music, Video Content
0.87
2018
2022
Drones
Entertainment
8
6
4
2
0
2020
Ed Tech
2.39
2018
2022
Insurtech
0.07
2018
2020
Fitness and Wellness
AR/VR
0.83
2018
2020
1.34
2018
Facial Recognition
3.18
2018
2022
4.20
Legal Tech
0.71
2018
2020
2020
Cybersecurity, Data Protection
1.63
Geospatial
8
6
4
2
0
8
6
4
2
0
Sales Enablement
0.87
2018
2018
8
6
4
2
0
Retail
HR Tech
3.05
2018
2022
1.61
2018
Marketing, Digital Ads
8
6
4
2
0
2022
Energy, Oil, and Gas
1.01
2018
2018
AV
5.52
Industrial Automation, Network
1.65
2018
6.05
Fintech
8
6
4
2
0
2022
8
6
4
2
0
1.72
2018
2020
2022
VC
8
6
4
2
0
0.02
2018
2020
Table of Contents
2022
Chapter 4 Preview
Figure 4.2.20
196
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.2 Investment
Finally, 4.2.21 shows private investment in AI by focus area
billion), and 40 times more than that in the European
over time within select geographic regions, highlighting
Union and the United Kingdom ($0.04 billion). Chinese
how private investment priorities in AI differ across
private investment in AI-related semiconductors ($1.02
geographies. For example, in 2022, private investment
billion) was 1.75 times more than that in the United
in AI-related drone technology in the United States ($1.6
States ($0.58 billion), and 102 times more than that in the
billion) was nearly 53 times more than that in China ($0.03
European Union and the United Kingdom ($0.01 billion).
Private Investment in AI by Focus Area and Geographic Area, 2017–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
Data Management, Processing, Cloud
Medical and Healthcare
US, 3.13
US, 4.19
CN, 1.87
CN, 0.25
4
4
EU/UK, 0.24
EU/UK, 0.76 4
2
0
2
2018
2020
Semiconductor
4
0
2022
2018
2020
2022
0
2018
2020
2022
Retail
0
US, 1.52
CN, 0.01
EU/UK, 2.07 4
2
2
2
2
0
0
0
2020
2022
2
0
2018
2022
Energy, Oil, and Gas
2018
2020
0
2022
2
2020
Agritech
4
0
2022
US, 0.55
CN, 0.10
EU/UK, 0.08 4
2020
2022
2
2018
2020
2022
HR Tech
0
US, 0.24
CN, 0.00
EU/UK, 1.28 4
2
2018
2018
2020
2022
Sales Enablement
0
2018
2020
2022
Facial Recognition
0
2018
2020
AR/VR
US, 1.12
CN, 1.68
EU/UK, 0.16 4
2022
0
2
2
2
0
0
Geospatial
4
US, 0.55
CN, 0.03
EU/UK, 0.01 4
2
2
0
2018
2020
VC
4
0
2022
2018
2020
Legal Tech
2022
US, 0.71
CN, 0.05
EU/UK, 0.06 4
2018
2020
Entertainment
2022
US, 0.47
CN, 0.18
EU/UK, 0.17 4
2
2018
2020
2022
0
2020
Drones
2018
2020
Insurtech
2018
2020
Ed Tech
US, 2.07
CN, 0.01
EU/UK, 0.06 4
0
2022
2018
US, 0.23
CN, 0.00
EU/UK, 0.14
2022
US, 1.60
CN, 0.03
EU/UK, 0.04
2022
US, 0.39
CN, 0.00
EU/UK, 1.29
2
2
2020
2022
Fitness and Wellness
US, 0.07
CN, 0.00
EU/UK, 0.00 4
0
2018
2020
2
2
2018
2018
Cybersecurity, Data Protection
US, 0.80
US, 3.87
CN, 0.34
CN, 1.07
EU/UK, 0.20 4
EU/UK, 0.23 4
2
Marketing, Digital Ads
US, 1.14
CN, 0.88
4
EU/UK, 0.76 4
0
2020
US, 0.69
CN, 0.49
EU/UK, 0.02
2
0
2018
AV
US, 3.23
CN, 0.03
EU/UK, 0.94 4
2
Industrial Automation, Network
US, 0.58
US, 0.87
CN, 1.02
CN, 1.06
4
EU/UK, 0.01
EU/UK, 1.65 4
NLP, Customer Support
US, 0.69
CN, 0.13
4
EU/UK, 0.04 4
Total Investment (in Billions of U.S. Dollars)
Fintech
2018
2020
2022
US, 0.12
CN, 0.01
EU/UK, 0.10
2022
Music, Video Content
US, 1.10
CN, 0.03
EU/UK, 0.44
2
2018
2020
2022
0
2018
2020
2022
US, 0.00
CN, 0.00
EU/UK, 0.02
2
0
2018
2020
Table of Contents
Figure 4.2.21
2022
Chapter 4 Preview
197
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
This section explores how corporations tangibly use AI. First, it highlights industry adoption trends and asks how businesses adopt
AI and what particular AI technologies they find most useful, and identifies how AI adoption affects their bottom line. Second, the
section considers industry motivations and explores what questions industry leaders consider when thinking about incorporating AI
technologies. Finally, it paints a qualitative picture of business AI use by examining trends in AI-related earnings calls.
4.3 Corporate Activity
Industry Adoption
Adoption of AI Capabilities
The following subsection on the industry adoption
According to the most recent McKinsey report, as of
of AI borrows data from McKinsey’s “The State of
2022, 50% of surveyed organizations reported having
AI in 2022—and a Half Decade in Review,” as well
adopted AI in at least one business unit or function
as previous years’ editions. The 2022 report drew on
(Figure 4.3.1). This total is down slightly from 56% in
data from a survey of 1,492 participants representing
2021, although up significantly from 20% in 2017. AI
a wide range of regions, industries, company sizes,
usage has rapidly grown in the past half-decade, but
functional specialties, and tenures.
leveled off since 2020.
Share of Respondents Who Say Their Organizations Have Adopted AI in at Least One Function, 2017–22
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
60%
50%
50%
% of Respondents
40%
30%
20%
10%
0%
2017
2018
2019
2020
2021
2022
Figure 4.3.1
Table of Contents
Chapter 4 Preview
198
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
In the last half-decade, the average number of AI capabilities that organizations have embedded
has doubled from 1.9 in 2018 to 3.8 in 2022 (Figure 4.3.2). Some of the AI capabilities that McKinsey
features in their survey include recommender systems, NL text understanding, and facial recognition.4
Average Number of AI Capabilities That Respondents’ Organizations Have Embedded Within at Least One
Function or Business Unit, 2018–22
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
4.00
3.80
Number of AI Capabilities (Average)
3.50
3.00
2.50
2.00
1.50
1.00
0.50
0.00
2018
2019
2020
2021
2022
Figure 4.3.2
4 In the 2022 edition of the McKinsey survey, 16 total AI capabilities are considered: computer vision, deep learning, digital twins, facial recognition, GAN, knowledge graphs,
NL generation, NL speech understanding, NL text understanding, physical robotics, recommender systems, reinforcement learning, robotic process automation, transfer
learning, transformers, and virtual agents.
Table of Contents
Chapter 4 Preview
199
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
The most commonly adopted AI use case in 2022 was service operations optimization (24%), followed
by the creation of new AI-based products (20%), customer segmentation (19%), customer service
analytics (19%), and new AI-based enhancement of products (19%) (Figure 4.3.3).
Most Commonly Adopted AI Use Cases by Function, 2022
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
Service Operations Optimization
24%
Creation of New AI-Based Products
20%
Customer Segmentation
19%
Customer Service Analytics
19%
New AI-Based Enhancements of Products
19%
Customer Acquisition and Lead Generation
17%
Product Feature Optimization
16%
Contact-Center Automation
16%
Risk Modeling and Analytics
15%
Service Operations
Product and/or
Service Development
Marketing and Sales
Predictive Service and Intervention
0%
Risk
14%
4%
8%
12%
% of Respondents
16%
20%
24%
Figure 4.3.3
Table of Contents
Chapter 4 Preview
200
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
With respect to the type of AI capabilities embedded
rates of embedding were 48%, 47%, and 46%. Across
in at least one function or business unit, as indicated
all industries, the most embedded AI technologies
by Figure 4.3.4, robotic process automation had
were robotic process automation (39%), computer
the highest rate of embedding within high tech/
vision (34%), NL text understanding (33%), and virtual
telecom, financial services and business, and legal
agents (33%).
and professional services industries—the respective
AI Capabilities Embedded in at Least One Function or Business Unit, 2022
Industry
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
All Industries 34%
30%
24%
18%
11%
25%
18%
23%
33%
20%
25%
20%
39%
16%
11%
33%
Business, Legal, and
32%
Professional Services
37%
31%
11%
8%
26%
12%
22%
34%
19%
23%
26%
46%
16%
11%
30%
Consumer Goods/
33%
Retail
36%
25%
19%
13%
18%
20%
11%
22%
24%
32%
19%
25%
7%
11%
40%
Financial Services 24%
22%
18%
24%
13%
29%
20%
30%
42%
14%
30%
19%
47%
17%
12%
33%
Healthcare Systems/
Pharma and 32%
Med. Products
18%
16%
5%
5%
14%
5%
12%
29%
11%
16%
13%
16%
9%
6%
14%
High Tech/Telecom 37%
45%
24%
16%
15%
23%
24%
29%
40%
15%
34%
23%
48%
22%
15%
43%
de
en
m
s
ic
ot
b
Ro
ng
di
ng
di
an
an
st
s
)
s
-3
nt
PT
ge
,G
lA
g.
ua
e.
rt
s(
Vi
er
m
or
sf
n
ng
io
an
ni
at
Tr
ar
om
Le
ut
er
sf
sA
an
es
Tr
g
oc
in
Pr
rn
tic
ea
bo
tL
en
Ro
s
m
m
te
ce
or
ys
nf
rS
i
Re
m
co
Re
st
er
er
d
Un
n
io
ph
ra
n
tio
ni
G
d
Un
h
ec
e
at
er
xt
Te
e
Sp
en
G
dg
le
al
ic
ys
Ph
L
N
L
N
L
N
ow
Kn
AN
G
s
in
og
ec
lR
ia
c
Fa
g
in
n
sio
Vi
rn
a
Le
w
lT
ita
ig
D
p
ee
D
r
te
pu
om
C
% of Respondents (AI Capability)
Figure 4.3.4
Table of Contents
Chapter 4 Preview
201
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Figure 4.3.5 shows AI adoption by industry and AI function in 2022. The greatest adoption was in risk for high
tech/telecom (38%), followed by service operations for consumer goods/retail (31%) and product and/or service
development for financial services (31%).
AI Adoption by Industry and Function, 2022
Industry
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
All Industries
11%
8%
5%
10%
19%
19%
21%
9%
Business, Legal, and
Professional Services
11%
10%
9%
8%
16%
20%
19%
12%
Consumer Goods/
Retail
14%
4%
3%
4%
15%
31%
29%
11%
Financial Services
1%
8%
7%
31%
17%
24%
23%
2%
Healthcare Systems/
Pharma and
Med. Products
15%
7%
2%
4%
22%
12%
8%
8%
High Tech/Telecom
6%
6%
4%
7%
38%
21%
25%
8%
Hu
ma
nR
eso
urc
es
Ma
nu
fa
ctu
rin
Ma
rke
ti
g
ng
Ris
Pro
k
Se duct
rvi
an
c
an
e D d/o
dS
r
ev
ale
elo
s
pm
en
t
Se
rv
ice
Str
Co ateg
rpo y a
era
rat nd
eF
tio
ns
ina
n
Op
ce
Su
p
Ma plyna Ch
a
ge
me in
nt
% of Respondents (Function)
Figure 4.3.5
Table of Contents
Chapter 4 Preview
202
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Figure 4.3.6 shows how rates of AI adoption by
points); followed by high tech/telecom, for risk
industry and AI function vary from 2021 to 2022
(22 percentage points). The most significant
in order to demonstrate how rates of AI adoption
decreases were in high tech/telecom, for product
have changed over the last year. The greatest year-
and/or service development (38 percentage points);
over-year increases were in consumer goods/retail,
and healthcare systems, also for product and/or
for strategy and corporate finance (25 percentage
service development (25 percentage points).
Percentage Point Change in Responses of AI Adoption by Industry and Function 2021 Vs. 2022
Industry
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
All Industries
2%
-4%
-15%
-13%
6%
-6%
12%
-4%
Business, Legal, and
Professional Services
-3%
2%
-19%
-7%
3%
-6%
11%
-1%
Consumer Goods/
Retail
12%
-14%
-19%
-13%
14%
16%
25%
-7%
Financial Services
-9%
4%
-17%
11%
-15%
-16%
10%
-6%
Healthcare Systems/
Pharma and
Med. Products
6%
-4%
-12%
-25%
9%
-5%
-4%
-1%
High Tech/Telecom
-6%
-5%
-24%
-38%
22%
-13%
15%
-8%
Hu
ma
nR
eso
urc
es
Ma
nu
fa
ctu
rin
Ma
rke
ti
g
ng
Ris
Pro
k
Se duct
rvi
a
nd
ce
an
De /or
dS
ve
ale
lop
s
me
nt
Se
rv
ice
Str
Co ateg
rpo y a
era
rat nd
eF
tio
ns
ina
n
Op
ce
Su
p
Ma plyna Ch
a
ge
me in
nt
Percentage Point Change in Responses (Function)
Figure 4.3.6
Table of Contents
Chapter 4 Preview
203
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Organizations report AI adoption leading to both
(Figure 4.3.7). On the revenue side, the functions that
cost decreases and revenue increases. On the cost
most respondents saw increases in as a result of AI
side, the functions that most respondents saw
adoption were marketing and sales (70%), product
decreases in as a result of AI adoption were supply
and/or service development (70%), and strategy and
chain management (52%), service operations (45%),
corporate finance (65%).
strategy and corporate finance (43%), and risk (43%)
Cost Decrease and Revenue Increase From AI Adoption by Function, 2021
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
Decrease by <10%
Decrease by 10–19%
Decrease by ≥20%
45%
Service Operations
28%
Function
Marketing and Sales
43%
Risk
52%
Strategy and Corporate Finance
Average Across All Activities
10%
7%
10%
25%
30%
30%
43%
9%
8%
10%
7%
20%
31%
32%
14%
21%
41%
Product and/or Service Development
10% 6%
32%
29%
Human Resources
Supply Chain Management
29%
42%
Manufacturing
Increase by >10%
23%
10%
18%
8%
57%
58%
31%
20%
48%
27%
17%
59%
28%
24%
19%
70%
41%
11%
16%
61%
33%
13%
13%
8%
Increase by ≤5%
37%
14%
6%
8%
Increase by 6–10%
70%
33%
41%
36%
65%
63%
% of Respondents
Figure 4.3.7
Table of Contents
Chapter 4 Preview
204
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Figure 4.3.8 shows AI adoption by organizations
across all geographies was 50%, down 6% from 2021.
globally, broken out by regions of the world. In 2022,
Notably, “Greater China” registered a 20 percentage
North America led (59%), followed by Asia-Pacific
point decrease from 2021.
(55%) and Europe (48%). The average adoption rate
AI Adoption by Organizations in the World, 2021 Vs. 2022
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
50%
All Geographies
56%
55%
Asia-Paci c
64%
48%
Europe
51%
59%
North America
55%
Greater China
(incl. Hong Kong,
Taiwan)
41%
61%
Developing Markets
(incl. India,
Latin America,
MENA)
0%
2022
44%
2021
52%
10%
20%
30%
40%
% of Respondents
50%
60%
Figure 4.3.8
Table of Contents
Chapter 4 Preview
205
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Consideration and Mitigation of Risks From
Adopting AI
risks were regulatory compliance (45%), personal/
As has been the case in the last few iterations of the
The least salient risks identified by organizations were
McKinsey report, in 2022 respondents identified
national security (13%) and political stability (9%).
individual privacy (40%), and explainability (37%).
cybersecurity as the most relevant risk when adopting
AI technology (59%) (Figure 4.3.9). The next most cited
Risks From Adopting AI That Organizations Consider Relevant, 2019–22
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
60%
59%, Cybersecurity
50%
% of Respondents
45%, Regulatory Compliance
40%
40%, Personal/Individual Privacy
37%, Explainability
30%
32%, Organizational Reputation
30%, Equity and Fairness
28%, Workforce/Labor Displacement
20%
20%, Physical Safety
13%, National Security
10%
0%
9%, Political Stability
2019
2020
2021
2022
Figure 4.3.9
Table of Contents
Chapter 4 Preview
206
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Figure 4.3.10 highlights the AI risks that organizations
have taken steps to mitigate. For instance, there is
are taking steps to mitigate. The top three responses
a gap of 8 percentage points for cybersecurity, 9
were cybersecurity (51%), followed by regulatory
percentage points for regulatory compliance, and 12
compliance (36%) and personal/individual privacy
percentage points for personal/individual privacy.
(28%). As was the case in previous years, there are
These differences suggest there is a gap between
meaningful gaps between the risks organizations
the awareness organizations have of various risks and
cite as relevant and those which organizations
their steps taken to mitigate such risks.
Risks From Adopting AI That Organizations Take Steps to Mitigate, 2019–22
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
51%, Cybersecurity
50%
40%
% of Respondents
36%, Regulatory Compliance
30%
28%, Personal/Individual Privacy
22%, Organizational Reputation
22%, Explainability
20%
18%, Workforce/Labor Displacement
17%, Equity and Fairness
15%, Physical Safety
10%
7%, National Security
4%, Political Stability
0%
2019
2020
2021
2022
Figure 4.3.10
Table of Contents
Chapter 4 Preview
207
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Narrative Highlight:
The Effects of GitHub’s Copilot on Developer
Productivity and Happiness
In 2021, launched a technical preview of Copilot,
a generative AI tool that enables developers and
coders to present a coding problem in natural
language and then have Copilot generate a
solution in code. Copilot can also translate
between various programming languages. In
2022, GitHub surveyed over 2,000 developers
who were using the tool to determine its effect on
their productivity, well-being, and workflow.5
Figure 4.3.11 summarizes the results of the survey.
Developers overwhelmingly reported feeling
more productive, satisfied, and efficient when
working with Copilot. More specifically, 88% of
surveyed respondents commented feeling more
productive, 74% reported being able to focus on
more satisfying work, and 88% claimed to have
completed tasks more quickly. One software
engineer stated, “[With Copilot] I have to think
It took the developers
using Copilot only 71
minutes to complete their
task—56% less time than
the developers who did not
use Copilot (161 minutes).
reported a completion rate of 78%, 8 percentage
points higher than those who did not use Copilot.
Likewise, it only took the developers using Copilot
71 minutes to complete their task, which was 56%
less time than the developers who did not use
Copilot (161 minutes). These survey and experiment
results are evidence of the tangible ways in which
AI tools improve worker productivity.
less, and when I have to think, it’s the fun stuff. It
sets off a little spark that makes coding more fun
and more efficient.”6
As part of the same survey, GitHub recruited
95 developers and randomly split them into two
groups, one of which used Copilot as part of a
coding task and the other which did not. The
results of this experiment are summarized in
Figure 4.3.12. The developers who used Copilot
5 Most of the developers surveyed, around 60%, were professional developers; 30% were students and 7% were hobbyists.
6 The quote is taken from this source.
Table of Contents
Chapter 4 Preview
208
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Narrative Highlight:
The Effects of GitHub’s Copilot on Developer
Productivity and Happiness (cont’d)
Measuring Dimensions of Developer Productivity When Using Copilot: Survey Responses, 2022
Source: GitHub Survey, 2022 | Chart: 2023 AI Index Report
I Am More Productive
88%
Focus on More Satisfying Work
74%
More Ful�lled With My Job
60%
Perceived Productivity
Satisfaction and Well-Being
Less Frustrated When Coding
59%
E�ciency and Flow
Faster With Repetitive Tasks
96%
Faster Completion
88%
Less Mental E�orts on Repetitive Tasks
87%
Less Time Searching
77%
More In The Flow
73%
0%
20%
40%
60%
% of Participants That Agreed or Strongly Agreed
80%
100%
Figure 4.3.11
Summary of the Experiment Process and Results
Source: GitHub Survey, 2022 | Table: 2023 AI Index Report
Used
GitHub Copilot
Did Not Use
GitHub Copilot
Number of Developers
45
50
Completion Rate (%)
78
70
Average Time Taken to
Complete the Task (Minutes)
71
161
Figure 4.3.12
Table of Contents
Chapter 4 Preview
209
Artificial Intelligence
Index Report 2023
Industry Motivation
This section explores the motivations industry
Chapter 4: The Economy
4.3 Corporate Activity
Importance of AI Solutions for Organizations’
Overall Success
Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report
leaders have in deploying AI and examines the
1%, Not Important
5%, Somewhat Important
degree to which they feel AI is important, the
reasons they are eager to embrace AI, and the
factors that have hindered further scaling of
AI solutions. The data from this section comes
from Deloitte’s “State of AI in Enterprise” report,
which has surveyed companies about their use
of AI since 2017. This year’s survey polled 2,620
94%, Important
business leaders from a wide range of countries,
industries, and corporate levels.
Figure 4.3.13
Perceived Importance of AI
Figures 4.3.13 and 4.3.14 suggest that an
overwhelming majority of business leaders
perceive AI to be important for their businesses.
More specifically, when asked how important
AI solutions were for their organization’s overall
success, 94% responded “important,” 5% said
“somewhat important,” and 1% answered “not
important” (Figure 4.3.13).
Believe AI Enhances Performance and Job
Satisfaction, 2022
Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report
2%, Strongly Disagree / Disagree
1%, Unsure
16%, Neither Agree nor Disagree
Similarly, when asked whether they believe that AI
enhances performance and job satisfaction, 82%
responded “strongly agree/agree,” 16% said they
“neither agree nor disagree,” and only 2% selected
“strongly disagree/disagree” (Figure 4.3.14).
82%, Strongly Agree / Agree
Table of Contents
Chapter 4 Preview
Figure 4.3.14
210
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
AI Investments and Implementation
Outcomes
In 2022, 76% of surveyed leaders reported
expecting to increase AI investments in the next
fiscal year (Figure 4.3.15). Although this represents
a 9 percentage point decrease since 2021 and a 12
percentage point decrease since 2018, a significantly
large portion of business leaders continue to express
interest in AI investment.
Expected AI Investment Increase in the Next Fiscal Year
Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report
80%
% of Respondents
76%
60%
40%
20%
0%
2018
2019
2020
2021
2022
Figure 4.3.15
Table of Contents
Chapter 4 Preview
211
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Figure 4.3.16 highlights the main outcomes that business leaders achieved by embracing AI solutions.7
The top outcome was lowered costs (37%), followed by improved collaboration across business
functions/organizations (34%) and having discovered valuable insights (34%).
Main Outcomes of AI Implementation, 2022
Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report
37%
Lower Costs
Improve Collaboration Across
Business Functions/Organizations
34%
Discover Valuable Insights
34%
Customize or Improve Product/Programs,
Services, or Offers
33%
Enter New Markets/Expand Services to
New Constituents
33%
Make Organizational Processes More E�cient
33%
Improve Decision-Making
32%
Create New Products/
Programs and Services
32%
Predict Demand
32%
Enable New Business/
Service Models
32%
Increase Revenue
31%
Activate the Potential of Existing Headcount
and/or Improve Talent Management
30%
Improve Constituent Engagement
30%
Anticipate Constituent Needs
0%
28%
10%
20%
% of Respondents
30%
Figure 4.3.16
7 Figure 4.3.16 is drawn from the chart in the Deloitte survey: “Outcomes—‘Achieved to a high degree.’”
Table of Contents
Chapter 4 Preview
212
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Challenges in Starting and Scaling AI Projects
were proving business value (37%), lack of executive
The top three challenges that business leaders
commitment (34%), and choosing the right AI
identified in terms of starting AI-related projects
technologies (33%) (Figure 4.3.17).
Top Three Challenges in Starting AI Projects, 2022
Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report
Proving Business
Value
37%
Lack of Executive
Commitment
34%
Choosing the Right
AI Technologies
0%
33%
5%
10%
15%
20%
% of Respondents
25%
30%
35%
Figure 4.3.17
Table of Contents
Chapter 4 Preview
213
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
The main barrier leaders faced in scaling existing AI initiatives was managing AI-related risks (50%), obtaining
more data or inputs to train a model (44%), and implementing AI technologies (42%) (Figure 4.3.18).
Main Barriers in Scaling AI Initiatives, 2022
Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report
Managing
AI-Related Risks
50%
Obtaining Needed Data
or Input to Train Model
44%
Implementing
AI Technologies
42%
Proving Business Value
0%
40%
10%
20%
30%
% of Respondents
40%
50%
Figure 4.3.18
Table of Contents
Chapter 4 Preview
214
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Earnings Calls
Aggregate Trends
The following subsection presents data from
from Fortune 500 companies that mentioned AI-related
NetBase Quid, which uses natural language
keywords (Figure 4.3.19). The number of such mentions
processing tools to analyze trends in corporate
dropped from the previous year, when there were 306,
earnings calls. NetBase Quid analyzed all 2022
but has increased since 2018 when there were 225.
In the 2022 fiscal year, there were 268 earnings calls
earnings calls from Fortune 500 companies,
identifying all mentions of “Artificial Intelligence,”
“AI,” “Machine Learning,” “ML,” and “deep learning.”
Number of Fortune 500 Earnings Calls Mentioning AI, 2018–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
300
Number of Earnings Calls
268
200
100
0
2018
2019
2020
2021
2022
Figure 4.3.19
Table of Contents
Chapter 4 Preview
215
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Specific Themes
advertising and marketing (8.8%) (Figure 4.3.20).
Mentions of AI in Fortune 500 earnings calls were
Compared to 2018, some of the less prevalent
associated with a wide range of themes. In 2022, the
AI-related themes in 2022 included deep learning
most cited themes were business integration (10.0%);
(4.8%), autonomous vehicles (3.1%), and data
pricing and inventory management (8.8%); and
storage and management (3.0%).
Themes for AI Mentions in Fortune 500 Earnings Calls, 2018 Vs. 2022
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
9.96% (-15%)
Business Integration
11.74%
8.82% (+48%)
Pricing and Inventory Management
5.94%
8.82% (+204%)
Advertising and Marketing
2.90%
8.39% (+23%)
Process Automation
6.81%
7.40% (-7%)
7.97%
Support Decision-Making
7.11% (+69%)
Healthcare and Medical Practices
4.20%
6.26% (+73%)
Cloud Platforms
3.62%
5.26% (-21%)
Personalizing Customer Experience
6.67%
4.84% (-41%)
Deep Learning
8.26%
4.13% (+24%)
Edge Intelligence
3.33%
3.84% (+121%)
Nvidia AI Use Cases
1.74%
3.27% (+33%)
Revenue Growth
2.46%
3.13% (-47%)
Autonomous Vehicles
5.94%
2.99% (+37%)
Data Processing
2.17%
2.99% (-55%)
Data Storage and Management
6.67%
2.70% (+10%)
2.46%
Adobe Experience
Customer Support
2.42% (+734%)
0.29%
2.13% (-59%)
Azure Cognitive Services
5.22%
1.85% (-20%)
2.32%
Data Center GPU
Investments
Nvidia RTX
Digital Transformation
0%
1.28% (+47%)
0.87%
2022
2018
1.00% (-62%)
2.61%
0.71% (-87%)
5.36%
1%
2%
3%
4%
5%
6%
7%
8%
9%
Theme Mentioned (% of Total)
10%
11%
12%
13%
14%
Figure 4.3.20
Table of Contents
Chapter 4 Preview
216
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Narrative Highlight:
What Are Business Leaders Actually Saying About AI?
To better understand business attitudes that surround AI, it is worth
looking at AI-related excerpts from the Fortune 500 earnings calls.
For example, on the topic of business integration, companies often
cite AI and machine learning (ML) use cases to reassure business
audiences of safer business practices, growing opportunities,
streamlining processes, and capability expansion.
“We spent $100 million building
certain risk and fraud systems
so that when we process
payments on the consumer side,
losses are down $100 million to
$200 million. Volume is way up.
That’s a huge benefit.”
– Jamie Dimon, CEO, JP Morgan
Chase & Co. (Q2 2022)
“Especially in the last year or so, the
field of robotics itself has actually
changed because with AI and
ML coming to the picture, there’s
significant developments in the
robotics field. So we think it’s a
huge opportunity for us.”
– Raj Subramaniam, CEO, FedEx
(Q3 2022)
“We spent a ton of money
on Cloud. We spend a ton of
money on adding capabilities.
And over time, as you do it on
one platform, it all becomes
more efficient. So, I think it’s
a lot of little things, but it adds
up with our base of people
and fixed cost, it adds up
significantly over time. We’ve
been able to maintain our
headcount at a level we feel
good about, and we think
we can grow massively on
top of that without having to
add lots of bodies to be able
to do it.” – Peter Kern, CEO,
Expedia Group (Q4 2022)
In terms of process automation, business leaders emphasize the ability of AI tools to accelerate
productivity gains and to deliver a better customer experience.
“We continue to drive
the use of automation
and artificial
intelligence to drive
productivity gains to
help offset inflationary
pressures.” – Jim Davis,
CEO, Quest Diagnostics
(Q4 2022)
Table of Contents
“We have improved the
experience for customers by
applying artificial intelligence
to match them with an expert
who is right for their specific
situation and to deliver insights
to experts so they can provide
excellent service.” – Sasan
Goodarzi, CEO, Intuit (Q2 2022)
Chapter 4 Preview
“In September, we opened a nextgen fulfillment center in Illinois.
This 1.1 million square foot facility
features robotics, machine learning,
and automated storage, resulting in
increased productivity and a better
service for our customers at faster
delivery times.” – John David, CFO,
Walmart (Q3 2022)
217
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Narrative Highlight:
What Are Business Leaders Actually Saying About AI?
(cont’d)
The conversation surrounding pricing and inventory management saw companies reassuring business
audiences on how their use of AI would improve their operational strength, especially in environments of
high inflation and supply chain challenges.
“We are … continuing to refine and invest
in machine learning tools that will allow for
more sophisticated competitive pricing
and greater automation at scale.”
– Adrian Mitchell, CFO, Macy’s (Q3 2022)
“Our teams are utilizing technology, innovative data analytics
and AI to forecast supply chain lead times and changes
in market demand to ensure optimal levels. These actions
along with our pricing initiatives positively impacted our
gross margin in the second quarter.”
– Bert Nappier, CFO, Genuine Parts Company (Q3 2022)
There is also a vibrant discussion about the ways in which AI can change healthcare and medical
practices, more specifically to reduce costs, improve the patient experience, and better serve clinicians.
“[Using] machine
learning and robotics,
we can now resolve
a wide range of
prescription drug
claims which previously
required the attention of
our pharmacists, freeing
them up to spend time
with patients. This
advanced approach
reduces overall cost
and improves the
patient experience.”
– Karen Lynch, CEO,
CVS Health (Q2 2022)
Table of Contents
“I’d like to highlight productivity efforts in our preauthorization process where
we’re leveraging an in-house artificial intelligence solution to automatically
match incoming faxes to the correct authorization requests. This solution
creates administrative efficiencies across millions of inbound images. We are
also scaling this solution to multiple business units such as pharmacy and
are also expanding the application of this type of AI to provide decision
support to clinicians, which will result in improvements to authorization
turnaround times, reduction in friction for providers and creating a better
member experience.” – Bruce Broussard, CEO, Humana (Q3 2022)
“We continue to see opportunities across [the software and analytics] segment
as payers, providers, and partners take advantage of our high ROI solutions and
realize the benefits of our data, AI models, and workflow capabilities.”
– Neil de Crescenzo, CEO, UnitedHealth Group (Q2 2022)
Chapter 4 Preview
218
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.3 Corporate Activity
Sentiment Analysis
sentiment associated with mentions of AI has been
NetBase Quid also runs the AI-related text of Fortune
overwhelmingly positive (Figure 4.3.21). Mentions
500 earnings calls through a sentiment analysis
of AI were rarely negative, suggesting that large
machine-learning algorithm that identifies whether
businesses tend to have positive associations when it
the sentiment associated with the mention of AI is
comes to AI tools.
positive, mixed, or negative . Overall, since 2018, the
8
Sentiment Summary Distribution for AI Mentions in Fortune 500 Earnings Calls by Publication Date, 2018–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
1%%
2%%
3%%
17%%
19%%
18%%
80%%
79%%
81%%
84%%
Sentiment Summary
1%%
16%%
79%%
86%%
1%%
84%%
12%%
85%%
1%%
20%%
2%%
16%%
1%%
14%%
77%%
0%%
87%%
1%%
13%%
1%%
84%%
80%%
77%%
81%%
1%%
22%%
3%%
17%%
18%%
81%%
84%%
17%%
81%%
15%%
18%%
81%%
3%%
21%%
2%%
76%%
1%%
2%%
23%%
0%%
15%%
1%%
19%%
86%%
80%
1%%
13%%
100%
60%
40%
Negative
20%
Mixed
Positive
0%
Q1
Q2
Q3
2018
Q4
Q1
Q2
Q3
2019
Q4
Q1
Q2
Q3
2020
Q4
Q1
Q2
Q3
2021
Q4
Q1
Q2
Q3
2022
Q4
Figure 4.3.21
8 Chapter 2 of the 2023 AI Index highlights trends in the performance of sentiment analysis algorithms.
Table of Contents
Chapter 4 Preview
219
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
Given that robots are frequently deployed with AI-based software technologies, it is possible to gain insights on AI-ready infrastructure
being deployed in the real world by tracking the installation of industrial robots. Data in this section comes from the International
Federation of Robotics (IFR), an international nonprofit organization that works to promote, strengthen, and protect the robotics
industry. Every year the IFR releases the World Robotics Report, which tracks global trends in installations of robots.9
4.4 Robot Installations
Aggregate Trends
2021 saw a rebound in the total number of worldwide
The following subsection includes data on the
robot installations. The 517,000 industrial robots
installation and operation of industrial robots,
installed in 2021 represented a 31.3% increase from
which are defined as an “automatically controlled,
2020 and a 211.5% increase since 2011 (Figure 4.4.1).
reprogrammable, multipurpose manipulator,
programmable in three or more axes, which can be
either fixed in place or mobile for use in industrial
automation applications.”
Number of Industrial Robots Installed in the World, 2011–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
517
Number of Industrial Robots Installed (in Thousands)
500
400
300
200
100
0
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 4.4.1
9 Due to the timing of the IFR’s survey, the most recent data is from 2021.
Table of Contents
Chapter 4 Preview
220
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
The worldwide operational stock of industrial robots
from 3,035,000 in 2020. In the last decade, the
also continues to steadily increase year over year
number of industrial robots being installed and the
(Figure 4.4.2). The total number of operational
number being used have both steadily increased.
industrial robots jumped 14.6% to 3,477,000 in 2021,
Operational Stock of Industrial Robots in the World, 2011–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
3,477
3,500
Number of Industrial Robots (in Thousands)
3,000
2,500
2,000
1,500
1,000
500
0
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 4.4.2
Table of Contents
Chapter 4 Preview
221
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
Industrial Robots: Traditional Vs.
Collaborative Robots
scalable than traditional robots, and are capable of
iterative learning.
A distinction can be drawn between traditional
robots that work for humans and collaborative
In 2017, only 2.8% of all newly installed industrial
robots that are designed to work with humans.
robots were collaborative (Figure 4.4.3). As of 2021,
Recently, the robotics community has been excited
that number increased to 7.5%. Although traditional
about the potential of collaborative robots given
industrial robots still lead new installations, the
that they can be safer, more flexible, and more
number of collaborative robots is slowly increasing.
Number of Industrial Robots Installed in the World by Type, 2017–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
Number of Industrial Robots Installed (in Thousands)
500
Traditional
517
Collaborative
39
424
400
400
391
394
300
478
200
389
405
2017
2018
370
368
2019
2020
100
0
2021
Figure 4.4.3
Table of Contents
Chapter 4 Preview
222
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
By Geographic Area
the amount installed by Japan (47,200) and 7.7
Country-level data on robot installations can illustrate
times the amount installed by the United States
which countries are prioritizing the integration of
(35,000) (Figure 4.4.4). The countries with the next
robots into their economy. In 2021, China installed
most installations were South Korea (31,100) and
the most industrial robots, with 268,200, 5.7 times
Germany (23,800).
Number of Industrial Robots Installed by Country, 2021
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
China
268.20
Japan
47.20
United States
35.00
South Korea
31.10
Germany
23.80
Italy
14.10
Taiwan
9.60
France
5.90
Mexico
5.40
India
4.90
Canada
4.30
Thailand
3.90
Singapore
3.50
Spain
3.40
Poland
3.30
0
30
60
90
120
150
180
Number of Industrial Robots Installed (in Thousands)
210
240
270
Figure 4.4.4
Table of Contents
Chapter 4 Preview
223
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
In 2013, China overtook Japan as the nation installing
the most industrial robots (Figure 4.4.5). Since then,
the gap between the total number of industrial robots
installed by China and the next-nearest nation has
only widened. In 2013, Chinese industrial robot
installations represented 20.8% of the world’s share,
whereas in 2021, they represented 51.8%.
Number of New Industrial Robots Installed in Top Five Countries, 2011–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
268, China
Number of Industrial Robots Installed (in Thousands)
250
200
150
100
47, Japan
35, United States
31, South Korea
24, Germany
50
0
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 4.4.5
Table of Contents
Chapter 4 Preview
224
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
China consolidated its dominance in industrial robotics in 2021, the first year in which the country installed
more industrial robots than the rest of the world combined (Figure 4.4.6).
Number of Industrial Robots Installed (China Vs. Rest of the World), 2016–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
Number of Industrial Robots Installed (in Thousands)
268, China
250
249, Rest of the World
200
150
100
50
0
2016
2017
2018
2019
2020
2021
Figure 4.4.6
Table of Contents
Chapter 4 Preview
225
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
Figure 4.4.7 shows the annual growth rate of
of industrial robot installations. The countries that
industrial robot installations from 2020 to 2021 by
reported the highest growth rates were Canada
country. Virtually every country surveyed by the
(66%), Italy (65%), and Mexico (61%).
IFR reported a yearly increase in the total number
Annual Growth Rate of Industrial Robots Installed by Country, 2020 Vs. 2021
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
Canada
66%
Italy
65%
Mexico
61%
Poland
56%
India
54%
China
51%
Thailand
36%
Taiwan
31%
Japan
22%
United States
14%
France
11%
Germany
6%
South Korea
2%
Spain
Singapore
1%
-35%
−40%
−30%
−20%
−10%
0%
10%
20%
30%
Annual Growth Rate of Industrial Robots Installed
40%
50%
60%
70%
Figure 4.4.7
Table of Contents
Chapter 4 Preview
226
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
Narrative Highlight:
Country-Level Data on Service Robotics
Another important class of robots are service robots,
which the ISO defines as a robot “that performs
Service Robots in Medicine
Source: UL Solutions, 2022
useful tasks for humans or equipment excluding
industrial automation applications.”10 Figure 4.4.8
is an example of a robot being used in medicine,
Figure 4.4.9 illustrates how a robot can help with
professional cleaning, and Figure 4.4.10 shows a
robot designed for maintenance and inspection.
Figure 4.4.8
Service Robots in Professional Cleaning
Source: This Week in FM, 2021
Figure 4.4.9
Service Robots in Maintenance and Inspection
Source: Robotnik, 2022
Figure 4.4.10
10 A more detailed definition can be accessed here.
Table of Contents
Chapter 4 Preview
227
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
Narrative Highlight:
Country-Level Data on Service Robotics (cont’d)
Compared to 2020, 2021 saw a higher number of professional service robots installed in the world
for several key application areas, including hospitality, medical robotics, professional cleaning, and
transportation and logistics (Figure 4.4.11). The category that registered the greatest year-over-year
increase was transportation and logistics: In 2021, 1.5 times the number of such service robots were
installed as in 2020.
Number of Professional Service Robots Installed in the World by Application Area, 2020 Vs. 2021
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
8
Agriculture
8
20
Hospitality
11
15
Medical Robotics
12
13
Professional Cleaning
2020
50
Transportation and Logistics
0
2021
10
34
5
10
15
20
25
30
35
40
Number of Professional Service Robots Installed (in Thousands)
45
50
Figure 4.4.11
Table of Contents
Chapter 4 Preview
228
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
Narrative Highlight:
Country-Level Data on Service Robotics (cont’d)
As of 2022, the United States has the greatest number of professional service robot manufacturers,
roughly 2.16 times as many as the next nation, China. Other nations with significant numbers of robot
manufacturers include Germany (91), Japan (66), and France (54) (Figure 4.4.12).
Number of Professional Service Robot Manufacturers in Top Countries by Type of Company, 2022
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
225
Number of Professional Service Robot Manufacturers
29
200
150
100
104
194
Startups
91
Incumbents
Unknown
66
54
50
94
52
47
79
61
China
France
Russia
44
Germany
Japan
39
34
35
1
1
United States
44
3
2
0
49
39
South Korea
Switzerland
Canada
Figure 4.4.12
Table of Contents
Chapter 4 Preview
229
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
Sectors and Application Types
(137,000), followed by automotive (119,000) (Figure
On a global level, the sector that saw the greatest
increases in the total number of industrial robot
amount of robot installations was electrical/electronics
installations since 2019.
4.4.13). Each of the highlighted sectors has recorded
Number of Industrial Robots Installed in the World by Sector, 2019–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
52
All Others
37
30
119
Automotive
84
102
137
Electrical/Electronics
110
89
15
Food
12
11
64
Metal and Machinery
44
52
24
Plastic and Chemical Products
19
18
2021
2020
107
Unspeci ed
87
87
0
20
40
60
80
100
Number of Industrial Robots Installed (in Thousands)
2019
120
140
Figure 4.4.13
Table of Contents
Chapter 4 Preview
230
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
Robots can also be deployed in a wide range of
230,000 industrial robots were installed for handling
applications, from assembling to dispensing and
functions, 2.4 times more than for welding (96,000)
handling. Figure 4.4.14 illustrates how the application
and 3.7 times more than for assembling (62,000).
of industrial robots has changed since 2021. Handling
Every application category, with the exception
continues to be the application case toward which
of dispensing and processing, saw more robot
the most industrial robots are deployed. In 2021,
installations in 2021 than in 2019.
Number of Industrial Robots Installed in the World by Application, 2019–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
62
Assembling
50
40
32
32
Cleanroom
26
11
Dispensing
8
12
230
Handling
169
177
7
Processing
5
2021
7
2020
80
Unspeci ed
2019
60
55
96
Welding
70
74
0
20
40
60
80
100
120
140
160
Number of Industrial Robots Installed (in Thousands)
180
200
220
240
Figure 4.4.14
Table of Contents
Chapter 4 Preview
231
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
China Vs. United States
and metal and machinery (34,000) (Figure 4.4.15).
The Chinese industrial sectors that installed the
Every industrial sector in China recorded a greater
greatest number of industrial robots in 2022 were
number of robot installations in 2021 than in 2019.
electrical/electronics (88,000), automotive (62,000),
Number of Industrial Robots Installed in China by Sector, 2019–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
29
All Others
21
12
62
Automotive
31
32
88
Electrical/Electronics
64
42
4
Food
3
3
34
Metal and Machinery
Pharma/Cosmetics
22
22
1
1
1
6
Rubber and Plastics
5
2021
4
2020
43
Unspeci ed
0
2019
30
31
10
20
30
40
50
60
Number of Industrial Robots Installed (in Thousands)
70
80
90
Figure 4.4.15
Table of Contents
Chapter 4 Preview
232
Artificial Intelligence
Index Report 2023
Chapter 4: The Economy
4.4 Robot Installations
The automotive industry installed the greatest number of industrial robots in the United States in 2021,
although installation rates for that sector decreased year over year (Figure 4.4.16). However, other sectors like
food, along with plastic and chemical products, saw year-over-year increases in robot installations.
Number of Industrial Robots Installed in the United States by Sector, 2019–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
4.50
All Others
2.60
3.50
9.80
Automotive
10.50
13.00
2.90
Electrical/Electronics
3.70
3.50
3.40
Food
2.70
2.20
3.80
Metal and Machinery
2.30
3.80
3.50
Plastic and Chemical Products
2.60
2.50
2021
2020
7.10
Unspeci ed
6.30
2019
5.00
0
2
3
5
6
8
9
Number of Industrial Robots Installed (in Thousands)
11
12
14
Figure 4.4.16
Table of Contents
Chapter 4 Preview
233
Artificial Intelligence
Index Report 2023
Artificial Intelligence
Index Report 2023
CHAPTER 5:
Education
Table of Contents
Chapter 5 Preview
234
Artificial Intelligence
Index Report 2023
CHAPTER 5 PREVIEW:
Education
Overview
236
5.2 K–12 AI Education
257
Chapter Highlights
237
United States
257
5.1 Postsecondary AI Education
238
CS Bachelor’s Graduates
238
CS Master’s Graduates
240
CS PhD Graduates
242
CS, CE, and Information Faculty
246
Narrative Highlight: Who Funds
CS Departments in the U.S.?
Table of Contents
Chapter 5 Preview
State-Level Trends
257
AP Computer Science
258
Narrative Highlight: The State of
International K–12 Education
260
ACCESS THE PUBLIC DATA
255
235
Artificial Intelligence
Index Report 2023
Chapter 5: Education
Overview
Studying the state of AI education is important for gauging some of the ways in which
the AI workforce might evolve over time. AI-related education has typically occurred
at the postsecondary level; however, as AI technologies have become increasingly
ubiquitous, this education is being embraced at the K–12 level. This chapter examines
trends in AI education at the postsecondary and K–12 levels, in both the United States
and the rest of the world.
We analyze data from the Computing Research Association’s annual Taulbee Survey
on the state of computer science and AI postsecondary education in North America,
Code.org’s repository of data on K–12 computer science in the United States, and a
recent UNESCO report on the international development of K–12 education curricula.
Table of Contents
Chapter 5 Preview
236
Artificial Intelligence
Index Report 2023
Chapter 5: Education
Chapter Highlights
More and more AI specialization.
The proportion of new computer science PhD graduates from U.S. universities who specialized in AI
jumped to 19.1% in 2021, from 14.9% in 2020 and 10.2% in 2010.
New AI PhDs increasingly
head to industry.
In 2011, roughly the same proportion of
new AI PhD graduates took jobs in industry
(40.9%) as opposed to academia (41.6%).
Since then, however, a majority of AI PhDs
have headed to industry. In 2021, 65.4% of AI
PhDs took jobs in industry, more than double
the 28.2% who took jobs in academia.
The gap in external
research funding for
private versus public
American CS departments
continues to widen.
In 2011, the median amount of total expenditure
from external sources for computing research
was roughly the same for private and public
CS departments in the United States. Since
then, the gap has widened, with private U.S.
CS departments receiving millions more in
additional funding than public universities.
In 2021, the median expenditure for private
universities was $9.7 million, compared to
$5.7 million for public universities.
Table of Contents
Chapter 5 Preview
New North American
CS, CE, and information
faculty hires stayed flat.
In the last decade, the total number of new
North American computer science (CS),
computer engineering (CE), and information
faculty hires has decreased: There were
710 total hires in 2021 compared to 733 in
2012. Similarly, the total number of tenuretrack hires peaked in 2019 at 422 and then
dropped to 324 in 2021.
Interest in K–12 AI and
computer science education
grows in both the United States
and the rest of the world.
In 2021, a total of 181,040 AP computer
science exams were taken by American
students, a 1.0% increase from the previous
year. Since 2007, the number of AP computer
science exams has increased ninefold. As of
2021, 11 countries, including Belgium, China,
and South Korea, have officially endorsed
and implemented a K–12 AI curriculum.
237
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
5.1 Postsecondary AI Education
CS Bachelor’s Graduates
At the undergraduate level, most AI-related courses
interest in AI. In 2021, the total number of new North
are offered as part of a computer science (CS)
American CS bachelor’s graduates was 33,059—
curriculum. Therefore, trends in new CS bachelor’s
nearly four times greater than in 2012 (Figure 5.1.1).
graduates give us a proxy for undergraduate
New CS Bachelor’s Graduates in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
33,059
Number of New CS Bachelor’s Graduates
30,000
25,000
20,000
15,000
10,000
5,000
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.1
Table of Contents
Chapter 5 Preview
238
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
Figure 5.1.2 looks at the proportion of CS bachelor’s graduates in North America who are international
students. The number stood at 16.3% in 2021 and has been steadily increasing since 2012—the proportion
of such students has risen 9.5 percentage points since 2012.
New International CS Bachelor’s Graduates (% of Total) in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
16.30%
New International CS Bachelor’s Graduates (% of Total)
16%
12%
8%
4%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.2
Table of Contents
Chapter 5 Preview
239
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
CS Master’s Graduates
AI courses are also commonly offered in CS master’s
as many master’s graduates as in 2012. However,
degree programs. Figure 5.1.3 shows the total
from 2018 to 2021 the total number of new master’s
number of new CS master’s graduates in North
graduates plateaued, declining slightly from 15,532 to
America since 2010. In 2021 there were roughly twice
15,068.
New CS Master’s Graduates in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
16,000
15,068
Number of New CS Master’s Graduates
14,000
12,000
10,000
8,000
6,000
4,000
2,000
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.3
Table of Contents
Chapter 5 Preview
240
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
Interestingly, the number of CS master’s students at North American universities who are international started
declining in 2016 after rising in the early 2010s (Figure 5.1.4). Despite the decline, in 2021 the majority of CS
master’s graduates remained international (65.2%).
New International CS Master’s Graduates (% of Total) in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
New International CS Master’s Graduates (% of Total)
80%
65.20%
60%
40%
20%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.4
Table of Contents
Chapter 5 Preview
241
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
CS PhD Graduates
Unlike the trends in bachelor’s and master’s CS
computer science (Figure 5.1.5). There were fewer
graduates, since 2010 there have not been large
CS PhD graduates in 2021 (1,893) than in 2020 (1,997)
increases in the number of new PhD graduates in
and 2012 (1,929).
New CS PhD Graduates in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Number of New CS PhD Graduates
2,000
1,893
1,500
1,000
500
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.5
Table of Contents
Chapter 5 Preview
242
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
CS PhD graduates in North American universities are becoming increasingly international (Figure 5.1.6). In 2010,
45.8% of CS PhD graduates were international students; the proportion rose to 68.6% in 2021.
New International CS PhD Graduates (% of Total) in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
New International CS PhD Graduates (% of Total)
70%
68.60%
60%
50%
40%
30%
20%
10%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.6
Table of Contents
Chapter 5 Preview
243
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
Moreover, now a significantly larger proportion of new CS PhD students are specializing in AI (Figure 5.1.7). In
2021, 19.1% of new CS PhD students in North American institutions specialized in AI, a 4.2 percentage point
increase since 2020 and 8.6 percentage point increase since 2012.
New CS PhD Students (% of Total) Specializing in AI, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
19.10%
18%
New AI PhD Students (% of Total)
16%
14%
12%
10%
8%
6%
4%
2%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.7
Table of Contents
Chapter 5 Preview
244
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
Where do new AI PhDs choose to work following
(40.9%) as in academia (41.6%). However, as of 2021
graduation? Mirroring trends reported in last year’s
a significantly larger proportion of students (65.4%)
AI Index report, an increasingly large proportion of
went to industry after graduation than to academia
AI PhD graduates are heading to industry (Figures
(28.2%). The amount of new AI PhDs entering
5.1.8 and 5.1.9). In 2011, for example, roughly the
government was 0.7% and has remained relatively
same percentage of graduates took jobs in industry
unchanged in the last half-decade.
Employment of New AI PhDs (% of Total) in North
America by Sector, 2010–21
Employment of New AI PhDs in North America by
Sector, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Industry
65.44%, Industry
238
61
63
154
132
72
51
63
60
123
43
42
162
134
101
74
64
153
116
40%
30%
28.19%, Academia
20%
10%
77
0.67%, Government
2019
2020
2017
2018
2016
2015
2014
2012
2013
2011
2010
2018
2017
2016
2015
0%
2013
2011
0
2010
76
85
2014
50
195
180
2021
100
136
50%
2021
47
2020
134
2019
150
219
73
178
154
60%
84
65
201
200
2012
Number of New AI PhD Graduates
Academia
249
New AI PhD Graduates (% of Total)
Government
250
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
281
Figure 5.1.8
Figure 5.1.9
1 The sums in Figure 5.1.9 do not add up to 100, as there is a subset of new AI PhDs each year who become self-employed, unemployed, or report an “other” employment status
in the CRA survey. These students are not included in the chart.
Table of Contents
Chapter 5 Preview
245
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
CS, CE, and
Information Faculty
5.1.10 highlights the total number of CS, CE (computer
To better understand trends in AI and CS education,
marginally increased in the last year, by 2.2%. Since
it is instructive to consider data on computer science
2011 the number of CS, CE, and information faculty
faculty in addition to postsecondary students. Figure
has grown by 32.8%.
engineering), and information faculty in North
American universities. The amount of faculty has
Number of CS, CE, and Information Faculty in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Tenure Track
Teaching Professors
Other Instructors
Research
8,000
Number of CS, CE, and Information Faculty
7,362
7,000
6,000
5,000
6,138
6,314
6,478
602
766
515
676
661
487
4,366
4,536
4,549
2011
2012
2013
656
447
669
6,629
689
529
6,806
6,887
691
649
589
432
432
390
Postdoc
7,657
653
465
494
7,858
7,976
668
530
296
426
617
8,149
522
306
736
861
1,180
831
895
1,183
1,150
1,014
1,122
4,711
4,786
5,059
5,214
5,252
5,231
5,310
4,548
2014
2015
2016
2017
2018
2019
2020
2021
863
4,000
3,000
2,000
1,000
0
Figure 5.1.10
Table of Contents
Chapter 5 Preview
246
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
In 2021 there were a total of 6,789 CS faculty members in the United States (Figure 5.1.11). The total number
of CS faculty in the United States increased by only 2.0% in the last year, but by 39.0% since 2011.
Number of CS Faculty in the United States, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Tenure Track
Teaching Professors
Other Instructors
Research
Postdoc
7,000
6,098
6,000
5,256
Number of CS Faculty
5,000
4,885
522
4,000
387
521
460
550
521
5,068
592
491
421
5,637
5,729
5,202
535
491
509
396
455
826
364
903
567
6,430
6,533
531
518
426
382
408
436
534
947
671
715
6,654
424
276
6,789
428
287
618
693
946
899
679
3,000
2,000
3,455
3,725
3,564
3,559
2012
2013
2014
3,971
4,176
4,366
4,384
4,390
4,482
3,880
2015
2016
2017
2018
2019
2020
2021
1,000
0
2011
Figure 5.1.11
Table of Contents
Chapter 5 Preview
247
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
Figure 5.1.12 reports the total number of new CS,
hires in 2021, while in 2012 there were 733. Similarly,
CE, and information faculty hires in North American
the total number of tenure-track hires peaked in 2019
universities. In the last decade, the total number of
at 422 and has since dropped to 324 in 2021.
new faculty hires has decreased: There were 710 total
New CS, CE, and Information Faculty Hires in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Number of New CS, CE, and Information Faculty Hires
Total
878
Tenure-Track
860
800
800
749
733
765
749
710
691
600
583
572
543
396
400
348
294
249
406
422
374
358
324
320
258
218
200
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.12
Table of Contents
Chapter 5 Preview
248
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
In 2021, the greatest percentage of new CS, CE, and information faculty hires (40%) came straight from
receiving a PhD (Figure 5.1.13). Only 11% of new CS and CE faculty came from industry.
Source of New Faculty in North American CS, CE, and Information Departments, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
New PhD
From Postdoc
From Other Academic
From Industry
100%
13%
13%
11%
11%
34%
34%
16%
15%
39%
40%
2020
2021
Source of New Faculty
80%
34%
41%
60%
15%
40%
20%
17%
38%
29%
0%
2018
2019
Figure 5.1.13
Table of Contents
Chapter 5 Preview
249
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
The share of filled new CS, CE, and information faculty positions in North American universities has remained
relatively stable in the last decade (Figure 5.1.14). In 2021, 89.3% of new faculty positions were filled, compared
to 82.7% in 2011.
Share of Filled New CS, CE, and Information Faculty Positions in North America, 2011–21
Share of Filled New CS, CE, and Information Faculty Positions
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
90%
89.28%
80%
70%
60%
50%
40%
30%
20%
10%
0%
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.14
Table of Contents
Chapter 5 Preview
250
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
Among open CS, CE, and information faculty positions in 2021, the most commonly cited reason for their
remaining unfilled was offers being turned down (53%) (Figure 5.1.15). In 22% of cases, hiring was still in
progress, while 14% of the time, a candidate had not been identified who met the department’s hiring goals.
Reason Why New CS, CE, and Information Faculty Positions Remained Un lled (% of Total), 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Didn’t �nd a person who met our hiring goals
Technically vacant, not �lled for admin reasons
O�ers turned down
Hiring in progress
Other
Reason Faculty Positions Remained Un�lled (% of Total)
100%
10%
18%
10%
80%
25%
26%
17%
31%
28%
27%
23%
22%
18%
55%
60%
10%
5%
6%
6%
45%
12%
34%
36%
40%
40%
43%
52%
51%
56%
53%
44%
20%
37%
37%
37%
26%
0%
2011
2012
2013
2014
26%
2015
16%
14%
14%
13%
2016
2017
2018
2019
8%
2020
14%
2021
Figure 5.1.15
Table of Contents
Chapter 5 Preview
251
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
Figure 5.1.16 highlights the median nine-month salaries
full professor in computer science made 3.2% more
of CS faculty in the United States by position since
than they did in 2020, and 12.8% more than they did in
2015. During that period, the salaries for all classes
2015. (Note: These figures have not been adjusted for
of professors have increased. In 2021, the average
inflation.)
Median Nine-Month Salary of CS Faculty in United States, 2015–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Full Professor
Median Salary of CS Faculty (in Thousands of U.S. Dollars)
180
160
Associate Professor
176.01
170.57
168.87
164.54
159.96
158.97
156.02
Assistant Professor
140
120
113.95
111.67
103.01
101.16
99.12
100
114.07
109.23
107.55
105.45
127.47
123.71
121.55
119.48
117.5
80
60
40
20
0
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.16
Table of Contents
Chapter 5 Preview
252
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
What proportion of new CS, CE, and information faculty tenure-track hires are international? The data suggests
that it is not a substantial proportion. In 2021, only 13.2% of new CS, CE, and information faculty hires were
international (Figure 5.1.17).
New International CS, CE, and Information Tenure-Track Faculty Hires (% of Total) in North America,
2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
New International Tenure-Track Faculty Hires (% of Total)
25%
20%
15%
13.20%
10%
5%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.17
Table of Contents
Chapter 5 Preview
253
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
The majority of CS, CE, and Information faculty losses in North American departments (36.3%) were the result
of faculty taking academic positions elsewhere (Figure 5.1.18). In 2021, 15.2% of faculty took nonacademic
positions, which is roughly the same amount as those who took such positions a decade prior, in 2011 (15.9%).
Faculty Losses in North American CS, CE, and Information Departments, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Died
Took academic position elsewhere
Remained, but changed to part-time
Retired
Took nonacademic position
Other
Unknown
327
300
270
250
Faculty Losses
213
200
221
50
0
234
42
36
44
32
24
20
23
43
77
74
86
74
65
2013
2014
312
303
37
46
34
33
20
26
126
34
62
100
237
22
20
22
27
150
232
246
303
139
113
89
110
85
52
67
2011
89
2012
94
90
2015
2016
80
2017
94
103
91
2018
2019
2020
100
2021
Figure 5.1.18
Table of Contents
Chapter 5 Preview
254
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
Narrative Highlight:
Who Funds CS Departments in the U.S.?
The CRA tracks data on the external funding sources
agencies such as the Army Research Office,
of CS departments in the United States. The main
the Office of Naval Research, and the Air Force
funder of American CS departments continues to
Research Laboratory (20.3%); industrial sources
be the National Science Foundation (NSF), which
(12.1%); the Defense Advanced Research Projects
in 2021 accounted for 34.9% of external funds.
Agency (DARPA) (8.8%); and the National
However, the share of funding provided by NSF has
Institutes of Health (NIH) (6.8%). The diminishing
decreased since 2003 (Figure 5.1.19). In 2021, the
share of NSF funds over time has been partially
next largest sources of funding came from defense
offset by increasing funds from industry and NIH.
External Funding Sources (% of Total) of CS Departments in United States, 2003–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
45%
40%
External Funding Sources (% of Total)
35%
34.90%, NSF
30%
25%
20%
20.30%, Other Defense
15%
12.10%, Industrial Sources
8.80%, DARPA
6.80%, NIH
4.90%, Private Foundation
4.60%, Other
3.60%, Other Federal
2.30%, DOE
1.50%, State Agencies
0.40%, Unallocated
0.00%, IMLS
10%
5%
0%
2003
2006
2009
2012
2015
2018
2021
Figure 5.1.19
Table of Contents
Chapter 5 Preview
255
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.1 Postsecondary AI Education
Narrative Highlight:
Who Funds CS Departments in the U.S.? (cont’d)
Figure 5.1.20 shows the median total expenditures
Although total median expenditures have
from external sources for computing research in
increased over the last decade for both private and
American CS departments. In 2021, the median total
public CS departments, the gap in expenditure
expenditure for private universities was $9.7 million
has widened, with private universities beginning to
compared with $5.7 million for public universities.
significantly outspend public ones.
Median Total Expenditure From External Sources for Computing Research of U.S. CS Departments, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Median Total Expenditure (in Millions of U.S. Dollars)
10
9.71, Private
8
6
5.69, Public
4
2
0
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.1.20
Table of Contents
Chapter 5 Preview
256
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.2 K–12 AI Education
The following subsection shows trends in K–12 AI education based on K–12 computer science education data in the United States as well
as survey data from UNESCO on the state of global K–12 AI education.
5.2 K–12 AI Education
United States
States Requiring That All High Schools Offer a Computer Science
Course, 2022
Data on the state of K–12 CS
education in the United States
Source: Code.org, 2022 | Chart: 2023 AI Index Report
ME
AK
comes from Code.org, an
education innovation nonprofit
dedicated to ensuring that
every school includes computer
science as part of its core K–12
education. Tracking trends
CA
NH
MA
NY
CT
RI
WA
MT
ND
SD
MN
WI
MI
OR
ID
WY
NE
IA
IL
IN
OH
PA
NJ
NV
UT
CO
KS
MO
KY
WV
DC
MD
DE
AZ
NM
OK
AR
TN
VA
NC
TX
LA
MS
AL
GA
SC
in K–12 CS education can
partially serve as a proxy for
understanding the state of K–12
AI education in America
HI
State-Level Trends
VT
Yes
No
FL
Figure 5.2.1 highlights the 27
Figure 5.2.1
states that in 2022 required that
all high schools offer a computer
science course.
Figure 5.2.2 highlights the
percentage of public high
Public High Schools Teaching Computer Science (% of Total in State),
2022
Source: Code.org, 2022 | Chart: 2023 AI Index Report
ME
60%
AK
51%
schools in a state that teach
computer science. The top
three states in terms of rate of
computer science teaching are
Maryland (98%), South Carolina
(93%), and Arkansas (92%).
CA
40%
HI
77%
VT
76%
NH
82%
MA
78%
NY
48%
CT
77%
RI
86%
WA
47%
MT
36%
ND
44%
SD
39%
MN
21%
WI
66%
MI
46%
OR
63%
ID
38%
WY
55%
NE
52%
IA
71%
IL
44%
IN
85%
OH
48%
PA
77%
NJ
67%
NV
83%
UT
73%
CO
57%
KS
40%
MO
49%
KY
63%
WV
78%
DC
45%
MD
98%
DE
40%
AZ
36%
NM
41%
OK
62%
AR
92%
TN
60%
VA
75%
NC
61%
TX
47%
LA
32%
MS
60%
AL
85%
GA
66%
SC
93%
FL
40%
Figure 5.2.2
Table of Contents
Chapter 5 Preview
257
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.2 K–12 AI Education
AP Computer Science
a total of 181,040 AP computer science exams taken,
Another barometer for tracking the state of K–12 CS
roughly the same number as the previous year, after
education in the United States is analyzing trends in the
several years of significant increases. This leveling
total number of AP computer science exams taken.
could be the result of the pandemic. Since 2007, the
2
number of AP computer science exams has increased
Year over year the total number of AP computer
over ninefold.
science exams continued to increase. In 2021, the
most recent year for which there is data, there were
Number of AP Computer Science Exams Taken, 2007–21
Source: Code.org, 2022 | Chart: 2023 AI Index Report
181.04
Number of AP Computer Science Exams Taken (in Thousands)
180
160
140
120
100
80
60
40
20
0
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 5.2.3
2 There are two types of AP CS exams: Computer Science A and Computer Science Principles. Data on computer science exams taken includes both exams. AP CS Principles
was initially offered in 2017.
Table of Contents
Chapter 5 Preview
258
Artificial Intelligence
Index Report 2023
In 2021, the states which
saw the greatest number of
AP computer science exams
taken were California (31,189),
Chapter 5: Education
5.2 K–12 AI Education
Number of AP Computer Science Exams Taken, 2021
Source: Code.org, 2022 | Chart: 2023 AI Index Report
ME
242
AK
100
VT
150
followed by Texas (17,307),
Florida (14,864), New York
WA
4,034
MT
42
ND
109
SD
26
OR
714
ID
429
WY
112
NE
514
IA
521
OH
PA
NJ
IL
IN
8,572 2,883 3,754 6,104 9,391
NV
CA
31,189 1,701
UT
612
CO
2,584
KS
236
MO
1,199
KY
1,462
AZ
1,587
NM
270
OK
500
VA
TN
NC
AR
1,406 2,046 6,034 6,273
(13,304), and New Jersey
(9,391) (Figure 5.2.4).
Figure 5.2.5 looks at the
number of AP CS exams
taken per capita.3 The state
with the largest per capita
TX
LA
17,307 1,191
amount of AP computer
science exams taken in
2021 was Maryland, with
WI
MN
MI
1,432 2,080 4,504
NH
403
MS
400
WV
352
GA
AL
2,399 7,221
HI
782
NY
CT
13,304 3,251
DC
352
MD
7,662
MA
5,451
RI
617
DE
513
SC
2,159
FL
14,864
Figure 5.2.4
124.1 exams per 100,000
inhabitants. The next states
were New Jersey (101.3),
Connecticut (89.7), California
(79.7), and Massachusetts
(78.0).
Number of AP Computer Science Exams Taken per 100,000 Inhabitants,
2021
Source: Code.org, 2022 | Chart: 2023 AI Index Report
ME
17.57
AK
13.62
VT
NH
MA
23.18 29.04 77.99
WA
52.11
MT
3.80
ND
14.01
SD
2.90
WI
MN
MI
25.07 35.37 44.87
WY
OR
NE
ID
IA
IL
IN
16.78 22.53 19.33 26.18 16.29 67.57 42.31
UT
NV
CA
CO
79.68 54.06 18.33 44.47
AZ
21.84
NM
12.76
KS
8.03
MO
KY
19.43 32.44
OH
31.91
PA
NJ
46.91 101.33
MD
DC
DE
52.63 124.09 51.05
VA
TN
OK
NC
AR
12.53 46.43 29.36 69.70 59.37
TX
MS
LA
58.55 25.74 13.56
HI
54.04
WV
19.71
RI
NY
CT
67.00 89.72 56.25
SC
GA
AL
47.51 66.94 41.57
FL
68.10
Figure 5.2.5
3 More specifically, Figure 5.2.5 normalizes the number of AP CS exams taken—the total number of exams taken in a particular state in 2021 is divided by the state’s
population based on the 2021 U.S. Census.
Table of Contents
Chapter 5 Preview
259
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.2 K–12 AI Education
Narrative Highlight:
The State of International K–12 Education
In 2021, UNESCO released one of the most
Figure 5.2.6, taken from the UNESCO report,
comprehensive reports to date on the international
highlights the governments that have taken steps
state of government-endorsed AI curricula. To
to implement AI curricula and across which levels
gather information, UNESCO released two surveys:
of education. For example, Germany is in the
the first to representatives of 193 UNESCO member
process of developing government-endorsed AI
states and the second to over 10,000 private-
curricular standards on the primary, middle, and
and third-sector actors. As part of these surveys,
high-school levels, and the Chinese government
respondents were asked to report on the status of AI
has already endorsed and implemented
curricula for students in K–12 general education.
standards across those same three levels.
Government Implementation
Implementationof
ofAI
AICurricula
Curriculaby
byCountry,
Country,Status,
Status,and
andEducation
EducationLevel
Level
UNESCO, 2022
2022 || Table:
Table:2023
2023AI
AIIndex
IndexReport
Report
Source: UNESCO,
Country
Status
Status
Armenia
Endorsedand
Endorsed
andImplemented
Implemented
PrimarySchool
School
Primary
Middle
School
Middle
School
Austria
Endorsed
Endorsedand
andImplemented
Implemented
Belgium
Endorsed
Endorsedand
andImplemented
Implemented
China
Endorsed
Endorsedand
andImplemented
Implemented
India
Endorsed
Endorsedand
andImplemented
Implemented
Kuwait
Endorsed
Endorsedand
andImplemented
Implemented
✓✓
✓✓
Portugal
Portugal
Endorsed
Endorsedand
andImplemented
Implemented
✓✓
✓✓
✓✓
Qatar
Qatar
Endorsed
Endorsedand
andImplemented
Implemented
✓✓
✓✓
✓✓
Serbia
Serbia
Endorsed
Endorsedand
andImplemented
Implemented
✓✓
✓✓
South
South Korea
Korea
Endorsed
Endorsedand
andImplemented
Implemented
United
United Arab
Arab Emirates
Emirates
Bulgaria
Bulgaria
Endorsed
Endorsedand
andImplemented
Implemented
In
Development
In Development
✓✓
✓✓
✓✓
✓✓
✓✓
Germany
Germany
Jordan
Jordan
In
InDevelopment
Development
In
InDevelopment
Development
✓✓
✓✓
✓✓
✓✓
Saudia
Saudia Arabia
Arabia
Serbia
Serbia
In
InDevelopment
Development
In
InDevelopment
Development
✓✓
✓✓
✓✓
✓✓
✓✓
High
School
High
School
✓✓
✓✓
✓✓
✓✓
✓✓
✓✓
✓✓
✓✓
✓✓
✓✓
✓✓
✓✓
Figure 5.2.64
4 According to the UNESCO report, Serbia has already endorsed and implemented certain kinds of K–12 AI curricula, but is also simultaneously in the process of
developing others—thus it is listed under both categories.
Table of Contents
Chapter 5 Preview
260
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.2 K–12 AI Education
Narrative Highlight:
The State of International K–12 Education (cont’d)
Figure 5.2.7 identifies the topic areas most emphasized in the K–12 AI curricula profiled in the UNESCO
report. The four topics toward which the most time was allocated were algorithms and programming (18%),
AI technologies (14%), data literacy (12%), and application of AI to other domains (12%).
Time Allocated (% of Total) in K–12 AI Curricula by Topic, 2022
Source: UNESCO, 2022 | Chart: 2023 AI Index Report
Algorithms and Programming
18%
Data Literacy
12%
Contextual Problem-Solving
11%
Application of AI to Other Domains
12%
Ethics of AI
7%
Social Implications of AI
5%
AI Technologies
14%
Developing AI Technologies
9%
AI Foundations
AI Techniques
Ethics and Social Impact
2%
Understanding, Using, and Developing AI
Unspeci ed
0%
Unspeci ed
10%
2%
4%
6%
8%
10%
% of Time Allocated
12%
14%
16%
18%
Figure 5.2.7
Table of Contents
Chapter 5 Preview
261
Artificial Intelligence
Index Report 2023
Chapter 5: Education
5.2 K–12 AI Education
Narrative Highlight:
The State of International K–12 Education (cont’d)
What might an actual K–12 AI curriculum look
like in practice? The UNESCO report includes
detailed information about a sample curriculum
that was deployed in Austria, the Austrian Data
Science and Artificial Intelligence curriculum.
As noted in the report:
“The Austrian Data Science and Artificial
Intelligence curriculum includes digital basics such
as using an operating system to store and print
files, design presentations, and use spreadsheets
and word-processing software. It also covers
design and reflection on types and social issues in
digital media, and safe digital media use. Students
in high school engage programming languages,
algorithms and simulations. They learn the basic
principles of data literacy, including collecting
data, structuring a spreadsheet, and carrying out
analyses and visualizations. They apply criteria
to evaluate the credibility and reliability of data
“They also gain an
understanding of the
ethical dilemmas that
are associated with the
use of such technologies,
and become active
participants in social
discourse on these
issues.”
sources as well as digital content. Students are
expected to know about careers in ICT, including
AI, and the social applications of emerging
technologies. They create digital media and learn
about the cloud and how to connect and network
computers. They also gain an understanding of
the ethical dilemmas that are associated with
the use of such technologies, and become active
participants in social discourse on these issues.
Finally, students are tasked with using technology
to make public statements and understand how
this reflects the democratic process.”
Table of Contents
Chapter 5 Preview
262
Artificial Intelligence
Index Report 2023
Artificial Intelligence
Index Report 2023
CHAPTER 6:
Policy and
Governance
Table of Contents
Chapter 6 Preview
263
Artificial Intelligence
Index Report 2023
CHAPTER 6 PREVIEW:
Policy and Governance
Overview
265
6.2 National AI Strategies
285
Chapter Highlights
266
Aggregate Trends
285
6.1 AI and Policymaking
267
By Geographic Area
285
Global Legislation Records on AI
267
6.3 U.S. Public Investment in AI
286
269
Federal Budget for Nondefense AI R&D
286
U.S. Department of Defense
Budget Requests
287
U.S. Government AI-Related
Contract Spending
288
By Geographic Area
Narrative Highlight: A Closer Look at
Global AI Legislation
270
United States Federal AI Legislation
271
United States State-Level AI Legislation 272
Narrative Highlight: A Closer Look
at State-Level AI Legislation
Total Contract Spending
288
275
276
6.4 U.S. AI-Related Legal Cases
291
By Geographic Area
277
Total Cases
291
Narrative Highlight: A Closer Look
at Global AI Mentions
Geographic Distribution
292
279
Sector
293
United States Committee Mentions
280
Type of Law
294
United States AI Policy Papers
283
Global AI Mentions
By Topic
284
Narrative Highlight: Three Significant
AI-Related Legal Cases
295
ACCESS THE PUBLIC DATA
Table of Contents
Chapter 6 Preview
264
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
Overview
The growing popularity of AI has prompted intergovernmental, national, and
regional organizations to craft strategies around AI governance. These actors are
motivated by the realization that the societal and ethical concerns surrounding AI
must be addressed to maximize its benefits. The governance of AI technologies has
become essential for governments across the world.
This chapter examines AI governance on a global scale. It begins by highlighting the
countries leading the way in setting AI policies. Next, it considers how AI has been
discussed in legislative records internationally and in the United States. The chapter
concludes with an examination of trends in various national AI strategies, followed
by a close review of U.S. public sector investment in AI.
Table of Contents
Chapter 6 Preview
265
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
Chapter Highlights
Policymaker interest
in AI is on the rise.
An AI Index analysis of the legislative records
of 127 countries shows that the number of bills
containing “artificial intelligence” that were
passed into law grew from just 1 in 2016 to 37 in
2022. An analysis of the parliamentary records on
AI in 81 countries likewise shows that mentions
of AI in global legislative proceedings have
increased nearly 6.5 times since 2016.
From talk to enactment—
the U.S. passed more
AI bills than ever before.
In 2021, only 2% of all federal AI bills in the
United States were passed into law. This number
jumped to 10% in 2022. Similarly, last year 35%
of all state-level AI bills were passed into law.
The U.S. government
continues to increase
spending on AI.
Since 2017, the amount of U.S. government
AI-related contract spending has increased
roughly 2.5 times.
Table of Contents
Chapter 6 Preview
When it comes to AI,
policymakers have
a lot of thoughts.
A qualitative analysis of the
parliamentary proceedings of a
diverse group of nations reveals
that policymakers think about AI
from a wide range of perspectives.
For example, in 2022, legislators in
the United Kingdom discussed the
risks of AI-led automation; those
in Japan considered the necessity
of safeguarding human rights in
the face of AI; and those in Zambia
looked at the possibility of using AI
for weather forecasting.
The legal world is
waking up to AI.
In 2022, there were 110 AI-related
legal cases in United States state
and federal courts, roughly seven
times more than in 2016. The
majority of these cases originated
in California, New York, and Illinois,
and concerned issues relating to
civil, intellectual property, and
contract law.
266
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
In the last 10 years, AI governance discussions have accelerated, resulting in numerous policy proposals in various legislative bodies. This
section begins by exploring the legislative initiatives related to AI that have been suggested or enacted in different countries and regions,
followed by an in-depth examination of state-level AI legislation in the United States. The section then scrutinizes records of AI-related
discussions in parliaments and congresses worldwide and concludes with the number of AI policy papers published in the United States.
6.1 AI and Policymaking1
Global Legislative
Records on AI
passed at least one AI-related bill, and together they
The AI Index conducted an analysis of laws passed
Figure 6.1.2 shows that from 2016 to 2022, there has
by legislative bodies in 127 countries that contain the
been a sharp increase in the total number of AI-related
words “artificial intelligence” from 2016 to 2022.2
bills passed into law, with only one passed in 2016,
Of the 127 countries analyzed, since 2016, 31 have
climbing to 37 bills passed in 2022.
have passed a total of 123 AI-related bills (Figure 6.1.1).
Number of AI-Related Bills Passed Into Law by Country, 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
0
1–5
6–10
11–15
16–25
No Available Data
Figure 6.1.1
1 Note that the analysis of passed AI policies may undercount the number of actual bills, given that large bills can include multiple sub-bills related to AI; for example, the CHIPS and Science
Act passed by the U.S. in 2022.
2 The full list of countries analyzed is in the Appendix. The AI Index team attempted to research the legislative bodies of every country in the world; however, publicly accessible legislative
databases were not made available for certain countries.
Table of Contents
Chapter 6 Preview
267
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Number of AI-Related Bills Passed Into Law in 127 Select Countries, 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
37
35
Number of AI-Related Bills
30
25
20
15
10
5
0
2016
2017
2018
2019
2020
2021
2022
Figure 6.1.2
Table of Contents
Chapter 6 Preview
268
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
By Geographic Area
the Philippines, which passed 5 and 4 laws, respectively.
Figure 6.1.3 shows the number of laws containing
Figure 6.1.4 shows the total number of laws passed
mentions of AI that were enacted in 2022. The United
since 2016. The United States leads the list with 22 bills,
States led the list with 9 laws, followed by Spain and
followed by Portugal, Spain, Italy, and Russia.
Number of AI-Related Bills Passed Into Law in Select Countries, 2022
Source: AI Index, 2022 | Chart: 2023 AI Index Report
United States
9
Spain
5
Philippines
4
Andorra
2
Belgium
2
Italy
2
Portugal
2
Russia
2
United Kingdom
2
Austria
1
Croatia
1
Germany
1
Kyrgyz Republic
1
Latvia
1
Liechtenstein
1
Slovenia
1
0
1
2
3
4
5
Number of AI-Related Bills
6
7
8
9
Figure 6.1.3
Number of AI-Related Bills Passed Into Law in Select Countries, 2016–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
United States
22
Portugal
13
Spain
10
Italy
9
Russia
9
Belgium
7
United Kingdom
6
Austria
5
Korea, Rep.
5
Philippines
5
France
4
China
3
Germany
3
Japan
3
0
Table of Contents
2
4
6
Chapter 6 Preview
8
10
12
14
Number of AI-Related Bills
16
18
20
22
Figure 6.1.4
269
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Narrative Highlight:
A Closer Look at Global AI Legislation
The following subsection delves into some of the AI-related legislation passed into law during 2022.
Figure 6.1.5 samples five different countries’ laws covering a range of AI-related issues.
AI-Related
AI-Related
AI-Related Legislation
Legislation From
From Select
Select Countries,
Countries, 2022
2022
Source:
AI
Index,
2022
Table:
2023
AI
Source:
Source:AI
AIIndex,
Index,2022
2022|||Table:
Table:2023
2023AI
AIIndex
IndexReport
Report
Country
Country
Country
Bill
Bill
BillName
Name
Description
Description
Kyrgyz
Republic
Kyrgyz
KyrgyzRepublic
Republic
About
About
About the
the Creative
Creative Industries
Industries Park
Park
This
and
operation
Thislaw
lawdetermines
determinesthe
thelegal
legalstatus,
status,management,
management,and
andoperation
operation
procedures
established
to
accelerate
the
proceduresof
ofthe
theCreative
CreativeIndustries
IndustriesPark,
Park,established
establishedto
toaccelerate
acceleratethe
the
development
of
creative
industries,
including
arti�cial
intelligence.
development of creative industries, includingarti�cial
arti�cialintelligence.
intelligence.
Latvia
Latvia
Latvia
Amendments
Amendments
Amendmentsto
tothe
theNational
NationalSecurity
SecurityLaw
Law
AAprovision
commercial
companies,
provisionof
ofthis
thisact
actestablishes
establishesrestrictions
restrictionson
oncommercial
commercialcompanies,
companies,
associations,
national
security,
including
associations,and
andfoundations
foundationsimportant
importantfor
fornational
nationalsecurity,
security,including
includingaaa
commercial
intelligence.
commercialcompany
companythat
thatdevelops
developsarti�cial
arti�cialintelligence.
intelligence.
Philippines
Philippines
Philippines
Second
Second
SecondCongressional
CongressionalCommission
Commissionon
onEducation
Education
(EDCOM
(EDCOM
(EDCOMII)
II)Act
Act
AAprovision
commission
to
review,
provisionof
ofthis
thisact
actcreates
createsaacongressional
congressionalcommission
commissionto
toreview,
review,
assess,
education;
to
recommend
assess,and
andevaluate
evaluatethe
thestate
stateof
ofPhilippine
Philippineeducation;
education;to
torecommend
recommend
innovative
education;
and
to
appropriate
innovativeand
andtargeted
targetedpolicy
policyreforms
reformsinineducation;
education;and
andto
toappropriate
appropriate
funds.
challenges
to
education
funds.The
Theact
actcalls
callsfor
forreforms
reformsto
tomeet
meetthe
thenew
newchallenges
challengesto
toeducation
education
caused
characterized,
in
part,
by
the
causedby
bythe
theFourth
FourthIndustrial
IndustrialRevolution
Revolutioncharacterized,
characterized,in
inpart,
part,by
bythe
the
rapid
development
of
arti�cial
intelligence.
rapid development of arti�cial intelligence.
Spain
Spain
Spain
Right
Right
Rightto
toequal
equaltreatment
treatmentand
andnon-discrimination
non-discrimination
AAprovision
intelligence
algorithms
provisionof
ofthis
thisact
actestablishes
establishesthat
thatarti�cial
arti�cialintelligence
intelligencealgorithms
algorithms
involved
take
into
account
involvedin
inpublic
publicadministrations’
administrations’decision-making
decision-makingtake
takeinto
intoaccount
account
bias-minimization
accountability,
whenever
bias-minimizationcriteria,
criteria,transparency,
transparency,and
andaccountability,
accountability,whenever
whenever
technically
technicallyfeasible.
feasible.
United
States
United
UnitedStates
States
AI
AI
AITraining
TrainingAct
Act
This
Budget
to
establish
or
Thisbill
billrequires
requiresthe
theO�ce
O�ceof
ofManagement
Managementand
andBudget
Budgetto
toestablish
establishor
or
otherwise
otherwise
provide
an
AI
training
program
for
the
acquisition
workforce
of
otherwiseprovide
providean
anAI
AItraining
trainingprogram
programfor
forthe
theacquisition
acquisitionworkforce
workforceof
of
executive
executive
agencies
(e.g.,
those
responsible
for
program
management
or
executiveagencies
agencies(e.g.,
(e.g.,those
thoseresponsible
responsiblefor
forprogram
programmanagement
managementor
or
logistics),
logistics),
with
exceptions.
The
purpose
of
the
program
isisto
to
ensure
that
logistics),with
withexceptions.
exceptions.The
Thepurpose
purposeof
ofthe
theprogram
programis
toensure
ensurethat
that
the
workforce has
knowledge of
the capabilities
and risks
associated with
the
theworkforce
workforcehas
hasknowledge
knowledgeof
ofthe
thecapabilities
capabilitiesand
andrisks
risksassociated
associatedwith
with
AI.
AI.
AI.
Figure 6.1.5
Table of Contents
Chapter 6 Preview
270
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
United States Federal
AI Legislation
just one federal bill was proposed, while in 2021, 134
A closer look at the U.S. federal legislative record
proposed bills. While fewer bills were proposed in
shows a sharp increase in the total number of
2022, the number of passed bills, which remained at
proposed bills that relate to AI (Figure 6.1.6). In 2015,
3 for each of the past four years, increased to 9.
bills were proposed. In 2022 this number fell to 88
Number of AI-Related Bills in the United States, 2015–22 (Proposed Vs. Passed)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
140
120
Number of AI-Related Bills
100
88, Proposed
80
60
40
20
9, Passed
0
2015
2016
2017
2018
2019
2020
2021
2022
Figure 6.1.6
Table of Contents
Chapter 6 Preview
271
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
United States State-Level
AI Legislation
Maryland with 3. Figure 6.1.8 shows the total volume of
Figure 6.1.7 shows the number of laws containing
California, Massachusetts, and Washington. Figure
mentions of AI that were passed by U.S. states in
6.1.9 highlights the number of state-level AI-related
2022. California leads the list with 5, followed by
bills passed by all states since 2016.
legislation passed from 2016 to 2022 for select states,
with Maryland leading the list with 7 bills, followed by
Number of AI-Related Bills Passed Into Law in Select U.S. States, 2022
Source: AI Index, 2022 | Chart: 2023 AI Index Report
California
5
Maryland
3
Colorado
2
New Jersey
2
Washington
2
Alabama
1
Hawaii
1
Idaho
1
Louisiana
1
Massachusetts
1
North Carolina
1
Vermont
1
0
1
2
3
Number of AI-Related Bills
4
5
Figure 6.1.7
Table of Contents
Chapter 6 Preview
272
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Number of AI-Related Bills Passed Into Law in Select U.S. States, 2016–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Maryland
7
California
6
Massachusetts
5
Washington
5
Illinois
3
Utah
3
Vermont
3
Alabama
2
Colorado
2
Michigan
2
New Jersey
2
New York
2
North Carolina
2
Ohio
2
0
1
2
3
4
Number of AI-Related Bills
5
6
7
Figure 6.1.8
(Sum)
Number of State-Level AI-Related Bills Passed Into Law in the
United States by State, 2016–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
ME
0
AK
0
CA
6
WA
5
MT
0
ND
1
SD
0
MN
0
WI
0
MI
2
OR
0
ID
1
WY
0
NE
0
IA
0
IL
3
IN
0
NV
1
UT
3
CO
2
KS
0
MO
0
KY
1
WV
1
AZ
0
NM
0
OK
0
AR
0
TN
0
VA
1
NC
2
TX
2
LA
1
MS
1
AL
2
GA
0
SC
0
HI
1
OH
2
VT
3
NH
0
MA
5
NY
2
CT
0
RI
0
PA
0
NJ
2
MD
7
DE
0
FL
0
Figure 6.1.9
Table of Contents
Chapter 6 Preview
273
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Growing policy interest in AI can also be seen at the state level, with 60 AI-related bills proposed in 2022
(Figure 6.1.10)—a dramatic increase from the 5 bills proposed in 2015. Additionally, the proportion of bills being
passed has risen throughout the years. In 2015, 1 bill was passed, representing 16% of the total bills proposed
that year; while in 2022, 21 bills were passed, or 35% out of the total that were proposed.
Number of State-Level AI-Related Bills in the United States, 2015–22 (Proposed Vs. Passed)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
60, Proposed
60
Number of AI-Related Bills
50
40
30
21, Passed
20
10
0
2015
2016
2017
2018
2019
2020
2021
2022
Figure 6.1.10
Table of Contents
Chapter 6 Preview
274
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Narrative Highlight:
A Closer Look at State-Level AI Legislation
The following subsection highlights some of the AI-related legislation passed into law at the state level
during 2022. Figure 6.1.11 focuses on wide-ranging AI-related laws from five states around the country.
AI-Related Legislation
Legislation From
From Select
Select States,
States, 2022
2022
AI-Related
Source:AI
AIIndex,
Index,2022
2022|||Table:
Table:2023
2023AI
AIIndex
IndexReport
Report
Source:
AI
Index,
2022
Table:
2023
AI
Index
Report
Source:
State
State
State
BillName
Name
Bill
Name
Bill
Description
Description
Alabama
Alabama
Alabama
Arti�cialIntelligence,
Intelligence,Limit
Limit the
the Use
Use
Arti�cial
Intelligence,
Limit
Arti�cial
ofFacial
FacialRecognition,
Recognition,to
toEnsure
Ensure
of
Facial
Recognition,
to
Ensure
of
Arti�cialIntelligence
IntelligenceIs
IsNot
Notthe
theOnly
Only
Arti�cial
Intelligence
Is
Not
Arti�cial
Basisfor
forArrest
Arrest
Basis
for
Arrest
Basis
This bill
bill prohibits
prohibits state
state or
or local
local law
law enforcement
enforcement agencies
agenciesfrom
fromusing
usingfacial
facialrecognition
recognition
This
matchresults
resultsas
asthe
thesole
solebasis
basisfor
formaking
makingan
anarrest
arrestor
orfor
forestablishing
establishingprobable
probablecause
causein
inaa
match
criminalinvestigation.
investigation.
criminal
California
California
California
BudgetAct
Actof
of2022
2022
Budget
Act
of
2022
Budget
provisionof
ofthis
thisappropriations
appropriationsbill
billfor
forthe
the2022–23
2022–23�scal
�scalyear
yearallocates
allocates$1,300,000
$1,300,000to
to
AAprovision
CaliforniaState
StateUniversity,
University,Sacramento,
Sacramento,to
toimprove
improvethe
thecampus
campuschildcare
childcarecenter,
center,
California
includingthe
thedevelopment
developmentof
ofan
anarti�cial
arti�cialintelligence
intelligencemixed-reality
mixed-realityclassroom.
classroom.
including
Maryland
Maryland
Maryland
ConservationFinance
FinanceAct
Act
Conservation
Finance
Act
Conservation
provisionof
ofthis
thisact
actestablishes
establishesthat
thatthe
theDepartment
Departmentof
ofNatural
NaturalResources
Resourcesshall
shallstudy
study
AAprovision
andassess
assessthe
thepotential
potentialfor
fordigital
digitaltools
toolsand
andplatforms
platformsincluding
includingarti�cial
arti�cialintelligence
intelligenceand
and
and
machine learning
learning to
to contribute
contribute to
to Chesapeake
Chesapeake Bay
Bay restoration
restorationand
andclimate
climatesolutions.
solutions.
machine
NewJersey
Jersey
New
Jersey
New
21stCentury
CenturyIntegrated
IntegratedDigital
Digital
21st
Century
Integrated
Digital
21st
ExperienceAct
Act
Experience
Act
Experience
provisionof
ofthis
thisact,
act,which
whichconcerns
concernsthe
themodernization
modernizationof
ofstate
stategovernment
governmentwebsites,
websites,
AAprovision
establishesthat
thatthe
thechief
chieftechnology
technologyo�cer,
o�cer,in
inconsultation
consultationwith
withthe
thechief
chiefinnovation
innovation
establishes
o�cerand
andthe
theNew
NewJersey
JerseyInformation
InformationTechnology
TechnologyProject
ProjectReview
ReviewBoard,
Board,shall
shallevaluate
evaluate
o�cer
on an
an annual
annual basis
basis the
the feasibility
feasibility of
of state
state agencies
agencies using
using arti�cial
arti�cialintelligence
intelligenceand
and
on
machinelearning
learningto
toprovide
providepublic
publicservices.
services.
machine
Vermont
Vermont
Vermont
AnAct
ActRelating
Relatingto
tothe
theUse
Useand
and
An
Act
Relating
to
the
Use
and
An
Oversightof
ofArti�cial
Arti�cialIntelligence
Intelligencein
in
Oversight
of
Arti�cial
Intelligence
Oversight
StateGovernment
Government
State
Government
State
This act
act creates
creates the
the Division
Division of
of Arti�cial
Arti�cial Intelligence
Intelligence within
withinthe
theAgency
Agencyof
ofDigital
DigitalServices
Services
This
toreview
reviewall
allaspects
aspectsof
ofarti�cial
arti�cialintelligence
intelligencedeveloped,
developed,employed,
employed,or
orprocured
procuredby
bythe
the
to
stategovernment.
government.The
Theact
actrequires
requiresthe
theDivision
Divisionof
ofArti�cial
Arti�cialIntelligence
Intelligenceto,
to,among
amongother
other
state
things,propose
proposeaastate
statecode
codeof
ofethics
ethicson
onthe
theuse
useof
ofarti�cial
arti�cialintelligence
intelligencein
instate
state
things,
government and
and make
make recommendations
recommendations to
to the
the General
General Assembly
Assemblyon
onpolicies,
policies,laws,
laws,and
and
government
regulationsregarding
regardingarti�cial
arti�cialintelligence
intelligencein
instate
stategovernment.
government.
regulations
Figure 6.1.11
Table of Contents
Chapter 6 Preview
275
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Global AI Mentions
Another barometer of legislative interest is the
contain the keyword “artificial intelligence” from 2016
number of mentions of “artificial intelligence” in
to 2022.3 Figure 6.1.12 shows that mentions of AI in
governmental and parliamentary proceedings. The
legislative proceedings in these countries registered a
AI Index conducted an analysis of the minutes or
small decrease from 2021 to 2022, from 1,547 to 1,340.
proceedings of legislative sessions in 81 countries that
Number of Mentions of AI in Legislative Proceedings in 81 Select Countries, 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
1,600
1,400
1,340
Number of Mentions
1,200
1,000
800
600
400
200
2016
2017
2018
2019
2020
2021
2022
Figure 6.1.12
3 The full list of countries that was analyzed is in the Appendix. The AI Index research team attempted to review the governmental and parliamentary proceedings of every country in the
world; however, publicly accessible governmental and parliamentary databases were not made available for all countries.
Table of Contents
Chapter 6 Preview
276
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
By Geographic Area
Figure 6.1.13 shows the number of legislative proceedings containing mentions of AI in 2022.4 From the 81
countries considered, 46 had at least one mention, and Spain topped the list with 273 mentions, followed by
Canada (211), the United Kingdom (146), and the United States (138).
Number of Mentions of AI in Legislative Proceedings by Country, 2022
Source: AI Index, 2022 | Chart: 2023 AI Index Report
0
1–55
56–110
111–165
166–220
221–280
No Available Data
Figure 6.1.13
4 For mentions of AI in legislative proceedings around the world, the AI Index performed searches of the keyword “artificial intelligence,” in the respective languages, on the websites of
different countries’ congresses or parliaments, usually under sections named “minutes,” “Hansard,” etc.
Table of Contents
Chapter 6 Preview
27 7
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Figure 6.1.14 shows the total number of AI mentions in the past seven years. Of the 81 countries considered, 62 had
at least one mention, and the United Kingdom dominates the list with 1,092 mentions, followed by Spain (832), the
United States (626), Japan (511), and Hong Kong (478).
Number of Mentions of AI in Legislative Proceedings by Country, 2016–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
0
1–220
221–440
441–660
661–880
881–1100
No Available Data
Figure 6.1.14
Table of Contents
Chapter 6 Preview
278
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Narrative Highlight:
A Closer Look at Global AI Mentions
The following subsection examines mentions of AI in government proceedings in 2022. Figure 6.1.15
quotes discussions across a geographically diverse set of countries.
AI-Related
AI-Related
AI-Related Parliamentary
ParliamentaryMen
Mentions
tionsFrom
FromSelect
SelectCountries,
Countries,2022
2022
Source:
AI
Index,
2022
Table:
2023
AI
Index
Report
Source:
Source:AI
AIIndex,
Index,2022
2022 |||Table:
Table:2023
2023AI
AIIndex
IndexReport
Report
Country
Country
Country
Legislature
Legislature
Legislature
Speaker
Speaker
Speaker
Quote
Quote
Agenda
AgendaItem
Item
Australia
Australia
Australia
House
of
House
Houseof
of
Representatives
Representatives
Representatives
Ed
Husic,
Australian
Ed
EdHusic,
Husic,Australian
AustralianLabor
Labor
Party,
Minister
Party,
Party,Minister
Ministerfor
forIndustry
Industry
and
Science
and
andScience
Science
“Working
“Workingwith
withour
ourinternational
internationalpartners
partnerswe
wecan
can
transform
transformAustralian
Australianknow-how
know-howinto
intoglobally
globallyrecognised
recognised
skills
skillsand
andmanufacturing
manufacturinginindefence
defenceindustries.
industries.And
Andwe
we
can
canbuild
buildon
onour
ourundeniable
undeniableexpertise
expertiseininareas
areaslike
like
quantum
quantumtechnologies,
technologies,robotics
roboticsand
andarti
arti cial
cial
intelligence.
intelligence.We
Wewill
willseek
seekto
topartner
partnerwith
withindustry
industryand
and
state
stateand
andterritory
territorygovernments
governmentstotoidentify
identifyinvestment
investment
opportunities
opportunitieswithin
withinpriority
priorityareas.
areas.An
Anon-ramp,
on-ramp,ififyou
you
will,
will,of
ofturn-key
turn-keyopportunities
opportunitiesfor
forinvestment
investmenttotomake
make
sure
the
NRF
is
well
placed
for
success.”
sure the NRF is well placed for success.”
National
NationalReconstruction
Reconstruction
Fund
FundCorporation
CorporationBill
Bill 2022
2022
- Second
Reading
- Second
Reading
Brazil
Brazil
Brazil
Diary
of
the
Diary
Diaryof
ofthe
the
Chamber
of
the
Chamber
Chamberof
ofthe
the
Members
Members
Members
Mr.
Gustavo
Fruet,
Mr.
Mr.Gustavo
GustavoFruet,
Fruet,
Democratic
Labor
Democratic
DemocraticLabor
LaborParty
Party
“There
“Therehas
hasbeen
beenaalot
lotofoftalk
talkabout
aboutthe
thefuture
futureofofwork
workdue
due
to
totechnology.
technology.InInthe
thebook
bookThe
TheFourth
FourthIndustrial
Industrial
Revolution,
Revolution,Klaus
KlausSchwab
Schwabeven
evenpoints
pointsout
outprofessions
professions
that
thatwill
willbe
beextinct
extinctand
andprofessions
professionsthat
thatwill
willdemand
demand
more
and
more
quali�cations,
in
times
of
5G,
more and more quali�cations, in times of 5G,Internet
Internetofof
Things
Thingsand
andArti
Arti cial
cialIntelligence.
Intelligence.InInthis
thissense,
sense,ititisisgood
good
to
tohighlight
highlightthat
thatthe
thepandemic,
pandemic,among
amongother
other
contradictions,
contradictions,ended
endedup
upanticipating
anticipatingthe
theuse
useofof
technology,
technology,especially
especiallyininthe
thetelework.”
telework.”
Presentation
PresentationofofBill
BillNo.
No.
135,
135,ofof2022,
2022,on
onthe
the
amendment
amendmentofofthe
theCLT
CLT- Consolidation
ConsolidationofofLabor
Labor
Laws,
Laws,with
withaaview
viewtoto
granting
grantingtelework
teleworktoto
parents
parentsofofchildren
childrenup
uptoto88
years
yearsold
old
Japan
Japan
Japan
210th Session
Session of
210th
210th Sessionof
of
the Diet
Diet House
House of
the
the Diet Houseof
of
Councilors
Councilors
Councilors
Commission on
on
Commission
Commission on
the Constitution
Constitution
the
the Constitution
No. 22
No.
No. 2
Kohei Otsuka,
Otsuka, Democratic
Kohei
Kohei Otsuka, Democratic
Party for
for the
the People,
Party
Party for thePeople,
People,
Shinryokufukai
Shinryokufukai
Shinryokufukai
“In the �eld of human rights, we believe that it is
“In the �eld of human rights, we believe that it is
necessary
necessaryto
toupdate
updatehuman
humanrights
rightsguarantees
guaranteesininorder
ordertoto
respond
respondto
tochanges
changesininthe
thetimes
timesthat
thatwere
wereunpredictable
unpredictable
when
whenthe
theConstitution
Constitutionwas
wasenacted.
enacted.InInparticular,
particular,asasthe
the
fusion of arti cial intelligence and Internet technology
fusion of arti cial intelligence and Internet technology
progresses,
progresses,the
theinternational
internationalcommunity
communityisisconcerned
concerned
about
aboutthe
theproblems
problemsofofindividual
individualscoring
scoringand
and
discrimination,
and
the
problem
of
Internet
advertising
discrimination, and the problem of Internet advertising
that
thatunfairly
unfairlyin�uences
in�uencesthe
thevoting
votingbehavior
behaviorofofcitizens.
citizens.
We need a constitutional argument to guarantee the
We need a constitutional argument to guarantee the
autonomous
autonomousdecision-making
decision-makingofofindividuals
individualsand
andprotect
protect
basic
basicdata
datarights
rightsininthe
thedigital
digitalage.”
age.”
The Commission on the
The Commission on the
Constitution
Constitution
United
United
United
Kingdom
Kingdom
Kingdom
House of
of
House
House of
Commons
Commons
Commons
Dame Angela
Angela
Dame
Dame AngelaEagle,
Eagle,Labor
Labor
“What
“Whatwould
wouldbe
bethe
theuse
useofofarti
arti cial
cialintelligence
intelligenceinintrying
trying
to decide how automated these things could become?
to decide how automated these things could become?
Would there be worries about over-automation? How
Would there be worries about over-automation? How
would that be looked at in terms of regulation? How
would that be looked at in terms of regulation? How
open are we going to be about the way in which AI is
open are we going to be about the way in which AI is
applied and how it might evolve in ways that might
applied and how it might evolve in ways that might
embed discrimination such that we get a system where
embed discrimination such that we get a system where
certain people may be discriminated against and
certain people may be discriminated against and
excluded?”
excluded?”
Financial
FinancialServices
Servicesand
and
Markets Bill (Fourth
Markets Bill (Fourth
Sitting)
Sitting)
Zambia
Zambia
Zambia
The House,
House,
The
The House,
National
National
National
Assembly
Assembly
Assembly
Hon. Collins
Collins Nzovu,
Nzovu, United
Hon.
Hon. Collins Nzovu, United
Party for
for National
National
Party
Party for National
Development,
Development,
Development, The
Minister of
of Green
Green
Minister
Minister of Green
Economy and
and Environment
Environment
Economy
Economy and Environment
“Madam Speaker, in order to enhance quality and
“Madam Speaker, in order to enhance quality and
accuracy of weather forecast, the Government, with
accuracy of weather forecast, the Government, with
�nancial support from the United Nations Development
�nancial support from the United Nations Development
Programme Strengthening Climate Resilience of
Programme Strengthening Climate Resilience of
Agricultural Livelihoods
Livelihoods in
in Agro-Ecological
Agro-Ecological (UNDP
(UNDP
Agricultural
Agricultural Livelihoods in Agro-Ecological (UNDP
SCRALA) project
project is
is currently
currently partnering
partnering with
with the
the
SCRALA)
SCRALA) project is currently partnering with the
University of
of Zambia
Zambia (UNZA)
(UNZA) to
to develop
develop aa seasonal
seasonal
University
University of Zambia (UNZA) to develop a seasonal
weather forecasting
forecasting system
system using
using arti
arti cial
cial intelligence.”
intelligence.”
weather
weather forecasting system using arti cial intelligence.”
Ministerial Statements;
Ministerial Statements;
Weather and Climate
Weather and Climate
Services and the
Services and the
2022/2023 rainfall forecast
2022/2023 rainfall forecast
Figure 6.1.15
Table of Contents
Chapter 6 Preview
279
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
United States
Committee Mentions
committees that address legislative and other policy
An additional indicator of legislative interest is the
Figure 6.1.16 shows a sharp increase in the total
number of mentions of “artificial intelligence” in
number of mentions of AI within committee reports
committee reports produced by House and Senate
beginning with the 115th legislative session.
issues, investigations, and internal committee matters.
Mentions of AI in U.S. Committee Reports by Legislative Session, 2001–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
80
73
70
Number of Mentions
60
50
40
30
20
10
0
107th
(2001–02)
108th
(2003–04)
109th
(2005–06)
110th
(2007–08)
111th
(2009–10)
112th
(2011–12)
113th
(2013–14)
114th
(2015–16)
115th
(2017–18)
116th
(2019–20)
117th
(2021–22)
Figure 6.1.16
Table of Contents
Chapter 6 Preview
280
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Figure 6.1.17 shows the mentions in committee reports for the 117th Congressional Session, which took place
from 2021 to 2022. The Appropriations Committee leads the House reports, while the Homeland Security and
Governmental Affairs Committee leads the Senate reports (Figure 6.1.18).
Mentions of AI in Committee Reports of the U.S. House of Representatives for the 117th Congressional
Session, 2021–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Appropriations
20
Science, Space, and Technology
9
Rules
5
Armed Services
3
Transportation and Infrastructure
3
Intelligence (Permanent Select)
2
Natural Resources
2
Oversight and Accountability
2
Budget
1
Education and the Workforce
1
Energy and Commerce
1
Financial Services
1
Foreign A airs
1
Homeland Security
1
House Administration
1
Small Business
1
Ways and Means
1
0
1
2
3
4
5
6
7
8
9
10
11
12
Number of Mentions
13
14
15
16
17
18
19
20
21
Figure 6.1.17
Mentions of AI in Committee Reports of the U.S. Senate for the 117th Congressional Session, 2021–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Homeland Security and
Governmental A airs
8
Appropriations
3
Commerce, Science,
and Transportation
3
Armed Services
2
Intelligence (Select)
2
0
Table of Contents
1
2
Chapter 6 Preview
3
4
5
Number of Mentions
6
7
8
Figure 6.1.18
281
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
Figure 6.1.19 shows the total number of mentions in committee reports from the past 10 congressional sessions,
which took place from 2001 to 2022. The House and Senate Appropriations Committees, which regulate
expenditures of money by the government, lead their respective lists (Figure 6.1.19 and 6.1.20).
Mentions of AI in Committee Reports of the U.S. Senate, 2001–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Appropriations
16
Homeland Security and
Governmental A airs
11
Armed Services
10
Commerce, Science,
and Transportation
9
Energy and
Natural Resources
7
Intelligence (Select)
5
0
2
4
6
8
10
Number of Mentions
12
14
16
Figure 6.1.19
Mentions of AI in Committee Reports of the U.S. House of Representatives, 2001–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Appropriations
45
Science, Space, and Technology
27
Rules
14
Armed Services
9
Energy and Commerce
8
Oversight and Accountability
8
Financial Services
6
Intelligence (Permanent Select)
6
Education and the Workforce
4
Transportation and Infrastructure
4
Homeland Security
3
Veterans’ A airs
3
Budget
2
Foreign A airs
2
Judiciary
2
Natural Resources
2
Small Business
2
Ways and Means
2
Agriculture
1
House Administration
1
0
Table of Contents
2
4
6
8
Chapter 6 Preview
10
12
14
16
18
20 22 24 26 28
Number of Mentions
30
32
34
36
38
40
42
44
46
Figure 6.1.20
282
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
United States AI Policy Papers
To estimate activities outside national governments
that are also informing AI-related lawmaking,
the AI Index tracked 55 U.S.-based organizations
that published policy papers in the past five
years. Those organizations include: think tanks
post that addresses issues related to AI and makes
specific recommendations to policymakers. Topics of
those papers are divided into primary and secondary
categories: A primary topic is the main focus of the
paper, while a secondary topic is a subtopic of the
paper or an issue that is briefly explored.
and policy institutes (19); university institutes and
Figure 6.1.21 highlights the total number of U.S.-based,
research programs (14); civil society organizations,
AI-related policy papers published from 2018 to 2022.
associations, and consortiums (9); industry and
After a slight dip from 2020 to 2021, the total increased
consultancy organizations (9); and government
to 284 in 2022. Since 2018, the total number of such
agencies (4). A policy paper in this section is defined
papers has increased 3.2 times, signaling greater
as a research paper, research report, brief, or blog
interest over time.
Number of AI-Related Policy Papers by U.S.-Based Organizations, 2018–22
Source: Stanford Institute for Human-Centered AI (HAI) Policy and Society | Chart: 2023 AI Index Report
300
284
Number of Policy Papers
250
200
150
100
50
0
2018
2019
2020
2021
2022
Figure 6.1.21
Table of Contents
Chapter 6 Preview
283
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.1 AI and Policymaking
By Topic
In 2022, the most frequent primary topics were
sat in fourth position as of 2022. All of these leading
industry and regulation (107), innovation and
topics were also well represented as secondary topics.
technology (90), and government and publication
Topics that received comparatively little attention
administration (82) (Figure 6.1.22). Privacy, safety, and
included social and behavioral sciences; humanities;
security, which was the most reported topic in 2021,
and communications and media.
Number of AI-Related Policy Papers by U.S.-Based Organization by Topic, 2022
Source: Stanford Institute for Human-Centered AI (HAI) Policy and Society | Chart: 2023 AI Index Report
Primary Topic
Secondary Topic
Industry and Regulation
107
Innovation and Technology
90
Gov’t and Public Administration
59
39
Equity and Inclusion
34
Int’l A airs and Int’l Security
34
Workforce and Labor
17
18
65
30
Democracy
26
26
Health and Biological Sciences
5
13
Justice and Law Enforcement
10
12
Physical Sciences
13
9
Education and Skills
8
10
4
8
3
1
Humanities 1
1
Communications and Media
0
25
59
Ethics
Social and Behavioral Sciences
69
82
Privacy, Safety, and Security
Energy and Environment
50
3
20
40
60
80
100
0
20
Number of Policy Papers
40
60
80
100
Figure 6.1.22
Table of Contents
Chapter 6 Preview
284
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.2 National AI Strategies
This subsection presents an overview of national AI strategies—policy plans developed by a country’s government to steer the
development and deployment of AI technologies within its borders. Tracking trends in national strategies can be an important way of
gauging the degree to which countries are prioritizing the management and regulation of AI technologies. Sources include websites of
national or regional governments, the OECD AI Policy Observatory (OECD.AI), and news coverage. “AI strategy” is defined as a policy
document that communicates the objective of supporting the development of AI while also maximizing the benefits of AI for society.5
6.2 National AI Strategies
Aggregate Trends
Canada officially launched the first national AI strategy
in March of 2017; since then a total of 62 national
AI strategies have been released (Figure 6.2.1). The
Yearly Release of AI National Strategies by Country
Source: AI Index, 2022 | Table: 2023 AI Index Report
Year
2017
Canada, China, Finland
2018
Australia, France, Germany, India, Mauritius, Mexico, Sweden
2019
Argentina, Austria, Bangladesh, Botswana, Chile, Colombia,
Cyprus, Czech Republic, Denmark, Egypt, Estonia, Japan,
Kenya, Lithuania, Luxembourg, Malta, Netherlands, Portugal,
Qatar, Romania, Russia, Sierra Leone, Singapore, United Arab
Emirates, United States of America, Uruguay
2020
Algeria, Bulgaria, Croatia, Greece, Hungary, Indonesia, Latvia,
Norway, Poland, Saudi Arabia, Serbia, South Korea, Spain,
Switzerland
2021
Brazil, Ireland, Peru, Philippines, Slovenia, Tunisia, Turkey,
Ukraine, United Kingdom, Vietnam
2022
Italy, Thailand
number of released strategies peaked in 2019.
By Geographic Area
Figure 6.2.2 highlights the countries which, as of
December 2022, have either released or developed
a national AI strategy. Figure 6.2.3 enumerates the
countries that, in 2021 and 2022, pledged to develop
an AI strategy . The first nations to officially release
national AI strategies were Canada, China, and Finland
in 2017. Only two nations released national AI strategies
in 2022: Italy and Thailand.
Figure 6.2.1
AI National Strategies in Development by Country
and Year
Source: AI Index, 2022 | Table: 2023 AI Index Report
Year
Countries With a National Strategy on AI, 2022
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Country
Country
2021
Armenia, Bahrain, Cuba, Iceland, Morocco, New Zealand, Oman
2022
Azerbaijan, Belgium, Benin, Israel, Jordan, Nigeria, Uzbekistan
Figure 6.2.3
Released
In Development
Not Released
Figure 6.2.2
5 The AI Index research team made efforts to identify whether there was a national AI strategy that was released or in development for every nation in the world.
It is possible that some strategies were missed.
Table of Contents
Chapter 6 Preview
285
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.3 U.S. Public Investment in AI
This section examines public AI investment in the United States based on data from the U.S. government and Govini, a company that uses
AI and machine learning technologies to track U.S. public and commercial spending.
6.3 U.S. Public Investment in AI
Federal Budget for
Nondefense AI R&D
information on classified AI R&D investment by
In December 2022, the National Science and
In fiscal year (FY) 2022, nondefense U.S. government
defense and intelligence agencies.
agencies allocated a total of $1.7 billion to AI R&D
Technology Council published a report on the
public-sector AI R&D budget across departments
and agencies participating in the Networking and
Information Technology Research and Development
(NITRD) Program and the National Artificial
spending (Figure 6.3.1). The amount allocated in FY
2022 represented a slight decline from FY 2021 and
a 208.9% increase from FY 2018. An even greater
amount, $1.8 billion, has been requested for FY 2023.
Intelligence Initiative. The report does not include
U.S. Federal Budget for Nondefense AI R&D, FY 2018–23
Source: U.S. NITRD Program, 2022 | Chart: 2023 AI Index Report
1.84
Budget (in Billions of U.S. Dollars)
1.50
1.75
1.73
FY21 (Enacted)
FY22 (Enacted)
1.43
1.11
1.00
0.56
0.50
0.00
FY18 (Enacted)
FY19 (Enacted)
FY20 (Enacted)
FY23 (Requested)
Figure 6.3.16
6 A previous report on the public-sector AI R&D budget released in 2021 classed the FY21 spending as totaling $1.53 billion. However, the most recent report,
released in 2022, upgraded the total spent in 2022 to $1.75 billion.
Table of Contents
Chapter 6 Preview
286
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.3 U.S. Public Investment in AI
U.S. Department of Defense
Budget Requests
Every year the DoD releases the amount of funding
to the 2022 report, the DoD requested $1.1 billion in FY
they have requested for nonclassified AI-specific
2023, a 26.4% increase from the funding they received
research, development, test, and evaluation. According
in FY 2022 (Figure 6.3.2).
U.S. DoD Budget Request for AI-Specific Research, Development, Test, and Evaluation (RDT&E), FY 2020–23
Source: U.S. Office of the Under Secretary of Defense (Comptroller), 2022 | Chart: 2023 AI Index Report
1.10
Budget Request (in Billions of U.S. Dollars)
1.00
0.93
0.84
0.87
0.80
0.60
0.40
0.20
0.00
FY20 Funding
FY21 Funding
FY22 Funding
FY23 Funding
Figure 6.3.2
Table of Contents
Chapter 6 Preview
287
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.3 U.S. Public Investment in AI
U.S. Government AI-Related
Contract Spending
volumes of federal contracts data, including prime
Public investment in AI can also be measured by
analyze data that is otherwise often inaccessible.
federal government spending on the contracts
contracts, grants, and other transaction authority
(OTA) awards. The use of AI models enables Govini to
Total Contract Spending
that U.S. government agencies award to private
companies for the supply of goods and services. Such
contracts typically occupy the largest share of an
Figure 6.3.3 highlights total U.S. government spending
on AI, subdivided by various AI segments. From 2021
agency’s budget.
to 2022, total AI spending increased from $2.7 billion
Data in this section comes from Govini, which created
nearly 2.5 times. In 2022, the AI subsegments that saw
a taxonomy of spending by the U.S. government on
the greatest amount of government spending included
critical technologies including AI. Govini applied
decision science ($1.2 billion), and computer vision
supervised machine learning and natural language
($0.8 billion).
to $3.3 billion. Since 2017, total spending has increased
processing to parse, analyze, and categorize large
U.S. Government Spending by Segment, FY 2017–22
Source: Govini, 2022 | Chart: 2023 AI Index Report
Decision Science
Computer Vision
Machine Learning
Autonomy
Natural Language Processing
U.S. Government Spending (in Billions of U.S. Dollars)
3.50
3.28
0.17
3.00
2.70
2.50
2.41
0.21
2.00
1.83
1.56
1.50
1.29
1.00
0.24
0.31
0.44
0.50
0.00
2017
0.41
0.43
0.3
0.82
0.58
0.53
0.45
0.46
0.41
0.43
0.32
0.52
0.43
0.26
0.21
0.69
0.21
1.01
0.46
0.55
2018
2019
1.19
0.73
2020
2021
2022
Figure 6.3.3
Table of Contents
Chapter 6 Preview
288
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.3 U.S. Public Investment in AI
Figure 6.3.4 shows U.S. government spending by AI segment in FY 2021 and FY 2022. Spending increased
for the decision science, computer vision, and autonomy segments, while spending on machine learning, and
natural language processing dropped slightly.
U.S. Government Spending by Segment, FY 2021 Vs. 2022
Source: Govini, 2022 | Chart: 2023 AI Index Report
1.19 (+18%)
Decision Science
1.01
0.82 (+55%)
Computer Vision
0.53
0.69 (+33%)
Autonomy
0.52
0.41 (-5%)
Machine Learning
Natural Language
Processing
0.00
0.43
0.17 (-19%)
2022
2021
0.21
0.20
0.40
0.60
0.80
U.S. Government Spending (in Billions of U.S. Dollars)
1.00
1.20
Figure 6.3.4
Table of Contents
Chapter 6 Preview
289
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.3 U.S. Public Investment in AI
In FY 2022, the majority of federal AI contracts were prime contracts (62.5%), followed by grants (34.9%) and
other transaction authority (OTA) awards (2.6%) (Figure 6.3.5). From FY 2021 to FY 2022, the share of contracts
remained about the same, while the share of grants rose.
Total Value of Contracts, Grants, and OTAs Awarded by the U.S. Government for AI/ML and Autonomy,
FY 2017–22
Source: Govini, 2022 | Chart: 2023 AI Index Report
2.05, Contracts
Total Value Awarded (in Billions of U.S. Dollars)
2.00
1.50
1.15, Grants
1.00
0.50
0.09, OTAs
0.00
2017
2018
2019
2020
2021
2022
Figure 6.3.5
Table of Contents
Chapter 6 Preview
290
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.4 U.S. AI-Related Legal Cases
In 2022, the AI Index partnered with Elif Kiesow Cortez, a scholar of artificial intelligence law, in a research project tracking trends in
American legal cases from 2000 to 2022 that contain AI-related keywords.7
6.4 U.S. AI-Related Legal Cases
Total Cases
there were a total of 110 AI-related cases in U.S.
In the last few years, there has been a sharp spike in
federal and state courts, 6.5 times more than in 2016
AI-related jurisprudence in the United States. In 2022,
(Figure 6.4.1).
Number of AI-Related Legal Cases in the United States, 2000–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
110
Number of AI-Related Legal Cases
100
80
60
40
20
0
2000
2002
2004
2006
2008
2010
2012
2014
2016
2018
2020
2022
Figure 6.4.1
7 The Index analyzed both federal and state-level cases. Specific keywords in the search included “artificial intelligence,” “machine learning,” and “automated decision-making.” Some of these
cases did not directly concern issues related to AI jurisprudence. As a next step of this project, we will aim to identify the cases that most centrally concern issues of AI-related law.
Table of Contents
Chapter 6 Preview
291
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.4 U.S. AI-Related Legal Cases
Geographic Distribution
they are home to many large businesses that have
In 2022, the majority of AI-related legal cases
greater number of AI-related legal cases originating
originated in California (23), Illinois (17), and New
from Illinois—this follows the state’s enactment
York (11) (Figure 6.4.2). The aggregate number of AI-
of the Biometric Information Privacy Act (BIPA),
related cases since 2000 show a similar geographic
which requires that companies doing business in
distribution (Figure 6.4.3). California and New York’s
Illinois follow a number of regulations related to the
inclusion in the top three is unsurprising given that
collection and storage of biometric information.
integrated AI. In recent years, there have been a
Number of AI-Related Legal Cases in the United States by State, 2022
Source: AI Index, 2022 | Chart: 2023 AI Index Report
California
23
Illinois
17
New York
11
Delaware
7
Florida
7
Washington
5
Kansas
4
Massachusetts
4
Maryland
4
District of Columbia
3
Texas
3
Ohio
3
Pennsylvania
3
Virginia
Missouri
0
2
2
2
4
6
8
10
12
14
Number of AI-Related Legal Cases
16
18
20
22
24
Figure 6.4.28
8 Figures 6.4.2 and 6.4.3 include information for states and districts, given that cases sometimes originate from American districts like the District of Columbia or Puerto Rico
Table of Contents
Chapter 6 Preview
292
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.4 U.S. AI-Related Legal Cases
Number of AI-Related Legal Cases in the United States by State, 2000–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
California
127
New York
66
Illinois
36
Texas
26
Delaware
19
Massachusetts
19
Washington
18
Pennsylvania
16
Michigan
12
Virginia
12
District of Columbia
12
Florida
12
Ohio
Kansas
Minnesota
0
10
9
8
10
20
30
40
50
60
70
80
Number of AI-Related Legal Cases
90
100
110
120
130
Figure 6.4.3
Sector
Figure 6.4.4 groups U.S.-based legal cases by economic sector. The predominant sector in 2022 was financial
services and professional services (48 cases); followed by media, culture, graphical (18); and public service (14).
Sector at Issue in AI-Related Legal Cases in the United States, 2022
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Financial Services, Professional Services
48
Media, Culture, Graphical
18
Public Service
14
Education
6
Health Services
6
Hotels, Catering, Tourism
4
Postal and Telecommunications Services
4
Transport (Including Civil Aviation,
Railways, Road Transport)
3
Food, Drink, Tobacco
2
Transport Equipment Manufacturing
2
Basic Metal Production
1
Mechanical and Electrical Engineering
1
Oil and Gas Production, Oil Re ning
1
0
Table of Contents
4
Chapter 6 Preview
8
12
16
20
24
28
32
Number of AI-Related Legal Cases
36
40
44
48
Figure 6.4.4
293
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.4 U.S. AI-Related Legal Cases
Type of Law
The greatest proportion of AI-related legal cases concerned civil law (29%) (Figure 6.4.5). There were also a large
number of AI-related legal cases in the domain of intellectual property (19%), as well as contract law (13.6%).
Area of Law of AI-Related Legal Cases in the United States, 2022
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Civil
32
Intellectual
Property
21
Contract
15
Competition
11
Constitutional
8
Employment and
Labor
6
Criminal
5
Corporate
4
Financial
3
Human Rights and
Immigration
2
Terrorism and
National Security
2
Tort
0
Table of Contents
1
2
4
6
8
Chapter 6 Preview
10
12
14
16
18
20
Number of AI-Related Legal Cases
22
24
26
28
30
32
Figure 6.4.5
294
Artificial Intelligence
Index Report 2023
Chapter 6: Policy and Governance
6.4 U.S. AI-Related Legal Cases
Narrative Highlight:
Three Significant AI-Related Legal Cases
The section below profiles three significant AI-related cases in the United States,
highlighting some of the legal issues that are at stake when AI is brought into the courts.
Duerr v. Bradley University (2022-
Flores v. Stanford9 (2021-Sep-28)
Dyroff v. Ultimate Software Grp., Inc
Mar-10) – United States Court of
– United States Court of Appeals
(2017-Nov-26) – United States Court of
Appeals for the Seventh Circuit
for the Second Circuit
Appeals for the Ninth Circuit
The plaintiffs, who were enrolled
The plaintiffs, offenders denied
Plaintiff Kristanalea Dyroff sued Ultimate
as undergraduates in a private
parole, sued the New York State
Software after her 29-year-old son died
university in Peoria, Illinois, during
Board of Parole over being
from an overdose of heroin laced with
the fall 2020 semester, were told
refused access to information
fentanyl, which he allegedly bought
to use a third-party proctoring
used by the board in its review
from a drug dealer that he encountered
tool called Respondus Monitor for
of their cases. Northpointe, Inc.,
on Ultimate Software’s social network
remote, online exams. This tool
petitioned the court as a non-
site. Dyroff asserted seven claims
made use of artificial intelligence
party because its Correctional
against Ultimate Software which
technologies. The plaintiffs claimed
Offender Management Profiling
included negligence, wrongful death,
that the defendants violated Illinois’
for Alternative Sanctions
and civil conspiracy. At the core of these
Biometric Information Privacy Act
(COMPAS), an AI-powered
claims was the argument that Ultimate
(BIPA) by not adequately following
risk assessment tool, had been
Software mined the data of users
its guidelines concerning the
used by the parole board in its
and deployed that data, alongside an
collection of biometric information.
determinations. Northpointe
algorithm, to recommend drug-related
BIPA does not apply to financial
wanted to prevent the disclosure
discussion groups to her son. Ultimate
institutions. Ultimately, the court
of AI trade secrets to one of the
Software moved to dismiss the claims
ruled that under the Gramm-
plaintiff’s expert witnesses. The
and claimed partial immunity under the
Leach-Bliley Act, the defendants
court ruled that the confidential
Communications Decency Act, which
were a financial institution by
material in question was relevant
protects website operators from liability
virtue of lending functions they
to the plaintiff’s case and posed
for third-party content on their site. The
engaged in and therefore exempt
little risk of competitive injury.
Court ruled that Ultimate Software was
from BIPA. As such, the plaintiff’s
As such, the material was
immune and that its use of algorithms
case was dismissed.
ordered to be released under a
did not sufficiently amount to novel
supplemental protective order.
content creation.
9 The defendant was Tina M. Stanford, as Chairwoman of the New York State Board of Parole.
Table of Contents
Chapter 6 Preview
295
Artificial Intelligence
Index Report 2023
Artificial Intelligence
Index Report 2023
CHAPTER 7:
Diversity
Table of Contents
Chapter 7 Preview
296
Artificial Intelligence
Index Report 2023
CHAPTER 7 PREVIEW:
Diversity
Overview
298
Chapter Highlights
299
7.1 AI Conferences
Women in Machine Learning (WiML)
NeurIPS Workshop
300
Narrative Highlight:
Disability Status of CS, CE,
and Information Students
311
New AI PhDs
312
CS, CE, and Information Faculty
313
300
Workshop Participants
300
7.3 K–12 Education
316
Demographic Breakdown
301
AP Computer Science: Gender
316
AP Computer Science: Ethnicity
318
7.2 AI Postsecondary Education
305
CS Bachelor’s Graduates
305
CS Master’s Graduates
307
CS PhD Graduates
309
Table of Contents
Chapter 7 Preview
ACCESS THE PUBLIC DATA
297
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
Overview
AI systems are increasingly deployed in the real world. However, there often exists
a disparity between the individuals who develop AI and those who use AI. North
American AI researchers and practitioners in both industry and academia are
predominantly white and male. This lack of diversity can lead to harms, among them
the reinforcement of existing societal inequalities and bias.
This chapter highlights data on diversity trends in AI, sourced primarily from academia.
It borrows information from organizations such as Women in Machine Learning
(WiML), whose mission is to improve the state of diversity in AI, as well as the
Computing Research Association (CRA), which tracks the state of diversity in North
American academic computer science. Finally, the chapter also makes use of Code.org
data on diversity trends in secondary computer science education in the United States.
Note that the data in this subsection is neither comprehensive nor conclusive. Publicly
available demographic data on trends in AI diversity is sparse. As a result, this chapter
does not cover other areas of diversity, such as sexual orientation. The AI Index hopes
that as AI becomes more ubiquitous, the amount of data on diversity in the field will
increase such that the topic can be covered more thoroughly in future reports.
Table of Contents
Chapter 7 Preview
298
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
Chapter Highlights
North American bachelor’s,
master’s, and PhD-level
computer science students
are becoming more
ethnically diverse.
Although white students are still the
most represented ethnicity among new
resident bachelor’s, master’s, and PhD-level
computer science graduates, students from
other ethnic backgrounds (for example,
Asian, Hispanic, and Black or African
American) are becoming increasingly
more represented. For example, in 2011,
71.9% of new resident CS bachelor’s
graduates were white. In 2021, that number
dropped to 46.7%.
Women make up an
increasingly greater share
of CS, CE, and information
faculty hires.
Since 2017, the proportion of new female
CS, CE, and information faculty hires has
increased from 24.9% to 30.2%. Still, most
CS, CE, and information faculty in North
American universities are male (75.9%).
As of 2021, only 0.1% of CS, CE, and
information faculty identify as nonbinary.
Table of Contents
Chapter 7 Preview
New AI PhDs are still
overwhelmingly male.
In 2021, 78.7% of new AI PhDs were
male. Only 21.3% were female, a
3.2 percentage point increase from
2011. There continues to be a gender
imbalance in higher-level AI education.
American K–12 computer
science education has
become more diverse,
in terms of both gender
and ethnicity.
The share of AP computer science
exams taken by female students
increased from 16.8% in 2007 to 30.6%
in 2021. Year over year, the share of
Asian, Hispanic/Latino/Latina, and
Black/African American students
taking AP computer science has
likewise increased.
299
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.1 AI Conferences
7.1 AI Conferences
Women in Machine Learning
(WiML) NeurIPS Workshop
collaboration and interaction among participants from
Women in Machine Learning (WiML), founded in
Workshop Participants
2006, is an organization dedicated to supporting and
Figure 7.1.1 shows the number of participants that
increasing the impact of women in machine learning.
have attended the WiML workshop since 2010. In the
This subsection of the AI Index report presents data
last decade, there has been a steady increase: 1,157
from the WiML annual technical workshop, hosted at
individuals participated in 2022, 13 times the number
NeurIPS. Since 2020, WiML has also been hosting the
in 2010. However, from 2021 to 2022, the number of
Un-Workshop, which serves to advance research via
workshop participants decreased from 1,486 to 1,157.1
diverse backgrounds at the International Conference
of Machine Learning (ICML).
Attendance at NeurIPS Women in Machine Learning Workshop, 2010–22
Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report
1,400
Number of Attendees
1,200
1,157
1,000
800
600
400
200
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 7.1.1
1 The recent decrease in WiML workshop attendance may be attributable to the overall recent decrease in NeurIPS attendance. This overall decrease may in turn be a result of
NeurIPS moving away from a purely virtual format.
Table of Contents
Chapter 7 Preview
300
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.1 AI Conferences
Demographic Breakdown
information aggregated. Among survey respondents,
Figure 7.1.2 breaks down the continent of residence
around 41.5% were from North America, followed
of the 2022 workshop participants. The data in the
by Europe (34.2%), Asia (17.1%), and Africa (3.4%). In
following figures comes from a survey completed
2022, there was greater representation from Europe,
by participants who consented to having such
Asia, and South America.
Continent of Residence of Participants at NeurIPS Women in Machine Learning Workshop, 2022
Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report
North
America
41.50%
Europe
34.20%
Asia
17.10%
Africa
South
America
Australia/
Oceania
3.40%
1.60%
1.40%
Antarctica 0.20%
0%
5%
10%
15%
20%
25%
% of Respondents
30%
35%
40%
45%
Figure 7.1.22
2 At the time of the survey, one of the respondents was temporarily residing in Antarctica.
Table of Contents
Chapter 7 Preview
301
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.1 AI Conferences
The majority of participants at the 2022 WiML workshop were female-identifying (37.0%), another 25.8% were
male-identifying, and 0.5% were nonbinary-identifying (Figure 7.1.3).
Gender Breakdown of Participants at NeurIPS Women in Machine Learning Workshop, 2022
Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report
Female
37.00%
Prefer Not to Say
36.30%
Male
Nonbinary
25.80%
0.50%
Gender Fluid 0.20%
Gender
0.20%
Non-Conforming
0%
5%
10%
15%
20%
% of Respondents
25%
30%
35%
40%
Figure 7.1.3
Table of Contents
Chapter 7 Preview
302
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.1 AI Conferences
The most represented professional positions at the workshop were PhD students (49.4%), research scientists/
data scientists (20.8%), software engineers/data engineers (8.4%), and faculty (4.4%) (Figure 7.1.4).
Professional Positions of Participants at NeurIPS Women in Machine Learning Workshop, 2022
Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report
PhD Student
49.40%
Research Scientist/
Data Scientist
Software Engineer/
Data Engineer
20.80%
8.40%
Faculty
4.40%
CEO/Director
3.50%
Others
3.50%
Postdoc
3.50%
MSc Student
Undergraduate
Student
2.30%
2.00%
Recruiter
1.60%
Lecturer
1.40%
Recent Graduate 0.20%
0%
10%
20%
30%
% of Respondents
40%
50%
Figure 7.1.4
Table of Contents
Chapter 7 Preview
303
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.1 AI Conferences
The WiML workshop participants at NeurIPS submitted papers covering a wide range of subjects (Figure 7.1.5).
The most popular submission topics were applications (32.5%), algorithms (23.4%), and deep learning (14.8%).
Primary Subject Area of Submissions at NeurIPS Women in Machine Learning Workshop, 2022
Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report
Applications
32.50%
Algorithms
23.40%
Deep Learning
14.80%
Social Aspects of
Machine Learning
7.70%
Reinforcement
Learning and Planning
7.20%
Neuroscience and
Cognitive Science
5.30%
Data, Challenges,
Implementations,
Software
3.80%
Probabilistic Methods
3.30%
Optimization
1.00%
Theory
1.00%
0%
5%
10%
15%
20%
% of Respondents
25%
30%
Figure 7.1.5
Table of Contents
Chapter 7 Preview
304
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
Another proxy for studying diversity in AI is looking at trends in postsecondary AI education. The following subsection borrows data
from the Computing Research Association’s (CRA) annual Taulbee Survey.3
7.2 AI Postsecondary Education
CS Bachelor’s Graduates
last decade whereby an increasingly large number
The number of female CS bachelor’s graduates
survey also included a nonbinary gender category: In
rose to 22.3% from 2020 to 2021 (Figure 7.2.1). This
2021, the number of nonbinary/other-identifying CS
increase mirrors a broader trend observed in the
bachelor’s graduates was 0.04%.
of CS bachelor’s graduates were women. The CRA
Gender of New CS Bachelor’s Graduates (% of Total) in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
90%
New CS Bachelor’s Graduates (% of Total)
80%
77.66%, Male
70%
60%
50%
40%
30%
22.30%, Female
20%
10%
0%
0.04%, Nonbinary/Other
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.2.1
3 The charts in this subsection look only at the ethnicity of domestic or native CS students and faculty. Although the CRA reports data on the proportion of nonresident aliens in each educational
level (i.e., Bachelor’s, Master’s, PhD, and faculty), data on the ethnicity of nonresident aliens is not included. For the proportion of nonresident aliens in each category, see footnotes.
Table of Contents
Chapter 7 Preview
305
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
Figure 7.2.2 breaks down the ethnicity of new CS bachelor’s graduates in North America: The top ethnicity
was white (46.7%), followed by Asian (34.0%) and Hispanic (10.9%). In the last decade, the proportion of
new CS bachelor’s graduates who were Asian, Hispanic, or multiracial (not Hispanic) steadily increased.4
Ethnicity of New Resident CS Bachelor’s Graduates (% of Total) in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
New CS Bachelor’s Graduates (% of Total)
70%
60%
50%
46.69%, White
40%
33.99%, Asian
30%
20%
10.91%, Hispanic (Any Race)
4.10%, Multiracial (Not Hispanic)
3.85%, Black or African-American
0.24%, Native Hawaiian or Paci c Islander
0.22%, American Indian or Alaska Native
10%
0%
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.2.2
4 In 2021, 16.3% of new CS bachelor graduates were nonresident aliens.
Table of Contents
Chapter 7 Preview
306
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
CS Master’s Graduates
Figure 7.2.3 shows the gender of CS master’s
moving to 27.8% in 2021 from 24.6% in 2011. In
graduates. The proportion of female CS master’s
2021, 0.9% of CS master’s graduates identified as
graduates has not substantially increased over time,
nonbinary/other.
Gender of New CS Master’s Graduates (% of Total) in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
80%
71.27%, Male
New CS Master’s Graduates (% of Total)
70%
60%
50%
40%
30%
27.83%, Female
20%
10%
0.90%, Nonbinary/Other
0%
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.2.3
Table of Contents
Chapter 7 Preview
307
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
Of domestic students, the most represented ethnicities are white (50.3%), followed by Asian (34.8%), and
Hispanic (7.3%) (Figure 7.2.4). As with CS bachelor’s graduates, in the last decade white students have
represented an increasingly smaller proportion of new CS master’s graduates.5
Ethnicity of New Resident CS Master’s Graduates (% of Total) in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
70%
New CS Master’s Graduates (% of Total)
60%
50.28%, White
50%
40%
34.83%, Asian
30%
20%
7.25%, Hispanic (Any Race)
3.82%, Black or African-American
3.45%, Multiracial (Not Hispanic)
0.25%, American Indian or Alaska Native
0.12%, Native Hawaiian or Paci c Islander
10%
0%
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.2.4
5 In 2021, 65.2% of new CS master’s graduates were nonresident aliens.
Table of Contents
Chapter 7 Preview
308
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
CS PhD Graduates
In 2021, the number of new female CS PhD
continue to be male. There remains a large gap
graduates rose to 23.3% from 19.9% (Figure 7.2.5).
between new male and female CS PhDs.
Despite this rise, most new CS PhD graduates
Gender of New CS PhD Graduates (% of Total) in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
80%
76.58%, Male
New CS PhD Graduates (% of Total)
70%
60%
50%
40%
30%
23.30%, Female
20%
10%
0.12%, Nonbinary/Other
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.2.5
Table of Contents
Chapter 7 Preview
309
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
Between 2011 and 2021, the number of new white resident CS PhD graduates declined by 9.4 percentage
points. Asians are the next most represented group (29%), followed by Hispanics (5.1%) and Black or African
Americans (4%) (Figure, 7.2.6).6
Ethnicity of New Resident CS PhD Graduates (% of Total) in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
70%
New CS PhD Graduates (% of Total)
60%
58.64%, White
50%
40%
30%
29.00%, Asian
20%
5.12%, Hispanic (Any Race)
4.05%, Black or African-American
2.13%, Multiracial (Not Hispanic)
0.64%, American Indian or Alaska Native
0.43%, Native Hawaiian or Paci c Islander
10%
0%
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.2.6
6 In 2021, 68.6% of new CS PhD graduates were nonresident aliens.
Table of Contents
Chapter 7 Preview
310
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
Narrative Highlight:
Disability Status of CS, CE, and Information Students
The 2021 edition of the CRA Taulbee Survey was the
disability accommodations in the last year.
first to gather information about the prevalence of
The number of such students was relatively small.
CS, CE, and information students with disabilities.
Only 4.0% of bachelor’s, 1.0% of PhD students,
The CRA asked departments to identify the number
and 0.8% of master’s students reported needing
of students at each degree level who received
accommodations (Figure 7.2.7).
CS, CE, and Information Students (% of Total) With Disability Accomodations in North America, 2021
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
4.10%
CS, CE, and Information Students (% of Total)
4%
3%
2%
1.00%
1%
0%
0.80%
Bachelor’s
PhDs
Master’s
Figure 7.2.7
Table of Contents
Chapter 7 Preview
311
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
New AI PhDs
were female. While the number of female AI PhDs
Figure 7.2.8 looks at demographic trends for new AI
marginally increased from 2020 to 2021, we find no
PhD graduates who focus on artificial intelligence.
meaningful trends in the last decade relating to the
In 2021, 78.7% of new AI PhDs were male and 21.3%
gender of new AI PhDs.
Gender of New AI PhD Graduates (% of Total) in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
80%
78.70%, Male
New AI PhD Graduates (% of Total)
70%
60%
50%
40%
30%
21.30%, Female
20%
10%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.2.8
Table of Contents
Chapter 7 Preview
312
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
CS, CE, and
Information Faculty
predominantly male (75.9%) (Figure 7.2.9). Women
make up 23.9% of CS, CE, and information faculty,
Data on the ethnicity and gender of CS, CE, and
and nonbinary individuals make up 0.1%. The share
information faculty helps to paint a picture of
of female CS, CE, and information faculty has slowly
diversity trends in academic AI and CS. As of 2021,
increased; since 2011, the number of female faculty
most CS, CE, and information faculty members are
members has risen 5 percentage points.
Gender of CS, CE, and Information Faculty (% of Total) in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
80%
75.94%, Male
CS, CE, and Information Faculty (% of Total)
70%
60%
50%
40%
30%
23.94%, Female
20%
10%
0.12%, Nonbinary/Other
0%
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.2.9
Table of Contents
Chapter 7 Preview
313
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
Although most new CS, CE, and information faculty hires in North American universities are still male, the
proportion of women among faculty hires reached 30.2% in 2021, up about 9 percentage points from 2015
(Figure 7.2.10).
Gender of New CS, CE, and Information Faculty Hires (% of Total) in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
New CS, CE, and Information Faculty Hires (% of Total)
80%
70%
69.26%, Male
60%
50%
40%
30.17%, Female
30%
20%
10%
0%
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
0.57%, Nonbinary/Other
2021
Figure 7.2.10
Table of Contents
Chapter 7 Preview
314
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.2 AI Postsecondary Education
The majority of resident CS, CE, and information faculty are white as of 2021 (58.1%), followed by Asian (29.7%)
(Figure 7.2.11). However, the gap between white CS, CE, and information faculty and faculty of the next nearest
ethnicity is slowly narrowing: In 2011, the gap stood at 46.1%, whereas in 2021 it dropped to 28.4%.7
Ethnicity of Resident CS, CE, and Information Faculty (% of Total) in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
70%
CS, CE, and Information Faculty (% of Total)
60%
58.08%, White
50%
40%
30%
29.70%, Asian
20%
5.82%, Unknown
2.80%, Hispanic (Any Race)
2.54%, Black or African-American
0.67%, Multiracial (Not Hispanic)
0.25%, American Indian or Alaska Native
0.13%, Native Hawaiian or Paci c Islander
10%
0%
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.2.11
7 In 2021, 6.7% of CS, CE, and information faculty in North America were nonresident aliens.
Table of Contents
Chapter 7 Preview
315
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.3 K–12 Education
How do trends in AI diversity measure at the K–12 level, prior to students entering university? This subsection borrows data from
Code.org, an American nonprofit that aims to promote K–12 computer science education in the United States.
7.3 K–12 Education
AP Computer Science: Gender
nor female (Figure 7.3.1). It is still the case that male
In 2021, 69.2% of AP computer science exams were
students take more AP computer science exams
taken by male students, 30.6% by female students,
than any other gender, but the proportion of female
and 0.3% by students who identified as neither male
students has almost doubled in the last decade.
AP Computer Science Exams Taken (% of Total) by Gender, 2007–21
Source: Code.org, 2022 | Chart: 2023 AI Index Report
AP Computer Science Exams Taken (% of Total)
80%
70%
69.16%, Male
60%
50%
40%
30.58%, Female
30%
20%
10%
0.26%, Other
0%
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.3.1
Table of Contents
Chapter 7 Preview
316
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.3 K–12 Education
On a percent basis, the states with the largest
(35%) (Figure 7.3.2). Other states with notable CS and
number of female AP computer science test-
AI activity include California, Texas, and Washington,
takers were Alabama (36%) and Washington, D.C.
with rates of women taking AP computer science
(36%), followed by Nevada (35%), Louisiana (35%),
tests at rates hovering around 30 percent.
Tennessee (35%), Maryland (35%), and New York
AP Computer Science Exams Taken by Female Students (% of Total),
2021
Source: Code.org, 2022 | Chart: 2023 AI Index Report
ME
27%
AK
20%
CA
31%
VT
23%
NH
24%
MA
30%
NY
35%
CT
30%
RI
31%
WA
32%
MT
21%
ND
16%
SD
15%
MN
23%
WI
23%
MI
30%
OR
21%
ID
26%
WY
31%
NE
25%
IA
24%
IL
32%
IN
23%
OH
27%
PA
27%
NJ
31%
NV
35%
UT
23%
CO
26%
KS
15%
MO
22%
KY
31%
WV
30%
DC
36%
MD
35%
DE
22%
AZ
27%
NM
29%
OK
25%
AR
29%
TN
35%
VA
28%
NC
31%
TX
30%
LA
35%
MS
33%
AL
36%
GA
29%
SC
34%
HI
30%
FL
31%
Figure 7.3.2
Table of Contents
Chapter 7 Preview
317
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
7.3 K–12 Education
AP Computer Science:
Ethnicity
most postsecondary computer science fields, the
Code.org collects data that speaks to trends in the
more ethnically diverse over time. White students
ethnicity of AP computer science test-takers. White
are still the greatest test-taking group; however, over
students took the greatest proportion of the exams in
time, more Asian, Hispanic/Latino/Latina and Black/
2021 (42.7%), followed by Asian (28.8%) and Hispanic/
African American students have taken AP computer
Latino/Latina students (16.5%) (Figure 7.3.3). As with
science exams.
pool of AP computer science test-takers is becoming
AP Computer Science Exams Taken (% of Total Responding Students) by Race/Ethnicity, 2007–21
AP Computer Science Exams Taken (% of Total Responding Students)
Source: Code.org, 2022 | Chart: 2023 AI Index Report
60%
50%
42.74%, White
40%
30%
28.78%, Asian
20%
16.48%, Hispanic/Latino/Latina
10%
6.32%, Black/African American
4.92%, Two or More Races
0.62%, Native American/Alaskan
0.15%, Native Hawaiian/Paci c Islander
0.00%, Other
0%
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 7.3.3
Table of Contents
Chapter 7 Preview
318
Artificial Intelligence
Index Report 2023
Artificial Intelligence
Index Report 2023
CHAPTER 8:
Public Opinion
Table of Contents
Chapter 8 Preview
319
Artificial Intelligence
Index Report 2023
CHAPTER 8 PREVIEW:
Public Opinion
Overview
321
Chapter Highlights
322
8.1 Survey Data
323
Global Insights
323
AI Products and Services
323
AI: Harm or Help?
327
United States
Table of Contents
329
Chapter 8 Preview
Narrative Highlight: How Does the
Natural Language Processing (NLP)
Research Community Feel About AI?
334
8.2 Social Media Data
340
Dominant Models
340
ACCESS THE PUBLIC DATA
320
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
Overview
AI has the potential to have a transformative impact on society. As such it has become
increasingly important to monitor public attitudes toward AI. Better understanding
trends in public opinion is essential in informing decisions pertaining to AI’s
development, regulation, and use.
This chapter examines public opinion through global, national, demographic, and ethnic
lenses. Moreover, we explore the opinions of AI researchers, and conclude with a look
at the social media discussion that surrounded AI in 2022. We draw on data from two
global surveys, one organized by IPSOS, and another by Lloyd’s Register Foundation
and Gallup, along with a U.S-specific survey conducted by PEW Research.
It is worth noting that there is a paucity of longitudinal survey data related to AI asking
the same questions of the same groups of people over extended periods of time. As AI
becomes more and more ubiquitous, broader efforts at understanding AI public opinion
will become increasingly important.
Table of Contents
Chapter 8 Preview
321
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
Chapter Highlights
Chinese citizens are among
those who feel the most
positively about AI products
and services. Americans …
not so much.
In a 2022 IPSOS survey, 78% of Chinese
respondents (the highest proportion of
surveyed countries) agreed with the statement
that products and services using AI have
more benefits than drawbacks. After Chinese
respondents, those from Saudi Arabia (76%)
and India (71%) felt the most positive about
AI products. Only 35% of sampled Americans
(among the lowest of surveyed countries) agreed
that products and services using AI had more
benefits than drawbacks.
People across
the world and
especially
America remain
unconvinced by
self-driving cars.
In a global survey, only
27% of respondents
reported feeling safe
in a self-driving car.
Similarly, Pew Research
suggests that only 26%
of Americans feel that
driverless passenger
vehicles are a good idea
for society.
Table of Contents
Chapter 8 Preview
Men tend to feel more
positively about AI products
and services than women.
Men are also more likely than
women to believe that AI will
mostly help rather than harm.
According to the 2022 IPSOS survey, men are more
likely than women to report that AI products and
services make their lives easier, trust companies
that use AI, and feel that AI products and services
have more benefits than drawbacks. A 2021 survey
by Gallup and Lloyd’s Register Foundation likewise
revealed that men are more likely than women to
agree with the statement that AI will mostly help
rather than harm their country in the next 20 years.
Different causes
for excitement
and concern.
NLP researchers …
have some strong
opinions as well.
Among a sample of surveyed
Americans, those who report
feeling excited about AI are
most excited about the potential
to make life and society better
(31%) and to save time and
make things more efficient
(13%). Those who report feeling
more concerned worry about
the loss of human jobs (19%);
surveillance, hacking, and digital
privacy (16%); and the lack of
human connection (12%).
According to a survey
widely distributed to NLP
researchers, 77% either
agreed or weakly agreed
that private AI firms have too
much influence, 41% said that
NLP should be regulated, and
73% felt that AI could soon
lead to revolutionary societal
change. These were some
of the many strong opinions
held by the NLP research
community.
322
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
8.1 Survey Data
Global Insights
Figure 8.1.1 highlights global opinions (aggregated
How do opinions of AI vary across the globe? The
variety of questions relating to AI products and
first subsection of this chapter provides a response by
looking at survey data from IPSOS and Pew Research,
as well as one poll that was a collaboration of Gallup
and Lloyd’s Register Foundation. The surveys suggest
that public perceptions concerning AI differ across
countries and by demographic groups.
results across the entire survey subsample) for a
services. It shows the percentage of respondents
who agree with a particular question. The majority of
the survey sample, 60%, believe that AI products and
services will profoundly change their daily life in the
near future—and make their life easier. A very slight
majority, 52%, feel that products and services that
use AI have more benefits than drawbacks. Only 40%
AI Products and Services
In late 2021, IPSOS ran a survey on global attitudes
of respondents report that AI products and services
toward AI products and services. The survey
make them feel nervous.
consisted of interviews with 19,504 adults ages
16–74 in 28 different countries.1
Global Opinions on Products and Services Using AI (% of Total), 2022
Source: IPSOS, 2022 | Chart: 2023 AI Index Report
I have a good understanding of what
artificial intelligence is
64%
Products and services using artificial intelligence
will profoundly change my daily life
in the next 3–5 years
60%
Products and services using artificial intelligence
make my life easier
60%
Products and services using artificial intelligence
have more benefits than drawbacks
52%
I know which types of products and services
use artificial intelligence
50%
I trust companies that use artificial intelligence
as much as I trust other companies
50%
Products and services using artificial intelligence
have profoundly changed my daily life
in the past 3–5 years
49%
Products and services using artificial intelligence
make me nervous
0%
39%
10%
20%
30%
40%
% of Respondents That “Agree”
50%
60%
Figure 8.1.1
1 See Appendix for more details about the survey methodology.
Table of Contents
Chapter 8 Preview
323
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Opinions vary widely across countries as to the
services using AI have more benefits than drawbacks
relative advantages and disadvantages of AI.
(Figure 8.1.2). However, only 35% of American
The IPSOS survey suggests that 78% of Chinese
respondents share that sentiment. Among the 28
respondents, 76% of Saudi Arabian respondents, and
surveyed countries, France and Canada held the most
71% of Indian respondents feel that products and
negative views.
‘Products and services using AI have more bene ts than drawbacks,’ by Country (% of Total), 2022
Source: IPSOS, 2022 | Chart: 2023 AI Index Report
China
78%
Saudi Arabia
76%
India
71%
Peru
70%
Mexico
65%
Malaysia
65%
Colombia
64%
Chile
63%
South Korea
62%
Turkey
60%
Brazil
57%
South Africa
57%
Argentina
55%
Spain
53%
Russia
53%
Italy
50%
Hungary
49%
Poland
48%
Japan
42%
Sweden
40%
Belgium
38%
Great Britain
38%
Australia
37%
Germany
37%
United States
35%
Netherlands
33%
Canada
32%
France
0%
Table of Contents
31%
10%
20%
Chapter 8 Preview
30%
40%
50%
% of Respondents That “Agree”
60%
70%
80%
Figure 8.1.2
324
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Figure 8.1.3 breaks down answers to all of IPSOS’
companies that use AI as much as other companies,
AI products and services questions by country.
and only 30% say that AI products and services using AI
Generally, sentiment relating to AI products
make them nervous. Conversely, American respondents
and services seems to be strongly correlated
are among the most negative when it comes to AI. Only
within specific countries. For example, Chinese
41% claim that AI products and services make their lives
respondents seem to feel among the most positive
easier, 35% report trusting AI companies as much as
about AI products and services: 87% of Chinese
other companies, and 52% report that AI products and
respondents claim that AI products and services
services make them feel nervous.
make their lives easier, 76% report trusting
Opinions About AI by Country (% Agreeing With Statement), 2022
Source: IPSOS, 2022 | Chart: 2023 AI Index Report
I have a good understanding of what 64% 59% 60% 69% 59% 76% 67% 71% 50% 50% 57% 67% 72% 42% 41% 61% 74% 65% 76% 66% 75% 73% 78% 72% 62% 60% 68% 63%
artificial intelligence is
Products and services using artificial
intelligence will profoundly change 60% 50% 52% 61% 44% 67% 80% 65% 45% 44% 46% 55% 74% 53% 53% 71% 65% 53% 71% 56% 60% 80% 72% 76% 56% 50% 73% 46%
my daily life in the next 3–5 years
Products and services using artificial
59% 46% 49% 65% 44% 70% 87% 71% 39% 45% 45% 50% 72% 54% 52% 71% 73% 47% 74% 58% 64% 80% 67% 74% 59% 46% 71% 41%
intelligence make my life easier
Products and services using artificial
intelligence have more benefits than 55% 37% 38% 57% 32% 63% 78% 64% 31% 37% 38% 49% 71% 50% 42% 65% 65% 33% 70% 48% 53% 76% 57% 62% 53% 40% 60% 35%
drawbacks
I know which types of products and
47% 38% 37% 58% 36% 59% 76% 62% 34% 37% 37% 38% 69% 45% 32% 61% 62% 41% 63% 52% 57% 69% 57% 60% 46% 37% 60% 39%
services use artificial intelligence
I trust companies that use artificial
intelligence as much as I trust other 55% 36% 40% 50% 34% 56% 76% 57% 34% 42% 35% 48% 68% 48% 39% 61% 60% 38% 60% 51% 52% 73% 56% 46% 50% 39% 63% 35%
companies
Products and services using artificial
intelligence have profoundly changed 53% 37% 37% 51% 32% 58% 73% 58% 32% 31% 33% 38% 67% 41% 30% 65% 62% 40% 65% 45% 50% 72% 56% 62% 49% 30% 60% 36%
my daily life in the past 3–5 years
United States
Turkey
Sweden
Spain
South Korea
South Africa
Saudia Arabia
Russia
Poland
Peru
Netherlands
Mexico
Japan
Malaysia
Italy
India
Hungary
Great Britain
France
Germany
China
Colombia
Chile
Brazil
Canada
Belgium
Australia
Argentina
Products and services using artificial
33% 51% 42% 35% 49% 36% 30% 39% 32% 37% 50% 31% 53% 26% 20% 48% 38% 36% 35% 30% 28% 51% 52% 32% 48% 37% 48% 52%
intelligence make me nervous
Figure 8.1.3
Table of Contents
Chapter 8 Preview
325
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Figure 8.1.4 breaks down opinions in all countries
feeling that AI products and services make their lives
across demographic groups such as gender, age,
easier, they are also less likely than the 35-to-49 age
household income, and employment status. IPSOS
category to believe that AI products and services have
results suggest that men feel more positively about
more benefits than drawbacks. Finally, households with
AI products and services than women—for example,
higher incomes are more positive, compared to those
compared to women, men are more likely to report
with lower incomes, about AI products and services
feeling that AI products and services make their
making life easier and having more benefits than
lives easier. Age-specific opinions vary. For instance,
drawbacks.
while individuals under 35 are most likely to report
Opinions About AI by Demographic Group (% Agreeing With Statement), 2022
69%
60%
66%
65%
61%
57%
63%
71%
56%
64%
71%
73%
74%
67%
59%
Products and services using artificial
intelligence will profoundly change
my daily life in the next 3–5 years
63%
57%
63%
61%
55%
56%
58%
67%
53%
58%
68%
70%
72%
64%
54%
Products and services using artificial
intelligence make my life easier
62%
58%
64%
62%
54%
56%
58%
66%
53%
58%
67%
67%
70%
63%
55%
Products and services using artificial
intelligence have more benefits than
drawbacks
55%
49%
47%
53%
46%
50%
51%
57%
45%
50%
59%
63%
64%
55%
47%
I know which types of products and
services use artificial intelligence
55%
46%
54%
51%
45%
46%
50%
57%
44%
48%
58%
63%
65%
54%
44%
I trust companies that use artificial
intelligence as much as I trust other
companies
53%
47%
54%
51%
44%
47%
48%
57%
45%
48%
56%
61%
62%
53%
45%
Products and services using artificial
intelligence have profoundly changed
my daily life in the past 3–5 years
51%
46%
54%
50%
41%
46%
47%
54%
43%
46%
55%
61%
62%
52%
43%
Products and services using artificial
intelligence make me nervous
38%
41%
40%
40%
38%
41%
41%
38%
41%
37%
40%
48%
46%
40%
38%
Female
Under 35
35 to 49
50 to 74
Low
Medium
High
Low
Medium
High
Business Owner
Sr. Exec./
Decision Maker
Employed
Gender
Age
Household Income
Education
Non-Employed
I have a good understanding of what
artificial intelligence is
Male
Source: IPSOS, 2022 | Chart: 2023 AI Index Report
Employment Status
Figure 8.1.4
Table of Contents
Chapter 8 Preview
326
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
AI: Harm or Help?
In 2021, Lloyd’s Register Foundation, an independent
A greater proportion of respondents believed that
global charity, collaborated with Gallup to poll
AI will mostly help (39%) compared to a smaller
125,911 people across 121 countries about their
proportion who believed that it would mostly harm
perceptions of artificial intelligence and other digital
(28%). Mirroring the disparity in responses across
trends. Figure 8.1.5 shows the responses to the
gender evident in the IPSOS survey, men in the
survey question, “Do you think artificial intelligence
Lloyd’s-Gallup poll were more likely than women to
will mostly help or mostly harm people in this country
report believing that AI will mostly help people in the
in the next 20 years?”
next 20 years.
Views on Whether AI Will ‘Mostly Help’ or ‘Mostly Harm’ People in the Next 20 Years Overall and by
Gender (% of Total), 2021
Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report
% World
% Women
% Men
42%
40%
39%
% of Respondents
35%
30%
28%
29%
27%
24%
22%
20%
20%
10%
8%
2%
0%
Mostly help
Mostly harm
2%
Neither
9%
8%
2%
Don’t have an opinion
Don’t know/refused
Figure 8.1.5
Table of Contents
Chapter 8 Preview
327
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Eastern Asia, Northern/Western Europe, and
for every 1 response of “mostly harm” there were 4.4
Southern Europe are the regions of the world where
responses suggesting that AI will “mostly help.” The
people are most likely to report believing that AI will
regions whose populations are most pessimistic about
mostly help versus mostly harm (Figure 8.1.6). More
the potential benefits of AI include Eastern Africa,
specifically, among the Eastern Asian survey sample,
Northern Africa, and Southern Africa.
Views on Whether AI Will ‘Mostly Help’ or ‘Mostly Harm’ People in the Next 20 Years by Region:
Ratio of ‘Mostly Help’/‘Mostly Harm’, 2021
Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report
Eastern Asia
4.40
Northern/Western Europe
1.80
Australia and New Zealand
1.70
Eastern Europe
1.40
Southern Europe
1.30
Central Asia
1.20
Southeastern Asia
1.20
Northern America
1.00
Middle East
1.00
Southern Asia
0.90
Latin America and Caribbean
0.90
Central/Western Africa
0.80
Northern Africa
0.70
Southern Africa
0.60
Eastern Africa
0.40
0
1
The Lloyd’s Register survey also polled
respondents about their perceptions of
2
3
Ratio of “Mostly Help”/“Mostly Harm”
4
Figure 8.1.6
Perceptions of the Safety of Self-Driving Cars
(% of Total), 2021
Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report
certain AI technologies, such as self-driving
cars. The majority of survey respondents
65%, Would not feel safe
reported not feeling safe in a self-driving car
(65%), compared to only 27% who reported
feeling safe (Figure 8.1.7).
27%, Would feel safe
8%, Don’t know/refused
Figure 8.1.7
Table of Contents
Chapter 8 Preview
328
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
United States
Americans’ Feelings Toward Increased Use of AI
Programs in Daily Life (% of Total), 2022
Source: Pew Research, 2022 | Chart: 2023 AI Index Report
In 2022, Pew Research released one of the most
45%, Equally concerned
and excited
comprehensive surveys to date about Americans’
views on AI. The survey interviewed 10,260 panelists
from a wide range of demographic groups about their
broad AI-related opinions, as well as their perspectives
on specific AI use cases.2
<1%, No answer
45% of Americans report feeling equally concerned
and excited about the use of AI programs in daily life,
while 37% report feeling more concerned than excited
37%, More concerned
than excited
18%, More excited
than concerned
(Figure 8.1.8). Only 18% of Americans report feeling
more excited than concerned about AI technology.
Figure 8.1.8
Which AI applications are Americans most excited
about? A large proportion report feeling very or
problems (40%) (Figure 8.1.9). Americans are very or
somewhat excited about AI being used to perform
somewhat concerned about AI being used to make
household chores (57%), to perform repetitive
important life decisions for people (74%) and to know
workplace tasks (46%), and to diagnose medical
people’s thoughts and behaviors (75%).
Americans’ Feelings on Potential AI Applications (% of Total), 2022
Source: Pew Research, 2022 | Chart: 2023 AI Index Report
Very/somewhat excited
Perform household chores
Equally excited and concerned
Very/somewhat concerned
57%
Perform repetitive workplace
tasks
24%
46%
Diagnose medical problem
27%
40%
Handle customer service calls
27%
35%
26%
47%
9%
16%
74%
Know people’s thoughts
and behaviors
9%
16%
75%
20%
2 See Appendix for more details about the survey methodology.
3 The numbers in Figure 8.1.9 may not sum up to 100% due to rounding.
Table of Contents
Chapter 8 Preview
26%
24%
Make important life decisions
for people
0%
19%
40%
% of Respondents
60%
80%
100%
Figure 8.1.93
329
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
There are two specific AI use cases that Americans
that police using facial recognition technology is a
are more likely to report feeling are good ideas
good idea for society compared to 27% who believe it
for society rather than bad: police use of facial
is a bad idea. However, Americans are not as excited
recognition technology, and social media companies
about driverless passenger vehicles: More feel that
using AI to find false information on their sites (Figure
driverless passenger vehicles are a bad idea for
8.1.10). More specifically, 46% of Americans believe
society than a good idea.
Americans’ Perceptions of Specific AI Use Cases (% of Total), 2022
Source: Pew Research, 2022 | Chart: 2023 AI Index Report
Bad idea for society
Facial recognition technology
by police
Good idea for society
27%
Not sure
46%
27%
Figure 8.1.8
Computer programs by social
media companies to nd false
information on their sites
Driverless passenger vehicles
0%
31%
38%
44%
20%
30%
26%
40%
% of Respondents
60%
29%
80%
100%
Figure 8.1.104
4 The numbers in Figure 8.1.10 may not sum up to 100% due to rounding.
Table of Contents
Chapter 8 Preview
330
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Of the sample of Americans who reported being more
hacking, and digital privacy (16%); and lack of human
concerned than excited about AI, Figure 8.1.11 outlines
connection (12%). Americans reported being less
the main reasons for their concern. The primary
concerned about the potential loss of freedom and
reasons include loss of human jobs (19%); surveillance,
issues relating to lack of oversight and regulation.
Main Reason Americans Are Concerned About AI (% of Total), 2022
Source: Pew Research, 2022 | Chart: 2023 AI Index Report
Loss of human jobs
19%
Surveillance, hacking, digital privacy
16%
Lack of human connection/qualities
12%
AI will get too powerful/outsmarting
people
8%
People misusing AI
8%
People becoming too reliant on
AI/tech
Figure 8.1.8
7%
AI fails, makes mistakes
6%
Concerns about government/tech
companies using AI
3%
Don’t trust AI or people wielding it
3%
Unforeseen consequences/effects
2%
Loss of freedom
2%
Human bias coded into AI
2%
Lack of oversight and regulation
2%
Other
0%
7%
5%
10%
% of Respondents
15%
20%
Figure 8.1.11
Table of Contents
Chapter 8 Preview
331
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
The two leading reasons that Americans report
society better. A significant group also reported
being excited about AI relate to its potential to
feeling excited about the potential of AI to save time
make life better and to save time (Figure 8.1.12).
and increase efficiency (13%), as well as to handle
Of the respondents, 31% believe AI makes life and
mundane, tedious tasks (7%).
Main Reason Americans Are Excited About AI (% of Total), 2022
Source: Pew Research, 2022 | Chart: 2023 AI Index Report
Makes life, society better
31%
Saves time, more e�cient
13%
Inevitable progress, is the future
10%
Handles mundane, tedious tasks
7%
Helps with work/labor
6%
AI is interesting, exciting
6%
Helps humans with di�cult/
dangerous tasks
6%
More accurate than humans
4%
Helps those who are elderly/
have a disability
4%
Personal anecdotes
2%
Other people’s fears based on
sci-�, not reality
2%
Other
Figure 8.1.8
7%
0%
5%
10%
15%
% of Respondents
20%
25%
30%
Figure 8.1.12
Table of Contents
Chapter 8 Preview
332
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
The Pew Research survey also asked participants
in the degree to which people felt that AI systems
which group of people had their experiences and
positively considered the experiences and views of
views taken into consideration in the design of AI
men over women. Similarly, respondents felt that the
systems. Respondents felt AI systems most reflected
experiences and views of Asian, Black, and Hispanic
the experiences and views of men and white adults
adults, compared to those held by white adults, were
(Figure 8.1.13). There was a 15 percentage point gap
not as positively considered.
People Whose Experiences and Views Are Considered in the Design of AI Systems (% of Total), 2022
Source: Pew Research, 2022 | Chart: 2023 AI Index Report
Net not well
Men
Net well
Not sure
12%
51%
37%
Figure 8.1.8
Women
White adults
25%
36%
13%
Asian adults
48%
23%
39%
33%
Black adults
33%
Hispanic adults
33%
0%
38%
43%
24%
42%
23%
20%
40%
% of Respondents
43%
60%
80%
100%
Figure 8.1.135
5 The numbers in Figure 8.1.13 may not sum up to 100% due to rounding.
Table of Contents
Chapter 8 Preview
333
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI?
From May to June 2022, a group of American
In general, the NLP research community
researchers conducted a survey of the NLP research
strongly feels that private firms have too much
community on a diverse set of issues, including the
influence (77%) and that industry will produce
state of the NLP field, artificial general intelligence
the most widely cited research (86%) (Figure
(AGI), and ethics, among others. According to the
8.1.14). Curiously, 67% either agreed or weakly
authors, a total of 480 individuals completed the
agreed with the statement that most of NLP is
survey, 68% of whom had authored at least two
dubious science. A small proportion, 30%, think
Association for Computational Linguistics (ACL)
an “NLP winter”—a period when the field faces
publications between 2019 and 2022. The survey
a significant slowdown or stagnation in research
represents one of the most complete pictures of the
and development—is coming in the next decade.
6
attitudes AI researchers have toward AI research.
State of the Field According to the NLP Community, 2022
Source: Michael et al., 2022 | Chart: 2023 AI Index Report
Private �rms have too
much in�uence
77%
Industry will produce
the most widely cited
research
86%
NLP winter
is coming (10 years)
30%
NLP winter
is coming (30 years)
62%
Most of NLP is
dubious science
67%
Author anonymity
is worth it
0%
63%
10%
20%
30%
40%
50%
60%
% of Respondents That “Agree” or “Weakly Agree”
70%
80%
90%
Figure 8.1.14
6 More detailed information about the survey methodology and sample group can be found in the following paper.
Table of Contents
Chapter 8 Preview
334
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
A small majority of NLP researchers believe that specific types of AI systems can actually understand
language: 51% agreed with the statement that language models (LMs) understand language, with even
more (67%) agreeing that multimodal models understand language (Figure 8.1.15).
Language Understanding According to the NLP Community, 2022
Source: Michael et al., 2022 | Chart: 2023 AI Index Report
LMs understand
language
51%
Multimodal models
understand language
67%
Text-only evaluation can
measure language
understanding
0%
36%
10%
20%
30%
40%
50%
% of Respondents That “Agree” or “Weakly Agree”
60%
Figure 8.1.15
Table of Contents
Chapter 8 Preview
335
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
NLP researchers also seem to believe that NLP’s
with 48% of respondents feeling it is unethical. Sixty
past net impact has been positive (89%) and that its
percent of researchers feel that the carbon footprint
future impact will continue to be good (87%) (Figure
of AI is a major concern; however, only 41% feel that
8.1.16). The community is divided on the issue of
NLP should be regulated.
using AI to predict psychological characteristics,
Ethics According to the NLP Community, 2022
Source: Michael et al., 2022 | Chart: 2023 AI Index Report
NLP’s past net impact
is good
89%
NLP’s future net impact
is good
87%
It is unethical to build
easily misusable systems
59%
Ethical and scienti�c
considerations
can con�ict
74%
Ethical concerns mostly
reduce to data quality
and model accuracy
25%
It is unethical to predict
psychological
characteristics
48%
Carbon footprint is
a major concern
60%
NLP should be regulated
0%
41%
10%
20%
30%
40%
50%
60%
% of Respondents That “Agree” or “Weakly Agree”
70%
80%
90%
Figure 8.1.16
Table of Contents
Chapter 8 Preview
336
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
Although a large majority of researchers feel that AI could soon lead to revolutionary societal change
(73%), only 36% feel that AI decisions could cause nuclear-level catastrophe (Figure 8.1.17). A plurality
of researchers, 57%, held that recent research progress was leading the AI community toward Artificial
General Intelligence (AGI).
Arti cial General Intelligence (AGI) and Major Risks According to the NLP Community, 2022
Source: Michael et al., 2022 | Chart: 2023 AI Index Report
AGI is an important
concern
58%
Recent progress is
moving us toward
AGI
57%
AI could soon lead
to revolutionary
societal change
73%
AI decisions could
cause nuclear-level
catastrophe
0%
36%
10%
20%
30%
40%
50%
% of Respondents That “Agree” or “Weakly Agree”
60%
70%
Figure 8.1.17
Table of Contents
Chapter 8 Preview
337
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
When asked about the direction AI research is taking, the NLP community registered the strongest
responses about the following: First, there’s too much focus on benchmarks (88%); second, more work
should be done to incorporate interdisciplinary insights (82%); and third, there’s too great a focus on
scale (72%) (Figure 8.1.18).
Promising Research Programs According to the NLP Community, 2022
Source: Michael et al., 2022 | Chart: 2023 AI Index Report
There’s too much
focus on scale
72%
There’s too much
focus on benchmarks
88%
On the wrong track:
model architectures
37%
On the wrong track:
language generation
41%
On the wrong track:
explainable models
50%
On the wrong track:
black-box
interpretability
42%
We should do more
to incorporate
interdisciplinary insights
0%
82%
10%
20%
30%
40%
50%
60%
% of Respondents That “Agree” or “Weakly Agree”
70%
80%
90%
Figure 8.1.18
Table of Contents
Chapter 8 Preview
338
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.1 Survey Data
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
A further point on the NLP community’s skepticism of scale: Only 17% of respondents agreed or weakly
agreed with the statement that scaling solves practically any important problem, with a further 50%
reaffirming the importance of linguistic structure (Figure 8.1.19).
Scale, Inductive Bias, and Adjacent Fields According to the NLP Community, 2022
Source: Michael et al., 2022 | Chart: 2023 AI Index Report
Scaling solves practically
any important problem
17%
Linguistic structure
is necessary
50%
Expert inductive
biases are necessary
51%
Linguistics/CogSci will
contribute to the
most-cited models
0%
61%
10%
20%
30%
40%
% of Respondents That “Agree” or “Weakly Agree”
50%
60%
Figure 8.1.19
Table of Contents
Chapter 8 Preview
339
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.2 Social Media Data
8.2 Social Media Data
Dominant Models
ChatGPT conversation has increasingly saturated social
Public attitudes toward AI can also be gauged through
broadly, with sentiment growing ever more mixed.
quantitative and qualitative analyses of posts that
people make on social media. The NetBase Quid team
leveraged the NetBase platform to analyze social
conversation around AI models and new releases for
uses across sectors from January to December 2022,
looking at 2.74 million social media posts.
Figure 8.2.1 shows the net sentiment score of various
AI models that were released throughout the year. The
net sentiment score expresses the ratio of positive
to negative sentiment around a given topic. In this
case, a net sentiment score of +100 means that all
conversation is positive; a score of -100 means that
media conversation around AI model releases more
Consumers question the implications of its launch
as well as its underlying ethical principles. Another
frequent preoccupation is the bias of the system toward
certain political, ethical, or cultural beliefs.
“ChatGPT passed a Wharton MBA exam. Time to
overhaul education.” – @GRDecter
“Alarm: ChatGPT by @OpenAI now *expressly
prohibits arguments for fossil fuels*. (It used to
offer them.) Not only that, it excludes nuclear
energy from its counter-suggestions. @sama,
what is the reason for this policy?” – @AlexEpstein
all conversation is negative. AlphaCode had the most
Finally, while GLM-130B took up very little volume
consistently high sentiment over time, as well as the
of the overall social media conversation, a small
highest average sentiment for 2022, due to positive
conversation of very negative sentiment grew over
press coverage on social media and practical use
the system’s ties to the Chinese government and
cases of AI-driven programming. Consumers and
how it was “prohibited” from using the software
media outlets embraced the practical use case of
to “undermine” China’s government in any way.
programming automation. Some sample social media
Technology influencer and PhD student Jesse Wood
posts relating to AlphaCode include:
posted a Twitter thread about GLM-130B’s licensing
“#AlphaCode—a new #AI system for developing
computer code developed by @DeepMind—
language that gained significant traction.
“The model license for GLM-130B has a
can achieve average human-level performance
restriction: ‘You will not use the Software for any
in solving programming contests.”
act that may undermine China’s national security
– Science Magazine, Twitter
and national unity, harm the public interest of
“DeepMind’s AlphaCode outperforms many
human programmers in tricky software
society, or infringe upon the rights and interests of
human beings.’” – @jrhwood
challenges.” – @lunamoth
Table of Contents
Chapter 8 Preview
340
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.2 Social Media Data
Net Sentiment Score of AI Models by Quarter, 2022
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
2022/Q1
2022/Q2
2022/Q3
2022/Q4
DALL-E
0
42
29
21
LaMDA
73
-9
-11
44
AlphaCode
60
79
71
70
CoPilot
29
22
15
34
PaLM
66
66
30
Gato
47
84
65
Imagen
24
65
56
Stable Di usion
35
52
Whisper
85
69
Make-A-Video
4
9
AlphaTensor
96
GLM-130B
55
BLOOM
0
CICERO
14
ChatGPT
32
Figure 8.2.17
7 The AI Index searched for sentiment surrounding the term “DALL-E,” as it was more frequently referred to on social media, rather than DALL-E 2, the official name of the text-to-image
model released by OpenAI in 2022.
Table of Contents
Chapter 8 Preview
341
Artificial Intelligence
Index Report 2023
Figure 8.2.2 highlights the proportion of AI-related
Chapter 8: Public Opinion
8.2 Social Media Data
“This story … is really sad, and I think an
social media conversation that was dominated by the
important window into the risks of designing
release of particular models. ChatGPT dominated
systems to seem like humans, which are
consumer conversation with a rapid rise, making
exacerbated by #AIhype.” – @nitashataku
8
up over half of consumer conversation by the end
of 2022. Despite initial excitement, sentiment was
mixed by the end of the year, as some individuals
became more aware of ChatGPT’s limitations.
OpenAI CEO Sam Altman even publicly commented
on it being “incredibly limited” in certain respects.
“ChatGPT is incredibly limited, but good enough
at some things to create a misleading impression
of greatness. It’s a mistake to be relying on it
for anything important right now. It’s a preview
of progress; we have lots of work to do on
robustness and truthfulness.” – @SamAltman
Conversation around LaMDA exploded in Q2
2022 as an ex–Google employee reported his
experiences with a “sentient” system that spoke of
its own emotions and thoughts. Many political and
Stable Diffusion conversation stands out as a
prominent leader in conversation volume toward
the end of 2022, but it is also a symbol of how the
consumer lexicon around AI models is developing.
Many consumers debated the “originality” of what
Stable Diffusion produces.
“I’ve worked on neural networks, so I understand
stable diffusion pretty well. And while it can’t
have original thoughts, it can come up with
original works.” – r/TikTokCringe
“That’s true of anywhere that datasets scrape
without permission. The thing to actually be upset
about is that their own generator is purposefully
using the Stable Diffusion dataset that already
contains tons of stolen work.” – @Emily_Art
technology influencers spoke out, however, about
the “deepfake” nature of the responses of systems
like LaMDA that do not have a sense of “truth” and
could proliferate misinformation.
“AI systems like LamDA and GPT-3 are
sociopathic liars with utter indifference to truth,
deepfakers with words, every day creating
more compelling, more plausible misinformation
on demand. It is imperative that we develop
technology & policy to thwart them.” –
ChatGPT dominated
consumer conversation
with a rapid rise, making
up over half of consumer
conversation by the end
of 2022.
@GaryMarcus
8 The figures in this section consider all AI-related social media conversation. The percentage associated with the model in Figure 8.2.2 represents the share of all AI-related social media
conversation that was dominated by that model.
Table of Contents
Chapter 8 Preview
342
Artificial Intelligence
Index Report 2023
Chapter 8: Public Opinion
8.2 Social Media Data
Select Models’ Share of AI Social Media Attention by Quarter, 2022
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
2022/Q1
2022/Q2
2022/Q3
2022/Q4
DALL-E
0%
1%
3%
2%
LaMDA
1%
35%
9%
<1%
AlphaCode
2%
<1%
<1%
1%
CoPilot
10%
3%
4%
1%
PaLM
<1%
<1%
<1%
Gato
10%
18%
3%
Imagen
5%
4%
2%
Stable Di usion
19%
19%
Whisper
<1%
<1%
Make-A-Video
33%
15%
AlphaTensor
1%
GLM-130B
<1%
BLOOM
<1%
CICERO
3%
ChatGPT
52%
Figure 8.2.2
Table of Contents
Chapter 8 Preview
343
Artificial Intelligence
Index Report 2023
Artificial Intelligence
Index Report 2023
Appendix
Artificial Intelligence
Index Report 2023
Appendix
Chapter 1
Research and Development
346
Chapter 2
Technical Performance
352
Chapter 3
Technical AI Ethics
363
Chapter 4
The Economy
366
Chapter 5
Education
375
Chapter 6
Policy and Governance
377
Chapter 7
Diversity
384
Chapter 8
Public Opinion
385
345
Artificial Intelligence
Index Report 2023
Appendix
Chapter 1: Research and Development
Chapter 1: Research and Development
Center for Security and
Emerging Technology,
Georgetown University
Methodology
Prepared by Sara Abdulla and James Dunham
publications. To identify AI publications, CSET used an
The Center for Security and Emerging Technology
since 2010 that appear AI-relevant.4 CSET researchers
(CSET) is a policy research organization within
Georgetown University’s Walsh School of Foreign
Service that produces data-driven research at the
intersection of security and technology, providing
nonpartisan analysis to the policy community.
For more information about how CSET analyzes
bibliometric and patent data, see the Country Activity
Tracker (CAT) documentation on the Emerging
Technology Observatory’s website.1 Using CAT, users
can also interact with country bibliometric, patent,
and investment data.2
To create the merged corpus, CSET deduplicated
across the listed sources using publication metadata,
and then combined the metadata for linked
English-language subset of this corpus: publications
developed a classifier for identifying AI-related
publications by leveraging the arXiv repository, where
authors and editors tag papers by subject. Additionally,
CSET uses select Chinese AI keywords to identify
Chinese-language AI papers.5
To provide a publication’s field of study, CSET
matches each publication in the analytic corpus
with predictions from Microsoft Academic Graph’s
field-of-study model, which yields hierarchical labels
describing the published research field(s) of study and
corresponding scores.6 CSET researchers identified
Publications from CSET Merged Corpus of
Scholarly Literature
the most common fields of study in our corpus of
Source
publications in all other fields as “Other AI.” English-
CSET’s merged corpus of scholarly literature
language AI-relevant publications were then tallied by
combines distinct publications from Digital Science’s
their top-scoring field and publication year.
Dimensions, Clarivate’s Web of Science, Microsoft
Academic Graph, China National Knowledge
Infrastructure, arXiv, and Papers With Code.3
AI-relevant publications since 2010 and recorded
CSET also provided year-by-year citations for AIrelevant work associated with each country. A
publication is associated with a country if it has at
1 https://eto.tech/tool-docs/cat/
2 https://cat.eto.tech/
3 All CNKI content is furnished by East View Information Services, Minneapolis, Minnesota, USA.
4 For more information, see James Dunham, Jennifer Melot, and Dewey Murdick, “Identifying the Development and Application of Artificial Intelligence in Scientific Text,” arXiv [cs.DL],
May 28, 2020, https://arxiv.org/abs/2002.07143.
5 This method was not used in CSET’s data analysis for the 2022 HAI Index report.
6 These scores are based on cosine similarities between field-of-study and paper embeddings. See Zhihong Shen, Hao Ma, and Kuansan Wang, “A Web-Scale System for Scientific
Knowledge Exploration,” arXiv [cs.CL], May 30, 2018, https://arxiv.org/abs/1805.12216.
Table of Contents
Appendix
346
Artificial Intelligence
Index Report 2023
Appendix
Chapter 1: Research and Development
available for all publications; those without counts
Epoch National
Affiliation Analysis
weren’t included in the citation analysis. Over 70% of
The AI forecasting research group Epoch maintains
English-language AI papers published between 2010
a dataset of landmark AI and ML models, along with
and 2020 have citation data available.
accompanying information about their creators and
least one author whose organizational affiliation(s)
are located in that country. Citation counts aren’t
CSET counted cross-country collaborations as
distinct pairs of countries across authors for each
publication. Collaborations are only counted once:
publications, such as the list of their (co)authors, number
of citations, type of AI task accomplished, and amount
of compute used in training.
For example, if a publication has two authors from
The nationalities of the authors of these papers have
the United States and two authors from China, it is
important implications for geopolitical AI forecasting.
counted as a single United States-China collaboration.
As various research institutions and technology
Additionally, publication counts by year and by
publication type (e.g., academic journal articles,
conference papers) were provided where available.
These publication types were disaggregated by
affiliation country as described above.
CSET also provided publication affiliation sector(s)
where, as in the country attribution analysis, sectors
were associated with publications through authors’
affiliations. Not all affiliations were characterized in
terms of sectors; CSET researchers relied primarily
companies start producing advanced ML models, the
global distribution of future AI development may shift
or concentrate in certain places, which in turn affects
the geopolitical landscape because AI is expected to
become a crucial component of economic and military
power in the near future.
To track the distribution of AI research contributions on
landmark publications by country, the Epoch dataset is
coded according to the following methodology:
1. A snapshot of the dataset was taken on
on GRID from Digital Science for this purpose, and
November 14, 2022. This includes papers about
not all organizations can be found in or linked to
landmark models, selected using the inclusion
GRID. Where the affiliation sector is available, papers
criteria of importance, relevance, and uniqueness,
were counted toward these sectors, by year. Cross-
as described in the Compute Trends dataset
sector collaborations on academic publications
documentation.8
7
were calculated using the same method as in the
cross-country collaborations analysis. We use HAI’s
standard regions mapping for geographic analysis,
and the same principles for double-counting apply for
regions as they do for countries.
2. The authors are attributed to countries based
on their affiliation credited on the paper. For
international organizations, authors are attributed
to the country where the organization is
headquartered, unless a more specific location
is indicated. The number of authors from each
country represented are added up and recorded.
7 See https://www.grid.ac/ for more information about the GRID dataset from Digital Science.
8 https://epochai.org/blog/compute-trends; see note on “milestone systems.”
Table of Contents
Appendix
347
Artificial Intelligence
Index Report 2023
Appendix
Chapter 1: Research and Development
If an author has multiple affiliations in different
Imagen
OPT-175B
countries, they are split between those
InstructGPT
PaLM (540B)
countries proportionately.
Jurassic-1-Jumbo
PanGu-alpha
Jurassic-X
Stable Diffusion (LDM-
Meena
KL-8-G)
Megatron-LM (original,
T5-3B
8.3B)
T5-11B
Megatron-Turing NLG
Turing NLG
530B
Wu Dao 2.0
Minerva (540B)
Wu Dao – Wen Yuan
9
3. E
ach paper in the dataset is normalized to
equal value by dividing the counts on each
paper from each country by the total number
of authors on that paper.10
4. All of the landmark publications are
aggregated within time periods (e.g., monthly
or yearly) with the normalized national
contributions added up to determine what
each country’s contribution to landmark AI
research was during each time period.
5. The contributions of different countries are
compared over time to identify any trends.
Large Language and
Multimodal Models Training
Cost Analysis
Cost estimates for the models were based directly
Large Language and
Multimodal Models
on the hardware and training time if these were
The following models were identified by members
training compute, and hardware utilization efficiency.11
of the AI Index Steering Committee as the large
Training time was then multiplied by the closest cost
language and multimodal models that would be
rate for the hardware the AI Index could find for the
included as part of the large language and multimodal
organization that trained the model. If price quotes
model analysis:
were available before and after the model’s training,
disclosed by the authors; otherwise, the AI Index
calculated training time from the hardware speed,
the AI Index interpolated the hardware’s cost rate
AlphaCode
GLM-130B
BLOOM
Gopher
Chinchilla
GPT-2
The AI Index classified training cost estimates as
Codex
GPT-3 175B (davinci)
high, middle, or low. The AI Index called an estimate
CogView
GPT-J-6B
high if it was an upper bound or if the true cost was
DALL-E
GPT-Neo
more likely to be lower than higher: For example,
DALL-E 2
GPT-NeoX-20B
PaLM was trained on TPU v4 chips, and the AI Index
ERNIE 3.0
Grover-Mega
estimated the cost to train the model on these chips
ERNIE-GEN (large)
HyperCLOVA
from Google’s public cloud compute prices, but the
along an exponential decay curve.
9 For example, an author employed by both a Chinese university and a Canadian technology firm would be counted as 0.5 researchers from China and 0.5 from Canada.
10 This choice is arbitrary. Other plausible alternatives include weighting papers by their number of citations, or assigning greater weight to papers with more authors.
11 Hardware utilization rates: Every paper that reported the hardware utilization efficiency during training provided values between 30% and 50%. The AI Index used the reported numbers
when available, or used 40% when values were not provided.
Table of Contents
Appendix
348
Artificial Intelligence
Index Report 2023
Appendix
Chapter 1: Research and Development
internal cost to Google is probably lower than what
each file’s revision history. The analysis of GitHub
they charge others to rent their hardware. The AI
data could shed light on relevant metrics about who
Index called an estimate low if it was a lower bound
is developing AI software, where, and how fast, and
or if the true cost was likely higher: For example,
who is using which development tools. These metrics
ERNIE was trained on NVIDIA Tesla v100 chips and
could serve as proxies for broader trends in the field
published in July 2021; the chips cost $0.55 per hour
of software development and innovation.
in January 2023, so the AI Index could get a low
Identifying AI Projects
estimate of the cost using this rate, but the training
hardware was probably more expensive two years
earlier. Middle estimates are a best guess, or those
that equally well might be lower or higher.
AI Conferences
The AI Index reached out to the organizers of various
AI conferences in 2022 and asked them to provide
information on total attendance. Some conferences
posted their attendance totals online; when this was
the case, the AI Index used those reported totals and
did not reach out to the conference organizers.
Arguably, a significant portion of AI software
development takes place on GitHub. OECD.AI
partners with GitHub to identify public AI projects—
or “repositories”—following the methodology
developed by Gonzalez et al.,2020. Using the 439
topic labels identified by Gonzalez et al.—as well as
the topics “machine learning,” “deep learning,” and
“artificial intelligence”—GitHub provides OECD.
AI with a list of public projects containing AI code.
GitHub updates the list of public AI projects on a
quarterly basis, which allows OECD.AI to capture
trends in AI software development over time.
Obtaining AI Projects’ Metadata
GitHub
OECD.AI uses GitHub’s list of public AI projects
The GitHub data was provided to the AI Index
to query GitHub’s public API and obtain more
through OECD.AI, an organization with whom
information about these projects. Project metadata
GitHub partners that provides data on open-
may include the individual or organization that
source AI software. The AI Index reproduces the
created the project; the programming language(s)
methodological note that is included by OECD.AI on
(e.g., Python) and development tool(s) (e.g., Jupyter
its website, for the GitHub Data.
Notebooks) used in the project; as well as information
about the contributions—or “commits”—made to it,
Background
Since its creation in 2007, GitHub has become
the main provider of internet hosting for software
development and version control. Many technology
organizations and software developers use GitHub
as a primary place for collaboration. To enable
collaboration, GitHub is structured into projects, or
“repositories,” which contain a project’s files and
Table of Contents
Appendix
which include the commit’s author and a timestamp.
In practical terms, a contribution or “commit” is an
individual change to a file or set of files. Additionally,
GitHub automatically suggests topical tags to each
project based on its content. These topical tags need
to be confirmed or modified by the project owner(s)
to appear in the metadata.
349
Artificial Intelligence
Index Report 2023
Appendix
Chapter 1: Research and Development
Mapping Contributions to AI Projects to a
Country
As of October 2021, 71.2% of the contributions to
Contributions to public AI projects are mapped
this methodology. However, a decreasing trend in
to a country based on location information at the
the share of AI projects for which a location can be
contributor level and at the project level.
identified is observed in time, indicating a possible lag
a) Location information at the contributor level:
• GitHub’s “Location” field: Contributors can
provide their location in their GitHub account.
Given that GitHub’s location field accepts free
text, the location provided by contributors is
public AI projects were mapped to a country using
in location reporting.
Measuring Contributions to AI Projects
Collaboration on a given public AI project is measured
by the number of contributions—or “commits”—made
to it.
not standardized and could belong to different
To obtain a fractional count of contributions by
levels (e.g., suburban, urban, regional, or
country, an AI project is divided equally by the total
national). To allow cross-country comparisons,
number of contributions made to it. A country’s total
Mapbox is used to standardize all available
contributions to AI projects is therefore given by the
locations to the country level.
sum of its contributions—in fractional counts—to each
• Top level domain: Where the location field
is empty or the location is not recognized, a
contributor’s location is assigned based on his
or her email domain (e.g., .fr, .us, etc.).
b) Location information at the project level:
• Project information: Where no location
AI project. In relative terms, the share of contributions
to public AI projects made by a given country is the
ratio of that country’s contributions to each of the
AI projects in which it participates over the total
contributions to AI projects from all countries.
In future iterations, OECD.AI plans to include
additional measures of contribution to AI software
information is available at the contributor
development, such as issues raised, comments, and
level, information at the repository or project
pull requests.
level is exploited. In particular, contributions
Identifying Programming Languages and
Development Tools Used in AI Projects
from contributors with no location information
to projects created or owned by a known
GitHub uses file extensions contained in a project to
organization are automatically assigned the
automatically tag it with one or more programming
organization’s country (i.e., the country where
languages and/or development tools. This implies that
its headquarters are located). For example,
more than one programming language or development
contributions from a contributor with no
tool could be used in a given AI project.
location information to an AI project owned by
Microsoft will be assigned to the United States.
If the above fails, a contributor’s location field is left
blank.
Table of Contents
Appendix
350
Artificial Intelligence
Index Report 2023
Appendix
Chapter 1: Research and Development
Measuring the Quality of AI Projects
Two quality measures are used to classify public AI
projects:
• Project impact: The impact of an AI project is
given by the number of managed copies (i.e.,
“forks”) made of that project.
• Project popularity: The impact of an AI project
is given by the number of followers (i.e., “stars”)
received by that project.
Filtering by project impact or popularity could help
identify countries that contribute the most to high
quality projects.
Measuring Collaboration
Two countries are said to collaborate on a specific
public AI software development project if there is
at least one contributor from each country with at
least one contribution (i.e., “commit”) to the project.
Domestic collaboration occurs when two contributors
from the same country contribute to a project.
Table of Contents
Appendix
351
Artificial Intelligence
Index Report 2023
Appendix
Chapter 2: Technical Performance
Chapter 2: Technical Performance
ImageNet
Very Deep Convolutional Networks for Large-Scale
Image Recognition
Data on ImageNet accuracy was retrieved through a
ViTAEv2: Vision Transformer Advanced by Exploring
detailed arXiv literature review cross-referenced by
technical progress reported on Papers With Code.
Inductive Bias for Image Recognition and Beyond
The reported dates correspond to the year in which
To highlight progress on top-1 accuracy with the use
a paper was first published to arXiv, and the reported
of extra training data, scores were taken from the
results (top-1 accuracy) correspond to the result
following papers:
reported in the most recent version of each paper.
Big Transfer (BiT): General Visual
Representation Learning
Learn more about the LSVRC ImageNet competition
and the ImageNet dataset.
To highlight progress on top-1 accuracy without the
use of extra training data, scores were taken from the
following papers:
Aggregated Residual Transformations for
Deep Neural Networks
Exploring the Limits of Weakly Supervised Pretraining
Fixing the Train-Test Resolution Discrepancy:
FixEfficientNet
ImageNet Classification With Deep Convolutional
Neural Networks
PeCo: Perceptual Codebook for BERT
Pre-training of Vision Transformers
Progressive Neural Architecture Search
CoAtNet: Marrying Convolution and
Attention for All Data Sizes
CoCa: Contrastive Captioners Are Image-Text
Foundation Models
Meta Pseudo Labels
National Institute of
Standards and Technology
(NIST) Face Recognition
Vendor Test (FRVT)
Data on NIST FRVT 1:1 verification accuracy by
dataset was obtained from the FRVT 1:1 verification
leaderboard.
Rethinking the Inception Architecture for
Computer Vision
Self-Training With Noisy Student Improves
ImageNet Classification
Some Improvements on Deep Convolutional Neural
Network Based Image Classification
Table of Contents
Appendix
352
Artificial Intelligence
Index Report 2023
Celeb-DF
Appendix
Chapter 2: Technical Performance
version of each paper. Details on the MPII benchmark
Data on Celeb-DF AUC was retrieved through a
detailed arXiv literature review. The reported dates
correspond to the year in which a paper was first
can be found in the MPII paper and MPII dataset.
To highlight progress on percentage of correct
keypoints without the use of extra training data, scores
published to arXiv or a method was introduced. With
were taken from the following papers:
Celeb-DF, recent researchers have tested previously
Bottom-Up and Top-Down Reasoning
With Hierarchical Rectified Gaussians
existing deepfake detection methodologies. The year
in which a method was introduced, even if it was
subsequently tested, is the year in which it is included
in the report. The reported results (AUC) correspond
to the result reported in the most recent version of
each paper. Details on the Celeb-DF benchmark can
be found in the Celeb-DF paper.
To highlight progress on Celeb-DF, scores were taken
from the following papers:
Deepfake Detection via Joint Unsupervised
Reconstruction and Supervised Classification
Exposing Deepfake Videos by Detecting
Face Warping Artifacts
Face X-Ray for More General Face Forgery Detection
FaceForensics++: Learning to Detect Manipulated
Facial Images
Spatial-Phase Shallow Learning: Rethinking Face
Forgery Detection in Frequency Domain
Cascade Feature Aggregation for
Human Pose Estimation
Deeply Learned Compositional Models for
Human Pose Estimation
Efficient Object Localization Using
Convolutional Networks
Learning Feature Pyramids for Human Pose Estimation
Stacked Hourglass Networks for
Human Pose Estimation
Toward Fast and Accurate Human Pose Estimation
via Soft-Gated Skip Connections
ViTPose: Simple Vision Transformer Baselines for
Human Pose Estimation
Cityscapes Challenge,
Pixel-Level Semantic
Labeling Task
MPII
Data on the Cityscapes challenge, pixel-level semantic
Data on MPII percentage of correct keypoints (PCK)
was taken from the Cityscapes dataset, specifically
was retrieved through a detailed arXiv literature
their pixel-level semantic labeling leaderboard.
review cross-referenced by technical progress
More details about the Cityscapes dataset and other
reported on Papers With Code. The reported dates
corresponding semantic segmentation challenges can
correspond to the year in which a paper was first
be accessed at the Cityscapes dataset webpage.
labeling task mean intersection-over-union (mIoU)
published to arXiv, and the reported results (PCK)
correspond to the result reported in the most recent
Table of Contents
Appendix
353
Artificial Intelligence
Index Report 2023
Appendix
Chapter 2: Technical Performance
Kvasir-SEG
An Analysis of Scale Invariance in Object
Detection-SNIP
Data on Kvasir-SEG mean dice was retrieved through
CBNet: A Novel Composite Backbone Network
Architecture for Object Detection
a detailed arXiv literature review cross-referenced by
technical progress reported on Papers With Code.
The reported dates correspond to the year in which
a paper was first published to arXiv, and the reported
results (mean dice) correspond to the result reported
in the most recent version of each paper. Details on
Deformable ConvNets v2: More Deformable,
Better Results
DetectoRS: Detecting Objects With Recursive
Feature Pyramid and Switchable Atrous Convolution
the Kvasir-SEG benchmark can be found in the Kvasir-
EVA: Exploring the Limits of Masked Visual
Representation Learning at Scale
SEG paper.
Grounded Language-Image Pre-training
To highlight progress on Kvasir-SEG, scores were
Inside-Outside Net: Detecting Objects in Context
With Skip Pooling and Recurrent Neural Networks
taken from the following papers:
GMSRF-Net: An Improved Generalizability With
Global Multi-Scale Residual Fusion Network for
Polyp Segmentation
CIFAR-10
PraNet: Parallel Reverse Attention Network for
Polyp Segmentation
a detailed arXiv literature review cross-referenced by
ResUNet++: An Advanced Architecture for
Medical Image Segmentation
reported dates correspond to the year in which a paper
Spatially Exclusive Pasting: A General Data
Augmentation for the Polyp Segmentation
(FID score) correspond to the result reported in the most
Common Object in Context
(COCO)
Data on COCO mean average precision (mAP50) was
retrieved through a detailed arXiv literature review
cross-referenced by technical progress reported on
Papers With Code. The reported dates correspond to
the year in which a paper was first published to arXiv,
Data on CIFAR-10 FID scores was retrieved through
technical progress reported on Papers With Code. The
was first published to arXiv, and the reported results
recent version of each paper. Details on the CIFAR-10
benchmark can be found in the CIFAR-10 paper.
To highlight progress on CIFAR-10, scores were taken
from the following papers:
GANs Trained by a Two Time-Scale Update Rule
Converge to a Local Nash Equilibrium
Large Scale GAN Training for High Fidelity Natural
Image Synthesis
and the reported results (mAP50) correspond to the
Refining Generative Process With Discriminator
Guidance in Score-Based Diffusion Models
result reported in the most recent version of each
Score-Based Generative Modeling in Latent Space
paper. Details on the COCO benchmark can be found
in the COCO paper.
To highlight progress on COCO, scores were taken
from the following papers:
Table of Contents
Appendix
Score-Based Generative Modeling Through
Stochastic Differential Equations
Self-Supervised GAN: Analysis and Improvement
With Multi-Class Minimax Game
354
Artificial Intelligence
Index Report 2023
Appendix
Chapter 2: Technical Performance
STL-10
The reported dates correspond to the year in which
Data on STL-10 FID scores was retrieved through a
results (accuracy) correspond to the result reported in
detailed arXiv literature review cross-referenced by
technical progress reported on Papers With Code. The
reported dates correspond to the year in which a paper
was first published to arXiv, and the reported results
(FID score) correspond to the result reported in the
most recent version of each paper. Details on the STL-
a paper was first published to arXiv, and the reported
the most recent version of each paper. Human-level
performance is taken from the 2021 VQA challenge.
To highlight progress on VQA accuracy without the
use of extra training data, scores were taken from the
following papers:
10 benchmark can be found in the STL-10 paper.
Bilinear Attention Networks
To highlight progress on STL-10, scores were taken
Multimodal Compact Bilinear Pooling for Visual
Question Answering and Visual Grounding
from the following papers:
DEGAS: Differentiable Efficient Generator Search
Diffusion-GAN: Training GANs With Diffusion
Discriminator Contrastive Divergence:
Semi-Amortized Generative Modeling by
Exploring Energy of the Discriminator
Dist-GAN: An Improved GAN Using
Distance Constraints
Soft Truncation: A Universal Training Technique of
Score-Based Diffusion Model for High Precision
Score Estimation
Text-to-Image Models
on MS-COCO 256 × 256
FID-30K
Data on MS-COCO 256 x 256 FID 30K for Text-toImage Models was retrieved from the paper Saharia
et al., 2022.
Oscar: Object-Semantics Aligned Pre-training
for Vision-Language Tasks
PaLI: A Jointly-Scaled Multilingual
Language-Image Model
Tips and Tricks for Visual Question Answering:
Learnings From the 2017 Challenge
UNITER: UNiversal Image-TExt Representation Learning
VLMo: Unified Vision-Language Pre-training With
Mixture-of-Modality-Experts
BEiT-3 Vs. Previous SOTA
Data on BEiT-3 and Previous SOTA was retrieved from
the paper Wang et al., 2022.
Visual Commonsense
Reasoning (VCR)
Data on VCR Q->AR score was taken from VCR
leaderboard; the VCR leaderboard webpage further
Visual Question Answering
(VQA)
delineates the methodology behind the VCR challenge.
Data on VQA accuracy was retrieved through a
in the VCR paper.
Human performance on VCR is taken from Zellers et
al., 2018. Details on the VCR benchmark can be found
detailed arXiv literature review cross-referenced by
technical progress reported on Papers With Code.
Table of Contents
Appendix
355
Artificial Intelligence
Index Report 2023
Kinetics-400, Kinetics-600,
and Kinetics-700
Data on Kinetics-400, Kinetics-600, and Kinetics-700
accuracy was retrieved through a detailed arXiv
Appendix
Chapter 2: Technical Performance
To highlight progress on Kinetics-600, scores were
taken from the following papers:
Learning Spatio-Temporal Representation
With Local and Global Diffusion
literature review cross-referenced by technical
Masked Feature Prediction for Self-Supervised
Visual Pre-training
progress reported on Papers With Code (Kinetics-400,
PERF-Net: Pose Empowered RGB-Flow Net
Kinetics-600, and Kinetics-700). The reported
Rethinking Spatiotemporal Feature Learning:
Speed-Accuracy Trade-Offs in Video Classification
dates correspond to the year in which a paper was
first published to arXiv, and the reported results
(top-1 accuracy) correspond to the result reported
Rethinking Video ViTs: Sparse Video Tubes for
Joint Image and Video Learning
in the most recent version of each paper. Details on
SlowFast Networks for Video Recognition
the Kinetics-400 benchmark can be found in the
Kinetics-400 paper. Details on the Kinetics-600
benchmark can be found in the Kinetics-600 paper.
Details on the Kinetics-700 benchmark can be found in
the Kinetics-700 paper.
To highlight progress on Kinetics-400, scores were
taken from the following papers:
Co-training Transformer With Videos and Images
Improves Action Recognition
InternVideo: General Video Foundation Models via
Generative and Discriminative Learning
Large-Scale Weakly-Supervised Pre-training for
Video Action Recognition
Non-Local Neural Networks
Omni-Sourced Webly-Supervised Learning for
Video Recognition
To highlight progress on Kinetics-700, scores were
taken from the following papers:
InternVideo: General Video Foundation Models via
Generative and Discriminative Learning
Learn to Cycle: Time-Consistent Feature Discovery
for Action Recognition
Masked Feature Prediction for Self-Supervised
Visual Pre-training
Text-to-Video Models
on UCF-101
Data on UCF-101 Inception Score (IS) for Text-to-Video
Models was retrieved from the paper Hong et al., 2022,
and Singer et al., 2022.
SlowFast Networks for Video Recognition
Temporal Segment Networks: Towards Good
Practices for Deep Action Recognition
Table of Contents
Appendix
356
Artificial Intelligence
Index Report 2023
Appendix
Chapter 2: Technical Performance
SuperGLUE
The SuperGLUE benchmark data was pulled from the SuperGLUE leaderboard. Details about the
SuperGLUE benchmark are in the SuperGLUE paper and SuperGLUE software toolkit. The tasks and
evaluation metrics for SuperGLUE are:
NAME
IDENTIFIER
METRIC
Broadcoverage Diagnostics
AX-b
Matthew’s Carr
CommitmentBank
CB
Avg. F1/Accuracy
Choice of Plausible Alternatives
COPA
Accuracy
Multi-Sentence Reading Comprehension
MultiRC
F1a/EM
Recognizing Textual Entailment
RTE
Accuracy
Words in Context
WiC
Accuracy
The Winograd Schema Challenge
WSC
Accuracy
BooIQ
BooIQ
Accuracy
Reading Comprehension With Commonsense Reasoning
ReCoRD
F1/Accuracy
Winogender Schema Diagnostic
AX-g
Gender Parity/Accuracy
Reading Comprehension Dataset Requiring
Logical Reasoning (ReClor)
Data on ReClor accuracy was retrieved through the ReClor leaderboard cross-referenced by technical progress
reported on Papers With Code. Details about the ReClor benchmark can be found in the ReClor paper.
Table of Contents
Appendix
357
Artificial Intelligence
Index Report 2023
Appendix
Chapter 2: Technical Performance
Narrative Highlight:
Just How Much Better Have
Language Models Become?
paid $90,000 for the role, and before I even started
Complete answer outputted by GPT-2 to the AI Index
I wanted to know in advance: What percentage of
prompt request.
his work was actually done as a private citizen, and
Model prompt >>> Explain to me the major
how efficiently should he handle it for his clients? And
accomplishments of Theodore Roosevelt’s
what advice would he give me should I end up in the
presidency.
public eye?
(The most notable is the establishment of a national
At no time, my job as the lead prosecutor on this
telephone and mail exchange company.)
case required me to put forth any of those two or
But the only things that will change with a president
who knows how to operate a business is his
understanding of the American system of government.
The public knows all about our problems, yet at least
half of the people who need it won’t know anything
on the case, I had to convince one top staffer to get
paid even $100,000 to have an interview with my lead
prosecutor.
five requests. Most people on the case, though, did.
One person asked me what the case’s real value
was. That didn’t stop me from telling it to one of
my top managers. And my top manager never even
mentioned it.
about ours. We can no longer have this. Our system of
In the end, it was a two-week, $40 million round-
government is the only
trip tour of the government’s business, with the best
one in which you really need a public relations firm to
work hard.
lawyer possible. In exchange, we got a chance to hear
his last public comment in this case, his last comment
to me.
The system of governmental control, while working,
is not always effective in ensuring that the very things
There was one more point — the end point of the case.
that our economy needs, do not always be at the
When it came up for consideration in October of 1993,
top of everybody’s list. So I wanted to know how
I had the choice to give it three years, or six. And I
successful the system you employ for managing the
chose six.
public’s needs is in helping the American economy
grow, how efficient the service our economy provides
is being made, and how it can better serve the needs
of a growing number of Americans.
I wanted to know.
This was before I ever hired my own lawyer to work on
the case. This was before I ever hired my own law firm
to handle government investigations. For my first major
job, at least, I had to convince two top aides to get
Table of Contents
Appendix
358
Artificial Intelligence
Index Report 2023
Appendix
Chapter 2: Technical Performance
Large Language Models on
the Blocksworld Domain
PubMed
Data on Blocksworld domain for large language models
evaluation (ROUGE-1) was retrieved through a detailed
Data on PubMed recall-oriented understudy for gisting
was retrieved from the paper Valmeekam et al., 2022.
arXiv literature review cross-referenced by technical
arXiv
dates correspond to the year in which a paper was first
Data on arXiv recall-oriented understudy for gisting
evaluation (ROUGE-1) was retrieved through a detailed
arXiv literature review cross-referenced by technical
progress reported on Papers With Code. The reported
dates correspond to the year in which a paper was first
progress reported on Papers With Code. The reported
published to arXiv, and the reported results (ROUGE-1)
correspond to the result reported in the most recent
version of each paper. Details about the PubMed
benchmark are in the PubMed paper.
To highlight progress in PubMed, scores were taken
published to arXiv, and the reported results (ROUGE-1)
from the following papers:
correspond to the result reported in the most recent
A Discourse-Aware Attention Model for Abstractive
Summarization of Long Documents
version of each paper. Details about the arXiv
benchmark are in the arXiv dataset webpage.
To highlight progress on arXiv, scores were taken from
the following papers:
Big Bird: Transformers for Longer Sequences
A Discourse-Aware Attention Model for
Abstractive Summarization of Long Documents
Get to the Point: Summarization With
Pointer-Generator Networks
Long Document Summarization With Top-Down
and Bottom-Up Inference
MemSum: Extractive Summarization of Long
Documents Using Multi-Step Episodic Markov
Decision Processes
PEGASUS: Pre-training With Extracted Gap-Sentences
for Abstractive Summarization
Get to the Point: Summarization With PointerGenerator Networks
Long Document Summarization With Top-Down
and Bottom-Up Inference
LongT5: Efficient Text-to-Text Transformer for
Long Sequences
PEGASUS: Pre-training With Extracted Gap-Sentences
for Abstractive Summarization
Sparsifying Transformer Models With Trainable
Representation Pooling
Abductive Natural Language
Inference (aNLI)
Data on Abductive Natural Language Inference (aNLI)
was sourced from the Allen Institute for AI’s aNLI
leaderboard. Details on the aNLI benchmark can be
found in the aNLI paper.
Table of Contents
Appendix
359
Artificial Intelligence
Index Report 2023
SST-5 Fine-Grained
Data on SST-5 Fine-Grained accuracy was retrieved
through a detailed arXiv literature review crossreferenced by technical progress reported on Papers
With Code. The reported dates correspond to the year
Appendix
Chapter 2: Technical Performance
To highlight progress on MMLU accuracy, scores were
taken from the following papers:
Language Models Are Few-Shot Learners
Language Models Are Unsupervised
Multitask Learners
in which a paper was first published to arXiv, and the
Scaling Instruction-Finetuned Language Models
reported results (accuracy) correspond to the result
Scaling Language Models: Methods, Analysis &
Insights from Training Gopher
reported in the most recent version of each paper.
Details about the SST-5 Fine-Grained benchmark can
be found in the SST paper.
To highlight progress on SST-5 Fine-Grained accuracy,
scores were taken from the following papers:
An Algorithm for Routing Capsules in All Domains
An Algorithm for Routing Vectors in Sequences
Improved Semantic Representations from TreeStructured Long Short-Term Memory Networks
Number of Commercially
Available MT Systems
Details about the number of commercially available
MT systems were sourced from the Intento report The
State of Machine Translation, 2022. Intento is a San
Francisco–based startup that analyzes commercially
available MT services.
Improved Sentence Modeling Using Suffix
Bidirectional LSTM
VoxCeleb
Learned in Translation: Contextualized Word Vectors
Data on VoxCeleb equal error rate (EER) was retrieved
Less Grammar, More Features
Recursive Deep Models for Semantic Compositionality
Over a Sentiment Treebank
Self-Explaining Structures Improve NLP Models
from the VoxCeleb Speaker Recognition Challenge
(VoxSRC).
For the sake of consistency, the AI Index reported scores
on the initial VoxCeleb dataset. Specifically, the AI Index
made use of the following sources of information:
MMLU
Data on MMLU accuracy was retrieved through a
detailed arXiv literature review cross-referenced by
technical progress reported on Papers With Code. The
reported dates correspond to the year in which a paper
was first published to arXiv, and the reported results
(accuracy) correspond to the result reported in the most
recent version of each paper. Details about the MMLU
benchmark can be found in the MMLU paper.
ID R&D System Description to VoxCeleb Speaker
Recognition Challenge 2022
The IDLAB VoXSRC-20 Submission: Large Margin
Fine-Tuning and Quality-Aware Score Calibration in
DNN Based Speaker Verification
The SpeakIn System for VoxCeleb Speaker
Recognition Challenge 2021
VoxCeleb: A Large-Scale Speaker Identification
Dataset
VoxCeleb: Large-Scale Speaker Verification in the Wild
VoxCeleb2: Deep Speaker Recognition
Table of Contents
Appendix
360
Artificial Intelligence
Index Report 2023
Whisper
Data on Whisper for large-scale speech recognition
models was retrieved from the paper Radford et al.,
2022.
Appendix
Chapter 2: Technical Performance
Training Time, Number
of Accelerators, and
Performance
Data on training time, number of accelerators,
Procgen
and performance for AI systems was taken from
Data on Procgen mean-normalized score was retrieved
through a detailed arXiv literature review. The reported
dates correspond to the year in which a paper was first
published to arXiv, and the reported results (meannormalized score) correspond to the result reported in
the most recent version of each paper. Details on the
Procgen benchmark can be found in the Procgen paper.
the MLPerf Training and Inference benchmark
competitions. Details on the MLPerf Training
benchmark can be found in the MLPerf Training
Benchmark paper, while details on MLPerf Inference
can be found in the MLPerf Inference Benchmark
paper. Information about the current benchmark
categories as well as technical information about
submission and competition subdivisions can be
To highlight progress on Procgen, scores were taken
found on the MLPerf Training and MLPerf Inference
from the following papers:
webpages.
Automatic Data Augmentation for Generalization in
Reinforcement Learning
The AI Index made use of data from the following
Leveraging Procedural Generation to Benchmark
Reinforcement Learning
MLPerf Training v2.1, 2022
Procedural Generalization by Planning With
Self-Supervised World Models
MLPerf Training v1.1, 2021
Rethinking Value Function Learning for
Generalization in Reinforcement Learning
MLPerf Training competitions:
MLPerf Training v2.0, 2022
MLPerf Training v1.0, 2021
MLPerf Training v0.7, 2020
MLPerf Training v0.6, 2019
MLPerf Training v0.5, 2018
The AI Index made use of data from the following
MLPerf Inference competitions:
MLPerf Inference v2.1, 2022
MLPerf Inference v2.0, 2022
MLPerf Inference v1.1, 2021
MLPerf Inference v1.0, 2021
MLPerf Inference v0.7, 2020
Table of Contents
Appendix
361
Artificial Intelligence
Index Report 2023
GPUs’ Performance and Price
Appendix
Chapter 2: Technical Performance
The AI Index collected data on GPUs’ performance and
Carbon Footprint of Select
Machine Learning Models
price, building on and extending the dataset collected
Data on carbon-emission estimates of select
from Epoch AI’s Trends in GPU Price-Performance
blog post.
machine learning models was sourced from the
paper Luccioni et al., 2022. Data on carbon-emission
The AI Index compiled a list of GPUs starting from
estimates of real-life examples was retrieved from
the Median Group (2018), Sun et al. (2019), and Epoch
Strubell et al., 2019.
(2022) datasets. To update and extend previous
for the period 2021–2023, gathering information
Energy Savings Results
From BCOOLER Experiment
from sources such as TechPowerUp, WikiChip, and
Data on energy savings over time for the
Wikipedia entries for the product series. We also
BCOOLER experiment was sourced from the
collected information about GPUs released before
paper Luo et al., 2022.
analysis, the AI Index included new GPU releases
2021 from the manufacturer’s catalog or Wikipedia’s
list of processors.
To disambiguate duplicates of different versions of
the same product with different specifications, the
AI Index added the part number or difference in
specification, as applicable.
To find GPU prices, the AI Index searched various
sources including the manufacturer’s website,
Wikipedia, and TechPowerUp. GPU prices have been
adjusted for inflation using CPI-U data provided by
the U.S. Bureau of Labor Statistics. Missing data for
certain GPUs was completed using additional sources,
such as the manufacturer’s website, Wikipedia,
and TechPowerUp. This includes information such
as manufacturer, type, release date, performance
(double, single, and half-precision operations per
second), die size, power, clock speed, process size,
and number of transistors.
Table of Contents
Appendix
362
Artificial Intelligence
Index Report 2023
Appendix
Chapter 3: Technical AI Ethics
Chapter 3: Technical AI Ethics
Meta-Analysis of Fairness
and Bias Metrics
For the analysis conducted on fairness and bias
metrics in AI, we identify and report on benchmark
and diagnostic metrics which have been consistently
cited in the academic community, reported on a public
leaderboard, or reported for publicly available baseline
models (e.g., GPT-3, BERT, ALBERT). We note that
research paper citations are a lagging indicator of
adoption, and metrics which have been very recently
adopted may not be reflected in the data for 2022. We
include the full list of papers considered in the 2022 AI
Index as well as the following additional papers:
Beyond the Imitation Game: Quantifying and
Extrapolating the Capabilities of Language Models
BBQ: A Hand-Built Bias Benchmark for
Question Answering
VLStereoSet: A Study of Stereotypical Bias
in Pre-trained Vision-Language Models
Natural Language Processing
Bias Metrics
In Section 3.3, we track citations of the Perspective
API created by Jigsaw at Google. The Perspective API
has been adopted widely by researchers and engineers
in natural language processing. Its creators define
toxicity as “a rude, disrespectful, or unreasonable
comment that is likely to make someone leave a
discussion,” and the tool is powered by machine
learning models trained on a proprietary dataset of
comments from Wikipedia and news websites.
We include the full list of papers considered in the
2022 AI Index as well as the following additional
papers:
Discovering Language Model Behaviors With
Model-Written Evaluations
AlexaTM 20B: Few-Shot Learning Using a
Large-Scale Multilingual Seq2Seq Model
“I’m Sorry to Hear That”: Finding New Biases in
Language Models With a Holistic Descriptor Dataset
Aligning Generative Language Models With
Human Values
On Measuring Social Biases in Prompt-Based
Multi-task Learning
Challenges in Measuring Bias via Open-Ended
Language Generation
PaLM: Scaling Language Modeling With Pathways
Characteristics of Harmful Text: Towards
Rigorous Benchmarking of Language Models
Perturbation Augmentation for Fairer NLP
Scaling Instruction-Finetuned Language Models
SODAPOP: Open-Ended Discovery of Social
Biases in Social Commonsense Reasoning Models
Towards Robust NLG Bias Evaluation
With Syntactically-Diverse Prompts
Table of Contents
Appendix
Controllable Natural Language Generation With
Contrastive Prefixes
DD-TIG at SemEval-2022 Task 5: Investigating the
Relationships Between Multimodal and Unimodal
Information in Misogynous Memes Detection and
Classification
363
Artificial Intelligence
Index Report 2023
Detoxifying Language Models With a Toxic Corpus
DisCup: Discriminator Cooperative Unlikelihood
Prompt-Tuning for Controllable Text Generation
Evaluating Attribution in Dialogue Systems:
The BEGIN Benchmark
Appendix
Chapter 3: Technical AI Ethics
Predictability and Surprise in Large Generative
Models
Quark: Controllable Text Generation With
Reinforced [Un]learning
Red Teaming Language Models With Language Models
Exploring the Limits of Domain-Adaptive Training
for Detoxifying Large-Scale Language Models
Reward Modeling for Mitigating Toxicity in
Transformer-based Language Models
Flamingo: A Visual Language Model for
Few-Shot Learning
Robust Conversational Agents Against Imperceptible
Toxicity Triggers
Galactica: A Large Language Model for Science
Scaling Instruction-Finetuned Language Models
GLaM: Efficient Scaling of Language Models
With Mixture-of-Experts
StreamingQA: A Benchmark for Adaptation to New
Knowledge over Time in Question Answering Models
GLM-130B: An Open Bilingual Pre-trained Model
Training Language Models to Follow Instructions
With Human Feedback
Gradient-Based Constrained Sampling From
Language Models
HateCheckHIn: Evaluating Hindi Hate Speech
Detection Models
Holistic Evaluation of Language Models
Transfer Learning From Multilingual DeBERTa
for Sexism Identification
Transformer Feed-Forward Layers Build Predictions
by Promoting Concepts in the Vocabulary Space
An Invariant Learning Characterization of
Controlled Text Generation
While the Perspective API is used widely within
LaMDA: Language Models for Dialog Applications
online toxicity, toxicity in the specific domains used to
Leashing the Inner Demons: Self-Detoxification
for Language Models
train the models undergirding Perspective (e.g., news,
Measuring Harmful Representations in Scandinavian
Language Models
forms of toxicity (e.g., trolling). Other known caveats
Mitigating Toxic Degeneration With Empathetic Data:
Exploring the Relationship Between Toxicity and
Empathy
voices: The Perspective API has been shown to
MULTILINGUAL HATECHECK: Functional Tests for
Multilingual Hate Speech Detection Models
am a gay man”). As a result, detoxification techniques
A New Generation of Perspective API: Efficient
Multilingual Character-Level Transformers
result in models that are less capable of modeling
OPT: Open Pre-trained Transformer Language Models
mentioning minority identities.
PaLM: Scaling Language Modeling With Pathways
New versions of the Perspective API have been
Perturbations in the Wild: Leveraging Human-Written
Text Perturbations for Realistic Adversarial Attack and
Defense
deployed since its inception, and there may be subtle
Table of Contents
Appendix
machine learning research and also for measuring
Wikipedia) may not be broadly representative of all
include biases against text written by minority
disproportionately assign high toxicity scores to text
that contains mentions of minority identities (e.g., “I
built with labels sourced from the Perspective API
language used by minority groups, and may avoid
undocumented shifts in its behavior over time.
364
Artificial Intelligence
Index Report 2023
Appendix
Chapter 3: Technical AI Ethics
RealToxicityPrompts
We tally the number of papers in each category to
We sourced the RealToxicityPrompts dataset of
are not double-counted in multiple categories. We
evaluations from the HELM benchmark website, as
documented in v0.1.0.
reach the numbers found in Figure 3.7.3. Papers
note that this data may not be as accurate for data
pre-2018 as societal impacts work at NeurIPS has
AI Ethics in China
historically been categorized under a broad “AI for
The data in this section is sourced from the 2022 paper
into more granular research areas. Examples include
AI Ethics With Chinese Characteristics? Concerns
workshops dedicated to machine learning for health;
and Preferred Solutions in Chinese Academia. We
climate; policy and governance; disaster response;
are grateful to Junhua Zhu for clarifications and
and the developing world.
correspondence.
To track trends around specific technical topics at
AI Ethics Trends at FAccT
and NeurIPS
NeurIPS as in Figures 3.7.4 to 3.7.7, we count the
To understand trends at the ACM Conference on
Fairness, Accountability, and Transparency, this
section tracks FAccT papers published in conference
proceedings from 2018 to 2022. We categorize
social impact” umbrella, but it has recently been split
number of papers accepted to the NeurIPS main track
with titles containing keywords (e.g., “counterfactual”
or “causal” for tracking papers related to causal
effect), as well as papers submitted to related
workshops.
author affiliations into academic, industry, nonprofit,
TruthfulQA
government, and independent categories, while also
We sourced the TruthfulQA dataset of evaluations
tracking the location of their affiliated institution.
from the HELM benchmark website, as documented
Authors with multiple affiliations are counted once in
in v0.1.0.
each category (academic and industry), but multiple
affiliations of the same type (i.e., authors belonging
to two academic institutions) are counted once in the
category.
For the analysis conducted on NeurIPS publications,
we identify workshops themed around real-world
impact and label papers with a single main category in
“healthcare,” “climate,” “finance,” “developing world,”
“science,” or “other,” where “other” denotes a paper
related to a real-world use case but not in one of the
other categories. The “science” category is new in
2022, but includes retroactive analysis of papers from
previous years.
Table of Contents
Appendix
365
Artificial Intelligence
Index Report 2023
Appendix
Chapter 4: The Economy
Chapter 4: The Economy
Lightcast
job postings is the Job Openings and Labor Turnover
Prepared by Scott Bingham, Julia Nania, Layla O’Kane,
of Labor Statistics. Based on comparisons between
and Bledi Taska
Survey (JOLTS) program, conducted by the Bureau
JOLTS and Lightcast, the labor market demand
Lightcast delivers job market analytics that empower
captured by Lightcast data represents over 99% of
employers, workers, and educators to make data-
the total labor demand. Jobs not posted online are
driven decisions. The company’s artificial intelligence
usually in small businesses (the classic example being
technology analyzes hundreds of millions of job postings
the “Help Wanted” sign in a restaurant window) and
and real-life career transitions to provide insight
union hiring halls.
into labor market patterns. This real-time strategic
Measuring Demand for AI
intelligence offers crucial insights, such as what jobs are
In order to measure the demand by employers of AI
most in demand, the specific skills employers need, and
skills, Lightcast uses its skills taxonomy of over 31,000
the career directions that offer the highest potential for
skills. The list of AI skills from Lightcast data are shown
workers. For more information, visit www.lightcast.io.
below, with associated skill clusters. While some skills
Job Posting Data
are considered to be in the AI cluster specifically,
To support these analyses, Lightcast mined its dataset
for the purposes of this report, all skills below were
of millions of job postings collected since 2010.
considered AI skills. A job posting was considered an
Lightcast collects postings from over 51,000 online job
AI job if it mentioned any of these skills in the job text.
sites to develop a comprehensive, real-time portrait
Artificial Intelligence: AIOps (Artificial Intelligence for
of labor market demand. It aggregates job postings,
removes duplicates, and extracts data from job postings
text. This includes information on job title, employer,
industry, and region, as well as required experience,
education, and skills.
IT Operations), Applications of Artificial Intelligence,
Artificial General Intelligence, Artificial Intelligence,
Artificial Intelligence Development, Artificial
Intelligence Markup Language (AIML), Artificial
Intelligence Systems, Azure Cognitive Services,
Job postings are useful for understanding trends in
Baidu, Cognitive Automation, Cognitive Computing,
the labor market because they allow for a detailed,
Computational Intelligence, Cortana, Expert Systems,
real-time look at the skills employers seek. To assess
Intelligent Control, Intelligent Systems, Interactive
the representativeness of job postings data, Lightcast
Kiosk, IPSoft Amelia, Knowledge-Based Configuration,
conducts a number of analyses to compare the
Knowledge-Based Systems, Multi-Agent Systems,
distribution of job postings to the distribution of official
Open Neural Network Exchange (ONNX), OpenAI
government and other third-party sources in the United
Gym, Reasoning Systems, Soft Computing, Syman,
States. The primary source of government data on U.S.
Watson Conversation, Watson Studio, Weka
Table of Contents
Appendix
366
Artificial Intelligence
Index Report 2023
Appendix
Chapter 4: The Economy
Autonomous Driving: Advanced Driver Assistance
Machine Learning: AdaBoost, Apache MADlib,
Systems, Autonomous Cruise Control Systems,
Apache Mahout, Apache SINGA, Apache Spark,
Autonomous System, Autonomous Vehicles, Guidance
Association Rule Learning, Automated Machine
Navigation and Control Systems, Light Detection and
Learning, Autonomic Computing, AWS SageMaker,
Ranging (LiDAR), OpenCV, Path Analysis, Path Finding,
Azure Machine Learning, Boosting, CHi-Squared
Remote Sensing, Unmanned Aerial Systems (UAS)
Automatic Interaction Detection (CHAID),
Natural Language Processing (NLP): Amazon Textract,
ANTLR, BERT (NLP Model), Chatbot, Computational
Linguistics, DeepSpeech, Dialog Systems, fastText,
Fuzzy Logic, Handwriting Recognition, Hugging
Face (NLP Framework), HuggingFace Transformers,
Intelligent Agent, Intelligent Software Assistant,
Intelligent Virtual Assistant, Kaldi, Latent Dirichlet
Allocation, Lexalytics, Machine Translation, Microsoft
LUIS, Natural Language Generation, Natural Language
Processing, Natural Language Processing Systems,
Natural Language Programming, Natural Language
Toolkits, Natural Language Understanding, Natural
Language User Interface, Nearest Neighbour
Algorithm, OpenNLP, Optical Character Recognition
(OCR), Screen Reader, Semantic Analysis, Semantic
Interpretation for Speech Recognition, Semantic
Parsing, Semantic Search, Sentiment Analysis,
Seq2Seq, Speech Recognition, Speech Recognition
Software, Statistical Language Acquisition, Text Mining,
Tokenization, Voice Interaction, Voice User Interface,
Word Embedding, Word2Vec Models
Classification And Regression Tree (CART), Cluster
Analysis, Collaborative Filtering, Confusion Matrix,
Cyber-Physical Systems, Dask (Software), Data
Classification, DBSCAN, Decision Models, Decision
Tree Learning, Dimensionality Reduction, Dlib
(C++ Library), Ensemble Methods, Evolutionary
Programming, Expectation Maximization Algorithm,
Feature Engineering, Feature Extraction, Feature
Learning, Feature Selection, Gaussian Process,
Genetic Algorithm, Google AutoML, Google Cloud
ML Engine, Gradient Boosting, H2O.ai, Hidden
Markov Model, Hyperparameter Optimization,
Inference Engine, K-Means Clustering, Kernel
Methods, Kubeflow, LIBSVM, Machine Learning,
Machine Learning Algorithms, Markov Chain, Matrix
Factorization, Meta Learning, Microsoft Cognitive
Toolkit (CNTK), MLflow, MLOps (Machine Learning
Operations), mlpack (C++ Library), Naive Bayes,
Perceptron, Predictionio, PyTorch (Machine Learning
Library), Random Forest Algorithm, Recommendation
Engine, Recommender Systems, Reinforcement
Learning, Scikit-learn (Machine Learning Library),
Neural Networks: Apache MXNet, Artificial Neural
Semi-Supervised Learning, Soft Computing, Sorting
Networks, Autoencoders, Caffe, Caffe2, Chainer,
Algorithm, Supervised Learning, Support Vector
Convolutional Neural Networks, Cudnn, Deep Learning,
Machine, Test Datasets, Torch (Machine Learning),
Deeplearning4j, Keras (Neural Network Library), Long
Training Datasets, Transfer Learning, Unsupervised
Short-Term Memory (LSTM), OpenVINO, PaddlePaddle,
Learning, Vowpal Wabbit, Xgboost
Pybrain, Recurrent Neural Network (RNN), TensorFlow
Table of Contents
Appendix
367
Artificial Intelligence
Index Report 2023
Appendix
Chapter 4: The Economy
Robotics: Advanced Robotics, Cognitive Robotics,
intelligence, computer vision, image processing,
Motion Planning, Nvidia Jetson, Robot Framework,
deep learning, TensorFlow, Pandas (software), and
Robot Operating Systems, Robotic Automation
OpenCV, among others.
Software, Robotic Liquid Handling Systems, Robotic
Programming, Robotic Systems, Servomotor, SLAM
Algorithms (Simultaneous Localization and Mapping)
Skill groupings are derived by expert taxonomists
through a similarity-index methodology that
measures skill composition at the industry level.
Visual Image Recognition: 3D Reconstruction, Activity
LinkedIn’s industry taxonomy and their corresponding
Recognition, Computer Vision, Contextual Image
NAICS codes can be found here.
Classification, Digital Image Processing, Eye Tracking,
Skills Genome
Face Detection, Facial Recognition, Image Analysis,
Image Matching, Image Processing, Image Recognition,
Image Segmentation, Image Sensor, Imagenet,
Machine Vision, Motion Analysis, Object Recognition,
OmniPage, Pose Estimation, RealSense
For any entity (occupation or job, country, sector,
etc.), the skill genome is an ordered list (a vector) of
the 50 “most characteristic skills” of that entity. These
most characteristic skills are identified using a TF-IDF
algorithm to identify the most representative skills of
LinkedIn
the target entity, while down-ranking ubiquitous skills
Prepared by Murat Erer and Akash Kaura
(e.g., Microsoft Word).
Country Sample
TF-IDF is a statistical measure that evaluates how
Included countries represent a select sample of eligible
representative a word (in this case a skill) is to a
countries with at least 40% labor force coverage by
selected entity). This is done by multiplying two
LinkedIn and at least 10 AI hires in any given month.
metrics:
that add little information about that specific entity
China and India were included in this sample because
of their increasing importance in the global economy,
but LinkedIn coverage in these countries does not
1. The term frequency of a skill in an entity (TF).
2. The logarithmic inverse entity frequency of the
reach 40% of the workforce. Insights for these countries
skill across a set of entities (IDF). This indicates
may not provide as full a picture as other countries, and
how common or rare a word is in the entire entity
should be interpreted accordingly.
set. The closer IDF is to 0, the more common the
word.
Skills (and AI Skills)
LinkedIn members self-report their skills on their
So if the skill is very common across LinkedIn entities,
LinkedIn profiles. Currently, more than 38,000 distinct,
and appears in many job or member descriptions, the
standardized skills are identified by LinkedIn. These
IDF will approach 0. If, on the other hand, the skill
have been coded and classified by taxonomists at
is unique to specific entities, the IDF will approach
LinkedIn into 249 skill groupings, which are the skill
1. More details are available at LinkedIn’s Skills
groups represented in the dataset. The top skills that
Genome and LinkedIn-World Bank Methodology.
make up the AI skill grouping are machine learning,
natural language processing, data structures, artificial
Table of Contents
Appendix
368
Artificial Intelligence
Index Report 2023
Appendix
Chapter 4: The Economy
AI Skills Penetration
perform the job. Skills penetration is used as a signal
The aim of this indicator is to measure the intensity of AI
for whether AI skills are prevalent in an occupation
skills in an entity (a particular country, industry, gender,
representative in any sector where the occupation
etc.) through the following methodology:
representative may exist. Examples of such
• Compute frequencies for all self-added skills by
LinkedIn members in a given entity (occupation,
industry, etc.) in 2015–2021.
• Re-weight skill frequencies using a TF-IDF model
to get the top 50 most representative skills in
that entity. These 50 skills compose the “skill
genome” of that entity.
• Compute the share of skills that belong to the
occupations include (but are not limited to): machine
learning engineer, artificial intelligence specialist,
data scientist, computer vision engineer, etc.
AI Talent
A LinkedIn member is considered AI talent if they
have explicitly added AI skills to their profile and/or
they are occupied in an AI occupation representative.
The counts of AI talent are used to calculate talent
concentration metrics. For example, to calculate
AI skill group out of the top skills in the selected
the country level AI talent concentration, we use
entity.
the counts of AI talent at the country level vis-a-vis
Interpretation: The AI skill penetration rate signals
the prevalence of AI skills across occupations, or the
the counts of LinkedIn members in the respective
countries.
intensity with which LinkedIn members utilize AI skills
Relative AI Skills Penetration
in their jobs. For example, the top 50 skills for the
To allow for skills penetration comparisons across
occupation of engineer are calculated based on the
countries, the skills genomes are calculated and a
weighted frequency with which they appear in LinkedIn
relevant benchmark is selected (e.g., global average).
members’ profiles. If four of the skills that engineers
A ratio is then constructed between a country’s and
possess belong to the AI skill group, this measure
the benchmark’s AI skills penetrations, controlling for
indicates that the penetration of AI skills is estimated to
occupations.
be 8% among engineers (i.e., 4/50).
Interpretation: A country’s relative AI skills
Jobs or Occupations
penetration of 1.5 indicates that AI skills are 1.5 times
LinkedIn member titles are standardized and grouped
as frequent as in the benchmark, for an overlapping
into approximately 15,000 occupations. These are
set of occupations.
not sector- or country-specific. These occupations
Global Comparison
are further standardized into approximately
3,600 occupation representatives. Occupation
representatives group occupations with a common role
and specialty, regardless of seniority.
For cross-country comparison, we present the
relative penetration rate of AI skills, measured as
the sum of the penetration of each AI skill across
occupations in a given country, divided by the
AI Jobs and Occupations
average global penetration of AI skills across the
An “AI” job (technically, occupation representative)
overlapping occupations in a sample of countries.
is an occupation representative that requires AI skills to
Table of Contents
Appendix
369
Artificial Intelligence
Index Report 2023
Appendix
Chapter 4: The Economy
Interpretation: A relative penetration rate of 2 means
Interpretation: A country’s “Relative AI Skills
that the average penetration of AI skills in that country
Penetration Across Genders” for women of 1.5 means
is two times the global average across the same set of
that female members in that country are 1.5 times
occupations.
more likely to list AI skills than the average member in
Global Comparison: By Industry
all countries pooled together across the same set of
The relative AI skills penetration by country for industry
occupations that exist in the country.
provides an in-depth sectoral decomposition of AI skill
Relative AI Hiring Index
penetration across industries and sample countries.
LinkedIn Hiring Rate or Overall Hiring Rate
Interpretation: A country’s relative AI skill penetration
rate of 2 in the education sector means that the average
penetration of AI skills in that country is two times the
global average across the same set of occupations in
that sector.
Global Comparison: By Gender
The “Relative AI Skills Penetration by Gender” metric
provides a cross-country comparison of AI skill
penetrations within each gender, comparing countries’
male or female AI skill penetrations to the global
is a measure of hires normalized by LinkedIn
membership. It is computed as the percentage of
LinkedIn members who added a new employer
in the same period the job began, divided by
the total number of LinkedIn members in the
corresponding location.
AI Hiring Rate is computed following the overall
hiring rate methodology, but only considering
members classified as AI talent.
Relative AI Hiring Index is the pace of change in
average of the same gender. Since the global averages
AI Hiring Rate normalized by the pace of change
are distinct for each gender, this metric should only be
in Overall Hiring Rate, providing a picture of
used to compare country rankings within each gender,
whether hiring of AI talent is growing at a higher,
and not for cross-gender comparisons within countries.
equal, or lower rate than overall hiring in a
Interpretation: A country’s AI skills penetration for
women of 1.5 means that female members in that
country are 1.5 times more likely to list AI skills than the
market. The relative AI Hiring Index is equal to
1.0 when AI hiring and overall hiring are growing
at the same rate year on year.
average female member in all countries pooled together
Interpretation: Relative AI Hiring Index shows how
across the same set of occupations that exist in the
fast each country is experiencing growth in AI talent
country/gender combination.
hiring relative to growth in overall hiring in the country.
Global Comparison: Across Gender
A ratio of 1.2 means the growth in AI talent hiring has
The “Relative AI Skills Penetration Across Genders”
outpaced the growth in overall hiring by 20%.
metric allows for cross-gender comparisons within
and across countries globally, since we compare the
countries’ male and female AI skill penetrations to the
same global average regardless of gender.
Table of Contents
Appendix
370
Artificial Intelligence
Index Report 2023
Appendix
Chapter 4: The Economy
Changelog From Methodology Included in Last Year’s AI Index
1. LinkedIn ramped a new version of its industry taxonomy (see details here).
a. This has resulted in changes to our top level five key industries. We have made the full-time series
available for each industry (as with prior years).
i. “Software & IT Services” industry evolved into a wider “Technology, Information and Media,” which
encompasses media and telecommunications as well as other sub-industries.
ii. Former “Hardware & Networking” industry does not exist in the new taxonomy, so we introduced
“Professional Services” industry as the fifth industry in scope which contains a high concentration of AI
talent.
iii. Remaining “Education,” “Manufacturing,” and “Financial Services” (formerly known as “Finance”)
also had updates in their coverage resulting from the inclusion of more granular sub-industries.
b. This also resulted in minor changes in magnitudes for some metrics since the distinct number of
industries, as well as the distinct number of AI occupations defined within each country-industry pair have
changed:
i. We define AI occupations (occupation representatives that require AI skills to perform the job) and
the respective definition of AI Talent at Country-Industry level. For example, data engineers working
in the technology, information, and media industry in Germany may be identified as holding an AI
occupation, whereas data engineers working in the construction industry in the United Arab Emirates
may not be identified as AI Talent. Following the introduction of a more granular industry taxonomy
with improved accuracy, our AI Talent identifications have been improved, and results have been
reflected to the entirety of time series for each relevant metric.
ii. The following metrics have been impacted by this change in industry taxonomy: AI Talent
Concentrations, and Relative AI Hiring Rates. No directional changes were observed, only minor
changes in magnitudes.
2. We introduced a methodology change into Relative Skills Penetration metrics:
a. In the past, the data used to calculate these metrics were limited to top five industries with the highest
AI skill penetration globally: “Software & IT Services,” “Hardware & Networking,” “Manufacturing,”
“Education,” and “Finance” industries. This year we updated our coverage to all industries.
Table of Contents
Appendix
371
Artificial Intelligence
Index Report 2023
Appendix
Chapter 4: The Economy
NetBase Quid
subsidiary, out of business) throughout the world.
Prepared by Bill Valle and Nicole Seredenko
M&A, public offerings, minority stakes made by PE/
The investment data includes private investments,
NetBase Quid delivers AI-powered consumer and
VCs, corporate venture arms, governments, and
market intelligence to enable business reinvention
institutions both within and outside the United States.
in a noisy and unpredictable world. The software
Some data is simply unreachable—for instance, when
applies artificial intelligence to reveal patterns in large,
investors’ names or funding amounts are undisclosed.
unstructured datasets and to generate visualizations
NetBase Quid embeds Capital IQ data as a default
that enable users to make smart, data-driven decisions
accurately, quickly, and efficiently. NetBase Quid uses
Boolean query to search for focus areas, topics, and
keywords within social media, news, forums and blogs,
companies, and patents data sources, as well as other
custom datasets. NetBase Quid then visualizes these
data points based on the semantic similarity.
Search, Data Sources, and Scope
Over 8 million global public and private company profiles
from multiple data sources are indexed in order to
search across company descriptions, while filtering and
including metadata ranging from investment information
to firmographic information, such as founded year, HQ
location, and more. Company information is updated
on a weekly basis. The NetBase Quid algorithm reads
a big amount of text data from each document to
make links between different documents based on
their similar language. This process is repeated at an
immense scale, which produces a network with different
clusters identifying distinct topics or focus areas. Trends
are identified based on keywords, phrases, people,
companies, and institutions that NetBase Quid identifies,
and the other metadata that is put into the software.
and adds in data from Crunchbase for the data
points that are not captured in Capital IQ. This not
only yields comprehensive and accurate data on
all global organizations, but it also captures earlystage startups and funding events data. Company
information is updated on a weekly basis.
Earnings Calls
NetBase Quid leverages earnings call transcript
data embedded from Seeking Alpha. For this report,
NetBase Quid has analyzed mentions of AI-related
keywords across all earnings call transcripts from
Fortune 500 companies from January 2018 through
December 2022. New earnings call transcript data is
updated in NetBase Quid on the 1st and 15th of every
month.
Search Parameters
Boolean query is used to search for focus areas,
topics, and keywords within the archived company
database, within their business descriptions and
websites. We can filter out the search results by
HQ regions, investment amount, operating status,
organization type (private/public), and founding
year. NetBase Quid then visualizes these companies
Data
by semantic similarity. If there are more than 7,000
Companies
companies from the search result, NetBase Quid
Organization data is embedded from Capital IQ and
selects the 7,000 most relevant companies for
Crunchbase. These companies include all types of
visualization based on the language algorithm.
companies (private, public, operating, operating as a
Table of Contents
Appendix
372
Artificial Intelligence
Index Report 2023
Boolean Search: “artificial intelligence” or “AI” or
“machine learning” or “deep learning”
Companies:
Appendix
Chapter 4: The Economy
McKinsey & Company
Data used in the Corporate Activity-Industry
Adoption section was sourced from the McKinsey
• Global AI and ML companies that have received
investments (private, IPO, M&A) from January 1,
2013, to December 31, 2022.
• Global AI and ML companies that have received
Global Survey “The State of AI in 2022—and a Half
Decade in Review.”
The online survey was in the field from May 3, 2022,
to May 27, 2022, and from August 15, 2022, to
over $1.5M for the last 10 years (January 1, 2013,
August 17, 2022, and garnered responses from 1,492
to December 31, 2022): 7,000 out of 7,500
participants representing a full range of regions,
companies have been selected through NetBase
industries, company sizes, functional specialties,
Quid’s relevance algorithm.
and tenures. Of those respondents, 744 said their
organization had adopted AI in at least one function
Target Event Definitions
and were asked questions about their organization’s
• Private investments: A private placement is a
AI use. To adjust for differences in response rates,
private sale of newly issued securities (equity or
the data is weighted by the contribution of each
debt) by a company to a selected investor or a
respondent’s nation to global GDP.
selected group of investors. The stakes that buyers
take in private placements are often minority
stakes (under 50%), although it is possible to take
control of a company through a private placement
The AI Index also considered data from previous
iterations of the survey. More specifically, the AI
index made use of data from:
as well, in which case the private placement would
The State of AI in 2021
be a majority stake investment.
The State of AI in 2020
• Minority investment: These refer to minority
stake acquisitions in NetBase Quid, which take
place when the buyer acquires less than 50% of
the existing ownership stake in entities, asset
Global AI Survey: AI Proves Its Worth,
But Few Scale Impact (2019)
AI Adoption Advances, But Foundational Barriers
Remain (2018)
products, and business divisions.
• M&A: This refers to a buyer acquiring more than
50% of the existing ownership stake in entities,
asset products, and business divisions.
Table of Contents
Appendix
373
Artificial Intelligence
Index Report 2023
Appendix
Chapter 4: The Economy
GitHub
To obtain a global view of how AI is transforming
Data on the effects of GitHub’s Copilot on developer
business leaders between April 2022 and May 2022.
productivity and happiness was sourced from the
GitHub Copilot Survey conducted in 2022.
organizations, Deloitte surveyed 2,620 global
Thirteen countries were represented: Australia (100
respondents), Brazil (115 respondents), Canada (175
The survey was emailed to 17,420 users who had opted
respondents), China (200 respondents), France (130
in to receive communications and were using GitHub
respondents), Germany (150 respondents), India
Copilot for their daily programming activities. Between
(200 respondents), Israel (75 respondents), Japan
February 10, 2022, and March 6, 2022, the authors
(100 respondents), Singapore (100 respondents),
received 2,047 responses that could be matched with
South Africa (75 respondents), the United Kingdom
usage measurements during the four-week period
(200 respondents), and the United States (1,000
leading up to March 12, 2022. The survey contained
respondents). All participating companies have
multiple-choice questions on demographic information
adopted AI technologies and are AI users.
and Likert-type questions on different aspects of
Respondents were required to meet one of the
productivity, which were randomized in the order of
following criteria: responsible for AI technology
appearance to the user.
spending or approval of AI investments, developing
More details can be found in Ziegler at al., 2022.
AI technology strategies, managing or overseeing
AI technology implementation, serving as an AI
Deloitte
technology subject matter specialist, or making
Data used in the Corporate Activity-Industry Motivation
complement the blind survey, Deloitte conducted
section was sourced from Deloitte’s “State of AI in the
qualitative telephone interviews with 15 AI specialists
Enterprise” surveys.
from various industries. More details are available on
or influencing decisions around AI technology. To
More specifically, the AI Index made use of the following
sources of information:
Deloitte’s State of AI in the Enterprise,
5th Edition Report (2022)
Deloitte’s website.
International Federation of
Robotics (IFR)
State of AI in the Enterprise, 4th Edition (2021)
Data presented in the Robot Installations section was
Deloitte’s State of AI in the Enterprise, 3rd Edition (2020)
sourced from the “World Robotics 2022” report.
State of AI in the Enterprise, 2nd Edition (2018)
The 2017 Deloitte State of Cognitive Survey (2017)
Table of Contents
Appendix
374
Artificial Intelligence
Index Report 2023
Appendix
Chapter 5: Education
Chapter 5: Education
Computing Research
Association (CRA Taulbee
Survey)
Note: This year’s AI Index reused the methodological
notes that were submitted by the CRA for previous
editions of the AI Index. For more complete delineations
of the methodology used by the CRA, please consult the
individual CRA surveys that are linked below.
Computing Research Association (CRA) members
The CRA Taulbee Survey is sent only to doctoral
departments of computer science, computer
engineering, and information science/systems.
Historically, (a) Taulbee covers one-quarter to onethird of total BS CS recipients in the United States;
(b) the percent of women earning bachelor’s degrees
is lower in the Taulbee schools than overall; and (c)
Taulbee tracks the trends in overall CS production.
The AI Index used data from the following iterations
of the CRA survey:
are 200-plus North American organizations active in
CRA, 2021
computing research: academic departments of computer
CRA, 2020
science and computer engineering; laboratories and
CRA, 2019
centers in industry, government, and academia; and
affiliated professional societies (AAAI, ACM, CACS/
AIC, IEEE Computer Society, SIAM USENIX). CRA’s
mission is to enhance innovation by joining with industry,
CRA, 2018
CRA, 2017
CRA, 2016
government, and academia to strengthen research and
CRA, 2015
advanced education in computing. Learn more about
CRA, 2014
CRA here.
CRA, 2013
The CRA Taulbee Survey gathers survey data during the
fall of each academic year by reaching out to over 200
CRA, 2012
CRA, 2011
PhD-granting departments. Details about the Taulbee
Survey can be found here. Taulbee doesn’t directly
survey the students. The department identifies each
new PhD’s area of specialization as well as their type
of employment. Data is collected from September to
January of each academic year for PhDs awarded in the
previous academic year. Results are published in May
after data collection closes.
Table of Contents
Appendix
375
Artificial Intelligence
Index Report 2023
Appendix
Chapter 5: Education
Code.org
State Level Data
The following link includes a full description of the
methodology used by Code.org to collect its data. The
staff at Code.org also maintains a database of the state
of American K–12 education and, in this policy primer,
provides a greater amount of detail on the state of
American K–12 education in each state.
AP Computer Science Data
The AP computer science data is provided to Code.org
as per an agreement the College Board maintains with
Code.org. The AP Computer Science data comes from
the college board’s national and state summary reports.
The State of International
K–12 Education
Data on the state of international K–12 AI education was
taken from the following UNESCO report, published in
2021. The methodology is outlined in greater detail on
pages 18 to 20 in the report and, for the sake of brevity, is
not completely reproduced in the 2023 AI index.
Table of Contents
Appendix
376
Artificial Intelligence
Index Report 2023
Appendix
Chapter 6: Policy and Governance
Chapter 6: Policy and Governance
Global Legislation Records on AI
For AI-related bills passed into laws, the AI Index performed searches of the keyword “artificial intelligence”
on the websites of 127 countries’ congresses or parliaments (in the respective languages) in the full text of bills.
Note that only laws passed by state-level legislative bodies and signed into law (i.e., by presidents or through
royal assent) from 2016 to 2022 are included. Laws that were approved but then repealed are not included in the
analysis. In some cases, there were databases that were only searchable by title, so site search functions were
deployed. Future AI Index reports hope to include analysis on other types of legal documents, such as regulations
and standards, adopted by state- or supranational-level legislative bodies, government agencies, etc. The AI Index
team surveyed the following databases:
Algeria
China
India
Monaco
Slovenia
Andorra
Colombia
Iran, Islamic Republic
Montenegro
South Africa
Antigua and Barbuda
Croatia
Iraq
Morocco
Spain
Argentina
Cuba
Ireland
Mozambique
Sri Lanka
Armenia
Curacao
Isle of Man
Nauru
St. Kitts and Nevis
Australia
Cyprus
Israel
The Netherlands
Suriname
Austria
Czech Republic
Italy
New Zealand
Sweden
Azerbaijan
Denmark
Jamaica
Nicaragua
Switzerland
The Bahamas
Estonia
Japan
Niger
Tajikistan
Bahrain
Faroe Islands
Kazakhstan
Tanzania
Bangladesh
Fiji
Kenya
Northern Marina
Islands
Barbados
Finland
Kiribati
Norway
Tongo
Belarus
France
Korea, Republic
Belgium
The Gambia
Kosovo
Belize
Georgia
Kyrgyz Republic
Bermuda
Germany
Latvia
Bhutan
Gibraltar
Lebanon
Bolivia
Greece
Liechtenstein
Brazil
Greenland
Lithuania
Brunei
Grenada
Luxembourg
Bulgaria
Guam
Macao SAR, China
Burkina Faso
Guatemala
Malawi
Cameroon
Guyana
Malaysia
Canada
Hong Kong
Malta
Cayman Islands
Hungary
Mauritius
Chile
Iceland
Mexico
Table of Contents
Appendix
Panama
Papua New Guinea
Philippines
Poland
Portugal
Romania
Russia
Samoa
Saudi Arabia
Serbia
Seychelles
Sierra Leone
Singapore
Togo
Turkey
Tuvalu
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States
Uruguay
Vietnam
Yemen
Zambia
Zimbabwe
Slovak Republic
37 7
Artificial Intelligence
Index Report 2023
Appendix
Chapter 6: Policy and Governance
United States State-Level AI Legislation
For AI-related bills passed into law, the AI Index performed searches of the keyword “artificial intelligence” on
the legislative websites of all 50 U.S. states in the full text of bills. Bills are only counted as passed into law if the
final version of the bill includes the keyword, not just the introduced version. Note that only laws passed from
2015 to 2022 are included. The count for proposed laws includes both laws that were proposed and eventually
passed as well as laws that were proposed that have not yet been passed, or are now inactive. In some cases,
databases were only searchable by title, so site search functions were deployed. The AI Index team surveyed
the following databases:
Alabama
Hawaii
Massachusetts
New Mexico
South Dakota
Alaska
Idaho
Michigan
New York
Tennessee
Arizona
Illinois
Minnesota
North Carolina
Texas
Arkansas
Indiana
Mississippi
North Dakota
Utah
California
Iowa
Missouri
Ohio
Vermont
Colorado
Kansas
Montana
Oklahoma
Virginia
Connecticut
Kentucky
Nebraska
Oregon
Washington
Delaware
Louisiana
Nevada
Pennsylvania
West Virginia
Florida
Maine
New Hampshire
Rhode Island
Wisconsin
Georgia
Maryland
New Jersey
South Carolina
Wyoming
Table of Contents
Appendix
378
Artificial Intelligence
Index Report 2023
Appendix
Chapter 6: Policy and Governance
Global AI Mentions
For mentions of AI in AI-related legislative proceedings around the world, the AI Index performed searches of
the keyword “artificial intelligence” on the websites of 81 countries’ congresses or parliaments (in the respective
languages), usually under sections named “minutes,” “hansard,” etc. In some cases, databases were only
searchable by title, so site search functions were deployed. The AI Index team surveyed the following databases:
Andorra
Ecuador
Japan
Angola
El Salvador
Kenya
Armenia
Estonia
Kosovo
Australia
Fiji
Latvia
Azerbaijan
Finland
Lesotho
Barbados
France
Liechtenstein
Belgium
The Gambia
Luxembourg
Bermuda
Germany
Macao SAR, China
Bhutan
Gibraltar
Madagascar
Brazil
Greece
Malaysia
Cabo Verde
Hong Kong
Maldives
Canada
Iceland
Malta
Cayman Islands
India
Mauritius
China
Ireland
Mexico
Czech Republic
Isle of Man
Moldova
Denmark
Israel
Netherlands
Dominican Republic
Italy
New Zealand
11
Northern Mariana
Islands
South Africa
Norway
Spain
Pakistan
Panama
Papua New Guinea
Philippines
Poland
Portugal
Romania
Russia
Samoa
San Marino
Seychelles
Sierra Leone
South Korea
Sri Lanka
Sweden
Switzerland
Tanzania
Trinidad and Tobago
Ukraine
United Kingdom
United States
Uruguay
Zambia
Zimbabwe
Singapore
Slovenia
11 The National People’s Congress is held once per year and does not provide full legislative proceedings. Hence, the counts included in the analysis only searched mentions of “artificial
intelligence” in the only public document released from the Congress meetings, the Report on the Work of the Government, delivered by the premier.
Table of Contents
Appendix
379
Artificial Intelligence
Index Report 2023
United States
Committee Mentions
In order to research trends on the United States’
committee mentions of AI, the following search was
conducted:
Appendix
Chapter 6: Policy and Governance
• Private
Sector Companies: Google AI, Microsoft
AI, Nvidia, OpenAI
• Think Tanks and Policy Institutes: American
Enterprise Institute, Aspen Institute, Atlantic
Council, Brookings Institute, Carnegie
Endowment for International Peace, Cato
Website: Congress.gov
Institute, Center for a New American Security,
Keyword: artificial intelligence
Center for Strategic and International Studies,
Filters: Committee Reports
Council on Foreign Relations, Heritage
United States AI Policy Papers
Foundation, Hudson Institute, MacroPolo,
National Security Institute, New America
Foundation, RAND Corporation, Rockefeller
Organizations
To develop a more nuanced understanding of the
thought leadership that motivates AI policy, we
Foundation, Stimson Center, Urban Institute,
Wilson Center
tracked policy papers published by 55 organizations
• University Institutes and Research Programs:
in the United States or with a strong presence in the
AI and Humanity, Cornell University; AI Now
United States (expanded from last year’s list of 36
Institute, New York University; AI Pulse, UCLA
organizations) across four broad categories:
Law; Belfer Center for Science and International
• Civil Society, Associations, and Consortiums:
Algorithmic Justice League, Alliance for Artificial
Intelligence in Healthcare, Amnesty International,
EFF, Future of Privacy Forum, Human Rights
Watch, IJIS Institute, Institute for Electrical and
Electronics Engineers, Partnership on AI
• Consultancy: Accenture, Bain & Company,
Affairs, Harvard University; Berkman Klein Center,
Harvard University; Center for Information
Technology Policy, Princeton University; Center
for Long-Term Cybersecurity, UC Berkeley;
Center for Security and Emerging Technology,
Georgetown University; CITRIS Policy Lab,
UC Berkeley; Hoover Institution, Stanford
University; Institute for Human-Centered Artificial
Boston Consulting Group, Deloitte, McKinsey &
Intelligence, Stanford University; Internet Policy
Company
Research Initiative, Massachusetts Institute of
• Government Agencies: Congressional Research
Service, Defense Technical Information Center,
Technology; MIT Lincoln Laboratory; Princeton
School of Public and International Affairs
Government Accountability Office, Library of
Congress, Pentagon Library
Table of Contents
Appendix
380
Artificial Intelligence
Index Report 2023
Methodology
Appendix
Chapter 6: Policy and Governance
• Industry and Regulation: economy, antitrust,
Each broad topic area is based on a collection of
M&A, competition, finance, management, supply
underlying keywords that describe the content of the
chain, telecom, economic regulation, technical
specific paper. We included 17 topics that represented
standards, autonomous vehicle industry and
the majority of discourse related to AI between 2018–
regulation
2021. These topic areas and the associated keywords
are listed below:
• Innovation and Technology: advancements and
improvements in AI technology, R&D, intellectual
• Health and Biological Sciences: medicine,
property, patents, entrepreneurship, innovation
healthcare systems, drug discovery, care,
ecosystems, startups, computer science,
biomedical research, insurance, health behaviors,
engineering
COVID-19, global health
• Physical Sciences: chemistry, physics, astronomy,
earth science
• Education and Skills: early childhood, K–12,
higher education, STEM, schools, classrooms,
reskilling
• Energy and Environment: energy costs, climate
• Workforce and Labor: labor supply and demand,
change, energy markets, pollution, conservation,
talent, immigration, migration, personnel
oil and gas, alternative energy
economics, future of work
• International Affairs and International Security:
• Social and Behavioral Sciences: sociology,
international relations, international trade,
linguistics, anthropology, ethnic studies,
developing countries, humanitarian assistance,
demography, geography, psychology, cognitive
warfare, regional security, national security,
science
autonomous weapons
• Justice and Law Enforcement: civil justice,
criminal justice, social justice, police, public
safety, courts
• Humanities: arts, music, literature, language,
performance, theater, classics, history,
philosophy, religion, cultural studies
• Equity and Inclusion: biases, discrimination,
• Communications and Media: social media,
disinformation, media markets, deepfakes
• Government and Public Administration:
gender, race, socioeconomic inequality,
disabilities, vulnerable populations
• Privacy, Safety, and Security: anonymity, GDPR,
federal government, state government, local
consumer protection, physical safety, human
government, public sector efficiency, public
control, cybersecurity, encryption, hacking
sector effectiveness, government services,
government benefits, government programs,
public works, public transportation
• Ethics: transparency, accountability, human
values, human rights, sustainability, explainability,
interpretability, decision-making norms
• Democracy: elections, rights, freedoms, liberties,
personal freedoms
Table of Contents
Appendix
381
Artificial Intelligence
Index Report 2023
Appendix
Chapter 6: Policy and Governance
National AI Strategies
The AI Index did a web search to identify national strategies on AI. Below is a list of countries that were identified
as having a national AI strategy, including a link to said strategy. For certain counties, noted with an asterisk(*), the
actual strategy was not found, and a news article confirming the launch of the strategy was linked instead.
Countries with AI Strategies in Place
Algeria*
Cyprus
Italy
Philippines
Switzerland
Argentina
Czech Republic
Japan
Poland
Thailand
Australia
Denmark
Kenya
Portugal
Tunisia*
Austria
Egypt, Arab Republic
Korea, Republic
Qatar
Turkey
Bangladesh
Estonia
Latvia
Romania
Ukraine
Botswana*
Finland
Lithuania
Russia
United Arab Emirates
Brazil
France
Luxembourg
Saudi Arabia
United Kingdom
Bulgaria
Germany
Malta
Serbia
United States
Canada
Greece
Mauritius
Sierra Leone
Uruguay
Chile
Hungary
Mexico
Singapore
Vietnam
China
India
The Netherlands
Slovenia
Colombia
Indonesia
Norway
Spain
Croatia
Ireland
Peru
Sweden
Countries with
AI Strategies in
Development
Armenia
Federal Budget for Nondefense AI R&D
Data on the federal U.S. budget for nondefense AI R&D was taken from previous
editions of the AI Index (namely the 2021 and 2022 versions) and from the
Azerbaijan
following National Science and Technology Council reports:
Bahrain
Supplement to the President’s FY 2023 Budget
Belgium
Supplement to the President’s FY2022 Budget
Benin
Cuba
Iceland
Israel
Jordan
Morocco
New Zealand
Nigeria
Oman
Uzbekistan
Table of Contents
U.S. Department of Defense Budget Requests
Data on the DoD nonclassified AI-related budget requests was taken from
previous editions of the AI Index (namely the 2021 and 2022 versions) and from
the following reports:
Defense Budget Overview United States Department of Defense
Fiscal Year 2023 Budget Request
Defense Budget Overview United States Department of Defense
Fiscal Year 2022 Budget Request
Appendix
382
Artificial Intelligence
Index Report 2023
Govini
Appendix
Chapter 6: Policy and Governance
through regular government reporting processes or
Govini is the leading commercial data company in
the defense technology space. Built by Govini, Ark.
ai is used at scale across the national security sector
of the U.S. federal government. This platform enables
government analysts, program managers, and
decision-makers to gain unprecedented visibility into
the companies, capabilities, and capital in national
security to solve challenges pertaining to acquisition,
foreign influence and adversarial capital, nuclear
human-intensive analytical approaches.
Moreover, beyond simply making usable an expansive
body of data sources, Govini’s SaaS Platform and
National Security Knowledge Graph establishes high
fidelity standards in categorized and fused data to
produce a comprehensive and accurate depiction
of federal spending, and the supporting vendor
ecosystem, over time.
modernization, procurement, science and technology,
U.S. AI-Related Legal Cases
and supply chain.
To identify AI-related legal cases, the AI Index research
Govini curated USG AI spend data from their annual
Scorecard Taxonomy by applying supervised machine
learning (ML) and natural language processing (NLP)
to parse, analyze, and categorize large volumes of
federal contracts data, including prime contracts,
grants, and other transaction authority (OTA)
team did a keyword search on the LexisNexis database,
under their U.S. legal cases filter. The keywords that
were searched include “artificial intelligence,” “machine
learning,” and “automated decision-making.” Cases
that contained one of these keywords were coded
according to a variety of variables of interest.
awards. Govini’s most recent scorecard focused on
critical technologies, of which AI/ML technologies
was a segment and consistent of six subsegments:
data-at-scale, decision science, computer vision,
machine learning, autonomy, and natural language
processing. By initially generating search terms and
then subsequently excluding specific terms that yield
erroneous results, Govini delivers a comprehensive
yet discriminant taxonomy of subsegments that are
mutually exclusive. Repeated keyword searches and
filters allow a consensus, data-driven taxonomy to
come into focus. Govini SMEs conduct a final review
of taxonomic structure to complement this iterative,
data-driven process.
The use of AI and supervised ML models enables the
analysis of large volumes of irregular data contained
in federal contracts—data that is often inaccessible
Table of Contents
Appendix
383
Artificial Intelligence
Index Report 2023
Appendix
Chapter 7: Diversity
Chapter 7: Diversity
Computing Research
Association (CRA Taulbee
Survey)
To learn more about the diversity data from the CRA,
please read the methodological note on the CRA’s data
included in the Chapter 5 subsection of the Appendix.
Code.org
To learn more about the diversity data from Code.
org, please read the methodological note on Code.
org’s data included in the Chapter 5 subsection of the
Appendix.
Table of Contents
Appendix
384
Artificial Intelligence
Index Report 2023
Appendix
Chapter 8: Public Opinion
Chapter 8: Public Opinion
IPSOS
For brevity, the 2023 AI Index does not republish the
NetBase Quid
Social Media Data
methodology used by the IPSOS survey that features
NetBase Quid collects social media data from over
in the report. More details about the IPSOS survey’s
methodology can be found in the actual survey.
Lloyd’s Register Foundation
and Gallup
For brevity, the 2023 AI Index does not republish the
methodology used by the Lloyd’s Register Foundation
and Gallup survey that features in the report. More
details about the Lloyd’s Register Foundation and
Gallup survey methodology can be found in the actual
survey.
500 million sources in real time and analyzes this data
through AI-powered Natural Language Processing.
This process parses out language and breaks out
posts by filters such as drivers of positive and negative
sentiment, emotions, and behaviors, allowing for
deeper insights to be reached. To understand public
perception of advancements in artificial intelligence,
NetBase Quid analyzed social media conversation
around AI and AI model releases from January 2022
to December 2022. First, the NetBase Quid team
analyzed conversation around AI to understand key
drivers of general sentiment around AI advancements,
Pew Research
such as ethical, cultural, and economic concerns and
For brevity, the 2023 AI Index does not republish
Quid team leveraged the platform for a more targeted
the methodology used by the Pew Research survey
analysis of the same conversation, understanding
that features in the report. More details on the Pew
volume and sentiment around the major AI model
Research survey methodology can be found in the
updates and releases in 2022. This NetBase Quid
actual survey.
analysis ultimately showcases the relationship
perceptions among consumers. Then, the NetBase
between public perception and the advancement of
AI, leveraging targeted analytics tools to understand
both specific reactions to model releases as well as a
wider consumer conversation and what drives it.
Table of Contents
Appendix
385
Artificial Intelligence
Index Report 2023
Download