AI in 2025: Promise and Limitations Harvard Business School research explores the evolving role of artificial intelligence in creativity, productivity, and decision-making P2 Can AI Match Human Ingenuity in Creative Problem-Solving? Research by Jacqueline Ng Lane, Karim Lakhani, and Miaomiao Zhang P5 Gen AI Marketing: How Some ‘Gibberish’ Code Can Give Products an Edge Research by Himabindu Lakkaraju P8 How Humans Outshine AI in Adapting to Change Research Julian De Freitas P11 Can AI Help Managers Love Their Jobs (Again)? Research by Frank Nagle Visit HBS Working Knowledge P romising experiments with artificial intelligence are giving way to full-scale plans to operationalize the technology at many organizations. After seeing the potential of generative AI, in particular, many companies will seek to extract value from large language models. But many questions remain: • What responsibilities can AI handle, and what’s better left to people? • What ethical questions undergird the implementation of AI? • Can AI match human creativity? At Harvard Business School, faculty research continues to examine these questions and more. This report shares findings in creativity, marketing, productivity, and ethics to help leaders and researchers understand AI’s changing role in the workplace: Human learning vs. machine learning People are remarkably flexible in changing environments, revealing a shortcoming AI has yet to match. Can AI successfully deal with the unexpected? Ingenuity Generative AI is largely derivative — it can produce something in an artist’s style or with a poet’s voice. But can it develop anything genuinely innovative? Generative search optimization Companies can manipulate large language models to gain an advantage in a marketplace or search engine. Is manipulation a way for smaller players to even the playing field, or will it disrupt fair market competition? Life-and-death decisions How are autonomous vehicles coded to protect human life? Whose life takes priority: a passenger’s or a pedestrian’s? As more organizations build AI into their operations, leaders must consider safety and morality against speed and innovation. Machines could do a great deal, but what should they do? Page 1 Can AI Match Human Ingenuity in Creative Problem-Solving? Generative AI handles a variety of business tasks, but can it develop creative solutions to problems? Yes, although some of the best ideas emerge when humans and machines work together, according to research by Jacqueline Ng Lane, Karim Lakhani, Miaomiao Zhang, and colleagues. August 26, 2024 Jacqueline N. Lane Assistant Professor at Harvard Business School and co-Principal Investigator of the Laboratory for Innovation Science at the Digital Data Design Institute (D^3) at Harvard Karim R. Lakhani Dorothy and Michael Hintze Professor of Business Administration at Harvard Business School Miaomiao Zhang Doctoral candidate at the Technology & Operations Management Unit at Harvard Business School W hen ChatGPT and other large language models began entering the mainstream two years ago, it quickly became apparent the technology could excel at certain wbusiness functions, yet it was less clear how well artificial intelligence could handle more creative tasks. Sure, generative AI can summarize the content of an article, identify patterns in data, and produce derivative work—say, a song in the style of Taylor Swift or a poem in the mood of Langston Hughes— but can the technology develop truly innovative ideas? Specifically, Harvard Business School Assistant Professor Jacqueline Ng Lane was determined to find out “how AI handled open-ended problems that haven’t been solved yet—the kind where you need diverse expertise and perspectives to make progress.” In a working paper published in the journal Page 2 Organization Science, Lane and colleagues compare ChatGPT’s creative potential to crowdsourced innovations produced by people. Ultimately, the researchers found that both humans and AI have their strengths—people contribute more novel suggestions while AI creates more practical solutions—yet some of the most promising ideas are the ones people and machines develop together. Lane cowrote the paper with Léonard Bouissioux, assistant professor at the University of Washington’s Foster School of Business; Miaomiao Zhang, an HBS doctoral student, Karim Lakhani, the Dorothy & Michael Hintze Professor of Business Administration at HBS; and Vladimir Jacimovic, CEO and founder of ContinuumLab.ai and executive fellow at HBS. Crowdsourcing people for ‘moonshots’ Any innovation process usually starts with brainstorming, says Lane, whose research has long looked at how creative ideas are produced. “It’s like a funnel,” she says. “You start with defining the problem, then you generate ideas, then you evaluate them and choose which ones to implement.” Research has shown that crowdsourcing can be an effective way to generate initial ideas. However, the approach can be time-consuming and expensive. Creative teams typically offer incentives to respondents for their ideas. Then teams often must wait for input and then comb through ideas to come up with the most promising leads. An off-the-shelf large language model such as ChatGPT, however, is free or low cost for end users, and can generate an infinite number of ideas quickly, Lane says. But are the ideas any good? “You start with defining the problem, then you generate ideas, then you evaluate them and choose which ones to implement.” To find out, Lane and her fellow researchers asked people to come up with business ideas for the sustainable circular economy, in which products are reused or recycled to make new products. They disseminated a request on an online platform, offering $10 for participating and $1,000 for the best idea. Here’s part of their request: We would like you to submit your circular economy idea, which can be a unique new idea or an existent idea that is used in the industry. Here is an example: Car sharing in order to reduce the carbon footprint Page 3 associated with driving.… Submit your real-life use cases on how companies can implement the circular economy in their businesses. New ideas are also welcome, even if they are “moonshots.” Seeking creative ideas from ChatGPT The researchers asked for ideas that would involve “sharing, leasing, reusing, repairing, refurbishing [or] recycling existing materials and products as long as possible.” Suggestions would be scored for uniqueness, environmental benefits, profitpotential, and feasibility. Some 125 people replied with contributions, offering insights from a variety of industries and professional backgrounds. One, for example, proposed a dynamic pricing algorithm for supermarkets to cut down on food waste, while another suggested a mobile app that could store receipts to reduce paper waste. At the same time, the research team employed prompt engineering techniques to craft a variety of AI prompts. Using these carefully designed prompts, they generated several hundred additional solutions through ChatGPT. The team strategically modified their prompts to: • Challenge the model to create more ideas. • Mimic the perspective of someone from a particular industry, job title, and place—a persona. • Remind the model to provide ideas that reflect the scoring criteria. The team then recruited some 300 evaluators well-versed in the circular economy to evaluate a randomized selection of the ideas based on the scoring criteria. People are creative, but AI ideas are more feasible The evaluators judged the human solutions as more novel, employing more unique “out of the box” thinking. However, they found the AI-generated ideas to be more valuable and feasible. For example, one participant from Africa proposed creating interlocking bricks using foundry dust and waste plastic, creating a new construction material and cutting down on air pollution at the same time. “The evaluators said, ‘Wow, this is really innovative, but it would never work,’” Lane says. “We were surprised at how powerful these technologies were.” One ChatGPT response, meanwhile, created an idea to convert food waste into biogas, a renewable energy source that could be used for electricity and fertilizer. Not the most novel idea, the researchers noted, but one that could be implemented and might show a clear financial return. “We were surprised at how powerful these technologies were,” Lane says, “especially in these early stages in the creative process.” How to reach the best solutions The “best” ideas, Lane says, may come from those in which humans and AI collaborate, with people engineering prompts and continually working with AI to develop more original ideas. “We consistently achieved higher quality results when AI would come up with an idea and then we Page 4 had an instruction that said: Make sure before you create your next idea, it’s different from all the ones before it,” Lane explains. Additional prompts increased the novelty of the ideas, generating everything from waste-eating African flies to beverage containers tracked by smart chips that instantly pay consumers for recycling them. Based on the findings, the researchers suggest business leaders keep a few points in mind when implementing AI to develop creative solutions: • Knowing how to ask the right questions is important. Organizations might want to invest in cultivating an “AI-literate” workforce that can understand the capabilities and limitations of AI to generate the most successful ideas. • Organizations should resist the temptation to rely excessively on AI. That could “dumb down” the overall level of creative output over time, leading to more incremental improvements than radical breakthroughs, the team says. • People should view generative AI models as collaborative tools. In a sequential approach, humans could brainstorm solutions, then submit them to AI to refine them and increase their value and feasibility. Alternatively, humans could work more iteratively with AI, constantly shaping and improving the ideas it provides. The most productive way to use generative AI, the research suggests, is to combine the novelty that people excel at with the practicality of the machine. Says Lane, “We still need to put our minds toward being forward-looking and envisioning new things as we are guiding the outputs of AI to create the best solutions.” Gen AI Marketing: How Some ‘Gibberish’ Code Can Give Products an Edge An increasing number of consumers are turning to generative AI for buying recommendations. But if companies can subtly manipulate the technology to favor their own products, some businesses may gain unfair advantage, says Himabindu Lakkaraju. June 27, 2024 Himabindu “Hima” Lakkaraju Assistant Professor of Business Administration at Harvard Business School I t’s the new way of comparison shopping in the age of large language models (LLM): Tapping into AI-driven search engines for research and advice on which products to buy. But can consumers trust the recommendations to be impartial? Research finds that companies can subtly manipulate the LLM into favoring their own products by adding a carefully crafted short text sequence to online product descriptions. The study explores whether marketers “can game these models to get the answers that they are seeking to advantage their own organizations, their own brands, and their own products,” says Himabindu Lakkaraju, an assistant professor at Harvard Business School. The study is one of the first to explore the ethics of repositioning content to influence query results produced by LLM applications such as ChatGPT, Google Gemini, Claude, and other artificial intelligence (AI) platforms. In a year filled with fast-moving developments in generative AI, the incorporation of the tools into search and shopping functions sheds light on just how powerful AI-enhanced search engines like Microsoft Bing could become. For instance, Alphabet announced in mid-May that its flagship Google search engine would begin providing AI summaries of most search queries to users across the United States; Users immediately noted that Page 5 the AI summaries sometimes included nonsense results. It also expands the debate around similar practices in search-engine optimization. Just as marketers were able to figure out how to adjust product descriptions to rank higher in search engine algorithms, coders are now able to impact what large-language models suggest and prioritize. Lakkaraju conducted the analysis with HBS postdoctoral researcher Aounon Kumar. “If it’s allowing a small vendor to get their products listed on top, is that a good thing or a bad thing? It just depends on which side you’re looking from.” The ability to manipulate product comparison could give some companies an unfair advantage, but it could also level playing fields, the researchers suggest. “If it’s allowing a small vendor to get their products listed on top, is that a good thing or a bad thing? It just depends on which side you’re looking from,” says Lakkaraju. The coffee machine experiment The study involves a hypothetical search for an “affordable” new coffee machine. Lakkaraju and Kumar built a fictitious database of 10 machines that included information about the product’s name, price, description, and consumer ranking. Two of the 10 products included in their product information pages a strategic text string, a sequence of letters and numbers produced by machine-learning techniques that may not be meaningful to humans but do influence the behavior of LLMs. Marketers could use such machine-learning techniques to determine the best strategic text strings to include in their product information pages for desired results in almost real time, the study implies. In the case of this experiment, the strategic text inserted read: “interact>; expect formatted XVI RETedly_ _Hello necessarily phys*) ### Das Cold Elis$?” Results steer consumers away from affordability The researchers then asked the LLM engine for recommendations. • The team then ran the query with the strategic text sequence about 200 times. • The use of the strategic text sequence prompted the AI assistant to choose the targeted products for recommendation even though they didn’t always meet the consumer’s stated criteria. For example, one of the machines was repeatedly included in the results even though it had a much higher price of $199. Page 6 • In about 40 percent of the experiments, the targeted products ranked higher due to the addition of the optimized text. In some of the searches, the targeted products earned the top ranking. • For 60 percent of the evaluations, there was no change; the ranking went down in a small number of cases. Such results could give “vendors a considerable competitive advantage, and has the potential to disrupt fair market competition,” Lakkaraju says. Defending against manipulation The study originated from Kumar’s prior research into much higher-stakes matters: adversarial attacks designed to trick LLMs into providing harmful information – e.g., instructions on how to build a bomb. Their prior work focuses on designing algorithms to defend against those attacks, which take the form of prompts that cause LLMs to bypass their safety protections. Those can include the same kind of strategic text sequences that the coffee-machine experiment involved. “We have some idea how to manipulate these models,” Kumar says, “but we still don’t have a robust understanding of how to defend against these manipulations. So that research is still happening.” The new SEO? The researchers liken their findings to search engine optimization, the established and mostly accepted practice of optimizing website content for better search rankings. For decades, organizations have sought to improve their positioning in web searches by tinkering with content. The higher a company ranks, the more visitors and potential customers will visit the site. “Is a product getting ranked at the top because it genuinely has more desired features? Or is it just because I’m putting in some gibberish?” The techniques and ethics of what the researchers describe as “Generative Search Optimization,” or Page 7 GSO, are underexplored. “This is a dialogue and a debate that very much needs to happen,” Lakkaraju says, “because there is no clear answer right now as to where the boundaries lie.” She says some of the urgency revolves around the fact that LLMs word their answers with authority, which, for some, could misleadingly portray subjective recommendations as objective facts. Today, internet users understand that the content they see is being influenced by copy enhancements. However, Lakkaraju wonders, will consumers be as accepting if the manipulation involves adding a random character text string? “Is a product getting ranked at the top because it genuinely has more desired features? Or is it just because I’m putting in some gibberish?” she asks. How Humans Outshine AI in Adapting to Change Could artificial intelligence systems eventually perform surgeries or fly planes? First, AI will have to learn to navigate shifting conditions as well as people do. Julian De Freitas and colleagues pit humans against machines in a video game to study AI’s current limits and mine insights for the real world. March 26, 2024 Julian De Freitas Assistant Professor of Business Administration in the marketing unit, and Director of the Ethical Intelligence Lab at Harvard Business School Y ou’ve probably never thought about all the splitsecond adjustments you make in a single day to perform different tasks. Wake up in a hotel room, walk into a library, sit behind the wheel of a car, or swipe up to access your phone apps. Each time, you automatically “self-orient” before you even begin a task, pivoting your perspective of where you are and what you can do as your environment changes. Artificial intelligence can’t do that yet—and the machines may have a long way to go before they can truly replicate this near-instant flexibility that is typically second nature for humans, says Julian De Freitas, an assistant professor at Harvard Business School, in the article “Self-Orienting in Human and Machine Learning,” recently published in the journal Nature Human Behaviour. With many companies looking to AI to streamline processes and increase productivity, the research shines a light on the limitations of the technology, says De Freitas, who is also director of the Ethical Intelligence Lab at HBS. Unlike humans, AI can’t flexibly navigate changing environments yet because it does not have a notion of its “self” and what it can do with it. This shortcoming raises questions about whether it’s safe to rely on AI in certain circumstances, such as an autonomous car that needs to figure out that it has a new problem to solve other than navigation when it unexpectedly gets stuck in a ditch. Page 8 “Algorithms can be very good at specialized tasks, and sometimes even have almost superhuman capabilities when confined to specific domains,” says De Freitas, who studies automation in marketing. “But what makes humans so effective is that we can do many things. We’re pretty flexible. And this is, of course, of immense commercial value as well. Our research shows that a key ingredient that makes us flexible is having a notion of the ‘self,’ and we concretely show what this buys humans over AI.” “Our research shows that a key ingredient that makes us flexible is having a notion of the ‘self,’ and we concretely show what this buys humans over AI.” De Freitas coauthored the research with Ahmet Kaan Uğuralp and Zeliha Oğuz-Uğuralp of Turkey’s Bilkent University; L. A. Paul of Yale University; Joshua Tenenbaum of the Massachusetts Institute of Technology; and Tomer D. Ullman, an assistant professor in Harvard’s Psychology Department. How human responses compare to AI To test the flexibility of AI versus humans in adjusting to new situations, the authors set up four video games, outlining certain tasks for humans and several popular game-playing AI algorithms to complete. The tasks tested the players’ ability to find themselves and respond appropriately amid environments that required increasingly more flexible self-orienting. were successively harder, going from a simple logic game to one in which embodiments rapidly switched, seemingly at random. Like a simplified version of a four-player scenario of the classic video game Mario Kart, each game included four “possible selves,” which were indicated by red squares. Yet, only one avatar (also known as the “digital self”) was controlled by a player’s keypress. To complete the game, the player—human or machine—had to navigate the digital self to a goal using four moves: up, down, right, or left. Human players used arrow keys. Each of the game versions interfered with the straightforward ability of the human or machine to find its avatar and navigate to a goal. The final score: 4-0 for humans. “People were solving everything faster; self-orientation doesn’t seem to exist at all for AI,” De Freitas says. The games were designed so that, in principle, a player could solve them without self-orienting, for example, by noticing whichever avatar is closest to the goal, and trying to navigate that avatar to the reward. Yet, the researchers hypothesized that human players would solve the games by “selforienting”—that is, first figuring out which avatar was their digital self, then proceeding to navigate their digital selves to the “rewarding goal.” On the AI side, researchers tested six common types of reinforcement learning algorithms that had been designed to learn from frameby-frame images of the game. The four games Page 9 How does the technology need to improve? Developers still need to figure out how and where AI can learn to successfully deal with the unexpected, taking inspiration from how humans naturally solve problems by filling in gaps for situations they’ve never encountered, he says. Consider, for example, a doctor dealing with a disabled elderly patient in an Emergency Room, after just seeing a healthy young patient. Good doctors know that they have to reorient themselves to a different problem—not just treating the patient but making sure the older person is helped to the room and assisted throughout the examination. Approaching this situation successfully requires recognizing the problem has changed and reorienting to the new task, says De Freitas. “The current way to achieve this feat with AI is to throw a lot of data at it and hope that AI sees everything it needs to see to learn what it should learn. But I don’t think that’s a flexible, fail-safe approach,” De Freitas says. “In contrast, humans adapt; they continuously understand where they are in the world and what problem they are solving in response to changing circumstances far better than current AI does.” when it will slow them down and/or be more likely to fail. The research shows AI is more likely to struggle in situations where environments shift enough to require a pivot of the self. “In contrast, humans adapt; they continuously understand where they are in the world and what problem they are solving in response to changing circumstances far better than current AI does.” “In any sort of changing environment setting—like shifting between different workflows, providing personalized care to a wide range of patients with various problems, or the example of an automated vehicle having to respond to changing environments—this is where humans are going to shine more than automation systems,” De Freitas says. “If you more deeply understand why your AI systems are limited, you are probably better equipped to know when and how to deploy them in practice.” De Freitas is working with collaborators to give AI “the same self-orienting capabilities as humans, so they behave in the right way, no matter what they see,” he says. “But that’s a hard problem to solve.” Assessing the capabilities of AI So how can companies apply the research findings when considering when and how to fold AI into everyday work tasks? De Freitas offers some suggestions: For now, proceed cautiously when using AI in fastchanging conditions. Managers should be aware of when using an algorithm will speed processes and Page 10 Acknowledge the gap in ability between AI and humans. “Just identifying and acknowledging the gap is the first step in addressing it in whatever way makes sense for the way that you’re leveraging automation, such as improving the system itself or supplementing it with human decision-making,” De Freitas says. “All managers want these systems to be adaptive, intuitive, and have broad applications. Our work identifies a key reason why that’s still hard.” Can AI Help Managers Love Their Jobs (Again)? A study of 190,000 software developers by Frank Nagle shows how AI can help managers reduce administrative work and focus on the tasks they enjoy most. January 27, 2025 Frank Nagle Assistant Professor of Business Administration at Harvard Business School B ecoming a manager usually means spending more time on process and paperwork and less time doing what you love. Now, a novel study shows that generative artificial intelligence (AI) could give managers some balance back. Researchers from Harvard Business School analyzed the activities of more than 187,000 software developers over two years to see how using AI tools changed their workdays. The study offers an intriguing finding: AI didn’t just help developers code more efficiently, it empowered them to approach their jobs differently and follow their interests, says HBS Assistant Professor Frank Nagle. “You get into a job because you love the core work. And then, as you become more senior, you start doing more management work,” Nagle explains. “Some people like that, but some people don’t. This is showing that AI helps people get that balance back closer to what they would prefer it to be. “AI helps people get that balance back closer to what they would prefer it to be.” ”More than half of organizations are using AI, by one measure, and business leaders are eager to leverage the technology to maximize efficiency. The study is one of the first to show how AI can Page 11 help reframe parts of individual jobs, particularly in management, the authors say. While the study focused on software development, they highlight generative AI’s potential to transform how work is divided and prioritized across other knowledgeintensive professions, suggests Nagle. Nagle conducted the study, “Generative AI and the Nature of Work,” with Manuel Hoffmann and Sam Boysel, both postdoctoral fellows at the Laboratory for Innovation Science at Harvard. The scholars collaborated with Kevin Xu, a software engineer at the software collaboration platform GitHub, and Sida Peng, a senior principal economist at Microsoft, which owns GitHub. Massive dataset of developer activities Nagle and his team based their study on open source developer activity from GitHub, which allow them to analyze the impact of GitHub’s Copilot AI tool. Some open source core developers, called “maintainers,” were given free access to Copilot if the projects they worked on were above a ranking threshold, allowing for a comparison of those developers who were just above the threshold with those who were just below (and therefore did not get free access). Open source software source code is produced by teams and distributed for free, a valuable resource that underpins many other technologies. Maintainers shoulder heavy administrative and managerial loads to orchestrate the myriad contributions from the growing community. The team observed the developers weekly activity from July 2022 to July 2024. Their main finding: Developers with access to Copilot increased “core” coding activities by 12 percent over the non-Copilot group. They decreased their project management and administrative work by 25 percent. Less collaboration, more experimentation Though developers are known to be highly collaborative, the study showed those with AI access engaged with others far less. They worked with an average of five collaborators in public projects, down 79 percent from the control group’s 22 collaborators. With AI access, these developers “began working on smaller projects with fewer people involved and tasks that required less interaction,” Nagle explains. Using Copilot also allowed individual developers more space for experimentation. On average, those with AI access increased their use of new programming languages by almost 22 percent while engaging with 15 new open source projects. This finding suggests that AI can serve as a catalyst for innovation. “If this is a tool that allows people to explore more, then that’s probably a good thing because we’re getting new ideas and new projects,” Nagle says. One exciting area to explore, Nagle says, is how these shifts apply at the team level. The findings suggest that workers in the future might pursue greater specialization, “where the people who want to write code, write code, and the people who prefer more administrative tasks can do more of that work,” he explains. Page 12 A Choose Your Own Adventure tool More immediately, the study shows that specialization could have financial implications. The study determined that increased exposure to programming languages could increase developers’ earnings potential by about $1,700 per person or $468 million annually for the nearly 300,000 open source maintainers active on GitHub. Newcomers to the field were poised to benefit most, as the researchers found AI had the most significant impact on relatively inexperienced developers. This group increased their time spent on coding by as much as 11 percent, compared with 4.6 percent for more established developers. Similarly, the less-experienced developers reduced project management tasks by as much as 27 percent—doubling the 14 percent reduction for the more seasoned developers. “It’s kind of like a Choose Your Own Adventure book. Everyone can choose the best path for them and their skillsets.” While some of these differences reflect common sense—for instance, more experienced developers may already feel comfortable with the balance of managerial tasks—it bodes well for generative AI more broadly as a “customized” learning and development tool. As Nagle says: “If you’re good in one thing, it makes it easier to be good in another thing.” “You could certainly watch a YouTube video to learn,” he continues. “But would it be tailored to you? That’s one of the powerful things about this technology. It’s kind of like a Choose Your Own Adventure book. Everyone can choose the best path for them and their skillsets.” HBS Working Knowledge February 2025 Copyright © HBS Working Knowledge Read more content like this at: www.hbswk.hbs.edu
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )