1 Machine Learning ESG Data as Alternative Data and ML/AI for Integrating ESG Data into Investment Decisions As an investment professional, how might you set about measuring the potential impact of climate change on a company’s future prospects? Negative climate outcomes in coming years may include higher temperatures, more intense storms, melting glaciers, rising sea levels, shifting agricultural patterns, pressure on food and water, and new threats to human health. Assessing the likely severity of these future events and then quantifying the impact on companies is no easy task. Big Data techniques could be pivotal in generating usable information that could help investment professionals unlock long-­term shareholder value. Some fund managers, influenced by evolving investor preferences and increasing disclosure by companies on non-­financial issues, have already incorporated ESG analysis into their investment processes. Governance (“G”) data are generally objective: Investors are able to observe and measure corporate board actions, making governance comparable across companies and regions. Data on Environmental (“E”) and Social (“S”) impacts on listed companies, on the other hand, are more subjective, less reliable, and less comparable. ESG data resemble alternative data in the sense that they have generally been poorly defined, are complex and unstructured, and need considerable due diligence before being used in investment decision making. Applying Machine Learning (ML) and Artificial Intelligence (AI) techniques can transform ESG data into meaningful information that is more useful for investment analysis. Corporate sustainability reports often suffer from haphazard data collection and missing values. Equally, when data vendors acquire and combine raw ESG data into aggregate ESG scores, potential signals may be lost. ESG data and scoring across companies and data vendors can lack consistency and comparability; as a result, using simple summary scores in investment analysis is potentially flawed. Data analysts can apply data-­science methods, such as data cleansing and data wrangling, to raw ESG data to create a structured dataset. Then, ML/AI techniques, such as natural language processing (NLP), can be applied to text-­based, video, or audio ESG data. The foundation of NLP consists of supervised machine learning algorithms that typically include logistic regression, SVM, CART, random forests, or neural networks. NLP can, for instance, search for key ESG words in corporate earnings calls. An increase in the number of mentions of, say, “human capital,” employee “health and safety,” or “flexible working” arrangements may indicate an increased focus on the “S” pillar of ESG. This would potentially raise the overall ESG score of a particular company. The results of such an application of NLP to corporate earnings calls are illustrated in the following exhibit: 2500 2000 Num. of companies that have reported Num. of companies Num. of companies menoning ESG keywords 1500 Num. of companies menoning COVID-19 1000 500 0 Source: “GS SUSTAIN: ESG—Neither Gone Nor Forgotten” by Evan Tylenda, Sharmini Chetwode, and Derek R. Bingham, Goldman Sachs Global Investment Research (2 April 2020). © 2022 CFA Institute. All rights reserved. 2 ML/AI can help fund managers apply only those ESG factors that are relevant to a company and its sector. For example, “E” factors are important for mining and utility companies but less so for clothing manufacturers. Likewise, “S” factors are important for the global clothing manufacturing sector but less so for mining and utility companies. ML/AI techniques are not used in isolation. ESG scoring systems tend to rely on cross-­ functional teams, with data scientists operating in tandem with economists, fundamental analysts, and portfolio managers to identify strengths and weaknesses of companies and sectors. Fundamental analysts, for instance, typically do not need to know the details of ML algorithms to make valuable contributions to the ESG investment workflow. The industry-­specific knowledge of fundamental analysts can provide nuanced viewpoints that help to: 1) identify relevant raw data; 2) enable data scientists to incorporate ESG data into appropriate investment models; and 3) interpret model outputs and investment implications.