The role of News Analytics in financial engineering: a review and the road ahead Gautam Mitra 7 December 2011 London Outline Introduction What… Why… How. A commercial News data Models and Applications Abnormal Returns News Enhanced Trading Strategies Risk Control Case studies Data sources Information Contents/Metadata Summary Information/Views Information/modelling architecture Risk Control News Analytics Toolkit Momentum study Summary Conclusion WHAT News analytics : a working definition News analytics refers to the measurement of the various qualitative and quantitative attributes of textual news stories. Some of these attributes are: sentiment, relevance, and novelty. Expressing news stories as numbers permits the manipulation of …information in a mathematical and statistical way < Taken from Wiki > A news story is about an event WHY the research problem = the business problem The world of financial analytics is concerned with three leading problems. ( i ) Pricing of assets in a temporal setting ( ii ) Making optimum investment decisions- low frequency or optimum trading decisions- high frequency ( iii )Controlling risk at different time exposures How the message Finance industry focuses on three major applications: > High frequency :Trading strategies > Low frequency :Investment strategies > Risk control By increasing the information set with quantified news the legacy models for the above applications can be enhanced Knowledge from three disciplines are required > Information engineering > AI …Knowledge Engineering > Financial Engineering Introduction News Market Environment Sentiment [Behavioural finance < greed..fear..irrational exuberance >……… Wall Street 1 Wall Street 2 => money never sleeps ] Introduction [ neo classical models for choice or decision making] Trading Strategies/ Decisions Investment Decisions Risk Control Decisions Introduction R & D Challenge Identify Killer Application Smart investors rapidly analyse/digest information. News stories/announcements. Stock price moves (market reactions). Act promptly to take trading/investment decisions. Can a machine act intelligently(AI) to compete or outsmart humans ? Commercial Read The Handbook of News Analytics in Finance By: Gautam Mitra and Leela Mitra < for an instant understanding ...! > < or look up http://www.bis.gov.uk/foresight/our- work/projects/current-projects/computer-trading The Future of Computer Trading in Financial Markets Our report: Automated analysis of news to compute market sentiment: its impact on liquidity and trading...Gautam Mitra , Dan DiBartolomeo, Ashok Banerjee, Xiang Yu. Outline Introduction What… Why… How. A commercial News data Models and Applications Abnormal Returns News Enhanced Trading Strategies Risk Control Case studies Data sources Information Contents/Metadata Summary Information/Views Information/modelling architecture Risk Control News Analytics Toolkit Momentum study Summary Conclusion News data: Data sources Which Asset classes....? FX- Currency Commodities Fixed income (Bonds) Stocks (Equities) Wall Street proverb: ‘Stocks are stories bonds are mathematics’ News Data Feed Providers Tertiary Market Participants Market Data Feed Providers Customers Institutional Customers Broker-Dealers & Market Makers ECN Retail Customers Retail Brokers & Market Makers Exchange Main Market Participants News data: Data sources Traders [ High Frequency ] Fund Managers [ Low Frequency ] Desktop • Market Data • NewsWire • Web < blogs, twitter, message boards > Data WareHouse DataMart News data: Data sources Sources of news/informational flows (Leinweber) News: Mainstream media, reputable sources. Newswires to traders desks. Newspapers, radio and TV. Pre-News: Source data SEC reports and filings. Government agency reports. Scheduled announcements, macro economic news, industry stats, company earnings reports… Web based news Social media: Blogs, websites and message boards Quality can vary significantly Barriers to entry low Human behaviour and agendas News data: Data sources Financial news can be split between Scheduled news (Synchronous) Unscheduled news (Asynchronous, event driven) Scheduled news (Synchronous) Arrives at pre scheduled times Much of pre news Structured format < XML..XBRL > Often basic numerical format Typically macro economic announcements and earnings announcements News data: Data sources Unscheduled news (Asynchronous, event driven) Arrives unexpectedly over time Mainstream news and social media Unstructured, qualitative, textual form Non-numeric Difficult to process quickly and quantitatively May contain information about effect and cause of an event To be applied in quant models needs to be converted to an input time series Information contents/Metadata Key Attributes include: Entity Recognition Relevance Novelty Events categories Sentiment Preanalysis extracts/computes/mines these attributes and using text analysis and AI-classifiers sentiment scores are created This is the (news) metadata Also the news flow/the intensity influences the resulting sentiment Information/modelling architecture Mainstream News Pre-News Web 2.0 Social Media Pre-Analysis (Classifiers & others) metadata • Entity Recognition • Relevance • Novelty • Events • Sentiment Score News Flow/Intensity (Numeric) financial market data Analysis Consolidated Data mart Updated beliefs, Ex-ante view of market environment Quant Models 1.Return Predictions 2.Fund Management / Trading Decisions 3.Volatility estimates and risk control Information value chain Data… …information… knowledge Data analysis Data mart quant models Analysis ..synthesis ..mining entity recognition Identify entities such as companies in news stories using point-intime sensitive information: Short names Long names Common abbreviations Common misspellings Securities identifiers Subsidiaries Analysis ..synthesis ..mining relevance Calculate the relevance of a story to a given company: • Mentions in the text • Positioning in the story (headline vs. last paragraph) • Total number of companies mentioned • Detect roles played by companies in the story • Represent the context numerically Analysis ..synthesis ..mining novelty Is the news story "new" or novel? • Elementize the various characteristics of a news story • Distinguish between similar vs. duplicate stories • Define a time window between stories Example: Toyota’s Vehicle Recall (news flow in the first 30 minutes) 100 75 56 42 • 2010-01-21 21:20:08 -- PRESS RELEASE: Toyota Files Voluntary Safety Recall on Select Toyota • 2010-01-21 21:20:08 -- News Flash: Toyota Files Voluntary Safety Recall On Select Toyota Division Vehicles • 2010-01-21 21:21:27 -- Toyota To Recall About 2.3M Vehicles For Sticking Accelerator Pedals>TM • 2010-01-21 21:48:10 -- DJ Toyota Recalls 2.3 Million Vehicles For Sticking Accelerators Analysis ..synthesis ..mining: event categories Company news and events are categorized: • • • • Identify actionable events The more detailed the event, the better Differentiate between scheduled vs. unscheduled news events Distinguish between explanatory or predictive inputs M&A Activity Stock Price Changes Analyst Ratings Bankruptcy Revenues Credit Ratings Regulatory Price Targets Dividends Legal Issues Earnings Insider Trading Analysis ..synthesis ..mining sentiment Summary information and views Thomson Reuters News Analytics Equity coverage and available data (i) Coverage (ii) Equity: All equities ............................34,037 (100.0%?) Active companies ................32,719 (96.1%) Inactive companies............. 1,318 (3.9%) Equity coverage by region Americas: ...............................14,785 APAC: .....................................11,055 EMEA:.......................................8,197 Equity Coverage Updates: Bi-weekly updated for recent changes (de-listings, M&A, IPOs). History: Available from January 2003 (history kept for delisted companies; symbology changes tracked). RavenPack News Analytics Equity Coverage by Region All equities...................................28,279 (100%) Americas: ...................................11,950 (42.24%) Asia: ............................................8,858 (31.31%) Europe:...................................... 5,859 (20.71%) Oceania: ....................................436 (5.08%) Africa: .........................................186 (0.66%) For the most updated list of supported companies download the companies.csv file at: https://ravenpack.com/newsscores/ Historical Data: Data format: Comma separated values (.csv) files Date/Time info: In Universal Coordinated Time (UTC) Archive Range: Since Jan 1, 2005 Archive Packaging: Monthly .csv files compressed in .zip files on a per year basis Summary information Other suppliers Deutsche Boerse < Alpha Flash > Bloomberg ‘Black box newsfeed’ Dow Jones Elementized Newsfeed Summary information and views Tetlock et al. event study shows “information leakage” Summary information and views Average Stock Price Reaction to Negative News Events Source: Macquarie Quant Research –May 2009 Summary information and views Average Stock Price Reaction to Positive News Events Source: Macquarie Quant Research –May 2009 Summary information and views Illustration of Seasonality (Hafez, RavenPack) RavenPack Sentiment Scores Reuters NewsScope Sentiment Engine Outline Introduction What… Why… How. A commercial News data Models and Applications Abnormal Returns News Enhanced Trading Strategies Risk Control Case studies Data sources Information Contents/Metadata Summary Information/Views Information/modelling architecture Risk Control News Analytics Toolkit Momentum study Summary Conclusion Model & Applications… (abnormal ) Returns Traders and quant managers … identify and exploit asset mispricings before they correct … generate alpha News data can be used Stock picking and generating trading signal Factor models Exploit behavioural biases in investor decisions Model & Applications… (abnormal ) Returns Stock picking and generating trading signal Sentiment reversal as buy signal: J Kitterell uses a sequence of P, N scores as a means of testing sentiment reversal. Momentum strategy enhanced by news sentiment scores Macquarie research also Sinha reports results with Thomson Reuters data. Model & Applications… (abnormal ) Returns Behavioural biases Odean and Barber (2007) find evidence individual investors have a tendency to buy attention grabbing stocks. Professional investors better equipped to assess a wider range of stocks they are less prone to buying attention grabbing stocks Da, Engleberg and Gao also consider how the amount of attention a stock received affects its cross-section of returns. Use the frequency of Google searches for a particular company as a measure of attention. Find some evidence that changes in investor attention can predict the cross-section of returns. Model & Applications… (abnormal ) Returns Stock picking and generating trading signal Li (2006) simple ranking procedure … identify stocks with positive and negative sentiment 10 K SEC filings for non-financial firms 1994 – 2005 Risk sentiment measure – count number of times words risk, risks, risky, uncertain, uncertainty and uncertainties appear in management discussion and analysis section Strategy long in low risk sentiment stocks short in high risk sentiment stocks … reasonable level returns Leinweber (2010) – event studies based on Reuters NewsScope Sentiment Engine News Enhanced Algorithmic Trading 1. Information/modelling architecture 2. Modelling architecture Pre-trade – Post trade Analysis Characterize asset behaviour/dynamics by i. Asset Price/Return ii. Asset (Price) Volatility iii. Asset (Price) Liquidity Construct trading models using these measures Market Data Bid, Ask, Execution price, Time bucket News Meta Data Time stamp, CompanyID, Relevance, Novelty, Sentiment score, Event category… Price/Returns Predictive Analysis Model Volatility Liquidity Pre-Trade Analysis Market Data Feed Predictive News Meta Data Analytics Feed Automated Algo-Strategies (Analytic) Market Data Price, volatility, liquidity Low Latency Execution Algorithms Post Trade Analysis Trade orders Post Trade Analysis Report Market Data News Data Ex-Ante Decision Model Ex-Post Analysis Model Applications: Risk management Traditionally historic asset price data has been used to estimate risk measures. Significant changes in the market environment ex post retrospective measures fail to account for developments in the market environment, investor sentiment and knowledge Traditional measures can fail to capture the true level of risk (Mitra, Mitra and diBartolomeo 2009; diBartolomeo and Warrick 2005) Incorporating measures or observations of the market environment in risk estimation is important EQUITY PORTFOLIO RISK (VOLATILITY) ESTIMATION USING MARKET INFORMATION AND SENTIMENT Leela Mitra Co-authors: Gautam Mitra and Dan diBartolomeo . Sponsored by: Case study: Outline Problem setting Model description Updating the model using quantified news Study I Study II Discussion and conclusions Introduction & background Tetlock et al. (2007) note there are three main sources of information Analyst forecasts Publicly disclosed accounting variables Linguistic descriptions of operating environments If first two are incomplete third may give us relevant information Tetlock et al. (2007) introduce “news” to a fundamental factor model Problem setting Three main types of factor models Macroeconomic – use economic variables as factors (Chen, Ross and Roll; Sharpe) Fundamental – based on firm specific (crosssectional) attributes (BARRA and Fama-French) Statistical – factors are unobservable and derived via calibration, often orthogonal. Differ on sources of risk (uncertainty); can be shown to be rotations of each other. Problem setting Need for models to update risk structure as environment changes diBartolomeo and Warrick (2005) update covariance estimates using option implied volatility CHANGES TO MARKET ENVIRONMENT TRADERS REACT CHANGES IN OPTION IMPLIED VOLATILITY CHANGES IN ASSET COVARIANCE MATRIX Traders respond quickly in an intelligent fashion Model description An extension of diBartolomeo & Warrick(2005) In two parts “Basic” statistical factor model Factor variance estimates are updated for changes in option implied volatility Model description We construct a statistical factor model using principal component analysis to find orthogonal factors Update the asset variances using option implied volatility data Model description For each asset for which we have option implied volatility data We wish to identify the new factor variances and asset specific variances implied by updated asset variances Solve this set of simultaneous equations to derive the values, subject to some further conditions Model description Further conditions Allow for structure that is expected of principal component factors Assume factor variances do not decline substantially from one period to the next Similarly assume asset specific variances do not decline substantially from one period to the next Study I Period 17 January 2008 to 23 January 2008 EURO STOXX 50 Market sentiment worsened Option implied volatility measures surged Few key events Large interest rate cut George Bush announced stimulus plan Soc Gen hit by Jerome Kerviel rogue trader scandal Study I Portfolio volatility from option implied model is higher than “basic” model rises significantly on 21 January Study II Over 2008 markets fell Loss of liquidity in credit markets and banking system Many banks suffered bankruptcy or propped up September and October 2008 – Volatility for financial firms particularly high Lehman Bankruptcy Lloyds takeover of HBOS Restrictions on short selling of financials Study II 18 September 2008 to 24 September 2008 Dow Jones 30 Portfolio of three finance stocks Portfolio of three non-finance stocks Bank of America, CitiGroup and JP Morgan Chase Equal weight on each stock Johnson & Johnson, Kraft Foods and Coca Cola Equal weight on each stock Can the model predict impact in one sector…? Study II Study II Information/modelling architecture Mainstream News Pre-News Web 2.0 Social Media Pre-Analysis (Classifiers & others) metadata • Entity Recognition • Relevance • Novelty • Events • Sentiment Score News Flow/Intensity (Numeric) financial market data Analysis Consolidated Data mart Updated beliefs, Ex-ante view of market environment Quant Models 1.Return Predictions 2.Fund Management / Trading Decisions 3.Volatility estimates and risk control Information value chain Data… …information… knowledge Data analysis Data mart quant models News Analytics Toolkit Momentum Study RSI (Relative Strength Indicator) with a 15 day timeframe U = closenow − closeprevious if up period, 0 otherwise D = closeprevious − closenow if down period, 0 otherwise RS = EMA(U,n) / EMA(D,n) RSI = 100 – 100 / (1 + RS) Asset Universe: FTSE100 and CAC40 EMA = n-period Exponential Moving Average Daily market data from Jan 2005 to Jan 2011 Portfolio Selection: Ranked by the RSI Momentum Indicator Long only, equally weighted Calendar rebalancing frequency every 60 or 90 working days Transaction Cost: 0.2% Number of assets in portfolio: 10 for FTSE100, 5 for CAC40 Momentum Study News enhanced Momentum Strategy News provided by RavenPack News Score 1.5 Revised Ranking including Market Data and News Data Companies are ranked according to average sentiment score Only news with Relevance ≥ 75 and within the previous 15 days are considered Momentum ranking and news ranking are combined with equal weights between news sentiment score and RSI score Companies with no news in the period are considered to have an average sentiment score of 50 (neutral sentiment) Momentum Study FTSE 100, 90 days rebalancing Momentum Study CAC 40, 90 days rebalancing Momentum Study FTSE 100, 60 days rebalancing Momentum Study CAC 40, 60 days rebalancing Summary & discussions Applications of (semi-)automated news analytics in finance are growing in importance. Pay back can be substantial to: Investment Managers Traders Internal Risk Auditors Regulators Summary & discussions Knowledge and Skills from three different disciplines: Information Systems. Artificial Intelligence. Financial Engineering & quantitative modelling (including behavioural finance). are required in various degrees to progress the field/make substantial impact. Thank you.... Thank you for your attention Comments and Questions please