Statistical Arbitrage Lecture 1: Overview John Lehoczky Mark Schervish Giuseppe Nuti (UBS) John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 1 / 36 Course Administrative Details I I Homework Assignments (50%) Team Project (50%) I I Teams will be assigned in two weeks Syllabus I I I I Course Schedule Course Description and Learning Objectives Course Materials Academic Integrity John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 2 / 36 Learning Objectives From the Statistical Arbitrage course, students will gain: I experience in cleaning and manipulating financial data and will become aware of possible pitfalls in finding a good trading strategy and evaluating its performance. I a mastery of and experience in implementing the major approaches to statistical arbitrage including pairs trading, value, momentum and reversal strategies. I an understanding of the fundamentals of modern electronic trading including limit-order books, high-frequency trading and have experience about the microstructure of stock price paths. I experience in devising, implementing, and evaluating statistical arbitrage strategies on market data. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 3 / 36 Course Schedule I I I I Lectures 1-6 will occur at 5:30pm on the scheduled class day. Lecture 6 (October 7) will be given by Giuseppe Nuti of UBS. Instead of a seventh lecture, there will be a period starting at 5:30 during which the two instructors will be available to discuss team projects with team members. The final exam on Wednesday, October 15 will consist of team project presentations, given in parallel but separate sessions in Pittsburgh and New York City. Teams must turn in both a written report and a set of presentation slides before the start of the sessions. There are two milestones leading to the final project presentations: I I Monday, September 22: project proposals are due Friday, October 10: draft executive summaries of the projects are due at noon. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 4 / 36 Course Topics: 1 I Lecture 1: Course overview, historical background, introduction to pairs trading, discussion of data issues, and homework assignments 1 and 2. I Lecture 2: Statistical arbitrage by creating long/short portfolios of winner (long) and loser (short) stocks based on some principle (e.g Value/contrarian and momentum strategies). Discussion of classic papers by LSV and JT as well as new paper by AMP. I Lecture 3: Continuation of long/short portfolio strategies (Combining value and momentum, reversals), defining and identifying statistical arbitrage. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 5 / 36 Course Topics: 2 I Lecture 4: Time series background leading up to cointegration, pairs trading with cointegration, Gatev et al, Avellaneda and Lee papers. I Lecture 5: Volatility arbitrage, background on electronic trading, market microstructure, limit order books and high frequency trading. Discussion of papers by AHS, “Equity trading in the 21st century” I Lecture 6: Lecture by Giuseppe Nuti of UBS on high frequency finance. I Lecture 7: Project consultations John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 6 / 36 Eras in Stock Market Trading I Pre 1959: Value investing and Technical Analysis I 1959-1973: The “Golden Age of Quantitative Finance” I 1973-1993: The era of the “efficient market hypothesis” I 1993-2002: The rise of statistical arbitrage I 2002-2014: The rise of electronic trading and high-frequency finance John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 7 / 36 Pre 1959: 1 I Prior to 1962, there were principally two approaches to investing: value investing and technical analysis. I Value investing (Benjamin Graham and coauthors), finding stocks that are undervalued or overvalued relative to their own fundamentals or their industrial sector. I I I I Security Analysis, 1934 (B. Graham and D. Dodd) The Interpretation of Financial Statements, 1937, (B. Graham and S. Meredith). The Intelligent Investor, 1949 (Originally B. Graham, latest editions have W. Buffett and J. Zweig as coauthors) Note that company financial variables change very slowly. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 8 / 36 Pre 1959: 2 I Technical Analysis (e.g. McGee and Edwards), perhaps the forerunner of today’s statistical machine learning approaches. I No statistical analysis of the likelihood and degree of success derived from trading on these signals. Little concern with issues of portfolio construction and diversification. I John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 9 / 36 The Golden Age of Mathematical Finance 1959-1973 I Markowitz Portfolio Theory (1952-1959) I Modigliani-Miller Foundations of Corporate Finance and Capital Structure (1958, 1961, 1963) I CAPM (Sharpe, 1964 and others) I Samuelson: “Properly adjusted stock prices are martingales” (1965) I The behavior of stock market prices (non-normal returns) (Fama, 1965) I Efficient Market Hypothesis (Fama, 1970) I Black Scholes Merton option pricing formula (1973) John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 10 / 36 CRSP: 1 I The availability of data and computing power is fundamental to understanding the behavior of equity prices and other financial instruments. I CRSP was founded in 1960 at the University of Chicago, and one of its major projects was the establishment of daily stock price data (open, high low, close) for all stocks listed on the NYSE in 1962. I It began with the NYSE and was expanded to the AMEX and NASDAQ (and more broadly later). I This led to the start of a “science” of equity price behavior. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 11 / 36 CRSP: 2 I Before CRSP, the prevailing model of stock price behavior, if any, dated back to Bachelier in 1900, Theory of Speculation, namely stock prices behave like geometric Brownian motion. I This theory posited that the log-returns of a stock should have a normal distribution as well as an independent increments structure. I CRSP data allowed for a careful study (on a daily time scale) of the behavior of stock prices. Just like the microscope revolutionized biology and the telescope revolutioned astronomy, or fMRI is revolutionizing brain science, CRSP data revolutionized quantitative stock price modeling. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 12 / 36 CRSP: 3 Normal Probability Plot for IBM data 0.999 0.997 0.99 0.98 0.95 Probability 0.90 0.75 0.50 0.25 0.10 0.05 0.02 0.01 0.003 0.001 −0.2 −0.15 −0.1 −0.05 Data 0 0.05 0.1 Figure: IBM daily log-returns: 1/2/62 to 3/7/12, mean = .0034, std = .0162 John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 13 / 36 Efficient Market Hypothesis: 1 I In 1965 Fama published a major empirical study of the behavior of stock prices. He demonstrated that price changes (log-returns) have much heavier tails than would be predicted from a normal distribution. I In 1970, Fama also formulated the efficient market hypothesis stating: “... investors can choose among the securities that represent ownership of firms’ activities under the assumption that security prices at any time ‘fully reflect’ all available information. A market in which prices always ’fully reflects’ available information is called ’efficient’.“ John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 14 / 36 Efficient Market Hypothesis: 2 I The efficient market hypothesis became standard in academia. It was popularized in the mainstream by A Random Walk Down Wall Street, first published in 1973 by Burton Malkiel. Malkiel said: I “... A blindfolded chimpanzee throwing darts at the Wall Street Journal could select a portfolio that would do as well as the experts.” I In 2003, he would modify this to encourage buying index funds with very low expense charges. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 15 / 36 Efficient Market Hypothesis: 3 I The academic community was solidly convinced by the efficient market hypothesis until the early 1990s; however, this didn’t prevent the financial services industry from expanding its stock advisory business during those years. I The efficient market hypothesis also had some important skeptics: I “I’d be a bum in the street with a tin cup if the markets were efficient.” Warren Buffett, Forbes Magazine, April 3, 1995. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 16 / 36 1990s, Rise of Statistical Arbitrage I The CAPM of Sharpe coupled with the efficient market hypothesis argued that there was one risk factor in investing, market risk. Stocks were characterized by their “beta coefficient” which measured their risk relative to the overall market. I Between 1985 and 1995, three major results were published that exhibited trading/investment strategies that resulted in “excess profits” beyond market risk (commonly called ‘alpha.’ These clearly showed “anomalies” and suggested trading strategies that were zero cost and yielded “excess profits.” I These include pairs trading, value, momentum, and short term reversals. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 17 / 36 What is Statistical Arbitrage: 1 I From investwords.com: “An attempt to profit from pricing inefficiencies that are identified through the use of mathematical models. Statistical arbitrage attempts to profit from the likelihood that prices will trend toward a historical norm. Unlike pure arbitrage, statistical arbitrage is not riskless.” I From investopedia.com: “A profit situation arising from pricing inefficiencies between securities. Investors identify the arbitrage situation through mathematical modeling techniques. Statistical arbitrage is not without risk; it depends heavily on the ability of market prices to return to a historical or predicted normal.” John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 18 / 36 What is Statistical Arbitrage: 2 I I From HJTW 2004: “We define statistical arbitrage as a long horizon trading opportunity that genertes a riskless profit. As such, statistical arbitrage is a natural extension of the trading strategies utilized in the existing empirical literature on persistent anomalies.” The HJTW definition contains three key ideas: 1. Zero initial cost (initial value of the trading portfolio is 0). This leads to long-short strategies, not buy-only strategies 2. Long horizon trading. With riskless arbitrage there is a finite time T at which the value of the trading portfolio will be non-negative with probability 1 and positive with positive probability. For statistical arbitrage we must let T → ∞. 3. Riskless profit (in a long horizon context) John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 19 / 36 What is Statistical Arbitrage: 3 I HJTW formalize these mathematically, and they develop statistical methodology to test whether a trading strategy can be considered to be a statistical arbitrage. I HJTW definition of statistical arbitrage: v (t) is the value of a trading portfolio at time t. A trading strategy with a sequence of portfolio values {v (t), t ≥ 0} is a statistical arbitrage if it satisfies: I 1. 2. 3. 4. v (0) = 0, limt→∞ E(v (t)) > 0, limt→∞ P(v (t) < 0) = 0, limt→∞ Var(t)/t = 0. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 20 / 36 What is Statistical Arbitrage: 4 I Statistical arbitrage is not I I I Having a “view” of the market and invest in some particular set of stocks. Considering a universe of stocks and a benchmark return (e.g. the S &P 500 index). Create a portfolio with weights on stocks and compare the return on that portfolio with the benchmark return. Neither is zero-cost or market neutral. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 21 / 36 Structure of Statistical Arbitrage Trading Strategies I Determine a universe of stocks I Select a formation period, a time period over which stocks’ performances are measured I Find a pair of stocks that exhibit a long-term stationary pattern (pairs trading), or rank stocks using some performance measure (value, momentum, reversals) defining “winners” and “losers.” I Develop a zero-cost, long-short portfolio (long the winners, short the losers) and hold it for some given time period. I Continue to construct and hold portfolios based on the given criterion (e.g. value, momentum or reversals) and evaluate the long-run results. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 22 / 36 Some Basic Strategies: Pairs Trading I I I I I Using statistical methodology, identify two or more stocks whose co-movement exhibits a long-term relationship (“mean-reversion”). Follow these stocks until they diverge by a sufficient amount (opening signal), i.e. based on the statistical model, one is priced relatively too high and the other is priced too low. Go short in the overpriced stock and go long in the underpriced stock. Hold until convergence. Need to identify pairs or baskets to trade and need to specify trading strategy parameters (e.g. opening and closing signals, bailout criteria) Strategy is zero-cost and market neutral We will consider the papers by Gatev et. al. and by Avellaneda and Lee (traded ETFs versus component stocks). John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 23 / 36 Gatev et. al. Figure: From Gatev et. al. illustrating pairs trading John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 24 / 36 Value, Momentum and Reversals: 1 I Value investing ranks all stocks in a universe according to a financial variable (e.g. book value) divided by market value (BM). I Lakonishok, Shleifer and Vishny (1993) created long/short portfolios every 6 months going long in top 10% and short in bottom 10% and held for some period. I This zero-cost strategy yielded excess profits of alpha = 10.5% per year over 25 years. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 25 / 36 Value, Momentum and Reversals: 2 I Momentum changes the investing maxim “buy low and sell high” to I “Buy high, sell higher” or “Short low, cover lower” I In 1993 Jegadeesh and Titman introduced a momentum strategy. Rank all stocks in a universe according to their returns over a formation period. Every 6 months they bought the top 10% and shorted the bottom 10%. Over a 23 year period, this zero-cost strategy yielded 1% to 1.5% per month in excess returns. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 26 / 36 Value, Momentum and Reversals: 3 I A third strategy, short term reversals, was also documented by Lehmann (1990) and Cooper (1999) to show very significant excess returns. It is exactly the opposite of momentum strategies with holding periods in the order of 4 or 5 days. I These three strategies produced major “anomalies” and led to the rejection of the efficient market hypothesis. I However, as these strategies became known, and by 2000 it was thought that they were no longer exceptions to the EMH. I New paper: “Value and Momentum Everywhere” by Asness, Moskowitz and Pedersen, July 2013 takes issue with this. This paper not only argues that these strategies are still viable, they apply to many different asset classes. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 27 / 36 2000, The Rise of Electronic Trading: Creative Destruction of the Traditional Industry I Just as computers have led to the creative destruction of industries like the music industry and the publishing industry (and maybe eventually education), computers have led to enormous changes in the finance industry. I New stock exchanges (e.g. BATS, Direct Edge, dark pools) and mergers of exchanges I Trading is dominated by electronic trading I Co-location of traders’ computers at the exchanges to minimize latency I Arms race among market players to get the fastest response to market conditions and news John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 28 / 36 Consolidation of Financial Markets John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 29 / 36 Market Microstructure: 1 I How are prices determined? If you place an order to buy or sell some amount, what price will you ultimately pay or receive? I On many exchanges the price of a stock is determined through the limit-order-book. I 1871 was the beginning of the specialist system. Specialists maintained a list of buy and sell orders for a particular stock, handled order fulfillment and was charged with maintaining an orderly market for a stock when there was a significant order imbalance. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 30 / 36 Market Microstructure: 2 I Decimalization was introduced on August 28, 2000 and was fully implemented on January 29, 2001. I This dramatically reduced bid-ask spreads, thus lowering the cost of trading. Volumes soared into the billions. I Specialists began to be replaced by electronic limit-order-books and market makers. I From 2001 to 2007 it is estimated that electronic trading in equities grew from 25% to 80%. Much of that 80% is coming from algorithmic and high-frequency traders (computers) making trades at the microsecond level. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 31 / 36 Limit-Order-Books: 1 I Limit-order-books are visible, consequently placing a limit order, especially a large order, will have an impact on the entire book. I Computers are tracking order imbalance, so the placement of a large order will move the market. I A large sell (buy) order will cause prices to fall (rise) against the interests of the party placing the order. I Need a sophisticated strategy to move a large block of stock without having severe market impact. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 32 / 36 A Limit Order Book John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 33 / 36 Limit-Order-Books: 2 I We need a sophisticated strategy to move a large block of stock without having severe market impact. I This problem of market impact has led to the estblishment of new exchanges called “dark pools.” These are exchanges in which the “lights are out”, i.e. the limit-order-book is not visible to market participants. I This has led to a whole new set of issues concerning how to gather information about a market in which the limit-order-book is not visible. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 34 / 36 New Exchanges I The new era of electronic trading has been fostered by the SEC through two important regulatory changes: I I I Reg ATS in 1998 that permitted the establishment of ECNs (Electronic Communication Networks) Reg NMS (National Market System) that unified those ECNs into a single market system and assured that investors would obtain the NBBO (National Best Bid Offer) One of the first and most important ECNs was Island founded by Joshua Levine, a Carnegie Mellon student. Island was sold to Instinet (later merged into Nasdaq) for $500M. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 35 / 36 High Frequency Trading: I There has been an “arms race” among investment banks in acquiring the fastest possible computing and networking to 1) react to changes in the limit-order-books and 2) react to news. I In 2007, the CTO of the NYSE reported that one firm estimated a reduction of 1 millisecond in latency was worth $100 million/year to the bottom line. I Communications systems to minimize latency between exchanges (e.g. New York and Chicago or New York and London) are being built. John Lehoczky & Mark Schervish Statistical Arbitrage Lecture 1a: Overview August 25, 2014 36 / 36