Faculteit Economie & Management Studiegebied Handelswetenschappen en Bedrijfskunde Opleiding Master of Science Handelsingenieur Intern aangestuurde masterproef A Tale of Market Efficiency A Methodological Digress Masterproef aangeboden door Tim VERHEYDEN tot het behalen van de graad van Master of Science Handelsingenieur Promotor: Prof. Dr. Filip VAN DEN BOSSCHE Copromotor: Prof. Dr. Lieven DE MOOR Academiejaar: 2012 – 2013 Verdedigd in: Juni 2013 Hogeschool-Universiteit Brussel, Warmoesberg 26, 1000 Brussel Tel: 02-210 12 11, Fax: 02-217 64 64, www.hubrussel.be Hogeschool - Universiteit Brussel Faculty of Economics & Management Master Thesis A Tale of Market Efficiency: A Methodological Digress Tim VERHEYDEN (153947) Master of Science Handelsingenieur Academiejaar 2012-2013 Promotor: Prof. Dr. Filip VAN DEN BOSSCHE Copromotor: Prof. Dr. Lieven DE MOOR Abstract The efficient market hypothesis (EMH) has been subject to debate for decades and has both proponents and opponents. In fact, the field of behavioral finance was developed in response to the body of anomalous evidence with regard to the EMH. Reviewing seminal work that underlies the theory on efficient markets, we provide a historical context for the current debate. Special attention is devoted to the development of appropriate methodology to test for weak form market efficiency, the least restrictive version of the EMH. Methodologies developed in the early aftermath of the debate are explored and tested for robustness to gain a better understanding of how the debate came into existence. More recent alternative approaches are discussed and examined to distill suggestions and recommendations for future research. Additionally, we test the adaptive markets hypothesis (AMH) as a theoretical alternative for the EMH and discuss the results together with our insights from the methodology as a way out of a debate of many years’ standing. We advance the thesis that the current debate can by some extent be explained by sensitivity of the most widely applied methodologies to test for weak form market efficiency. Alternative methodologies like rolling estimation windows and time-varying parameter models also prove useful to test for efficiency. Finally, we find the AMH to be helpful in reconciling between the views of the EMH and behavioral finance, but further research is needed to overcome a conundrum that presents itself when testing for efficiency through the predictability proxy. Keywords: efficient market hypothesis; behavioral finance; weak form market efficiency; adaptive markets hypothesis JEL-codes: B26, G02, G14 1 1. Introduction In the world of academic finance, researchers have been stymied over the informational efficiency of stock markets for more than 40 years. In fact, one could go back to the 18th century to see that even Adam Smith (1759, 1766), father of modern economics, was torn between two opinions on the efficiency and self-stabilizing nature of financial and economic markets. Are stock prices quoted on stock markets in line with the intrinsic value of the underlying financial asset? This question remains to be answered, as academics cannot seem to find common ground. Different statistical approaches have been developed to address the question of market efficiency, but researchers have widely adopted the same kind of methodologies (Lim & Brooks, 2011). However, common ground remains to be found. The fact that scholars continue to disagree over a certain issue, even when examining it in exactly the same way, makes us wonder: maybe the way market efficiency has been examined is biased to begin with? Instead of focusing on the question of whether stock markets are efficient, this paper looks into the far less popular question addressing the cause of the debate from remaining unsettled. Traditional methodologies applied over the years are tested for robustness to gain a better understanding of how the debate came into existence. Alternative methodologies and an alternative theoretical framework are examined to gain valuable insights that can help to settle the debate in the future. Before going into the debate and the methodological issue in more detail, we can start by defining efficiency based on the work of Fama (1970). When he wrote a review of earlier research on the efficiency of financial markets, he decided to bundle the evidence in a new concept called the efficient market hypothesis (EMH). According to Fama (1970, p. 383), “a market in which prices always fully reflect available information is called efficient.” Although this definition provides us with a first idea of what an efficient market is, it does not really explain what is meant by available information. This is why Fama (1970) included some elaboration on this definition, making a distinction between three types of efficient markets, depending on what information is comprised in the information set. Weak form efficient market (information set = historical price information); Semi-strong form efficient market (information set = all publicly available information); Strong form efficient market (information set = all information, both public and private). The original idea of making a clear distinction between different forms of market efficiency comes from Roberts (1967), but Fama (1970) was most successful introducing the concept to the general public. With the additional knowledge on what is meant by available information, it becomes perfectly clear what efficient markets are. For example, if stock market prices fully reflect all past price information of the different stocks, the market is called weak form efficient. This also implies that it is impossible to generate excess returns based on past price information. What this actually means is that so-called technical analysis1 of stocks is obsolete and does not generate any riskadjusted excess returns over the return of the general market. A semi-strong efficient market fully reflects all publicly available information. Thus, it is impossible to make excess returns based on information that is publicly available to all financial market participants. Suppose that in today’s newspaper, a certain company announces news that affects its stock market price. Under the assumption of semi-strong efficient markets, this information is not useful to make any profits, as the market would already fully reflect this publicly available information. What this also means is that fundamental analysis2 of stocks is rendered ineffective. When a stock market is strong form 1 Technical analysis consists of looking at charts of past prices and returns of a stock, in order to derive a certain pattern that can be extended in the future to make profitable predictions of future price movements (Brown & Jennings, 1989). 2 Fundamental analysis consists of researching all publicly available information (e.g. financial statements) about a certain stock to infer important insights that can be used to make a profit in the stock market (Kothari, 2001). 2 efficient, even inside traders with private information on a stock are not able to generate excess returns, adjusted for risk, over the general market. This implies that the prices of stocks fully incorporate all possible information, whether it is publicly available or not. The question remains whether the EMH is a valid academic theory. More specifically, we have to investigate the different forms of the EMH, to be able to conclude whether some form of the EMH is valid. The strong form efficient market theory has never been believed to be accurate. Rather than debating the flaws of this theory, practitioners should recognize that it has some theoretical value in fully explaining the concept of market efficiency. The semi-strong form efficient market theory has never been subject to serious scrutiny. Until recently, most people believed it to hold true. The only question, however, is on which time scale. For example, even if we do believe that a stock market fully reflects all available information, we still have to clarify within which time frame we believe this to be true. Given current technology, it is reasonable to assume that a news announcement in the daily newspaper will already be incorporated in the stock price by the time that someone reads it. It remains unclear, however, whether it is reasonable to assume that newly announced information on a newspaper’s website will also be immediately reflected in the stock market price. Therefore, when we accept the semi-strong form of the EMH, we also need to specify on what time scale we accept it. Different researchers have tried to provide an answer to this time scale question using event studies (Lim & Brooks, 2011). However, the conclusions are not consistent and seem to depend on the point in time the research was conducted. To circumvent this inconvenience, and to take into account the absolute non-believers of semi-strong form efficient markets, researchers have shifted their focus to the weak form of the EMH. The greatest benefit of this shift is that it reduces the debate to its fundamental level. If stock markets appear to be weak form efficient, than the EMH is valid in the weak form. If the opposite applies, the EMH is invalid in any form, as the weak form is the last possible form in which the EMH could possibly hold true. For this exact reason, the focus of this thesis is also on the weak form of the EMH. Now that the debate has been properly redefined to the question of whether the weak form of the EMH holds true, we can start exploring the answer to the question. At first glance, most people are believed to assume that this definition has to hold true. However, it is of crucial importance to consider every word in the definition. After proper investigation, most researchers began to zoom in on the words “fully reflect”. It might seem trivial that, given the current state of information technology, past price information is reflected in the current prices of stocks. However, the question remains whether current stock prices fully reflect this prior information, and thus whether past information is correctly incorporated or not. From the 1970s onward, academics started to elaborate on this question. In later years, two different schools of thought started to form. On the one hand, the proponents of the EMH argue that financial markets are perfectly capable of aggregating information from all investors, which in turn leads to efficient markets. If the price of a stock would appear to be too high given past price information, rational investors would bid the price down to make a profit and vice versa. On the other hand, some researchers started looking into the psychology of investors. In close collaboration with psychologists, the field of behavioral finance was established. Proponents of behavioral finance believe that investors are not always fully rational and therefore are not able to force the stock market to be efficient at all times (e.g. Shefrin, 2000). The debate between these two schools of thought is still going on today. Over the last decade, fewer researchers were interested in trying to settle the debate, but the U.S. housing bubble, which eventually triggered the current sovereign debt crisis, sparked newfound interest in this matter. 3 One of the reasons for the two views to collide is directly related to the dual nature of the fields of economics and finance. Finance scholars have always had a predilection for the scientific approach that is used in the exact sciences. The difference with the exact sciences, however, is that the subjects of finance are still people, which as social creatures are often not programmed according to some rational model that is always valid. This very duality about the scientific nature of finance also caused the two different schools on efficient markets to be formed. On the one hand, proponents of the EMH believe that investors are rational optimizers that are able to make the best possible decisions given certain information. Advocates of behavioral finance, on the other hand, feel like investors are not really able to act rationally at all times. They refer to recent bubbles and financial crises to point out that there are different psychological effects that cause human beings to stray from rational decision making. There is still no consensus on whether the EMH is valid or not. Earlier research signaled that there might be some flaws to the existing methodology, without undertaking further empirical investigation (e.g. Campbell, Lo, & MacKinlay, 1997; Lim & Brooks, 2011). Our work, to the best of our knowledge, is the first to specifically focus on the role of methodology in the debate and the robustness of the most widely applied tests of weak form market efficiency. Clearly, methodology was developed over the years to enable researchers to test for market efficiency. However, methodology might also have been the cause of the debate since researchers have failed to find common ground, even when adopting similar methodologies. In order to evaluate the role methodology has played in the development of the debate, we study different approaches applied in the early aftermath of the conception of the EMH and examine two of them for robustness: Lo and MacKinlay’s (1988) variance ratio test and the augmented Dickey Fuller test. Next to the exploration and analysis of these more traditional tests, alternative methodologies that emerged over the last decade are considered as well to infer insights on how to help settle the debate in the future. The currently best-developed alternative test is checked for robustness. Our robustness analyses are applied to data from the Dow Jones Industrial Average (DJIA), the Standard & Poor’s 500 (S&P-500) and the NASDAQ, which are three well-established U.S. stock market indices that are often used in efficiency research. We also consider the largest Belgian stock market index – the BEL-20 – to verify if our conclusions also hold true outside the United States of America. The results and insights from examining the different types of methodologies are discussed together with a possible reconciling theoretical framework: the adaptive markets hypothesis (Lo, 2004, 2005). From our analysis, we derive some valuable lessons from the past and formulate some suggestions for future research to come to a definitive view on efficient markets. The importance of valid financial models cannot be overstressed, as policy makers and investors tend to act upon academic theory. In many instances, this seemed to work out for the best, as was the case with the concept of diversification from the optimal portfolio theory (Markowitz, 1952). Although, when academic theory is flawed, it has the potential to set the entire economy astray. One example is the housing bubble that caused the global financial crisis of 2008. While policy makers, bankers and investors were blindly following the bullish market, irrational exuberance was building up underneath (Shiller, 2000). Today, we are still trying to deal with the consequences and even the future of an entire generation is at stake. In the next section, we review the literature relevant to the efficient market debate. Our empirical work starts at the third section, which looks into three types of traditional tests of market efficiency. Next, we consider alternative methodologies that have emerged more recently. In the fifth section, we investigate a new theoretical framework that might provide reconciliation between both schools on efficient markets. An integrated discussion of all results obtained through our empirical work is presented in section six; section seven concludes. 4 2. Literature review3 2.1. Economic origins of market efficiency We can trace back the intellectual debate on the efficiency of financial and economic markets to the father of modern economics, Adam Smith. When thinking about the work of Smith, many researchers refer to his seminal book on the wealth of nations (Smith, 1766). In this book, he explained the theory of the invisible hand and argued for the self-stabilizing nature of economic markets. Consequently, many researchers claim that Smith believed economic and financial markets to be efficient and any form of market intervention to be obsolete. However, what these researchers fail to recognize is that Adam Smith produced more than just his book on the wealth of nations. In fact, seven years earlier, he wrote a book on the theory of moral sentiments (Smith, 1759), in which he pointed to apparent behavioral biases in the human decision making process. Clearly, these observations are in contrast with the argument that he believed economic markets to be perfectly efficient. In order to prevent further intellectual abuse of Adam Smith’s work, Vernon Smith (1998) wrote a paper on the apparent contradictions between both works, concluding that the beliefs held by Adam Smith were far more nuanced than one would believe when only reading The Wealth of Nations (1766). In conclusion, it would be intellectually unfair to contend that even Adam Smith believed that economic and financial markets were efficient. 2.2. Statistical foundations of market efficiency Important building blocks for the later development of a theory on the efficiency of financial markets were provided by the theory of probability, which has its origins in the world of gambling. The first mathematical work on probability theory dates back to 1564 and was also a guide to gambling: Liber de Ludo Aleae (The Book of Games of Chance), by the Italian mathematician Girolamo Cardano. According to Hald (1990), Cardano considered different dice and card games, giving readers advice on how to gamble. Other than just a guide for gamblers, the work of Cardano is also of scientific relevance, given his theoretical digressions on the possible outcomes of games of chance. In fact, Cardano defined the terms probability and odds for the first time and even presented what he believed to be the fundamental principle of gambling: equal conditions. Next to the work of Cardano, most early research that was essential in the later development of a theory on efficient markets was conducted in the 19th century. For example, Brown (1828) observed what we now call a Brownian motion for the first time, when he was looking through the microscope and noticed the apparent random movement of particles suspended in water. In later years, Regnault (1863) proposed a theory on stock prices when he found that the deviation of the price of a stock is directly proportional with the square root of time: a relation that is still valid in the world of finance today. The first statement about the efficiency of financial markets came from Gibson (1889, p. 11) in his book about the stock markets of London, Paris and New York: “When shares become publicly known in an open market, the value which they acquire there may be regarded as the judgment of the best intelligence concerning them.” Marshall (1890) transformed economics into a more exact science, drawing from the fields of mathematics, statistics and physics. He popularized the usage of demand and supply curves and marginal utility, and brought together different elements from welfare economics into a broader context. The influence of Marshall on the field of economics was significant, in particular because his book on the principles of economics became a seminal work in the field. At the very end of the 19th century, Bachelier (1900) finished a PhD thesis in which he was the first to present a mathematical model for the motion that Brown (1828) had observed. The stochastic 3 Our review of the literature significantly benefited from the earlier work of Sewell (2011), who presents a more elaborate view of the EMH literature. Contrary to Sewell (2011), our paper focuses on the role of methodology to gain a better understanding of the historical development of the debate and insights for future research. 5 process developed by Bachelier became one of the centerpieces of finance, as Samuelson (1965) based his explanation of a random walk, which was introduced by Pearson (1905), on Bachelier’s early research. 2.3. Economic foundations of market efficiency After the elaboration on some of the statistical origins of market efficiency, we have a look at the work of some notable economists, before and after the Great Depression of 1929. A very prominent researcher at that time was Fisher, who made multiple contributions to the field of finance (Fox, 2009). He made great progress on the search for a general equilibrium theory and provided important insights for utility theory, which later proved useful for von Neumann and Morgenstern (1944) in their definitive book on general utility theory. Despite some of his brilliant contributions, Fisher became even more famous because of his public statements prior to the Great Depression that started in 1929 (Fox, 2009). Fisher was advocating the collection of data to approach the financial market in a much more scientific way than before. Through his revolutionary statistical analysis of stock market prices, he was able to make predictions about future price levels, which led him to publicly announce that the boom in stock prices prior to the 1929 crash was the prelude of a “permanently high plateau”. When only a few days later stock prices plunged like never before, Fisher was publicly humiliated. Subsequent work of Fisher was received with great suspicion, even though it later appeared to be as brilliant as most of his pre-1929 work. Very much like Marshall and Fisher, Cowles (1933, 1944) tried to turn economics into a more exact science and found that investors are unable to beat the market by means of price forecasting. Working (1934) came up with similar conclusions stating that stock returns exhibit behavior similar to lottery numbers. Together, the work of Cowles and Working point towards what was later called an informationally efficient stock market. In 1936, Keynes published his seminal book General Theory of Employment, Interest, and Money. In his work, which mostly impacted and shaped the field of macroeconomics, Keynes introduced the concept of animal spirits. According to him, investors base their decisions on a “spontaneous urge to action, rather than inaction, and not on the outcome of a weighted average of quantitative benefits multiplied by quantitative probabilities” (pp. 161-162). One year later, Cowles and Jones (1937) published a paper that provided early proof of serial correlation in time series of stock prices. Together with the more theoretical work of Keynes, this empirical evidence formed an early challenge to the existence of efficient markets. Nevertheless, a real discussion on the efficiency of financial markets only emerged after the establishment of the EMH by Fama in 1970. As a result of their collaboration during the war, von Neumann and Morgenstern (1944) published their book on the theory of games and economic behavior. Not only was the book the starting point of game theory, it also proved to be essential in the development of a theory on efficient markets. The most important piece of theory in their book was about the maximization of what was called expected utility: a new concept for dealing with uncertainty by multiplying probabilities with utilities of potential outcomes. After the Second World War, Markowitz (1952) published his paper on portfolio selection. Operating within the mean-variance framework, he presented a model in which it was possible to determine the optimal portfolio of securities, providing a maximum level of return given a certain level of risk. Central to his theory was the idea of diversification as a way of getting rid of all systematic or correlated risk, leaving only the so-called idiosyncratic risk of individual securities. The approach of Markowitz in trading off risk and return was very similar to what other economists had been occupied with during the war: considering the trade-off between power and precision of bombs (Fox, 2009). 6 2.4. Asset pricing revolution Sharpe (1964) revolutionized the world of finance by presenting the capital asset pricing model (CAPM). Building on the earlier work of Markowitz (1952), the CAPM allows for the calculation of a theoretical rate of return on an asset, given the amount of non-diversifiable risk the asset entails. The reason only non-diversifiable risk is taken into account is the assumption that the asset is added to a well-diversified portfolio that neutralizes idiosyncratic risk to all extent. Asset pricing models, like the one presented by Sharpe, were very important in the debate on efficient markets that emerged in later years, as it provided researchers with the opportunity to theoretically derive the price and return of financial assets. That way, it was possible to examine whether the actual return on an asset was in line with the theoretical rate of return derived from the underlying asset pricing model. In later years, scholars came across some interesting asset pricing anomalies and argued that the CAPM was too limited by only accounting for one factor of risk. Ross (1976) came up with an alternative: arbitrage pricing theory, which is far more flexible than the model of Sharpe, and states that the expected return on an asset is a linear function of different factors of risk, each with their respective factor sensitivity. Whenever the actual return on the asset deviates from the one derived from the theoretical model, the force of arbitrage brings the actual rate of return back in line with the theoretical one. By discounting for several sources of risk instead of just non-diversifiable risk, the model addresses the major flaw of the CAPM. However, the model of Ross is very general and does not give any guidelines as to what specific factors of risk to account for. In 1993, Fama and French further improved asset pricing theory by presenting their threefactor model. Starting from their observation of pricing anomalies with respect to market capitalization and growth vs. value strategies, they found the expected rate of return to depend on the exposure of the asset to each of three factors: market risk premium (non-diversifiable risk), market capitalization and the book-to-market ratio. With their model, they not only addressed the biggest flaw of the CAPM (only one risk factor), but also were specific in formulating their factors of risk, unlike Ross. Following the momentum puzzle pointed out by Jegadeesh and Titman (1993), Carhart (1997) extended Fama and French’s model to a four-factor model, taking into account a momentum risk factor. However important the asset pricing models in the debate on efficient markets, a conundrum presented itself. It was never entirely certain what the correct theoretical price of an asset was, as different models accounted for different factors of risk and could yield different theoretical rates of return. This conundrum has come to be known as the joint-hypothesis problem, which we address later on in this literature review. 2.5. The beginning of efficient market theory The idea of an efficient market was first described by Samuelson (1965) when he showed that a stock market is informationally efficient when prices fluctuate randomly, given that the market contains all available information and expectations from market participants. In the same year, Fama (1965a) also defined an efficient market for the first time. Based on the empirical investigation of stock market prices, he observed that financial markets follow a random walk. Another paper by Fama (1965b) elaborated on the random walk pattern in stock market prices to show that technical and fundamental analysis could not possibly yield risk-adjusted excess returns. Fama and Blume (1966) considered the profitability of technical trading rules like the popular filter rule4 that was described by Alexander (1961, 1964). They concluded that no economic profits could be made using these filter rules since trading costs would be too high even when adopting the most profitable very small-width5 filters. This also confirmed their beliefs of financial markets being 4 Example of an x% filter rule: buy and hold securities of which the daily closing price moves up by at least x%, until the price moves down by at least x% from the subsequent high, at what point it is time to simultaneously sell the security and go short. The short position is then maintained until the daily closing price of the security rises at least x% above the subsequent low, after which the short position needs to be covered and the security is bought again (Alexander, 1961, 1964). 5 A very small-width filter is a filter in which x lies between 0.5% and 1.5% (Alexander, 1961, 1964). 7 informationally efficient. Roberts (1967) was the first one to coin the term efficient market hypothesis (EMH) and suggested a distinction between several types of efficiency. The definitive paper on the EMH was published by Fama (1970) in the form of his first of three reviews of the theoretical and empirical work on efficient markets. He defined an efficient market to be a market that fully reflects all available information and introduced three different types of informational efficiency. Summarizing results from weak form, semi-strong form and strong form efficiency tests, Fama concluded that almost all of the early evidence pointed towards a financial market that was efficient in at least the weak sense. Although he found some price dependencies, they never sufficed to be used in profitable trading mechanisms, making markets weak form efficient. Fama also considered the joint-hypothesis problem. Essentially, he argued that it would be impossible to ever correctly test the EMH, because no academic consensus was found on the true underlying asset-pricing model. Whenever a test of market efficiency would reject the efficiency hypothesis, there was always the possibility that it was simply due to the underlying asset pricing model finding an incorrect theoretical asset value. The only conclusion that could be made from efficiency tests is that a market is efficient or not with respect to a certain underlying asset pricing model. The same conclusion could never be made independently from the underlying model. Besides Fama (1970), other researchers have attempted to formulate a clear definition of what is meant by an efficient market. Jensen (1978, p. 96) wrote “a market is efficient with respect to information set θt if it is impossible to make economic profits by trading on the basis of information set θt.” Malkiel (1992) stated that a stock market is efficient whenever the prices of stocks remain unchanged, despite information being revealed to each and every market participant. Even though there is a lot of academic merit to the definitions of Jensen and Malkiel, we adopt the definition of Fama, as explained in the introduction. More particularly, this thesis focuses on weak form market efficiency, i.e. the set of available information we consider consists only of historical price information (Fama, 1970). 2.6. Early aftermath of efficient market hypothesis In the early aftermath of Fama’s work (1965a, 1965b), a lot of research was conducted in order to test the validity of the EMH. Like Fama (1970) concluded in his first review paper, a lot of the empirical evidence was pointing towards a weak form efficient stock market. Immediately though, different scholars found contradicting evidence as well. Kemp and Reid (1971) pointed out that a lot of the earlier research considered only U.S. stock market data. Using British data, they found the stock price movements to deviate from what is expected under the random walk hypothesis, which contradicts with what was argued by Fama. Grossman (1976) found the first evidence of an important paradox: “informationally efficient price systems aggregate diverse information perfectly, but in doing this the price system eliminates the private incentive for collecting the information” (p. 574). In his literature survey, Ball (1978) pointed out consistent excess returns after the public announcement of firms’ earnings, which is a clear violation of the theory on semi-strong form efficient markets, as no excess returns should be possible when trading on public information. When looking at long-term interest rates, Shiller (1979) found that the observed volatility is in excess of that predicted by expectations models. This observation implies some extent of forecastability of long-term interest rates, which contradicts the EMH. The most convincing piece of contradicting evidence came from the paradox that was presented by Grossman and Stiglitz (1980), following the earlier work of Grossman (1976). In order for investors to be motivated to spend resources for collecting and analyzing information to trade on, they must have some form of incentive (Grossman & Stiglitz, 1980). If a stock market would prove to be perfectly efficient, however, there would be no reward for collecting information, since that information would already be reflected in the current stock price. This simple paradox shows that financial markets can never become entirely efficient, as no investor would be motivated to collect 8 information in the first place. Consequently, no one would trade on new information and it would become impossible for stock market prices to reflect all available information. 2.7. Establishment of behavioral finance Following the paradox presented by Grossman and Stiglitz (1980), polarization on the topic of efficient markets became apparent. More and more researchers observed pricing anomalies, hence the validity of the EMH became uncertain. De Bondt and Thaler (1985) tested the hypothesis that investors tend to overreact to unexpected and dramatic news events by looking at the performance of extreme portfolios over three years. They found that portfolios of stocks that performed poorly over the last three to five years tend to significantly outperform portfolios of stocks that performed well over this period in the next three to five years. This finding was consistent with the overreaction hypothesis and pointed to weak form inefficiencies. In later years, Lakonishok, Shleifer and Vishny (1994) conducted similar empirical research, using proxies for value instead of historical price information. The results they found also revealed market inefficiencies. Next to the contradicting evidence from empirical research in the field of finance, some psychologists started to point out behavioral biases that challenged the EMH. Most significantly, Kahneman and Tversky’s (1979) prospect theory explained how investors tend to make decisions when risk is involved, and provided an alternative to expected utility theory. People tend to be loss averse, as they hate losing more than they love winning. Rather than simply multiplying probabilities and utility, prospect theory suggests that a distinction is made between losses and gains, which transforms the expected utility function. The behavioral biases presented in prospect theory are in direct contrast with a theory of efficient markets in which investors were originally assumed to be fully rational. Together with the paper of De Bondt and Thaler (1985), the work of Kahneman and Tversky (1979) can be seen as the beginning of the field of behavioral finance, in which the traditional theory of finance is merged with concepts from other social sciences like psychology and sociology. Behavioral finance tries to formulate an alternative for the EMH by assuming that investors are not perfectly rational, which leads to anomalies in stock pricing (e.g. overreaction), which in turn causes stock markets not to be behave efficient at all times. 2.8. Development of tests for weak form market efficiency A very interesting day, both for the world of practice and academics, was October 19th, 1987: Black Monday. This day became infamous because of the crashing of stock markets around the world. Starting in Hong Kong, the crash spread to Europe and eventually the United States later the same day. The DJIA dropped by 508 points or 22.61%, which to this day is the largest percentage drop ever in its value. Different phenomena came together to cause this dramatic event: program trading, market psychology, overvaluation and eventually illiquidity (Shiller, 1989). Despite its negative impact on investors and the global economy, this unique event also provided researchers with valuable new data for scientific analysis. Advocates of behavioral finance also pointed towards Black Monday to further illustrate that investors are not fully rational and overreact to information in times of market mania. Together with the valuable data from the Black Monday crash, the evolution of computing power allowed researchers to come up with new and more advanced empirical tests of market efficiency (Bodie, Kane, & Marcus, 2010). In our literature review, we focus on three particular types of statistical tests of the weak form of the EMH. A first group of weak form market efficiency tests looks at return autocorrelations. The general philosophy behind these tests is the following: if significant autocorrelation is found among the returns on a stock, there is some extent of predictability, which is in contradiction with the EMH (Lim & Brooks, 2011). The empirical work testing return autocorrelations can be split based on the horizon of returns. Autocorrelations in the short run (day, week, month) tend to be positive for 9 returns on portfolios (e.g. Conrad & Kaul, 1988; Lo & MacKinlay, 1990, 1997) and negative for returns on individual stocks (e.g. Lehmann, 1990; Jegadeesh, 1990). Autocorrelations in medium horizon returns (1-12 months) tend to be positive; for the long horizon (1-5 years) return autocorrelations tend to be negative (Bodie et al., 2010). For short horizon returns, Lo and MacKinlay (1990, 1997) find significant autocorrelation among returns on S&P-500 stocks. However, the pattern of autocorrelation is weaker for weekly and monthly returns, and for larger rather than small stocks. Jegadeesh and Titman (1993, 2001) considered stock returns in the medium horizon and found significant evidence of momentum profits, which gave rise to an important puzzle in asset pricing theory. Proponents of behavioral finance used this finding of momentum profits to argue that a gradual adjustment of prices causes the predictable drifts or autocorrelation in returns, which implies that financial markets do not promptly incorporate news in past prices, and hence are not weak form efficient (Bodie et al., 2010). For evidence of return autocorrelation on the longer horizon we can refer back to De Bondt and Thaler (1985), who showed that investors tend to overreact to dramatic and unexpected news events. Some of the (momentum) puzzles found when analyzing return autocorrelations were further investigated by researchers on the behavioral finance side of the debate. Using behavioral theories of under- and overreaction to information, some researchers were able to explain the observed puzzles. De Long, Shleifer, Summers, and Waldmann (1990) showed that the long horizon negative autocorrelation in returns (reversal) can be explained by a stylized model with two types of agents: fundamentalists, who get signals about intrinsic values, and chartists, who learn indirectly about intrinsic values by looking at prices. Whenever a good signal is received by fundamentalists, prices increase. Chartists will observe this rise in prices, causing some chartists to buy, which in turn further increases prices and causes more chartists to buy. Eventually, share prices are so far beyond intrinsic values that fundamentalists start selling again. Another explanation was provided by Barberis, Shleifer and Vishny (1998). They explained underreaction to information using conservatism: investors erroneously believe that the earnings process underlying stock prices is mean-reverting and so they underreact to news. To explain overreaction, they refer to the representativeness heuristic: investors overextrapolate from a sequence of growing earnings, overreacting to a long trend. Daniel, Hirshleifer and Subrahmanyam (1998) related overreaction to overconfidence, as traders tend to overestimate the precision of their private signals, leading to prices being pushed above the fundamental level in the case of good news. Some researchers also developed specific linear serial correlation tests to analyze the weak form of the EMH (Lim & Brooks, 2011). These tests simply examine the third version of the random walk hypothesis, which we explain later on, and have been adopted right from the start of the debate (e.g. Granger & Morgenstern, 1963; Fama, 1965a). However, the most popular linear serial correlation test was developed several decades after the start of the debate by Lo and MacKinlay (1988), when they presented their variance ratio (VR) test. The VR test can be used to check the null hypothesis of serially uncorrelated returns, which points towards informational efficiency of stock prices, and is expressed as the ratio of the k-period return variance over k times the variance of the one-period return. According to the random walk hypothesis, stock prices are following a random walk when the variance of the k-period return is the same as k times the variance of the one-period return. So in order to test whether returns are serially uncorrelated, it suffices to test whether the variance ratio is significantly different from one. Applying their own VR test, Lo and MacKinlay found that the random walk hypothesis does not hold for weekly stock market returns. Further on in this paper, the VR test is examined for robustness in order to gain insights into the role of methodology in the debate. Unit root tests, which can examine the stationarity of stock returns, form a second class of weak form market efficiency tests. The basic idea is that stock returns that contain a unit root, and are hence non-stationary, are following a random walk (Lim & Brooks, 2011). The most popular approach to examine stationarity has proven to be the augmented Dickey-Fuller (ADF) test, which is also examined for robustness in the next section. Recent research has led to the development of 10 more sophisticated tests of stationarity as well. However, it was shown that the existence of a unit root in stock returns is not a sufficient pre-requisite for the random walk hypothesis to hold (Rahman & Saadi, 2008). In addition to stationarity, returns need to be serially uncorrelated in order for those returns to be following a random walk. The final class of weak form market efficiency tests considers non-linear serial dependence. Since linear autocorrelation tests only account for linear effects, some researchers pointed out that stock markets could exhibit inefficient behavior, even when linear autocorrelation tests point towards informational efficiency in the weak sense (Granger & Andersen, 1978). Among popular tests of non-linear serial dependence are the Hinich bicorrelation test (Hinich, 1996), the Engle Lagrange multiplier test (Engle, 1982) and the Brock-Dechert-Scheinkman test (Brock, Scheinkman, Dechert, & LeBaron, 1996). Almost every empirical paper employing one or more of these tests reports significant nonlinear serial dependence across worldwide stock markets (Lim & Brooks, 2011). Despite the emergence of these different tests, a consensus on the validity of the EMH remained to be found. By the beginning of the 1990s, the debate had split researchers into two camps: believers of the EMH on the one hand, and proponents of behavioral finance on the other. As a reaction to the emergent body of anomalous evidence and the rise of behavioral finance, Fama (1991) wrote a second review covering tests of the different forms of the EMH. He concluded that the idea of efficient markets still remained valid because the observed anomalies tended to disappear over time and because anomalous traders seemed to cancel each other out. 2.9. Alternative approach to weak form market efficiency testing Looking for a way to settle the debate and taking into account the paradox pointed out by Grossman and Stiglitz (1980), Campbell et al. (1997) suggested a new approach in testing for market efficiency. Instead of using all-or-nothing tests that did not lead to a definitive answer, they suggested an approach in which the degree of market efficiency was tested over time. This approach would enable researchers to draw more nuanced conclusions, which could eventually help move the debate along. Other than introducing this idea, Campbell et al. did not present a concrete approach. However, they inspired other researchers to come up with several alternative tests of market efficiency. Since this paper focuses on weak form market efficiency, we only discuss three alternative forms of weak form market efficiency tests. Section four considers these alternative methodologies in more detail. A first alternative approach is the non-overlapping subperiod analysis, which looks at different separated time windows and the evolution of efficiency between those windows (Lim & Brooks, 2011). This approach is only useful when examining the impact of a specific policy from one time window to another. For example, one could investigate the effects of a short sell prohibition on market efficiency. The first subperiod would then consist of all the historical data up until the last day before the prohibition took effect, and the second subperiod would start at the moment the prohibition was adopted until today. Another possible alternative is the use of rolling estimation windows. The idea behind this alternative is to transform a data sample of n observations into n-l+1 windows, with l being the length of the window6 (Lim & Brooks, 2011). Here, the different time windows overlap, as they are pushed forward until the final observation is included in the last time window7. This rolling approach allows researchers to look at underlying changes in efficiency on a shorter time scale than is the case with the non-overlapping subperiod analysis. Furthermore, rolling estimation 6 For example, if we have 100 observations of returns on a certain stock, and a time window length of 20, we could transform the data into 81 different time windows. 7 In the same example, we would start with the time window going from observation 1 until 20. Then, we push the time window forward to get the second window, spanning from observation 2 until 21. We continue this procedure until we reach the last time window, covering observation 81 until 100. 11 windows accommodate a comparison of stock market efficiency through time, since a varying degree of efficiency is measured, rather than a static binary condition of efficiency. Time-varying parameter models constitute a final alternative approach to market efficiency testing. This approach draws from state space models8 to allow standard regression parameters to change over time (Lim & Brooks, 2011). The greatest advantage is that this allows regression methods to be applied to more dynamic concepts like time-varying efficiency. Primarily, these models have been applied to developing stock markets, as these could not have been efficient from inception. The time-varying parameter model allows for an evolution of those markets towards efficiency, by letting regression parameters evolve through time. A static approach like one of the classic tests would not allow for this underlying shift in parameters and would thus be biased. Recently, however, different time-varying parameter models have also been applied to developed financial markets. In section four, we further study this methodology and present an extended time-varying parameter model combining properties from the classical literature of efficient markets and behavioral finance. Despite the potential of these emerging methodological approaches, the debate remains unsettled. To further address the critiques uttered against the EMH, Fama (1998) wrote a third and final review on the empirical work testing market efficiency, and concluded there is a lack of valid evidence to disprove his theory. In subsequent years, it seemed as if the discussion would never be settled and that it would slowly fade away to the history books. Nevertheless, advocates of behavioral finance did not rest their case and put in a lot of effort to make behavioral finance more known to a broader audience. 2.10. Current state of the debate Today, there is still no definitive view on the efficiency of financial markets, even though proponents of both the EMH and behavioral finance have conducted further research. Shiller (2003) claims that theories of efficient markets should remain as a characterization of an ideal world, but not as an accurate description of global financial markets. Literally preceding the article of Shiller in the same journal, Malkiel (2003, p. 80) argues that “if any $100 bills are lying around the stock exchanges of the world, they will not be there for long”. His statement became a classical economics joke to explain that efficiency anomalies would not persist because someone would benefit from the opportunity immediately, through the price arbitrage mechanism. Currently, we see two important reasons why the debate on efficient markets is still not settled. The first one relates to an alternative theoretical framework. Being critical of an existing theoretical framework is somewhat straightforward. Indeed, a theory is supposed to be imperfect as it is only a framework to describe reality. However, coming up with a new and improved theory is far less evident. Thus far, advocates of behavioral finance have failed in coming up with such a new theory that could replace the EMH. While several behavioral biases have been documented in the academic literature, there is still a lack of an overarching framework that could describe the efficiency of financial markets in a behavioral way. Well aware of this problem, Lo (2004, 2005) looked at evolutionary biology to reconcile opinions from both ends of the efficiency spectrum when he formulated the adaptive markets hypothesis (AMH). In his theory, he also incorporates the concept of a varying degree of efficiency following Campbell et al. (1997). Despite its potential, the AMH has not yet replaced the EMH as the definitive theory on the efficiency of financial markets. Further exploring as to why this is the case, we empirically test Lo’s theory in section five. 8 “State space modeling provides a unified methodology for treating a wide range of problems in time series analysis. In this approach it is assumed that the development over time of the system under study is determined by an unobserved series of vectors α1, …, αn, with which are associated a series of observations y1, …, yn; the relation between the α t’s and the yt’s is specified by the state space model. The purpose of state space analysis is to infer the relevant properties of the αt’s from a knowledge of the observations y1, …, yn.” (Durbin & Koopman, 2008, p. 1). 12 We also see a second reason why the debate yet remains to be settled: flawed methodology. As is clear from the literature overview, many different methodologies have been developed over the course of the last 60 years. On the one hand, there were the more traditional tests of efficiency that led to all-or-nothing conclusions. On the other hand, a new strand of methodologies was developed following the idea of a time-varying degree of efficiency. Even though methodologies were originally designed to help settle the debate, we believe the way researchers implemented these methodologies also caused the debate from remaining unsettled. The most important argument for holding this belief is that conflicting results were found by researchers who adopted exactly the same methodologies. In this thesis, we further examine the role of methodology in the debate on efficient markets. We discuss different methodological approaches and examine two of the most popular traditional efficiency tests and one alternative test for robustness. The results are discussed together with insights from our empirical test of Lo’s AMH, which enables us to distill important lessons from the past and suggestions for the future to eventually help settle the debate. 3. Traditional tests of weak form market efficiency With the market efficiency debate explained and the historical train of thought explored, we start the empirical work by focusing on the methodological approaches that were applied in the past, in order to better understand how the debate came into existence. In this section, we focus on the three different types of traditional tests of market efficiency that were introduced in the literature review. We refer to these tests as traditional, as they were applied most often in the early aftermath of the establishment of the EMH. Also, these tests make use of more traditional statistical techniques that can be characterized as static, as only one time-invariant conclusion can be drawn based on a full sample of data. 3.1. Tests based on return autocorrelation 3.1.1. Two different approaches In the literature review, we introduced two possible approaches when relying on return autocorrelation as a proxy to test for weak form market efficiency. The principle for both of these approaches is the same: significant return autocorrelation implies some degree of predictability, which is at odds with the EMH. Because of their relatedness, tests of return autocorrelation are also a test for the applicability of technical analysis. The first approach simply examines autocorrelation in return series over different time horizons. Over the short horizon (1 day - 1 month), mixed results have been found in the past. For individual stocks, short run return autocorrelations tend to be small but negative, which can be explained by market microstructure effects like the bid-ask bounce (e.g. Lehmann, 1990; Jegadeesh, 1990). Portfolio returns, however, exhibit large and positive autocorrelations over the short run, which could be explained by effects of non-synchronous trading9 (e.g. Conrad & Kaul, 1988; Lo & MacKinlay, 1990, 1997). In the medium horizon (1-12 months), returns on both individual stocks and portfolios exhibit positive autocorrelation. Drawing from this observation, Jegadeesh and Titman (1993, 2001) implemented a strategy that consisted of buying a portfolio of stocks that performed well in the three to twelve months before the investment period, and selling a portfolio of stocks that performed badly over the same period. What they found was that the portfolio of stocks that performed well in the last three to twelve months tended to have a positive riskadjusted excess return (alpha) over the subsequent twelve-month investment period; alpha tended to be negative for the portfolio of stocks that performed badly over the same investment period. These results were surprising and created what was called a momentum puzzle. Finally, returns over the long horizon (1-5 years) exhibit negative autocorrelation or mean reversion. De Bondt and 9 A non-synchronous trading effect arises when the assumption is that asset prices are recorded at time intervals of the same length, when in fact they are recorded at time intervals of other, possibly irregular lengths (Bodie et al., 2010). 13 Thaler (1985) further investigated this finding and tested the hypothesis of overreaction to unexpected and dramatic news by looking at portfolios of stocks with extreme past performance. The extreme portfolios were constructed based on market-adjusted cumulative abnormal returns of stocks over the last three years: the top 35 stocks were allocated to a winner portfolio, and the bottom 35 stocks were allocated to a loser portfolio. Consequently, the performance of the extreme portfolios was measured over three years, by deriving the differences in market-adjusted cumulative average returns. The results found by De Bondt and Thaler were consistent with the overreaction hypothesis: over a period of 50 years, loser portfolios had outperformed winner portfolios by about 23% in the three years following the formation of the portfolios. This apparent overreaction to new information clearly pointed to weak form inefficiencies. The second approach drawing from return autocorrelations to test for weak form market efficiency focuses on the development of statistical tests for linear serial correlation. Historically, the primary tool to test for weak form market efficiency through the exploration of return autocorrelations was the VR test (Lo & MacKinlay, 1988). After briefly introducing this test in the literature review, we now provide a further exploration. Additionally, we perform a robustness analysis, which provides valuable insights on how methodology influenced the debate. Although several innovations have been suggested to the original VR test (e.g. Chow & Denning, 1993; Wright, 2000), we perform our robustness analysis to the original test by Lo and MacKinlay (1988). Given our aim to better grasp the historical development of the debate and the role of traditional methodology, our focus in this section is on those methodologies that have been most influential in the past. When looking for ways out of the debate later on in this paper, the most recent and state-of-the-art methodologies are considered. 3.1.2. Robustness analysis: Lo and MacKinlay’s variance ratio test Before explaining the VR test, we need to introduce the concept of a random walk with a drift10, which is a statistical property of time-series data that is important in every academic field that makes use of time-series analysis. Our explanation is based on the seminal book by Campbell et al. (1997). Random walks There are three possible forms of the random walk. The simplest, but strongest form of the random walk (version 1) is the one with independently and identically distributed (IID) increments: For example, assume the time-series at hand is the return11 on a certain stock. The equation of the first version of the random walk then tells us that the return of today is equal to the return of yesterday plus an expected return change or drift and a certain IID random error term . The expected price of today will thus be equal to the price of yesterday plus a certain drift. The second version of the random walk (random walk 2) is a generalization of the first one, as the increments are only assumed to be independently but not identically distributed (INID). The reason for this generalization is purely empirical, as stock returns have proven to be distributed in a nonidentical way through time. This ought to be no surprise since a lot has changed throughout the years. Stock markets have evolved in terms of economic, technological, social, institutional and regulatory aspects. As a consequence, stock prices and stock returns are not identically distributed through time and we need a less constrained model that accounts for this statistical property. 10 In this paper, whenever we use the term random walk, we refer to the random walk with a drift and the three possible forms it can take (depending on the assumptions of the error term). Note, however, that there are other random walk models as well (e.g. pure random walk and random walk with a trend and a drift). 11 Following earlier research, we always use continuously compounded returns instead of gross returns. 14 The third and final version of the random walk is a further generalization of the independent increment model (random walk 2). We now only assume that the increments are uncorrelated. The reason for this further relaxation of assumptions is again empirical. Especially for stock prices, researchers have found it implausible for today’s stock return to be completely independent from yesterday’s return. The uncorrelated increment model (random walk 3), because of its more realistic assumptions, has proven to be the most popular to test for random walks in stock return time-series. We start our analysis from this third version of the random walk as well. We also need to point out the link between random walks and weak form efficient markets. As we saw before, a weak form efficient financial market is a market in which all past price information is fully reflected in stock prices. Therefore, it is impossible to predict future prices based on past price information. The random walk model says precisely the same as this weak form efficient market condition. The stock price of tomorrow is unpredictable, as there is no way of predicting the arbitrary drift term. This parallel between the weak form market efficient condition and the random walk made it interesting for researchers to test weak form market efficiency indirectly, by testing if stock returns12 are following a random walk (version 3)13. This is also the exact principle that is used in the VR test. Introduction to the variance ratio test Lo and MacKinlay’s VR test (1988) has turned out to be extremely popular among researchers to test for an uncorrelated increment and can also be used to determine whether a stock market is weak form efficient over a certain period of time. The main assumption behind the test is that if stock returns follow a random walk (version 3), the variance of the stock returns over a certain time interval is the same as (order of differentiation; lag order) times the variance of the stock returns over an interval . Following Campbell et al. (1997) we can also develop this model with more mathematical rigor. Assume that is the return on a certain stock at time . The uncorrelated increment model then looks like the following: The arbitrary drift is once again represented by and is the random increment at time . If the return follows a random walk (version 3), the variance of should be times the variance of . Statistically, this simple yet elegant relationship can be tested by calculating if the ratio of the variance of over times the variance of significantly differs from unity. This is exactly what Lo and MacKinlay’s VR test does. The null hypothesis of this test states that the time-series follows an uncorrelated increment model. Whenever the ratio statistically differs from unity, the null hypothesis can be rejected and we arrive at the alternative hypothesis stating that the time series does not follow a random walk (version 3). 12 Proponents of the EMH have argued that in order for a stock market not to be weak form market efficient, returns rather than prices must not follow a random walk. The explanation of why the VR test is able to test weak form market efficiency, however, remains valid since returns are simply a transformation of stock prices. 13 Some proponents of the EMH have recently argued that a stock market that is not following a random walk could still be efficient. However, we do not focus on this discussion. Instead, we look at methodologies that have been used abundantly by researchers in the past and how these might have influenced the fact that we still have a debate going on today. 15 The VR test is typically implemented for different orders of differentiation . For example, with an order of differentiation of 2, the VR would become: With being the first-order autocorrelation coefficient of returns and . The VR test statistic with order of differentiation 2 is equal to one plus the first-order autocorrelation coefficient of returns, for any time series. More generally, for any order of differentiation , the VR test statistic can be calculated as follows: With and being the kth order autocorrelation coefficient of . As is clear from the definition, is a particular linear combination with linearly declining weights of the first autocorrelation coefficients of . Most commonly, the following orders of differentiation have been applied: 2, 4, 8 and 16. In applying the test and examining its robustness, we also report values on these four different orders of differentiation. There is still one major drawback to the VR test statistic as we have just derived it: it only applies under the assumption that the data are homoscedastic (Lo & MacKinlay, 1988). To overcome this inconvenience, one could first model heteroscedasticity separately (e.g. via GARCH-models), and then apply the presented form of the VR test. However, Lo and MacKinlay also developed an integrated approach for a heteroscedasticity-consistent test statistic, drawing from the earlier work on heteroscedasticity-consistent methods from White (1980) and White and Domowitz (1984). What they found was a robust standardized test statistic following the standard normal distribution that can test the third version of the random walk, despite the possible presence of heteroscedasticity. When implementing the VR test and when checking for robustness, we report heteroscedasticity-consistent probability values (p-values) of the test statistic. The derivation of this heteroscedasticity-consistent form of the VR test, however, is beyond the scope of this paper. Application of the variance ratio test After introducing the VR test, we now apply it to the historical returns of the three major U.S. stock market indices: DJIA, S&P-500, and NASDAQ; and the main Belgian stock index: BEL-20. To perform our analysis, we use the statistical software package Gretl. The data for our tests come from the financial database Datastream of Thomson Reuters. All available data up until Tuesday, February 5th, 201314 is included. We start by an exploration of the data, followed by an in-depth explanation of the applied methodology. Next, we present and discuss the results from our application of Lo and MacKinlay’s (1988) VR test. 14 The data were collected at the library of the faculty of Business and Economics of the KU Leuven on Wednesday, February 6th, 2013. 16 Table 1 summarizes the data used throughout this thesis. Plots of the data summarized in table 1 can be found in appendix A. Table 1: Data summary statistics To gain more insights into the debate and the role of methodology, the robustness of results obtained through the popular VR test is analyzed. The robustness check is implemented as a scenario analysis based on two decisions researchers need to make when implementing the VR test: time interval of data and time window of data. The fact that researchers typically only include one type of time interval and time window is the exact reason why we perform robustness tests. For each of the four indices, we implement the VR test for three different time intervals: daily, weekly and monthly returns. Next to the time interval, we also test the robustness for changes in the time windows of data. More specifically, for every index and time interval, we apply the VR test to 30 different random time windows grouped into three categories of constant window length (5 years, 10 years and 20 years). When implementing the VR test, researchers also need to decide on the orders of differentiation (q). However, this purely statistical parameter is not examined for its robustness, as academics typically report results from different orders of differentiation to avoid sensitivity. In our analysis, we use the four different values (2, 4, 8 and 16) for the order of differentiation that were originally suggested by Lo and MacKinlay (1988). Conclusions for a specific sample of data are drawn from the results using these four different orders of differentiation. Since our data comprise daily, weekly and monthly price levels for the four selected indices, we first need to implement a transformation to obtain returns. In order to find the return on day/week/month t, we calculate the natural logarithm of the price level on day/week/month t divided by the price level on day/week/month t-1. Next, we divide our data into time windows with different time lengths for every index-time interval pair. For the robustness analysis we choose window lengths of 5, 10 and 20 years, as these represent typical time lengths of data from a large sample of historical studies. After the division of our data in windows of 5, 10 and 20 years, we randomly select 10 different time windows for each particular combination of index, time interval and window length. This way, we end up with 9015 different scenarios per index, which can be divided into 9 groups of unique combinations of time interval and time window length of data. Within each of these 9 groups, we end up with a random selection of 10 different scenarios with 15 3 different time intervals of data (daily, weekly and monthly) x 3 different window lengths of data (5, 10 and 20 years) x 10 random samples. 17 the same window length and time interval. For every scenario, we report p-values of the VR test statistic for four different orders of differentiation (2, 4, 8 and 16), which means that our analysis eventually yields 36016 p-values for the 90 different scenarios per index. These p-values can be seen as statistical thresholds and give us the necessary information to conclude whether or not we can reject the null hypothesis of stock returns following a random walk (version 3) for a certain scenario. In drawing our conclusions, we use a 95% confidence interval. As a way to explain the variance in p-values across the different scenarios, an OLS regression17 is implemented as well. The explanatory variables in this regression include dummies for time interval, time window length and stock market index. All together, our approach allows us to examine the robustness of the VR test as it was designed by Lo and MacKinlay (1988). From this robustness check, we infer valuable insights on how the current debate has developed over the last decades and what role specific methodology like the VR test has played. The summary of results for each of the four considered stock indices are presented in tables 2, 3, 4 and 5. 16 3 different time intervals of data x 3 different window lengths of data x 10 random samples x 4 orders of differentiation. 17 Results and output from the OLS regression are available upon request. 18 Table 2: DJIA VR test results The table presents 360 p-values for the 90 scenarios of the Dow Jones Industrial Average stock market index. Daily, weekly and monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination and q represents the four different orders of differentiation that were considered. P-values indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis. 19 Table 3: S&P-500 VR test results The table presents 360 p-values for the 90 scenarios of the Standard & Poor’s 500 stock market index. Daily, weekly and monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination and q represents the four different orders of differentiation that were considered. P-values indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis. 20 Table 4: NASDAQ VR test results The table presents 360 p-values for the 90 scenarios of the NASDAQ stock market index. Daily, weekly and monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination and q represents the four different orders of differentiation that were considered. P-values indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis. 21 Table 5: BEL-20 VR test results The table presents 360 p-values for the 90 scenarios of the BEL-20 stock market index. Daily, weekly and monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination and q represents the four different orders of differentiation that were considered. P-values indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis. First, we notice that results are similar across stock markets, even for the BEL-20. For each of the four considered stock market indices18, daily and weekly data unambiguously lead to the rejection of the null hypothesis of returns following a random walk (version 3). Research on samples with window length of 20 years also yields unequivocal results. Within time intervals, we notice that 18 Only three small exceptions are found: one for daily observations of the NASDAQ, and two for weekly observations (one for the BEL-20, one for the S&P-500). Despite these exceptions, there is conclusive evidence that daily and weekly observations lead to unambiguous conclusions. 22 longer time windows are in general less prone to sensitivity. The summary tables also show that ambiguity of results increases from daily to weekly to monthly data. From these results, we can conclude that daily data are the most robust to underlying variations in time window length and sampling scheme, closely followed by weekly data. Monthly data prove to be prone to significant degrees of sensitivity, particularly when using 5- and 10-year time windows. Results from the OLS regression on the total of 1 440 p-values indicate that the index at hand does not significantly explain variation in the p-values. In contrast, time window and time interval are highly significant. A negative relationship is found between time window length and p-value. The relationship between time interval and p-value is positive: longer time intervals lead to higher pvalues. Orders of differentiation of 8 and 16 yield significantly higher p-values than orders of differentiation 2 and 4. In brief, Lo and MacKinlay’s VR test appears to be robust when using daily and weekly data, regardless of the time window length and random sample of data. VR tests drawing from monthly data only prove to be robust when using time windows with a length of 20 years. Sensitivity and thus controversy can be spurred when implementing VR tests using monthly data with samples of length 5 or 10 years. The OLS regression shows that the stock market index at hand cannot explain variation in p-values across scenarios. A negative relationship between time window length and p-value is found; the relationship between time interval and p-value is positive. In section 5, we present a further discussion of these results, along with results from the robustness check of the ADF unit root test, the exploration of alternative tests of weak form market efficiency and the analysis of Lo’s AMH. 3.2. Tests based on stationarity 3.2.1. Development of tests using the stationarity proxy A second possible proxy to indirectly test for weak form market efficiency is the concept of stationarity. In the past, the augmented Dickey-Fuller (ADF) unit root test proved to be most popular among researchers to test for weak form market efficiency by means of the stationarity proxy. Further refinement of the ADF test in later years led to the inclusion of (multiple) structural breaks to prevent for the detection of a unit root because of an underlying structural break. Another innovation was the emergence of panel unit-root tests, which overcome the problem of lower statistical power of univariate unit root tests with a small sample size. Next to these more general refinements to the ADF unit root test, other innovations were proposed as well. For an elaborate overview of the literature on weak form market efficiency tests using the stationarity proxy, we refer to Lean and Smyth (2007), and Lim and Brooks (2011). Just like for the VR test, we perform a robustness analysis for tests based on stationarity to comment on the role of methodology in the debate. In order to optimize the explanatory power of our robustness analysis, we focus on the ADF unit root test, as it was applied most often in the past. But before going into the exploration and the robustness check of the ADF unit root test, we want to make an important side note. Further statistical investigation by Rahman and Saadi (2008) showed that the existence of a unit root in return series is merely a prerequisite, but not a sufficient condition for the third version of the random walk hypothesis. What this implies is that proof of a unit root does not suffice to imply weak form market efficiency. In addition, the returns also need to be serially uncorrelated. In the context of our research, however, we have chosen to implement the ADF unit root test despite this shortcoming. After all, we are taking a step back to understand how the debate has developed over the last decades and what the role of methodology has been in this development. Therefore, we only draw from the ADF unit root test to learn how it may have affected the debate and not to come up with a conclusion on the efficiency of financial markets. 23 3.2.2. Robustness analysis: Augmented Dickey-Fuller unit root test Using stationarity as a proxy to test for weak form market efficiency, the ADF unit root test is the second traditional statistical test of weak form market efficiency we examine for robustness. We first introduce the concept of stationarity, before exploring the ADF test in more detail. Stationarity of time series Theoretically, a time-series is stationary when the underlying data generating process can be defined as a stochastic process with a joint probability distribution that does not change in time nor space (Hamilton, 1994). The distribution of a stationary time-series is thus the same as the distribution of a lagged version or a subset of this time-series. This also implies that statistical moments like the mean and variance of the stationary process do not change over time or position. This conceptual definition of a stationary process is referred to as hard stationarity. For purposes of empirical testing, however, the concept of weak stationarity or covariancestationarity was developed. In the remainder of this paper, when we refer to stationarity, we mean covariance-stationarity. According to Hill, Griffiths and Lim (2011), a time series rt is following a weak stationary process if for every point in time t, it is true that: The mean is constant: E(rt) = The variance is constant: var(rt) = The covariance depends on s, not on t19: cov(rt, rt+s) = cov(rt, rt-s) = A more elaborated discussion of different types of stationarity can for example be found in chapter 3 of Hamilton (1994). Introduction to the augmented Dickey-Fuller unit root test The Dickey-Fuller tests essentially examine if a certain time series contains a unit root or not, or equivalently is non-stationary or stationary. These tests have also been popular among researchers to indirectly determine whether or not a stock return series is weak form efficient. Whenever the Dickey-Fuller tests indicate that stock returns are non-stationary (null hypothesis), researchers have implied that stock returns are following a random walk. Consequently, it was concluded that the stock market at hand is weak form market efficient. Following Dickey and Fuller (1979), three different regression equations can be used to test for stationarity: (1) (2) (3) In our example, would be the difference between the stock returns and , is the intercept or drift term (comparable to the random walk arbitrary drift), is the time trend and is the error term. Equation (1) is the most basic form of the Dickey-Fuller test. Equation (2) also includes a deterministic intercept or drift term; equation (3) has both a drift and a linear time trend. The key parameter in all of the above equations is . Under the null hypothesis , and the time series contains a unit root and is thus non-stationary. Note that when this is the case, we would get the expression for a random walk. Therefore, if we cannot reject the null hypothesis, the time series at hand is assumed to follow a random walk and to be weak form efficient. 19 In words: the covariance does not depend on the specific time, but only on the length of the time interval. 24 An important assumption in the previous Dickey-Fuller equations is that the random errors are not autocorrelated. To accommodate testing for unit roots in the event autocorrelation does occur in the error term, the augmented Dickey-Fuller (ADF) was designed. This extended version of the Dickey-Fuller equations simply includes a sufficient amount of lagged terms to deal with autocorrelation. The ADF equation looks like the following (Hill et al., 2011): Again, is the difference between and . The sum operator represents the number of lagged first difference terms included to ensure that the residuals are not autocorrelated. The number of necessary lag terms can be determined by examining the autocorrelation function of the random error term . The parameter of interest is again . The null hypothesis of the ADF test states that the time series contains a unit root ( ) and hence is nonstationary. The alternative hypothesis states that the time series is stationary. Statistical significance of results is discussed using pvalues that are drawn from the test statistic that is following a non-standard distribution20 ( statistic). Application of the augmented Dickey-Fuller unit root test As a way to obtain a deeper knowledge of the development of the market efficiency debate, we analyze the ADF test in the exact same way as the VR test, using the same methodology, software and set of (transformed) data on four stock market indices (DJIA, S&P-500, NASDAQ and BEL-20). The same two decisions drive variation in our scenario analysis: time interval of data and time window of data. Identical time intervals (daily, weekly and monthly) and time window lengths (5, 10 and 20 years) are considered. The ADF test is applied to a total of 10 different random time windows for every index-time interval-time window length combination. Across the different indices and time intervals, the ADF test is applied 36021 times. Starting from the daily/weekly/monthly returns on the four selected stock market indices, the data are again divided in time windows of 5, 10 and 20 years. The motivation for these specific window lengths remains the same: it enables us to capture the variation in time lengths from a large sample of earlier research. Again, we randomly select 10 different time windows per index-time interval-time window length combination. In total, this process yields 9022 different scenarios per index, which represent 9 groups of unique time interval-time window pairs per index. Since the null hypothesis of the ADF test states that the return series contains a unit root and is thus nonstationary, a rejection of the null hypothesis points towards the index not being efficient in the weak sense. A significance level of 5% is again used. Following earlier research, both a trend and a constant will be included in the unit root equation. To gain further insight about the variation in pvalues, an OLS regression is again put in place. The explanatory variables remain the same: dummies for time interval, window length and index. This scenario analysis based on variation in two important data preprocessing decisions again allows us to examine for robustness of test results, which helps us to gain a deeper understanding of the historical development of the debate and the specific role of methodology like the ADF unit root test. The summary of results for each of the four selected stock market indices can be found in tables 6, 7, 8 and 9. 20 Distribution quantiles are computed by simulation or numerical approximation. Using Monte Carlo simulation, Dickey and Fuller (1979) tabulated critical values for the ADF test, which were later extended by MacKinnon (1991). 21 4 indices x 3 time intervals x 3 time window lengths x 10 random windows. 22 3 different time intervals of data x 3 different window lengths of data x 10 random samples. 25 Table 6: DJIA ADF unit root test results The table presents 90 p-values for the 90 scenarios of the Dow Jones Industrial Average stock market index. Daily, weekly and monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination. P-values indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis. Table 7: S&P-500 ADF unit root test results The table presents 90 p-values for the 90 scenarios of the S&P-500 stock market index. Daily, weekly and monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination. Pvalues indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis. Table 8: NASDAQ ADF unit root test results The table presents 90 p-values for the 90 scenarios of the NASDAQ stock market index. Daily, weekly and monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination. Pvalues indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis. 26 Table 9: BEL-20 ADF unit root test results The table presents 90 p-values for the 90 scenarios of the BEL-20 stock market index. Daily, weekly and monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination. Pvalues indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis. From our analysis the ADF unit root test appears to be robust for daily data. Weekly observations seem to spur controversy, as unequivocal results are only found when time windows are at least 10 years in length23. Results from monthly data exhibit a high degree of sensitivity24, except when time windows of 20 years are used25. These results are again found across the different considered stock markets (incl. BEL-20). Conclusions are hence not limited to U.S. stock markets. The OLS regression indicates significantly higher p-values for the BEL-20 and the S&P-500 compared to the DJIA and NASDAQ. Additionally, the same relations between time interval, time window and p-value are found: longer time intervals lead to higher p-values; longer time windows lead to lower p-values. In brief, we find the ADF unit root test to be robust for daily data, regardless of the time window length. For weekly data, the ADF test seems robust in combination with windows of 10 and 20 years. When implementing monthly data, the ADF test is only robust when using windows of 20 years, given our sample of data. A positive relationship is found between time interval and p-value; a negative relationship exists between time window and p-value. P-values for the BEL-20 and the S&P-500 are also significantly larger than for the DJIA and NASDAQ. A further discussion of these results can be found in section 5. 3.3. Tests based on non-linear serial dependence A third and last traditional way of examining weak form market efficiency is through tests based on non-linear serial dependence. Because of the overwhelming amount of possible nonlinear dependency tests and their level of complexity, however, we choose not to separately test them for robustness. Instead, we describe this type of test in more detail and present some important insights for the debate based on the bulk of earlier empirical research. The first two types of traditional tests we discussed were applied numerously in the aftermath of the establishment of the EMH. But despite their popularity, they were not perfect in examining weak form market efficiency. As we explained before, the ADF unit root test was shown to be inconclusive as a tool to test for weak form market efficiency, as it does not take into account the sufficient condition of serial independence. The VR test was critiqued as well, mostly by proponents 23 Two exceptions for the BEL-20 can be found for this conclusion. One exception can be observed: monthly returns for the S&P-500 in windows of 5 years lead to unambiguous conclusions across samples. 25 Seven conflicting results on a total of 40 samples across the four stock market indices. 24 27 of the EMH who claimed that the test is biased towards the conclusion that stock markets are not weak form efficient. Continuing innovations of Lo and MacKinlay’s original VR test nullified most of these critiques (e.g. the less popular Chow and Denning test (1993)). In later years, scholars from the behavioral finance side of the debate also started to question the original VR tests because of the exact opposite reason: an apparent bias towards a conclusion that stock markets were indeed weak form efficient. The main argument presented by these behavioral researchers was the fact that VR tests use a market efficiency proxy (i.e. linear autocorrelation) that is only able to pick up linear forms of serial dependence, while non-linear serial dependence could also spur market inefficiency. In fact, Lim and Brooks (2011) pointed out that the first statistical evidence for the argument of the behavioral finance researchers was published even before Lo and MacKinlay (1988) presented their VR test. Granger and Andersen (1978) showed that tests based on autocorrelation are ineffective in detecting processes that exhibit nonlinear rather than linear autocorrelation. Later, Hinich and Patterson (1985) pointed out that the definition of a pure white noise process like the random walk (version 1) had become blurred, mostly because of the work of Jenkins and Watts (1968) and Box and Jenkins (1970). At the time, most researchers ignored nonlinear relationships by implicitly assuming that an observed time series is generated by a Gaussian process. Consequently, only linear autocorrelation was used to test for white noise. However, researchers should be careful in mixing up the concepts of a white noise process and a pure white noise process, as their mutual relation is asymmetrical. All pure white noise series, like the first version of the random walk, are serially uncorrelated and thus white noise (Lim & Brooks, 2011). However, the reverse is not true unless the time series at hand is normally distributed. This more technical discussion from the field of statistics had its consequences for testing weak form market efficiency. Disregarding the asymmetrical relationship between white noise and pure white noise, scholars who did not find linear autocorrelation in a return series concluded that this return series is following a random walk and is thus weak form efficient. However, as was pointed out by Hinich and Patterson (1985) this conclusion only holds under the assumption that the return series at hand is normally distributed, which is in practice untenable for financial data. Consequently, new tests were necessary to take into account the possibility of nonlinear serial dependence driving weak form market efficiency. As revealed by Lim and Brooks (2011), it was chaos theory from the field of physics that sparked new thoughts among researchers in finance about potential nonlinear return predictabilities in the short term. Almost all of the early evidence also pointed towards the existence of such nonlinear serial dependencies. Given the complex nature of chaos theory, however, most research started focusing on the development of tests for nonlinear stochastic dependence, which can be divided into two categories. The first group of tests examines a null hypothesis of linearity, without specifically defining a nonlinear alternative in the event the null hypothesis is rejected. However useful as a general test of nonlinearity, the lack of insight on the type of nonlinear dynamics makes these tests less powerful. Examples are the bispectrum test (Hinich, 1982), the Brock-DechertScheinkman (BDS) test (Brock et al., 1996) and the bicorrelation test (Hinich, 1996). The second group of nonlinear tests addresses the main drawback of the first group by explicitly testing against a well-specified nonlinear alternative. Popular types of tests used in this category are the Langrange multiplier (LM) test, the likelihood ratio test and the Wald test. Among the specific nonlinear alternatives considered in this group of tests is the autoregressive conditional heteroscedasticity (ARCH) model (Engle, 1982). From our brief literature review, it is clear that there is some validity to the concept of non-linear dependency in efficiency testing. A further digression on the results from each of the three traditional tests of weak form market efficiency is presented is section six. 28 4. Alternative methodological approach To overcome the conflicting results found when examining weak form market efficiency in the traditional way, Campbell et al. (1997) introduced the idea of relative market efficiency, which captures the degree to which a market is efficient through time. Rather than phrasing the conclusion in terms of weak form efficient or not, this approach accommodates more nuance (e.g. becoming more or less efficient). Following Campbell et al., three possible alternative approaches were developed. In this section, we further explore these alternatives and look for important insights from earlier empirical research to uncover opportunities for a way out of the debate. 4.1. Non-overlapping subperiod analysis As suggested by its name, non-overlapping subperiod analysis is implemented by dividing a sample of stock market data into subsequent non-overlapping subsamples (Lim & Brooks, 2011). These subsamples are not selected randomly, but rather come from some ex ante insights about possible underlying factors that drive market efficiency. For example, if a new policy measure was implemented regulating the financial sector, it could be interesting to introduce a transition point in the dataset for the day this policy measure was enacted into law. The combination of possible underlying factors or events will eventually determine the number of subsamples the dataset will be divided into. If, for example, 5 underlying events were considered, the dataset would become subdivided in 6 subperiods. After the division of data in subsamples, researchers can estimate weak form market efficiency in every subsample and observe the impact of the predetermined underlying events by means of possible significant changes in the test results. The estimation of weak form market efficiency per subperiod can be executed in several ways, for example by means of a traditional test of weak form market efficiency. In their survey paper, Lim and Brooks (2011) give an overview of different types of underlying factors that have been examined in the past. For example, researchers studying the effect of liberalization of financial markets on weak form efficiency found mixed results. We could thus not conclude that liberalization always comes with a benefit for market efficiency. Another underlying factor that has been researched is the improvement of information technology. Although it seems natural that technological advances would improve overall market efficiency, results from the literature are ambiguous. Gu and Finnerty (2002) find that autocorrelation in returns drastically drops after the 1970s, which they ascribe to continuing improvements in technology. Other research, however, concluded that the implementation of electronic trading systems did not lead to an unequivocal increase of weak form market efficiency (Lim & Brooks, 2011). The effects of regulatory and policy measures on market efficiency mainly depend on the nature of the implemented measure. Results from earlier research point towards efficiency benefits as a result of deregulation. Policies that are meant to intervene in the market seem to have detrimental effects on weak form efficiency. For example, price limits introduced to circuit break the system in times of extreme volatility were shown to disrupt the market equilibrium and hence decreased weak form market efficiency. Obviously, non-overlapping subperiod analysis is particularly helpful for policy makers. For the purpose of broad market efficiency research, however, this approach is somewhat less well suited. Given the scope of our paper, we limit the discussion of non-overlapping subperiod analysis to this review of prior research and a further discussion of earlier results in section six. 29 4.2. Rolling estimation windows Rolling estimation windows constitute a second group of alternative approaches that have been developed in response to the concept of time-varying efficiency. We first introduce the rolling estimation windows, after which we review earlier research implementing this technique. Finally, we implement a robustness analysis. 4.2.1. Introduction to rolling estimation windows In contrast to the non-overlapping subperiods, the rolling estimation technique transforms a full sample of data into a series of overlapping subsamples. Starting from the first observation, a subsample of a certain length is created. Next, this first time window is pushed forward one observation. This process ends when a time window is created that covers the last observation for the first time. More specifically, a sample of n observations is divided into n-l+1 windows, with l being the length of the window (Lim & Brooks, 2011). Rolling estimation windows are more suited for broad market efficiency research than non-overlapping subperiod analysis because no ex ante decisions have to be made about underlying factors. Since efficiency can now be studied on a rolling basis, the approach also allows for more nuanced conclusions that take into account the possible time-variant character of weak form market efficiency. Between the time that Campbell et al. (1997) coined the idea of a varying degree of market efficiency and today, a lot of research involving rolling estimation windows has been done. A recent overview of this research can be found in Lim and Brooks (2011). Some of the specific models that have been developed are the following: Rolling Rolling Rolling Rolling Rolling VR tests; ADF unit root tests; bicorrelation tests; parameters of ARCH models; Hurst exponents. As becomes immediately obvious, most of these tests are just extensions of the traditional tests of market efficiency (e.g. VR test and ADF test). Instead of examining an entire data set at once, rolling windows are implemented and a time-varying degree of efficiency is measured by applying traditional tests of weak form market efficiency on every rolling window. Plotting the results from the traditional test of weak form market efficiency of each individual window will then yield a timevarying measure of predictability, which could in turn be interpreted as a time-variant degree of weak form market efficiency. The straightforward nature of this extension, despite the added complexity, has made rolling estimation windows the most popular alternative methodological approach to efficiency measurement. Nevertheless, as with every methodology, certain critiques can be uttered. For instance, these tests are again subject to sensitivity in an underlying parameter: the length of the rolling time windows. After a brief review of earlier research implementing rolling estimation windows, we take into account this issue by implementing a robustness check. 4.2.2. Review of rolling estimation literature The most important and popular application of the rolling estimation windows has been to acquire relative measures of weak form market efficiency. Among others, Cajueiro and Tabak (2004, 2005) conducted multiple studies to test relative weak form market efficiency and were able to visualize the time-variant weak form efficiency character of different Asian stock markets. Ranking these stock markets, a high extent of variation in the degree of weak form efficiency was found. Covering both developed and emerging stock markets, Lim and Brooks (2006) confirm the high degree of variation in relative efficiency across stock markets and find that developed markets are relatively more efficient than their emerging counterparts. 30 An alternative application of the rolling estimation windows is related to the technique of nonoverlapping subperiod analysis. Instead of an ex ante specification, rolling estimation windows can be used to identify important events ex post by looking at shifts in the measure for relative weak form efficiency. For example, Alvarez-Ramirez, Alvarez, Rodriguez and Fernandez-Anaya (2008) find the termination of the Bretton Woods system in 1971 to coincide with a shift towards weak form market efficiency of the Dow Jones. Other applications of the rolling estimation windows that are less interesting for the purposes of our research can be found in Lim and Brooks (2011). 4.2.3. Robustness analysis: Rolling variance ratio tests Like we did for two traditional tests of market efficiency, a robustness analysis is implemented for rolling estimation windows. More particularly, we choose to focus on rolling VR tests because we already explored the traditional VR test in section 3. Rolling ADF unit root tests are not considered as they tend to be less popular for rolling window research following the critique formulated by Rahman and Saadi (2008). Gretl is again used to execute our analysis. Given that the rolling window technique simply consists of applying traditional market efficiency tests to rolling windows, the robustness check can be approached in two stages. The first stage is the robustness check of the applied traditional test of market efficiency. In the case of rolling VR tests, we already completed the first stage in section 3 when we learned that VR tests are best implemented using daily data. The second stage of the robustness check addresses possible sensitivity of the rolling VR test to the length of the rolling time windows. Thus, we now apply VR tests to daily data organized in rolling windows of varying lengths: 50 days (2 months26), 130 days (6 months), 260 days (1 year), 520 days (2 years), 780 days (3 years), 1040 days (4 years), 1300 days (5 years), 2600 days (10 years), 3900 days (15 years) and 5200 days (20 years). For every combination of stock market index, order of differentiation (q) and rolling window length, we calculate p-values from the VR test statistics for the windows included in the rolling VR test. In order to obtain a clear idea about the relative degree of weak form efficiency, we implement an aggregate efficiency measure using the efficiency ratio of windows, which is the ratio of the number of windows that reject the null hypothesis of a random walk at a confidence level of 95% over the total number of windows. From this aggregate measure across the different window lengths and orders of differentiation we can discuss the robustness of the rolling VR test. Next to the aggregate efficiency measure, we also take into account the specific time-varying nature of these tests by looking at the obtained time series of p-values for the different window lengths. An overview of the obtained efficiency ratios can be found in table 10. 26 A rolling time window of length 25 days (1 month) is not considered, as it is too small to be used in combination with an order of differentiation of 16. 31 Table 10: Efficiency ratios The table presents efficiency ratios for different combinations of stock market index, rolling window length and order of differentiation. The number of observations in the different rolling windows is indicated by l. The orders of differentiation are denoted by q. For every of the four considered stock market indices, the efficiency ratio is calculated as the ratio of the number of windows that reject the null hypothesis of a random walk at a confidence level of 95% over the total number of windows. Note that the efficiency ratios roughly increase with the order of differentiation. The same observation was made through the OLS regression for the robustness analysis of the traditional VR test. In interpreting results, we consider the different orders of differentiation and take into account the observed positive relationship. Results are again fairly similar across stock markets. We observe a negative relationship between the relative degree of efficiency and the window length. This also means that research that only considers one window length could be biased. For example, if a researcher were to randomly pick rolling windows of length 5 years (1300 observations), he or she would observe very low degrees of efficiency for the four considered stock market indices. If that same researcher had considered rolling windows of length 1 year, the conclusion would have been less extreme. Rolling windows of length 2 or more years yield results that are mostly useless for further exploration or discussion, as the time-varying degree of efficiency is wiped out. Results from rolling windows of 2 months (50 observations) prove to be out of touch with results from other rolling window lengths. Given the time-varying nature of rolling estimation windows, we also plot VR test p-values of rolling windows through time for every pair of index and order of differentiation. As it would take up too much space, we only include the graph for the DJIA and order of differentiation 2 in appendix B, which proved to be representative for the other cases as well. From this graph, we gain the same insights as from the aggregate efficiency measure. The path of the p-values through time for windows of length 2 months is not in line with the paths for longer windows. Once the windows become too large (> 1 year), the time-varying degree of efficiency vanishes. All in all, we can conclude that rolling VR tests are best implemented using daily data organized in windows with a length between 6 months and 1 year. Weekly and monthly data are more prone to sensitivity. Other window lengths can either lead to p-values that are biased upward or results that no longer incorporate the time-varying concept of efficiency. 32 4.3. Time-varying parameter models The third and last alternative method for weak form market efficiency testing is the time-varying parameter model, which draws from a state space approach that allows for dynamic regression parameters that can evolve through time. The origins of the time-varying parameter model lie in the work of Kalman (1960) and developments from the field of engineering (Durbin & Koopman, 2008). To this day, a lot of new research in the fields of mathematics and statistics is still being devoted to the further development of this methodology. Therefore, the application of time-varying parameter models in finance is still in its infancy and a robustness check like the one we implemented for the rolling estimation windows is not sensible. We further explore the methodology from the existent body of research and look for insights that can be useful in better grasping how the current debate came into existence. Additionally, we expand upon the test for evolving efficiency and introduce a theoretical extension that might be worthwhile to implement in future research. The first study implementing a time-varying parameter model came from Emerson, Hall and Zalewska-Mitura (1997). Drawing from the autocorrelation proxy of weak form market efficiency, they designed a state space model with time-variant autocorrelation coefficients, which can be interpreted as a time-varying measure of efficiency. From this research, Zalewska-Mitura and Hall (1999) developed a formal test for evolving efficiency, in which a state space approach is applied to a GARCH-M model to let parameters evolve through time. The GARCH-M approach is used to deal with problems of non-constant error variance (heteroscedasticity) and autocorrelation of residuals, taking into account the financial risk premium property (Hill et al., 2011). The state space modeling accommodates the time-varying approach to efficiency. The combination of GARCH and state space modeling essentially boils down to a time-varying parameter on the lagged return, with heteroscedasticity and autocorrelation of residuals being controlled for. As was mentioned in the literature review, the test for evolving efficiency has mainly been applied to developing markets, as these are not believed to be “born” efficient. In their paper, Emerson et al. (1997) found varying degrees of efficiency among Bulgarian shares. They also found variation in the time it takes for Bulgarian shares to become more efficient. Zalewska-Mitura and Hall (1999) find similar results for Hungarian stocks, using their test for evolving efficiency. Other researchers have also adopted this test to stock markets from other continents. For example, Abdmoulah (2010) finds no clear evolution towards weak form market efficiency for 11 Arab stock markets, concluding that past reforms of the Arab financial markets have been ineffective in addressing informational inefficiency. The test for evolving efficiency has also been applied to examine the evolving stock market efficiency of the British FTSE-100, but the autocorrelation parameter did not really evolve through time. Nevertheless, later studies have shown that the test for evolving efficiency can be useful for developed markets as well. Next to introducing the test for evolving efficiency, we also propose a theoretical expansion by adopting the property of asymmetric reaction to information, i.e. people react more heavily to negative than positive news. This is not just a behavioral observation but also a widely accepted decision bias, as stock markets prove to be more sensitive to negative than positive news (Shefrin, 2000). For the sake of simplicity, we develop the expanded test starting from an ordinary least squares (OLS) regression, modeling the return of today as a function of the return of the days before , with : In this OLS regression, is a constant, is the number of included lagged returns, parameter of the lagged return and is the random error term. 33 is the A major problem with this simple approach to weak form market efficiency testing is that the underlying assumptions of independent and homoscedastic residuals are not met with financial data. A common solution is to implement a GARCH-M model, which allows for a separate estimation of the variance of the random error term through a variance function, resolving the issue of heteroscedasticity (Engle, 2001). Autocorrelation in the variance function is addressed by adopting a geometric lag structure. Additionally, the typically positive relationship between risk and return is taken into account by incorporating the variance of the residuals in the return equation. To model the common asymmetric reaction to information, a threshold component is added as well. More specifically, an asymmetry term is added to the variance function to distinguish between the effects of good and bad news on stock markets. Now, the model looks like the following: The , , , and are the same as for the OLS regression. The random error term , however, is different as its variance is now modeled separately in the second equation ( , which resolves the issue of heteroscedasticity. The variance of the residuals ( ) is also included in the first equation with its own parameter , representing the risk-premium property. The variance function consists of a constant and a geometric lag structure that deals with the issue of autocorrelation: the variance of residuals of yesterday ( ) with parameter and the squared residual of yesterday with parameter . The threshold component ( ) together with its parameter is added to the variance function to account for the asymmetric investor reaction to new information. One last problem remains with this approach: static parameters. To overcome this problem, we adopt a state space model, which allows for parameters that dynamically evolve through time. What we end up with is an adapted version of the test for evolving efficiency: The first equation represents the measurement equation of the state space model; the other two equations are state equations. The variance function remains exactly the same. In the first equation, and become time-variant and change into and . The last equation presents the dynamic estimation of as plus a separate random error term with its own variance . Another interesting approach combines elements of the time-varying parameter model and the rolling estimation windows. This methodology, developed by Ito and Sugiyama (2009), first uses a moving time window approach to calculate time-varying autocorrelations ( ) on weekly or monthly basis. This simply consists of dividing n observations into n-l+1 subsamples - l being the window length - and then calculating first-order autocorrelations between the different subsamples. Next, a state space model drawing from these time-varying autocorrelations ( ) is implemented: 34 The first equation is again the measurement equation, but this time without a time-varying intercept and the risk-premium component, and starting from calculated rolling autocorrelations rather than the return. The second equation is again the state equation, which is identical to the one for the test of evolving efficiency. Initial results from this methodology point towards varying levels of efficiency for the S&P-500 (Ito & Sugiyama, 2009). 5. Alternative theoretical framework Focusing on the lack of a new theoretical framework on efficient markets as one of the causes of the debate, we now consider an interesting alternative proposed by Lo (2004, 2005). We first present the framework and then implement a rolling VR test to look for empirical validation. 5.1. Adaptive markets hypothesis: A possible reconciliating framework In an attempt to reconcile theories of the EMH and behavioral finance, Lo (2004, 2005) came up with the adaptive markets hypothesis (AMH). Starting from the concepts of bounded rationality and satisficing27, and the notion of biological evolution, he argued that many of the biases found in behavioral finance follow a certain evolutionary path, in which individuals try to learn and adapt to new market conditions. This learning and adaptation process is driven by competition among investors, and natural selection determines the market ecology, with some investors being driven out of the market and some investors remaining in the market. The process of natural selection and competition shapes the evolutionary dynamics underlying the market, which are mirrored in the degree of efficiency. As long as there is no market shock that causes market ecology to change, stock markets are fairly efficient. Once a certain event triggers the process of competition and natural selection, markets become temporarily less efficient. When the new market ecology is formed, efficiency of financial markets returns to pre-shock levels. Looking at the recent financial crisis, we can recognize certain elements from Lo’s theory. Financial markets had been fairly stable for some years and a reasonable degree of market efficiency was reached. Nevertheless, investors also demonstrated some degree of irrational behavior, which eventually led to the housing bubble and markets exhibiting higher degrees of inefficiency. Since mortgages had been transformed into investment vehicles sold across the globe, the housing crisis quickly evolved into a global financial crisis. Investors had to learn from their mistakes and needed to adapt to the new market conditions. Trying to look for the optimal investment strategies, competition started between new and incumbent investors. Those investors that did not learn quickly enough and/or did not adapt to the new circumstances lost so much money that they were driven out of the market. The new market ecology was formed consisting of those investors that had learned and adapted promptly, and a new evolution towards efficiency was started. From the AMH theory and the example, a reconciliation between the EMH and behavioral finance becomes apparent. Markets are neither perfectly efficient nor inefficient all the time; there is a certain evolutionary aspect to the process of market efficiency. For a long time, stock markets can process information in a reasonably efficient manner until a certain shock, crash or other event disrupts this state of efficiency. Some market participants are driven out of the market and some new participants enter the market. During this process in which a new market ecology is formed, relative levels of inefficiency are found. Once the transformation period ends, levels of market efficiency are restored, until a new crash, shock or other event disrupts the new found ecological equilibrium. Note that the AMH theory is also similar to the idea of the alternative methodologies in which market efficiency is calculated in relative degrees over time. 27 Humans do not have the information nor the methodology to always optimize in a rational way. Consequently, they use some rules of thumb or heuristics to find satisfactory results that are not necessarily rational (Simon, 1955). 35 5.2. Empirical validation of the adaptive markets hypothesis After introducing Lo’s AMH theory, we now look for empirical validation by employing a rolling VR test starting from the insights of our robustness analysis. We also review earlier empirical research testing Lo’s theory. 5.2.1. First attempts to empirically test the adaptive markets hypothesis The earliest attempt to empirically investigate the AMH comes from Lo himself. Computing rolling first-order autocorrelations of monthly returns as a measure of market efficiency, Lo (2004, 2005) finds a cyclical pattern through time, which confirms the idea of underlying dynamics to the degree of market efficiency. However, Lo’s estimated rolling autocorrelation measures are not in line with the idea of markets being relatively efficient for a long time, until a market crash causes a short period of relatively lower efficiency. Rather, his empirical evidence points towards the reverse. In later years, researchers examined the AMH by means of trading strategies. Investigating the profitability of moving average strategies on the Asia-Pacific financial markets, Todea, Ulici and Silaghi (2009) confirm the cyclical efficiency pattern of the AMH. Neely, Weller and Ulrich (2009) study excess returns earned by various technical trading rules on foreign exchange markets. They find these returns to decline over time, but at a slower pace than expected under the EMH because of behavioral and institutional factors. These findings are consistent with the AMH view of markets being dynamic systems subject to underlying evolutionary processes. Finding a higher degree of stock market predictability in times of economic and political crises, Kim et al. (2011) confirm Lo’s idea of time-varying market efficiency being driven by changing market conditions. During market bubbles and crashes, virtually no return predictability is found. This, however, is at odds with Lo’s AMH, which states that higher degrees of predictability and thus lower degrees of efficiency ought to be found in times of market mania. Implementing an OLS regression, Kim et al. find inflation, risk-free rates and stock market volatility to be important factors influencing stock return predictability over time. The first evidence from empirical studies shows that there is some value to the idea of adaptive markets. However, some discrepancies were found as well. From these observations and given the limited number of studies empirically testing the AMH, we apply a rolling VR test to gain more insights into the validity of Lo’s theory. 5.2.2. Application of a rolling variance ratio test to the adaptive markets hypothesis Our choice for the rolling VR test to investigate the AMH follows naturally from the fact that both the test and the theory find their origin in the concept of time-varying efficiency. In contrast to earlier research, we also take underlying sensitivities to the applied methodology into account by drawing from the findings of our earlier robustness analysis of the rolling VR test. Specifically, we use daily data and rolling windows of length 6 months (130 observations) and 1 year (260 observations). The same four stock market indices as before are considered and the selected orders of differentiation are still 2, 4, 8 and 16. We calculate the p-values of the VR test statistics for the different rolling windows in the data samples and plot the results from the different considered orders of differentiation for every combination of stock market index and rolling window length. The obtained graphs present the evolution of VR test p-values through time and can be interpreted as a time-varying measure of efficiency, which allows us to comment on the validity of the ideas underlying the AMH. The results are obtained with the statistical package Gretl and are presented graphically in figure 1. 36 Figure 1: Rolling VR test p-values for the DJIA, S&P-500, NASDAQ and BEL-20 for windows of length (l) 130 and 260 observations. Every graph presents the p-values of the VR test for the different rolling windows in the data sample. Per index, two graphs are included: one for windows with length (l) 130 observations (6 months) and one for windows with length 260 observations (1 year). For every combination of index and rolling window length, p-values are plotted for four different orders of differentiation (q). 37 When looking at the plots in figure 1, we observe a cyclical pattern like the one noticed by Lo (2004, 2005) when using rolling first-order autocorrelations. The patterns also exhibit some similarities across the different stock indices. Peaks in p-values can be found at the end of the 1980s and around 2010, which are both periods of stock market crises. Together, these observations provide additional validation for the idea of adaptive markets. Even though p-values for the second order of differentiation ought to be smaller than those for higher orders of differentiation, we detect a few observations in which this is not the case. Further examination did not yield a precise explanation, other than that these irregularities seem to coincide with periods of increased market turbulence. This, however, does not interfere with the overall interpretation of results or the obtained conclusions. There is also a peculiarity to the graphs that is at odds with the AMH: the peaks in p-values represent periods with a relatively high degree of weak form market efficiency, as the null hypothesis of returns following a random walk is no longer rejected. Rather than periods with a high degree of efficiency being disrupted by a crash causing some extent of inefficiency, rolling estimation windows point to the reverse. Stock market indices appear to be relatively inefficient for some time, until a short period with a high degree of efficiency starts. Although in contrast with the theory suggested by Lo (2004, 2005), this observation is not that different from what he observed empirically by means of his rolling autocorrelation test. Our observation is also in line with what was found by Kim et al. (2011). They provide a possible explanation stating that in times of turbulence markets are harder to predict, causing tests of efficiency based on predictability to point towards a higher degree of efficiency. A further digression on these results together with the results of our earlier investigation of traditional and alternative weak form market efficiency tests is presented in the next section. 6. Discussion of results Further discussing results from our earlier analysis of both traditional and alternative test methodologies, and the alternative theoretical framework, we now aim to infer some important lessons from the past. Additionally, we use these results to look at the future and come up with suggestions for further research. 6.1. Traditional tests of weak form market efficiency Our treatment of the more traditional tests of market efficiency allows us to learn something about the role methodology has played in the development of the debate. From our robustness analysis we learn that it is unlikely that the VR test has complicated the debate if only daily or weekly data, or monthly data in combination with time windows of 20 years was used in research. However, scanning the literature we find that this was not always the case. For example, Charles and Darné (2009) indicate that research employing Lo and MacKinlay’s VR test to check for weak form market efficiency of Latin American stock markets has led to conflicting results. Furthermore, they observed that “the results are overall mixed and scattered over studies that employ different sample periods, methods and data frequencies” (p. 518). This observation is perfectly consistent with our findings about sensitivity of the VR test to decisions on time interval (i.e. data frequency) and time window (i.e. sample period). In sum, we can conclude that methodology, and more particularly the VR test, might have complicated the debate. From our analysis, we also learn that prior research using the ADF unit root tests in combination with daily data probably did not spur controversy. However, given weekly and monthly data, depending on the sample time window length, the ADF unit root test may have led to conflicting results. Lean and Smyth (2007) reviewed the literature of studies employing ADF unit root tests to examine weak form market efficiency and concluded that “the empirical evidence on the random walk hypothesis from these studies is mixed” (p. 17). Given our findings of sensitivity of the ADF 38 test to decisions on time window and time interval, we can understand how these conflicting results could have been found. Since both the ADF unit root test and Lo and MacKinlay’s VR test have been popular to test for weak form market efficiency, we also need to examine to what extent these tests lead to consistent results. All in all, we see that both tests only yield consistent results for daily data, weekly data in combination with time windows of 10 and 20 years and monthly data in combination with time windows of 20 years. Nonlinear serial dependence also proved to be a significant feature of stock market returns across the globe. The most important insight for the purpose of our research is that, like was suggested by behavioral scholars, there is more to weak form market efficiency than the predictability uncovered by linear autocorrelation tests. This also helps explain part of the debate, as it weakens the arguments of proponents of the EMH drawing from linear autocorrelation tests. However, since few nonlinear tests provide more information on the specific form of the underlying nonlinear process, it was difficult for behaviorists to make a strong case for stock markets being generally inefficient. To this day, the emergence of new methodologies looking at nonlinear serial dependency did not lead to a conclusion for the debate, but instead further increased controversy. All together, our analysis yields an important lesson from the past: traditional methodologies to test for weak form market efficiency may be considered as one of the causes of the debate. Both tests based on the autocorrelation and the stationarity proxy are prone to sensitivity, which might have led to conflicting results. Also, these tests do not always lead to the same conclusions, which might have induced further controversy. Finally, the consideration of nonlinear serial dependencies proved to be useful, but eventually only led to added complexity in the debate, as it remains difficult to specify the underlying nonlinear process. 6.2. Alternative tests of weak form market efficiency In response to the idea of Campbell et al. (1997) of a time-varying degree of efficiency, several alternative test methodologies were developed. These methodologies have already been applied but most are still in development today and can be considered a tool for future research. Non-overlapping subperiod analysis has proven to be very valuable in researching the impact of specific events or factors on market efficiency. From earlier research, we learn that weak or mixed results are found on the impact of some broad categories of events. This again proves how complex and multidimensional the concept of market efficiency is. It cannot just be about information being incorporated in market prices in a timely manner, because then efficiency should have increased in response to, for example, liberalization and technological revolutions. Since nonoverlapping subperiod analyses are somewhat less appropriate to comment on the broader issue of market efficiency, academics should also be careful in drawing too strong a conclusion from the evidence. Stating more than that these tests provide a better understanding of the complexity of the issue would be scientifically unjust. The best-developed category of alternative tests is that of the rolling estimation windows. From our exploration we learn that a relative and time-varying approach can be valuable and is less likely to spur controversy. The robustness analysis indicates that rolling VR tests should be implemented on daily data organized in windows with a length between 6 months and 1 year. Rather than immediately settling the debate, we feel like the rolling estimation windows can help to redefine the debate first. Instead of trying to prove whether or not financial markets are weak form efficient, it might be wise to study market efficiency as a time-variant and relative, rather than an absolute concept. 39 The time-varying parameter models represent another interesting alternative approach to the examination of weak form market efficiency, but more research is still required. On the one hand, methodological innovations are necessary to further perfect the test as a measure of weak form market efficiency. On the other hand, these kinds of tests need to be implemented more often using stock market data to validate earlier results of time-varying degrees of efficiency. Both the rolling windows and the time-varying parameter models provide a new perspective for the debate as their time-variant nature enables reconciliation between the extreme views held by adherents of both the EMH and behavioral finance. We suggest that researchers look into the further development of these tests as this could lead to a definitive redefinition of efficiency from an absolute to a more time-variant and relative state of financial markets. This, in turn, might further accommodate a reconciling conclusion for a debate of many years’ standing. 6.3. Alternative theoretical framework on market efficiency Another reason for the debate from remaining unsettled is the lack of an alternative theoretical framework capturing obtained empirical results. Lo (2004, 2005) tried to fill this lack by positing the AMH, which reconciles between behavioral finance and the EMH, drawing from insights of evolutionary biology. Both our own empirical investigation of the AMH and earlier research confirmed the dynamic character of weak form market efficiency. However, the theoretical pattern of longer periods of relatively high degrees of efficiency interrupted by shorter periods of relative inefficiency associated with crises can be empirically discarded. In fact, the opposite pattern was found. We believe the reason for these conflicting patterns lies in the unobservable nature of efficiency, which calls for the use of proxies. Different kinds of proxies have been used in the past, all of which are related to the principle of predictability because of its interesting inverse theoretical relationship with efficiency. Lo’s AMH starts from our intuition of markets being capable of efficiently incorporating information most of the time. In times of turbulence, however, this ability of markets might temporarily weaken. Inversely, predictability of stock returns is always assumed to be limited, except in times of market mania. Looking at the estimates for the efficiency pattern from testing the AMH, the inverse relationship between the predictability proxy and efficiency seems imperfect. In fact, for our intuition to be correct, the relationship between predictability and efficiency should be proportional rather than inverse, as the observed peaks in p-values represent periods with a relatively low degree of return predictability. This puzzle helps to understand why Lo’s theory has not been able to settle the debate over the last decade. In order to find a way out of this conundrum, we see two possibilities. We could redefine our intuition, as our understanding of the concept of efficiency might be incorrect. This, however, tends to be very difficult and is likely to face strong resistance. A better solution would be to look for another observable trait of financial markets that could be used as an appropriate proxy for efficiency, which presents an interesting challenge for future research. 40 7. Conclusion The debate on efficient markets has come a long way. In fact, many of the most renowned 19th and 20th century economists have contributed to it to some extent. From our research, we find two reasons that help explain the current dispute. First, traditional test methodologies applied in the early aftermath of the establishment of the EMH proved to be prone to sensitivity, which might have spurred conflicting results. Additionally, opponents of the EMH have failed in coming up with an improved theoretical alternative. Next to a better understanding of the past, our research also provides perspectives for the future. Most importantly, we believe that the concept of efficiency should be redefined along the lines of Campbell et al. (1997). Considering efficiency as an absolute and binary state of financial markets has spurred controversy. The idea of efficiency being a time-variant and relative characteristic can pave the way for reconciliation between adherents of the EMH and proponents of behavioral finance. In this regard, the presented alternative methodologies enabling a more dynamic approach to efficiency can be important tools for future research. In addition to an evolution of test methodologies, a new theoretical framework on efficiency is needed. Lo’s AMH can be the impetus for such a new framework, but further methodological refinement is needed in testing for market efficiency. Research opportunities remain in the continued development of alternative test methodologies, as they have the potential to become the standard tests of efficiency in the future. Ideally, robustness analyses should be implemented systematically to avoid further controversy. More research is also needed to address the conundrum observed when empirically investigating Lo’s AMH. Together, alternative test methodologies and a new theoretical framework on efficiency have the potential of settling the debate by means of reconciling between ideas of both the EMH and behavioral finance using the concept of time-varying and relative efficiency. References Abdmoulah, W. (2010). Testing the evolving efficiency of Arab stock markets. International Review of Financial Analysis, 19(1), 25-34. Alexander, S. S. (1961). Price movements in speculative markets: Trends or random walks. Industrial Management Review, 2(2), 7-26. Alexander, S. S. (1964). Price movements in speculative markets: Trends or random walks, no. 2. Industrial Management Review, 5(2), 25-46. Alvarez-Ramirez, J., Alvarez, J., Rodriguez, E., & Fernandez-Anaya, G. (2008). Time-varying Hurst exponent for US stock markets. Physica A, 387(24), 6159-6169. Bachelier, L. (1900). Théorie de la spéculation. Annales Scientifiques de l’Ecole Normale Supérieure, 3(17), 21-86. Ball, R. (1978). Anomalies in relationships between securities' yields and yield-surrogates. Journal of Financial Economics, 6(2-3), 103-126. Barberis, N., Shleifer, A., & Vishny, R. (1998). A model of investor sentiment. Journal of Financial Economics, 49(3), 307-343. Bodie, Z., Kane, A., & Marcus, A. (2010). Investments (9th ed.). New York: McGraw-Hill Irwin. Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis, forecasting and control. San Francisco: Holden-Day. Brock, W. A., Scheinkman, J. A., Dechert, W. D., & LeBaron, B. (1996). A test for independence based on the correlation dimension. Econometric Reviews, 15(3), 197-235. Brown, D., & Jennings, R. (1989). On technical analysis. Review of Financial Studies, 2(4), 527551. 41 Brown, R. (1828). A brief account of microscopical observations. Edinburgh New Philosophical Journal, 6, 358-371. Cajueiro, D. O., & Tabak, B. M. (2004). The Hurst exponent over time: Testing the assertion that emerging markets are becoming more efficient. Physica A, 336(3-4), 521-537. Cajueiro, D. O., & Tabak, B. M. (2005). Ranking efficiency for emerging equity markets II. Chaos, Solitons and Fractals, 23(2), 671-675. Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The econometrics of financial markets. Princeton: Princeton University Press. Carhart, M. M. (1997). On persistence in mutual fund performance. Journal of Finance, 52(1), 5782. Charles, A., & Darné, O. (2009). Variance-ratio tests of random walk: An overview. Journal of Economic Surveys, 23(3), 503-527. Chow, K., & Denning, K. (1993). A simple multiple variance ratio test. Journal of Econometrics, 58 (3), 385-401. Conrad, J., & Kaul, G. (1988). Time-variation in expected returns. Journal of Business, 61(4), 409425. Cowles, A. (1933). Can stock market forecasters forecast? Econometrica, 1(3), 309-324. Cowles, A. (1944). Stock market forecasting. Econometrica, 12(3-4), 206-214. Cowles, A., & Jones, H. (1937). Some a posteriori probabilities in stock market action. Econometrica, 5(3), 280-294. Daniel, K., Hirshleifer, D., & Subrahmanyam, A. (1998). Investor psychology and security market under- and overreactions. Journal of Finance, 53(6), 1839-1885. De Bondt, W. F., & Thaler, R. (1985). Does the stock market overreact? Journal of Finance, 40(3), 793-805. De Long, B. J., Shleifer, A., Summers, L. H., & Waldmann, R. J. (1990). Positive feedback investment strategies and destabilizing rational expectations. Journal of Finance, 45(2), 374-397. Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366), 427-431. Durbin, J., & Koopman, S. (2008). Time series analysis by state space methods. Oxford: Oxford University Press. Emerson, R., Hall, S. G., & Zalewska-Mitura, A. (1997). Evolving market efficiency with an application to some Bulgarian shares. Economics of Planning, 30(2-3), 75-90. Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50(4), 987-1007. Engle, R. F. (2001). GARCH 101: The use of ARCH/GARCH models in applied econometrics. Journal of Economic Perspectives, 15(4), 157-168. Fama, E. F. (1965a). The behavior of stock-market prices. Journal of Business, 38(1), 34-105. Fama, E. F. (1965b). Random walks in stock market prices. Financial Analysts Journal 21(5), 55– 59. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance, 25(2), 383-417. Fama, E. F. (1991). Efficient capital markets: II. Journal of Finance, 46(5), 1575-1617. Fama, E. F. (1998). Market efficiency, long-term returns and behavioral finance. Journal of Financial Economics, 49(3), 283-306. Fama, E. F., & Blume, M. E. (1966). Filter rules and stock-market trading. Journal of Business, 39(S1), 226-241. Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3-56. 42 Fox, J. (2009). The myth of the rational market: A history of risk, reward, and delusion on Wall Street. New York: Harper Business. Gibson, G. (1889). The stock markets of London, Paris and New York. New York: G.P. Putnam's Sons. Granger, C. W. J., & Morgenstern, O. (1963). Spectral analysis of New York stock market prices. Kyklos, 16(1), 1–27. Granger, C. W. J., & Andersen, A. P. (1978). An introduction to bilinear time series models. Gottingen: Vandenhoeck and Ruprecht. Grossman, S. (1976). On the efficiency of competitive stock markets where traders have diverse information. Journal of Finance, 31(2), 573-585. Grossman, S., & Stiglitz, J. (1980). On the impossibility of informationally efficient markets. American Economic Review, 70(3), 393-408. Gu, A. Y., & Finnerty, J. (2002). The evolution of market efficiency: 103 years daily data of the Dow. Review of Quantitative Finance and Accounting, 18(3), 219-237. Hald, A. (1990). A history of probability and statistics and their applications before 1750. New York: John Wiley and Sons. Hamilton, J. D. (1994). Time series analysis. Princeton: Princeton University Press. Hill, R. C., Griffiths, W. E., & Lim, G. C. (2011). Principles of econometrics (Vol. 4). New York: John Wiley and Sons. Hinich, M. J. (1982). Testing for Gaussianity and linearity of a stationary time series. Journal of Time Series Analysis, 3(3), 169-176. Hinich, M. J. (1996). Testing for dependence in the input to a linear time series model. Journal of Nonparametric Statistics, 6(2-3), 205-221. Hinich, M. J., & Patterson, D. M. (1985). Evidence of nonlinearity in daily stock returns. Journal of Business and Economics Statistics, 3(1), 69-77. Ito, M., & Sugiyama, S. (2009). Measuring the degree of time varying market inefficiency. Economics Letters, 103(1), 62-64. Jegadeesh, N. (1990). Evidence of predictable behavior of security returns. Journal of Finance, 45 (3), 881-898. Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: Implications for stock market efficiency. Journal of Finance, 48(1), 65-91. Jegadeesh, N., & Titman, S. (2001). Profitability of momentum strategies: An evaluation of alternative explanations. Journal of Finance, 56(2), 699-720. Jenkins, G. M., & Watts, D. (1968). Spectral analysis and its applications. San Francisco: HoldenDay. Jensen, M. C. (1978). Some anomalous evidence regarding market efficiency. Journal of Financial Economics, 6(2-3), 95-101. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263-292. Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1), 35-45. Kemp, A. G., & Reid, G. C. (1971). The random walk hypothesis and the recent behaviour of equity prices in Britain. Economica, 38(149), 28-51. Keynes, J. M. (1936). The general theory of employment, interest and money. London: Macmillan. Kim, J. H., Shamsuddin, A., & Lim, K. P. (2011). Stock return predictability and the adaptive markets hypothesis: Evidence from century-long U.S. data. Journal of Empirical Finance, 18(5), 868-879. Kothari, S. (2001). Capital markets research in accounting. Journal of Accounting and Economics, 31(1-3), 105-231. 43 Lakonishok, J., Shleifer, A., & Vishny, R. W. (1994). Contrarian investment, extrapolation, and risk. Journal of Finance, 49(5), 1541-1578. Lean, H. H., & Smyth, R. (2007). Do Asian stock markets follow a random walk? Evidence from LM unit root tests with one and two structural breaks. Review of Pacific Basin Financial Markets and Policies, 10(1), 15-31. Lehmann, B. (1990). Fads, martingales and market efficiency. Quarterly Journal of Economics, 105(1), 1-28. Lim, K. P., & Brooks, R. D. (2006). The evolving and relative efficiencies of stock markets: Empirical evidence from rolling bicorrelation test statistics (SSRN Working Paper No. 931071). Retrieved from the Social Sciences Resource Network website: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=931071 Lim, K. P., & Brooks, R. D. (2011). The evolution of stock market efficiency over time: A survey of the empirical literature. Journal of Economic Surveys, 25(1), 69-108. Lo, A. W. (2004). The adaptive markets hypothesis: Market efficiency from an evolutionary perspective. Journal of Portfolio Management, 30(5), 15-29. Lo, A. W. (2005). Reconciling efficient markets with behavioral finance: The adaptive markets hypothesis. Journal of Investment Consulting, 7(2), 21-44. Lo, A. W., & MacKinlay, A. C. (1988). Stock market prices do not follow random walks: Evidence from a simple specification test. Review of Financial Studies, 1(1), 41-66. Lo, A. W., & MacKinlay, A. C. (1990). When are contrarian profits due to market overreaction? Review of Financial Studies, 3(2), 175-206. Lo, A. W., & MacKinlay, A. C. (1997). Maximizing predictability in the stock and bond markets. Macroeconomic Dynamics, 1(1), 102-134. MacKinnon, J. G. (1991). Critical values for cointegration tests. In R. F. Engle & C. W. J. Granger (Eds.). Long-run economic relationships: Readings in cointegration. Oxford: Oxford University Press. Malkiel, B. G. (1992). Efficient market hypothesis. In P. Newman, M. Milgate, & J. Eatwell (Eds.). New palgrave dictionary of money and finance. London: Macmillan. Malkiel, B. G. (2003). The efficient market hypothesis and its critics. Journal of Economic Perspectives, 17(1), 59-82. Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7(1), 77-91. Marshall, A. (1890). Principles of economics. London: Macmillan. Neely, C. J., Weller, P. A., & Ulrich, J. (2009). The adaptive markets hypothesis: Evidence from the foreign exchange market. Journal of Financial and Quantitative Analysis, 44(2), 467-488. Pearson, K. (1905). The problem of the random walk. Nature, 72(1865), 294. Rahman, A., & Saadi, S. (2008). Random walk and breaking trend in financial series: An econometric critique of unit root tests. Review of Financial Economics, 17(3), 204-212. Regnault, J. (1863). Calcul des chances et philosophie de la bourse. Paris: Mallet-Bachelier et Castel. Roberts, H. (1967). Statistical versus clinical prediction of the stock market. Unpublished manuscript. Ross, S. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13(3), 341-360. Samuelson, P. A. (1965). Proof that properly anticipated prices fluctuate randomly. Industrial Management Review, 6(2), 41-49. Sewell, M. (2011). History of the efficient market hypothesis (UCL Research Note No. RN/11/04). Retrieved from the University College of London website: http://www.cs.ucl.ac.uk/ fileadmin/UCL-CS/images/Research_Student_Information/RN_11_04.pdf Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance, 19(3), 425-442. 44 Shefrin, H. (2000). Beyond greed and fear: Understanding behavioral finance and the psychology of investing. Oxford: Oxford University Press. Shiller, R. J. (1979). The volatility of long-term interest rates and expectations models of the term structure. Journal of Political Economy, 87(6), 1190-1219. Shiller, R. J. (1989). Market volatility. Cambridge: The MIT Press. Shiller, R. J. (2000). Irrational exuberance. Princeton: Princeton University Press. Shiller, R. J. (2003). From efficient markets theory to behavioral finance. Journal of Economic Perspectives, 17(1), 83-104. Simon, H. A. (1955). A behavioral model of rational choice. Quarterly Journal of Economics, 69(1), 99-118. Smith, A. (1759). The theory of moral sentiments. Indianapolis: Liberty Fund. Smith, A. (1766). The wealth of nations. New York: P. F. Collier. Smith, V. L. (1998). The two faces of Adam Smith. Southern Economic Journal, 65(1), 2-19. Todea, A., Ulici, M., & Silaghi, S. (2009). Adaptive markets hypothesis: Evidence from Asia-Pacific financial markets. Review of Finance and Banking, 1(1), 7-13. von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton: Princeton University Press. White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817-838. White, H., & Domowitz, I. (1984). Econometrica, 52(1), 143-162. Nonlinear regression with dependent observations. Working, H. (1934). A random-difference series for use in the analysis of time series. Journal of the American Statistical Association, 83(s1), S87-S93. Wright, J. (2000). Alternative variance-ratio tests using ranks and signs. Journal of Business and Economic Statistics, 18(1), 1-9. Zalewska-Mitura, A., & Hall, S. G. (1999). Examining the first stages of market performance. A test for evolving market efficiency. Economics letters, 64(1), 1-12. 45 Appendix A Graphical summary table 1 Time-series plots for the daily, weekly and monthly prices of the DJIA, S&P-500, NASDAQ and BEL-20. 46 47 Appendix B This graph presents the p-values from different rolling window lengths through time for the DJIA with order of differentiation 2.