A Tale of Market Efficiency

advertisement
Faculteit Economie & Management
Studiegebied Handelswetenschappen en Bedrijfskunde
Opleiding Master of Science Handelsingenieur
Intern aangestuurde masterproef
A Tale of Market Efficiency
A Methodological Digress
Masterproef aangeboden door
Tim VERHEYDEN
tot het behalen van de graad van
Master of Science Handelsingenieur
Promotor: Prof. Dr. Filip VAN DEN BOSSCHE
Copromotor: Prof. Dr. Lieven DE MOOR
Academiejaar: 2012 – 2013
Verdedigd in: Juni 2013
Hogeschool-Universiteit Brussel, Warmoesberg 26, 1000 Brussel
Tel: 02-210 12 11, Fax: 02-217 64 64, www.hubrussel.be
Hogeschool - Universiteit Brussel
Faculty of Economics & Management
Master Thesis
A Tale of Market Efficiency: A Methodological Digress
Tim VERHEYDEN (153947)
Master of Science Handelsingenieur
Academiejaar 2012-2013
Promotor: Prof. Dr. Filip VAN DEN BOSSCHE
Copromotor: Prof. Dr. Lieven DE MOOR
Abstract
The efficient market hypothesis (EMH) has been subject to debate for decades and has both
proponents and opponents. In fact, the field of behavioral finance was developed in response to the
body of anomalous evidence with regard to the EMH. Reviewing seminal work that underlies the
theory on efficient markets, we provide a historical context for the current debate. Special
attention is devoted to the development of appropriate methodology to test for weak form market
efficiency, the least restrictive version of the EMH. Methodologies developed in the early aftermath
of the debate are explored and tested for robustness to gain a better understanding of how the
debate came into existence. More recent alternative approaches are discussed and examined to
distill suggestions and recommendations for future research. Additionally, we test the adaptive
markets hypothesis (AMH) as a theoretical alternative for the EMH and discuss the results together
with our insights from the methodology as a way out of a debate of many years’ standing. We
advance the thesis that the current debate can by some extent be explained by sensitivity of the
most widely applied methodologies to test for weak form market efficiency. Alternative
methodologies like rolling estimation windows and time-varying parameter models also prove
useful to test for efficiency. Finally, we find the AMH to be helpful in reconciling between the views
of the EMH and behavioral finance, but further research is needed to overcome a conundrum that
presents itself when testing for efficiency through the predictability proxy.
Keywords: efficient market hypothesis; behavioral finance; weak form market efficiency; adaptive
markets hypothesis
JEL-codes: B26, G02, G14
1
1. Introduction
In the world of academic finance, researchers have been stymied over the informational efficiency
of stock markets for more than 40 years. In fact, one could go back to the 18th century to see that
even Adam Smith (1759, 1766), father of modern economics, was torn between two opinions on
the efficiency and self-stabilizing nature of financial and economic markets. Are stock prices quoted
on stock markets in line with the intrinsic value of the underlying financial asset? This question
remains to be answered, as academics cannot seem to find common ground.
Different statistical approaches have been developed to address the question of market efficiency,
but researchers have widely adopted the same kind of methodologies (Lim & Brooks, 2011).
However, common ground remains to be found. The fact that scholars continue to disagree over a
certain issue, even when examining it in exactly the same way, makes us wonder: maybe the way
market efficiency has been examined is biased to begin with? Instead of focusing on the question
of whether stock markets are efficient, this paper looks into the far less popular question
addressing the cause of the debate from remaining unsettled. Traditional methodologies applied
over the years are tested for robustness to gain a better understanding of how the debate came
into existence. Alternative methodologies and an alternative theoretical framework are examined to
gain valuable insights that can help to settle the debate in the future.
Before going into the debate and the methodological issue in more detail, we can start by defining
efficiency based on the work of Fama (1970). When he wrote a review of earlier research on the
efficiency of financial markets, he decided to bundle the evidence in a new concept called the
efficient market hypothesis (EMH). According to Fama (1970, p. 383), “a market in which prices
always fully reflect available information is called efficient.” Although this definition provides us
with a first idea of what an efficient market is, it does not really explain what is meant by available
information. This is why Fama (1970) included some elaboration on this definition, making a
distinction between three types of efficient markets, depending on what information is comprised in
the information set.



Weak form efficient market (information set = historical price information);
Semi-strong form efficient market (information set = all publicly available information);
Strong form efficient market (information set = all information, both public and private).
The original idea of making a clear distinction between different forms of market efficiency comes
from Roberts (1967), but Fama (1970) was most successful introducing the concept to the general
public. With the additional knowledge on what is meant by available information, it becomes
perfectly clear what efficient markets are. For example, if stock market prices fully reflect all past
price information of the different stocks, the market is called weak form efficient. This also implies
that it is impossible to generate excess returns based on past price information. What this actually
means is that so-called technical analysis1 of stocks is obsolete and does not generate any riskadjusted excess returns over the return of the general market. A semi-strong efficient market fully
reflects all publicly available information. Thus, it is impossible to make excess returns based on
information that is publicly available to all financial market participants. Suppose that in today’s
newspaper, a certain company announces news that affects its stock market price. Under the
assumption of semi-strong efficient markets, this information is not useful to make any profits, as
the market would already fully reflect this publicly available information. What this also means is
that fundamental analysis2 of stocks is rendered ineffective. When a stock market is strong form
1
Technical analysis consists of looking at charts of past prices and returns of a stock, in order to derive a
certain pattern that can be extended in the future to make profitable predictions of future price movements
(Brown & Jennings, 1989).
2
Fundamental analysis consists of researching all publicly available information (e.g. financial statements)
about a certain stock to infer important insights that can be used to make a profit in the stock market (Kothari,
2001).
2
efficient, even inside traders with private information on a stock are not able to generate excess
returns, adjusted for risk, over the general market. This implies that the prices of stocks fully
incorporate all possible information, whether it is publicly available or not.
The question remains whether the EMH is a valid academic theory. More specifically, we have to
investigate the different forms of the EMH, to be able to conclude whether some form of the EMH is
valid. The strong form efficient market theory has never been believed to be accurate. Rather than
debating the flaws of this theory, practitioners should recognize that it has some theoretical value
in fully explaining the concept of market efficiency. The semi-strong form efficient market theory
has never been subject to serious scrutiny. Until recently, most people believed it to hold true. The
only question, however, is on which time scale. For example, even if we do believe that a stock
market fully reflects all available information, we still have to clarify within which time frame we
believe this to be true. Given current technology, it is reasonable to assume that a news
announcement in the daily newspaper will already be incorporated in the stock price by the time
that someone reads it. It remains unclear, however, whether it is reasonable to assume that newly
announced information on a newspaper’s website will also be immediately reflected in the stock
market price. Therefore, when we accept the semi-strong form of the EMH, we also need to specify
on what time scale we accept it. Different researchers have tried to provide an answer to this time
scale question using event studies (Lim & Brooks, 2011). However, the conclusions are not
consistent and seem to depend on the point in time the research was conducted. To circumvent
this inconvenience, and to take into account the absolute non-believers of semi-strong form
efficient markets, researchers have shifted their focus to the weak form of the EMH. The greatest
benefit of this shift is that it reduces the debate to its fundamental level. If stock markets appear to
be weak form efficient, than the EMH is valid in the weak form. If the opposite applies, the EMH is
invalid in any form, as the weak form is the last possible form in which the EMH could possibly hold
true. For this exact reason, the focus of this thesis is also on the weak form of the EMH.
Now that the debate has been properly redefined to the question of whether the weak form of the
EMH holds true, we can start exploring the answer to the question. At first glance, most people are
believed to assume that this definition has to hold true. However, it is of crucial importance to
consider every word in the definition. After proper investigation, most researchers began to zoom
in on the words “fully reflect”. It might seem trivial that, given the current state of information
technology, past price information is reflected in the current prices of stocks. However, the
question remains whether current stock prices fully reflect this prior information, and thus whether
past information is correctly incorporated or not. From the 1970s onward, academics started to
elaborate on this question. In later years, two different schools of thought started to form. On the
one hand, the proponents of the EMH argue that financial markets are perfectly capable of
aggregating information from all investors, which in turn leads to efficient markets. If the price of a
stock would appear to be too high given past price information, rational investors would bid the
price down to make a profit and vice versa. On the other hand, some researchers started looking
into the psychology of investors. In close collaboration with psychologists, the field of behavioral
finance was established. Proponents of behavioral finance believe that investors are not always
fully rational and therefore are not able to force the stock market to be efficient at all times (e.g.
Shefrin, 2000). The debate between these two schools of thought is still going on today. Over the
last decade, fewer researchers were interested in trying to settle the debate, but the U.S. housing
bubble, which eventually triggered the current sovereign debt crisis, sparked newfound interest in
this matter.
3
One of the reasons for the two views to collide is directly related to the dual nature of the fields of
economics and finance. Finance scholars have always had a predilection for the scientific approach
that is used in the exact sciences. The difference with the exact sciences, however, is that the
subjects of finance are still people, which as social creatures are often not programmed according
to some rational model that is always valid. This very duality about the scientific nature of finance
also caused the two different schools on efficient markets to be formed. On the one hand,
proponents of the EMH believe that investors are rational optimizers that are able to make the best
possible decisions given certain information. Advocates of behavioral finance, on the other hand,
feel like investors are not really able to act rationally at all times. They refer to recent bubbles and
financial crises to point out that there are different psychological effects that cause human beings
to stray from rational decision making.
There is still no consensus on whether the EMH is valid or not. Earlier research signaled that there
might be some flaws to the existing methodology, without undertaking further empirical
investigation (e.g. Campbell, Lo, & MacKinlay, 1997; Lim & Brooks, 2011). Our work, to the best of
our knowledge, is the first to specifically focus on the role of methodology in the debate and the
robustness of the most widely applied tests of weak form market efficiency. Clearly, methodology
was developed over the years to enable researchers to test for market efficiency. However,
methodology might also have been the cause of the debate since researchers have failed to find
common ground, even when adopting similar methodologies. In order to evaluate the role
methodology has played in the development of the debate, we study different approaches applied
in the early aftermath of the conception of the EMH and examine two of them for robustness: Lo
and MacKinlay’s (1988) variance ratio test and the augmented Dickey Fuller test. Next to the
exploration and analysis of these more traditional tests, alternative methodologies that emerged
over the last decade are considered as well to infer insights on how to help settle the debate in the
future. The currently best-developed alternative test is checked for robustness. Our robustness
analyses are applied to data from the Dow Jones Industrial Average (DJIA), the Standard & Poor’s
500 (S&P-500) and the NASDAQ, which are three well-established U.S. stock market indices that
are often used in efficiency research. We also consider the largest Belgian stock market index – the
BEL-20 – to verify if our conclusions also hold true outside the United States of America. The
results and insights from examining the different types of methodologies are discussed together
with a possible reconciling theoretical framework: the adaptive markets hypothesis (Lo, 2004,
2005). From our analysis, we derive some valuable lessons from the past and formulate some
suggestions for future research to come to a definitive view on efficient markets.
The importance of valid financial models cannot be overstressed, as policy makers and investors
tend to act upon academic theory. In many instances, this seemed to work out for the best, as was
the case with the concept of diversification from the optimal portfolio theory (Markowitz, 1952).
Although, when academic theory is flawed, it has the potential to set the entire economy astray.
One example is the housing bubble that caused the global financial crisis of 2008. While policy
makers, bankers and investors were blindly following the bullish market, irrational exuberance was
building up underneath (Shiller, 2000). Today, we are still trying to deal with the consequences
and even the future of an entire generation is at stake.
In the next section, we review the literature relevant to the efficient market debate. Our empirical
work starts at the third section, which looks into three types of traditional tests of market
efficiency. Next, we consider alternative methodologies that have emerged more recently. In the
fifth section, we investigate a new theoretical framework that might provide reconciliation between
both schools on efficient markets. An integrated discussion of all results obtained through our
empirical work is presented in section six; section seven concludes.
4
2. Literature review3
2.1. Economic origins of market efficiency
We can trace back the intellectual debate on the efficiency of financial and economic markets to the
father of modern economics, Adam Smith. When thinking about the work of Smith, many
researchers refer to his seminal book on the wealth of nations (Smith, 1766). In this book, he
explained the theory of the invisible hand and argued for the self-stabilizing nature of economic
markets. Consequently, many researchers claim that Smith believed economic and financial
markets to be efficient and any form of market intervention to be obsolete. However, what these
researchers fail to recognize is that Adam Smith produced more than just his book on the wealth of
nations. In fact, seven years earlier, he wrote a book on the theory of moral sentiments (Smith,
1759), in which he pointed to apparent behavioral biases in the human decision making process.
Clearly, these observations are in contrast with the argument that he believed economic markets to
be perfectly efficient. In order to prevent further intellectual abuse of Adam Smith’s work, Vernon
Smith (1998) wrote a paper on the apparent contradictions between both works, concluding that
the beliefs held by Adam Smith were far more nuanced than one would believe when only reading
The Wealth of Nations (1766). In conclusion, it would be intellectually unfair to contend that even
Adam Smith believed that economic and financial markets were efficient.
2.2. Statistical foundations of market efficiency
Important building blocks for the later development of a theory on the efficiency of financial
markets were provided by the theory of probability, which has its origins in the world of gambling.
The first mathematical work on probability theory dates back to 1564 and was also a guide to
gambling: Liber de Ludo Aleae (The Book of Games of Chance), by the Italian mathematician
Girolamo Cardano. According to Hald (1990), Cardano considered different dice and card games,
giving readers advice on how to gamble. Other than just a guide for gamblers, the work of Cardano
is also of scientific relevance, given his theoretical digressions on the possible outcomes of games
of chance. In fact, Cardano defined the terms probability and odds for the first time and even
presented what he believed to be the fundamental principle of gambling: equal conditions.
Next to the work of Cardano, most early research that was essential in the later development of a
theory on efficient markets was conducted in the 19th century. For example, Brown (1828)
observed what we now call a Brownian motion for the first time, when he was looking through the
microscope and noticed the apparent random movement of particles suspended in water. In later
years, Regnault (1863) proposed a theory on stock prices when he found that the deviation of the
price of a stock is directly proportional with the square root of time: a relation that is still valid in
the world of finance today. The first statement about the efficiency of financial markets came from
Gibson (1889, p. 11) in his book about the stock markets of London, Paris and New York: “When
shares become publicly known in an open market, the value which they acquire there may be
regarded as the judgment of the best intelligence concerning them.” Marshall (1890) transformed
economics into a more exact science, drawing from the fields of mathematics, statistics and
physics. He popularized the usage of demand and supply curves and marginal utility, and brought
together different elements from welfare economics into a broader context. The influence of
Marshall on the field of economics was significant, in particular because his book on the principles
of economics became a seminal work in the field.
At the very end of the 19th century, Bachelier (1900) finished a PhD thesis in which he was the first
to present a mathematical model for the motion that Brown (1828) had observed. The stochastic
3
Our review of the literature significantly benefited from the earlier work of Sewell (2011), who presents a
more elaborate view of the EMH literature. Contrary to Sewell (2011), our paper focuses on the role of
methodology to gain a better understanding of the historical development of the debate and insights for future
research.
5
process developed by Bachelier became one of the centerpieces of finance, as Samuelson (1965)
based his explanation of a random walk, which was introduced by Pearson (1905), on Bachelier’s
early research.
2.3. Economic foundations of market efficiency
After the elaboration on some of the statistical origins of market efficiency, we have a look at the
work of some notable economists, before and after the Great Depression of 1929. A very
prominent researcher at that time was Fisher, who made multiple contributions to the field of
finance (Fox, 2009). He made great progress on the search for a general equilibrium theory and
provided important insights for utility theory, which later proved useful for von Neumann and
Morgenstern (1944) in their definitive book on general utility theory. Despite some of his brilliant
contributions, Fisher became even more famous because of his public statements prior to the Great
Depression that started in 1929 (Fox, 2009). Fisher was advocating the collection of data to
approach the financial market in a much more scientific way than before. Through his revolutionary
statistical analysis of stock market prices, he was able to make predictions about future price
levels, which led him to publicly announce that the boom in stock prices prior to the 1929 crash
was the prelude of a “permanently high plateau”. When only a few days later stock prices plunged
like never before, Fisher was publicly humiliated. Subsequent work of Fisher was received with
great suspicion, even though it later appeared to be as brilliant as most of his pre-1929 work.
Very much like Marshall and Fisher, Cowles (1933, 1944) tried to turn economics into a more exact
science and found that investors are unable to beat the market by means of price forecasting.
Working (1934) came up with similar conclusions stating that stock returns exhibit behavior similar
to lottery numbers. Together, the work of Cowles and Working point towards what was later called
an informationally efficient stock market. In 1936, Keynes published his seminal book General
Theory of Employment, Interest, and Money. In his work, which mostly impacted and shaped the
field of macroeconomics, Keynes introduced the concept of animal spirits. According to him,
investors base their decisions on a “spontaneous urge to action, rather than inaction, and not on
the outcome of a weighted average of quantitative benefits multiplied by quantitative probabilities”
(pp. 161-162). One year later, Cowles and Jones (1937) published a paper that provided early
proof of serial correlation in time series of stock prices. Together with the more theoretical work of
Keynes, this empirical evidence formed an early challenge to the existence of efficient markets.
Nevertheless, a real discussion on the efficiency of financial markets only emerged after the
establishment of the EMH by Fama in 1970.
As a result of their collaboration during the war, von Neumann and Morgenstern (1944) published
their book on the theory of games and economic behavior. Not only was the book the starting point
of game theory, it also proved to be essential in the development of a theory on efficient markets.
The most important piece of theory in their book was about the maximization of what was called
expected utility: a new concept for dealing with uncertainty by multiplying probabilities with
utilities of potential outcomes. After the Second World War, Markowitz (1952) published his paper
on portfolio selection. Operating within the mean-variance framework, he presented a model in
which it was possible to determine the optimal portfolio of securities, providing a maximum level of
return given a certain level of risk. Central to his theory was the idea of diversification as a way of
getting rid of all systematic or correlated risk, leaving only the so-called idiosyncratic risk of
individual securities. The approach of Markowitz in trading off risk and return was very similar to
what other economists had been occupied with during the war: considering the trade-off between
power and precision of bombs (Fox, 2009).
6
2.4. Asset pricing revolution
Sharpe (1964) revolutionized the world of finance by presenting the capital asset pricing model
(CAPM). Building on the earlier work of Markowitz (1952), the CAPM allows for the calculation of a
theoretical rate of return on an asset, given the amount of non-diversifiable risk the asset entails.
The reason only non-diversifiable risk is taken into account is the assumption that the asset is
added to a well-diversified portfolio that neutralizes idiosyncratic risk to all extent. Asset pricing
models, like the one presented by Sharpe, were very important in the debate on efficient markets
that emerged in later years, as it provided researchers with the opportunity to theoretically derive
the price and return of financial assets. That way, it was possible to examine whether the actual
return on an asset was in line with the theoretical rate of return derived from the underlying asset
pricing model. In later years, scholars came across some interesting asset pricing anomalies and
argued that the CAPM was too limited by only accounting for one factor of risk. Ross (1976) came
up with an alternative: arbitrage pricing theory, which is far more flexible than the model of
Sharpe, and states that the expected return on an asset is a linear function of different factors of
risk, each with their respective factor sensitivity. Whenever the actual return on the asset deviates
from the one derived from the theoretical model, the force of arbitrage brings the actual rate of
return back in line with the theoretical one. By discounting for several sources of risk instead of
just non-diversifiable risk, the model addresses the major flaw of the CAPM. However, the model of
Ross is very general and does not give any guidelines as to what specific factors of risk to account
for. In 1993, Fama and French further improved asset pricing theory by presenting their threefactor model. Starting from their observation of pricing anomalies with respect to market
capitalization and growth vs. value strategies, they found the expected rate of return to depend on
the exposure of the asset to each of three factors: market risk premium (non-diversifiable risk),
market capitalization and the book-to-market ratio. With their model, they not only addressed the
biggest flaw of the CAPM (only one risk factor), but also were specific in formulating their factors of
risk, unlike Ross. Following the momentum puzzle pointed out by Jegadeesh and Titman (1993),
Carhart (1997) extended Fama and French’s model to a four-factor model, taking into account a
momentum risk factor.
However important the asset pricing models in the debate on efficient markets, a conundrum
presented itself. It was never entirely certain what the correct theoretical price of an asset was, as
different models accounted for different factors of risk and could yield different theoretical rates of
return. This conundrum has come to be known as the joint-hypothesis problem, which we address
later on in this literature review.
2.5. The beginning of efficient market theory
The idea of an efficient market was first described by Samuelson (1965) when he showed that a
stock market is informationally efficient when prices fluctuate randomly, given that the market
contains all available information and expectations from market participants. In the same year,
Fama (1965a) also defined an efficient market for the first time. Based on the empirical
investigation of stock market prices, he observed that financial markets follow a random walk.
Another paper by Fama (1965b) elaborated on the random walk pattern in stock market prices to
show that technical and fundamental analysis could not possibly yield risk-adjusted excess returns.
Fama and Blume (1966) considered the profitability of technical trading rules like the popular filter
rule4 that was described by Alexander (1961, 1964). They concluded that no economic profits could
be made using these filter rules since trading costs would be too high even when adopting the
most profitable very small-width5 filters. This also confirmed their beliefs of financial markets being
4
Example of an x% filter rule: buy and hold securities of which the daily closing price moves up by at least x%,
until the price moves down by at least x% from the subsequent high, at what point it is time to simultaneously
sell the security and go short. The short position is then maintained until the daily closing price of the security
rises at least x% above the subsequent low, after which the short position needs to be covered and the security
is bought again (Alexander, 1961, 1964).
5
A very small-width filter is a filter in which x lies between 0.5% and 1.5% (Alexander, 1961, 1964).
7
informationally efficient. Roberts (1967) was the first one to coin the term efficient market
hypothesis (EMH) and suggested a distinction between several types of efficiency.
The definitive paper on the EMH was published by Fama (1970) in the form of his first of three
reviews of the theoretical and empirical work on efficient markets. He defined an efficient market
to be a market that fully reflects all available information and introduced three different types of
informational efficiency. Summarizing results from weak form, semi-strong form and strong form
efficiency tests, Fama concluded that almost all of the early evidence pointed towards a financial
market that was efficient in at least the weak sense. Although he found some price dependencies,
they never sufficed to be used in profitable trading mechanisms, making markets weak form
efficient. Fama also considered the joint-hypothesis problem. Essentially, he argued that it would
be impossible to ever correctly test the EMH, because no academic consensus was found on the
true underlying asset-pricing model. Whenever a test of market efficiency would reject the
efficiency hypothesis, there was always the possibility that it was simply due to the underlying
asset pricing model finding an incorrect theoretical asset value. The only conclusion that could be
made from efficiency tests is that a market is efficient or not with respect to a certain underlying
asset pricing model. The same conclusion could never be made independently from the underlying
model.
Besides Fama (1970), other researchers have attempted to formulate a clear definition of what is
meant by an efficient market. Jensen (1978, p. 96) wrote “a market is efficient with respect to
information set θt if it is impossible to make economic profits by trading on the basis of information
set θt.” Malkiel (1992) stated that a stock market is efficient whenever the prices of stocks remain
unchanged, despite information being revealed to each and every market participant. Even though
there is a lot of academic merit to the definitions of Jensen and Malkiel, we adopt the definition of
Fama, as explained in the introduction. More particularly, this thesis focuses on weak form market
efficiency, i.e. the set of available information we consider consists only of historical price
information (Fama, 1970).
2.6. Early aftermath of efficient market hypothesis
In the early aftermath of Fama’s work (1965a, 1965b), a lot of research was conducted in order to
test the validity of the EMH. Like Fama (1970) concluded in his first review paper, a lot of the
empirical evidence was pointing towards a weak form efficient stock market. Immediately though,
different scholars found contradicting evidence as well. Kemp and Reid (1971) pointed out that a
lot of the earlier research considered only U.S. stock market data. Using British data, they found
the stock price movements to deviate from what is expected under the random walk hypothesis,
which contradicts with what was argued by Fama. Grossman (1976) found the first evidence of an
important paradox: “informationally efficient price systems aggregate diverse information perfectly,
but in doing this the price system eliminates the private incentive for collecting the information” (p.
574). In his literature survey, Ball (1978) pointed out consistent excess returns after the public
announcement of firms’ earnings, which is a clear violation of the theory on semi-strong form
efficient markets, as no excess returns should be possible when trading on public information.
When looking at long-term interest rates, Shiller (1979) found that the observed volatility is in
excess of that predicted by expectations models. This observation implies some extent of
forecastability of long-term interest rates, which contradicts the EMH.
The most convincing piece of contradicting evidence came from the paradox that was presented by
Grossman and Stiglitz (1980), following the earlier work of Grossman (1976). In order for investors
to be motivated to spend resources for collecting and analyzing information to trade on, they must
have some form of incentive (Grossman & Stiglitz, 1980). If a stock market would prove to be
perfectly efficient, however, there would be no reward for collecting information, since that
information would already be reflected in the current stock price. This simple paradox shows that
financial markets can never become entirely efficient, as no investor would be motivated to collect
8
information in the first place. Consequently, no one would trade on new information and it would
become impossible for stock market prices to reflect all available information.
2.7. Establishment of behavioral finance
Following the paradox presented by Grossman and Stiglitz (1980), polarization on the topic of
efficient markets became apparent. More and more researchers observed pricing anomalies, hence
the validity of the EMH became uncertain. De Bondt and Thaler (1985) tested the hypothesis that
investors tend to overreact to unexpected and dramatic news events by looking at the performance
of extreme portfolios over three years. They found that portfolios of stocks that performed poorly
over the last three to five years tend to significantly outperform portfolios of stocks that performed
well over this period in the next three to five years. This finding was consistent with the
overreaction hypothesis and pointed to weak form inefficiencies. In later years, Lakonishok,
Shleifer and Vishny (1994) conducted similar empirical research, using proxies for value instead of
historical price information. The results they found also revealed market inefficiencies.
Next to the contradicting evidence from empirical research in the field of finance, some
psychologists started to point out behavioral biases that challenged the EMH. Most significantly,
Kahneman and Tversky’s (1979) prospect theory explained how investors tend to make decisions
when risk is involved, and provided an alternative to expected utility theory. People tend to be loss
averse, as they hate losing more than they love winning. Rather than simply multiplying
probabilities and utility, prospect theory suggests that a distinction is made between losses and
gains, which transforms the expected utility function. The behavioral biases presented in prospect
theory are in direct contrast with a theory of efficient markets in which investors were originally
assumed to be fully rational.
Together with the paper of De Bondt and Thaler (1985), the work of Kahneman and Tversky
(1979) can be seen as the beginning of the field of behavioral finance, in which the traditional
theory of finance is merged with concepts from other social sciences like psychology and sociology.
Behavioral finance tries to formulate an alternative for the EMH by assuming that investors are not
perfectly rational, which leads to anomalies in stock pricing (e.g. overreaction), which in turn
causes stock markets not to be behave efficient at all times.
2.8. Development of tests for weak form market efficiency
A very interesting day, both for the world of practice and academics, was October 19th, 1987: Black
Monday. This day became infamous because of the crashing of stock markets around the world.
Starting in Hong Kong, the crash spread to Europe and eventually the United States later the same
day. The DJIA dropped by 508 points or 22.61%, which to this day is the largest percentage drop
ever in its value. Different phenomena came together to cause this dramatic event: program
trading, market psychology, overvaluation and eventually illiquidity (Shiller, 1989). Despite its
negative impact on investors and the global economy, this unique event also provided researchers
with valuable new data for scientific analysis. Advocates of behavioral finance also pointed towards
Black Monday to further illustrate that investors are not fully rational and overreact to information
in times of market mania. Together with the valuable data from the Black Monday crash, the
evolution of computing power allowed researchers to come up with new and more advanced
empirical tests of market efficiency (Bodie, Kane, & Marcus, 2010). In our literature review, we
focus on three particular types of statistical tests of the weak form of the EMH.
A first group of weak form market efficiency tests looks at return autocorrelations. The general
philosophy behind these tests is the following: if significant autocorrelation is found among the
returns on a stock, there is some extent of predictability, which is in contradiction with the EMH
(Lim & Brooks, 2011). The empirical work testing return autocorrelations can be split based on the
horizon of returns. Autocorrelations in the short run (day, week, month) tend to be positive for
9
returns on portfolios (e.g. Conrad & Kaul, 1988; Lo & MacKinlay, 1990, 1997) and negative for
returns on individual stocks (e.g. Lehmann, 1990; Jegadeesh, 1990). Autocorrelations in medium
horizon returns (1-12 months) tend to be positive; for the long horizon (1-5 years) return
autocorrelations tend to be negative (Bodie et al., 2010). For short horizon returns, Lo and
MacKinlay (1990, 1997) find significant autocorrelation among returns on S&P-500 stocks.
However, the pattern of autocorrelation is weaker for weekly and monthly returns, and for larger
rather than small stocks. Jegadeesh and Titman (1993, 2001) considered stock returns in the
medium horizon and found significant evidence of momentum profits, which gave rise to an
important puzzle in asset pricing theory. Proponents of behavioral finance used this finding of
momentum profits to argue that a gradual adjustment of prices causes the predictable drifts or
autocorrelation in returns, which implies that financial markets do not promptly incorporate news in
past prices, and hence are not weak form efficient (Bodie et al., 2010). For evidence of return
autocorrelation on the longer horizon we can refer back to De Bondt and Thaler (1985), who
showed that investors tend to overreact to dramatic and unexpected news events.
Some of the (momentum) puzzles found when analyzing return autocorrelations were further
investigated by researchers on the behavioral finance side of the debate. Using behavioral theories
of under- and overreaction to information, some researchers were able to explain the observed
puzzles. De Long, Shleifer, Summers, and Waldmann (1990) showed that the long horizon negative
autocorrelation in returns (reversal) can be explained by a stylized model with two types of agents:
fundamentalists, who get signals about intrinsic values, and chartists, who learn indirectly about
intrinsic values by looking at prices. Whenever a good signal is received by fundamentalists, prices
increase. Chartists will observe this rise in prices, causing some chartists to buy, which in turn
further increases prices and causes more chartists to buy. Eventually, share prices are so far
beyond intrinsic values that fundamentalists start selling again. Another explanation was provided
by Barberis, Shleifer and Vishny (1998). They explained underreaction to information using
conservatism: investors erroneously believe that the earnings process underlying stock prices is
mean-reverting and so they underreact to news. To explain overreaction, they refer to the
representativeness heuristic: investors overextrapolate from a sequence of growing earnings,
overreacting to a long trend. Daniel, Hirshleifer and Subrahmanyam (1998) related overreaction to
overconfidence, as traders tend to overestimate the precision of their private signals, leading to
prices being pushed above the fundamental level in the case of good news.
Some researchers also developed specific linear serial correlation tests to analyze the weak form of
the EMH (Lim & Brooks, 2011). These tests simply examine the third version of the random walk
hypothesis, which we explain later on, and have been adopted right from the start of the debate
(e.g. Granger & Morgenstern, 1963; Fama, 1965a). However, the most popular linear serial
correlation test was developed several decades after the start of the debate by Lo and MacKinlay
(1988), when they presented their variance ratio (VR) test. The VR test can be used to check the
null hypothesis of serially uncorrelated returns, which points towards informational efficiency of
stock prices, and is expressed as the ratio of the k-period return variance over k times the variance
of the one-period return. According to the random walk hypothesis, stock prices are following a
random walk when the variance of the k-period return is the same as k times the variance of the
one-period return. So in order to test whether returns are serially uncorrelated, it suffices to test
whether the variance ratio is significantly different from one. Applying their own VR test, Lo and
MacKinlay found that the random walk hypothesis does not hold for weekly stock market returns.
Further on in this paper, the VR test is examined for robustness in order to gain insights into the
role of methodology in the debate.
Unit root tests, which can examine the stationarity of stock returns, form a second class of weak
form market efficiency tests. The basic idea is that stock returns that contain a unit root, and are
hence non-stationary, are following a random walk (Lim & Brooks, 2011). The most popular
approach to examine stationarity has proven to be the augmented Dickey-Fuller (ADF) test, which
is also examined for robustness in the next section. Recent research has led to the development of
10
more sophisticated tests of stationarity as well. However, it was shown that the existence of a unit
root in stock returns is not a sufficient pre-requisite for the random walk hypothesis to hold
(Rahman & Saadi, 2008). In addition to stationarity, returns need to be serially uncorrelated in
order for those returns to be following a random walk.
The final class of weak form market efficiency tests considers non-linear serial dependence. Since
linear autocorrelation tests only account for linear effects, some researchers pointed out that stock
markets could exhibit inefficient behavior, even when linear autocorrelation tests point towards
informational efficiency in the weak sense (Granger & Andersen, 1978). Among popular tests of
non-linear serial dependence are the Hinich bicorrelation test (Hinich, 1996), the Engle Lagrange
multiplier test (Engle, 1982) and the Brock-Dechert-Scheinkman test (Brock, Scheinkman, Dechert,
& LeBaron, 1996). Almost every empirical paper employing one or more of these tests reports
significant nonlinear serial dependence across worldwide stock markets (Lim & Brooks, 2011).
Despite the emergence of these different tests, a consensus on the validity of the EMH remained to
be found. By the beginning of the 1990s, the debate had split researchers into two camps:
believers of the EMH on the one hand, and proponents of behavioral finance on the other. As a
reaction to the emergent body of anomalous evidence and the rise of behavioral finance, Fama
(1991) wrote a second review covering tests of the different forms of the EMH. He concluded that
the idea of efficient markets still remained valid because the observed anomalies tended to
disappear over time and because anomalous traders seemed to cancel each other out.
2.9. Alternative approach to weak form market efficiency testing
Looking for a way to settle the debate and taking into account the paradox pointed out by
Grossman and Stiglitz (1980), Campbell et al. (1997) suggested a new approach in testing for
market efficiency. Instead of using all-or-nothing tests that did not lead to a definitive answer, they
suggested an approach in which the degree of market efficiency was tested over time. This
approach would enable researchers to draw more nuanced conclusions, which could eventually help
move the debate along. Other than introducing this idea, Campbell et al. did not present a concrete
approach. However, they inspired other researchers to come up with several alternative tests of
market efficiency. Since this paper focuses on weak form market efficiency, we only discuss three
alternative forms of weak form market efficiency tests. Section four considers these alternative
methodologies in more detail.
A first alternative approach is the non-overlapping subperiod analysis, which looks at different
separated time windows and the evolution of efficiency between those windows (Lim & Brooks,
2011). This approach is only useful when examining the impact of a specific policy from one time
window to another. For example, one could investigate the effects of a short sell prohibition on
market efficiency. The first subperiod would then consist of all the historical data up until the last
day before the prohibition took effect, and the second subperiod would start at the moment the
prohibition was adopted until today.
Another possible alternative is the use of rolling estimation windows. The idea behind this
alternative is to transform a data sample of n observations into n-l+1 windows, with l being the
length of the window6 (Lim & Brooks, 2011). Here, the different time windows overlap, as they are
pushed forward until the final observation is included in the last time window7. This rolling
approach allows researchers to look at underlying changes in efficiency on a shorter time scale
than is the case with the non-overlapping subperiod analysis. Furthermore, rolling estimation
6
For example, if we have 100 observations of returns on a certain stock, and a time window length of 20, we
could transform the data into 81 different time windows.
7
In the same example, we would start with the time window going from observation 1 until 20. Then, we push
the time window forward to get the second window, spanning from observation 2 until 21. We continue this
procedure until we reach the last time window, covering observation 81 until 100.
11
windows accommodate a comparison of stock market efficiency through time, since a varying
degree of efficiency is measured, rather than a static binary condition of efficiency.
Time-varying parameter models constitute a final alternative approach to market efficiency testing.
This approach draws from state space models8 to allow standard regression parameters to change
over time (Lim & Brooks, 2011). The greatest advantage is that this allows regression methods to
be applied to more dynamic concepts like time-varying efficiency. Primarily, these models have
been applied to developing stock markets, as these could not have been efficient from inception.
The time-varying parameter model allows for an evolution of those markets towards efficiency, by
letting regression parameters evolve through time. A static approach like one of the classic tests
would not allow for this underlying shift in parameters and would thus be biased. Recently,
however, different time-varying parameter models have also been applied to developed financial
markets. In section four, we further study this methodology and present an extended time-varying
parameter model combining properties from the classical literature of efficient markets and
behavioral finance.
Despite the potential of these emerging methodological approaches, the debate remains unsettled.
To further address the critiques uttered against the EMH, Fama (1998) wrote a third and final
review on the empirical work testing market efficiency, and concluded there is a lack of valid
evidence to disprove his theory. In subsequent years, it seemed as if the discussion would never be
settled and that it would slowly fade away to the history books. Nevertheless, advocates of
behavioral finance did not rest their case and put in a lot of effort to make behavioral finance more
known to a broader audience.
2.10. Current state of the debate
Today, there is still no definitive view on the efficiency of financial markets, even though
proponents of both the EMH and behavioral finance have conducted further research. Shiller (2003)
claims that theories of efficient markets should remain as a characterization of an ideal world, but
not as an accurate description of global financial markets. Literally preceding the article of Shiller in
the same journal, Malkiel (2003, p. 80) argues that “if any $100 bills are lying around the stock
exchanges of the world, they will not be there for long”. His statement became a classical
economics joke to explain that efficiency anomalies would not persist because someone would
benefit from the opportunity immediately, through the price arbitrage mechanism.
Currently, we see two important reasons why the debate on efficient markets is still not settled.
The first one relates to an alternative theoretical framework. Being critical of an existing theoretical
framework is somewhat straightforward. Indeed, a theory is supposed to be imperfect as it is only
a framework to describe reality. However, coming up with a new and improved theory is far less
evident. Thus far, advocates of behavioral finance have failed in coming up with such a new theory
that could replace the EMH. While several behavioral biases have been documented in the
academic literature, there is still a lack of an overarching framework that could describe the
efficiency of financial markets in a behavioral way. Well aware of this problem, Lo (2004, 2005)
looked at evolutionary biology to reconcile opinions from both ends of the efficiency spectrum when
he formulated the adaptive markets hypothesis (AMH). In his theory, he also incorporates the
concept of a varying degree of efficiency following Campbell et al. (1997). Despite its potential, the
AMH has not yet replaced the EMH as the definitive theory on the efficiency of financial markets.
Further exploring as to why this is the case, we empirically test Lo’s theory in section five.
8
“State space modeling provides a unified methodology for treating a wide range of problems in time series
analysis. In this approach it is assumed that the development over time of the system under study is
determined by an unobserved series of vectors α1, …, αn, with which are associated a series of observations y1,
…, yn; the relation between the α t’s and the yt’s is specified by the state space model. The purpose of state
space analysis is to infer the relevant properties of the αt’s from a knowledge of the observations y1, …, yn.”
(Durbin & Koopman, 2008, p. 1).
12
We also see a second reason why the debate yet remains to be settled: flawed methodology. As is
clear from the literature overview, many different methodologies have been developed over the
course of the last 60 years. On the one hand, there were the more traditional tests of efficiency
that led to all-or-nothing conclusions. On the other hand, a new strand of methodologies was
developed following the idea of a time-varying degree of efficiency. Even though methodologies
were originally designed to help settle the debate, we believe the way researchers implemented
these methodologies also caused the debate from remaining unsettled. The most important
argument for holding this belief is that conflicting results were found by researchers who adopted
exactly the same methodologies. In this thesis, we further examine the role of methodology in the
debate on efficient markets. We discuss different methodological approaches and examine two of
the most popular traditional efficiency tests and one alternative test for robustness. The results are
discussed together with insights from our empirical test of Lo’s AMH, which enables us to distill
important lessons from the past and suggestions for the future to eventually help settle the debate.
3. Traditional tests of weak form market efficiency
With the market efficiency debate explained and the historical train of thought explored, we start
the empirical work by focusing on the methodological approaches that were applied in the past, in
order to better understand how the debate came into existence. In this section, we focus on the
three different types of traditional tests of market efficiency that were introduced in the literature
review. We refer to these tests as traditional, as they were applied most often in the early
aftermath of the establishment of the EMH. Also, these tests make use of more traditional
statistical techniques that can be characterized as static, as only one time-invariant conclusion can
be drawn based on a full sample of data.
3.1. Tests based on return autocorrelation
3.1.1. Two different approaches
In the literature review, we introduced two possible approaches when relying on return
autocorrelation as a proxy to test for weak form market efficiency. The principle for both of these
approaches is the same: significant return autocorrelation implies some degree of predictability,
which is at odds with the EMH. Because of their relatedness, tests of return autocorrelation are also
a test for the applicability of technical analysis.
The first approach simply examines autocorrelation in return series over different time horizons.
Over the short horizon (1 day - 1 month), mixed results have been found in the past. For individual
stocks, short run return autocorrelations tend to be small but negative, which can be explained by
market microstructure effects like the bid-ask bounce (e.g. Lehmann, 1990; Jegadeesh, 1990).
Portfolio returns, however, exhibit large and positive autocorrelations over the short run, which
could be explained by effects of non-synchronous trading9 (e.g. Conrad & Kaul, 1988; Lo &
MacKinlay, 1990, 1997). In the medium horizon (1-12 months), returns on both individual stocks
and portfolios exhibit positive autocorrelation. Drawing from this observation, Jegadeesh and
Titman (1993, 2001) implemented a strategy that consisted of buying a portfolio of stocks that
performed well in the three to twelve months before the investment period, and selling a portfolio
of stocks that performed badly over the same period. What they found was that the portfolio of
stocks that performed well in the last three to twelve months tended to have a positive riskadjusted excess return (alpha) over the subsequent twelve-month investment period; alpha tended
to be negative for the portfolio of stocks that performed badly over the same investment period.
These results were surprising and created what was called a momentum puzzle. Finally, returns
over the long horizon (1-5 years) exhibit negative autocorrelation or mean reversion. De Bondt and
9
A non-synchronous trading effect arises when the assumption is that asset prices are recorded at time
intervals of the same length, when in fact they are recorded at time intervals of other, possibly irregular lengths
(Bodie et al., 2010).
13
Thaler (1985) further investigated this finding and tested the hypothesis of overreaction to
unexpected and dramatic news by looking at portfolios of stocks with extreme past performance.
The extreme portfolios were constructed based on market-adjusted cumulative abnormal returns of
stocks over the last three years: the top 35 stocks were allocated to a winner portfolio, and the
bottom 35 stocks were allocated to a loser portfolio. Consequently, the performance of the extreme
portfolios was measured over three years, by deriving the differences in market-adjusted
cumulative average returns. The results found by De Bondt and Thaler were consistent with the
overreaction hypothesis: over a period of 50 years, loser portfolios had outperformed winner
portfolios by about 23% in the three years following the formation of the portfolios. This apparent
overreaction to new information clearly pointed to weak form inefficiencies.
The second approach drawing from return autocorrelations to test for weak form market efficiency
focuses on the development of statistical tests for linear serial correlation. Historically, the primary
tool to test for weak form market efficiency through the exploration of return autocorrelations was
the VR test (Lo & MacKinlay, 1988). After briefly introducing this test in the literature review, we
now provide a further exploration. Additionally, we perform a robustness analysis, which provides
valuable insights on how methodology influenced the debate. Although several innovations have
been suggested to the original VR test (e.g. Chow & Denning, 1993; Wright, 2000), we perform our
robustness analysis to the original test by Lo and MacKinlay (1988). Given our aim to better grasp
the historical development of the debate and the role of traditional methodology, our focus in this
section is on those methodologies that have been most influential in the past. When looking for
ways out of the debate later on in this paper, the most recent and state-of-the-art methodologies
are considered.
3.1.2. Robustness analysis: Lo and MacKinlay’s variance ratio test
Before explaining the VR test, we need to introduce the concept of a random walk with a drift10,
which is a statistical property of time-series data that is important in every academic field that
makes use of time-series analysis. Our explanation is based on the seminal book by Campbell et al.
(1997).
Random walks
There are three possible forms of the random walk. The simplest, but strongest form of the random
walk (version 1) is the one with independently and identically distributed (IID) increments:
For example, assume the time-series
at hand is the return11 on a certain stock. The equation of
the first version of the random walk then tells us that the return of today is equal to the return of
yesterday plus an expected return change or drift
and a certain IID random error term . The
expected price of today will thus be equal to the price of yesterday plus a certain drift.
The second version of the random walk (random walk 2) is a generalization of the first one, as the
increments are only assumed to be independently but not identically distributed (INID). The reason
for this generalization is purely empirical, as stock returns have proven to be distributed in a nonidentical way through time. This ought to be no surprise since a lot has changed throughout the
years. Stock markets have evolved in terms of economic, technological, social, institutional and
regulatory aspects. As a consequence, stock prices and stock returns are not identically distributed
through time and we need a less constrained model that accounts for this statistical property.
10
In this paper, whenever we use the term random walk, we refer to the random walk with a drift and the
three possible forms it can take (depending on the assumptions of the error term). Note, however, that there
are other random walk models as well (e.g. pure random walk and random walk with a trend and a drift).
11
Following earlier research, we always use continuously compounded returns instead of gross returns.
14
The third and final version of the random walk is a further generalization of the independent
increment model (random walk 2). We now only assume that the increments are uncorrelated. The
reason for this further relaxation of assumptions is again empirical. Especially for stock prices,
researchers have found it implausible for today’s stock return to be completely independent from
yesterday’s return. The uncorrelated increment model (random walk 3), because of its more
realistic assumptions, has proven to be the most popular to test for random walks in stock return
time-series. We start our analysis from this third version of the random walk as well.
We also need to point out the link between random walks and weak form efficient markets. As we
saw before, a weak form efficient financial market is a market in which all past price information is
fully reflected in stock prices. Therefore, it is impossible to predict future prices based on past price
information. The random walk model says precisely the same as this weak form efficient market
condition. The stock price of tomorrow is unpredictable, as there is no way of predicting the
arbitrary drift term. This parallel between the weak form market efficient condition and the random
walk made it interesting for researchers to test weak form market efficiency indirectly, by testing if
stock returns12 are following a random walk (version 3)13. This is also the exact principle that is
used in the VR test.
Introduction to the variance ratio test
Lo and MacKinlay’s VR test (1988) has turned out to be extremely popular among researchers to
test for an uncorrelated increment and can also be used to determine whether a stock market is
weak form efficient over a certain period of time. The main assumption behind the test is that if
stock returns follow a random walk (version 3), the variance of the stock returns over a certain
time interval is the same as (order of differentiation; lag order) times the variance of the stock
returns over an interval
.
Following Campbell et al. (1997) we can also develop this model with more mathematical rigor.
Assume that
is the return on a certain stock at time . The uncorrelated increment model then
looks like the following:
The arbitrary drift is once again represented by and
is the random increment at time . If the
return
follows a random walk (version 3), the variance of should be
times the variance of
. Statistically, this simple yet elegant relationship can be tested by calculating if
the ratio of the variance of
over times the variance of
significantly differs
from unity. This is exactly what Lo and MacKinlay’s VR test does. The null hypothesis of this test
states that the time-series follows an uncorrelated increment model. Whenever the ratio
statistically differs from unity, the null hypothesis can be rejected and we arrive at the alternative
hypothesis stating that the time series does not follow a random walk (version 3).
12
Proponents of the EMH have argued that in order for a stock market not to be weak form market efficient,
returns rather than prices must not follow a random walk. The explanation of why the VR test is able to test
weak form market efficiency, however, remains valid since returns are simply a transformation of stock prices.
13
Some proponents of the EMH have recently argued that a stock market that is not following a random walk
could still be efficient. However, we do not focus on this discussion. Instead, we look at methodologies that
have been used abundantly by researchers in the past and how these might have influenced the fact that we
still have a debate going on today.
15
The VR test is typically implemented for different orders of differentiation . For example, with an
order of differentiation of 2, the VR would become:
With
being the first-order autocorrelation coefficient of returns and
. The VR test
statistic with order of differentiation 2 is equal to one plus the first-order autocorrelation coefficient
of returns, for any time series. More generally, for any order of differentiation , the VR test
statistic can be calculated as follows:
With
and
being the kth order autocorrelation coefficient of . As is
clear from the definition,
is a particular linear combination with linearly declining weights of
the first
autocorrelation coefficients of
. Most commonly, the following orders of
differentiation
have been applied: 2, 4, 8 and 16. In applying the test and examining its
robustness, we also report values on these four different orders of differentiation.
There is still one major drawback to the VR test statistic as we have just derived it: it only applies
under the assumption that the data are homoscedastic (Lo & MacKinlay, 1988). To overcome this
inconvenience, one could first model heteroscedasticity separately (e.g. via GARCH-models), and
then apply the presented form of the VR test. However, Lo and MacKinlay also developed an
integrated approach for a heteroscedasticity-consistent test statistic, drawing from the earlier work
on heteroscedasticity-consistent methods from White (1980) and White and Domowitz (1984).
What they found was a robust standardized test statistic following the standard normal distribution
that can test the third version of the random walk, despite the possible presence of
heteroscedasticity. When implementing the VR test and when checking for robustness, we report
heteroscedasticity-consistent probability values (p-values) of the test statistic. The derivation of
this heteroscedasticity-consistent form of the VR test, however, is beyond the scope of this paper.
Application of the variance ratio test
After introducing the VR test, we now apply it to the historical returns of the three major U.S. stock
market indices: DJIA, S&P-500, and NASDAQ; and the main Belgian stock index: BEL-20. To
perform our analysis, we use the statistical software package Gretl. The data for our tests come
from the financial database Datastream of Thomson Reuters. All available data up until Tuesday,
February 5th, 201314 is included. We start by an exploration of the data, followed by an in-depth
explanation of the applied methodology. Next, we present and discuss the results from our
application of Lo and MacKinlay’s (1988) VR test.
14
The data were collected at the library of the faculty of Business and Economics of the KU Leuven on
Wednesday, February 6th, 2013.
16
Table 1 summarizes the data used throughout this thesis. Plots of the data summarized in table 1
can be found in appendix A.
Table 1: Data summary statistics
To gain more insights into the debate and the role of methodology, the robustness of results
obtained through the popular VR test is analyzed. The robustness check is implemented as a
scenario analysis based on two decisions researchers need to make when implementing the VR
test: time interval of data and time window of data. The fact that researchers typically only include
one type of time interval and time window is the exact reason why we perform robustness tests.
For each of the four indices, we implement the VR test for three different time intervals: daily,
weekly and monthly returns. Next to the time interval, we also test the robustness for changes in
the time windows of data. More specifically, for every index and time interval, we apply the VR test
to 30 different random time windows grouped into three categories of constant window length (5
years, 10 years and 20 years). When implementing the VR test, researchers also need to decide on
the orders of differentiation (q). However, this purely statistical parameter is not examined for its
robustness, as academics typically report results from different orders of differentiation to avoid
sensitivity. In our analysis, we use the four different values (2, 4, 8 and 16) for the order of
differentiation that were originally suggested by Lo and MacKinlay (1988). Conclusions for a
specific sample of data are drawn from the results using these four different orders of
differentiation.
Since our data comprise daily, weekly and monthly price levels for the four selected indices, we
first need to implement a transformation to obtain returns. In order to find the return on
day/week/month t, we calculate the natural logarithm of the price level on day/week/month t
divided by the price level on day/week/month t-1. Next, we divide our data into time windows with
different time lengths for every index-time interval pair. For the robustness analysis we choose
window lengths of 5, 10 and 20 years, as these represent typical time lengths of data from a large
sample of historical studies. After the division of our data in windows of 5, 10 and 20 years, we
randomly select 10 different time windows for each particular combination of index, time interval
and window length. This way, we end up with 9015 different scenarios per index, which can be
divided into 9 groups of unique combinations of time interval and time window length of data.
Within each of these 9 groups, we end up with a random selection of 10 different scenarios with
15
3 different time intervals of data (daily, weekly and monthly) x 3 different window lengths of data (5, 10 and
20 years) x 10 random samples.
17
the same window length and time interval. For every scenario, we report p-values of the VR test
statistic for four different orders of differentiation (2, 4, 8 and 16), which means that our analysis
eventually yields 36016 p-values for the 90 different scenarios per index. These p-values can be
seen as statistical thresholds and give us the necessary information to conclude whether or not we
can reject the null hypothesis of stock returns following a random walk (version 3) for a certain
scenario. In drawing our conclusions, we use a 95% confidence interval. As a way to explain the
variance in p-values across the different scenarios, an OLS regression17 is implemented as well.
The explanatory variables in this regression include dummies for time interval, time window length
and stock market index. All together, our approach allows us to examine the robustness of the VR
test as it was designed by Lo and MacKinlay (1988). From this robustness check, we infer valuable
insights on how the current debate has developed over the last decades and what role specific
methodology like the VR test has played.
The summary of results for each of the four considered stock indices are presented in tables 2, 3, 4
and 5.
16
3 different time intervals of data x 3 different window lengths of data x 10 random samples x 4 orders of
differentiation.
17
Results and output from the OLS regression are available upon request.
18
Table 2: DJIA VR test results
The table presents 360 p-values for the 90 scenarios of the Dow Jones Industrial Average stock market index. Daily, weekly
and monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and
twenty years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time
window-time interval combination and q represents the four different orders of differentiation that were considered. P-values
indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in
green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis.
19
Table 3: S&P-500 VR test results
The table presents 360 p-values for the 90 scenarios of the Standard & Poor’s 500 stock market index. Daily, weekly and
monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty
years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time
interval combination and q represents the four different orders of differentiation that were considered. P-values indicated in red
are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or
higher, which does not lead to a rejection of the same null hypothesis.
20
Table 4: NASDAQ VR test results
The table presents 360 p-values for the 90 scenarios of the NASDAQ stock market index. Daily, weekly and monthly refer to the
time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented
by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination
and q represents the four different orders of differentiation that were considered. P-values indicated in red are smaller than
0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which
does not lead to a rejection of the same null hypothesis.
21
Table 5: BEL-20 VR test results
The table presents 360 p-values for the 90 scenarios of the BEL-20 stock market index. Daily, weekly and monthly refer to the
time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented
by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination
and q represents the four different orders of differentiation that were considered. P-values indicated in red are smaller than
0.05 and represent a rejection of the null hypothesis of a random walk. P-values indicated in green are 0.05 or higher, which
does not lead to a rejection of the same null hypothesis.
First, we notice that results are similar across stock markets, even for the BEL-20. For each of the
four considered stock market indices18, daily and weekly data unambiguously lead to the rejection
of the null hypothesis of returns following a random walk (version 3). Research on samples with
window length of 20 years also yields unequivocal results. Within time intervals, we notice that
18
Only three small exceptions are found: one for daily observations of the NASDAQ, and two for weekly
observations (one for the BEL-20, one for the S&P-500). Despite these exceptions, there is conclusive evidence
that daily and weekly observations lead to unambiguous conclusions.
22
longer time windows are in general less prone to sensitivity. The summary tables also show that
ambiguity of results increases from daily to weekly to monthly data. From these results, we can
conclude that daily data are the most robust to underlying variations in time window length and
sampling scheme, closely followed by weekly data. Monthly data prove to be prone to significant
degrees of sensitivity, particularly when using 5- and 10-year time windows.
Results from the OLS regression on the total of 1 440 p-values indicate that the index at hand does
not significantly explain variation in the p-values. In contrast, time window and time interval are
highly significant. A negative relationship is found between time window length and p-value. The
relationship between time interval and p-value is positive: longer time intervals lead to higher pvalues. Orders of differentiation of 8 and 16 yield significantly higher p-values than orders of
differentiation 2 and 4.
In brief, Lo and MacKinlay’s VR test appears to be robust when using daily and weekly data,
regardless of the time window length and random sample of data. VR tests drawing from monthly
data only prove to be robust when using time windows with a length of 20 years. Sensitivity and
thus controversy can be spurred when implementing VR tests using monthly data with samples of
length 5 or 10 years. The OLS regression shows that the stock market index at hand cannot
explain variation in p-values across scenarios. A negative relationship between time window length
and p-value is found; the relationship between time interval and p-value is positive. In section 5,
we present a further discussion of these results, along with results from the robustness check of
the ADF unit root test, the exploration of alternative tests of weak form market efficiency and the
analysis of Lo’s AMH.
3.2. Tests based on stationarity
3.2.1. Development of tests using the stationarity proxy
A second possible proxy to indirectly test for weak form market efficiency is the concept of
stationarity. In the past, the augmented Dickey-Fuller (ADF) unit root test proved to be most
popular among researchers to test for weak form market efficiency by means of the stationarity
proxy. Further refinement of the ADF test in later years led to the inclusion of (multiple) structural
breaks to prevent for the detection of a unit root because of an underlying structural break.
Another innovation was the emergence of panel unit-root tests, which overcome the problem of
lower statistical power of univariate unit root tests with a small sample size. Next to these more
general refinements to the ADF unit root test, other innovations were proposed as well. For an
elaborate overview of the literature on weak form market efficiency tests using the stationarity
proxy, we refer to Lean and Smyth (2007), and Lim and Brooks (2011).
Just like for the VR test, we perform a robustness analysis for tests based on stationarity to
comment on the role of methodology in the debate. In order to optimize the explanatory power of
our robustness analysis, we focus on the ADF unit root test, as it was applied most often in the
past. But before going into the exploration and the robustness check of the ADF unit root test, we
want to make an important side note. Further statistical investigation by Rahman and Saadi (2008)
showed that the existence of a unit root in return series is merely a prerequisite, but not a
sufficient condition for the third version of the random walk hypothesis. What this implies is that
proof of a unit root does not suffice to imply weak form market efficiency. In addition, the returns
also need to be serially uncorrelated. In the context of our research, however, we have chosen to
implement the ADF unit root test despite this shortcoming. After all, we are taking a step back to
understand how the debate has developed over the last decades and what the role of methodology
has been in this development. Therefore, we only draw from the ADF unit root test to learn how it
may have affected the debate and not to come up with a conclusion on the efficiency of financial
markets.
23
3.2.2. Robustness analysis: Augmented Dickey-Fuller unit root test
Using stationarity as a proxy to test for weak form market efficiency, the ADF unit root test is the
second traditional statistical test of weak form market efficiency we examine for robustness. We
first introduce the concept of stationarity, before exploring the ADF test in more detail.
Stationarity of time series
Theoretically, a time-series is stationary when the underlying data generating process can be
defined as a stochastic process with a joint probability distribution that does not change in time nor
space (Hamilton, 1994). The distribution of a stationary time-series is thus the same as the
distribution of a lagged version or a subset of this time-series. This also implies that statistical
moments like the mean and variance of the stationary process do not change over time or position.
This conceptual definition of a stationary process is referred to as hard stationarity.
For purposes of empirical testing, however, the concept of weak stationarity or covariancestationarity was developed. In the remainder of this paper, when we refer to stationarity, we mean
covariance-stationarity. According to Hill, Griffiths and Lim (2011), a time series rt is following a
weak stationary process if for every point in time t, it is true that:



The mean is constant: E(rt) =
The variance is constant: var(rt) =
The covariance depends on s, not on t19: cov(rt, rt+s) = cov(rt, rt-s) =
A more elaborated discussion of different types of stationarity can for example be found in chapter
3 of Hamilton (1994).
Introduction to the augmented Dickey-Fuller unit root test
The Dickey-Fuller tests essentially examine if a certain time series contains a unit root or not, or
equivalently is non-stationary or stationary. These tests have also been popular among researchers
to indirectly determine whether or not a stock return series is weak form efficient. Whenever the
Dickey-Fuller tests indicate that stock returns are non-stationary (null hypothesis), researchers
have implied that stock returns are following a random walk. Consequently, it was concluded that
the stock market at hand is weak form market efficient.
Following Dickey and Fuller (1979), three different regression equations can be used to test for
stationarity:
(1)
(2)
(3)
In our example,
would be the difference between the stock returns and
, is the intercept
or drift term (comparable to the random walk arbitrary drift), is the time trend and
is the error
term. Equation (1) is the most basic form of the Dickey-Fuller test. Equation (2) also includes a
deterministic intercept or drift term; equation (3) has both a drift and a linear time trend. The key
parameter in all of the above equations is . Under the null hypothesis
, and the time series
contains a unit root and is thus non-stationary. Note that when this is the case, we would get the
expression for a random walk. Therefore, if we cannot reject the null hypothesis, the time series at
hand is assumed to follow a random walk and to be weak form efficient.
19
In words: the covariance does not depend on the specific time, but only on the length of the time interval.
24
An important assumption in the previous Dickey-Fuller equations is that the random errors are not
autocorrelated. To accommodate testing for unit roots in the event autocorrelation does occur in
the error term, the augmented Dickey-Fuller (ADF) was designed. This extended version of the
Dickey-Fuller equations simply includes a sufficient amount of lagged terms to deal with
autocorrelation. The ADF equation looks like the following (Hill et al., 2011):
Again,
is the difference between and
. The sum operator represents the number of lagged
first difference terms included to ensure that the residuals are not autocorrelated. The number of
necessary lag terms can be determined by examining the autocorrelation function of the random
error term . The parameter of interest is again . The null hypothesis of the ADF test states that
the time series contains a unit root (
) and hence is nonstationary. The alternative hypothesis
states that the time series is stationary. Statistical significance of results is discussed using pvalues that are drawn from the test statistic that is following a non-standard distribution20 ( statistic).
Application of the augmented Dickey-Fuller unit root test
As a way to obtain a deeper knowledge of the development of the market efficiency debate, we
analyze the ADF test in the exact same way as the VR test, using the same methodology, software
and set of (transformed) data on four stock market indices (DJIA, S&P-500, NASDAQ and BEL-20).
The same two decisions drive variation in our scenario analysis: time interval of data and time
window of data. Identical time intervals (daily, weekly and monthly) and time window lengths (5,
10 and 20 years) are considered. The ADF test is applied to a total of 10 different random time
windows for every index-time interval-time window length combination. Across the different indices
and time intervals, the ADF test is applied 36021 times.
Starting from the daily/weekly/monthly returns on the four selected stock market indices, the data
are again divided in time windows of 5, 10 and 20 years. The motivation for these specific window
lengths remains the same: it enables us to capture the variation in time lengths from a large
sample of earlier research. Again, we randomly select 10 different time windows per index-time
interval-time window length combination. In total, this process yields 9022 different scenarios per
index, which represent 9 groups of unique time interval-time window pairs per index. Since the null
hypothesis of the ADF test states that the return series contains a unit root and is thus nonstationary, a rejection of the null hypothesis points towards the index not being efficient in the
weak sense. A significance level of 5% is again used. Following earlier research, both a trend and a
constant will be included in the unit root equation. To gain further insight about the variation in pvalues, an OLS regression is again put in place. The explanatory variables remain the same:
dummies for time interval, window length and index. This scenario analysis based on variation in
two important data preprocessing decisions again allows us to examine for robustness of test
results, which helps us to gain a deeper understanding of the historical development of the debate
and the specific role of methodology like the ADF unit root test.
The summary of results for each of the four selected stock market indices can be found in tables 6,
7, 8 and 9.
20
Distribution quantiles are computed by simulation or numerical approximation. Using Monte Carlo simulation,
Dickey and Fuller (1979) tabulated critical values for the ADF test, which were later extended by MacKinnon
(1991).
21
4 indices x 3 time intervals x 3 time window lengths x 10 random windows.
22
3 different time intervals of data x 3 different window lengths of data x 10 random samples.
25
Table 6: DJIA ADF unit root test results
The table presents 90 p-values for the 90 scenarios of the Dow Jones Industrial Average stock market index. Daily, weekly and
monthly refer to the time interval at which data was recorded. The three different time window lengths of five, ten and twenty
years are represented by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time
interval combination. P-values indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a
random walk. P-values indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis.
Table 7: S&P-500 ADF unit root test results
The table presents 90 p-values for the 90 scenarios of the S&P-500 stock market index. Daily, weekly and monthly refer to the
time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented
by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination. Pvalues indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values
indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis.
Table 8: NASDAQ ADF unit root test results
The table presents 90 p-values for the 90 scenarios of the NASDAQ stock market index. Daily, weekly and monthly refer to the
time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented
by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination. Pvalues indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values
indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis.
26
Table 9: BEL-20 ADF unit root test results
The table presents 90 p-values for the 90 scenarios of the BEL-20 stock market index. Daily, weekly and monthly refer to the
time interval at which data was recorded. The three different time window lengths of five, ten and twenty years are represented
by 5, 10 and 20. S1 until S10 represent the 10 randomly selected samples for every time window-time interval combination. Pvalues indicated in red are smaller than 0.05 and represent a rejection of the null hypothesis of a random walk. P-values
indicated in green are 0.05 or higher, which does not lead to a rejection of the same null hypothesis.
From our analysis the ADF unit root test appears to be robust for daily data. Weekly observations
seem to spur controversy, as unequivocal results are only found when time windows are at least 10
years in length23. Results from monthly data exhibit a high degree of sensitivity24, except when
time windows of 20 years are used25. These results are again found across the different considered
stock markets (incl. BEL-20). Conclusions are hence not limited to U.S. stock markets.
The OLS regression indicates significantly higher p-values for the BEL-20 and the S&P-500
compared to the DJIA and NASDAQ. Additionally, the same relations between time interval, time
window and p-value are found: longer time intervals lead to higher p-values; longer time windows
lead to lower p-values.
In brief, we find the ADF unit root test to be robust for daily data, regardless of the time window
length. For weekly data, the ADF test seems robust in combination with windows of 10 and 20
years. When implementing monthly data, the ADF test is only robust when using windows of 20
years, given our sample of data. A positive relationship is found between time interval and p-value;
a negative relationship exists between time window and p-value. P-values for the BEL-20 and the
S&P-500 are also significantly larger than for the DJIA and NASDAQ. A further discussion of these
results can be found in section 5.
3.3. Tests based on non-linear serial dependence
A third and last traditional way of examining weak form market efficiency is through tests based on
non-linear serial dependence. Because of the overwhelming amount of possible nonlinear
dependency tests and their level of complexity, however, we choose not to separately test them for
robustness. Instead, we describe this type of test in more detail and present some important
insights for the debate based on the bulk of earlier empirical research.
The first two types of traditional tests we discussed were applied numerously in the aftermath of
the establishment of the EMH. But despite their popularity, they were not perfect in examining
weak form market efficiency. As we explained before, the ADF unit root test was shown to be
inconclusive as a tool to test for weak form market efficiency, as it does not take into account the
sufficient condition of serial independence. The VR test was critiqued as well, mostly by proponents
23
Two exceptions for the BEL-20 can be found for this conclusion.
One exception can be observed: monthly returns for the S&P-500 in windows of 5 years lead to unambiguous
conclusions across samples.
25
Seven conflicting results on a total of 40 samples across the four stock market indices.
24
27
of the EMH who claimed that the test is biased towards the conclusion that stock markets are not
weak form efficient. Continuing innovations of Lo and MacKinlay’s original VR test nullified most of
these critiques (e.g. the less popular Chow and Denning test (1993)). In later years, scholars from
the behavioral finance side of the debate also started to question the original VR tests because of
the exact opposite reason: an apparent bias towards a conclusion that stock markets were indeed
weak form efficient. The main argument presented by these behavioral researchers was the fact
that VR tests use a market efficiency proxy (i.e. linear autocorrelation) that is only able to pick up
linear forms of serial dependence, while non-linear serial dependence could also spur market
inefficiency.
In fact, Lim and Brooks (2011) pointed out that the first statistical evidence for the argument of
the behavioral finance researchers was published even before Lo and MacKinlay (1988) presented
their VR test. Granger and Andersen (1978) showed that tests based on autocorrelation are
ineffective in detecting processes that exhibit nonlinear rather than linear autocorrelation. Later,
Hinich and Patterson (1985) pointed out that the definition of a pure white noise process like the
random walk (version 1) had become blurred, mostly because of the work of Jenkins and Watts
(1968) and Box and Jenkins (1970). At the time, most researchers ignored nonlinear relationships
by implicitly assuming that an observed time series is generated by a Gaussian process.
Consequently, only linear autocorrelation was used to test for white noise. However, researchers
should be careful in mixing up the concepts of a white noise process and a pure white noise
process, as their mutual relation is asymmetrical. All pure white noise series, like the first version
of the random walk, are serially uncorrelated and thus white noise (Lim & Brooks, 2011). However,
the reverse is not true unless the time series at hand is normally distributed.
This more technical discussion from the field of statistics had its consequences for testing weak
form market efficiency. Disregarding the asymmetrical relationship between white noise and pure
white noise, scholars who did not find linear autocorrelation in a return series concluded that this
return series is following a random walk and is thus weak form efficient. However, as was pointed
out by Hinich and Patterson (1985) this conclusion only holds under the assumption that the return
series at hand is normally distributed, which is in practice untenable for financial data.
Consequently, new tests were necessary to take into account the possibility of nonlinear serial
dependence driving weak form market efficiency.
As revealed by Lim and Brooks (2011), it was chaos theory from the field of physics that sparked
new thoughts among researchers in finance about potential nonlinear return predictabilities in the
short term. Almost all of the early evidence also pointed towards the existence of such nonlinear
serial dependencies. Given the complex nature of chaos theory, however, most research started
focusing on the development of tests for nonlinear stochastic dependence, which can be divided
into two categories. The first group of tests examines a null hypothesis of linearity, without
specifically defining a nonlinear alternative in the event the null hypothesis is rejected. However
useful as a general test of nonlinearity, the lack of insight on the type of nonlinear dynamics makes
these tests less powerful. Examples are the bispectrum test (Hinich, 1982), the Brock-DechertScheinkman (BDS) test (Brock et al., 1996) and the bicorrelation test (Hinich, 1996). The second
group of nonlinear tests addresses the main drawback of the first group by explicitly testing against
a well-specified nonlinear alternative. Popular types of tests used in this category are the
Langrange multiplier (LM) test, the likelihood ratio test and the Wald test. Among the specific
nonlinear alternatives considered in this group of tests is the autoregressive conditional
heteroscedasticity (ARCH) model (Engle, 1982).
From our brief literature review, it is clear that there is some validity to the concept of non-linear
dependency in efficiency testing. A further digression on the results from each of the three
traditional tests of weak form market efficiency is presented is section six.
28
4. Alternative methodological approach
To overcome the conflicting results found when examining weak form market efficiency in the
traditional way, Campbell et al. (1997) introduced the idea of relative market efficiency, which
captures the degree to which a market is efficient through time. Rather than phrasing the
conclusion in terms of weak form efficient or not, this approach accommodates more nuance (e.g.
becoming more or less efficient). Following Campbell et al., three possible alternative approaches
were developed. In this section, we further explore these alternatives and look for important
insights from earlier empirical research to uncover opportunities for a way out of the debate.
4.1. Non-overlapping subperiod analysis
As suggested by its name, non-overlapping subperiod analysis is implemented by dividing a sample
of stock market data into subsequent non-overlapping subsamples (Lim & Brooks, 2011). These
subsamples are not selected randomly, but rather come from some ex ante insights about possible
underlying factors that drive market efficiency. For example, if a new policy measure was
implemented regulating the financial sector, it could be interesting to introduce a transition point in
the dataset for the day this policy measure was enacted into law. The combination of possible
underlying factors or events will eventually determine the number of subsamples the dataset will
be divided into. If, for example, 5 underlying events were considered, the dataset would become
subdivided in 6 subperiods. After the division of data in subsamples, researchers can estimate
weak form market efficiency in every subsample and observe the impact of the predetermined
underlying events by means of possible significant changes in the test results. The estimation of
weak form market efficiency per subperiod can be executed in several ways, for example by means
of a traditional test of weak form market efficiency.
In their survey paper, Lim and Brooks (2011) give an overview of different types of underlying
factors that have been examined in the past. For example, researchers studying the effect of
liberalization of financial markets on weak form efficiency found mixed results. We could thus not
conclude that liberalization always comes with a benefit for market efficiency. Another underlying
factor that has been researched is the improvement of information technology. Although it seems
natural that technological advances would improve overall market efficiency, results from the
literature are ambiguous. Gu and Finnerty (2002) find that autocorrelation in returns drastically
drops after the 1970s, which they ascribe to continuing improvements in technology. Other
research, however, concluded that the implementation of electronic trading systems did not lead to
an unequivocal increase of weak form market efficiency (Lim & Brooks, 2011). The effects of
regulatory and policy measures on market efficiency mainly depend on the nature of the
implemented measure. Results from earlier research point towards efficiency benefits as a result of
deregulation. Policies that are meant to intervene in the market seem to have detrimental effects
on weak form efficiency. For example, price limits introduced to circuit break the system in times of
extreme volatility were shown to disrupt the market equilibrium and hence decreased weak form
market efficiency.
Obviously, non-overlapping subperiod analysis is particularly helpful for policy makers. For the
purpose of broad market efficiency research, however, this approach is somewhat less well suited.
Given the scope of our paper, we limit the discussion of non-overlapping subperiod analysis to this
review of prior research and a further discussion of earlier results in section six.
29
4.2. Rolling estimation windows
Rolling estimation windows constitute a second group of alternative approaches that have been
developed in response to the concept of time-varying efficiency. We first introduce the rolling
estimation windows, after which we review earlier research implementing this technique. Finally,
we implement a robustness analysis.
4.2.1. Introduction to rolling estimation windows
In contrast to the non-overlapping subperiods, the rolling estimation technique transforms a full
sample of data into a series of overlapping subsamples. Starting from the first observation, a
subsample of a certain length is created. Next, this first time window is pushed forward one
observation. This process ends when a time window is created that covers the last observation for
the first time. More specifically, a sample of n observations is divided into n-l+1 windows, with l
being the length of the window (Lim & Brooks, 2011). Rolling estimation windows are more suited
for broad market efficiency research than non-overlapping subperiod analysis because no ex ante
decisions have to be made about underlying factors. Since efficiency can now be studied on a
rolling basis, the approach also allows for more nuanced conclusions that take into account the
possible time-variant character of weak form market efficiency.
Between the time that Campbell et al. (1997) coined the idea of a varying degree of market
efficiency and today, a lot of research involving rolling estimation windows has been done. A recent
overview of this research can be found in Lim and Brooks (2011). Some of the specific models that
have been developed are the following:





Rolling
Rolling
Rolling
Rolling
Rolling
VR tests;
ADF unit root tests;
bicorrelation tests;
parameters of ARCH models;
Hurst exponents.
As becomes immediately obvious, most of these tests are just extensions of the traditional tests of
market efficiency (e.g. VR test and ADF test). Instead of examining an entire data set at once,
rolling windows are implemented and a time-varying degree of efficiency is measured by applying
traditional tests of weak form market efficiency on every rolling window. Plotting the results from
the traditional test of weak form market efficiency of each individual window will then yield a timevarying measure of predictability, which could in turn be interpreted as a time-variant degree of
weak form market efficiency. The straightforward nature of this extension, despite the added
complexity, has made rolling estimation windows the most popular alternative methodological
approach to efficiency measurement. Nevertheless, as with every methodology, certain critiques
can be uttered. For instance, these tests are again subject to sensitivity in an underlying
parameter: the length of the rolling time windows. After a brief review of earlier research
implementing rolling estimation windows, we take into account this issue by implementing a
robustness check.
4.2.2. Review of rolling estimation literature
The most important and popular application of the rolling estimation windows has been to acquire
relative measures of weak form market efficiency. Among others, Cajueiro and Tabak (2004, 2005)
conducted multiple studies to test relative weak form market efficiency and were able to visualize
the time-variant weak form efficiency character of different Asian stock markets. Ranking these
stock markets, a high extent of variation in the degree of weak form efficiency was found. Covering
both developed and emerging stock markets, Lim and Brooks (2006) confirm the high degree of
variation in relative efficiency across stock markets and find that developed markets are relatively
more efficient than their emerging counterparts.
30
An alternative application of the rolling estimation windows is related to the technique of nonoverlapping subperiod analysis. Instead of an ex ante specification, rolling estimation windows can
be used to identify important events ex post by looking at shifts in the measure for relative weak
form efficiency. For example, Alvarez-Ramirez, Alvarez, Rodriguez and Fernandez-Anaya (2008)
find the termination of the Bretton Woods system in 1971 to coincide with a shift towards weak
form market efficiency of the Dow Jones. Other applications of the rolling estimation windows that
are less interesting for the purposes of our research can be found in Lim and Brooks (2011).
4.2.3. Robustness analysis: Rolling variance ratio tests
Like we did for two traditional tests of market efficiency, a robustness analysis is implemented for
rolling estimation windows. More particularly, we choose to focus on rolling VR tests because we
already explored the traditional VR test in section 3. Rolling ADF unit root tests are not considered
as they tend to be less popular for rolling window research following the critique formulated by
Rahman and Saadi (2008). Gretl is again used to execute our analysis.
Given that the rolling window technique simply consists of applying traditional market efficiency
tests to rolling windows, the robustness check can be approached in two stages. The first stage is
the robustness check of the applied traditional test of market efficiency. In the case of rolling VR
tests, we already completed the first stage in section 3 when we learned that VR tests are best
implemented using daily data. The second stage of the robustness check addresses possible
sensitivity of the rolling VR test to the length of the rolling time windows. Thus, we now apply VR
tests to daily data organized in rolling windows of varying lengths: 50 days (2 months26), 130 days
(6 months), 260 days (1 year), 520 days (2 years), 780 days (3 years), 1040 days (4 years), 1300
days (5 years), 2600 days (10 years), 3900 days (15 years) and 5200 days (20 years). For every
combination of stock market index, order of differentiation (q) and rolling window length, we
calculate p-values from the VR test statistics for the windows included in the rolling VR test. In
order to obtain a clear idea about the relative degree of weak form efficiency, we implement an
aggregate efficiency measure using the efficiency ratio of windows, which is the ratio of the
number of windows that reject the null hypothesis of a random walk at a confidence level of 95%
over the total number of windows. From this aggregate measure across the different window
lengths and orders of differentiation we can discuss the robustness of the rolling VR test. Next to
the aggregate efficiency measure, we also take into account the specific time-varying nature of
these tests by looking at the obtained time series of p-values for the different window lengths.
An overview of the obtained efficiency ratios can be found in table 10.
26
A rolling time window of length 25 days (1 month) is not considered, as it is too small to be used in
combination with an order of differentiation of 16.
31
Table 10: Efficiency ratios
The table presents efficiency ratios for different combinations of stock market index, rolling window length and order of
differentiation. The number of observations in the different rolling windows is indicated by l. The orders of differentiation are
denoted by q. For every of the four considered stock market indices, the efficiency ratio is calculated as the ratio of the number
of windows that reject the null hypothesis of a random walk at a confidence level of 95% over the total number of windows.
Note that the efficiency ratios roughly increase with the order of differentiation. The same
observation was made through the OLS regression for the robustness analysis of the traditional VR
test. In interpreting results, we consider the different orders of differentiation and take into
account the observed positive relationship. Results are again fairly similar across stock markets.
We observe a negative relationship between the relative degree of efficiency and the window
length. This also means that research that only considers one window length could be biased. For
example, if a researcher were to randomly pick rolling windows of length 5 years (1300
observations), he or she would observe very low degrees of efficiency for the four considered stock
market indices. If that same researcher had considered rolling windows of length 1 year, the
conclusion would have been less extreme. Rolling windows of length 2 or more years yield results
that are mostly useless for further exploration or discussion, as the time-varying degree of
efficiency is wiped out. Results from rolling windows of 2 months (50 observations) prove to be out
of touch with results from other rolling window lengths.
Given the time-varying nature of rolling estimation windows, we also plot VR test p-values of
rolling windows through time for every pair of index and order of differentiation. As it would take
up too much space, we only include the graph for the DJIA and order of differentiation 2 in
appendix B, which proved to be representative for the other cases as well. From this graph, we
gain the same insights as from the aggregate efficiency measure. The path of the p-values through
time for windows of length 2 months is not in line with the paths for longer windows. Once the
windows become too large (> 1 year), the time-varying degree of efficiency vanishes.
All in all, we can conclude that rolling VR tests are best implemented using daily data organized in
windows with a length between 6 months and 1 year. Weekly and monthly data are more prone to
sensitivity. Other window lengths can either lead to p-values that are biased upward or results that
no longer incorporate the time-varying concept of efficiency.
32
4.3. Time-varying parameter models
The third and last alternative method for weak form market efficiency testing is the time-varying
parameter model, which draws from a state space approach that allows for dynamic regression
parameters that can evolve through time. The origins of the time-varying parameter model lie in
the work of Kalman (1960) and developments from the field of engineering (Durbin & Koopman,
2008). To this day, a lot of new research in the fields of mathematics and statistics is still being
devoted to the further development of this methodology. Therefore, the application of time-varying
parameter models in finance is still in its infancy and a robustness check like the one we
implemented for the rolling estimation windows is not sensible. We further explore the
methodology from the existent body of research and look for insights that can be useful in better
grasping how the current debate came into existence. Additionally, we expand upon the test for
evolving efficiency and introduce a theoretical extension that might be worthwhile to implement in
future research.
The first study implementing a time-varying parameter model came from Emerson, Hall and
Zalewska-Mitura (1997). Drawing from the autocorrelation proxy of weak form market efficiency,
they designed a state space model with time-variant autocorrelation coefficients, which can be
interpreted as a time-varying measure of efficiency. From this research, Zalewska-Mitura and Hall
(1999) developed a formal test for evolving efficiency, in which a state space approach is applied to
a GARCH-M model to let parameters evolve through time. The GARCH-M approach is used to deal
with problems of non-constant error variance (heteroscedasticity) and autocorrelation of residuals,
taking into account the financial risk premium property (Hill et al., 2011). The state space modeling
accommodates the time-varying approach to efficiency. The combination of GARCH and state
space modeling essentially boils down to a time-varying parameter on the lagged return, with
heteroscedasticity and autocorrelation of residuals being controlled for.
As was mentioned in the literature review, the test for evolving efficiency has mainly been applied
to developing markets, as these are not believed to be “born” efficient. In their paper, Emerson et
al. (1997) found varying degrees of efficiency among Bulgarian shares. They also found variation in
the time it takes for Bulgarian shares to become more efficient. Zalewska-Mitura and Hall (1999)
find similar results for Hungarian stocks, using their test for evolving efficiency. Other researchers
have also adopted this test to stock markets from other continents. For example, Abdmoulah
(2010) finds no clear evolution towards weak form market efficiency for 11 Arab stock markets,
concluding that past reforms of the Arab financial markets have been ineffective in addressing
informational inefficiency. The test for evolving efficiency has also been applied to examine the
evolving stock market efficiency of the British FTSE-100, but the autocorrelation parameter did not
really evolve through time. Nevertheless, later studies have shown that the test for evolving
efficiency can be useful for developed markets as well.
Next to introducing the test for evolving efficiency, we also propose a theoretical expansion by
adopting the property of asymmetric reaction to information, i.e. people react more heavily to
negative than positive news. This is not just a behavioral observation but also a widely accepted
decision bias, as stock markets prove to be more sensitive to negative than positive news (Shefrin,
2000). For the sake of simplicity, we develop the expanded test starting from an ordinary least
squares (OLS) regression, modeling the return of today
as a function of the return of the days
before
, with
:
In this OLS regression,
is a constant,
is the number of included lagged returns,
parameter of the lagged return
and
is the random error term.
33
is the
A major problem with this simple approach to weak form market efficiency testing is that the
underlying assumptions of independent and homoscedastic residuals are not met with financial
data. A common solution is to implement a GARCH-M model, which allows for a separate
estimation of the variance of the random error term through a variance function, resolving the
issue of heteroscedasticity (Engle, 2001). Autocorrelation in the variance function is addressed by
adopting a geometric lag structure. Additionally, the typically positive relationship between risk and
return is taken into account by incorporating the variance of the residuals in the return equation.
To model the common asymmetric reaction to information, a threshold component is added as
well. More specifically, an asymmetry term is added to the variance function to distinguish between
the effects of good and bad news on stock markets. Now, the model looks like the following:
The ,
,
,
and
are the same as for the OLS regression. The random error term ,
however, is different as its variance is now modeled separately in the second equation (
,
which resolves the issue of heteroscedasticity. The variance of the residuals ( ) is also included in
the first equation with its own parameter , representing the risk-premium property. The variance
function consists of a constant
and a geometric lag structure that deals with the issue of
autocorrelation: the variance of residuals of yesterday (
) with parameter
and the squared
residual of yesterday with parameter
. The threshold component (
) together with its
parameter is added to the variance function to account for the asymmetric investor reaction to
new information.
One last problem remains with this approach: static parameters. To overcome this problem, we
adopt a state space model, which allows for parameters that dynamically evolve through time.
What we end up with is an adapted version of the test for evolving efficiency:
The first equation represents the measurement equation of the state space model; the other two
equations are state equations. The variance function remains exactly the same. In the first
equation,
and
become time-variant and change into
and
. The last equation presents
the dynamic estimation of
as
plus a separate random error term with its own variance .
Another interesting approach combines elements of the time-varying parameter model and the
rolling estimation windows. This methodology, developed by Ito and Sugiyama (2009), first uses a
moving time window approach to calculate time-varying autocorrelations ( ) on weekly or monthly
basis. This simply consists of dividing n observations into n-l+1 subsamples - l being the window
length - and then calculating first-order autocorrelations between the different subsamples. Next, a
state space model drawing from these time-varying autocorrelations ( ) is implemented:
34
The first equation is again the measurement equation, but this time without a time-varying
intercept and the risk-premium component, and starting from calculated rolling autocorrelations
rather than the return. The second equation is again the state equation, which is identical to the
one for the test of evolving efficiency. Initial results from this methodology point towards varying
levels of efficiency for the S&P-500 (Ito & Sugiyama, 2009).
5. Alternative theoretical framework
Focusing on the lack of a new theoretical framework on efficient markets as one of the causes of
the debate, we now consider an interesting alternative proposed by Lo (2004, 2005). We first
present the framework and then implement a rolling VR test to look for empirical validation.
5.1. Adaptive markets hypothesis: A possible reconciliating framework
In an attempt to reconcile theories of the EMH and behavioral finance, Lo (2004, 2005) came up
with the adaptive markets hypothesis (AMH). Starting from the concepts of bounded rationality and
satisficing27, and the notion of biological evolution, he argued that many of the biases found in
behavioral finance follow a certain evolutionary path, in which individuals try to learn and adapt to
new market conditions. This learning and adaptation process is driven by competition among
investors, and natural selection determines the market ecology, with some investors being driven
out of the market and some investors remaining in the market. The process of natural selection
and competition shapes the evolutionary dynamics underlying the market, which are mirrored in
the degree of efficiency. As long as there is no market shock that causes market ecology to
change, stock markets are fairly efficient. Once a certain event triggers the process of competition
and natural selection, markets become temporarily less efficient. When the new market ecology is
formed, efficiency of financial markets returns to pre-shock levels.
Looking at the recent financial crisis, we can recognize certain elements from Lo’s theory. Financial
markets had been fairly stable for some years and a reasonable degree of market efficiency was
reached. Nevertheless, investors also demonstrated some degree of irrational behavior, which
eventually led to the housing bubble and markets exhibiting higher degrees of inefficiency. Since
mortgages had been transformed into investment vehicles sold across the globe, the housing crisis
quickly evolved into a global financial crisis. Investors had to learn from their mistakes and needed
to adapt to the new market conditions. Trying to look for the optimal investment strategies,
competition started between new and incumbent investors. Those investors that did not learn
quickly enough and/or did not adapt to the new circumstances lost so much money that they were
driven out of the market. The new market ecology was formed consisting of those investors that
had learned and adapted promptly, and a new evolution towards efficiency was started.
From the AMH theory and the example, a reconciliation between the EMH and behavioral finance
becomes apparent. Markets are neither perfectly efficient nor inefficient all the time; there is a
certain evolutionary aspect to the process of market efficiency. For a long time, stock markets can
process information in a reasonably efficient manner until a certain shock, crash or other event
disrupts this state of efficiency. Some market participants are driven out of the market and some
new participants enter the market. During this process in which a new market ecology is formed,
relative levels of inefficiency are found. Once the transformation period ends, levels of market
efficiency are restored, until a new crash, shock or other event disrupts the new found ecological
equilibrium. Note that the AMH theory is also similar to the idea of the alternative methodologies in
which market efficiency is calculated in relative degrees over time.
27
Humans do not have the information nor the methodology to always optimize in a rational way.
Consequently, they use some rules of thumb or heuristics to find satisfactory results that are not necessarily
rational (Simon, 1955).
35
5.2. Empirical validation of the adaptive markets hypothesis
After introducing Lo’s AMH theory, we now look for empirical validation by employing a rolling VR
test starting from the insights of our robustness analysis. We also review earlier empirical research
testing Lo’s theory.
5.2.1. First attempts to empirically test the adaptive markets hypothesis
The earliest attempt to empirically investigate the AMH comes from Lo himself. Computing rolling
first-order autocorrelations of monthly returns as a measure of market efficiency, Lo (2004, 2005)
finds a cyclical pattern through time, which confirms the idea of underlying dynamics to the degree
of market efficiency. However, Lo’s estimated rolling autocorrelation measures are not in line with
the idea of markets being relatively efficient for a long time, until a market crash causes a short
period of relatively lower efficiency. Rather, his empirical evidence points towards the reverse.
In later years, researchers examined the AMH by means of trading strategies. Investigating the
profitability of moving average strategies on the Asia-Pacific financial markets, Todea, Ulici and
Silaghi (2009) confirm the cyclical efficiency pattern of the AMH. Neely, Weller and Ulrich (2009)
study excess returns earned by various technical trading rules on foreign exchange markets. They
find these returns to decline over time, but at a slower pace than expected under the EMH because
of behavioral and institutional factors. These findings are consistent with the AMH view of markets
being dynamic systems subject to underlying evolutionary processes.
Finding a higher degree of stock market predictability in times of economic and political crises, Kim
et al. (2011) confirm Lo’s idea of time-varying market efficiency being driven by changing market
conditions. During market bubbles and crashes, virtually no return predictability is found. This,
however, is at odds with Lo’s AMH, which states that higher degrees of predictability and thus
lower degrees of efficiency ought to be found in times of market mania. Implementing an OLS
regression, Kim et al. find inflation, risk-free rates and stock market volatility to be important
factors influencing stock return predictability over time.
The first evidence from empirical studies shows that there is some value to the idea of adaptive
markets. However, some discrepancies were found as well. From these observations and given the
limited number of studies empirically testing the AMH, we apply a rolling VR test to gain more
insights into the validity of Lo’s theory.
5.2.2. Application of a rolling variance ratio test to the adaptive markets hypothesis
Our choice for the rolling VR test to investigate the AMH follows naturally from the fact that both
the test and the theory find their origin in the concept of time-varying efficiency. In contrast to
earlier research, we also take underlying sensitivities to the applied methodology into account by
drawing from the findings of our earlier robustness analysis of the rolling VR test. Specifically, we
use daily data and rolling windows of length 6 months (130 observations) and 1 year (260
observations). The same four stock market indices as before are considered and the selected
orders of differentiation are still 2, 4, 8 and 16. We calculate the p-values of the VR test statistics
for the different rolling windows in the data samples and plot the results from the different
considered orders of differentiation for every combination of stock market index and rolling window
length. The obtained graphs present the evolution of VR test p-values through time and can be
interpreted as a time-varying measure of efficiency, which allows us to comment on the validity of
the ideas underlying the AMH. The results are obtained with the statistical package Gretl and are
presented graphically in figure 1.
36
Figure 1: Rolling VR test p-values for the DJIA, S&P-500, NASDAQ and BEL-20 for windows of length (l) 130 and 260
observations.
Every graph presents the p-values of the VR test for the different rolling windows in the data sample. Per index, two graphs are
included: one for windows with length (l) 130 observations (6 months) and one for windows with length 260 observations (1
year). For every combination of index and rolling window length, p-values are plotted for four different orders of differentiation
(q).
37
When looking at the plots in figure 1, we observe a cyclical pattern like the one noticed by Lo
(2004, 2005) when using rolling first-order autocorrelations. The patterns also exhibit some
similarities across the different stock indices. Peaks in p-values can be found at the end of the
1980s and around 2010, which are both periods of stock market crises. Together, these
observations provide additional validation for the idea of adaptive markets. Even though p-values
for the second order of differentiation ought to be smaller than those for higher orders of
differentiation, we detect a few observations in which this is not the case. Further examination did
not yield a precise explanation, other than that these irregularities seem to coincide with periods of
increased market turbulence. This, however, does not interfere with the overall interpretation of
results or the obtained conclusions.
There is also a peculiarity to the graphs that is at odds with the AMH: the peaks in p-values
represent periods with a relatively high degree of weak form market efficiency, as the null
hypothesis of returns following a random walk is no longer rejected. Rather than periods with a
high degree of efficiency being disrupted by a crash causing some extent of inefficiency, rolling
estimation windows point to the reverse. Stock market indices appear to be relatively inefficient for
some time, until a short period with a high degree of efficiency starts. Although in contrast with the
theory suggested by Lo (2004, 2005), this observation is not that different from what he observed
empirically by means of his rolling autocorrelation test. Our observation is also in line with what
was found by Kim et al. (2011). They provide a possible explanation stating that in times of
turbulence markets are harder to predict, causing tests of efficiency based on predictability to point
towards a higher degree of efficiency. A further digression on these results together with the
results of our earlier investigation of traditional and alternative weak form market efficiency tests is
presented in the next section.
6. Discussion of results
Further discussing results from our earlier analysis of both traditional and alternative test
methodologies, and the alternative theoretical framework, we now aim to infer some important
lessons from the past. Additionally, we use these results to look at the future and come up with
suggestions for further research.
6.1. Traditional tests of weak form market efficiency
Our treatment of the more traditional tests of market efficiency allows us to learn something about
the role methodology has played in the development of the debate. From our robustness analysis
we learn that it is unlikely that the VR test has complicated the debate if only daily or weekly data,
or monthly data in combination with time windows of 20 years was used in research. However,
scanning the literature we find that this was not always the case. For example, Charles and Darné
(2009) indicate that research employing Lo and MacKinlay’s VR test to check for weak form market
efficiency of Latin American stock markets has led to conflicting results. Furthermore, they
observed that “the results are overall mixed and scattered over studies that employ different
sample periods, methods and data frequencies” (p. 518). This observation is perfectly consistent
with our findings about sensitivity of the VR test to decisions on time interval (i.e. data frequency)
and time window (i.e. sample period). In sum, we can conclude that methodology, and more
particularly the VR test, might have complicated the debate.
From our analysis, we also learn that prior research using the ADF unit root tests in combination
with daily data probably did not spur controversy. However, given weekly and monthly data,
depending on the sample time window length, the ADF unit root test may have led to conflicting
results. Lean and Smyth (2007) reviewed the literature of studies employing ADF unit root tests to
examine weak form market efficiency and concluded that “the empirical evidence on the random
walk hypothesis from these studies is mixed” (p. 17). Given our findings of sensitivity of the ADF
38
test to decisions on time window and time interval, we can understand how these conflicting results
could have been found.
Since both the ADF unit root test and Lo and MacKinlay’s VR test have been popular to test for
weak form market efficiency, we also need to examine to what extent these tests lead to consistent
results. All in all, we see that both tests only yield consistent results for daily data, weekly data in
combination with time windows of 10 and 20 years and monthly data in combination with time
windows of 20 years.
Nonlinear serial dependence also proved to be a significant feature of stock market returns across
the globe. The most important insight for the purpose of our research is that, like was suggested
by behavioral scholars, there is more to weak form market efficiency than the predictability
uncovered by linear autocorrelation tests. This also helps explain part of the debate, as it weakens
the arguments of proponents of the EMH drawing from linear autocorrelation tests. However, since
few nonlinear tests provide more information on the specific form of the underlying nonlinear
process, it was difficult for behaviorists to make a strong case for stock markets being generally
inefficient. To this day, the emergence of new methodologies looking at nonlinear serial
dependency did not lead to a conclusion for the debate, but instead further increased controversy.
All together, our analysis yields an important lesson from the past: traditional methodologies to
test for weak form market efficiency may be considered as one of the causes of the debate. Both
tests based on the autocorrelation and the stationarity proxy are prone to sensitivity, which might
have led to conflicting results. Also, these tests do not always lead to the same conclusions, which
might have induced further controversy. Finally, the consideration of nonlinear serial dependencies
proved to be useful, but eventually only led to added complexity in the debate, as it remains
difficult to specify the underlying nonlinear process.
6.2. Alternative tests of weak form market efficiency
In response to the idea of Campbell et al. (1997) of a time-varying degree of efficiency, several
alternative test methodologies were developed. These methodologies have already been applied
but most are still in development today and can be considered a tool for future research.
Non-overlapping subperiod analysis has proven to be very valuable in researching the impact of
specific events or factors on market efficiency. From earlier research, we learn that weak or mixed
results are found on the impact of some broad categories of events. This again proves how
complex and multidimensional the concept of market efficiency is. It cannot just be about
information being incorporated in market prices in a timely manner, because then efficiency should
have increased in response to, for example, liberalization and technological revolutions. Since nonoverlapping subperiod analyses are somewhat less appropriate to comment on the broader issue of
market efficiency, academics should also be careful in drawing too strong a conclusion from the
evidence. Stating more than that these tests provide a better understanding of the complexity of
the issue would be scientifically unjust.
The best-developed category of alternative tests is that of the rolling estimation windows. From our
exploration we learn that a relative and time-varying approach can be valuable and is less likely to
spur controversy. The robustness analysis indicates that rolling VR tests should be implemented on
daily data organized in windows with a length between 6 months and 1 year. Rather than
immediately settling the debate, we feel like the rolling estimation windows can help to redefine
the debate first. Instead of trying to prove whether or not financial markets are weak form
efficient, it might be wise to study market efficiency as a time-variant and relative, rather than an
absolute concept.
39
The time-varying parameter models represent another interesting alternative approach to the
examination of weak form market efficiency, but more research is still required. On the one hand,
methodological innovations are necessary to further perfect the test as a measure of weak form
market efficiency. On the other hand, these kinds of tests need to be implemented more often
using stock market data to validate earlier results of time-varying degrees of efficiency.
Both the rolling windows and the time-varying parameter models provide a new perspective for the
debate as their time-variant nature enables reconciliation between the extreme views held by
adherents of both the EMH and behavioral finance. We suggest that researchers look into the
further development of these tests as this could lead to a definitive redefinition of efficiency from
an absolute to a more time-variant and relative state of financial markets. This, in turn, might
further accommodate a reconciling conclusion for a debate of many years’ standing.
6.3. Alternative theoretical framework on market efficiency
Another reason for the debate from remaining unsettled is the lack of an alternative theoretical
framework capturing obtained empirical results. Lo (2004, 2005) tried to fill this lack by positing
the AMH, which reconciles between behavioral finance and the EMH, drawing from insights of
evolutionary biology. Both our own empirical investigation of the AMH and earlier research
confirmed the dynamic character of weak form market efficiency. However, the theoretical pattern
of longer periods of relatively high degrees of efficiency interrupted by shorter periods of relative
inefficiency associated with crises can be empirically discarded. In fact, the opposite pattern was
found.
We believe the reason for these conflicting patterns lies in the unobservable nature of efficiency,
which calls for the use of proxies. Different kinds of proxies have been used in the past, all of which
are related to the principle of predictability because of its interesting inverse theoretical
relationship with efficiency. Lo’s AMH starts from our intuition of markets being capable of
efficiently incorporating information most of the time. In times of turbulence, however, this ability
of markets might temporarily weaken. Inversely, predictability of stock returns is always assumed
to be limited, except in times of market mania.
Looking at the estimates for the efficiency pattern from testing the AMH, the inverse relationship
between the predictability proxy and efficiency seems imperfect. In fact, for our intuition to be
correct, the relationship between predictability and efficiency should be proportional rather than
inverse, as the observed peaks in p-values represent periods with a relatively low degree of return
predictability. This puzzle helps to understand why Lo’s theory has not been able to settle the
debate over the last decade. In order to find a way out of this conundrum, we see two possibilities.
We could redefine our intuition, as our understanding of the concept of efficiency might be
incorrect. This, however, tends to be very difficult and is likely to face strong resistance. A better
solution would be to look for another observable trait of financial markets that could be used as an
appropriate proxy for efficiency, which presents an interesting challenge for future research.
40
7. Conclusion
The debate on efficient markets has come a long way. In fact, many of the most renowned 19th
and 20th century economists have contributed to it to some extent. From our research, we find two
reasons that help explain the current dispute. First, traditional test methodologies applied in the
early aftermath of the establishment of the EMH proved to be prone to sensitivity, which might
have spurred conflicting results. Additionally, opponents of the EMH have failed in coming up with
an improved theoretical alternative.
Next to a better understanding of the past, our research also provides perspectives for the future.
Most importantly, we believe that the concept of efficiency should be redefined along the lines of
Campbell et al. (1997). Considering efficiency as an absolute and binary state of financial markets
has spurred controversy. The idea of efficiency being a time-variant and relative characteristic can
pave the way for reconciliation between adherents of the EMH and proponents of behavioral
finance. In this regard, the presented alternative methodologies enabling a more dynamic approach
to efficiency can be important tools for future research. In addition to an evolution of test
methodologies, a new theoretical framework on efficiency is needed. Lo’s AMH can be the impetus
for such a new framework, but further methodological refinement is needed in testing for market
efficiency.
Research opportunities remain in the continued development of alternative test methodologies, as
they have the potential to become the standard tests of efficiency in the future. Ideally, robustness
analyses should be implemented systematically to avoid further controversy. More research is also
needed to address the conundrum observed when empirically investigating Lo’s AMH. Together,
alternative test methodologies and a new theoretical framework on efficiency have the potential of
settling the debate by means of reconciling between ideas of both the EMH and behavioral finance
using the concept of time-varying and relative efficiency.
References
Abdmoulah, W. (2010). Testing the evolving efficiency of Arab stock markets. International Review
of Financial Analysis, 19(1), 25-34.
Alexander, S. S. (1961). Price movements in speculative markets: Trends or random walks.
Industrial Management Review, 2(2), 7-26.
Alexander, S. S. (1964). Price movements in speculative markets: Trends or random walks, no. 2.
Industrial Management Review, 5(2), 25-46.
Alvarez-Ramirez, J., Alvarez, J., Rodriguez, E., & Fernandez-Anaya, G. (2008). Time-varying Hurst
exponent for US stock markets. Physica A, 387(24), 6159-6169.
Bachelier, L. (1900). Théorie de la spéculation. Annales Scientifiques de l’Ecole Normale
Supérieure, 3(17), 21-86.
Ball, R. (1978). Anomalies in relationships between securities' yields and yield-surrogates. Journal
of Financial Economics, 6(2-3), 103-126.
Barberis, N., Shleifer, A., & Vishny, R. (1998). A model of investor sentiment. Journal of Financial
Economics, 49(3), 307-343.
Bodie, Z., Kane, A., & Marcus, A. (2010). Investments (9th ed.). New York: McGraw-Hill Irwin.
Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis, forecasting and control. San Francisco:
Holden-Day.
Brock, W. A., Scheinkman, J. A., Dechert, W. D., & LeBaron, B. (1996). A test for independence
based on the correlation dimension. Econometric Reviews, 15(3), 197-235.
Brown, D., & Jennings, R. (1989). On technical analysis. Review of Financial Studies, 2(4), 527551.
41
Brown, R. (1828). A brief account of microscopical observations. Edinburgh New Philosophical
Journal, 6, 358-371.
Cajueiro, D. O., & Tabak, B. M. (2004). The Hurst exponent over time: Testing the assertion that
emerging markets are becoming more efficient. Physica A, 336(3-4), 521-537.
Cajueiro, D. O., & Tabak, B. M. (2005). Ranking efficiency for emerging equity markets II. Chaos,
Solitons and Fractals, 23(2), 671-675.
Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The econometrics of financial markets.
Princeton: Princeton University Press.
Carhart, M. M. (1997). On persistence in mutual fund performance. Journal of Finance, 52(1), 5782.
Charles, A., & Darné, O. (2009). Variance-ratio tests of random walk: An overview. Journal of
Economic Surveys, 23(3), 503-527.
Chow, K., & Denning, K. (1993). A simple multiple variance ratio test. Journal of Econometrics, 58
(3), 385-401.
Conrad, J., & Kaul, G. (1988). Time-variation in expected returns. Journal of Business, 61(4), 409425.
Cowles, A. (1933). Can stock market forecasters forecast? Econometrica, 1(3), 309-324.
Cowles, A. (1944). Stock market forecasting. Econometrica, 12(3-4), 206-214.
Cowles, A., & Jones, H. (1937). Some a posteriori probabilities in stock market action.
Econometrica, 5(3), 280-294.
Daniel, K., Hirshleifer, D., & Subrahmanyam, A. (1998). Investor psychology and security market
under- and overreactions. Journal of Finance, 53(6), 1839-1885.
De Bondt, W. F., & Thaler, R. (1985). Does the stock market overreact? Journal of Finance, 40(3),
793-805.
De Long, B. J., Shleifer, A., Summers, L. H., & Waldmann, R. J. (1990). Positive feedback
investment strategies and destabilizing rational expectations. Journal of Finance, 45(2),
374-397.
Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series
with a unit root. Journal of the American Statistical Association, 74(366), 427-431.
Durbin, J., & Koopman, S. (2008). Time series analysis by state space methods. Oxford: Oxford
University Press.
Emerson, R., Hall, S. G., & Zalewska-Mitura, A. (1997). Evolving market efficiency with an
application to some Bulgarian shares. Economics of Planning, 30(2-3), 75-90.
Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of
United Kingdom inflation. Econometrica, 50(4), 987-1007.
Engle, R. F. (2001). GARCH 101: The use of ARCH/GARCH models in applied econometrics. Journal
of Economic Perspectives, 15(4), 157-168.
Fama, E. F. (1965a). The behavior of stock-market prices. Journal of Business, 38(1), 34-105.
Fama, E. F. (1965b). Random walks in stock market prices. Financial Analysts Journal 21(5), 55–
59.
Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. Journal of
Finance, 25(2), 383-417.
Fama, E. F. (1991). Efficient capital markets: II. Journal of Finance, 46(5), 1575-1617.
Fama, E. F. (1998). Market efficiency, long-term returns and behavioral finance. Journal of
Financial Economics, 49(3), 283-306.
Fama, E. F., & Blume, M. E. (1966). Filter rules and stock-market trading. Journal of Business,
39(S1), 226-241.
Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds.
Journal of Financial Economics, 33(1), 3-56.
42
Fox, J. (2009). The myth of the rational market: A history of risk, reward, and delusion on Wall
Street. New York: Harper Business.
Gibson, G. (1889). The stock markets of London, Paris and New York. New York: G.P. Putnam's
Sons.
Granger, C. W. J., & Morgenstern, O. (1963). Spectral analysis of New York stock market prices.
Kyklos, 16(1), 1–27.
Granger, C. W. J., & Andersen, A. P. (1978). An introduction to bilinear time series models.
Gottingen: Vandenhoeck and Ruprecht.
Grossman, S. (1976). On the efficiency of competitive stock markets where traders have diverse
information. Journal of Finance, 31(2), 573-585.
Grossman, S., & Stiglitz, J. (1980). On the impossibility of informationally efficient markets.
American Economic Review, 70(3), 393-408.
Gu, A. Y., & Finnerty, J. (2002). The evolution of market efficiency: 103 years daily data of the
Dow. Review of Quantitative Finance and Accounting, 18(3), 219-237.
Hald, A. (1990). A history of probability and statistics and their applications before 1750. New
York: John Wiley and Sons.
Hamilton, J. D. (1994). Time series analysis. Princeton: Princeton University Press.
Hill, R. C., Griffiths, W. E., & Lim, G. C. (2011). Principles of econometrics (Vol. 4). New York: John
Wiley and Sons.
Hinich, M. J. (1982). Testing for Gaussianity and linearity of a stationary time series. Journal of
Time Series Analysis, 3(3), 169-176.
Hinich, M. J. (1996). Testing for dependence in the input to a linear time series model. Journal of
Nonparametric Statistics, 6(2-3), 205-221.
Hinich, M. J., & Patterson, D. M. (1985). Evidence of nonlinearity in daily stock returns. Journal of
Business and Economics Statistics, 3(1), 69-77.
Ito, M., & Sugiyama, S. (2009). Measuring the degree of time varying market inefficiency.
Economics Letters, 103(1), 62-64.
Jegadeesh, N. (1990). Evidence of predictable behavior of security returns. Journal of Finance, 45
(3), 881-898.
Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: Implications for
stock market efficiency. Journal of Finance, 48(1), 65-91.
Jegadeesh, N., & Titman, S. (2001). Profitability of momentum strategies: An evaluation of
alternative explanations. Journal of Finance, 56(2), 699-720.
Jenkins, G. M., & Watts, D. (1968). Spectral analysis and its applications. San Francisco: HoldenDay.
Jensen, M. C. (1978). Some anomalous evidence regarding market efficiency. Journal of Financial
Economics, 6(2-3), 95-101.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica, 47(2), 263-292.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic
Engineering, 82(1), 35-45.
Kemp, A. G., & Reid, G. C. (1971). The random walk hypothesis and the recent behaviour of equity
prices in Britain. Economica, 38(149), 28-51.
Keynes, J. M. (1936). The general theory of employment, interest and money. London: Macmillan.
Kim, J. H., Shamsuddin, A., & Lim, K. P. (2011). Stock return predictability and the adaptive
markets hypothesis: Evidence from century-long U.S. data. Journal of Empirical Finance,
18(5), 868-879.
Kothari, S. (2001). Capital markets research in accounting. Journal of Accounting and Economics,
31(1-3), 105-231.
43
Lakonishok, J., Shleifer, A., & Vishny, R. W. (1994). Contrarian investment, extrapolation, and risk.
Journal of Finance, 49(5), 1541-1578.
Lean, H. H., & Smyth, R. (2007). Do Asian stock markets follow a random walk? Evidence from LM
unit root tests with one and two structural breaks. Review of Pacific Basin Financial Markets
and Policies, 10(1), 15-31.
Lehmann, B. (1990). Fads, martingales and market efficiency. Quarterly Journal of Economics,
105(1), 1-28.
Lim, K. P., & Brooks, R. D. (2006). The evolving and relative efficiencies of stock markets:
Empirical evidence from rolling bicorrelation test statistics (SSRN Working Paper No.
931071).
Retrieved
from
the
Social
Sciences
Resource
Network
website:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=931071
Lim, K. P., & Brooks, R. D. (2011). The evolution of stock market efficiency over time: A survey of
the empirical literature. Journal of Economic Surveys, 25(1), 69-108.
Lo, A. W. (2004). The adaptive markets hypothesis: Market efficiency from an evolutionary
perspective. Journal of Portfolio Management, 30(5), 15-29.
Lo, A. W. (2005). Reconciling efficient markets with behavioral finance: The adaptive markets
hypothesis. Journal of Investment Consulting, 7(2), 21-44.
Lo, A. W., & MacKinlay, A. C. (1988). Stock market prices do not follow random walks: Evidence
from a simple specification test. Review of Financial Studies, 1(1), 41-66.
Lo, A. W., & MacKinlay, A. C. (1990). When are contrarian profits due to market overreaction?
Review of Financial Studies, 3(2), 175-206.
Lo, A. W., & MacKinlay, A. C. (1997). Maximizing predictability in the stock and bond markets.
Macroeconomic Dynamics, 1(1), 102-134.
MacKinnon, J. G. (1991). Critical values for cointegration tests. In R. F. Engle & C. W. J. Granger
(Eds.). Long-run economic relationships: Readings in cointegration. Oxford: Oxford
University Press.
Malkiel, B. G. (1992). Efficient market hypothesis. In P. Newman, M. Milgate, & J. Eatwell (Eds.).
New palgrave dictionary of money and finance. London: Macmillan.
Malkiel, B. G. (2003). The efficient market hypothesis and its critics. Journal of Economic
Perspectives, 17(1), 59-82.
Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7(1), 77-91.
Marshall, A. (1890). Principles of economics. London: Macmillan.
Neely, C. J., Weller, P. A., & Ulrich, J. (2009). The adaptive markets hypothesis: Evidence from the
foreign exchange market. Journal of Financial and Quantitative Analysis, 44(2), 467-488.
Pearson, K. (1905). The problem of the random walk. Nature, 72(1865), 294.
Rahman, A., & Saadi, S. (2008). Random walk and breaking trend in financial series: An
econometric critique of unit root tests. Review of Financial Economics, 17(3), 204-212.
Regnault, J. (1863). Calcul des chances et philosophie de la bourse. Paris: Mallet-Bachelier et
Castel.
Roberts, H. (1967). Statistical versus clinical prediction of the stock market. Unpublished
manuscript.
Ross, S. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory,
13(3), 341-360.
Samuelson, P. A. (1965). Proof that properly anticipated prices fluctuate randomly. Industrial
Management Review, 6(2), 41-49.
Sewell, M. (2011). History of the efficient market hypothesis (UCL Research Note No. RN/11/04).
Retrieved from the University College of London website: http://www.cs.ucl.ac.uk/
fileadmin/UCL-CS/images/Research_Student_Information/RN_11_04.pdf
Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk.
Journal of Finance, 19(3), 425-442.
44
Shefrin, H. (2000). Beyond greed and fear: Understanding behavioral finance and the psychology
of investing. Oxford: Oxford University Press.
Shiller, R. J. (1979). The volatility of long-term interest rates and expectations models of the term
structure. Journal of Political Economy, 87(6), 1190-1219.
Shiller, R. J. (1989). Market volatility. Cambridge: The MIT Press.
Shiller, R. J. (2000). Irrational exuberance. Princeton: Princeton University Press.
Shiller, R. J. (2003). From efficient markets theory to behavioral finance. Journal of Economic
Perspectives, 17(1), 83-104.
Simon, H. A. (1955). A behavioral model of rational choice. Quarterly Journal of Economics, 69(1),
99-118.
Smith, A. (1759). The theory of moral sentiments. Indianapolis: Liberty Fund.
Smith, A. (1766). The wealth of nations. New York: P. F. Collier.
Smith, V. L. (1998). The two faces of Adam Smith. Southern Economic Journal, 65(1), 2-19.
Todea, A., Ulici, M., & Silaghi, S. (2009). Adaptive markets hypothesis: Evidence from Asia-Pacific
financial markets. Review of Finance and Banking, 1(1), 7-13.
von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton:
Princeton University Press.
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for
heteroskedasticity. Econometrica, 48(4), 817-838.
White,
H., & Domowitz, I. (1984).
Econometrica, 52(1), 143-162.
Nonlinear
regression
with
dependent
observations.
Working, H. (1934). A random-difference series for use in the analysis of time series. Journal of the
American Statistical Association, 83(s1), S87-S93.
Wright, J. (2000). Alternative variance-ratio tests using ranks and signs. Journal of Business and
Economic Statistics, 18(1), 1-9.
Zalewska-Mitura, A., & Hall, S. G. (1999). Examining the first stages of market performance. A test
for evolving market efficiency. Economics letters, 64(1), 1-12.
45
Appendix A
Graphical summary table 1
Time-series plots for the daily, weekly and monthly prices of the DJIA, S&P-500, NASDAQ
and BEL-20.
46
47
Appendix B
This graph presents the p-values from different rolling window lengths through time for the DJIA with order of differentiation 2.
Download