Investigation of NYSE High Frequency Financial

advertisement
Investigation of NYSE High Frequency Financial Data
for Intraday Patterns in Jump Components of
Equity Returns
Peter Van Tassel1
Final Report Submitted for Economics 201FS: Research Seminar
and Lab on High Frequency Financial Data Analysis
for Duke Economics Juniors
Duke University
Durham, North Carolina
2 May 2007
Academic Honesty Pledge
1. I will not lie, cheat, or steal in my academic endeavors, nor will I accept the actions of those
who do.
2. I will conduct myself responsibly and honorably in all my activities as a Duke student.
3. The assignment is in compliance with the Duke Community Standard as expressed on pp. 5-7
of “Academic Integrity at Duke: A Guide for Teachers and Undergraduates.”
.
1
peter.vantassel@duke.edu. Box 93481, Durham NC 27708
Peter Van Tassel
Pledge
I. Introduction
Financial markets are complex systems in which agents interact to determine the prices of
different assets. The goal of our research is to analyze high frequency financial data to improve
our knowledge of how financial markets operate. In particular, we consider the literature on jump
components in asset prices as a starting point for this report.
Our motivation is both practical and intellectual. From a practical standpoint, there is
something awry in the traditional methods for modeling stock price evolution. The well
documented smiles in implied volatility from the Black-Scholes option pricing formula are one
of many indications that the market is valuing volatility in a different manner than the
rudimentary academic models. The literature related to jump components in asset prices,
including papers like Andersen, Bollerslev, Diebold 2004, Eraker 2003, and Huang and Tauchen
2005, all have practical implications for modeling volatility, including important results for
derivative valuation, risk management, and asset allocation. These concerns are of utmost
importance for a trader or portfolio manager.
From an intellectual standpoint, the recently available high frequency financial data
provide new opportunities for frontier research in econometrics. In some instances this will allow
us to investigate previous literature in finance, providing a better picture as to which ideas are
robust to the tick by tick data at the New York Stock Exchange. In other circumstances the data
will allow for new types of investigation that were never previously considered. Undoubtedly,
the ability to zoom in on financial markets will add a new level of complexity to research and a
variety of intellectual challenges before we can decipher the on-goings of the real world.
In this lab report the high frequency financial data will be considered to investigate
intraday patterns in jump components of equity return variance. The purpose is to improve our
2
understanding of the evolution of heavily traded stocks on the NYSE. Learning more about what
drives intraday patterns in jump components will provide practical implications for trading and
portfolio management, as well as raising several interesting questions at the end of the report.
Our focus will be on the S&P 500. Figures 1a & 1b present the level prices and returns over our
sample period. The rest of the report will proceed as such: Section II will describe the data,
Section III will describe the statistics considered, Section IV will present some preliminary work
related to SEC Filings and idiosyncratic jumps, including an interesting example, Section V will
discuss patterns in flagged jump arrival, Section VI will continue the investigation of intraday
patterns in jump components of equity return variance, and Section VII will present our ideas for
future work. All tables, figures, and references can be found at the back of the report.
II. Data
High frequency financial data from the NYSE are analyzed for the purpose of this report.
We will focus on the S&P 500 but will also consider Pepsi Co., the Coca-Cola Company, and
Bristol Myers Squibb Co. The specific stocks that are considered were assigned as a starting
point for research in Econ 201FS. Moving forward the analysis will be extended to an additional
37 stocks and an aggregate portfolio of all 40 stocks included in Law (2007). In this report the
SPY data set will be used as a proxy for the market portfolio. The data sets are obtained from the
Trade and Quote Database (TAQ) which is available via Wharton Research Data Services
(WRDS). A more comprehensive discussion in regard to the data and how the data sets were
compiled can be found in Law (2007).2
2
First and foremost, it is necessary to thank Tzuo Hann Law for sharing the data sets he compiled for his Honors
Thesis. Without his help and willingness to collaborate the writing of this report would have been impossible.
A more detailed discussion of the method for obtaining the data sets can be found in his paper The Elusiveness of
Systematic Jumps.
3
Our selection of stocks is motivated by trading volume. In order for the statistics used in
this report to behave properly it is necessary that the stocks considered be heavily traded. The
stocks assigned in class and the stocks included in Law (2007) are 40 of the most actively traded
stocks on the NYSE as defined by their 10-day trading volume. For each stock there is a data set
that includes all trades from January 1, 2001 through December 31, 2005. The time period was
selected for two reasons. During the late 1990s trading frequency increased significantly and by
2001 the volume was high enough to justify the use of the statistics. Additionally, by 2001
almost all of the stocks were converted from fractional to decimal trading which helped to reduce
some of the market-microstructure noise.
To convert the TAQ data into a 30 second price series an adapted version of the previous
tick method from Dacorogna, Gencay, Muller, Olsen, and Pictet (2001) is applied. It excludes
the first five minutes of the trading day in order to ensure uniformity of trading and information
arrival. The resulting price series includes 771 observations from 9:35am to 4:00pm across 1241
days. The structure of the data set is advantageous because it allows us to easily implement the
statistics across different sampling intervals.
In this report the sampling frequency will be 17.5 minutes unless otherwise stated. Our
primary concern in selecting a sampling frequency is the effect of market microstructure noise.
The literature on market microstructure noise (MMN) dates back to Black (1976) and discusses a
variety of sources that bias prices when sampling at a high frequency, including trading
mechanisms and discrete prices. One approach to account for this problem is proposed in
Andersen, Bollerslev, Diebold, and Labys (2000). They suggest the creation of signature plots of
the realized variance across different sampling intervals to allow for the visual selection of a
sampling interval where the MMN seems to have stabilized. The selection of 17.5 minutes is
4
made because it seems to be the highest sampling frequency that is relatively unaffected by the
MMN in the signature plots included in Law (2007). The result for our data set is that each stock
has 22 returns per day across 1241 days.
Figures 2a-2b and Tables 2a-2b are also related to the discussion of sampling interval.
The figures portray the number of flagged jumps by the LM statistic at a .999 significance level
across different sampling intervals and window sizes. The tables include the number of flagged
jumps using the recommended window sizes for instantaneous volatility as defined by Lee and
Mykland. Visually the statistic seems to stabilize as the window size increases for each sampling
interval. However, Tables 2a & 2b suggest that the number of flagged jumps does not stabilize
across different sampling intervals. At 17.5 minutes there are 306 flagged jumps for the SPY
data whereas at 55 minutes there are 120 flagged jumps. One explanation might be Type I errors.
At different sampling intervals there are a significantly different number of statistics calculated.
For example, a sampling interval of 17.5 minutes yields 27,152 statistics whereas a sampling
interval at 55 minutes only yields 8,609 statistics over the same data set. The null hypothesis of
no jump will likely be incorrectly rejected more frequently over 27,152 statistics versus 8,609
statistics. To account for this discrepancy Table 2c includes the number of flagged jumps at each
sampling intervals with varying significance levels.3 Although it does not convincingly suggest
that the LM statistic has stabilized, it does provide more reassuring evidence that 17.5 minutes is
an appropriate sampling interval than Tables 2a & 2b.
A final consideration needs to be made for the errors in the dataset. It is important to
realize that the TAQ database is a human construction that relies on manual entry of the data. As
3
The different significance levels were calculated in the following manner. For the entire sample we decide Pr(Type
I Error) = .001. Each flagged jump is assumed to be independently flagged correctly or inadvertently. Using a
binomial distribution we argue Pr (No error) = .999 = Pr (k=0) =
 nk   nk 1  k
  . The value of n is known
 n 
n
to be the number of statistics in the sample and the value of alpha is then determined for each sampling interval to be
used as a significance level in Table 2c.
5
such, it is inevitably subject to human error and needs to be highly scrutinized. Errors in the data
set are removed in two manners. First, a simple algorithm sets suspect prices equal to zero when
thirty second returns are at least 1.5% in opposite directions. Suspect prices are removed from
the price series because it seems illogical that an efficient market would induce a stock to move
1.5% in opposite directions in the span of one trading minute. A likely cause of this phenomenon
is data entry error. However, we do not presume that errors in data entry are the only possibility.
One curious example can be found on Figure 2c included. The highlighted trade seems to be out
of sync with the rest of the price series. However, further inspection reveals that the volume on
the trade was actually 37 times greater than the average volume per transaction over the 5 year
sample. Isn’t it possible that a large investor, perhaps a hedge fund, needed to unload a large
quantity of shares and was willing to accept a slightly lower price than the rest of the market?
Surely some behavior that seems irrational is in reality a well functioning and efficient market.
With this concern taken into consideration we proceed with the second method for highlighting
errors in the price series. Often a human eye is required for removing outliers that are undetected
by the algorithm. In particular, manual inspection helps to remove returns that seem to have
unreasonably high or low magnitudes. One example is discussed in Figures 2d & 2e.
Ultimately, we arrive at our data for this report. It includes price series for Pepsi Co.
(PEP), the Coca Cola Company (KO), Bristol Myers Squibb Co. (BMY), and the S&P 500
(SPY). Each price series begins on January 1, 2001 and ends on December 31, 2005, including
771 observations per day across 1241 trading days.
6
III. Modeling Jump Components in Equity Return Variance
Two non-parametric test statistics will be considered in the analysis that follows. The first
statistic is recommended by Huang and Tauchen (2005) in response to their extensive Monte
Carlo analysis. It utilizes the realized variance discussed in Andersen, Bollerslev, and Diebold
(2002) and the bi-power variation developed in Barndorff-Nielsen and Shephard (2004) as a
method for analyzing the contribution of jumps to total price variance. Here on referred to as the
BNS or z-statistic, it tests the null hypothesis that no jumps occurred in an entire trading day.
The second statistic is recommended by Lee and Mykland (2006) and is here on referred to as
the LM statistic. Their statistic is relevant because it presents certain practical advantages for the
analysis of intraday patterns. In particular, the Lee and Mykland statistic allows for the flagging
of specific returns as statistically significant jumps. A more detailed explanation of the
differences between the two statistics will follow in the subsequent pages. The rationale for
including both statistics is simple. While the BNS statistic has been published and rigorously
tested, the statistic proposed by Lee and Mykland is still under review. The comparison of both
statistics will shed light on how the Lee and Mykland statistic compares to the BNS statistic and
it will help to support our findings with different methods for analyzing the high frequency data.
The model behind the BNS statistic is a scalar log-price continuous time evolution,
dp(t )   (t )dt  (t )dW (t )  dLj (t ).
(1)
The first and second term in the model date back to the assumptions made in the Black-Scholes
option pricing formula. To be concrete, (t)dt is a drift term and (t)dw(t) is the instantaneous
volatility with a standardized Brownian motion. The notation for the additional term Lj(t) was
first used in Basawa and Brockwell (1982). It refers to a pure jump Lévy process with
increments Lj(t) – Lj(s) = s≤≤ () where () is the jump size. Huang and Tauchen consider
7
a specific class of the Lévy process called the Compound Poisson-Process (CPP) where jump
intensity is constant and jump size is independently identically distributed.
The realized variance and bi-power variation measures for price variation in high
frequency financial data are presented below. As developed in Barndorff-Nielsen and Shephard
and defined in Huang and Tauchen,
rt , j  p(t  1  j / M )  p(t  1  ( j  1) / M ), j  1,2,..., M
(2)
M
RVt   rt 2,j ,
(3)
j 1
BVt  12 (
M M
 M M
) rr , j 1 rt , j  (
) rr , j 1 rt , j ,
M  1 j 2
2 M  1 j 2
M
M
TPt  M (
) rt , j  2
M  2 j 3
3
4
3
4
4
3
rt , j 1
3
4
rt , j
3
(5)
,
(6)
where
 a  ( Z ), Z ~ N (0,1), a  0. 4
a
Here M is the within day sampling frequency. Combining the results of Andersen, Bollerslev,
and Diebold (2002) with the Barndorff-Nielsen and Shephard (2004), the difference between
realized volatility and bi-power variation provides a method to investigate the jump component
in equity return variance.
t
lim RVt  BVt    ( s)ds   
m
p
4
p 22
1
 ( ( p 1))
2
 ( Z P ),
1
( )
2
t 1
t
Nt
2
j 1
2
t, j
   ( s)ds 
2
t 1
t Nt
 
t 1 j 1
2
t, j
.
(7)
See Barndorff-Nielsen and Shephard (2004) for further explanations.
8
Multiple statistics discussed in Huang and Tauchen (2005) use these results as a means to
measure statistically significant jump days. Their recommended statistic will be used throughout
this report. It is defined as,
RJt
Z TP ,rm,t 
( bb
TP
1
 v qq ) max( 1, t 2 )
M
BV t
,
(8)
where
RJ t 
RVt  BVt

, vbb  ( ) 2    3, vqq  2.
RVt
2
Figures 3a & 3b plot the recommended BNS statistic applied to the SPY and PEP data sets at a
17.5 minute sampling interval. The number of days were the null hypothesis of no jumps is
rejected at a statistically significant level is 37 for the SPY and 65 for PEP.
The model considered by Lee and Mykland is quite similar. They define the underlying
stock price evolution as,
d log S (t )   (t )dt  (t )dW (t ) Y (t )dJ (t )
(9)
The only difference from the model described before is in the counting process. Here dJ(t) is a
non-homogenous Poisson-type jump process. It does not make the assumption of constant jump
intensity or the assumption of independent identically distributed jump size as used in the BNS
statistic. The advantage of having a more general counting process is that it allows for scheduled
events like earnings announcements to affect jump intensity. The assumption made by Lee and
Mykland is that for any  > 0,
9
sup
sup
i t ut
i
i 1
1

 (u )   (ti)  O (t 2
),
p
1
sup
sup
 (u )   (ti)  O (t 2
p
i t ut
i
i 1

).
They later explain,
we use Op notation throughout this paper to mean that, for random vectors {Xn}
and non-negative random variable {dn}, Xn = Op (dn), if for each  > 0, there exists a
finite constant M  such that P( X n  M  d n )   eventually. One can interpret
Assumption 1 as the drift and diffusion coefficients not changing dramatically over a
short time interval…This assumption also satisfies the stochastic volatility plus finite
activity jump semi-martingale class in Barndorff-Nielsen and Shephard (2004).5
Aside from the subtle difference in stock price evolution, Lee and Mykland go on to make
definitions for the realized variation and bi-power variation that come directly from BarndorffNielsen and Shephard (2004). They use the bi-power variation in their statistic as a means to
estimate the instantaneous volatility. The term π/2 is multiplied by the estimate of instantaneous
volatility to studentize the their statistic defined as,
L(i ) 
log S (t i ) log S (t i 1 )
,
 (t i )
1  i 1
 (t i ) 
( )  log S (t j ) S (t j 1 ) log S (t j 1 ) S (t j  2 ) .
K  2 2 j i  K  2
(10)
The window size K determines the degree to which the instantaneous volatility is backward
looking. Lee and Mykland recommend window sizes of 7, 16, 78, 110, 156, and 270 for
sampling intervals of 1 week, 1 day, 1 hour, 30 minutes, 15 minutes, and 5 minutes, respectively.
In this report a sampling interval of 17.5 minutes will be used unless otherwise stated. The
acceptable values for K as defined in Lee and Mykland range from 75 to 5544. Our choice of
window size will be K = 100. This decision is motivated by Figures 5a-5h. In keeping with
5
Lee and Mykland, 6.
10
recommendations proposed by Lee and Mykland the window size is chosen as a small value of K
where the statistic has stabilized.
IV. SEC Filings & Idiosyncratic Jumps
It is well documented in the literature that idiosyncratic jumps are related to firm-specific
events. To investigate this claim made in Law (2007) and Lee and Mykland (2006) we analyze
the relation between flagged jumps by the BNS statistic and the SEC Filings for Pepsi Co. A
Chi-Square Test of Independence is performed to test for a correlation between the filings and
the jumps. The results are included below.
The first time the test was performed the BNS statistics were calculated at a 5 minute
instead of the usual 17.5 minute sampling interval. Table 4a denotes the matches between the
flagged jump days and the SEC Filings. A match is defined to be an SEC filing the day before or
the day of a flagged jump. The rationale for this definition is two fold. A match of a flagged
jump on the day of the SEC Filing is the trivial definition. We also consider the possibility that
the information in the filing may precede the filing in the form of an announcement or
information being leaked into the market, constituting a possible violation of the strong form of
the efficient market hypothesis. Table 4b denotes the matches between the flagged jump days
and the SEC Filings when calculated at a 17.5 minute sampling interval. The only common
match between the two tables is August 1st, 2001, which provides an interesting example
discussed in Figure 4a.
The conclusion of our preliminary work is that the null hypothesis of independence
between SEC Filings and flagged jumps is rejected at a .999 level of statistical significance.
Table 4c includes the values for the Chi-Square Test of Independence. The test supports the
11
notion that jumps in specific stocks will be related to idiosyncratic concerns. Further, we find
that the types the SEC filings that matched with flagged jumps at the 5 minute sampling interval
are never quarterly or annual filings. Rather, unexpected filings like 8Ks and 13s are matched
with flagged jumps. As defined by the SEC these filings are used to announce major events that
shareholders should know about, often relating to mergers and acquisitions, changes in the
ownership of a company, or forecasts of future earnings.
Future work will include a regression that will detail how certain types of SEC Filings are
related to flagged jump days, and whether SEC Filings can be used to forecast flagged jump
days. To perform this regression it will be helpful to consider a wider variety of stocks than just
Pepsi Co. The intent of this analysis is to better understand the claim that jumps are correlated
with idiosyncratic changes in the value of a company. It will allow us to see a variety of events
that have a significant effect on equity returns and hopefully, better understand what is driving
jumps in specific equity price series.
V. Intraday Patterns in Flagged Jump Arrivals
One of the underlying goals in using the high frequency data is to better understand how
financial markets work. To address this concern the LM and BNS statistics are implemented in
Sections V and VI to detect for intraday patterns in jump components of equity return variance.
Our intuition is that jumps in the SPY are more likely to occur near the market’s open than other
times of the day. One reason is that information flow, particularly in the form of macroeconomic
announcements, is high at the beginning of the trading day.6 Both Lee and Mykland (2006) and
6
It is well documented that information flow is high in the morning. Chaboud, Chernenko, Howorka, Krishnasami,
Liu, Wright (2004) discusses the effect of macroeconomic announcements on foreign exchange volume and
volatility, particularly the announcements made at 8:30am for GDP, Nonfarm Payrolls, Business inventory, Durable
goods orders, Housing Starts, Initial Claims, Personal Consumption Expenditure, Personal Income, PPI, Retail
12
Law (2007) indicate that jumps in the market portfolio correspond with macroeconomic
announcements. Further, Lee and Mykland (2006) conclude that most idiosyncratic jumps occur
early in they morning. They explain that scheduled announcements usually correspond with
jumps in equity price series. They also find that the majority of firm-specific jumps correspond
with unscheduled news announcements. This confirms the analysis in the previous section
suggesting that unscheduled SEC Filings correlate to jumps in equity prices. The contribution of
this section is that we apply the LM statistic to a five year as opposed to a one year time period.
Additionally, we exclude the trades from 9:30am to 9:35am where the majority of the jumps in
Lee and Mykland (2006) are flagged. A visual representation of the flagged jump arrivals can be
found in Figures 5a-5h in addition to the number of flagged jumps at the recommended window
sizes.
To obtain Figures 5a-5h the LM statistic is applied to our data sets. The sampling interval
is 17.5 minutes. Our choice of sampling interval is motivated by Law (2007). The signature plots
indicate that 17.5 minutes is a sufficient interval to stabilize the effect of market-microstructure
noise. The different Figures 5a-5f then highlight the different number of flagged jumps at
particular times during the trading day over the course of the 5 year sample. To obtain these
figures the LM statistic was applied to each data set with varying window sizes. The flagged
jumps were then counted across the corresponding time intervals within the trading day. In
equation (11) Ji is the number of flagged jumps for each of the 22 intraday time intervals where
Sales, and Trade Balance Data as well as announcements made at 10:00am for Consumer Confidence, Factory
Orders, ISM Index, and New Existing Home Sales. Additionally, Ederington and Lee (1993) suggest that a large
volume of information is released after the market closes, which is subsequently priced into securities at the start of
the proceeding trading day.
13
0 < i ≤ 22 corresponds to the intraday statistics and returns calculated at a 17.5 minute sampling
interval across the data set.
Ji 
# statistics
j
k 1
i ,k
; ji ,k  indicator function  1  the kth statistic is a flagged jump
(11)
The pattern that follows is striking. Across all window sizes the LM statistic flags significantly
more returns in the morning than later parts of the day. It is notable that the peak is centered
approximately at 10:00am when several macroeconomic announcements including Consumer
Confidence, Factory Orders, the ISM Index, and New Existing Home Sales are released. The
number of flagged jumps then begins to increase later in the trading, which could be a response
to the FOMC announcements that are released at approximately 2:15pm. The result for the SPY
is a “U” shape that seems reminiscent of the well documented patters in intraday volatility.
Interestingly, the individual stocks do not exhibit an increase in the number of flagged jumps in
the afternoon. Our analysis suggests that the “U” shaped pattern in flagged jump arrivals is
specific to the market index.
VI. Intraday Patterns in Volatility
To continue the investigation in flagged jump arrival our analysis is extended to see
whether a similar pattern persists in the jump component of equity return variance. It is important
to note that the well documented “U” shaped pattern in intraday volatility dates back to Wood
(1985). His paper is the first to discuss the patterns in both volatility and volume for intraday
transaction data from the NYSE. There is also an extensive literature that investigates the
patterns in foreign exchange volatility across different markets. A notable example is Andersen
and Bollerslev (1998). They detect similar patterns in foreign-exchange volatility and while their
paper presents several ideas related to our future work, the relevance of the background literature
14
on intraday trading patterns is that a priori, it is not obvious that the jump component in equity
return variance will follow the U shaped pattern that has been documented in former papers. Our
purpose in separating the diffusive and jump components in the variance is to identify an
important difference in the evolution of stock prices. Our intuition that the jump component may
follow a similar pattern to the well documented “U” shape is from Section V. There we find a
striking pattern in the number of flagged jumps throughout the trading day. Namely, the jumps
flagged by the LM statistic loosely adhere to a U shaped pattern, where there are significantly
more jumps flagged in the morning than other times of the trading day. In this section we
conclude that a similar pattern persists in the jump component of equity return variance.
Figures 6a – 6d are the starting point for our analysis. They plot the averages of the
realized variance and bipower variation for the SPY data. To calculate the average of the realized
variance and the bipower variation throughout the trading day we use a procedure similar to the
procedure implemented in Section V. Here the averages for realized variance and bipower
variation are calculated as,
rv i 
bvi 
#ofreturns

k 1
rk2,i
1241
#ofreturns
rk ,i rk 1,i
k 2
1241

Again the subscript i refers to the time within the trading day. It can take on values from 1 to 22
because our sampling interval is 17.5 minutes excluding the first 5 minutes the market is open.
Figure 6e is then included for comparison. The only difference from Figure 6a is that the
definition of bipower variation is changed slightly. Specifically, the bipower variation is
redefined as,
15
1
bvi   
2
#ofreturns

k 2
1
 
1241
 2
rk ,i rk 1,i
#ofreturns
rk 1,i rk ,i
k 2
1241

Both plots indicate the same pattern. In particular, they suggest that the diffusive component of
equity return variance only comprises 45% of total equity return variance in the first 17.5
minutes of the trading day. The diffusive component then increases significantly so that at
10:27am it comprises 72% of the total variance. Throughout the rest of trading day the percent of
the diffusive component generally declines with the exception of the last 17.5 minutes were it
reaches its maximum value of 77%. If we define the jump component in equity return variance to
be the difference between realized variance and bipower variation, as suggested in Barndorff and
Shephard (2004), we also see in Figure 6b that the jump component follows a U shaped pattern.
It is highest in the morning and then increases in the late afternoon, with the exception that it
falls in the last 17.5 minutes of the trading day.
The final set of figures relating to the SPY make a small addition to Figure 6a. In
particular, Figures 6g & 6h include the realized variance averaged across the trading day for all
of the days flagged as jumps by the BNS statistic. The difference from the previous figures is
that the bipower variation makes up an even smaller percentage of the realized variance on
flagged jump days. As can be noted in Figure 6h, the average of the bipower variation on flagged
jump days is only 35% of the realized variance in the first 17.5 minutes of the trading day. It is
also interesting to see that the realized variance on flagged jump days is very erratic. At times the
realized variance averaged over jump days even dips below the bipower variation. An
explanation for this peculiarity will be left for our future work. What is striking is that the jump
16
component comprises a high percentage of realized volatility during the first 17.5 minutes of the
trading day and it appears to loosely follow a U shaped pattern.
Figures are also included for some of the individual stocks. Their primary difference with
respect to the market is that the bipower variation makes up a comparatively lower proportion of
realized variance at the start of the trading day. This supports our conclusion in Section V and
the conclusion of Lee and Mykland (2006) that the vast majority of idiosyncratic jumps happen
early in the trading day. Furthermore, the conclusions of this section pose several interesting
questions for our future work. First, why does the jump component follow a similar pattern to the
well documented U shaped pattern in equity return variance? Jumps are rare events. Huang and
Tauchen (2005) conclude that jumps are only expected to make up 4.5 to 7% of total return
variance. As a result, it seems implausible that jumps in equity prices are driving the U shaped
pattern. Perhaps there are inherent characteristics in the NYSE that drive the U shaped pattern in
both diffusive and jump components of equity return variance, but if so, what are the
characteristics and how are they driving volatility? Another interesting question is related to the
proportion of realized variance made up by bipower variation. In Figure 6d it is clear that the
bipower variation makes up a small portion of the realized variance during the first 17.5 minutes
of the trading day and a large portion of the realized variance during the last 17.5 minutes of the
trading day. Does this suggest that trading at the start of the day is information driven and more
related to jump components while trading at the end of the day is more related to the diffusive
components in equity return variance including trader activity to close positions and prepare for
the next day?
VII. Conclusions and Future Work
17
The lab report concludes at an interesting point in our research. After beginning the
semester with no exposure to the literature on volatility modeling and no previous knowledge of
the literature on jump components in asset prices, we arrive at several questions that could
constitute the topic of a senior honors thesis. In particular, how strongly are SEC Filings related
to idiosyncratic jumps in stock prices? Can they be used to forecast or explain flagged jumps?
How robust are the patterns outlined in Sections V and VI of this report? What can we learn from
the intraday patterns in jump components of equity return variance? How do they relate to the
daily operations of financial markets?
To answer any of these questions the analysis in this report will need to be developed. It
will be helpful to apply some of the same procedures to all 40 of the stocks considered in Law
(2007). Of course, it will also be necessary to consult the literature to find more robust methods
to support the claims made in this report. Papers related to intraday patterns in volatility may
provide helpful methods of investigation and suggestions for future analysis. One similarity in
the literature is that patterns in volatility are often discussed alongside patterns in volume.
Analyzing volume with the high frequency data may be an interesting extension of our current
work. This will need to be performed in anticipation of the fall semester. The focus in the fall
will be to write a paper that properly addresses those questions outlined above.
18
VII. Tables
Flagged Jumps at Recommended Window Sizes for
SPY Data*
Sampling Frequency
5 minutes
7 minutes
11 minutes
17.5 minutes
27.5 minutes
38.5 minutes
55 minutes
77 minutes
Window Size
270
250
200
150
110
90
78
75
Jumps
1193
819
522
306
212
135
120
82
Flagged Jumps at Recommended Window Sizes for
PEP Data*
Sampling Frequency
5 minutes
7 minutes
11 minutes
17.5 minutes
27.5 minutes
38.5 minutes
55 minutes
77 minutes
Window Size
270
250
200
150
110
90
78
75
Jumps
1285
875
549
381
230
169
137
89
19
Tables 2a & 2b: The table highlights the number of flagged jumps at the recommended window size by Lee and
Mykland. * refers to the fact that approximations were made for window sizes that were not included in the paper.
Flagged Jumps at Recommended Window Sizes for SPY Data*
Sampling Frequency
5 minutes
7 minutes
11 minutes
17.5 minutes
27.5 minutes
38.5 minutes
55 minutes
77 minutes
Window Size
Significance Level
270
250
200
150
110
90
78
75
0.99988
0.999836
0.999753
0.999624
0.999435
0.999236
0.998948
0.9985758
Jumps
617
436
297
214
158
120
120
82
Tables 2a & 2b: The table highlights the number of flagged jumps at the recommended window size by Lee and
Mykland when corrected for Type I errors. * refers to the fact that approximations were made for window sizes that
were not included in the paper.
SEC Filing / Jump Day Matches Type of Filing
2001
3
23
Def 14A
2001
7
18
8K
2001
8
1
8K
2001
8
1
8K
2002
2
12
8K, 13G/A
2002
10
17
8K/A
2003
10
28
8K/A
2003
11
5
8K
2003
11
5
8K
Table 4a: The table above includes the matches between flagged jump days at a 5 minute sampling interval for the
BNS statistic and SEC filings of PEP. Days were matched if an SEC filing was filed on the day of or the day before
a jump. Days may be listed twice if there was a filing the day of and the day before a flagged jump. It is interesting
to note that the only types of filings that matched were 8Ks or 13s. No quarterly or annual filings were matched,
which suggests the SEC Filings related to flagged jumps were unexpected.
SEC Filing / Jump Day Matches
2001
7
26
20
2001
2001
2001
2001
2001
2001
2003
2003
2003
2004
2004
2005
2005
7
7
8
8
8
9
1
4
7
2
11
2
9
26
27
1
1
7
17
30
16
24
6
29
28
29
Table 4b: Above the matches between flagged jump days at a 17.5 minute sampling interval for the BNS statistic
and SEC filings of PEP. The only day that is common to both Table 4a & 4b is August 1 st, 2001. Finding the types
of filings associated with these match dates is left for future work, which should also include the application of this
analysis to a wider variety of stocks.
Chi-Square Test of Independence
5 minute sampling interval
Expected matches
Observed matches
χ²
Degrees of Freedom
(28/1241)(84/1241)(1241) = 1.90
9
Σ (Oi – Ei)² / Ei = (9-1.9)²/1.9 = 26.5
3
17.5 minute sampling interval
Expected matches
Observed matches
χ²
Degrees of Freedom
(65/1241)(84/1241)(1241) = 4.40
14
Σ (Oi – Ei)² / Ei = (14-4.4)²/4.4 = 20.9
3
Table 4c: The Chi-Square Test of Independence tests independence between the flagged jumps from the BNS
statistic and SEC Filings for Pepsi Co. Expected matches are calculated by assuming independence between the
jumps and filings. They multiply the sample average of jumps and SEC Filings over the sample times the number
observations in the sample to calculate an expected number of matches. The value of χ² is statistically significant at
the .999 level for both sampling intervals. This rejects the null hypothesis that SEC filings and flagged jumps are
independent.
21
IX. Figures
22
Figure 1a & 1b: Included above are figures that plot the SPY level and returns throughout the sample. The level
sampling frequency is twice per day while the return sampling frequency is daily. It is easy to confirm on
finance.yahoo.com or Google finance that our high frequency data when sampled at daily intervals has an identical
appearance to the data reported online.
23
Figure 2a & 2b: The plots above include the number of flagged jumps by the LM statistic across different sampling
frequencies and window sizes for both the SPY and the PEP data. The number of flagged jumps for each the SPY
and PEP at the window sizes recommended by LM can be found in Tables 2a & 2b.
24
Figure 2c: The figure above includes the 30 second price series for Pepsi Co. on April 26 th, 2005. The arrow is
pointing to a trade that seems to be inconsistent with the other prices. However, it is interesting to note that the
volume on that trade is 181,700 shares which is significantly higher than an average 5 year average of 4,869 shares
per transaction. A plausible explanation for the low price associated with this trade is that an investor was willing to
sell shares at the lowest price of the day in order to unwind a large position.
Figures 2d & 2e: These plots will be discussed again in Section VI: Intraday Patterns in Volatility. With respect to
Section II: Data, it is notable that both plots are created by the exact same procedure across the entire data set at a
17.5 minute sampling interval, and their only difference is two data points. These outliers are removed for the work
in Section VI.
25
Figure 3a & 3b: Included above are plots of the BNS statistic across the sample. The recommended statistic from
Huang and Tauchen (2005) is used to compute the statistics at the recommended sampling frequency of 17.5
minutes from Law (2007). The BNS statistic flags 37 days for the SPY and 65 days for PEP as statistically
significant rejecting the null hypothesis that there were no jumps.
26
Figure 4a: The plot above includes the 5 minute price series for Pepsi Co. on August 1 st, 2001. It is the only flagged
jump that matches with an SEC filing for both the 5 minute and 17.5 minute sampling interval. It provides the ideal
example of a match between a flagged jump and SEC Filing. The filing released was an 8K announcing Pepsi’s
unconditional clearance from the Federal Trade Commission to merge with the Quaker Oats Company. Plans to
merge were previously announced on December 4, 2000 and as a result, the market was prepared to quickly adjust
the price of PEP once the merger became official.
27
Figure 5a & 5b: Included in the figures above is the number of flagged jumps for the SPY Data at different
times of the trading day. The 2d plot has the same curves as the 3d plot. It simply eliminates the axis for window
size by superimposing all the curves on a two dimensional space. Both plots confirm the fact that the statistic
stabilizes for sufficiently high window size. Although Lee and Mykland recommend a window size of
approximately 150 for our sampling interval of 17.5 minutes, the statistic seems to stabilize with a window size as
small as 100. The average number of flagged jumps from k = 100 to k = 150 is 311 while the number of flagged
jumps when k = 100 is 322. Also interesting is the “U” shaped pattern in jump arrivals. Slightly after 10:00am there
is a noticeable peak in the number of flagged jumps. During the middle of the trading day there are visually less
jumps. Starting at about 2:15pm the number of flagged jumps begins to increase a little from the midday lows.
28
Figures 5c, 5d, 5e, 5f, 5g, & 5h: Included in the above figures are the same plots as in Figures 5a & 5b
except for the PEP, KO, and BMY data. The LM statistic has stabilizes at a similar value. For PEP the average
number of flagged jumps from k = 100 to k =150 is 382 and the number of flagged jumps at k = 100 is 391. Similar
values for KO and BMY are 302 and 440 for the averages and 309 and 449 as the number of flagged jumps at k =
100 respectively. The only visual difference in the jump arrival is that the plots seem to be more peaked at the start
of the day and they do not appear to increase as significantly at the end of the day, that is to say, the individual
stocks examined here do not exhibit a “U” shaped pattern in jump arrival.
29
Figures 6a – 6d: All plots are from the SPY data. In the top left the average of the realized variance is plotted
against the average of the bipower variation throughout the trading day. The figure in the top right includes the
difference between the realized variance and bipower variation. The bottom figures include the bipower variation as
a percentage of the realized variance and the number of times greater the realized variance is than the bipower
variation.
Figure 6e & 6f: These plots are also constructed from the SPY data. The only difference from Figures 6a – 6d is
that the bipower variation is defined in the slightly different manner detailed in Section VI. They are helpful in that
they appear almost identical to Figures 6a & 6b, which helps to confirm the patterns we observe in the jump
component of equity return variance.
30
Figures 6g & 6h: Here similar plots are made with a minor addition. The realized variance and bipower variation
included in Figure 6g are identical to the RV and BV in Figure 6a. The addition is the average of the realized
variance for the SPY data on flagged jump days by the BNS statistic. In Figure 6h one can see that the proportion of
equity return variance that is comprised by the diffusive bipower variation is even lower on jump days. At the start
of the day the bipower variation only accounts for 35% of the equity return variance on average. The jump
component on flagged jump days also appears to be more erratic. At certain points in the trading day it actually falls
below the bipower variation when averaged across the sample. An explanation for this peculiarity will be left for our
future work.
Figures 6i & 6j: Similar plots are included for the PEP data. Here the bipower variation is defined as in Figure 6a.
31
Figures 6k & 6l: Similar plots are included for the BMY data. Here the bipower variation is defined as in Figure 6a.
Figures 6m & 6n: Similar plots are included for the KO data. Here the bipower variation is defined as in Figure 6a.
32
X. References
Andersen, T. G., T. Bollerslev, and F. X. Diebold (2004). Some Like it Smooth, and Some Like
it Rough: Disentangling Continuous and Jump Components in Measuring, Modeling, and
Forecasting Asset Return Volatility. Working paper, Duke University.
Andersen, T. G. and T. Bollerslev (1998). Deutsche mark-dollar volatility: Intraday activity
patterns, macroeconomic announcements, and longer run dependencies, Journal of Finance 53,
219-265.
Barndorff-Nielsen, O. and N. Shephard (2004). Power and Bipower Variation with Stochastic
Volatility and Jumps. Journal of Financial Econometrics 2, 1-37.
Barndorff-Nielsen, O. and N. Shephard (2006b). Impact of Jumps on Returns and Realized
Variances: Econometric Analysis of Time-Deformed Levy Processes. Journal of Econometrics
131, 217-252.
Black, F. (1986). Noise. Journal of Finance 41, 529-543.
Eraker, B., M. Johannes, and N. Polson (2003). The Impact of Jumps in Volatility and Returns.
Journal of Finance 58, 1269-1300.
Huang, X. and G. Tauchen (2005). The Relative Contributions of Jumps to Total variance.
Journal of Financial Econometrics 3, 456-499.
Lee, S. S. and P. Mykland (2006). Jumps in Financial Markets: A New Nonparametric Test and
Jump Dynamics, working paper.
Wood, R. A., T. H. McInish, and J. K. Ord (1985). An investigation of transaction data for
NYSE stocks, Journal of Finance 25, 723-739.
33
34
Download