Investigation of NYSE High Frequency Financial

Investigation of NYSE High Frequency Financial Data for Intraday Patterns in Jump Components of Equity Returns Peter Van Tassel1 Final Report Submitted for Economics 201FS: Research Seminar and Lab on High Frequency Financial Data Analysis for Duke Economics Juniors Duke University Durham, North Carolina 2 May 2007 Academic Honesty Pledge 1. I will not lie, cheat, or steal in my academic endeavors, nor will I accept the actions of those who do. 2. I will conduct myself responsibly and honorably in all my activities as a Duke student. 3. The assignment is in compliance with the Duke Community Standard as expressed on pp. 5-7 of “Academic Integrity at Duke: A Guide for Teachers and Undergraduates.” . 1 peter.vantassel@duke.edu. Box 93481, Durham NC 27708 Peter Van Tassel Pledge I. Introduction Financial markets are complex systems in which agents interact to determine the prices of different assets. The goal of our research is to analyze high frequency financial data to improve our knowledge of how financial markets operate. In particular, we consider the literature on jump components in asset prices as a starting point for this report. Our motivation is both practical and intellectual. From a practical standpoint, there is something awry in the traditional methods for modeling stock price evolution. The well documented smiles in implied volatility from the Black-Scholes option pricing formula are one of many indications that the market is valuing volatility in a different manner than the rudimentary academic models. The literature related to jump components in asset prices, including papers like Andersen, Bollerslev, Diebold 2004, Eraker 2003, and Huang and Tauchen 2005, all have practical implications for modeling volatility, including important results for derivative valuation, risk management, and asset allocation. These concerns are of utmost importance for a trader or portfolio manager. From an intellectual standpoint, the recently available high frequency financial data provide new opportunities for frontier research in econometrics. In some instances this will allow us to investigate previous literature in finance, providing a better picture as to which ideas are robust to the tick by tick data at the New York Stock Exchange. In other circumstances the data will allow for new types of investigation that were never previously considered. Undoubtedly, the ability to zoom in on financial markets will add a new level of complexity to research and a variety of intellectual challenges before we can decipher the on-goings of the real world. In this lab report the high frequency financial data will be considered to investigate intraday patterns in jump components of equity return variance. The purpose is to improve our 2 understanding of the evolution of heavily traded stocks on the NYSE. Learning more about what drives intraday patterns in jump components will provide practical implications for trading and portfolio management, as well as raising several interesting questions at the end of the report. Our focus will be on the S&P 500. Figures 1a & 1b present the level prices and returns over our sample period. The rest of the report will proceed as such: Section II will describe the data, Section III will describe the statistics considered, Section IV will present some preliminary work related to SEC Filings and idiosyncratic jumps, including an interesting example, Section V will discuss patterns in flagged jump arrival, Section VI will continue the investigation of intraday patterns in jump components of equity return variance, and Section VII will present our ideas for future work. All tables, figures, and references can be found at the back of the report. II. Data High frequency financial data from the NYSE are analyzed for the purpose of this report. We will focus on the S&P 500 but will also consider Pepsi Co., the Coca-Cola Company, and Bristol Myers Squibb Co. The specific stocks that are considered were assigned as a starting point for research in Econ 201FS. Moving forward the analysis will be extended to an additional 37 stocks and an aggregate portfolio of all 40 stocks included in Law (2007). In this report the SPY data set will be used as a proxy for the market portfolio. The data sets are obtained from the Trade and Quote Database (TAQ) which is available via Wharton Research Data Services (WRDS). A more comprehensive discussion in regard to the data and how the data sets were compiled can be found in Law (2007).2 2 First and foremost, it is necessary to thank Tzuo Hann Law for sharing the data sets he compiled for his Honors Thesis. Without his help and willingness to collaborate the writing of this report would have been impossible. A more detailed discussion of the method for obtaining the data sets can be found in his paper The Elusiveness of Systematic Jumps. 3 Our selection of stocks is motivated by trading volume. In order for the statistics used in this report to behave properly it is necessary that the stocks considered be heavily traded. The stocks assigned in class and the stocks included in Law (2007) are 40 of the most actively traded stocks on the NYSE as defined by their 10-day trading volume. For each stock there is a data set that includes all trades from January 1, 2001 through December 31, 2005. The time period was selected for two reasons. During the late 1990s trading frequency increased significantly and by 2001 the volume was high enough to justify the use of the statistics. Additionally, by 2001 almost all of the stocks were converted from fractional to decimal trading which helped to reduce some of the market-microstructure noise. To convert the TAQ data into a 30 second price series an adapted version of the previous tick method from Dacorogna, Gencay, Muller, Olsen, and Pictet (2001) is applied. It excludes the first five minutes of the trading day in order to ensure uniformity of trading and information arrival. The resulting price series includes 771 observations from 9:35am to 4:00pm across 1241 days. The structure of the data set is advantageous because it allows us to easily implement the statistics across different sampling intervals. In this report the sampling frequency will be 17.5 minutes unless otherwise stated. Our primary concern in selecting a sampling frequency is the effect of market microstructure noise. The literature on market microstructure noise (MMN) dates back to Black (1976) and discusses a variety of sources that bias prices when sampling at a high frequency, including trading mechanisms and discrete prices. One approach to account for this problem is proposed in Andersen, Bollerslev, Diebold, and Labys (2000). They suggest the creation of signature plots of the realized variance across different sampling intervals to allow for the visual selection of a sampling interval where the MMN seems to have stabilized. The selection of 17.5 minutes is 4 made because it seems to be the highest sampling frequency that is relatively unaffected by the MMN in the signature plots included in Law (2007). The result for our data set is that each stock has 22 returns per day across 1241 days. Figures 2a-2b and Tables 2a-2b are also related to the discussion of sampling interval. The figures portray the number of flagged jumps by the LM statistic at a .999 significance level across different sampling intervals and window sizes. The tables include the number of flagged jumps using the recommended window sizes for instantaneous volatility as defined by Lee and Mykland. Visually the statistic seems to stabilize as the window size increases for each sampling interval. However, Tables 2a & 2b suggest that the number of flagged jumps does not stabilize across different sampling intervals. At 17.5 minutes there are 306 flagged jumps for the SPY data whereas at 55 minutes there are 120 flagged jumps. One explanation might be Type I errors. At different sampling intervals there are a significantly different number of statistics calculated. For example, a sampling interval of 17.5 minutes yields 27,152 statistics whereas a sampling interval at 55 minutes only yields 8,609 statistics over the same data set. The null hypothesis of no jump will likely be incorrectly rejected more frequently over 27,152 statistics versus 8,609 statistics. To account for this discrepancy Table 2c includes the number of flagged jumps at each sampling intervals with varying significance levels.3 Although it does not convincingly suggest that the LM statistic has stabilized, it does provide more reassuring evidence that 17.5 minutes is an appropriate sampling interval than Tables 2a & 2b. A final consideration needs to be made for the errors in the dataset. It is important to realize that the TAQ database is a human construction that relies on manual entry of the data. As 3 The different significance levels were calculated in the following manner. For the entire sample we decide Pr(Type I Error) = .001. Each flagged jump is assumed to be independently flagged correctly or inadvertently. Using a binomial distribution we argue Pr (No error) = .999 = Pr (k=0) =  nk   nk 1  k   . The value of n is known  n  n to be the number of statistics in the sample and the value of alpha is then determined for each sampling interval to be used as a significance level in Table 2c. 5 such, it is inevitably subject to human error and needs to be highly scrutinized. Errors in the data set are removed in two manners. First, a simple algorithm sets suspect prices equal to zero when thirty second returns are at least 1.5% in opposite directions. Suspect prices are removed from the price series because it seems illogical that an efficient market would induce a stock to move 1.5% in opposite directions in the span of one trading minute. A likely cause of this phenomenon is data entry error. However, we do not presume that errors in data entry are the only possibility. One curious example can be found on Figure 2c included. The highlighted trade seems to be out of sync with the rest of the price series. However, further inspection reveals that the volume on the trade was actually 37 times greater than the average volume per transaction over the 5 year sample. Isn’t it possible that a large investor, perhaps a hedge fund, needed to unload a large quantity of shares and was willing to accept a slightly lower price than the rest of the market? Surely some behavior that seems irrational is in reality a well functioning and efficient market. With this concern taken into consideration we proceed with the second method for highlighting errors in the price series. Often a human eye is required for removing outliers that are undetected by the algorithm. In particular, manual inspection helps to remove returns that seem to have unreasonably high or low magnitudes. One example is discussed in Figures 2d & 2e. Ultimately, we arrive at our data for this report. It includes price series for Pepsi Co. (PEP), the Coca Cola Company (KO), Bristol Myers Squibb Co. (BMY), and the S&P 500 (SPY). Each price series begins on January 1, 2001 and ends on December 31, 2005, including 771 observations per day across 1241 trading days. 6 III. Modeling Jump Components in Equity Return Variance Two non-parametric test statistics will be considered in the analysis that follows. The first statistic is recommended by Huang and Tauchen (2005) in response to their extensive Monte Carlo analysis. It utilizes the realized variance discussed in Andersen, Bollerslev, and Diebold (2002) and the bi-power variation developed in Barndorff-Nielsen and Shephard (2004) as a method for analyzing the contribution of jumps to total price variance. Here on referred to as the BNS or z-statistic, it tests the null hypothesis that no jumps occurred in an entire trading day. The second statistic is recommended by Lee and Mykland (2006) and is here on referred to as the LM statistic. Their statistic is relevant because it presents certain practical advantages for the analysis of intraday patterns. In particular, the Lee and Mykland statistic allows for the flagging of specific returns as statistically significant jumps. A more detailed explanation of the differences between the two statistics will follow in the subsequent pages. The rationale for including both statistics is simple. While the BNS statistic has been published and rigorously tested, the statistic proposed by Lee and Mykland is still under review. The comparison of both statistics will shed light on how the Lee and Mykland statistic compares to the BNS statistic and it will help to support our findings with different methods for analyzing the high frequency data. The model behind the BNS statistic is a scalar log-price continuous time evolution, dp(t )   (t )dt  (t )dW (t )  dLj (t ). (1) The first and second term in the model date back to the assumptions made in the Black-Scholes option pricing formula. To be concrete, (t)dt is a drift term and (t)dw(t) is the instantaneous volatility with a standardized Brownian motion. The notation for the additional term Lj(t) was first used in Basawa and Brockwell (1982). It refers to a pure jump Lévy process with increments Lj(t) – Lj(s) = s≤≤ () where () is the jump size. Huang and Tauchen consider 7 a specific class of the Lévy process called the Compound Poisson-Process (CPP) where jump intensity is constant and jump size is independently identically distributed. The realized variance and bi-power variation measures for price variation in high frequency financial data are presented below. As developed in Barndorff-Nielsen and Shephard and defined in Huang and Tauchen, rt , j  p(t  1  j / M )  p(t  1  ( j  1) / M ), j  1,2,..., M (2) M RVt   rt 2,j , (3) j 1 BVt  12 ( M M  M M ) rr , j 1 rt , j  ( ) rr , j 1 rt , j , M  1 j 2 2 M  1 j 2 M M TPt  M ( ) rt , j  2 M  2 j 3 3 4 3 4 4 3 rt , j 1 3 4 rt , j 3 (5) , (6) where  a  ( Z ), Z ~ N (0,1), a  0. 4 a Here M is the within day sampling frequency. Combining the results of Andersen, Bollerslev, and Diebold (2002) with the Barndorff-Nielsen and Shephard (2004), the difference between realized volatility and bi-power variation provides a method to investigate the jump component in equity return variance. t lim RVt  BVt    ( s)ds    m p 4 p 22 1  ( ( p 1)) 2  ( Z P ), 1 ( ) 2 t 1 t Nt 2 j 1 2 t, j    ( s)ds  2 t 1 t Nt   t 1 j 1 2 t, j . (7) See Barndorff-Nielsen and Shephard (2004) for further explanations. 8 Multiple statistics discussed in Huang and Tauchen (2005) use these results as a means to measure statistically significant jump days. Their recommended statistic will be used throughout this report. It is defined as, RJt Z TP ,rm,t  ( bb TP 1  v qq ) max( 1, t 2 ) M BV t , (8) where RJ t  RVt  BVt  , vbb  ( ) 2    3, vqq  2. RVt 2 Figures 3a & 3b plot the recommended BNS statistic applied to the SPY and PEP data sets at a 17.5 minute sampling interval. The number of days were the null hypothesis of no jumps is rejected at a statistically significant level is 37 for the SPY and 65 for PEP. The model considered by Lee and Mykland is quite similar. They define the underlying stock price evolution as, d log S (t )   (t )dt  (t )dW (t ) Y (t )dJ (t ) (9) The only difference from the model described before is in the counting process. Here dJ(t) is a non-homogenous Poisson-type jump process. It does not make the assumption of constant jump intensity or the assumption of independent identically distributed jump size as used in the BNS statistic. The advantage of having a more general counting process is that it allows for scheduled events like earnings announcements to affect jump intensity. The assumption made by Lee and Mykland is that for any  > 0, 9 sup sup i t ut i i 1 1   (u )   (ti)  O (t 2 ), p 1 sup sup  (u )   (ti)  O (t 2 p i t ut i i 1  ). They later explain, we use Op notation throughout this paper to mean that, for random vectors {Xn} and non-negative random variable {dn}, Xn = Op (dn), if for each  > 0, there exists a finite constant M  such that P( X n  M  d n )   eventually. One can interpret Assumption 1 as the drift and diffusion coefficients not changing dramatically over a short time interval…This assumption also satisfies the stochastic volatility plus finite activity jump semi-martingale class in Barndorff-Nielsen and Shephard (2004).5 Aside from the subtle difference in stock price evolution, Lee and Mykland go on to make definitions for the realized variation and bi-power variation that come directly from BarndorffNielsen and Shephard (2004). They use the bi-power variation in their statistic as a means to estimate the instantaneous volatility. The term π/2 is multiplied by the estimate of instantaneous volatility to studentize the their statistic defined as, L(i )  log S (t i ) log S (t i 1 ) ,  (t i ) 1  i 1  (t i )  ( )  log S (t j ) S (t j 1 ) log S (t j 1 ) S (t j  2 ) . K  2 2 j i  K  2 (10) The window size K determines the degree to which the instantaneous volatility is backward looking. Lee and Mykland recommend window sizes of 7, 16, 78, 110, 156, and 270 for sampling intervals of 1 week, 1 day, 1 hour, 30 minutes, 15 minutes, and 5 minutes, respectively. In this report a sampling interval of 17.5 minutes will be used unless otherwise stated. The acceptable values for K as defined in Lee and Mykland range from 75 to 5544. Our choice of window size will be K = 100. This decision is motivated by Figures 5a-5h. In keeping with 5 Lee and Mykland, 6. 10 recommendations proposed by Lee and Mykland the window size is chosen as a small value of K where the statistic has stabilized. IV. SEC Filings & Idiosyncratic Jumps It is well documented in the literature that idiosyncratic jumps are related to firm-specific events. To investigate this claim made in Law (2007) and Lee and Mykland (2006) we analyze the relation between flagged jumps by the BNS statistic and the SEC Filings for Pepsi Co. A Chi-Square Test of Independence is performed to test for a correlation between the filings and the jumps. The results are included below. The first time the test was performed the BNS statistics were calculated at a 5 minute instead of the usual 17.5 minute sampling interval. Table 4a denotes the matches between the flagged jump days and the SEC Filings. A match is defined to be an SEC filing the day before or the day of a flagged jump. The rationale for this definition is two fold. A match of a flagged jump on the day of the SEC Filing is the trivial definition. We also consider the possibility that the information in the filing may precede the filing in the form of an announcement or information being leaked into the market, constituting a possible violation of the strong form of the efficient market hypothesis. Table 4b denotes the matches between the flagged jump days and the SEC Filings when calculated at a 17.5 minute sampling interval. The only common match between the two tables is August 1st, 2001, which provides an interesting example discussed in Figure 4a. The conclusion of our preliminary work is that the null hypothesis of independence between SEC Filings and flagged jumps is rejected at a .999 level of statistical significance. Table 4c includes the values for the Chi-Square Test of Independence. The test supports the 11 notion that jumps in specific stocks will be related to idiosyncratic concerns. Further, we find that the types the SEC filings that matched with flagged jumps at the 5 minute sampling interval are never quarterly or annual filings. Rather, unexpected filings like 8Ks and 13s are matched with flagged jumps. As defined by the SEC these filings are used to announce major events that shareholders should know about, often relating to mergers and acquisitions, changes in the ownership of a company, or forecasts of future earnings. Future work will include a regression that will detail how certain types of SEC Filings are related to flagged jump days, and whether SEC Filings can be used to forecast flagged jump days. To perform this regression it will be helpful to consider a wider variety of stocks than just Pepsi Co. The intent of this analysis is to better understand the claim that jumps are correlated with idiosyncratic changes in the value of a company. It will allow us to see a variety of events that have a significant effect on equity returns and hopefully, better understand what is driving jumps in specific equity price series. V. Intraday Patterns in Flagged Jump Arrivals One of the underlying goals in using the high frequency data is to better understand how financial markets work. To address this concern the LM and BNS statistics are implemented in Sections V and VI to detect for intraday patterns in jump components of equity return variance. Our intuition is that jumps in the SPY are more likely to occur near the market’s open than other times of the day. One reason is that information flow, particularly in the form of macroeconomic announcements, is high at the beginning of the trading day.6 Both Lee and Mykland (2006) and 6 It is well documented that information flow is high in the morning. Chaboud, Chernenko, Howorka, Krishnasami, Liu, Wright (2004) discusses the effect of macroeconomic announcements on foreign exchange volume and volatility, particularly the announcements made at 8:30am for GDP, Nonfarm Payrolls, Business inventory, Durable goods orders, Housing Starts, Initial Claims, Personal Consumption Expenditure, Personal Income, PPI, Retail 12 Law (2007) indicate that jumps in the market portfolio correspond with macroeconomic announcements. Further, Lee and Mykland (2006) conclude that most idiosyncratic jumps occur early in they morning. They explain that scheduled announcements usually correspond with jumps in equity price series. They also find that the majority of firm-specific jumps correspond with unscheduled news announcements. This confirms the analysis in the previous section suggesting that unscheduled SEC Filings correlate to jumps in equity prices. The contribution of this section is that we apply the LM statistic to a five year as opposed to a one year time period. Additionally, we exclude the trades from 9:30am to 9:35am where the majority of the jumps in Lee and Mykland (2006) are flagged. A visual representation of the flagged jump arrivals can be found in Figures 5a-5h in addition to the number of flagged jumps at the recommended window sizes. To obtain Figures 5a-5h the LM statistic is applied to our data sets. The sampling interval is 17.5 minutes. Our choice of sampling interval is motivated by Law (2007). The signature plots indicate that 17.5 minutes is a sufficient interval to stabilize the effect of market-microstructure noise. The different Figures 5a-5f then highlight the different number of flagged jumps at particular times during the trading day over the course of the 5 year sample. To obtain these figures the LM statistic was applied to each data set with varying window sizes. The flagged jumps were then counted across the corresponding time intervals within the trading day. In equation (11) Ji is the number of flagged jumps for each of the 22 intraday time intervals where Sales, and Trade Balance Data as well as announcements made at 10:00am for Consumer Confidence, Factory Orders, ISM Index, and New Existing Home Sales. Additionally, Ederington and Lee (1993) suggest that a large volume of information is released after the market closes, which is subsequently priced into securities at the start of the proceeding trading day. 13 0 < i ≤ 22 corresponds to the intraday statistics and returns calculated at a 17.5 minute sampling interval across the data set. Ji  # statistics j k 1 i ,k ; ji ,k  indicator function  1  the kth statistic is a flagged jump (11) The pattern that follows is striking. Across all window sizes the LM statistic flags significantly more returns in the morning than later parts of the day. It is notable that the peak is centered approximately at 10:00am when several macroeconomic announcements including Consumer Confidence, Factory Orders, the ISM Index, and New Existing Home Sales are released. The number of flagged jumps then begins to increase later in the trading, which could be a response to the FOMC announcements that are released at approximately 2:15pm. The result for the SPY is a “U” shape that seems reminiscent of the well documented patters in intraday volatility. Interestingly, the individual stocks do not exhibit an increase in the number of flagged jumps in the afternoon. Our analysis suggests that the “U” shaped pattern in flagged jump arrivals is specific to the market index. VI. Intraday Patterns in Volatility To continue the investigation in flagged jump arrival our analysis is extended to see whether a similar pattern persists in the jump component of equity return variance. It is important to note that the well documented “U” shaped pattern in intraday volatility dates back to Wood (1985). His paper is the first to discuss the patterns in both volatility and volume for intraday transaction data from the NYSE. There is also an extensive literature that investigates the patterns in foreign exchange volatility across different markets. A notable example is Andersen and Bollerslev (1998). They detect similar patterns in foreign-exchange volatility and while their paper presents several ideas related to our future work, the relevance of the background literature 14 on intraday trading patterns is that a priori, it is not obvious that the jump component in equity return variance will follow the U shaped pattern that has been documented in former papers. Our purpose in separating the diffusive and jump components in the variance is to identify an important difference in the evolution of stock prices. Our intuition that the jump component may follow a similar pattern to the well documented “U” shape is from Section V. There we find a striking pattern in the number of flagged jumps throughout the trading day. Namely, the jumps flagged by the LM statistic loosely adhere to a U shaped pattern, where there are significantly more jumps flagged in the morning than other times of the trading day. In this section we conclude that a similar pattern persists in the jump component of equity return variance. Figures 6a – 6d are the starting point for our analysis. They plot the averages of the realized variance and bipower variation for the SPY data. To calculate the average of the realized variance and the bipower variation throughout the trading day we use a procedure similar to the procedure implemented in Section V. Here the averages for realized variance and bipower variation are calculated as, rv i  bvi  #ofreturns  k 1 rk2,i 1241 #ofreturns rk ,i rk 1,i k 2 1241  Again the subscript i refers to the time within the trading day. It can take on values from 1 to 22 because our sampling interval is 17.5 minutes excluding the first 5 minutes the market is open. Figure 6e is then included for comparison. The only difference from Figure 6a is that the definition of bipower variation is changed slightly. Specifically, the bipower variation is redefined as, 15 1 bvi    2 #ofreturns  k 2 1   1241  2 rk ,i rk 1,i #ofreturns rk 1,i rk ,i k 2 1241  Both plots indicate the same pattern. In particular, they suggest that the diffusive component of equity return variance only comprises 45% of total equity return variance in the first 17.5 minutes of the trading day. The diffusive component then increases significantly so that at 10:27am it comprises 72% of the total variance. Throughout the rest of trading day the percent of the diffusive component generally declines with the exception of the last 17.5 minutes were it reaches its maximum value of 77%. If we define the jump component in equity return variance to be the difference between realized variance and bipower variation, as suggested in Barndorff and Shephard (2004), we also see in Figure 6b that the jump component follows a U shaped pattern. It is highest in the morning and then increases in the late afternoon, with the exception that it falls in the last 17.5 minutes of the trading day. The final set of figures relating to the SPY make a small addition to Figure 6a. In particular, Figures 6g & 6h include the realized variance averaged across the trading day for all of the days flagged as jumps by the BNS statistic. The difference from the previous figures is that the bipower variation makes up an even smaller percentage of the realized variance on flagged jump days. As can be noted in Figure 6h, the average of the bipower variation on flagged jump days is only 35% of the realized variance in the first 17.5 minutes of the trading day. It is also interesting to see that the realized variance on flagged jump days is very erratic. At times the realized variance averaged over jump days even dips below the bipower variation. An explanation for this peculiarity will be left for our future work. What is striking is that the jump 16 component comprises a high percentage of realized volatility during the first 17.5 minutes of the trading day and it appears to loosely follow a U shaped pattern. Figures are also included for some of the individual stocks. Their primary difference with respect to the market is that the bipower variation makes up a comparatively lower proportion of realized variance at the start of the trading day. This supports our conclusion in Section V and the conclusion of Lee and Mykland (2006) that the vast majority of idiosyncratic jumps happen early in the trading day. Furthermore, the conclusions of this section pose several interesting questions for our future work. First, why does the jump component follow a similar pattern to the well documented U shaped pattern in equity return variance? Jumps are rare events. Huang and Tauchen (2005) conclude that jumps are only expected to make up 4.5 to 7% of total return variance. As a result, it seems implausible that jumps in equity prices are driving the U shaped pattern. Perhaps there are inherent characteristics in the NYSE that drive the U shaped pattern in both diffusive and jump components of equity return variance, but if so, what are the characteristics and how are they driving volatility? Another interesting question is related to the proportion of realized variance made up by bipower variation. In Figure 6d it is clear that the bipower variation makes up a small portion of the realized variance during the first 17.5 minutes of the trading day and a large portion of the realized variance during the last 17.5 minutes of the trading day. Does this suggest that trading at the start of the day is information driven and more related to jump components while trading at the end of the day is more related to the diffusive components in equity return variance including trader activity to close positions and prepare for the next day? VII. Conclusions and Future Work 17 The lab report concludes at an interesting point in our research. After beginning the semester with no exposure to the literature on volatility modeling and no previous knowledge of the literature on jump components in asset prices, we arrive at several questions that could constitute the topic of a senior honors thesis. In particular, how strongly are SEC Filings related to idiosyncratic jumps in stock prices? Can they be used to forecast or explain flagged jumps? How robust are the patterns outlined in Sections V and VI of this report? What can we learn from the intraday patterns in jump components of equity return variance? How do they relate to the daily operations of financial markets? To answer any of these questions the analysis in this report will need to be developed. It will be helpful to apply some of the same procedures to all 40 of the stocks considered in Law (2007). Of course, it will also be necessary to consult the literature to find more robust methods to support the claims made in this report. Papers related to intraday patterns in volatility may provide helpful methods of investigation and suggestions for future analysis. One similarity in the literature is that patterns in volatility are often discussed alongside patterns in volume. Analyzing volume with the high frequency data may be an interesting extension of our current work. This will need to be performed in anticipation of the fall semester. The focus in the fall will be to write a paper that properly addresses those questions outlined above. 18 VII. Tables Flagged Jumps at Recommended Window Sizes for SPY Data* Sampling Frequency 5 minutes 7 minutes 11 minutes 17.5 minutes 27.5 minutes 38.5 minutes 55 minutes 77 minutes Window Size 270 250 200 150 110 90 78 75 Jumps 1193 819 522 306 212 135 120 82 Flagged Jumps at Recommended Window Sizes for PEP Data* Sampling Frequency 5 minutes 7 minutes 11 minutes 17.5 minutes 27.5 minutes 38.5 minutes 55 minutes 77 minutes Window Size 270 250 200 150 110 90 78 75 Jumps 1285 875 549 381 230 169 137 89 19 Tables 2a & 2b: The table highlights the number of flagged jumps at the recommended window size by Lee and Mykland. * refers to the fact that approximations were made for window sizes that were not included in the paper. Flagged Jumps at Recommended Window Sizes for SPY Data* Sampling Frequency 5 minutes 7 minutes 11 minutes 17.5 minutes 27.5 minutes 38.5 minutes 55 minutes 77 minutes Window Size Significance Level 270 250 200 150 110 90 78 75 0.99988 0.999836 0.999753 0.999624 0.999435 0.999236 0.998948 0.9985758 Jumps 617 436 297 214 158 120 120 82 Tables 2a & 2b: The table highlights the number of flagged jumps at the recommended window size by Lee and Mykland when corrected for Type I errors. * refers to the fact that approximations were made for window sizes that were not included in the paper. SEC Filing / Jump Day Matches Type of Filing 2001 3 23 Def 14A 2001 7 18 8K 2001 8 1 8K 2001 8 1 8K 2002 2 12 8K, 13G/A 2002 10 17 8K/A 2003 10 28 8K/A 2003 11 5 8K 2003 11 5 8K Table 4a: The table above includes the matches between flagged jump days at a 5 minute sampling interval for the BNS statistic and SEC filings of PEP. Days were matched if an SEC filing was filed on the day of or the day before a jump. Days may be listed twice if there was a filing the day of and the day before a flagged jump. It is interesting to note that the only types of filings that matched were 8Ks or 13s. No quarterly or annual filings were matched, which suggests the SEC Filings related to flagged jumps were unexpected. SEC Filing / Jump Day Matches 2001 7 26 20 2001 2001 2001 2001 2001 2001 2003 2003 2003 2004 2004 2005 2005 7 7 8 8 8 9 1 4 7 2 11 2 9 26 27 1 1 7 17 30 16 24 6 29 28 29 Table 4b: Above the matches between flagged jump days at a 17.5 minute sampling interval for the BNS statistic and SEC filings of PEP. The only day that is common to both Table 4a & 4b is August 1 st, 2001. Finding the types of filings associated with these match dates is left for future work, which should also include the application of this analysis to a wider variety of stocks. Chi-Square Test of Independence 5 minute sampling interval Expected matches Observed matches χ² Degrees of Freedom (28/1241)(84/1241)(1241) = 1.90 9 Σ (Oi – Ei)² / Ei = (9-1.9)²/1.9 = 26.5 3 17.5 minute sampling interval Expected matches Observed matches χ² Degrees of Freedom (65/1241)(84/1241)(1241) = 4.40 14 Σ (Oi – Ei)² / Ei = (14-4.4)²/4.4 = 20.9 3 Table 4c: The Chi-Square Test of Independence tests independence between the flagged jumps from the BNS statistic and SEC Filings for Pepsi Co. Expected matches are calculated by assuming independence between the jumps and filings. They multiply the sample average of jumps and SEC Filings over the sample times the number observations in the sample to calculate an expected number of matches. The value of χ² is statistically significant at the .999 level for both sampling intervals. This rejects the null hypothesis that SEC filings and flagged jumps are independent. 21 IX. Figures 22 Figure 1a & 1b: Included above are figures that plot the SPY level and returns throughout the sample. The level sampling frequency is twice per day while the return sampling frequency is daily. It is easy to confirm on finance.yahoo.com or Google finance that our high frequency data when sampled at daily intervals has an identical appearance to the data reported online. 23 Figure 2a & 2b: The plots above include the number of flagged jumps by the LM statistic across different sampling frequencies and window sizes for both the SPY and the PEP data. The number of flagged jumps for each the SPY and PEP at the window sizes recommended by LM can be found in Tables 2a & 2b. 24 Figure 2c: The figure above includes the 30 second price series for Pepsi Co. on April 26 th, 2005. The arrow is pointing to a trade that seems to be inconsistent with the other prices. However, it is interesting to note that the volume on that trade is 181,700 shares which is significantly higher than an average 5 year average of 4,869 shares per transaction. A plausible explanation for the low price associated with this trade is that an investor was willing to sell shares at the lowest price of the day in order to unwind a large position. Figures 2d & 2e: These plots will be discussed again in Section VI: Intraday Patterns in Volatility. With respect to Section II: Data, it is notable that both plots are created by the exact same procedure across the entire data set at a 17.5 minute sampling interval, and their only difference is two data points. These outliers are removed for the work in Section VI. 25 Figure 3a & 3b: Included above are plots of the BNS statistic across the sample. The recommended statistic from Huang and Tauchen (2005) is used to compute the statistics at the recommended sampling frequency of 17.5 minutes from Law (2007). The BNS statistic flags 37 days for the SPY and 65 days for PEP as statistically significant rejecting the null hypothesis that there were no jumps. 26 Figure 4a: The plot above includes the 5 minute price series for Pepsi Co. on August 1 st, 2001. It is the only flagged jump that matches with an SEC filing for both the 5 minute and 17.5 minute sampling interval. It provides the ideal example of a match between a flagged jump and SEC Filing. The filing released was an 8K announcing Pepsi’s unconditional clearance from the Federal Trade Commission to merge with the Quaker Oats Company. Plans to merge were previously announced on December 4, 2000 and as a result, the market was prepared to quickly adjust the price of PEP once the merger became official. 27 Figure 5a & 5b: Included in the figures above is the number of flagged jumps for the SPY Data at different times of the trading day. The 2d plot has the same curves as the 3d plot. It simply eliminates the axis for window size by superimposing all the curves on a two dimensional space. Both plots confirm the fact that the statistic stabilizes for sufficiently high window size. Although Lee and Mykland recommend a window size of approximately 150 for our sampling interval of 17.5 minutes, the statistic seems to stabilize with a window size as small as 100. The average number of flagged jumps from k = 100 to k = 150 is 311 while the number of flagged jumps when k = 100 is 322. Also interesting is the “U” shaped pattern in jump arrivals. Slightly after 10:00am there is a noticeable peak in the number of flagged jumps. During the middle of the trading day there are visually less jumps. Starting at about 2:15pm the number of flagged jumps begins to increase a little from the midday lows. 28 Figures 5c, 5d, 5e, 5f, 5g, & 5h: Included in the above figures are the same plots as in Figures 5a & 5b except for the PEP, KO, and BMY data. The LM statistic has stabilizes at a similar value. For PEP the average number of flagged jumps from k = 100 to k =150 is 382 and the number of flagged jumps at k = 100 is 391. Similar values for KO and BMY are 302 and 440 for the averages and 309 and 449 as the number of flagged jumps at k = 100 respectively. The only visual difference in the jump arrival is that the plots seem to be more peaked at the start of the day and they do not appear to increase as significantly at the end of the day, that is to say, the individual stocks examined here do not exhibit a “U” shaped pattern in jump arrival. 29 Figures 6a – 6d: All plots are from the SPY data. In the top left the average of the realized variance is plotted against the average of the bipower variation throughout the trading day. The figure in the top right includes the difference between the realized variance and bipower variation. The bottom figures include the bipower variation as a percentage of the realized variance and the number of times greater the realized variance is than the bipower variation. Figure 6e & 6f: These plots are also constructed from the SPY data. The only difference from Figures 6a – 6d is that the bipower variation is defined in the slightly different manner detailed in Section VI. They are helpful in that they appear almost identical to Figures 6a & 6b, which helps to confirm the patterns we observe in the jump component of equity return variance. 30 Figures 6g & 6h: Here similar plots are made with a minor addition. The realized variance and bipower variation included in Figure 6g are identical to the RV and BV in Figure 6a. The addition is the average of the realized variance for the SPY data on flagged jump days by the BNS statistic. In Figure 6h one can see that the proportion of equity return variance that is comprised by the diffusive bipower variation is even lower on jump days. At the start of the day the bipower variation only accounts for 35% of the equity return variance on average. The jump component on flagged jump days also appears to be more erratic. At certain points in the trading day it actually falls below the bipower variation when averaged across the sample. An explanation for this peculiarity will be left for our future work. Figures 6i & 6j: Similar plots are included for the PEP data. Here the bipower variation is defined as in Figure 6a. 31 Figures 6k & 6l: Similar plots are included for the BMY data. Here the bipower variation is defined as in Figure 6a. Figures 6m & 6n: Similar plots are included for the KO data. Here the bipower variation is defined as in Figure 6a. 32 X. References Andersen, T. G., T. Bollerslev, and F. X. Diebold (2004). Some Like it Smooth, and Some Like it Rough: Disentangling Continuous and Jump Components in Measuring, Modeling, and Forecasting Asset Return Volatility. Working paper, Duke University. Andersen, T. G. and T. Bollerslev (1998). Deutsche mark-dollar volatility: Intraday activity patterns, macroeconomic announcements, and longer run dependencies, Journal of Finance 53, 219-265. Barndorff-Nielsen, O. and N. Shephard (2004). Power and Bipower Variation with Stochastic Volatility and Jumps. Journal of Financial Econometrics 2, 1-37. Barndorff-Nielsen, O. and N. Shephard (2006b). Impact of Jumps on Returns and Realized Variances: Econometric Analysis of Time-Deformed Levy Processes. Journal of Econometrics 131, 217-252. Black, F. (1986). Noise. Journal of Finance 41, 529-543. Eraker, B., M. Johannes, and N. Polson (2003). The Impact of Jumps in Volatility and Returns. Journal of Finance 58, 1269-1300. Huang, X. and G. Tauchen (2005). The Relative Contributions of Jumps to Total variance. Journal of Financial Econometrics 3, 456-499. Lee, S. S. and P. Mykland (2006). Jumps in Financial Markets: A New Nonparametric Test and Jump Dynamics, working paper. Wood, R. A., T. H. McInish, and J. K. Ord (1985). An investigation of transaction data for NYSE stocks, Journal of Finance 25, 723-739. 33 34

Investigation of NYSE High Frequency Financial

Related documents

Products

Support

Investigation of NYSE High Frequency Financial

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib