Trading Networks Lada Adamic, Celso Brunetti, Jeffrey H. Harris, and Andrei Kirilenko* Abstract In this paper we analyze the time series of 12,000+ networks of traders in the emini S&P 500 stock index futures contract and empirically link network variables with financial variables more commonly used to describe market conditions. We show that network variables lead trading volume, intertrade duration, effective spreads, trade imbalances and other market liquidity measures. Network variables reflect information, information asymmetry and market liquidity and significantly presage future market conditions prior to volume or liquidity measures. We also find two-way Granger-causality between network variables and both returns and volatility, highlighting strong feedback between market conditions and trading behavior. JEL Classifications: D85 Network Formation and Analysis, G12 Trading Volume, G17 Financial Forecasting and Simulation, L14 Networks *Adamic is with the University of Michigan, Brunetti is with the Board of Governors at the Federal Reserve Bank, Harris is with Syracuse University, and Kirilenko is with the Commodity Futures Trading Commission. We are grateful to Paul Tsyhura for invaluable assistance with the retrieval, organization, and processing of transaction-level data. We thank Kirsten Anderson, Ana Babus, Matt Elliott, Pat Fishe, Ben Golub, Matt Harding, Matt Jackson, Pete Kyle, Shawn Mankad, Antonio Mele, Anna Obizhaeva, Han Ozsoylev, Kester Tong, Kumar Venkataraman and presentation participants at the American University, Cambridge University, the Chicago Mercantile Exchange, the Commodities Futures Trading Commission, 2009 Complexity Conference at Northwestern University, 2009 Econometric Society Summer Meetings in Barcelona, the Federal Reserve Board of Governors, NASDAQ, 2009 NBER-NSF Time Series Conference, Oxford, the Securities and Exchange Commission, Southern Methodist University, Stanford University, Syracuse University, the University of Maryland, the University of Michigan, Villanova University, and 2009 Workshop on Information in Networks at New York University for very helpful comments and suggestions. Some earlier drafts of the paper were circulated under the title ”On the Informational Properties of Trading Networks.“ The views expressed in this paper are our own and do not constitute an official position of the Commodity Futures Trading Commission, its Commissioners or staff. Financial markets bring buyers and sellers together, aggregating information for price discovery and providing liquidity for uninformed, liquidity-seeking order flow. Indeed, numerous models show that trading outcomes – prices, volume, volatility, and effective spreads – depend on the complex mix of information flows and liquidity demands of individual buyers and sellers.1 In this paper we use established network analysis tools to characterize information and liquidity flows in financial markets. We find strong and significant correlations between network statistics and financial variables, evidence that quantitative network parameters capture information, information asymmetry and liquidity characteristics in the emini S&P500 futures market. Moreover, we find that the time series of network statistics significantly presage (Granger-cause) intertrade duration, trading volume, effective spreads, trade imbalances and signed volume imbalances. Since duration and volume have been shown to proxy for information and liquidity in financial markets,2 these findings suggest that network statistics serve as primitive measures of information and liquidity that emerge prior to the advent of trading volume and other market liquidity changes. In this regard, network statistics contain useful information beyond that contained in past volume and other more standard market descriptors. We also find that network variables capture complex feedback mechanisms that likely underlie high frequency trading strategies. Our network variables See, for instance, Copeland and Galai (1983), Glosten and Milgrom (1985), Kyle (1985), Admati and Pfleiderer (1988), Foster and Viswanathan (1990, 1993, 1994). The theoretical literature on limit order markets includes Parlour (1998), Foucault (1999), Biais, Martimort and Rochet (2000), Parlour and Seppi (2003), Foucault, Kadan and Kandel (2005), Back and Baruch (2007), Goettler, Parlour, and Rajan (2005, 2009), Large (2009), Rosu (2009), and Biais and Weill (2009), among others. 2 See Epps and Epps (1976) and Engle and Russell (1998). 1 1 exhibit bi-directional Granger-causality with returns and volatility measures, highlighting the fact that trading strategies evolve dynamically in response to changing market conditions. Importantly, we replicate the contemporaneous correlations between financial and network variables in a simulated agent-based trading model with randomly generated orders, but find no Granger-causality in simulated networks. Thus, the fact that network variables lead volume, duration and liquidity metrics does not stem from some statistical artifact of network construction, but rather from real market interactions. Our sample consists of more than 7.22 million transactions conducted during August 2009 in the nearby e-mini S&P 500 futures contract. Utilizing proprietary data from the U.S. Commodity Futures Trading Commission (CFTC) that identifies the trading account on both sides of every trade, we reconstruct the links between buyers and sellers over short intervals of time, creating a time series of more than 12,000 networks each comprised of 600 sequential trades. Importantly, these financial networks reflect the microstructure of the U.S. equity index futures markets that serve as our laboratory. Our results are robust to different equity index futures markets (e-mini Dow Jones and Nasdaq 100 futures), different observation periods (May and August 2008), and different sampling frequencies (240 and 600 transactions). The juxtaposition between live and simulated markets suggests that these Grangercausality results stem from strategic trading behavior and not from some statistical artifact of network construction. 2 While graph theory dates back several centuries, more recent applications offer novel insights across a diverse range of disciplines (see Newman (2010)). Network analysis yields insight into molecular processes in cells, the stability of ecosystems, and epidemic thresholds in human contact networks. Likewise, network analysis is fruitful in modeling large scale information systems (such as the Internet) and devising algorithms for efficient information retrieval (such as Google). Applications in economics and finance range from analyses of interactions among firms to correlations between various asset price time series.3 In the network literature, most applications to human social networks model the interaction between entities that drive overall network dynamics. Since the clearinghouse is the counterparty to all futures trades, our networks represent traders matched on straightforward price and time priority rules. In our work, traders interact only through the trade matching process in the limit order book and do not intend to necessarily trade with each other. We show that network metrics can capture many salient features of financial markets, from localized patterns of trade to market-wide characteristics. In finance, a growing literature addresses banking networks. Leitner (2005) models bank networks where linkages both spread contagion and induce private sector bailouts. Similarly, Babus (2009) models the contagion risk that a bank Other financial network papers include Caldarelli and Catanzaro (2004), who analyze networks of corporate boards, Jackson (2007, 2009), Allen and Babus (2008), Colla and Mele (2009), and Oszoylev and Walden (2009) who summarize network applications in economics and finance and Onnela, Kaski and Kertész (2004), Boginski, Butenko and Pardalos (2005) and Castiglionesi and Navarro (2010), Fenn, Porter, Williams, McDonald, Johnson and Jones (2010) who model financial networks more generally. 3 3 failure propagates through networks. Allen, Babus and Carletti (2010) also model bank portfolio decisions as a network formation game.4 This is the first paper to empirically examine the links between network statistics and standard financial variables. We consider trading network parameters as primitive descriptors of the order execution process – not outcomes resulting from the preferences of individual traders, but rather from the simultaneous and aggregate liquidity demands and information-based trade that comprise financial markets. Our results suggest that network statistics and standard financial descriptors are strongly linked economically, with various network statistics serving primary roles in describing information and liquidity in financial markets. In this regard, network statistics might be useful for constructing profitable trading strategies. The paper proceeds as follows. In Section I, we describe our unique high frequency data and our network and financial variables. In Section II, we show examples of different trading networks and use these examples to illustrate how our network statistics relate to financial variables. In Section III, we present contemporaneous correlations and lead-lag (Granger-causality) tests among network and financial variables. Section IV presents an agent-based simulation model of trading networks in order to compare our empirical market results to results in a controlled setting. We conclude with Section V. Other recent papers on trading networks include Blume, Easley, Kleinberg and Tardos (2009) and Cohen-Cole, Kirilenko and Patacchini (2011). 4 4 1. High Frequency Data, Network Metrics, and Financial Variables We use audit trail, transaction-level data for over 7.2 million regular transactions (executed between 9:30 a.m. ET and 4:00 p.m. ET) in the September 2009 e-mini S&P 500 futures contract during August 2009. The e-mini S&P 500 futures contract is a highly liquid, fully electronic, cash-settled contract traded on the CME GLOBEX trading platform.5 Each transaction includes the date, time (up to the second), unique transaction identifier (which enables us to order transactions sequentially within each second), executing and opposite trading accounts, executing and opposite brokers, buy/sell flag, price, and quantity.6 31,585 unique accounts trade during our August 2009 sample. Since our goal is to explore links between financial variables and network statistics, we first determine a sampling frequency that ensures our financial variables are not contaminated by market microstructure noise. We apply both the Andersen, Bollerslev, Diebold and Labys (2000) volatility signature plot and Bandi and Russell (2006) technique and find that the smallest acceptable sampling window must contain at least 50 transactions. However, networks constructed over such a small number of transactions can be too sparsely connected to adequately capture the dynamic nature of the informational and liquidity forces that drive markets. Consequently, we use 600 transactions as a conservative sampling window to construct meaningful trading networks and to compute network statistics. Although pro rata allocations and lead market makers exist on the CME, GLOBEX trades in the e-mini S&P contract are matched with price-time priority. 6 We applying standard filters designed to look for recording errors and outliers in the price and quantity series (see Hansen and Lunde (2006)) and find no irregularities in the data. 5 5 1.1 Networks and Network Metrics Over the entire month of trading, we construct 12,032 sequential (nonoverlapping) trading networks, each comprised of 600 consecutive trades.7 Following network terminology, a network consists of nodes and edges. We define a node as an individual trader (trading account) and an edge that connects a pair of nodes as a trade between two traders. In our work the edges are directed, with buys representing edges pointed toward a trader and sells representing edges pointed away from a trader. Multiple transactions between traders are represented by a single directed edge pointed toward the buyer and away from the seller, even if the prices and quantities of each transaction differ. Additionally, from network terminology the degree of a node is the number of edges connected to it, with indegree (outdegree) representing the number of edges pointing toward (away from) the node. For clarity, we use the terms buydegree and selldegree to represent indegree and outdegree, respectively. Quantitative analysis of networks employs a set of standard metrics. We focus on network metrics that represent the properties of information and liquidity in markets. Centrality, for instance, quantifies the importance of a specific trader in a network. Although several centrality measures exist, we use degree centrality to characterize how “central” a particular trader is in terms of its trading, utilizing the buy/sell indicators in our data to construct buydegree and selldegree centrality– Since the number of trades during the trading hours is seldom precisely divisible by 600, we complete the final network for each day with the trades reported after the close. Our results are robust to excluding this final network of each day. 7 6 representing the number of bilateral purchases and sales, respectively, for each trader in the network.8 We then compose aggregate centralization from individual trader centrality measures that characterizes the inequality in degree among all traders in the network. Specifically, we compute a centralization Gini (GB,S) for buyers (B) and sellers (S) defined as: GB,S = ∑π 1 (2ππ − π − 1)ππ π∗πΈ summing over all N traders, where ki is the ith trader's buydegree and selldegree, ri, i=1,…,N, is each trader's rank order number9 and E is the number of unique connections between traders present among the 600 trades in each network. By construction, GB and GS are 0 if every trader has the same number of buy and sell connections, and positive with increasing degree inequality. Maximum centralization occurs when one trader does all the buying or selling in the network. Using the above formula, we compute network-wide centralization as the difference between GB and GS. Intuitively, centralization can be interpreted as the presence of a dominant buyer (close to 1) or a dominant seller (close to -1) within the 600-trade network. The absolute value of centralization (|centralization|) therefore represents the presence of a dominant trader on either side of the market. In economic terms, centralization represents a measure of informed trading or strong net demand for liquidity in the market. Note that buydegree and selldegree do not fully capture the role of a trader in the network. For instance, a trader that does not execute a large number of trades but serves to connect otherwise disconnected traders may be very central, but will have relatively low buydegree and selldegree measures. 9 The trader with the highest centrality ranks highest, the trader with the second highest centrality ranks second, and so on. We rank buydegree and selldegree centrality separately. 8 7 Table I presents summary statistics for the network variables across all 12,032 networks in our sample.10 Centralization is approximately symmetric and ranges from -0.74 to +0.65 with a mean near zero and standard deviation of 0.17. |Centralization| ranges from 0.0 to 0.74 with mean 0.14 and standard deviation of 0.10. The mass of this distribution is more than one standard deviation away from zero, which can be interpreted as inequality in the number of buy vs. sell matches per trader. Assortativity in networks represents the tendency of `like to be connected with like' for any node (Newman (2002)). For our purposes, we use each trader's buydegree and selldegree to assess assortativity within our networks, examining the propensity of highly connected buyers to trade with highly connected sellers, for instance. In an assortative network, buyers/sellers with many edges (high buydegree/selldegree nodes) are more likely to connect to similar buyers/sellers. In a disassortative network, buyers/sellers with many edges tend to connect to buyers/sellers with few edges. Intuitively, assortativity represents a measure of asymmetric information or liquidity imbalance in a trading network. 11 For instance, one large buyer matched with a number of small sellers exhibits high assortativity while a large buyer matched with a single large seller exhibits low assortativity. We measure assortativity by the Pearson correlations between buydegree and selldegree for all edges present in the network. Using the notation above, over all Each network variable is stationary and exhibit persistent autocorrelations. Jarque-Bera (1980) tests reject the null of normality at standard significance levels. 11 Glosten (1987), Stoll (1989), George, Kaul and Nimalendran (1991) and Madhavan, Richardson and Roomans (1997) each model asymmetric information components in quoted bid-ask spreads. 10 8 edges Ei.j we calculate four pairwise correlations ρ(kibuy, kjbuy), ρ(kibuy, kjsell), ρ(kisell, kjbuy), and ρ(kisell, kjsell) corresponding to the four conditional degree distributions (buydegree ki,jbuy and selldegree ki,jsell) for each connected trader pair i and j. Intuitively, the coefficient ρ(kisell, kjbuy) measures the correlation between the number of unique buyers to whom a seller is selling to and the number of unique sellers that those buyers are buying from. A negative correlation indicates that when a seller matches with many buyers, those buyers are buying from few or no other sellers. From these four correlations, we construct an aggregate assortativity index (AI) for each network π΄πΌ = 1 ππ’π¦ ππ’π¦ ππ’π¦ ππ’π¦ (π(ππ , ππ ) − π(ππ , πππ πππ ) − π(πππ πππ , ππ ) + π(πππ πππ , πππ πππ )) 4 computed over all edges, where the scaling factor, ¼, assures that the assortativity index falls between -1 and 1. By construction, the assortativity index captures patterns in networks that feature large degree dispersion. For example, the assortativity index is high in a network that contains a trader (like a dominant intermediary) who is both a large buyer and a large seller trading with many small counterparties. Likewise, the assortativity index is also large in a network that has a large buyer and a large seller connected through many small intermediaries. In either example, trades occur between parties that differ in both connectivity and liquidity provision/removal. As Table I shows, the assortativity index in our sample ranges from -0.06 to +0.34, with a mean of 0.04 and standard deviation of 0.04. This positive mean 9 stems in part from the skewed degree distributions. Most buyers have low buydegree and, on average, buy from a seller with high selldegree. Similarly, most sellers have low selldegree and, on average, sell to buyers with high buydegree. This pattern is consistent with a market populated by a small number of highly connected intermediaries who buy and sell from many liquidity-demanding (or informed) traders. Clustering is a measure of transitivity in the network, i.e. if i trades with j, and j trades with k, clustering measures whether i also trades directly with k. We quantify clustering using the global clustering coefficient (CC) (Newman (2002)): πΆπΆ = 3 ∗ πππππ ππ π where T represents the total number of “connected triples” of three traders (i, j and k) and Tclosed represents the number of “closed triples” where i trades with j, j trades with k, and i also trades directly with k.12 Economically, the clustering coefficient represents liquidity in the market. Connected triples represent the presence of at least one short-term intermediary. Intuitively, large clustering coefficients represent greater liquidity. In the extreme case where a single trader provides liquidity, no closed triples exist and the clustering coefficient is zero. At the other extreme, where all triples involve three traders connected as a “closed triple”, the clustering coefficient is 3. As shown in Table I, the clustering coefficient in our sample ranges from 0.0 to 0.23, with an average of 0.05 and standard deviation of 0.03. On average, there is We treat the edges as undirected in the clustering coefficient since intermediation/liquidity involves both buying and selling. Fagiolo (2007) discusses directed clustering coefficients. 12 10 only a slight tendency for traders to cluster in this market. Indeed, the average clustering coefficient we observe is nearly identical to the coefficient we would expect by randomly connecting traders with 600 trades. Dispersion of information or liquidity in the market may be measured using connected components. A connected component is the set of all traders connected with each other through bilateral trading. The largest strongly connected component (LSCC) is the maximum number of traders that can be reached from any other trader by following directed edges. We compute the largest strongly connected component as: πΏππΆπΆ = πΏππΆπΆπππ₯ π where LSCCmax is the raw count of traders in the LSCC and N is the total number of traders in the market. This ratio ranges between 1/N (one trader connects all other traders) and 1 (all traders are connected to all other traders). A larger LSCC forms when many traders are both buying and selling, that is when the supply of liquidity is dispersed among many traders. For example, a large strongly connected component is likely to emerge as a result of a large number of limit orders rather than one large market order. Since the LSCC is scaled by the total number of traders in the market, the arrival of concentrated net demand for liquidity will manifest itself in a smaller LSCC. As Table I indicates, the portion of the network occupied by the largest strongly connected component varies significantly, ranging from 0.00 to 0.52, with a mean of 0.09 and standard deviation of 0.06. 11 1.2 Financial Variables For each of the 12,032 sampling periods, we also compute market returns, volatility, intertrade duration, trading volume, and more standard liquidity measures including effective bid-ask spreads, and signed volume, buy/sell trade imbalances. These variables are typically utilized in economic studies to describe financial market conditions. Lastly, given the fact that our clustering and centralization measures may be related to market concentration, we also construct a Herfindahl measure of buy/sell trade concentration during each interval. Descriptive statistics for these financial variables are shown in Table II.13 Market returns contain valuable information about the true underlying price formation process, but may also contain market microstructure noise, measurement errors, and seasonal patterns (Engle (2000)). We compute open-to-close market returns as the log difference between the last and the first transaction price for each network period. We remove the predictable intraday seasonal component from raw returns by regressing returns on a constant and a sequence of dummy variables for each half-hour during the trading day and then use the unexplained term as our measure of market returns.14 As shown in Table II, returns range from -0.15 to +0.20 and are centered on zero with standard deviation of 0.04. Returns exhibit slightly negative skewness. As with our network variables, all financial variables in our sample are stationary and (other than returns) highly persistent, with significant autocorrelation coefficients at 1, 5, and 10 lags. 14 We apply the same technique and use a Fourier flexible form to remove seasonality from all variables and examine alternative close-to-close returns with similar results. 13 12 We use three measures to estimate volatility during each sampling interval period: absolute returns, squared returns, and the log difference between the maximum and minimum prices (price range). For the results reported below, we use price range as an estimate of volatility.15 Importantly, however, our main results are not affected by the choice of volatility estimator. As shown in Table II, across 12,032 sampling intervals, volatility averages about 0.06 percent, corresponding to an annual volatility of 23.56 percent. Volatility ranges from 0.02 to 0.29 percent, with a standard deviation of just 0.02 percent. Trading volume contains valuable information about the underlying price formation process, because volume together with observed transaction prices may be driven by a common latent factor.16 We compute trading volume as the number of contracts both bought and sold during each network interval. Volume ranges from 1095 to 7706 with an average of 2629 and standard deviation of 636 contracts. Intertrade duration can be interpreted as a proxy for the arrival of new information or liquidity to the market (Engle and Russell (1998) and Engle (2000)). We compute three measures of duration: total period duration, volume-weighted period duration, and the average of the 599 intertrade durations and report results for total period duration – the time elapsed between the start and end of the network interval. Our main results are not affected by the choice of duration Range-based volatility estimators are more efficient than return-based volatility estimators (see, e.g., Parkinson (1980), Garman and Klass (1980), Beckers (1983), and Brunetti and Lildtholdt (2006)). Christensen and Podolskij (2007) use the price range to compute realized volatility in high frequency data. 16 The vast literature on the properties of trading volume includes Clark (1973), Epps and Epps (1976), Tauchen and Pitts (1983), Admati and Pfleiderer (1988), Easley and O'Hara (1992), and Andersen (1996), among others. 15 13 estimator. As shown in Table II, the intertrade duration ranges from zero to 344 seconds. In our sample, 600 transactions occur every 40.8 seconds, on average. Liquidity reflects the ease with which a security can be bought or sold without a significant price effect. Unlike trading volume and intertrade duration, liquidity is not directly observable and has multiple dimensions that are difficult to capture in a single measure. Indeed, as discussed above, our network variables capture some of these dimensions. We also measure more traditional liquidity metrics, like effective spreads, signed volume and trade imbalances.17 The effective spread captures the implied cost of trading and is equal to twice the square root of the first order autocovariance of returns over each interval (Stoll (1978)). Signed volume (buy minus sell volume) and trade imbalances (number of buys minus sells) capture the propensity for buy and sell orders to match up during the interval, another dimension of liquidity. The e-mini S&P 500 contract is extremely liquid. Table II shows that the average effective spread is less than 0.8 cents, with a maximum effective spread of just over two cents during our sample period. The raw Amihud illiquidity measure (unreported) also has mean, median, and standard deviation very close to zero and by this measure, it takes about 21 contracts (or over $1 million) to move prices by one tick (0.25 index points). While mean and median signed volume and trade imbalances are very small, the market can be imbalanced, with maximum signed volume exceeding 5000 contracts and maximum buy imbalance at 13 trades. These We also compute Amihud illiquidity measures using both contract volume and dollar trading volume with nearly identical results. 17 14 extreme cases suggest that market liquidity can be stressed during the short intervals we study, giving us reason to explore whether liquidity metrics might also presage returns, volatility and volume below. Table II also displays a Herfindahl measure computed over each trading interval to compare with the clustering and centralization metrics we apply from network statistics. As shown, the Herfindahl trades statistic is less variable than most other variables, with a minimum of 0.0002, maximum of 0.0013 and standard deviation of just 0.0001.18 2. Illustrative Examples Prior to conducting the time-series statistical analysis for the full sample of the network and financial variables, we present statistics from representative trading networks to illustrate the key concepts and the intuition behind our approach. Figure 1 presents three actual trading networks from our data, accompanied by network statistics and financial variables. The three trading networks are chosen to display relatively high values of centralization (left column), assortativity (middle column), and both clustering and the largest strongly connected component (right column). The left column of Figure 1 presents a network with one dominant seller (perhaps an informed trader or trader with strong liquidity demand) matched with many small buyers. As shown in Panel A, this trading network has a large negative centralization coefficient, intermediate levels of assortativity and LSCC, and a 18 In unreported results we also construct a Herfindahl volume metric with nearly identical results. 15 small clustering coefficient. As shown in Panel B, this network is also associated with a large negative market return, high volatility, intermediate trading volume, and low duration. Liquidity, volatility, duration and return statistics suggest that strong liquidity demand drains a significant portion of market liquidity and made a considerable price impact. Indeed, the signed volume and trade imbalance are accompanied by a large negative return, volatility, and short intertrade duration suggesting a large market order which quickly “walks the book.” The center column of Figure 1 presents a trading network with a relatively large buyer trading with several moderately-sized sellers through a number of intermediaries. As shown in Panel A, this network is characterized by intermediate centralization and clustering levels, suggesting that the net demand for liquidity is relatively balanced, with slightly negative signed volume but positive trade imbalance. Moreover, the high assortativity index means that large traders are mostly matched with many small traders (rather than with each other); and the small LSCC suggests that liquidity is provided by just a few intermediaries. As shown in Panel B, this network is associated with good liquidity and relatively low effective spreads. Furthermore, this trading network is characterized by high intertrade duration and relatively low volume, so these 600 trades are relatively small and executed quickly. This network is associated with intermediate positive market returns and intermediate volatility levels. Overall, these market conditions represent a handful of intermediaries supplying liquidity to smaller traders resulting in low volatility, low effective spreads and slightly positive returns. 16 The right column of Figure 1 presents a trading network with greater dispersion among buyers and sellers of various sizes, reflected in the zero centralization metric (no dominant buyer or seller exists) and low assortativity (“like traders trade with like”). High levels of clustering and LSCC, suggest that the net supply of liquidity is dispersed among many. Indeed, this trading network exhibits zero returns accompanied by good liquidity–low signed volume imbalance, zero trade imbalance and relatively large trading volume. These three examples illustrate our conjecture that network statistics and standard financial variables capture different dimensions of market conditions. Indeed, since information and liquidity appear to drive both network and financial variables, the two sets of variables are likely to be statistically interrelated as well. We formally examine the statistical relation between network metrics and traditional financial variables below. 3. The Statistical Relation between Network and Financial Variables We first examine contemporaneous correlations among network and financial variables and then test for lead-lag relations between and among these variables. Our ultimate goal is to discover whether network analysis tools can serve to improve our understanding of financial market conditions beyond what more typical financial descriptors provide. 17 3.1 Correlations Panel A in Table III reports contemporaneous correlations among network variables. Although centralization is not correlated with other network variables, the other network variables are often strongly contemporaneously correlated with each other. |Centralization| is negatively correlated with both the clustering coefficient and the LSCC. Intuitively, this means a dominant trader increases net demand for market liquidity and counterparties to that dominant trader are less likely to trade with each other. Conversely, the high positive correlation between clustering and LSCC suggests that dispersed net liquidity supply among many traders is often accompanied by strong liquidity demand. The assortativity index is relatively uncorrelated with centralization measures and clustering, but is strongly negatively correlated with the LSCC. Intuitively, both higher assortativity and lower LSCC reflect greater liquidity imbalance. Panel B in Table III examines this economic intuition more closely with contemporaneous correlations between financial and network variables. Consistent with our illustrative examples above, centralization is strongly correlated with returns, signed volume and trade imbalances. At the same time, while |centralization| is positively correlated with volume and volatility, it is negatively correlated with duration, effective spreads and trade concentration. Intuitively, a large order (with high |centralization|) has a significant chance to walk the limit order book, resulting in higher volatility and volume with lower intertrade duration. 18 Higher asymmetric information or greater liquidity imbalance, as represented by the assortativity index, is negatively correlated with volume and volatility, but largely unrelated to more traditional market metrics. The former suggests that higher asymmetric information or liquidity imbalance is accompanied by a slight reduction in both volatility and trading volume. The latter, however, suggests that asymmetric information, as represented by assortativity, is not related to signed volume or trade imbalances. Clustering is positively correlated with volume and negatively correlated with effective spreads and volatility. Intuitively, since clustering captures shortterm market making activities, greater intermediation improves liquidity, smooths volatility and enhances trading volume. Clustering is also highly negatively correlated with trade concentration (Herfindahl trades). The LSCC, the dispersion of net liquidity supply, is positively correlated with both volume and volatility. The LSCC, however, is not significantly correlated with other more traditional financial market variables. 3.2 Lead-Lag Relations Correlations offer evidence that some network and financial variables are contemporaneously related and suggests that network variables may be redundant descriptors of market conditions. In this section, we examine the lead-lag relations among network and financial variables to gauge the incremental value in 19 calculating non-standard network metrics. As noted above, we conjecture that network metrics serve as primitive measures of information and liquidity. To explore whether network variables presage or follow standard financial variables we apply standard Granger causality tests in vector autoregressive (VAR) models.19 Table IV provides the p-values of Granger-non-causality tests among the five network variables. Centralization neither Granger-causes nor is Grangercaused by other network variables (p-values equal to 0.65 and 0.37, respectively). As a system, however, the remaining network variables are both jointly Grangercaused by (and jointly Granger-cause) each other (p-values between 0.00 and 0.05), indicating strong feedback effects. Indeed, most pair-wise tests are also significant. Table V presents p-values for the Granger-non-causality tests between network variables and each financial variable both jointly and independently. In Panel A, we find that returns and volatility are jointly both Granger-caused by and Granger-cause network variables (p-values of 0.00 and 0.01 for returns and 0.00 and 0.00 for volatility, respectively). Centralization drives the result for returns, with strong bi-directional Granger-causality. For volatility, we find strong bidirectional Granger-causality (representing feedback effects) between volatility and each of our four network variables independently. Panel B in Table V reports test results for intertrade duration and volume. We find strong evidence of one-way causality–duration and volume are both Since the variables exhibit heteroskedasticity and serial correlation, we estimate VAR models using generalized method of moments (GMM) and Newey-West robust standard errors. Standard tests (available upon request) show that the VAR model must include all five network variables. The Akaike and Schwartz Information Criteria indicate an optimal lag-length of twelve. 19 20 Granger-caused by each network variable, both individually and jointly (joint pvalue = 0.00). While duration leads the LSCC and volume leads assortativity independently, neither duration nor volume lead network variables jointly (p-values = 0.20 and 0.16, respectively). These results indicate that network statistics presage both duration and trading volume. To the extent that duration and volume proxy for information arrival, network statistics reflect forthcoming changes in market conditions. Lastly, Panels C and D show that network variables jointly Granger-cause signed volume, trade concentration, effective spreads and trade imbalances. These strong results suggest that the demand and supply of liquidity are reflected in network variables prior to emerging in more traditional liquidity measures. While individual pair-wise tests vary significantly, we find no feedback effects here. Taken as a whole, these network statistics serve to presage future, short-term changes in both liquidity and market concentration. For comparison, Table VI displays bi-variate VAR Granger-causality tests between our liquidity and the traditional market statistics of returns, volatility, volume and duration. Contrary to results obtained with network statistics, we find bi-directional causality among many of these variables, suggesting that they are jointly determined in the market during the same trading interval. Effective spreads, in particular, exhibit bi-directional relations with volatility, volume and duration. Trade imbalances (Panel B) are unrelated to these other market statistics. Interestingly, we find no lead-lag in either direction between these 21 traditional liquidity measures and returns, in stark contrast to the strong bidirectional effects between network statistics and returns. Centralization, in particular, is strongly related to returns (Table V) while market concentration (Table VI) is not significantly related to returns at all. These results confirm that our network metrics capture an element of market conditions that is simply not found in traditional market descriptors. 3.3 Robustness The links between network and financial variables are remarkably robust with respect to different markets, observation periods, sampling frequencies and levels of aggregation (defining nodes as either individual accounts or executing brokers). We replicate correlations and lead-lag results for both the e-mini Nasdaq 100 and e-mini Dow Jones stock index futures contracts. We also find qualitatively similar results for these three equity futures contracts during the individual months of May 2008, August 2008 and August 2009. The results also hold for networks constructed with sampling windows of 240, 600 and 1,200 transactions during each month, suggesting that our results are not sample-specific, market-specific, nor specific to the trade construction of the network. 4. An Agent-based Simulation Model Benchmark The bi-directional Granger-causality that we find between network and financial variables suggests strong feedback effects in short intervals. In order to explore the source of this feedback, we construct an agent-based simulation model 22 of a limit order market that is devoid of a feedback mechanism. The simulated model allows for heterogeneous beliefs about the price process, but imparts no intentionality or memory upon the traders. Consequently, we might expect to find significant correlations among network and financial variables, but not necessarily evidence of Granger-causality. In the simulation we endow a fixed number of traders with orders which arrive with a Poisson distributed inter-arrival time and are assigned at random. The quantity for each order is drawn from a lognormal distribution and with an equal probability of being a buy or a sell order. For a sell order the price is set a small fixed number above the last transaction price plus a log-normally distributed, mean zero random variable. For a buy order, the price is set a small fixed number below the last transaction price plus a log-normally distributed, mean zero random variable. The (log-normally distributed, zero mean) random variable added to the order price is consistent with an equilibrium price function under the assumption of heterogeneous beliefs about the true price process (see Scheinkman and Xiong (2003)). As our simulation starts, orders begin to populate the limit order book. Each incoming order is compared to previously placed orders by price and time priority. If a match between a buy and sell order is made, a transaction takes place. If an order is only partially filled, we attempt to match the remaining quantity against other orders on the book, and if no match is found, the order remains in the limit order 23 book. In order to avoid stale limit orders, each order expires and is withdrawn from the market after 100 subsequent transactions are executed. We simulate orders until we attain 7.2 million transactions, segment the data into 12,000 networks of 600 consecutive transactions, and compute network and financial variables for each sampling interval using the same methods applied to the empirically observed data. We then analyze both correlations and lead-lag relations among network and financial variables to compare with actual market results. Generally, we find that the simulated executions generate contemporaneous correlations similar to, but less significant than, those obtained from the real market data. The correlations between network variables and between network and financial variables are displayed in Table VII. As with the market data, we find strong correlations between returns and centralization. The simulation also sheds light on the possible sources of this high correlation. We find the mechanics of the impatient order submission strategy raises the correlation: when a large buy order at a high price is matched against several existing sell orders network centralization is high. Matching a large number of sell orders to a single buy order is more likely to reflect the large buy order walking up the book, so contemporaneous returns are also high. Generally, correlations between volatility and |centralization|, and between the Hefindhal index and |centralization| are also relatively high in the simulated networks. These results suggest that the order matching process can generate 24 contemporaneous correlations between financial and network variables in the absence of any economic motivation (other than the prescribed urgency to trade built in to simulate perishable information). However, although our simulations prescribe order execution rules and order cancellation times, these simulations do not allow for dynamic trading strategies that adapt to changing market conditions. As expected, Granger-causality tests among simulated network and financial variables lack the dynamic structure found in live market data.20 Indeed, Granger-causality tests among network and financial variables are generally insignificant and yield few feedback effects, indicating a very poor fit. This suggests that the Granger-causality results that we find in the actual market data arise as a result of the strategic behavior of traders and are not simply artifacts of the order matching process. This evidence further suggests that network statistics uniquely characterize information and liquidity dynamics in financial markets. While simulations generate similar correlations between network and traditional market variables, they cannot replicate the lead-lag relations between these variables. Further, the strong Granger-causality running from network to traditional market variables in real markets is not an artifact of the order matching process. Rather, these simulation results suggest that the real market lead-lag relations reflect latent order submission strategies in financial markets, strategies that manifest Reflecting the lack of dynamics in the simulated data, the AIC (SIC) selects an optimal lag-length of order one (zero) in the VAR specification. We use lag-length of order one. 20 25 themselves in the composition of the network prior to the advent of volume or liquidity changes. 5. Conclusions We use audit trail data that identifies trading accounts on both sides of every trade to construct the links between buyers and sellers in the e-mini S&P 500 stock index futures contract, the price discovery venue for the U.S. equity market. We construct a time series of more than 12,000 trading networks each comprised of 600 sequential trades and use established network analysis tools, to empirically link network variables with financial variables that describe market conditions. We find that network statistics are contemporaneously correlated with these financial variables, suggesting that the underlying economics that drive financial variables also drive network parameters. Importantly, network variables capture information, information asymmetry and liquidity characteristics in the e-mini S&P500 futures market that we study. While network and financial market variables are contemporaneously correlated, network variables are primitive determinants of market conditions, emerging prior to many more common measures of information and liquidity. For example, we find that network statistics strongly Granger-cause trading volume, intertrade duration, effective spreads, trade imbalances and signed volume imbalances. Moreover, we document bi-directional Granger-causality between network variables and both returns and volatility, suggesting that network 26 variables also capture complex feedback mechanisms that underlie trading strategies in financial markets.. We conjecture that market-wide patterns of order execution are informative about the price formation process in order driven markets and these dynamics are manifest in both financial and network variables. Given the fact that network variables presage volume, duration and other market liquidity measures, we conclude that trading network analysis offers a fruitful mechanism for assessing the underlying strategies that drive price formation and liquidity supply in financial markets. Given the strong contemporaneous link between financial and network variables, we believe that network analysis tools might also offer new insights into the behavior of financial markets. The data we employ here differs from the typical social network setting where connections are made intentionally. The trade matching algorithm in these futures markets simply pairs up buyers and sellers with price-time priority. Whether our results hold in settings where orders are executed via preferencing arrangements, across competitive trade execution venues (like the U.S. equity markets), in floor-based exchanges, or in markets where internalization or payments/rebates for order flow are possible remain questions for further research. 27 References Admati, Anat R., and Paul Pfleiderer, 1988, A theory of intraday patterns: Volume and price variability, Review of Financial Studies 1, 3-40. Allen, Franklin, and Ana Babus, 2008, Networks in finance, Working Paper 08-07, Wharton Financial Institutions Center, University of Pennsylvania. Allen, Franklin, Ana Babus, and Elena Carletti, 2010, Asset commonality, debt maturity and systemic risk, Journal of Financial Economics, forthcoming. Amihud, Yakov, 2002, Illiquidity and stock returns: Cross-section and time-series effects, Journal of Financial Markets 5, 31-56. Andersen, Torben G., 1996, Return volatility and trading volume: An information flow interpretation of stochastic volatility, Journal of Finance 51, 169-204. Andersen, Torben G., Tim Bollerslev, Francis X. Diebold, and Paul Labys, 2000, Great realizations, Risk 13, 105-108. Babus, Ana, 2009, The formation of financial networks, University of Cambridge working paper. Back, Kerry, and Schmuel Baruch, 2007, Working orders in limit-order markets and floor exchanges, Journal of Finance 61, 1589-1621. Bandi, Federico M., and Jeffrey R. Russell, 2006, Separating microstructure noise from volatility, Journal of Financial Economics 79, 655-692. Beckers, Stan, 1983, Variance of security price returns based on high, low and closing prices, Journal of Business 56, 97-112. Biais, Bruno, David Martimort, and Jean-Charles Rochet, 2000, Competing mechanisms in a common value environment, Econometrica 68, 799–837. Biais, Bruno and Pierre-Olivier Weill, 2009, Liquidity shocks and order book dynamics, NBER working paper #15009. Blume, Larry, David Easley, Jon Kleinberg and Eva Tardos, 2009, Trading networks with price-setting agents, Games and Economic Behavior 67, 36-50. Boginski, Vladimir, Sergiy Butenko and Panos M. Pardalos, 2005, Statistical analysis of financial networks, Computational Statics and Data Analysis 48, 431-443. 28 Brunetti, Celso, and Peter M. Lildholdt, 2006, Relative efficiency of return- and range-based volatility estimators, Johns Hopkins University working paper. Castiglionesi, Fabio, and Noeli Navarro, 2010, Optimal fragile financial networks, Tilburg University working paper. Caldarelli, Guido, and Michele Catanzaro, 2004, The corporate boards networks, Physica A-Statistical Mechanics and its Applications 338, 98-106. Christensen, Kim, and Mark Podolskij, 2007, Realised range-based estimation of integrated variance. Journal of Econometrics 141, 323-349. Clark, Peter K., 1973, A subordinated stochastic process model with finite variance for speculative prices, Econometrica 41, 135-155. Cohen-Cole, Ethan, Andrei A. Kirilenko and Eleonora Patacchini, 2011, How Your Counterparty Matters: Using Transaction Networks to Explain Returns in CCP Marketplaces, University of Maryland working paper. Colla, Paolo, and Antonio Mele, 2009, Information linkages and correlated trading, Review of Financial Studies 23, 203-246. Copeland, Thomas E. and Dan Galai, 1983, Information effects on the bid-ask spread, Journal of Finance 38, 1457-1469. Corwin, Shane A., and Paul H. Schultz, 2010, A simple way to estimate bid-ask spreads from daily high and low prices, University of Notre Dame working paper. Easley, David and Maureen O’Hara, 1992, Adverse selection and large trade volume: the implications for market efficiency, Journal of Financial and Quantitative Analysis 27, 185-208. Engle, Robert F., 2000, The econometrics of ultra-high-frequency data, Econometrica 68, 1-22. Engle, Robert F., and Jeffrey R. Russell, 1998, Autoregressive conditional duration: A new model for irregularly spaced transaction data, Econometrica 66, 11271162. Epps, Thomas W., and Mary Lee Epps, 1976, The stochastic dependence of security price changes and transaction volumes: Implications for the mixture-ofdistribution hypothesis, Econometrica 44, 305-321. 29 Fagiolo, Giorgio, 2007, Clustering in complex directed networks, Physical Review E 76, 26107. Fenn, Daniel J., Mason A. Porter, Stacy Williams, Mark McDonald, Neil F. Johnson, and Nick S. Jones, 2010, Temporal evolution of financial market correlations, preprint: http://arxiv.org/abs/1011.3225. Foster, F. Douglas, and S. Viswanathan, 1990, A theory of interday variations in volumes, variances and trading costs in securities markets, Review of Financial Studies 4, 595-624. Foster, F. Douglas, and S. Viswanathan, 1993, Variations in trading volume, return volatility and trading costs: Evidence on recent price formation models, Journal of Finance 48, 187-211. Foster, F. Douglas, and S. Viswanathan, 1994, Strategic trading with asymmetrically informed traders and long-lived information, Journal of Finance and Quantitative Analysis 29, 499-518. Foucault, Thierry, 1999, Order flow composition and trading costs in a dynamic limit order market, Journal of Financial Markets 2, 99–134. Foucault, Thierry, Ohad Kadan, and Eugene Kandel, 2005, The limit order book as a market for liquidity, Review of Financial Studies 18, 1171–1217. Garman, Mark B., and Michael J. Klass, 1980, On the estimation of security price volatilities from historical data, Journal of Business 53, 67-78. George, Thomas J., Gautam Kaul and M. Nimalendran, 1991, Estimation of the bidask spread and its components: A new approach, Review of Financial Studies 4, 623-656. Glosten, Lawrence R., 1987, Components of the bid-ask spread and the statistical properties of transaction prices, Journal of Finance 42, 1293-1307. Glosten, Lawrence R., and Paul R. Milgrom, 1985, Bid, ask and transaction prices in a specialist market with heterogeneously informed traders, Journal of Financial Economics 14, 71-100. Goettler, Ronald L., Christine A. Parlour, and Uday Rajan, 2005, Equilibrium in a dynamic limit order market, Journal of Finance 60, 2149–2192. 30 Goettler, Ronald L., Christine A. Parlour, and Uday Rajan, 2009, Informed traders and limit order markets, Journal of Financial Economics 93, 67-87. Hansen, P. Reinhard, and Asger Lunde, 2006, Realized variance and market microstructure noise, Journal of Business and Economic Statistics 24,127-218. Hasbrouck, Joel, 2003, Intraday price formation in U.S. equity index markets, Journal of Finance 58, 2375-2400. Jackson, Matthew O., 2007, The study of social networks in economics, The missing links: Formation and decay of economic networks, James E. Rauch, editor, Russel Sage Foundation: New York. Jackson, Matthew O., 2009, Networks and economic behavior, Annual Review of Economics 1, 489-513. Jarque, Carlos M., and Anil K. Bera, 1980, Efficient tests for normality, homoscedasticity and serial independence of regression residuals, Economics Letters 6, 255–259. Kyle, Albert S., 1985, Continuous auctions and insider trading, Econometrica 53, 1315-1335. Large, Jeremy, 2009, A market-clearing role for inefficiency on a limit order book, Journal of Financial Economics 91, 102-117. Leitner, Yaron, 2005, Financial networks: Contagion, commitment, and private sector bailouts, Journal of Finance 60, 2925-2953. Madhavan, Ananth, Matthew Richardson and Mark Roomans, 1997, Why do security prices change? A transaction-level analysis of NYSE stocks, Review of Financial Studies 10, 1035-1064. Newman, Mark E. J., 2002, Assortative mixing in networks, Physical Review Letters 89, 208701. Newman, Mark E. J., 2010, Networks: An introduction, Oxford University Press. Onnela, Jukka-Pekka, Kimmo Kaski and János Kertész, 2004, Clustering and information in correlation based financial networks, European Physical Journal B 38, 353-362. Ozsoylev, Han N., and Johan Walden, 2009, Asset pricing in large information networks, University of Oxford working paper. 31 Parkinson, Michael, 1980, The extreme value method for estimating the variance of rate of return, Journal of Business 53, 61-65. Parlour, Christine A., 1998, Price dynamics in limit order markets, Review of Financial Studies 11, 789-816. Parlour, Christine A., and Duane J. Seppi, 2003, Liquidity-based competition for order flow, Review of Financial Studies 16, 301-343. Rosu, Ioanid, 2009, A dynamic model of the limit order book, Review of Financial Studies 22, 4601-4641. Scheinkman, Jose A., and Wei Xiong, 2003, Overconfidence and speculative bubbles, Journal of Political Economy 111, 1183-1219. Stoll, H., 1978, The supply of dealer services in securities markets, Journal of Finance, 33, 1133-1151. Stoll, Hans R., 1989, Inferring the components of the bid-ask spread: Theory and empirical tests, Journal of Finance 44, 115-134. Tauchen, George E., and Mark Pitts, 1983, The price variability-volume relationship on speculative markets, Econometrica 51, 485-505. 32 Panel A: Network Metrics 0.044 0.000 0.044 0.000 0.297 0.011 0.009 0.170 0.005 0.333 Panel B: Traditional Financial Variables Returns -0.100 0.051 0.000 Volatility 0.100 0.051 0.051 Volume 2325 2304 3832 Duration 10 19 14 Effective Spread 0.007 0.005 0.009 Signed Volume -2151 -557 -170 Trade Imbalance -4.0 2.0 0.0 Herfindahl Trades 7.50 7.94 9.56 Figure 1. Trading Network Statistics This figure displays three examples of trading networks comprised of 600 sequential trades in the nearby e-mini S&P 500 futures contract on the CME Globex platform during August 2009. Centralization, |Centralization|, assortativity, clustering and the largest strongly connected component (LSCC) are computed as defined in Section I. Returns are open-to-close returns computed as differences between log ending and beginning prices during each interval. Volatility is the range between high and low prices, volume is the total number of contracts traded and duration is the time (in seconds) elapsed between the start and end of the network interval. Effective Spread is equal to twice the square root of the first order autocovariance of returns over each interval (x10-3). Signed Volume is equal to the buy volume minus the sell volume. Trade Imbalance is equal to the number of buy trades less the number of sell trades in each interval. Herfindahl Trades is the Herfindahl index computed using the number of trades traded by each trader over each interval (x10-3). Centralization |Centralization| Assortativity Clustering LSCC -0.736 0.736 0.097 0.003 0.013 33 Table I: Network Variables: Summary Statistics The sample contains summary statistics for 12,032 networks each comprised of 600 sequential trades in the nearby e-mini S&P 500 futures contract on the CME Globex platform during August 2009. Centralization, |centralization|, assortativity, clustering and the largest strongly connected component (LSCC) are network variables computed as defined in Section I. ADF probability refers to the pvalue of the ADF test for the null of unit root. Terms in [brackets] below autocorrelation coefficients refer to p-values of the Portmanteau Q-test for no serial correlation at 1, 5, and 10 lags. Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis ADF probability Autocorrelation Lag 1 Autocorrelation Lag 5 Autocorrelation Lag 10 Centralization 0.0022 0.0037 0.6499 -0.7356 0.1711 -0.0200 2.9827 0.0000 0.007 [0.462] 0.079 [0.000] 0.060 [0.000] |Centralization| 0.1373 0.1177 0.7356 0.0000 0.1021 0.9944 4.0422 0.0001 0.003 [0.779] 0.013 [0.000] 0.029 [0.000] Assortativity 0.0424 0.0321 0.3385 -0.0557 0.0391 1.5562 6.2788 0.0000 0.120 [0.000] 0.074 [0.000] 0.058 [0.000] Clustering 0.0524 0.0483 0.2250 0.0004 0.0282 0.9605 4.4588 0.0000 0.197 [0.000] 0.085 [0.000] 0.055 [0.000] LSCC 0.0912 0.0857 0.5217 0.0024 0.0641 0.7500 3.6643 0.0000 0.273 [0.000] 0.132 [0.000] 0.078 [0.000] 34 Table II: Financial Variables: Summary Statistics The sample contains summary statistics for 12,032 networks each comprised of 600 sequential trades in the nearby e-mini S&P 500 futures contract on the CME Globex platform during August 2009. Returns are open-to-close network returns computed as differences between log ending and beginning prices during each interval. Volatility is the range between high and low prices, volume is the total number of contracts traded and duration is the time (in seconds) elapsed between the start and end of the network interval. Effective Spread is equal to twice the square root of the first order autocovariance of returns, Stoll (1978), over each interval (x10-3). Signed Volume is equal to the buy volume minus the sell volume. When the price is constant (zero returns), buy/sell volume is equal to that of the previous interval. Trade Imbalance is equal to the number of buy trades less the number of sell trades in each interval. Herfindahl Trades is the Herfindahl index computed using the number of trades traded by each trader over each interval (x10-3). ADF probability refers to the p-value of the ADF test for the null of unit root. Terms in [brackets] below autocorrelation coefficients refer to p-values of the Portmanteau Q-test for no serial correlation at 1, 5, and 10 lags. Effective Signed Trade Herfindahl Returns Volatility Volume Duration Spread Volume Imbalance Trades 13.804 0.0145 Mean 0.0002 0.0621 2628.7 40.775 7.9369 0.0842 18.000 0.0000 Median 0.0000 0.0507 2521.0 28.000 7.9046 0.0836 5171.0 13.000 Maximum 0.1992 0.2901 7706.0 344.00 21.256 0.1306 -4592.0 -8.0000 Minimum -0.1493 0.0241 1095.0 0.0000 0.0011 0.0242 1219.3 1.5679 Std. Dev 0.0383 0.0191 636.06 37.926 2.1513 0.0126 0.0462 0.0615 Skewness -0.0116 0.9046 1.9566 1.9630 0.1778 0.1456 2.9539 3.5352 Kurtosis 2.7977 7.0270 11.016 8.1385 3.7118 3.1088 0.0001 0.0001 0.0000 ADF probability 0.0001 0.0000 0.0000 0.0000 0.0000 Autocorrelations: -0.0121 0.1747 0.5211 0.5483 0.1867 0.1985 -0.0113 0.3796 Lag 1 [0.185] [0.000] [0.000] [0.000] [0.000] [0.000] [0.213] [0.000] Lag 5 -0.0136 0.1392 0.3378 0.3805 0.1263 0.0364 -0.0080 0.1840 [0.363] [0.000] [0.000] [0.000] [0.000] [0.000] [0.073] [0.000] Lag 10 0.0027 0.1076 0.2351 0.3299 0.0992 0.0064 0.0054 0.1249 [0.289] [0.000] [0.000] [0.000] [0.000] [0.000] [0.115] [0.000] 35 Table III: Correlations between Financial and Network Variables This table contains pairwise correlations calculated from 12,032 networks of 600 sequential trades in the nearby e-mini S&P 500 futures contract on the CME Globex platform during August 2009. Centralization, |centralization|, assortativity, clustering and the largest strongly connected component (LSCC) are network variables computed as defined in Section I. Returns are open-to-close network returns computed as differences between log ending and beginning prices during each interval. Volatility is the range between high and low prices, volume is the total number of contracts traded and duration is the time (in seconds) elapsed between the start and end of the network interval. Effective Spread is equal to twice the square root of the first order autocovariance of returns, Stoll (1978), over each interval (x10-3). Signed Volume is equal to the buy volume minus the sell volume. Trade Imbalance is equal to the number of buy trades less the number of sell trades in each interval. Herfindahl Trades is the Herfindahl index computed using the number of trades traded by each trader over each interval. * indicates statistical significance at the 5 percent level. Central |Central- Assorta- Cluster-ization ization| tivity ing LSCC Panel A: |Centralization| 0.008 Assortativity 0.001 0.085* Clustering -0.001 -0.235* -0.038* LSCC -0.002 -0.142* Panel B: -0.550* 0.541* Returns 0.675* -0.011 0.013 0.011 0.006 Volatility -0.025* 0.170* -0.072* -0.034* 0.024* Volume 0.018 0.093* -0.052* 0.101* 0.153* Duration -0.006 -0.193* -0.011 0.003 -0.015 Effective Spread -0.003 -0.157* 0.033 -0.250* -0.318 Signed Volume 0.637* 0.007 -0.004 -0.009 -0.004 Trade Imbalance 0.669* -0.013 0.015 0.013 0.002 Herfindahl Trades -0.004 -0.027* 0.067* -0.621* -0.559 36 Table IV: Network Variables: p-values for the Null Hypothesis of Granger Non-causality The sample contains p-values for the null hypothesis of Granger non-causality for vector autoregressions (VARs) using the time series of 12,032 network variables each comprised of 600 sequential trades in the nearby e-mini S&P 500 futures contract on the CME Globex platform during August 2009. Centralization, |centralization|, assortativity, clustering and the largest strongly connected component (LSCC) are network variables computed as defined in Section I. The VARs are estimated using generalized method of moments (GMM) with HAC robust standard errors and with optimal lag-length of 12 selected using Akaike Information Criterion. The upper right quadrant displays p-values for pairwise tests of the first column variable Granger-causing the first row variable, with Total representing the p-value for the first column variable jointly Granger-causing all variables in the first row. The lower left quadrant displays p-values for pairwise tests of the first row variable Granger-causing the first column variable, with All representing the p-value for the first row variable jointly Granger-causing all variables in the first column. Centralization |Centralization| Assortativity Clustering LSCC Total Centralization 0.59 0.75 0.99 0.65 0.65 |Centralization| 0.18 0.11 0.01* 0.04* 0.00* Assortativity 0.69 0.09 0.03* 0.00* 0.00* Clustering 0.54 0.06 0.00* 0.00* 0.00* LSCC 0.98 0.15 0.00* 0.06 0.00* All 0.37 0.05* 0.00* 0.01* 0.00* 37 Table V: Financial and Network Variables: p-values for the Null Hypothesis of Granger Non-causality The sample contains p-values for the null hypothesis of Granger non-causality for VARs using the time series of 12,032 network variables each comprised of 600 sequential trades in the nearby e-mini S&P 500 futures contract on the CME Globex platform during August 2009. Returns, Volatility, Duration, Volume, Effective Spreads, Signed Volume, Trade Imbalance, and Herfindahl Trades are computed over each interval. Centralization, Assortativity, Clustering and the Largest Strongly Connected Component (LSCC) are network variables computed as defined in Section I. The VARs are estimated using GMM with HAC robust standard errors and with optimal lag-length of 12 selected using AIC. Each number represents the p-value for pairwise tests between variables in each column, with arrows indicating significance at the 5% level. The p-values in parentheses denote that these four Network Variables jointly Granger-cause Financial Variables with significance at the 5 percent level labeled *. The p-values in brackets denote that Financial Variables jointly Granger-cause the four Network Variables with significance at the 5 percent level labeled †. Panel A: Returns (0.01)* [0.00]† 0.00 0.00 Centralization 0.00 0.00 0.25 0.05 Assortativity 0.00 0.04 0.88 0.40 0.00 0.00 0.16 0.68 0.05 0.00 Clustering LSCC Volatility (0.00)* [0.00]† Panel B: Duration (0.00)* [0.20] 0.00 0.11 Centralization 0.48 0.00 0.01 0.34 Assortativity 0.05 0.03 0.00 0.51 Clustering 0.15 0.00 0.00 0.01 LSCC 0.14 0.00 Volume (0.00)* [0.16] 38 Panel C: Signed Volume (0.00)* [0.35] 0.08 0.10 0.90 0.98 0.75 0.68 0.92 0.80 Centralization Assortativity Clustering LSCC 0.84 0.20 0.19 0.01 0.58 0.00 0.37 0.00 0.20 0.02 0.14 0.21 Herfindahl Trades (0.00)* [0.22] Panel D: Effective Spread (0.00)* [0.21] 0.00 0.07 Centralization 0.00 0.14 0.25 0.38 Clustering 0.11 0.66 0.00 0.42 LSCC 0.44 0.17 Assortativity Trade Imbalance (0.03)* [0.18] 39 Table VI: Financial and Liquidity Measures: p-values for the Null Hypothesis of Granger Non-causality The sample contains p-values for the null hypothesis of Granger non-causality for bi-variate VARs estimated with optimal lag-length 6 selected using the AIC over 12,032 sequential networks each comprised of 600 sequential trades in the nearby e-mini S&P 500 futures contract on the CME Globex platform during August 2009. Each number represents the p-value for pairwise tests between variables in each column, with arrows indicating significance at the 5% level. Returns are open-toclose network returns computed as differences between log ending and beginning prices during each interval. Volatility is the range between high and low prices, Volume is the total number of contracts traded and Duration is the time (in seconds) elapsed between the start and end of the network interval. Effective Spread is equal to twice the square root of the first order autocovariance of returns, Stoll (1978), over each interval (x10-3). Signed Volume is equal to the buy volume minus the sell volume. Trade Imbalance is equal to the number of buy trades less the number of sell trades in each interval. Herfindahl Trades is the Herfindahl index computed using the number of trades traded by each trader over each interval. Panel A: Signed Volume 0.18 0.10 Returns 0.55 0.81 0.47 0.91 Volatility 0.00 0.71 0.48 0.02 Volume 0.00 0.01 0.02 0.37 Duration 0.00 0.00 Herfindahl Trades Panel B: Effective Spread 0.72 0.52 Returns 0.15 0.24 0.05 0.00 Volatility 0.60 0.93 0.00 0.00 Volume 0.29 0.10 0.00 0.00 Duration 0.13 0.07 Trade Imbalance 40 Table VII: Simulation Results - Correlations between Financial and Network Variables This table contains pairwise correlations between network variables each calculated from 12,000 simulated networks of 600 sequential trades as described in Section IV. Centralization, |centralization|, assortativity, clustering and the largest strongly connected component (LSCC) are network variables computed as defined in Section I. Returns are open-to-close network returns computed as differences between log ending and beginning prices during each interval. Volatility is the range between high and low prices, volume is the total number of contracts traded and duration is the time (in seconds) elapsed between the start and end of the network interval. Effective Spread is equal to twice the square root of the first order autocovariance of returns, Stoll (1978), over each interval (x10-3). Signed Volume is equal to the buy volume minus the sell volume. Imbalance is equal to the number of buy trades less the number of sell trades in each interval. The Herfindahl Trades measure is the Herfindahl index computed using the number of trades traded by each trader over each interval. * indicates statistical significance at the 5 percent level. Central -ization |Centralization| Assortativity Clustering LSCC Returns 0.605* -0.024* 0.003 -0.000 0.003 Volatility 0.008 0.219* 0.013 0.004 -0.059* Volume -0.004 0.067* 0.119* 0.042* -0.028* Duration 0.016 0.003 0.012 -0.002 0.003 Effective Spread 0.052* 0.039* 0.031* 0.007 -0.010 Signed Volume 0.299* 0.009 0.043* 0.021* -0.016 Imbalance 0.722* 0.052* -0.011 0.000 -0.022* Herf. Trades -0.001 -0.138* 0.001 -0.050* -0.818* 41