Clustered Institutional Holdings and Stock Comovement ZHENG SUN1 First Draft: January, 2007 This Draft: April, 2008 Abstract Previous literature has found that stock returns comove more than fundamentals. More recently, researchers have also found commonalities in liquidity and trading activity. In this paper, I document the role of institutional clienteles in comovement. To define clienteles, I take an innovative approach based on applying hierarchical clustering algorithms to institutional holdings. I find the majority of institutional investors can be stably clustered into a small number of clienteles. These clienteles seem to play an important role in explaining comovement. Stocks hold by the same clientele comove excessively in trading volume, return, and liquidity. Lastly, I provide a channel through which clientele effects generate comovement. Funds within the same clientele seem suffer correlated liquidity shocks. These shocks generate correlated order flow in the underlying stocks, inducing comovement in return and liquidity for these stocks. 1 I am highly indebted to my dissertation committee: Joel Hasbrouck, Robert Whitelaw, Robert Engle and Lasse Pedersen. I also thank participants of the finance seminars at NYU, MIT, Ohio State University, University of California, Irvine, University of Texas, Austin for helpful comments and suggestions. All the remaining errors are mine. Electronic copy available at: http://ssrn.com/abstract=1332201 I. Introduction This paper studies what drives stock comovement. Previous literature has extensively documented how fundamentals drive comovement. But there have also been lots of evidence showing that stock returns may move with each other even when fundamentals are not at work. The puzzling findings related to comovement do not just stop at prices and returns. More recently, people have also found liquidity commonality, as well as trading behavior commonality. In trying to understand these comovement findings, the current literature seems to have found that the degree of excess comovement seems to be related to the degree of institutional ownership. Stocks with higher proportion of institutional ownership seem to comove more excessively.2 This paper also looks at excessive stock comovement from the angle of institutional investment. But instead of focusing on the degree of institutional ownership, I focus on the differences among institutional holdings. Using these differences, I show institutional investors group into distinct clienteles, and the existence of clienteles plays an important role in explaining excessive comovement. In contrast to traditional asset pricing theories in which the marginal investor in every stock is the same broadly diversified representative agent, clientele theories assume each stock has a body of holders who find it attractive. Clientele theories imply limited market participation. The marginal investor in a particular asset market is a specialized investor rather than the diversified representative agent. 3 In such a world, the set of potential buyers for an asset is much smaller. When one investor sells part of its portfolio due to a liquidity shock or preference change, those who are willing to buy without a big price concession probably also specialize in the same set of stocks. It takes longer for investors who do not specialize in these stocks to learn about them. Thus, these “outside” investors would ask for a bigger price concession. In this case, an asset’s price changes reflect the 2 Pindyck and Rotemberg (1993) suggest excessive return comovement is related to institutional ownership. Kamara, Lou, and Sadka (2006) suggest degree of institutional ownership is related to degree of liquidity commonality. 3 Merton (1987) presents a model in which segmentation arises endogenously, and explores the implication of market segmentation for asset prices. Allen and Gale (1994) study an environment in which traders must specialize in a certain asset market ex ante. This leads to limited market participation ex post. Moreover, the wealth level of the specialized traders is critical to setting prices. Barberis and Shleifer (2003) show how style investment can generate excessive comovement 1 Electronic copy available at: http://ssrn.com/abstract=1332201 preferences or wealth changes of the investors who specialize in it. Defining investors and funds holding similar stocks as a clientele, different clienteles effectively define different marginal investors. Consequently, returns of two stocks that share the same marginal investors tend to comove more than returns of two stocks held by different marginal investors. Furthermore, when a liquidity shock hits a marginal investor, it likely triggers a liquidity shock for the group of stocks in the marginal investor’s portfolio. This mechanism simultaneously increases the trading volume and decreases the liquidity of this group of stocks. This paper shows that the existence of distinct institutional clienteles can partially explain commonalities in trading volume, return, and liquidity. First, I show the majority of institutional investors can be stably clustered into a small number of clienteles. Second, I show stocks held by the same clientele comove excessively in trading volume, return, and liquidity. Here, I use excess comovement to mean comovement unexplained by factors commonly thought to generate commonality. Lastly, I provide a channel through which clientele effects generate comovement. I show funds within the same clientele suffer from correlated fund withdraws which in turn trigger reactions that induce comovement among stocks underlying these funds’ portfolios. The literature has generally ignored the differences among institutional investors because it is difficult to quantify these differences. In the first part of this paper, I attack this problem through examining the differences among these investors’ portfolio holdings. Specifically, I apply hierarchical clustering techniques to institutional holdings data to group together funds that hold similar portfolios. I find the majority of institutional investors can be clustered into a few distinct groups. The largest ten groups contains more than 50% of the institutional investors in terms of total number and more than 80% in terms of total assets under management. This clustering is stable over time; funds that group together in one year are likely to group together the following years. As funds in each stable cluster hold similar portfolios through time, I will from here on refer to the funds that share a cluster as a clientele. The existence of distinct clienteles is most interesting if it has asset pricing implications. In the second part of the paper, I find the clustering of institutional investors actually induces commonality in stock trading volume, return, and liquidity relative to the 2 standard factor models. Taking returns as an example, a one standard deviation increase in the cluster level return is associated with a 20 bps increase in individual daily return. So I essentially identify a cluster effect for stocks. Previous research has also found excessive return comovement 4 and commonality in liquidity and trading activity 5 in equity markets, and there is some evidence that commonality in order flow can partially explain commonality in stock returns 6 . However, up to now, no one has tried to synthesize all the empirical evidence of excess commonality. Through clientele effects, this paper provides a simple channel that can partially account for all of the previous comovement related findings. To understand why stocks comove excessively on the clientele level, in the third and last part of this paper, I test whether the fact that specialized institutional investors are wealth constrained can explain the observed clientele level comovement effects.7 I will refer to this channel as the wealth constraint channel. I focus on testing this particular channel because of the special organizational feature of the asset management industry. In this industry, retail or other uninformed investors delegate their investment decisions to fund managers, but they can often put money in or pull money out of funds freely. For example, it is well documented that mutual fund investors actively chase past performance; 8 thus, fund managers may become wealth constrained upon bad past performance. The wealth constraint channel can induce excessive clientele level stock comovement if all funds in a cluster experience liquidity shocks around the same time and if funds outside the cluster demand a larger premium to hold stocks not usually in their portfolios. As pointed out by Allen and Gale (1994), limited participation by itself does not explain 4 Both Karolyi and Stultz (1996) and Connolly and Wang (1998) find that macroeconomic announcements and other public news do not affect the comovements of the Japanese and American stock markets. King and Wadhwani (2000) also find that observable economic variables explain only a small fraction of international stock market comovements. 5 Chordia et al. (2000) find that quoted spreads, quoted depth, and effective spreads comove with market- and industry-wide liquidity; Huberman and Halka (2001) find evidence of commonality for quotes and depth. 6 Hasbrouck and Seppi (2001) find that both returns and order flow are characterized by common factors, and that commonality in order flow explains two-thirds of the commonality in returns. 7 Shleifer and Vishny (1997) are the first to emphasize inter-temporal wealth effects of financial constraint. Gromb and Vayanos (2002), Kyle and Xiong (2001), and Yuan (2005) model a large drop in prices caused by distressed liquidation of assets by hedge funds. 8 For early results concerning the relationship between fund flow and performance, readers can consult Ippolito (1992), Chevalier and Elllison (1997) and Sirri and Tufano (1998). 3 excessive price movement. Liquidity in the market depends not only on the number of investors who participate, but also on the amount of cash they hold. Even if the market is thin, as long as a couple of informed investors hold enough cash, they will provide liquidity at a small premium. Excessive comovement results when all the informed investors are wealth constrained. I test the wealth constraint channel in three steps. First, I show that money flows of funds in the same clientele are highly correlated. Next, I show that funds’ investment behaviors are affected by large fund flows. When funds suffer large outflows, they dramatically increase the number of stocks they sell. The first two steps show that there is correlated liquidity demand due to large money withdraws. In the last step, I show, as the wealth constraint story would predict, this correlated liquidity demand partially explains clientele level comovement. Clientele level comovement significantly increases when funds in the clientele face large outflows. Moreover, this effect is asymmetric. I do not find increased comovement when funds receive large inflows. The rest of this paper proceeds as follows: Section II takes a brief look at the related literature; Section III summarizes the data used in this paper; Section IV describes my clustering methodology and characterizes the properties of the clusters obtained; Section V shows how fund clustering induces the aforementioned stock commonalities; Section VI links cluster level comovement to wealth constraint effects; Section VII lays out possible future research and concludes. II. Literature This paper’s findings add to the fast-growing literature that emphasizes how frictions or market sentiments can de-link returns comovement from comovement of news about fundamentals. For example, Pindyck and Rotemberg (1993) find excess return comovement relative to the stocks’ exposure to macroeconomic factors. They show this excess return comovement can be explained by the stocks’ underlying proportion of institutional ownership. Lee, Shleifer, and Thaler (1991) show that prices of stocks with low proportions of institutional ownership and prices of small stocks move together with closed-end fund discounts, which are held predominantly by individuals. Both these 4 papers stress the market segmentation between the individual and institutional investors. In this paper, I show different clienteles exist among institutional investors; in effect, I argue there is also segmentation among institutional investors. Barberis, Shleifer, and Wurgler (2005) study comovement of stocks within the S&P 500 index. They show when a stock is added to the S&P index, its beta with respect to the S&P increases. As the authors point out, it would be hard to argue that the stock’s addition is correlated with any change in fundamentals. They suggest one possible explanation might be that certain investors prefer certain investment habitats. As these investors’ risk aversions, sentiments, or liquidity needs change, they adjust their portfolios’ exposure to the securities in their habitats, thereby inducing a common factor underlying these securities’ returns. Unlike Barberis, Shleifer, and Wurgler (2005), who only focus on S&P stocks and index changes, this paper looks at almost all stocks in the investment universe. That I find a few stable fund clusters indicates that the habitat argument works for a much broader set of stocks. The findings in this paper also contribute to the growing literature that documents and explains liquidity commonality. The mechanism proposed here is demand-driven, which differs from supply-side explanation prevalent in the literature. Coughenour and Saad (2004) and Newman and Rierson (2004) find that bid-ask spreads of stocks supported by the same market makers move together. This spread comovement can be due to the market maker using information across stocks or in his managing the inventory risk of the combined portfolio. Brunnermeier and Pedersen (2005) propose a third mechanism for a market maker to transfer liquidity shock between stocks. Their story relies on margin calls to the liquidity provider forcing him to rebalance his portfolio. In contrast, the mechanism stressed in this paper generates liquidity commonality from order flow commonality. This mechanism does not require the existence of a common market maker for the underlying stocks. Furthermore, I find empirical evidence showing that the commonality of order flow is not solely driven by information. At least a part of it is induced by the same group of marginal investors liquidating their portfolios. Another strand of literature in which this paper fits examines the relationship between asset management industry fund flows and stock returns. Edelen (1999) finds substantial positive cross-correlation in fund flows, possibly indicating the existence of a common factor behind fund flows. In this paper, I provide a natural candidate for such a factor, 5 namely the clustering of institutional holdings. Boyer and Zheng (2004) find that the quarterly contemporaneous relations between fund flows and returns are positive and significant for Mutual Funds, Foreigners, Pension Funds and Insurance Companies. Moreover, they find that the quarterly contemporaneous covariances are driven mainly by strong contemporaneous monthly relations. Boyer and Zheng’s results suggest that these sectors may exert price pressure on the market through their demand for stocks. The price impact appears to be temporary and is reversed in the subsequent months. Similar effects are found in Coval and Stafford (2005) and Frazzini and Lamont (2005). The former paper finds that funds experiencing large outflows (inflows) tend to decrease (increase) their existing positions. This strategy change creates price pressure in the securities held in common by these funds. The latter paper uses mutual fund flow as a measure of individual investor sentiment for different stocks, and finds that high investor sentiment predicts low future returns at long horizons. Both these studies and this paper conjecture that funds incur liquidity shocks around the same time due to their common holdings. Unlike this paper, however, they focus on individual stocks, thus they cannot directly test the similarity of funds’ portfolios. Moreover, they do not provide evidence of correlation of fund flows among similar funds. Finally, from a methodology point of view, this paper documents similarity among fund managers’ holdings and designs a new method to classify institutional investors based on their holding information. Because I look at holdings rather than rely on selfreported information, my method can be thought of as a cleaner classification of investment style. This classification procedure is especially useful given that a large number of institutional investors do not report investment styles. It should prove to be more useful as more institutional investing information become available. III. Data The main questions I ask in this paper are: (1) whether several distinct clienteles exist among institutional investors in the equity market, and (2) whether such a clientele effect can help to explain the commonality of market trading behavior, returns, and liquidity. The existing literature provides little help on these questions because of data issues, as 6 well as the difficulty to quantify institutional and investor differences. To tackle these questions, I design a new approach based on an empirical definition of “clienteles.” In this paper, I define a clientele to be a group of investors holding similar portfolios throughout time. The natural implementation of this approach consists of clustering together similar institutional holdings. I detail my methodology in the next section. First, I introduce the major datasets used in this paper. To conduct my analysis, I require institutional holding, style, and return data. In addition, I need data on stock trading volume, returns, liquidity, and other related firm characteristics. In total, I use four major datasets in this paper: CDA Spectrum Institutional data, CDA Spectrum Mutual Fund data, CRSP Survivorship Bias Free Mutual Fund Data, and The Transactions and Quotes (TAQ) data. My institutional equity ownership data come from the CDA Spectrum 13f filings database. The SEC requires all institutional investors with more than $100 million in equity ownership to report their holdings via quarterly 13f filings. Little research has been done on the holding and trading characteristics for institutions other than mutual funds.9 This is unsatisfying given the large market shares of the other types of institutions. For example, it is possible that institutions other than mutual funds hold a majority of the shares of certain stocks. When mutual funds need to trade in and out of these stocks, other institutions can easily absorb this trading need. In other words, mutual funds may not always be the marginal investors for the stocks they trade. Therefore, in this study, I incorporate all major institutions whose holding information is available. The CDA spectrum dataset classifies the filing institutions into one of five categories: bank trust department, insurance companies, mutual funds, independent investment advisors, and other institutional investors. Pension funds likely fall into the last group. One major type of institutions, hedge funds, is missing from the data. This data limitation will generate problems if hedge funds have sufficient wealth to absorb the trading needs of all other institutions. However, existing literature documents strong evidence of limits to arbitrage by hedge funds. It is nonetheless unfortunate that this data limitation forces me to assume hedge funds have no impact. 9 There are a few exceptions: Bennett et al. (2003) study the characteristics of stocks that are held by all classes of institutions. Boyer and Zheng (2004) document the price impact of trading by different types of institutions. 7 Table 1 summarizes the median number of stocks a typical institutional investor holds and trades each quarter.10 From 1980-2003, the median number of stocks an institutional investor holds at the end of each quarter is 262. Considering the median number of stocks bought (124) and sold (111) during each quarter, I find, as does the existing literature, that institutional investors manage assets actively. Because I also examine how correlated fund inflow and outflow affect stock characteristics, I require institutional flow data. Unfortunately, the CDA Spectrum Institutional database from 13f filings does not report total assets under management, nor does it report net return information. Thus, when using these data to compute fund flow statistics, I have to assume that the reported equity portfolio makes up the fund’s entire investment set, and also that funds make investment choices on quarterly intervals. To soften these assumptions and to enhance the power of my tests involving fund flow, I adapt two datasets on mutual funds that are commonly used in the literature. From the CDA/Spectrum holdings database, I obtain complete quarterly U.S. equity holdings for all the U.S. mutual funds during the period of 1980-2003. I manually merge these data with the CRSP Survivorship Bias Free Mutual Fund Database.11 The CRSP Mutual Fund Database includes fund returns, total net assets, different types of fees, investment objectives, and other fund characteristics. My merged final sample spans the period from January 1980 to December 2003. I eliminate bond and international funds from the sample. In addition, I include funds with multiple share classes only once. Finally, I eliminate from the sample all fund-quarter observations for which fewer than ten stock holdings are reported. Lastly, to construct stock liquidity proxies, I use the TAQ database of the New York Stock Exchange. This database records transaction prices and quantities of all trades, as well as prevailing quotes beginning in 1993. I apply standard microstructure filters to these data. Namely, using TAQ’s sale condition and correction indicator variables, I exclude the transactions that take place under special conditions and those that are 10 I delete the first quarter and last quarter observation for each fund so I do not mistakenly count purchases or sales due to funds entering or exiting the database. If a fund has a missing report during a quarter, I do not count the number of trades in the subsequent quarter since doing so would use information from more than half a year earlier. 11 Information on how to merge the two datasets can be found in Wermers (2000), where detailed description of the CDA/spectrum dataset is also provided. 8 wrongly ordered. Also, I only use trade and quote information from NYSE, NASDAQ and AMEX. This actually turns out to be an important filter. Using this filtered sample, I derive several standard liquidity measures, such as proportional quoted bid-ask spreads and proportional effective spreads. Following previous literature, I use these spreads to proxy for transaction cost. Table 2 presents summary statistics of liquidity and trading volume measures used in the paper. IV. Clustered Institutional Holdings I first ask whether there exist different clienteles among institutional investors in the U.S. equity market. Previous research looking at institution ownership has mainly focused on the distinction between individual investors and institutional investors. In this paper, I find that even among institutional investors, different types and styles emerge. By clustering on portfolio holdings, I show that there exist different clienteles among institutional investors in the U.S. equity market. The next section details my clustering methodology and presents properties of the obtained clusters. Briefly, I find the majority of institutional investors can be stably clustered into a small number of clienteles. A. Cluster Methodology For each quarter, I perform cluster analysis on fund portfolio holdings. 12 As an example, in a given quarter, suppose there are a total of n stocks held by all the institutional investors. I first construct an n by 1 vector for each fund. Each entry of this vector corresponds to a fund’s position on a particular stock. For any particular stock, if a fund does not hold it in its portfolio, then the entry corresponding to that stock is zero. Otherwise, the entry corresponds to the stock’s proportion in the fund’s portfolio. With the weight vectors created, I compute the pair-wise distance between any two funds in their holdings. I define the distance of two funds as the sum of the absolute differences of stock holdings. Mathematically, let vi = [ω1i , ω2i ...ωni ] and vk = [ω1k , ω2k ...ωnk ] be the weight 12 One of the papers that introduce the cluster analysis into finance is by Elton and Gruber (1970). Other finance papers that use the cluster techniques include Carleton and McGee (1970) and Brown and Goetzmann (1997). 9 vectors for funds i and k, respectively. The distance (or dissimilarity) between the two n funds is defined to be dik = ∑ ω ij − ω kj . Clearly, d is symmetric and is bounded between j =1 0 and 2. If the two funds hold the same positions, then d equals to 0. However, if the two funds hold totally different stocks, then d hits its maximum value of 2. For K funds, I obtain a pair-wise distance matrix with K*(K-1)/2 unique pair-wise distances. I then perform a hierarchical clustering of funds based on this pair-wise distance matrix as follows. First, I set each fund to its own cluster. Then, I find the pair of funds that are the closest according to my distance measure and cluster them together. After this clustering, a new cluster that contains more than one fund is formed. I define the distance between all other funds and the new cluster as the furthest distance between any fund in the cluster and other funds outside the cluster. This definition gives the most conservative measure of distance. Under my definition, the distance separating any two funds within the same cluster is always shorter than the distance between that cluster and any fund outside the cluster. Proceeding under the same principle, I define the distance between any two multiple-fund clusters as the furthest distance between funds from these two clusters. The hierarchical cluster analysis proceeds in an orderly fashion from the weakest level (where all funds are individual clusters) to the strongest level (where all funds are in one cluster). To ensure all funds do not fall into the same cluster, I terminate the process if all distances between clusters are above a certain threshold. Since I am interested in the number of clusters that obtains and since that number depends on the imposed threshold, I must objectively choose the threshold. I perform a simulation to find this threshold. Specifically, to maintain certain properties of funds, such as fund size and portfolio concentration, I first fix the funds’ portfolio weights and the number of stocks in which funds can invests. Then, using these characteristics from the real data, I use the permutation method to simulate a portfolio for each fund under the null hypothesis that funds choose their portfolios independently. After simulating funds’ holding vectors, I then calculate pair-wise distances between the simulated funds. With these, I plot the distribution of simulated pair-wise distances and use the left 1% critical value as the threshold for clustering. I combine two clusters together if their distance is below the 10 threshold. By construction, I can reject the null that funds in these two clusters invest in independent portfolios with 99% confidence. The above procedure assures that I cluster together funds whose holdings are close to one another. The empirical distribution of the pair-wise distance metric from the sample simulated under the null of no holdings correlation is much more right-skewed than the corresponding distribution from the real data. Figure 1 plots a snapshot of the two distributions for the first quarter of 1994. That the majority of pair-wise distance statistics lie to the left of the 1% critical threshold suggests that the majority of institutions probably fall into a few large clusters. My approach to classifying institutional investors has certain advantages over existing techniques.13 First of all, this approach does not use style information self-reported by fund managers. Thus, it does not suffer from strategic style misclassification by funds.14 Second, compared to clustering funds based on return information, looking directly at holdings information allows one to measure fund strategy more closely since fund returns can be obscured by manager skill and management fees. Moreover, clustering based on return information requires long time-series of data, so it must be assumed that fund style does not change during the clustering period. Since I cluster quarterly, my method does not require a constant style assumption. Third, I do not need to know how many styles actually exist, nor do I need to specify what these styles are. The existing literature focuses mainly on broad characteristic comparisons, such as large vs. small and growth vs. value. Although size and book-to-market are two important factors in determining styles, they alone certainly do not capture all there is about style. For example, as pointed out by Brown and Goetzmann (1997), one can interpret the definition of “growth funds” in many ways. Managers who describe their funds as growth-oriented have great latitude in picking the types of stocks they can hold, the timing of purchases and sales, the level of fund diversification, the industry concentration of the portfolio, as well as a host of other factors that can go into determining client investment style. Ultimately, these uncaptured features of the data translate into distinct groups of stocks that institutions hold. 13 The existing literature classifies funds using several different methods: (1) classify funds based on the reported styles; (2) classify funds into growth, value, small- and large-cap using the loadings on the Fama-French factors (Chan, Chen & Lakonishok (2002)); (3) classify funds by clustering fund performance variables (Brown & Goetzmann (1997)). 14 Brown and Goetzmann (1997) find evidence consistent with the notion that mutual funds choose to report styles that minimize their relative poor performance. 11 My clustering approach picks up styles not easily identified when only taking into account known risk factors. B. Cluster Characteristics Table 3 summarizes the number of clusters produced each period. In the interest of brevity, I report annual results, which are simple averages of quarterly statistics. At first glance, the total number of clusters is around one-tenth of the number of funds in the market. Clusters’ sizes are tremendously unbalanced. On one hand, there are several large clusters hosting numerous funds; on the other hand, some funds form their own clusters. The ten largest clusters cover more than 50% of institutional investors in terms of the number of funds and more than 80% in terms of dollar volume. Thus, it seems reasonable to focus exclusively on the largest clusters. One important question is whether the few large clusters obtained can pick up important fund-distinguishing characteristics. Two of the most commonly used characteristics to identify distinct fund styles are size and book-to-market. Thus, I look at whether stocks held by funds in different clusters differ along these two dimensions. Figure 2 shows some supporting evidence to this effect. Here I only consider the ten largest clusters obtained. Stocks are ranked into 5 by 5 size and book-to-market portfolios. Each bubble in the graph represents a particular cluster’s average rank along the two dimensions. Its center corresponds to the mean rank for the cluster, and the widths along the two dimensions represent the standard errors around the mean. As one can see, these bubbles generally do not overlap. Thus, clusters do seem to hold stocks that differ significantly in these two characteristics. Figure 2 suggests my clustering scheme does pick up important fund-distinguishing characteristics. The asset management industry also defines several distinct styles among institutional investors, especially for mutual funds. One commonly used style measure is the ICDI objective code. To gain further understanding of the clusters obtained, I study whether each cluster can be associated with a distinct fund style. Since the ICDI style measure is reported for mutual funds only, I perform a separate clustering analysis using mutual fund holding data only. Table 4 summarizes the relationship between the clusters obtained and the ICDI objective. For each cluster, I identify a dominant style, which is the style most 12 represented within that cluster. Then for all clusters identified by the same style, I calculate the cluster-level average of the percentage of funds of that style. Panel A reports the results. Only the largest 10 clusters and clusters containing more than 10 funds are analyzed. As one can see, clusters and styles do not perfect mapping to each other. Although there are some clusters hosting exclusively one certain fund style, such as “aggressive growth”, “international equity”, “sector funds,” and “utility funds”, most of the dominant styles cover less than 50% of the funds in the clusters. This suggests that funds with different reporting styles can still hold similar portfolios. At the same time, it is possible that two funds reporting the same style hold very different portfolios. Therefore, I also examine how many clusters can be associated with a single style. Here, I associate a cluster with its dominant style. Panel B reports the percentage of clusters belonging to various styles. On average the “long term growth” style is associated the most number of clusters; however, its coverage seems to decline with time. For most styles, the percentage of clusters associated with them seems to be stable over time. However, during the past twenty years, the numbers of “aggressive growth” and “sector funds” clusters seem to have increased. Since I perform cluster analysis quarterly, my method as detailed does not guarantee that funds falling into one cluster in this quarter will be grouped together the next quarter. Moreover, funds on the margin between two groups are easily misclassified. To alleviate concerns over the stability, I test the clustering’s consistency. If there are some economic forces behind the clusters, then the clustering should remain stable over time. I test this conjecture by looking at pair-wise connections between funds. The intuition behind looking at pair-wise connections is that, ideally, funds clustered together should stay clustered together. In each quarter, I define “connection” to be either 1 or 0 depending on whether the two funds in question fall into the same cluster or not. I then count the percentage of pair-wise connections that remain unchanged for the next year. Based on intuition given earlier, a higher percentage of unchanged pair-wise connections signifies a more stable clustering. Table 5 gives the clustering stability results. Column 2 counts the number of pair-wise connections that stay the same, and column 3 counts the total number of pair-wise connections for funds that are alive in both the previous and the current quarter. Column 4 gives the percentage of connections changed from the previous 13 quarter. I call this percentage the transition rate. The average annual transition rate is 14.9%. To gauge the stability of the clustering through time, for each year I bootstrap the “switching rate” under the null hypothesis of no cross-sectional structure. The null is constructed by forming samples via random draws without replacement from actual fund portfolios. For each round of the bootstrap procedure, I set the number of clusters and the total number of funds equal to those statistics from the real sample. Column 5 reports the average “switching rate” for each year. The typical rate of change under the null is 21.45%, which is considerably higher than the transition rate of 14.9% obtained from the true sample. Column 6 reports the standard deviation of the bootstrapped distribution. That each year’s transition rate is below the 1% critical value in the left tail of the bootstrapped distribution allows me to reject the null of spurious classification. Thus, clustering based on portfolio holdings gives a stable grouping of institutional investors. This stable clustering supports the story that there are indeed clienteles among institutional investors. Although the results presented in this paper do not depend on the economic reasons underlying why institutional holdings cluster, the existing literature does provide some rationales that support this phenomenon. Investors may choose to invest in only a subset of stocks due to preference, liquidity, and information reasons. For example, according to Nieuwerburgh and Veldkamp (2006), when information is costly and its acquisition has increasing returns to scale, it pays to focus exclusively on the single risk factor with which the fund is most familiar.15 Thus, funds stay with their familiar risk factors as long as their investments are successful. Conversely, a fund should learn about and rely on other risk factors after it performs badly. I provide some evidence consistent with this explanation. Specifically, I test whether a fund switches to another cluster after poor performance. To do so, I run a logistic regression using a dummy variable that captures whether a fund leaves its cluster during a certain quarter. Table 6 provides the results. They are consistent with the information story; a fund’s probability of switching clusters increases as the fund’s past quarter net performance and past year net performance decrease. The coefficients on fund past performance are negative and significant, 15 Veldkamp (2005) proposes a theory that provides rationales for why investors may want to purchase the same information that others are purchasing. 14 suggesting that a fund is more likely to change its style and transfer to another cluster after its current strategy fails. V. Comovement and Clustered Institutional Holdings That there exist clustered institutional holdings is most interesting if it has asset pricing implications. In this section, I study the pricing implications of institutional clustering detailed in the previous section. Existing theories suggest that when segmented investor groups concentrate on different stocks, only the marginal investor who specializes in a stock affects that particular stock’s price and the associated trading behavior. Consequently, stocks held by the same marginal investor would comove excessively. In the current context, since I cluster based on fund portfolio holdings, each cluster of institutions can represent a marginal investor. I test if fund clustering can explain the observed comovement in stock return, trading pattern, and liquidity. Economically, this hypothesis is the same as whether the observed stock comovement is related to stocks sharing a common marginal investor. A. Comovement in Trading Volume, Liquidity and Return A.1. Trading Behavior Commonality Since clientele effects generate pricing and liquidity commonality through trading, I first examine whether fund clustering induces comovement in stocks’ turnover. By grouping funds into clusters, I essentially create several “pseudo portfolios.” I test for excessive trading behavior comovement within these portfolios. There are a few ways of doing so. One way would be to compute pair-wise correlations among all stocks and compare the average correlation of pairs of stocks within the same cluster against the average correlation of pairs of stocks not residing in the same cluster. This method turns out to be too computationally intensive. Instead, I construct measures of turnover on the stock, cluster, as well as market, level. I then regress stock level turnover on cluster and market level turnover to see if cluster level turnover can explain stock level turnover. My daily stock turnover measure, defined as daily trading volume divided by the number of shares outstanding, comes from CRSP. Constructing cluster level turnover 15 turns out to be nontrivial because some stocks are held by multiple clusters. These stocks tend to be large and liquid; most of them are in fact index stocks. Diversification and liquidity concerns are the two main forces behind these stocks being held by a large population of investors. If a stock is held by more than 5 clusters of investors, I associate that stock with the five clusters holding the most of its shares. If a stock is held by 5 or fewer clusters, I associate that stock with all of the clusters that hold it. For each stock, I define its associate cluster level turnover variable as the average turnover among all the other stocks held by every cluster that holds this particular stock. I construct a stock’s associate market level turnover variable as the average turnover of all other stocks in the market.16 To test whether clustering can explain the cross-section of individual stock turnover, for each stock I run a time-series regression of daily17 individual stock turnover on its associated daily cluster level turnover. I include the stock’s associated market level turnover as control. I control for the other common factors, such as size, book to market, past return, and industry, that seem to be correlated to risks involving public information. The formal specification is T j ,t = α + β c CT j ,t + β m MT j ,t + β s ST j ,t + β bm BMT j ,t + β lr LRT j ,t + β I IT j ,t + ε j ,t (1) where CT j ,t is the stock’s associated cluster level turnover and MT j ,t is the stock’s associated market level turnover. ST j ,t , BMT j ,t , LRT j ,t represent the average turnover for a portfolio matched on size, book-to-market, and, past one year return, respectively. IT j ,t refers to the average turnover for the stocks in the same industry as the target stock. Since similar funds, such as sector funds, likely hold stocks with similar characteristics, commonality of turnover within the same cluster may reflect common information. I construct matching size, book-to-market, past year return, and industry portfolios to control for common turnover due to public information. Specifically, I sort stocks into quintile portfolios based on market capitalization, book-to-market, and past year return. 16 I also use another way to identify stocks sharing the same cluster as the target stock. I perform a second level clustering on the universe of stocks held by institutional investors based on the similarity of clusters that hold each stock during each period. The basic results are not changed. 17 I replicate all the analyses using weekly and bi-weekly data. The results are similar to regressions using daily data. The weekly results are slightly stronger than their daily counterparts, and the bi-weekly results are as strong as the weekly ones. These results imply that, for my data, clientele effects generate comovement at a weekly frequency. 16 Each stock thus matches to one of the quintile portfolios. To control for industry effects, I use the Fama-French 48 industry portfolios. Each stock matches to one of 48 industry portfolios. The first two columns of Table 7 report evidence that supports excessive turnover comovement within clusters. To judge the cluster level turnover’s explanatory power of stock level turnover, Column 1 shows baseline results of stock level turnover regressed on market level turnover and characteristic controls. Since I standardize each variable by its time-series standard deviation, one can readily read off the economic significance of the estimates. Note the reported coefficients are the cross-sectional means of stock-wise time-series regressions. Column 2 shows regression results after I add cluster level turnover to the right-hand side of the baseline regression. Stock level turnover comove with market level turnover, public information, and cluster level turnover since all coefficients are positive and significant. The coefficient on cluster β c is highly statistically significant. A one standard deviation increase in the stock’s affiliated cluster turnover is associated with 0.0230 standard deviation increase in the stock’s turnover. In terms of magnitude, the cluster turnover coefficient is roughly 50% as large as the coefficients of industry turnover variable. Turnover of the matched book-to-market portfolio turns out to have the smallest effect. Thus, cluster turnover seems to be associated with observed stock turnover comovement. A.2. Return Commonality Trading commonality discovered in the previous subsection does not directly translate into return commonality. If the transaction is not information-based and if there is enough liquidity for the underlying stock, then trading by itself should not impact prices greatly. Hence I test whether stock returns also comove at the cluster level. The test specification is similar to that for stock turnover. The only difference is that instead of using stock, cluster, and market level turnover measures, I use in their place the corresponding return measures. I construct these return measures in ways analogous to their turnover measure counterparts. In the interest of brevity, I skip the construction details. Column 3 and 4 in Table 7 show the results from regressing stock returns on their associated cluster level returns, their associated market level returns, as well as their 17 associated characteristic returns. Existing literature says the Fama-French three factors, the momentum factor, and the industry factor explain a large part of the cross-section of stock returns. My results concur; the coefficients on the market, size, momentum, and industry portfolio return variables all load significantly in the return regressions reported in the table. However, similar to the turnover results, book-to-market return does not significantly contribute to stock return comovement. Comparing the baseline regression that does not include cluster level return to the regression that does include cluster level return, I find a cluster effect on top of the other factors. The coefficient on cluster level return is both statistically and economically significant. For a typical stock, a one standard deviation increase in cluster level average return is associated with a 0.04 standard deviation increase in the stock’s return, which translates to a 20 basis points increase in daily return level. Thus, I find that fund holding clustering is associated with observed stock return comovement. A.3. Liquidity Commonality Having looked at how clustering seems to be associated with stock trading and return commonality, I finally examine if clustering is also connected to observed liquidity commonality. The last four columns of Table 7 summarize regression results for the commonality of quoted spreads and effective spreads, respectively. The baseline regression mimics the results that have been found in the literature. Both size and industry factors play important roles in explaining the movement of individual liquidity. Similar to the return and trading volume results, cluster level liquidity is statistically and economically significant. The economic significance of the cluster level liquidity is roughly one-third of that of the industry level liquidity. Using quoted spread as an example, a one standard deviation increase in industry level liquidity increases stock level liquidity by 11.8% of a standard deviation. A one standard deviation increase in cluster level liquidity increases stock level liquidity by 4.4% of a standard deviation, which translates to a 0.1% increase in the quoted spread. The explanatory power of the regression also increases with the addition of cluster level liquidity. 18 VI. Correlated Budget Constraints and Cluster Level Comovement In the previous sections, I document the existence of stable clusters among institutional investors and show how clustering promotes excessive comovement in trading behavior, return, and liquidity among stocks held by the same cluster. Current literature provides three explanations as to why we observe excessive comovement. I now look at how these explanations fit into my clustering story. The three explanations can be neatly characterized by the mechanisms through which they operate. The channels are based on (1) correlated private information,18 (2) cross-market portfolio rebalancing,19 and (3) correlated liquidity demands.20 In this paper, I mainly focus on the channel based on correlated liquidity demands. First, I show that funds’ investment behavior is dramatically distorted when facing a large flow shock. When facing a large money inflow, a fund tends to purchase more; when facing a large money outflow, it liquidates a larger proportion of its portfolio. Whether such trading distortions have market-wide impact also depends on if other investors are willing and able to provide liquidity to the constrained funds at reasonable prices. In the second part of this section, I find that funds sharing the same cluster experience large inflows and outflows around the same time. My finding concurs with Edelen (1999). Edelen finds substantial positive cross-correlation in fund flows, possibly indicating the existence of common factors. Here, I show that part of the flow commonality is due to the clustering of funds’ portfolio holdings. This high degree of flow correlation implies that funds clustered together likely hit their budget constraints concurrently. In the last part of this section, I test whether common liquidity shocks can partially explain why stocks held by the same cluster 18 The correlated information channel was originally introduced by King and Wadhwani (1990). It is based on the idea that information asymmetry leads uninformed traders to incorrectly update their beliefs on the payoffs of many assets following idiosyncratic shocks to a single asset. 19 Fleming, Kirby, and Ostdiek (1998) and Kodres and Pritsker (2002) argue that the portfolio rebalancing activity of privately informed, price-taking investors—driven by risk aversion—may mislead the updating process of other, uninformed investors, thus eventually inducing financial contagion. 20 The importance of financial constraint arbitrage is first emphasized by Shleifer and Vishny (1997). Kyle and Xiong (2001) study the effect of wealth constraint on arbitrageurs and use it as a spillover mechanism. Gromb and Vayanous (2002) develop an equilibrium model of arbitrage trading with margin constraints to explain contagion. Yuan (2005) shows that information asymmetry amplifies the wealth effect on price movement. 19 experience excessive comovement. Particularly, because funds within the same cluster experience correlated inflows and outflows, an individual fund’s liquidity-forced transactions aggregate to a large liquidity demand at the cluster, as well as market, level, inducing a big price impact. Whether common liquidity shocks have a first order effect in the real market is ultimately an empirical question I answer in the following sections. A. Fund Trading Behavior during Large Flow Periods As pointed out above, CDA Spectrum Institutional data from the quarterly 13f filings database does not contain direct fund flow information. To proxy for the flow variable, I had to make rather strong assumptions about fund investment spectrum and timing. Fortunately, the CRSP Mutual Fund Survivorship Free database allows me to calculate fund flow for mutual funds without these assumptions. For the analysis of this section, I restrict my sample to mutual funds due to data limitations. Since I am interested in trading behavior of funds when they incur large inflows or outflows, I calculate the holding statistics restricted to sub-samples of funds with flow>19% and flow<-7%, respectively. The 19% and -7% represent the upper and lower twenty percentile cutoffs of the fund flow distribution. As a comparison, I also include the corresponding statistics for funds facing regular inflow (0<flow<19%) and regular outflow (-7%<flow<0). The summary statistics are reported in Table 8. On average, 18.18% of funds incur a large inflow, and 20.4% funds incur a large outflow each quarter. Panel B of Table 8 shows that funds buy more stocks and sell fewer stocks when they incur large inflows. The percentage of stocks bought increases from 43% for the regular inflow sample to 70% for the sub-sample of stocks that incur large inflows. In the same sub-sample, the percentage of sales decreases from 26% to 24%. In contrast, when funds incur large outflows, they tend to sell more stocks. Fund purchase percentage decreases to 33%, and fund sale percentage increases to 47%. These summary statistics suggest large capital flow strongly influences a fund’s investment decisions. When a fund experiences a large liquidity shock, it must change its portfolio to absorb the effects. When I further decompose purchases into increases on an existing stock or “new buys,” the pattern of funds buying (selling) when they experience large inflows (outflows) is even stronger. I label a purchase a “new buy” if the fund adds a stock to its portfolio in 20 the current quarter. A fund manager likely holds stronger positive opinions about “new buys” as opposed to increases on existing positions. For the similar reasons, I also count the number of stocks that are completely dropped (labeled “exits”) by funds. Managers should hold stronger negative opinion about these dropped stocks as compared to mere sales. From panel B of Table 8, among the stocks that are sold by funds that incur large inflows, 16% of them are completed dropped from fund portfolios. Only 13% are dropped from fund portfolios for the whole universe of funds. Panel D of Table 8 shows that 16% of purchases are “new buys” for funds incurring large outflows as compared to 14% for the whole universe. Managers need stronger signals to trade a stock in a different direction from that of the rest of the portfolio, especially when facing tight liquidity constraints. B. Correlated Cluster Level Flow among Institutional Investors For each mutual fund, I can more precisely calculate its fund flow. Thus, I can confidently decompose mutual fund total flow into expected flow and unexpected flow using a third order VAR model on [flow(t), return(t)]. To control for the market level expectation, I also include on the right-hand side of the regression the lagged market level average fund flow and lagged market level fund return. Expected flows are the forecasts of the VAR model, and unexpected flows are the residuals of the model. I decompose the flow variable in this manner because fund managers likely factor into their strategies how much money is expected to flow into or out of their funds. The impact of fund flows on funds’ investment strategy can be alleviated if they can be predicted ahead of time. In contrast, unexpected fund flows come as shocks, and a fund manager must deal with them in ways that may impact the flow, return, and liquidity of stocks held by the fund. First, I apply the same clustering procedure for the mutual fund quarterly holdings. Because restricting to mutual funds means further results are generated from a smaller sample, I check that statistics from the mutual fund sample conform to the statistics from entire sample of institutional investors. Reassuringly, properties of the clusters produced from the mutual fund sample qualitatively match properties of the clusters obtained from 21 the sample consisting of all institutions. 21 Thus, cluster analysis on the mutual fund sample should give the same inferences as analysis on the sample consisting of all institutional investors. Having checked that analysis of the mutual fund sample is representative, I proceed to fund flow analysis. Before I show fund flows are correlated at the cluster level, I supply a reason as to why one may expect this to be true. It is well documented that fund investors actively pull money out of loser funds and put money into winner funds. Since funds within a clientele hold similar portfolios, they should also perform similarly. If fund investors do chase performance, then correlated performance should lead to correlated fund flows. To verify this argument, I test whether funds sharing the same cluster have more correlated performance than funds residing in different clusters. Results in panel A of Table 9 support this hypothesis. Column 1 reports results from a regression of individual fund performance on market level fund performance and reveals a strong market level performance correlation. However, as can be seen from the second column, once I add a cluster level performance variable to the regression, market level influence dramatically decreases. In fact, the cluster level performance variable dominates and absorbs the market level variable’s explanatory power. This result suggests that most of the fund level performance correlation takes place at the cluster level. Having found excess performance correlation at the cluster level, I move on to test whether fund flows are also correlated at the cluster level. This is done by regressing individual fund flow on cluster level fund flow. For each fund, I construct a cluster level fund flow (Flowc,t) variable. For any particular fund, I define Flowc,t as the average flow for all funds sharing the same cluster, excluding the fund itself. Similarly, I construct a market level fund flow (Flowm,t) variable as the average flow of all funds in the market, excluding the fund itself. I then regress individual fund flow (Flowi,t) on its cluster level average flow (Flowc,t) and its associated market level average flow (Flowm,t). If fund flows are correlated at the cluster level, then cluster level average flow should provide explanatory power for individual flow over and above explanatory power provided by market level average flow alone. 21 The clustering results for the mutual fund datasets are available upon request. 22 I add several controls to the above baseline regression. As mentioned in the data section, I do not classify institutions using self-reported styles. I give several reasons for not doing so—among them, the fact that except for mutual funds, style is often not reported. Since analysis in this section is limited to mutual funds and mutual funds of the same style tend to have correlated portfolios, I add to the baseline regression a style flow variable, Flows,t, to control for correlated flow generated by funds sharing the same investment style. For each fund, I define Flows,t to be the average flow of all the funds that share the same style, excluding the fund itself. Previous literature also finds that fund investors actively chase past performance. It also finds fund flows are positively autocorrelated. Thus, I throw in as control variables current and previous month net return, as well as the fund’s previous month flow into the regression. For each fund, I run the following time series regression: Flowi ,t = α + β c Flowc ,t + β m Flowm,t + γX t + ε i ,t (2) where X t denotes exogenous control variables mentioned above. I normalize all variables in this regression by their own standard deviations so one can directly compare the economic significance of different regressors. Table 9 shows the regression results. I report regression results with and without the cluster flow variable separately for comparison. I also include in the regressions lagged cluster and lagged market flows to study possible contagion effects among funds in the same cluster. Table 9 reveals ample evidence of flow comovement within the same cluster. The regression coefficient of contemporaneous cluster flow is both statistically and economically significant. With raw flow as the dependent variable, the contemporaneous cluster flow coefficient β c is around 0.14 and has an associated t-statistic of 16. Approximately 70% of the time series β c coefficients are positive, and 34% of them exceed the 5% one-tailed critical value. Adding to the baseline regression cluster level flow improves the adjusted R-squared of the baseline model by 3% to 5% depending on which component of fund flow is under study. The significance of the cluster level variable in both the expected and unexpected flow regressions implies that they both contribute to the commonality of overall fund flow. 23 The coefficient on the cluster level variable in the expected flow regression loads is not surprising since funds sharing the same clusters hold similar portfolios and thus have correlated past performance. However, the corresponding coefficient in the unexpected flow regression is a little surprising. This result suggests that even if funds are sophisticated enough to smooth out the impact of the expected flows, a systematic liquidity shock will affect each cluster. Moreover, the significance of cluster level flow after controlling for individual past performance suggests that there might be a non-linear relationship between flow and past return. Contemporaneous cluster flow may pick up the residual effects left out by the linear prediction. An unrelated explanation may come from an externality effect. When investors decide in which funds to invest, they not only look at each fund’s individual performance, but they may also consult the performance of similar funds. Therefore, failures of some funds may cause investors to pull money out of other funds of the same type even if these other funds perform relatively satisfactorily. Finally, consistent with the existing literature, the style flow variable is significant even after controlling for performance, suggesting style does generate flow commonality. However, the coefficient on the cluster level flow variable is typically twice as large as that on the style flow variable, suggesting clustering generates even more flow commonality than style. C. Asymmetric Comovement Induced by Fund Flow The previous section shows that funds sharing the same cluster tend to have large inflows and outflows around the same time, suggesting commonality in financial constraint. Because the clustering is based on fund holdings, funds of a cluster hold a large percentage of the shares outstanding of the stocks held by the cluster. Since no one is there to take the other side, the correlated liquidity shocks aggregate to a market-wide liquidity shock for stocks that the funds in the cluster decide to buy or sell. These marketwide liquidity shocks impact pricing, as well as other stock trading related characteristics. Thus the correlated liquidity story predicts comovement among stocks held by the same cluster should increase with the magnitude of fund flow. Realistically, common liquidity shock to stocks held by the same marginal investor generates comovement is not the only story that can explain the phenomena already 24 mentioned. A missing risk factor or characteristics common to the stocks in question can also generate excessive return comovement. Although the missing factor mechanism plays at least a partial role in determining excessive return comovement, it is unlikely that the missing risk factor can generate the same observed comovement in liquidity or trading volume if it contains only public information. Moreover, fund flow is orthogonal to the time-series variation of the missing risk factor. Thus, if my cluster level variable provides additional explanatory power for comovement during times of inflows or outflows, then the budget constraint mechanism must have a first order effect. To show that part of the explanatory power of the cluster flow variable comes from funds changing their positions when facing liquidity shocks, I add two interaction terms to equation (1), my baseline regression testing for comovement. Specifically, I interact my cluster level measure with a dummy variable, large_inflow, indicating whether the cluster incurs a large inflow and a dummy variable, large_outflow, for a large outflow. The two variables not only capture the effects from the magnitude of the shocks but also the asymmetric effects from the direction of the shocks. Although funds subject to negative flow shocks have to liquidate some of their positions, funds do not necessarily increase their positions after a positive fund inflow. Thus, fund flows should impact prices and other stock characteristics asymmetrically. To test fund flow effects, I run the following model, T j ,t = α + β c CT j ,t + β InflowCT j ,t × DInflow, j ,t + βOutflowCT j ,t × Doutflow, j ,t + β ' X j ,t + ε j ,t (3) _ where Dinf low = 1 if the average cluster flow is greater than D inf low , and Dinf low = 0 otherwise; Doutflow is defined likewise. I use the upper and lower 10th percentile cutoff of _ _ historical cluster flow as the value for D inf low and D outflow , respectively. The regression results for turnover, return and liquidity are summarized in Table 10. The signs of the two interaction coefficients are consistent throughout all specifications. Using turnover as an example, both of the coefficients on the interaction terms are positive, suggesting higher cluster level comovement when funds in the cluster incur large flows. However, whereas the coefficient on the interaction term for large inflow is generally not significant, the coefficient on the interaction term for large outflow is always significant. This difference implies an asymmetric effect of fund flow, which is 25 consistent with the wealth constraint story. When funds face large inflows, they have the flexibility to smooth out their investment over time, so that price impact can be minimized. In contrast, when funds are subject to large outflows through redemption, they are then forced to liquidate their assets relatively quickly, which can generate a large price impact. Finally, expected fund flow and unexpected fund flow may have different degrees of price impact. However, it is but it is unclear which one should have a stronger effect. On the one hand, if large flows are expected, funds may prepare for them and choose investment strategies that mitigate their impact. On the other hand, as shown by the flow correlation test in the previous section, expected fund flows tend to be more correlated than unexpected ones. Therefore, we perform separate analyses for expected and unexpected flow. Comparing results of models using expected flows versus those using unexpected flows, unexpected outflows seem to have stronger effects. It seems funds do factor next period’s expected flows into their investment strategies. VII. Conclusion Since the asset management industry is a dominant player in the U.S. equity market, institutional investment behavior should impact stocks in terms of their trading, pricing, and liquidity. Proceeding from this intuition, this paper shows the existence of institutional clienteles can partially account for all of the previously found comovement related findings. Using a novel approach based on applying standard clustering algorithms to institutional holdings, I first find the majority of institutional investors fall into a few distinct clienteles. This partitioning seems to be stable and seems to capture some fundamental economic characteristics. Second, I find the existence of institutional clienteles has asset pricing implications in that there appears to be excessive comovement of turnover, return, and liquidity on the clientele level. Finally, as a possible explanation for the excessive clientele-level commonalities, I present evidence showing that institutional investors are wealth constrained gives one explanation for the observed cluster effects. 26 This paper uncovers a stable clustering of institutional holdings. One may argue, however, that it is equally important to uncover the economic forces behind this clustering phenomenon. Future research on this topic may uncover why institutional holdings cluster together and also what institutional characteristics are related to this clustering phenomenon. Furthermore, from a methodological point of view, this paper has important implications for risk management. Kyle and Xiong (2001) and Stephen Ross (2001) point out that the comovement due to the intrinsic feature of the asset management industry implies some flaw in current risk valuation methodology. Currently, researchers evaluate portfolio risk based on historical correlation of returns of the underlying stocks. This paper suggests that when evaluating the diversification level of one fund, one should also take into account which other funds hold the same stocks and how healthy their financial conditions are. 27 References Allen, Franklin, and Douglas Gale, 1994, Limited market participation and volatility of asset prices, The American Economic Review 84, 933-955. Ang, Andrew, and Joseph Chen, 2002, Asymmetric correlations of equity portfolios, Journal of Financial Economics 63, 443–494. Barberis, Nicolas, Andrei Shleifer, and Jeffrey Wurgler, 2005, Comovement, Journal of Financial Economics 75, 283-317. Bennett, James, Richard Sias and Laura Starks, 2003, Greener Pastures and the Impact of Dynamic Institutional Preferences, The Review of Financial Studies 16(4), 1203-1238. Boudoukh, Jacob, Matthew Richardson, Robert Stanton, and Robert Whitelaw, 1997, Pricing mortgage-backed securities in a multifactor interest rate environment: multivariate density estimation approach, Review of Financial Studies 10, 405-446. Boyer, Brian H., Tomomi Kumagai, and Kathy Zhichao Yuan, 2005, How Do Crises Spread? Evidence from Accessible and Inaccessible Stock Indices, AFA 2003 Washington, DC Meetings. Boyer, Brian, and Lu Zheng, 2004, Who moves the market? A study of stock prices and sector cashflows, Working paper, University of Michigan. Brown, Stephen, and William Goetzmann, 1997, Mutual fund styles, Journal of Financial Economics 43, 373-399. Brunnermeier, Markus, and Lasse Pedersen, 2005, Market liquidity and funding liquidity, Working paper, New York University. Chan, Louis, Hsiu-Lang Chen, and Josef Lakonishok, 2002, On Mutual Fund Investment Styles, The Review of Financial Studies 15, 1407-1437. Carleton, Willard, and Victor McGee, 1970, Piecewise regression, Journal of the American Statistical Association, 1109-1124. Chevalier, Judith, Glenn Ellison, 1997, Risk taking by mutual funds as a response to incentives, The Journal of Political Economy 105, 1167-1200. 28 Chordia, Tarun, Richard Roll, and Avanidhar Subrahmanyam, 2000, Commonality in liquidity, Journal of Financial Economics 56, 3-28. Collin-Dufresne, Pierre , Robert Goldstein, and Spencer Martin, 2001, The determinants of credit spread changes, The Journal of Finance 56, 2177-2208. Connolly, Robert, and Albet Wang, 1998, On stock market return comovements: Macroeconomic news, dispersion of beliefs, and contagion, Working paper, Rice University. Connolly, Robert, and Albet Wang, 2003, International equity market comovements: economic fundamentals or contagion?, Pacific-Basin Finance Journal 11, 23–43. Coughenour, Jay, and Mohsen Saad, 2004, Common market makers and commonality in liquidity, Journal of Financial Economics 73, 37-69. Coval, Joshua and Erik Stafford, 2005, Asset fire sales (and purchases) in equity markets, Working paper, Harvard University. Da, Zhi and Pengjie Gao, 2005, Clientele change, liquidity shock, and the return on financially distressed stocks, Working paper, Northwestern University. Edelen, Roger, 1999, Investor flows and the assessed performance of open-end mutual funds, Journal of Financial Economics 53, 439-466. Elton, Edwin and Martin Gruber, 1970, Improved forecasting through the design of homogeneous groupings, Journal of Business 44, 432-450. Frazzini, Andrea and Owen Lamont, 2005, Dumb money: mutual fund flows and the cross-section of stock returns, Working paper, Yale University. Gabaix, Xavier, Arvind Krishnamurthy, and Olivier Vigneron, 2006, Limits of arbitrage: theory and evidence from the mortgage-backed securities market, Journal of Finance, forthcoming. Gromb, Denis, and Dimitri Vayanos, 2002, Equilibrium and welfare in markets with financially constrained arbitrageurs, Journal of Financial Economics 66, 361-407. Hasbrouck, Joel, and Duane Seppi, 2001, Common factors in prices, order flows and liquidity, Journal of Financial Economics 59, 383-411. 29 Hasbrouk, Joel, 2005, Trading Costs and Returns for US Equidities: The Evidence from Daily Data, Working Paper, New York University. Huberman, Gur, and Dominika Halka, 2001, Systematic Liquidity, The Journal of Financial Research 24, 161-178. Ippolito, Richard, 1992, Consumer reaction to measures of poor quality: evidence from the mutual fund industry, Journal of Law and Economics 35, 45-70. Kacperczyk, Marcin, Clemens Sialm, and Lu Zheng, 2005, On the Industry Concentration of Actively Managed Equity Mutual Funds, Journal of Finance 60, 1983-2011. Karolyi, Andrew, and Rene Stulz, 1996, Why do markets move together? An investigation of U.S.-Japan stock return comovements, Journal of Finance 51, 951–986. King, Mervyn, and Sushil Wadhwani, 2000, Transmission of volatility between stock markets, Review of Financial Studies 3, 5–33. Kodres, Laura E., and Matthew Pritsker, 2002, A rational expectations model of finanical contagion, Journal of Finance 57, 769-799. Kyle, Albert, and Wei Xiong, 2001, Contagion as a wealth effect, The Journal of Finance 56, 1401-1440. Lee, Charles, Andrei Shleifer, and Richard H. Thaler, 1991, Investor sentiment and the closed-end fund puzzle, Journal of Finance 46, 75-109. Longin, Francois, and Bruno Solnik, 2001, Extreme correlation of international equity markets, The Journal of Finance 56, 649–676. Merton, Robert, 1987, A simple model of capital market equilibrium with incomplete information, The Journal of Finance 42, 483-510. Newman, Yigal, and Michael Rierson, 2004, Illiquidity spillovers: theory and evidence from european telecom bond issuance, Working paper, Stanford University. Nieuwerburgh, Stijn Van, and Laura Veldkamp, 2006, Information acquisition and portfolio under-diversification, Working paper, New York University. Pasquariello, Paolo, 2006, Imperfect Competition, Information Heterogeneity, and Financial Contagion, Review of Financial Studies. 30 Pindyck, Robert, and Julio J. Rotemberg, 1993, The comovement of stock prices, The Quarterly Journal of Economics 108, 1073-1104. Ross, Stephen, 2001, Discussion: Contagion as a wealth effect, Journal of Finance 56, 1440-1443. Shleifer, Andrei, and Robert Vishny, 1997, The limits of arbitrage, The Journal of Finance 52, 35-55. Sirri, Erik, and Peter Tufano, 1998, Costly search and mutual fund flows, The Journal of Finance 53, 1589-1622. Veldkamp, Laura, 2005, Information markets and the comovement of asset prices, Forthcoming in Review of Economic Studies. Wermers, Russ, 2003, Mutual fund performance: An empirical decomposition into stockpicking talent, style, transaction costs, and expenses, The Journal of Finance 55, 1655-1703. Yuan, Kathy, 2005, Asymmetric price movements and borrowing constraints: A REE model of crisis, contagion, and confusion, Journal of Finance 60, 379-411. 31 Table 1 Institutional Trading (All Institutions) This table reports the median number of stocks held and traded by a typical institutional investor each quarter. Institutional ownership data are obtained from CDA Spectrum. Sample period is 1980-2003. We delete the first quarter and last quarter observation for each fund to preclude artificial counting of purchase or sale due to funds entering or exiting the database. If a fund has a missing report during a quarter, we do not count the number of trades in the following quarter. A fund is considered to buy/sell a stock if it increases/decreases its shares from the last quarter. A purchase is labeled as "new buy" if it was not in the portfolio but is newly added in the current quarter. A sale is labeled as "exit" if it is completely eliminated from the portfolio. Year # of funds Hold Buy Sell New buy Exit Unchanged 1980 438 190 83 75 23 16 48 1981 478 191 83 78 21 19 49 1982 502 194 85 83 25 22 48 1983 523 213 99 87 31 24 51 1984 584 216 94 92 27 26 56 1985 635 227 106 93 33 27 55 1986 687 236 112 99 34 31 55 1987 739 246 114 105 34 31 59 1988 779 247 117 112 31 28 45 1989 752 263 115 103 32 29 75 1990 841 253 104 102 26 28 76 1991 872 257 112 97 31 25 73 1992 949 266 122 101 33 27 70 1993 966 280 134 108 40 30 68 1994 997 290 129 124 40 36 73 1995 1099 298 138 123 40 36 73 1996 1083 308 153 124 47 38 68 1997 1199 312 155 130 46 39 67 1998 1323 310 155 132 50 43 66 1999 1310 313 156 146 50 50 62 2000 1514 310 162 147 54 52 54 2001 1579 287 146 129 43 40 52 2002 1601 285 143 131 38 39 49 2003 1690 288 148 129 41 36 47 Total 964 262 124 111 36 32 60 32 Table 2 Summary Statistics for Liquidity and Turnover Panel A summarizes the liquidity variables. The proportional quoted spread and the proportional effective spread are used as proxies for liquidity measures. TAQ data from 1993-2003 are used to compute spread measures. Daily spread measures are calculated as the simple average of spreads of every transaction and quote during a day. Panel B summarizes the daily turnover measure, with sample period from 1980-2003. All the statistics reported are the cross-sectional statistics for time-series means among the stocks. For each measure, separated summary statistics are reported for all the stocks in the database, and the stocks having institutional ownership. Panel A: Daily Liquidity Measures (Cross-sectional statistics for time-series means) N Mean Median Std. Deviation Proportional Quoted Spread (Whole Sample) 15,044 0.0437 0.0259 0.0575 Proportional Quoted Spread (Whole Sample) 15,034 0.0424 0.0257 0.0546 Proportional Effective Spread (Held by Institutions) 14,722 0.0314 0.0194 0.0392 Proportional Effective Spread (Held by Institutions) 14,722 0.0306 0.0192 0.0377 Panel B: Daily Turnover (Cross-sectional statistics for time-series means) N Mean Median Std. Deviation Turnover (Whole Sample) 21,352 0.0059 0.0030 0.0848 Turnover (Held by Institutions) 20,130 0.0060 0.0031 0.0871 33 Table 3 Number of Clusters Hierarchical clustering is performed each quarter based on the pair-wise distances of the portfolios. All the statistics are simple averages of the quarterly values within a year. Column 2 is the total number of funds per quarter. Column 3 counts the average number of clusters obtained each period. Columns 4 and 5 compute the average percentage of funds covered by the largest 10 clusters in terms of count of funds and market value. Year # of funds # of clusters Largest 10/total countLargest 10/total asset 1980 475 39 86.86% 94.73% 1981 511 45 82.79% 93.08% 1982 536 48 82.48% 93.05% 1983 574 55 79.65% 92.21% 1984 631 63 76.41% 91.58% 1985 694 68 73.70% 90.01% 1986 746 83 67.67% 87.44% 1987 809 88 65.58% 85.66% 1988 834 86 66.29% 86.58% 1989 827 86 67.42% 86.24% 1990 892 84 68.91% 86.79% 1991 932 88 68.81% 88.23% 1992 1018 107 61.76% 83.40% 1993 1035 115 57.66% 82.53% 1994 1086 124 58.55% 83.52% 1995 1181 136 58.00% 83.13% 1996 1191 136 57.71% 81.82% 1997 1320 161 57.98% 84.20% 1998 1452 163 58.70% 86.04% 1999 1465 140 63.79% 90.36% 2000 1646 151 63.22% 89.82% 2001 1673 158 62.22% 90.24% 2002 1741 175 59.42% 88.17% 2003 1800 179 57.26% 85.35% 34 Table 4 Mutual Fund Clusters and Styles The table summarizes the relationship between the clusters obtained from the mutual fund holding datasets and mutual funds’ ICDI objective. For each cluster, we identify a dominant style, which represents the highest number of funds within the cluster. Panel A reports the average percentage of funds covered by the dominant style. Only the largest 10 clusters and clusters containing more than 10 funds are analyzed. Panel B reports the percentage of the total number of clusters belonging to various styles. A cluster is only associated with a dominant style. "AG", "BL", "GI", "IN", "LG", "SF", "TR", "UT" and "IE" represents aggressive growth, balanced, growth and income, income, long term growth, sector funds, total return, utility funds, and international equity, respective. Panel A: Average percentage of funds within a cluster covered by the dominant style year AG BL GI IN LG SF TR UT IE 1993 66.32 28.57 48.20 48.76 48.43 46.15 30.30 76.53 1994 79.23 40.00 44.52 54.89 55.55 69.95 33.33 95.83 50.00 1995 68.03 43.33 43.42 44.57 51.66 78.87 36.36 87.16 50.00 1996 67.30 34.84 40.54 54.10 50.96 85.52 42.48 89.63 . 1997 83.53 36.67 43.01 43.09 49.34 61.30 34.38 91.29 . 1998 75.16 40.00 41.65 55.84 52.65 75.74 35.42 86.36 . 1999 83.98 30.00 43.69 38.59 51.46 76.32 . 83.85 100.00 2000 88.63 . 41.78 43.81 47.12 76.90 . 81.83 100.00 2001 88.12 . 43.67 41.63 53.28 72.17 30.00 87.76 100.00 2002 87.30 . 40.50 . 54.47 83.28 . 89.25 100.00 2003 90.87 . 40.29 . 55.39 84.82 . 95.66 . whole 79.86 36.20 42.84 47.25 51.85 73.73 34.61 87.74 83.33 Panel B: Average percentage of clusters for each style year AG BL GI IN LG SF TR UT IE 1993 7.5 3.8 24.3 8.8 51.8 4.3 8.7 4.6 . 1994 13.0 3.6 20.1 8.7 52.2 4.7 3.1 3.4 3.1 1995 11.7 5.7 19.5 6.3 52.5 5.0 3.0 3.1 3.6 1996 15.3 4.1 16.8 5.0 50.4 5.4 3.0 3.8 . 1997 13.7 2.6 15.8 7.5 49.1 9.4 2.5 3.1 . 1998 12.4 2.7 17.2 4.9 49.0 9.6 2.8 3.5 . 1999 21.7 3.0 17.1 3.7 40.8 10.7 . 3.0 3.0 2000 18.2 . 13.3 6.3 48.0 13.2 . 4.1 3.3 2001 17.5 . 15.9 4.4 40.1 16.6 3.3 3.3 3.3 2002 25.7 . 17.1 . 35.9 17.1 . 3.4 3.3 2003 22.3 . 19.8 . 37.8 16.1 . 4.1 . whole 16.3 3.7 17.9 6.2 46.2 10.2 3.8 3.6 3.3 35 Table 5 Transition Rate of Pair-Wise Connections Between Funds In each quarter, we study the pair-wise connection between funds; connection takes the value of 1 or 0 depending on whether the two funds under study fall into the same cluster or not. We then count the percentage of pair-wise connections remaining unchanged the next quarter. The higher the percentage, the higher the stability of clustering. Column 2 counts the number of pair-wise connections that are the same as the last quarter, and column 3 counts the total number pair-wise connections for funds that exist in both the previous and the current quarter. Column 4 is the transition rate, which computes the percentage of connections that changed since last quarter. Column 5 reports the bootstrapped transition rate under the null of no cross-sectional structure. The last column reports the standard deviation of the bootstrapped null distribution. Null transition Year Stay Total Transition rate Std. rate 1981 64309 84993 0.2434 0.3169 0.0050 1982 71634 90354 0.2071 0.3684 0.0064 1983 69886 90112 0.2246 0.3456 0.0071 1984 86818 109039 0.2052 0.3343 0.0058 1985 104785 129784 0.1935 0.2862 0.0043 1986 128892 156826 0.1782 0.2064 0.0024 1987 139789 163463 0.1451 0.1967 0.0024 1988 172574 200615 0.1397 0.2051 0.0021 1989 180288 210005 0.1407 0.2218 0.0033 1990 180133 213918 0.1558 0.2170 0.0023 1991 239758 283542 0.1542 0.2028 0.0016 1992 266787 304078 0.1231 0.1567 0.0012 1993 303359 341548 0.1122 0.1425 0.0012 1994 273712 306200 0.1061 0.1521 0.0014 1995 366303 409872 0.1066 0.1608 0.0016 1996 376703 421103 0.1066 0.1433 0.0013 1997 357245 405887 0.1204 0.1803 0.0018 1998 464076 526623 0.1193 0.1533 0.0013 1999 448169 509174 0.1194 0.1626 0.0013 2000 593227 686772 0.1366 0.1950 0.0014 2001 751530 866017 0.1323 0.2020 0.0015 2002 761940 881229 0.1353 0.1926 0.0015 2003 914199 1040793 0.1221 0.1914 0.0016 Total 0.1490 0.2145 36 Table 6 Cluster Transition Probability Inter-temporal links are established for each cluster. The dummy transition variable is created for each fund-quarter observation. If a fund changes its associated cluster from one quarter to the next, then the transition variable takes a value of 1, and 0 otherwise. Then the logistic regression of the transition variable is estimated. The institutional type and the fund's own past net performance are used to explain the transition rates. Transfer 1 16141 p Not transfer 0 36691 0.3055 Parameter Intercept Bank Insurance Mutual fund Independent advisor Past quarter net performance Past year net performance Year fixed effect Quarter fixed effect DF 1 1 1 1 1 Estimate -0.8127 -1.0383 0.0157 0.3910 0.5120 Chi-Square 2931.2696 1700.0156 0.2619 159.7305 939.1836 Pr>ChiSq <.0001 <.0001 0.6088 <.0001 <.0001 odds ratio 1 -2.4021 45.7134 <.0001 0.934 1 22 3 -0.9236 152.3284 578.8553 13.9422 <.0001 <.0001 0.003 0.842 37 0.314 0.901 1.312 1.481 Bank vs. Other Insurance vs. Other Mutual vs. Other IA vs. Other Table 7 Clusters and Comovement Characteristics studied include turnover, return, quoted spread and effective spread. For each stock, its individual daily characteristics is regressed on cluster average characteristics, contemporaneous market characteristics, average characteristics for the portfolios matched based on size, book to market and past year return, and Fama-French 48 Industry, respectively. All the variables on the right-hand side exclude the stock under study. The reported coefficients are the crosssectional means of the firm-by-firm time-series regression. For the cluster variable, the percentage of positive and positive significant among individual time series regressions are also reported. Sample period is 1980-2003 for turnover and return regressions and 1993-2003 for quoted spread and effective spread regressions. Cluster std error % positive % positive significant Turnover 0.0230 0.0010 56.54% 22.12% Return 0.0454 0.0008 61.92% 15.67% Quoted Spread 0.04418 0.00371 56.21% 36.35% Effective Spread 0.0367 0.0033 56.32% 31.78% Market std error 0.0136 0.0021 0.0129 0.0021 -0.0073 0.0014 -0.0102 0.0014 0.02429 0.008693 0.019404 0.008491 0.0455 0.0073 0.0384 0.0072 Size std error 0.0782 0.0027 0.0732 0.0027 0.1211 0.0025 0.1118 0.0025 0.262833 0.009907 0.255358 0.009703 0.2267 0.0082 0.2242 0.0081 BM std error 0.0023 0.0025 0.0011 0.0025 -0.0064 0.0025 -0.0071 0.0025 -0.01243 0.009537 -0.01409 0.009245 -0.0065 0.0077 -0.0090 0.0075 Momentum std error 0.0333 0.0023 0.0304 0.0023 0.0407 0.0022 0.0375 0.0021 0.056273 0.008637 0.052262 0.008339 0.0580 0.0072 0.0542 0.0070 Industry std error 0.0466 0.0016 0.0459 0.0016 0.0752 0.0017 0.0739 0.0017 0.126926 0.006569 0.117616 0.006283 0.1054 0.0051 0.0980 0.0049 N Mean R-square Mean Adjusted R-square 9253 0.0958 0.0843 9096 0.1012 0.0879 9944 0.0861 0.0753 9658 0.0883 0.0762 5510 0.4818 0.4732 5401 0.4988 0.4891 5247 0.4107 0.4005 5163 0.4238 0.4120 38 Table 8 Institutional Trading (Mutual Funds) The table reports median number of stocks held and traded by a mutual fund each quarter. Mutual fund ownership data is obtained from CDA Spectrum. Sample period is from 1980 to 2003. We delete the first quarter and last quarter observation for each fund to preclude artificial counting of purchase or sale due to funds entering or exiting the database. If a fund has a missing report during a quarter, we do not count the number of trades in the immediate subsequent quarter. A fund is considered to buy/sell a stock if it increases/decreases its shares from the last quarter. A purchase is labeled as "new buy" if it was not in the portfolio but are newly added in the current quarter. A sale is labeled as "exit" if it is completely ellimated from the portfolio. Panel A, B , C and D reports statistics for all funds, funds with 19%>flow>0%, flow>=19%, -7%<flow<0% and funds with flow<=-7% respectively. Panel A: 19%>Instituions with flow>0 year # of funds flow hold buy sell new buy exit unchange 1980-1984 61 3.39% 53 21 12 8 7 26 1985-1989 96 4.01% 56 23 15 10 8 26 1990-1994 173 4.13% 64 27 15 9 8 30 1995-1999 195 4.18% 74 33 19 11 11 28 2000-2001 269 3.40% 76 38 23 11 11 22 whole 154 3.84% 64 28 17 10 9 26 Panel B: Instituions with flow>=19% 1980-1984 7 33.08% 54 32 13 14 9 21 1985-1989 16 31.77% 48 33 12 13 9 15 1990-1994 32 33.10% 57 42 14 15 10 17 1995-1999 44 35.77% 63 46 16 14 12 12 2000-2003 42 55.37% 63 45 17 14 11 12 whole 28 37.09% 57 39 14 14 10 16 Panel C: -7%<Instituions with flow<0 1980-1984 99 -2.54% 47 12 11 5 6 27 1985-1989 147 -2.81% 51 16 15 7 8 26 1990-1994 145 -2.20% 57 18 16 8 8 29 1995-1999 179 -2.46% 66 25 24 10 11 27 2000-2003 376 -2.76% 74 28 31 12 12 21 whole 181 -2.54% 58 20 19 8 9 26 Panel D: Institutions with flow<=-7% 1980-1984 9 -11.19% 62 16 17 8 10 31 1985-1989 30 -10.41% 53 18 23 10 13 20 1990-1994 22 -11.12% 58 18 27 10 12 22 1995-1999 58 -11.54% 62 23 33 11 13 16 whole 37 -11.04% 61 20 28 10 12 20 39 Table 9 Flow Regressions For each fund, we construct its cluster fund flow—that is, the average flow for funds sharing the same cluster (excluding the fund itself). We perform time-series regressions of individual fund flow on market average flow (also excluding the fund itself) and cluster average flow for each fund. The reported coefficients are the median coefficients across all funds. Variable flow_cluster_std flow_style_std lag_cluster_std lag_style_std flow_mkt_std lag_mkt_std lag_flow_std return_std flow_cluster_std 1.000 flow_style_std 0.318 1.000 lag_cluster_std 0.667 0.274 1.000 lag_style_std 0.108 0.272 0.131 1.000 flow_mkt_std 0.321 0.690 0.287 0.237 1.000 lag_mkt_std 0.282 0.588 0.325 0.270 0.863 1.000 lag_flow_std 0.259 0.174 0.313 0.076 0.148 0.168 1.000 return_std 0.115 0.165 0.064 0.023 0.195 0.079 0.035 1.000 lag_return_std 0.115 0.157 0.118 0.060 0.184 0.196 0.070 0.430 Panel A: Performance Correlation Cluster Performance (t) 0.8266 Std. Error 0.0079 %positive 98.47% %positive significant 93.55% Cluster Performance (t-1) Std. Error %positive %positive significant 0.0091 0.0054 52.52% 7.85% Market Performance (t) Std. Error 0.8737 0.0026 0.1217 0.0080 Market Performance (t-1) Std. Error -0.0130 0.0041 -0.0062 0.0042 Self Performance (t-1) Std. Error -0.0078 0.0043 -0.0040 0.0040 40 Log (tna) -0.0183 0.0039 -0.0069 0.0028 N Mean R-square Mean Adjusted R-square Panel B: Flow Correlation 1629 0.7903 0.7708 1567 0.9008 0.8872 Cluster Flow (t) Std. Error %positive %positive significant Flow 0.1430 0.0087 70.51% 34.88% Expected flow 0.1472 0.0074 71.44% 33.65% Unexpected flow 0.1570 0.0084 74.11% 33.58% Cluster Flow (t-1) Std. Error %positive %positive significant -0.0506 0.0063 41.82% 8.26% 0.0539 0.0053 61.47% 17.79% 0.0139 0.0054 55.25% 8.90% Style Flow (t) Std. Error 0.0783 0.0012 0.0665 0.0074 0.0274 0.0095 0.0759 0.0111 0.0616 0.0062 0.0727 0.0069 Style Flow (t-1) Std. Error -0.0068 0.0061 -0.0050 0.0060 0.0664 0.0090 0.0370 0.0102 0.0195 0.0048 0.0107 0.0056 Market Flow (t) Std. Error 0.0993 0.0080 0.0699 0.0079 0.1472 0.0094 0.0929 0.0124 0.0789 0.0063 0.0551 0.0070 Market Flow (t-1) Std. Error -0.0621 0.0060 -0.0446 0.0068 0.0086 0.0076 -0.0284 0.0112 -0.0084 0.0051 -0.0130 0.0054 Flow (t-1) Std. Error 0.3659 0.0090 0.3491 0.0091 Self Performance (t) Std. Error 0.0069 0.0045 0.0025 0.0045 41 Self Performance (t-1) Std. Error 0.0116 0.0036 0.0118 0.0035 Log(tna) 0.0182 0.0117 1802 0.4818 0.4097 0.0075 0.0119 1743 0.5175 0.4357 N Mean R-square Mean Adjusted R-square 0.0020 0.0101 1506 0.2958 0.2513 -0.0425 0.0097 1495 0.4049 0.3505 42 0.0243 0.0065 1506 0.1183 0.0629 0.0108 0.0059 1495 0.2031 0.1311 Table 10 Flows and Cluster-Level Comovement The specification of the test is the same as in Table 7, except that two more variables are added into the regression. The first new variable Cluster*Large_Inflow is an interaction term between cluster-level characteristics and a dummy variable indicating whether the average flow for funds in the clusters that hold the stock has an large inflow or not. The second new variable is constructed likewise, except that the dummy variable for Large_Inflow is replaced by a dummy variable for Large_Outflow. An inflow/outflow is indicated large if they are higher/lower than the top/bottom ten percentile for the flow distribution. Turnover 0.0196 0.0264 0.0050 0.0050 53.13% 55.48% 19.82% 19.09% 0.0180 0.0025 56.04% 11.07% Return 0.0168 0.0024 56.58% 11.31% 0.0154 0.0024 56.20% 9.70% Quoted Spread 0.0385 0.0287 0.0315 0.0061 0.0059 0.0057 56.49% 55.46% 52.72% 33.31% 31.33% 31.02% Effective Spread 0.0367 0.0311 0.0264 0.0049 0.0048 0.0045 55.54% 55.69% 54.97% 26.35% 24.54% 24.62% Cluster std error % positive % positive significant 0.0177 0.0052 52.24% 18.53% Cluster*large_inflow std error % positive % positive significant 0.0057 0.0033 -0.0074 0.0068 0.0075 0.0071 49.32% 47.61% 45.78% 20.74% 20.30% 15.95% 0.0043 0.0048 0.0051 0.0037 0.0039 0.0036 50.77% 48.06% 50.00% 6.31% 6.87% 6.92% 0.0002 0.0010 0.0005 0.0021 0.0015 0.0015 49.37% 50.17% 51.16% 28.11% 32.17% 32.66% 0.0002 0.0018 50.22% 27.10% 0.0003 0.0006 0.0011 0.0011 50.58% 50.69% 28.97% 27.13% Cluster*large_outflow std error % positive % positive significant 0.0194 0.0102 0.0152 0.0054 0.0059 0.0052 51.99% 52.05% 51.50% 19.29% 19.73% 19.93% 0.0066 0.0063 0.0089 0.0024 0.0024 0.0036 52.97% 68.72% 53.11% 9.55% 13.40% 9.06% 0.0044 0.0041 0.0049 0.0024 0.0014 0.0017 52.12% 51.19% 54.12% 22.35% 20.68% 23.43% 0.0021 0.0014 51.41% 27.07% 0.0018 0.0148 0.0011 0.0011 53.52% 54.25% 29.03% 28.96% Market std error -0.0132 0.0036 -0.0145 0.0035 -0.0169 0.0036 -0.0457 0.0026 -0.0445 0.0026 -0.0422 0.0025 -0.0322 0.0130 -0.0302 0.0126 -0.0376 0.0127 -0.0238 0.0111 -0.0240 0.0108 -0.0358 0.0101 Size std error 0.0583 0.0053 0.0547 0.0053 0.0510 0.0053 0.0380 0.0050 0.0431 0.0049 0.0500 0.0049 0.1626 0.0162 0.1592 0.0159 0.1452 0.0157 0.1129 0.0134 0.1174 0.0134 0.1176 0.0129 BM std error 0.0092 0.0045 0.0123 0.0046 0.0126 0.0046 0.0289 0.0047 0.0271 0.0047 0.0223 0.0047 -0.0056 0.0165 -0.0069 0.0158 -0.0016 0.0158 -0.0050 0.0126 -0.0008 0.0129 -0.0157 0.0129 43 Momentum std error 0.0261 0.0043 0.0274 0.0042 0.0264 0.0042 0.0353 0.0041 0.0319 0.0040 0.0311 0.0040 0.0054 0.0164 0.0074 0.0151 0.0118 0.0159 0.0207 0.0122 0.0215 0.0115 0.0195 0.0115 Industry std error 0.0834 0.0031 0.0801 0.0031 0.0833 0.0032 0.1272 0.0030 0.1255 0.0030 0.1274 0.0029 0.1865 0.0112 0.1925 0.0114 0.1930 0.0112 0.1888 0.0090 0.1868 0.0090 0.1908 0.0089 N Mean R-square Mean Adjusted R-square 2385 0.1103 0.0942 2396 0.1093 0.0932 2347 0.1085 0.0919 2250 0.0742 0.0590 2264 0.0681 0.0537 2267 0.0741 0.0599 1657 0.4052 0.3934 1749 0.3999 0.3880 1728 0.3885 0.3760 1651 0.3219 0.3074 1679 0.3179 0.3031 1661 0.3009 0.2854 44 0.25 Probability Density 0.2 0.15 0.1 0.05 0 0 0.2 0.4 0.6 0.8 1 1.2 Pairwise Distance 1.4 1.6 1.8 2 Figure 1: Empirical distribution of pair-wise distance metric for institutional holdings (Quarter one of Year 1994). The blue histogram represents the empirical distribution of the pair-wise distance metric from the data, and the red line plots the corresponding distribution for the sample simulated under the null of no holdings. The pair-wise distance measure is defined as the sum of absolute deviation between two funds’ portfolio weights. For the null distribution, each fund is allowed to choose their portfolios randomly, but maintaining the fund size and portfolio concentration the same as in the data. 45 Figure 2: Size and book to market rankings for stocks held by top 10 clusters (second quarter of sample year 1995). Stocks are ranked into 5 by 5 size and book-to-market portfolios. Each bubble in the graph represents a particular cluster’s average rank along the two dimensions. Its center corresponds to the mean rank for the cluster, and the widths along the two dimensions represent the standard errors around the mean. 46