Essays on Social Networks and Information Worker Productivity A HUS by Lynn Wu LRHIE M.Eng. in Electrical Engineering and Computer Science, MIT, 2003 B.S. in Electrical Engineering and Computer Science, MIT, 2003 B.S. in Management Science, MIT, 2002 ARCH.IVEg Submitted to the Sloan School of Management in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Management at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2011 C 2011 Lynn Wu. All Rights Reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. 111 1/7 Signature of the Author MIT Sloan School of Management May 8, 2011 9 Certified by ,- Erik Brynjolfsson Schussel Family Professor Thesis Suprisor / I Accepted by R erto William F Pounds Pro ssor in Chair of Ph.D. Program, MIT Sloan School o . ernandez anagement anagement Abstract Essays on Social Networks and Information Worker Productivity Lynn Wu Submitted to the Alfred P. Sloan School of Management in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Management Science Abstract In this thesis, I examine how information, information technology, and social networks affect information worker productivity. The work is divided into three essays based on tracking detailed communication patterns of information workers in the high-tech industry. Essay 1: "Social Network Effects on Performance and Layoffs: Evidence from the Adoption of a Social Networking Tool." By studying the changes in employees' networks and performance before and after the introduction of a social networking tool, I find that a structurally diverse network (low in cohesion and rich in structural holes) has a positive effect on work performance. The size of the effect is smaller than traditional estimates, suggesting that omitted individual characteristics may bias the estimated network effect. I consider two intermediate mechanisms by which a structurally diverse network is theorized to improve work performance, information diversity (instrumental) and social communication (expressive), and quantify their effects on two types of work outcomes: billable revenue and layoffs. Analysis shows that the information diversity derived from a structurally diverse network is more correlated with generating billable revenue than is social communication. However, the opposite is true for layoffs. Friendship, as approximated by social communication, is more correlated with reduced layoff risks than is information diversity. Field interviews suggest that friends can serve as advocates in critical situations, ensuring that favorable information is distributed to decision makers. This, in turn, suggests that having a structurally diverse network can drive both work performance and job security, but that there is a tradeoff between either mobilizing friendship or gathering diverse information. Essay 2: "Identification of Influence: An Experimental Platform for Understanding the Relationship between Social Networks and Performance." This study creates an experimental platform for identifying the relationship between social networks and performance. While a large body of literature has examined the correlations between certain network topologies and performance, little research has shown a definitive causal linkage. I address this problem through conducting three sets of randomized field experiments using an on-line experimental platform at a large information technology firm. The platform enables randomly selected employees to achieve certain network characteristics. By examining work performance before and after the experiment, I plan to show the causal relationship between networks and productivity. Essay 3: "Water Cooler Networks: Performance Implications of Informal Face-to-Face Interaction Structures in Information-Intensive Work." This study examines the performance characteristics of face-to-face interaction networks and finds that their structural properties are important for effective knowledge transfer and productivity. We argue that network theory should incorporate the implications of media choice, and particularly differences between face- to-face and electronic communication, when assessing how networks affect individual performance. We introduce a new methodology, using Sociometric badges, to record precise data on face-to-face interaction networks for a group of workers in a large IT manufacturing firm over a one-month period. Linking these data to detailed performance metrics, we find that 1) network cohesion is associated with higher worker productivity, in contrast to previous findings in email data; 2) cohesion in face-to-face networks is associated with even higher performance during complex tasks, suggesting that cohesion complements information-rich media for transferring the complex knowledge needed to complete such tasks; 3) while information-seeking from many colleagues creates disruptions, more interactions with a few key strong-tie informants speeds up work. Face-to-face networks have more explanatory power than physical-proximity networks, suggesting that information flows in actual conversations (rather than individuals' correlated exposure to common environmental factors through physical proximity) are driving our results. These results augment our understanding of how media choice and network structure interact, shedding light on the organizational effects of face-toface interaction. The methods and techniques we introduce are replicable, creating opportunities for new lines of research into the consequences of face-to-face interaction in organizations. Committee: Erik Brynjolfsson (Chair) Director, Center for Digital Business & Schussel Family Professor MIT Sloan School of Management Roberto Fernandez William F. Pounds Professor in Management &Professor of Organization Studies MIT Sloan School of Management Ray Reagans Alfred P. Sloan Professor of Management, MIT Sloan School of Management Sinan Aral Assistant Professor of Information, Operations and Management Sciences New York University Stern School of Business Acknowledgements I've been incredibly fortunate to have an amazing group of advisors, peers, and friends. Without their support, the journey would have been much more intimidating. First of all, I would like to thank my fiance, Tim Kaldewey, for his love and support. Having flown endless miles from California to Boston during my graduate study, Tim has been my biggest supporter, cheering for me when I made a breakthrough, and motivating me when I hit a roadblock. Tim, you are the reason I survived this journey. To the members of my committee, Erik Brynjolfsson, Sinan Aral, Roberto Fernandez and Ray Reagans, I am honored and proud to call you my advisors and my role models. My deepest gratitude goes to my Chair, Erik Brynjolfsson. I am indebted to your inspirations, insights, and guidance. You have deeply influenced the way I view the world and how I approach solving a problem. Thank you for believing in me, investing in me, and giving me opportunities to grow and explore. Every interaction with you has been an eye-opening experience and the courage, excitement, and confidence you inspired every time after we meet were critical for pushing me forward. Sinan, you gave me the courage to pursue my passion. Your guidance and inspiration have had a profound impact on my personal growth as a researcher. You showed me how to look at the bigger picture beyond a single research project. Most importantly, you taught me the importance of communicating my ideas to others with clarity. Roberto, you are one of the most effective educators I know. Thank you for always being honest and motivating me to do my best. Your unwavering commitment to academic rigor taught me to be a meticulous researcher. You have never failed to transmit valuable lessons and remind me what is important in my work and in the field. Your commitment to give back to the academic community has inspired me to do the same. Ray, your insightful comments were critical for my thesis. You have an incredible ability to see what is still very cloudy in my mind and help me frame my research to fit in the broader picture. Each minute spent with you has saved me hours. Thank you for believing in my work and me. Several other faculty members at MIT Sloan School of Management also provided tremendous support and guidance. Professor Stuart Madnick has always been instrumental for my growth. You have been a great mentor and a friend. Having also spent my entire education at MIT in course 6 and course 15, like you, I feel an immediate affinity to you. You helped me bridge to the world of management from my training in engineering. Wanda Orlikowski, George Westerman, Andy McAfee, and Stephanie Woerner, you have been very generous with your time and provided valuable feedback, especially during the critical stage of my job hunting. Marshall Van Alstyne, you have always been a great mentor and gave me valuable insights about research and where the field is headed. I would also like to thank my peers. Without you, the journey would have been unthinkable. Jason Abaluck, Phil Anderson, Joelle Evans, Heekyung Kim, Xitong Li , Yiftach Nagar, Adam Saunders, Jialan Wang, Yanbo Wang, and Jie Yang are the ones who make this experience tolerable and, at times, fun. I will always remember our laughter and tears. I especially owe thanks to Chuck Eesley. Not only did we grow up together as researchers, we have also grown to be great friends. Over the past 5 years, I have been affiliated with the MIT Center for Digital Business (CDB), which has created an exciting environment, enabling me to pursue my research interests. Without its financial support and contacts with various industrial partners, my research program would not have been possible. I especially thank Ching-yung Lin from IBM Research for your indefatigable support over the past years. Your faith, guidance, and generous financial support are greatly appreciated. I look forward to our continuing collaboration. I would not have made it without our amazing PhD director, Sharon Cayley. Your support, especially during dark times, has been instrumental for my survival in the PhD program. You have always made my visits to your office feel like home. Lastly, I would like to thank my parents, Zhen Wu and Cindy Lin, and my grandparents. Your unconditional love and support are always a source of comfort. Thank you for always believing in me and standing behind all my endeavors. Table of Contents In trodu ction ....................................................................................................................................................... 7 Essay 1: "Social Network Effects on Performance and Layoffs: Evidence from the Adoption of a Social Networking Tool." 1. In trodu ction ................................................................................................................................................ 14 2. Th eory.............................................................................................................................................................17 3 . Settin g ............................................................................................................................................................. 26 4 . D ata .................................................................................................................................................................. 29 5. Id entification .............................................................................................................................................. 37 6. E m p irical M eth ods .................................................................................................................................. 41 7. R esults ............................................................................................................................................................ 43 8 .D iscu ssion an d C on clu sion .............................................................................................................. 56 R eferen ce ........................................................................................................................................................... 60 Essay 2: "Identification of Influence: An Experimental Platform for Understanding the Relationship between Social Networks and Performance. 1t. In trodu ction ................................................................................................................................................ 2. 64 D ata an d Settin g........................................................................................................................................67 3. R esearch D esign ....................................................................................................................................... 69 4. Surveys on Expertise-Find Usage Patterns..................................................................................... 73 5. R obu stn ess Ch ecks..................................................................................................................................93 6. Conclusion and Pre-Experim ental Statistics................................................................................... 94 R eferen ces.........................................................................................................................................................95 Essay 3: "Water Cooler Networks: Performance Implications of Informal Face-to-Face Interaction Structures in Information-Intensive Work 1. Introduction .................................................................................................................................................... 98 2. Theory............................................................................................................................................................101 3. Background and D ata ............................................................................................................................... 4. Em pirical M ethods.....................................................................................................................................125 5. R esults............................................................................................................................................................127 6. D iscussions and Conclusion....................................................................................................................135 References.........................................................................................................................................................138 113 Introduction Organizations have long recognized the importance of social capital and looked for ways to effectively use the social networks of their employees. Recently, with the wide adoption of social networking tools, it has become increasingly important to understand how people derive value from their networks and if social media plays a role in helping individuals build desired networks and subsequently, harvest value from them. The goal of this thesis is to examine how social networks, information, and information technology (IT) affect information worker productivity. While the field has made some progress in understanding how certain properties of social networks favor superior work performance, many questions remain-in particular, those concerning the causal relationship between networks and performance, and the mechanisms of how networks impact performance. In addressing these questions, the thesis is divided into three essays. The first essay, "Social Network Effects on Performance and Layoffs: Evidence from the Adoption of a Social Networking Tool," assesses the effect of social networks for a group of consultants on two outcome measures: the ability to generate billable revenue and the risk of being laid off. Not only do I examine the performance implications of network structures, but I explore the intermediate outcomes networks generate that ultimately affect performance. The second essay, entitled "Water Cooler Networks: Performance Implications of Informal Face-to-Face Interaction Structures in Information-Intensive Work," examines the performance implications of face-to-face networks. Using Sociometric badges to record precise face-to-face interaction data for a group of IT workers and linking these data to detailed performance metrics, we find that, contrary to previous findings in email networks, network cohesion (the lack of structural holes) is associated with higher worker productivity. In my third essay, "Identification of Influence: An Experimental Platform for Understanding the Relationship between SocialNetworks and Performance,"I create an experimental platform to address the causal linkage between networks and performance. Together, the three essays form a program dedicated to understanding how social networks, information, and social media affect information worker productivity. These topics are of growing importance to managerial theory, practices, and policymaking as information work has become a cornerstone of production in developed economies where access to and processing of information are the keys to driving information worker productivity. The growth of information in the past 20 years is unprecedented and the advance of information technologies, such as search engines and data management tools, has greatly facilitated our ability to search for information. However, social networks remain an important, if not the predominant, way we obtain relevant and valuable information. Interestingly, during this time, there is a similar uptick in the use of social networking tools, easing the sharing of information. Together with the advance of digital communication, they provide an exciting opportunity to explore how people use social networks to obtain information and how that affects productivity. My work draws on several rich fields for both theoretical foundations and methodologies. For understanding how information workers generate value, I draw theoretical insights from labor economics, especially the production function of labor. Examining how social networks and social media drive productivity, I draw on theories from economic sociology and information economics-specifically, how certain attributes of social networks, such as structural holes, confer work advantages on individuals. I also draw from information system literature to form the basis of my thinking on how technology adoptions affect productivity. The prior literature on IT and firm productivity has inspired me to explore how technology adoption translates to gains in individual workers' performance. In addition to drawing from these three fields for theoretical foundations, I leverage techniques from machine learning and information retrieval to quantify various hard-to-capture properties of information derived from social networks. Together, these fields provide ideas and tools that help me to understand the importance of information, information technology, and social networks in information worker productivity. My work differs from prior literature in four important ways: 1) by addressing the causal relationship between network structures and performance through designing and implementing a set of randomized field experiments; 2) by analyzing the actual information content transmitted inside the network rather than using network structures as proxies for information; 3) by examining the network effects on multiple work outcomes in a single setting to explore how they differ in generating these outcomes; 4) by understanding how media choice affects the use of social networks to transfer knowledge. To accomplish these goals, I captured the information content of electronic interactions of more than 8000 employees in a large international firm as well as detailed face-to-face interactions at another medium size firm. The data are collected using privacy-preserving crawlers and sensors. From these data, I was able to map the electronic communication networks of these employees to understand the performance implications of a structurally diverse network-as characterized by low cohesion and structural equivalence and richness in structural holes. I used both panel data and instrumental variables to eliminate factors that might confound the estimates and found that a structurally diverse network can positively affect performance. While these econometric techniques help in addressing issues of whether there is a causal linkage between network structure and performance, they are not sufficient because these techniques require the assumption of exogeneity. However, only a randomized experiment in the field can be truly exogenous and definitely address the causal impact of networks on performance in the real world. To address this previously intractable problem, I designed and implemented an online experimental platform for conducting randomized field experiments at a large technology firm. Measuring the network and performance change for both the treatment and control groups, I can definitively answer the question of whether certain changes in the network cause performance improvements. From these causal analyses, I will be able to suggest building features into a social networking tool that recommend optimal connections to maximize employee performance. From these experiments, I can also address many important research questions. For example, I will be able to empirically show whether social networking tools can generate long-term change in a person's social network and the time it takes for such a change to generate value for the employees and the organization. I also contribute to the literature on the economics of information. By directly observing the information content transmitted in a network, I measure the information benefits derived from a structurally diverse network and their impacts on performance. While Burt (1992) theorized about three types of information benefits-access, timing, and referrals-there is scant evidence documenting them and especially comparing them in the same setting. This is problematic, because information benefits are theorized to be the primary reason why actors with a structurally diverse network can derive rents. Thus, it is important to verify if such networks actually generate these benefits. To address this problem, I measure information diversity (a combination of the access and timing of information) and friendship (a proxy for referrals) as two types of information benefits to explain why structurally diverse networks can generate work advantage and how they differ in affecting various work outcomes. I find that information diversity is more correlated than friendship for predicting objective performance measures such as billable revenue. However, friendship is more important than information diversity with regard to reducing the risk of layoffs. While having a large repository of information is crucial for performance in a knowledge-intensive industry, friendship is also important due to the referral process, because friends are more likely to advocate for an individual to avert a crisis, such as being laid off. Furthermore, I find that while a structurally diverse network can generate both types of information benefits, there is a fundamental tradeoff between the two. The constraint here is that a person only has limited time and resources to devote to gathering diverse information or forming friendships. While spending time to accumulate diverse information is helpful for generating billable revenue, it takes away time and energy from forming friendships, which is crucial for avoiding being laid off. Although studying networks derived from electronic media is important, face-to-face interactions, such as the proverbial "water cooler conversations," remain a significant part of intra-organizational communication. In a coauthored paper, "Water Cooler Networks: Performance Implications of Informal Face-to-Face Interaction Structures in InformationIntensive Work," I examine the performance implications of face-to-face networks. We introduce a new methodology, using Sociometric badges, to record precise data on face-to-face interaction for a group of IT workers. Combining these data with detailed performance metrics, we find that, contrary to previous findings in email networks, network cohesion (the lack of structural holes) is associated with higher worker productivity. This result augments our understanding of how media choice and network structure interact, shedding light on the organizational implications of face-to-face interaction. The methods and techniques are replicable, creating opportunities for new lines of research into the implications of face-to-face interactions in organizations. The methods and the experimental platform designed in this thesis are replicable and portend a new frontier in understanding the mechanisms behind why information and social networks matter for information worker productivity and whether the relationship is causal. As estimated in 2006, the amount of digital information created, captured, and replicated is 161 billion gigabytes, about 3 million times the information in all the books ever written (Barnette 2006). The speed of information growth has only been increasing since then. During this time, the simultaneous explosion of social media, knowledge management, and networking tools is not a mere coincidence. These tools have made it possible to cheaply share and disseminate the vast amount of information recently created. Thus, understanding how workers acquire information through social networks and how that ultimately affects productivity is only going to grow in importance. Reference: Gantz, J. F., Chute, C., Manfrediz, A., Minton, S., Reinsel, D., Schlichting, D., and Toncheva, A. 2008. "The Diverse and Exploding Digital Universe", EMC White Paper. http://www.emc.com/leadership/digital-universe/expanding-digital-universe.htm Social Network Effects on Performance and Layoffs: Evidencefrom the Adoption of a Social Networking Tool Lynn Wu Abstract By studying the changes in employees' networks and performance before and after the introduction of a social networking tool, I find that a structurally diverse network (low in cohesion and rich in structural holes) has a positive effect on work performance. The size of the effect is smaller than traditional estimates, suggesting that omitted individual characteristics may bias the estimated network effect. I consider two intermediate mechanisms by which a structurally diverse network is theorized to improve work performance, information diversity (instrumental) and social communication (expressive), and quantify their effects on two types of work outcomes: billable revenue and layoffs. Analysis shows that the information diversity derived from a structurally diverse network is more correlated with generating billable revenue than is social communication. However, the opposite is true for layoffs. Friendship, as approximated by social communication, is more correlated with reduced layoff risks than is information diversity. Field interviews suggest that friends can serve as advocates in critical situations, ensuring that favorable information is distributed to decision makers. This, in turn, suggests that having a structurally diverse network can drive both work performance and job security, but that there is a tradeoff between either mobilizing friendship or gathering diverse information. Keywords: Social Network, Productivity, Layoffs, Information Diversity, and Friendship Introduction Social network theory predicts a structurally diverse network that is low in cohesion and spans structural holes to be associated with higher work performance. By linking unconnected groups, the brokers, who bridge these holes, are endowed with early exposure to novel information and can act as hubs to facilitate information flow between otherwise disconnected groups. Studies have shown that people whose networks are rich in structural holes have a competitive advantage over their peers. They tend to receive superior performance ratings and higher compensation (Burt 1992; Podolny and Baron 1997; Burt 2005; Cross and Cummings 2004; Lin 2002). For example, bankers with structurally diverse networks are more likely to be recognized as top performers (Burt 2000). Similarly, employees in research and development positions maintaining diverse contacts outside of the team are more productive than their peers (Reagans and Zuckerman 2001). While previous research has provided important theoretical insights (e.g., Burt 1992; Coleman 1988), the question of how social network positions drive productivity gains remains open. Information benefits have been theorized to be the primary reason why a structurally diverse network endows individuals with work advantage. Often network structures are treated as a proxy for accessing more information and more diverse information (Burt 2008), and thus having a structurally diverse network is assumed to give individuals information advantage. However, information transmitted inside a network is rarely directly observed. Thus, it is difficult to verify if a structurally diverse network actually generates information benefits that ultimately affect performance. Burt has theorized that three forms of information benefitsaccess, timing and referrals-are responsible for driving superior work performance (Burt 1992: 13-15). If so, it is important to separately quantify these benefits and their relationships to work outcomes. I examine whether structurally diverse networks can generate information benefits by focusing on how information diversity and social communication-two types of information benefits that emerge from structurally diverse networks-can lead to superior work performance. I define information diversity as the heterogeneity of the information content in individuals' electronic communications. As a measure, it combines access to and timing of information,' the first two types of information benefits in Burt's framework. Earlier access to a variety of information sources allows an individual to gather more, and more diverse, information, which can be instrumental to performance. I also create a friendship index that measures the frequency of social communications and informal activities in a person's electronic communications. Because social communications can help generate friendships and friends are more likely to serve as advocates, the friendship index can serve as a proxy for the referral process, Burt's third type of information benefit. By examining the two types of information benefits and their instrumental and expressive nature, I attempt to bridge the literature on network structures with the literature on tie content. I find a structurally diverse network can generate both instrumental and expressive types of information benefits, with information diversity being instrumental and social communication being expressive. This finding runs contrary to the belief that due to their contrasting natures, there is a tradeoff between having both expressive and instrumental relationships in a networks (Bale and Slater 1955: 290-92; Etzioni 1965: 696-97; Slater 1965). While it is possible to have both kinds of benefits in a structurally diverse network, there is a tradeoff between the two in the relative returns on the investment from either socializing to form friendships or gathering diverse information. The decision to mobilize friendship or information diversity may depend on the work outcome one hopes to achieve. To better understand the tradeoff, I examine the impact of information diversity and social communication on two types of performance measures-billable revenue and layoffs-for 1Although this approach does not capture all communications by a person, email and instant messaging represent a significant proportion of the overall communication. Furthermore, calendar events also capture some of the a group of technology consultants at a large information technology firm. I choose billable revenue as an objective measure of a worker's productivity, because it is one of the most important performance metrics for evaluating employees in the consulting industry. Because accessing diverse information is critical for solving difficult problems, information diversity derived from a structurally diverse network is more likely to be beneficial in generating billable revenue. I also explore the effect of information diversity and social communication on layoffs, a negative and traumatic experience for most workers, and one which can negatively affect the remaining employees through network destruction, especially when friends are laid off (Krackhardt and Porter 1985; Shah 2000). It is possible that the mechanism for generating billable revenue may fundamentally differ from that for determining the risk of layoff. Often, firms delegate layoff decisions to managers. A manager's favorable opinion is likely to protect a person from being laid off. Effective promotion from referrals gets the actor's name mentioned at the right place and the right time, maximizing the job retention rate. I find that information diversity derived from a structurally diverse network is more associated with improving objective performance, such as billable revenue, than is social communication, but that social communication is more positively correlated with job retention than is information diversity. To exemplify their possible tradeoff, I show whether information diversity and social communication are substitutes in generating billable revenue and avoiding layoffs. To lend a causal interpretation to the analysis, I take advantage of a variation generated by a technology that can change the network positions of its users over time. By examining the change in work outcomes before and after the adoption, it is possible to determine if this technologically induced network change can actually alter billable revenue and layoff risks. Similarly, by examining the change in information diversity and social communication, it is network change and the changes in information diversity and social communications before and after the adoption, it is possible to determine if a structurally diverse network can actually generate different types of information benefits that may ultimately affect work outcomes. Theory Network Positions and Performance The structural perspective of network studies (Coleman 1988; Burt 1992) often focuses on the configuration of ties as opposed to the content of ties in the ego-network. One of the most prominent features of social network structure that has received an enormous amount of theoretical and empirical attention is brokerage or network diversity (e.g., Burt 1992; Granovetter 1973), characterized by a network that is low in cohesion and structural equivalence and rich in structural holes. Such networks are often positively correlated to various measurements of work performance. For example, Burt (1992, 2000, 2004) shows that structural holes can create a competitive advantage for individuals in dimensions such as wages and promotion. He attributes the normalized performance differences to actors' ability to access and gather information from non-redundant social groups (Burt 1992; Ancona and Caldwell 1992; Sparrowe et al. 2001; Reagans and Zuckerman 2001; Cummings and Cross 2003; Zaheer and Bell 2005). This information advantage is particularly important in knowledge-intensive industries where the success of a project relies on identifying and assimilating existing information in order to create new knowledge and innovation (Burt 1992). Thus, a structurally diverse network is assumed to confer information benefits by providing the access to novel information from loosely connected network neighborhoods (Burt 1992). The economic value of information stems from the fact that information is distributed unevenly in a network and thus, tapping into various information sources that are distributed throughout the network is important for solving difficult problems and finding new opportunities. Structurally diverse networks can provide actors with the capability to reach out to distant information sources. A redundant network, on the other hand, tends to provide repeat information. In such a network, no one can monopolize information long enough to derive rents because the dense network of strong ties can quickly disseminate any information throughout the network. In addition to information diversity, brokers are also theorized to control the flow of information and reap rents from brokering between two disconnected parties (Burt 2004, Obstfeld 2005). Endowed with preferential access to information, brokers are in a unique position to identify arbitrage opportunities and reap benefits through strategically linking disconnected actors. However, as Reagans and Zuckerman (2008) commented, there is a fundamental tradeoff in the social-structural foundations of power and knowledge. The same mechanism that endows brokers with power as the providers of information also reduces their power as acquirers of information because network contacts in a non-redundant network are also monopolist themselves when the broker tries to acquire information from them (Reagans and Zuckerman 2008). However, regardless of the control benefits, information benefits derived from a structurally diverse network are still greater than what is provided in a redundant network. In this paper, I focus on the information benefits and examine how they affect performance independent of whether individuals control the information flow to their advantage. Thus, I hypothesize that a structurally diverse network can affect work outcomes such as objective work performance measured using individual billable revenue as well as subjective performance as measured by the risk of being laid off Hypothesis ia: Structurallydiverse networks cause increasein billable revenue Hypothesis 1b: Having a structurallydiverse network reduces layoff risks. Network Diversity, Information Diversity and Social Communication While the information benefit derived from a structurally diverse network has received much scholarly attention, few have actually measured it; the vast majority of empirical work on network and information is content-agnostic (Hansen 1999). As Burt explained, network 18 structure is often used as a proxy for information flow because structures can be measured more easily than the actual content of what is transmitted in the network (Burt 2008). He calls for the next phase of network research to investigate how individuals gather information from their network positions. While some research, especially in the connectionist perspective, has also stressed the importance of measuring the content of the network, they often characterize the network as channels, pipes or conduits, and the content as attributes of the nodes (Podolny 2001; Rodan and Galunic 2007). Under this assumption, information flow is implicitly assumed to be proportional to the distribution of links among nodes of the network (Granovetter 1978; Schelling 1978). However, information exchange may occur strategically; individuals do not always share all available information (Reagans and McEvily 2003; Aral and Van Alstyne 2007). Hence, it is critical to open the black box of networks to investigate information that is being transferred between individuals inside a network. Instead of using characteristics of nodes as a proxy for information content and structural topology as a proxy for information flow, the next phase of research should examine if and how network positions generate information benefits and whether these information benefits ultimately induce superior work performance (Burt 2008; Aral and Van Alstyne 2007). One notable exception in advancing the information content analysis of networks is the recent work by Aral and Van Alstyne (2007). Using encoded email content, the authors analyzed the email traffic at an executive recruiter firm and showed that brokers are more likely to have more heterogeneous information, which is also associated with higher work performance (Aral and Van Alstyne 2007). While calculating information heterogeneity is a notable breakthrough, it is also important to measure other aspects of information benefits, especially comparing them in the same setting to show how their capabilities differ in influencing work outcomes. As Burt (1992) theorized, there are three forms of information benefits: access, timing, and referrals. Access refers to receiving a valuable piece of information, while timing refers to the ability to receive a key piece of information faster than others. Referrals are a process in which personal contacts promote the actor to others. As Burt explained, "they are motors expanding the third category of people in your network, the players you don't know who are aware of you.... [They] are strong personal advocates in decision-making process...to ensure both favorable information and response to any negative information get distributed during decisions" (Burt, 1992: 14-15). However, measuring access, timing and referrals is extremely difficult, because it is hard to directly observe the information content in people's interactions. I address this issue by quantifying two types of information benefits: information diversity and friendship, through encoded electronic communication such as email, text messages, and calendar events. Information diversity can be viewed as a combination of information access and timing. Specifically, I compute the diversity of information content by counting the number of distinct topics in an actor's electronic communications. Obtaining information from diverse sources is the key to making better decisions, solving difficult problems, and generating innovative solutions. Thus, I hypothesize information diversity to be the primary mechanism for a structurally diverse network to generate rents and competitive advantage. Hypothesis 2a: Structurally diverse networks can generate informationdiversity. Social communication contributes to the referral process. It measures how much of an actor's communications is related to socializing and informal social activities. Through these activities, network contacts get to know an individual better and are more likely to serve as strong personal advocates, particularly in situations of crisis and uncertainty (Ibarra 1995). Having a diverse circle of friends is more likely to help the actor, trumpeting his accomplishments and advertising his work to a diverse group of people, including decision makers. Hypothesis 2b: Structurallydiverse networks can generate referrals. Content of Ties: Expressive and Instrumental Network Relationships Information diversity and social communication can also be viewed in the framework of expressive and instrumental network relationships where information diversity is a proxy for instrumental relationships and social communication is a proxy for expressive relationships. Research on expressive and instrumental networks often focuses on the content of relationships (Borgatti and Foster 2003), as opposed to the structural properties. It argues that topological studies of networks often neglect the resources flowing between ties and focus exclusively on the structural perspectives (Lin 2001; Snijder 1999). This is problematic because actors are only successful when they can mobilize resources from their network contacts (Podolny and Baron 2001). One way to classify these resources is through their instrumental or expressive nature. Instrumental ties are often used to exchange work-related resources, and they typically involve actions that seek information, expertise, and professional advice (Ibarra 1993; Fombrun 1982; Lincoln and Miller 1979; Podolny and Baron 1997). Expressive ties, on the other hand, are often affective and friendship-based and involve the exchange of alliance, trust, and social support (Krackhardt 1995). Although instrumental ties and expressive ties are theorized to be distinct, they can also overlap in a dyadic relationship. Often expressive ties have instrumental values and some instrumental ties are also affective; thus the line separating the two often blurs (Scott 1996). Some have suggested that the two types of networks may interact positively: workers with overlapping instrumental and expressive ties are more effective (Ibarra 1992). On the other hand, expressive and instrumental activities often conflict because it is difficult to fill both roles at the same time (Bales and Slater 1955; Etzioni 1965; Slater 1965). For example, having friendship or expressive ties can make it difficult for a manager to enforce rules and sanctions to subordinates. Thus, expressive ties dampen the effect of instrumental actions or vice versa, leading to a tradeoff in having either one or the other (Fernandez 1991). Similarly, because people tend to distinguish between the two roles, expressive and instrumental networks can 21 have a substitutive relationship (Homan 1974; Fernandez 1991). Drawing upon the literature on expressive and instrumental networks, I show that information benefits derived from a structurally diverse network can have both instrumental and expressive elements. Specifically, information diversity is coupled with instrumental actions, much as social communication is to expressive actions. Instrumental actions, which generate task-related information and advice, increase the information diversity crucial to higher work performance. Expressive actions such as social communication generate friendship and active referrals that are more likely to advocate and promote an individual, helping the person avert crises and find new opportunities. However, the literature on instrumental and expressive networks focuses more on tie contents rather than on the network's structure, while the structural perspective often disregards instrumental and expressive elements in the network perhaps due to the difficulty of directly observing the information flow. I bridge the gap between the structure- and tie content-centric views by showing that structurally diverse networks can generate both instrumental and expressive properties. This is contrary to the notion that it is difficult to build both expressive and instrumental networks because they can be at odds (Bales and Slater 1955: 290-92; Etzioni 1965: 696-97; Slater 1965). However, unifying social communication and information diversity in a network comes with its own costs and tradeoffs. Because an actor's time and energy are necessarily limited, gathering information must in effect be traded off against forming friendship through social communication. Although they may overlap, social communication and information diversity are still distinct. For instance, to increase information diversity, actors can form ties with individuals whom they do not like or normally interact with for the sole purpose of gathering information. However, one can spend the same time and energy socializing, making friends and thus facilitating the referral process, even if this effort does not necessarily increase information diversity. Thus, because the effort to generate information diversity and social communication can be orthogonal, the tradeoff between the two lies in the return to these investments. Network Effect on Billable Revenue and Layoffs To understand how information diversity and social communication are used to achieve different goals, I examine their effects on two types of outcome: billable revenue and layoffs. Billable revenue is an objective measure of work performance and is one of the most salient metrics for evaluating consultants. Information workers, such as consultants, are especially valued for their ability to access valuable information, which can have two effects in enhancing work performance. First, accessing information related to the task at hand directly improves the quality of work. Second, accessing diverse information also exposes actors to new opportunities and valuable resources (Burt 1992, 2004). Consequently, these actors would be the first to learn a new opportunity, placing them at the front of the queue to strategically seize the opportunity. In the IT consulting business, accessing information expediently is the key to performance. Since consultants' performance is largely measured by billable revenue, it is crucial to do well in the current project as well as spending time to look for future opportunities. Knowing where to obtain expertise through networks helps an individual solve difficult problems and produce high quality work, enhancing his reputation and his future prospect for finding opportunities. All else equal, a manager would prefer reputable consultants to handle important projects, because they are more likely to satisfy customers and generate repeat business. Thus, if a structurally diverse network is to produce informational benefits, it should have a strong effect on information workers in knowledge-intensive settings. Social communication may also help improve billable revenue. By socializing informally with a diverse group of people, consultants are more likely to encounter opportunities serendipitously. Friends can also provide important information that eventually results in billable revenue. The operative factor in these situations, however, is information diversity that includes information generated by work-related interactions with friends. On the other hand, social communication, which proxies for friendship, is distinct from informational diversity in that it captures the referral process. Through informal interactions, an actor's network contacts are more likely to know his expertise and can serve as his advocates to others. Although having someone to advocate for an actor is helpful, it rarely generates billable revenue directly, because having access to useful information, opinions, and perspectives is ultimately responsible for solving difficult cases and generating profits. Hence, I hypothesize that a performance improvement arises from a structurally diverse network primarily by means of information diversity but not necessarily by social communication. Hypothesis 3a: Structurally diverse networks induce higher billable revenue primarily through informationdiversity, not through social communication. While structurally diverse networks are shown to provide information diversity that directly improves work performance, they can also produce referrals who can enhance a person's prestige and reputation. Functioning as means to trumpet one's accomplishments and promote one's work, referrals ensure the actor is protected in crisis situations, such as layoffs. Thus, the same channel through which actors derive diverse information also provides them with a diverse network of potential referrals. Through social communication to generate affective relationships, individuals can mobilize their network contacts to serve as their referrals. Thus, social communication in an employee's structurally diverse network can reduce the risk of being laid off and increase job retention. An employee is much less likely to be laid off if a wider range of people, including managers, has a favorable opinion of the person. Referrals can greatly facilitate this process by functioning as means to broadcast one's achievements to others. The advantages of the referral process also flow from the theory of recognition heuristics (Goldstein and Gigerenzer 1999, 2002); according to this theory, people place higher values on objects they recognize than on objects they don't, regardless of their actual values. Thus, when key decision-makers have heard of a person, that recognition value alone may keep the person from being laid off. In contrast, people with comparable, or even superior work evaluations, may face higher risks of being laid off if they lacked a diverse group of referrals to ensure that the decision-makers are aware of their contributions. From qualitative interviews of managers who participated in layoff decisions, many of them expressed the importance of reputation and general awareness of a person's work through the referral process. When we sit down at a meeting to make layoff decisions, we discuss people's work and what we think of their work, not just billable hours. Usually, when more than one person in the meeting is aware of the person or speaks on his behalf, this person is much less likely to be laid off than someone nobody has heard of. This confirms that actors with referrals in a structurally diverse network are able to effectively advertise their work and promote themselves through referrals. Consequently, their visibility is increased, and they are less likely to be laid off. Information diversity can also reduce the risks of layoff through generating more billable revenue, because firms are less likely to lay off their star performers who disproportionately contribute to generating profits for the firm. However, social communication plays a more important role in reducing the risk of layoff than does information diversity, possibly because layoffs do not only affect the person who got laid off; it can affect the team and other colleagues who are connected to the person. Once the person leaves the organization, he is removed from the social network of his contacts and network destruction from layoffs can have a drastic effect on the remaining employees (Krackhardt and Porter 1985). Qualitative interviews show that when a key person is removed from the organizational network, it affects other team members (Shah 2000). One person during the interview lamented: We were just in the process of forming a project that involves the collaboration of several groups when layoff happened. When Bob got laid off, the project also fell apart, because Bob was the key person connecting all of us together. Once he was gone, we were not able to mobilize everyone to continue the effort. Billable revenue, on the other hand, tends to affect the person himself rather than the group, because generating more billable revenue has less effect on other group members than layoffs. Accordingly, the mechanism for reducing the layoff risk is different from the one that generates billable revenue. Friends are likely to protect one from being laid off, because not only would they lose a potentially important information source, they may also experience the negative consequences of losing a friend (Shah 2000). Informal activities and social communication with a diverse group of people can promote friendships that, in turns, shield an actor from being laid off. Thus, friendships can protect a person from being laid off more than information diversity, even after controlling for billable revenue. Figure 1 below captures the theory development and hypothesis testing. Hypothesis 3b: Social communication is more correlatedwith protecting an actorfrom layoffs than is informationdiversity. Instrumental Hla vs. Expressive H2a Diversity H2b Social Communication H3a ego vs. group + Billable Revenue Structurally Diverse Networks Hib + H3b Retention Figure 1: Hypotheses Testing Framework Setting To test these hypotheses, I have collected data at a large information technology firm. High-tech firms have been a fertile ground for researchers to understand how network characteristics play important roles in information-intensive work settings such as the search and transfer problems across organizational units (Hansen 1999), research and development productivity (Reagans and Zuckerman 2001), and mobility in the workplace (Podolny and Baron 1997). If information benefits derived from network positions matter for performance, they should matter especially in a knowledge-intensive setting, such as in the high-tech sector. To characterize the social network in the firm, I captured its internal electronic communication exchanges. Previous work has validated the benefits of using electronic communication data to understand intra-organizational networks within a firm or an institution (Wu et al. 2004; Kossinets and Watts 2006, 2009). While using digital traces left by users can construct a more accurate portrait of a network, more importantly it allows for the direct observation of the information transmitted inside the network. Examining the variation of the information content across individuals can confirm whether the information-based assumptions about the network are valid (Burt 2008). As explained by Aral and Van Alstyne (2007), analyzing the content of communications as well as the topological structures of networks can open new avenues for answering questions at the heart of the sociology of information. Thus, by examining the content, I can capture the information heterogeneity across individuals by computing the total number of distinct topics in each person's electronic communications. Furthermore, I extend the content analysis by constructing a friendship index that quantifies informal and socializing activities in individuals' communication. Without recording of the content of people's communication transmitted inside an electronic communication network, it would be difficult to measure and classify different types of information benefits. Traditionally, content analysis is often done through detailed ethnography studies. While these studies are useful, they are also limiting because it is difficult to capture the communication content for a large group of people using ethnography. Similarly, traditional social network data is generated using self-reports such as surveys and questionnaires that require the subjects to recall their network connections. While respondents are generally good at remembering recent and frequent interactions, they are poor at recalling weak and distant ties (Marsden 1990; Krackhardt and Kilduff 1999). The recall bias as well as the general inaccuracy in memory can be problematic for constructing network relations that are socially distant, resulting in errors in measuring many network parameters (Marsden 2005; Kumbasar, Romney and Batchelder 1994). Using the archive of electronic communications directly can greatly alleviate this type of bias, because electronic records can precisely capture when and what exact information content is exchanged between actors. In particular, I focus on employees in the consulting division of this firm whose primary function is to solve problems for clients and generate profits from billable revenue. Typically, consultants are involved in four broad categories of projects: IT consulting, business processes, application supports, and outsourcing services. Consulting projects are in general informationintensive and require solving difficult problems for the client. According to qualitative interviews, consultants often spend a large amount of time assembling, analyzing, and assessing information gathered from various sources to fully understand clients' problems and make decisions and recommendations based on the information. To access diverse information that is critical for decision-making, consultants often need to reach out to experts in the organization. Having connections to the experts either directly or through colleagues is crucial for the consultants to gather and integrate information into viable solutions. Satisfying clients is extremely important because generating repeat business is the key for maintaining a continuing stream of revenue and avoiding bench time. In addition to working on the current project, consultants also have to look for future projects. Consulting work in this firm functions like an internal labor market. To avoid bench time, consultants constantly spend time searching for future opportunities. While an internal placement manager is assigned to each consultant, the manager has limited capacity to help individual consultants. Qualitative interviews indicate that a typical placement manager is responsible for 50-100 consultants at a time, and thus, relying on the placement manager alone is not enough to find suitable projects as needed. Accordingly, consultants need to be proactive to find opportunities, and social contacts can play an important role in this process. Having access to information about project opportunities from social contacts is useful, because hearing about an opportunity early gives a consultant a timing advantage in applying for the job. Obtaining more information about the project and the person leading the project also help the consultants to present their skills strategically to suite the needs of the project lead. Hence, they are more likely to be hired. Data To understand how social networks affect billable revenue and the risk of layoff, I analyze an electronic communication social network of 8037 employees over 2 years. The data contains email, calendars and instant messaging activities inside a global information technology firm. To the best of my knowledge, this is the largest social network ever tapped to study the impact of social networks on information worker productivity. The data is collected using a privacy-preserving social network analysis system (Lin et al. 2008) that deploys social sensors to gather, crawl, and mine various types of data sources, including the hierarchical structure of the organization, and individual role assignments as well as the encoded content of email and instant messages and calendars of employees who volunteered their data for the study. The system is deployed globally and has collected detailed electronic communication records of 8,037 volunteers. Although the volunteers only represent about 5% of the global population of the firm, it represents about 15% of employees in English-speaking regions and 23% of employees in the consulting services, which will be the primary focus in this study. To alleviate the potential problems arising from the missing parts of the whole company's network, I only examine the local network structure of each volunteer, because the system captures all the direct communications that the volunteers are involved in, including communication to nonvolunteers. Furthermore, because more than 50% of the direct contacts of these volunteers (i degree away) are also volunteers themselves, I can also determine their dyadic relationships. For the case when the network contacts are not volunteers, it is possible to make some inference about their network connections by examining if they co-occur frequently in the same email, IM, or calendar event. When two people (B and C) are listed together as the correspondents of a third (A), they (B and C) are likely to be connected to each other as well. However, it is still possible to miss some connections among the non-volunteers, and the network structural parameters may be biased as a result. This is a common problem for network studies in the field that requires setting a boundary on the population studied. But the missing connections among non-volunteers would not bias the content analysis that calculates the parameters of information benefits using the electronic communication records of the volunteers, which is fully captured. From these volunteers, it is also possible to derive a partial social network of everyone in the firm. However, I constrain the analysis to focus on the sub-network for the 8,037 volunteers whose complete electronic communication data is available. To eliminate any potential bias from the volunteered data, I compare the job roles, demographics, the types of business functions, and hierarchical ranks of the volunteers with the rest of the firm. I find minimal differences between the two populations. However, the volunteers in my sample are on average less likely to be laid off than others in the firm. Perhaps, these volunteers are more likely to be high performers or they are more socially connected than the rest. After all, that they have donated their data for this research in exchange for accessing social networking tools signals that they are more interested in social networking than others. However, with a large sample, more than 8,ooo people, there is sufficient variation to detect the local average effect from networks in this sub-population of more socially inclined people. To construct a precise view of the network that reflects the real communication patterns among employees, I eliminate spam and mass email announcements. Since each electronic message includes a timestamp, I can map a dynamic panel of social networks from January 2007 to January 2009. Each monthly network is built using a sliding window of 6 months with a i-month step size and includes all electronic messages in the current month, plus three months prior and two months after the current month. This construction of network panels can more accurately reflect the network relationships than the network activities in only a single month. Using the communication data, I construct a network panel of 17 periods for 8,037 employees, which provides an opportunity of rare scale and scope to study how a person's social network evolves over time. To explore how social networks are related to work performance, I obtain detailed financial performance records of more than 8,000 consultants. I focus on 2,038 consultants in this sample who have volunteered their electronic communication data and the 2,592 projects that these consultants participated in from January 2007 to January 2009. The sheer volume of the data allows a more precise estimate of how population-level topology in a network, information diversity, and social communication affect objective performance measures and layoff risks. To protect the privacy of the volunteers, their identities are replaced with hash identifiers, and the content of their messages is encoded. Tables 1 and 2 show the summary statistics of these consultants, including their demographics and job roles as well as network characteristics. Table 1: Summary Statistics for Person-Level Networks Variable Obs. Mean Std. Dev. Min Max Direct Contacts 8071 1o6.15 116.584 1 1575 Network Constraint 8071 -531 -303 Ties to managers 8o7i 17-518 18.349 0 .642 1.096 o Ties to divisions -8071 -. 052 1-735 256 11 Table 2: Summary Statistics on Consultants Variable layoff Gender (o- Obs. 8071 8071 Mean Std. Dev. Min Max .054 .226 0 1 .184 .388 0 1 8 8071 7.768 .161 1.508 .367 1 12 0 1 male) Job Rank Managers To study the network effect on the risk of being laid off, I use data during a round of layoffs in January 2009 when approximately 8% of the work force is eliminated. The firm's corporate policy allows for a two-month grace period during which the laid-off employees could retain their work privileges, including access to the corporate email system, intranet, and internal job postings. If they were able to find other positions within the firm during the grace period, they could be internally transferred and thus remain at the firm. However, due to the recession's severity, the firm simultaneously instituted a worldwide hiring freeze, making such internal transfers unlikely. Although I have no roster of exactly who got laid off, I can infer one by comparing the human resource (HR) directory shortly after the layoff announcement and right after the actual layoff event. From the difference between the two HR databases, I can derive who has left. It is possible that some employees may have left voluntarily, although unlikely in light of the severe recession and the difficult labor market worldwide, especially in North America. Several regional offices were closed and everyone in them was laid off. I exclude them from the dataset. Dependent Variables The dependent variables are two types of work performance outcomes. First, I measure the objective work performance using the monthly billable revenue generated by each consultant in a two-year period from January 2007 to January 2009. Because billable revenue is the benchmark for gauging productivity in the consulting industry, it is a clear and objective performance measure widely adopted for evaluating the performance of information workers such as consultants, lawyers, and accountants. The second dependent variable is whether a consultant was laid off in January 2009. Measured using job retention, the variable is binary, equaling o if a person is laid off and 1 if a person is retained. I explore how network positions and the information benefits derived from these positions can increase the rate of job retention. Explanatory Variables I use Burt's measure of network constraint (Burt 1992) to measure network diversity, or brokerage positions. Network _Diversity =1 - C Ct=(P+ p ,q qi,j. Network constraint Ci measures the degree to which an individual's contacts are connected to each other as well as their connections to the individual. Pij is the proportion of actor i's network time and energy invested in communicating with actorj. Network constraint is a local property that measures the cohesiveness of a person's network (Burt 1992), and network diversity is the opposite of network constraint and is computed as 1-C. Since relationships may erode over time, I use a 6-month sliding window of electronic communication to gauge the network relationships in the current month. Pij is calculated from the tie strength, which is measured using the frequency of one's electronic communications. Granovetter (1982) described four identifying properties for the strength of ties: time, emotional intensity, intimacy, and reciprocity. In practice, tie strength has been measured in many ways. Some use reciprocation to represent strong ties and a lack of reciprocation to represent weak ties (Friedkin 1980). Others have included the recency of contact or the frequency of interactions as a surrogate for tie strength (Granovetter 1973). To measure the tie strength in electronic communications, I primarily use the frequency, but with some modifications. Because a single electronic message does not constitute an actual tie, especially when it is sent to a large number of people, counting any message exchange between actors as a dyadic tie would overestimate the number of ties and the overall tie strength. Thus, I eliminated all messages that have more than 15 recipients (Lin et al. 2008). In addition, to accurately reflect the tie strength between two actors, I normalized the measure to an interval between 0 and 1, with o indicating no tie between the two actors and 1 indicating the maximal tie strength (Lin et al. 2008). The detailed calculation is described below. log(X) TieStrength.. = ~ Maxlog(X',) k 0: iff{Xy s 3 +log(X Xj Xi: otherwise )} where Xij is the total number of electronic messages between actors i and j. Basically, the formula indicates that a tie exists only when the number of electronic messages between two actors reaches a certain threshold. This threshold is personalized; for active users of electronic media, the threshold to register a tie is higher than for those who seldom use electronic media. This measure of tie strength has been extensively tested and shown to accurately reflect the tie strength between actors (Lin et al. 2008). Content Analysis To measure information diversity and social communication, I use the content of electronic communications, after ensuring privacy is preserved. Individuals are hashed with unique classifiers, so it is impossible to determine their identities. To preserve the privacy of each message, the original textual content is also not recorded. Instead, I create a set of tokenized one-gram and two-gram keywords after eliminating stop-words and stemming. Stop words are common words such as articles ("a," "an," "the") and prepositions (e.g., "from," "of," "to"). Stemming involves stripping each word to its root. For example, the word "running" will be recorded as its root, "run." With these precautions, it is virtually impossible to reconstruct the original message from these tokenized keywords, which are further anonymized with hash identifiers to preserve privacy. I model the diversity for information content using Latent Dirichlet Allocation (LDA) to classify the content into distinct topics. LDA is an advanced statistical technique that is widely used in information retrieval and machine learning. It is a generative probabilistic model that extracts topics from a corpus of documents2 (Wen and Lin 2010). Each topic is a vector of words that are statistically related to each other. For example, in Figure 2, LDA classifies the sample text into four specific topics. The topic "Children" has words including "women," "child," "care," and "parents." Similarly, the topic "Budget" includes words such as "tax," "federal," "state," and "spending." "Arts" "Budgets" "Education" Opera Performing Act Lincoln Center New York Philharmonic Leading Music Supporter Million Board Grants Support Research Services Foundation President Announcing Building Fund Receive Annual $25,000 School Education Monday Taught Young $100,000 The William Randolph Hearst Foundation will give $1.25 million to Lincoln Center, Metropolitan Opera Co., New York Philharmonic and Juilliard School. "Our board felt that we had a real opportunity to make a mark on the future of the performing arts with these grants an act every bit as important as our traditional areas of support in health, medical research, education and the social services," Hearst Foundation President Randolph A. Hearst said Monday in announcing the grants. Lincoln Center's share will be $200,000 for its new building, which will house young artists and provide new public facilities. The Metropolitan Opera Co. and New York Philharmonic will receive $400,000 each. The Juilliard School, where music and the performing arts are taught, will get $250,000. The Hearst Foundation, a leading supporter of the Lincoln Center Consolidated Corporate Fund, will make its usual annual $100,000 donation, too. Figure 2: An Example of LDA Classification3 LDA classifies topics in two specific steps. The first step is a discovery phase that Given a document corpus, LDA models each document d as a finite mixture over an underlying set of topics, where each topic t is characterized as a distribution over words. A posterior Dirichlet parameter g(d; t) can be associated with the document d and the topic t to indicate the strength of t in d. For details of the algorithm, please refer to D. Blei, A. Ng, and M. Jordan, Latent dirichlet allocation, Journal of Machine Learning Research 3:993-1022, 2003. 3 Adopted from Blei & Jordan, 2003, Latent dirichlet allocation, Journal of Machine Learning Research 3:993-1022. 2 searches the entire topic space using every document in a corpus. Once words are classified into topics, LDA finds the topic space in each individual document. A document, in this setting, is an aggregate of all the electronic messages in a person's communication. LDA is used to classify 100 topics using the entire corpus of electronic communications of 8,037 volunteers from January 2007 to January 2009. Information diversity is then calculated for each person in every month as the total number of topics in the person's electronic communications during that month. To measure referrals, I create a friendship index. First, I obtain a dictionary of every word ever used in the corpus of electronic messages. Each word is ranked by its TF-IDF4 weight, which measures how important a word is to a document. I then used this list to create a sub-list of words that are related to social communications and social activities with friends but also have relatively high TF-IDF values. For example, some examples of keywords are "lunch," "coffee," "football," and "baseball." Two firm employees also verified that the words on the list are often used for social and informal activities. An employee then calculated the frequency of this set of words in each person's monthly communications. I created a friendship index as the ratio of words relating to social activities to the total number of words. Ffiendsh p Index = Words related to social activities Total nwnber of _words Control Variables I include controls for individuals' demographics such as gender, managerial roles, and job ranks. A managerial role is a dummy variable indicating whethek the person is a manager. Job ranks have an ordinal value ranging from 6 to 12 where level 6 is a junior consultant while 4 TF-IDF stands for "term frequency-inverse document frequency." It is often used in information retrieval and text mining. The weight is a statistical measure used to evaluate how important a word is to a document. The importance increases if the word is rarely used. Frequently occurring words such as "have" will have relatively low TF-IDF weights, whereas a relatively exotic word such as "haematoma" has a high weight. level 12 is an executive vice president. A dummy variable is also created for each job rank but results do not fundamentally change between using a set of dummy variables and the ordinal job rank. To control for the differences across various divisions and geographical locations, I include dummies for the four business divisions as well as a dummy for each geographical location. To control for the current workload, I include the average monthly revenue billed in the past six months. Lastly, to control for individual preferences to use electronic media, I include a person's total number of electronic messages (email, calendar, and instant messaging) in a month. Identification Despite the overwhelming evidence for strong correlations between network positions and work performance, the causal mechanism underlying the association is underexplored (Reagans and McEvily 2003). A plausible explanation is that people actively seek high performers for advice and collaboration opportunities, and hence high performers tend to display a structurally diverse network. Similarly, certain individual characteristics may manifest in their social networks. For example, a popular person tends to have a more diverse network, which may also enable the person to be an effective employee in an organization. In essence, individual traits are the missing variables that mediate both network positions and performance, so their observed relationship may be spurious. That there are positive correlations between certain individual characteristics and network positions suggests that individual heterogeneity may moderate the relationship between network positions and performance (Burt 2004, 2007; Hargadon and Sutton 1997). For example, Burt and Ronchi (2007) suggest that high-status individuals such as executives are more likely to occupy brokering positions in the firm because their roles as executives require them to reach out to a diverse range of people. Similarly, Burt (2007) suggests that inherent abilities, such as possessing performance-enhancing cognitive skills, are ultimately responsible for improving work performance. In short, in this view, network positions are a function of human capital. To detect a causal relationship between network positions and performance, an exogenous source of variation is needed (Munshi 2003). I exploit the adoption of a social networking tool that can exogenously change a person's network position over time. The primary function of this technology (Expertise-Find) is to allow users to search for experts using keywords. Because people resort to technologies only when they cannot find relevant experts in their immediate network neighborhood (Borgatti and Cross 2003), the experts in a user's search are often outside of the person's existing social circle. If users decide to reach out to these experts, the network diversity of the users is likely to increase after using this tool. Accordingly, the adoption of Expertise-Find could exogenously change a person's network position. If observable improvements in work performance are detected after the technology adoption, it is likely that the increase in network diversity induces the performance improvement, suggesting a causal relationship between network positions and performance. Expertise-Find Expertise-Find is similar to popular search engines on the Web, such as Google, with the only difference being that instead of URLs, it returns a list of people whose expertise is relevant to the search query. This tool aggregates as much information as it can about the employees inside the firm using the intranet. For example, the tool can crawl and mine public information on the intranet about employees using their online profiles, resumes and online forums as well as private communication exchanges if they decide to volunteer their data. In aggregates, these data serve as the basis to infer individual expertise at the firm. For example, when searching for the phrase "Social Networks," Expertise-Find would return a list of people ranked by whether their expertise is relevant social networks (Figure 3). Each search result lists the name of the expert, a picture (if available in the public HR directory), the job role, and the division the expert belongs to. If one clicks on the person, the system shows more details, such as the physical work location and contact information. In order to understand how employees use the search tool and ..... .. .. .. ....... how often they actually contact the experts from the search, I conducted an extensive survey about the general usage and search behaviors. The consistent pattern from the survey reveals that the vast majority of people use Expertise-Find when they have already exhausted their existing local networks, conforming to earlier studies (Borgatti and Cross 2003). By contacting experts suggested by the tool, users are more likely to find the information they need either directly, or through further recommendations from the expert. Evidence also suggests that the relationship formed between the expert and the searcher can become more permanent with repeated interactions. One person interviewed commented that she made a friend after contacting an expert through the tool. One of the experts who had helped her earlier was transferred to the same work location as she was, and she offered to help the expert through the transition and they became friends afterward. Some experts also mentioned that they received thank-you gifts from the searchers they helped, and this helped to enhance their relationship. The firm has a program that sponsors this type of gifts so that individuals can use them to thank people in the organization who are helpful to their work. Naam She "0 W l LnIII11 LAW%* Us inlt _IAMMM .,M. JiName ~ Name Di i Division Job Role JobRole Name Division JobRole Division Name Division Job RoleJob Name Division Role Name Job RoleJo Division Job Role Name ol Division JobRole Name Division Nm Dvso Figure 3: Snapshot of Expertise-Find. Search result from searching for the phrase "Social Networks" Overall, by contacting people from the search result, users are more likely to reach out to a distant group of people, increasing their network diversity. Because I have the historical electronic communication data of the volunteers, it is possible to measure the network change for the same person before and after the adoption. If there is a change in the network position after the adoption, it is plausible to attribute the change to using the search tool. If we simultaneously observe a performance change, it is likely that the performance gain is due to the change in network positions. However, there may be self-selection factors that could induce both a network change and the adoption of the search tool, and it is important to address them. Selection Effect An important concern is that there is a selection bias in choosing when and why to sign up for Expertise-Find. The bias can simultaneously drive the adoption of the tool as well as any change in network positions. However, three factors help alleviate the bias. First, I examine the change in a person's network position before and after the adoption. If there are any unobserved individual characteristics, such as the propensity to use new technologies, that can drive both the adoption and the network change, I can eliminate this type of bias through a fixed-effect specification. Second, people adopted this tool at different times throughout the study, allowing me to control for any temporal shocks that can affect the adoption choice. For example, if people are more likely to sign up for the tool after their annual performance review in February, controlling for the February-effect can eliminate this bias. It is also plausible that people would choose to use Expertise-Find when they already have many consulting projects. Consequently, it may seem that a network change is affecting the change in billable revenue, but it is actually a reverse causality in which having a heavy workload induces people to use the technology and change their network positions as a result. In order to eliminate this bias, I use the average monthly billable revenue in the past 6 months to control for the existing workload. It is also possible that a person chooses to adopt the tool when a project requires different knowledge from what they had before. Hence, the person uses the tool to access information. I argue that the adoption of Expertise-Find can be particularly helpful because it provides a means for the person to reach experts in distant pockets of the organization. Thus, the tool can reduce the search costs of finding information and help the person complete projects and satisfy clients. One could also argue that it is not the network, but the ability to locate information quickly, that is ultimately responsible for inducing the performance change. Because ExpertiseFind can effectively locate the source of information, it reduces the search cost of information that is ultimately affecting performance. However, as I argued earlier, a structurally diverse network is reason for reducing the search cost because such networks can generate information benefits that expose people to more information, and more unique information, than their peers. Hence, using Expertise-Find as an instrument for network diversity, I can directly observe if network diversity produces information benefits in the forms of information diversity and social communication. After controlling for factors that may drive the adoption choice, it is plausible that the adoption is exogenous for changing the network position. Although I am aware that there could still be other unobserved heterogeneities that violate this assumption, interviews and surveys on user behaviors do not show any other consistent pattern that could drive both the technology adoption and the change in network positions. Empirical Methods I estimate the relationship between network diversity and work outcomes using the adoption of Expertise-Find as an instrument for network diversity. To understand how a structurally diverse network induces superior work outcomes, I examine if network diversity actually generates information diversity and social communication, the two types of information benefits theorized to arise from a structurally diverse network. Using the adoption of ExpertiseFind, I hope to find evidence of causal relationships between network diversity and information diversity, between network diversity and social communication, and between network diversity and work outcomes. Furthermore, I am interested in how information benefits-information diversity and social communication-ultimately affect different types of work outcomes: billable revenue and layoffs. However, the instrumental variable approach is not sufficient to identify the relationship because I have two potentially endogenous variables but only one instrument. Hence, in order to control for the differences in individuals' characteristics, I incorporate attributes such as gender, demographics, and job roles that may affect both information benefits and work outcomes. If the unobserved heterogeneity in individuals' characteristics is correlated with the error terms in the model, estimates using pooled OLS will be biased. To address this issue, I examine the variation within and across individuals over time using both fixed and random effect models to control for the bias. However, this technique can only be applied when studying the impact of network positions on billable revenue, because I have a longitudinal panel of both. But layoffs are cross-sectional because being laid off is a one-time event. Thus, I can only control for observable individual characteristics, instead of exploiting the variation within individuals as I could with the analysis on billable revenue. To alleviate the endogeneity concern in analyzing layoffs, I employ the lagged measurements of network characteristics at time t-1 to predict layoffs at time t. Specifically, I use the electronic interactions six months prior to the layoff events to calculate network variables. If networks are to have an effect on layoffs, the network of communications prior to the layoff event should have an important impact. Furthermore, I also included individuals' objective performance, billable revenue, to predict layoffs, because employees with superior performance should have lower probability of being laid off. To mitigate the estimation problem arising from the endogenous relationship between network characteristics and billable revenue, I use the billable revenue generated 6 months prior (at time t-2) to when network characteristics are calculated (at time t-1). .... ......... Results Network Changefrom the Technology Adoption First, I examine if the adoption of Expertise-Find can actually induce a change in network positions. It is possible that the adoption is not a random event. However, because I examine the network change for the same person over time, fixed-effect models can eliminate many individual heterogeneities, such as human capital, that might bias the estimates. I also control for temporal shocks to mitigate some biases from time-varying characteristics. By including a dummy for each calendar month, I eliminate the seasonal effects that can drive the adoption choice. However, there might still be time- and individual-varying biases. For instance, it is possible that people are more likely to adopt this technology when facing high workloads. Thus, I use the average billable revenue in the past six months to control for the general workload at the time of the adoption. Event Study: Structural Diversity and Technology Adoption 0 -10 -5 0 0 Months Since Signup coeffientsoneventmonth coeff high 10 5 coeff_low Figure 4: Event study for when people adopted Expertise-Find. Each point on the graph is the coefficient estimates calculated from regressing network diversity on each month since adoption after controlling for calendar-month dummies, past billable revenue and individual fixed-effect. A value of zero on the X-axis indicates that Expertise-Find is just adopted. Negative values on the X-axis indicate the number of months before the adoption has occurred and the positive values indicate the number of months that have passed since the adoption. From this graph, it shows that the effect on structurally diversity gradually goes up since the adoption event at X=0. To construct the technology adoption variable, I use a binary variable that equals i for every month after the person has adopted Expertise-Find and zero before the adoption has happened. Overall, there is a positive and significant correlation between the instrument and the endogenous variable in the first-stage regression. Using the fixed-effect model, I find that the correlation between network diversity and the adoption of Expertise-Find is .114 (t = 17.86) after controlling for seasonality and past performance. To estimate the validity of the instrument, I calculate the concentration parameter, which is 86.7, indicating that the adoption of ExpertiseFind is not a weak instrument (Hansen, Hausman, and Newey 2004).5 Figure 4 shows the relationship between network diversity and the timing of the adoption in an event study. Each data point on the graph shows the coefficient estimates calculated from regressing the network diversity on each month before and after the adoption event in the 2-year period in my sample. After factoring out seasonality, individual fixed-effects, and past performance, the coefficient estimate for months after the adoption event (X>o) is increasing over time, indicating that Expertise-Find can induce a change in network diversity (Figure 4). The reduced-form regressions in Table 3 show that the adoption is positively associated with generating billable revenue as well as reducing the risk of layoff (or increasing job retention). After controlling for temporal shocks, individual fixed-effects, and a person's past performance, Model 1 of Table 3 shows that the adoption of Expertise-Find is positively associated with generating more billable revenue. As with layoffs, the adoption is also positively correlated with job retention. However, because the layoff event has only one observation for each person, it is impossible to use the earlier instrumental variable approach that relies on the network change before and after the adoption. Instead, I use the number of months since a person has signed up for Expertise-Find to instrument for network diversity. As shown in Figure 4, network diversity gradually increases after a person has started to use the tool. Thus, other s The test for weak instrumental variable requires the concentration parameter to be greater than 10 (Hansen, Hausman, and Newey 2004). Any value less than 10 indicates the presence of a weak instrument. things being equal, early adopters should have more structurally diverse networks than late adopters. To test the validity of this instrument, I calculate the concentration parameter in the first-stage regression and the value is ni, slightly above the cut-off for the validity of the weak instrument test. The reduced-form regression (Model 2) shows that the number of months passed after the adoption is positively correlated with job retention. Table 3: Reduced-form Regressions: Adoption on Billable Revenue and Retention Model (1) Dep var: Monthly Revenue Adoption 584-15** (282-30) .028** (.123) Individual Fixed Effect Yes No Month Dummies Control variables Yes No Communication Volume Communication Volume Past Billable Revenue Past Billable Revenue Work divisions, Work divisions, #people Average Month Revenue Observations Billable (2) Billable Retention geographical locations geographical locations 2,038 1,506 11,842 I 1,842 I 20,373 1,506 Clustered standard error, ***p<o.01, ** p<O.05, * p<0-1 Network Effect on Billable Revenue Next, I show if a technology-induced change in network positions can-induce a change in performance over time. Model 1 of Table 4 shows the OLS estimate on the correlation between network diversity and billable revenue, after controlling for demographics, the work division as well as the managerial and technical level for the person. This is what has been traditionally estimated in understanding the relationship between network diversity and performance in previous work. As shown in Model 1, coefficient estimate for network diversity is positive and the effect is relatively large. A 1%increase in network diversity is correlated with billing 886 US 45 dollars in a month. However, when a fixed-effect specification is used (Model 2), the size of the coefficient is reduced by 17% (Pnetwork diversity = 733.0, p <.01). This shows that unobserved time- invariant individual characteristics could drive changes in both network diversity as well as work performance. In Model 3, I estimated the effect of network diversity on billable revenue using the adoption of Expertise-Find as an instrumental variable (IV) for network diversity. The coefficient from this IV regression is reduced dramatically by 82% (Pnetwork diversity = 126.5, P <.1), demonstrating time-varying individual heterogeneity can still bias the estimate upward. However, network diversity in the IV regression continues to be positive and statistically significant, demonstrating that it can induce a positive change in performance. Table 4: Network Dive sity and Performance Model (1) (2) (3) (4) (5) (6) Dependent Variable Monthly Monthly Monthly Monthly Monthly Monthly Monthly Model revenue OLS revenue FE revenue IV revenue FE revenue IV revenue FE 843.6** 181.00* 882.4** 886.4*** 733.O*** 126.5 (154-11) ( ** (12.50) * (16o.28) 239.5** (18.20) * (160.55) .290*** -. 320 .253*** -.463* (.290) (.09) (.27) .110*** .121*** Log Diversity: log(1- constraint) Volume of (7) revenue IV (117.90) email/IM/calendar events I____(.09) Average billable revenue in the past 6 months I 1 1(.) (.oi) Controls Gender (0-male) Manage (dummy) -258.6 -- -- -- -- -- -- (388.9) -- -- -- -- -- -- -617.0 -- -- -- -- -- -- (692.8) -- -- -- -- -- -- -- - -- -- -- -- 1,369*** Job rank (6-12) (127.2) Business Consultant 7,367*** Division (dummy) (874.1) Technology 2,363* Consultant -- -- -- -- -- -- -- -- -- -- -- -- Division (dummy) (1,220) Sales Division (dummy) 612.1 -- -- -- -- (i,6o) 161.8 -- -- -- -- -- Headquarter (dummy) -- -- - -- -- -- (1,280) -- -- - -- -- -- -- 4,348*** -- -- -- -- -- -- (dummy) (1,088) -- -- -- -- -- -- Observations 20,373 20,373 20,373 20,373 20,373 20,373 20,373 2,038 2,038 2,038 2,038 Software Division #employees 2,038 2,038 2,038 Controls: monthly dummy for each of the 24 months In Model 4, 5 in Table 4, I incorporated the total number of electronic messages exchanged over a month as a control for individual differences in online media use. It is possible that tech-savvy individuals are more likely to adopt a new technology and simultaneously be high performers. After controlling for usage of electronic media, the results largely mirror earlier results. The parameter estimate of network diversity in IV model is significantly less than the estimates in the fixed-effect model, but the coefficient is still positive and statistically significant. It is also possible that existing workload may drive the adoption of Expertise-Find as people seek to use this tool to help with their high workload. To address this potential bias, I controlled for the average monthly revenue in the past 6 months. As shown in Model 6 and 7, while past performance is strongly correlated with the current billable revenue, the IV estimate for network diversity continues to be positive and significant. However, the size of the effect estimated in the IV regression is much smaller than estimates using the fixed-effect model and the OLS model. Overall, these results largely support Hypothesis la. Network Effect on Layoffs While I show evidence of a causal relationship between network diversity and billable revenue, I also examine if network positions can affect a person's risk of being laid off. If network positions are to have an impact on work outcomes, it should have an even more pronounced impact on layoffs, because unlike promotions and performance evaluations, layoffs are a more traumatic experience for most people and network contacts should play an important role in keeping a person from being laid off. Table 5 shows the cross-sectional analysis of the network effect on layoffs using the network characteristics calculated from six months of electronic communications prior to the layoff event. The first model of Table 5 shows the effect of network diversity on job retention (ilayoffs). Gender and job roles do not show any statistically significant effect on job retention, but geographical locations appear to have an effect. Compared to the European Union, workers in the US are more likely to be laid off. This difference is probably due to stronger labor laws in Europe, which make it harder for the firm to downsize. I also control for the usage of digital media and show that network diversity is positively correlated with job retention, but it is just short of being statistically significant (Model i). Table 5: Networks Diversity and Layoff/Retention Model (1) (2) (3) Dependent Variable Model ln(network diversity): ln(i- constraint) (4) (5) retention Probit retention IV Probit retention Probit retention Probit retention IV Probit .105 (.077) .157*** (.019) -150* (.085) .105* (.062) .118** (.060) .090** .096** .155*** (.043) (.044) (.042) -.050 --197** (.057) (.093) ln(Billable revenue) ln(Friends' Billable Revenue) Controls Volume of email/IM/ calendar events Gender (0-male) Job Role (level 6-12) Europe Asia Australia US (dummy) 1.58eo5 (1.48eo5) .089 -7.72e-05*** (1.81e-o5) .112 1.52eo5 (1.62eo5) .0480 2.70eo5 (1.94eo5) -8.89e-06 (3.o3e-o5) .064 -.031 (.125) (.0861) (.135) (.136) (.132) -055 .163*** .0568 .056 .079 (.0360) (.0467) (.0412) (.0413) (.096) .674*** .102 .646*** .657*** .406 (.157) (.240) (.174) (.175) (.293) .358 .510** .249 .478* .474* (.225) (.193) (.248) (.250) .421 (.350) -.419*** .423 .0829 .598* (.259) (.200) (.332) -. 362** -.194 -.445*** .619* (.334) --445*** (.141) (.134) (.157) (.158) (.160) .125 .00747 (.384) .773 (-744) .716 (.764) .663 (.666) (-575) .353 --519 (-371) -.666 .264 (.618) .530 .280 (.634) (-705) (.463) (.754) .467 .032 (-585) .087 (-773) (.729) .003 -.546 .0475 -.0295 (.620) .585 (.695) -.314 (.383) (.660) (.612) .524 .783 (.418) (.748) (.680) -724 (.764) 1,927 1,927 Technology Division (dummy) (.635) Division .133 Business (dummy) Sales Division (dummy) Software (dummy) Headquarter (dummy) Division Observations 1,927 Standard errors in parentheses 1,927 1,927 p<o.oi,** p<o.o5, * p<o-1 .902 (.651) In Model 2, I examine if the relationship between network diversity and the risk of being laid off could be causal using instrumental variables. I use the number of months that have passed since a person has adopted Expertise-Find as an instrument for network diversity. However, the instrument can be problematic because there might be individual characteristics that drive both the network change and the likelihood to adopt early. This is more problematic in a cross-sectional analysis, because it is impossible to exploit the fixed-effect model to eliminate any time-invariant individual characteristics, such as the propensity to be early or late adopters. To address this issue, I control for demographics, gender, job role, and rank and other observable individual traits. However, I am aware that there might still be unobserved factors that may drive both changes in network positions and the risk of layoffs. The instrumental variable approach shows that the coefficient on network diversity is positive. Specifically, a one-percentage increase in network diversity is correlated with an increase of 15.7 percentages in job retention, providing evidence that peripheral actors are more likely to be laid off than those who occupy more central positions in the network. However, it is possible that those with a structurally diverse network may just perform better and for that reason are less likely to be laid off. To address this issue, I control for the objective work performance using billable revenue. However, billable revenue is an endogenous variable because network positions can simultaneously affect billable revenue and the risk of being laid off. Hence, including billable revenue as an independent variable is problematic (Angrist and Pischke 2009). To address this problem, I use the lagged billable revenue 6 months before measuring the network characteristics. The timing difference implies that the billable revenue is predetermined before network positions are calculated. Thus, they are less likely to be the outcomes in the causal nexus (Angrist and Pischke 2009). Using billable revenue from an earlier period is also beneficial because it controls for possibilities that people who are finishing their current projects may also face increased risk of layoff when they have not lined up any future. As expected, the objective performance, measured by billable revenue, is a strong predictor for job retention (or reduced risk of layoff). Similarly, network diversity continues to be positively correlated with job retention (Model 3, Table 5). If the main advantage to having a structurally diverse network is the access to relevant information and expertise, the billable revenue generated should capture the performance impact from network diversity. But Model 3 shows that a structurally diverse network provides additional shields against layoff, even after controlling for the objective performance, and interestingly, the effect from network diversity is actually greater than that from billable revenue (Pbilable revenue = .090, pnetwork diversity = .150). The F- test shows the significance of the test is at p < .001 level, demonstrating that in addition to information diversity, a structurally diverse network can protect a worker from being laid off, beyond generating more billable revenue for the person. A possible explanation for why network diversity can reduce the risk of layoff even after controlling for billable revenue is that actors with a structurally diverse network can be instrumental for helping others to generate revenue for the firm. By providing key information and expertise to their network contacts, the actors can indirectly contribute to the profitability of the firm and, accordingly, they are less likely to be laid off. If this is the case, we would expect that the billable revenue generated from network contacts can reduce a person's risk of being laid off. However, the average billable revenue generated from a person's network contacts (1 degree away), is not statistically significantly correlated with retention (Model 4). This result provides some evidence that helping others does not reduce a person's risk of being laid off. Lastly, I examine if the results could be causal using the number of months that have passed since a person has adopted Expertise-Find as an instrument for network diversity. Model 4 shows that a 1%increase in network diversity is associated with an increase of 1n.8 percentages in job retention, demonstrating that network diversity has a significant impact on layoffs. However, the sizes of the effects from network diversity and billable revenue are comparable. Interestingly, the average billable revenue of network contacts increases the risk of layoffs. This is possibly because when others perform well, it actually decreases the relative performance of the person and thus increases the risk of layoff for the person. Taking these results together, a structurally diverse network can positively associated with job retention, supporting hypothesis ib. To understand exactly how a structurally diverse network can increase the rate of retention as well as generating more billable revenue, I examine the effect of information benefits derived from a structurally diverse network. In particular, I focus on information diversity and social communications and their impacts on work outcomes. Information Diversity and Social Communication as a Function of Network Diversity I explore if a structurally diverse network, as measured by network diversity, actually generates information benefits, specifically in the forms of information diversity and social communication. Information diversity is calculated as the number of topics in a person's electronic communications. Social communication is calculated using a friendship index that measures the frequency of words in the messages that are related to socializing and informal activities. Friendship index is a proxy for the referral process, because friends are more likely to advocate for the person, trumpeting his or her accomplishments at key junctions such as during layoffs. Table 6 shows the relationship between information diversity and network diversity, and between the friendship index and network diversity. I find strong evidence that network diversity generates information diversity. Using a fixed-effect model, a one-standard-deviation increase in network diversity is associated with finding an additional 1.5 topics in one's electronic communication (Model 1). The effect continues to be positive (Pnetwork diversity = 6.11, p < .05) when an instrumental variable is used for network diversity. This is also similar to findings in Aral and Van Alstyne (2007), which also finds a structurally diverse network to be positively correlated with accessing diverse information. Next, I examine if a structurally diverse network can also facilitate the referral process as approximated by social communication. As with information diversity, the fixed-effect model shows that a one-standard-deviation increase of network diversity is correlated with gaining .01 points in the friendship index, which is about a one percentage increase (Model 3). Network diversity continues to be positively associated with the friendship index (Pnetwork diversity = .216, p < .05), using the adoption of Expertise-Find as an instrumental variable. Overall, the fixed-effect and the IV regressions show a causal relationship between network diversity and social communication. Having friends in a diverse network can facilitate the referral process where friends can serve as advocates for the person. These results support Hypotheses 2a and 2b. Table 6: Relationships among Network Diversity, Information Diversity and Social Communication Model (1 (2) Dependent Information Information Variable Diversity Diversity Model FE IV Network diversity 1.466*** 6.105** (std) (.196) (3-05) (-.003) (.105) .00207*** -.0111 (.00684) 7.39e-05*** (2.92e-o6) 4.79e-05*** (1-37e-05) .000166 4.03e-08 (4.29e-o7) 15,634 .064 1,912 9.90e-07 (6.89e-07) 15,634 (3) (4) FE IV Friendship Index 1** Friendship Index .216** Controls Volume of email /IM/calendar events ln(Billable revenue) Observations R-squared Number of people (.000267) -8.41e-o5*** (3.11e-o5) 9,666 .027 (.000169) 9,666 1,912 1,912 I I Controls: monthly dummies for each of the I 24 1,912 _I dummies and individual fixed effect Standard errors in parentheses *** p<o.oi, ** p<o.05, * P<o-1 In Table 7, I explore whether information diversity and social communication are complements or substitutes by examining their correlations. After controlling for temporal shocks, individual fixed effects, and past performance, the correlation between information diversity and the friendship index is negative, suggesting a potential substitutive relationship between the two. Though they can overlap, gathering information and socializing are two distinct activities. This shows that a structurally diverse network can generate both expressive 52 and instrumental elements, as shown by information diversity and social communication, respectively. However, there might be a tradeoff between the two in generating the desired work outcome, as I explore in the next section. Information Diversity, Social Communication and Their Relations to Billable Revenue and Layoffs To examine how a structurally diverse network improves work performance, I explore how information diversity and social communication differ in generating billable revenue and reducing the risk of being laid off. Table 8 shows the effect of these factors in generating billable revenue. Model 1 shows that, after controlling for the volume of communication, a onestandard-deviation increase in information diversity is correlated with generating an additional $187.50 of billable revenue, while the friendship index is not statistically significantly correlated with billable revenue. When both information diversity and the friendship index are treated as independent variables in the same model (Model 3), I find that information diversity, but not the friendship index, is positively correlated with generating billable revenue. In Models 4-6, I control for the past billable revenue, because it could be serially correlated with the current billable revenue. Results in these models largely mirror the earlier results in Models 1-3: only the coefficient on information diversity is statistically significantly correlated with billable revenue; the coefficient on the friendship index is not. Overall, these results support Hypothesis 4a. In Model 7, I explore whether information diversity and the friendship index serve as complements or substitutes. The interaction effect (Pinfornation diversity Xfriendship index = -219.18, p < .05) is negative. Together with the earlier negative correlation between information diversity and friendship index (Table 7), these results show that the two serve as substitutes for generating billable revenue (Athey and Stern, 1998; Brynjolfsson and Milgrom, 2010). Table 8: Network Information and Performance: (2) (3) (4) 187.5* 192.4 (108.1) (108.1) 244.6** (107.0) Position Diversity and Social Communication Model Dependent Variable Model Information Diversity (standardized) Friendship Index ( Monthly revenue FE Monthly revenue FE Monthly revenue FE 5.695 (standardized) Monthly revenue FE Monthly revenue FE 42.76 (66.86) 147.3 (122.8) (65.20) (5) (6) (7) 249.5** 239.9** (107.0) (107.1) 178.3 -33-30 (163.48) Monthly revenue FE (100.3) -219.18** Information Diversity X Friendship Index 1 (115.8) Controls volume of communication Observations #employees Monthly revenue FE 20,373 2,038 20,373 2,038 volume of communication past billable revenue 20,373 20,373 2,038 2,038 20,373 2,038 20,373 20,373 2,038 2,038 Controls: monthly dummy for each of the 24 months, individual-level fixed-effect Standard errors in parentheses ***p<o.oi, ** p<o.o5 Next, in Table 9, 1 explore the effect of information diversity and the friendship index on job retention (reducing risk of layoff). The first model shows the correlation between information diversity and job retention after controlling for demographics, gender, job ranks, and dummies for regions and business divisions. Contrary to the result in the performance analysis, information diversity is not correlated with retention (Model 1). However, a onestandard-deviation increase in the friendship index is associated with an 11 percentages increase in job retention (Model 2). When both information diversity and the friendship index are jointly used in the model, the friendship index is still positively associated with retention but the coefficient on information diversity is not. The F-test shows that the effect of the friendship index is greater than that of information diversity at p = .01 level. This set of results suggests that social communication, which approximates the referral process, is more important for avoiding layoffs than is information diversity. This is the exact opposite from the performance analysis where information diversity is more correlated with generating billable revenue than is social communication. Because work performance could also have a significant impact on layoffs, I control for the past billable revenue, using data 6 months prior to the layoff event (Models 4-6 of Table 9). All else being equal, high performers are more likely to be retained than low performers. It is also possible that a person contributes indirectly to firm profits by helping his colleagues. Thus, I control for the average billable revenue of the network contacts in Models 4-6. As in Model 1-3, I find that compared to information diversity, social communication as measured by friendship index is more correlated with job retention (Model 4). The F-test shows that the coefficient of the friendship index is greater than that of information diversity in maximizing job retention (p < .001). Together these results demonstrate that social communication, which approximates the referral process, is the primary channel through which a structurally diverse network drives job retention, lending support to Hypothesis 4b. From qualitative interviews, managers state that a person is less likely to be laid off when others have heard about his or her work either directly or indirectly. Because friends are more likely to advocate for friends, having a diverse group of friends is helpful in averting crises, such as layoffs. Next, I explore whether information diversity and social communication are substitutes in reducing the risk of getting laid off. Model 7 shows the interaction effect of information diversity and the friendship index to be negative and statistically significant. Together with their negative correlation (Table 7), these results demonstrate that information diversity and social communication are substitutes (Athey and Stern, 1998; Brynjolfsson and Milgrom, 2010). Overall, these results show that a structurally diverse network can have both instrumental (information diversity) and expressive (social communication) elements, contrary to the notion that it is rare to have both because one may dampen the effect of the other. However, the tradeoff re-emerges in the ability of a person to mobilize either information diversity or referrals to achieve a desired work outcome. The substitutive relationship between the two shows the limitation in mobilizing them together. The return from investing in social communication will dampen the return to investment in information diversity. An individual could develop both kinds of social capital but an individual's couldn't benefit from them both equally. Table 9: Networks and Layoff Risks: Information Diversity vs. Model (1) (2) (3) (4) retention retention retention retention Dependent Variable Probit Probit Probit Probit Model Information .026 .031 .062 Diversity (.064) (standardized) Index Friendship (standardized) Information Diversity .110* (.o61) (.064) .112* Social Communication (5) (6) retention retention Probit Probit .0683 (.073) .193** (.090) (.062) (7) retention Probit .0814 (-074) (.083) .184** (.090) .209** (.097) -.264** X Friendship Index Log(Billable .125*** .16** .120** revenue) (.047) (.048) (.048) (.048) Log(Friends' Billable -.020 (.066) 1,927 -.013 (.064) 1,927 -.027 -.020 (.067) 1,927 (.067) 1,927 Revenue) Observations 1,927 1,927 1,927 (.139) .123*** Controls: Volume of email/IM/ calendar events, gender, job rank, regional dummies, business division dummies Standard errors in parentheses *** p<0.01, ** p<0.05, * p<o-1 Discussion and Conclusion In this study, I examine the impact of social networks on billable revenue and layoffs. Using the adoption of a social networking tool that could change a person's network position over time, I show evidence of a causal relationship between network diversity and billable revenue and between network diversity and layoffs. However, the size of the effect is much smaller than the traditional OLS and fixed-effect estimates. Because this tool can improve a person's network position primarily through information-seeking activities, the improvement in work performance is likely to come from the information benefits derived from having a structurally diverse network. However, it is worth noting that there are different types of information benefits. I show two types of information benefits-information diversity and referrals-and they could have different effects in generating billable revenue and avoiding layoffs. Using the adoption of Expertise-Find as an instrument for network diversity, I show that a structurally diverse network can generate both referrals, as approximated by social communication, and information diversity. To examine how the effect of information diversity differs from the effect of referrals in generating superior work outcomes, I take advantage of information technology that captures the digital traces from people's daily communications. I use advanced machine-learning techniques to assess the content of people's electronic communications. To measure the diversity or the novelty of information content, I calculate the number of distinct topics in a person's communications. To measure referrals, I calculate a friendship index that captures the frequency of words in the electronic communications that are related to informal and social activities. Comparing the measurement of information diversity with the friendship index, I show that the former is positively correlated with generating billable revenue, whereas the latter is not. However, I find that in the case of layoffs, the friendship index is positively associated with retention, while information diversity is not. Interviews with managers who participated in the layoff decisions suggest that the referral is more important for job retention because layoffs can have a dramatic effect the remaining colleagues. Thus, these colleagues are likely to serve as advocates in critical situations such as impending layoffs, promoting one's work and accomplishment to others. This can reduce the probability of being laid off. The two types of information benefits can also be classified as instrumental and expressive, with information diversity being the instrumental element and social communication being the expressive element. Contrary to the notion that it is difficult to have both in a network because instrumental ties may dampen the effect of expressive ties (Fernandez 1991), I show it is possible to have both in a structurally diverse network. However, information diversity and the friendship index are shown to be substitutes for generating billable revenue and reducing the risks of getting laid off, suggesting a tradeoff in the returns from investing in instrumental (information diversity) or expressive elements (social communication) in a structurally diverse network. While it is possible to have both types of information benefits from a structurally diverse network, one cannot benefit from both equally. Thus, looking at their effects on different work outcomes is important. Information diversity primarily drives billable revenue, which is an objective and contractible performance metric. On the other hand, social communication is intangible and un-contractible. For example, those with more affective relationships could be great team players, facilitating collaborations and distributing their resources to other team members when needed. Because their services are instrumental to the success of the team, they are less likely to be laid off despite having lower objective performance evaluations. However, social communication can also be viewed negatively. For example, if multiple factions and cliques have formed as the result of politics, members of the same faction are more likely to protect their own members even if their objective performance evaluations are inferior. Thus, the amount of time and energy devoted to generate information diversity and social communication may depend on the reward structure and the firm culture. If the reward structure is more aligned with generating more billable revenue, employees would spend more time and energy gathering diverse information which is shown to the key to generating profits for the firm. However, when the culture of the firm is more group-focused or when the work outcome of individuals, such as layoffs, may also depend on the team, employees are more likely to spend time socializing, forming friendships, and lobbying supporters. In the case of layoffs where the decision is not purely based on observable performance metrics such as billable revenue, having supporters to advocate on one's behalf can significantly reduce the layoff risk. From the firm's perspective, delegating the layoff decisions to managers would be optimal if social communication can improve effectiveness of collaborations among team members and contribute to the profitability of the firm, because managers have private information about the employees, including friendships, that the firm cannot observe. But, it is also possible for the managers to have a different objective function from that of the firm; and managers would choose to lay off a person to maximize his own power inside the organization, even at the expense of the firm. My data analysis suggests that social communications do not contribute to one's own billable hours or to the billable hours of one's contacts. This suggests that the impact of social communication on layoffs is evidence that delegating layoff decisions to managers has important costs. Future work could attempt to examine more fully the costs and benefits of such delegation in order to improve our understanding of the optimal allocation of decision rights within firms. Reference Athey, S., and S. Stern (1998) "An Empirical Framework for Testing Theories about Complementarities in Organizational Design," NBER Working Paper No. 66oo. Ancona, D.G. and Caldwell, D.F. 1992. "Demography and Design: Predictors of new Product Team Performance." Organization Science, 3(3): 321-341. Angrist J, Pischke J. Mostly HarmlessEconometrics, Princeton University Press, 2009 Aral S, Van Alstyne M. 2007. Networks, Information &Social Capital. InternationalConference on Network Science 2007 Bales, R. F. and Slater P. 1955. "Role Differentiation in Small Decision Making Groups." Chapter V in Family, Socialization and Interaction Process, edited by Talcott Parsons and Robert F. Bales. Glencoe, IL: Free Press. Berger, J and Milkman, K.L. 2010, Social Transmission, Emotion, and the Virality of Online Content Borgatti S., and Cross R. 2003, "A Relational View of Information Seeking and Learning in Social Networks," ManagementScience 49(4): 432-445. Borgatti, S. and Foster, P. 2003. "The network paradigm in organizational research: A review and typology." JournalofManagement.29(6): 991-1013 Brynjolfsson, E., Milgrom, R. (2008) "Complementarities in Organizations", NBER Workshop on Economics of Organization Burt R. 1992. Structural Holes: The Social Structure of Competition. HarvardUniversity Press, Cambridge, MA. Burt, R. 2000. "The network structure of social capital" In B. Staw, and Sutton, R. (Ed.), Research in organizationalbehavior (Vol. 22). New York, NY, JAI Press. Burt R. 2004. StructuralHoles & Good Ideas.American Journalof Sociology, 110: 349-99. Burt R. 2005. Brokerage and Closure: An Introduction to Social Capital, Oxford University Press, New York, NY Burt R. 2007. Secondhand Brokerage: Evidence on the Importance of Local Structure for Managers, Bankers, and Analysts, Academy of Management Journal, 2007, 50(1), pp. 119- 48. Burt R., Ronchi, D. 2007 Teaching Executives to See Social Capital: Results from a Field Experiment, Social Science Research, 2007 Burt, R. 2008. "Information and structural holes: comment on Reagans and Zuckerman." Industrialand CorporateChange, 17(5): 953-969. Coleman, J.S. 1988. Social Capital in the Creation of Human Capital. American Journal of Sociology, (94): S95-S120. Cross, R. and Cummings, J. 2004. "Tie and Network Correlates of Performance in Knowledge Intensive Work."Academy ofManagement Journal.47(6): pp. 928-937. Cummings J, Cross R. 2003. Structural properties of work groups and their consequences for performance. Social Networks, 25(3): 197-210. Etzioni, A. 1965. "Dual Leadership in Complex Organizations." American Sociological Review 30: 688-98. Fernandez, R.B. 1991. "Structural Bases of Leadership in Intraorganizational Networks." Social Psychology Quarterly 54:36-53 Fombrun, C.J. 1982. Strategies for network research in organizations. Academy of Management Review, 7: 280-291. Friedkin, N. 1982. Information Flow Through Strong and Weak Ties in Intraorganizational Social Networks. Social Networks, 3 (1982) 273-285 Goldstein, D. G., Gigerenzer, G. (2002). Models of ecological rationality: The recognition heuristic. Psychological Review, 109, 75-90. Goldstein, D. G., Gigerenzer, G. 1999. The recognition heuristic: How ignorance makes us smart. In G. Gigerenzer, & P. M. Todd, (Eds.). Simple heuristics that make us smart. Oxford: Oxford University Press. Granovetter, M. 1973. The strength of weak ties. American JournalofSociology, 6: 1360-1380. Granovetter, M. 1982. The strength of weak ties: A network theory revisited. In P. V. Marsden and N. Lin(eds.), Social Structure and Network Analysis: 105-1 30. Hargadon, A. and R, Sutton. 1997. "Technology brokering and innovation in a product development firm." Administrative Science Quarterly,(42): 716-49. Hansen, M. 1999. The search-transfer problem: The role of weak ties in sharing knowledge across organization subunits. Administrative Science Quarterly(44:1): 82-111. Hansen, C., Hausman, J., and Newey, W., Many Weak Instruments and Microeconometric Practice,mimeo July 2005. Homans, G.C. 1974. The Human Group, Social Behavior: Its Elementary Forms, Harcourt, Brace &World Ibarra, H. 1992. "Homophily and Differential Returns: Sex Differences in Network Structure and Access in an Advertising Firm," AdministrativeScience Quarterly 37(3): 422-447. Ibarra, H. 1993. "Personal Networks of Women and Minorities in Management: A Conceptual Framework," The Academy of ManagementReview 18(1): 56-87. Ibarra, H.1995. Race, opportunity, and diversity of social circles in managerial networks. Academy of ManagementJournal,38: 673-703. Kossinets, G. and D. Watts. 2006. "Empirical Analysis of an Evolving Social Network." Science (311:5757): 88-90. Kossinets, G. and D. Watts. 2009. "Origins of Homophily in an Evolving Social Network." American Journalof Sociology, 115(2): 405-50. Krackhardt, D. 1992. The strength of strong ties: The importance of philos in organizations. In N. Nohria & R. G. Eccles (Eds.), Networks and organizations:Structure,form and action. Cambridge, MA: Harvard Business School Press. Krackhardt, D., 1995. "Entrepreneurial Opportunities in an Entrepreneurial Firm: A Structural Approach," in Entrepreneurship Theory and Practice, Spring, pp. 53-69. Krackhardt, D. and Kilduff, M. 1999. "Whether close or far: Social distance effects on perceived balance in friendship networks." Journalofpersonalityand socialpsychology, (76) 770-82. Krackhardt, D. and Porter, L. 1985. "When Friends Leave: A Structural analysis of the Relationship between Turnover and Stayer's Attitudes." Administrative Science Quarterly. 30: 242-261. Kumbasar, E., Romney, A.K., and Batchelder, W.H. 1994. "Systematic biases in social perception." American JournalofSociology, (100): 477-505. Lin, C.Y., Ehrlich, K., Griffiths-Fisher, V. and Desforges, C. "SmallBlue: People Mining for Expertise Search", IEEE MultiMedia at Work 15(1) Lin, N. 2001. Social capital: A theory of social structure and action. Cambridge: Cambridge University Press. Lincoln, J. R., Miller, J. 1979. Work and friendship ties in organizations: A comparative analysis of relational networks. AdministrativeScience Quarterly,24: 181-199. Marsden, P. 1990. "Network Data and Measurement." Annual Review of Sociology (16): 435463. McGinn, K.L. and Milkman, K.L. 2010, Shall I stay or shall I go? Cooperative and competitive effects of workgroup sex and race composition on turnover Munshi, K., 2003 "Networks in the Modern Economy: Mexican Migrants in the U. S. Labor Market," The Quarterly JournalofEconomics 118, no. 2: 549-599. Obstfeld, D. 2005. "Social networks, the tertius iungens orientation, and involvement in innovation." Administrative Science Quarterly,50, 100-130. Podolny, J. 2001. "Networks as the Pipes and Prisms of the Market." American Journal of Sociology, (107:1): 33-60. Podolny, J. and Baron, J. 1997. Resources and relationships: Social networks and mobility in the workplace. American SociologicalReview (62:5): 673-693. Reagans, R. and McEvily, B., 2003. Network Structure & Knowledge Transfer: The Effects of Cohesion & Range. Administrative Science Quarterly,(48): 240-67. Reagans, R. and Zuckerman, E., 2001. Networks, diversity, and productivity: The social capital of corporate R&D teams. OrganizationScience (12:4): 502-517 Reagans, R. and Zuckerman, E., 2008, "Why knowledge does not equal power: the network redundancy trade-off," Industrial and Corporate Change 17, no. 5 (October 1, 2008): 903 - 944Rodan, S. and Galunic, D. 2004. "More Than Network Structure: How Knowledge Heterogeneity Influences Managerial Performance and Innovativeness." Strategic Management Journal (25): 541-562. Scott, B.D. 1996. "Shattering the Instrumental-Expressive Myth: The Power of Women's Networks in Corporate-Government Affairs," Gender and Society 10(3): 232-247. Shah, P.P. 2000. "Network Destruction: The Structural Implications of Downsizing," The Academy of ManagementJournal(43:1): 101-112. Slater, P.E. 1965. "Role Differentiation in Small Groups." American Sociological Review 20:30010. Snijders, T. 1999. Prologue to the measurement of social capital. The Tocqueville Review, 10(1): 27-44. Sparrowe, R., Liden, R., Wayne, S., Kraimer, M. 2001. Social networks and the performance of individuals and groups. Academy ofManagementJournal,44(2): 316-325. Wen, Z. and Lin, C.Y.,2010. "Towards Finding Valuable Topics", SIAM International Conference on Data Mining 2010: 720-731 Wu, F., Huberman, B., Adamic, L., and J. Tyler. 2004. "Information Flow in Social Groups." PhysicaA, 337: 327-335. Wu, L., Lin, C., Aral, S., Brynjolfsson, E. 2009. "Network Structure and Information Worker Productivity: New Evidence from the Global Consulting Services Industry." Winter Conference on Business Intelligence, University of Utah, Salt Lake City, UT. Zaheer, A. and Bell, G.G. 2005. "Benefiting from network position: firm capabilities, structural holes, and performance," Strategic ManagementJournal26(9): 809-825. Identification of Influence: An ExperimentalPlatformfor Understandingthe Relationshipbetween Social Networks and Performance Lynn Wu Abstract This study creates an experimental platform for identifying the relationship between social networks and performance. While a large body of literature has examined the correlations between certain network topology and performance, little research has shown a definitive causal link between social network and productivity. I address this problem through conducting three sets of randomized field experiments using an on-line experimental platform at a large information technology firm. The platform enables randomly selected employees to achieve certain network characteristics. By examining work performance before and after the experiment, I hope to tease out the causal linkage between networks and productivity. Furthermore, I plan to distinguish the type of employees (e.g. peripheral actors) that could benefit the most from a change in network structure. Introduction This study focuses on the identification of influence in social networks. A large body of literature on social networks and organizations describes the benefit of social networks on work performance in various settings. However, little research leverages the ample data that is created by people's interactions, such as e-mail, call logs, text messaging, document repositories, wikis, and so on. This gap is problematic, because much of the literature on organizational networks suffers from the same deficits as the social network literature-they both tend to be focused on small, static networks. As a result, these studies generally show a correlation between work performance and a person's position in the network. While these correlation studies are important, they do not demonstrate a causal relationship. Without a collection of detailed, large-scale longitudinal data and the capability to introduce an exogenous source of variation, it would be hard to answer important questions like these: "Does the influence of social networks persist over time?" "How do social networks help people achieve better work performance?" "Is there a causal relationship between network position and productivity?" The direction of causality is one of the most important questions that hamper the progress of network research. While ample evidence has shown strong correlations between network characteristics and productivity, the causal direction is unclear. If, for example, we find that the most productive employees in the firm are those whose networks are full of structural holes, we might conclude that these workers' positions in the social network afford them timely access to information and consequently help them to be more productive or to outperform their peers. An equally likely explanation is a reverse causality story, in which workers are more likely to seek advice from a few high-performing individuals and as a result, high performers tend to have networks with structural holes. Essentially, they are the stars who perform well because of their personal attributes, and find themselves in certain network positions as a result of their performance. I present two ways to address the direction of causality between network characteristics and work performance. First, I use the instrumental variable approach in Chapter 1 of this thesis, finding an exogenous source of variation in a person's social network. However, any inferences from using instrumental variables require an explicit assumption of exogeneity. Unless the instruments were truly manipulated using randomization, it can be hard, at times, to justify their validity. Thus, to truly introduce a source of exogenous variation, I create a platform to conduct randomized experiments that can answer questions such as whether network characteristics have a causal effect on performance. The platform, using an online social networking tool, allows manipulations to potentially alter a person's network position or their social behaviors. For example, if structural holes are theorized to improve work performance, an experiment can be designed to expose randomly selected individuals in the treatment group to people whose network is more central than those in the control group. Examining the work performance before and after the interventions, we can explore if a change of network position can in fact alter work performance. If an improvement in work productivity is detected, it is reasonable to infer a causal relationship, in which occupying a desirable position in a network causes individuals to be more productive above and beyond their inherent abilities. Understanding the direction of causality between network properties and productivity would be a breakthrough in the field of social networks, information productivity, and organizations. If the direction of causality is established, it is important to explore the type of workers who benefit the most from occupying a central position in the network. It is possible that the peripheral actors are more likely to benefit from connecting to a person who is occupying the center of the network than people who are already at the center of the network. Similarly, it is possible that a junior employee may benefit more from becoming more central in the network than someone who is senior. To precisely capture the social networks of individuals over time, I take advantage of recent advances in information technology to collect real-time email and instant messaging (IM) communication data in a large corporation. Since email and IM archives record detailed communication logs, such as who has talked to whom, the exact time of the interaction, and the content of the exchange, constructing social networks using email and IM archives allows researchers to eliminate errors and bias that are often introduced in self-reports. With access to detailed records of electronic data for more than 10,000 people in a large organization over three years, I am able to map social network graphs for each person over time. In advancing our understanding of how information workers generate value, I focus on a class of information workers that have rarely been examined in the past: consultants that represent a large population of information workers who generate revenue by logging "billable hours." To explore the relationship between social network positions and productivity of these consultants, I collect detailed and objective performance measures of more than 1,ooo consultants, including the numbers of billable hours and participated projects, and the revenue generated. To understand how consultants generate economic value, I also conducted extensive interviews with 15 consultants at various stages of their careers. Through these interviews, I find that efficient access to useful information is crucial, as timely and valuable information can facilitate fast and high-quality decision-making and is thus critical to satisfying customers. This is particularly important for the consulting services, as generating repeat business from existing clients is a key performance indicator and the cornerstone of a consulting business. If expeditious access to information improves productivity, understanding the mechanisms of how information workers access the information through both social and technological means is important. Continuing with the micro-analysis research on worker productivity pioneered by Ichniowski, Shaw, and Prennushi (1997), I plan to study a single industry in depth and examine how information workers obtain information through various communication channels and social networks. With the cooperation of the company and employees, I have been monitoring email and instant messaging usage to analyze the flow of information and its relationship to network structure and work performance. Data and Setting To understand the micro mechanisms of how social networks can help work performance, I analyze an electronic communication social network of more than 9000 employees over 3 years. The data contains email and instant messaging activities inside a global information technology firm with more than 30 product divisions. To my knowledge, this is the largest social network ever utilized to study the impact of social networks on information worker productivity. The data is collected using a privacy-preserving social network analysis system (Lin et al., 2008) that uses social sensors to gather, crawl, and mine various types of data sources, including content of individual email and instant message communications, calendars, and takes into account the hierarchical structure of the organization as well as individual role assignments. The system is deployed in more than 70 countries and has collected detailed electronic communication records of 9035 volunteer employees. In this study, I constrain the analysis to focus on the sub-network for the 9035 volunteers for whom we have their complete electronic communication data. To eliminate any potential self-selection bias from using volunteered data, I compare the network characteristics and job roles of the volunteers with the rest of the firm. I find minimal differences between the two populations, alleviating any concerns about using only the sub-population consisting of only volunteers. Table 1: Summary Statistics for Person-Level Networks Variable Obs. Mean Std. Dev. Min Max In-degree 7043 13-35 7.81 0 125 Out-degree 7043 13-41 8.72 0 132 Betweenness centrality 7043 .001 .001 0 .018 Network Constraint 7043 .564 .284 .074 1.810 7043 .120 .466 0 5 7043 4 0.372 0 1 Direct project contacts with managers Gender: male = 1 To construct a view of the network that reflects the real communications between 67 employees, I eliminated spain and mass email announcements. Using the timestamp associated with each electronic communication exchange, I mapped a dynamic panel of social networks from 2006 to 2009. Each network is built using only the communications occurring in sliding window of 6 months. I calculated some standard measures of networks such as centrality, network size, and network constraints. The summary statistics for the network are shown in Table 1. This set of data provides a rare opportunity to study how a person's social network evolves over time. To explore how social network is related to work performance, I obtained detailed financial performance records of more than 10,000 consultants. I focus on 2,500 consultants in this sample who have volunteered their electronic communication data, and collected detailed records of 2,592 projects these consultants participated in from June 2007 to July 2008. The sheer volume of the data permitted a more precise estimation of how population-level topology in a network contributed to information worker productivity, after controlling for human capital, work characteristics, and demographics. To protect the privacy of the participants, their identities are replaced with hash identifiers. Table 2 in the result section shows the baseline correlation results between various network measures and performance. They largely conform to theoretical predictions and are consistent with previous correlation studies, lending confidence to our data collection. To understand how information is transmitted between consultants, we record the actual content of all their electronic exchanges over a three-year period from 2006 to 2009. To protect their privacy, all context and grammar structure of the content has been stripped and we retain only a list of word frequencies. Using this set of words alone, it is virtually impossible to reconstruct the original messages. From the aggregated words of information exchange, we classify the content into 100 topics of expertise. Monitoring how a person's topics change over time and tracking how these topics flow in and out of one's network can provide insights on how employees obtain information through their social contacts. Using this dataset, we can evaluate both the quality and quantity of information that each person acquires through the network. Person-level Email Networks Personal Monthly Personal Personal Monthly Monthly Dependent Var. Revenue I Revenue Revenue Controls: Project Complexity, Line of Business, Months, Regions, job level month, job role (1) (2) (3) sectional Cross Random Effect Fixed Effect network Table 2: hours 113.7*** (0-993) 513-35 ** $119.84*** (2.307) Betweenness 115.2*** (0-950) 321.7*** Centrality (199-36) (217.83) (126.9) Constraint -328.49** -276.64** (io6.68) (W3.88) In-degree -13-51 (34-17) -82.24* (39.65) -313.5** (125.4) -30-32 (46-56) Out-degree 8.358 9.833 99.31** (21.85) (24-38) (47.29) 1.441 (48.01) -98.48* (53-49) 368.0 (855-3) Number of strong -52.14 -56.10 -2.427* links (43.61) (48-43) (1.406) Number of strong links to managers 8o8.2** (365.2) 588.2* (389.4) 1320** Number Managers of in -283.5** network 6 (615.8) Total -7.418 - .777 -10.19 communications (4-588) (4.880) (7-554) -0.00948** -0.0191*** 1.66e7*** (0.00433) (0-00489) (4.02e8) -283.2** (114.7) -196.6 -254.6*** network (128.7) (95-78) Is Manager 2685*** -- 566.6 (817-5) -- (1376) 84.96 (495.6) -- 236.9 (214.5) to managers Reach in 3 steps Divisions gender in code (i=male) -- Observations 5527 5527 5499 R-squared 0.81 0.81 0.78 p<.1, **p<.05, ***p<.ool. Huber-white robust standard errors are shown in parentheses Research Design Below, I describe two sets of experiments that are used to tease out the causal relationships between network structure and performance. Both experiments manipulate the social structure of the treatment group and compare it with the change in the control group. The first set of experiments makes the treatment group more likely to be connected to central actors in the network than the control group. Both experiments leverage a social networking search tool, called Expertise-Find that searches for experts in an organization based on keywords. Expertise-Find, an Experimental Platform Expertise-Find is similar to popular search engines on the Web, such as Google, with the only difference being that instead of URLs, it returns a list of people whose expertise is relevant to the search query. This tool aggregates as much information as it can about the employees inside the firm using the firm's Intranet. For example, the tool can crawl and mine information about employees using their online profiles, resumes, and online forums as well as communication data (if they decide to volunteer their electronic communication data). These data serve as the basis to infer individual expertise at the firm. For example, when searching for the phrase "Social Networks," Expertise-Find would return a list of people ranked by whether their expertise is relevant to social networks (Figure i). Each search result lists the name of the expert, a picture (if available in the public HR directory), the job role, and the division the expert belongs to. If one clicks on the person, the system shows more details, such as the physical work location and contact information. By contacting experts suggested by the tool, users may be able to find the information they need either directly, or through further recommendations from the expert. Evidence also suggests that the relationship formed between the expert and the searcher can become more permanent with repeated interactions. One person interviewed commented that a friendship was formed after contacting an expert through the tool. An expert who had helped her earlier was transferred to the same work location as she was, and she offered to help the expert through the transition and they became friends since. Some experts also mentioned that they received thank-you gifts from the searchers they helped, further enhancing their relationship. The firm has a program that sponsors this type of gifts so that individuals can use them to thank people in the organization who are helpful to their work. SM Ir (SAJEd*wMr) onry: 2L 8e:V h Ea: Social Networks 9"odeyee: NolutIM R J OMomA 3~w Q Zuckerberg Facebook 2 Williams Anderson MySpace Stanford Burt Twitter 5Weiner Granovetter 6 Chicago Coleman Obama President Figure 3: Current search result using the people search tool The experimental platform primarily relies on changing the search algorithms of Expertise-Find. Currently, the search algorithm aggregates all the information it can find about a person in the intranet and creates an expertise index based on the content. It does not yet leverage any social network parameters into the search. Thus, the current search algorithm ranks individuals highly when their expertise best matches the searched keywords. The main manipulation for the randomized experiments is to change the search algorithms for a group of randomly selected individuals. These algorithms are designed such that people with certain types of network characteristics are more likely to show up at the top of the search results. For example, a manipulation can be designed to help a person become more structurally diverse. By listing in the top search results individuals who are more likely to increase the searcher's network diversity, the manipulation makes searchers more likely to connect to these individuals and thus increase their own structural diversity than those who use the default algorithm that does not rank research results by structural parameters. To verify that people are on average more likely to view the top search results and actually contact the experts they find, a user survey on search behaviors was conducted in January 2010. While that top searches are more likely to be viewed is a general assumption and has been verified in Human Computer Interaction research, it is nonetheless important to make sure it is the case in this setting, as this is a crucial assumption underlying the experiments. Table 3. Surveys on User Behaviors for Expertise-Find How often do you use Expertise-Find? 1. o o o Daily Weekly Monthly o Bi-monthly o 2. Rarely How often do you look at the first 5 search results? o 90% of the time or more o 6o% of the time or more 3 o o 30% of the time or more 10% of the time or more o less than 5% of the time How often do you contact people in the first 5 search results? o 90% of the time or more o 6o% of the time or more o 30% of the time or more o less than 5% of the time o 1o% of the time or more 4. How often do you look at the first 10 search results? o 90% of the time or more o 6o% of the time or more o 30% of the time or more o less than 5% of the time o 5. io% of the time or more How often do you contact people in the first 10 search results? o 90% of the time or more o 6o% of the time or more 6. o 30% of the time or more o 90% of the time or more o 1o% of the time or more o less than 5%of the time How often do you look at results beyond the first page? o 6o% of the time or more o o 8. 30% of the time or more 10% of the time or more o less than 5% of the time On average, how many pages of search results do you go through for a typical search? Using this platform for conducting randomized experiments, we first explore if changing the search algorithms of Expertise-Find and manipulating the interface to Expertise-Find could alter people's network positions. Second, if we find that the experiment could alter a person's network position, we examine the likelihood of a changed network position affecting productivity. Lastly, we explore the conditions under which a person's network position has the most significant change, and how much the change is associated with productivity. Surveys on Expertise-Find Usage Patterns For the manipulation to have an effect on a person's social network structure, it is important to verify that i) users are more likely to view the top search results, and 2) users are more likely to contact the top search results from Expertise-Find. We verify this by sending a survey to the existing Expertise-Find users asking questions about their search behaviors. The detailed survey questions are found in Table 3. Specifically, the questions explore: i) How often do people use Expertise-Find? 2) When they use Expertise-Find, do they focus primarily on the top search results? 3) How often do they actually contact the people suggested by ExpertiseFind? Table 4: Summary Statistics for Survey Results Survey Obs. Mean Std Dev Min Max Usage 23 3.71 .88 1 4 Looking at Top 5 search 23 4.93 .26 3 5 Contacting Top 5 search 23 4.14 -75 3 5 Looking at Top 10 search Contacting Top 10 search 23 23 4.78 4.14 .41 .74 3 3 5 5 Looking beyond first page 18 4.75 -43 3 5 Avg pages viewed 18 3.82 2-45 2 10 The survey was sent out to a randomly selected group of 8o users of Expertise-Find; about 25 people filled out the survey, which is a 31% response rate. The summary statistics are in Table 5. We find that people use Expertise-Find between every month to every 2 months, suggesting that the experimental period should be at least 6 months or possibly longer for the manipulation to take effect. Table 4 also shows that most individuals tend to view the top 5-10 results. More than half the surveyed individuals browse only the first 4 pages. These results largely conform to existing Human Computer Interaction research. Overall, results at the top of the page tend to be viewed more often, and searchers are also more likely to contact these individuals. Table 5: Minimum Detectable Size Calculations power tpower variance sensitivity sample size 80% 16 0.1089 90% 80% 90% 8o% 90% revenue 21 16 21 0.1089 0.1089 0.1089 0.05 0.05 0.04 0.04 3769-388859 4947-322877 5889.670092 7730-191996 16 0.1089 0.03 10470-52461 21 0.1089 0.03 13742.56355 8o% 80% 80% 80% 80% 80% 80% 80% 16 93545185-36 0.1 866.8436592 16 93545185-36 0.09 1070.177357 16 93545185.36 0.08 1354-443217 16 93545185-36 0.07 1769.068692 16 16 0.06 0.05 2407.899053 3467-374637 0.04 5417.77287 16 93545185.36 93545185.36 93545185.36 93545185-36 0.03 9631-596213 90% 90% 21 21 93545185-36 93545185-36 0.1 1137.732303 1404.607781 90% 21 93545185-36 0.08 90% 90% 90% 21 21 21 93545185-36 93545185-36 93545185-36 90% 90% managers 21 21 93545185-36 93545185-36 0.07 0.06 0.05 0.04 0.03 80% 16 10.1124 0.08 7229.546169 90% 80% 21 16 10.1124 10.1124 0.05 0.04 24291.27513 28918.18468 90% 21 10.1124 0.04 80% 16 10.1124 0.03 37955-11739 51410.10609 90% 21 10.1124 0.03 67475-76425 80% 16 1.247 0.05 2287.139654 90% 21 1.247 0.05 3001.870796 80% 16 1.247 0.04 3573.655709 90% 21 1.247 0.04 80% 16 1.247 0.03 4690.423119 6353.165706 90% 21 1.247 0.03 8338.529989 constraint 16 o.09 1777.706723 2321.902659 3160.367507 4550.929211 7110.826892 12641.47003 divisions We also followed up by telephoning ten individuals to investigate how they use the tool and under what conditions they find the tool useful. From this anecdotal evidence, we find that people are more likely to use the tool when they are starting a new project. They often like to search for others who are working in similar areas because they are likely to be future resources or potential competitors. Many times, they would seek these individuals when they encounter difficult problems. We find that people resort to Expertise-Find to look for potential resources when they face a new problem in a project and no one in their immediate network can help. Typically, we find that it is not unusual to contact multiple individuals before finding a satisfactory answer. However, it does not mean earlier contacts were unsuccessful searches. In fact, they are often valuable for referring the questioner to other individuals who are more likely to know the answer, especially once they have taken the time to understand in detail what the questioner is asking for. Network Structure and Performance Three sets of experiments are implemented on the Expertise-Find experimental platform. They are used to infer the following: 1) the direction of causality between network positions and performance; 2) the optimal ways to change a person's network over time; 3) the long-term implications of a temporary shock to a person's network position. Experiment #1: Passive Network Change through Expertise-Find This experiment leverages an existing intra-organizational search technology, ExpertiseFind, that is designed to find people in the firm whose expertise is related to the searched keywords. Currently, each search result shows a photo of the expert with a caption describing the job role, and the division the person belongs to as well as a link to the contact information such as email address and work phone number. In order to generate an exogenous change to a person's social network, we create a manipulation to change the algorithm that ranks the search 75 result. Specifically, the new algorithm will combine both an expertise relevance score and a network-based score to create a new score for each search result, with the results being reordered by the new score. Currently, the weight on the search relevance score and weight on the network-based score are the same. The network-based score can also incorporate various other network parameters, such as network size, length of longest ties, and structural equivalence, depending on the desired treatment on the network. For this experiment, the network-based score uses the structural holes calculated from a person's local network. However, it is not the structural holes of the person in each search result, but the potential structural holes that the searcher would have generated when a link is created between the searcher and the person in the search result. Thus, the network-based rank is ordered by the size of the change in structural holes when adding a candidate to the searcher's immediate network. The final score is the sum of the rank on the expertise-relevance score and the rank on the network-based score. NewScore = Rank(search relevance score) + Rank(network-based score) For example, if a person is ranked first in the search relevance score but ranked fifth in the potential change in the searcher's structural holes, the score for the person is one plus five, for a total score of six. The score is then ranked in increasing order. Individuals with lowest scores would be displayed as the top search results. If we assume that people are more likely to click on the top search results, they are also more likely to be exposed to individuals whose connection to the searcher can generate the largest improvement in the searcher's network-based score (structural holes, in this case). Research in Human Computer Interaction has in general validated this assumption in various settings (Borgatti & Cross 2003). We also verified this assumption through a user survey designed to provide understanding of people's search behaviors. Figure 2 shows the search results that the treatment group would see, and Figure 1 shows the default search order that the control group would see when the same keywords are used in Expertise-Find. - ... ........ neW to Social Networks M No bobLM2 (1;peolw I Dl CGe. lamb iaskelt hemr) J Sho t E k 2:pus tey b 3: Ispo 7 kw Zuckerberg Obama Facebook President 3 Colman 5 MvuIBmis be - [777man44Twitter 3 --- --__ -........... . Granovetter Stanford 6 Burt 8 Chicago Williams Anderson MySpace Weiner Linkedin Figure 4: Search results reordered by social connectedness One potential concern is that by altering the ranking algorithm to include a networkbased score, people in the treatment group would not see results that are best matched for the expertise they sought and this may potentially negatively affect their search experience. However, the list of the search results are the same for both control and the treatment group, but only the order shown on the screen would be different. Therefore, people on the control and treatment group see the same list of potential candidates. This may be a problem if the search results are long and users rarely go through more than a few pages of the search results, and thus potentially depriving the treatment group from seeing the best-matched results. In this case, any performance gain derived from the manipulation would be actually understated because the gains were made despite not being able to see the search results best matched to the keywords. Another key for the manipulation to work is to ensure that the search order will actually change for a significant proportion of queries using the new ranking scheme. Ideally, after the re-ranking, it is unlikely that the top result from the original search algorithm is also the same 77 person who has the highest network-based score. To ensure that the network-based ranks are not collinear with the expertise-relevance ranks, we randomly sampled 100 queries and calculated the ranks using the modified and the default algorithm. The correlation of the two types of ranking scheme is 0.36, suggesting that the modified ranking order provides significant deviation from the expertise-based ranking. Control Group The control group continues to see results from the default search algorithm, which does not incorporate any of the network-based score to order the search results. However, we realize that the treatment group could potentially contaminate the results from the control group. Because both the control and the treatment group belong to the same firm, it is impossible to create two independent worlds that prevent one group from communicating with the other. As a consequence, any change in social structures in the treatment group can affect the control group, but the effect is second-order. Detailed treatment for network contamination is explained in a later section. To mitigate some of the contamination effect, we have captured the electronic communication exchange for both the treatment and control group for two years before the experiment starts. This will allow us to estimate the change in network structures for the control group before and after the experiment. If the change is minimal, then we can safely use the control group data during the experiment. Otherwise, we use a difference-in-difference specification where we calculate the change in network structure for both the control and treatment groups before and after the experiment. We expect the changes in network structure to be greater in the treatment group since connection to the top search results for them should generate a more structural diverse network. Related Experiments While the first experiment uses structural holes as a proxy for social connectedness to infer their effect on performance, other network measures could also be used. Using different measurements for social network positions is useful because comparing their productivity impacts allow us to formally evaluate their effects on work performance. The social network literature has long extolled the value of certain network positions. However, there is no systematic way of effectively evaluating their causal impacts. For example, while structural holes could be beneficial for work performance, they might not be, and even if they are, crosssectional studies do not permit evaluation of the marginal cost of attaining more structural holes. Designing a platform that can test the performance implications of each network parameter would be extremely valuable for understanding why and how networks matter for performance, as well as the costs and benefits of attaining certain network positions. Below is a list of network measures that can be tested using the Expertise-Find experimental platform. 1. Recommending high-status individuals such as project managers, directors, and partners. The ranking system places high-status individuals at the top of the search results. 2. Recommending long-range ties: high path length. The ranking system places individuals at the top of the search results when they have a longer path length to the searcher. 3. Homophilic recommendations. The ranking system places individuals at the top of the search results when they have similar traits to the searcher, such as the same gender, demographics, and job roles. 4. Heterophilic recommendations. The ranking system places individuals at the top of the search results when they are different from the searcher, such as different genders, demographics, and job roles. Implications This research design explores whether there is a causal relationship between social network positions and work performance. Through modifying the search algorithm of ExpertiseFind, we create a mechanism to induce a change in a person's social network positions in an organization over time. When a change in network positions is detected for people in the treatment group, it is possible there is also an associated change in their performance. If the associated changes for both positive, it would indicate a causal relationship between social network positions and performance. However, it is likely that performance returns from a change in networks, if there is any at all, will not be immediately obvious, because it may take a significant amount of time for an investment in social networks to generate a return in performance. Therefore, I expect a lag between a change in network positions and any associated performance change. Similarly, it is an interesting empirical question to examine how long it takes for an investment in social networks to translate to higher productivity. While detecting a change in billable revenue with a change in network positions can help establishing a causal relationship between social networks and performance, it is equally important to explore the mechanism behind the relationship. In Chapter 1 of this thesis, I discussed the informational advantage provided by structural holes. To detect if more structural holes actually generate more information, we can measure the diversity of information before and after the experiment to see if a change in structural holes also generate more information, and if so, what type of information. One advantage to measuring information is that while performance improvement, as measured by billable revenue, may be hard to detect in the shortrun, information advantage as measured using email communications may be more easily detected. Even if we detect a change in productivity from a change in network positions, it is important to explore if such change is only temporary. By monitoring people's network positions and performance over time, it is possible to trace the trajectory of how a temporary shock to 80 network positions can affect long-term work performance. It is plausible that modifying the search algorithm can boost a person's work performance in the short run, but it is uncertain if this type of manipulation can have a detectable longer-term effect on productivity. It would be interesting to observe the performance differences between the treatment and the control group after the rank algorithm is restored to the original algorithm where only expertise-relevance is used to rank results. It is possible that we would observe that performance returns to the preexperimental values, but it is also plausible that individuals in the treatment group would experience a long-term advantage over those in the control group if the performance differences continue to persist. This could have tremendous value for the firm, especially if a short-term and a simple shock to the network position can induce a long-term improvement in performance. A similar question can also be asked about the network structure. After we return to the original ranking algorithm, it would interesting to examine if social network structures in the treatment group could also return to the pre-experimental structure. Without constant reinforcement of recommending individuals who can generate significant change to a person's network position, I suspect the network position of the person may return to its original state. This is because a contact made through Expertise-Find may often involve an arm's-length transaction. Typically, the contact is made when the searcher needs help with a difficult problem. Once the problem is resolved, the two people may cease to have future interactions. However, during interviews, we have heard anecdotal evidence that contacts through ExpertiseFind can also develop deeper working relationships with the searcher. Thus, it would be interesting to observe if there is network decay after the experiment and, if so, the rate of the decay. Lastly, measuring how fast an intervention alter network positions is used to gauge the cost of attaining certain network positions, such as having more structural holes or having more high-status network contacts. Combining the potential gains in performance with the costs of generating a change in network positions allows us to evaluate the net benefits of attaining certain network positions. Most research in the past only gauged the positive correlations between certain network positions and performance without explicitly addressing the cost associated with arriving at the network position. While the literature has shown a positive correlation between structural holes and work performance, it may still be unfavorable for a person to strive for structural holes when the costs are high. For example, consider the following hypothetical scenarios. In the first one, a 1%increase in structural holes takes 3 months to attain and is associated with a $328 increase in monthly revenue. Compare this with a second hypothetical scenario where a 1%increase in the number of strong ties to managers would take at least 12 months to attain but is associated with generating an $800 increase in monthly revenue. Without considering the costs, we may conclude that having strong ties to managers is more beneficial, but after taking costs into account, it may be worthwhile for an employee to pursue structural holes, especially when the employee is seeking a faster return on performance. Gauging the costs of generating a certain type of network is important because it is an underexplored issue in network research. Without a cost analysis, the performance improvement from a network change alone is by itself insufficient for deriving the net benefits generated from social network positions. The summary of questions addressed by Experiment #1 is in Table 7. Table 7: Research questions addressed by Experiment #1 1. Can a simple manipulation such as an online experiment change a person's network position? If so, how long does it take to change a person's network position? 2. If a network change is generated, how long does it last, what is the rate of decay if any? 3. Can a network change generate a change to accessing more information? 4. Can a network change improve work performance as measured by billable revenue? What is the cost-benefit analysis for changing network positions? Which network position 5. generate the largest reward? Network Contamination Ideally, the treatment and the control group are two independent worlds with no interactions between them, ensuring a precise estimate of the performance effect from a network change. While separating individuals into independent worlds is easily achievable in a laboratory setting, it is nearly impossible in the field, especially for an experiment with a relatively long span of time. The contamination is further exacerbated in a single organizational setting, because a large majority of individuals are often connected through a single cluster, creating a challenge for dividing them into relatively independent groups. However, we still believe in the benefits of running randomized experiments in the field to gauge the causal effect of networks on performance in a real business setting. The results may have immediate managerial implications that might not have been evident in a lab experiment. While it is difficult to eliminate contamination entirely, we try to minimize it in two ways. First, we avoid using global network measures where an addition or a deletion of an edge in a person's ego network could generate a ripple effect to the rest of the network. For example, increasing one person's betweenness centrality can indirectly change the betweenness centrality of others in the network, because betweenness centrality is a global network measure that requires calculating all pathways between any two nodes in the network; any perturbation in the network can change the betweenness centrality measures for the rest. Thus, using betweenness centrality can be problematic, since a network change for individuals in the treatment group could change the network positions of individuals in the control group, confounding the estimation of the treatment effect. To minimize this type of contamination, only local network measures should be used. For example, structural holes are calculated based on ties that are one degree and two degrees apart from the ego. Similarly, adding connections such as long ranging ties, homophilic, and heterophilic ties should not affect the rest of the nodes in the network and thus, they do not suffer from network contaminations. Even using local measures that are not prone to contaminate across groups, it is still important to mimic the two independent worlds as much as we can by finding the maximum number of participants (N) that minimizes the distance between any pair of nodes to be at least 2 steps away. Thus, on average adding an edge in a person's local network in the treatment group would not affect the local network of individuals in the control group. Through simulations, we pick N to be 4,770 individuals, which represents a good trade-off between finding a large enough sample size and minimizing the distance between any two nodes. The simulation algorithm is in Table 6. The intuition behind the algorithm is counting in a set of nodes the number of pairs that are within a distance of 2 or less from each other. The nodes are randomly selected so they have the same probability of having the same distance from each other. When the set is small, the chance that the two nodes are close together is relatively low, but the set gets larger, the chance of having multiple pairs of nodes that are close together increases. For each size of the set, i, ranging from 1 to 9,035 (the total number of nodes), we randomly choose i nodes and calculate the number of pairs whose distance is less than 2 steps. We repeat this process 100 times for each i, so that we can find the average number of pairs that have a distance of less than 2 steps. We then plot this number as a function of i. As shown in Figure 3, the graph is nonlinear and we pick the size of the set to be 4,470, at the point where the inflection becomes relatively large. When the network is dense, i tends to be much smaller but because the network generated from the electronic communications of 9,035 employees is relatively sparse, 4,470 nodes is still feasible to minimize contamination and at the same time preserve a large enough sample size. Table 6: Algorithm to determine the basic sample saize for (n= 1to N-1) for (m=1 to M) for all pairs <ni n-j> in n ei = find direct edges (or edge) between nji and n-j e2= find path less than 2 between ni and nj <E>.add(e1) <E>.add(e2) end end store <n, <E>/M> end To minimize future errors, we choose to gradually increase the number of subjects in the treatment group, initially starting with a relatively small number of people, and then increasing 84 -- . ........................ ..................... . ...... ---- -------------------- ----- gradually to the full size. Starting the experiment with a smaller group can help by detecting errors earlier in the process and thus making it possible to eliminate these errors before the full experiments are deployed (Kohavi et al. 2009). Slowly ramping up the experiments also allows us to monitor the level of network contamination and adjust the eventual sample size if necessary. Edge Node Simulation 25000 20000 ' 15000 10000 5000 0 4 m- m 5 W 4 4m -4 r -4 -IL L L Number of Randomly Selected Nodes Figure 3: Simulation for choosing sample size to minimize contaminations Sample Size The minimum requirement for any experiment to succeed is that the manipulation must alter subjects' behaviors as intended. To assess whether network positions, such as structural holes, can causally affect work performance, it is important to ensure that the manipulation can actually alter the network positions for individuals in the treatment group. It is necessary that the sample size be large enough to detect any change generated in a statistically significant way. Using a power of 8o%, the minimum sample size needed is 3,770 people in order to ensure that a 5% change in structural holes can be detected. Since we have more than 9,ooo individuals in the sample that is more than enough to satisfy the requirement. After taking network contamination into account, 3,770 people is still well below the sample size of 4,470, the maximum network size to minimize network contamination. Overall, these results give confidence that it is possible to detect a network change from the manipulation. We intend to monitor the changes throughout the experiment to trace how long it takes to reach certain measurements of structural holes. To link a network change to an increase in performance, we also need to detect a significant change in billable revenue-the primary performance measure. Using a power of 8o%, we need a sample of at least 3,467 consultants in order to detect a 5% change in billable revenue. However, we only have 2,881 consultants in the sample, so it would be difficult to detect the same level of change as with structural holes. Table 5 provides the sample size needed under various power and sensitivity values. Using all the 2,881 consultants allows us to detect a 6.5% change in monthly billable revenue. Using half of them can detect a 7.5% change in billable revenue, which is a substantial change for a person's performance. Thus, we expect that detecting a statistically significant change from billable revenue takes much longer than detecting a corresponding change in network positions. Monitoring the change in billable revenue over time is important, since it may take multiple months to detect a statistical change in billable revenue. However, capturing the time it takes to detect the change is an interesting question by itself. To ensure we capture some intermediate outcomes, we also plan to generate alternative outcome measures, such as the number of new projects a person started or the instances of repeat business. Lastly, a survey is planned to gauge the level of job satisfaction, the likelihood of finding new project opportunities, and the speed with which answers to difficult problems can be found. These intermediate outcomes are pertinent, as they may eventually lead to higher billable revenue. Experiment #2: Passive Network Change through Feedback The second set of experiments is designed to actively alter people's network characteristics by providing them with information about their social networks. Instead of attempting to maximize a person's network position by actively manipulating the search algorithm that recommends potential connections, this intervention provides feedback about the individual that may influence the person to change their social networks themselves. Figure 3 provides an example of how the experimental platform is used to provide people with information about their networks. When users in the treatment group log onto Expertise-Find, a message about a certain property of their networks shows up on the page. In Figure 3, we show a message informing the person how many people are directly connected to her. We then provide the summary statistics of employees at the firm who are similar to the user, as in rank, job role, and demographics. For example, the message in Figure 3 has summary statistics for the average of all Level 7 employees in North America who are in management consulting. In addition to network size, we can also provide other types of network information, such as 2-step network reach (Figure 5) or centralities or structural holes. However, it may be difficult for average users to understand the concepts of betweenness centrality and structural holes. Network size and range, on the other hand, are easier concepts for an average user to grasp. The goal of the experiment is only to prompt the users to think about their network when compared with the networks of others, but not necessarily to provide any specific strategy or recommendation (as in Experiment #i). Observing how different types of individuals respond to the message can have important implications for the policies to provide feedback. For example, we may see a pattern of reversion ......... .............. ... ..... - . ......... ............ ____ . ...... .............. to the mean, in which individuals whose network size is above the average may choose to do nothing, while individuals whose network size is below the mean may choose to actively pursue new network connections. Similarly, if we provide information about the maximum number of connections to individuals, it may prompt people, especially the most competitive individuals, to strive for more connections. "OM ftw d N AbWAs Tait I hil et 1 Umea,for (sImec keywords) COacdey: Ofetmoe: it0bRale: O1 eo tem Dwa .i f We FmlWiOft fl You have 21 people in your network. Band 7 consultants in North America have on average 46 people in their networks. Please invite your colleagues to join SmallBlue. The more people who join, the better SmallBlue will be. Showpeople: Use the search bar above to expertise In your extended social network, New to SmaBilue? Find out more I Figure 5: Homepage when logged in for the treatment group. A message displays to inform about a person's network range. Control Group The control group uses the original Expertise-Find interface with no information about the user's network statistics. However, in order to avoid having a significantly different homepage between the treatment and the comparison groups, we also provide a message for the control group, but it is neutral and does not contain any network information. For example, in Figure 4, two neutral messages are shown to the control group. The first simply shows the number of individuals who are using the social networking tool and the second is an advertising message that encourages people to participate in using Expertise-Find. These informational . . .............. .............. . ..... ...... messages are not designed to influence an individual to change their networks. [I. Umftf I flRISKI , Abou W bioson: Coenely; (solCt keyWordot) t o e , ies tow 01050 PM"olo Jobtot Did you know 10,232 people have joined the smallblue community? Please invite your colleagues to join SmallBlue. The more people who join, the better SmallBlue will be. Showpeople: Usethe searchbarabove to expertise in your extended socialnetwork. I New to SmatBlue? find out rnre Figure 6: The default home page for the control group. The message does not include any network parameters. "W~a ftoADaftt 1WIN lot ter (oot*Wky"oood) Rewh sMaeDoooroM Tanf of W Not PftMOW tf" Oiuismoo Job Role:___ -. c~y You can reach 126 individuals from your network connections. Band 7 consultants inNorth American can reach 300 individuals from their network connections. Please invite your colleagues to join SmallBlue. The more people who join, the better SmallBlue will be. Showpeople: In your extended social network. Use the search bar above to expertise Imr New to Smalllue? Fnd out Figure 7: The homepage for a treatment group to displays network reach for the person. Implications Experiment #2 prompts the users to actively change their networks by providing information about their networks and how they compare to people similar to themselves. This contrasts to the strategy in Experiment #1 that is designed to passively change a person's network through system recommendations. In the passive experiment, users are not aware that their search algorithm is modified to simultaneously maximize a network parameter such as structural holes. Thus, when searchers in the treatment group make these connections, they are not necessarily aware that these connections are enhancing their network positions. Because their intention in making these connections is no different from that of the control group, any change in the network position can be attributed to the system's manipulation, not users' intent. Experiment #2, on the other hand, tries to induce users to change their social networks of their own volition. The goal is to make the users more aware of their own social networks. Providing feedback on their network positions and how they compare to people who are in similar job roles, ranks, and demographics could potentially induce users to change their networking behaviors. Literature has shown that providing feedback can be effective for changing behaviors (Becherer, Morgan & Richard 1982). For example, providing timely feedback about their work performance can help the employees to increase their organizational commitment and job satisfaction (Becherer, Morgan & Richard 1982). Similarly, by providing feedback about their social network statistics, we hope to influence individuals to alter their behaviors and change their network positions. Both the feedback and how it is presented can have an impact on how people change their behaviors. For example, providing only the mean may prompt individuals who are below average to actively pursue network connections. Providing the maximum, on the other hand, may prompt more competitive individuals to change. Measuring network characteristics before and after the intervention, we can gauge whether there is a network change, and if so, how large the network change is when individuals are provided with information about their social networks. We can compare the results with that of Experiment #1 to evaluate their relative effectiveness in generating a network change. However, we should also take the costs of the change into account. It is possible that the extra costs of actively seeking beneficial connections are higher than the costs of making connections that are passively suggested by an online social network system. In Experiment #1, specific recommendations are made after each search and individuals simply need to follow up with the recommendations. The costs to pursue these recommendations should not be any higher than the recommendations that are generated by the default algorithm. After all, the searchers are active pursuing information from their colleagues; they would pursue the recommendations regardless whether the person just also happens to maximize their structural holes. On the other hand, in Experiment #2, individuals need to actively seek beneficial connections themselves before attempting to make them. Therefore, they incur the extra cost of figuring out with whom they can connect first. Thus, I suspect it may cost more effort and time for a person to actively seek social connections than passively connecting to individuals through a system recommendation. However, it is likely that the connections an individual actively seeks are fundamentally different from the connections suggested by Expertise-Find. For example, subjects may make a connection through Expertise-Find because they need an answer to their current problem at work, and the relationship may be short-lived. On the other hand, connections made after receiving network feedback may be more strategic and have greater potential value. Thus, it is an empirical question to understand the quality and the type of connections made from the two sets of experiments. Perhaps, if there are such quality differences, the performance impact from network change by providing feedback about an employee's network is also different from manipulating Expertise-Find's rankings. While passive manipulations may induce many shortterm connections for a specific goal (and the ties may dissolve shortly after), actively reaching for network connections may have long-term performance and network implications. Table 8 summarizes the research questions addressed in Experiment #2. Table 8: Research questions addressed by Experiment #2 1. Can providing feedback change a person's network positions? If so, how big is the change and how big is the performance resulted from the change in networks. 2. Compare the network change generated by a passive manipulation (Experimental #i) vs. a change generated by an active manipulation (Experiment 3. 4. #2)? Compare the performance change between Experiment #1 and Experiment #2 Compare the information advantage generated from Experiment #1 and Experiment #2 Network Contamination and Sample Size Calculations In this experiment, we use all 9,035 employees in our sample to study how providing feedback could drive network change. To avoid network contamination stemming from using global network measures, only local network measures are used to gauge a network change. Since this intervention does not recommend any specific individuals to maximize a person's network measure as it did in Experiment #1, we are interested to see not only if there is a change in network positions but also the type of network change that feedback can provide. For example, a new connection could be a random person in the company, or it could be a manager, or someone from a different group, or a person who is central in the organization in both formal hierarchy as well as informal social networks. To gauge the new types of ties formed by an individual, we use the network characteristics of people connected to the individual prior to the intervention in order to avoid potential network change derived from the experiment. We also need to know the minimum sample size required to detect changes in different types of network ties. With 80% power and 9,035 employees in the sample, we can easily detect a statistically significant change. Table 5 lists the sample size required for testing various network outcomes. For example, to detect a change caused by adding one extra manager in the network only requires a sample of 161 people. To identify a change in performance, we can detect a 3% change in monthly revenue with 9,035 individuals. Experiment #3: Active vs. Passive Tie Formation This experiment explores if there is a complex relationship between ties formed through actively seeking for social connections and ties formed passively through changing the underlying search algorithm for Expertise-Find. It is possible that ties formed passively are fundamentally different from ones formed actively. Thus, the two mechanisms may be complementary for forming new ties. However, it is also possible that they are substitutes if they 92 generate the same underlying network structure. Thus it is an empirical question to examine if the two act as complements or substitutes for forming social connections and if there are any performance implications. Robustness Checks Although randomized experiments can help us tease out the causal relationship between network structures and work performance, there might be other intervening factors that will modulate the relationship. For example, when opportunities are scarce and few projects are available, social networks may have a stronger effect on work performance, as people tend to activate their social networks to look for new opportunities. Similarly, when opportunities are abundant and workloads are high, people may be less likely to leverage their ties to seek more projects. The adversity or prosperity of the economic environment may condition the overall effect from social networks. In interpreting the results obtained, it is important take into account the economic environment while the experiment is being conducted. To address these issues, I plan to conduct several interviews after the experiment to understand how workers leverage social networks during unusual economic conditions. However, if the experiment occurred during special circumstances such as in a recession, it is important to understand how people enact their social network when facing distressing circumstances. Recently, a steam of work explores how people activate their social ties in dealing with complex problems within a difficult environment (Srivastava, 2o1; Smith Menton & Thompson). With real-time and moment-to-moment interactions as captured by people's digital trace, we may be able to understand the intermediate mechanism of how people enact their social networks and if the enacted social networks have different properties than the people's latent social networks. There are concerns that even within the same firm, different cultures may induce people to use their social networks differently. Several studies have shown that the role of brokerage 93 may vary significantly based on the culture and seniority in the firm (Bian, 1997; Burt, 2007). The beauty of using randomized assignment is that it eliminates these types of bias. However, to make our estimate more precise, we choose to study consultants in the United States only and we plan to stratify based on covariates and randomize the treatment based on a group of covariates. Conclusion and Pre-Experimental Statistics Currently, I have analyzed the correlations between various network measures and work performance. Specifically, I have uncovered three preliminary results. First, the structural diversity of social networks is positively correlated with performance, corroborating previous work. Second, network size was positively correlated with higher productivity. However, when we separated network size into in-degree and out-degree, we found that while in-degree is positively correlated with higher work performance, out-degree is not correlated with performance in the project network - that is, where each node was a project, not a person. Third, for both the employee and the project network, knowing powerful individuals such as executives is positively associated with work performance. However, having many managers on a project is negatively correlated with project revenues. To understand the detailed mechanisms and causal relationships of these results, I plan to conduct randomized experiments to test the causality of these correlations. Establishing the causal relationship between social networks and work performance would be an important contribution to the literature of social networks and information worker productivity. I plan to explore the mechanism of how social networks enable timely access to information through mining the actual content of people's electronic communications. Through this exercise, I hope to understand how people acquire knowledge in work settings and under what conditions experts actually improve work performance. References Aral, S., Brynjolfsson, E., & Van Alstyne, M. 2006. "Information, Technology and Information Worker Productivity: Task Level Evidence." Proceedings of the 27th Annual International Conference on Information Systems, Milwaukee, Wisconsin. Aral, S., & Van Alstyne, M. 2007. "Network Structure & Information Advantage" International Conference on Network Science 2007 Becherer, R. C., Morgan F., and Richard, L.M. 1982, "The Job Characteristics of IndustrialSalespersons: Relationships to Motivation and Satisfaction," Journal of Marketing, 46, 125-135. Burt, R. 1987. "Social Contagion & Innovation: Cohesion versus Structural Equivalence." American Journal of Sociology, 92: 1287-1335. Burt, R. 1992. "Structural Holes: The Social Structure of Competition." Harvard University Press, Cambridge, MA. Burt, R. 1997. "The Contingent Value of Social Capital", Administrative Science Quarterly, Vol. 42. No. 2 Burt, R. 2004. "Structural Holes & Good Ideas" American Journal of Sociology, (110): 349-99. Lin, C., Ehrlich, K., Griffiths-Fisher, V., and Desforges, C., SmallBlue: People Mining for Expertise Search, IEEE Multimedia Magazine, Jan.-Mar. 2008. Coleman, J.S. 1988. "Social Capital in the Creation of Human Capital" American Journal of Sociology, (94): S95-S120. Freeman, L. 1979. Centrality in social networks: Conceptual clarification. Social Networks 1(3) 215-234. Garguilo, M., and A. Rus 2002 "Access and mobilization: Social capital and top management response to market shocks." Working paper, INSEAD. Granovetter, M., 1973. "The strength of weak ties." American Journal of Sociology, 6: 1360-1380. Granovetter, M., 1982. "The strength of weak ties: A network theory revisited." In P. V. Marsden and N. Lin (eds.), Social Structure and Network Analysis 1o5: 1-30. Granovetter, M. 1985. "Economic Action & Social Structure: The Problem of Embeddedness." American Journal of Sociology (91):1420-1443. Granovetter, M. 1992. "Problems of Explanation in Economic Sociology." In N. Nohria & R.G. Eccles (eds.), Networks & Organizations: 25-56. Harvard Business School Press, Boston. Hansen, M. 1999. "The search-transfer problem: The role of weak ties in sharing knowledge across organization subunits." Administrative Science Quarterly (44:1): 82-111. Hansen, M. 2002. "Knowledge networks: Explaining effective knowledge sharing in multiunit companies." Organization Science (13:3): 232-248. Pentland, A. 2006. "Automatic mapping and modeling of human networks" Physica A: Statistical Mechanics and its Applications. Podolny, J., and Baron, J. 1997. "Resources and relationships: Social networks and mobility in the workplace." American Sociological Review (62:5): 673-693. Polanyi, M. 1966. "The Tacit Dimension." New York: Anchor Doubleday Books. Reagans, R. and McEvily, B. 2003. "Network Structure & Knowledge Transfer: The Effects of Cohesion & Range." Administrative Science Quarterly, (48): 240-67. Reagans, R. and Zuckerman, E. 2001. "Networks, diversity, and productivity: The social capital of corporate R&D teams." Organization Science (12:4): 502-517 Smith, E., Menon, T., & Thompson, L., Status Differences in the Cognitive Activation of Social Networks (September 22, 2010). Organization Science, Forthcoming. Waber, B.N., Olguin Olguin, D., Kim, T., Mohan, A., Ara, K., and Pentland, A. 2007. "Organizational Engineering using Sociometric Badges" International Conference on Network Science, New York, NY. Wu, Waber, Aral, Brynjolfsson & Pentland ""Mining Face-to-Face Interaction Networks Using Sociometric Badges: Predicting Productivity in an IT Configuration Task", International Conference on Information Systems, Paris, France, December 14 - 17,2008. Uzzi, B. 1996. "The Sources and Consequences of Embeddedness for the Economic Performance of Organizations: The Network Effect." American Sociological Review, (61):674-98. Uzzi, B. 1997. "Social Structure and Competition in Interfirm Networks: The Paradox of Embeddedness." Administrative Science Quarterly, 42: 35-67. Water Cooler Networks: PerformanceImplications of Informal Face-to-FaceInteractionStructures in Information-Intensive Work Lynn Wu MIT Sloan School of Management lynnwu@mit.edu Benjamin N. Waber MIT Media Laboratory bwaber@media.mit.edu Sinan Aral NYU Stern School of Business &MIT Sloan School of Management sinan@stern.nyu.edu Erik Brynjolfsson MIT Sloan School of Management erikb@mit.edu Alex (Sandy) Pentland MIT Media Laboratory pentland@mit.edu This study examines the performance characteristics of face-to-face interaction networks and finds that their structural properties are important for effective knowledge transfer and productivity. We argue that network theory should incorporate the implications of media choice, and particularly differences between face-to-face and electronic communication, when assessing how networks affect individual performance. We introduce a new methodology, using Sociometric badges, to record precise data on face-to-face interaction networks for a group of workers in a large IT manufacturing firm over a one-month period. Linking these data to detailed performance metrics, we find that 1) network cohesion is associated with higher worker productivity, in contrast to previous findings in email data; 2) cohesion in face-to-face networks is associated with even higher performance during complex tasks, suggesting that cohesion complements information-rich media for transferring the complex knowledge needed to complete such tasks; 3) while information-seeking from many colleagues creates disruptions, more interactions with a few key strong-tie informants speeds up work. Face-to-face networks have more explanatory power than physical-proximity networks, suggesting that information flows in actual conversations (rather than individuals' correlated exposure to common environmental factors through physical proximity) are driving our results. These results augment our understanding of how media choice and network structure interact, shedding light on the organizational effects of face-to-face interaction. The methods and techniques we introduce are replicable, creating opportunities for new lines of research into the consequences of face-to-face interaction in organizations. Keywords: Social Networks, Face-to-Face Communication, Information Worker Productivity, Sociometric Badge. Introduction Social networks are theorized to affect work performance due to their central role in the informal structure of organizations (Sparrowe et al., 2001). Numerous studies have shown that social networks can affect organizational power, innovation, creativity, and individual and team performance (Sparrowe et al., 2001; Cumming and Cross, 2003; Krackhardt, 1990). However, while most studies elicit generalized social networks ties or specific types of relationships (such as friendship or advice seeking relationships) through the use of survey instruments or electronically recorded communication logs such as email or telephone records, almost no research examines the performance effects of face-to-face conversational networks using behavioral data. Face-to-face conversations, both formal and informal, remain a significant part of organizational communication which to date has been understudied. Valuable information passes through verbal communication at the proverbial water cooler and face-to-face communication can, through non-verbal cues, transmit important calibrations of norms and culture and provide a medium for the informal development of trust and affect among organizational members (Csikszentmihalyi, 1996). The lack of studies on face-to-face networks represents an important gap in social networks research and our understanding of the informal structures that facilitate work in modem organizations. Recent advances in electronic communication and ubiquitous email use give researchers the opportunity to solicit networked interactions through moment-to-moment email communication data and thus to eliminate some of the known biases of survey instruments (Quintane and Kleinbaum, 2008). As email communication logs record who has emailed whom, the exact time of the interaction, and the content of the exchange, email archives allow researchers to study the mechanisms through which electronic communications impact organizational structure and work performance (Aral et al., 2006, 2007). Attention has also turned to other forms of remote collaboration, with researchers examining how groups coordinate in virtual environments (Huang et al., 2009). However, while email has certainly become an important communication tool over the last fifteen years, face-to-face conversations remain a critical and in many cases predominant mode of communication (Chidambaram and Jones, 1993). What is known about electronic communication networks does not necessarily inform us about the implications of face-to-face communication networks. Face-to-face conversations are likely to deliver fundamentally different types of information and advice than what is transferred via email or over the telephone. Consequently, network properties associated with improved work performance in face-to-face networks may differ dramatically from networks actualized through other modes of communication. If we utilize our face-to-face networks differently than we do our electronic networks - if we transfer different types of information in different ways across these different media - then the ubiquity of electronic communication data could lead researchers to invalid generalizations about organizational networks if electronic communication is accepted as a broad proxy for the generic social structures instantiated in organizational communication. Unfortunately, until now, recording precise and reliable data on face-to-face interaction has been difficult, as it requires moment-to-moment records of individuals' conversations over time and because there is no trace, electronic or otherwise, of most face-to-face interactions. While self-reports may be good instruments for recording perceived social ties, they do not typically capture accurate data about actual face-to-face conversations (Marsden 2005). Recording precise moment-to-moment interactions between individuals is essential to understanding how workers seek and access information informally to solve complex problems at work. To fill this gap, we employ a new data collection method that utilizes Sociometric badges developed at the MIT Media Laboratory to record continuous face-to-face interactions among employees at a commercial IT hardware facility over time. Recording actual face-to-face interactions, we eliminate some of the known biases in self-report studies (Marsden, 2005). Thus, we are able to introduce observable characteristics of face-to-face communication into social network analysis to understand how such communication enables or constrains information transfer and work performance. Combining data on face-to-face interactions with project and accounting data on the relative performance of the same set of workers, we evaluate which face-to-face network structures best predict higher performance and whether these structures differ from those found to predict productivity in the context of electronic communication networks such as email and telephone communication. Our analyses uncover three key results. First, we find that in face-to-face networks, cohesion and strong ties are positively correlated with higher worker productivity. This contrasts what has been found in email communication data (Aral et al., 2008; Wu et al., 2009) where diverse networks with weak ties and structural holes are correlated with higher performance. These results imply that the mode of communication is essential to understanding the value of communication in network structures. That cohesive networks are valuable in thick synchronous information-rich communication channels that provide non-verbal cues (e.g. faceto-face), while structurally diverse networks are valuable in codified asynchronous communication channels (e.g. email), suggests that different network structures complement different communication channels, providing evidence of the need to incorporate theories of communication media choice into social network theory. Second, network cohesion is associated with even higher performance when workers are executing complex tasks, suggesting the need for tight clustered networks to transfer the complex information and knowledge needed to complete complex work. Third, while information seeking from many colleagues creates disruptions, more interactions with a few key strong-tie informants speeds up work, implying that larger networks are costly to maintain, while high bandwidth networks of fewer contacts provide access to relevant information most efficiently. These results are much stronger in conversation networks than in physical proximity networks, indicating that information flows in 100 actual conversations (rather than individuals' correlated exposure to common environmental factors through physical proximity) are driving our results. Although we cannot firmly identify the direction of causality in our results, panel data estimates eliminate bias from any unobserved time-invariant factors that may confound the findings. Furthermore, on-site visits and interviews support our conclusions - employees corroborated their use of face-to-face conversations to communicate complex and embedded knowledge. These results demonstrate the importance of face-to-face social networks in predicting worker productivity even as technology-mediated communications become ubiquitous. Differences in the types of network structures associated with performance in faceto-face networks compared to electronic communication networks suggests a need for social network theory to incorporate media choice as a significant driver of network outcomes. Such evidence is important for managers who face increasingly global and geographically dispersed work environments, as electronic communication networks alone may not be enough to transfer the complex tacit knowledge needed for the successful execution of complex tasks. Theory In this study, we link social network theory (e.g. Granovetter, 1973, Burt, 1992) and characteristics of face-to-face communication (e.g. Daft and Lengel, 1986, Chidambaram and Jones, 1993) to understand what types of network structure are most conducive to transferring knowledge and improving work performance in face-to-face work environments. Specifically, we contrast new evidence on face-to-face networks with prior results on email networks. Using electronic communication data, prior work has found that structurally diverse networks with many weak ties and structural holes are associated with improved individual work performance. We contend that the opposite should be true in face-to-face networks. Electronic networks such as email facilitate information sharing that is constrained by the medium to be codifiable and simple. In contrast, information exchanged in face-to-face networks is more likely to be tacit and 101 more complex. We hypothesize that such information is transferred more effectively in cohesive networks of strong ties. By elevating face-to-face network data collection to comparable standards of accuracy and precision found in electronic communication data, we open new avenues for understanding how media choice interacts with social network structure to facilitate knowledge transfer and performance. Although many different types of network data, from call logs to email to surveys, are used to examine network structure and knowledge transfer in organizations, most modern network theory remains agnostic about the communication channels through which information will flow, leaving open the possibility that different network structures complement different communication media to facilitate knowledge transfer. By hypothesizing and testing a set of predictions in face-to-face networks which contrast those that have been shown to hold in electronic networks we substantiate a call to bring media choice back into considerations of the relationship between network structure and work performance. The Effect ofNetwork Cohesion in Face-to-Face Networks As face-to-face communication offers the richest medium for sharing complex knowledge and cohesive networks provide the most effective structure for transferring such knowledge, we hypothesize that cohesive networks are likely to complement face-to-face communication in effectively locating and transferring complex tacit knowledge. Face-to-Face Communication and Knowledge Exchange A long line of research in organizational communication theory, particularly information richness theory, posits that face-to-face conversation is the richest mode of communication, providing multiple social cues through both natural language and body language, greatly reducing equivocality (Daft and Lengel, 1986, Chidambaram and Jones, 1993). Face-to-Face communication has two important properties that help facilitate information transfers: the ability to transmit complex and tacit information and the ability to foster trust between actors. 102 Face-to-face communication is thought to have the greatest capacity to transfer complex knowledge (Roberts, 2000). Complex knowledge is typically defined in terms of its codifiability and interdependence. When knowledge is codified, it has a stable meaning and can be expressed in writing since symbols representing the knowledge (e.g. mathematical formulas, acronyms, etc.) are already widely understood throughout an organization (Brynjolfsson, 1994; Hansen, 1999; Reagans and McEvily, 2003). On the other hand, tacit knowledge that is difficult to codify cannot be precisely expressed using existing symbolic representations. Transferring tacit knowledge can be difficult, because it is hard to articulate. Furthermore, even when knowledge can be codified, it can still be difficult to transfer if its component concepts are complex or highly interdependent. This interdependency is characterized by the degree to which knowledge is part of a larger system of interrelated concepts (Teece, 1986; Winter, 1987). Transferring such knowledge can be particularly challenging, as it requires the transmission of knowledge related to the larger conceptual system in addition to the specific knowledge itself. Furthermore, understanding tacit knowledge may be much more important than understanding codified knowledge in information-intensive settings. Tacit knowledge is often heavily embedded in a system and is thus more valuable for solving local or context-dependent problems. Similarly, because codified knowledge can be easily learned, competitive advantage is more likely to arise from understanding and using tacit knowledge. Face-to-face communication provides an efficient channel through which to transfer tacit and interdependent knowledge, as it facilitates interaction across multiple levels of communication-verbal, physical, and contextual. In a face-to-face conversation, it is natural to interrupt, learn and give feedback as two people interact, increasing the information processing power of the exchange (Nohria and Eccles, 1992). During a face-to-face conversation, one party can see how the other party is responding and can strategically alter the presentation to facilitate communication, especially if the concept is complex (Goffman, 1982). When people meet faceto-face, they are more likely to devote more energy to the other party, as face-to-face 103 conversations often require full cognitive attention as oppose to other forms of communication that do not require an immediate response. For example, while email can deliver the same verbal content as a conversation, informal social mechanisms that ensure recipients devote time and energy to absorbing the content of the conversation are missing. Thus, face-to-face communication can be more effective at transferring tacit and interdependent information than other communication media such as email or the telephone. Furthermore, simultaneously processing information in multiple ways is critical to creativity and problem solving (Bateson 1973; Csikszentmihalyi 1996). Physical, verbal and contextual cues enabled by face-to-face conversations can be complementary and mutually enriching, leading to discoveries that would not have been possible using more asynchronous communication channels. Face-to-face interactions also foster trust and provide actors the motivation they need to devote time and effort to transferring complex or tacit information. In a face-to-face conversation, people can read others' intentions, because humans are quite effective at sensing and processing non-verbal messages, particularly about emotion and trustworthiness (Putnam, 2000). Similarly, face-to-face interactions can accelerate the bonding process and foster informal friendship networks (Storper and Venables, 2003). Water-cooler conversations during breaks allow workers to develop a sense of members' expertise, competence, character and personalities outside the immediate task environment. These informal conversations create a basis to develop trust. When people communicate frequently, especially informally, they tend to create stronger bonds upon which trust can be built (Cheepen, 1988). Once trust is developed, a source is more willing to initiate knowledge transfers and to work to ensure that recipients understand the information even if the knowledge is complex and difficult to share (Reagans and McEvily, 2003). Network Cohesion and Knowledge Exchange Network cohesion also supports transfers of tacit complex knowledge. To find a solution 104 to an ambiguous and complex problem, an information seeker must be able to articulate her request clearly to others. Absorptive capacity can facilitate the expression of ideas in ways that others can understand (Cohen and Levinthal, 1990). A cohesive network can effectively increase absorptive capacity as repeated communication from multiple perspectives allows a group of actors to develop group-specific communication heuristics that ease the expression of complex and ambiguous problems. Contacts in a cohesive network can then be more effective in identifying relevant recommendations and transferring necessary information especially if it is tacit and context-dependent. Through frequent communication, actors are less inhibited from asking for clarification and accordingly, they are more likely to assimilate information. Network diversity and structural holes, the lack of connection between one's contacts, has been shown to provide access to novel information (Burt 2004) and improve performance (Burt 2000, Sparrowe et al., 2001, Reagans and Zuckerman, 2001; Cummings and Cross, 2003). In particular, in analysis of email communication networks, message content and employee performance, Aral & Van Alstyne (2009) demonstrate that networks with structural holes deliver diverse and novel information and that access to novel information explains a significant portion of the variance in productivity - more so for instance than traditional human capital. Yet, while structurally diverse networks with an abundance of weak ties are beneficial for exposing actors to novel information, they are less effective at transferring complex knowledge (Hansen, 1999; 2002). While people with a structurally diverse social network can efficiently locate information, whether they can successfully assimilate the information depends largely on the effectiveness of the knowledge transfer. When information is simple, explicit or declarative, a structurally diverse network with many weak ties is sufficient, as the information can be easily transmitted between actors. However, a network with many structural holes may not be as effective as a cohesive network for transferring complex or tacit information for three main reasons. First, cohesive networks facilitate knowledge transfer because actors are more likely to 105 trust each other. Trust has been shown to reduce the perceived cost of information sharing as well as increase spontaneous information sharing (Kramer, 1999). Without trust, the source may simply refuse to pass on the information to the recipient. Consequently, it is important to convince the source that the transfer would not negatively affect them or be too costly. A cohesive network with strong ties and a dense web of third-party ties can help convince the source to initiate the transfer (Reagans & McEvily 2003). By creating cooperative motivation and removing competitive impediments to information transfer, cohesion can increase trust between parties (Granovetter, 1992; Reagans and McEvily, 2003), which is especially important for transferring complex knowledge. Second, greater absorptive capacity in a cohesive network facilitates effective knowledge transfer (Hansen, 1999). Absorptive capacity is important for recognizing the value of new information and for enabling the assimilation and application of the information (Cohen and Levinthal, 1990). A cohesive network can increase absorptive capacity as repeated communication allows actors to develop relationship-specific communication heuristics that ease knowledge transfer (Hansen, 1999). With more frequent communication, actors are less inhibited from seeking information and asking for clarification in a cohesive network, and accordingly, they are more likely to understand how to correctly use the information more effectively. Third, the redundancy inherent in cohesive networks allows actors to receive information through multiple perspectives from multiple people, easing knowledge transfer. Although cohesive networks have been criticized for supplying redundant information, redundancy can also be a powerful instrument for effectively transferring tacit knowledge. Redundancy does not simply duplicate existing knowledge, but also often creates an intellectual common ground that can help individuals sense what others are struggling to articulate (Nonaka, 1990; 1994; Grant, 1996). Consequently, cohesive networks can facilitate tacit information transfers by allowing the same information to be repeated multiple times from 106 different perspectives. Complementarity between Network Cohesion and Face-to-Face Interaction Transfers of tacit complex knowledge require both frequent, embedded interaction and high fidelity conversation. While cohesive networks increase absorptive capacity by providing the infrastructure for frequent interactions, face-to-face conversations increase absorptive capacity by enhancing the fidelity of each conversation. Frequent interactions, although helpful, cannot replace nonverbal cues and feedback available in a face-to-face conversation. Face-toface communication offers the maximal information transfer in each exchange. Through verbal, physical and contextual cues, face-face conversations can greatly improve understanding of complex concepts. Although email may textually deliver the same verbal content of a conversation, it lacks rich social cues which recipients process simultaneously and multimodally to absorb the content of the conversation. When an idea is particularly ambiguous and complex, structurally diverse networks are likely to be insufficient because infrequent communications in diverse networks inhibit the development of group norms and group-specific communication heuristics. Without repeated interactions, enabled by a cohesive network, people may experience difficulty expressing their ideas and are less likely to receive relevant responses. Thus, face-to-face communication and cohesive networks should complement each other to provide both frequent and high fidelity communication to improve the ability of a person to express complex ideas and disambiguate misconceptions. Since cohesive networks can foster similar norms, enforce sanctions when someone misbehaves and cultivate reputations among the group members, cohesive networks can also remove motivational impediments to information sharing. These properties allow members of a cohesive social network to trust each other. The dense web of third party ties in cohesive networks also creates protection and cooperative motivation for the source to share information. When a source refuses to help, her reputation may suffer as her uncooperative behavior can 107 quickly pass through a cohesive network and others may immediately issue sanctions against her. Similarly, it also offers protections to the source, if the recipient of the information inappropriately uses the information. News of the untrustworthy behavior would quickly spread in a cohesive network and the person would no longer receive cooperation from others. While face-to-face communications can develop trust and cooperative behaviors in general, their effectiveness in structurally diverse social networks is likely to be limited. Without frequent communication needed to establish trust, face-to-face communication alone is not enough to motivate cooperative behaviors in a structurally diverse network where communication may be relatively rare. Furthermore, without a dense web of third-party ties to vouch for the information seeker and protect the source, the source is less likely to be willing to initiate the transfer, especially if the information is complex or sensitive. There is also little penalty for refusing requests for information or for the seekers to misuse information since they are likely to belong to different communities than the source. Lastly, both face-to-face communication and cohesive networks can promote serendipitous encounters where people can receive valuable information without necessarily seeking it. Proverbial water cooler conversations are often random encounters. With multiple levels of communication - verbal, physical and contextual - face-to-face conversations during these random encounters allow actors to simultaneously process information multi-modally, which is critical for creativity and problem solving (Bateson, 1973). Frequent communication enabled by cohesive networks improves the probability of having serendipitous meetings. Thus, face-to-face conversations and cohesive networks are likely to be complementary and mutually enriching, leading to discoveries that would not have been possible using more asynchronous communication channels or less cohesive networks. In summary, we expect that face-to-face networks require network cohesion to transfer the more complex, embedded knowledge that they are typically relied upon to transfer, and that face-to-face communication complements network cohesion in facilitating complex knowledge 108 transfers. We therefore hypothesize that network cohesion is positively associated with work performance in a face-to-face networks. Hypothesis 1a: Cohesion in face-to-face networks, measured by network constraint,is correlatedwith strongerinformation worker performance. Furthermore, we expect the effect to be more pronounced when workers are engaged in complex tasks that require more tacit and interdependent information. Simple tasks that require relatively codified and context independent information can often be solved without the necessity of using a cohesive network or face-to-face communication. However, when workers face complex tasks that presumably require access to tacit and embedded information, manuals prove to be less useful and workers must turn to face-to-face conversations with colleagues in a cohesive network to access the desired information. As face-to-face communication is the richest medium that can most effectively transfer complex knowledge between actors, and because cohesive networks are most effective for transferring complex knowledge, a cohesive face-to-face network may be especially helpful in transferring tacit or embedded knowledge used for the execution of complex tasks. Thus, we hypothesize: Hypothesis ib: Cohesion in face-to-face networks, measured by network constraint,is more helpfulfor completing complex tasks than simple tasks. The Effect of Tie Strength in Face-to-Face Networks Knowledge sharing is more common among strong ties (e.g. Henderson and Cockburn 1994, Eisenhardt and Tabrizi, 1995, Hansen 1999) because individuals linked by strong ties have greater motivation to be of assistance and to make themselves available to one another (Grannovetter 1982). Strong ties also facilitate more frequent two-way interactions that help assimilate information, allowing recipients to get immediate feedback from strong tie contacts while information is being transferred (Polanyi 1966, Barton and Sinha, 1993). High frequency 109 interactions with strong ties allow the source and recipient to develop relationship-specific heuristics that make it easier for the source to understand and use the information provided (Mergel, Lazer, and Binz-Scharf, 2008). Strong ties also tend to minimize conflict between individuals, making information sharing more likely (Hansen 1999). In his study of networks amongst business units in a national electronic and computer firm, Hansen (1999) finds that the effectiveness of weak and strong ties in transferring information is contingent on the complexity of the information being transferred. Weak ties have the strongest positive effect on information sharing when knowledge is codified and context independent, while strong ties provide the strongest effect when the knowledge being transferred is tacit and interdependent. Centola and Macy (2007) also document the contingent benefit of weak ties. While they are efficient for propagating simple contagions, weak long ranging ties are not necessarily sufficient for propagating complex contagions, in which an actor is infected only if more than one neighbor is also infected. However, very few researchers have examined the degree to which the effect of strong or weak ties on knowledge transfer is moderated by the communication channels through which information is transmitted. We therefore recast arguments about cohesion and tie strength in the context of different channels of communication, contrasting thick rich face-to-face interactions with asynchronous communications such as email in which non-verbal cues and simultaneous interaction are absent. We argue these differences should change predictions about which types of network structure are likely to support information exchange and thus performance. Strong ties facilitate complex information sharing primarily because they enable frequent interactions, which allow actors to establish effective communication mechanisms, foster trust and minimize conflicts, all of which are essential to sharing information (Hansen, 1999). In the same way that face-to-face communication should complement cohesive networks in transferring information, it should complement strong ties by improving the quality of each communication exchange. Together, tie strength and network cohesion provide both the quality 110 and the frequency of communication that are necessary to build trust, minimize information distortion and improve information transfer. Other text-based communications, such as email, do not have the capacity to transmit the rich social cues that are necessary to transfer tacit and embedded information, and accordingly, cannot complement strong ties to transfer complex information. However, maintaining dyadic ties is costly in face-to-face networks, which require time-consuming conversations and co-presence (Burt, 1992; Uzzi, 1997; Hansen, 1999). The cost may not justify the benefit if these ties are used to transfer simple information, which does not necessarily require a rich communication channel. On the other hand, selectively cultivating a few strong ties in face-to-face settings may be extremely beneficial if they are used to transfer complex and sensitive information. Thus, the tradeoff between using strong or weak ties in a face-to-face conversational network depends largely on the nature of the information being transferred. If the task requires complex information, strong ties can give extra power in transmitting knowledge. However, if the information is simple and can be articulated in writing, maintaining a large number of close contacts may be too expensive to justify the maintenance cost. Maintaining nonessential face-toface contacts may take time away from task completion activities and thus hurt productivity. Thus, we hypothesize that network size (maintaining many face-to-face contacts) has a negative average effect on work performance, as transferring simple knowledge through direct contacts is expensive. However, relying on strong ties to access information in face-to-face networks may be beneficial when workers are executing complex tasks that require more complex and tacit information. Hypothesis 2a: On average,network size has negative effect on work performance. Hypothesis 2b: Strong ties have a positive effect on work performance when solving complex tasks. The Effect ofNetwork Reach in Face-to-Face Networks We have argued that face-to-face communication can complement a cohesive network 111 and strong ties to transfer complex information. However, before the transfer can occur, it is important to find the source of the information first. The ability to access multiple parts of the informal organizational network helps a worker find required information as well as to seek advice and support in solving problems and meeting client requirements (Hansen 1999). Network reach, the number of individuals one can reach within two steps in the network, has been theorized to provide access to a broader range of information and solutions to solve problems encountered during task execution (Reagans and McEvily, 2003). Broad network reach can facilitate search because it enables actors to access people in various parts of the network who could provide a given piece of knowledge either directly or through a friend. Consequently, reach should be positively correlated with performance controlling for network size and network cohesion. While broad network reach is desirable for the information search process, its effect is likely to be more pronounced in a face-to-face social network due to information distortion. For textual exchanges, distortion is less problematic because the original text can be passed electronically without alteration. When information passes through a long path in a face-to-face network, it is more likely to be distorted, as people tend to misunderstand or misinterpret information through each exchange (Collins and Guetzkow, 1964; Huber and Daft, 1987; Gilovich, 1991; Hansen, 2002). Imprecise or inaccurate information can have a negative performance impact on recipients. Acting on vague information obtained indirectly, the recipient may need to use her ties to connect to the original information source, only to find it was not what she sought. Eliminating misleading information is costly, as verifying each incorrect lead wastes valuable time and effort. When an actor with high network reach can easily access other experts directly, not only is she exposed to less information distortion, she can also access knowledge more quickly. We therefore hypothesize that: Hypothesis 3a: Greater reach in face-to-face networks is correlated with stronger informationworkerperformance. 112 Transfers of complex knowledge may experience even greater distortion than transfers of simple knowledge, as complex knowledge is inherently more difficult to understand and is more likely to be misinterpreted. Broad network reach can reduce information distortion and promote knowledge transfer by influencing an actor's ability to effectively access complex ideas in the network. Workers with broad network reach are exposed to more views and perspectives, allowing them to understand information from different angles and to frame information in ways others can understand. Thus, we hypothesize that broad network reach is particularly beneficial for complex tasks that require both more information and information that is inherently more complex and therefore more difficult to transfer and absorb. Hypothesis 3b: Broad network reach in face-to-face networks is more strongly associatedwith improved performance when completing complex tasks. Background and Data Setting We studied a data server configuration facility with 56 employees, 36 of which participated in this study. The organizational chart of the division can be seen in Figure 1.While the job description of employees at the facility is heterogeneous, we focus on a set of employees whose primary role is to guide, solicit and capture clients' IT configuration requirements, and to produce IT products according to those specifications. The group consists of 28 people, 23 of whom participated in the study. Interviews indicate that the data configuration process is information-intensive, requiring employees to quickly analyze the feasibility of specifications and build the system. Our onsite interviews with both managers and configuration specialists indicate that talking to other employees is particularly helpful for understanding how the overall system works, how requirements fit together and how interoperability constrains the set of viable specifications, as there are no existing manuals to explain all the intricacies of the system. Employees therefore engage in face-to-face communication to transfer tacit and embedded 113 knowledge. Each configuration task is executed by a single individual and is randomly assigned given a workload constraint, much like a series of queued tasks. In this setting, everyone in the configuration division is placed in a large room with four rows of cubicles. Each row has 4 pairs of cubicles with each pair facing each other. Since everyone is collocated in the same room, there are ample opportunities to meet face-to-face. Department Manager (1 Person) Configuration Manager (I1Person) Pricing Manager (I1Person) Business Coordinator Configuration Strategists (28 People) Pricing Strategists (10 People) Business Coordinators (14 People) Manae Figure 8: Organizational Chart To measure worker performance, we collected data on 911 configuration tasks during the experimental period of 25 working days (more than one month's activities at the facility). For each task, we gathered data on the task duration, difficulty level, the number of follow-ups the employee conducted with the client, and information about the employee who performed the task. Although some of the tasks took less than a day to finish, tasks that took more than one day deserve special consideration as we cannot assume the worker is working on the task 24 hours a day. To better approximate the completion time of tasks that span multiple days, we assumed an 8-hour workday. Our interviews with staff indicate that employees typically follow this work schedule and rarely stay late or work on weekends to catch up. Although task completion time is only one dimension of work performance, it is an important outcome in the computing industry (Eisenhardt & Tabrizi 1995), and in this organization employees are formally evaluated on this metric. 114 .......... . ..... ...................... . The Sociometric Badge To capture face-to-face interactions, employees participating in the study were instructed to wear a Sociometric badge every day from the moment they arrived at work until they left their office. In total, we collected 1,900 hours of data. All of these employees were male and since this was a recently formed department, none had been employed for more than one year. We recorded every face-to-face conversation between workers using the Sociometric badge and continuously logged physical proximity, as well as many other behavioral features such as animation, tonal variation and the sequence and timing of interactions. The content of the conversations were not recorded. Capabilities of Wearable Sociometric Badge Recognizing common daily human activities (such as sitting, standing, walking, and running) in real time using a 3-axis accelerometer (Olguin Olguin & Pentland, 2006). Extracting speech features in real time to capture nonlinguistic social signals such as interest and excitement, the amount of influence each person has on another in a social interaction, and unconscious back-and-forth interjections, while ignoring the words themselves in order to assuage privacy concerns (Pentland, 2005). Performing indoor user localization by measuring received signal strength and using triangulation algorithms that can achieve position estimation errors as low as 1.5 meters, which also allows for detection of people in close physical proximity (Sugano, Kawazoe, Ohta, & Murata, 2006; Gwon, Jain, & The 2: Figure Wearable Sociometric Badge Kawahara, 2004). Communicating with Bluetooth enabled cell phones, PDAs, and other devices to study user behavior and detect people in close proximity (Eagle & Pentland, 2006). Capturing face-to-face interaction time using an IR sensor that can detect when two people wearing badges are facing each other within a 30*-cone and one meter distance. Choudhury (Choudhury, 2004) showed that it was possible to detect face-to-face conversations of more than one minute using an earlier version of the Sociometric badge with 87% accuracy. The 'wearable badge' form factor is particularly useful for collecting data on face-to-face interactions in organizational contexts. First, most organizations already require individuals to wear identification badges with embedded radio frequency identification (RFID). It is not hard to extend the sensing functionality of these badges further with accelerometers, infrared (IR) 115 transceivers, and microphones. Second, wearable badges are less obtrusive than sensors that require a long setup period to function. The success of IT products that employ this form factor for wearable sensors, such as the nTag (http://www.ntag.com/) and Vocera systems (http://www.vocera.com/) suggests that this technology is acceptable to users in a wide variety of contexts. The capabilities of the wearable Sociometric badge are described and a picture of the badge is shown in Figure 2. Below, we describe in detail how we use the Sociometric badge to detect face-to-face interactions and physical proximity. Infrared (IR) can be used to detect face-to-face interactions between people. In order for one badge to be detected by another through IR, two Sociometric badges must have a direct line of sight to each other. Every time a badge detects an IR signal we say that face-to-face interaction may occur. We define the total amount of face-to-face interaction time per person as the total number of consecutive IR detections per contact multiplied by the IR transmission rate, which in our experiments was once every two seconds. We find that this gives a good balance between detection accuracy and power expenditure. We can detect whether a person is speaking through the tonal variations captured through the badge. With both voice and IR detection, we can predict whether two people are having conversations reasonably well. More details about the Sociometric badges can be found in the research note accompanying this article (Waber et al. 2010). Measuring Physical Proximity and Location Sociometric badges can detect other badges in close proximity (within 10 meters) in an omni-directional fashion using the badge's radio. To record the location of an individual, base stations with overlapping ranges are used to triangulate the wearable badge's position down to the sub-meter level. The detailed technical description for the location sensing capabilities can be found at http://hd.media.mit.edu/badges/. 116 The Advantage of Recording Precise Face-to-Face Social Networks Using Sociometric badges, we can accurately detect face-to-face interactions and construct real face-to-face social networks for a group of people over time. Traditionally, selfreports such as surveys and questionnaires are used as the primary tool for generating social network data. Even then, to make survey results more accurate and reliable, subjects are often surveyed repeatedly and questions can be difficult to answer, as it requires subjects to recall specific events in the past (Marsden, 1990). All this can entail a considerable burden to the subject, leading to low participation and high drop off rates. Sociometric badges can on the other hand record accurate social network data while minimizing the cost and burden incurred by participants (participants simply wear a badge that is slightly larger than a typical identification card). Furthermore, the badge does not record actual conversations and data is de-identified prior to analysis, preserving the privacy of employees. Precise measurements of networks may not be as critical for studies of social influence, attitude and opinions as are cognitive networks obtained through self-report. However, obtaining accurate knowledge of precise, time-stamped interaction data is critical for studies of contagion (Centola, 2007) and information transfer and diffusion (Marsden, 1990), which is the focus of this study. As valuable information can pass through any type of relationship, it is crucial to record all instances of information exchange instead of just those recalled in self-reports. Respondents are generally good at remembering recent, frequent, and intense interactions but poor at recalling their interactions with weak and distant ties. Thus, survey methods tend to reveal strong, close relations as oppose to interactions with distant or weak ties. For example, Brewer (2000) reported an appreciable level of omission of weak ties in a dormitory friendship study. Brewer (2000) recommends a few steps to reduce the level of forgetting, including asking subjects to choose their friends and relations from a list rather than asking them to recall contacts directly from memory. Using multiple name generators can also 117 be helpful as friends forgotten in one generator can be named in response to other generators. While these methods can alleviate some of the bias introduced by self-reports, data collection may still be problematic since certain network measures can change dramatically with even a few missing ties (Borgatti, Carley and Krackhardt, 2006). Those missing ties could pass important information and failure to capture them can be problematic. Recent developments in data collection methodologies have advanced the accuracy of network measurements to include weak and distance ties as well as strong ties. These methods include position generators (Lin, Fu and Hsung, 2001) and resource generators (Van der Gaag and Snijders, 2005) that simply ask the subject to recall the number of people she knows in a particular context instead of naming specific individuals. However, without knowing the specific alters, it is hard to construct a network beyond one degree of separation. Without a complete network, it is difficult to calculate structural properties such as structural holes and centralities, which are essential for the study of information transfer and diffusion (Marsden, 1999). Furthermore, Marsden (1990, 2005) finds that while respondents are capable of reporting on their local networks in general terms, they are typically unable to give useful data on the exact timing of interactions. Participants tend to respond to surveys in ways that make them look as good as possible and consequently they tend to under-report behaviors deemed inappropriate and to over-report behaviors viewed as appropriate (Donaldson and GrantVallone, 2002). Sociometric badges can objectively measure the actual occurrence of face-toface interactions and record precisely when, within whom and for how long two people spoke, greatly minimizing reporting biases (Olguin Olguin et al., 2009). Network Variable Construction We measure Network Size as the number of direct contacts one has. In face-to-face networks, a direct link between two actors exists when they engage in at least one conversation during the experimental period. Physical proximity networks, on the other hand, are a broader 118 measure of direct links where network size counts an interaction between actors when they either engaged in a conversation or were physically within ten meters of each other. The Volume of Interactions measures the total number of interactions an actor has with anyone in the network. This differs from network size as it counts all communication incidents regardless of with whom the actor has interacted. For example, an actor who communicates 100 times to a single person in the network would have the same volume of interactions as someone who communicates with 100 different people once. The network size in the former case is one, but in the latter case is 100. While both variables measure the number of direct interactions between actors, network size may have a stronger effect than the volume of interactions in enabling workers to access and transfer complex information. Since a high volume of interaction may also only involve a small group of actors, frequent interaction with the same person may be redundant and may not add value for knowledge transfer. The definitions of all the network variables are shown in Table 1. Table 1: Network Characteristics and Description Description Network Characteristics The total number of contacts with whom an actor exchanges at Network Size least one message The total number of face-to-face interactions an actor Interactions Volume of Iexperiences The probability of an actor that falls on the shortest path Betweenness Centrality between any two other actors Degree to which an actor's contacts are connected to each other Cohesion (Constraint) The number of other people an actor can reach in two links or Reach less Tie Strength is measured using the frequency of one's face-to-face communication with other actors. Granovetter (1982) described four identifying properties of the strength of ties as time, emotional intensity, intimacy and reciprocity. In practice, tie strength has been measured in many ways. Some use reciprocation to represent strong ties and a lack of reciprocation as evidence of weak ties (Friedkin, 1980). Others have included the recency of contact (Lin et al., 119 1978) or the frequency of interactions as a surrogate for tie strength (Granovetter, 1973), and we adopt a similar measure here. We define actor A to have a strong tie to actor B when the frequency of communication between A and B exceeds one standard deviation of A's average communication frequency to everyone else in the network. StrongTie(A,B) = 1,if(freq(A, B) > average(freq(A,A)+ std(freq(A, A)) Table 2: Summary Statistics F2F Network Variables Latent Network Variables Variable Obs. Mean Std. Dev. Min Max Interactions Network size Betweenness 931 931 931 526.62 11-44 1-49 421.52 3-47 1.38 Constraint 931 0.53 0.19 2-step reach 931 86.73 7.52 156 1 0 0 0 2701 20 8.95 1-33 94.44 3-day Network Variables Interactions Network Size 132 132 30.9697 3.666667 48.84059 3.513695 Betweenness Constraint 132 132 1.648174 0.695031 2.453405 0.268236 2-step reach 132 33.93879 28.11835 0 0 0 0.25 0 287 13 10.729 1.9 88-57 We use Network constraint, Ci , to measure the cohesion of the network in which an actor is embedded. Constraint measures the degree to which an individual's contacts are connected to each other. Py is the proportion of i's network time and energy invested in communicating withj. Network constraint can be used as proxy for measuring network cohesion (Burt, 1992). In the hypothetical network in Table 2, C 2 is much higher than C7, because friends of actor 12 are more likely to be friends with each other than friends of actor 7. We construct network characteristics for both face-to-face and physical proximity. The network topologies are shown in Figures 3 and 4, and summary statistics are provided in Tables 3. Network Reach measures the degree to which any member of a network can reach everyone else in the network. We measure 2-step reach, which calculates the number of actors 120 that an individual can reach in the network in 2 steps, because our network is small enough that all actors are able to connect to everyone else in the network in three steps or less. Actor 7, located in the center of the network in Table 2, can reach eight other employees in two steps and therefore has a higher network reach than actor 12 who can only reach five others. Table 3: Network Measures for a Hypothetical Network 12 9 Direct Contacts Size(7)= 4 Size(12)= 3 Indirect Contacts Constraint Btw(7)= 33 Btw(12)=6 Reach(7)=67% Reach(12)=41% Constr(7)=0.47 Constr(12)=0.84 Control Variable Construction Other network characteristics may influence information sharing and work performance. Betweenness centrality B(n) has been widely used to measure how fast a person can obtain information in a network. It measures the probability that an individual i will fall on the shortest path between any two other individuals in a network (Freeman, 1979), where gjk(n) is the number of shortest geodesic paths from i toj that pass through a node n, while gjk is the number of shortest geodesic paths from i toj: B(n,) =YLX g,,0,)k j<k As shown in the hypothetical network in Table 2, actor 7 is located in a relatively more central position than actor 12. As actor 7 is closer to three different groups of actors, her betweenness centrality is much higher than actor 12. In addition to network structure, we posit two broad categories of factors that may influence the task completion rate: characteristics of tasks and of individual workers. 121 Characteristics of Tasks As harder tasks take longer to finish, task difficulty is strongly correlated with time to completion. We include two controls for task complexity: task difficulty and the number of follow-ups. Managers determine the task difficulty based on the initial request and parameters of the job and assign one of three difficulty levels to each task-basic, complex, or advanced. These difficulty ratings can be revised during task execution, although most task complexity scores are never modified. Instead, another metric, the number of follow-ups, is used to approximate the complexity level of the task during execution. When tasks are particularly complex, the number of follow-ups between the IT worker and the sales team increases. We therefore include controls for both the assigned (and revised) task complexity and the number of follow ups that occurred during the project. Although managers assign one of three complexity levels to all tasks: Basic (Low Complexity), Complex (Medium Complexity) and Advanced (High Complexity), the majority of tasks performed in our sample are basic tasks. Basic tasks can be completely quickly, usually in one day, since these jobs are generally straightforward and routine tasks that do not require any advanced technical skills. Often these basic configurations are components of a larger system and workers only need local knowledge about the component, rather than of the entire system, to successfully configure the product. To complete these tasks, workers can use simple off-theshelf configurations or follow detailed instructions already created by the sales team or the client. Even if they encounter technical difficulties during the task, IT workers can find most of the solutions in existing manuals or knowledge database. Rarely do they need to consult others to solve these problems. Thus, completing simple tasks usually only requires codified and context-independent information. The difficulty of a complex task lies in between advanced and basic tasks. In our sample, only less than 10% of the tasks are labeled as complex tasks, while the rest are labeled as either advanced or basic. Advanced tasks are the most difficult tasks assigned by the manager. They are often 122 novel and technologically complex configurations that require more advanced system knowledge. Special and customized orders to build an entire hardware system are typical examples of these tasks. To design these configurations, the IT worker needs to understand the entire system, especially the compatibility of various components within the system design. Solutions in such configuration tasks are usually new innovations that cannot be found in existing manuals or database. These tasks typically take longer to complete than simple routine tasks and the consultant often must confer with other team members to create a viable solution. Some tasks are classified as advanced not necessarily because of their technical difficulty but because the task description is vague. In addition, customers may impose a budge constraint as well. To put together a system with a set of functionalities under the budget can be challenging and sometimes infeasible. When it is impossible to meet the customer demand, the configuration specialist often contacts the sales representative to clarify which of the customer's requirements is absolutely necessary. The sales team would then work with the customer to revise the requirements or the budget before the IT worker can complete the order. Sometimes, the customer's specifications may have errors and the IT worker would also need to contact sales to verify and modify the existing plan. To keep track of these exchanges between sales and the IT worker, we measured the number offollow-ups in each configuration task that provides another proxy for task complexity as measured during task execution. Thus, completing advanced tasks and tasks that require frequent follow-ups would take longer than simple tasks as advanced tasks requires the transfer of tacit and embedded knowledge. The correlation between task difficulty assigned by the manger and the number of follow-ups is 0.7 and their Chronback alpha is 0.75. We therefore aggregate task difficulty level and the number of follow-ups into a single construct to measure task complexity. We create the task complexity variable by first demeaning each variable and then dividing each variable by its standard deviation (Norm). We then normalize the sum of these variables to construct the overall task complexity. 123 Complexity = Norm(Norm(TaskDifficulty) + Norm(NumberOfFollowUps)) Characteristics of Individuals We included controls for human capital using functional titles that classify employees into 3 categories: manager, pricing strategist and configuration specialist. While managers may be knowledgeable about the entire system, they are less likely to be intimately familiar with dayto-day configuration routines. The primary role of a pricing strategist is to determine if the pricing is feasible and correct based on the requirement, but they perform some configuration tasks as well. Among the three types of workers in our sample, the configuration specialist is most prepared to execute the configuration and we expect them complete tasks more quickly and accurately. Our interviews indicate that almost all workers have at least a Master's degree in a relevant field and all had joined this particular division less than a year before the start of the study. Thus, there is little variation in education level, experience or tenure within the group. Table 4: Summary Statistics for Worker and Task Characteristics Variable Obs. Mean Std. Dev. Min Task Completion time (minutes) 1201 Functional Title 1157 Task Complexity 1201 Number of Follow-ups 1217 Voice Animation 931 6703509 515-9159 1.57822 968.8949 Max .502736 1 1 12281 3 1.437968 .761374 1 3 4.612983 3.270522 0 21 6056288 85 2.89E+o7 Table 5: Pair Wise Correlations Between Independent Variables for the F2F Network Compl- Follow Direct ConsFunction exity up Animation Volume links Btw traint Reach Function Complexity 1.00 -.52 1.00 Follow up Animation Interactions -.52 -. 14 -. 17 .42 .04 .12 1.00 .03 .13 Size -.31 .16 .26 Btw Constraint -.27 -.28 .16 .13 .17 .17 Reach -.07 .04 .16 1.00 .25 .43 .43 1.00 .62 1.00 -.17 .71 -.28 .87 -. 44 1.00 -. 38 1.00 .38 .15 .69 .54 -.46 1.00 124 To mitigate the lack of complete demographic data on workers, we infer some worker characteristics from the badge data. By measuring the tonal variance of workers, we can infer how animated a person (Pentland, 2006). The animation of a worker's voice may give us indications about his general enthusiasm or motivation (Basu, 2002; Pentland, 2006). We also employ fixed effects specifications (described below) to control for time invariant characteristics of individuals. Summary statistics and correlations of all variables are listed in Tables 4 and 5. Empirical Methods Combining task performance and network data, we empirically test whether face-to-face and proximity networks are correlated with productivity and performance. Time to task completion measures how fast a person can finish a given task and based on our interviews, speed is a good measure of work performance in this setting. The accuracy or quality of configurations is also an important measure but only 20 of the 1217 tasks in our sample contain detectable errors and 90% of those errors were due to server configuration issues that are largely outside the control of individual workers. Since the majority of the tasks are completed correctly, completion time is a good metric for work performance. Although multitasking can increase total task throughput and could confound the use of duration as the only performance measure (Aral et al., 2007) in this setting multitasking is not possible since tasks are assigned to workers one at a time. Consequently, task duration provides a good overall measure of work performance that is also relied on by the firm to evaluate employees. Since our dependent variable is the number of minutes it takes to complete a task, we specify a duration model. We use a hazard rate model of the likelihood of a project completing at time t, conditional on it not having been completed earlier. The Cox proportional hazards model is used to examine the effect of network characteristics on project completion rate as follows: HazardRate(R)= f(size, betwenness,cohesion,reach,strongties, complexity, job title) 125 .. .......... ....... ........ ...... ...... ..... - --____ ........ R(t) = r(t )b e'i where R(t) represents the project completion rate, t is project time in the risk set, and r(t)b is the baseline completion rate when all the independent variables are set to zero. In this model, the effects of independent variables are specified in the exponential power, where 1 is a vector of estimated coefficients on a vector of independent variables X. R has a straight forward interpretation, where | 0-1 represents the percentage increase (or decrease) in project completion rate associated with a one unit increase in the independent variable depending on whether R-1 is positive (or negative). We tested this specification using both face-to-face and proximity-based network characteristics (Figures 3 and 4). The thickness of the lines in the graphs indicates the number of interactions between workers. As shown in the figures, there are more interactions between workers in the physical proximity-based network than in the face-toface network because when people are engaged in conversation they are by definition close to each other, whereas two people who are not talking could still be in close proximity. Therefore, the face-to-face network is a subset of the proximity network. Figure 3: Face-to-Face Conversation Network Figure 4: Weighted F2F and Physical Proximity Network 126 We test the effects of four face-to-face network attributes on the speed of task completion: size, volume, cohesion and reach, controlling for betweenness centrality and various task and worker characteristics. First, we use a single pooled cross-sectional network over the entire experimental period to compute network variables. Constructing a network over the entire period allows us to assess the overall social network that a worker can potentially leverage when completing a task ("Cross-Sectional Network"). In addition, we constructed a set of longitudinal social networks in three day panels of interaction ("3-Day Networks") that measure interactions over pooled three day periods and arrange data as a dynamic panel over three day cross sections, allowing us to estimate fixed effects models that eliminate omitted heterogeneity from time invariant individual characteristics that may bias results in crosssectional data. We estimate hazard rate models at the project level with the duration of each project as the dependent variable and a project/worker as the observation, and fixed and random effects models on panel data with average project duration over three days spells as the dependent variable and independent variables measured per individual over three day cross sections of interaction. We use the following linear model to estimate the effect of network structure on work performance in our longitudinal data. AvgDuration,,= a+ plAvgComplexity,+ pl2Network,+ /31IndividualCharacteristics+E, When a coefficient is positive in this model, it means the variable whose parameter is being estimated is associated with more time to complete projects. As a robustness check, we also estimated panel data specifications using spells of different durations (1, 3 and 5 day panels) and the length of the periods do not affect our results. As employees work on more than one task during the observation period, standard errors are clustered around individuals. Results The Effect ofNetwork Structure on Performance Model 1 in Table 6 shows the effect of the latent cross-sectional network on worker 127 productivity. Unsurprisingly, complex tasks take longer to complete on average. Tonal variation, a proxy for employees' level of enthusiasm and motivation, has no effect on work performance. However, it is possible that tonal variation will have an effect if we can disaggregate the data to measure tones in conversations during a given task, although we are currently unable to achieve this level of granularity. As predicted, network cohesion is positively correlated with work performance. Instead of reducing speed and productivity, as shown in email networks (Aral and Van Alstyne, 2009), a one-standard-deviation increase in network constraint in face-to-face networks is associated with doubling the speed of task completion, demonstrating that cohesive ties in a face-to-face networks are more highly correlated with productivity than networks with structural holes. We suspect that the information transmitted in face-to-face networks is inherently different from that which is transferred in other media. It appears that the advantages of using face-to-face communication to transmit complex knowledge are enhanced in cohesive networks, supporting Hypothesis Ia. Similarly, strong ties are also associated with improved work performance. As shown in Model 1, one additional strong tie is associated with 22.5% increase in productivity controlling for network size, demonstrating that strong ties may be beneficial in transferring information in face-to-face social networks. Interestingly, the total number of face-to-face interactions has a minimal impact on the time to finish a task. However, an additional network contact is associated with a 9% decrease in the average speed of task completion, demonstrating the potential cost in time, effort, and energy of engaging in many face-to-face conversations. These results imply that while having more direct contacts may be harmful for performance as face-toface contacts are expensive to maintain, selectively cultivating strong ties may be more beneficial in a face-to-face setting, especially when one needs more tacit and embedded information. These results support Hypothesis 2a. Network reach is also positively correlated with the task completion rate. Specifically, a one-percent increase in network reach is associated with a 4% increase in the speed of task 128 completion, confirming Hypothesis 3a. Having broad reach is particularly helpful for locating the source of knowledge and thus can increase the rate of task completion. Although results using the latent cross-sectional network show promising evidence supporting our hypotheses, the results may be driven by unobserved variation in individual characteristics, such as employees' inherent ability or ambition. Without comprehensive demographic data, it may be premature to attribute these performance differences to social networks alone. We therefore used the panel data fixed effects and random effects specifications that eliminate variance explained by any time-invariant characteristics, such as individual aptitude that could affect performance. The results are shown in Table 6, Models 2 and 3 respectively. The coefficients from the random effects model are roughly the same as with the cross-sectional network. Although the coefficients in the fixed effects model diminish in magnitude for some network metrics, the signs and statistical significance of those coefficients again corroborate the results providing further evidence supporting our hypotheses. Network cohesion, as measured by network constraint continues to have the strongest effect on worker performance even after eliminating time-invariant factors such as individuals' inherent ability. As shown in Model 3 of Table 6, the fixed effects model demonstrates that a one-standard-deviation increase in network constraint is associated with a decrease of 338 minutes in task completion time. The evidence from both longitudinal and cross-sectional networks demonstrates that cohesion in face-to-face networks is correlated with higher productivity supporting Hypothesis 1a. The number of direct contacts has either a limited or a negative effect on the task completion rate in the longitudinal models. Although the total number of face-to-face interactions continues to have minimal impact on the rate of task completion, an additional network contact in the fixed effects model is associated with an increase of 140 minutes in average task completion rate. This negative impact of network size on work performance suggests a potential cost in time, effort, and energy to maintain face-to-face relationships. 129 Disruptions during task execution can be especially distracting, as the cognitive cost of task switching can impede the rate of task completion (Aral, et al., 2006). A large number of conversations during a task are likely to be unproductive and interruption-driven communication from other colleagues. Since interruptions are costly to workers, we may also expect the total number of conversations, as approximated by network size, to negatively affect work performance. The coefficients on the number of strong ties are not significantly than zero. Thus we do not see strong evidence that having strong ties is correlated with faster task completion time as we have shown in the cross-sectional network. The effect of network reach continues to be positively correlated with task completion. A one percent increase in network reach is associated with an 1.96-minute decrease in task completion time. Broader network reach in potential contacts is particularly effective in searching for information and reducing information distortion because an employee can choose the shortest path to task-relevant experts among those potential contacts while performing a particular task. Lastly, when we compare face-to-face networks with physical proximity networks, we see (in Model 4) that most of the coefficients in the physical proximity network are insignificant, demonstrating that face-to-face conversations matter more than physical proximity alone. Although workers are more likely to turn to a person for information when they are physically proximate (Allen, 1977), we show that being physically close alone is not enough. As information transfer is far more likely to happen during a face-to-face conversation than during times of mere collocation, face-to-face interaction networks are more likely to have an impact on work performance than physical proximity networks. Conley and Udry (2005) find similar results when studying the effect of social networks on the use of fertilizer in Ghanaian pineapple farms. These results also rule out confounding effects of workers being simultaneously exposed to positive environmental factors, like being close to the sales team or sitting in the vicinity of a manager. 130 Table 6: The Effect of F2F Networks on Work Performance Cross-sectional Network Panel: 3-Day Networks Physical Proximity Network Model 1 Model 2 Model 3 Model 4 Completion Rate Average Duration Average Duration Completion Rate Hazard Linear Random Effects .565*** 270.8*** 280.2*** o.863*** (.027) (68.01) (74.81) (.012) 2.328*** -494.7** -- 2.047*** (.230) (197-1) -- (.190) Tonal Variation 1.000 (6.63e-o9) -2.65e-o6 (9-13e-o6) --- 1.000 (.0001) Interactions Volume 1.000 -3.163** -2-311 1.000 (.0001) (1.469) (1.692) (.0001) .901*** 142.9*** 140.8** 1.002 (0.026) (49.87) (54-19) Network Cohesion 2.075** (.603) -264.0* (134-0) -338.o* (196.0) (.017) .855** (.047) Betweenness Centrality 1.135** (.067) -62.78** -68.20** -959 (28.49) (30-55) (.049) 1.037*** -1o.84** -11.96** (0.010) (4.761) (5.181) -- 1.225*** (0.092) 54-07 (47.63) 39-14 (26-36) -- 911 93 93 911 Network Type Dependent Variable Task Complexity Configuration Specialist Network Size 2 Step Reach Strong Ties Observations Standard errors in parentheses p<o.ooi, ** . i Effe Hazard p<o.o5, * P<o-1 The Effect of Network Structure on Completing Complex Tasks As cohesive networks enable more effective transfers of complex knowledge (Reagans and McEvily 2003; Hansen, 1999) we expect cohesion to be more effective when employees are engaged in complex tasks. Given the cost of face-to-face interactions in time, effort, energy and interruption, we also expect additional face-to-face interaction, especially those with strong ties, 131 to increase the speed of project completion for complex tasks that require more information, advice and tacit guidance from colleagues. For complex tasks, we expect the benefits of face-toface conversations to outweigh the costs, whereas for simple tasks we expect there to be less benefit to interaction, while still creating costs. However, we do not expect network size to be particularly helpful in completing complex tasks when the strength of ties is low. An additional weakly linked tie is costly to maintain and its ability to transfer complex information may still be limited. A pair joined by a weak tie may not have communicated frequently enough to develop trust and relationship-specific heuristics and thus continue to have difficulty in articulating complex concepts to each other. Accordingly, we expect that while strong ties may be beneficial to complete complex tasks, network size may have a negative association with the rate at which workers complete these tasks. To test these expectations, we add interaction terms between task complexity levels and various network measures. The results in Table 7 lend broad support to our hypotheses. In the cross sectional network, we find that network cohesion continues to have a positive correlation with the rate at which tasks are completed, and is especially beneficial for completing complex tasks. The interaction term between task complexity and network cohesion is positive (0=1.796, p<o.1), demonstrating that cohesion in a face-to-face network may facilitate transferring and understanding complex information. We find similar results using the longitudinal network data (Model 3). Using fixed effect specifications, we find that while network cohesion alone is no longer correlated with task completion, the interaction term between cohesion and complexity remains correlated with faster completion time (0=-643, p<o.o5, Model 5). Both the cross sectional and longitudinal results lend support to Hypothesis 1b that network cohesion may be even more effective in completing a complex task that requires tacit or context dependent information. 132 Table 7: The Effect of F2F Networks on Work Performance in Complex Tasks Cross-sectional Network Panel: 3-Day Networks Physical Proximity Network Model 1 Model 2 Model 3 Model 4 Completion Rate Average Duration Average Duration Completion Rate Hazard Linear Random Effects Linear Fixed Effects Hazard Task Complexity 0.102* (0.120) 515-4 451-4 0.718 (599-7) (0-380) Configuration Specialist 2.494*** -607.8*** (0.263) (211.8) Tonal Variation 1.000 1.25e-o6 (7.04e-09) (9.76e-06) Interaction Volume 1.000 (0.000167) -1.257 2.035 (4.754) (5-475) Network Size o.864*** -103.4 -158.7 (0.0276) (136.7) (153-0) Network Cohesion 2.080** (0.657) 644.5 (780.9) 636.0 Betweenness 117.7 centrality 1.224*** (0.0819) 160.9 (106.2) 2-Step Reach 1.041*** (0.0112) 13.67 Strong Ties 1.293*** (0.102) -447.1* ComplexityX Interact. Vol. 1.000 (0.000138) -0-382 -1.769 (2.695) 1.000 (2.374) ComplexityX 1.040 (0.0336) 141.3* 166.2* (78.59) (86.49) 0.999 1.796* -642.2* -663.4** (0.623) (370.5) (341-7) 1.070 (0.066) ComplexityX Betweenness o.988 (0.00942) -110.9** -136.o** o.982 ComplexityX Reach Network type Dep. Var Network Size ComplexityX Cohesion (534-7) (91.88) (12.94) (239.5) 2.117*** (0.210) 1.000 (0.000) (904.3) 0.855 (0.100) o.857 (o.oo) -521.8* (273-3) (60-55) 1.012 (0.0136) -15-38** (7.682) -14.89* ComplexityX Strong Ties o.981 (0.0665) 233-4 261.5 (234.9) (156.8) Observations 911 Standard errors in parentheses, 0-997 (0.042) 12.75 (14-57) (52.72) _93 1.000 (0.000) (.0001) (0.021) (0.081) (8-550) 93 911 *** p<o.oo, ** p<o.05, * p<o-1 133 Strong ties, on the other hand, show mixed results. While we expect more strong ties to be positively correlated with work performance, we expect the effect to be more pronounced when doing complex tasks. For complex tasks, the benefit of a strong tie should outweigh the costs of maintaining strong ties. However, in both cross-sectional and longitudinal networks, strong ties are positively correlated with task completion rate but the interaction between task complexity and strong ties is not statistically significant. We expect network size to have a negative association with work performance, especially for completing complex tasks, as the costs of maintaining face-to-face interactions is high but the benefits are low if these contacts are not strongly tied to the actor. In the cross-sectional network, We find that more network contact are costly on task execution with an additional interaction correlated with reducing the project completion rate for complex tasks by about 14% (Model 1 Table 7) in the cross-sectional network. While the coefficient on the longitudinal network variable is positive it is short of being statistically significant. However, we find that the interaction of network size and task complexity is correlated with delaying project completion in the longitudinal network (Model 2 and Model 3), supporting Hypothesis 2a. Having more network contacts can actually hurt if none of them have a strong connection to the actor. Strong network contacts can cause unwanted interruptions but may be strong enough to transfer complex information needed for completing difficult tasks. Overall, combining the results on strong ties and network size demonstrates that more interactions with fewer strongly tied connections are the most beneficial for increasing the speed of work. We suspect that employees who seek information from a greater number of colleagues not only experience a cost to those interactions, but are also not finding the information they are looking for and thus are seeking advice from additional colleagues. Our interviews corroborate this interpretation as employees report having to contact more people when they cannot find the guidance they are seeking. More interactions with the right colleagues are helpful on complex tasks, but seeking advice from many colleagues is not only costly but also signals an inability to find the information necessary 134 to complete the task quickly. It could also be that more interactions with fewer strongly tied colleagues generate a higher degree of mutual understanding and conversational rapport that facilitates more efficient transfers of complex knowledge. Lastly, the physical proximity network displays no significant effect on task completion, suggesting that face-to-face conversations are more important than physical proximity when completing complex tasks. Discussions and Conclusion By studying face-to-face interaction networks, we can better understand how the relevant theories interact. Until now, social network theories (e.g. Granovetter, 1973; Burt, 1992) and theories of communication media choice, such as information richness theory (Daft and Lengel, 1987) have been used independently to understand knowledge transfer in informationintensive work. Social network theories explain how network structures co-vary with the diffusion and distribution of information, but largely ignore characteristics of communication channels. Media-choice theories focus explicitly on communication channel requirements for different types of knowledge transfer but ignore the population level topology through which information is transferred in a network. We bridge these two sets of theory to understand what types of social structures are most conducive to transferring knowledge and improving performance in face-to-face communication networks, which has been rarely studied in the past. As valuable information passes through verbal exchange, studying face-to-face interaction networks is particularly important for understanding how informal social structures can facilitate work in modem organizations. To precisely capture face-to-face interactions, we used new tools and methodologies to collect precise real time data on face-to-face interactions in an IT configuration facility. By matching data obtained through the use of wearable Sociometric badges with detailed performance data from the firm's accounting records we are able to test the effects of real time face-to-face interaction networks on individual information worker performance. Although 135 detailed data on electronic interactions (e.g. email, phone logs, instant messaging) has become readily available in recent years, our ability to record network data for face-to-face interactions has lagged behind. The tools and methods presented in this paper give researchers important new opportunities for collecting fine grained data about the flow of information and knowledge through informal channels such as face-to-face interaction in real organizations, opening new avenues for research into social networks, knowledge management and IT use in organizations and elevating data collection on face-to-face networks to the standards of accuracy and precision displayed in electronic communication data. Our research uncovers three main results. First, in face-to-face networks, network cohesion, rather than structural holes, is associated with higher productivity. We suspect that information transmitted in face-to-face networks is more tacit, complex and embedded than information transferred through electronic channels, and that the advantages of using face-toface communication to transmit complex knowledge are enhanced by cohesion which increases norms of trust, effective communication heuristics and absorptive capacity through the provision of multiple perspectives on a problem. Second, we find that cohesion in face-to-face networks is more strongly correlated with performance when the participants are solving complex problems. This suggests that cohesion complements information-rich communication media for the effective transmission of complex tacit knowledge when conducting complex tasks. Third, we find that network size has a negative relationship with task-completion, implying that the cognitive cost of interruptions is high during task execution. On the other hand more interactions with fewer people, as measured by strong ties speed up project completion for complex tasks, which require more complex information and guidance from colleagues. The explanatory power of cohesion is stronger in face-to-face networks than in physical proximity networks, demonstrating that information flows in actual conversations (rather than mere physical proximity) are driving our results. There are two main limitations of our work. First, we do not know the content of the 136 conversations so we can only theorize about the types of information transmitted in face-to-face conversation. However, our on-site interviews indicate that the information transferred in faceface conversations in this firm may be fundamentally different from that which is transferred in electronic media. Workers tend to seek tacit and context dependent information when they ask for help in face-to-face conversations. Second, although our longitudinal models allow us to control for variance explained by any time-invariant characteristics of employees, our results may still be biased by unobserved and time-varying characteristics such as media choice at different points during a task or by endogeneity. Although we do not make causal interpretations of our parameter estimates, we submit that such interpretations are plausible when viewed in light of evidence from interviews and fixed effects analyses, which control for omitted variables that could influence our results. Caveats aside, our results represent some of the first evidence measuring the effects of face-to-face communication networks on information worker performance. Using innovative technology to record face-to-face interactions, we show cohesive networks in a rich communication medium such as face-to-face interaction are associated with higher employee performance. The unique characteristics of face-to-face networks highlight the need to distinguish them from other types of communication networks, particularly when analyzing their effects on productivity and performance. 137 References Ancona, D.G., and Caldwell, D.F. 1992. "Demography & Design: Predictors of new Product Team Performance." Organization Science 3(3): 321-341. Apte, U., and Nath, H. 2004. "Size, structure and growth of the U.S. economy." Center for Management in the Information Economy, Business and Information Technologies Project (BIT) Working Paper. Aral, S., Brynjolfsson, E., and Van Alstyne, M. 2006. "Information, Technology and Information Worker Productivity: Task Level Evidence." Proceedings of the 27th Annual International Conference on Information Systems, Milwaukee, Wisconsin. Aral, S., Brynjolfsson, E., and Van Alstyne, M. 2007 "Productivity Effects of Information Diffusion in Networks." Proceedings of the 28th Annual International Conference on Information Systems, Montreal, CA. Aral, S., and Van Alstyne, M. W. 2009. "Networks, Information & Brokerage: The DiversityBandwidth Tradeoff." Available at SSRN: http://ssrn.com/abstract=958158 Argote, L. 1999. "Organizational Learning: Creating, Retaining and Transferring Knowledge. Kluwer Academic Publishers", Boston, MA Basu, S. 2002. "Conversational Scene Analysis". PhD thesis, Massachusetts Institute of Technology, Media Laboratory. Bateson, G. 1973, Steps Toward an Econology of Mind. London: Paladin Press Boorman, S. A. 1975 "A combinatorial optimization model for transmission of job information through contact networks." Bell Journal of Economics 6(21): 6-249. Brass, D. J., and M. E. Burkhardt. 1992. Centrality and power in organizations. N. Nohria and R. G. Eccles, eds. Networks and Organizations: Structure, Form and Action. HBS Press, Boston, MA, 191-215. Burt, R. 1987. "Social Contagion & Innovation: Cohesion versus Structural Equivalence." American Journal of Sociology (92): 1287-1335. Burt, R. 1992. "Structural Holes: The Social Structure of Competition." Harvard University Press, Cambridge, MA. Burt, R. 1997. "The Contingent Value of Social Capital", Administrative Science Quarterly 42(2) Burt, R. 2004. "Structural Holes & Good Ideas" American Journal of Sociology (110): 349-99. Cheepen, C. "The predictability of informal conversation", London : Pinter, 1988. Coleman, J.S. 1988. "Social Capital in the Creation of Human Capital." American Journal of Sociology (94): S95-S120. Cohen, W., and Levinthal, D. 1990, "Absorptive Capacity: A New Perspective on Learning and Innovation", Administrative Science Quarterly 35(1):128-152. Conley, T.G., and Udry, C.R. 2005, "Learning about a New Technology: Pineapple in Ghana, Yale University Working Paper Contu, A., and Willmott, H., 2003. "Re-embedding situatedness: The importance of power relations in learning theory", Organizational Science 14(3): 283-296. Csikszentmihalyi, M. 1996, "Creativity: Flow and Psychology of Discovery and Invention". New York: Harper Collins. Cummings, J., and Cross, R. 2003. "Structural properties of work groups and their consequences for performance." Social Networks 25(3): 197-210. Donaldson, S., and Grant-Vallone, E. 2002 "Understanding Self-Report Bias in Organizational Behavior Research," Journal of Business and Psychology 17(2):245-260. Daft, R. L., and Lengel, R.H. 1986. "Organizational Information Requirements, Media Richness and Structural Design," Management Science 32(5): 554-571. 138 Eisenhardt, K., and Tabrizi, B., 1995. "Accelerating adaptive processes: Product Innovation In the global computer industry." Administrative Science Quarterly (40): 84-110. Freeman, L. 1979. Centrality in social networks: Conceptual clarification. Social Networks 1(3): 215-234. Garguilo, M., and Rus, A. 2002 "Access and mobilization: Social capital and top management response to market shocks." Working paper, INSEAD. Gilovich, T. 1991. How We Know What Isn't So: The Fallibility of Human Reason in Everyday Life. Free Press, New York. Goffman, E. 1959. The Presentation of Self in Everyday Life, New York: Doubleday. Goffman, E. 1982. Interaction Rituals: Essays on Face-to-Face Behavior, New York: Pantheon Books. Granovetter, M. 1973. "The strength of weak ties." American Journal of Sociology (6): 13601380. Granovetter, M. 1982. "The strength of weak ties: A network theory revisited." In P. V. Marsden and N. Lin(eds.), Social Structure and Network Analysis: 105-130. Granovetter, M. 1985. "Economic Action & Social Structure: The Problem of Embeddedness." American Journal of Sociology (91): 1420-1443. Granovetter, M. 1992. "Problems of Explanation in Economic Sociology." In N. Nohria & R.G. Eccles (eds.), Networks & Organizations: 25-56. Harvard Business School Press, Boston. Grant, R. 1996. Prospering in dynamically-competitive environments: Organizational capability as knowledge integration. Organization Science 7(4) 375-387. Hinds, P.J. and Mortensen, M. 2005. "Understanding Conflict in Geographically Distributed Teams: The Moderating Effects of Shared Identity, Shared Context, and Spontaneous Communication," Organization Science 16(3): 290-307. Hansen, M. 1999. "The search-transfer problem: The role of weak ties in sharing knowledge across organization subunits." Administrative Science Quarterly 44(1): 82-111. Holggraves, T. 2002. "Language as Social Action: Social Psychology and Languyage Use", Thomas Holtgraves, Language as social action (Taylor &Francis, 2002). Huang, Y., Zhu, M., Wang, J., Pathak, N., Shen, C., Keegan, B., Williams, D., and Contractor, N., 2009. "The Formation of Task-oriented Groups: Exploring Combat Activities in Online Games," International Conference on Computational Science and Engineering, pp. 122-127. Kramer, R.M., 1999. "Trust and Distrust in Organizations: Emerging Perspectives, Enduring Questions," Annual Review of Psychology (5o): 569-598. Markus, M.L., 1994. "Electronic Mail as the Medium of Managerial Choice," Organization Science 5(4): 502-527. Marsden, P. 1990. "Network Data &Measurement." Annual Review of Sociology (16): 435-463. McCain, B., O'Reilly, C., and Pfeffer, J. 1983. "The effects of departmental demography on turnover: The case of a university," Academy of Management Journal (26): 626-641. Mergel, I., Lazer, D., and Binz-Scharf, M.C., 2008. "Lending a helping hand: voluntary engagement in knowledge sharing," International Journal of Learning and Change 3(1): 522. Nelson, R., and Winter, G., 1982 "An Evolutionary Theory of Economic Change." Cambridge, MA: Belknap Press. Nohria, N., and Eccles, R., 1992, "Networks and Organizations Structure, Form and Action", Boston: Harvard Business School Press. Nonaka, I. 1990. "Redundant, overlapping organization: A Japanese approach to managing the innovation process", California Management Review, Spring, pp. 27-38. Nonaka, I. 1994 "A Dynamic Theory of Organizational Knowledge Creation," Organization Science 5(1):14-37. O'Reilly, C. 1980 "Individuals and Information Overload in Organizations: Is More Necessarily Better?" The Academy of Management Journal 23(4): 684-696. 139 O'Reilly, C. Caldwell, D., and Barnett, W. 1989. Work group demography, social integration, and turnover. Administrative Science Quarterly (34): 21-37. Olguin-Olguin, D.,Waber, B. N., Kim, T. J., Mohan, A., Ara, K., and Pentland, A. 2009, "Sensible Organizations: Technology and Methodology for Automatically Measuring Organizational Behavior", IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 39(1): 43-55 Olguin-Olguin, D., and Pentland, A. 2006, "Human Activity Recognition: Accuracy across Common Locations for Wearable Sensors". IEEE ioth International Symposium on Wearable Computing (Student Colloquium Proceedings). Montreaux, Switzerland Pentland., A. 2006. "Automatic mapping and modeling of human networks" Physica A: Statistical Mechanics and its Applications. Podolny, J., and Baron, J. 1997. "Resources and relationships: Social networks and mobility in the work-place." American Sociological Review 62(5): 673-693. Polanyi, M. 1966. "The Tacit Dimension." New York: Anchor Day Books. Quintane, E., and Kleinbaum, A. 2008. "Mind Over Matter? E-mail and Survey as Representations of Observed and Perceived Networks", International Social Network Conference. St. Pete Beach, Florida, USA. Reagans, R., and McEvily, B. 2003. "Network Structure & Knowledge Transfer: The Effects of Cohesion &Range." Administrative Science Quarterly (48): 240-67. Reagans, R., and Zuckerman, E. 2001. "Networks, diversity, and productivity: The social capital of corporate R&D teams." Organization Science 12(4): 502-517 Slaughter, S., and Kirsch, L. 2006. "The Effectiveness of Knowledge Transfer Portfolios in Software Process Improvement: A Field Study." Information Systems Research, 17(3):301320 Sparrowe, R, Liden, R., Wayne, S., and Kraimer, M., 2001. "Social networks and the performance of individuals and groups." Academy of Management Journal 44(2): 316-325. Sproull, L., and Kiesler, S. 1986, "Reducing Social Context Cues: Electronic Mail in Organizational Communication," Management Science 32(11): 1492-1512. Trevino, L.K., Daft, R.L., and Lengel,R.H.1991. "Understanding Managers' Media Choices: A Symbolic Interactionist Perspective," in J. Fulk and C. Steinfield (Eds.), Organizations and Communication Technology, Newbury Park, CA: Sage, 71-94. Teece, D. J. 1986. "Profiting from technological innovation: Implications for integration, collaboration, licensing and public policy." Research Policy (15): 285-305. Waber, B.N., Olguin Olguin, D., Kim, T., Mohan, A., Ara, K., and Pentland, A. 2007. "Organizational Engineering using Sociometric Badges" International Conference on Network Science, New York, NY. Walther, J. 1995. "Relational Aspects of Computer-Mediated Communication: Experimental observations over time", Organization Science 6(2) Winter, S. 1987. "Knowledge and competence as strategic assets." In DavidJ. Teece (ed.), The Competitive Challenge: 159-184. Cambridge, MA: Ballinger. Uzzi, B. 1996. "The Sources and Consequences of Embeddedness for the Economic Performance of Organizations: The Network Effect." American Sociological Review (61): 674-98. Uzzi, B. 1997. "Social Structure and Competition in Interfirm Networks: The Paradox of Embeddedness." Administrative Science Quarterly (42): 35-67. Zenger, T., Lawrence, B. 1989. "Organizational demography: The differential effects of age and tenure distributions on technical communication." Academy of Management Journal (32): 353-376. 140