Emergence of Developer Teams in the Collaboration Network Bora Caglayan1 , Ayşe Başar Bener2 , Andriy Miranskyy3 Bogazici University, Istanbul, Turkey, bora.caglayan@boun.edu.tr1 Ryerson University, Toronto, ON, Canada, ayse.bener@ryerson.ca2 IBM Toronto Software Laboratory, Toronto, ON, Canada, andriy@ca.ibm.com3 Abstract—Developer teams may naturally emerge independent of managerial decisions, organizational structure, or work locations in large software. Such self organized collaboration teams of developers can be traced from the source code repositories. In this paper, we identify the developer teams in the collaboration network in order to present the work team evolution and the factors that affect team stability for a large, globally developed, commercial software. Our findings indicate that: a) Number of collaboration teams do not change over time, b) Size of the collaboration teams increases over time, c) Team activity is not related with team size, d) Factors related to team size, location and activity affect the stability of teams over time. I. I NTRODUCTION Development of a large software is beyond the technical capabilities of a single person or a few people. For this reason, large software is built through the collaboration of many developers. Formation of distinct collaboration communities (teams) is inevitable over the course of such collaboration. It is hard to track the underlying source code level developer team structure only by examining the organizational structure of the developer groups. The increased popularity of distributed developer teams with the emergence of global software engineering has also increased the difficulty of tracking the collaboration communities among the developers [1], [2]. Developer collaboration teams may be formed over time based on the combination of different causes such as managerial decisions, software architectural constraints or employee preferences. Therefore, the collaboration of developers may be independent from their roles in the organizational structure. For example, a subcontractor who reside in Canada may be the closest collaborator of a development team leader in India. In order to understand developer team formation, activity on the software repositories should be analyzed. Afterwards, using complex networks analysis techniques, the manager can identify the team structures of developers[3], [4]. In this research, we examine the structure and the evolution of the developer teams during the development of an internationally developed, large-scale, commercial software project. We propose a team identification method based on the collaboration on the source code and identify the programmer teams (communities) in an automated way by mining the collaboration network. We investigate the effect of certain factors such as distributed location, team size and team activity on team stability. 978-1-4673-6290-0/13/$31.00 c 2013 IEEE The research questions that will be addressed in this paper are as follows: 1) How do developer teams emerge during software evolution? 2) Which factors related to teams affect the team stability over time? By answering the first research question, we aim to help practitioners to understand the code level collaboration and guide managers in their team formation decisions. Some teams may emerge only for a short duration, while some teams remain stable during development. By answering the second research question, we can understand the factors that affect near-term stability of the teams during a release. The rest of the paper is structured as follows: In Section II we discuss the background for the research and the related literature. Afterwards, in Section III we define the dataset, the method used to model collaboration to identify teams and the empirical setup used to answer the research questions. In Section IV we present the empirical results. We continue with a discussion of threats to validity and the implications of the results in Section V . In Section VI we present the answer to the research questions and possible future work in this area. II. BACKGROUND Work team structures, changes in teams over time and work team effectiveness have been studied in many disciplines such as management science [5], psychology [6], sociology [7], network science [4] and software engineering [8], [9], [10]. In this section we overview the literature related to work teams and communities in networks. A. Work Team Formation Work teams are small groups of interdependent individuals and they usually consist of up to thirty members who share responsibility for outcomes of their organizations [6]. Forming work teams effectively is one of the most important responsibilities of managers. Management science and sociology research on work teams are usually theoretical. Researchers have proposed frameworks for various team construction and management strategies [5], [6]. For example, Manz argued that we should move toward self-leading teams in organizations based on his theoretical research[11]. Similarly, Marks et al. studied the team processes over time in organizations and built a theoretical framework to model the various teamwork 33 CHASE 2013, San Francisco, CA, USA processes [12]. They proposed that, by using their teamwork process framework, a model for measuring team productivity for each process can be built. We believe that empirical work regarding work team formation is relatively easier in the domain of software engineering. Software development datasets are usually rich in terms of work information about the developers and all the activity on the project source code is in most cases recorded indefinitely. Developer communication and development activities can be traced easily from the source code management systems and the issue repositories to identify the work teams. B. Community Structure in Networks Communities can be identified in networks by finding the relatively more strongly connected components (subnetworks) within the network. For large networks the identification process becomes a hard computational challenge. Greedy algorithms and heuristics are proposed to this challenge instead of exhaustive searches since the exhaustive graph search algorithms are NP-Complete [13]. Communities in networks can be identified by a top-down or bottom-up approach. Bottom-up approaches identify communities through techniques similar to k-means clustering in non-linked data. In short summary; for bottom-up approach, random nodes are selected within the network and the community is grown like a snowball around the selected node by adding the strongly connected nodes to the community as the algorithm iterates. Bottom-up approach is good for identifying strongly connected communities in especially for large networks but it may ignore periphary nodes with less degrees of the community [4]. On the other hand, top-down approaches iteratively remove edges based on a criteria and stops the algorithm when a certain condition is met. One common criteria for edge removal is removing most central edges first. In a scale free network model, the weak ties have usually higher centrality since they connect different strongly connected components. By removing the edges with the highest centrality values one can form several strongly connected communities in one weakly connected network [4]. One other challenge in identifying communities in real networks is performance evaluation of the particular method for community identification. The performance is evaluated by calculating the fraction of the sum of within-community edges to all of the edges in the network. However, using this performance evaluation technique naı̈vely may give best results for one giant community which may include the whole network. For this reason, change in the likelihood of forming the expected community in the examined network with respect to a random network should be considered [4]. Ideally, within community edges of the communities should be significantly higher compared to random wiring of the same nodes. C. Software Collaboration Teams Software collaboration teams have been analyzed previously by different research groups [8], [9], [10], [2], [14], [15], [16]. Fig. 1. Collaboration Network In August 2009 For The Enterprise Project Several researchers have investigated certain community patterns in the collaboration network. One common trend among the open software developers is the formation of core and periphery communities [8]. The core communities work on the open source project consistently over time while the periphery group makes one-time or rare contributions. Robles et al. analyzed this trend on several open source projects and discussed the importance of the core communities for the open source projects[8]. Bird et al. mined the email data of open source projects to identify the communities [9]. Jermakovics et al. visualized the network structure by filtering edges in the file-level collaboration network [16]. Datta et al. and Wen et al. highlighted the scale-free behavior of the developer collaboration network [14], [17]. Bird et al. built a theory on the role of branches in source code repositories on forming virtual teams [9]. They stated that branches are an important element in goal formation for developer teams for large projects. Begel discussed the coordination challenges in large-scale software development [2]. He argued that, differences of location, time zone, and culture increases the coordination challenges and reduces developer productivity. His research high- 34 lights the need for virtual coordination mechanisms in software engineering. To address these coordination challenges, several researchers proposed recommendation systems for coordination. For example, Minto et al. proposed a recommendation system which proposes people to collaborate to developers based on past collaboration activity [10]. Borici et al. built a real-time visualization tool that highlights developer and task dependencies to overcome the coordination challenges [18]. Current literature on developer collaboration teams lack research on the emergence of work teams in software projects to the best of our knowledge. Therefore, our work is different from the related research on software collaboration teams in three aspects: 1)We identify the distinct developer communities in a large scale software without human intervention, 2) We track the evolution of the communities over time, 3) We define the characteristics of collaboration teams that make them more or less stable over time. III. M ETHODOLOGY A. Dataset We used a commercial large-scale enterprise software product to conduct our empirical work. The enterprise software product of the company has a 20 year old code base. We examined a 500 kLOC part of the product that constitutes a set of architectural functionality of the project as the dataset. The programming languages of the project are C and C++. The software is developed by an international group of developers in five countries. B. Data Extraction We extracted the development activity for a period of 13 months between 01 January 2009 and 01 February 2010. A major version of the product was released during this period in summer 2009. This release is chosen since the development data for it has the best quality. The average size of the source code files exceed one thousand lines of code. Therefore, we extracted all of the changes with function level granularity. 9703 defects in the software were fixed by 123 developers within the 13 month period. For each month we identified the communities using the clustering algorithm described in Section III-C. The number of developers increased from 34 to 123 within the observed period. This increase is a usual trend in the company. The reason for this increase is the addition of developers from the maintenance groups of older releases as the release time gets nearer. C. Collaboration, Communities and Teams We define collaboration as the co-work on the same function in the same software source code file during a release of a software similar to the method proposed by Meneely et al. for defect prediction [19]. For example if Dev X and Dev Y have changed the same function in the same source code file within the same development branch previously we assume that they have collaborated during development. Collaboration network is an un-directed non-weighted network < E, V > where V is the set of developers and E is the set of edges between the developers if they have collaborated previously in the release. The collaboration network of the enterprise project during August 2009 is given in Figure 1 . Many large software networks have been shown to be scalefree[17], [14]. Therefore, we chose a community identification technique which has been considered to be effective for scalefree networks [4]. For community inference, we created a partition dendogram using Louvain algorithm and chose the best partition using Modularity measure [20]. We measured the strength of the developer team cohesion with the modularity(Q) measure proposed by Newman et al.[4]. Modularity estimates the quality of a particular partition of a network by estimating the likelihood of proposing the partition without chance. It is found from the difference of the fraction of within community edges over all edges in the proposed partition of the network with the expected within community edges fraction in a network with same nodes and random edges. If we define E as a kxk matrix where k is the number of communities and matrix values Eij as the edges between communities i and j trace of the matrix TrE gives the fraction of within communities edges. Sum of the matrix multiplication E∗E gives the fraction of within partition edges in a randomly connected network with the same nodes and the partition. Hence, the formula for modularity is defined as follows: Q = TrE − kE 2 k (1) Value of Q ranges between 0 and 1. Q ≤ 0.7 holds for most real networks since perfect separation is usually not possible. Q∼ = 0 holds for a random partition of a random network. The structure of the collaboration network of the enterprise software for each month is given in Table I. The collaboration network consists of one giant weakly connected component. We see that the collaboration densifies over time. In other words while the developers in the collaboration network increased linearly, number of unique collaboration combinations(edges) increased exponentially. D. Team Factors We identified several quantitative attributes for each team formed in a month. These attributes quantify the activity, size, centrality and activity of the particular team. • Clustering Coefficient (clustering): It is the degree to which nodes in a graph tend to cluster together. It is found dividing 3∗T riangles by the number of connected triples of nodes where triangles are cliques with size 3 [21]. • Average Pagerank (avg pagerank): Pagerank is a centrality measure used widely in web search algorithms to estimate the popularity of the pages [22]. Average pagerank estimates the centrality of the developers of the team within the network. • Average Betweenness: (avg bwns) Betweenness of a node is the fraction of shortest paths in the network that also passes through the node. Average betweenness estimates the centrality of developers in the team within the network similar to pagerank [21]. 35 TABLE I E VOLUTION OF T HE C OLLABORATION N ETWORK AND T HE K EY S TRUCTURAL P ROPERTIES OF T HE C OLLABORATION N ETWORK F OR T HE E NTERPRISE P ROJECT Key Network Statistics Node Count Edge Count Diameter Mean Clustering Coefficient Mean Transitivity Mean Clique Size Max Clique Size Weakly Connected Component Count Community Count Modularity(Q) • • • • • • • Month 1 34 148 4 0.650469 0.561219 5.387097 9 1 4 0.22 Month 2 40 208 3 0.730237 0.525645 6.204545 10 1 3 0.22 Month 3 58 378 4 0.672606 0.535743 6.833333 13 1 4 0.22 Month 4 68 559 4 0.705336 0.58701 8.717949 17 1 3 0.21 Month 5 90 917 4 0.664292 0.56618 9.992933 19 1 4 0.21 Average Degrees: (avg degrees) Average degrees is the average collaborators a developer has in a given team with the rest of the team. Node count: (size) Node count is the number of developers in the team and it describes the size of the team. Edges: (edges) Edges denote the number of within-team edges(collaborations) in the team. Past Work: (past work) Past work is the total number of defects fixed by the team members previously. Defects are any source code changes done on function in the source code related to an issue. Changes done related to issues previously are counted if the issue is closed on the time of observation. Last Month’s Work: (months work) It is the total number of defects fixed by the team members in the last month. Changes in the last month on the issues which have been closed are counted. Unique Locations: (unique locs) The number of distinct countries where the team members work during the project. Relative Change: (rel change) Relative change is the relative difference between the team this month and the team most similar to it in the last month. For example, if 10 members are added and 3 members are removed in a team with 15 members originally relative change is (10 + 3)/15 = 0.87. Formula for relative change is as follows: # People added + # People added Relative Difference = Team Size This Month • • # People added + # People added Team Size Next Month (3) Stability: Stability is the normalized value of the inverse of future relative change. It is 0 if there is no change in the team and 1 if the team is completely changed during a month. E. Hypotheses We tested several hypotheses on the team data of the enterprise software in order to answer research question 1. In order to test the hypotheses we used a 2-tailed Spearman’s correlation test. The hypotheses are as follows: Month 7 93 1041 4 0.678065 0.598542 11.504178 22 1 5 0.15 Month 8 102 1653 4 0.736464 0.652298 16.861998 27 1 4 0.15 Month 9 106 1919 4 0.751094 0.679767 18.5196 32 1 4 0.15 Month 10 109 2044 4 0.75301 0.669911 18.648556 32 1 4 0.15 Month 11 114 2262 4 0.754242 0.679357 20.696069 34 1 4 0.15 Month 12 120 2446 4 0.765123 0.672996 21.674272 36 1 4 Month 13 123 2455 4 0.76537 0.671355 21.668607 36 1 3 Hypothesis I: The number of developer teams change over time: As the number of developers increase threefold over a duration of 13 months, new developer communities may emerge and number of distinct teams may increase. We tested Hypothesis I by checking if there is a positive correlation between distinct team count and time(in months). Null hypothesis and the alternative hypothesis for this test can be defined as follows: • • H0 : Number of developer team does not significantly change over time. H1 : Number of developer teams change over time. Hypothesis II: The team size increases over time: Similar to Hypothesis I, team size may increase over time as the number of developers increase threefolds. We tested Hypothesis II by checking if there is a positive correlation between the team size and month. Null hypothesis and the alternative hypothesis for this test can be defined as follows: • • H0 : Size of a developer team does not increase over time. H1 : Size of a developer team increases over time. Hypothesis III: Team activity is higher for larger teams: We tested Hypothesis III by checking if there is a positive correlation among the the factors team activity and the team size. Team activity is the development activity(months work) of the particular team within a single period. Null hypothesis and the alternative hypothesis for this test can be defined as follows: • (2) Future Relative Change: (frel change) Future relative change is calculated similar to the relative change factor. In this case, relative difference between a team this month and the team most similar to it in the next month is calculated. Formula for the future relative change is as follows: frel change = Month 6 91 1034 4 0.690793 0.600575 11.54902 22 1 4 0.16 • H0 : Team activity is not significantly higher for larger developer teams. H1 : Team activity is higher for larger developer teams. F. Developer Team Stability In Figure 2 we observed that some of the developer teams remain stable over time while some teams change dramatically every month. In order to understand the causes of this trend we examined the relation between team stability and the factors related to size, activity and centrality of the teams as presented in Section III-D. We measure the developer team stability as the inverse of future relative change (Stability = 1/f rel change) where future relative change is the relative change in the team for the next month. We assume that as the future relative change gets lower the team gets more stable and vice versa. In order to identify the factors that affect the team stability we built a linear regression which uses the stability as the dependent variable and all the other factors in Section III-D as the independent variables. 36 37 Fig. 2. Visualisation of Team Evolution: X axis is the time in months. Size of the marker denotes the team size. The marker gets darker as the community gets more stable. Marker labels X(+Y -Z) denote team Size(+Added Members - Removed Members) respectively. TABLE II VALUES AND S IGNIFICANCE VALUES OF S PEARMAN C ORRELATION AMONG THE FACTORS (p < 0.001: ***, p < 0.01: **, p < 0.05: *), (S EE S ECTION III-D FOR FACTOR EXPLANATIONS .) avg bwns avg clustering avg degrees avg pagerank edges frel change rel change months work past work size unique locs avg bwns avg clustering avg degrees avg pagerank edges frel change rel change months work past work size -0.59*** -0.65*** 0.64*** -0.48*** 0.36* -0.04 -0.13 -0.43** -0.41** -0.58*** 0.47*** -0.10 0.21 -0.17 0.03 -0.02 0.19 0.05 0.26 -0.43** 0.85*** -0.48*** -0.12 0.19 0.80*** 0.70*** 0.62*** -0.31* 0.01 -0.19 -0.02 -0.29 -0.27 -0.39** -0.44** -0.30* 0.23 0.95*** 0.93*** 0.67*** 0.35* -0.10 -0.42** -0.46** -0.52*** -0.16 -0.32* -0.25 -0.18 0.40** 0.29* 0.18 0.90*** 0.71*** 0.71*** IV. E MPIRICAL R ESULTS TABLE III T HE LINEAR REGRESSION COEFFICIENTS AND THEIR SIGNIFICANCE VALUES (p < 0.001: ***, p < 0.01: **, p < 0.05: *) A. Developer Team Emergence In Figure 2 the evolution of the teams over 13 months can be observed. In the figure, team evolution is traced by linking most similar teams together for each successive month. We see that after month six two relatively large teams emerge in the project and remain relatively stable afterwards during the observed period. These two teams make up more than 80 percent of the developers. The correlations among all the factors are given in Table II. We observe that many of the factors are correlated significantly with each other. The average clustering coefficient of the nodes is significantly positively correlated with degrees but significantly negatively correlated with average betweenness. In general, network size and activity related factors (past work, edges, nodes, post work) are inversely correlated with the network centrality related factors (average page rank and average betweenness). This results, shows that teams with more central members within the network are usually less active. Central members usually connect distant groups and these central members may have management and coordination responsibilities. Past activity of the community is significantly positively correlated with this month’s activity. This finding shows that active teams tend to remain active consistently. On the other hand, number of unique locations is significantly positively correlated with the activity and size related factors while significantly negatively correlated with the centrality factors. The community count does not increase significantly over time as observed in Table I. We observe that number of distinct communities range between 3 and 5 for every month during the thirteen month period. Developers do not build more collaboration teams as their number increases three times. Therefore, we accept the null hypothesis for Hypothesis I. The result of Hypothesis II is evident after the result of Hypothesis I since we have already found that number of communities do not increase while number of developers increases three times in the collaboration network. The community size increases significantly over time. Therefore, we accept the alternative hypothesis for the Hypothesis II. (Intercept***) months work past work* avg pagerank* avg betweenness size avg degrees* unique locations* min relative change* Estimate 2.2762 -0.0001 0.0003 -25.6846 9.5871 -0.0175 -0.0280 0.3052 0.3047 Std. Error 0.6191 0.0002 0.0002 12.5072 13.9609 0.0116 0.0118 0.1346 0.1233 t value 3.68 -0.31 2.20 -2.05 0.69 -1.51 -2.36 -2.27 2.47 Pr(> |t|) 0.0007 0.7545 0.0337 0.0464 0.4961 0.1394 0.0229 0.0287 0.0177 We found that community size has no significant relation with the activity of the team for the given month. We accept the null hypothesis for Hypothesis III. If we consider that team sizes range between 6 and 51 developers, this finding is rather interesting. The team size for a given month does not represent the monthly activity of the team. We would like to emphasize that, monthly activity of a team may not represent their productivity. B. Developer Team Stability In Table II we observe that 5 parameters are correlated significantly with the future relative change of a team. Past relative change and average betweenness are significantly positively correlated with future relative change. We can argue that, the teams which have been relatively stable in the past are likely to remain stable. Teams with members who are more central in the network structure (in terms of betweenness centrality measure) are also more likely to be stable in the future. Four factors are significantly negatively correlated with the future team relative change, including average degrees, edges, size and unique locations. The coefficients of the linear regression equation that fits stability with the team factors are provided in Table III. The factors, past work, average pagerank, average degrees, unique locations and minimum relative change are significant in the regression equation. We observed, surprisingly, that the coefficient of unique locations in the linear regression equation 38 is significantly negative. This trend implies that teams with members from more locations(countries) tend to have more stable structures. One possible reasons for the stability of international teams can be their members’ similar responsibilities in the project such as the development of project core assets. The significance of minimum team stability shows that teams which changed little in the past or teams with members with more past work tend to remain stable in the future. The reason for this trend may be the emergence of responsibility similarity among the stable team members. On the other hand, teams with popular members in terms of degree or centrality tend to remain less stable in the future. The reason for the popularity of team members is their co-work with a lot of people and their collaborators may change for each month. V. D ISCUSSION A. Implications of The Case Study In our case study we have observed that teams among developers may emerge independent of organizational structure and locations of the developers. In this section we discuss some of the possible implications of these findings to the practice. We found that number of teams do not change and the number of members in teams increase significantly over time. In the long run this trend may be good for reducing employee turnover vulnerability of the project since the removal of a few team members would not disintegrate an entire team. Collaboration teams imply concentrated activity on a part of source code by the team members. The concentration of activity may have been caused by the internal structure of the software. In this aspect, large collaboration teams indicate collective code ownership of the collaborated parts by many people. On the other hand, large team sizes may reduce productivity due to coordination problems. There are several studies that highlight organizational challenges in large teams. For example, Rodriguez et al. found that team sizes larger than 9 reduce productivity significantly on 951 projects [23]. It should be noted that our definition of developer teams is significantly different than the organization level team definition of the ISBSG dataset used by Rodriguez et al. We define teams based on the clusters in the collaboration network while ISBSG dataset reports organizational developer team sizes. Team stability may have several implications on software projects in the long term. Stable teams are known to be more productive because of the increased team cohesion over time [6]. Cohesion is known to be an important element in work teams [6]. Pescosolido et al. states that cohesive teams develop a sense of shared responsibility and success [24]. On the other hand, stable team structures may be harmful in the long term according to some proponents of agile methodologies since collective code ownership of the source code may be low [25], [26]. In our case study, we could not observe a trend of overspecialization among the developers. However in an earlier study using the same dataset, we found that issue ownership among developers tends to be imbalanced which may have been caused by overspecialization [27]. B. Threats to Validity In this section, we discuss possible threats to validity of our study. Threats to construct validity consider the relationship between theory and observation, in case the measured variables do not measure the actual factors. We measured collaboration as co-edits on the same function by two developers. The collaboration network model may ignore some aspects of developer cooperation activity [28]. We use function level granularity since the file sizes of the project exceeds several thousand lines of code. Some basic file maintenance operations such as comment edits, documentation mass changes may introduce noise to the collaboration data. We filtered activity such as documentation edits and other edits of the source code in order to model the collaboration more accurately. We assume that co-changes in same files in the source code by a group of people makes them teams. This assumption does not take into account any collaborative activity outside the source code such as meetings and email messages. We have used a large scale enterprise product to conduct the study. Even though drawing general conclusions from an empirical study is very difficult, results should be transferable to other researchers with well-designed and controlled experiments. In addition, the analysis can be replicated on large open source software. VI. C ONCLUSIONS We examined the collaboration activity in an internationally developed large-scale software to answer the following research questions: 1) How do developer teams emerge during software evolution?: Developer teams naturally emerge independent of the organizational structure in large software. In this case, the team formation may have been affected by different factors. We found that number of developer teams remains same (between three and five) over a period of 13 months despite team size increases of three times. On the other hand, the size of the teams increase significantly over time. We also found that the team activity is not correlated with the team size. This finding is especially interesting when we consider that team sizes range between 6 and 51 during development. 2) Which factors related to teams affect the team stability over time?: We found that multiple factors about the team size, activity and centrality affect the stability of the team over time. The lack of relation between the number of distinct locations and team stability was a counter-intuitive finding. A. Future Work Comparison of empirical results with direct observations of developer activity and identification of the factors which may affect team formation may be considered as a future work. If we identify these factors, we can calibrate them to form the desired collaboration teams naturally, without 39 directly changing team members or having any other direct intervention. This would be an important step towards building self-leading developer teams in software projects. ACKNOWLEDGEMENTS This research is supported in part by Turkish State Planning Organization (DPT) under the project number 2007K120610 and by NSERC Project number 402003-2012. We would like to thank IBM Canada Lab – Toronto site for making their development data available for research and strategic help during all phases of this research. The opinions expressed in this paper are those of the authors and not necessarily of IBM Corporation. R EFERENCES [1] J. Herbsleb and A. Mockus, “An empirical study of speed and communication in globally distributed software development,” IEEE Transactions on Software Engineering, vol. 29, no. 6, pp. 481–494, Jun. 2003. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/ wrapper.htm?arnumber=1205177 [2] A. Begel, “Effecting change: Coordination in large-scale software development,” Proceedings of CHASE, 2008. [Online]. Available: http://research.microsoft.com/pubs/75110/effecting-change.pdf [3] D. Easley and J. M. Kleinberg, Networks , Crowds , and Markets : Reasoning about a Highly Connected World, draft ed. Cambridge University Press, 2010. [4] M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E, vol. 69, no. 2, 2003. [Online]. Available: http://link.aps.org/doi/10.1103/PhysRevE.69.026113 [5] J. Hackman, “The design of work teams,” Handbook of organizational behavior, 1987. [Online]. Available: http://groupbrain.wjh.harvard.edu/ jrh/pub/JRH1987 1.pdf [6] E. Sundstrom, K. P. de Meuse, and D. Futrell, “Work teams: Applications and effectiveness.” American Psychologist, vol. 45, no. 2, pp. 120–133, 1990. [Online]. Available: http://doi.apa.org/getdoi.cfm? doi=10.1037/0003-066X.45.2.120 [7] R. Breiger, “The duality of persons and groups,” Social forces, 1974. [Online]. Available: http://sf.oxfordjournals.org/content/53/2/181.short [8] G. Robles, J. M. González-Barahona, and I. Herraiz, “Evolution of the core team of developers in libre software projects,” Mining Software Repositories, 2009. ICSE Workshops MSR’09., pp. 167–170, 2009. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp? arnumber=5069497 [9] C. Bird, T. Zimmermann, and A. Teterev, “A theory of branches as goals and virtual teams,” Proceeding of the 4th international workshop on Cooperative and human aspects of software engineering - CHASE ’11, p. 53, 2011. [Online]. Available: http://portal.acm.org/citation.cfm? doid=1984642.1984655 [10] S. Minto and G. Murphy, “Recommending emergent teams,” 2007. ICSE Workshops MSR’07, no. Section 5, 2007. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=4228642 [11] C. C. Manz, “Self-Leading Work Teams: Moving Beyond SelfManagement Myths,” Human Relations, vol. 45, no. 11, pp. 1119–1140, Nov. 1992. [Online]. Available: http://hum.sagepub.com/ cgi/doi/10.1177/001872679204501101 [12] M. a. Marks, J. E. Mathieu, and S. J. Zaccaro, “A Temporally Based Framework and Taxonomy of Team Processes,” The Academy of Management Review, vol. 26, no. 3, p. 356, Jul. 2001. [Online]. Available: http://www.jstor.org/stable/259182?origin=crossref [13] S. Even and G. Even, Graph Algorithms, ser. Graph Algorithms. Cambridge University Press, 2011. [Online]. Available: http://books. google.com.tr/books?id=m3QTSMYm5rkC [14] S. Datta, R. Sindhgatta, and B. Sengupta, “Evolution of developer collaboration on the jazz platform,” Proceedings of the 4th India Software Engineering Conference on - ISEC ’11, pp. 21–30, 2011. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1953355. 1953359 [15] C. Bird, D. Pattison, R. DSouza, V. Filkov, and P. Devanbu, “Chapels in the bazaar? latent social structure in oss,” in 16th ACM SigSoft International Symposium on the Foundations of Software Engineering, Atlanta, GA. Citeseer, 2008. [16] A. Jermakovics, A. Sillitti, and G. Succi, “Mining and visualizing developer networks from version control systems,” in Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, ser. CHASE ’11. New York, NY, USA: ACM, 2011, pp. 24–31. [Online]. Available: http: //doi.acm.org/10.1145/1984642.1984647 [17] L. Wen, R. G. Dromey, and D. Kirk, “Software engineering and scalefree networks.” IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society, vol. 39, no. 4, pp. 845–54, Aug. 2009. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/19380275 [18] A. Borici, K. Blincoe, A. Schröter, G. Valetto, and D. Damian, “Proxiscientia: Toward real-time visualization of task and developer dependencies in collaborating software development teams,” in CHASE. IEEE, 2012, pp. 5–11. [19] A. Meneely, L. Williams, W. Snipes, and J. Osborne, “Predicting failures with developer networks and social network analysis,” in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. ACM, 2008, pp. 13–23. [Online]. Available: http://portal.acm.org/citation.cfm?id=1453106 [20] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, Oct. 2008. [Online]. Available: http://stacks.iop.org/1742-5468/2008/i=10/a= P10008?key=crossref.46968f6ec61eb8f907a760be1c5ace52 [21] M. Newman, Networks: An Introduction. OUP Oxford, 2010. [Online]. Available: http://books.google.com.tr/books?id=q7HVtpYVfC0C [22] S. Brin and L. Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, vol. 30, no. 1-7, pp. 107–117, Apr. 1998. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S016975529800110X [23] D. Rodrguez, M. Sicilia, E. Garca, and R. Harrison, “Empirical findings on team size and productivity in software development,” Journal of Systems and Software, vol. 85, no. 3, pp. 562–570, 2012, novel approaches in the design and implementation of systems/software architecture. [Online]. Available: http://www.sciencedirect.com/science/ article/pii/S0164121211002366 [24] a. T. Pescosolido and R. Saavedra, “Cohesion and Sports Teams: A Review,” Small Group Research, vol. 43, no. 6, pp. 744–758, Nov. 2012. [Online]. Available: http://sgr.sagepub.com/cgi/doi/10.1177/ 1046496412465020 [25] L. Williams, R. Kessler, W. Cunningham, and R. Jeffries, “Strengthening the case for pair programming,” Software, IEEE, vol. 17, no. 4, pp. 19–25, 2000. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all. jsp?arnumber=854064 [26] K. Beck, Extreme Programming Explained: Embrace Change, ser. The XP Series. Addison-Wesley, 2000. [Online]. Available: http: //books.google.co.uk/books?id=G8EL4H4vf7UC [27] B. Caglayan and A. Bener, “Issue ownership activity in two large software projects,” SIGSOFT Softw. Eng. Notes, vol. 37, no. 6, pp. 1–7, Nov. 2012. [Online]. Available: http://doi.acm.org/10.1145/2382756. 2382786 [28] J. Aranda and G. Venolia, “The secret life of bugs: Going past the errors and omissions in software repositories,” in Proceedings of the 31st International Conference on Software Engineering, ser. ICSE ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 298–308. [Online]. Available: http://dx.doi.org/10.1109/ICSE.2009.5070530 40