Uploaded by jodesmith495

Emergence of Developer Teams in Collaboration Networks

advertisement
Emergence of Developer Teams in the
Collaboration Network
Bora Caglayan1 , Ayşe Başar Bener2 , Andriy Miranskyy3
Bogazici University, Istanbul, Turkey, bora.caglayan@boun.edu.tr1
Ryerson University, Toronto, ON, Canada, ayse.bener@ryerson.ca2
IBM Toronto Software Laboratory, Toronto, ON, Canada, andriy@ca.ibm.com3
Abstract—Developer teams may naturally emerge independent
of managerial decisions, organizational structure, or work locations in large software. Such self organized collaboration teams
of developers can be traced from the source code repositories. In
this paper, we identify the developer teams in the collaboration
network in order to present the work team evolution and the
factors that affect team stability for a large, globally developed,
commercial software. Our findings indicate that: a) Number of
collaboration teams do not change over time, b) Size of the
collaboration teams increases over time, c) Team activity is not
related with team size, d) Factors related to team size, location
and activity affect the stability of teams over time.
I. I NTRODUCTION
Development of a large software is beyond the technical
capabilities of a single person or a few people. For this reason,
large software is built through the collaboration of many
developers. Formation of distinct collaboration communities
(teams) is inevitable over the course of such collaboration.
It is hard to track the underlying source code level developer team structure only by examining the organizational
structure of the developer groups. The increased popularity of
distributed developer teams with the emergence of global software engineering has also increased the difficulty of tracking
the collaboration communities among the developers [1], [2].
Developer collaboration teams may be formed over time
based on the combination of different causes such as managerial decisions, software architectural constraints or employee
preferences. Therefore, the collaboration of developers may be
independent from their roles in the organizational structure.
For example, a subcontractor who reside in Canada may be
the closest collaborator of a development team leader in India.
In order to understand developer team formation, activity
on the software repositories should be analyzed. Afterwards,
using complex networks analysis techniques, the manager can
identify the team structures of developers[3], [4].
In this research, we examine the structure and the evolution of the developer teams during the development of an
internationally developed, large-scale, commercial software
project. We propose a team identification method based on the
collaboration on the source code and identify the programmer
teams (communities) in an automated way by mining the
collaboration network. We investigate the effect of certain
factors such as distributed location, team size and team activity
on team stability.
978-1-4673-6290-0/13/$31.00 c 2013 IEEE
The research questions that will be addressed in this paper
are as follows:
1) How do developer teams emerge during software
evolution?
2) Which factors related to teams affect the team
stability over time?
By answering the first research question, we aim to help
practitioners to understand the code level collaboration and
guide managers in their team formation decisions. Some teams
may emerge only for a short duration, while some teams
remain stable during development. By answering the second
research question, we can understand the factors that affect
near-term stability of the teams during a release.
The rest of the paper is structured as follows: In Section
II we discuss the background for the research and the related
literature. Afterwards, in Section III we define the dataset,
the method used to model collaboration to identify teams and
the empirical setup used to answer the research questions. In
Section IV we present the empirical results. We continue with
a discussion of threats to validity and the implications of the
results in Section V . In Section VI we present the answer to
the research questions and possible future work in this area.
II. BACKGROUND
Work team structures, changes in teams over time and work
team effectiveness have been studied in many disciplines such
as management science [5], psychology [6], sociology [7],
network science [4] and software engineering [8], [9], [10]. In
this section we overview the literature related to work teams
and communities in networks.
A. Work Team Formation
Work teams are small groups of interdependent individuals
and they usually consist of up to thirty members who share
responsibility for outcomes of their organizations [6]. Forming
work teams effectively is one of the most important responsibilities of managers. Management science and sociology
research on work teams are usually theoretical. Researchers
have proposed frameworks for various team construction and
management strategies [5], [6]. For example, Manz argued that
we should move toward self-leading teams in organizations
based on his theoretical research[11]. Similarly, Marks et al.
studied the team processes over time in organizations and
built a theoretical framework to model the various teamwork
33
CHASE 2013, San Francisco, CA, USA
processes [12]. They proposed that, by using their teamwork
process framework, a model for measuring team productivity
for each process can be built.
We believe that empirical work regarding work team formation is relatively easier in the domain of software engineering.
Software development datasets are usually rich in terms of
work information about the developers and all the activity on
the project source code is in most cases recorded indefinitely.
Developer communication and development activities can be
traced easily from the source code management systems and
the issue repositories to identify the work teams.
B. Community Structure in Networks
Communities can be identified in networks by finding the
relatively more strongly connected components (subnetworks)
within the network. For large networks the identification
process becomes a hard computational challenge. Greedy
algorithms and heuristics are proposed to this challenge instead
of exhaustive searches since the exhaustive graph search
algorithms are NP-Complete [13].
Communities in networks can be identified by a top-down
or bottom-up approach. Bottom-up approaches identify communities through techniques similar to k-means clustering in
non-linked data. In short summary; for bottom-up approach,
random nodes are selected within the network and the community is grown like a snowball around the selected node
by adding the strongly connected nodes to the community
as the algorithm iterates. Bottom-up approach is good for
identifying strongly connected communities in especially for
large networks but it may ignore periphary nodes with less
degrees of the community [4].
On the other hand, top-down approaches iteratively remove
edges based on a criteria and stops the algorithm when
a certain condition is met. One common criteria for edge
removal is removing most central edges first. In a scale free
network model, the weak ties have usually higher centrality
since they connect different strongly connected components.
By removing the edges with the highest centrality values
one can form several strongly connected communities in one
weakly connected network [4].
One other challenge in identifying communities in real
networks is performance evaluation of the particular method
for community identification. The performance is evaluated
by calculating the fraction of the sum of within-community
edges to all of the edges in the network. However, using
this performance evaluation technique naı̈vely may give best
results for one giant community which may include the whole
network. For this reason, change in the likelihood of forming
the expected community in the examined network with respect
to a random network should be considered [4]. Ideally, within
community edges of the communities should be significantly
higher compared to random wiring of the same nodes.
C. Software Collaboration Teams
Software collaboration teams have been analyzed previously
by different research groups [8], [9], [10], [2], [14], [15], [16].
Fig. 1. Collaboration Network In August 2009 For The Enterprise Project
Several researchers have investigated certain community
patterns in the collaboration network. One common trend
among the open software developers is the formation of core
and periphery communities [8]. The core communities work
on the open source project consistently over time while the
periphery group makes one-time or rare contributions. Robles
et al. analyzed this trend on several open source projects
and discussed the importance of the core communities for
the open source projects[8]. Bird et al. mined the email
data of open source projects to identify the communities [9].
Jermakovics et al. visualized the network structure by filtering
edges in the file-level collaboration network [16]. Datta et
al. and Wen et al. highlighted the scale-free behavior of the
developer collaboration network [14], [17]. Bird et al. built a
theory on the role of branches in source code repositories on
forming virtual teams [9]. They stated that branches are an
important element in goal formation for developer teams for
large projects.
Begel discussed the coordination challenges in large-scale
software development [2]. He argued that, differences of location, time zone, and culture increases the coordination challenges and reduces developer productivity. His research high-
34
lights the need for virtual coordination mechanisms in software
engineering. To address these coordination challenges, several
researchers proposed recommendation systems for coordination. For example, Minto et al. proposed a recommendation
system which proposes people to collaborate to developers
based on past collaboration activity [10]. Borici et al. built a
real-time visualization tool that highlights developer and task
dependencies to overcome the coordination challenges [18].
Current literature on developer collaboration teams lack
research on the emergence of work teams in software projects
to the best of our knowledge. Therefore, our work is different
from the related research on software collaboration teams in
three aspects: 1)We identify the distinct developer communities in a large scale software without human intervention,
2) We track the evolution of the communities over time, 3)
We define the characteristics of collaboration teams that make
them more or less stable over time.
III. M ETHODOLOGY
A. Dataset
We used a commercial large-scale enterprise software product to conduct our empirical work. The enterprise software
product of the company has a 20 year old code base. We
examined a 500 kLOC part of the product that constitutes a
set of architectural functionality of the project as the dataset.
The programming languages of the project are C and C++. The
software is developed by an international group of developers
in five countries.
B. Data Extraction
We extracted the development activity for a period of 13
months between 01 January 2009 and 01 February 2010. A
major version of the product was released during this period
in summer 2009. This release is chosen since the development
data for it has the best quality. The average size of the source
code files exceed one thousand lines of code. Therefore, we
extracted all of the changes with function level granularity.
9703 defects in the software were fixed by 123 developers
within the 13 month period. For each month we identified
the communities using the clustering algorithm described in
Section III-C. The number of developers increased from 34 to
123 within the observed period. This increase is a usual trend
in the company. The reason for this increase is the addition of
developers from the maintenance groups of older releases as
the release time gets nearer.
C. Collaboration, Communities and Teams
We define collaboration as the co-work on the same function
in the same software source code file during a release of a
software similar to the method proposed by Meneely et al.
for defect prediction [19]. For example if Dev X and Dev
Y have changed the same function in the same source code
file within the same development branch previously we assume
that they have collaborated during development. Collaboration
network is an un-directed non-weighted network < E, V >
where V is the set of developers and E is the set of edges
between the developers if they have collaborated previously in
the release. The collaboration network of the enterprise project
during August 2009 is given in Figure 1 .
Many large software networks have been shown to be scalefree[17], [14]. Therefore, we chose a community identification
technique which has been considered to be effective for scalefree networks [4]. For community inference, we created a
partition dendogram using Louvain algorithm and chose the
best partition using Modularity measure [20]. We measured
the strength of the developer team cohesion with the modularity(Q) measure proposed by Newman et al.[4]. Modularity
estimates the quality of a particular partition of a network by
estimating the likelihood of proposing the partition without
chance. It is found from the difference of the fraction of within
community edges over all edges in the proposed partition
of the network with the expected within community edges
fraction in a network with same nodes and random edges.
If we define E as a kxk matrix where k is the number
of communities and matrix values Eij as the edges between
communities i and j trace of the matrix TrE gives the fraction
of within communities edges. Sum of the matrix multiplication
E∗E gives the fraction of within partition edges in a randomly
connected network with the same nodes and the partition.
Hence, the formula for modularity is defined as follows:
Q = TrE − kE 2 k
(1)
Value of Q ranges between 0 and 1. Q ≤ 0.7 holds for most
real networks since perfect separation is usually not possible.
Q∼
= 0 holds for a random partition of a random network.
The structure of the collaboration network of the enterprise
software for each month is given in Table I. The collaboration
network consists of one giant weakly connected component.
We see that the collaboration densifies over time. In other
words while the developers in the collaboration network
increased linearly, number of unique collaboration combinations(edges) increased exponentially.
D. Team Factors
We identified several quantitative attributes for each team
formed in a month. These attributes quantify the activity, size,
centrality and activity of the particular team.
• Clustering Coefficient (clustering): It is the degree to
which nodes in a graph tend to cluster together. It is found
dividing 3∗T riangles by the number of connected triples
of nodes where triangles are cliques with size 3 [21].
• Average Pagerank (avg pagerank): Pagerank is a centrality measure used widely in web search algorithms
to estimate the popularity of the pages [22]. Average
pagerank estimates the centrality of the developers of the
team within the network.
• Average Betweenness: (avg bwns) Betweenness of a
node is the fraction of shortest paths in the network
that also passes through the node. Average betweenness
estimates the centrality of developers in the team within
the network similar to pagerank [21].
35
TABLE I
E VOLUTION OF T HE C OLLABORATION N ETWORK AND T HE K EY S TRUCTURAL P ROPERTIES OF T HE C OLLABORATION N ETWORK F OR T HE
E NTERPRISE P ROJECT
Key Network Statistics
Node Count
Edge Count
Diameter
Mean Clustering Coefficient
Mean Transitivity
Mean Clique Size
Max Clique Size
Weakly Connected Component Count
Community Count
Modularity(Q)
•
•
•
•
•
•
•
Month 1
34
148
4
0.650469
0.561219
5.387097
9
1
4
0.22
Month 2
40
208
3
0.730237
0.525645
6.204545
10
1
3
0.22
Month 3
58
378
4
0.672606
0.535743
6.833333
13
1
4
0.22
Month 4
68
559
4
0.705336
0.58701
8.717949
17
1
3
0.21
Month 5
90
917
4
0.664292
0.56618
9.992933
19
1
4
0.21
Average Degrees: (avg degrees) Average degrees is the
average collaborators a developer has in a given team
with the rest of the team.
Node count: (size) Node count is the number of developers in the team and it describes the size of the team.
Edges: (edges) Edges denote the number of within-team
edges(collaborations) in the team.
Past Work: (past work) Past work is the total number of
defects fixed by the team members previously. Defects are
any source code changes done on function in the source
code related to an issue. Changes done related to issues
previously are counted if the issue is closed on the time
of observation.
Last Month’s Work: (months work) It is the total
number of defects fixed by the team members in the last
month. Changes in the last month on the issues which
have been closed are counted.
Unique Locations: (unique locs) The number of distinct
countries where the team members work during the
project.
Relative Change: (rel change) Relative change is the
relative difference between the team this month and the
team most similar to it in the last month. For example,
if 10 members are added and 3 members are removed
in a team with 15 members originally relative change is
(10 + 3)/15 = 0.87. Formula for relative change is as
follows:
# People added + # People added
Relative Difference =
Team Size This Month
•
•
# People added + # People added
Team Size Next Month
(3)
Stability: Stability is the normalized value of the inverse
of future relative change. It is 0 if there is no change in
the team and 1 if the team is completely changed during
a month.
E. Hypotheses
We tested several hypotheses on the team data of the
enterprise software in order to answer research question 1.
In order to test the hypotheses we used a 2-tailed Spearman’s
correlation test. The hypotheses are as follows:
Month 7
93
1041
4
0.678065
0.598542
11.504178
22
1
5
0.15
Month 8
102
1653
4
0.736464
0.652298
16.861998
27
1
4
0.15
Month 9
106
1919
4
0.751094
0.679767
18.5196
32
1
4
0.15
Month 10
109
2044
4
0.75301
0.669911
18.648556
32
1
4
0.15
Month 11
114
2262
4
0.754242
0.679357
20.696069
34
1
4
0.15
Month 12
120
2446
4
0.765123
0.672996
21.674272
36
1
4
Month 13
123
2455
4
0.76537
0.671355
21.668607
36
1
3
Hypothesis I: The number of developer teams change
over time:
As the number of developers increase threefold over a
duration of 13 months, new developer communities may
emerge and number of distinct teams may increase. We tested
Hypothesis I by checking if there is a positive correlation
between distinct team count and time(in months). Null hypothesis and the alternative hypothesis for this test can be defined
as follows:
•
•
H0 : Number of developer team does not significantly change
over time.
H1 : Number of developer teams change over time.
Hypothesis II: The team size increases over time:
Similar to Hypothesis I, team size may increase over time
as the number of developers increase threefolds. We tested
Hypothesis II by checking if there is a positive correlation
between the team size and month. Null hypothesis and the
alternative hypothesis for this test can be defined as follows:
•
•
H0 : Size of a developer team does not increase over time.
H1 : Size of a developer team increases over time.
Hypothesis III: Team activity is higher for larger teams:
We tested Hypothesis III by checking if there is a positive
correlation among the the factors team activity and the team
size. Team activity is the development activity(months work)
of the particular team within a single period. Null hypothesis
and the alternative hypothesis for this test can be defined as
follows:
•
(2)
Future Relative Change: (frel change) Future relative
change is calculated similar to the relative change factor.
In this case, relative difference between a team this month
and the team most similar to it in the next month is
calculated. Formula for the future relative change is as
follows:
frel change =
Month 6
91
1034
4
0.690793
0.600575
11.54902
22
1
4
0.16
•
H0 : Team activity is not significantly higher for larger developer teams.
H1 : Team activity is higher for larger developer teams.
F. Developer Team Stability
In Figure 2 we observed that some of the developer teams
remain stable over time while some teams change dramatically
every month. In order to understand the causes of this trend
we examined the relation between team stability and the
factors related to size, activity and centrality of the teams as
presented in Section III-D. We measure the developer team
stability as the inverse of future relative change (Stability =
1/f rel change) where future relative change is the relative
change in the team for the next month. We assume that as the
future relative change gets lower the team gets more stable
and vice versa.
In order to identify the factors that affect the team stability
we built a linear regression which uses the stability as the
dependent variable and all the other factors in Section III-D
as the independent variables.
36
37
Fig. 2. Visualisation of Team Evolution: X axis is the time in months. Size of the marker denotes the team size. The marker gets darker as the community gets more stable. Marker labels X(+Y -Z) denote
team Size(+Added Members - Removed Members) respectively.
TABLE II
VALUES AND S IGNIFICANCE VALUES OF S PEARMAN C ORRELATION AMONG THE FACTORS (p < 0.001: ***, p < 0.01: **, p < 0.05: *), (S EE S ECTION
III-D FOR FACTOR EXPLANATIONS .)
avg bwns
avg clustering
avg degrees
avg pagerank
edges
frel change
rel change
months work
past work
size
unique locs
avg bwns
avg clustering
avg degrees
avg pagerank
edges
frel change
rel change
months work
past work
size
-0.59***
-0.65***
0.64***
-0.48***
0.36*
-0.04
-0.13
-0.43**
-0.41**
-0.58***
0.47***
-0.10
0.21
-0.17
0.03
-0.02
0.19
0.05
0.26
-0.43**
0.85***
-0.48***
-0.12
0.19
0.80***
0.70***
0.62***
-0.31*
0.01
-0.19
-0.02
-0.29
-0.27
-0.39**
-0.44**
-0.30*
0.23
0.95***
0.93***
0.67***
0.35*
-0.10
-0.42**
-0.46**
-0.52***
-0.16
-0.32*
-0.25
-0.18
0.40**
0.29*
0.18
0.90***
0.71***
0.71***
IV. E MPIRICAL R ESULTS
TABLE III
T HE LINEAR REGRESSION COEFFICIENTS AND THEIR SIGNIFICANCE
VALUES (p < 0.001: ***, p < 0.01: **, p < 0.05: *)
A. Developer Team Emergence
In Figure 2 the evolution of the teams over 13 months can
be observed. In the figure, team evolution is traced by linking
most similar teams together for each successive month. We
see that after month six two relatively large teams emerge
in the project and remain relatively stable afterwards during
the observed period. These two teams make up more than 80
percent of the developers.
The correlations among all the factors are given in Table II.
We observe that many of the factors are correlated significantly
with each other. The average clustering coefficient of the
nodes is significantly positively correlated with degrees but
significantly negatively correlated with average betweenness.
In general, network size and activity related factors (past
work, edges, nodes, post work) are inversely correlated with
the network centrality related factors (average page rank and
average betweenness). This results, shows that teams with
more central members within the network are usually less
active. Central members usually connect distant groups and
these central members may have management and coordination responsibilities.
Past activity of the community is significantly positively
correlated with this month’s activity. This finding shows that
active teams tend to remain active consistently. On the other
hand, number of unique locations is significantly positively
correlated with the activity and size related factors while
significantly negatively correlated with the centrality factors.
The community count does not increase significantly over
time as observed in Table I. We observe that number of distinct
communities range between 3 and 5 for every month during
the thirteen month period. Developers do not build more
collaboration teams as their number increases three times.
Therefore, we accept the null hypothesis for Hypothesis I.
The result of Hypothesis II is evident after the result of
Hypothesis I since we have already found that number of communities do not increase while number of developers increases
three times in the collaboration network. The community size
increases significantly over time. Therefore, we accept the
alternative hypothesis for the Hypothesis II.
(Intercept***)
months work
past work*
avg pagerank*
avg betweenness
size
avg degrees*
unique locations*
min relative change*
Estimate
2.2762
-0.0001
0.0003
-25.6846
9.5871
-0.0175
-0.0280
0.3052
0.3047
Std. Error
0.6191
0.0002
0.0002
12.5072
13.9609
0.0116
0.0118
0.1346
0.1233
t value
3.68
-0.31
2.20
-2.05
0.69
-1.51
-2.36
-2.27
2.47
Pr(> |t|)
0.0007
0.7545
0.0337
0.0464
0.4961
0.1394
0.0229
0.0287
0.0177
We found that community size has no significant relation
with the activity of the team for the given month. We accept
the null hypothesis for Hypothesis III. If we consider that
team sizes range between 6 and 51 developers, this finding
is rather interesting. The team size for a given month does not
represent the monthly activity of the team. We would like to
emphasize that, monthly activity of a team may not represent
their productivity.
B. Developer Team Stability
In Table II we observe that 5 parameters are correlated
significantly with the future relative change of a team. Past
relative change and average betweenness are significantly
positively correlated with future relative change. We can argue
that, the teams which have been relatively stable in the past
are likely to remain stable. Teams with members who are
more central in the network structure (in terms of betweenness
centrality measure) are also more likely to be stable in the
future. Four factors are significantly negatively correlated with
the future team relative change, including average degrees,
edges, size and unique locations.
The coefficients of the linear regression equation that fits
stability with the team factors are provided in Table III.
The factors, past work, average pagerank, average degrees,
unique locations and minimum relative change are significant
in the regression equation. We observed, surprisingly, that the
coefficient of unique locations in the linear regression equation
38
is significantly negative. This trend implies that teams with
members from more locations(countries) tend to have more
stable structures. One possible reasons for the stability of international teams can be their members’ similar responsibilities
in the project such as the development of project core assets.
The significance of minimum team stability shows that teams
which changed little in the past or teams with members with
more past work tend to remain stable in the future. The reason
for this trend may be the emergence of responsibility similarity
among the stable team members. On the other hand, teams
with popular members in terms of degree or centrality tend to
remain less stable in the future. The reason for the popularity
of team members is their co-work with a lot of people and
their collaborators may change for each month.
V. D ISCUSSION
A. Implications of The Case Study
In our case study we have observed that teams among developers may emerge independent of organizational structure and
locations of the developers. In this section we discuss some
of the possible implications of these findings to the practice.
We found that number of teams do not change and the
number of members in teams increase significantly over time.
In the long run this trend may be good for reducing employee
turnover vulnerability of the project since the removal of a
few team members would not disintegrate an entire team.
Collaboration teams imply concentrated activity on a part
of source code by the team members. The concentration of
activity may have been caused by the internal structure of
the software. In this aspect, large collaboration teams indicate
collective code ownership of the collaborated parts by many
people. On the other hand, large team sizes may reduce
productivity due to coordination problems. There are several
studies that highlight organizational challenges in large teams.
For example, Rodriguez et al. found that team sizes larger
than 9 reduce productivity significantly on 951 projects [23].
It should be noted that our definition of developer teams
is significantly different than the organization level team
definition of the ISBSG dataset used by Rodriguez et al. We
define teams based on the clusters in the collaboration network
while ISBSG dataset reports organizational developer team
sizes.
Team stability may have several implications on software
projects in the long term. Stable teams are known to be more
productive because of the increased team cohesion over time
[6]. Cohesion is known to be an important element in work
teams [6]. Pescosolido et al. states that cohesive teams develop
a sense of shared responsibility and success [24]. On the other
hand, stable team structures may be harmful in the long term
according to some proponents of agile methodologies since
collective code ownership of the source code may be low [25],
[26]. In our case study, we could not observe a trend of overspecialization among the developers. However in an earlier
study using the same dataset, we found that issue ownership
among developers tends to be imbalanced which may have
been caused by overspecialization [27].
B. Threats to Validity
In this section, we discuss possible threats to validity of our
study.
Threats to construct validity consider the relationship between theory and observation, in case the measured variables
do not measure the actual factors. We measured collaboration
as co-edits on the same function by two developers. The
collaboration network model may ignore some aspects of
developer cooperation activity [28].
We use function level granularity since the file sizes of the
project exceeds several thousand lines of code. Some basic file
maintenance operations such as comment edits, documentation
mass changes may introduce noise to the collaboration data.
We filtered activity such as documentation edits and other
edits of the source code in order to model the collaboration
more accurately. We assume that co-changes in same files in
the source code by a group of people makes them teams.
This assumption does not take into account any collaborative
activity outside the source code such as meetings and email
messages.
We have used a large scale enterprise product to conduct
the study. Even though drawing general conclusions from an
empirical study is very difficult, results should be transferable
to other researchers with well-designed and controlled experiments. In addition, the analysis can be replicated on large open
source software.
VI. C ONCLUSIONS
We examined the collaboration activity in an internationally developed large-scale software to answer the following
research questions:
1) How do developer teams emerge during software evolution?: Developer teams naturally emerge independent
of the organizational structure in large software. In this
case, the team formation may have been affected by
different factors. We found that number of developer
teams remains same (between three and five) over a
period of 13 months despite team size increases of three
times. On the other hand, the size of the teams increase
significantly over time. We also found that the team
activity is not correlated with the team size. This finding
is especially interesting when we consider that team
sizes range between 6 and 51 during development.
2) Which factors related to teams affect the team stability over time?: We found that multiple factors about
the team size, activity and centrality affect the stability
of the team over time. The lack of relation between the
number of distinct locations and team stability was a
counter-intuitive finding.
A. Future Work
Comparison of empirical results with direct observations
of developer activity and identification of the factors which
may affect team formation may be considered as a future
work. If we identify these factors, we can calibrate them
to form the desired collaboration teams naturally, without
39
directly changing team members or having any other direct
intervention. This would be an important step towards building
self-leading developer teams in software projects.
ACKNOWLEDGEMENTS
This research is supported in part by Turkish State Planning
Organization (DPT) under the project number 2007K120610
and by NSERC Project number 402003-2012. We would like
to thank IBM Canada Lab – Toronto site for making their
development data available for research and strategic help
during all phases of this research. The opinions expressed in
this paper are those of the authors and not necessarily of IBM
Corporation.
R EFERENCES
[1] J. Herbsleb and A. Mockus, “An empirical study of speed and
communication in globally distributed software development,” IEEE
Transactions on Software Engineering, vol. 29, no. 6, pp. 481–494,
Jun. 2003. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/
wrapper.htm?arnumber=1205177
[2] A. Begel, “Effecting change: Coordination in large-scale software
development,” Proceedings of CHASE, 2008. [Online]. Available:
http://research.microsoft.com/pubs/75110/effecting-change.pdf
[3] D. Easley and J. M. Kleinberg, Networks , Crowds , and Markets :
Reasoning about a Highly Connected World, draft ed.
Cambridge
University Press, 2010.
[4] M. E. J. Newman and M. Girvan, “Finding and evaluating community
structure in networks,” Phys. Rev. E, vol. 69, no. 2, 2003. [Online].
Available: http://link.aps.org/doi/10.1103/PhysRevE.69.026113
[5] J. Hackman, “The design of work teams,” Handbook of organizational
behavior, 1987. [Online]. Available: http://groupbrain.wjh.harvard.edu/
jrh/pub/JRH1987 1.pdf
[6] E. Sundstrom, K. P. de Meuse, and D. Futrell, “Work teams:
Applications and effectiveness.” American Psychologist, vol. 45, no. 2,
pp. 120–133, 1990. [Online]. Available: http://doi.apa.org/getdoi.cfm?
doi=10.1037/0003-066X.45.2.120
[7] R. Breiger, “The duality of persons and groups,” Social forces, 1974.
[Online]. Available: http://sf.oxfordjournals.org/content/53/2/181.short
[8] G. Robles, J. M. González-Barahona, and I. Herraiz, “Evolution
of the core team of developers in libre software projects,” Mining
Software Repositories, 2009. ICSE Workshops MSR’09., pp. 167–170,
2009. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?
arnumber=5069497
[9] C. Bird, T. Zimmermann, and A. Teterev, “A theory of branches as
goals and virtual teams,” Proceeding of the 4th international workshop
on Cooperative and human aspects of software engineering - CHASE
’11, p. 53, 2011. [Online]. Available: http://portal.acm.org/citation.cfm?
doid=1984642.1984655
[10] S. Minto and G. Murphy, “Recommending emergent teams,” 2007.
ICSE Workshops MSR’07, no. Section 5, 2007. [Online]. Available:
http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=4228642
[11] C. C. Manz, “Self-Leading Work Teams: Moving Beyond SelfManagement Myths,” Human Relations, vol. 45, no. 11, pp.
1119–1140, Nov. 1992. [Online]. Available: http://hum.sagepub.com/
cgi/doi/10.1177/001872679204501101
[12] M. a. Marks, J. E. Mathieu, and S. J. Zaccaro, “A Temporally
Based Framework and Taxonomy of Team Processes,” The Academy
of Management Review, vol. 26, no. 3, p. 356, Jul. 2001. [Online].
Available: http://www.jstor.org/stable/259182?origin=crossref
[13] S. Even and G. Even, Graph Algorithms, ser. Graph Algorithms.
Cambridge University Press, 2011. [Online]. Available: http://books.
google.com.tr/books?id=m3QTSMYm5rkC
[14] S. Datta, R. Sindhgatta, and B. Sengupta, “Evolution of developer
collaboration on the jazz platform,” Proceedings of the 4th India
Software Engineering Conference on - ISEC ’11, pp. 21–30, 2011.
[Online]. Available: http://portal.acm.org/citation.cfm?doid=1953355.
1953359
[15] C. Bird, D. Pattison, R. DSouza, V. Filkov, and P. Devanbu, “Chapels
in the bazaar? latent social structure in oss,” in 16th ACM SigSoft
International Symposium on the Foundations of Software Engineering,
Atlanta, GA. Citeseer, 2008.
[16] A. Jermakovics, A. Sillitti, and G. Succi, “Mining and visualizing
developer networks from version control systems,” in Proceedings
of the 4th International Workshop on Cooperative and Human
Aspects of Software Engineering, ser. CHASE ’11. New York,
NY, USA: ACM, 2011, pp. 24–31. [Online]. Available: http:
//doi.acm.org/10.1145/1984642.1984647
[17] L. Wen, R. G. Dromey, and D. Kirk, “Software engineering and scalefree networks.” IEEE transactions on systems, man, and cybernetics.
Part B, Cybernetics : a publication of the IEEE Systems, Man, and
Cybernetics Society, vol. 39, no. 4, pp. 845–54, Aug. 2009. [Online].
Available: http://www.ncbi.nlm.nih.gov/pubmed/19380275
[18] A. Borici, K. Blincoe, A. Schröter, G. Valetto, and D. Damian, “Proxiscientia: Toward real-time visualization of task and developer dependencies
in collaborating software development teams,” in CHASE. IEEE, 2012,
pp. 5–11.
[19] A. Meneely, L. Williams, W. Snipes, and J. Osborne, “Predicting
failures with developer networks and social network analysis,” in
Proceedings of the 16th ACM SIGSOFT International Symposium
on Foundations of software engineering. ACM, 2008, pp. 13–23.
[Online]. Available: http://portal.acm.org/citation.cfm?id=1453106
[20] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast
unfolding of communities in large networks,” Journal of Statistical
Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, Oct.
2008. [Online]. Available: http://stacks.iop.org/1742-5468/2008/i=10/a=
P10008?key=crossref.46968f6ec61eb8f907a760be1c5ace52
[21] M. Newman, Networks: An Introduction. OUP Oxford, 2010. [Online].
Available: http://books.google.com.tr/books?id=q7HVtpYVfC0C
[22] S. Brin and L. Page, “The anatomy of a large-scale hypertextual
Web search engine,” Computer Networks and ISDN Systems,
vol. 30, no. 1-7, pp. 107–117, Apr. 1998. [Online]. Available:
http://linkinghub.elsevier.com/retrieve/pii/S016975529800110X
[23] D. Rodrguez, M. Sicilia, E. Garca, and R. Harrison, “Empirical
findings on team size and productivity in software development,”
Journal of Systems and Software, vol. 85, no. 3, pp. 562–570, 2012,
novel approaches in the design and implementation of systems/software
architecture. [Online]. Available: http://www.sciencedirect.com/science/
article/pii/S0164121211002366
[24] a. T. Pescosolido and R. Saavedra, “Cohesion and Sports Teams:
A Review,” Small Group Research, vol. 43, no. 6, pp. 744–758,
Nov. 2012. [Online]. Available: http://sgr.sagepub.com/cgi/doi/10.1177/
1046496412465020
[25] L. Williams, R. Kessler, W. Cunningham, and R. Jeffries, “Strengthening
the case for pair programming,” Software, IEEE, vol. 17, no. 4, pp.
19–25, 2000. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.
jsp?arnumber=854064
[26] K. Beck, Extreme Programming Explained: Embrace Change, ser.
The XP Series. Addison-Wesley, 2000. [Online]. Available: http:
//books.google.co.uk/books?id=G8EL4H4vf7UC
[27] B. Caglayan and A. Bener, “Issue ownership activity in two large
software projects,” SIGSOFT Softw. Eng. Notes, vol. 37, no. 6, pp. 1–7,
Nov. 2012. [Online]. Available: http://doi.acm.org/10.1145/2382756.
2382786
[28] J. Aranda and G. Venolia, “The secret life of bugs: Going past the
errors and omissions in software repositories,” in Proceedings of the
31st International Conference on Software Engineering, ser. ICSE ’09.
Washington, DC, USA: IEEE Computer Society, 2009, pp. 298–308.
[Online]. Available: http://dx.doi.org/10.1109/ICSE.2009.5070530
40
Download