1. Metadata Analysis Software repositories like CVS, Subversion, SourceForge, and GitHub store metadata such as commit comments, user-ids, timestamps, and other similar information. This metadata describes, respectively, the why, who, and when context of a source code change. Additional metadata regarding a source code change is available from other types of software repositories. Bugzilla is a defect/bug-tracking system that maintains the history of the entire lifecycle of a bug (or a feature). Each bug is maintained in the form of a record, termed a bug report. In addition to storing the description of a bug, it includes monitoring fields such as when a bug was reported, assignment to a maintainer, priority, severity, and current state (open/closed). Archived communications in the form of e-mail lists capture discussions between developers over the lifetime of the project. 1.1 Bug-fixing change analysis Delegating the responsibility of implementing change requests (e.g., bug fixes and new feature request) to the developer(s) with the right expertise is by no means trivial task in software maintenance for managers and technical leads. Typically it involves a project and/or organization wide knowledge and balancing many factors; all of which if handled manually can be quite tedious. Kagdi and Poshyvanyk [1] presented an approach using metadata to recommend a ranked list of developers to assist in performing software changes given a textual change request. The approach employs a two-fold strategy. First, a technique based on information retrieval is put at work to locate the relevant units of source code, e.g., files, classes, and methods, to a given change request. These units of source code are then fed to a technique that recommends developers based on their source code change expertise, experience, and contributions, as derived from the analysis of the previous commits. A high accuracy was observed on a number of open source projects such as Apache httpd, KOffice, and GNU gcc across a wide variety of issues. 1.2 Successful open-source development When software repositories are mined, two distinct sources of information are usually explored: the history log and snapshots of the system. Results of analyses derived from these two sources are biased by the frequency with which developers commit their changes. The usage of mainstream SCM systems influences the way that developers work. For example, since it is tedious to resolve conflicts due to parallel commits, developers tend to minimize conflicts by not contemporarily modifying the same file. This however defeats one of the purposes of such systems. Largely adopted SCM systems are file-based. Combined with the checkout/checkin protocol, where a developer checks out the code before an implementation session, and checks in the changed files after an indefinite period of time, SCM systems lose precious information about source code changes that are impossible to be recovered even with elaborate mining and reverse engineering techniques. Hattori and Lanza [3] presented Syde tool which records every change by every developer in multi-developer projects at the exact time they happen. A change is defined as every saved file that has undergone at least one structural change from the last save action. Hence, the matadata about who changes what and when is accurate and complete. Syde’s collector stores metadata in a centrally accessible repository. The Notifier broadcasts the metadata to all members of the team thus providing awareness of changes to all developers. Once a developer has become aware that certain parts of the system have changed, he could request from the Syde server an update of the specific parts of the code and the distributer updates the client’s source base. Thus Syde uses metadata to augment the awareness of a team of developers by propagating changes as they happen. For virtually every open source project there exists a developer mailing list. For many open source projects this mailing list can also be read by everyone who is interested using a web front-end. One important function of such a mailing list is that people can submit patches. Patches are change sets that can be applied to the software using a specific tool: the patch tool. Patches may, for example, introduce new features, fix bugs, translate strings to a different language, or re-structure parts of the software (by applying refactorings). Unfortunately, not all patches submitted actually make it into the real software and thus may lead to wrong predictions or suggestions. While there is a considerable amount of research on analyzing the change information stored in software repositories, only few researchers have looked at software changes contained in email archives in form of patches. Weißgerber et al. [24] looked at the email archives of two open source projects, FLAC and OpenAFS, to answer questions like the following: How many emails contain patches? How long does it take for a patch to be accepted? Does the size of the patch influence its chances to be accepted or the duration until it gets accepted? To answer such questions, they implemented a parser that extracts patches from e-mail archives in MBOX format. MBOX is a standardized format to store a set of emails, e.g. all emails sent to a particular address or mailing list. As patches of type context diff and unified diff contain the name of the modified file in their header, they looked for the application of the patch in all revisions of that file that are checked-in at a later time than the patch has been sent to the mailing list to identify if a patch is accepted. SourceForge, the world’s largest development and download repository of open-source code and applications, hosts more than 114,500 projects with 1.3 million registered users. These applications are downloaded and used by millions across the globe. However, there is little evidence that any of these applications have been tested against a set of rigorous requirements and indeed, for most of these applications, it is likely that no requirements specification exists. It poses a challenge to test this software. Elcock and Laplante [2] presented an approach to create a set of behavioral requirements to be used to develop test cases. These behavioral requirements were developed for a small number of projects derived from SourceForge using artifacts such as help files and software trails such as open and closed bug reports. These artifacts highlight implemented product features as well as expected behavior, which form the basis for behavioral requirements. The behavioral requirements are then used to create test cases. This proved to be a viable technique for testing software when traditional requirements are unavailable or incomplete. 1.3 Software defects / faults predictors Why is it that some programs are more failure-prone than others? This is one of the central questions of software engineering. To answer it, we must first know which programs are more failure-prone than others. One of the most abundant, widespread, and reliable sources for failure information is a bug database, listing all the problems that occurred during the software lifetime. Unfortunately, bug databases do not directly record how, where, and by whom the problem in question was fixed. This information is hidden in the version database, recording all changes to the software source code. In order to predict bugs, Zimmermann et al. [12] used CVS and Bugzilla. They identified fixes within the commit messages by searching for references to bug reports such as “Fixed 42233” or “bug #23444”. The trust level was increased when a message contained keywords such as “fixed” or “bug” or matched patterns like “# and a number”. Then they used bug tracking system to map bug reports to releases. Each bug report contains a field called “version” that lists the release for which the bug was reported; however, since the values of this field may change during the life cycle of a bug, only the first reported release was used. Two different kinds of defects were distinguished: pre-release defects and post-release defects. Since the location of every defect that has been fixed is known, the number of defects per location and release were counted. The Eclipse bug data set was used to build and assess models for defect prediction. In order to predict the files/packages that have most post-release defects, linear regression models were used. Using these models for each file/package the number of expected post-release defects was predicted and the resulting ranking to the observed ranking using Spearman correlation was compared. The experiments showed that the combination of complexity metrics can predict defects, suggesting that the more complex code it, the more defects it has. However, their predictions were far from being perfect. More research is being done in this area. 2. Static Source Code Analysis 2.1 Bug finding and fixing More and more software is developed across multiple sites in large geographically distributed teams. As teams get larger and structures become more complex, finding developers best suited for a specific task gets more difficult. Most automatic approaches use variants of the Line 10 Rule to determine experts for files. The name Line 10 Rule stems from a version control system that stores the author who did the commit, in line 10 of the commit’s log message. Developers who changed a file are considered to have expertise for this file. This heuristic does not work for files and developers with no or little history. This kind of expertise is known as implementation expertise. Schuler and Zimmermann [4] introduced Usage expertise which considers developers also accumulate expertise by calling (using) methods. First, CVS commits are reconstructed with a sliding time window approach. A reconstructed commit is a set of revisions, where each revision is the result of a single check-in. Method calls that have been inserted are computed within a commit operation. Every commit also has set of changed locations — method bodies, classes or files. For every location that was changed, the set of added method calls are computed by comparing the abstract syntax tree of the change before and after commit. As a result the set of new calls are obtained for a commit. The set of changed locations, the set of new calls, and the author of commits serve as main input for the construction of expertise profiles. Expertise profiles help to locate and recommend experts, to support collaboration and communication between developers, and to analyze and inform about the developers usage of APIs. 2.2 Function Usage Patterns Programmers commonly reuse existing frameworks or libraries to reduce software development efforts. One common problem in reusing the existing frameworks or libraries is that the programmers know what type of object that they need, but do not know how to get that object with a specific method sequence. To help programmers to address this issue, Thummalapenta and Xie [5] developed an approach, called PARSEWeb, which takes queries of the form “Source object type → Destination object type” as input, and suggests relevant method-invocation sequences that can serve as solutions that yield the destination object from the source object given in the query. PARSEWeb interacts with a code search engine (CSE) to gather relevant code samples and performs static analysis over the gathered samples to extract required sequences. As code samples are collected on demand through CSE, this approach is not limited to queries of any specific set of frameworks or libraries. This approach is effective in addressing programmers’ queries. Code Conjurer [6] is an Eclipse plug-in that extracts interface and test information from a developer's coding activities and uses this information to issue test-driven searches to a code-search engine. It presents components matching the developer's needs as reuse recommendations without disturbing the development work. Automated dependency resolution then allows selected components to be woven into the current project with minimal manual effort. Sourcerer [7] is an infrastructure for large-scale indexing and analysis of open source code. By taking full advantage of this structural information, Sourcerer provides a foundation upon which state of the art search engines and related tools are built easily. Software developers spend considerable effort implementing auxiliary functionality used by the main features of a system (e.g. compressing/decompressing files, encryption/decryption of data, scaling / rotating images). With the increasing amount of open source code available on the Internet, time and effort can be saved by reusing these utilities through informal practices of code search and reuse. However, when this type of reuse is performed in an ad hoc manner, it can be tedious and error-prone: code results have to be manually inspected and extracted into the workspace. Lemos et al. [8] introduced the use of test cases as an interface for automating code search and reuse and evaluated its applicability and performance in the reuse of auxiliary functionality. Their approach is called Test-Driven Code Search (TDCS). Test cases serve two purposes: (1) they define the behavior of the desired functionality to be searched; and (2) they test the matching results for suitability in the local context. They developed CodeGenie, an Eclipse plugin that performs TDCS using Sourcerer. 2.3 Communication via source code comments Program comments have long been used as a common practice for improving inter-programmer communication and code readability, by explicitly specifying programmers’ intentions and assumptions. Unfortunately, comments are not used to their maximum potential, as since most comments are written in natural language, it is very difficult to automatically analyze them. Furthermore, unlike source code, comments cannot be tested. As a result, incorrect or obsolete comments can mislead programmers and introduce new bugs later. Tan, Yuan and Zhou [9] investigated how to explore comments beyond their common usage. They studied the feasibility and benefits of automatically analyzing comments to detect software bugs and bad comments. They used Linux as a demonstration case and found that a significant percentage of comments were about “hot topics” such as synchronization and memory allocation, indicating that the comment analysis may first focus on hot topics instead of trying to “understand” any arbitrary comments. They examined several open source bug databases and found that bad or inconsistent comments have introduced bugs, indicating the importance of maintaining comments and detecting inconsistent comments. 3. Source Code Differencing and Analysis 3.1 Semantic differencing During software evolution, information about changes between different versions of a program is useful for a number of software engineering tasks. For example, configuration-management systems can use change information to assess possible conflicts among updates from different users. Another example is, in regression testing, knowledge about which parts of a program are unchanged can help in identifying test cases that need not be rerun. For many of these tasks, a purely syntactic differencing may not provide enough information for the task to be performed effectively. This problem is especially relevant in the case of object-oriented software, for which a syntactic change can have subtle and unforeseen effects. Apiwattanapong et al. [10] presented a technique for comparing object-oriented programs that identifies both differences and correspondences between two versions of a program. The technique is based on a representation that handles object-oriented features and, thus, can capture the behavior of object-oriented programs. They introduced JDiff, a tool that implements the technique for Java programs. One of the main advantages of this technique over traditional text-based differencing approaches is that it can account for the effect of changes due to object-oriented features. Four empirical studies were performed on many versions of two medium-sized subjects - Daikon and Jaba. Daikon is a tool for discovering likely program invariants developed by Ernst and colleagues, whereas Jaba is the analysis tool. The results show that JDiff is more efficient and effective than existing tools. 3.2 Syntactic differencing for fine-grained analyses A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. Change distilling [11] is a tree differencing algorithm for fine-grained source code change extraction. It extracts changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that transforms one tree into the other given the computed matching. As a result, fine-grained change types are identified between program versions. The change distilling algorithm was tested with a benchmark, which consisted of 1,064 manually classified changes in 219 revisions of eight methods from three different open source projects. There was a significant improvement in extracting types of source code changes over the existing approaches: This algorithm approximated the minimum edit script 45 percent better than the original change extraction approach. It found all occurring changes and almost reached the minimum conforming edit script, that is, it reached a mean absolute percentage error of 34 percent, compared to the 79 percent reached by the original algorithm. Evolizer [13] is a platform to enable software evolution analysis, in Eclipse. It’s similar to Kenyon3 or eROSE, but it systematically integrates change history with version and bug data. Evolizer provides a set of metamodels to represent software project data along with adequate importer tools to obtain this data from software project repositories. The current implementation provides support for importing and representing data from the versioning control systems CVS and SVN, the bug-tracking system Bugzilla, Java source code, and fine-grained source code changes, as well as the integration of these models. Using the Eclipse plug-in extension facilities and the Hibernate object-relational mapping framework (www. hibernate.org), extending existing metamodels and data importers in Evolizer, or adding new ones, is straightforward. Models are defined by Java classes, annotated with Hibernate tags, and added to the list of model classes. Evolizer loads this list of classes and provides it to the other Evolizer plugins for accessing the software evolution data. Using the Eclipse plug-in mechanism with extensible metamodels is Evolizer’s main advantage over existing mining tools. 3.3 Identification of refactorings in changes Ratzinger et al. [14] analyzed the influence of evolution activities such as refactoring on software defects. In a case study of five open source projects- ArgoUML, JBoss Cache, Liferay Portal, the Spring framework, and XDoclet, attributes of software evolution were used to predict defects in time periods of six months. Versioning and issue tracking systems were used to extract 110 data mining features, which were separated into refactoring and non-refactoring related features. These features were used as input into classification algorithms that create prediction models for software defects. They found that refactoring related features as well as non-refactoring related features lead to high quality prediction models. Additionally, refactorings and defects have an inverse correlation: The number of software defects decreases, if the number of refactorings increased in the preceding time period. It is concluded that refactoring should be a significant part of both bug fixes and other evolutionary changes to reduce software defects. Hill et al. [15] inferred much of what we know about how programmers refactor in the wild is based on studies that examine just a few software projects. Researchers have rarely taken the time to replicate these studies in other contexts or to examine the assumptions on which they are based. To help put refactoring research on a sound scientific basis, they drew conclusions using four data sets spanning more than 13 000 developers, 240 000 tool-assisted refactorings, 2500 developer hours, and 3400 version control commits. Using these data, they found that programmers frequently do not indicate refactoring activity in commit logs, which contradicts assumptions made by several previous researchers. They were able to confirm the assumption that programmers do frequently intersperse refactoring with other program changes. 4. Software Metrics 4.1 Complexity of different changes When building software quality models, the approach often consists of training data mining learners on a single fit dataset. Typically, this fit dataset contains software metrics collected during a past release of the software project that we want to predict the quality of. In order to improve the predictive accuracy of such quality models, it is common practice to combine the predictive results of multiple learners to take advantage of their respective biases. Although multi-learner classifiers have been proven to be successful in some cases, the improvement is not always significant because the information in the fit dataset sometimes can be insufficient. Khoshgoftaar et al. [16] presented a method to build software quality models using majority voting to combine the predictions of multiple learners induced on multiple training datasets. Experimental results show in a large scale empirical study involving seven real-world datasets and seventeen learners, on average, combining the predictions of one learner trained on multiple datasets significantly improves the predictive performance compared to one learner induced on a single fit dataset. Combining multiple learners trained on a single training dataset does not significantly improve the average predictive accuracy compared to the use of a single learner induced on a single fit dataset. 4.2 Change-prone classes and change-couplings Source code coupling and change history are two important data sources for change coupling analysis. The popularity of public open source projects makes both sources available. Zhou et al. [25] inspected different dimensions of software changes including change significance or source code dependency levels, extract a set of features from the two sources and proposed a change propagation model based on a bayesian network for change coupling prediction. By combining the features from the co-changed entities and their dependency relation, the approach modeled the underlying uncertainty. The empirical case study on two medium-sized open source projects, Azureus and ArgoUML, demonstrated the feasibility and effectiveness of their approach compared to previous work. 4.3 Types of changes and origin analysis Software evolution research is limited by the amount of information available to researchers: Current version control tools do not store all the information generated by developers. They do not record every intermediate version of the system issued, but only snapshots taken when a developer commits source code into the repository. Additionally, most software evolution analysis tools are not a part of the day-to-day programming activities, because analysis tools are resource intensive and not integrated in development environments. Robbes and Lanza [17] proposed to model development information as change operations that are retrieved directly from the programming environment the developers are using, while they are effecting changes to the system. This accurate and incremental information opened new ways for both developers and researchers to explore and evolve complex systems. They built a toolset named SpyWare [18] which, using a monitoring plug-in for integrated development environments (IDEs), tracks the changes that a developer performs on a program as they happen. SpyWare stores these first-class changes in a change repository and offers a plethora of productivity-enhancing IDE extensions to exploit the recorded information. 4.4 Validation of defect detectors Software managers are routinely confronted with software projects that contain errors or inconsistencies and exceed budget and time limits. By mining software repositories with comprehensible data mining techniques, predictive models can be induced that offer software managers the insights they need to tackle these quality and budgeting problems in an efficient way. Vandecruys et al. [19] used the Ant Colony Optimization (ACO)-based classification technique in a tool called AntMiner+ to predict erroneous software modules. In an empirical comparison on three real-world public datasets, the rule-based models produced by AntMiner+ are shown to achieve a predictive accuracy that is competitive to that of the models induced by several other included classification techniques, such as C4.5, logistic regression and support vector machines. The comprehensibility of the AntMiner+ models is superior to the latter models. 4.5 Predicting post-release failures In software development, resources for quality assurance are limited by time and by cost. In order to allocate resources effectively, managers need to rely on their experience backed by code complexity metrics. But often dependencies exist between various pieces of code over which managers may have little knowledge. These dependencies can be construed as a low level graph of the entire system. Zimmermann and Nagappan [20] proposed to use network analysis on these dependency graphs. This allows managers to identify central program units that are more likely to face defects. Their work was evaluated on Windows Server 2003, it was found that the recall for models built from network measures is by 10% points higher than for models built from complexity metrics. In addition, network measures could identify 60% of the binaries that the Windows developers considered as critical—twice as many as identified by complexity metrics. Visulization 5.1 Co-changing files 5 Traceability link recovery has been a subject of investigation for many years within the software engineering community. Having explicit documented traceability links between various artifacts (e.g., source code and documentation) is vital for a variety of software maintenance tasks including impact analysis, program comprehension, and requirements assurance of high quality systems. It is particularly useful to support source code comprehension if links exist between the source code, design, and requirements documentation. These links help explain why a particular function or class exists in the program. Kagdi et al. [21] presented an approach to recover/discover traceability links between software artifacts via the examination of a software system’s version history. They applied a heuristic-based approach that used sequential-pattern mining to the commits in software repositories for uncovering highly frequent co-changing sets of artifacts. They used the idea that if different types of files are committed together with high frequency then there is a high probability that they have a traceability link between them. The approach is evaluated on a number of versions of the open source system KDE (K Desktop Environment). The results show that their approach is able to uncover traceability links between various types of software artifacts (e.g., source code files, change logs, user documentation, and build files) with high accuracy. This approach can readily be applied, in conjunction with other link recovery methods, to produce a more complete picture of the traceability of a software system. 5.2 Structural and architectural changes Voinea and Telea [22] presented structural and architectural changes visually as follows: First, an infrastructure that allows generic querying and data mining of different types of software repositories such as CVS and Subversion is constructed. Using this infrastructure, several models of the software source code at different levels of detail are constructed, ranging from project and package up to function and code line. Second, three views that allow examining the architectural changes at different levels of detail is presented: the file view shows changes at line level across many versions of a single, or a few, files. The project view shows changes at file level across entire software projects. The decomposition view shows changes at subsystem level across entire projects. The toolset provides the standard basic management tools of SCM systems via its integrated CVS client. The tool is easily scalable to handle huge projects of thousands of files, hundreds of releases, and tens of developers (e.g. VTK, ArgoUML, PostgresSQL, X Windows, and Mozilla). 5.3 Change smells and refactorings Refactoring is a hot and controversial issue. Supporters claim that it helps increasing the quality of the code, making it easier to understand, modify and maintain. Moreover, there are also claims that refactoring yields higher development productivity – however, there is only limited empirical evidence of such assumption. Moser et al. [23] conducted a case study to assess the impact of refactoring in a close-to industrial environment, PROM. Results indicate that refactoring not only increases aspects of software quality, but also improves productivity. Refactoring prevents an explosion of complexity and coupling metrics by driving developers to simpler design. They studied application development for mobile devices but their findings are applicable to small teams working in similar, highly volatile domains. However, additional research is needed to ensure that this is indeed true and to generalize it to other contexts. References: [1] Huzefa Kagdi and Denys Poshyvanyk, “Who can help me with this change request?”, In ICPC ’09: Proceedings of the 17th International Conference on Program Comprehension, pages 273–277. IEEE Computer Society, May 2009. [2] Albert Elcock, Phillip A. Laplante, “Testing software without requirements: using development artifacts to develop test cases”, In Innovations Syst Softw Eng (2006) 2:137–145 [3] L. Hattori and M. Lanza, “Mining the history of synchronous changes to refine code ownership”, In Proceedings of MSR 2009 (6th IEEE Working Conference on Mining Software Repositories), pages 141–150. IEEE CS Press, 2009. [4] Schuler D and Zimmermann T,“Mining Usage Expertise from Version Archives”, In Proceedings of the 2008 international working conference on Mining software repositories 2008, Leipzig, Germany May 10 - 11, 2008. [5] S. Thummalapenta and T. Xie, "PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web", In Proc. Int'l Conf. Automated Software Eng., ACM Press, 2007, pp. 204–213. [6] O. Hummel, W. Janjic, and C. Atkinson, “Code conjurer: Pulling reusable software out of thin air”, IEEE Softw., 25(5):45–52, 2008. [7] S. Bajracharya, J. Ossher, and C. Lopes, “Sourcerer: An internet-scale software repository”, In Proceedings of 1st Intl. Workshop on Search-driven Development, Users, Interfaces,Tools, and Evaluation, 2009. [8] O. Lemos, S. K. Bajracharya, J. Ossher, P. C. Masiero, and C. Lopes, “Applying test-driven code search to the reuse of auxiliary functionality”, In 24th Annual ACM Symposium on Applied Computing (SAC 2009), 2009. [9] Lin Tan, Ding Yuan and Yuanyuan Zhou, “HotComments: How to Make Program Comments More Useful?”, In Proceedings of Hot Topics in Operating Systems, 2007. [10] T. Apiwattanapong, A. Orso, and M.J. Harrold, “JDiff: A Differencing Technique and Tool for Object-Oriented Programs,” Automated Software Eng., vol. 14, no. 1, pp. 3-36, Mar. 2007. [11] B. Fluri, M. Würsch, M. Pinzger, and H. C. Gall. “Changedistilling: Tree differencing for fine-grained source code change extraction”, IEEE Trans. Softw. Eng., 33(11):725–743, 2007. [12] Zimmermann, T., Premraj, R., Zeller, A, “PredictingDefects for Eclipse”, In Proc. 3rd International Workshop on Predictor Models in Software Engineering (Minneapolis, MN, USA, May, 2007), PROMISE’07. [13] Gall HC, Fluri B, Pinzger M, “Change analysis with evolizer and change distiller”, IEEE Software 2009, 26: 26–33. [14] J. Ratzinger, T. Sigmund, and H. C. Gall, “On the relationof refactorings and software defect prediction”, In MSR ’08: Proceedings of the 2008 International Workshop on Mining Software Repositories, pages 35–38, New York, 2008. ACM. [15] E. Murphy-Hill, C. Parnin, and A. Black, “How We Refactor, and How We Know It”, In IEEE 31st International Conference on Software Engineering, 2009. [16] Khoshgoftaar, T. M., Rebours, P., and Seliya, N, “Software quality analysis by combining multiple projects and learners”, Software Quality Control 17, 1 (Mar.2009), 25-49. [17] R. Robbes and M. Lanza, “A change-based approach to software evolution”, Electronic Notes in Theoretical Computer Science (ENTCS), 166:93–109, Jan. 2007. [18] Robbes, R., Lanza, M, “Spyware: A change-aware development toolset”, In Proceedings of ICSE (30th International Conference in Software Engineering), pp. 847–850. ACM Press, New York (2008). [19] O. Vandecruys, D. Martens, B. Baesens, C. Mues, M.D. Backer, and R. Haesen, “Mining Software Repositories for Comprehensible Software Fault Prediction Models,” J. Systems and Software, vol. 81, no. 5, pp. 823-839, 2008. [20] Zimmermann, T. and Nagappan, N., "Predicting Defects using Network Analysis on Dependency Graphs," in 29th International Conference on Software Engineering, 2007. [21] H. H. Kagdi, J. I. Maletic, and B. Sharif, “Mining software repositories for traceability links”, In ICPC, pages 145–154. IEEE Computer Society, 2007. [22] L. Voinea and A. Telea, “Visual data mining and analysis of software repositories”, Computers & Graphics, 31(3):410–428, 2007. [23] Moser, R., Abrahamsson, P., Pedrycz, W., Sillitti, A., Succi, G., “A case study on the impact of refactoring on quality and productivity in an agile team”, In Proc. of the 2nd IFIP Central and East European Conference on Software Engineering Techniques CEE-SET 2007, Poznan, Poland (2007). [24] Weißgerber, P., Neu, D., Diehl, S., “Small Patches Get In!”, In Proceedings of the Fifth International Workshop on Mining Software Repositories, 67 pages. IEEE Press, Los Alamitos (2008). [25] Zhou Y,W¨urschM, Giger E, Gall H, L¨u J., “A bayesian network based approach for change coupling prediction”, In Proceedings of the Working Conference on Reverse Engineering (WCRE), IEEE Computer Society: Washington, DC, USA, 2008; 27–36.