Survey Paper - The Lack Thereof

advertisement
1.
Metadata Analysis
Software repositories like CVS, Subversion, SourceForge, and GitHub store metadata such as commit comments, user-ids,
timestamps, and other similar information. This metadata describes, respectively, the why, who, and when context of a source
code change. Additional metadata regarding a source code change is available from other types of software repositories.
Bugzilla is a defect/bug-tracking system that maintains the history of the entire lifecycle of a bug (or a feature). Each bug is
maintained in the form of a record, termed a bug report. In addition to storing the description of a bug, it includes monitoring
fields such as when a bug was reported, assignment to a maintainer, priority, severity, and current state (open/closed).
Archived communications in the form of e-mail lists capture discussions between developers over the lifetime of the project.
1.1 Bug-fixing change analysis
Delegating the responsibility of implementing change requests (e.g., bug fixes and new feature request) to the developer(s)
with the right expertise is by no means trivial task in software maintenance for managers and technical leads. Typically it
involves a project and/or organization wide knowledge and balancing many factors; all of which if handled manually can be
quite tedious. Kagdi and Poshyvanyk [1] presented an approach using metadata to recommend a ranked list of developers to
assist in performing software changes given a textual change request. The approach employs a two-fold strategy. First, a
technique based on information retrieval is put at work to locate the relevant units of source code, e.g., files, classes, and
methods, to a given change request. These units of source code are then fed to a technique that recommends developers based
on their source code change expertise, experience, and contributions, as derived from the analysis of the previous commits. A
high accuracy was observed on a number of open source projects such as Apache httpd, KOffice, and GNU gcc across a wide
variety of issues.
1.2 Successful open-source development
When software repositories are mined, two distinct sources of information are usually explored: the history log and snapshots
of the system. Results of analyses derived from these two sources are biased by the frequency with which developers commit
their changes. The usage of mainstream SCM systems influences the way that developers work. For example, since it is tedious
to resolve conflicts due to parallel commits, developers tend to minimize conflicts by not contemporarily modifying the same
file. This however defeats one of the purposes of such systems. Largely adopted SCM systems are file-based. Combined with
the checkout/checkin protocol, where a developer checks out the code before an implementation session, and checks in the
changed files after an indefinite period of time, SCM systems lose precious information about source code changes that are
impossible to be recovered even with elaborate mining and reverse engineering techniques. Hattori and Lanza [3] presented
Syde tool which records every change by every developer in multi-developer projects at the exact time they happen. A change
is defined as every saved file that has undergone at least one structural change from the last save action. Hence, the matadata
about who changes what and when is accurate and complete. Syde’s collector stores metadata in a centrally accessible
repository. The Notifier broadcasts the metadata to all members of the team thus providing awareness of changes to all
developers. Once a developer has become aware that certain parts of the system have changed, he could request from the
Syde server an update of the specific parts of the code and the distributer updates the client’s source base. Thus Syde uses
metadata to augment the awareness of a team of developers by propagating changes as they happen.
For virtually every open source project there exists a developer mailing list. For many open source projects this mailing list can
also be read by everyone who is interested using a web front-end. One important function of such a mailing list is that people
can submit patches. Patches are change sets that can be applied to the software using a specific tool: the patch tool. Patches
may, for example, introduce new features, fix bugs, translate strings to a different language, or re-structure parts of the
software (by applying refactorings). Unfortunately, not all patches submitted actually make it into the real software and thus
may lead to wrong predictions or suggestions. While there is a considerable amount of research on analyzing the change
information stored in software repositories, only few researchers have looked at software changes contained in email archives
in form of patches. Weißgerber et al. [24] looked at the email archives of two open source projects, FLAC and OpenAFS, to
answer questions like the following: How many emails contain patches? How long does it take for a patch to be accepted? Does
the size of the patch influence its chances to be accepted or the duration until it gets accepted? To answer such questions, they
implemented a parser that extracts patches from e-mail archives in MBOX format. MBOX is a standardized format to store a set
of emails, e.g. all emails sent to a particular address or mailing list. As patches of type context diff and unified diff contain the
name of the modified file in their header, they looked for the application of the patch in all revisions of that file that are
checked-in at a later time than the patch has been sent to the mailing list to identify if a patch is accepted.
SourceForge, the world’s largest development and download repository of open-source code and applications, hosts more than
114,500 projects with 1.3 million registered users. These applications are downloaded and used by millions across the globe.
However, there is little evidence that any of these applications have been tested against a set of rigorous requirements and
indeed, for most of these applications, it is likely that no requirements specification exists. It poses a challenge to test this
software. Elcock and Laplante [2] presented an approach to create a set of behavioral requirements to be used to develop test
cases. These behavioral requirements were developed for a small number of projects derived from SourceForge using artifacts
such as help files and software trails such as open and closed bug reports. These artifacts highlight implemented product
features as well as expected behavior, which form the basis for behavioral requirements. The behavioral requirements are then
used to create test cases. This proved to be a viable technique for testing software when traditional requirements are
unavailable or incomplete.
1.3 Software defects / faults predictors
Why is it that some programs are more failure-prone than others? This is one of the central questions of software engineering.
To answer it, we must first know which programs are more failure-prone than others. One of the most abundant, widespread,
and reliable sources for failure information is a bug database, listing all the problems that occurred during the software
lifetime. Unfortunately, bug databases do not directly record how, where, and by whom the problem in question was fixed. This
information is hidden in the version database, recording all changes to the software source code. In order to predict bugs,
Zimmermann et al. [12] used CVS and Bugzilla. They identified fixes within the commit messages by searching for references to
bug reports such as “Fixed 42233” or “bug #23444”. The trust level was increased when a message contained keywords such as
“fixed” or “bug” or matched patterns like “# and a number”. Then they used bug tracking system to map bug reports to
releases. Each bug report contains a field called “version” that lists the release for which the bug was reported; however, since
the values of this field may change during the life cycle of a bug, only the first reported release was used. Two different kinds of
defects were distinguished: pre-release defects and post-release defects. Since the location of every defect that has been fixed
is known, the number of defects per location and release were counted. The Eclipse bug data set was used to build and assess
models for defect prediction. In order to predict the files/packages that have most post-release defects, linear regression
models were used. Using these models for each file/package the number of expected post-release defects was predicted and
the resulting ranking to the observed ranking using Spearman correlation was compared. The experiments showed that the
combination of complexity metrics can predict defects, suggesting that the more complex code it, the more defects it has.
However, their predictions were far from being perfect. More research is being done in this area.
2.
Static Source Code Analysis
2.1 Bug finding and fixing
More and more software is developed across multiple sites in large geographically distributed teams. As teams get larger and
structures become more complex, finding developers best suited for a specific task gets more difficult. Most automatic
approaches use variants of the Line 10 Rule to determine experts for files. The name Line 10 Rule stems from a version control
system that stores the author who did the commit, in line 10 of the commit’s log message. Developers who changed a file are
considered to have expertise for this file. This heuristic does not work for files and developers with no or little history. This kind
of expertise is known as implementation expertise. Schuler and Zimmermann [4] introduced Usage expertise which considers
developers also accumulate expertise by calling (using) methods. First, CVS commits are reconstructed with a sliding time
window approach. A reconstructed commit is a set of revisions, where each revision is the result of a single check-in. Method
calls that have been inserted are computed within a commit operation. Every commit also has set of changed locations —
method bodies, classes or files. For every location that was changed, the set of added method calls are computed by comparing
the abstract syntax tree of the change before and after commit. As a result the set of new calls are obtained for a commit. The
set of changed locations, the set of new calls, and the author of commits serve as main input for the construction of expertise
profiles. Expertise profiles help to locate and recommend experts, to support collaboration and communication between
developers, and to analyze and inform about the developers usage of APIs.
2.2 Function Usage Patterns
Programmers commonly reuse existing frameworks or libraries to reduce software development efforts. One common problem
in reusing the existing frameworks or libraries is that the programmers know what type of object that they need, but do not
know how to get that object with a specific method sequence. To help programmers to address this issue, Thummalapenta and
Xie [5] developed an approach, called PARSEWeb, which takes queries of the form “Source object type → Destination object
type” as input, and suggests relevant method-invocation sequences that can serve as solutions that yield the destination object
from the source object given in the query. PARSEWeb interacts with a code search engine (CSE) to gather relevant code samples
and performs static analysis over the gathered samples to extract required sequences. As code samples are collected on
demand through CSE, this approach is not limited to queries of any specific set of frameworks or libraries. This approach is
effective in addressing programmers’ queries.
Code Conjurer [6] is an Eclipse plug-in that extracts interface and test information from a developer's coding activities and uses
this information to issue test-driven searches to a code-search engine. It presents components matching the developer's needs
as reuse recommendations without disturbing the development work. Automated dependency resolution then allows selected
components to be woven into the current project with minimal manual effort. Sourcerer [7] is an infrastructure for large-scale
indexing and analysis of open source code. By taking full advantage of this structural information, Sourcerer provides a
foundation upon which state of the art search engines and related tools are built easily.
Software developers spend considerable effort implementing auxiliary functionality used by the main features of a system (e.g.
compressing/decompressing files, encryption/decryption of data, scaling / rotating images). With the increasing amount of
open source code available on the Internet, time and effort can be saved by reusing these utilities through informal practices of
code search and reuse. However, when this type of reuse is performed in an ad hoc manner, it can be tedious and error-prone:
code results have to be manually inspected and extracted into the workspace. Lemos et al. [8] introduced the use of test cases
as an interface for automating code search and reuse and evaluated its applicability and performance in the reuse of auxiliary
functionality. Their approach is called Test-Driven Code Search (TDCS). Test cases serve two purposes: (1) they define the
behavior of the desired functionality to be searched; and (2) they test the matching results for suitability in the local context.
They developed CodeGenie, an Eclipse plugin that performs TDCS using Sourcerer.
2.3 Communication via source code comments
Program comments have long been used as a common practice for improving inter-programmer communication and code
readability, by explicitly specifying programmers’ intentions and assumptions. Unfortunately, comments are not used to their
maximum potential, as since most comments are written in natural language, it is very difficult to automatically analyze them.
Furthermore, unlike source code, comments cannot be tested. As a result, incorrect or obsolete comments can mislead
programmers and introduce new bugs later. Tan, Yuan and Zhou [9] investigated how to explore comments beyond their
common usage. They studied the feasibility and benefits of automatically analyzing comments to detect software bugs and bad
comments. They used Linux as a demonstration case and found that a significant percentage of comments were about “hot
topics” such as synchronization and memory allocation, indicating that the comment analysis may first focus on hot topics
instead of trying to “understand” any arbitrary comments. They examined several open source bug databases and found that
bad or inconsistent comments have introduced bugs, indicating the importance of maintaining comments and detecting
inconsistent comments.
3.
Source Code Differencing and Analysis
3.1 Semantic differencing
During software evolution, information about changes between different versions of a program is useful for a number of
software engineering tasks. For example, configuration-management systems can use change information to assess possible
conflicts among updates from different users. Another example is, in regression testing, knowledge about which parts of a
program are unchanged can help in identifying test cases that need not be rerun. For many of these tasks, a purely syntactic
differencing may not provide enough information for the task to be performed effectively. This problem is especially relevant in
the case of object-oriented software, for which a syntactic change can have subtle and unforeseen effects. Apiwattanapong et
al. [10] presented a technique for comparing object-oriented programs that identifies both differences and correspondences
between two versions of a program. The technique is based on a representation that handles object-oriented features and,
thus, can capture the behavior of object-oriented programs. They introduced JDiff, a tool that implements the technique for
Java programs. One of the main advantages of this technique over traditional text-based differencing approaches is that it can
account for the effect of changes due to object-oriented features. Four empirical studies were performed on many versions of
two medium-sized subjects - Daikon and Jaba. Daikon is a tool for discovering likely program invariants developed by Ernst and
colleagues, whereas Jaba is the analysis tool. The results show that JDiff is more efficient and effective than existing tools.
3.2 Syntactic differencing for fine-grained analyses
A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a
program. Change distilling [11] is a tree differencing algorithm for fine-grained source code change extraction. It extracts
changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that
transforms one tree into the other given the computed matching. As a result, fine-grained change types are identified between
program versions. The change distilling algorithm was tested with a benchmark, which consisted of 1,064 manually classified
changes in 219 revisions of eight methods from three different open source projects. There was a significant improvement in
extracting types of source code changes over the existing approaches: This algorithm approximated the minimum edit script 45
percent better than the original change extraction approach. It found all occurring changes and almost reached the minimum
conforming edit script, that is, it reached a mean absolute percentage error of 34 percent, compared to the 79 percent reached
by the original algorithm.
Evolizer [13] is a platform to enable software evolution analysis, in Eclipse. It’s similar to Kenyon3 or eROSE, but it
systematically integrates change history with version and bug data. Evolizer provides a set of metamodels to represent
software project data along with adequate importer tools to obtain this data from software project repositories. The current
implementation provides support for importing and representing data from the versioning control systems CVS and SVN, the
bug-tracking system Bugzilla, Java source code, and fine-grained source code changes, as well as the integration of these
models. Using the Eclipse plug-in extension facilities and the Hibernate object-relational mapping framework (www.
hibernate.org), extending existing metamodels and data importers in Evolizer, or adding new ones, is straightforward. Models
are defined by Java classes, annotated with Hibernate tags, and added to the list of model classes. Evolizer loads this list of
classes and provides it to the other Evolizer plugins for accessing the software evolution data. Using the Eclipse plug-in
mechanism with extensible metamodels is Evolizer’s main advantage over existing mining tools.
3.3 Identification of refactorings in changes
Ratzinger et al. [14] analyzed the influence of evolution activities such as refactoring on software defects. In a case study of five
open source projects- ArgoUML, JBoss Cache, Liferay Portal, the Spring framework, and XDoclet, attributes of software
evolution were used to predict defects in time periods of six months. Versioning and issue tracking systems were used to
extract 110 data mining features, which were separated into refactoring and non-refactoring related features. These features
were used as input into classification algorithms that create prediction models for software defects. They found that refactoring
related features as well as non-refactoring related features lead to high quality prediction models. Additionally, refactorings
and defects have an inverse correlation: The number of software defects decreases, if the number of refactorings increased in
the preceding time period. It is concluded that refactoring should be a significant part of both bug fixes and other evolutionary
changes to reduce software defects.
Hill et al. [15] inferred much of what we know about how programmers refactor in the wild is based on studies that examine
just a few software projects. Researchers have rarely taken the time to replicate these studies in other contexts or to examine
the assumptions on which they are based. To help put refactoring research on a sound scientific basis, they drew conclusions
using four data sets spanning more than 13 000 developers, 240 000 tool-assisted refactorings, 2500 developer hours, and 3400
version control commits. Using these data, they found that programmers frequently do not indicate refactoring activity in
commit logs, which contradicts assumptions made by several previous researchers. They were able to confirm the assumption
that programmers do frequently intersperse refactoring with other program changes.
4.
Software Metrics
4.1 Complexity of different changes
When building software quality models, the approach often consists of training data mining learners on a single fit dataset.
Typically, this fit dataset contains software metrics collected during a past release of the software project that we want to
predict the quality of. In order to improve the predictive accuracy of such quality models, it is common practice to combine the
predictive results of multiple learners to take advantage of their respective biases. Although multi-learner classifiers have been
proven to be successful in some cases, the improvement is not always significant because the information in the fit dataset
sometimes can be insufficient. Khoshgoftaar et al. [16] presented a method to build software quality models using majority
voting to combine the predictions of multiple learners induced on multiple training datasets. Experimental results show in a
large scale empirical study involving seven real-world datasets and seventeen learners, on average, combining the predictions
of one learner trained on multiple datasets significantly improves the predictive performance compared to one learner induced
on a single fit dataset. Combining multiple learners trained on a single training dataset does not significantly improve the
average predictive accuracy compared to the use of a single learner induced on a single fit dataset.
4.2 Change-prone classes and change-couplings
Source code coupling and change history are two important data sources for change coupling analysis. The popularity of public
open source projects makes both sources available. Zhou et al. [25] inspected different dimensions of software changes
including change significance or source code dependency levels, extract a set of features from the two sources and proposed a
change propagation model based on a bayesian network for change coupling prediction. By combining the features from the
co-changed entities and their dependency relation, the approach modeled the underlying uncertainty. The empirical case study
on two medium-sized open source projects, Azureus and ArgoUML, demonstrated the feasibility and effectiveness of their
approach compared to previous work.
4.3 Types of changes and origin analysis
Software evolution research is limited by the amount of information available to researchers: Current version control tools do
not store all the information generated by developers. They do not record every intermediate version of the system issued, but
only snapshots taken when a developer commits source code into the repository. Additionally, most software evolution analysis
tools are not a part of the day-to-day programming activities, because analysis tools are resource intensive and not integrated
in development environments. Robbes and Lanza [17] proposed to model development information as change operations that
are retrieved directly from the programming environment the developers are using, while they are effecting changes to the
system. This accurate and incremental information opened new ways for both developers and researchers to explore and
evolve complex systems. They built a toolset named SpyWare [18] which, using a monitoring plug-in for integrated
development environments (IDEs), tracks the changes that a developer performs on a program as they happen. SpyWare stores
these first-class changes in a change repository and offers a plethora of productivity-enhancing IDE extensions to exploit the
recorded information.
4.4 Validation of defect detectors
Software managers are routinely confronted with software projects that contain errors or inconsistencies and exceed budget
and time limits. By mining software repositories with comprehensible data mining techniques, predictive models can be
induced that offer software managers the insights they need to tackle these quality and budgeting problems in an efficient way.
Vandecruys et al. [19] used the Ant Colony Optimization (ACO)-based classification technique in a tool called AntMiner+ to
predict erroneous software modules. In an empirical comparison on three real-world public datasets, the rule-based models
produced by AntMiner+ are shown to achieve a predictive accuracy that is competitive to that of the models induced by several
other included classification techniques, such as C4.5, logistic regression and support vector machines. The comprehensibility of
the AntMiner+ models is superior to the latter models.
4.5 Predicting post-release failures
In software development, resources for quality assurance are limited by time and by cost. In order to allocate resources
effectively, managers need to rely on their experience backed by code complexity metrics. But often dependencies exist
between various pieces of code over which managers may have little knowledge. These dependencies can be construed as a
low level graph of the entire system. Zimmermann and Nagappan [20] proposed to use network analysis on these dependency
graphs. This allows managers to identify central program units that are more likely to face defects. Their work was evaluated on
Windows Server 2003, it was found that the recall for models built from network measures is by 10% points higher than for
models built from complexity metrics. In addition, network measures could identify 60% of the binaries that the Windows
developers considered as critical—twice as many as identified by complexity metrics.
Visulization
5.1 Co-changing files
5
Traceability link recovery has been a subject of investigation for many years within the software engineering community.
Having explicit documented traceability links between various artifacts (e.g., source code and documentation) is vital for a
variety of software maintenance tasks including impact analysis, program comprehension, and requirements assurance of high
quality systems. It is particularly useful to support source code comprehension if links exist between the source code, design,
and requirements documentation. These links help explain why a particular function or class exists in the program. Kagdi et al.
[21] presented an approach to recover/discover traceability links between software artifacts via the examination of a software
system’s version history. They applied a heuristic-based approach that used sequential-pattern mining to the commits in
software repositories for uncovering highly frequent co-changing sets of artifacts. They used the idea that if different types of
files are committed together with high frequency then there is a high probability that they have a traceability link between
them. The approach is evaluated on a number of versions of the open source system KDE (K Desktop Environment). The results
show that their approach is able to uncover traceability links between various types of software artifacts (e.g., source code files,
change logs, user documentation, and build files) with high accuracy. This approach can readily be applied, in conjunction with
other link recovery methods, to produce a more complete picture of the traceability of a software system.
5.2 Structural and architectural changes
Voinea and Telea [22] presented structural and architectural changes visually as follows: First, an infrastructure that allows
generic querying and data mining of different types of software repositories such as CVS and Subversion is constructed. Using
this infrastructure, several models of the software source code at different levels of detail are constructed, ranging from project
and package up to function and code line. Second, three views that allow examining the architectural changes at different levels
of detail is presented: the file view shows changes at line level across many versions of a single, or a few, files. The project view
shows changes at file level across entire software projects. The decomposition view shows changes at subsystem level across
entire projects. The toolset provides the standard basic management tools of SCM systems via its integrated CVS client. The
tool is easily scalable to handle huge projects of thousands of files, hundreds of releases, and tens of developers (e.g. VTK,
ArgoUML, PostgresSQL, X Windows, and Mozilla).
5.3 Change smells and refactorings
Refactoring is a hot and controversial issue. Supporters claim that it helps increasing the quality of the code, making it easier to
understand, modify and maintain. Moreover, there are also claims that refactoring yields higher development productivity –
however, there is only limited empirical evidence of such assumption. Moser et al. [23] conducted a case study to assess the
impact of refactoring in a close-to industrial environment, PROM. Results indicate that refactoring not only increases aspects of
software quality, but also improves productivity. Refactoring prevents an explosion of complexity and coupling metrics by
driving developers to simpler design. They studied application development for mobile devices but their findings are applicable
to small teams working in similar, highly volatile domains. However, additional research is needed to ensure that this is indeed
true and to generalize it to other contexts.
References:
[1] Huzefa Kagdi and Denys Poshyvanyk, “Who can help me with this change request?”, In ICPC ’09: Proceedings of the 17th
International Conference on Program Comprehension, pages 273–277. IEEE Computer Society, May 2009.
[2] Albert Elcock, Phillip A. Laplante, “Testing software without requirements: using development artifacts to develop test
cases”, In Innovations Syst Softw Eng (2006) 2:137–145
[3] L. Hattori and M. Lanza, “Mining the history of synchronous changes to refine code ownership”, In Proceedings of MSR
2009 (6th IEEE Working Conference on Mining Software Repositories), pages 141–150. IEEE CS Press, 2009.
[4] Schuler D and Zimmermann T,“Mining Usage Expertise from Version Archives”, In Proceedings of the 2008 international
working conference on Mining software repositories 2008, Leipzig, Germany May 10 - 11, 2008.
[5] S. Thummalapenta and T. Xie, "PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web", In Proc.
Int'l Conf. Automated Software Eng., ACM Press, 2007, pp. 204–213.
[6] O. Hummel, W. Janjic, and C. Atkinson, “Code conjurer: Pulling reusable software out of thin air”, IEEE Softw., 25(5):45–52,
2008.
[7] S. Bajracharya, J. Ossher, and C. Lopes, “Sourcerer: An internet-scale software repository”, In Proceedings of 1st
Intl. Workshop on Search-driven Development, Users, Interfaces,Tools, and Evaluation, 2009.
[8] O. Lemos, S. K. Bajracharya, J. Ossher, P. C. Masiero, and C. Lopes, “Applying test-driven code search to the reuse of auxiliary
functionality”, In 24th Annual ACM Symposium on Applied Computing (SAC 2009), 2009.
[9] Lin Tan, Ding Yuan and Yuanyuan Zhou, “HotComments: How to Make Program Comments More Useful?”, In Proceedings of
Hot Topics in Operating Systems, 2007.
[10] T. Apiwattanapong, A. Orso, and M.J. Harrold, “JDiff: A Differencing Technique and Tool for Object-Oriented Programs,”
Automated Software Eng., vol. 14, no. 1, pp. 3-36, Mar. 2007.
[11] B. Fluri, M. Würsch, M. Pinzger, and H. C. Gall. “Changedistilling: Tree differencing for fine-grained source code change
extraction”, IEEE Trans. Softw. Eng., 33(11):725–743, 2007.
[12] Zimmermann, T., Premraj, R., Zeller, A, “PredictingDefects for Eclipse”, In Proc. 3rd International Workshop on Predictor
Models in Software Engineering (Minneapolis, MN, USA, May, 2007), PROMISE’07.
[13] Gall HC, Fluri B, Pinzger M, “Change analysis with evolizer and change distiller”, IEEE Software 2009, 26: 26–33.
[14] J. Ratzinger, T. Sigmund, and H. C. Gall, “On the relationof refactorings and software defect prediction”, In MSR ’08:
Proceedings of the 2008 International Workshop on Mining Software Repositories, pages 35–38, New York, 2008. ACM.
[15] E. Murphy-Hill, C. Parnin, and A. Black, “How We Refactor, and How We Know It”, In IEEE 31st International Conference on
Software Engineering, 2009.
[16] Khoshgoftaar, T. M., Rebours, P., and Seliya, N, “Software quality analysis by combining multiple projects and learners”,
Software Quality Control 17, 1 (Mar.2009), 25-49.
[17] R. Robbes and M. Lanza, “A change-based approach to software evolution”, Electronic Notes in Theoretical Computer
Science (ENTCS), 166:93–109, Jan. 2007.
[18] Robbes, R., Lanza, M, “Spyware: A change-aware development toolset”, In Proceedings of ICSE (30th International
Conference in Software Engineering), pp. 847–850. ACM Press, New York (2008).
[19] O. Vandecruys, D. Martens, B. Baesens, C. Mues, M.D. Backer, and R. Haesen, “Mining Software Repositories for
Comprehensible Software Fault Prediction Models,” J. Systems and Software, vol. 81, no. 5, pp. 823-839, 2008.
[20] Zimmermann, T. and Nagappan, N., "Predicting Defects using Network Analysis on Dependency Graphs," in 29th
International Conference on Software Engineering, 2007.
[21] H. H. Kagdi, J. I. Maletic, and B. Sharif, “Mining software repositories for traceability links”, In ICPC, pages 145–154. IEEE
Computer Society, 2007.
[22] L. Voinea and A. Telea, “Visual data mining and analysis of software repositories”, Computers & Graphics, 31(3):410–428,
2007.
[23] Moser, R., Abrahamsson, P., Pedrycz, W., Sillitti, A., Succi, G., “A case study on the impact of refactoring on quality and
productivity in an agile team”, In Proc. of the 2nd IFIP Central and East European Conference on Software Engineering
Techniques CEE-SET 2007, Poznan, Poland (2007).
[24] Weißgerber, P., Neu, D., Diehl, S., “Small Patches Get In!”, In Proceedings of the Fifth International Workshop on Mining
Software Repositories, 67 pages. IEEE Press, Los Alamitos (2008).
[25] Zhou Y,W¨urschM, Giger E, Gall H, L¨u J., “A bayesian network based approach for change coupling prediction”, In
Proceedings of the Working Conference on Reverse Engineering (WCRE), IEEE Computer Society: Washington, DC, USA, 2008;
27–36.
Download