Transformational Abstraction for Java (TAJ) Advisor: Student: Dr. Spencer Rugaber Sergio Berzosa González Motivation A major problem in software maintenance and reverse engineering is the lack of documentation that represents the actual state of the application source code. When a software product needs to be updated to fix bugs, add new functionality, or ported to other system, usually developers find themselves with documentation that does not reflect the actual state of the application (and the source code) making it hard to comprehend how the code accomplishes the functionality stated in the different high level documentation elements. Developers then must read and understand the source code, using code comments as the low level documentation, which may themselves be out of date. TAJ’s functionality is based on the idea that the source code is created to simulate elements pertaining to a well define universe (the problem domain). The source code is driven by the problem domain, and thus source code constructions are related to elements contained in the problem domain. Existing tools provide the ability to document programs at low levels (source code comments) or at high levels of abstraction (architecture diagrams), but the creation of the source code is affected by low level design decisions that obscure the higher level abstractions. TAJ fills this gap by explicitly linking source code constructions to elements in the problem domain. Traditional documentation allows developers to get information about different levels of abstraction, from the simple source code comments, to class diagrams, state charts, sequence diagrams, etc. Those different types of documentation are not directly related to each other, so making use of those documents means having to manually fill the gap between the different representations. In a sense TAJ offers an integral solution by allowing users to create and view documentations at multiple abstraction levels while maintaining a relation between the different levels. By offering these levels of abstraction different kind of users can obtain just the necessary information they need, from a view where the low level is represented to a high level where the main interactions between the different components can be easily observed. Moreover, users can interactively drill down or move up in the abstraction levels at different parts of the source code file, so they can always get the most appropriate representation for each part of the source code. All these elements constitute a system that helps developers in two different scenarios: Building up an understanding from undocumented code: By following the process of examining the code looking for domain elements’ implementations, and by modifying the domain to more precisely represent the code, developers can build up understanding of the system. Using TAJ to understand previously annotated code: When a new developer needs to start working on code created by other developer, TAJ can be used as a documentation element to detect which sections of the source code are used to implement a certain domain element. This would allow a quick navigation to the source code points of interests to build a general understanding of the application. Architecture TAJ can be viewed as two different components that interact with each other to create a mapping between source code and the domain. TAJ editor: the editor allows developers to group continuous lines of code, giving it a brief description of the intent of the grouped code; these elements receive the name of chunks. The chunks can then be folded or unfolded to toggle between the source code and the description associated with the chunk. The chunk structure also allows grouping a contiguous list of source code lines or chunks, creating a tree-like structure. TAJ domain editor: the domain editor component allows developers to create and modify a domain model. The different elements in the domain model can them be linked to different chunks created in the editor component of TAJ. The domain representation allows the creation of associations and specializations between the elements of the domain as well as the definition of operations and attributes for the different domain elements. These elements are developed under the Eclipse framework to be integrated into the Eclipse application as a plug-in. The use of this popular IDE allows developers to easily integrate TAJ into their workflow. Initial project status Last version available of TAJ previous to release 2.0 was designed using the Eclipse framework 2.x. The release only included the TAJ editor as part of the plug-in, while the TAJ modeler was at a conceptual stage. The TAJ editor enabled users to collapse contiguous lines of text into chunks. The structure generated by those folding could be saved in order to be later recovered for further modifications. Along with the java file being edited, TAJ created an addition file on the file system containing information of the different chunks created by the user. That filename was created using the filename of the java file appended with the string “.taj” The process of saving the chunks could be considered destructive since the creation of chunks modified the original content of the java file being edited. Chunks that were collapsed, replaced the actual portions of the java code with the description associated with each of the chunks. This made it impossible to manipulate the java file without first unfolding the collapsed chunks to recover the original source code file. Along the editor, an outline associated with it was created to show the different elements of the document being edited as a series of lines and chunks. This outline gives visual feedback of the hierarchy of the different chunks in the document. Conversion to Eclipse Framework 3.x After understanding how the last version of the plug-in was constructed and how it interacted with the Eclipse Framework, the next task was to be developed was porting the latest version of TAJ at that time, to the current version of the Eclipse Framework. This task involved the study of the modifications suffered by the Eclipse Framework from release 2 to release 3, how the TAJ source code could be modified to be compatible with it, and how the new features could be used to enhance the TAJ project. Eclipse 3 includes a tool that allows developers to automatically modify an existing plugcode to run in the new Eclipse 3 framework. This is done by including a compatibility layer that enables legacy plug-ins to continue to use the same interfaces and classes by maintaining binary compatibility. After we were able to execute the plug-in using Eclipse 3, the task of modifying the TAJ source code so that it runs natively started. There are some exceptions were the API changes could not be done in any compatibility way. Fortunately, TAJ did not use any of those features, and thus we were able to easily get the code running using the previously mentioned compatibility layer. Removing the compatibility layer The changes needed for removing the compatibility layer and running using native Eclipse 3 methodologies. Some of the changes suffered by the framework included the creation of new packages that. Some packages were moved between releases while some others were divided between various plug-ins. For being able to compile the source code removing the compatibility layer new packages had to be imported so that the elements contained in the moved and split packages could be referenced correctly. Porting the code also involved revising the plug-in lifecycle model used by Eclipse, moving from a proprietary technology to a model driven by the OSGi specification. OSGi specification and Eclipse Starting with Eclipse 3.0 the runtime is based on the Open Services Gateway Initiative (OSGi). Previous to Eclipse 3.0 the framework used its own proprietary plug-in system to manage the installed plug-ins. As more plug-ins and requirement were incorporated into Eclipse IDE, the developers decided to substitute the proprietary model with OSGi. The OSGi defines the concept of bundle as collection of types and resources and associated inter-bundle prerequisite information that contributes to the system. OSGi also defines an infrastructure for a bundle’s lifecycle and how bundles interact with each other. Interaction of the layers in the host O.S, Java and OSGi Eclipse implements a subset of OSGi centered in the modularization and lifecycle portion of the specification. However, it makes minimal use of the service support provided by OSGi. Instead, Eclipse keeps its own extension points that enable the bundle interaction. One of the key benefits of OSGi over the previous technology used by Eclipse is the ability to discover, load and upload bundles at run-time without the need of restarting the Eclipse IDE, and also to broadcast related events (eg: install, stop, uninstall…) to interested parties. In previous releases plug-ins were discovered during the Eclipse initialization and any modification to the plug-ins needed a restart Eclipse to take effect. The new model also encourages doing initialization in a classic lazy style, where the data structures and models initialization is differed until it is actually needed, and not when the plug-in/bundle is activated. A problem encountered when adapting the plug-in was importing a package in the source code without defining the import in the corresponding area of the manifest file. The source code compiled without error, but when executing the plug-in it would fail when opening a file because the source could not be able to load a certain class. By explicitly importing the package in the OSGi manifest file the class was available for use by the plug-in. The current OSGi implementation used in Eclipse is called Equinox and is based on the OSGi R4 core framework specification. Folding The TAJ model created for the different code abstractions constructed during the tool usage create a tree structure that contains, lines of code as well as chunks. The diagram below shows the classes used to create this structure. Class diagram of the TAJ model The diagram shows that we have implemented the Composite design pattern. There is a class called Clip from which the classes Line and Chunk inherit. The class Line represents each one of the lines of the source code document. When a section of code is collapsed by the user a new Chunk is created, since the class Chunk is an aggregation of clips, a chunk is allowed to contain not only lines, but also other chunks. To represent the whole structure there is a model that contains an attribute that references the main chunk. Since we only need one reference to the whole structure, the design is simplified. When a new model is created the root element of the model is a chunk that includes all the lines of the document. Eclipse integration When it was time to implement the folding in Eclipse we had the choice to create our own infrastructure using functionality provided by Eclipse or to make use of the folding included in the Eclipse framework. The folding, as it is implemented by eclipse, is used to hide/show portions of text based on an analysis of the text, creating different sections that can then manipulated making use of UI elements provided by the platform. The decision taken was to make use of these folding capabilities of the framework instead of implementing our own solution. This way we could use an already existing component that provided the basic functionality that we needed. The folding in eclipse makes use of three main classes, those classes create a pipeline structure than transforms the original text of the document into the view that is shown to the user. Eclipse folding implementation Since this implementation only support the display of text contained in the master document, the descriptions associated with each chunk have to be inserted in the master document so that they can be displayed in the text viewer. It is important to keep the chunks unaware of this issue, so that the initial and final line numbers of the chunks refer to the original document. In order to do this, the editor has to compute the difference between the number of the displayed lines and that same line in the original document. This value is not constant and varies from chunk to chunk depending on the chunk position in the model structure. The decision of keeping the chunks unaware of this offset would allow us to modify the implementation without altering the Chunk code. In order to create the different sections that can be collapse we have to make use of the projection annotations, these annotations tell eclipse which sections of the document are collapsible. Here, the Observer pattern was used. The TajEditor object subscribes to the document model. When the user request an action that modifies the number of chunks in the document (operations of collapse or expand), the editor calls the appropriate method in the model object. Once the new chunks have been created, the model notifies to all the observer of that action so they can take the appropriate measures. When the editor is notified of a change on the structure of the document, it creates or removes the appropriate projection annotation so that the view shown to the users is synchronized with the TAJ structure of the document. The framework automatically creates all the UI components and adds the functionality based on the annotations. TAJ Domain Editor The domain editor implemented allows uses to create a domain to be linked to the different chunks created in the TAJ editor. A domain consists of a series of classes, identified by their name, which can contain attributes and operations. Those different classes can be related by using associations or specializations. Domain model GMF The model editor makes use of the Graphical Modeling Framework (GMF) in order to create a graphical editor for an underlying model definition. GMF provides a runtime infrastructure for developing graphical editors based on the Eclipse Modeling Framework Project (EMF) and the Graphical Editing Framework (GEF). By making use of GMF we are able to create a complete editor that supports all the elements defined by the underlying model defined by EMF. At the same time, GMF makes use of GEF to generate a rich graphical editor. The framework requires the creation of different elements that once combined contribute to the creation of the final editor. Graphical Definition A graphical definition model is needed for defining how the different elements of the domain will be represented in the editor. In the case of TAJ this graphical definition includes rectangular figures for the classes, compartments for the attributes and operations of the classes, lines for the associations and specializations, etc. Tooling Definition The tooling definition is needed so GMF knows which tools it has to place in the palette situated that will be situated at the tight side of the domain model being created. Usually this definition includes the different elements that can me added to the model. By default this palette also includes some common elements not associated with a particular model, but can be useful for the users (eg: create a note) In its tooling definition TAJ includes tools for creating classes, attributes, operations, associations and specializations. Mapping definition This mapping is used by GMF to know the relations between the different definitions previously mentioned and the underlying ecore model. In this mapping we can specify which and how elements of the ecore model will be created and displayed by the editor. The tooling definition is used by the mapping definition to associate an element in the editor palette to an element of the ecore model. At the same time the graphical definition is used to graphically display the different elements on the editor. For example the mapping definition specifies that the compartments for attributes and operations defined in the graphical tool have to be drawn inside the Class element defined in the ecore model. Similarly here is were we define that the name attribute of a class object (defined in the ecore model) has to be displayed inside the representation of a class (specified in the graphical definition) using a label (also specified in the graphical definition), that the associations have to be displayed using a particular line style (previously defined in the graphical definition)… GMF overview It is important to understand the difference between the different elements involved in the creation of the editor. It is especially important to understand the difference between the ecore model and the graphical definition. Since the graphical editor stores information about the graphical model (eg: position of element, typographies, colors…), an element included in the graphical definition that does not have a mapping to the underlying model would be saved as part of the graphical information, while ignored by the model. Thus, a tool that uses the model as its input would not be able to access the information stored as part of the graphical definition. At one point this problem appeared during the development process. At that point the model definition had to be recreated from scratch including the elements that were previously included as part of the graphical definition that were needed in the model. This forced us to recreate the subsequent elements implicated in the GMF creation too. EMF The EMF Eclipse project is a modeling framework and code generation facility for building applications based on a structured data model. Starting from a model specification EMF is able to generate a set of Java classes for the model. Those classes can be then used to generate new elements of the model specification. EMF also provides a based editor to manipulate the model. GEF The Graphical Editing Framework of Eclipse is the underlying component used by GMF to create the domain editor. GEF is an application neutral framework that provides the groundwork to build applications that need to make use of some graphical components. This framework is divided into 2 packages to support the needs for graphical editors like the document layout, rendering support, figures, connectors, printing, etc. Link between code and domain When the application domain has been created and the code has been annotated, the users can link those two sources of knowledge to reinforce their understanding. The creation of those links implies the necessity to correctly identify the different elements. The chunks will be identified by a sequential number, while the domain elements are identifies by a Uniform Resource Identifier (URI). The domain model includes references to objects that store the information of the links created for each source code file with the domain model. Those links are merely a pair <chunked, elementID> that allow us to create the mapping “many to many” needed by our tool. Linking structure Conclusion Program understanding happens at various levels of abstraction, but the different documentation artifacts are not clearly connected to source code constructions. By creating these two different elements we hope to achieve documentation that is constantly update with the source code that can be easily browsed by the developer. With the previous version of TAJ ax experiment was conducted to test the effectiveness of the tool in relation with Javadoc or ordinary documentation. The results showed that TAJ performed better in both the lower level and the higher level of abstraction. The inclusion of the domain editor should only benefit this understanding. The obvious hypothesis would be that as the source code increases in size, the domain component becomes increasingly important to minimize the time needed to gain understanding of the application. Eclipse is a very powerful platform to develop set of tools. Eclipse tries to create a low coupled environment where components can be added or removed to create a platform that can be adapted to different usages. At the same time, the core framework is surrounded by other components that try to ease the work of developer in different areas. Unfortunately this creates a very steep learning curve; this is especially true when not only you need to learn the main platform, but also some of the surrounding technologies like EMF and GMF. Moreover the documentation available for the different component not always refers to the latest versions of the components, or when the documentation is simply not there. For instance the help available for GMF include a list of topics to learn the use of the platform, but some of those topics are not even written. Future work There are certain areas of TAJ that could be explored for further releases. These areas could range from user interface changes to create easier navigation between model and code, to some short of code generation to allow translation of source code to an object oriented design (independently of the design used in the previous code). Editor capable of handling Java code Currently, the TAJ editor is capable of creating chunks for a contiguous group of lines of code. Since sometimes more that one code construction is defined in a single line it would be useful to create chunks with smaller granularity, where chunk could contain only a portion of a given line. Data storage method In the actual version each java file annotated with TAJ generates a file that contains data generated by TAJ stored using serializable objects. The creation of a unique file that contains that information could reduce number of files to store and the disk space needed to store the data, thus improving maintainability. The substitution of the serialized objects by a XML should also be studied, the use od the format would allow interoperability with other tools that could offer different views of the data created with TAJ. Multiple domain definition Current TAJ supports the creation of a single domain model. Since an application is affected by more than one domain, ranging from the main domain/s, to OS interaction, programming language knowledge, etc. it will be interesting to include the ability to create more than one domain for the same project. The benefit of the ability to create more that one domain is not limited to the understanding created by the fact that more information is available for the user, but could also be used to eliminate undesired domains. Users could be allowed to define different views that would hide/show complete domains, change the text color in the source code so that users can easily identify code pertaining to different domains, etc. Code generation/refactoring New features could be added to the tool to allow refactoring of the source code by using some code generation technique. It would be possible to construct concrete classes for the abstract classes defined in the domain; these classes could be automatically filled with the different source code pertaining to chunks related to the classes. The code generated using this technique would create an object oriented structure even if the original source code was not creating this methodology. This feature could also be used to translate between different programming languages. Queries and statistics Another interesting feature would be the ability to query the tool to obtain more complex information. For example user could request which code chunks are related to more that one domain, if the domains selected have little connection with each other but there are some chunks that are linked to both, those chunks could be candidates to code refactoring to reduce the coupling. Better integration with Eclipse Certain elements create using the Eclipse framework could benefit from a better integration with the platform. For example the outline view for TAJ documents could be used to quickly navigate though the code or to interact with the TAJ document structure (collapse, expand, change chunk description, etc.) Papers JRipples is a tool for during Incremental Change J Buckner, J Buchta, M Petrenko, V Rajlich - Program Comprehension, 2005. IWPC 2005. JRipples is a tool for assisting programmers during Incremental Change. JRipples automatically analyzes an application Java source files and extracts dependencies between classes. When a change has to be made to the application, the programmer has to select the initial class of the impact set, and JRipples is able to identify classes that might need revision based on the dependencies previously identified. Once those classes have been identifies the tool assists the developer by keeping track of classes that have to be checked, classes that have been already modified by the developers to incorporate the new functionality and classes that do not need to be modified. Similarly to TAJ, JRipples helps developers during the implementation of new functionality. The process followed by TAJ relies in a previously manually created annotated tree that can be used to easily identify which sections of code need to be modified to obtain a new version of the application with the new requirements implemented. By using per-line granularity programmers are able to identify more precisely sections of code that need revision. Another benefit of the TAJ application is that it can be used to reverse engineer an application source code and gain understanding about the application’s high level architecture, this knowledge can be used as a reference when changes need to be made. Design Fragments Make Using Frameworks Easier G Fairbanks, D Garlan, W Scherlis - ACM SIGPLAN Notices, 2006 Given that programming frameworks impose certain burdens to programmers because the lifecycle of the application is affected by the framework, there is the need for programmers to comprehend how the framework and their code cooperate to complete a task. A way for programmers to learn how to use programming frameworks is to refer to examples, and copy code from those examples that provide the needed functionality. This paper proposes the creation of design fragments databases, these fragments include relevant information about how the framework has to be used to implement certain functionality in the application being built. An IDE that support these design fragments can help developers checking for the correct implementation of the fragments, notify the users when a fragment has been deprecated or updated to solve certain bugs. Unfortunately the exampled provided in the paper is related to a relatively small framework like is Java Applets construction. From what I have experienced during this project I have learned that Eclipse framework is very complex, and even with examples and wizards, when you need to implement something slightly different, you find yourself “lost” in a sea of packages, interfaces, classes, and inheritance hierarchies. Designing and keeping updated fragments with new versions for all possible implementation needs seem an unaffordable task. At the same time it seems clear that there is a need for specific documentation that can be used to comprehend frameworks behavior. Introduction and Overview Domain Analysis Concepts and Research Directions G Arango, R Prieto-Diaz - Domain Analysis and Software Systems Modeling, 1991 This paper proposes domain analysis in the context of software reusability. The application of reusability techniques aims to help developers in the task of software development and maintenance. By making possible to reuse elements (modules, software architectures, formal transformations, test cases) software development complexity can be eased. This paper highlights the difficulty of domain modeling in practice, where even small domains reveal themselves to be complex when they are under analysis. Domains are an important part of the Eclipse plug-in we are developing. A domain is used to identify constructions in the code that are related to certain elements of the domain, in order to ease the process of locating those code constructions, there is the need to correctly identify the domain model used by the application at a appropriate granularity level. The difficulty of identifying a correct domain during the domain analysis must be taken into consideration when generation the domain used in the plug-in. Modifying the domain after finding certain code construction has to be done in such a way, that the element introduced in the domain model is not a simple translation from the source code, but a real element of the domain. The use of domain knowledge in program understanding S Rugaber - Annals of Software Engineering, 2000 The different documentation elements created during a software processes give usually a high level description of the solution to be implemented and the final requirements to fulfill. In contrast the source code is a very low level representation that is affected by many different elements like the programming language used, programming experience of the developers, non-functional requirements, etc. This gap between both representations can make it hard to relate the different elements between both representations. By understanding the different domains that affect the creation of the source code, developers know what to expect in terms of constructions in the source code, and how these constructions are related to elements in the domain. Domains are an integral part of the TAJ tool. Software is created in order to solve a problem that is based in the real world, and thus those elements will affect somehow the source code. These relations, even if they are usually not explicitly represented, are needed in order to comprehend how the code achieves its results. By explicitly showing these relationships we believe that program understanding can be eased. Improving the Quality of Requirements Specifications via Automatically Created Object-Oriented Models D. Popescu, S. Rugaber, N. Medvidovic, D. M. Berry Software requirements specifications (SRS) are a fundamental part of software construction. Even when the final version of the requirements can be written using in a formal language, the first draft is usually written in natural language (NL). The use of natural language can introduce ambiguities in the requirements specification, which can be hard to detect even after a review process. The authors propose an approach to transform the requirements specification into a graphical model that can be reviewed more easily than the original requirement specification. Since the document has to be analyzed to create the diagram, the correct identification of elements used in the natural language is a key aspect for this approach to work. In order to accomplish this task, the authors limit the natural language used to a constraining grammar in order to reduce the complexity of the analysis procedure. The paper examines how a NL SRS correctly identifies desired pieces of information when the NL is modified to the constraining grammar, but it fails to mention how much and how the text was altered in order to achieve the result shown. Also there is no study of how the review of the model performs against a review of the original NL SRS, given that the creation of the model could introduced errors that didn’t exist in the original document, it would be interesting to compare results of these reviews. SRS are used as a base to write the application code, and thus code is strongly related to the SRS. Since SRS are used to describe elements of a higher level domain it seems natural to use a SRS as a layer between of abstraction between the problem domain and the source code. TAJ is able to link source constructions with domain elements, the ability of generating an initial model based on the analysis of the SRS could aid in the workflow of working with TAJ. On the Knowledge Required to Understand a Program R Clayton, S Rugaber, L Wills - Working Conference on Reverse Engineering, 1998 This paper reflects the complexity of understanding a program when the main source of knowledge of the program is the source code. Since applications are created to solve certain real life problem, the source code relates to elements in the domain of the problem, but the problem domain is not the only existing knowledge needed to create an application. When an application is created there is the need to know about the programming language used during the coding phase, general concepts about programming and how it relates to the underplaying elements (hardware, virtual machine, interpreter, etc.). Those different knowledge domains affect the ideal design concept and add complexity when trying to move from the low level source code to a higher level of understanding. The paper explores these issues with the source code of an application written in FORTRAN, which finds a root of a function of a single, real variable in an interval. The results of this study show that all the different domains that take part in the application source code are highly interleaved and equally important when trying to understand the application source code. Allowing more than a single domain to be represented in TAJ is something that will have to be explored in future versions of the tool. This is not a trivial issue since certain code constructions can be, as shown in this paper, related to multiple domains and there is the need to keep all the different domains up to date in order for them to be useful. Software Maintenance by Transformation G. Arango, I. Baxter, .P Freeman, C. Pidgeon IEEE Software 3:33, 27-39, 1986 In this paper the authors explore the possibility of modifying existing operational software by transformation. They use a software model based in a tree structure were the root node is the initial system specification. Leaves are executable applications, and the intermediate nodes represent specifications at varying levels of abstraction. The arcs represent possible design choices. A specific software implementation of a specification is defined by a set of arcs that go from the root node to a leaf. When a software product has to be modified, the model is traversed from the current leaf towards the root until a node that encompasses both the current and desired implementations is found, and then a new child node is explored until a new leaf node that implements all the needed requirements is reached. Using this sort of techniques would be a very useful way of modifying existing software applications, but it seems to be based on an ideal model, where the different design decisions do not intersect with each other. For example a decision of changing certain software structure could invalidate certain states previously assumed as correct. This is something confirmed by the authors in their example, were they port an application from UCI Lisp to Franz Lisp and they found the semantic gap between I/O concepts in the two dialects to be to large. TAJ is based on the same concept that the final source code is an implementation of an ideal abstract representation (a domain), and that is affected by the different functional and non-functional requirements. The current version of TAJ is useful to gain knowledge of the application by abstracting software constructions to the domain concepts, but some of the concepts presented in the paper could be later added to TAJ in order to, for example, support a history of the evolution of the software, or the ability of generating portions of code automatically. JIRiSS - an Eclipse plug-in for Source Code Exploration D Poshyvanyk, A Marcus, Y Dong - Proc. of 14th IEEE International Conference on Program …, 2006 JIRiSS is an eclipse plug-in for source code exploration, the plug-in allows users to search Java projects (source code and documentation) using natural language. The results of the queries are presented back to the user sorted by relevance, so that the most relevant results can be easily identified by the user. Source code and documentation analysis and comprehension are a fundamental part of the process of software development, maintenance, reverse engineering, and software evolution, JIRiSS allows users to query those sources of information in order to extract knowledge from them. By integrating this process in the development IDE, users can more easily access the source of knowledge. Unfortunately the paper does not include technical or implementation details, this fact makes it impossible for the reader to determinate the usefulness of the plug-in compared to the search function included in all IDEs. In the paper they mention the ability to query the system using natural language, but the lack of information about the processing of the queries, makes it impossible to determinate if there is some kind of semantic analysis of the queries or is just a simple use of keywords. The ability to query a source of knowledge that includes both source code and documentation could be useful to determinate associations between the low level implementation of the application and a higher representation of the problem that the application tries to solve. This could be beneficial in a project like TAJ, where the user has extract knowledge from the code and link it to a domain for a better understanding of a program. Supporting Document and Data Views of Source Code ML Collard, JI Maletic, A Marcus - Proceedings of the 2002 ACM symposium on Document …, 2002 This paper describes a representation of source code using the XML format. The XML stores the original source code and adds structural information to it, so various views can be selected from a single document. This XML document can be used by other applications to manipulate the document, and allows developers to query the document using XPath to extract information about it. It is not clear how the use of this tool could be useful to developers since it seems to simply structural information that does not really add any real knowledge. Useful features like different highlight colors in the source code are available on any modern IDE and the creation of things like call graphs can be done directly from the source code. Also the authors affirm that srcML has higher visibility than AST & Symbol Table, but due to the tree-based structure of XML documents, there is limited expressiveness unless internal mapping are created between different elements of the document as a workaround. Since no document schema is shown is impossible to evaluate the expressiveness that can be archived with this representation. For some reason they consider it important to maintain shallow information like whitespaces and tabulations, but it seems that it would be more useful to generate them when reading the document, offering the ability for developers to create they own formatting settings. Also there is no actual implementation of the srcML tool so the authors could not show any tests results that can be used to evaluate their solution. The use of an XML document to store the information generated by TAJ would be useful to allow interoperability with other tools. Since there are many tools and libraries to manipulate XML documents this addition would make possible to offer different views of the same information. This could lead to views for certain users that hide non-relevant data, so the relevant information is more visible to them. References - Eclipse 3.0 Porting Guide, Generic Workbench http://dev.eclipse.org/viewcvs/index.cgi/platform-uihome/rcp/generic_workbench_porting_guide.html?view=co - Folding in Eclipse Text Editors http://www.eclipse.org/articles/Article-Folding-in-Eclipse-TextEditors/folding.html - OSGi http://www.osgi.org/ - The Official Eclipse FAQs http://wiki.eclipse.org/index.php/Eclipse_FAQs - GMF Tutorial http://wiki.eclipse.org/index.php/GMF_Tutorial - GMF New and Noteworthy http://wiki.eclipse.org/index.php/GMF_New_and_Noteworthy - Generating an EMF Model - The Eclipse Modeling Framework (EMF) Overview