CPSC 507 Software Engineering Project Report GNU Chess under Acacia/CIAO view by Ha Hoang hvha@cs.ubc.ca 1. Introduction It is well known that programmers and software maintainers are frequently faced with the task of maintaining complex software systems with inadequate documentation. A tool that can help them to regenerate software blueprints and trace architecture evolution is a great assistance. Topics on software architecture, visualization and reverse engineering that we discussed in the course have inspired me in experiment with a real tool on a real system. This is quite challenge as I have never used any analyzing tool before (not even knew about them before the course). This project is aimed at applying a program analysis tool to perform a task on an existing (large) system. Acacia/CIAO, a source analyzer and visualizer, is chosen as an analyzing tool. The system to be analyzed is GNU Chess1, free software provided by GNU, which consists of more than 10,000 lines of C code. The report is arranged as follow: section 2 briefly introduces GNU Chess and discusses what I have learnt about the system. Section 3 focuses on Acacia/CIAO, presents my own experience with the tool. Conclusion closes the report in section 4. 1 GNU Chess: http://www.gnu.org/software/chess/chess.html 1 2. GNU Chess GNU Chess version 5.0 is a software package provided by GNU which lets most modern computers play a full game of chess. It has a plain terminal interface but supports visual interfaces such as X-Windows "xboard" and Windows-for-PC "winboard" as well as a full 3-dimensional wooden chess-board protocol for the Novag Chess board enabling one to be relatively free of the computer itself. It consists of 31 source files, contains 10,532 lines of C code. The code is simplified and re-structured compare with older versions, thus increases understandability and makes it easy for code modification. The goal of this project is to use Acacia/CIAO to analyze the architecture of GNU Chess, in term of modules that the system consists of and the interrelationship between them. Several tasks are done in this project: extracting a database from the source code, learning the interrelationship between modules, detecting dead-functions (i.e. those that do not have any reference paths from main function) and unnecessary include files. The result of each task is described below. 2.1 Extract a structure information from the source As the program has a small number of source files, it is quite easy to have CIA to extract information and build a relational database. The complexity of the program database is compared with other two programs that were analyzed by the authors of Ciao [1] with some metrics on the complexity are shown in Table 1. It is remarkable that GNU Chess is of the average size of these two programs, which seems to be a reasonable choice. Program Lines incl 1,957 GNU Chess 10,532 xgremlin 24,582 Source size DB size 49,963 620,726 Entities Relationships 26,416 392 517 165,000 2,008 2,981 620,726 4,842 6,634 Table 1. Complexity metrics for GNU Chess, incl and xgremlin program databases 2.2 Interrelationship between modules The system is analyzed at the granularity of functions, and each source file is considered as a module. The interrelationship between modules is studied based on the function calls between modules and the shared variables among them. Using Acacia/CIAO, we can find out all function calls from other modules (files) to one particular module or functions calls from that particular module to others. Figure 1 and 2 show some examples of this relationship. Applying this procedure to all modules in the 2 system, we can build up a module-to-module relationship in term of function calls. This is shown in the Table 2. Figure 1. A Reference Graph that shows all the functions that are called by functions in module test.c 3 Figure 2. A Reference Graph that shows all the functions that call functions in module test.c Based on this information, I have drawn a roughly hierarchical architecture of the system. As the modules call each other very often, the hierarchy looks quite spaghettilike. However, it is easy to see that there is a root (main) represents level 0. Level 1 can be considered as those modules that connect directly with the root (cmd, book, init, iterate and version). The rest of modules are arranged somewhat randomly. We can see that most of the modules are tightly coupled. Analyzing global variables shared between related modules (modules that have functions call to or called from other modules) shows that only few variables are declarations, the rest are definitions. Hence, shared variables are not big problems of the system in term of coupling. Furthermore, with the help of Ciao, it is not difficult to trace all entities that might be affected by the change of an entity. 4 Ciao does provide customized visualizations of different relationships among entities. But it is analyzers who have to find out the structure of the program, i.e. which modules the program has and how they are connected together. With information extracted by Ciao as shown in Table 2 and without any modest kind of document, analyzers have to base on their own experience to determine system modules. This decision is very important as it affects the assessment of coupling or coherence of the system, but is not easy to make. In fact, I find this is the most painful task in the project. 2.3 Dead - functions The term “dead - function” is used for those functions that never get exercised [1], thus are useless and should be taken away from the system. This scenario can happen when a new version of the software is built based on the previous versions. Modifications to the system may leave some old functions unused. In order to detect those functions, we first extract all the functions that are not on any reference paths from the root function (main). Base on the relational database extracted from the source code (using cia), Acacia/CIAO allows us to build up a function-to-function reference graph. The receiving graph is pretty complex and incomprehensible in printing form. However, we can refine the graph in Ciao by deleting unimportant relationships. In this particular case, I had deleted all the functions that come from the system include files to concentrate on the program’s functions only. The refined graph is shown in Figure 3. There are six functions that stay alone without any reference from the root function: BookCmd, ShowSmallBoard, ShowHashKey, ShowCBoard, ShowMvBoard and InitFICS. This is not the final answer though. There are two possibilities for a function not to be on any reference paths from root. The first one is that they are dead functions (and that are what we are looking for). The second possibility is that they are indirectly invoked through variables. In this case, we need also the variableto-function relationship. The query for this relationship returns that there are no variables invoke six functions listed above. Now, we can be sure that they are dead-functions and thus can be taken away. 2.4 Unnecessary include files C and C++ software systems typically share data types, macros, and declarations of global variables by including common header files. The header files and their interdependencies form include hierarchy. As with any other parts of the system, include hierarchy grows with the project as features are added, deleted or modified. It can become very large and complex, thus makes it very difficult for programmers to decide when a file must be included. Since including a file without any useful information is usually harmless, programmers tend to include enough files so that the program will compile. This causes an extra cost of compilation overhead due to the processing of unneeded files. Therefore, it is useful to find out when an include file is needed or not. This information can be used to redo the code or refine the include hierarchy. 5 6 GNU Chess has six program include files and lots of built-ins. I am only interested in the program include files though. Surprisingly, among those six functions, there are two that are unnecessary. The first one (univ.h) is included by univ.c, a file that never compiles. In fact, this is the only file that can not be compiled in GNU Chess, probably is an under-experiment part. It does not effect the system performance though. The second one (eval.h) is more interesting. It is included in a file that really plays a role in the system. Figure 5 shows the include hierarchy and reference graph of cmd.c (a module that provides commands to the system). The dotted edge between cmd.c and eval.h means that cmd.c includes the header file, but does not directly use any information contained in it. I did a small test, in which I took out the include file from cmd.c and then compile it. The system compiles without any problem. Figure 4. A reference graph that shows all include files in cmd.c 2.5 Summary All the tasks of learning the system structure, detecting dead functions and unnecessary include functions are achieved. The most valuable lesson that I learn from GNU Chess is how to deal with a nontrivial program without any documentation. The approach applied here is bottom-up, in which from all complex relationships between functions, we extract out higher level structures. As the project is done, it seems that another approach is possible, especially 7 for GNU Chess. As the main function is clearly specified with not many directly linked functions, it is possible to analyze the system in top-down manner as well. In this second approach, we can take each function that is directly linked with the main as a module of the first level. Finding all functions that are called directly from these functions could possibly form the second level of system architecture. 3. Acacia/CIAO Acacia/CIAO is a set of tools for analyzing C, C++ and HTML source, consists of CIA and Ciao. CIA is a static extractor that can extract information about system and store it in some source model. Ciao is a customizable graphical navigator for software and document repositories that helps large software projects to regenerate their software blueprints and trace architecture evolution [3]. Such a set of tools is very useful for software maintainers and programmers because software document is rarely precisely reflect its features and architecture and it is almost impossible to look at the code of large projects. Starting with Acacia/CIAO is not easy although installing it is not a big deal. Everything is straightforward until one tries to run the demo. The demo is too simple. The graphs pop up are nice, but we can not do much with them. My first impression was “can it really do as what is said?” [3]. There is very few and incomplete guideline on how to exploit the tool. The only way to do it is to learn as going. With GNU Chess, I can not try all the features of Acacia/CIAO. There is only one version of GNU Chess available, making a real structure differences with Acacia/CIAO impossible. Other features are learnt and studied as the tasks are done. 3.1 Database extraction To have a program database that works with Acacia/CIAO is not difficult. If we have a system that can compile, then it is pretty sure that we will be able to extract a database. Only one simple ‘make’ file is needed and Acacia takes a good care of the rest. However, as specified in the manual, there are many other options that users can use to extract information from source code to build up database that can save disk space for the database or customize it. I did not use them because I believe that GNU Chess is not large enough to encounter such problems. 3.2 Query and graph viewer All the queries are done from the Main View window or from a Graph View window. Almost all items in the Main View Window and Graph View windows submenus are used, but with different frequency. I find the most often used query is that of relationship. 8 Combining relationship with entity and focus submenus can be used to achieve very interesting results in query on interrelationship among entities such as variables, functions, and modules. #Include graph is very good for finding out unnecessary include file. However, it is quite tedious to query every single source file to check for unnecessary include functions. In GNU Chess, there are only 25 .c files. Just imagine with a much larger system, this work must be very time-consuming. In Acacia/CIAO, there is no direct query or graph viewer that shows the interrelationship between modules (files). I have combined the relationship submenu with reachable set and focus submenus. This results in a complete relationship among functions or between a module (file) and other functions. Checking for the attributes of functions give us all modules that contain these functions. But, there is still no graphical visualization for these particular references. Analyzers have to make up a visualization that suits them the most. I find that two-dimention table is a good start in understanding the interrelationship among modules. For the rows and columns that have very few checked mark (x), we can conclude that they have little interconnection with other modules and can be independent. In contrast, for modules that have many checked mark in their rows or columns, they seem to have strong interconnection with others and thus are worth to have a deeper check in dependency. In the latter case, we can again use Acacia/CIAO to query for all the shared variables among the interconnected functions, to see if they are indeed strongly coupled or not. I have tried several ways to detect dead function using Ciao. The most straightforward method is to use a tool of CIA called deadobj. Unfortunately, this tool is not included in Acacia/CIAO and thus cannot be exploited. An alternative is to view the function-to-function relationship in Graph View window. The graph is rather complex and could hardly detect any function that does not have a reference path from the main function. To refine the graph, I have deleted all the functions that are called by C include files. The result graph is clear enough to find out all stand-alone functions that mentioned in section 2.3. The refinement is pretty much tedious and time-consuming, too. I believe that for large projects, it is obviously necessary to have the CIA tool deadobj. As seen in Figure 3, large program graphs are usually very complex. A good feature of Ciao is that it allows user to manage this complexity by concentrating on individual nodes of interest. We can Find all program entities that an entity depends on directly or indirectly. For instance, we can extract all functions that call one particular function. Display a few layers of relationships centered on a particular node, using focus submenu. Display attributes of a particular node or query backward on certain node for different information. 9 3.3 Database and text view Both database and source views are directed to Text View windows. This mode of view is appropriate for comparing structure differences. However, as this is not the goal of the project, it is rarely used. 3.4 Other remarks ADVANTAGES The graphical interface of Ciao is friendly and easy to navigate. User can choose to extend a current graph (using “inplace” mode) or create a new graph. With many graph windows open at the same time, the navigation graph can help user to handle them efficiently. The logic of queries implemented in Ciao is well understandable. This is very helpful for users to learn the tool, as the help document of Ciao is very inadequate. I find that learning on going with Ciao is very interesting. cql used in Ciao is a lot similar to other database query language such as sql. Furthermore, users can always view the query result in graph or text mode to check if it is what they want or to find out what needs to be changed in their queries. Experimenting with Ciao make me believe that other analyzing tools that built on top of Ciao will work well. There are several tools like that have been built. Dragger [1] is a tool that can generate program graph and allow users to manage the complexity of the graphs. TestTube [2] is another system that combines static and dynamic analysis to perform selective re-testing of software systems. DISADVANTAGES The popup menu is not very pleasant: it can be hidden under a bunch of windows and user has to move all of them around to find that small window. It would be more convenient for users if Ciao can redo or checkpoint for "inplace" queries. In case of running a sequence of queries, (which happens very often), users might want to incrementally analyze a subset of entities. Each "inplace" query may not be right and users probably want to roll back. However, it is not possible in Ciao. Users have to re-start all queries in the sequence and wait for a while when Ciao perform all the queries again. Some tasks are tedious and time-consuming. For example, to detect dead functions and unnecessary include files. 10 4. Conclusion Ciao as I learn from the project is an interesting tool for system analysis. As a graphical navigator, it supports programmers to navigate through an information database of the program. Although there are a few aspects of Ciao that are not completely automated, it does a good job in general. Together with other tools that are built on top of Ciao such as Dagger or TestTub, the future tool can be one that can depict the architecture of system at a high level. From what I see about Ciao and its related tools so far, why not! I really enjoy doing this project as I learn and do a lot of things that have read from books and papers. Gail, thank you very much. References [1] Yih-Farn R. Chen. Dagger: A tool to generate program graphs. In Proceedings of the USENIX Unix Applications Development Symposium, pages 19-35, 1994. [2] Yih-Farn R. Chen, David Rosenblum, and Kiem-Phong Vo. TestTube: A System for Selective Regression Testing. In The 16th International Conference on Software Engineering, pages 211-220, 1994. [3] Yih-Farn R. Chen, Glenn S. Fowler, Eleftherios Koutsofios, and Ryan S. Wallach. Ciao: A Graphical Navigator for Software and Document Repositories. AT&T Bell Laboratories. [4] http://www.research.att.com/~ciao/help/ 11