gnu_chess

advertisement
CPSC 507 Software Engineering
Project Report
GNU Chess under Acacia/CIAO view
by Ha Hoang
hvha@cs.ubc.ca
1. Introduction
It is well known that programmers and software maintainers are frequently faced
with the task of maintaining complex software systems with inadequate documentation.
A tool that can help them to regenerate software blueprints and trace architecture
evolution is a great assistance.
Topics on software architecture, visualization and reverse engineering that we
discussed in the course have inspired me in experiment with a real tool on a real system.
This is quite challenge as I have never used any analyzing tool before (not even knew
about them before the course).
This project is aimed at applying a program analysis tool to perform a task on an
existing (large) system. Acacia/CIAO, a source analyzer and visualizer, is chosen as an
analyzing tool. The system to be analyzed is GNU Chess1, free software provided by
GNU, which consists of more than 10,000 lines of C code. The report is arranged as
follow: section 2 briefly introduces GNU Chess and discusses what I have learnt about
the system. Section 3 focuses on Acacia/CIAO, presents my own experience with the
tool. Conclusion closes the report in section 4.
1
GNU Chess: http://www.gnu.org/software/chess/chess.html
1
2. GNU Chess
GNU Chess version 5.0 is a software package provided by GNU which lets most
modern computers play a full game of chess. It has a plain terminal interface but supports
visual interfaces such as X-Windows "xboard" and Windows-for-PC "winboard" as well
as a full 3-dimensional wooden chess-board protocol for the Novag Chess board enabling
one to be relatively free of the computer itself. It consists of 31 source files, contains
10,532 lines of C code. The code is simplified and re-structured compare with older
versions, thus increases understandability and makes it easy for code modification.
The goal of this project is to use Acacia/CIAO to analyze the architecture of GNU
Chess, in term of modules that the system consists of and the interrelationship between
them. Several tasks are done in this project: extracting a database from the source code,
learning the interrelationship between modules, detecting dead-functions (i.e. those that
do not have any reference paths from main function) and unnecessary include files. The
result of each task is described below.
2.1 Extract a structure information from the source
As the program has a small number of source files, it is quite easy to have CIA to
extract information and build a relational database. The complexity of the program
database is compared with other two programs that were analyzed by the authors of Ciao
[1] with some metrics on the complexity are shown in Table 1. It is remarkable that GNU
Chess is of the average size of these two programs, which seems to be a reasonable
choice.
Program
Lines
incl
1,957
GNU Chess
10,532
xgremlin
24,582
Source size
DB size
49,963
620,726
Entities
Relationships
26,416
392
517
165,000
2,008
2,981
620,726
4,842
6,634
Table 1. Complexity metrics for GNU Chess, incl and xgremlin program databases
2.2 Interrelationship between modules
The system is analyzed at the granularity of functions, and each source file is
considered as a module. The interrelationship between modules is studied based on the
function calls between modules and the shared variables among them. Using
Acacia/CIAO, we can find out all function calls from other modules (files) to one
particular module or functions calls from that particular module to others. Figure 1 and 2
show some examples of this relationship. Applying this procedure to all modules in the
2
system, we can build up a module-to-module relationship in term of function calls. This
is shown in the Table 2.
Figure 1. A Reference Graph that shows all the functions that are
called by functions in module test.c
3
Figure 2. A Reference Graph that shows all the functions that call
functions in module test.c
Based on this information, I have drawn a roughly hierarchical architecture of the
system. As the modules call each other very often, the hierarchy looks quite spaghettilike. However, it is easy to see that there is a root (main) represents level 0. Level 1 can
be considered as those modules that connect directly with the root (cmd, book, init,
iterate and version). The rest of modules are arranged somewhat randomly.
We can see that most of the modules are tightly coupled. Analyzing global
variables shared between related modules (modules that have functions call to or called
from other modules) shows that only few variables are declarations, the rest are
definitions. Hence, shared variables are not big problems of the system in term of
coupling. Furthermore, with the help of Ciao, it is not difficult to trace all entities that
might be affected by the change of an entity.
4
Ciao does provide customized visualizations of different relationships among
entities. But it is analyzers who have to find out the structure of the program, i.e. which
modules the program has and how they are connected together. With information
extracted by Ciao as shown in Table 2 and without any modest kind of document,
analyzers have to base on their own experience to determine system modules. This
decision is very important as it affects the assessment of coupling or coherence of the
system, but is not easy to make. In fact, I find this is the most painful task in the project.
2.3 Dead - functions
The term “dead - function” is used for those functions that never get exercised [1],
thus are useless and should be taken away from the system. This scenario can happen
when a new version of the software is built based on the previous versions. Modifications
to the system may leave some old functions unused.
In order to detect those functions, we first extract all the functions that are not on
any reference paths from the root function (main). Base on the relational database
extracted from the source code (using cia), Acacia/CIAO allows us to build up a
function-to-function reference graph. The receiving graph is pretty complex and
incomprehensible in printing form. However, we can refine the graph in Ciao by deleting
unimportant relationships. In this particular case, I had deleted all the functions that come
from the system include files to concentrate on the program’s functions only. The refined
graph is shown in Figure 3. There are six functions that stay alone without any reference
from the root function: BookCmd, ShowSmallBoard, ShowHashKey, ShowCBoard,
ShowMvBoard and InitFICS. This is not the final answer though. There are two
possibilities for a function not to be on any reference paths from root. The first one is that
they are dead functions (and that are what we are looking for). The second possibility is
that they are indirectly invoked through variables. In this case, we need also the variableto-function relationship. The query for this relationship returns that there are no variables
invoke six functions listed above. Now, we can be sure that they are dead-functions and
thus can be taken away.
2.4 Unnecessary include files
C and C++ software systems typically share data types, macros, and declarations
of global variables by including common header files. The header files and their
interdependencies form include hierarchy. As with any other parts of the system, include
hierarchy grows with the project as features are added, deleted or modified. It can
become very large and complex, thus makes it very difficult for programmers to decide
when a file must be included. Since including a file without any useful information is
usually harmless, programmers tend to include enough files so that the program will
compile. This causes an extra cost of compilation overhead due to the processing of
unneeded files. Therefore, it is useful to find out when an include file is needed or not.
This information can be used to redo the code or refine the include hierarchy.
5
6
GNU Chess has six program include files and lots of built-ins. I am only
interested in the program include files though. Surprisingly, among those six functions,
there are two that are unnecessary. The first one (univ.h) is included by univ.c, a file
that never compiles. In fact, this is the only file that can not be compiled in GNU Chess,
probably is an under-experiment part. It does not effect the system performance though.
The second one (eval.h) is more interesting. It is included in a file that really plays a
role in the system. Figure 5 shows the include hierarchy and reference graph of cmd.c (a
module that provides commands to the system). The dotted edge between cmd.c and
eval.h means that cmd.c includes the header file, but does not directly use any
information contained in it. I did a small test, in which I took out the include file from
cmd.c and then compile it. The system compiles without any problem.
Figure 4. A reference graph that shows all include files in cmd.c
2.5 Summary
All the tasks of learning the system structure, detecting dead functions and
unnecessary include functions are achieved.
The most valuable lesson that I learn from GNU Chess is how to deal with a nontrivial program without any documentation. The approach applied here is bottom-up, in
which from all complex relationships between functions, we extract out higher level
structures. As the project is done, it seems that another approach is possible, especially
7
for GNU Chess. As the main function is clearly specified with not many directly linked
functions, it is possible to analyze the system in top-down manner as well. In this second
approach, we can take each function that is directly linked with the main as a module of
the first level. Finding all functions that are called directly from these functions could
possibly form the second level of system architecture.
3. Acacia/CIAO
Acacia/CIAO is a set of tools for analyzing C, C++ and HTML source, consists of
CIA and Ciao. CIA is a static extractor that can extract information about system and
store it in some source model. Ciao is a customizable graphical navigator for software
and document repositories that helps large software projects to regenerate their software
blueprints and trace architecture evolution [3]. Such a set of tools is very useful for
software maintainers and programmers because software document is rarely precisely
reflect its features and architecture and it is almost impossible to look at the code of large
projects.
Starting with Acacia/CIAO is not easy although installing it is not a big deal.
Everything is straightforward until one tries to run the demo. The demo is too simple.
The graphs pop up are nice, but we can not do much with them. My first impression was
“can it really do as what is said?” [3]. There is very few and incomplete guideline on how
to exploit the tool. The only way to do it is to learn as going.
With GNU Chess, I can not try all the features of Acacia/CIAO. There is only one
version of GNU Chess available, making a real structure differences with Acacia/CIAO
impossible. Other features are learnt and studied as the tasks are done.
3.1 Database extraction
To have a program database that works with Acacia/CIAO is not difficult. If we
have a system that can compile, then it is pretty sure that we will be able to extract a
database. Only one simple ‘make’ file is needed and Acacia takes a good care of the rest.
However, as specified in the manual, there are many other options that users can use to
extract information from source code to build up database that can save disk space for the
database or customize it. I did not use them because I believe that GNU Chess is not large
enough to encounter such problems.
3.2 Query and graph viewer
All the queries are done from the Main View window or from a Graph View
window. Almost all items in the Main View Window and Graph View windows
submenus are used, but with different frequency. I find the most often used query is that
of relationship.
8
Combining relationship with entity and focus submenus can be used to achieve
very interesting results in query on interrelationship among entities such as variables,
functions, and modules.
#Include graph is very good for finding out unnecessary include file. However, it
is quite tedious to query every single source file to check for unnecessary include
functions. In GNU Chess, there are only 25 .c files. Just imagine with a much larger
system, this work must be very time-consuming.
In Acacia/CIAO, there is no direct query or graph viewer that shows the
interrelationship between modules (files). I have combined the relationship submenu with
reachable set and focus submenus. This results in a complete relationship among
functions or between a module (file) and other functions. Checking for the attributes of
functions give us all modules that contain these functions. But, there is still no graphical
visualization for these particular references. Analyzers have to make up a visualization
that suits them the most. I find that two-dimention table is a good start in understanding
the interrelationship among modules. For the rows and columns that have very few
checked mark (x), we can conclude that they have little interconnection with other
modules and can be independent. In contrast, for modules that have many checked mark
in their rows or columns, they seem to have strong interconnection with others and thus
are worth to have a deeper check in dependency. In the latter case, we can again use
Acacia/CIAO to query for all the shared variables among the interconnected functions, to
see if they are indeed strongly coupled or not.
I have tried several ways to detect dead function using Ciao. The most
straightforward method is to use a tool of CIA called deadobj. Unfortunately, this tool is
not included in Acacia/CIAO and thus cannot be exploited. An alternative is to view the
function-to-function relationship in Graph View window. The graph is rather complex
and could hardly detect any function that does not have a reference path from the main
function. To refine the graph, I have deleted all the functions that are called by C include
files. The result graph is clear enough to find out all stand-alone functions that mentioned
in section 2.3. The refinement is pretty much tedious and time-consuming, too. I believe
that for large projects, it is obviously necessary to have the CIA tool deadobj.
As seen in Figure 3, large program graphs are usually very complex. A good
feature of Ciao is that it allows user to manage this complexity by concentrating on
individual nodes of interest. We can

Find all program entities that an entity depends on directly or indirectly. For instance,
we can extract all functions that call one particular function.

Display a few layers of relationships centered on a particular node, using focus
submenu.

Display attributes of a particular node or query backward on certain node for different
information.
9
3.3 Database and text view
Both database and source views are directed to Text View windows. This mode of
view is appropriate for comparing structure differences. However, as this is not the goal
of the project, it is rarely used.
3.4 Other remarks
ADVANTAGES

The graphical interface of Ciao is friendly and easy to navigate. User can choose to
extend a current graph (using “inplace” mode) or create a new graph. With many
graph windows open at the same time, the navigation graph can help user to handle
them efficiently.

The logic of queries implemented in Ciao is well understandable. This is very helpful
for users to learn the tool, as the help document of Ciao is very inadequate. I find that
learning on going with Ciao is very interesting. cql used in Ciao is a lot similar to
other database query language such as sql. Furthermore, users can always view the
query result in graph or text mode to check if it is what they want or to find out what
needs to be changed in their queries.

Experimenting with Ciao make me believe that other analyzing tools that built on top
of Ciao will work well. There are several tools like that have been built. Dragger [1]
is a tool that can generate program graph and allow users to manage the complexity
of the graphs. TestTube [2] is another system that combines static and dynamic
analysis to perform selective re-testing of software systems.
DISADVANTAGES

The popup menu is not very pleasant: it can be hidden under a bunch of windows and
user has to move all of them around to find that small window.

It would be more convenient for users if Ciao can redo or checkpoint for "inplace"
queries. In case of running a sequence of queries, (which happens very often), users
might want to incrementally analyze a subset of entities. Each "inplace" query may
not be right and users probably want to roll back. However, it is not possible in Ciao.
Users have to re-start all queries in the sequence and wait for a while when Ciao
perform all the queries again.

Some tasks are tedious and time-consuming. For example, to detect dead functions
and unnecessary include files.
10
4. Conclusion
Ciao as I learn from the project is an interesting tool for system analysis. As a graphical
navigator, it supports programmers to navigate through an information database of the
program. Although there are a few aspects of Ciao that are not completely automated, it
does a good job in general. Together with other tools that are built on top of Ciao such as
Dagger or TestTub, the future tool can be one that can depict the architecture of system at
a high level. From what I see about Ciao and its related tools so far, why not!
I really enjoy doing this project as I learn and do a lot of things that have read from books
and papers. Gail, thank you very much.
References
[1] Yih-Farn R. Chen. Dagger: A tool to generate program graphs. In Proceedings of the
USENIX Unix Applications Development Symposium, pages 19-35, 1994.
[2] Yih-Farn R. Chen, David Rosenblum, and Kiem-Phong Vo. TestTube: A System for
Selective Regression Testing. In The 16th International Conference on Software
Engineering, pages 211-220, 1994.
[3] Yih-Farn R. Chen, Glenn S. Fowler, Eleftherios Koutsofios, and Ryan S. Wallach.
Ciao: A Graphical Navigator for Software and Document Repositories. AT&T Bell
Laboratories.
[4] http://www.research.att.com/~ciao/help/
11
Download