Ambrosia 15-612 Autonomous Agent Execution Environment (AXE) A secure, distributed agent execution system for the Java Virtual Machine. Ambrosia Joshua Baer, Bob Dean, John Huebner, Jed Pickel 15-612 Distributed Systems, Dr. Raj Rajkumar, TA Andrew Berry Carnegie Mellon University, Spring 1998 Chapter 1 Introduction The Agent eXecution Environment (AXE) Purpose Ambrosia has developed an agent execution environment and proven it has practical use by implementing several agents. An Agent eXecution Environment (AXE) is more flexible than a client/server design model because it allows arbitrary code to be executed on a remote machine. An agent system can also localize computations near data, reducing network transmissions and increasing perceived and actual performance. For example, a traditional client/server implementation of a mechanism to find and retrieve files on the network (“find file”) would require a specific server on every machine from which files were to be found. This “find file” server would actively wait for find file requests, wasting local resources. In a more flexible design, the server might provide remote directory operations as a service. This would allow the client to download the available directory structures from the server, search the directories locally, and then send a request to the server for a file transfer. The client would be more efficient if it made a set of concurrent requests for directories, but this would also make the client more complex. In contrast, an agent system would allow the search routines to be sent to multiple servers for execution. The file search would be able to occur local to the server, and also concurrently on multiple machines. This would improve latency, and reduce network bandwidth. Furthermore, each instance of the algorithm used by the client could be tailored to the search being performed. Overview An agent system offers high availability and fault tolerance using a fail-stop model. Availability is increased because a user can obtain an agent from multiple sources and execute it on multiple hosts. If an agent is replicated to a node, and that node fails before or during the execution of the agent, the parent agent can create and replicate a new agent to another node in the system. If the system semantics are designed to allow multiple copies of an agent, and/or agent cooperation, then a shared state method of active replication is easily supported. If an agent sends a copy of itself to another node and does not hear a timely reply, it can re-direct that task to another node. Performance can be increased by using long-term caching. An execution node saves the code and static data of an agent so that they need not be sent to the node the next time the agent is invoked. Terms This section defines a number of terms used to describe the Agent eXecution Environment. Some of these terms have been changed or updated since the submission of the original design proposal. Term Agent AXEKey Agent eXecution Environment (AXE) Execution Node Data Object Code Object Node Security Manager (sandbox, SecMan) Resources Sync Short Term Caching Long Term Caching Definition A self contained execution including code, data, security info An agent object which stores the public and private key, and also provides methods for manipulating and authenticating them. The distributed collection of applications that accept, executes, and transmits agents. It mediates between the agent and the operating system to acquire resources for the agent. A single application in the execution environment. It will only send agents that it will execute itself. The portion of an agent containing the state of an agent. The variables needed by an agent to run. Note: Constants are stored in the code object at compilation time. The portion of an agent that contains the execution instructions. It also includes static data since it shares its properties of read only and static after compile. A single instance of an Agent eXecution Environment on a single host. The internal AXE system which controls an agent’s access to resources. CPU cycles, file system, memory, network, display An agent entry point called by the system before transporting the agent to a different node. Allows the agent to save its state so that it is not lost during replication. The execution node storing the entire agent to disk, temporarily, for use in transmitting or re-transmitting to another node. The execution node storing the code, and static data of an agent, between invocations, to increase performance the next time that agent is invoked. 2 Chapter 2 Architecture The Agent eXecution Environment (AXE) Overall Design An Agent eXecution Environment (AXE) is best defined as a distributed collection of synchronized execution nodes. An execution node is a machine that provides ability to accept and execute objects (agents) from the network as a service. An agent is a combination of code, data, and log information, that has the ability to travel through the Agent eXecution Environment, under the constraints imposed by individual execution nodes. The details of agent transport will be explained later in this paper Agent Object w/ methods Log Data Data Structure Privilege List Exec Agent GUI Calls Creates Reads /Write s AXE Node Sec Man JVM Figure 2.1: The Architecture of a Single AXE Node An agent can only be introduced into the system from a node which itself is participating in the AXE. In order to provide an incentive to share local resources, a future restriction would only allow a node to introduce agents which would have permission to execute on the local node (the Golden Rule). To join the system, a newly launched node contacts an existing member and authenticates, after which the member contacts all existing nodes to add the new member to their list of nodes. This AXE is locked down during this information update using a twophase commit to maintain integrity. During the initialization of an AXE, the AXE registers its own security manager with the JVM. This gives control of the Java security checks to the AXE Agent Security Manager. Thus, the AXE has the ability to accept or deny every possible security request made within the JVM. See figure 2.1. Agent Master Agent Data Code Data Code Agent Data Legend: Agent eXecution Environment Code Machine Receives Process Creates Agent eXecution Environment Data Structure Sends Agent to Figure 2.2: Agent Transport An agent can be introduced into the system by sending it to any node. Usually a user launches an agent by executing a master program that initializes the agent’s data and sends it to a known node. One future addition would be to allow a node to launch agents without the aid of a master. Then, we could add the requirement that an agent to be introduced into the system from a node which itself is participating in the AXE, as an additional security measure. In order to provide an incentive to share local resources, another future restriction would only allow a node to introduce agents which would have permission to execute on the local node (the Golden Rule). Each node keeps a list of all functioning nodes. In the future a node would be able to store public keys for agents and other nodes. The node would have the abilities to function as a trusted source for other nodes for obtaining agents, and to authenticate checksums or hashes for known agents. The administrator at each individual node has the ability to configure the environment according to their local policies and procedures, such as a default security policy for unknown agents. At the present time, all agents have the same privileges, but the system has been designed so that it could easily be enhanced to support per-agent security restrictions. The distributed nature of this project takes place on two levels. The AXE itself is a distributed system that must maintain state, availability, and security. On top of that, the AXE provides a framework for individual agents to build their own distributed systems. 2 Each node is multi-threaded, has the ability to process multiple agents simultaneously, implements a GUI for configuration, and maintains its own log file. Any change of state is recorded by the log file. Logging Whenever a problem occurs within a complex computer system, administrators need the ability to immediately and effectively isolate the problem and fix it. The System Log is invaluable in this task. For this reason, every aspect of an AXE node is logged. The log allows the administrator to trace and plug holes in their security model. The AXE system design tries to ease the administrators work by incorporating log viewing into the graphical user interface, and by having each system module perform logging of its events. The GUI aids with log analysis by the inclusion of two elements: the Quick Log Window and the Log Viewer. The Quick Log Window resides in the lower right side of the GUI’s main window. It tracks the last one hundred messages logged by the system, allowing a quick reference for when the administrator becomes aware of a problem. If the Quick Log is not enough, the GUI also includes a Log Viewer. The Log Viewer is a separate window from the main GUI window. It also includes basic text manipulation features such as searching. The AXE System also aids with logging by allowing the administrator to control the amount of logging that takes place. At the lowest logging level, basic system events having to do with the life cycle of the agent are logged as well as events having to do with the addition/removal of nodes from the Environment. The second logging level is a Trace. At this level the Agent Interface logs all API calls, and the Security Manager begins to log every security check, including the agent which necessitated the security check, and whether or not the check succeeded. The third and final logging level is Verbose. At this level every aspect the of system is logged. When an agent makes a system access the state of every agent is written to the log for comparison. When the Security Manager is called, a failed Authentication results in a dump of the full execution stack. Such measures aid the administrator by giving a base line to compare an event with, separating an error caused by a rogue agent from possible system glitches. Agent Transportation Agent transport is one of the major elements of the Agent eXecution Environment. The transportation system is what allows an agent to be sent around the network from node to node. There are three main parts to the transport system design: the agent object, the agent replication process, and agent caching. The Agent as an Object The agent object has three main objects within it. These objects are the code object, data object, and the security/authentication object. The agent has been designed in this way to limit the executable segment’s access to corruptible data. This design allows the agent to 3 be easily transported across the network as a single object, and also allows the execution environment access to vital security information about the agent before the agent is executed. Code Object: contains the actual Java byte code for the agent. The code object also contains constants required for execution, but not large static data structures. This includes such items as final variables and predefined strings. As far as the agent is concerned, this object is execute only. Data Object: contains the current state of the agent. If an agent is to be transferred and restarted on another execution environment at the current point of execution, all necessary data for this restart is saved by calling for a Sync. If no state is needed at the new location, this object will be "empty." This object is only available to the agent by Execution Environment system calls such as GetData and SetData. No other access is allowed to the agent. Between these calls, a local copy of the object can be manipulated and saved. This object also includes separate static data such as graphics files, agent specific help and other static data which the agent may wish to bring with it. Security/Authentication Object: contains all security data needed by the AXE for authentication and tracking of the agent. For example, this object contains a public key to allow for agent authentication and methods for manipulating and authenticating keys and data. The agent, through the use of AXE system calls, can read this object without restriction, but has no write access. The authentication portion of this object contains a checksum to ensure that the object is intact, version information, and the author’s name. The Agent Transport and Replication Process The Agent eXecution Environment supports two methods of agent transport and replication. The first of these methods is manual control by the user. Each node has the ability to send an agent directly to another execution node of the AXE from the command line or (not fully implemented) within the GUI. The second method of agent transport is Agent Replication. This is the process whereby and agent will send a copy of itself to one or more execution nodes. The agent achieves this through the use of execution node system calls. Currently, there are two options for agent replication. The first is a straight transfer of the agent currently residing in the client’s cache. This means that when the agent is executed on the new node, the execution will be independent of the parent agent’s current state at time of transfer. The second transfer method is where the agent requests that its current state be sent along with the cached agent to the new node. In this method, the node calls a Sync in the agent before transport, to allow it to update the data object with any relevant state information. The first method could be used, for example, to upgrade a common utility agent such as a global find file. Such an agent does not require knowledge of any execution node for it’s own operation and therefore can be transferred without updating its cached Execution State. The second method is used by the Agent Interface (the AXE agent API) to send running agents from one node to another. The replicated agents will resume execution at the start of their code block, but their data will be preserved in the state saved by the sync() method. It is possible for a restarted agent to return to its pre-transport execution state by efficient use of its Data Object. 4 Under the current system design, it is the responsibility of the agent to update and store it’s own state by calling its setData() method before it is replicated. Each node of the execution environment contains a setData() method for the agent to complete this task if it so desires. Caching The level and complexity of the caching system used by each node is completely at the control of the node administrator. At the minimum level, all agents that are executed on a node should be placed into the node’s agent cache. This allows the node to start and stop an agent as needed without having to download the agent from the network each time. At this level, the administrator can decide to only allow hand-picked agents to run on the node, and for the node to refuse replication requests from other nodes. At the most complex level, the node accepts all replication requests from other nodes, and caches any agent that is sent to it independent of the agents executing on the node. Agents are cached regardless of whether or not they are ever executed. Security Manager The first security issue to address is which agents will be permitted by a node. Administrators should be able to decide whether to accept anonymous agents and choose in particular which agents to accept, while rejecting others. The use of the Agent’s Security/Authentication eases this task greatly and also allows it to be automated. Sandbox This environment is designed such that administrators at individual nodes have the ability to configure a default security policy for access to selected resources by the anonymous agents. Anonymous agents are agents that are not known by the local server. Known agents eventually will have a custom security policy based on the administrator’s level of trust for that agent. In the current implementation, all agents are treated as Anonymous. The security policy provides access control to: Network Resources Accept socket connections Open socket connections Listen for a network connection Use IP multicast Set the socket factory Local File System Delete files 5 Read files Write files Process Control Modify thread arguments Modify thread group arguments Know about the thread group for new threads Create subprocesses System Resources Use the printer Set system properties Access the clipboard Bring windows to the foreground Java Specific Dynamically load and link code libraries Access the AWT event queue Manipulate class loaders Halt Java VM Access members Access Java packages Define classes in packages Use the Security API Examine the stack depth of a class Authentication Upon receipt of an Agent, the Agent eXecution Environment must perform a number of functions to authenticate that agent. Fundamentally, the two primary authentication 6 requirements are: knowledge of where the agent came from, and assurance that the agent code is not modified from the known version. The AXE includes a public key infrastructure such that each node has a unique public/private key pair and each instance of an agent has the option of having a public/private key pair. Ideally, outgoing agents would signed, and incoming agents will be verified by checking the signature. This functionality would be implemented at the node and could not be altered by an agent. This form of authentication proves the true source of an agent, and that the agent was not modified in transit. In order to assure that agent code is not modified from a known version by a malicious node, a built-in Java one-way hash function is used. Hashes of running agents are stored on every node. Upon receipt of an agent, a hash of the agent code is computed and compared with the hash stored on the node. To reduce the chance of man in the middle attacks, this comparison could be encrypted using the public key infrastructure in place. Encryption With a public key infrastructure already implemented, we could implement the option of encrypting all data transmitted between servers. This would add significant processing and network overhead, and might be best implemented as an optional AXE service which agents could take advantage of if necessary. Shared Distributed State Each node stores a list of names, IP addresses, and public keys for the other primary nodes of the AXE. Each node is able to: List all machine names in AXE Return most idle node in AXE (least number of agents, but ideally the largest number of free cycles) Return public key for given machine name Return public key or hash for given agent Verify a signed piece of data Removing Nodes from the AXE Nodes can be removed from the AXE on demand on in response to an error. Choosing “Exit” allows a node to gracefully remove itself from the AXE. Additionally, if any errors are encountered communicating with an existing member of the AXE, the member who 7 discovers the error notifies all other members and each node removes the offending member from the AXE. 8 Chapter 3 Detailed Design and Implementation Problem areas, tradeoffs, and design decisions RMI vs. Object Serialization vs. Applets Java has a well-developed mechanism for running untrusted code, called the Applet class. Existing Java Virtual Machines (JVMs) already implement a sandbox for this class. The advantage of using applets for our agents is that we could exploit the existing sandbox. The disadvantage of using applets for agents is that we have limited control over the existing sandbox. For the development of the AXE, Ambrosia chose to use Object Serialization. "java.io.ObjectOutputStream" marshalls objects for sending over a socket. "java.io.ObjectInputStream" unmarshalls the stream into an object again. Javaís Remote Method Invocation also has facilities to load a class locally. These classes provide the foundation for building a very rich execution environment, although they are at a lower level than applets. With the combination of the Agent Security Manager, the advantages of the Applet approach are achieved without the disadvantages. Additionally, RMI is used for maintaining state across all nodes. After initially implementing agent transport with standard object serialization, it was realized that RMI could have been used for this process. This would have had the uniformity of RMI for all network communication. Strict object serialization has a small performance advantage over RMI. If the AXE were to be re-engineered, RMI would be used for agent transport as well as maintaining shared state. Primary Backup vs. Shared State There were two design choices for implementing the shared state of the AXE. One approach was to use a central server with one or more primary backup servers. This is efficient for updates, but limited in backup capabilities to the number of backup machines. Another approach was a shared state system where each node maintains an identical data structure with all pertinent AXE information. This is more processor and network intensive, but maintains higher availability and better matches the peer-focused agent environment. The shared state system was decided to be the best implementation for the AXE. This turned out to be simpler to implement since there is no primary server or list of backup servers to keep track of or allocate. It also simplifies adding new nodes to the system and controlling updates. . . . . Scalability . . The downside. of choosing the shared state approach is that it does not scale well. Each node locks down . the entire AXE every time it joins the AXE, exits the AXE, launches an agent, or terminates and agent. With hundreds of nodes, this would result in excessive . network traffic and reduced processing time due to the locks. Ideally, in future implementations a method of linking multiple AXEs together without sharing state between different AXEs would be developed. This would allow agents to be authenticated and transported between AXEs based on geography, network topology, or human resources. Agent Security The AXE overloads the Java Security Manager in order to implement the above mentioned security model. The Security Policy for a node is defined by a Privilege Setting List. This list defines the agent security context for the node by containing information on each aspect of the Agent Security Manager. When the Agent Security Manager receives a specific security check request, the appropriate context within the Privilege List is checked. If the Privilege Setting allows the action that prompted the security check, then the check is a success and the calling thread is allowed to continue executing. There is one major difficulty when dealing with the Java Security Manager: there can only be one Security Manager per Java Virtual Machine. To explain this, take into account the above mentioned security implementation and the following scenario. An administrator identifies an agent that is abusing the node’s networking capabilities. Instead of killing the agent, the administrator changes the default security policy so agents can no longer accept socket connections. The difficulty is that with the general JVM, this now means that the AXE can no longer accept socket connections either. This problem is solved by authentication. If a security check fails the initial check by the Privilege Setting List, then the Agent Security Manager checks the authentication level of the calling thread. Where an agent may have restricted system access, the threads involved in running the AXE do not. This authentication is accomplished in two ways: Thread Groups – Within the node each agent is executed as a thread. When the thread is created it is assigned to the Agent Thread Group. Part of the Security Authentication Process is to check the Thread Group of the calling thread. If the Thread Group is the Agent Thread Group, access is denied Execution Stack – The Security Manager has the ability to access the Class Execution Stack of the Java Virtual Machine. Authentication is also accomplished by searching the classnames on the stack for a name containing the “agent” keyword. If such a match occurs, then an agent was the source of the security check and access is denied. This necessitates the requirement for each agent to have the keyword within their classname. Since all agents are verified by the Node before they are executed, it is possible to deny access to agents that do not comply with the naming requirement. 2 . . . . Both authentication processes are used in the current implementation of the Agent . Security Manager, although both are reasonably safe, the double level of security is more . safe than either alone. . . advantage to overloading the Java Security Manager. The system can There is another . information about the execution of an agent external to the log. This maintain statistical information can then be displayed to the User/Administrator in a simple quick reference format. Since each agent is giving an agent control block to maintain information on the agent as it executes, it is possible for resource tracking to be accomplished. This is done by maintaining flags within the Agent Control Block for each type of system resource. Thus when the security manager determines that an agent is making a File System call, the file System flag is set to be true. This is advantageous to the Administrator since it allows Administrator to determine which agents are accessing the various aspects of the system without having to resort to the master log. Tracking statistical data for each agent also allows the AXE to identify the Stability of the agent. Stability is classified as being one of the following three settings: Stable, Unstable, and Hostile. A Stable agent is one that is executing without error. An Unstable agent is one that has made at least one violation of the nodes Security Policy. Such an agent is ore likely to make further violations as well. The Unstable setting flags this to the Administrator. The final setting is the Hostile setting. An agent is deemed Hostile if it has made multiple general security violations, and/or has tried to access parts of the node system that the Administrator has decide to “watch”. For example, an Administrator decides that he does not want agents to access the Java System for the Node. A Hostile agent would be one that tries to access the Java System, even though it may not make multiple security violations required to be generally defined as a Hostile Agent. There is only one major way to improve upon the Agent Security Manager. With the addition of public key agent verification, it would be possible to have multiple privilege levels as opposed to the single level currently implemented. This would also be easy to implement. Each privilege level would require its own Privilege Setting List, and the code for this already exists. Additional logic would need to be added to the Authentication method. This logic would be responsible for mapping an agent to the relevant Privilege Setting List based on its hashcode and AXEKey. A better implementation would be to add a reference the Privilege Setting List to the Agent Control Block when the agent is initially added to the node. When authentication is performed the Authentication function would merely need to extract this reference from the Agent Control Block for the current agent. The Graphical User Interface The Graphical User Interface of the AXE has been designed to give the maximum amount of information to the user/administrator quickly. This way the administrator can react quickly when a problem arises. For this reason the AXE GUI contains an Agent List, Active Node List, and the Recent Log. The purposes of these elements have been introduced above. The Recent Log was discussed in the Logging section. The main object in the GUI is the Agent List. This lists all agents that are currently on the node. In addition, the statistical data collected within the security manager is also displayed here. 3 . . . . . . . . . The only improvement currently under consideration is to have the Active AXE Node list control the main display. Thus the GUI could be used to administer any node of the AXE. 4 Chapter Chapter 4 Agents Specific Agent Implementations and Ideas Distributed Processing We implemented a distributed processing agent to show the tremendous increase in performance that harnessing multiple CPU’s for a calculation delivers. In order to distribute a computation, one must first devise a way to break the computation into independent parts that can be processed individually and return results that can be easily combined to yield the final result. The calculation that chosen was numerical integration because the integral can be broken x=0.0 x=2.0 Figure 4.1 x=0.0 x=2.0 Figure 4.2 into many sections that can later be summed. This follows the process of Reimann Sums. Reimann Sums works on the principle that an integral can be represented as a sum of rectangles of a height equal to the function at the rectangle’s location. There can be great error in using this process depending on the width of the rectangles used. Rectangle width is varied depending on the slope of the function to reduce error. The function x2 is the function used by the agent. This method of approximation introduces some error where the corners of the rectangle fall bellow or above the true area as shown in figure 4.3. If we assume the function is monotonic, then the worst case scenario can be calculated where the function has a discontinuity immediately after the lower bound as shown in figure 4.4. This comes out to a maximum error of (fn(max)-fn(min)) * (max-min). Error . . . . . . . . . Error Error Calculated Area x=0 Figure 4.3 x=0 x=2.0 Figure 4.4 x=2.0 The distributed processing agent is executed by running a master application written in Java. It creates an agent interface, agent and a special data object used but the distributed processing agent. It initializes the data with zero for a lower bound and its local host as the site to send results to. The master sets the upper bound and threshold according to the command line parameters. Once the data object is initialized, the agent and its data are sent to the AXE on the local host. When the distributed processing agent first starts executing, it calculates the worst case error and checks to see if it is less than the threshold value. If the error is greater than the threshold value, then the agent sends itself to the next two hosts and terminates. The first copy of itself is sent with bounds equal to the parent’s lower bound and mean. The second copy of itself is sent with bounds equal to the parent’s mean and upper bound. In this way, it splits the area to be summed into two smaller areas. The initial design called for an agent to tell the master that it was splitting. The master would then reserve space for the children and wait for their results. This made resending an agent that timed out simple, since the master only needed to check the list for children that had not reported in. This design was flawed, however, because the children’s results often preceded the parent’s notification that it was splitting. As a result the child’s response would be lost and the parent’s splitting notice would create two empty entries. This was fixed by modifying the protocol to be time independent. Instead of trying predict what messages were expected, we sent only partial sums with boundaries. The master is then able to compare the boundaries of the sums that it has so far to determine what ranges need to be recalculated. To assist this process the master creates a binary tree of results. Each result has a value for to sum, lower and upper bounds, and a flag indicating 0.0-2.0 0 . 0 2 1.0-1.5 . 0 0.0-1.0 1.0-1.25 2 . 0 2 . 5 2 . 0 3 1.25-1.5 . 0 0 . 0 41.0-2.0 . 0 2 . 5 3 . 0 2 . 0 4 . 0 1.5-1.75 2 1.5-2.0 3 . 0 3 . 5 3 . 0 4 1.75-2.0 . 0 3 . 5 4 . 0 . . . . whether it is partially or totally complete. . . is implemented by using Java’s ServerSocket. SetSoTimeout() to specify Fault tolerance the number of.milliseconds for the master to wait between agents reporting partial sums. . the java.io.InterruptedIOException and send out a new agent to fill in the Then we catch empty nodes .in the tree. FindFile The purpose of the Find File agent is to demonstrate the ability of the Agent eXecution Environment to share global resources. The resource in this case is long term storage media. The agent gives the client user the ability to search for data concurrently on multiple execution nodes, and then retrieve that data. The agent can be viewed as being similar to the Find File utility found with Windows 95 and NT, but on a global scale as opposed to local one. The Agent consists of three parts: the application interface dialog, the slave agent and the file transfer agent. The process is as follows: 1) The user fills in the interface dialog. The information entered can be an exact filename (foobar.doc), or a substring of possible file names (foo*). This data is placed within the Data Object of a slave agent. The slave agent is then replicated to all the nodes of the AXE. 2) The slave agent searches the shared directory tree of each node for a match to the criteria housed in its Data Object. The slave compiles all matches and sends them back to the interface dialog. 3) The interface dialog collates all return data and displays it in a graphical list to the user. The user then selects the file(s) that they wish to download. At this point the interface dialog places the file name in the Data Object of a file transfer agent. The file transfer agent is then sent to the appropriate node where it opens the file, and sends it back across the network to the interface dialog. Administrative Reporter The administrative agent will allow one machine to monitor other machines on the network. We will attempt to track as much information as possible, however, Java turns out to be a major limitation in this area. In achieving its cross-platform execution, detailed system information would be compromised. We would ideally like to track: Idle cpu cycles Free disk space Free RAM 3 . . . . Network traffic (kb/s) . . Currently running processes . Currently.running agents . Percentage user/agent processing time Unfortunately, the only information which is readily available from Java is: Java VM version Java Machine Architecture Resources accessed Number of agents running on the node To use the Admin agent, one execution node will launch the agent, which will send a copy of itself to a single node (arbitrarily). The agent will then bounce from node to node, sending back status reports to the master. The agent can be configured to do a single pass through the AXE or just keep rotating through all available nodes on a continuous basis. The master would watch the slaves for extreme values or known patterns. Upon detecting a possible problem, a human would be notified via email or possibly numeric pager. Humans could check AXE status at any time by viewing a web page which summarized the current statistics. We decided not to implement the Administrative Reporter agent for a few reasons. First, some of the functionality originally intended for Admin was brought into the node itself, such as tracking the agents running on each node. Second, Java turned out to be extremely limited in the information available about the local system. Intrusion Response Agent While there are any number of examples where distributed agent technology can be applied, we have elected to include a short discussion about how this technology can be applied to the network security field. After discovery of a network intrusion, determining the full extent of the intrusion often requires a coordination effort across multiple machines and multiple sites. Among other issues, this coordination requires detailed analysis of audit information that often extends well beyond the realm of a single network. This is a long and painful process requires a significant amount of human effort. Agent technology can be leveraged to solve this problem quickly, and reduce the large overhead of human effort involved in responding to network intrusions. A single agent could follow audit trail information across multiple machines and multiple networks to the extent permitted by individual nodes. 4 . . . . Of course, there are a whole set of other security issues introduced by applying agent . technology across untrusted networks. Agent technology, a piece of mobile code . performing some function in an automated fashion, could easily be applied for insidious . are also a number of issues about access to system resources as well applications. There . of service attacks. Although we have though through these problems to as potential denial some extent,.we will avoid getting into great detail in order to preserve the brevity of this report. 5 Chapter 5 Discussion The process, the results, the experience Features Not Implemented There were a number of features we were not able to implement due to time constraints. They are listed in order of rough priority. Persistent agent trail Sign all nodes, agents, and data (code for signing developed but not deployed) Distinction between “Allowed Agents” and “Anonymous Agents” Individual security policies for known agents Caching (developed but not deployed) Encourage sharing of resources with “The Golden Rule” policy Link together more than one AXE for scalability Coordinating a Team of Programmers Many of the beneficial lessons learned from this project revolved around issues of coordinating a team of four programmers. One of the significant technical issues learned by all of us was using CVS in order to maintain a shared state for our code development; however, most of our lessons learned resulted from human interaction issues. These issues included agreeing on meeting times, differences in schedules, differences in coding style, and differences in work habits. In retrospect, we could have improved our process by maintaining regular meeting times, providing regular status updates, and establishing milestones, goals and deadlines for integration. Overall, we all learned some very valuable lessons from this project that we can apply next time we work on a group project. One tool which proved invaluable for the development of this project was a group email discussion list. Besides providing basic group communication facilities, it also archived all messages on a secure web site. This allowed us to hash out many of the details of our project Chapter 6 Summary A few concluding comments What was accomplished This turned out to be a very interesting project for the four of us. We were a small group relative to the others, and composed of people with very diverse backgrounds. Unlike many other groups, none of us knew each other prior to taking this class and forming our group. Needless to say, this created some obstacles to teamwork and communication which took us a while to overcome. The first few weeks were spent getting a feel for each team member’s capabilities, work style, and level of commitment. Once we agreed to the autonomous agents theme, we gained excitement for the project and its possibilities. We were intrigued by the idea of a useful, cross platform FindFile solution, as well as a cross platform server monitoring system. Two of us had never programmed in Java before, and were excited to gain experience with it. We were able to construct the basic Agent eXecution Environment and get it to run on three platforms; Solaris, Linux, and WindowsNT. We could not get a Macintosh node up and running due to lack of support for rmiregistry. Agents can be introduced into the environment and replicate themselves to multiple nodes. All nodes are aware of all other nodes and agents in the system at all times. All agents are monitored by a node-specific security policy with fine control over system resources. Detailed logging is present at multiple levels with a variety of ways to access and view the log. Of three initially planned agents, we were able to get two functional prototypes operational. DistProc is an interesting example of an agent replicating itself many times for distributed processing. FindFile is a usable implementation for network searching. The administrative reporting agent turned out to be quite limited since most of the information we would be interested in still platform specific at this time (idle cpu cycles, process list, system resources). Appendix A Code Summary The nitty-gritty Overview All parts of the AXE lie within the ambrosia package. The package is divided into two sections, ambrosia.axe for code relating to the environment and nodes, and ambrosia.agent for specific agent code and generic agent support code. ambrosia.axe ambrosia.axe.core This contains fundamental code for the AXE. ambrosia.axe.gui This contains all of the code for the AXE graphical user interface. ambrosia.axe.server This contains the RMI stubs and other communication code. ambrosia.axe.util This contains utilities and tools used by the server, including authentication tools. ambrosia.agent ambrosia.agent.(agent) This is where specific agents must be located. For security reasons, only agents with “ambrosia.agent” in their package name will be allowed to execute on the AXE. . . . . ambrosia.agent.core. . This contains.fundamental code required by all agents. . ambrosia.agent.sample . This is a sample agent useful for testing purposes. ambrosia.agent.util This includes utilities and tools which would be useful to agent developers. 2 . . . . . . . . . 3 . . . . . . . . . 4