Ambrosia Autonomous Agent Group (AAAG) by

advertisement
Ambrosia Autonomous Agent Group
(AAAG)
by
Jed Pickel, John Huebner, Robert Dean, Joshua Baer
System Design Document
Sunday, February 15, 1998
1. Introduction ________________________________________________________ 1
1.1
Purpose ______________________________________________________________ 1
1.2
Overview _____________________________________________________________ 1
1.3
Terms _______________________________________________________________ 1
2. Agent Execution Environment _________________________________________ 2
2.1
Overall Design ________________________________________________________ 2
2.2
Agent Transportation __________________________________________________ 2
2.2.1
2.2.2
2.2.3
2.3
The Agent as an Object _____________________________________________________ 2
Agent Transport and Replication _____________________________________________ 3
Caching _________________________________________________________________ 4
Security Model ________________________________________________________ 4
2.4
2.5
2.6
2.7
2.7.1
2.7.2
Sandbox___________________________________________________________________ 4
Authentication _____________________________________________________________ 5
Encryption _________________________________________________________________ 5
Location and Authentication Agent (LAA) _______________________________________ 5
Primary Name/Key Server __________________________________________________ 5
Backup Name/Key Server ___________________________________________________ 6
3. Agents_____________________________________________________________ 6
3.1
Find File _____________________________________________________________ 6
Security Issues ____________________________________________________________________ 6
3.2
Administrative ________________________________________________________ 7
3.3
Distributed Processing __________________________________________________ 7
4. Unresolved Design issues _____________________________________________ 7
4.1
Object Serialization vs. Applets __________________________________________ 7
4.2
Is static data part of the code object or the data object? ______________________ 7
4.3
Will nodes use network broadcasts to find administrative servers? _____________ 8
1.
Introduction
1.1
Purpose
Ambrosia will develop an agent execution environment and prove that it has practical use by implementing
several agents. An Agent Execution Environment (AEE) is more flexible than a client/server design model
because it allows arbitrary code to be run on the remote machine. An agent system can localize
computations near data.
For example, a traditional client/server implementation of a mechanism to find files would have to have a
server on every machine from which files were to be found. This server would do nothing but wait for find
file requests. They would serve no other purpose. In a more flexible design, the server might provide
remote directory operations as a service allowing the client to implement find file by requesting all the
available directory structures and then requesting the file, once the client found the file. The client would
be more efficient if it made a set of concurrent requests for directories, but this would also make the client
more complex.
In contrast, an agent system could allow the browsing of directories to occur on the same machine that held
the disk, while copies of it searched the disks of other machines. This would improve latency, and reduce
network bandwidth. Furthermore, the algorithm that the client used could be tailored to the type search
being performed. In fact, the Execution Node server could be used for arbitrary agents.
1.2
Overview
An agent system offers high availability and fault tolerance using a fail-stop model. Availability is
increased because a user can obtain an agent from multiple sources and execute it on multiple hosts. If the
node an agent is heading for fails, the node that it was departing from can re-invoke it or re-direct the
replication to another node. If the semantics are designed to allow multiple copies of an agent, and/or agent
cooperation, then the state machine method of active replication is easily supported. If an agent sends a
copy of itself to another node and does not hear a timely reply, it can re-direct that task to another node.
Performance can be increased by using long-term caching. An execution node saves the code and static
data of an agent so that they need not be sent to the node the next time the agent is invoked.
1.3
Terms
Agent
Security Object
Agent
Execution
Environment (AEE)
Execution Node
Data Object
Administrative
Server (“server”)
Code Object
Sandbox
Resources
Short Term Caching
Long Term Caching
A self contained execution including code, data, security,
The portion of an agent containing the Audit Trail, and Public Key (Private?)
The distributed collection of applications that accept, executes, and transmits
agents. It mediates between the agent and the operating system to acquire resources
for the agent.
A single application in the execution environment. It will only send agents that it
will execute itself.
The portion of an agent containing the state of an agent. The variables needed by
an agent to run. Note: Constants are stored in the code object at compilation time.
The application that keeps track of participating nodes and authenticates agents.
The portion of an agent that contains the execution instructions. It also includes
static data since it shares its properties of read only and static after compile.
The Execution Node system that controls an agent’s access to resources.
CPU cycles, disk space, memory, network, display
The execution node storing the entire agent to disk, temporarily, for use in
transmitting or re-transmitting to another node.
The execution node storing the code, and static data of an agent, between
invocations, to increase performance the next time that agent is invoked.
1
2.
Agent Execution Environment
2.1
Overall Design
An Agent Execution Environment (AEE) is best defined as a distributed collection of execution nodes. An
execution node is a machine that provides ability to accept and execute objects (agents) from the network
as a service.
An agent is a combination of code, data, and log information, that has the ability to travel through the
Agent Execution Environment, under the constraints imposed by individual execution nodes. Details of
agent transport will be covered in detail later in this paper.
An agent can only be introduced into the system from a node which itself is participating in the AEE. A
node will only have permission to introduce agents which would have permission to execute on the local
node. This mechanism will provide incentive to permit local resources to be devoted to the AEE.
Activity between nodes within the Agent Execution Environment will be coordinated by an administrative
server. The purpose of this server is to maintain state of the AEE. This server will: keep a list of all
functioning nodes, act as key server to store public keys for agents and nodes, function as a trusted source
for obtaining agents, and have the ability for nodes to query for the trusted checksums or hashes for known
agents. The server will be implemented as an agent, which removes the requirement of being bound to an
individual node. A separate server is planned for each network segment. This will allow nodes to locate
their local server by broadcasting. Local servers will have the ability to communicate with servers in other
networks, such that the system will scale. Details of this server are described later in this document.
The administrator at each individual node has the ability to configure that environment according to their
local policies and procedures. Some of the configuration options are a default security policy (for unknown
agents), and the option of configuring individual security policies for known agents.
Details of the security model are included below.
The distributed nature of this project takes place at two levels. The AEE itself is a distributed system that
must maintain state, availability, and security. On top of that, the AEE provides a framework for individual
agents to build their own distributed systems.
Each node will be multithreaded, have the ability to process multiple agents simultaneously, implement a
GUI for configuration, and maintain its own audit trail. Any change of state will be recorded in the audit
trail.
2.2
Agent Transportation
Agent transport is one of the major elements of the Agent Execution Environment. The transportation
system is what allows an agent to be sent around the network from node to node. There are three main
parts to the transport system design: the agent object, the agent replication process, and agent caching.
2.2.1 The Agent as an Object
The agent object has three main objects within it. These objects are the code object, data object, and the
security/authentication object. The agent is designed in this way to limit the executables access to
corruptible data. This design also allows the agent to be easily transported across the network as a single
object, and also allows the execution environment access to vital security information about the agent
before the agent is executed.
2
Agent Object
Code Object
Constants
Data Object
Dynamic
Data
Security/
Authentication
Object
Static Data

Code Object: contains the actual Java byte code for the agent. The code object also contains constants
required for execution, but not large static data structures. This includes such items as final variables
and predefined strings. As far as the agent is concerned, this object is execute only.

Data Object: contains the current state of the agent. If an agent is to be transferred and restarted on
another execution environment at the current point of execution, all necessary data for this restart is
saved here. If no state is needed at the new location, this object will be “empty.” This object is only
available to the agent by Execution Environment system calls such as “Save_State” and
“Restore_State.” No other access is allowed to the agent. However, between these calls, a local copy
may be manipulated and later saved. There may also be a flag to the system call an agent uses to
transport it which will cause the state to travel with it. This also includes separate static data such as
graphics files, agent specific help and other static data which the agent may wish to bring with it.

Security/Authentication Object: contains all security data needed by the AEE for authentication and
tracking of the agent. For example, this object will contain a public key to allow for agent
authentication, and also an audit trail, which can be used to track the agent’s path through the network.
The agent, through the use of AEE system calls, can read this object without restriction, but has no
write access. The authentication portion of this object contains a checksum to ensure that the object is
intact along with version information, and the author’s name. This object is more fully explained by
the Security Monitor element of this document.
2.2.2 Agent Transport and Replication
The Agent Execution environment will support two methods of agent transport and replication. The first of
these methods is manual control by the user. For example, the user will be able to contact the primary
server and request a specific agent from its long-term cache. The primary server will then update the
agent’s audit trail and send a copy of the agent to the user’s client. Each client will also have the ability to
send an agent directly to another execution node of the AEE.
The second method of agent transport is Agent Replication. This is the process whereby and agent will
send a copy of itself to one or more execution nodes. The agent achieves this using execution node system
calls. Currently, there are two options for agent replication. The first is a straight transfer of the agent
currently residing in the client’s cache. This means that when the agent is executed on the new node, the
3
execution will be independent of the parent agent’s current state at time of transfer. The second transfer
method is where the agent requests that its current state be sent along with the cached agent to the new
node. In this method, when the agent is executed on the new node its state will be the same as the parent’s
state, and the two instances of the agent will be indistinguishable. The first method could be used, for
example, to upgrade a common utility agent such as a global find file. Such an agent does not require
knowledge of any execution node for it’s own operation and therefore can be transferred without updating
it’s stored Execution State. The second method could be useful for a system monitor agent that is
gathering data on the system as a whole. When the monitor leaves a node it would require the ability to
take whatever data it collects with it.
Under the current system design, it is the responsibility of the agent to update and store it’s own state
before it is replicated. Each node of the execution environment will contain whatever system calls are
necessary for the agent to complete this task if it so desires.
2.2.3 Caching
The level and complexity of the caching system used by each node is completely at the control of the node
administrator. At the minimum level, all agents that are executed on a node are placed into the node’s
agent cache. This allows the node to start and stop an agent as needed without having to download the
agent from the network each time. At this level, the administrator can decide to only allow handpicked
agents to run on the node, and for the node to refuse replication requests from other nodes. At the most
complex level, the node accepts all replication requests from other nodes, and caches any agent that is sent
to it independent of the agents executing on the node. Agents are cached regardless of whether or not they
are ever executed.
2.3
Security Model
Without security in mind, an Agent Execution Environment is a server. It has a listening socket that will
accept connections from arbitrary hosts, download, and execute arbitrary code. A significant portion of this
project deals with the tradeoff between security and usability. Security is a very important factor in the
design of this project.
The first security issue to address is which agents will be permitted by a node. Administrators can decide
whether to accept anonymous agents and can choose particular agents to accept, while rejecting others.
2.4
Sandbox
This environment is designed such that administrators at individual nodes have the ability to configure a
default security policy for access to selected resources by the anonymous agents. Anonymous agents are
agents that are not known by the local server. Known agents, on the other hand, each have a custom
security policy based on the administrator’s level of trust for that agent.
The security policy will provide access control to
 file system
 network
- initiate outgoing connections
- accept incoming connections
 interaction with other locally executing agents
 memory
 processing time
 other local resources (to be determined)
4
2.5
Authentication
Upon receipt of an Agent, the Agent Execution Environment (AEE) must perform a number of functions to
authenticate that agent. Fundamentally, the two primary authentication requirements are: knowledge of
where the agent came from, and assurance that the agent code is not modified from the known version.
The server will include a public key infrastructure such that each node has a unique public/private key pair
and each instance of an agent has the option of having a public/private key pair. Outgoing agents will be
signed, and incoming agents will be verified by checking the signature. This functionality will be
implemented at the node and can not be altered by an agent. This form of authentication proves the true
source of an agent, and that the agent was not modified in transit.
In order to assure that agent code is not modified from a known version by a malicious node, some sort of
one way hash mechanism will be used. Hashes of known versions of agent code will be stored on the
server. Upon receipt of an agent, a hash of the agent code will be computed and compared with the hash
stored on the server. This one way hash function will be implemented with either md5 or blowfish. To
reduce the chance of man in the middle attacks, this comparison will have to be encrypted using the public
key infrastructure in place.
2.6
Encryption
With a public key infrastructure already implemented, we may implement the option of encrypting all data
when in transit between servers.
2.7
Location and Authentication Agent (LAA)
The Location and Authentication Agent (LAA) is the only agent required by the AEE. The LAA running
on each execution node can have one of three possible states
 Primary Name/Key Server
 Backup Name/Key Server
 Name/Key Client
In a healthy AEE, there would be one primary name/key server, a few backup name/key servers, and many
name/key clients. The primary name/key server would typically serve a local network segment, although
there are no practical limitations to the AEE topology. Primary name/key servers group other primary
name/key servers and execution nodes into logical, geographic, or other groupings.
2.7.1 Primary Name/Key Server
Primary name/key servers act like folders or directories in a tree-structured file system. Each primary
name/key server stores a list of names, IP addresses, and public keys for the other primary name/key
servers and execution nodes below it in the tree. Optionally, it can define a ‘parent’ server, allowing reverse
traversal of the tree.
Primary name/key servers will typically service local network segments, for optimal performance. Also,
since machines which are physically close to one another often work together and know each other, this
will likely be the most useful scenario.
Primary name/key servers will provide a number of services and computations:
 List all machine names in AEE
 Return parent server name
 Return most idle node in AEE (largest number of free cycles)
5


Return public key for given machine name
Return public key or hash for given agent
2.7.2 Backup Name/Key Server
When an LAA is set to backup name/key server mode, it is not always actually used as an active backup
server. It is made available as a backup server from the local execution node, but before it will be used as
one it must appear in the ‘backup group’ list of a primary name server. The state information which must
be kept synchronized between the primary and backup servers consists of the public keys and agent hash
results. Cached data does not need to be synchronized; the caching mechanism will keep its data current
independently.
2.7.3 Name/Key Client
Most execution nodes on a given network segment will be name/key clients. A backup server which is not
in the ‘backup group’ list of a primary server will also function as a name/key client until it is added to a
primary server’s backup group.
Name/key clients talk to primary servers to obtain the list of all machine names in that AEE, the name of
the primary servers ‘parent’ server, and for help in choosing execution nodes to work with. They also get
public keys and agent hash results from the primary server, for authenticating agents and machines.
3.
Agents
3.1
Find File
The purpose of the Find File agent is to demonstrate the ability of the Agent Execution
Environment to share global resources. The resource in this case is long term storage media. The agent
will give the client user the ability to search for data in parallel on multiple execution nodes, and then
retrieve that data. The agent can be viewed as being similar to the Find File utility found with Windows 95
and NT, but on a global scale as apposed to local one. The Agent will consist of two parts: an interactive
user interface dialog, and a multi-threaded request listener. The user process is as follows:
1) The user fills in the interface dialog. The information entered can be an exact filename
(foobar.doc), or a substring of possible file names (foo*). This data is then sent to all execution nodes that
are running the Find File agent.
2) The request listeners on the execution nodes receive the search request. The listeners then
retrieve the shared directory tree from its host execution node, and searches it based on the request. Any
matches found are returned to the proper interface dialog along with any information necessary for
retrieving the data over the network.
3) The interface dialog collates all return data and displays it in a graphical list to the user. The
user then selects the file(s) that they wish to download, and the agent downloads the data to the user’s
machine.
Security Issues
The main security issue is that the agent would require both read and write file system access. If
no sandboxing is performed by the execution node, then the a rogue agent could corrupt the filesystem.
Our current sandboxing design resolves this issue by only allowing agents to access a file tree of user
defined shared files. Thus, the user has full control over the segments of his file system that the agent can
access.
6
3.2
Administrative
The administrative agent will allow one machine to monitor other machines on the network. We will
attempt to track as much information as possible, however, we expect Java to be a major limitation in this
area. In achieving its cross-platform execution, detailed system information was often compromised. We
will attempt to track
 Idle CPU cycles
 Free disk space
 Free RAM
 Network traffic (kb/s)
 Currently running processes
 Currently running agents
 Percentage user/agent processing time
To use the administrative agent, one execution node will launch the agent, which will send copies of itself
to all execution nodes it is authorized to access (unless a subset is specified). Once at the ‘slave’ nodes, the
administrative agents begin sending status reports back to the ‘master’ node at regular intervals.
The master will watch the slaves for extreme values or known patterns. Upon detecting a possible problem,
a human will be notified via email or possibly numeric pager. Humans could check AEE status at any time
by viewing a web page which summarized the current statistics.
3.3
Distributed Processing
Our planned demonstration for distributed processing is the generation of fractals. Fractals are convenient
because they are complex, iterative mathematical formula with a high degree of locality. Because of their
locality, it is easy to separate the task into smaller tasks. The agent for fractal generation will send itself to
many nodes, each copy of the agent will calculate a portion of the fractal using local CPU and memory, and
then will return the result to the parent who will re-assemble it for the user who launched it.
4.
Unresolved Design issues
4.1
Object Serialization vs. Applets
Java has a well-developed mechanism for running untrusted code, called the Applet class. Existing Java
Virtual Machines (JVMs) already implement a sandbox for this class. The advantage of using applets for
our agents is that we could exploit the existing sandbox. The disadvantage of using applets for agents is
that we have limited control over the existing sandbox.
There are other Java classes that support transporting code. “java.io.ObjectOutputStream” marshalls
objects for sending over a socket. “java.io.ObjectInputStream” unmarshalls the stream into an object again.
Java’s Remote Method Invocation also has facilities to load a class locally. These classes provide the
foundation for building a very rich execution environment, although they are at a lower level than applets.
4.2
Is static data part of the code object or the data object?
When constants or strings are part of an agent are they stored in the data or the code object? It is important
to ensure that static data is preserved in a Long Term Cache, while dynamic data is not.
7
4.3
Will nodes use network broadcasts to find administrative servers?
This seems convenient but implementation has not been explored and there may be hazards to this
approach.
4.4
Can two execution nodes communicate without a server?
Ideally, any AEE will be able to act as a name/directory server unto itself. This would allow two execution
nodes to communicate without a server to mediate the transaction. One execution node would point to the
other as it’s name/directory server, and the other execution node would act as a server with only those two
execution nodes on the network. If the server execution node was already part of another AEE group, it
would form a ‘virtual AEE’ with just those two execution nodes in it. No data or agent processes would be
able to transmit between the two AEE’s.
One possible solution is to have the server be selected and configured automatically. A new execution
node would broadcast to the network looking for servers. If none respond, it would declare itself a primary
server.
8
Download