OCTOBER 1981 LIDS-R-1158 COMMUNICATION, DATA BASES DECISION SUPPORT

advertisement
OCTOBER 1981
COMMUNICATION,
LIDS-R-1158
DATA BASES & DECISION SUPPORT
Edited By
Michael Athans
Wilbur B. Davenport, Jr.
Elizabeth R. Ducot
Robert R. Tenney
Proceedings of the Fourth MIT/ONR Workshop on
Distributed Information and Decision Systems
Motivated by Command-Control-Communications (C3) Problems
Volume ilI
June 15 - June 26, 1981
San Diego, California
ONR Contract No. N00014-77-C-0532
MIT Document Services
Room 14-0551
77 Massachusetts Avenue
Cambridge, MA 02139
ph: 617/253-5668 1fx: 617/253-1690
email: docs @ mit.edu
http://libraries.mit.edu/docs
DISCLAIMER OF QUALITY
Due to the condition of the original material, there are
unavoidable flaws in this reproduction. We have made every
effort to provide you with the best copy available. If you are
dissatisfied with this product and find it unusable, please
contact Document Services as soon as possible.
Thank you.
PREFACE
This volume is one of a series of four reports containing contributions from the speakers at the fourth MIT/ONR Workshop on Distributed
Information and Decision Systems Motivated by Command-Control-Communication
(C3 ) Problems.
Held from June 15 through June 26, 1981 in San Diego,
California, the Workshop was supported by the Office of Naval Research
under contract ONR/N00014-77-C-0532 with MIT.
The purpose of this annual Workshop is to encourage informal interactions between university, government, and industry researchers on basic
issues in future military command and control problems.
It is felt that
the inherent complexity of the C 3 system requires novel and imaginative
thinking, theoretical advances and the development of new basic methodologies in order to arrive at realistic, reliable and cost-effective designs for future C3
systems.
Toward these objectives, the speakers, in
presenting current and future needs and work in progress, addressed the
following broad topics:
1)
Surveillance and Target Tracking
2)
Systems Architecture and Evaluation
3)
Communication, Data Bases & Decision Support
4)
C 3 Theory
In addition to the Workshop speakers and participants, we would
like to
thank Dr. Stuart Brodsky
of the Office of Naval Research, and
Ms. Barbara Peacock-Coady and Ms. Lisa Babine of the MIT Laboratory for
Information and Decision Systems for their help in making the Workshop
a success.
Cambridge, Massachusetts
October 1981
MichaeZ Athans
Wilbur B. Davenport, Jr.
Elizabeth R. Ducot
Robert R. Tenney
III
COMMUNICATION,
FOREWORD
DATA BASES
& DECISION
SUPPORT
............................................... iv
RELIABLE BROADCAST ALGORITHMS IN COMMUNICATIONS NETWORK
Professor Adrian SegallZZ .....................................
1
THE HF INTRA TASK FORCE COMMUNICATION NETWORK DESIGN STUDY
Drs. Dennis Baker, Jeffrey E. Wieselthier, and
Anthony Ephremides .................................... 7
FAIRNESS IN FLOW CONTROLLED NETWORKS
Professors Mario Gerla and Mark Staskaukas ................
31
PERFORMANCE MODELS OF DISTRIBUTED DATABASE
Professor Victor O.-K. Li
.................................
37
ISSUES IN DATABASE MANAGEMENT SYSTEM COMMUNICATION
Mr. Kuan-Tsae Huang and Professor Wilbur B. Davenport, Jr. .. 47
MEASUREMENT OF INTER-NODAL DATA BASE COMMONALITY
Dr. David E. Corman ........................................ 73
MULTITERMINAL RELIABILITY ANALYSIS OF DISTRIBUTED PROCESSING
SYSTEMS
Professors Aksenti Grnarov and Mario GerZa .................
93
FAULT TOLERANCE IMPLEMENTATION ISSUES USING CONTEMPORARY
TECHNOLOGY
Professor David Rennels ...........................
123
APPLICATION OF CURRENT AI TECHNOLOGIES TO C2
Dr. Robert BechtaZ ....................................... 143
~~~ -^---------~~~~~~~~~~~~~~~~14311
A PROTOCOL LEARNING SYSTEM FOR CAPTURING DECISION-MAKER LOGIC
.....
Dr. Robert BechtaZ ....................................
151
ON USING THE AVAILABLE GENERAL-PURPOSE EXPERT-SYSTEMS PROGRAMS
Dr. CarroZZ K. Johnson ......................
1e1--
155
e
COMMUNICATION,
DATA BASES
AND DECISION
SUPPORT
FOREWORD
As in the companion volumes, the papers included in this volume
are not in the order of their presentation at the Workshop, but rather
are grouped according to theme.
The corresponding talks were, in fact,
scattered over five different days of the ten day Workshop.
The first three papers in this volume concentrate on C
tion issues:
communica-
SegaZZ's paper first reviews the present status of broad-
cast routing algorithms for communication networks from the standpoint
of reliability and efficiency, and thenpresents an adaptive tree
cast algorithm designed to be both reliable and efficient.
broad-
The next
paper by Baker, et. al., first describes the requirements for, and design
constraints placed upon, the HF Intra-Task Force Communication Network,
and then gives a distributed algorithm that is designed to allow Task
Force nodes to organize themselves into an efficient network structure.
Finally, a paper by Gerla and Staskanskas addresses the issue of fairness in routing and flow control algorithms.
The next five papers concern database and distributed processing
issues in C2 systems:
Li's paper discusses a five-step approach to the
modeling of the performance of concurrency control algorithms used in
distributed databases.
Next, the Huang and Davenport paper discusses
the communication problems associated with nonintegrated, heterogeneous
and distributed database management systems.
Corman's paper presents the
results of an analysis designed to establish the general requirements
for internodal database commonality so as to obtain effective coordination of Over-the-Horizon Targeting tactical operations.
The Grnarov
and GerZa paper presents a novel multiterminal reliability measure that
reflects the connections between subsets of resources in a system composed
of distributed processors, a distributed database and communications.
Finally, RenneZs paper discusses fault-tolerance problems concerning
systems using contemporary LSI & VLSI technology; both from the viewpoint of fault-detection, recovery and redundancy within component
-iv-
computers and from the viewpoint of the protection of sets of computers
against faults.
The last three papers in this volume relate to decision support
systems:
First, BechteZ reviews the application of current artificial
intelligence technologies to C2 systems and then, in his second paper,
discusses the early stages of work directed towards the development of
a protocol learning system that attempts to capture the logic used by
human decision makers.
Lastly, Johnson's paper gives a comparison and
evaluation of the various presently available general-purpose expertsystems programs.
_v_
RELIABLE BROADCAST
ALGORITHMS
IN COMMUNICATION NETWORKS
BY
Adrian Segall
Department of Electrical Engineering
Technion, Israel Institute of Technology
Haifa, IsraeZ
This work was conducted on a consulting agreement with the Laboratory for
Information and Decision Systems at MIT with support provided by the Office
of Naval Research Under Contract ONR/NOOOZ4-77-C-0532.
1.
Introduction
Broadcast multipoint communication is the delivery of copies of a
message to all nodes of a communication network. C3 - oriented communication
network, as well as civilian networks, often require broadcasts of messages
and the purpose of this paper is to first survey the existing broadcast
algorithms and second to introduce a new algorithm that has the advantage of
combining reliability and efficiency, properties that are of major importance
to C3 - systems.
Many subscribers of a C3 - communication network, whether they are
ships, aircraft, Marine Corps units, etc., are mobile, and their location and
connectivity to the network may change frequently. Whenever such a change
occurs and the user needs to connect to a new network node, this information
must be broadcast to all nodes, so that the corresponding directory list entry
can be updated. Broadcast messages are used in many other situations, like
locating subscribers or services whose current location is unknown (possibly
because of security reasons), updating distributed data bases, transmitting
battle information and commands to all units connected to the communication
network, and in fact in all cases when certain information must reach all
network nodes.
There are certain basic properties that a good broadcast algorithm
must have and the most important, especially for C3 - systems, are:
a) reliability,
b) low communication cost,
c) low delay,
d) low memory
requirements. Reliability means that every message must indeed reach each
node, duplicates, if they arrive at a node, should be recognizable and only one
copy accepted, and messages should arrive in the same order as transmitted.
The reliability requirement is important in both civilian and military networks,
but in the latter type it is much more difficult to satisfy, since changes
in network topology are more likely to occur because of enemy actions and
mobility of network nodes. Communication cost is the amount of communication
necessary to achieve the broadcast and consists of, first, the number of
messages carried by the network per broadcast message, second, the number of
control messages necessary to establish the broadcast paths and, third, the
overload carried by each message. Low delay and memory are basic requirements
for any communication algorithm, and broadcasts are no exception.
2
2.
Broadcast algorithms - a brief survey
A detailed survey of the existing broadcast algorithms appears in
[1]; we shall give here a brief description of the most important ones and
of their main properties.
(i)
Separately addressed packets is the simplest method, whereby the
source node makes copies of the broadcast packet, one copy for each
destination node, and it sends each copy according to the normal
routing procedure. The communication cost of this procedure is of
course extremely high.
(ii)
Hot potato (flooding). Whenever a node receives a broadcast packet
for the first time, it sends copies of the packet to all neighbors,
except to the one the packet was received from. All copies of the
packet received later by the node are discarded. This is achieved
by using sequence numbers for the broadcast packets and each node
remembering the sequence numbers of the packets received so far.
This method is fast and reliable, but quite expensive in terms of
communication cost, overhead and memory requirements. The number
of messages carried by the network per broadcast message is 2E
where E is the number of network links.
(iii)
Spanning tree. An (undirected) spanning tree is a graph superimposed
on the network such that there exists exactly one path on the graph
between each pair of network nodes. Broadcast on. a spanning tree
is achieved by each node sending a copy of an incoming message on
each tree branch except along the branch on which it arrived. This
method is the cheapest possible in terms of communication cost
because it requires N-1 messages per broadcast message, where N is
the number of network nodes. On the other hand, it
is hard to make
it reliable and adaptive, because it is not clear how to coordinate
the tree construction, in case of network topological or load
changes, and the actual sending of broadcast information.
it also
requires a large number of control messages in order to build the
tree.
3
(iv)
Source associated spanning trees requires building N directed
spanning trees, one associated with each source node, and broadcasting information originating at a node on the corresponding
spanning tree. This method provides in general lower delays than
method (iii), but has the same drawbacks as that method compounded
by the fact that one has to deal with N trees.
3.
The adaptive tree broadcast algorithm (ATBA)
Any routing algorithm constructs paths between each pair of network
nodes and if all paths to a given destination have no loops, they form a
spanning tree in the network (one spanning tree for each destination). The
routing algorithm proposed in [2] and extended in [3] to take into account
topological changes, has the property that it maintains at any instant of
time a spanning tree for each destination and the broadcast algorithm proposed
here, named the adaptive tree broadcast algorithm (ATBA) exploits exactly
this fact. The idea is to save the construction of the source associated
broadcast trees by using instead the routing trees as built by the routing
algorithm of [2]. In this way the trees are used for broadcast as well as
routing purposes.
The major addition to the algorithm of [2] to allow the use of the
trees for broadcast purposes stems from the fact that broadcasts propagate
uptree, whereas in the distributed routing algorithm each node knows only
the node that is downtree from itself. Consequently, the algorithm must
provide to each node the knowledge of its uptree nodes. This information is
obtained as described below.
At the time a node i changes its preferred neighbor (i.e. the downtree node) it sends a message named DCL (declare) to its new preferred
neighbor and a message named CNCL (cancel) to the old one. Node i is then
allowed to enter a new update cycle of the routing algorithm of [2] only
after DCL has been confirmed, thus ensuring node i that the new preferred
neighbor is indeed aware of the fact the i is uptree from it and also that
all broadcast messages belonging to the previous cycle have already arrived
at node i.
· ~~·P··- C~--------~~--~(~~---~~
To summarize, the required additionsto the routing algorithm are:
a) DCL and CNCL messages,
b) slow down the update cycle only in extreme
situations (confirmation of DCL arrives after the propagation of the new
update cycle),
c) memory at each node required to store broadcast messages
that arrive during the present and previous cycles. The latter is needed
at a node j say, in case a node sends DCL and node j must forward broadcast
messages that have not been received by the other node from its previous
preferred neighbor.
The detailed algorithm is given in a paper in preparation where
reliability is also proved.
Each broadcast message propagates on a tree, so that the communication
cost is minimal, no overhead is necessary in the broadcast messages and the
extra load is N short control messages per cycle, while the routing protocol
uses 2E messages per cycle, where N,E are the number of nodes and links
in the network respectively.
References
[1]
Y.K. Dalal and R.M. Metcalfe, Reverse path forwarding of broadcast
packets, Comm. ACM, Dec. 1978.
[2]
P.M. Merlin and A. Segall, A failsafe distributed routing protocol,
IEEE Trans. Comm., Sept. 1979.
[3J
A. Segall, Advances in verifiable failsafe routing procedures,
IEEE Trans. Comm., April 1981.
5
6
THE HF
INTRA-TASK FORCE COMMUNICATION NETWORK DESIGN
BY
Dennis J. Baker
Jeffrey E. Wieseithier
Anthony Ephremides
Naval Research Laboratory
Washington, D.C. 20375
7
STUDY
THE HF INTRA-TASK FORCE COMMUNICATION NETWORK DESIGN STUDY
Dennis J. Baker, Jeffrey E. Wieselthier, and Anthony Ephremides
Naval Research Laboratory
Washington, D. C. 20375
I.
INTRODUCTION
The trend towards more sophisticated, highly computerized Naval platforms
is generating new communication requirements that cannot be met by existing
communication systems. At the Naval Research Laboratory (NRL) a new communication network is being developed which will form the primary Extended Line of
Sight (ELOS; 50 to 1000 km) communication system for ships, submarines, and
aircraft that form a Task Force. This new system, called the HF Intra-Task
Force (ITF) Communication Network, will use HF radio waves (2 to 30 MHz) to
interconnect the various Task Force platforms.
The HF ITF Network can be characterized as a general purpose military
communication network with time varying connectivities that uses broadcast HF
radio waves to link nodes. This combination of characteristics makes the HF
ITF network unique. There are a few existing networks that share some of
these characteristics but none that possess all of them. The HF ITF Network
must handle both voice and data, bursty and non-bursty traffic. Some of the
communication traffic may require delivery to the destination within a few
seconds while other traffic may tolerate delays of hours. Communication
traffic classification levels will vary from unclassified to highly classified. The Network must support several modes of communication including
point-to-point, broadcast, and conferencing. Network survivability is an
important consideration since the Network may be subject to communication
jamming, physical attack, spoofing, etc.
We envision the HF ITF network to employ modern networking techniques such
as automated switching, adaptive routing, channel sharing, and distributed
network control to overcome the deficiencies of existing HF ELOS networks. We
begin by reviewing the operational requirements for the HF ITF network and by
describing briefly some of the environmental and equipment related constraints
that affect its design. This is done in Section I. We then focus the paper
upon the identification of the special networking issues that result from
these requirements and constraints. Specifically, in Section II we address
the architecture of the network. The areas covered in the first section of
this paper are discussed in greater detail in [WIES 81].
OPERATIONAL REQUIREMENTS
First we consider the operational requirements imposed upon the HF ITF
network in its role as the primary ELOS communication system in the intra task
force environment:
I.1
o Number of Nodes - This number is variable, generally ranging from two
to one hundred. The nodes are usually located within a circle 500 km in
diameter.
~~·~-·-- -···-·- --- · ~·~·---···--·---·------
-- ·-- -8
o Mobility - All of the nodes in the task force are mobile. Their
maximum speeds are approximately 100 knots for ships, 50 knots for submarines,
and 1000 knots for aircraft.
o Variable Topology - The HF radio connectivity of the nodes in the net
may change due to varying radio wave propagation conditions, noise levels,
hostile jamming, interference from other members of the task force and other
sources, node destruction, platform mobility, and channel switching, thereby
affecting the network's configuration.
o Internetting - The HF ITF network should have automated internetwork
communication capabilities with the Defense Communications System (DCS), with
the Joint Tactical Information Distribution System (JTIDS) [SCHO 79], and with
the World-Wide Military Command and Control System (WWMCCS).
o Adaptability - The network must be adaptable in near real-time to a
In particular the HF
considerable variety of operating modes and scenarios.
ITF network must be designed for robust performance in the presence of severe
jamming. The dual role of HF as the primary ELOS system and also as backup
for long haul ultra high frequency/super high frequency (UHF/SHF) satellite
links requires HF ITF network flexibility.
o Communication Modes - The HF ITF network should provide for point-topoint and broadcast modes of operation; in the broadcast mode of operation the
transmitted information may be received by more than one station.
o Precedence Levels - The system should provide for at least the following
standard military precedence levels for record traffic: FLASH, IMMEDIATE,
PRIORITY, and ROUTINE.
Traffic and Speed of Service - The network must handle both voice and
data traffic. Approximate speed of service requirements vary from approximately 5 seconds for tactical voice/data traffic to several hours for ROUTINE
record traffic.
o
o Freedom From Errors - Acceptable bit error rates (BER) vary from 10-3
to 10-5 depending upon whether voice or data is being transmitted. Forward
error correction (FEC) and/or automatic repeat request (ARQ) methods must be
used to achieve such error control levels.
o Graceful Degradation (Survivability) - One of the most important
requirements of the HF ITF network is survivability; the network should degrade
gracefully under stress conditions. Network degradation will usually be caused
by loss of nodes or jamming; however, it may also arise as a result of the
increased traffic during times of crisis. Consequently, during periods of
stress only essential traffic should be permitted access to the network.
o Security - The HF
(COMSEC). Communication
interception. To combat
Probability of Intercept
operation.
ITF Network must provide for Communications Security
threats include jamming, spoofing, and communication
the latter threat, the network must provide Low
(LPI) and Limited Range Intercept (LRI) modes of
.?-r--.I1.
4 li.i~.,.p~.1~
I- -----rr -- r·~·~_
~ * IOl~I
X~^9B~IP
~
~
~
-----
~-~9
1.2
ENVIRONMENTAL CONSTRAINTS
The physical and military aspects of the HF ITF environment lead to the
following additional network constraints.
o Military Operational Environment - The military aspects of the HF ITF
network play a dominant role in shaping its design. Thus the network must be
designed for optimum performance in stressed (i.e., jamming and/or physical
attack) environments in addition to providing good performance under
non-stressed conditions.
o HF Radio Wave Propagation - The HF radio channel is both fading and
dispersive with propagation occurring via both groundwaves and skywaves. The
HF medium has been designated as the primary medium for ELOS intra task force
communication largely because of the ELOS propagation ranges of HF groundwaves. Groundwave attenuation varies with frequency and sea state conditions
but is generally more predictable than skywave attenuation [CREP 76]. Skywave
radiation is partially absorbed by the lower ionosphere during the daytime,
making skywave paths difficult to use especially at the lower end of the HF
spectrum. The HF ITF network will rely primarily on the use of HF ground
waves to connect nodes. Skywave signals will typically be considered as a
source of multipath interference. Useful references on HF communication
include [KRIL 79, WAGN 77 and WATT 79].
o Noise - Noise contamination is especially severe in the HF band in the
ITF environment.
The noise sources that must be considered include both atmospheric and man-made. For the HF ITF network, the potentially significant
sources of man-made noise are jammers, locally generated platform
interference, and interference from other HF ITF network users and other
non-hostile users of the HF medium.
o Transmitter Power Levels - Transmitter power limitations are an
important equipment related constraint imposed upon the ITF network. Radiated
power level is an important factor in determining the communication range and
thereby the connectivity of the network. Large radiated power levels are
often desirable to combat jamming. On the other hand, local electromagnetic
interference (EMI) rises as transmitter power increases. This may result in
communication link outages due to excessive noise in the collocated receiver.
Reduced power levels are useful for LPI operations. Thus, transmitter power
levels should be variable and under network control.
o Number of Signals - There are also constraints on the number of
different signals that may be transmitted simultaneously from a single
platform as well as constraints on the number that may be received and
demodulated simultaneously. Currently these limitations vary among Navy
platforms depending upon a particular platform's missions, size, and relative
value. For the Navy's HF narrowband system in use today, these limitations
arise primarily from constraints on the number of separate transmitters,
receivers, antennas, and multicouplers available on each platform. Today, a
large Naval vessel is, typically, capable of transmitting or receiving at 2400
bps on 7-9 narrowband circuits simultaneously. A new HF wideband
spread-spectrum AJ system is presently under development rDAVI 80, HOBB 80].
The maximum number of signals that will be able to be simultaneously
10
transmitted or received at a platform using the new HF system is unknown at
this time. In fact it is anticipated that the results of the network design
study may influence the hardware capabilities of the new spread-spectrum
system.
The contemporary Naval narrowband HF systems architecture lacks the
flexibility and responsiveness required to implement modern networking
techniques such as packet switching, adaptive routing, distributed network
management, and the integration of voice and data traffic. Present HF systems
(with the exception of LINK 11 [SCHO 79]) are basically designed as manually
operated systems. The new wideband architecture, however, will permit the
realization of the network design concepts discussed in this report.
1.3
SWITCHING, ROUTING, AND SIGNALING CONSIDERATIONS
There is considerable controversy about the merits and demerits of the
three basic switching modes that may be used in communication networks; these
modes are circuit (or line) switching, message switching, and packet
switching. Generally, if the messages are long, then circuit-switching is
preferable, while if they are short, message- (or packet-) switching is better
[KLEI 76, 78, KUO 81, ROBE 78, SCHW 771.
The ITF network traffic tends to be of the "short message" type, although
circuit connections might be needed for voice conversations. Thus a hybrid
solution may be preferable, consisting of circuit switching for voice and
message- or packet-switching for data. Of course the situation is not that
clear-cut, since there may be short voice messages and long file transfers as
well. However, packet voice techniques are not yet sufficiently developed for
the ITF or similar networks. One of the major difficulties associated with
packet voice is the requirement of a nearly constant packet delay (throughout
each voice transmission) to ensure the intelligibility of the speech rCOVI 79].
The integration of different types of communication traffic in the ITF
network will be an important subject for future investigation. Recent journal
articles on integrated switching in military communication networks include
[BIAL 80, COVI 80, MOWA 80].
Routing
ITF network
Distributed
fashion are
networks is
under changing topology is an open problem, characteristic of the
and of great interest in the field of networks in general.
algorithms to handle routing under changing topology in a failsafe
discussed in [MERL 79, SEGA 81a, b]. Routing in packet radio
addressed in [GAFN 80, GITM 76, KAHN 78].
The ITF Network should use spread spectrum signaling techniques [DIXO 76a,
b] in order to provide protection from jamming and interception of messages,
as discussed in [WIES 81]. The use of spread spectrum signaling leads
naturally to the use of Code Division Multiple Access (CDMA) techniques, since
under CDMA the dual purpose of providing multiple access capability as well as
jam resistance can be achieved. We use the term CDMA to include all forms of
spread spectrum multiple access, i.e., direct sequence (DS), frequency hopping
(FH), and hybrid FH-DS signaling.
11
Under any CDMA technique the source transmits to the destination using a
particular code. (For example, in the case of FH signaling the code
corresponds to the FH pattern.) Division by code is analagous to division by
time (as in TDMA) or frequency (as in FDMA). Under CDMA, however, there is
only quasi-orthogonality as opposed to full orthogonality among the codes of
the different users. Therefore, signals transmitted using different codes can
interfere with each other. (In the case of FH signaling, a "hit" occurs when
two or more signals are simultaneously transmitted in the same frequency slot;
forward error correcting coding techniques can be used to handle the resulting
loss of data, as long as the number of bits lost is not too great.) As the
number of simultaneous transmissions (using different codes) increases, the
interference level increases gradually, typically resulting in graceful
degradation. The basic principles of CDMA, and considerations relating to its
use in the ITF Network environment were outlined in [WIES 81]; the
relationship between CDMA techniques and the proposed ITF network architecture
was addressed in [BAKE 81c].
II.
NETWORK ORGANIZATION
Because of the variable connectivity of the HF ITF Network and the need to
provide network survivability, the ITF Network will require techniques that
adaptively organize the mobile units into efficient network structures and
maintain these structures in spite of node and link losses. We have
developed, and present here, the organizational architecture for an intra-task
force (ITF) network that provides a network structure and a set of procedures
for dealing with the variable-connectivity problem, regardless of its source.
In the proposed architecture, the nodes self-organize into an efficient
network structure, and then they continually modify this structure, if
necessary, to compensate for connectivity changes. To perform these tasks,
each node runs an identical algorithm and updates the algorithm's data base by
exchanging control messages with neighboring nodes over the "control
channel."
In this sense, the algorithm is fully distributed.
Under the proposed architecture, the ITF Network would be continually
organizing itself into a hierarchical structure consisting of node clusters
Each cluster has a local controller ("cluster
such as shown in Figure 1.
head"), which can hear and be heard by all other nodes in the cluster. The
cluster structure is well suited to the use of certain multiple access
protocols [HEIT 76, LAM 79, MASS 80, TOBA 75, 80, WIES 80a, b] for
communication between cluster members and their heads. Cluster heads are
linked together using gateway nodes, if necessary, to form the "backbone"
network which consists of dedicated links. The set of links between each node
and its cluster head plus the links of the backbone network comprise the
"primary" links of the network.
12
The Linked Cluster Architecture possesses several desirable features
including:
1) The cluster heads can control and coordinate access to the
communication channel. 2) The backbone network provides a convenient path for
inter-cluster communication. 3) By having each cluster head broadcast a
message, the message is broadcast to all nodes. 4) A connected network is
formed using a reduced subset of the total number of potential links.
5) Vulnerability is reduced since the network does not rely on a central
control node; any node can assume the role of cluster head.
When the communicating nodes are within direct range of each other, it may
be preferable to set up auxiliary channels to handle some of the communication
traffic (especially voice traffic and long data transfers). For example, with
appropriate signaling, the Network might set up separate dedicated circuits
between pairs of ordinary nodes for the purpose of voice communications. The
channels used to form these voice circuits might be entirely distinct from the
channels used to form the primary links, and thus we call them auxiliary
channels.
II.1
Network Structuring Algorithms
We consider two methods for establishing node clusters. These two cluster
head selection rules comprise the main difference between our two network
structuring algorithms, which we refer to as the Linked Cluster Algorithm
(LCA) and the Alternative Linked Cluster Algorithm (ALCA). It is instructive
first to describe the two algorithms in a fictitious, centralized mode. That
is, we shall temporarily assume that a central controller has full
connectivity information about the entire network and proceeds to form the
clusters and to designate the cluster heads. In fact, to facilitate the
description, we shall further assume, for the moment, that the communication
range is fixed and common for all nodes. After describing the centralized
versions of these cluster head selection rules, we shall proceed to explain
their more interesting distributed implementation in which no node possesses
any prior knowledge about the other nodes, no coordinating node is present,
and the communication range is variable and unknown.
Before proceeding with the description of these algorithms, however, we
introduce the following terminology for describing the linked cluster
structure:
(1) Two nodes are said to be neighbors if they can communicate
with each other via a half-duplex HF channel. Thus, we do not consider one
way communication capability sufficient for connectivity. (2) Two clusters
are said to be directly linked if their cluster heads are neighbors. (3) A
node is a member of a cluster if the node is a neighbor of the cluster head or
if the node is itself the cluster head. (4) A cluster covers another cluster
if each member of the second cluster is also a member of the first cluster.
13
Cluster Head Selection Rule (LCA) - Centralized Version
This method produces the node clusters shown in Figure 2a. The nodes are
first numbered from 1 to N. The central controller starts with the highest
numbered node, node N, and declares it a cluster head. Then it draws a circle
around that node N with radius equal to the range of communication. The nodes
inside the circle form the first cluster. It then considers whether there are
nodes outside this circle. If there are, it tentatively considers drawing a
circle about N-1.
Should any nodes lie within this circle that were not
already within the first circle, node N-1 becomes a cluster head and a circle
is drawn about it. Then consideration of tentative cluster head status for
nodes N-2 , N-3, etc. follows, until all nodes lie within at least one circle.
The resulting arrangement provides every node with a cluster head. The
clusters may be directly linked, they may even cover one another, they may
simply overlap, or they may be disconnected. In the last two cases, selected
nodes must serve as gateways for the interconnection of the cluster heads.
Cluster Head Selection Rule (ALCA) - Centralized Version
In the alternative method the procedure is a slight variation of the one
just described. To facilitate comparisons with the LCA, the nodes are
numbered in reverse order from that shown for the LCA.
Thus, nodes 1, 2, 3,
etc. of the LCA examples are nodes N, N-l, N-2, etc. of the corresponding
ALCA examples. The central controller starts with the lowest numbered node,
node 1, declares it a cluster head, and draws a circle around it with radius
equal to the fixed communication range, thus forming the first cluster. If
node 2 lies in this circle it does not become a cluster head. If not, it does
become a head and the controller draws a circle around it. Proceeding in this
manner, node i becomes a cluster head unless it lies in one of the circles
drawn around lower numbered nodes. The resulting arrangement is shown in
Figure 2b. Unlike the previously described case for the LCA, with this method
no cluster can cover another nor can two clusters be directly linked.
We now briefly describe the distributed versions of our two algorithms.
The Linked Cluster Algorithm (LCA) - Distributed Version
The network architecture shown in Figure 1 is not, by itself, adequate for
the HF ITF Network. The structure shown in Figure 1 is based on a single
connectivity map for the network. However, over the entire HF band, there may
be several different connectivity maps due to variations in the HF
communication range with frequency. Consequently, we have considered a
network architecture that consists of several overlayed sets of linked
clusters, each set being similar to the one shown in Figure 1 and being based
on a particular connectivity map. Moreover, these connectivity maps are
continually being reformed in order to adapt to the time variation of the HF
ITF Network connectivities. The HF band is partitioned for this reason into M
subbands, for each of which a separate run of the algorithm is required in
order to produce the corresponding sets of clusters. These separate runs take
place consecutively during M epochs. During epoch i the algorithm is run for
the ith subband of the HF channel. A connectivity map is formed based on the
connectivities that exist within that subband. The algorithm then provides
for the selection of cluster heads and gateway nodes. When the M runs are
14
completed, the epochs repeat in a cyclic fashion providing a continual
updating process. Note that during any epoch only one set of linked clusters
is being reorganized - the remaining M-1 sets are unaffected. To prevent
disruptions in communication traffic flow, the network should route traffic so
as to avoid the subband in which the network is being reorganized.
Appropriate message framing provisions must be made of course in order to
avoid interruption of message transmissions at the beginning of the
corresponding reorganization epochs.
The schedule of events in the algorithm is shown in Figure 3. Each epoch
of the control channel is divided into two frames of N time slots each, where
N is the number of nodes. Each node transmits a control message during its
assigned time slot in each frame of an epoch. During the first frame, a node
broadcasts the identities of the nodes it has heard from during previous slots
in the frame. Thus, by the time it is ready to transmit in Frame 2, each node
knows all its neighbors.
The Linked Cluster Algorithm provides a deterministic rule by which each
node can ascertain, just prior to its Frame 2 transmission, whether it should
become a cluster head. According to this rule, the cluster head for node i is
the highest numbered node connected to node i, including node i itself. Each
node then broadcasts this determination in its assigned Frame 2 slot along
with its list of neighbors. Thus, by the end of Frame 2, each node knows:
its neighbors' neighbors, one hop away heads, and some of the two hops away
heads. This information is needed to determine which nodes must become
gateways for linking the clusters. After the M epochs have occurred, the
network has been organized into a distinct structure for each of the M
subbands. The algorithm is then repeated, recognizing that the connectivity
is time varying.
The Alternative Linked Cluster Algorithm (ALCA) - Distributed Version
The two algorithms have nearly identical implementations. Both use the
same data structures, and both follow the same transmission schedule shown in
Figure 3. Also, the formats for the control messages are nearly the same for
the two algorithms; the only differences are that the rules for determining
cluster heads is different and, instead of announcing in Frame 2 whether it is
a cluster head, each node instead announces its own head. In the distributed
implementation, a node determines if it should become a cluster head as
follows. First, node 1 always becomes a cluster head and announces, in slot 1
of Frame 2 that it is its own head. Other nodes determine, just prior to
their own Frame 2 transmissions, whether they should become cluster heads.
The rule is that a node becomes a cluster head if it has no lower numbered
heads as a neighbor. If a node is not a head but is connected to more than
one head, the lowest numbered head is this node's own head. An additional,
arbitrary difference between the two algorithms is that, in the ALCA, the
lowest numbered nodes (instead of the highest numbered) are preferred for
gateway status. Thus, in the ALCA, the lower numbered nodes are more likely
to become cluster heads and gateways whereas the LCA favors the higher
numbered nodes for these roles. Details relating to the formation of gateways
under the two algorithms are given in [BAKE 81 a,b,c].
15
III.
SIMULATION RESULTS
A simulator model was constructed to provide examples of network
structures obtained with both algorithms. In our simulation model the
determination of whether two nodes are within communication range is based on
the HF groundwave range model shown in Figure 4. The actual communication
range will differ, of course, from that given by the model. However, this
model is representative of the variation of the groundwave communication range
with frequency over the HF band.
Frequency Dependence - Since the communication range varies significantly
across the HF band, we envision the network as consisting of the overlay of
several sets of linked clusters, each set derived from a connectivity map
formed using a different frequency. The frames of Figure 5 show the resulting
network structures for six epoch frequencies. These particular frequencies
were chosen because they provide examples corresponding to a wide variation in
the communication range. In this example, connected nets are formed at all
but the highest frequency.
The technique of overlaying several sets of linked clusters provides
alternative communication paths. If a backbone network link is lost at one
frequency due to jamming, other backbone networks at other frequencies can be
used. When the net is reorganized in the subband in which the jamming occurs,
a new backbone network will be set up that will not contain the jammed link.
Unfortunately, because the same nodes (i.e. the higher numbered ones, in
the case of the LCA) are more likely to become heads and gateways for each
epoch in our example, these nodes will be overburdened with network management
and traffic direction responsibilities. Moreover, the appearance of the same
nodes in several different backbone networks makes the network too dependent
on these nodes. For example, in Figure 5, the loss of node 9 would sever the
backbone network at all frequencies. Although, the network would begin to
compensate for the loss of this node by restructuring the backbone network,
epoch by epoch, parts of the network might remain disconnected until a full
cycle of epochs occurred. This problem can be avoided by introducing a
dynamic node numbering strategy.
Node Numbering - Given the simple strategy used in deciding the identity
of a cluster head or a gateway node among a group of candidates, it is clear
that number assignment to the nodes is a very important part of the proposed
organization. For example, in the ALCA the lower numbered nodes simply have a
greater tendency to become heads or acquire gateway status than higher
numbered nodes while in the LCA the opposite is true. One possible way to
alleviate problems associated with having the same nodes become heads and
gateways is to assign to each node a different number for each epoch. A
simple strategy that tends to produce orthogonal backbone networks is to
invert the numbering on successive epochs. That is, nodes 1, 2, 3, etc.
become nodes N, N-l, N-2, etc. An example of such a strategy is shown in
Figure 6 for both the LCA and ALCA. The results show some separation of the
backbone networks, however, the nodes numbered 3 and 9 in Frames (a) and (d)
and 2 and 8 in Frames (b) and (c) still appear in each of the backbone
networks. That this is unavoidable can be seen by considering the complete
connectivity map for this set of nodes, which is shown in Figure 7. Since
16
nodes 2 and 8 are cut set nodes, they must necessarily be part of the backbone
network. In general, a strategy of node number inversion followed by number
randomizing on alternate epochs should produce well separated backbone
networks if there are no cut set nodes. Thus, if a node becomes a head or
gateway in several networks when node renumbering is used, then this node is
likely to be a "critical" node in the sense that its loss may split the
network.
Loss of Nodes - Since the ITF Network is a military network, its nodes may
be disabled or destroyed by physical attack. Consequently, both of our
network structuring algorithms provide for sensing node losses and for
reconfiguring the network, if necessary. An example of this, using the ALCA,
is shown in Figure 8. Frame (a) shows the resulting backbone network for an
initial network of 20 nodes. Each subsequent frame corresponds to a network
obtained by deleting five nodes from the network shown in the preceding
frame. The nodes that are "lost" are chosen from among the nodes that are
most likely to become cluster heads or gateways. Since the ALCA favors the
selection of the lower numbered nodes as heads and gateways, the five lowest
numbered nodes were deleted in successive frames.
The network adapts to the loss of nodes 1 through 5 as follows. The roles
of heads 1 and 4 are taken over by nodes 10 and 7. The role of gateway node
2, which links clusters 1 to 8, 6 to 8, and 1 to 6, is assumed by node 15,
which links clusters 8 to 10 and 6 to 10. The loss of node 2 also results in
the creation of the new gateway at 9, which links clusters 6 and 8. The loss
of nodes 3 and 5 has no immediate effects.
The additional loss of nodes 6 thru 10 has the effect of disconnecting the
backbone network. This is unavoidable since, for example, the loss of node 7
results in the isolation of node 20. Likewise, nodes 12, 13, and 19 are also
isolated from the rest of the network once nodes 7 and 8 are lost. The
cluster head roles of nodes and 8 and 10 (Frame (b)) are taken over by nodes
15 and 11 (Frame (c)). The effects of the loss of head 6 are borne by new
heads 11 and 15. That is, all the nodes within cluster 6 (Frame (b)) are now
contained within- the combination of new clusters 11 and 15 (Frame (c)). Also,
the disappearance of 6 negates the need for a gateway node at 9. Thus the
gateway role of 9 does not have to be taken over by any other node.
The additional loss of nodes 11 through 15 causes no further partitioning
of the network; it still comprises three isolated parts. However, the roles
of heads 11 and 15 and gateway 16 (Frame (c)) are now taken over by the single
head at node 16 (Frame (d)). Again, we emphasize that, at other frequencies,
the network may still be connected.
IV.
CONCLUSIONS
We have described the requirements and constraints imposed on the HF
Intra-Task Force (ITF) Communication Network. Guided by these requirements
and constraints, we have developed distributed algorithms that allow the Task
Force nodes to self organize into an efficient network structure. Our network
17
structuring algorithms provide the HF ITF Network with a survivable
architecture that adapts the network structure to connectivity changes arising
for any reason. Path redundancies resulting from distinct structure within
each subband provide a robustness to the ITF Network. The architecture
provides a framework for investigating other networking issues, including
routing, flow control, and protocol design.
18
BIBLIOGRAPHY
BAKE 81a Baker, D. J. and A. Ephremides, "A Distributed Algorithm for
Organizing Mobile Radio Telecommunication Networks," Proceedings of
the Second International Conference on Distributed Computing Systems,
pp. 476-483, April 1981.
BAKE 81b Baker, D. J. and A. Ephremides, "The Architectural Organization of a
Mobile Radio Network via a Distributed Algorithm," to appear in IEEE
Transactions on Communications.
BAKE 81c Baker, D. J., A. Ephremides, and J. E. Wieselthier, "An Architecture
for the HF Intra-Task Force (ITF) Communication Network," submitted
for publication as an NRL report.
BELL 80
Bell, C. R. and R. E. Conley, "Navy Communications Overview," IEEE
Transactions on Communications, Vol. COM-28, 1573-1579, September
1980.
BIAL 80
Bially, T., A. J. McLaughlin, and C. J. Weinstein, "Voice
Communication in Integrated Digital Voice and Data Networks," IEEE
Transactions on Communications, Vol. COM-28, 1478-1490, September
1980.
COVI 79
Coviello, G. J., "Comparative Discussion of Circuit- vs.
Packet-Switched Voice," IEEE Transactions on Communications, Vol.
COM-27, 1153-1160, August 1979.
COVI 80
Coviello, G. J. and R. E. Lyons, "Conceptual Approaches to Switching
in Future Military Networks," IEEE Transactions on Communications,
Vol. COM-28, 1491-1498, September 1980.
CREP 76
Crepeau, P. J., "Topics in Naval Telecommunications Media Analysis,"
NRL Report 8080, December 1976.
DAVI 80
Davis, J. R., C. E. Hobbis, and R. K. Royce, "A New Wide-Band System
Architecture for Mobile High Frequency Communication Networks," IEEE
Transactions on Communications, Vol. COM-28, 1580-1590, September
1980.
DIXO 76a Dixon, R. C., Spread Spectrum Systems, John Wiley and Sons (New York,
1976.
DIXO 76b Dixon, R. C., ed., Spread Spectrum Techniques, IEEE Press (New York,
1976.
EPHR 81
Ephremides, A. and D. J. Baker. "An Alternative Algorithm for the
Distributed Organization of Mobile Users into Connected Networks,"
Proceedings of the 1981 Conference on Information Sciences and
Systems (CISS), held 25-27 March 1981 at Johns Hopkins University.
19
GAFN 80
Gafni, E. and D. P. Bertsekas, "Distributed Algorithms for Generating
Loopfree Routes in Networks with Frequently Changing Topology,"
Proceedings of the Fifth International Conference on computer
communication, pp. 219-224, October 1980.
GITM 76
Gitman, I., R. M. VanSlyke and H. Frank, "Routing in PacketSwitching Broadcast Radio Networks," IEEE Transactions on
Communications, Vol COM-24, pp. 926-930, August 1967.
GOOD 76
Goodbody, R. L. et al, "Navy Command Control and Communications
System Design Principles and Concepts," (8 volume set), NELC TD 504,
15 August 1976.
Vol. VI. Appendix E - Networking Principles and Features, Appendix
F - Data Base Management Considerations, Appendix G - NC 3 N User
Data and Information Exchange Network Requirements,
HEIT 76
Heitmeyer, C., J. Kullback, and J. Shore, "A Survey of Packet
Switching Techniques for Broadcast Media," Naval Research Laboratory,
Washington, D.C., NRL Report 8035, October 12, 1976.
HLUC 79
Hluchyj, M. G., "Connectivity Monitoring in Mobile Packet Radio
Networks," Technical Report LIDS-TH-875, Laboratory for Information
and Decision Systems, MIT, January 1979.
HOBB 80
Hobbis, C. E., R. M. Bauman, R. K. Royce, and J. R. Davis, "Design
and Risk Analysis of a Wideband Architecture for Shipboard HF
Communication," NRL Report 8408.
KAHN 78
Kahn, R. E., et al., "Advances in Packet Radio Technology,"
Proceedings of the IEEE, Vol. 66, No. 11, Nov. 1978.
KLEI 76
Kleinrock, L., Queueing Systems, Vol. 2:
Wiley Interscience (New York), 1976.
KLEI 78
Kleinrock, L., "Principles and Lessons in Packet Communications,"
Proceedings of the IEEE, Vol. 66, pp. 1320-1329, 1978.
KRIL 79
Krill, J. A., "Methods for Computing HF Band Link Parameters and
Propagation Characteristics," BGAAWC Data Linking Series Volume 7,
The Johns Hopkins University Applied Physics Laboratory, December
1979.
KUO 81
Kuo, F. F., ed., "Protocols and Techniques for Data Communication
Networks," Prentice Hall, (Englewood Cliffs, NJ), 1981.
LAM 79
Lam, S. S., "Satellite Packet Communication - Multiple Access
Protocols and Performance," IEEE Transactions on Communications, Vol.
COM-27, pp. 1456-1466, October 1979.
MASS 80
Massey, J. L., "Collision-Resolution Algorithms and Random-Access
Communications," Technical Report, UCLA-ENG-8016, April 1980.
20
Computer Applications,
MERL 79
Merlin, P. M. and A. Segall, "A Failsafe Distributed Routing
Protocol," IEEE Transactions on Communications, Vol. COM-27, pp.
1280-1287, September 1979.
MOWA 80
Mowafi, O. A. and W. J. Kelly, "Integrated Voice/Data Packet
Switching Techniques for Future Military Networks," IEEE Transactions
on Communications, Vol. COM-28, 1655-1662, September 1980.
ROBE 78
Roberts, L. G., "The Evolution of Packet Switching," Proceedings of
the IEEE, Vol. 66, pp. 1307-1313, 1978.
SCHO 79
Schoppe, W. J., "The Navy's Use of Digital Radio," IEEE Transactions
on Communications, Vol. COM-27, No. 12, pp. 1938-1945, 1979.
SCHW 77
Schwartz, M., Computer-Communication Network Design and Analysis,
Prentice-Hall, (Englewood Cliffs, New Jersey), 1977.
SEGA 81a Segall, A., "Advances in Verifiable Fail-Safe Routing Procedures,"
IEEE Transactions on Communications, Vol. COM-29, No. 4, pp. 491-497,
April 1981.
SEGA 81b Segall, A., and M. Sidi "A Failsafe Distributed Protocol for Minimum
Delay Routing," IEEE Transactions on Communications, Vol. COM-29, No.
5, pp. 689-695, May 1981.
TOBA 75
Tobagi, F. A. and L. Kleinrock, "Packet Switching in Radio Channels:
Part II - The Hidden Terminal Problem in Carrier Sense
Multiple-Access and The Busy Tone Solution," IEEE Transactions on
Communications Vol. COM-23, pp. 1417-1433 (1975).
TOBA 80
Tobagi, F. A., "Multiaccess Protocols in Packet Communication
Systems," IEEE Transactions on Communication, Vol. COM-28, pp.
468-488, 1980.
WAGN 77
Wagner, L. S., "Communications Media Analysis - HF," NRL Memorandum
Report 3428, March 1977.
WATT 79
Watterson, C. C., "Methods of Improving the Performance of HF Digital
Radio Systems," Institute for Telecommunication Sciences, NTIA Report
79-29, October 1979.
WIES 80a Wieselthier, J. E. and A. Ephremides, "A New Class of Protocols
for Multiple Access in Satellite Networks," IEEE Transactions on
Automatic Control, Vol. AC-25, pp. 865-879, October 1980.
WIES 80b Wieselthier, J. E., and A. Ephremides, "Protocols of Multiple Access
(A Survey) and the IFFO Protocols (A Case Study)," to appear in the
Proceedings of the NATO Advanced Study Institute on "New Concepts in
Multi-User Communication," August 1980.
WIES 81
Wieselthier, J. E., D. J. Baker, and A. Ephremides, "Survey of
Problems in the Design of an HF Intra Task Force Communication
Network," to be published as NRL Report 8501.
21
Fig. 1 - Example of network organized into linked, node clusters. The squares
represent cluster heads, triangles represent gateways, and solid dots represent
ordinary nodes. The backbone network consists of cluster heads, gateway nodes,
and the links that join them together. The communication range for each cluster
head is indicated by a circle.
22
OI
cv,
4o-
--
-o
a-
4-
ci4-4-)
23
4
Id
Y
23~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
LL
Li
ii
C)
I-C
CL
LLi
Z
O~~~~~~~~
uui
15
CL
LAL
Li
ci~~~~~c
Li1
I-
0
CE
z
CL.
LiCD
C
(\Jr
U)~~~
a
-4
137 U
LL~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~L
CI
U2
ICL~~~~~
24t
L]CO
CI
EL]
CE
2:1
o
C/,
FI1
I
iC)
I
3I
W
(NW) 23NNU
25
NOI.OINnwwu
25I~~~~~~~~~t~~~*
EL-
I
3Hz
4
)
.Hz
.
7
.
7 .2
:7
*107~~
~10
Fig. 5 - Network structures obtained using six different epoch
frequencies.
26
OInc
9~~~~~~~~~%
v,
co
cc
a
0
or-.
1%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~C
E
c-
a,
.4~~~~~~~~~~~~~~~~~~~~~~~~"
27
v
co
~~~~~~~~~~~~~~J~~~~~~~~~~~.
10~~~~~~~~
4-rDI
E
orO3
O
27~~~~~~~
L-
7
6
4
3
9
10
Fig. 7 - Connectivity map for examples shown in Figure 6.
28
/
9
~~~~~~~o
Cl~~~~~~~~~~~~~~~~~~~~~~~~~~~~~E
1%
-o~~~~~
'0
0
cc
C'I
~
~
~
cc~~c
~29
09
~ ~ ~~
c
4-
a
--
co~~~~~f
-o
Cl~~ in1
L0
C,
c~~ ~~~~~~~~~~~~~~~~c
.%0
0
9~
r
~~~~~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~~v
~~~~~~~~in~~~~~~~~~~~~~~~
I
CD
4U
Oa)
LL.
4t--
29~~~~~~~~
30
FAIRNESS
IN
FLOW CONTROLLED NETWORKS
BY
Mario GerZa
Mark Staskauskas
University of California
Los Angeles
Los Angeles, California 90024
31
01981 IEEE.
Reprinted, with permission, from ICC '81, International
Conference on Communications, June 14-18, 1981/Denver,CO.
FAIRNESS IN FLOW CONTROLLED NETWORKS
M. Gerla and M. Staskauskas
University of California, Los Angeles
ABSTRACT
In this paper, we investigate the problem of fair sharing of
banwid.We
show that convenbandwidth in packet switched bandwidth
networks. in We
show
that convenpacke
switched
tional flow control schemes succeed in preventing congestion at the
cost of unfairness. We then provide a general definition of fairness
for two important cases-- the fixed route case and the optimal
multi-path case. Fairness algorithms are presented for input rate
flow controlled networks as well as window flow controlled networks. Experimental results show that simple flow control and
routing parameter adjustments can considerably improve fairness.
I. INTRODUCTION
Resource sharing is probably the most fundamental and most
advertised feature of modern data communications networks. In a
packet switched network, buffers and trunks are dynamically shared
by packets in order to improve throughput. In a satellite random
access channel, bandwidth efficiency is increased by allowing stations to dynamically share the channel and transmit whenever they
need to, eliminating the rigid preallocation of slots of more traditional time division multiplexed schemes.
Unrestricted sharing, however, can create problems. One type
of problem which has been extensively studied is congestion. If too
many users are attempting to use the same resource, then the
resource may go to waste. The typical example is the random
access satellite channel. When several users of the satellite channel
attempt to transmit a packet in the same slot, a "collision" occurs
which forces them to retransmit at a later time. Retransmissions
have the effect of increasing the load on the channel, and, if proper
conditions are met, may trigger a positive feedback process which
ultimately leads to system congestion. Examples of throughput
degradation and congestion due to unrestricted sharing also abound
in terrestrial packet networks, as shown in [GERL 80a].
In order to avoid the drawbacks of uncontrolled sharing, some
restrictions are built into the communications network under the
name of '"flow control", or "channel access control". The objective
of these control schemes is to protect network performance from
overload. Performance is generally defined as a global network
parameter averaged over all users, e.g., overall average delay, total
throughput, or average power, where power is the ratio of
throughput over delay [GIES 781.
At this point, a question immediately comes to mind: since
restrictions are placed on the users for the benefit of a global performance measure, are we to expect that the performance of each
individual user is optimized at the same time as the overall performance is optimized? Unfortunately, this is not always the case, as
demonstrated by the simple examples shown below.
In satellite multiple access schemes the Reservation ALOHA
protocol was proposed to reduce channel wastage due to conflicts
[CROW 731. The basic principle consists of preallocating slots in
future frames for the users who had a successful transmission in
the current frame. The motivation was to gradually move from a
completely random access scheme to a more orderly scheme (some-
what similar to TDMA) which could provide better throughput performance at heavy load. Indeed, in heavy load situations, the few
fortunate users who can "capture" the slots tend to keep them for a
long time, denying access to the remaining users. Throughput is
certainly maximized, but so is the discontent of the users that are
Iral
excluded!
In terrestrial packet networks, a popular approach to congestion prevention, known as input buffer limit control, consists of
refusing incoming traffic from local hosts (but not from network
trunks) when the packet switched processor is short of buffers
([LAM 77], [KAMO 791). The goal is to try and deliver the existing (transit) traffic (which has entered the network at a remote
source and therefore has already consumed some network
resources) before accepting new traffic. This practice was shown
very effective for congestion prevention; unfortunately, it also has
the effect of unfairly penalizing local users connected to
congestion-prone nodes, while favoring remote users.
The above examples are meant to show that fairness in
resource sharing is not a natural by-product of congestion protection or performance optimization. Rather, it is an independent
(sometimes antithetical) criterion which must be specifically cared
for during network and protocol design. This is particularly important for public data networks which generally have the policy of
charging customers uniformly, on a per-packet basis. This policy
would be untenable if the service provided (measured in terms of
individual throughput and delay) was widely different from user to
individual throughput and delay) was widely different from user to
user. Of course, fairness has a price and, in some applications,
must be traded off with overall nework efficiency.
The purpose of this paper is to investigate fairness in a fairly
simplified, yet representative, packet network environment.
Namely, we assume that the network is operated with multi-path
routing, and is controlled by either end-to-end window flow control
or input rate flow control. Unlimited buffering is assumed in the
nodal processors. A number of source-destination pairs are active,
and are transmitting packets to each other at a constant rate. In the
window control mode, we achieve fairness by adjusting the window
parameters for the various source-destination pairs. In the input
rate control mode, we achieve fairness by adjusting user input rates.
2. PREVIOUS WORK
Surprisingly little attention has been dedicated in the past to
the study of fairness and to the investigation of mechanisms which
can enforce fairness. We are aware only of two contributions which
(directly or indirectly) address this issue.
Jaffe in [JAFF 80] is motivated by the search for a "uniform
throughput" solution. In this model, users are assumed to have
unrestricted demands, which are accommodated through the network on preestablished single paths called virtual circuits. The network exercises flow control on individual input rates, in an attempt
to provide fair sharing of link capacities and, at the same time,
optimize a "delay-throughput tradeoff" measure. Fairness is
achieved by regulating input rates. Routes, however, are assumed
fixed, i.e. no optimization on routes is allowed in order to improve
fairness. A set of throughputs {y,, r = 1,....R }, where R = number
of source-destination pairs, is said to be fair if the following condition is satisfied for each link k in the network:
32
Y,
-< XA (CA - fk)
(c) penalty coefficients a and b
very large (see above)
(d)
large (see above)
(d) initial
demands identical
for all users and much
larger than trunk capacities.
(2)
where: C, = capacity of the k-th link
.fk. = data flow on k-th link
X; = a constant coefficient
a priori assigned to the link.
In fact, based on our assumption of very severe penalty functions,
the only links that contribute (significantly) to the delay term in (4)
are the bottleneck links. However if bottlenecks are the only constraints on the increase of (y,), then all the users sharing the same
bottleneck must have the same throughput in order to minimize
the sum of penalty functions. This derives immediately from the
fact that the sum of identical, convex functions of variables, whose
sum is in turn constrained by the bottleneck capacity, is minimized
when all the variables are identical. This equal sharing condition is
exactly the same as the fairness condition stated by Jaffe.
The link flow fk is given by the sum of all contributions y,
traversing link k. If no other restrictions are present, each user r
(say) will attempt to maximize his throughput 'y, on link k while
yet meeting constraint (2). If N users are present, competitive
equilibrium is reached when:
XkCk
I + XkN'
1,...N
From (3) we note that if Xk - oA, the entire capacity is equally
subdivided among the users. If Xk; < co, some residual capacity is
left after the subdivision.
Both the above formulations suffer from certain drawbacks.
Jaffe's formulation includes throughput fairness, but it assumes
predefined, single-path routing. In our investigations, we have
discovered that changes in routing pattern can have a strong impact
on fairness. The formulation of Gallager and Golestaani allows
optimization over both individual throughputs and traffic pattern,
but it does not include a rigorous notion of fairness.
Since a typical virtual circuit involves several hops, a user
throughput y, is subjected to a constraint of type (2) for each hop
along the path. The most restrictive constraint clearly will prevail;
the corresponding link is called "bottleneck" for user r. Since
different users have different bottlenecks, individual user
throughputs generally are not uniform throughout the network.
However, the following property is verified:
"each user's throughput is at least as large as that of all the
other users that share his bottleneck link."
It appears, therefore, that a rigorous definition of fairness
which applies to both single- and multi-path routing is needed. In
the next section, we provide such a definition and relate it to previous formulations.
This property satisfies our intuitive notion of fairness that
users who have no other restrictions should all be granted an equal
amount of the resource, while users who have other restrictions
should be granted as much of the resource as these restrictions
allow, up to the full amount obtained by the unrestricted users.
Gallager and Golestaani propose in [GALL 801 a combined
routing
and flow controli schemse aimedGALL0
t oimi
ing e
routing and flow control scheme aimed at optimizing the
throughput/delay tradeoff in the network. The objective function
(to be minimized) is the sum of average network delay plus a set of
penalty functions (one per user) reflecting the throughput reductions suffered by each user with respect to his initial demand.
More precisely, the objective function F is defined as follows:
F = T +_,P,(y,)
3. OPTIMAL FAIRNESS DEFINITION
We start by stating a general principle for an efficient and fair
throughput allocation:
Optimal Fairness Principle: Total network throughput is maximnized subject to the constraint that network capacity is fairly disThis principle gives a qualitative definition of fairness. We need,
however, to be more specific about the way capacity is distributed
among users. For example, we need to specify how users sharing
the same bottleneck are allocated shares of its capacity. We distinguish two cases:
(1) fixed path case: each end-to-end user session is carried on a
single, predefined path. No attempt is made to optimize the
path so as to improve efficiency and/or fairness (Jaffe's
model.)
(2) multi-path case: the traffic in each user session can be distributed simultaneously over several routes. Routing is chosen
so as to maximize throughput and, at the same time, guarantee fairness (Gallager-Golestaani model.)
(4)
r-t
where: (a) T = total average delay
(b) P (Yr) is a penalty function
defined for 0 < yr, < yo;
For the sake of clarity, we will introduce separate definitions
P(Ox) _ O
of fairness for the two cases, although we will later show that the
fixed path definition is a special case of the multi-path definition.
where y, = initial demand.
a
(c) -p,(Y)
First, we introduce some terminology:
constrained user: a user who cannot get all the
throughput he originally requested because of network
capacity constraints.
bI
=
.
The variables in this optimization problem are the input rates
Yr, r - 1,2,...,R} (flow control problem) and the network paths
chosen for such input rates (routing problem).
user throughput: total throughput obtained by the user,
possibly the sum of throughputs on separate paths.
Although the Gallager-Golestaani approach does not explicitly
address fairness, it does attempt to evenly distribute throughput
among the users, within the limitations posed by network capacity
(which is reflected in the objective function by the network delay
term). Moreover, the Gallager-Golestaani solution reduces to the
above-mentioned Jaffe solution in the following limiting case:
saturated trunk: a trunk with very high utilization (in
the limit, utilization = 1).
(a) fixed, single-path routing
(b) penalty functions convex and
identical for all users
saturated cut: a "minimal' set of saturated trunks which
partitions the network into two components. The set is
minimal in the sense that no proper subset of it would
partition the network into two components.
We assume that the network is connected, and that trunks are
full duplex. In order to simplify the definitions, we initially assume
33
actual net,^ork operation.
If rate control is assumed, the computation of the optimal
solution is rather straightforward. We can identify the bottlenecks
by inspection and select the optimal input rates in the fixed path
case. For the multi-path case, we can use the flow deviation
method [FRAT 73] with the joint objective function of Galiager
and Golestaani in Eq. (4) to find optimal routing and input rates.
When windows are used, the problem of finding optimal input
rates becomes substantially more difficult, since we can control
input rates only indirectly through windows. Given the windows,
the input rates can be computed only by solving a very cumbersome network-of-queues problem. The approach that we have
adopted is an iterative method based on heuristics. We repeatedly
adjust windows compute flows, evaluate fairness and make proper
corrections to the windows until we satisfy the fairness conditions
or we reach a local optimum. The critical step in this procedure is
the evaluation of throughput for a given set of window sizes. In
the fixed path case we use an approximate solution technique for
closed networks of queues called mlean allue analysis [REIS 79]. In
the optimal routing case we use an approach that combines the flow
deviation method and mean value analysis to maximize individual
throughputs in a fixed-window network [GERL 80b]. We then
evaluate the fairness of the solution using the fairness measure F
defined in Eq. (4) above.
The next section reports the experimental results we obtained
with both the rate and window control modes, for both fixed and
optimal routing.
that trunks can be utilized up to capacity, i.e. we ignore delays. We
will later relax this assumption.
Next, we introduce two definitions of fairness, for fixed and
multi-path routing, respectively:
(1) Optimal fbirness, fixed pathl case: A solution is optimally fair
(for a given route selection) if any constrained user traverses
at least one saturated trunk in which his throughput is larger
than or equal to the throughput of all other users (if any)
sharing the same trunk. This trunk is called the bottleneck for
that user.
(2) Opilmal.fairness, multi-path case: A solution is optimally fair if
any constrained user traverses at least one saturated cut
separating source from destination, such that his throughput
is larger than or equal to the throughput of all other users (if
any) sharing the same cut. This cut is called the bolttleneck for
the user.
We note that the two definitions are formally very similar: to
make them equivalent, we need only substitute the concept of
saturated trunk with that of saturated cut. In particular, definition
(2) reduces to definition (1) for tree network topologies, where
each cut consists of only one trunk.
The optimal fairness definitions are consistent with the
aforementioned principle of optimality. First, efficient use of
resources is ensured by the fact that each constrained user traverses
at least one saturated section. In fact, the existence of a saturated
section implies that no further increase in the aggregate throughput
of the users sharing the section is possible. Secondly, fairness is
ensured by the fact that all constrained users sharing the same
bottleneck have the same throughput.
It is worth noting that both the Jaffe and Gallager optimality
criteria satisfy the above conditions, at least in the limiting case.
Jaffe's criterion is identical to condition (1) when X - oo (i.e.
trunk utilization - 1.) Of course, our condition allows also for
non-saturated users, which are not considered in Jaffe's formulation. Likewise, Gallager's solution satisfies condition (2) when P
> > T, i.e. the throughput penalty functions become very large,
and all are identical. Under these conditions, each constrained user
in Gallager's optimal solution must traverse a saturated cut, or else
the value of the objective function could be reduced (and the solution improved) by increasing the throughput of such a user. Furthermore, all constrained users sharing the same bottleneck must
have equal throughput; otherwise, we could always reduce the
objective function by equalizing the throughputs, while keeping the
sum of the throughputs constant.
These last observations are important, because they allow us
to use the Jaffe and G-G algorithms to find fair and efficient solutions, as we show in the next section.
We have established fairness conditions under the assumption
that trunks can be fully saturated. In reality, trunk utilization must
be less than unity; otherwise, network delays become infinite. One
way to overcome this problem is to find the optimally fair solution
at saturation, and then scale down the constrained users first, until
the constrained users' throughput is identical to the throughput of
some unconstrained user sharing the same bottleneck. At this
point, the unconstrained user is reclassified as a constrained user.
The constrained-user scale-down procedure is then reiterated, picking up and reclassifying other unconstrained users as appropriate,
until the acceptable delay or trunk utilization is reached.
5. EXPERIMENTAL RESULTS
In this section, we apply the optimal fairness algorithms
presented in section 4 to some medium-size network examples.
Experiments were performed using both input rate control and window control. For each case, both fixed and optimal routing solutions were investigated.
5.1 Input Rate Control Experiments
We start by considering the topology in Fig. 5.1. Initially, we
assume that only links 1 through 7 are present (i.e. links 8 and 9
are removed.) This renders the topology a tree and forces a singlepath routing solution. Five chains (i.e. user sessions, or virtual circuits) are considered, as shown by the dotted lines in Fig. 5.1.
r
[
6
8
.
..
=
=
f
CHAIN5
CHAIN
4
CHAIN
3
IN 3
4. FAIRNESS IN FLOW CONTROLLED NETWORKS
_
_
- - -FIG.
In the previous section, we presented the conditions for
optimal fairness. In this section, we are concerned with finding
flow control and routing solutions which satisfy such conditions.
We start by distinguishing between two types of flow control. In
the rate control mode of operation, we assume that input rates can
be directly set during optimization. This is the basic assumption
made in both [JAFF 801 and IGALL 801. In window control mode,
we can manipulate input rates only indirectly, by changing the windows of the various user sessions. This is a more realistic model of
i
l
|
9
5.1
-
- - -.
INPUT RATE FLOW CONTROL EXAIIPLE
Link capacities are assumed to be all I pkt/sec. Initial chain input
rates are also 1 pkt/sec. The optimally fair solution is shown by the
first column in Table 5.2.
34
until no reduction in the standard deviation of user throughputs is
noted.
The results are presented in Table 5.4, where we give the
throughputs, window sizes and standard deviations before and after
balancing.
Table 5.2. Input Rate Flow Control Results
WITH
LNKS
WITH
WITH
1-7
LINK
LINK
LINKS
CHAIN
ONLY
8
9
8 &9
1
0.50
0.50
1.00
1.00
2
0.50
0f.50
1.00
1.00
|3
0.33
0.50
0.33
0.67
i4
0.33
0.75
0.33
0.67
5
0.33
0.75
0.33
0.67
This solution was first obtained by inspection, and was later
verified by minimizing the function F (see Eq. (4)) using very large
penalty functions. Clearly, for tree topologies the fixed path solution corresponds to the optimal routing solution. Note that link 6
is the bottleneck for chains 1 and 2, and therefore its capacity is
equally subdivided between them. Likewise, link 5 is the
bottleneck for chains 3,4, and 5.
Next, we add links to the network to introduce multi-path
routing. Adding link 8 relieves the bottleneck at link 5; chain 3 is
now bottlenecked at link 2, and the residual capacity of the cut
(5,8) is divided evenly among chains 4 and 5. Similarly, adding
link 9 relieves the bottleneck of chains 1 and 2, doubling their
throughputs; finally, adding both links 8 and 9 doubles all
throughputs from what they were with links 1 through 7 alone.
We also considered the more highly connected topology of
Fig. 5.3.
)
fl)f
1
(Following
Table 5.4 Window Flow Control
Results-Fixed Paths
Before Balancing
After Balancing
Std.
Std.
Dev.
y
Dev.
W
y
chain
W
1
.196
2
.296
2
2
.316
.032
5
1
.215
.085
7
1
.351
1
.254
2
4
.313
4 .324
5
.331
.007
5
4 .270
.064
7
4
.396
3
.317
2
10
.345
10
.335
.007
.266
.060
13
.324
5
10
.384
9
.336
7
10
Note how the reduction in standard deviation is greatest for large
window sizes; this is because the throughputs are concave functions
of the window sizes, i.e., ay / a W decreases with W, allowing a
greater degree of"fine tuning" at larger window sizes.
Next, we introduce optimal routing, and seek a set of windows which optimize the objective function F.
To solve this problem, we first let W' = W for r = 1,2,...,8,
and postulate that there is a unique value of W, Wo,
0 , for which F
is minimum; this hypothesis has been verified experimentally. Wo,"
can be determined by using standard bisection techniques requiring
5-10 applications of the window control routing algorithm of
[GERL 80b]. Once Wo,, has been determined, we attempt to vary
individual windows to further reduce F. As in the previous section,
the goal is to equalize user throughputs, causing a reduction in the
sum of convex penalty functions.
[GALL 801, we use penalty functions of the form
\=/
-PY
14 )
chain
source
dest.r
FIG.
5.3
1
1
6t
(-·--------~
5 ) -the
2
6
3
2
4
4
5
3
5
5
2
6
4
3
7
3
2
8
4
5
A HIGHLY CONNECTED NETWORKEXAMPLE
Minimizing F with large penalties, we find that y, = 0.5 for r =
1,2,...,8. This result indicates that the topology, connectivity, and
source-destination pair selection in this example are sufficiently uniform to allow all throughputs to be equalized. In general, then,
fairness optimization tends to maximize total network throughput
while minimizing the differences among individual throughputs. The
degree to which the latter can be accomplished depends on the
source-destination pair placements, link capacities and network
topology.
5.2 Window flow control
In our window flow control experiments, we again consider
the network of Fig. 5.3. Initially, we assume fixed, single-path
routing. We choose the routes for chains 2,5, and 7 in such a way
that they all traverse link (2,3) (which then becomes the
bottleneck.) In order to obtain a fair set of windows, we start by
assigning the same window size to each chain, and then increase
(decrease) the window of the user whose throughput is most below
(above) the average of link (2,3) users. We continue this process
I Y°b
(4.1)
Here, P;(y,) is the first derivative of the penalty function for chain
r, and a and b are coefficients which determine its severity. Using
network of Fig. 5.3, we performed experiments for three
different sets of penalty functions, and, for each one, evaluated F at
the following stages of optimization:
--with fixed routing;
--after routing optimization;
--after determining Wop,;
--after adjusting individual
windows.
Finally, we computed the optimal solution for input rate control.
This solution clearly gives us the lower bound on F, to be compared with window control results.
The results are given in Table 5.5.
RUN
1
2
3
Table 5.5 Window Flow Control
Results-Optimal Routing
Fixed
Routing Window
Indiv.
Routing
Opt.
Opt.
Wdws.
18.49
9.97
8.02
8.00
71.08
29.54
28.50
28.33
22316.33
5360.62
3319.44
3194.91
Penalty functions:
I
Fl: -P-P =I
35
,W
I
1
Rate
Control
5.58
27.95
3094.37
I*
2 4
F2 : -PI = ,j
'op,W
=6
F3:
,Y'
=
-F318
:
1[KAMO 791
Kamoun, F., "A Drop and Throttle Flow Control
(DTFC) Policy for Computer Networks," presented at
the 9th Int. Teletraffic Congress, Spain, October 1979.
-P.;=
jJW
01
[LAM 77]
Lam, S.S., and M. Reiser, "Congestion Control of
Store and Forward Networks by Input Buffer Limits,"
Proc. Nat. Telecommun. Con!f, Los Angeles, Calif.,
December 1977.
[REIS 791
Reiser, M., "A Queueing Network Analysis of Computer Communication Networks with Window Flow
Control," IEEE Transactions on Communications, August
1979, pp. 1199-1209.
Windows after individual adjustment:
F1:
_Wop,
= [2,2,1,1,1,1,1,1}
F2:
WEo,= {7,7,6,6,6,6,6,6}
F3:
Wop, = 124,18,18,18,18,24,18,17}
The largest reduction in F occurs after routing optimization,
which both increases throughput and decreases delay, thereby
reducing both components of the joint objective function. The
decrease in F obtained by varying individual windows is once again
greatest for large window sizes. Note that for medium and large
window sizes, we are able to come quite close to the lower bound.
This research was supported by ONR Grant
N00014-79-C-086
6. CONCLUSIONS
In this paper, we have defined general conditions for fairness
in a packet switched network, and have presented and demonstrated
algorithms for the implementation of fair routing and flow control
policies. The algorithms adjust input rates directly (input rate flow
control case) or indirectly (window flow control case) so as to
satisfy the aforementioned conditions. Simple network examples
show that substantial improvements in fairness are possible by
adjusting the routing and flow control parameters. These results
may find practical applications in the following areas: selective window size assignment in multi-user networks; design of input rate
flow control schemes; evaluation of adaptive routing schemes.
BIBLIOGRAPHY
[CROW 73] Crowther, W., et al., "A System for Broadcast Communication: Reservation-ALOHA," Proc. 6th HICSS,
University of Hawaii, Honolulu, January, 1973.
[FRAT 73] Fratta, L., M. Gerla, and L. Kleinrock, "The Flow
Deviation Method: An Approach to Store-and-Forward
CoMlmunication Network Design," Networks, vol. 3, no.
2, April 1973, pp. 97-133.
[GALL 80] Gallager, R. G., and S. J. Golestaani, "Flow Control
and Routing Algorithms for Data Networks," Proc. Intl.
Conf. on Computer Comm., Atlanta, Georgia, October
1980, pp. 779-784.
[GERL 80a] Gerla, M., and L. Kleinrock, "Flow Control: A Comparative Survey," IEEE Transactions on Communications,
April 1980, pp. 553-574.
[GERL 80b] Gerla, M., and P. O. Nilsson, "Routing and Flow
Control Interplay in Computer Networks," Proc. Intl.
Conf. on Computer Comm., Atlanta, Georgia, October
1980, pp. 84-89.
[GIES 781
Giessler, A., J. Hanle, A. Konig, and F. Pade, "Free
Buffer Allocation - An Investigation by Simulation,"
Computer Networks, Vol. 2, 1978.
[JAFF 80]
Jaffe, J. M., "A Decentralized, "Optimal," MultipleUser, Flow Control Algorithm," Proc. Intl. Conf on
Computer Comm., Atlanta, Georgia, October 1980, pp.
839-844.
36
PERFORMANCE MODELS
OF
DISTRIBUTED DATABASES
BY
Victor O.K. Li
PHE 526
Department of EZectricaZ Engineering-Systems
University of Southern CaZifornia
Los AngeZes, CA 90007
This research was supported in part by the Office of NavaZ Research under
Contract NOOOZ4-77-C-0532.
37
1.
Introduction
A distributed database(DDB) consists of copies of datafiles (often redundant) distributed on a network of computers.
Some enterprises, such as military Command, Control and Communications systems, are distributed in nature; since command posts
and sensory gathering points are geographically dispersed,
users are necessarily dispersed. Other potential users are
airline reservation systems, and electronic funds transfer
systems. A typical user is an enterprise which maintains
operations at several geographically dispersed sites, and whose
activities necessitate inter-site communication of data. The
distribution of data in a network also offers advantages over
the centralization of data at one computer. These advantages
include:
improved throughput via parallel processing, sharing
of data and equipment, and modular expansion of data management
capacity.
In addition, when redundant data is maintained, one
also achieves increased data reliability and improved response
time (See 112], [161).
There are two major implementation problems associated
with distributed databases. The first problem is that communication channels between sites are often very slow compared to
the storage devices at the local computer sites.
For example,the
ARPANET can move data at about 25 kbps (kilobits/sec) while
standard disks can move data at about 1 Mbps (megabits/sec), a
40-fold increase in rate.
Besides, networks have relatively
long access times, corresponding to the propagation delay for
one message to go from one computer site to another.
(This
propagation delay is about 0.1 sec. for the ARPANET.)
The
other problem is that communication channels and computer sites
are susceptible to failures, giving rise to networks that may
have constantly changing topologies.
2.
Key Technical Problems
Some of the problems associated with distributed databases
are the same as those for centralized databases and can therefore
use the same solutions.
Such problems include*: choosing a
good data model, designing a schema, etc.
However, mainly
because of the two implementation problems associated with the
distributed database, the following problems require significantly
different approaches:
The reader is referred to Date 15] for a definition of these terms,
38
(1)
query processing-a query accessing data stored at different
sites requires that data must be moved around in the network.
The communication delay, and hence the response
time, depends strongly on the choice of a particular data
storage and transfer strategy.
(2)
concurrency control-in centralized databases, locking
is the standard method used to maintain consistency among
redundant copies of data. The distributed nature of the
data in DDB means that setting locks produces long message
delays.
(3)
reliability/survivability-the network introduces new components (communication links, computers) where failure can
occur, and hence the associated problems of failure detection
and failure recovery.
(4)
file allocation-the problem of how many copies of each data
file to maintain and where to locate them. The use of
additional redundant copies generally means reduced communication delay associated with data retrieval.
Unfortunately,
it also means increased delay associated with update
synchronization.
The problem is difficult not only because
of varying file request rates due to the users, but also
because of the dynamic nature of the network topology.
The majority of research reported in the literature has been
on the development of concurrency control algorithms.
However, little has been done to compare the performance of the
different proposals. Bernstein and Goodman ll] analyzed the
performance of principal concurrency control methods in qualitative terms. The analysis considers four cost factors:
communication overhead, local processing overhead, transaction
restarts and transaction blocking.
The assumption is that
the dominant cost component is the number of messages transmitted-.
Thus distance between database sites, topology of network and
queueing effects are ignored. A quantitative comparison is
described in Garcia-Molina 18].
He compared several variants
of the centralized locking algorithm with Thomas' Distributed
Voting Algorithm [181 and the Ring Algorithm of Ellis [6].
The
major assumptions are (1) a fully redundant database, and (2) the
transmission delay between each pair of sites is constant.
The first assumption requires that the whole- database is fully
replicated at each node. This is necessary because Garcia-Molina
did not want to model query processing, which would have been
necessary for a general (not fully redundant) database.
The
second assumption means that the topology, message volume and
queueing effects of the communication subnetwork will be ignored.
In addition, although Garcia-Molina was primarily interested
in locking algorithms, he did not analyze the effect of deadlocks
on their performance.
This paper describes the development of
a performance model which attempts to remedy the shortcomings
associated with previous work in this area.
39
3.
The Performance Model
The basic architecture of a DDB consists of database sites
connected to each other via a communication subnetwork. At each
database site is a computer running one or both of the software
modules:
Transaction Module (TM) and Data Module (DM). The
TM supervises user interactions with the database while the
DM manages the data at each site.
We propose a 5-step approach to model the performance of
concurrency control algorithms:
(1)
(2)
Input Data Collection-Given a DDB managed on an arbitrary
communication network, we have to determine the following:
(a) topology of the network, i.e. the connectivity and
capacity of links between computer sites
(b)
locations of all copies of files
(c)
arrival rates of the different transactions.
Transaction Processing Model-Consider transaction T arriving
at database site i and processed by TM .
Suppose T reads
data items X,Y and writes U,V where U=f(X,Y), V=g(X,Y).
update will be performed in two steps.
This
(a)
Query Processing-TMawill devise a query processing
strategy to access X and Y and to produce the values
of U and V at database site i. To model concurrency
control algorithms accurately, we have to model query
processing.
Previous researchers got around the query
processing problem by assuming a fully redundant
database, in which case all queries will be addressed
to the local site and incur zero communication delay.
We do not believe this is a realistic assumption, and
are confronted with the problem of modelling query
processing. This is the object of Li 113].
(b)
Write-The new values of U and V will be written into
the database. This is accomplished by the two-phase
commit algorithm (See [t][ and 191):
(i)
Pre-commits-TM sends new values of U and V to
all DM's having copies of U and V, respectively.
The DM's then copy the new values to secure storage
and acknowledge receipt.
(ii)
Commits-After all DM's have acknowledge, TM
sends commit messages, requesting the DM's
to copy the new values of U and V from secure
storage into the database.
40
Using our Transaction Processing Model, we can determine,
for each particular transaction, the file transfer, read and
write messages that are necessary.
This information, together
with the transaction arrival rates and the file locations,
lets us generate estimates for fij, the arrival rate of messages
at site i destined for site j.
(3)
(4)
Communication Subnetwork Model-Using the message flow
requirements between database'sites, f.., and the network
topology as input to a routing strategy, such as Gallager's
Minimum Delay Routing Strategy 17], we can determine the
total traffic on each channel of the network.
Kleinrock
EPL developed a network of queues model to analyze the
message delay in a communication network.
The major
assumption of Kleinrock's model is the Independence
Assumption, which says that the lengths of a message at
successive channels in its path through the network are
independent.
In Li [12], we have pointed out some of the
inadequacies of the Independence Assumption and have proposed
a new Independent Queue s Assumption.
This assumption is
somewhat stronger than the Independence Assumption, but
has more flexibility in modeling a communication subnetwork.
Conflict Model-The conflict model lets us determine the
probability of conflicts between transactions and the delay
due to conflicts.
This is probably the most important
component of the performance'model.
Each concurrency
control algorithm has a distinct conflict model.
Fortunately,
although the literature abounds in concurrency control
methods,
they can be classified into two major approaches,
namely, timestamp ordering and two-phase locking.
In Li [12], we have developed the conflict model for SDD-1
1151, a timestamp ordering algorithm.
The performance of locking algorithms depends very much
on the deadlock solution technique associated with it.
In Li 1121, we have analyzed locking algorithms using the
Prioritized Transactions and Ordered Queues technique
for deadlock resolution.
In addition, we estimated the
probability of deadlocks for a simple locking algorithm.
(5)
4.
Performance Measures-We emphasize the performance measure
most visible to the users, namely response time, which is
the sum of local processing delay at the database sites,
transmission delay and delay due to conflicts.
Suggestions for Further Research
We propose to investigate the following related
research topics:
(1)
Message Delay in a Computer Network with Failing Nodes and
Links
41
Since the DDB is managed on a computer network, analysis of
the message delay in the underlying communication network is
an important problem. Kleinrock 11] has determined the average
In
message delay for all messages in a communication network.
Li [12], we developed a model for finding the end-to-end message
delay, i.e. the message delay between any pair of nodes.
Both studies, however,have assumed that the communication
channels and'the computers are perfectly reliable. This is
unrealistic in general, and especially so in a military
3
C environment. We therefore propose to analyze the effect of
node and link failures on the message delay.
(2)
Develop conflict models for concurrency control
algorithms
In Li [12], we have developed conflict models for four
concurrency control methods. We plan to develop conflict models
In addition, we
for other concurrency control algorithms.
would like to improve on existing models by relaxing some of the
assumptions.
(3)
Develop new query processing strategies amenable to distributed implementation
Existing query processing algorithms (See [3], [10], [19])
assume that the algorithms will be implemented in a centralized
One of the computer sites, the central node, gets
fashion.
information from all other nodes on the delay on the communication links and uses this information to compute the optimal
query processing strategy. This necessitates the transmission
of information from all nodes in the network to the central
node, plus the transmission of instructions from the central
node to the file nodes to coordinate the query processing. There
is also the problem of what to do when communication links and
computers fail. The central node may not be able to get all
In particular, if the central
the information it requires.
In Li [13] we have
node fails, the whole system will fail.
The MST Algorithm
developed the MST and the MDT algorithms.
minimizes the total communication costs associated with a query
while the MDT Algorithm minimizes the response time. While
these algorithms can also be implemented using a centralized
algorithm, they are significantly different from previous work
in that they are particularly suited for distributed implementation,
in which each node in the network bases all its decisions on
information received only from its neighbors and it is not
necessary to have a central node.
In addition, previous research ([3], 110], and 119])
address themselves only to non-redundant databases i.e. only
one copy of each file is maintained in the database. The MST
and MDT Algorithms can be easily generalized to redundant
databases by employing the artificial file node technique
developed in Li 1131.
42
The present versions of the MST and the MDT Algorithms,
however, do suffer from strict assumptions, namely (1) each
file accessed by the query has the same size, and (2) the
selectivity parameters have size one. We would like to develop
a practical -query processing algorithm by relaxing these assumptions.
(4)
Develop new file allocation algorithms
Existing research efforts on file allocation have concentrated
on variants of the following problem:
Given a description of user demand for service stated as
the volume of retrievals and updates from each node of the network to each file, and a description of the resources available
to supply this demand stated as the network topology,
link
capacities and the node capacities; determine an assignment of
files to nodes which does not violate any capacity constraints
and which minimizes total costs.
Our literature research efforts reveal that there are
currently three basic approaches to this problem:
(1)
Static File Allocation-Assume that the rate of
request for service is time-invariant and formulate the
problem into a nonlinear zero-one integer programming
problem.
This is the approach taken by Chu 14] and
Casey [2].
(2)
Dynamic File Allocation-Segall [17] and Ros Peran
[14] assumed that the rate of request for service is
varying, but that each file can be allocated independently
of other files in the network.
The problem is formulated
into a dynamic programming problem.
(3)
Heuristic File Allocation-The use of heuristics to
reduce the computational complexity of finding an
acceptable solution.
There are three major shortcomings that limit the usefulness of existing algorithms for file allocation:
(1) existing models assume that each query accesses a
single site while in reality query processing usually
involves retrievals in several geographically distributed
database sites.
Since communication delays are substantial,
the distinction between single-site and multiple-site data
retrieval is important.
Besides, different query processing
schemes will incur substantially different transmission
delays.
(2)
existing models neglect synchronization costs. When
redundant copies of a file are updated, existing algorithms
43
assume that the only costs incurred are the transmission
costs from the site performing the update to all sites
The cost of synchronization,
containing copies of the file.
which will vary with the synchronization scheme (e.g.
locking, timestamp ordering) and with the file allocation
is completely neglected.
(3) existing algorithms assume a reliable communications
network and reliable computer sites, except for Ros Peran's
1141 work which allowed for node (computer site) failures
and recoveries.
We propose to develop a file allocation algorithm that will
account for the costs of multiple-site queries and update
synchronization.
To quantify the costs of multiple-site queries, we have
to assume a query processing strategy. Suitable candidates
of this strategy are the MST and MDT Query Processing Strategies
Given the identities
that we shall develop in Research Task (3).
and locations of the files accessed by a query, these strategies
To quantify
will let us estimate the response time of the query.
the costs of update synchronization, we need to calculate the
probability of conflicts and the delay due to conflicts for
updates.
The conflict models that we shall study in Research
Task (2) will furnish these parameters.
We shall also study the effect of node and link failures.
44
REFERENCES
[1]
P.A. Bernstein, N. Goodman,"Fundamental Algorithms for
Concurrency Control in Distributed Database Systems,"
Computer Corporation of America, February 15, 1980.
[2]
R.G. Casey, "Allocation of Copies of Files in an Information
Network," Proceedings AFIPS 1972 Spring- Joint Computer
Conference, AFIPS Press, Vol. 40, 1972. pp. 617-625.
[3]
W.D.M. Chiu, "Optimal Query Interpretation for Distributed
Databases," Ph.D. Dissertation, Division of Applied
Sciences, Harvard University, December 1979.
[4]
W.W.Chu, "Optimal File Allocation in a Computer Network,"
in Computer Communication Networks, Kuo, F.F. editor,
Prentice-Hall Computer Applications in Electrical Engineering
Series, Prentice-Hall Inc., Englewood Cliffs, N.J. 1973.
[5]
C. Date, An Introduction to Database Systems, 2nd Ed.,
Addison-Wesley, 1977.
[6]
C.A. Ellis, "A Robust Algorithm for Updating Duplicate
Database," Proc. 2nd Berkeley Workshop on Distributed
Databases and Computer Networks, May 1977.
[7]
R.G. Gallager, "A Minimum Delay Routing Algorithm Using
Distributed Computation," IEEE Trans. on Comm., Vol.
COM.-25, No. 1, January 1977, pp. 73-85.
[8]
H. Garcia-Molina, "Performance of Update Algorithms for
Replicated Data in a Distributed Database," Ph.D.
Dissertation, Computer Science Department, Stanford University, June 1979.
[9]
J. Gray, "Notes on Database Operating Systems," Report
RJ2188, IBM Research Lab., San Jose, CA, February 1978.
[10]
A.R. Hevner and D.B. Yao, "Query Processing in Distributed
Databases", IEEE Trans. on Software Eng., Vol. SE-5,
No. 3, May 1979.
[11]
L. Kleinrock, Communication Nets:
Stochastic Message
Flow and Delay, McGraw-Hill, New York, 1964.
[12]
V. Li, "Performance Models of Distributed Database Systems,"
Report LIDS-TH-1066, MIT, Lab, for Information and Decision
Systems, Cambridge, Mass., February 1981.
[13]
V. Li, "Query Processing in Distributed Databases,"
Submitted for publication.
[14]
F. Ros Peran, "Dynamic File Allocation in a Computer Network,"
45
Electronic Systems Laboratory Report ESL-R-667 MIT,
June 1976.
[151
J.B. Rothnie, P.A. Bernstein, S.A. Fox, N. Goodman,
M.M. Hammer, T.A. Landers, C.L. Reeve, D.W. Shipman and
E. Wong, "Introduction to a System for Distributed Databases," ACM Trans. on Database Systems, Vol. 5., No. 1,
March 1980.
[16]
J.B. Rothnie, N. Goodman, "A Survey of Research and
Development in Distributed Database Management," Proc.
3rd Intl. Conf. on Very Large DataBases, IEEE, 1977,
pp. 48-62.
[17]
A. Segall, "Dynamic File Assignment in a Computer Network,"
IEEE Trans. on Automatic Control, Vol. AC-21, April 1976,
pp. 161-173.
[181
R.H. Thomas, "A Majority Consensus Approach to Concurrency
Control for Multiple Copy Databases," ACM Trans. on Database
Systems, Vol. 4, No. 2, June 1979, pp. 180-209.
[19]
E. Wong, "Retrieving Dispersed Data from SDD-1:
A System
for Distributed Databases," Rep. CCA-77-03, Computer Corp.
of America, March 15, 1977.
46
ISSUES
IN DATABASE MANAGEMENT
SYSTEM COMMUNICATIONS
BY
K. T. Huang
W.B. Davenport, Jr.
Laboratory for Information and Decision Systems
Massachusetts Institute of TechnoZogy
Cambridge, MA
02Z39
The research was conducted at the MIT Laboratory for Information and Decision
Systems, with support provided by the Office of NavaZ Research Under Contract
ONR/NOOOZ4-77-C-0532.
47
I.
Introduction
Database management systems are amongst the most important and success-
ful software developments in this decade.
They have already had a signifi-
cant impact in the field of data processing and information retrieval.
Many
organizations in the military forces have developed independently their own
databases on their own computers and database management systems to support
the planning and decision making in C3 operations.
Each DBMS has its own
intended schema, access control, degree of efficiency, security classification and operational requirements, etc.
Often, different database systems
may contain data relevant to the same problem although their structure and
representation could be different.
Bringing together all these databases
in several locations in order to integrate information resources and build
new kinds of applications to help C3 operations will be
beneficial.
One of the main problems in using these databases is the communication
between them when we need retrieval and update information.
Existing data
communication technology for computer networks does not yet provide a
solution for the communication between these DBMS.
This paper is dedicated to the study of the communications between
nonintegrated, heterogeneous and distributed DBMSs.
A concept of a data-
base communication system is proposed to provide a way to integrate and
share information
in a heterogeneous database.
system is a front-end software system of a DBMS.
The Database communication
It presents to users an
environment of a single system and allows them to access the data using
a high level data manipulation language without requiring that the database
be physically integrated and controlled.
48
In Section 2, we describe the motivations and difficulties of heterogeneous DBMS and specify the goal of this system design.
relational data model
communication.
is
In section 3, a
chosen as global data model to support the
Several reasons are described.
In Section 4, we
describe
the architecture of a database communication system and the functional
characteristics of each of its components.
In Section 5, some network
configuration are described to integrate heterogenous DBMSs by using
database communication systems.
Lastly, several problems requiring
further research are discussed.
49
II.
MO)TIVATION AND OBJECTIVES
The Heterogeneous World of DBMSs
In the "real" world, resources are heterogeneous in nature,
size, shape, color, structure etc.)
(e.g.
and particularly in the world of DMBMs.
There are at least several dozens of heterogeneous DBMSs commercially
available today, e.g. IMB, S2000, TOTAL ISMS, etc.
From several points
of view, we can distinguish heterogeneous DMBSs.
1.
Conceptual Model Approach
Traditionally, database models may be classified into three categories:
hierarchical, network, and relational.
Most of commercial available systems
are implemented in some variant of one of the three models.
For example,
IMS is hierachical, system 2000 in inverted hierarchical, TOTAL follows
CODASYL DBTG architecture, ADABAS is inverted network and INGRES is
relational.
2.
Physical Model Approach
Although two DBMSs may have the same conceptual model or may even be
the same type of DBMS, they may have different data structures.
For
example, the storing of information about courses offered and students
taking them may well use different physical data structures.
S2
S1
courses
S3
courses
courses
iCOUSTU
students
STUDENT
50
With different data structures, the access paths will be different.
3.
Data manipulation language approach
The data manipulation language can be second-at-a-time or set-at-atime.
In other words,
non-procedural.
it can be low level procedural or high level
It depends on the conceptual model and physical model the
system has adapted.
It also depends on the system itself.
For example,
in relational system, System R, the language can be SEQUEL or Query-byExample.
4.
Apolication Approach
From an application point of view. the DBMS can be classified into
either a general purpose system or a special purpose system.
general .irpose
purposeF
PARS
TOTAL is a
DBMS which is used for all kinds of different application
(Programmed Airline Reservation) System is a special purpose
system whinch serves only a specialized application.
The systems used for
different purposes support different facilities.
5.
Machine Approach
The same DBMS can be implemented on different computers.
The ARPANET-
Datacomputer system is a typical heterogeneous system where quite different
types of computers are tied together and implement their own DBMSs.
ferent computers may differ:
Dif-
in their speed, memory size, storage management,
etc.
6.
System Control Approach
Viewed from the system control aspect, there are two types of systems:
centralized and decentralized control systems.
A centralized contol
system assumes the existance of one central control function to handle
all systemwide global control.
The LADDER-FAM (Language Access to
51
to Distributed Data with Error Recovery - File Access Manager)
developed at SRI is an example.
[1,2]
A distributed control system where the
control is completely distributed to each subsystem is more reliable.
The SDD-1 system of computer corporation of America
[3]
is an example of
this type.
Difficulties and Approaches
The large bulk of local data are produced at a variety of locations
in many fields.
In all of business, scientific research, government,
the data exchange is very important in decision making, experiment,
management and control.
The difficulties of communications between
heterogeneous DBMSs can be identified as follows.
1.
Data model -
be different.
the conceptual models for different DBMS may
A user having a knowledge of one system may not be
familiar with another system.
Selection of a data model for every system
to provide a uniform view to the end user is essential.
2.
Data definition language --in addition to selecting a data
model, a data definition language to support the description of the conceptual scheme is also essential.
3.
Data manipulation language -
the user's query language
cannot be -the one for local host schemes.
supports the global uniform scheme.
It must be a query language that
Because the end users don't know
what data model the query will have to deal with, they are obviously unable to specify how something must be done, and so must instead specify
what is to be done, i.e. the language must be nonprocedural.
52
4.
Data integration -
most of the databases set up by independent
organizations are hard to be integrated.
It it also possible that in-
consistencies exist between copies of the same information stored in
different databases.
Combining all local schema together to form a
global schema is needed in order to provide an integration schema for
them.
5.
Data incompatibilities -
the same objects in different DMBSs
may be represented in different types, different schema names, different
scales, etc.
When integrating the DBMSs, we need to recognize these
incompatibilities of data sources and identify them in the integration
schema.
6.
Processing results -
once a result is gotten for a query, it
is expressed in the form of the original data model, and it must be
translated to the uniform data model.
Can this current result be saved
and be operated later on?
7.
Data dictionary and directory schema -
We must provide each
end user with a unified directory such that he is able to see easily
what data is available, where it is, and how to get it.
8.
Access planning -
with a high-level query language, the system
should provide an optimizing strategy for each query in a distributed
system.
9.
Multiple-systems access -
or more different systems.
each query may reference data in two
The system must coordinate their transactions.
53
4.
Data integration -
most of the databases set up by independent
organizations are hard to be integrated.
It it also possible that in-
consistencies exist between copies of the same information stored in
different databases.
Combining all local schema together to form a
global schema is needed in order to provide an integration schema for
them.
5.
Data incompatibilities -
the same objects in different DMBSs
may be represented in different types, different schema names, different
scales, etc.
When integrating the DBMSs, we need to recognize these
incompatibilities of data sources and identify them in the integration
schema.
6.
Processing results -
once a result is gotten for a query, it
is expressed in the form of the original data model, and it must be
translated to the uniform data model.
Can this current result be saved
and be operated later on?
7.
Data dictionary and directory schema -
We must provide each
end user with a unified directory such that he is able to see easily
what data is available, where it is, and how to get it.
8.
Access planning -
with a high-level query language, the system
should provide an optimizing strategy for each query in a distributed
system.
9.
Multiple-systems access -
or more different systems.
each query may reference data in two
The system must coordinate their transactions.
54
10.
Multiple view support -
If the system wants to support multiple
schema for each DBMS, so that users can have freedom to chose their own
preferred query language and global schema, then the systems must add more
schema translators and query translators.
11.
has
Control system -
After integrating different DBMSs, the system
to have a system controller so as to control the network DBMSs.
The
data manager must decide whether to use centralized control or distributed
controls.
Design Objectives
Before we set up the design approach, it is important to decide what
goals we want to achieve.
1.
Central view for users -
All user's views are defined upon a
global conceptual schema which is the union of the local schemata and
integration
schema.
It is hoped that from the user's point of view, the
system behaves in the same way as in a centralized system and the user is
unaware that he may be dealing with heterogenous local databases.
2.
General to any system -
We wish the database communication
system to be general to any system and that it can be used to integrate
various database systems for various applications.
In addition, we want
to minimize the cost and effort and maximize the overall performance.
3.
Flexible to future extension -
We know that the volume and the
complexity of databases are extending very rapidly.
We want the system
to be flexible for the future expansion with minimum cost.
55
4.
Reliability -
DBMSs do not fully rely
We hope that the communication between heterogeneous
on a centralized system.
The communication capability
should be distributed among every heterogeneous DBMS.
5.
Distributed control -
Based on the reliability and parallel
processing issues, we want the communication between DBMS to have distributed controls
6.
Security -
When combining heterogeneous DBMSs, some confidential
data in one system should often not be accessible to users in another
system.
The security facility must be reliable when checking access rights and
protecting the data.
56
III.
DATA MODEL
Because we are dealing with communications between different DBMSs
supported by different data models, e.g. hierachical, relational etc.,
our approach is to select a data model to support a uniform conceptual
schema for each DBMS in order to provide users with a homogeneous view of the
conceptual schema and also serve as a bridge between the underlying models.
Many logical data models have been proposed which model the real world
in terms of the interested objects and the interrelation between them.
In [41, the authors study 23 data models and attempt to establish the
similarities and differences among them according to data model structure,
logical access type, semantics and terminology.
focused on two directions.
models.
Recent research has
One is to enhance the conventional data
The notion of "normal form theory" has led to a refinement of the
relational model which attempts
to catch up more semantic information
by explicitly expressing functional dependencies among data.
Many authors
worked along this direction and have built various semantic data models.
The second approach
has been to emphasize the identification of a basic
simple construct with clean semantics.
These constructs
may be easily collected in a meaningful fashion to represent complex
varieties in semantic structures.
It is clear that there is no mental
model that is so superior that it is good for all users.
In view of the state of art, we chose a relational data model as a
global data model to provide a central view to the user's bases for the
following reasons:
1.
The relational data model shields the user from data
formats, access methods and the complexity
57
of storage structures.
2.
It supports a high-level non-procedural query language
3.
The storage and data structures are very simple, all data
is represented in the form of records.
4.
Access paths do not have to be predefined.
operators are supported in relational model, e.g.,
etc. for
5.
A number of power
select, project, join,
data retrieval.
Because of the decline of hardware cost and the rise of
manpower cost, a high-level nonprocedure manipulation language is
necessary to minimize the user workload.
6.
The relational model provide a simple
and powerful interface
to the data.
7.
The relational model has fast response to ad hoc queries which
often are a high-percentage of queries.
8.
The advance in associative storage devices offer the potential
of greatly improving the efficiency and therefore the performance of a
relational system.
Based on this choice, we propose a database communication system
which incorporates distributed heterogeneous systems into a unified
entity and shares the information resources in a distributed manner.
58
IV.
ARCHITECTURE OF DATABASE COMMUNICATION SYSTEMS
Although the heterogeneous database management systems are geo-
graphically distributed, the existing approach for communication between
heterogeneous DBMSs builds a single control system which cooperates and
communicates between different DBMSs by using the computer network. One asks
why shouldn't the database control also be spread through each cooperating
DBMS?
Hopefully doing so will provide a better use of data resources and
improve the performance and reliability.
Our approach is to define a database communication system which
serves as a front-end processor of local DBMS(s) and as
computer network.
an interface to the
It is a software system aimed to link geographically
distributed heterogeneous DBMSs together and to act as a bridge for
communication between local DBMSs
(see Fig. 1).
The basic underlying assumptions are:
1.
It is possible to exchange information amongst the various
system and they are willing to maintain information.
2.
Each DBMS is considered to be able to execute a given local
transaction.
3.
There exists a communication network which connects the
various DBMSs.
4.
The access to a local DBMS is not affected by the operation
of the data communication system which should transparent to the local
user.
59
r
external
schema /
I
I
I
i
l
Database
Communication
System
DBMS
I'
r
xterna
ew
I
Database
Communication
System
Network
r
A
I
J2
0
~D BMS
_j
_.........
Fig. 1.
General Architecture of Heterogeneous DBMSs.
Functional Characteristic
The database communication system consists of three major units
(Fig. 2)
*
schema unit
·
query unit
*
control unit
I
-
-
I
The functional characteristics of each component within a unit are
described separately in order to maintain the modularity.
60
DBMS
1
I
QUERY
TRANSLATOR
SCHEMA
TRANSLATOR
CONCURRENCY
CONTROL
LOCAL SCHEMA
INTEGRITY
CONTROL
GLOBAL SCHEMA
SECURITY
CONTROL
QUERY
OPTIMIZER
QUERY
QUERY
RECOMPOSER
DATA
D
ATA
DICTIONARY
INTEGRATION
SCHEMA
DIRECTORY
QUERY
UNIT
SCHEMA
UNIT
CONTROL
UNIT
ARCHITECTURE OF DATABASE COMMUNICATION SYSTEM
61
a.
Schema Unit:
The schema unit maintains the local schema and integrity schema.
It consists of three components.
(i)
schema translator
* reads a schema description of the local DBMS and translates
into schema description in a global data model and vice versa.
. this is done by a mapping of the data definition language
and the structure of the data model
· the schema translator may be different for different target
DBMS.
· A schema unit can have several different kinds of schema
translators
(ii)
local schema and global schema
. local schema is the schema translated by the schema
translator from the local host schema.
. global schema is the union of all local schema and
and integration schema of the database communication
system
(iii)
integration schema
. It consists of information about integrity constraints,
data incompatibility, and data redundancy.
. It is set up at the time a DBMS joins the heterogeneous
network
. the component can'be viewed as a small database.
b.
Query Unit:
A query unit takes care of the query processing, optimization and
access strategy.
(i)
It consists of three components.
query translator:
· translates a query in the global query language into
a query accepted by the local DBMS.
· this is done by a mapping of the data manipulation
language
· the query is parsed and simplified.
(ii)
query optimizer
the query is decomposed into local subqueries which
reference only local schema and queries reference
62
GLOBAL SCHEMA
INTEGRATION SCHEMA
LOCAL
SCHEMA
LOCAL
HOST
SCHEMA
LOCAL
SCHEMA
_
-
_
LOCAL
HOST
SCHEMA
SCHEMA ARCHITECTURE
63
only the integration schema
·
the distributed query algorithm must provide an execution
strategy which minimizes both the amount of data moved
from site to site and the number of messages sent between
sites. In addition, the algorithm should take advantage
of the computing power available at all of the sites
involved in the processing the query.
·
the algorithm must also take care of the query recomposer
within the optimization strategy.
(iii) query recomposer
·
The access strategies are then executed. The results
of the execution is represented in local host schema.
The final answer must be described in terms of global
schema.
The result of local queries must be sent to the answer
site so that the results can put together and reformatted
as the answer expected by the query.
c.
Control Unit:
(i)
Concurrency Control
The concurrency control algorithm must have a synchronization protocol to preserve consistency in a distributed
environment.
It processes distributed interleaved transactions by
guaranteeing that all nodes in the system process the
accepted update in the same reference.
Deadlock detection or prevention mechanisms must be provided.
When system failures occur, the other
nodes must be
able to continue to operate and the crashed nodes must be
able to restore to correct operation.
(ii)
Integrity Control
There are two levels of consistency. Strong mutual
consistency has all copies of data in the system
updated at the same time. Weak mutual consistency
allows various copies of the data to converge to the
same update status over time, but at any instant of
time, some copies may be more up-to-date than others.
In a C operational system, we may want to adapt weak
mutual consistency so as to use less processing time.
64
QUERY AGAINST
GLOBAL SCHEMA
DECOMPOSER
QUERY OPTIMIZER
QUERY TRANSLATOR
QUERY SIMPLIFIER
PARSER
ACCESS PLANER
EXECUTION
QUERY RECOMPOSER
FINAL RESULTS
QUERY PROCESSING
65
(iii)
(iv)
Security Control
.
All data, dictionary, programs and services must be
protected from unauthorized access.
.
all authorization information is kept locally and
checked locally.
·
a feedback encryption and decryption system must be
providedto each node across the communication
network.
Data dictionary/directory schema
·
Only indicates the storage nodes at which various
data stored.
One central master directory and local subsets of
the subsets of the control directory.
They are the "bread and butter" software for a successful database administration function.
66
V.
Heterogeneous DBMSs Network
By using a database communication system (DCS), the heterogeneous
DBMSs can be interconnected in several different configurations according
to the desired criteria.
For example, several versions of the same type
of system could be grouped together under a local database communication
system so that they can easily communicate without needing translation
if a query just references the local DBMSs.
The systems which store
similar data can be grouped together at a first level so that they can
be more efficient in retrieving data and exchanging information.
systems
which
Those
store confidential data can be put together, so that the
management and security control can be done more effectively.
The heterogeneous DBMSs network using a database communication system
as a bridge for interconnection may have one of the following configurations:
1.
Star Architecture
(Centralized System)
DBMSL
DBMS
DCS
67
2.
hierachical architecture
DCS
DCS
IMS
3.
IMS
DCS
IMS
R
DCS
R
R
IDMS RLDMS
TOTALI
Ring architecture
"'D DBMSr
DBMS_
DBMS
DBMS
_
DCS
DS.6
68
B
4.
General Architecture
DBMS
DCS 'D
SBMS
DBMS
CS
DCS
5.
Partition Architecture
DCS
DCS
DBMS
DBMS'
DCS
DCS
DCS
DBS
DBMS
69
DBMS
VI.
CONCLUSION
The database communication system is an approch to integrating hetero-
geneous database management systems.
By integrating many independent, dis-
tributed information resources, we believe it should be helpful in information
retrieval and decision making
for solving C
problems.
There are several
problems that need further studies in order to make the system
.
query optimization
.
distributed concurrency control
.
translation rule
.
security control
successful:
In the environment of C 3 operation, time is a very important factor.
The user should be above to easily form
a query and the system retrieve
the data and recompose it to present to the user quickly.
zation is a first priority.
Query optimi-
Concurrency control problems have been widely
studied, mostly for centrally controlled systems.
We need to study
and develop algorithms which are suitable for a distributed environment.
For the translation between global schema and local schema, and between a
query in a global language and a query in a local language, we need to study
the rules for translation for different data models and manipulation
languages.
systems.
Security control is one of the most important problems in C 3
Because the data are often integrated together, the security
control of classified information is essential.
The mechanism for checking
access rights and encryption of the information flowing throughout the
network deserve further study.
It is hoped that such a database communi-
cation system will increase the efficient usage and management of information and data of C
systems.
70
VII.
REFERENCE
1.
P. Morn's & D. Sagalowicz, "Managing Network Access to a Distributed
Database", Proc. 2nd Berkeley Workshop, pp. 58-67, 1977.
2.
"Language Access to Distributed Data with Error
E.D. Sacerdoti:
Recovery" SRI Tech. Note 140, 1977.
3.
J.B. Rothnie & N. Goodman, "An Overview of the Preliminary Design of
SDD-l", Proc. 2nd Berkeley Workshop, 39-57, 1977.
4.
L. Kerschberg, etc., "A Taxomomy of Data Models", in Systems for Large
Data Bases, P.C. Lockerman and E.J. Neuhold eds., North-Hollow Pub.
1976.
71
72
MEASUREMENT
OF
INTER-NODAL
DATA BASE
COMMONALITY
BY
D. E. Corman
The Applied Physics Laboratory
The John Hopkins University
Laurel, MD
20810
This work was supported by the Naval Electronic Systems Command Under
Task C3AO of Contract N00024-8Z-C-5301 with the Department of the Navy.
73
ABSTRACT
This paper presents the results of an analysis to establish
general requirements for inter-nodal data base commonality necessary for
effective coordination of Over-the-Horizon Targeting (OTH-T) tactical
operations. The technical approach has been to first develop a
characterization of data base commonality, define measures of effectiveness
(MOEs) which reflect this characterization and then set requirements
(i.e., a threshold on the MOE selected).
1.0
Introduction
The Over-the-Horizon Targeting System (OTH-T) provides a means
for distributing targeting information originating from multiple sensors and
sources to multiple users (nodes). These nodes include one or more central
collection/processing nodes and one or more attack nodes equipped with antiship cruise missiles (ASCM). Each node maintains a data base containing
dynamic information on shipping in an area of interest. When two or more
nodes have overlapping areas of interest, it is essential that their respective
data bases provide similar information in the overlapping areas.
The purpose of this paper is to provide a definition for, and means
for measuring data base commonality within the Over-the-Horizon Targeting
System. The approach is to first determine what information must be extracted
from a data base to support a targeting mission.
It is this processed information,
the targeting file, which should be compared for comonality between nodes
pursuing the same mission in an overlapping area of interest. Measures of
effectiveness (MOEs) are then developed that summarize the agreement between
targeting files. The MOEs are shown to be simple to compute and use and are
related to estimates of acquisition probability provided by the fire control system.
2.0
Background
In order to discuss data base commonality it is first necessary
to review the OTH-T System - how it receives, stores, processes data and how
this data is used in the fire control system to produce a targeting solution
for an ASCIf launch.
2.1
OTH-T System Description
The Over-the-Horizon Targeting system is responsible for the
collection, dissemination and processing of information utilized in the
targeting of anti-ship cruise missiles. As developed in References 1-4
the OTH-T System includes one or more central collection nodes (Regional
Processing Groups - RPG) and ore or more attack nodes (Attack Targeting
Groups - ATG). Figure 1 shows a sample system configuration in which three
sensor sources (Information CollectionGroups - ICG) supply raw sensor data
to one RPG. This RPG (either ashore or afloat) then provides processed
data in the form of correlated tracks to two ATGs. Also illustrated in
the figure is a direct path from the sensor to the attack node for operation
74
during an augmented mode when timeliness of data arrival is the critical
issue. References 2 and 3 provide additional details on an investigation
conducted at the Johns Hopkins University Applied Physics Laboratory (JHU/APL)
to determine the future Navy Command and Control System Architecture. Reference
4 provides a description of the Over-the-Horizon Targeting System as implemented i
during a recent targeting demonstration.
A data unit collected from an ICG may consist of any combination
of observables including position, course, speed and/or line-of-bearing as
well as an estimate of report uncertainty. The report may also include contact
identification or classification when available. At each of the processing
nodes, received reports are correlated with track histories resident within the
data base. A track history is a collection of one or more reports, arranged
in chronological order, which represents a unique ship moving within the
area of interest. If correlated, a report is added to the appropriate track
history. Otherwise, the report.is either used to initiate a new track or set
aside as an ambiguity. References 5 and 6 provide details on an Area Tracking
and Correlation (ATAC) model utilized at JHU!APL to investigate multi-sensor,
multi-sensor correlation requirements.
The collection of track histories at a targeting node constitutes the
track file. Each track file is constantly changing with time as new information
is received. Track files held by different nodes will be different, in general
even for the same area of interest since each node Ray receive information
from different sensors/sources or the same information at different times.
From the track file is created the targeting file. This file
consists of the set of unique ship tracks to be used by the fire control system
in developing a targeting solution. Each track includes a position and
positional uncertainty estimate at a particular time and the rate of growth
of the uncertainty with time. These estimates are derived through application
of a Kalman filter algorithm to a subset of the contact reports resident in a
given track history. Reference 7 reports on a survey of available ship tracking
algorithms conducted at JHU/APL to assist in the selection of a tracking
algorithm for the OTHI-T system. Reference 8 provides a type C specification
of the recommended ship tracking algorithm. This is the algorithm currently
specified for implementation at the processing centers and on ASCM equipped
targeting nodes.
The targeting file, when displayed to an operator or when used
by the fire control system, can be thought of as a targeting picture, i.e., a
map showing the locations, motions, probability densities, and possible
identifications of a collection of ships in an area of interest at a particular
time. In what follows, data base commonality will be defined as the similarity
of targeting pictures at various nodes. Future paragraphs will provide more
details on what at first blush appears to be an imprecise statement.
75
Sensor/
source
1
t
Sensor/
source
2
\
Sensor!
source
3
RPG
ATG,
Node 1
Fig. 1
o
ATG,
Node 2
Sample OTH-T system configuration showing data flow (U).
76
2.2
Fire Control System
A fire control system, such as that employed on Anti-Ship Tomahawk
cruise missile equipped ships, utilizes the targeting file as supplied by the
OTH-T system to perform mission planning. The desired result of mission
planning is to develop fire control solutions which produce acceptable levels
of mission success given the known capability of the weapon system. Conversely,
mission planning should reject solutions which have a poor chance of success.
As applicable to the Navy Anti-Surface Warfare (ASUW) Mission these targeting
effectiveness predictions are expressed by the probability that an ASCM
successfully acquire the target (Pa). Two probabilities are relevant. The
acq
first measure, P
(isolated probability of acquisition), gives the probability
acq i
that the missile will acquire the target assuming no other ships are present.
The second measure, P
(conditional probability of acquisition), gives the
acq
c
probability of acquisition conditioned on the presence of other shipping and
the fact that they may be acquired before the target.
Data base commonality in the context of this paper is then defined
as the ability of two or more data bases maintained at separate targeting nodes
to support similar weapons employment decisions. Specifically it is required
that, at a given time, separate data bases produce similar targeting pictures
which result in similar targeting effectiveness predictions (i.e., Pa).
acq
In the next section more direct measures of data base commonality are defined
and characterized. It will be demonstrated that these measures are closely
related to the similarity of targeting effectiveness predictions for two data
bases.
3.0
Measure of Effectiveness Specification
A measure of data base commonality can be developed using an
analogy with terrain correlation techniques utilized for missile guidance
(Reference 9). For that problem measured terrain heights are compared with
a stored reference map of terrain heights in order to find the displacement
of the measured map. Mathematically the displacement vector (xd,yd) is found
which maximizes the correlation function f f(x,y)g(x-xd,y-yd)dA, where f(x,y)
A
represents the height of the measured terrain, g(x,y) represents the height of
the reference terrain, and A is the area over which the two maps are to be
compared.
A similar equation can be used to compare targeting pictures in a
common area of interest. Specifically we define a normalized correlation coefficier
p, 0 < p < 1, given by
f(x,y) g(x,y) dA
P
,
[
f(x,y) dA1/
77
[
g2(xy) dA] 1/2
(1)
where f and g are "heights" proportional to the local shipping density in
each respective data base. More specifically, f and g are given by the sum
of the probability densities of the individual ships, i.e.,
m
f(x,y)
=
fi (x,y)
g(x,y)
i~
l
1
ggi (x,y),
Here the probability density functions (PDFs) are given by
{fi }i
{i=
for
m ships in data base 1 and n ships in data base 2. In what follows, it is
assumed that f. and gi are given by bivariate normal random variables. In
this case, eacd density function is completely specified by a mean position
and the associated-uncertainty ellipse. This assumption is satisfied when
a Kalman filter tracking algorithm is used for track projection. Reference 10
provides an analytical expression for evaluation of the integral in (1).
The expression for the correlation coefficient p given in equation
(1) does not account for identification or classification information that
may accompany ship track data. Instead, it measures only the positional
similarity of two targeting pictures at a specified time kWhen a correspondence
between ships in two data bases has been established on the basis of identities
or partial identity (i.e., classificatior a slightly different formulation
is required. First suppose two ships have been correlated on the basis of
identifying information. The target correlation coefficient is given by
equation (1) where f and g are now the individual density functions of the two
correlated ships. In cases where partial identification is available, a
correlation algorithm can be applied to provide a most likely correlation
candidate. For this candidate the target correlation coefficient pT is then
degraded by a scale factor K, 0 < K < 1 to account for missing or incomplete
identities. The following table provides a reasonable set of values of K
for different threat categories. In this table a large penalty is assessed
for matching a friendly with a neutral or unknown; and a small penalty for
matching a hostile with a neutral or unknown
TABLE 1
Identification Degradation Faator, k
Friendly
aUnJknon
1eutral
Friendly
Hostile
1.00
0.00
0.10
0.10
1.00
0.50
0.50
Nostile
78
Commonality which includes background shipping is important primarily
in the vicinity of targets.
For this reason, the computation of PD (from equation
1), data base correlation, coefficient, will normally be restricted to an
ares about the target or other high interest ship.
Summarizing briefly, two measures of data base commonality have been
introduced.
The first measure,
PT,
quantifies the correlation between track
projections for high interest targets. The second, pD, quantifies the correlation
between data bases for all shipping within a target centered area. Following
sections provide details on the properties of the measures p and PD. Primary
emphasis is given, however, to analysis of PT due to its simpler form.
4.0
Properties of Data Base Commonality Measures PT' PD
Figure 2 indicates the effects of various geometries on PT
single target correlation coefficient. In this figure the track projections
for the two data bases are assumed to be circularly normally distributed
with mean mi and covariance matrix Qi
2
[0 1
(i-1,2) respectively.
As
shown in referencel0,under these conditions the correlation coefficient pT
takes a particularly simple form involving two dimension-less parameters; the
normalized distance between projections, nl, and the ratio of standard deviations,
n2 .
Accordingly,
2
P
e
2
exp - 1
\ih2
n2
1 2
a2
n
2
=
-
a1
,
and d being the radial distance between the two projected positions.
79
2)
1.0
0
0-
-
.0
0
4-,
C
.)
4
0.4
c0)
0.2-
'0
0.25
0.75
0.50
Non-dimensional distance
Fig. 2
1.0
1.25
d
Single target correlation versus non-dimensional distance.
80
As the figure illustrates, for small nl, target correlation is chiefly
dependent upon the ratio of standard deviations n . As the non-dimensional
distance nl, increases over 1.25,this dependence ?ecomes much weaker. To
achieve a correlation coefficient greater than 0.8, note that n1 must be
less than 1.0 and n 2 must be less than 2.0.
Figure 3 shows a case where the probability density functions
have the same means and the same uncertainty area (equal ellipse areas);
but different eccentricities. Correlation is plotted as a function of the
ratio of the major and minor ellipse axes lengths for the elliptical
density function. The above figure taken together with figure 2 show that
PT depends only weakly on geometric shape. It is chiefly a function of
separation distance and area size. Following this consideration one step
further, we can think of p as the approximate percentage overlap of the
two uncertainty regions.
igure 4 shows this relationship, for two
circular normal PDFs with different means but identical standard deviations.
The percentage overlap is computed for the 90% uncertainty circles; the
typical containment percentage specified by the OTH-T System. As the
figure illustrates there is an approximately linear relationship between pT
and percentage area overlap.
For multiple ships in both targeting files, pD can be thought of
as the percentage of ships common to both data bases. Assuming nl and n
ships in data bases 1 and 2 respectively, suppose that n3 ships are identically
distributed in both files. It is easy to show that
- n 33
p
D
So that if n1 = n 2,
niiln21 2
then
pD is exactly the percent of ships in common.
Figures 5 and 6 show examples of overlays of two targeting files
projected to the same time. Ninety percent uncertainty ellipses are shown
(radius = 10 nmi) and the common area is cross-hatched. In figure 5 all
four ships are in both files and pD = 0.84; in Figure 6 two of 4 ships are
held in common and pD = 0.49.
81
1.0
0.8
*0.6 _
Coincident means,
r-
- I
equal uncertainty areas
L.4
0.2
1
2
3
Ratio of ellipse axes, a/b
Fig. 3
Target correlation vs uncertainty area shape.
82
.
4
0.8
/
/
0.8
.o
o
°
0.6
4,
0,
//
X
co
//
0.4
/
:/
/Circular
normal densities,
equal standard deviations
0.2
0
"'
20
40
60
80
Percentage overlap, 90% uncertainty areas
Fig. 4
Target correlation vs uncertainty area overlap.
83
100
Fig. 5
D
Overlay of targeting picturesip:
= 0.84.
(X)(_
I
Fig. 6
Overlay of targeting pictures, PD
84~~~
=
0.49.
5.0
Relationship of PT to P
acqi
and PD to P
.acq.
In this section the relationships between the data base commonality
and acquisition probabilities are established. It will
measures (p and
be shown the PT and PD are closely related to isolated and conditional Pacqs
respectively and can hence be utilized to measure data base commonality.
In what follows, Tomahawk fire control system algorithms were
exercised in order to compute estimated acquisition probabilities. A previous
estimates. An
study (Reference 1) has confirmed the validity of these P
ASCM mission was designed using the targeting file obtainedqfrom one data base.
This same mission was flown against the targeting file provided by the second
data base using techniques developed in Reference 11. The results are estimates
for the same ASCM mission based on
of both conditional and isolated P
different targeting data bases. I~Cqs then possible to compare differences
estimates with correlation coefficients computed for the two data bases.
in P
Variation of launch parameters over a sufficiently large simple space (different
launch ranges, targeting geometries, search modes and targeting data quality)
provides statistically meaningful results.
5.1
between
Comparison of Pacq
and PT
The above technique has been used to demonstrate the relationship
- P
. Figure 7 plots the absolute difference IP
PT and P
as a function of target correlation coefficient, PT. For this figure two mission
types were compared. The first mission type utilized a single pass search
modes at short range using good quality targeting data. The second mission
utilized an area search mode at long range with moderate quality targeting
data. Two data base geometry variations were considered to provide different
values for PT . The first variation involved specification of different mean
target locations with equal circular uncertainty area size (the geometry as
utilized in development of figure 2 ). The second geometry was identical to
that utilized in figure 3 i.e., equal mean position and uncertainty area but
different ellipse eccentricity. These mission types and geometry variations
were selected to exercise the fire control software over a broad spectrum of
engagement types while limiting the number of scenarios to an acceptable level
,within the time constraints of this study. As illustrated in figure7 ,
predictions differ by
specification of p > 0.9 ensures that isolated P
less than 10%.
*i.e., identical launch range, search mode fly-out waypoints launch point and
environmental data
85
a--
0
/
o
0.2
o
a.
0
4 -.
/
,
C
c
0.4
/
.
Et
I%.
o
.
/
- ./
0.6
o e Single pass search - short range
o n Area search mode - medium range
-Q~, i
08
0.8 ~
a Circular normal densities,
/
equal standard deviations,
different means
/
-_o
on Coincident means, equal
uncertainty areas,
different area shapes
cL.
1.0
0
0.2
Fig. 7
I
I
0.4
0.6
Single target correlation, pT
I
0.8
Variation of predicted perforr:-nce (PACQ;) V:S PT.
86
·1.0,
5.2
Comparison of Pacq
A comparison of Pa
and PD
and PD was also made for different targeting
scenarios. IMission types utilized were identical to those selected in section
5.1. Background shipping was generated for each scenario assuming a Poisson
distribution of ships about the target position. Targeting data basas for each
scenario were then constructed with random detection error and random location
errors added. Random errors were not added to target position however,
since the effect of these were observed in section 5.1. For each scenario
it was possible to compare data base correlation coefficient, pD, with
IP
-
aqc2
P
I, the absolute difference in conditional probability of
aqC1
target acquisition.
Figure 8 shows
(0-4 ships in the missile
attack with probability >
in P
varies from near
the results for 40 scenarios of background shipping
attack boundary - the region succeptible to ASCM
.01). For a given value of pD' the percent difference
zero up to some maximum value. The reason for this
acq
variation is that in many scenarios background shipping only marginally effects
target acquisiton probabilty and a low value of p does not force a disagreement.
Conversely, a given value for pD does imply a maximum error in Pacq prediction.
acq
Specification of PD > 0.85 ensures that Pacq predictions differ less than 10%.
5.3
Validation Using OTHrT Exercise Data
The preceeding two sections have demonstrated a relationship between
the two data base correlation coefficients and variations in P
estimates for
the two targeting data bases. This relationship can be exploitSc to provide
a threshold on the commonality measures in order to ensure a specified level
of acquisition probability difference. For example, in order to ensure
acquisition probability difference between data bases not exceed 10%, values
for data base commonality measures pT > 0.9 and pD > 0.85 are required.
Reference 12 reports on an analysis of data base commonality taken from
on-going Anti-Ship Tomahawk Cruise Missile testing. Targeting data bases were
provided at both the attack target node (an ASCM equipped SSN) and at the
ashore regional processing center. Data collected during this exercise
permitted analysis of Paca 's for missions designed by the SSN using data
provided by the RPG and conversely. The results are summarized in Tables 2
and 3 below.
87
0.00oo
Predicted
performance
0.10
% difference, 0.10
IPACQc 2 - PACQC1
o/
/
0.15
/
0.20
0.25
0.5-
0.6
0.7
0.8
Targeting mission correlation, D
Fig. &
Variation of predicted performance (PACQC) VSPD
0.9
1.0
TABLE 2
Mission
Flown
Against
RPG
Data Base
Mission Designed by
RPG
SSN
P
=0.86
acqi
P
P
P
=0.60
=0.51
acq i
acq c
SSN
Data Base
=0.82
acqi
P
=0.79
acq i
P
P
=0.62
acq
P
=0.65
acq c
acq i
=0.86
PT=O. 9 ; p -0.9 4
Note:
TABLE 3
Mission
Flown
Against
Mission Designed By
RPG
SSN
RPG Data
Base
P
=0.89
acq
P
SSN
Data Base
P
P
Note:
acq
=0.44
PT=0.05
89
acqi
acq i
=0.19
=0.88
Although the number of examples treated in this operational
analysis was quite small, the results presented here provide additional
credibility to the requirements on data base commonality provided in
Sections 5.1 and 5.2.
6.0
MOE Proosects
Themeasures of effectiveness developed herein suezest several
possible future applications. One tirst application concerns tie usage
of PT as a pre-processor for track-to-track correlation. Here the value
of PT can be used as a coarse test statistic in selection of preliminary
correlation candidates. Evidently, by the Cauchy-Schwartz inequality, we
observe that pT=l if and only if the two tracks have the same probability
density functions, ioeo, represent the same target.
An additional application of the MOE inciudes assessment of the
quality of a targeting base relative to ground trutn. Here, in assessment
of data base quality, we select the value for system accuracy (a) which
maximizes the value of the data base correlation coefficient. This measure
takes into account both the possible incompleteness of the targeting data
base in addition to its positional uncertainty.
7.0
Summary
In this paper data base commonality has been defined as the ability
of two or more data bases maintained at separate targetingnodes to support
similar weapons employment decisions. Quantitatively this is measured by
the difference in acquisition probabilities for identical ASCGI missions
developed using each data base. Two measures have been developed which
are functionally related to the similarity of targeting effectiveness
predictions. These measures are expressible as normalized inner products of
bivariate normal probability density functions. The first measure PT
represents a correlation between PDFs for a pair of identified tracks and
has been demonstrated to be closely related to the similarity between
predictions for the two data bases. The second measure, PD,
isolated P
computes teqcorrelation between track projections for all shipping within
a target--centered circular region. This measure has been related to
predictions for the data bases.
differences in conditional P
acq
Requirements on data base commonality have been developed using
a combination analytical and Monte Carlo simulation approach to determine
values of PT and PD which produce acceptable levels of agreement in
These requirements have been demonstrated using data
P
predictions.
cogiected during on-going fleet exercises conducted in conjunction with
Anti-Ship Tomahawk testing.
REFERENCES
1.
JHU/APL SECRET Report FS-80-076, dated March 1980, "Over-the-Horizon/
Detection, Classification and Targeting (OTH/DC&T) Engineering Analysis,
Volume 9 - System Requirements"
2.
JHU/APL CONFIDENTIAL Report FS-79-057, dated December 1979, "Over-theHorizon/Detection, Classification and Targeting (OTH/DC&T) Engineering
Analysis, Volume 4 - System Concept Development"
3.
JHU/APL CONFIDENTIAL Report FS-80-064, dated March 1980, "Over-theHorizon/Detection, Classification and Targeting (OTH/DC&T) Engineering
Analysis, Volume 10 - Detailed System Description (Type A Specification)"
4.
JHU/APL SECRET Report FS-79-166, dated February 1980, "OTH Targeting
for Anti-Ship Tomahawk During CNO Project 310-1"
5.
JHU/APL UNCLASSIFIED Report FS-80-170, dated August 1980, "Over-theHorizon/Detection, Classification and Targeting (OTH/DC&T) Engineering
Analysis, Volume 11 - Area Tracking and Correlation Model"
6.
JHU/APL CONFIDENTIAL Memorandum CLA-1608, dated 8 January 1981, "ATAC
Correlator Upgrade and Results Using Real-World Data"
7.
JHU/APL CONFIDENTIAL Report FS-79-276 dated December 1979, "OTH/DC&T
Engineering Analysis, Volume 8- Evaluation of Surface Ship Tracking
Algorithms"
8.
Naval Electronics System Command Report, PME-108-S-00454 dated December
1980, "Over-the-Horizon/Detection, Classification and Targeting (OTH/DC&T)
Ship Tracking Algorithm Computer Program Specification"
9.
Joint Cruise Missiles Project Office SECRET Report dated September 1980,
Status and Prospects"
"TERCOM for Cruise Missiles, Volume 1:
10. JHU/APL UNCLASSIFIED Memorandum FlA80U-091 dated 16 October 1980,
"Measures of Effectiveness for Data Base Commonality"
11. JHU/APL CONFIDENTIAL Memorandum F1A80C-003, dated 15 September 1980,
"Rapid Reconstruction of Tomahawk Engagements Using Fire Control Software"
12. JHU/APL CONFIDENTIAL Memorandum FlA80C-040 dated 5 November 1980, "Validation
of Requirements on Data Base Commonality"
91
92
MULTITERMINAL RELIABILITY ANALYSIS
OF
DISTRIBUTED PROCESSING
SYSTEMS
BY
Aksenti -Grnarov
Mario GerZa
University of California
at Los Angeles
Computer Science Dept.
Los Angeles, California 90024
This research was supported by the Office of Naval Research Under Contract
N0004-79-C-0866. Aksenti Grnarov is currently on leave from the University
of Skopje, YugosZavia.
93
MULTITERMINAL RELIABILITY ANALYSIJ OF DISTRIBUTED
PROCESSING SYSTEMS
Aksenti Grnarov and Mario Gerla
UCLA, Computer Science Dept.
Los Angeles, Ca 90024, USA
Tel. (213) 825-2660
Telex (910) 342-7597
ABSTRACT-- Distributed processing system reliability has been measured in the
past in terms of point to point terminal reliability or, more recently, in
terms of "survivability index" or "team behaviour". While the first approach
leads to oversimplified models, the latter approaches imply excessive computational effort. A novel, computationally more attractive measure based on multiterminal reliability is here proposed. The measure is the probability of
true value of a boolean expression whose terms denote the existence of connections- between subsets of resources.
The expression is relative straightforward to derive, and reflects fairly accurately the survivability of distributed
systems
with redundant processor, data base and communications
resources. Moreover, the probability of such boolean expression to be true
can be computed using a very efficient algorithm. This paper describes the
algorithm in some detail, and applies it to the reliability
evaluation
of
a
simple distributed file system.
This research was supported by the Office of Naval Research under contract
N00014-79-C-0866.
Aksenti Grnarov is currently on leave from the University
of Skopje, Yugoslavia.
94
I.
INTRODUCTION
Distributed processing has become increasingly popular in recent
years,
because of the advancement in computer network technology and the fal-
mainly
ling cost of hardware, particulary of microprocessors. Intrinsic advantages of
distributed
processing
include
high
performance due to parallel operation,
modular growth, fault resilience and load leveling.
com-
In a distributed processing system (DPS), computing facilities and
munications
subnetwork are interdependent of each other. Therefore, a failure
of a particular DPS computer site will have a negative effect on
DP
system.
Similarly,
failure
of
overall
the
the communication subsystem will lead to
overall performance degradation.
Recently, there have been considerable attempts at systematically investigating
the survival attributes of distributed processing systems subject to
failures or losses of processing or communication components. Examples of
include
[HIL
80].
DPS
Two main approaches to DPS survivability evaluation have
emerged:
a) In [MER 80] the term survivability index is
used
as
a
performance
parameter of a DDP (distributed data processing) system. An objective function
is defined to provide a measure of survivability in terms
failure
of
node
and
link
probabilities, data file distribution, and weighting factors for net-
work nodes and computer programs.
This objective
function
allows
the
com-
parison of alternative data file distributions and network architectures. Criteria can be included such as addition or
movement
of
programs
deletion
of
communication
links,
among nodes, duplication of data sets, etc. Constrains
95
can be introduced which limit the number and size of files and
can
be
stored
at
a
node.
One
programs
of the main disadvantages of the algorithm
presented in [MER 80] is its computational complexity. The algorithm is
tically
that
prac-
applicable only to DDP systems in which the sum of nodes and links is
say, less than 20.
b) The second approach is a "team" approach in which the overall
performance
is
related to both the operability and the communication connec-
tivity of its "member" components [HIL 80].
axiomatically
on
the
connectivity
effect"
essentials of the "team
trade-offs
team
system
of
alternate
approach
is
connected/disconnected
that
state
and
network
The
space
allows
architectures.
defined
of the graph, captures the
survivability
performance
index,
performance
cost/performance
The basic advantage of the
degradation
beyond
the
state is measured. One disadvantage of the approach is
that of being restricted to the homogenous case and of ignoring
other
impor-
survivability,
namely
tant details of real DPS.
In this paper we propose a novel measure of
DPS
"multiterminal reliability".
Definition 1.
The multiterminal reliability of a DPS
consisting
of
a
set of nodes V={1,2,...,N} is defined as
Ps
=
Prob CI 1
G
1
CI2,J
..2 ®2 .-
where:
I1'
,J,I2J2
... IJ
k
are
subsets of V
96
0
.k-1
Ck Jk
(1)
CIj,J.
denotes existence of connections between all the nodes
of
the
'Jk as well as the meaning ofG j depend on
the
subset Ij and all the nodes of subset Jj
and
j has a meaning of OR or AND.
The subsets I1,J1...' , ,
event
(task)
whose
survivability in being evaluated.
Priority between
-operations is determined by parenthasis in the same way as in standard
(
j
logi-
cal expressions.
As an example, let us assume that the successful completion of a given
task
requires
node A to communicate with node B or node C; and nodes D and E
to communicate with node F and G. The multiterminal reliability of such
task
is given by
P=
where I
Prob
~~m
J1 ={A},={B }
( C
'I~
,
OR C I
1'J
1'
2
) AND C
{I3 = {D,E} and J3
=
The multiterminal reliability measure can be used
survivability of the following systems:
(A) Distributed Data Base Systems
Given: link and computer center reliability
Determine:
97
3
3
{F,G}.
to
characterize
the
(1) how to assign files to computer centers for the best reliability of the distributed data base operation.
(2) where to place extra copies of one or more files
in
order
to
improve reliability.
(a)
Team Work
(1) Given link and processing node reliability determine what
tribution
dis-
of the members will result in highest probability of a connec-
tion.
(2) Given the distribution of the team members and network
topolo-
gy, how many links and/or nodes should fail before members disconnection.
(3) Given the distribution of
the
team
members,
which
topology
offers the highest probability of connection.
( Distributed Data Processing Systems
Given link and processing node reliability and distribution of programs
and
data,
determine
what
is the probability of performing some
specified task.
(D) Computer-Communication Networks
Given link and node reliability what is the probability of the network to become partitioned.
98
In all the above applications, system survivability is
ized
by
some
multiterminal reliabilty measure.
algorithm for multiterminal
gorithm
best
character-
In this paper, an efficient
reliability analysis of DPS is presented. The al-
can be applied to oriented and non-oriented graphs (representing DPS)
and can produce numerical results as well as symbolic reliability expressions.
The paper is organized in five sections. In Section 2, the
of
Boolean
application
algebra to multiterminal reliability is considered. Derivation of
the algorithm is presented in Section 3. An example for determination
of
the
multiterminal reliability is given in Section 4 . Some comments and concluding
remarks are presented in the final section.
I
BOOLEAN
ALGEBRA APPROACH
For reliability analysis a DPS is usually represented by a probabilistic
graph
G(V,E) where V-1,2,...,N and E=ala2,...,aE are respectively the set of
nodes (representing the processing nodes) and the set of directed or undirected
arcs
representing
the communication links.
To every DPS element i (pro-
cessing node or link), a stochastic variable yi can be associated.
The weight
assigned to the ith element represents the element reliability
Pi
=
Pr(yi
=
1)
qi= 1-Pi
i.e., the probability of the existence of the
ith
element.
All
stochastic
variables are supposed to be statistically independent.
There are two basic approaches for computing terminal
74].
The
reliability
[FRA
first approach considers elementary events and the terminal relia-
99
bility, by definition, is given by
Pst
=
Pe
F(e)=1
where Pe is probability which corresponds to the event e and F(e)=l1 means that
the event is favorable.
The second approach considers larger events corresponding to the
paths between terminal nodes.
simple
These events however are no longer disjoint and
the terminal reliability is given by the
probability
of
the
union
of
the
case
by
the
events corresponding to the existence of the paths.
The complexity of these approaches is caused in the first
large
number
of
elementary
events (of the order 2n where n = the number of
elements which can fail) and in the second case by the
difficult
computation
of the sum of probability of nondisjoint events (-the number of joint probabilities to be computed is of the order 2m where m = the number of paths
between
node pairs).
Fratta and Montanari [FRA 74] chose to represent the connection
nodes,
say
s and t, by a Boolean function.
This Boolean function is defined
in such a way that a value of 0 or 1 is associated with each
to
whether
between
event
according
or not it is favorable (i.e., the connection Cs, t exists).
the Boolean function corresponding to the
the connection Cs
connection
Cs
t
is
Since
unique,
means
that
tion.
Representing a connection by its Boolean function, the problem of
t
this
can be completely defined by its Bolean functer-
minal reliability can be stated as follows: Given a Boolean function FST, find
a minimal covering consisting of nonoverlapping implicants.
Once the
desired
Boolean form is obtained, the arithmetic expression giving the terminal relia-
100
bility is computed by means of the following correspondences
Xi -> Pi
xi -> qi
Boolean sum ->
arithmetic sum
Boolean product -> arithmetic product
implicants
A drawback of the algorithms based on the manipulation of
application
iterative
of
certain
the
operations and the fact that the
Boolean
Boolean function changes at every step (and may be clumsy).
tion
is
The Boolean func-
may be simplified using one of the following techniques: absorbtion law,
prime implicant form, irredundant form or minimal form.
cedures
however
Any one of these pro-
requires a considerable computational effort.
can be concluded that these algorithms are
applicable
only
to
Therefore, it
networks
of
small size.
Recently, efficient algorithms based on the application of Boolean algebra to terminal reliability computation and symbolic reliability analysis were
proposed in [GRN 79] and [GRN 80a] respectively.
the
representation
The algorithms are based
of simple paths by "cubes" (instead of prime implicants),
on the definition of a new operation for manipulating the cubes,
interpretation
of
on
resulting
and
on
the
cubes in such a way that Boolean and arithmetic
reduction are combined.
The proposed algorithm for multiterminal
based
on
reliability
the
analysis
the derivation of a Boolean function for multiterminal connectivity
and the extension of the algorithm presented in [GRN 80b] to handle both
titerminal
is
reliability
computation
and
symbolic
101
mul-
multiterminal reliability
analysis.
III DERIVATION OF THE ALORITHM
Before presenting the algorithm for multiterminal reliability
analysis,
it is useful to recall the definition of the path identifier from [GRN 79]:
Definition 2.
The path identifier IPk for the path wk is defined
as
a
string of n binary variables
IPk = X1x2...xi...x n
where
x i =1
if the ith element of the DPS is included in the path Wk
x. = x
otherwise
1
and n is the number of DPS elements that can fail, i.e:
n = N
in the case of perfect links and imperfect nodes
n = E
in the case of perfect nodes and imperfect links
n = N+E in the case of imperfect links and imperfect nodes.
As an example, let us consider a 4 node, 5 link DPS given in
in
Figure
which nodes are perfectly reliable and links are subject to failures.
sets of path identifiers for the connections CS A and CST are given in
1,
The
Table
1 and Table 2 respectively.
Boolean functions corresponding to CSA and CS,T given by their Karnaugh maps,
are shown in Figure 2.
102
x3
TABLE
4
1
TABLE
IP
PAT
2
IP
PATH
2TXXX
S X A
1XXXX
SXA
S x 3 B X5 A
XXlXI
S X1A
X5 B
X111X
S X3B
X4T
XX1IX
S X3B
X5A X21
X1lXl
1
S X3_
X4T X2
Figure 1.
1
X4
lZXX11
Example of DPS
Instead of the cumbersome determination
of
elementary
(or
composite)
events which correspond to a multiterminal connection, the multiterminal reliability can be determined from the Boolean function representing
tion.
the
connec-
Moreover, the corresponding Boolean function can be obtained from path
identifiers (Boolean functions) representing terminal connections.
For
ple, the Boolean function corresponding to the multiterminal connection
103
exam-
\
X1 X 2
x3x
XX
1X2
01 1'1 10
°0
1
00
x3x
1
,0
1
11 10
00
01
1
1
01
1
2
1
11
10'10 _1
x 5 =0
1
1
xF5
s ,a
xix2
Fx
2
xlx 2
a 0. 1 11
10
xx
34
01
0a0 01
11 10
3
1
01
_
11 1
1 11 7 1_
1
10
10 _
X= 0
_,
x
5
5F
1
st
Figure 2. Karnaugh Map Representation of the Connections
CS,A and CS,T
C
mor
C
CS,A
OR
C
U
F
S,T
can be obtained as
F
=F
mor
S,A
where U is the logical operation union.
S,T
Karnaugh map of Fmor is shown in Fig-
ure 3.
104
X1X2
X3
X4_
\X1 X2
x3x
0<1
0
7
00
11 10
00oo
01
01
o10
0lol!~01
Figure 3. Karnaugh map representation of the connection
Cmor
CS,A
S
OR CS,T
Covering the Karnaugh map with disjoint cubes, we can obtain F
as
/or
Fmor X
1
+ XX
3X 5
+ XX 3X 4X 5
i.e. multiterminal reliability is given by
Pmor
=
P1
q 1P3P
Analogously, the Boolean function
5
+ qP1
3P 4 q5
corresponding
to
the
mtititerminal
connection
Cmand = CS,A AND CS,T
can be obtained as
Fmand = FS,A /\ FS,T
where
/\
is the logical operation intersection.
map representation (Figure 4), Fand is given by
the following set of cubes
105
According to
the
Karnaugh
X1 X 2
xx
X 1X 2
4'00
01 11
00
01
10
_
x
00 01
11 10
300
01
_
11
11
10
10
1
x5= 0
1
1'
1
1
5= 1
Figure 4. Karnaugh Map Representation of the connection
Cmand
Cs,a AND Cs,t
IP = (11xxx,xxlll,lxxll,xllxl,x111x,lxllx)
Applying the algorithm REL [GRN 80b] we obtain
that
the
multiterminal
reliability is given by
mand
=
P1P2 + P 3 P4 p5 (1-P 1 P2 ) +
lP4P 5 q2 q 3 + q 1 P2P3q4P 5 + cqP P2
3
P4
Since the logical operations union and intersection satisfy the commutative and associative laws, previous results can be generalized as follows.
1)
Multiterminal connection of OR type Cs, T
equal to
106
(T =
{t
t2 ,. . . , tk
)
is
C
Cs
st
s,T
1
OR C
OR ... OR C
s,t2
'tk
and the corresponding Boolean function Fs, T can be obtained as
,UF
U ... UF
F
s,t1
s,t2
stk
F
s,T
2)
Multiterminal connection of AND type CsT (T = {t 1 t 2,..,tk}) is equal
to
C
s,T
= C
s,t1
AND C
st 2
AND...AND C
st
k
and the corresponding Boolean function Fs, T can be obtained as
F
Fs
s,t
s,T
A F
s,t2
A ... /\ F t
s,tk
In the case when all nodes from the set S have connections of
the
same
type with all nodes from the set T, multiterminal connection can be written as
CS,TT
Determination of FS9T
Determination of FS
of
nation
T
by Boolean expression manipulation or by
determi-
elementary events is a cumbersome and time consuming task.
Hence,
their application is limited to DPS of very small size.
Since the path identifiers can be
interpreted
as
cubes,
the
Boolean
function FS,T can be efficiently obtained simply by manipulating path identifiers.
tion
In the sequel we present OR-Algorithm and AND-Algorithm for
of
the Fs
T
of type OR and AND respectively.
on the application of the intersection operation EMIL
107
determina-
Both algorithms are based
651].
Since
the
path
identifiers
have
only symbols x and 1 as components, the intersection opera-
tion can be modified as follows:
3:
Definition
r
ala 2... a i ...a
The
and c
r /\Acs
cr and cS
blb
2
operation
..b i...
bn
between
two
cubes,
is defined as
/\ operation is given by
seen
produce
that
A
1
x
1
1
1
x
1
x
the
intersection
operation
between
a cube which is common to both cr and cS.
two
cubes
If c r /\ cs
r.
s
this means that the cube c is completely included in the cube cS.
fied
say
[(a A b),(a 2 A b2),...,(a i /\ bi,...,(a /\ bn)]
where the coordinate
It can be
intersection
The
cr
modi-
intersection operation produces a cube which has only symbols x and 1 as
coordinates, so the modified intersection operation can be applied
again
on
again
and
the set of cubes obtained by the application of the modified inter-
section operation.
Let
CS
s1,
us
T. and C
suppose
T
are
that
in
the
cubes
lists
corresponding
L 1 and L 2
to
the
connections
respectively.
Also, let us
2, 2
denote the length (number of cubes) in the lists by k1 and k 2
denote the jth element in the list L..
108
----""~""
lOB"~-
.
With
c. we
Now, we can introduce the OR - Algorithm
DO
- A1I_
ri
t h m
STEP 1.
for i from 1 to k1( do
for j from 1 to k 2 do
begin
c= c 1
'
if c = c1 then
begin
delete c1 from list L1 ;
1
end
else- if c = c22 then delete c22 from list L22
end
STEP 2.
Store undeleted elements from the lists L1 and L2 as
new
list
L1
STEP 3. END
As an example, an application of the OR - algorithm is shown on determination
of
F T
s,T
F
U F
for
s,t
s,a
the
L1 and L2 are
109
DPS
given
in Figure 1. The lists
L1
L2
1
1
1xxxx
C2
11 xxx
2
C2
xxlxl
1
2
C2
1xx11
c1
3
c3
3
x111x
c3
xxllx
c4
x 1 xl
STEP 1:
STEP 1:
1
1
c1 A c2=C2
1
delete c1
c 1 A c2=c2
delete 02
1
313
C11 A c=c
24 11c4 2
2
323
c! Ac 2 c1 c 2
c ! A c2=c 2
delete c 2
c3 A c3=c3
delete c3
1
2
1
1
STEP 2:
L1
c
STEP 3:
1
1xxxx
C2
xxlxl
c3r
xx lx
END
It can be seen that the OR - Algorithm
number
produces
a
list
with
of elements which are cubes of the largest possible size.
110
minimal
This is the
same result as obtained from the identification of disjoint cubes in
This
property
allows
Fig.
fast generation of the set of disjoint cubes necessary
for reliability analysis [GRN 79].
AND - A 1I
o riX I
m
STEP 1.
for i from 1 to k1 do
begin
for j from 1 to k 2 do
j
i
i
ci+2
c1 / c2
for k from 1 to k 2-1 do
begin
m = k+1
while 0cki+2
ckm
Ci+2
i+2
and
and mn<k
m<k2 do
begin
k
i+2
\ c
i+2
if c = c2 then delete ci from list L.
2
m = m+1
end
if m < k
2
then delete ci+2 from list Li
i+2
i+2
end
end
STEP 2
Store undeleted elements from lists L3 ,...,L
STEP 3. END
3.
+2 as a new list L
As an example, the application of the AND-Algorithm is shown on determination of F s ,
T
=
Fs,a /\ Fst for the same DPS.
STEP 1.
i= 1
Step 1.1
c
c 11
3
=c3
Ic3 =1
a3 3I
1xxx1
2
3=
11
1
3
e
1 A
112
e c4 = c1
1.2~_~~_~__.___~_Step
2
1 A= c
i=22
lxxx
c2
I
delete c2
Step 1.1
4
i4 = xxiii
'C4
x11xl
Step 1.2
4c
1 =
4 xx111l
i
3
=
Step 1.1
[=3x1111
xliii
Step 1.2
L, = xlix
STEP 2.
~~~~Step
~
1.xx
Step 1.2
c1
L1
xlxx
1L
1xllx
113
14
C1
1
xx111
c1
xllxl
c6
x111x
It can be seen that the AND
-
Algorithm
also
produces
a list with
minimal number of elements which are cubes of the larger possible size.
The
Boolean
CS T where S={s 1, s2
function
..
.s
k}
corresponding
to
the
connection
and T = {t1 ,t 2 ,t
... m}, can be obtained using the
following:
me)- Alizo
Step 1:
thm
Find the path identifiers
for
terminal
connection
s 1 ,t1
and
store them in the list L1; i <- 1.
Step 2:
Sort the path identifier in L 1 according to increasing number of
symbols 1 (i.e. increasing path length);
Step 3:
if i < k continue. Otherwise go to step 5
Step 4:
for j = j
Step 4.1
'... m
(jl =2 if i = 1,otherwise jl = 1)
Find the path identifiers for
terminal
connection
si,tj
and store them in the list L2
Step 4.2
Sort the path identifiers in L 2 according to the
ing number of symbols "1"
Step 4.3
Perform
-Algorithm on the lists L 1 and L 2
Step 4.4 i<-i + 1; go to Step 2.
Step 5:
END
114
increas-
In the algorithm,
Sorting
of
® denotes
OR or AND depending on the connection
type.
the lists allows faster execution of the algorithm (starting with
the largest cubes results in earlier deleting of covered
cubes
i.e.
faster
reducing of the lists length during the execution of Step 4.3).
According to the previous we can propose
the
following
algorithm
for
multiterminal reliability analysis:
MUREL - A 1 z i
r i t ha
STEP 1:
Derive the multiterminal connection expression corresponding
to the event which has to be analized.
STEP 2:
Determine the Boolean function corresponding to the multiter
minal
connection by repetative application of the mi - Al-
gorithm.
STEP 3:
Apply the REL -Algorithm to obtain the multiterminal
bility expression or value.
STEP 4: END
115
relia-
For the computational complexity of the MUREL-Algorithm we can
say
the
following:
i) The
-Algorithm can be realized using only
logical
which
operation
belongs to the class of the fastest instructions in a computer system.
ii) The mQ -Algorithm produces a minimal set of maximal possible cubes (
i.e. minimal irredundant form of the Boolean function).
iii) The REL-Algorithm is the fastest one for determination of the reliability
expression or-reliability computation from the set of cubes (path
identifiers).
From the above considerations we can conclude that
rithm
can
the
algo-
proposed
be applied on DPS of significantly larger size then it is possible
using other existing techniques.
In the following section, the algorithm is illustrated with an
tion to a small distributed system.
in the final phase of implementation.
applica-
A program based on the MUREL-Algorithm is
Experimental results
based
on
medium
and large scale systems will be included in the final version of the paper.
V EXAMPLE OF APPLICATION OF THE ALGORITHM
As an example of the application of the algorithm we
vivability
index
5.
the
sur-
for the simple DDP system shown in Figure 5 (the example is
taken from [MER 80]).
figure
compute
Assignment of files and programs to nodes is
shown
in
FA denotes the set of files available at a given node, FN i denotes
116
the files needed to execute program i and PMS designates the set
of
programs
to be executed at that node.
Let us assume that for a given application, we
survivability of program PM3.
are
interested
in the
Likewise, for another application, we need both
We separately analyze these two cases
programs PM3 and PM8 to be operational.
using as a measure for survivability the multiterminal reliability (probability of program execution).
The two problems can be stated as follows:
Given: node and link reliability, and file and program
assignments
to nodes.
Find: The survivability of:
1) Program PM3
2)
Both programs PM3 and PM8
1) Survivabilitv of PM3
The survivability PM3 is equal to the multiterminal reliability of
con-
nection
C
m3
where I1
= {1' 3}
OR
-
2,i11
2,12
2,I2
and I2 = {1,4} The connections C 2 ,I
C2,I
= C2,1 AND C2,3
C2,I
= C2,1 AND C2,4
117
and
, C2
I
are equal to
NODE
X2
FA: 3,5, 7
X5 /
PMS:
PM 3 , PM 4
FN3:
2,4
FN4:
3,4
NODE X3
NODE X
FA:
X6
1,2
PMS:
PM 1 ,
FN 1
1,2,3
.
N
2:
4,6 .7
PMS: 5
PM2
6
7
FN5 :1,5,4
~~~~FN
2, 3
FA-
6 '6,2
FN 7 :7,1 .
Z~FN
2,3
3
7
NODE X4
: 5,3,4
FA
PMS:
FN
8
PM 8
: 1,2,6,7
Figure 5. Four Node DDP
Paths
C2 1 ' C2,
3
and
corresponding
identifiers
path
and C2 4 are shown in Figure 6.
C2, 1
F2,
paths
11xx1xxx
x1X5 2
2,3
IJS
for
the
connections
paths
F2,3
x 1 x5 x2 x6 x3
1llxl xx
X 1X 5 x2 7X4X8
1lx1lx11
C2,4
paths
F2,4
X 1X 5X 2X X
74
llxllxlx
x 1X 5 x 2 x 6x 3 x8 x 4
Il111 xl
Figure 6. PathSand Path Identifiers Representing Connections
C2,1,C2,3, and C2,4
Applying the AND - algorithm on F 2 , 1 and F 2 ,3
FF
2 2,I1
,
and F2 1 and F2, 4 we obtain
2
2,
111xlxx
llxllxlx
11x11x11
111111x1
Applying the OR - algorithm on
F2 I
and F2
Fm1
we obtain
2
Fm3
11 1x1 1xx
1llxllxlx
Applying the REL - Algorithm
Pm3
Assuming pi
=
95
=
on Fm3 we obtain
PlP2P3P4P5P6
i, we have:
Pm3
+
=
P 1 P2 PP4 P5
7 (1
- P3 P6 )
.85
2) Survivability f both PM3 and PM8
119
The survivability of PM8 is equal
to the multiterminal
reliability
of
connection
C 4,13
Cm8
where
I3 = {1,3}.
The connection C4 ,
'3
is equal to
C4 I = C4, 1 AND C 4, 3
Paths
and
corresponding
path
identifiers
for
the
C4 1 and C 4 ,3 are shown in Figure 7.
C4, 1
paths
F4,1
x4x 7 x 1
lxxlxxlx
X4 X8 X3 X6 X 1
lx1xlxlx
C4 ,3
F4,3
paths
X4 X8 X3
xxllxxxl
X xlXX~x3
ixllxllx
Figure 7. Paths and Path identifiers for Connections
C4,1 and C4, 3
Applying the AND - Algorithm on F4, 1 and F4, 3 we obtain
Fm8
lxl lxxl 1
lxllxllx
1xllxlxx
120
connections
Applying the AND - Algorithm on Fm3 and Fm8 we obtain
m3
m
1111111x
11111 1x
11111xl1
Applying the REL - Algorithm on F
we obtain
Pm = PlP2P3P4P5P6P7 + P1 P2P
q7 P8 + PlP P
2
3 P4P 5 P6
3 P4P P7P
5 q6
8
Assuming Pi=0.95 hi, we have:
Pm= 0.778
V CONCLUDING REMARKS
In the paper, the multiterminal reliability is introduced as
of
DPS
measure
survivability , and the MUREL-Algorithm for multiterminal reliability
analysis of DPS is proposed.
terms
a
of
its
First, the event under
multiterminal
connection.
study
Then the m O
is
in
-Algorithm is used to
translate the multiterminal connection into a Boolean function
the relevant system components.
expressed
involving
all
Finally, the multiterminal reliability is ob-
tained from the Boolean function by application of the REL-Algorithm.
Preliminary
MUREL-Algorithm
computational
permits
the
complexity
survivability
considerations
analysis
larger size than using currently available techniques.
121
show
that
the
of DPS of considerably
[GRNA79 ] A. Grnarov, L. Kleinrock, M. Gerla, "A New Algorithm for Network Reliability Computation", Computer Networking Symposium, Gaithersburg,
Maryland, Decembe 1979.
[GRNA80a] A. Grnarov, L. Kleinrock, M. Gerla, "A New Algorithm for Symbolic
Reliability Analysis of Computer Communication Networks", Pacific
Telecommunications Conference, Honolulu, Hawaii, January 1980.
[GRNA80b] A. Grnarov, L. Kleinrock, M. Gerla, "A New Algorithm for Reliability
Analysis of Computer Communication Networks", UCLA Computer Science
Quarterly, Spring 1980.
[HILB80 ] G. Hilborn, "Measures for Distributed Processing network survivability, Proceedings of the 1980 National Computer Conference, May 1980.
[MERW80 ] R. E. Merwin, M. Mirhakak, "Derivation and use of a survivability
criterion for DDP systems", Proceedings of the 1980 National Computer Conference, May 1980.
EMILL65 ] R. Miller, Switching Theory, Volume I: Combinational
York, Viley, 1965.
122
Circuits,
New
FAULT TOLERANCE
IMPLEMNETATION
CONTEMPORARY
ISSUES
USING
TECHNOLOGY
BY
David A. Rennels
University of CaZifornia
at Los Angeles
Computer Science Dept.
Los Angeles, CaZlifornia 90024
This research was sponsored jointly by the Aerospace Corporation under Contract
23107N, and the Office of Naval Research under Contract N7000Z4-79-C-0866. The
building block development at the Jet PropuZsion Laboratory was initiated by
the Naval Ocean System Center, Code 925.
123
FAULT TOLERANCE IMPLEMLNTATION ISSUES USING
CONTEMPORARY TECHNOLOGY
David A. Rennels
University of California
Los Angeles
One of the most striking features of contemporary technology is the
proliferation of small computers within complex systems. Wltn processors on
an Inexpensive single chip and reasonably large memories or just a few chips,
computers systems design will largely be done at the PMS (Procebsor, Memory,
Switch) level, replacing much of conventional logic design [BELL 71].
PMS
design reflects the use of distributed computing based upon Lbl and VLSI technology.
This paper discusses techniques to achieve fault tolerance based on this
level of technology.
A model of a fault-tolerant heterogeneous network of
distributed computers Is presented which was chosen to be sufficiently general
to represent a wide range of systems. Fault tolerance is discussed on two
levels: (1) Intra-computor fault tolerance Issues of redundant communications
and protection of sets of computers against faults, and (2) Implementation
Issues of fault detection, recovery and redundancy within various component
computers.
.H
A distributed System model
A complex system is usually composed of a set of subsystems, many of
which contain electromechanical sensors and actuators to perform specialized
tasks. The associated computing system exists to control and provide data
processing services within the system, and is often structured to match that
host system. It is likely that the system and its associated computers will
be heterogeneous. Different types of computers may be inherited with the purchase of subsystems, and the Intercommunications structure between computers
Is likely to differ in various areas that have divergent handwidth requirements. Fault tolerance Is likely to be selectively employed.
Non critical
subsystems may contain little or no redundancy in their associated computers,
other subsystems may contain spare sensors, actuators, and computers.
Critical functions may employ three computers with voted outputs to provide instantaneous recovery from single faults.
In formulating fault tolerance techniques for distributed systems, it is
necessary to account for these facts that real systems sometimes grow in an
ad-hoc fashion and are seldom homogeneous. In order to do this, we view a
distributed computing system as a collection of homogeneous sets (HSETS) of
computers as shown in figure 1. An HSET consists of a set of computers which
(1) utilize identical hardware, (2) use identical 1/0 and Intercommunications
124
interfaces, and (3) If spare computers are employed, a spare can be used to
replace any machine which fails in the HSET. The collection of HSET's Is
heterogeneous, In that one HSET may employ a different type of computer than
another HSET.
Figure 1 shows three different types of HSET's described below:
a) Embedded Computers are responsible for control and data handling for a subsystem.
These computers are embedded within the subsystem and may have
dozens ot subsystem-specific I/0 lines to sensors and actuators. Usually, the services of one computer Is required, with any other computers
in the subsystem serving as homogeneous backup spares for fault
recovery.
b) Special Purpose Processing Arrays are used for signal processing or as a
shared resource for high speed execution of specialized tasks. Typically, specialized computers and high handwidth Interconnections make these
implementations quite different from the other HSET's. They must contain
redundant elements in order to recover from internal faults, and fault
detection mechanisms to locate a faulty element. The other HSETs in the
system can be used to command reconfigurations and effect recovery.
c) Command and Service Networks (CiN) provide high level computations to control and provide various computing services for the different subsystems. Every computer In the CSN has identical connections to an Intercommunications system over which commands are sent and information is
exchanged with other HSET's. Typically, several computers operate concurrently in the CSN, each providing a different system-level computing
function. Among these are (1) The system executive which commands and
coordinates the operation of the various subsystems; (2) redundancy and
fault recovery management to detect faults not handled locally in
HSET's, and effect reconfiguration and reinitialization as required; (3)
collection and distribution of common data; and (4) service computations
such as encryption or navigation. Critical computations, such as the
system executive, may be executed redundantly in two or three computers
so that they will continue if one machine should fail.
The remaining portion of
system. Before describing its
the fault tolerance techniques
fault tolerance mechanizations
this network model Is the Intercommunications
needed characteristics, it is useful to examine
which can be used in the various HSETs.
These
influence the nature of intercommunications.
125
~ ~~
0II~
z
.9
-
C)
0
C)C
H
z
C)~~~~~~~~~~
H
6
C-r~~~~~~~~~~~~
U
0
V3
C
WI
U
P
4_ II
C71
.
E~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I
C.C
...
z~~~~~~~~~~~I
* .~~~~~~~zL
E~~~~~~~~~1
4)
IcrlU~~~~~~~~~~~~~~~
z~~~~~~
a)
U~~~~~~
*
e
*7
k
L)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~a
O~~~~~~~~~~~~
U~~
ZQ<
~~~~~~~~~~~~~~
126,
C)·
'
~~~~
C )
,
ill
Fault Tolerant Design Techniques within HSETs
The following redundancy techniques are used to protect a uniprocessor.
Redundant computers are employed either as unpowered spares, or running identIcal computations to enhance fault recovery. These techniques are applicable
In all the HSETs. For most embedded computers these techniques apply directly
to protect a single computer. In the CSN, several computers carry out different functions concurrently, and each active computer may use one of these
techniques, selected by the requirements of its function.
It is assumed that either one of two types of computer are used
configurations described below:
a) Off thJ Shelf Computers (OTSC) which have limited internal fault
capability.
in the
detection
b) Self Checking Computers (SCC) which are specially constructed with internal
fault monitoring circuits which can detect faults concurrent with normal
operation.
The principal requirement of fault tolerance in an HSET is that a fault
be detected and satisfactory computation be restored. There are other requirements which vary with the function being carried out by a particular computer, e.g.:
a) Recovery Time -This is the length of time computing can be
between occurrence of a fault and recovery of computations.
disrupted
b) Fall Safe Outputs -Some applications prohibit incorrect outputs, though
outputs during the recovery time may be acceptable.
no
c) Computational Integrity -Specific computations may be required to continue
(Other computations may require less computational inwithout error.
tegrity, and be recovered by re-initialization and restart.)
There are three basic approaches to implementing fault tolerance in
uniprocessors which have been studied and implemented: Standby Redundancy
(SR), Duplex Redundancy (DR), and Voting or Hybrid Redundancy (HR). The following paragraphs describe each when applied to whole computers.
A Standby Redundant (SB) configuration employs one active computer to
The other computers serve as unpowered backup
carry out the computations.
spares. A fault detection mechanism (FDM) is employed to detect faults in the
active machine. If a fault Is detected, a program restart or rollback may be
attempted to determine if the fault was permanent or transient in nature. Recurrence of the fault indicates that it was permanent and the active computer
Is replaced with a spare [AVIZ 71].
127
The principal design problem is implementation of the fault detection
mechanism.
Two approaches have been used. The fIrst is ad-hoc addition of
Software reasonableness
error checks to off the shelf (OTS) computers.
checks, perioic diagnostics, and time out counters are typically employed.
The second Is to use specially designed selt-checking computers (SCC) which
contain internal monitoring circuits to detect faults concurrently with normal
program execution. Replacement of a faulty computer with a spare is done by
Replacement and loading of the spare's memory can be conpower switching.
trolled by other computers in the hierarchic network, or by a heavily protected logic circuit within the HSET. The use of selt-checking computers provides
a high degree of fault detection, and allows faults to be detected before
faulty outputs are generated and propagated throughout the network. (After
propagation of errors, fault recovery is made much more difficult.) We feel
that off the shelf computers are unsuitable for general application in SR configurations due to their limited fault detection capability, and the fact that
faulty outputs can be generated. With OTS computers, considerable information
damage may occur before a fault is detected, making recovery from transient
provisions should be made in a distributed
faults very difficult. However
system for incorporating OTS machines which may be Inherited wltn existing
subsystem equipment.
redunWhen self checking computers are used a special form of standby
dancy may be employed designated Standby Redundancy with Hot Spare (SRHS). As
in SR, one machine is responsible for ongoing computations and communications.
It is also responsible for maintaining status updates in a second powered
powered machine designated the Hot Spare. The primary machine and hot spare
look for faults in the other machine by periodically checking their internal
fault monitors. If a fault occurs and cannot be corrected with a program
rollback, the checking logic in the faulty computer causes it to shut itselt
down. The remaining good computer continues the computations and configures a
new hot spare by activating one of the unpowered spares [RENN 78].
SR configurations impose the following constraints on the
cations system and higher level computers in the network:
intercommuni-
(1) Communications through the network only take place witn the one active
This is an advantage because redundant use of
computer in the HSET.
handwidth, consistency problems, and synchronization problems of communicating with multiple machines (e.g. DR and HR configurations) are
avoided. A disadvantage of simpler communication is the potential for
millisecond delays for re-transmission If an error causes a message to
be lost. Applications and executive software must be explictly designed
to accomodate such delays. Status messages must be included in the intercommunications protocols to verify proper receipt of messages, and
allow retransmission if one Is lost.
(2) If inadequate fault detection Is employed in individual computers, it is
probable that a computer fault will generate erroneous messages to the
network. In order to accomodate machines which cannot guarantee fall
sate outputs, fault containment mechanisms must be built into the intercommunications structure. The destination of messages from each machine
must be carefully controlled ana protected so that other HSETs can de-
128
fend themselves against faulty messages.
(3) In the case of transient faults which cannot be recovered locally, and
permanent faults in an SR configuration, external intervention is required from outside the faulty HSET. The redundancy and fault recovery
management process must activate a spare machine and load and initialize
its memory before normal functioning can be resumed. A SRHS configuration is designed to avoid the need for this outside help, but if Its
recovery mechanisms should be befuddled and confounded by an unusual
fault, similar support Is required.
A Duplex Redundant (DR) configuration employs two computers which per
form identical computations, and compare outputs to provide fault detection.
Upon detecting a disagreement, a rollback or restart of the program is attempted to effect recovery from transients. If the fault persists, one of two
approaches are taken. The first approach is to isolate the fault by finding
the faulty machine. For off the shelf computers, diagnostic programs are run
on both machines to identify the miscreant. With self checking computers the
faulty computer should identify itselt by a fault Indicator [TOY 78]. The
second approach, used in the USAF FTSC machine, is to try all combinations of
two machines until two are found which agree [STIF 76]. Reconfiguration by
power switching, can be controlled by other computers in the network or a special carefully protected logic circuit in the subsystem.
(3) Hybrid Redundancey (tiB) employs majority voting to mask the outputs
of a faulty machine. Three machines perform Identical computations and their
outputs are voted. Spare machines are employed to replace a disagreeing
member of the three active machines. Off the shelf or selt checking computers
may be employed. Reconfiguration may be initiated and controlled by the two
agreeing machines In a triplet or by external computers in the network.
Duplex configurations using self checking computers and hybrid configurations employ identical computations in different computers so that instanThe remaining
taneous recovery will occur if one of the computers fail.
"good" computation is readily identified ana used. The hybrid configuration
offers very high fault "coverage"; nearly any conceivable fault in one machine
will be masked by the other two.
tions
are:
There are several options for Implementing duplex and hybrid configurawhich profoundly affect the architecture of the host network. These
(1) Internal vs. External Comparison Voting - Comparison and voting logic can
be implemented within an HSET so that the intercommunications system
"sees" a single computer. Using this (internal) approach, the internal
logic in the HSET distributes incoming data to the two or three active
computers and selects a single correct result for output. This approach
has two disadvantages. A principal reason for using duplex selt checking computers or hybrid computers is the ability to provide "instantaneIf simplex intercommunications are
ous" recovery by fault masking.
used, a transmission error will require retransmission and considerable
delay for recovery, negating the delay free recovery in the HSET.
Secondly, internal voting and comparison logic provide a potential sin-
129
gle point failure within the HStT. Therefore we have pursued the other
(external) alternative. Each active computer receives redundant messages from the intercommunications system, and each computer delivers
one of a redundant set of outputs to the intercommunications network.
Voting or selection of a correct message occurs at the receiving modules
in other HSETs. This Is done to mask communications errors using redundant transmissions.
(2) Synchronous vs. Independent Clocks Some fault tolerant distributed systems
have been successfully built with clocks synchronized In all computers,
and others have been implemented with Independent unsynchronized clocks
throughout the system. When unsynchronized clocks are used, It is necessary to synchronize programs at intervals on the order of a few milliseconds using software synchronization techniques or a common Real
Time Interrupt [WENS 78].
If a common synchronized clock is used, it must be specially designed
ana protected with a high degree of redundancy. We have chosen the independent clock approach, in an attempt to minimize the use of specialized highly
protected hardware. If the distributed system is spread over a large area, it
becomes difficult to prevent noise and phase differences in the synchronized
common clock.
In light of these options that we have selected for investigation, the
following constraints are imposed in the intercommunications system and high
level computers in the network:
(1) The network must support redundant message transmission to and from TMR
and Duplex configurations (along with simplex transmissions between SR
machines), since several critical HSETs may use one of these forms of
redundancy.
(2) If OTS computers are employed in DR or HR configurations an individual
machine may generate erroneous messages, so fault containment mechanisms
must be built into the intercommunications system in the same manner as
for SR configurations.
IV The Redundant Intercommunications Structure
Based on the distributed system model of section 11 and the assumptions
on tne use of redundancy in HSETs, we are currently investigating intercommunA communication
ication structures which best support this type of system.
system based on MILSIU 1553A buses was designed but found to have limitations
In supporting systems which include hybrid redundant HSETs ERtNN 80]. Current
research is based on the use of redundant buses which are similar to ethernet
[CRAN 80]. Jhe following are preliminary results.
The Intercommunication system should have the following properties:
(1) There must be redundant communications paths to support redundant messages
and should include one or more spares. Typically four buses would be
130
used in the Intercommunication system, if TMR HSETs are employed.
(2) In typical command, control, and data handling systems, criticaality of
messages and their bandwidth are often Inversely proportional. Raw data
often requires a preponderance of bandwidth, but It can be processed In
single (SR) computers and passed through the intercommunications network
on one bus only. It can be delayed or interrupted by higher priority
High
priority command messages typically require low
messages.
bandwidth. Therefore they can be computed redundantly in HR or DR computers and sent over multiple buses without seriously degrading total
available bandwidth.
There are two ways of handling the Interleaving of critical low
rate and non critical high rate data. One way is to reserve short time
intervals on all buses periodically, during which redundant communicaThe second is to give priority of access to the
tions can be sent.
critical messages to guarantee their arrival within an acceptable time
This is facilitated in an ethernet type of bus in two ways:
interval.
(1) giving these messages a shorter wait interval before transmission,
thus allowing them to always win during contention, and (2) allowing
these messages to force a collision to remove less critical messages
from the bus it their acceptable transmission delay Is about to run out.
(3) In the redundant communications we must enforce data consistency. By this
we mean that unless there is a fault in one of several redundantly
operating computers, their messages will be Identical. In order to do
this sample and hold techniques must be applied at all sensors and in
internal message buffering. Data is collected and held in one or more
registers, the computers initiate data gathering and are synchronized to
waiT before using the data so that the held data is the same for all
computers.
Since the computers are not clock synchronized they might
otherwise receive different data by sampling a changing measurement at
slightly different times. A typical sample and hold interval is on the
order of a few milliseconds [RENN 78].
(4) The intercommunications interfaces.should be designed with self checking
logic.
A methodology has been developed which allows implementation of
Internal check circuits on chips which also detects faults in the check
These design techniques have been shown to be relatively
circuits.
inexpensive on VLSI devices [CARI 77J. The failure of a self-checKing
intertace will result in that interface disabling itself, and will thus
prevent tne faulty interface from disabling a bus or generating erroneThis is especially important in an ethernet type of bus
ous messages.
in which a faulty interface can easily disrupt message traffic.
(5) Each interface should enforce fault containment on messages generated by
its host processor to limit the effects in the network of fault induced
messages. This containment takes two forms:
131
(i) Message names and destinations are restricted. Each Interface is loaded with the names and destinations of messages Its
host computer is allowed to send over its associated bus. This approach Is akin to capabilities addressing In that the communications Interface will refuse to send any message for which it has
not been enabled.
(II) Bandwidth and length limitations associated
message are stored in the terminal and enforced.
with
each
Figure 2 shows a portion of a distributed system containing three HSETs:
a command ana service network (CSN) and two sets of embedded computers. Each
HSLT may have additional I/O lines and buses between computers, but the intercommunication system between HSETs Is composed of four buses. Each computer
contains four bus interfaces (one to each bus) and a subset of the send and
receive message names is shown as columns in each interface. Various redundancy types are employed. Three computers In the CSN (C1, C2, C3) run the
system executive In a voted configuration. The data handler (C4) Is operated
as a standby redundant machine, and one spare (C5) backs up both the system
executive ana data handler.
One subsystem-embedded HSLT (C6, C7, C8) operates in a triply redundant
(TMN) voted configuration, while the other one (C8, C9) executes in a SRHS
configuration. The following are examples of the various forms of communications.
(1) The system executive trio can send message A to the TMR HSET.
Each executive program commands an output of messages A and It is carried out over buses 1, 2, and 3 for computers 1, 2, and 3 respectively.
Each receiving computer (C6, C7, C8) receives all three messages. Low
level I/O software In each machine votes the incoming messages and
delivers either a single "correct" message A to the applications programs or notifies the executive that all three disagree. Similarly, the
embedded TMR HSET can deliver messages B to the system executive. A
fault in any single computer or on a single bus is effectively masked.
(2) The system executive trio sends message C to the single computer C9 over three buses. That computer receives all three messages
and votes for a corrected result. When the single computer C9 wishes to
sena a message D to the system executive, it is sent over all three
buses. This is carried out automatically by the three bus terminals In
C9 which recognize D as a legal output message. Redundant transmission
of the message from a single computer is necessary to prevent a single
bus error from causing the triplicated computers from receiving ditferent, Information and going out of synchronization.
132
=~~~~~~~~~
I
|P=
C
cn
CZ> l
n
S
C
W
z~~~~~~~~~~~
W
~ ~ ~~ ~ ~~~~~~
~
= > n~~~~~
mI u I I(I"I
u
Il
C
Xrts
I
L U
~CYIiZ
r vdl
m
CZ
CC.
R0
11 I
1
v;1 Ui
!ujc3=I-L:
I _ 1I IJLib 11=1w
mro
X
111 =
I-'-f-11,
I
I3 I
xl~~l>: rr I I
r~~~ 1
Z
1
~Pc
II
I
I .ta
CC
.1
r,
Pv;,
W
1LUQ
Iv
D
u~
-Cr~~~~~~~~~~~~~~~~~~~~
+
-
W
133
='1t
(3) Finally, single computers only exchange one copy of messages.
For example, messages E, F, G, and H are sent between C4, and C9. Simplex messages should employ automatic status returns to verify correct
reception. A lost or jammed message can then be retransmitted.
In summary, the intercommunications system makes use of "soft" names
loaded Into communications terminals to define and enable various redundant
ana non-redundant bus transmissions.
There are a variety of problems associated with this type of implementation.
One problem is the opposing requirements of a large message name space
vs. interface complexity. The table of receivable message (soft) names must
reside in each bus interface since an associative search Is required to match
each arriving message. Sixty four "receive" message names cost the equivalent
of about 1000 gates in each Interface. This appears to be the maximum number
of receive names that could be Implemented on a single-chip interface In
current technology (at an acceptable cost). The table of "send" soft names
can be stored in the host computer's memory In an encoded form to prevent the
computer from accidently modifying the table in an undetectable fashion.
Theretore the send soft name list for each bus interface does not represent a
major size constraint on Interface implementation.
The system name space consists of a hard name (which identifies an HSET)
concatenated with a soft name which identifies a single or multiple (redundant) destination within the HSET. Thus while a single HSET is limited on entry names, the system can have a very large address space. Thus, this problem
appears tractable.
Configuration of the bus system, (i.e. loading all the name tables) is
another difficult design problem. Each interface Is loaded externally via the
bus system to prevent the Interface's host computer from having the ability to
erroneously modify Its interface. Currently we require each interface to receive two independent commands before its table can be loaded. The TMR set of
system executive computers are responsible for reassigning access rights to
circumvent faults.
We are currently examining more elaborate configurations which allow intertaces to be switched to different buses for enhanced reliability.
V VLSI Implementation of Distrlbuted Systems
The use of LSI and VLSI Is essential for the efficient implementation of
distributed computers with more than a few computers. If small and medium
scale integrated circuits are used, the overhead of replicating computers and
communications interfaces may become excessive. Recent research indicates
that a wide range of systems can be Implemented using existing processors and
memories ana about a half dozen different types of LSI or VLSI building block
circuits [RENN 81].
134
Figure 3 shows a typical computer. An internal bus connects the processor, memory, and communications circuits and consists of a three-state address
bus (AB) data bus (DB), and control bus (CB). The Internal bus Is essentially
the same for a variety of different microprocessor chips. Slight differences
tne definition of control bus signals from different processors, but
exist lii
most can be made to produce a standard set with the addition of a few external
gates. Therefore we define a standard Internal bus for 16-bit computers with
augmentation for error detection as:
AB/ADDRESS BUS <0:17>
AB <0:15>: = 16 bit address
AB <16> reserved for error detecting code
AB <17> reserved for error detecting code
DB/DATA BUS <0:17>
DB <0:15>: = 16 bit data
DB <16> reserved for error detecting code
DB <17> reserved for error detecting code
CB/CONTROL BUS <0:3>
RWL/READ-WRITE LEVEL: = CB <0>
RWL/NOT READ-WRIIE LEVEL: = CB <1>
MST/NOT MEMORY START: = CB <2>
(Clearly the same sort of definition can be made for computers of different word length, but for the systems we are currently considering, 16-bit
machines are most likely for near term use.)
The address and data buses are the same as the "typical" computer except
that two additional lines have been added to each to allow Implementation of
an error detecting code. A computer may use zero, one, or both of these lines
for internal fault detection. If one line is used, (AB <16> or DB <16>) odd
parity is Implemented to detect single (and any odd) number of erroneous bits.
Two parity bits may be used on each bus to separately check odd parity over
even ana oad numbered bits of their respective bus - to provide detection of
single errors and any two adjacent errors In a word. The double adjacent
detection was added because area related faults In VLSI devices connected to
the bus are likely to damage either single or adjacent circuits. Stronger
checking can be employed by adding additional check bits in a straightforward
fasnion.
The control bus contains a true and complement representation of a
Read/Write signal to memory, and memory start and completion signals for
handshaking for asynchronous operation. By using the well known technique of
memory mapped I/O, (i.e. referenced I/0 devices as reserved memory addresses)
communication with peripheral devices is simple. All buses are three state so
that direct memory access devices can capture the buses and execute memory cycles.
With a standard internal bus interface of the type described above, it
Is now possible to define a set of building block circuits which interface
memory, processors, I/0, and communications interfaces to the Internal bus.
The building block interfaces provide the functionality needed to implement a
135
variety of computers with selectable fault tolerance features and the interfaces necessary to connect them Into SR, SRHS, or HR configurations. Several
building block circuits ot this type have been designed and reported elsewhere
[RENN 81].
A brief description of a set of building block circuits Is given
below ana shown in figure 4. These have somewhat more general applicability
buT Include the functions of the set referenced above. All building block
circuits snould be designed in such a way that they will detect and signal
Internal faults in the circuits that they Interface to the computer or in
their own internal logic concurrently with normal operation.
This fault
detection can be Implemented relatively Inexpensively using self checking
"morphic" logic [CAR] 77J.
a) Memory Interface BuLLdlLa BDlock MIBB
A MIBB Interfaces RAMs to the internal bus of a computer to form a
memory moaule, ana can provide several optional modes of operation. One of
three memory codes can be employed: (1) single parity error detection, (2)
double parity error detection, and (3) Hamming single error correction and
douDle error detection. In all cases, spare bit planes can be connected and
substiTuTed for any active bit plane which fails. Three optional internal bus
coaes can also be selected depending upon what is provided in the computer (1)
no parity, (2) single parity check and (3) double parity check on both the AB
ana DB as described above. Error status and the position of erroneous bits
can be read out using reserved addresses, and the Internal configuration of
the MIBB, (e.g. substitution of spare bit planes and "soft" address range assignments) Is accomplished by storing commands into reserved addresses.
b) Processor Interface Buldlna
Blocks PIBB
Two types of PIBBs have been considered. The first type interfaces with
the sTancard Internal bus and provides three additional "local" internal buses
to which tnree processors are connected. It operates three processors synchronously and provides majority voting on three outputs for fault masking.
The PIBB can be externally configured to use only one processor if the other
two have failed.
Double parity (over odd/even bits on the Address and Data
buses described above) is employed on the internal buses and the PIBB checks
incoming data and encodes outgoing addresses and data to allow fault detection
on Internal bus transmissions.
Due to tne very large number of pins required by the voting PIBB, a
second type of PIBB has been defined. This PIBB operates two processors and
compares tneir outputs for fault detection. A spare processor can be substituTed for either active processor and, it two have failed, the PIBB can be
commanded to operate with a single processor (without processor fault detection).
The double parity code Is used on the internal bus as above. This
PIBB is shown in figure 4.
A single processor may be used In a computer module without a PIBB and
wiTnouT concurrent fault detection. When a PIBB and two or three processors
are employed, processor fault detection is provided.
c) JQ Buildina
Blocks
136
I
'
L
Ak a
rC
c: C) o
U
C
I 11
ciU;
Ib_
Q)
co Q)
I
a)
--
---
_
m
I
_
--1
b
) tHI
F
;
---E 4*
l
c0
c
y.CI
z
9
0
-I
.. II
I I-I
mP~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:~~~~~~~~~~~
0
1
C
0)1
L))
137
a) '
H
-
E
a)
!
|
I/O building blocks provide an interface between the computers standard
internal bus and the wires which connect a computer to the sensors and actuators ot its host subsystem. Typical standard I/0 functions are: a) parallel
data in and out, b) serial data in and out, c) pulse counting, pulse generation, d) analog multiplexor and A to D converter , e) adjustable frequency
ouTput, and f) DMA channels. Since most of these functions are rather simple,
it is possible to implement several on a building block LSI device, and select
one or more by wiring special connecting pins. (An exception is the analog
circuitry). Control of all I/0 operations Is memory mapped, using the standard internal bus described above, i.e. special memory addresses are reserved
for each I/O operation. An input Is Initiated by a read from the associated
address, ana an output or interface command is initiated by a write to an associated out-of-range address.
When error detecting codes are employed on the Internal bus, the I/O
building block can detect internal faults in much of its logic by passing the
couea data tnrough to the output lines and checking them for proper coding.
This is the case with parallel and serial outputs. Pulse circuits, frequency
generators, DMA channels and internal control logic are duplicated on-chip and
compared for fault detection.
Inputs (if not encoded externally) must be encoded before being conveyed
to the internal bus. Duplication with comparison can be used to detect faults
in input circuitry, but care must be taken to prevent synchronization failure
(e.g. resampling upon disagreement).
Finally discrete isolation circuitry should be provided for outputs. If
several redundant computers are connected to the same output lines a short
represents a catastrophic failure. Similarly, inputs should be short protected.
d) Intercommunications Interface Buildina Blocks (1113B)
The IIBB is used to interface a computer to the intercommunications network of a distributed system. Typically several IIBBs are used In each computer to provide redundant access to the communications system. As discussed
in section IV above, the IIBBs should be implemented with internal fault
safe shutdetection circuitry based on self-checking logic, to provide fail
down on detection of an internal fault. Access protection for fault containment was described above.
The definition of the IIBBs profoundly affects the functionality of the
distributed system. Therefore the IIBBs should act as fairly powerful "front
ena" processors which can autonomously handle most of the details of moving
One desireable feature is the ability to move data
data between machines.
of
the computers in the network with a minimum of
directly between memories
distraction So on-going software. By Implementing direct memory access in intercommunications, one computer can diagnose and reconfigure another using
memory-mapped commands, and can easily load another computer's memory for initialization during fault recovery.
138
A Bus Interface Building Block was designed at the Jet Propulsion Laboratory to serve as an interface to MILSTD 1555A buses. The design is seltchecking and can be microprogrammed to act as either a terminal or controller.
One controller and several terminals were to be implemented in each computer,
wlTn each connected to separate buses to provide redundant communications.
These interfaces have moderately high capabilites and are able to Identify
messages and load them (by direct memory access) into the host computers
memory as specified by local control tables.
The 1553A bus will work adequately in systems which employ SRHS redundancy In computer sets, but it has several undesirable features for more general application. It cannot address groups of three computers (in a TMR configuration) with a single message. Another problem is Its central control.
Fault Conaltions In terminal computers can only be identified by polling.
Therefore we are currently examining alternative bus structures as discussed
In section IV.
Building Block Retrofittina
The memory, Intercommunications and I/O building blocks described above
can be used individually with existing computers. For example, a conventional
computer might be fitted with IIBBs to allow it to communicate In the distributed system.
The MIBB might be employed to provide memory transient fault
protection, or I/O BBs migh serve as convenient devices for local connections.
In these cases, however, the host computer does not have complete Internai fault detection and can be expected to generate faulty outputs between the
time that a fault occurs and is detected. Similarly, many local transients
will go undetected and require external intervention for correction.
In order to provide thorough concurrent fault detection and transient
correction within the computer it is necessary to employ all the bulding
blocks including the PIBB (with a redundant processor), and the final building
block, the Fault Hanaler.
With a full complement of building blocks seltchecking computers are constructed.
The Fault Handler (FH)
The final building block is the fault handler. It is responsible for
actions when a fault is detected in one of the other building block circuits.
NoTe tnat tne building blocks, as defined, are self checking.
Each building
block generates a two-wire Uncorrectable Fault signal when it detects a fault
in Itself or its associated (memory, intercommunications, or I/0) circuitry.
Whenever tne FH receives a fault signal, it disables all outputs from
the computer by disabling the IIBB and I/O BB circuits. It may optionally initiate a program rollback by resetting and restarting all the processors.
If
the fault recurs, the processors are halted and the FU waits for external intervention. If the rollback is successful, the program can command that the
computers outputs be re-activated. The FH can be built using self checking
139
logic, and internal duplication circuits so that its own
shut down the computer.
failures
will
also
YV Summary
By Implementing a set of LSI and VLSI building block circuits, it will
be possible 3 to construct a variety of fault tolerant distributed computing
systems for C applications. Two building blocks have already been developed
A MIBB has been built along with a Core building
to the breadboard stage.
block which combines the function of the PIBB for duplex computers and the
Fault Handler[RENN 81]. Our current research centers on intercommunications
building blocks. All the building block circuits with the exception of the
IIBB have a complexity equivalent to fewer than 5,000 gates and can be readily
implemented as LSI. The Intercommunications interface already designed (BIBB)
has a complexity of about 10,000 gates and more advanced implementations,
Thus its
unaer consideration are expected to about double that complexity.
implementation will be more difficult but still feasible. We view such building blocks as a key enabling technology for future distributed systems.
Acknowledament
This research was sponsored jointly by the Aerospace Corporation under
Contract 23107N, and the Office of Naval Research under contract N00014-79-C0t66.
The building block development at the Jet Propulsion Laboratory was Initiated by the Naval Ocean System Center, Code 925.
140
References
EBELL 71J Bell C.G., and A. Newell: Computer Structures:
ples. New York, McGraw-Hill 1971.
Readings
and
Exam-
[AVIZ 71] A. Avizienis, et. al., "The STAR (Selt-TesTlng-And-Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant
Computer Design," IEEE Trans. Computers, Vol. C-20, No. 11, Nov.
1971, pp. 1312-1321.
[RENN 78] D. Rennels, "Architectures for Fault-Tolerant Spacecraft Computers,
Proc. IEEE, Vol. 66, No. 10, October 1978, pp. 1255-1268.
[TOY
78] W. N. Toy, "Fault Tolerant Design of Local ESS Processors,"
IEEE. Vol. 66, No. 10, October 1978, pp. 1126-1145.
Proc.
[SIIF 76J J. J. Stittler, "Architectural Design for Near 100Q Fault Coverage,"
Proc. 1976 IEEE Jnt. Symp
Sm. on Fault Tolerant Comutin, June 21-23,
1976, Pittsburgh, PA.
[WENS 78] J. H. Wensley et. al., "Sift: Design and Analysis of a FaultTolerant Computer for Aircraft Control," Proc. IEE., Vol. 66 No. 10,
Oct. 1978, pp. 1240-12)5.
[RENN 80] D. Rennels, et. al., "Selective Redundancy in a Building Block Distributed Computing System," D1i. Government Microcircuit Applications Conference, Houston, TX, November 198u.
[CRAN 80] R. Crane, et. al., "Practical considerations in Ethernet Local Network Design, Proc. Hawaii
Conf.
Cont.on System Science, January
1980.
[RENN 81J D. Rennels, et. al., Fault Tolerant Computer Study Final Report JPL
Publication 80-73, Jet Propulsion Laboratory, Pasadena, CA., February 1, 1981.
[CAKT 77] W. C. Carter, et. al., "Cost Effectiveness of Self-ChecKing Computer
Design," Dig. 1977 lnt. Symp. on Fault Tolerant Computing, Los
Angeles, CA., June 197/, pp. 117-123.
141
142
APPLICATION
OF CURRENT
AI
TECHNOLOGIES
BY
Robert J. Bechtel
NavaZ Ocean Systems Center
San Diego, California 92152
143
TO C2
APPLICATION OF CURRENT AI TECHNOLOGIES TO C2
Robert J. Bechtel
Naval Ocean Systems Center
San Diego, CA
Artificial intelligence (AI) shows promise of providing new tools
to tackle some of the problems faced by C2 system designers.
Unfortunately, few C2 experts are also conversant with the
current state-of-the-art in artificial intelligence. We will
present an overview of the information processing needed to
support command and control, then examine a number of areas of
artificial intelligence for both general relevance and specific
system availability in the identified processing areas.
Since artificial intelligence is the less well known field, we
will briefly describe here why the
areas
of
artificial
intelligence surveyed are important to C2 applications.
Knowledge Representation Systems
Intelligent systems or programs rely on two sources for their
power. The first is well designed algorithms and heuristics,
which specify the processes to be undertaken in pursuit of a
task. The second is accurate, well organized information about
the task domain, which provides material for the algorithms and
heuristics to work with. The command and control task domain has
a wealth of information associated with it, but to date very
little of this information has been made available in a useful
form for artificial intelligence programs. Organization of the
information is as important as its capture, because poorly
organized data will either (at best) slow the system or
(at
worst) be inaccessible.
Knowledge representation systems seek to provide frameworks for
organizing and storing information. Designers of
different
systems perceive different problem areas that need work, and thus
different systems do different things well.
Knowledge Presentation-Systems
Artificial intelligence programs usually require large amounts of
information about the domain in which they operate. In addition
to this domain knowledge, there is usually also a large amount of
information within the domain which the program will process to
144
perform
its function.
For example, a tactical situation
assessment program's
domain
knowledge
may
include
both
information
about
formats
for
storing information about
platforms, sensors, and sightings,
and
information
about
platforms, sensors, and sightings themselves, stored as the
formats specify.
Users should have access to the information within the domain
that is used by the program, both because it may be useful in raw
form, and as a check on the program's operation. Managing the
presentation of such information is a complex task which has not
been as well explored as the problems of information acquisition.
The most widely used technique of knowledge presentation to date
has been ordinary text.
Occasionally the presentation is
organized in a question-answering form, but more commonly it is
not under user control at all. It has been especially difficult
for users to tailor the information presented to match their
needs, concerns, and preferences. Presentation modes other than
text (such as graphics or voice) have been extremely limited.
Inference -Systems
Inference is the process of drawing conclusions, of adding
information to a knowledge
(data) base on the
basis
of
information that is already there. Inference systems may operate
in many different ways. One of the most useful forms is that of
rule-based systems. Here, knowledge is structured in rules which
are applied to facts to reach conclusions. The method of rule
application forms the process base for inference, while the rules
are the knowledge structuring base.
Natural -Language -Processing
The ability to use a natural language such as English to
communicate with a computer has long been a goal of artificial
intelligence researchers.
A
language
understanding
and
generating capability could conceivably remove many obstacles
that presently obstruct the man-machine interface.
We restrict
our examination to printed input and output, ignoring speech.
Planning -and -Problem-Solving
Planning
and problem solving, the process of determining,
examining, and deciding among alternatives, is at the heart of
145
the command and control domain. Knowledge representation and
presentation, natural language interfaces, and inference systems
are all useful as components to support the assessment and
decision processes.
Current artificial intelligence planning systems combine aspects
of the preceding areas to propose action sequences to accomplish
goals. Most of the existing systems also have some ability to
monitor the execution of proposed action sequences to insure
satisfactory achievement of the goal.
146
REFERENCES
[Anderson 771 Anderson, R. H., Gallegos, M., Gillogly, J. J.,
Greenberg, R. B., and Villanueva, R., RIT Referenc e Manual,
The Rand Corporation, Technical Report R-1808-ARPA, 1977.
[Brachman 78a] Brachman, R. J., Qn the Epistemological Status Df
Semantic Networks, Bolt Beranek and Newman Inc., BBN
Report 3807, April 1978.
[Brachman 78b] Brachman, R. J., Ciccarelli, E., Greenfeld,
N. R., and Yonke, M. D., KL-ONE Reference Manual, Bolt Beranek
and Newman Inc., BBN Report 3848, July 1978.
[Brachman 80]
Brachman, R. J. and Smith, B. C., "Special Issue
on Knowledge Representation," SIGART Newsletter, (70),
February 1980, 1-138.
[Charniak 80] Charniak, E., Riesbeck, C. K., and McDermott,
D. V., Artificial Intellience Programming, Lawrence Erlbaum
Associates, Hillsdale, New Jersey, 1980.
[Davis 78] Davis, R., "Knowledge Acquisition in Rule-Based
Systems: Knowledge about Representation as a Basis for System
Construction and Maintenance," in Waterman, D. A. and
Hayes-Roth, F. (ed.), Pattern-Directed Inference Systems,
Academic Press, 1978.
[Engelman 79] Engelman, C., Berg, C. H., and Bischoff, M.,
"KNOBS: An Experimental Knowledge Based Tactical Air Mission
Planning System and a Rule Based Aircraft Identification
Simulation Facility,n in Proceedings Qf the Si£xth
International Joint Conference on Artificial Intelligence,
pp. 247-249, International Joint Conferences on Artificial
Intelligence, 1979.
[Engelman 80]
Engelman, C., Scarl, E. A., and Berg, C. H.,
"Interactive Frame Instantiation," in Proceedings af the
First Annual National Conference on ArAtificial Intelligence,
pp. 184-186, 1980.
[Fikes 71]
Fikes, R. E. and Nilsson, N. J., "STRIPS: A new
approach to the application of theorem proving to problem
solving," Artifiial Intelliaence 2, 1971, 189-208.
[Gershman 77]
Gershman, A. V., Analyzing English Noun Groups fDX.
their Conceptual Content, Department of Computer Science,
Yale University, Research Report 110, May 1977.
147
[Granger 801 Granger, R. H., Adaptive Understanding: Correcting
erroneous inferences, Department of Computer Science, Yale
University, Research Report 171, January 1980.
[Greenfeld 79] Greenfeld, N. R. and Yonke, M. D., AIPS: An
Information Presentation System for Decision Makers, Bolt
Beranek and Newman Inc., BBN Report 4228, December 1979.
Herot, C. F., Carling, R., Friedell, M., and
[Herot 80]
Kramlich, D., "A Prototype Spatial Data Management System,"
in SIGGRAPH 80 Conference Proceedings, pp. 63-70, 1980.
Konolige, K. and Nilsson, N., "Multiple-Agent
[Konolige 80]
Planning Systems," in NCAI, pp. 138-141, 1980.
[McCall 791
McCall, D. C., Morris, P. H., Kibler, D. F., and
Bechtel, R. J.,
STAMMER2 Production System for Tactical
Situation Assessment, Naval Ocean Systems Center, San Diego,
CA, Technical Document 298, October 1979.
[Meehan 76] Meehan, J. R., _The Metanovel: Telling Stories by
Computer, Ph.D. thesis, Yale University, December 1976.
[Riesbeck 74] Riesbeck, C. K., Computational Understanding:
Analysis of sentences and context, Fondazione Dalle Molle per
gli studi linguistici e di communicazione internazionale,
Castagnola, Switzerland, Working Paper, 1974.
Robinson, A. E. and Wilkins, D. E., "Representing
[Robinson 80]
in NCAI, pp. 148-150,
Knowledge in an Interactive Planner,"
1980.
[Sacerdoti 74] Sacerdoti, E. D., "Planning in a Hierarchy of
Abstraction Spaces," Artificial Intelliaence 5, 1974,
115-135.
[Sacerdoti 77] Sacerdoti, E. D., A Structure for
Behavior, Elsevier, 1977.
Plans and
Schank, R. and Abelson, R., Scripts, Plans, Goals,
[Schank 77]
r into human knowledge
and Understanding: An
structures, Lawrence Erlbaum Associates, Hillsdale, New
Jersey, 1977.
[Shortliffe 76] Shortliffe, E. H., Comuter-based medical
consultations: MYCIN, American Elsevier, 1976.
148
[Stefik 79] M. Stefik, "An Examination of a Frame-Structured
Representation System," in Proceedings of the Sixth
International Joint Conference on Artificial Intelligence,
pp. 845-852, International Joint Conferences on Artificial
Intelligence, Tokyo, August 1979.
[van Melle 79]
van Melle, W.,
"A Domain-Independent
Production-Rule System for Consultation Programs," in
Proceedings of the Sixth International Joint Conference on
Artificial Intelligence, pp. 923-925, 1979.
[Waterman 79] Waterman, D. A., Anderson, R. H., Hayes-Roth, F.,
Klahr, P., Martins, G., and Rosenschein, S. J., Design of a
Rule-Qriented System for Implementing Expertise, The Rand
Corporation, Rand Note N-1158-1-ARPA, May 1979.
[Wilensky 78] Wilensky, R., Understanding Goal-Based Stories,
Department of Computer Science, Yale University, Research
Report 140, September 1978.
[Woods 70] Woods, W., "Transition network grammars for natural
language analysis," Communications of the ACM 13, 1970,
591-606.
[Woods 72]
Woods, W. A., Kaplan, R. M., and Nash-Webber, B. L.,
The LUNAR Sciences Natural Language
Information System: Final
Report, Bolt Beranek and Newman Inc., BBN Report 2378, 1972.
[Zdybel 79]
Zdybel, F., Yonke, M. D., and Greenfeld, N. R.,
Application af
Symbolic Processing to Command and Control,
Final Report, Bolt Beranek and Newman Inc., BBN Report 3849,
November 1979.
[Zdybel 80]
Zdybel, F., Greenfeld, N., and Yonke, M.,
Application of Symbolic Processing to Command And Control:
An Advanced Information Presentation System, Annual Technical
Report, Bolt Beranek and Newman Inc., BBN Report 4371, April
1980.
149
150
A PROTOCOL LEARNING
SYSTEM FOR CAPTURING
DECISION-MAKER LOGIC
BY
Robert J. BechteZ
Naval Ocean Systems Center
San Diego, California 92Z52
151
A PROTOCOL LEARNING SYSTEM FOR CAPTURING DECISION-MAKER LOGIC
Robert J. Bechtel
Naval Ocean Systems Center
San Diego, CA
A current primary effort underway in the artificial intelligence
group at NOSC is the development of a protocol learning system
(PLS),
that is, a system which learns from protocols, or records
of behavior. We intend to use the PLS as a tool to acquire
knowledge about naval domains from experts in those domains so
that the knowledge can be used by computer systems to perform
useful tasks in the domains.
Our knowledge acquisition effort has two domain foci: the
(TSA) system,
existing rule-based tactical situation assessment
and a mission planning support system under current development.
Most of our present efforts are directed to the TSA application,
since there is an existing software system to enhance.
Knowledge -acquisition --in -the TSA domain
In the TSA context, we have interpreted knowledge acquisition as
the modification and enhancement of a rule collection through
interaction with a domain expert during the actual situation
assessment process. The computer system will present the input
(from reports and sensors) to the user, along with
information
system conclusions. The user will then be prompted for agreement
or disagreement with the conclusions. In cases where the user
disagrees with the system conclusions, a dialogue will ensue to
modify the system's reasoning to be more acceptable to the
expert. Previous work in this area has been done by Davis,
resulting in a system called TEIRESIAS. We anticipate drawing
heavily on this work.
During modification of the system's reasoning, existing rules may
be deleted or modified. New rules may be added to the rule set
which supplement or supplant existing rules. With such wholesale
changes underway, it would be easy to lose track of the current
capabilities of the system, especially of those conclusions
reached through a long chain of rule applications. As a part of
a feedback mechanism to let the expert know about the effects of
the changes he has made, we are developing a rule-merging
subsystem which will combine rules and present the combinations
to the user for his approval. This user approval forms a sort of
"sensibility check."
152
Rule merging and sensibility checking are techniques that are
applied after changes have been made to a rule set.
These
changes will be made by an expert user to cause the system's
reasoning to conform more closely to his own.
Detecting the
user's desire to make a change and understanding the nature of
the change desired fall within the scope of what we call dialogue
control.
Dialogue control is the problem of deciding what to do next in an
interaction. At some points in the interaction the options are
clear and a decision point is well defined.
For example,
immediately after presenting any conclusions, the system should
solicit
the
user's
agreement or disagreement with them.
(Disagreement here includes the possibility that additional
conclusions could have been reached.) If the system's reasoning
is sound, no additional interaction is required, though it may be
allowed. However,
if the expert disagrees, the system must
initiate and control (or direct) a dialogue with the user to
- localize the cause of the disagreement
- explore alternatives to resolve the disagreement
- select and implement one
change to the rule set.
of
the
alternatives as a
Implementing changes that the user desires requires some form of
editing capability.
Controlling the editor is then a problem.
Clearly, the user should not be burdened with learning the
intricacies of either the internal representation of the rules or
of a special language for changing those representations. The
editor (whether LISP or special purpose) must be under the
control of the dialogue controller, which will interpret the
user's desires and translate them into editor commands.
Knowledge acquisition -in -the mission pl-anning -domain
Many of the same considerations are in effect in the mission
planning support system. However, there are some unique aspects
to the mission planning acquisition problem.
The mission
planning task is concerned with developing a plan to achieve a
particular mission or goal.
By relying on a model of the
relations between goals, plans, actions, and states, we have
implemented a very rudimentary dialogue control system which
interacts with a user to collect methods for achieving goals, and
to collect goals for which methods may be applied. Since the
focus so far has been on the dialogue control, the interface
(in
153
is still primitive,
terms
of language capability)
permanent record is kept of the knowledge elicited.
Another difficulty with this preliminary system
concentrates exclusively on plans and goals.
mechanism for describing objects, properties of
relations between objects, even though they form
part of the domain.
and
no
is that it
There is no
objects, or
an important
Reference
Knowledge acquisition in rule-based systems -- knowledge about
for system construction and
a
basis
as
representations
maintenance. R. Davis. In D.A. Waterman and F. Hayes-Roth (eds.),
Pattern-Directed vInference Systems, Academic Press, 1978.
154
ON
USING THE
AVALIABLE GENERAL-PURPOSE
EXPERT-SYSTEMS PROGRAMS
BY
Carroll K. Johnson
Naval Research Laboratory and
Oak Ridge National Laboratory
Washington D,C,
155
20375
ON USING THE AVAILABLE GENERAL-PURPOSE
EXPERT-SYSTEMS PROGRAMS
Carroll K. Johnson
Naval Research Laboratoryl and
Oak Ridge National Laboratory2
Many research groups have become interested in trying Artificial
Intelligence (AI) programming techniques on their own research problems.
The most promising developments in applied AI for most groups are the expert
systems programs. Evaluation projects on expert systems are being carried out
by several potential users. This paper summarizes an informal evaluation
.project in progress at Oak Ridge National Laboratory (ORNL), and the beginning
of a much larger scale effort in the Navy Center for Applied Research in
Artificial Intelligence at the Naval Research Laboratory (NRL).
The approach used at ORNL was to organize a group called the programmed
reasoning methodology panel with nine computer scientists, physical scientists
and engineers participating part time. The principal activity is to gain
familiarty with available expert systems programs by applying them to a broad
range of practical problems.
Network access to the SUMEX-AIM facility provided an opportunity to
use the systems developed by the Stanford Heuristic Programming Project,
particularly EMYCIN and AGE which are written in the language INTERLISP.
The ORNL computing facilities include a DEC KL-10, but its TOPS-10 operating
system will not support INTERLISP. However, other LISP dialects such as
MACLISP can be run under TOPS-10, and the Carnegie Mellon expert system OPS-5
written in MACLISP was chosen as the principal LISP-based expert system to be
used at ORNL.
Another expert systems program suitable for local implementation was the
FORTRAN coded program EXPERT written at Rutgers. EXPERT is an excellent
program to use for trying out production rule programming in a non-LISP
computing environment. The program also is reasonably economical in computer
run time requirements.
For problems which can be formulated into a tree-structured consulting
dialog, either EMYCIN or EXPERT can be used advantageously. Rule sets were
developed on both systems for a chemical spectroscopy problem involving
analytical chemistry interpretation of joint infra-red, nuclear magnetic
resonance, and mass spectral data. That application is more closely related
to signal interpretation than to consulting, thus more automatic data entry
would be required for a useful real-time implementation. Another project
underway is a more traditional consulting project involving assistance to
users setting up IBM job control language (JCL).
156
The OPS-5 expert systems program is being used in collaboration with
the system's developers from Carnegie Mellon. The two problem areas being
developed with OPS-5 are safeguards for nuclear fuel reprocessing, and countermeasures for oil and hazardous chemical spills. The spills countermeasures
problem was utilized as the "mystery problem" for an Expert Systems Workshop
held in August 1980. During the week-long workshop, eight different expert
systems (AGE, EMYCIN, EXPERT, HEARSAY III, KAS, OPS-5, ROSIE and RLL) were
applied to the problem.
At the present time (February 1981), the Navy AI Center at NRL is just
starting to be staffed and equipped. The author is a visiting scientist at
the Center and plans to develop an expert system for aiding technicians
troubleshooting electronic equipment. Control of semi-automatic test equipment, a large knowledge base, and a convenient man-machine interface are
required. OPS-5 and ROSIE are the first expert systems to be tried in this
application.
1
Work being done is sponsored by the Office of Naval Research and the Naval
Research Laboratory.
2
Research at Oak Ridge sponsored by the Division of Materials Sciences,
U.S. Department of Energy, Under Contract W-7405-eng-26 with the Union
Carbide Corporation.
157
158
APPENDIX
FOURTH MIT/ONR WORKSHOP ON
DISTRIBUTED INFORMATION AND DECISION SYSTEMS
MOTIVATED BY COMMAND-CONTROL-COMMUNICATIONS
(C ) PROBLEMS
June 15, 1981 through June 26, 1981
San Diego, California
List of Attendees
Table of Contents Volumes I-IV
159
MIT/ONR WORKSHOP OF DISTRIBUTED INFORMATION AND DECISION SYSTEMS
MOTIVATED BY COMMAND-CONTROL-COMMUNICATIONS (C3 ) PROBLEMS
JUNE 15, 1981
-
JUNE 26, 1981
ATTENDEES
David S. AZberts
Special Asst. to Vice President
& General Manager
The MITRE Corporation
1820 Dolley Madison Blvd.
McLean, VA 22102
Tel: (703) 827-6528
Vidyadhana Raj AviZZlla
Electronics Engineer
Naval Ocean Systems Center
Code 8241
San Diego, CA 92152
Tel: (714) 225-6258
Dennis J. Baker
Research Staff
Naval Research Laboratory
Code 7558
Washington DC 20375
Tel: (202) 767-2586
Glen AZiqZgaier
Electronics Engineer
Naval Ocean Systems Center
Code 8242
San Diego, CA 92152
Tel: (714) 225-7777
Ami Arbel
Senior Research Engineer
Advanced Information & Decision Systems
201 San Antonio Circle #201
Mountain View, CA 94040
Tel: (415) 941-3912
Jay K. Beam
Senior Engineer
Johns Hopkins University
Applied Physics Laboratory
Johns Hopkins Road
Laurel, MD 20810
Tel: (301) 953-7100 x3265
Michael Athans
Professor of Electrical Engineering
& Computer Science
Laboratory for Information and
Decision Systems
Massachusetts Institute of Technology
Room 35-406
Cambridge, MA 02139
Tel: (617) 253-6173
DanielZ A. Atkinson
Executive Analyst
CTEC, Inc.
7777 Leesburg Pike
Falls Church, VA 22043
Tel: (703) 827-2769
AlZan R. Barnum
Technical Director
Information Sciences Division
Rome Air Development Center
Griffiss AFB, NY 13441
Tel: (315) 330-2204
Robert Bechtel
Scientist
Naval Ocean Systems Center
Code 8242
San Diego, CA 92152
Tel; (714) 225-7778
160
160
ATTENDEES C
PAGE TWO
CONFERENCE
Vitalius Benokraitis
Alfred Brandstein
Mathematician
US ARMY Material Systems Analysis
ATTN: DRXSY - AAG
Aberdeen Proving Ground, MD 21005
Tel:
Systems Analysis Branch
CDSA MCDEC USMC
Quantico, VA 22134
Tel: (703) 640-3236
(301) 278-3476
James V. Bronson
Lieutenant Colonel USMC
Naval Ocean Systems Center
MCLNO Code 033
San Diego, CA 92152
Patricia A. Billingsley
Research Psychologist
Navy Personnel R&D Center
Code 17
San Diego, CA 92152
Tel: (714) 225-2081
Tel:
(714) 225-2383
Rudolph C. Brown, Sr.
Westinghouse Electric Corporation
P. 0. 746
MS - 434
Baltimore, MD 21203
Tel:
William B. Bohan
Operations Research Analyst
Naval Ocean Systems Center
Code 722
San Diego, CA 92152
Tel: (714) 225-7778
Thomas G. Bugenhagen
Group Supervisor
Applied Physics Laboratory
Johns Hopkins University
Johns Hopkins Road
Laurel, MD 20810
Tel: (301) 953-7100
James Bond
Senior Scientist
Naval Ocean Systems Center
Code 721
San Diego, CA 92152
Tel: (714) 225-2384
James R. Callan
Paul L. Bongiovanni
Research Engineer
Naval Underwater Systems Center
Code 3521 Bldg. 1171-2
Newport, RI 02840
Tel: (401) 841-4872
Research Psychologist
Navy Personnel R&D Center
Code 302
San Diego, CA 92152
Tel: (714) 225-2081
David Castanon
Christopher Bowman
Member, Technical Staff
Research Associate
Laboratory for Information and
VERAC, Inc.
10975 Torreyana Road
Suite 300
San Diego, CA 92121
Tel: (714) 457-5550
Decision Systems
Massachusetts Institute of
Technology
Room 35-331
Cambridge, MA 02139
Tel: (617) 253-2125
161
ATTENDEES C 3 CONFERENCE
PAGE THREE
S. I. Chou
Robin DillZZard
Engineer
Naval Ocean Systems Center
Mathematician
Naval Ocean Systems Center
Code 713B
San Diego, CA 92152
Tel: (714) 225-2391
Code 824
San Diego, CA 92152
Tel: (714) 225-7778
Gerald A. CZapp
EZizabeth R. Ducot
Physicist
Naval Ocean Systems Center
Code 8105
San Diego, CA 92152
Tel: (714) 225-2044
Research Staff
Laboratory for Information and
Decision Systems
Massachusetts Institute of
Technology
Room 35-410
Cambridge, MA 02139
Tel: (617) 253-7277
Douglas Cochran
Scientist
Bolt Beranek & Newman Inc.
50 Moulton Street
Cambridge, MA 02138
Tel: (415) 968-9061
Donald R. Edmonds
Group Leader
MITRE Corporation
1820 Dolley Madison Blvd.
McLean, VA 22102
Tel: (702) 827-6808
A. Brinton Cooper, III
Chief, C3 Analysis
US ARMY Material Systems Analysis
Martin Einhorn
ATTN: DRXSY-CC
Aberdeen Proving Ground, MD 21005
Tel: (301) 278-5478
Scientist
Systems Development Corporation
4025 Hancock Street
San Diego, CA 92110
Tel: (714) 225-1980
David E. Corman
Engineer
Jonhs Hopkins University
Leon Ekchian
Applied Physics Laboratory
Johns Hopkins Road
Laurel, MD 20810
Tel: (301) 953-7100 x521
Graduate Student
Laboratory for Information and
Decision Systems
Massachusetts Institute of
Technology
Room 35-409
WiZbur B. Davenport, Jr.
Professor of Communications Sciences
& Engineering
Laboratory for Information and
Decision Systems
Massachusetts Institute of Technology
Room 35-214
Cambridge, MA 02139
Tel: (617) 253-2150
Cambrdige, MA 02139
162
Tel:
(617) 253-5992
Thomas Fortmann
Senior Scientist
Bolt, Beranek & Newman, Inc.
o50
Moulton Street
Cambridge, MA 02138
Tel: (617) 497-3521
CONFERENCE
ATTENDEES C
PAGE FOUR
Clarence J. Funk
Peter P. Groumpos
Scientist
Naval Ocean Systems Center
Code 7211, Bldg. 146
San Diego, CA 92152
Tel: (714) 225-2386
Professor of Electrical Eng.
Cleveland State University
Cleveland, OH 44115
Tel: (216) 687-2592
George D. Halushynsky
Mario GerZa
Professor of Electrical Engineering
& Computer Science
University of California
Los Angeles
Boelter Hall 3732H
Los Angeles, CA 90024
Tel: (213) 825-4367
Member of Senior Staff
Johns Hopkins University
Applied Physics Laboratory
Johns Hopkins Road
Laurel, MD 20810
Tel: (301) 953-7100 x2714
Scott Harmon
Donald T.
Electronics Engineer
Naval Ocean Systems Center
Code 8321
San Diego, CA 92152
Tel: (714) 225-2083
GiZes, Jr.
Technical Group
The MITRE Corproation
1820 Dolley Madison Bldv.
McLean, VA 22102
Tel:
(703) 827-6311
David Haut
Research Staff
Naval Ocean Systems Center
Code 722
San Diego, CA 92152
Tel: (714) 225-2014
Irwin R. Goodman
Scientist
Naval Ocean Systems Center
Code 7232
Bayside Bldg. 128 Room 122
San Diego, CA 92152
Tel: (714) 225-2718
C. W. Heistrom
Frank GreitzerResearch Psychologist
Navy Personnel R&D Center
San Diego, CA 92152
Tel: (714) 225-2081
Professor of Electrical Eng.
& Computer Science
University of California,
San Diego
La Jolla, CA 92093
Tel: (714) 452-3816
Leonard S. Gross
Ray L. Hershman
Member of Technical Staff
VERAC, Inc.
10975 Torreyana Road
Suite 300
San Diego, CA 92121
Tel: (714) 457-5550
Research Psychologist
Navy Personnel R&D Center
Code P305
San Diego, CA 92152
Tel: (714) 225-2081
163
ATTENDEES C 3 CONFERENCE
PAGE FIVE
Sam R. HoZZllingsworth
CarrollZZ K. Johnson
Senior Research Scientist
Honeywell Systems & Research Center
2600 Ridgway Parkway
Minneapolis, MN 55413
Tel: (612) 378-4125
Visiting Scientist
Naval Research Laboratory
Code 7510
Washington DC 20375
Tel: (202) 767-2110
Kuan-Tsae Huang
Jesse KasZer
Graduate Student
Laboratory for Information and
Decision Systems
Massachusetts Institute of Technology
Room 35-329
Cambridge, MA 02139
Tel: (617) 253-
Electronics Engineer
Naval Ocean Systems Center
Code 9258, Bldg. 33
San Diego, CA 92152
Tel: (714) 225-2752
Richard T. KeZZlley
Research Psychologist
Navy Personnel R&D Center
James R. Hughes
Major, USMC
Concepts, Doctrine, and Studies
Development Center
Marine Corps Development & Education
Quantico, VA 22134
Code 17 (Command Systems)
San Diego, CA 92152
Tel:
(714) 225-2081
Tel:
David KZeinman
(703) 640-3235
Kent S. HuZZ
Commander, USN
Deputy Director,
Mathematical & Information Sciences
Office of Naval Research
Code 430B
800 N. Quincy
Arlington, VA 22217
Tel: (202) 696-4319
CarolZyn Hutchinson
Systems Engineer
Comptek Research Inc.
10731 Treena Street
Suite 200
San Diego, CA 92131
Tel: (714) 566-3831
Professor of Electrical Eng.
& Computer Science
University of Connecticut
Box U-157
Storrs, CT 06268
Tel: (203) 486-3066
Robert C. KoZb
Head Tactical Command
& Control Division
Naval Ocean Systems Center
Code 824
San Diego, CA 92152
Tel: (714) 225-2753
Michael Kovacich
Systems Engineer
Comptek Research Inc.
Mare Island Department
P.O. Box 2194
Vallejo, CA 94592
Tel: (707) 552-3538
164
ATTENDEES C 3 CONFERENCE
PAGE SIX
Timothy Kraft
Alexander H. Levis
Systems Engineer
Comptek Research, Inc.
10731 Treena Street
Suite 200
San Diego, CA 92131
Senior Research Scientist
Laboratory for Information and
Decision Systems
Massachusetts Institute of
Technology
Tel:
Room 35-410
Cambridge, MA 02139
Tel: (617) 253-7262
(714) 566-3831
Manfred Kraft
Diplom-Informatiker
Victor O.-K. Li
Professor of Electrical Eng.
& Systems
PHE
University of Southern California
Los Angeles, CA 90007
Tel: (213) 743-5543
Hochschule der Bundeswehr
Fachbereich Informatik
Werner-Heissenbergweg 39
8014 Neubiberg, West Germany
Tel: (0049) 6004-3351
Leslie Kramer
Senior Engineer
Glenn R. Linsenmayer
ALPHATECH, Inc.
3 New England Executive Park
Burlington, MA 01803
Tel: (617) 273-3388
Westinghouse Electric Corporation
P. O. Box 746 - M.S. 434
Baltimore, MD 21203
Tel: (301) 765-2243
Ronald W. Larsen
Pan-Tai Liu
Division Head
Naval Ocean Systems Center
Code 721
San Diego, CA 92152
Tel: (714) 225-2384
Professor of Mathematics
University of Rhode Island
Kingston, RI 02881
Tel: (401) 792-1000
Joel S. Lawson,
Robin Magonet-Neray
Graduate Student
Jr.
Laboratory for Information and
Decision Systems
Massachusetts Institute of
Technology
Room 35-403
Cambridge, MA 02139
Tel: 617) 253-2163
Chief Scientist C31
Naval Electronic Systems Command
Washington DC 20360
Tel: (202) 692-6410
Dan Leonard
Electronics Engineer
Kevin Malloy
Naval Ocean Systems Center
SCICON Consultancy
Code 8105
Sanderson
House
Sanderson House
49, Berners Street
San Diego, CA 92152
Tel: (714) 225-7093
London WlP 4AQ, United Kingdom
Tel: (01) 580-5599
165
ATTENDEES C 3 CONFERENCE
PAGE SEVEN
Dennis C. McCaZZll
CharZes L. Morefield
Mathematician
Naval Ocean Systems Center
Code 8242
San Diego, CA 92152
Tel: (714) 225-7778
Board Chairman
VERAC, Inc.
10975 Torreyana Road
Suite 300
San Diego, CA 92121
Tel: (714) 457-5550
Marvin Medina
Scientist
Naval Ocean Systems Center
San Diego, CA 92152
Tel: (714) 225-2772
Peter Morgan
SCICON Consultancy
49-57, Berners Street
London W1P 4AQ, United Kingdom
Tel: (01) 580-5599
MichaeZ Melich
Head, Command Information
John S.
Systems Laboratory
Naval Research Laboratory
Code 7577
Washington DC 20375
Tel: (202) 767-3959
Captain, USAF
TAFIG/IICJ
Langeley AFB, VA 23665
Tel: (804) 764-4975
Morrison
MichaelZ S.
Murphy
John MeZviZle
Member of Technical Staff
Naval Ocean Systems Center
Code 6322
San Diego, CA 92152
Tel: (714) 225-7459
VERAC, Inc.
10975 Torreyana Road
Suite 300
San Diego, CA 92121
Tel: (714) 357-5550
Glenn E. MitzeZ
Engineer
Johns Hopkins University
Applied Physics Laboratory
Johns Hopkins Road
Laurel, MD 20810
Tel: (301) 953-7100 x2638
Jim Pack
Naval Ocean Systems Center
Code 6322
San Diego, CA 92152
Tel: (714) 225-7459
Bruce Patyk
MichaeZ H. Moore
Naval Ocean Systems Center
Senior Control System Engineer
Systems Development Corporation
4025 Hancock Street
San Diego, CA 92037
Tel: (714) 225-1980
Code 9258, Bldg. 33
San Diego, CA 92152
Tel: (714) 225-2752
166
ATTENDEES C 3 CONFERENCE
PAGE EIGHT
RoZand Payne
Barry L. Reichard
Vice President
Advanced Information & Decision Systems
201 San Antonio Circle #286
Mountain View, CA 94040
Tel: (415) 941-3912
Field Artillery Coordinator
US Army Ballistic Research
Laboratory
ATTN: DRDAR-BLB
Aberdeen Proving Ground, MD 21014
Tel: (301) 278-3467
Anastassios Perakis
Graduate Student
Ocean Engineering
Massachusetts Institute of Technology
Room 5-426
Cambridge, MA 02139
Tel: (617) 253-6762
David Rennels
Professor of Computer Science
University of California, LA
3732 Boelter Hall
Los Angeles, CA 90024
(213) 825-2660
Tel:
LlZoyd S. Peters
Thomas P. Rona
Associate Director
Center for Defense Analysis
SRI International
EJ352
333 Ravenswood Avenue
Menlo Park, CA 94025
Tel: (415) 859-3650
Staff Scientist
Boeing Aerosapce Company
MS 84-56
P. O. Box 3999
Seattle, WA 98124
Tel: (206) 773-2435
HariZaos N. Psaraftis
Professor of Marine Systems
Massachusetts Institute of Technology
Room 5-213
Cambridge, MA 02139
Tel: (617) 253-7639
President & Treasurer
ALPHATECH, Inc.
3 New England Executive Park
Burlington, MA 01803
Tel: (617) 273-3388
Nils R. SandeZZ, Jr.
DanieZ Schutzer
Paul M. Reeves
Electronics Engineer
Naval Ocean Systems Center
Technical Director
Naval Intelligence
Chief of Naval Operations
Code 632
San Diego, CA 92152
Tel: (714) 225-2365
NOP 009T
Washington DC 20350
Tel: (202) 697-3299
167
ATTENDEES C 3 CONFERENCE
PAGE NINE
T. Tao
Adrian SegazZ
Professor of Electrical Engineering
Technion IIT
Haifa, Israel
Tel:
(617) 253-2533
Professor
Naval Postgraduate School
Code 62 TV
Monterey, CA 93940
Monterey, CA 93940
Tel: (408) 646-2393 or 2421
Prodip Sen
H. Gregory Tornatore
Polysystems Analysis Corporation
P. O. Box 846
Huntington, NY 11743
Tel: (516) 427-9888
Johns Hokpins University
Applied Physics Laboratory
Johns Hopkins Road
Laurel, MD 20810
Tel: (301) 953-7100 x2978
Harlan Sexton
Edison Tse
Naval Ocean Systems Center
Code 6322
San Diego, CA 92152
Tel: (714) 225-2502
Professor of Engineering
Economic Systems
Stanford University
Stanford, CA 94305
Tel: (415) 497-2300
Mark J. Shensa
Naval Ocean Systems Center
E. B. TurnstaZZ
Head, Ocean Surveillance
Systems Department
Naval Ocean Systems Center
Code 72
Code 6322
San Diego, CA 92152
Tel: (714) 225-2349 or 2501
J. R. Simpson
Office of Naval Research
800 N. Quincy
Arlington, VA 22217
San Diego, CA 92152
Tel:
Lena Vatavani
Research Scientist
Laboratory for Information and
Decision Systems
Massachusetts Institute of
Technology
Room 35-437
Cambridge, MA 02139
Tel: (617) 253-2157
Tel:
(202) 696-4321
Stuart H. Starr
Director Systems Evaluation
The Pentagon
DUSD (C31), OSD
Room 3E182
Washington DC 20301
Tel:
(202) 695-9229
168
(714) 225-7900
ATTENDEES C3 CONFERENCE
PAGE TEN
ManieZ Vineberg
Richard P. Wishner
Electronics Engineer
Naval Ocean Systems Center
Code 9258
San Diego, CA 92152
Tel: 714) 225-2752
President
Advanced Information & Decision
Systems
201 San Anotnio Circle
Suite 286
Mountain View, CA 94040
Tel: (415) 941-3912
Joseph H. Wack
Advisory Staff
Westinghouse Electric Corporation
Joseph G. WohZ
P. O. Box 746 MS-237
Baltimore, MD 21203
Tel: (301) 765-3098
V. P. Research & Development
ALPHATECH, Inc.
3 New England Executive Park
Burlington, MA 01803
Tel: (617) 273-3388
Jan D. Wald
Senior Research Scientist
Honeywell Inc.
Systems & Research Center
MN 17-2307
P. O. Box 312
Minneapolis, MN 55440
Tel: (612) 378-5018
John M. Wosencraft
Head of C3 Curriculum
Naval Postgraduate School
Code 74
Monterey, CA 93940
Tel: (408) 646-2535
Bruce K. Walker
Lofti A. Zadeh
Professor of Systems Engineering
Case Western Reserve University
Cleveland, OH 44106
Tel: (216) 368-4053
Professor of Computer Science
University of California
Berkeley, CA 94720
Tel: (415) 526-2569
David White
Advanced Technology, Inc.
2120 San Diego Avenue
Suite 105
San Diego, CA 92110
Tel: (714) 981-9883
Jeffrey E. Wieselthier
Naval Research Laboratory
Code 7521
Washington DC 20375
Tel: (202) 767-2586
169
SURVEILLANCE AND TARGET TRACKING
FOREWORD
.....................................................
DATA DEPENDENT ISSUES IN SURVEILLANCE PRODUCT INTEGRATION
Dr. DanieZ A. Atkinson ..........................................
MEMORY DETECTION MODELS FOR PHASE-RANDOM OCEAN ACOUSTIC FLUCTUATIONS
Professor HariZaos N. Psaraftis, Mr. Anatassios Perakis, and
Professor Peter N. MikhaheZlvsky .................................
DETECTION TRESHOLDS FOR MULTI-TARGET TRACKING IN CLUTTER
Dr. Thomas Fortmann,Professor Yaakov Bar-ShaZlom, and
Dr. MoZZy Scheffe .............................. ·.................
MULTISENSOR MULTITARGET TRACKING FOR INTERNETTED FIGHTERS
Dr. Christopher L. Bowman .......................................
MARCY:
A DATA CLUSTERING AND FUSION ALGORITHM FOR MULTI-TARGET
TRACKING IN OCEAN SURVEILLANCE
Dr. MichaeZ H. Moore ............................................
AN APOSTERIORI APPROACH TO THE MULTISENSOR CORRELATION OF DISSIMILAR
SOURCES
Dr. MichaeZ M. Kovacich .........................................
A UNIFIED VIEW OF MULTI-OBJECT TRACKING
Drs. Krishna R. Pattipati, Nils R. SandeZZ, Jr., and
Leslie C. Kramer ................................................
OVERVIEW OF SURVEILLANCE RESEARCH AT M.I.T.
Professor Robert R. Tenney ......................................
A DIFFERENTIAL GAME APPROACH TO DETERMINE PASSIVE TRACKING MANEUVERS
Dr. PauZ L. Bongiovanni
and Professor Pan-T. Liu ...............
170
DESCRIPTION OF AND RESULTS FROM A SURFACE OCEAN SURVEILLANCE
SIMULATION
Drs. Thomas G. Bugenhagen, Bruce Bundsen, and
Lane B. Carpenter .............................................
AN
OTH SURVEILLANCE CONCEPT
Drs. Leslie C. Kramer and Nils R. SandellZZ, Jr ..................
APPLICATION OF AI METHODOLOGIES TO THE OCEAN SURVEILLANCE
PROBLEM
Drs. Leonard S. Gross, Michael S. Murphy, and
Charles L. Morefield ..........................................
A PLATFORM-TRACK ASSOCIATION PRODUCTION
SUBSYSTEM
Ms. Robin Dillard .............................................
171
II
SYSTEM
FOREWORD
ARCHITECTURE AND EVALUATION
................................................
C I SYSTEMS EVALUATION PROGRAM
Dr. Stuart H. Starr .........................................
C
SYSTEM RESEARCH AND EVALUATION: A SURVEY AND ANALYSIS
Dr. David S. Alberts ........................................
THE INTELLIGENCE ANALYST PROBLEM
Dr. Daniel Schutzer .........................................
DERIVATION OF AN INFORMATION PROCESSING SYSTEMS (C3/MIS)
--ARCHITECTURAL MODEL -- A MARINE CORPS PERSPECTIVE
Lieutenant CoZoneZ James V. Bronson .........................
A CONCEPTUAL CONTROL MODEL FOR DISCUSSING COMBAT DIRECTION
SYSTEM (C2 ) ARCHITECTURAL ISSUES
Dr. Timothy Kraft and Mr. Thomas Murphy .....................
EVALUATING THE UTILITY OF JINTACCS MESSAGES
Captain John S. Morrison ....................................
FIRE SUPPORT CONTROL AT THE FIGHTING LEVEL
Mr. Barry L. Reichard .......................................
A PRACTICAL APPLICATION OF MAU IN PROGRAM DEVELOPMENT
Major James R. Hughes ........................................
HIERARCHICAL VALUE ASSESSMENT IN A TASK FORCE DECISION ENVIRONMENT
Dr.
Ami ArbeZ .............................................
172
OVER-THE-HORIZON, DETECTION, CLASSIFICATION AND TARGETING
(OTH/DC&T) SYSTEM CONCEPT SELECTION USING FUNCTIONAL FLOW
DIAGRAMS
Dr. Glenn E. MitzeZ .........................................
A SYSTEMS APPROACH TO COMMAND, CONTROL AND COMMUNICATIONS
SYSTEM DESIGN
Dr. Jay K. Beam and Mr. George D. HaZuschynsky ..............
MEASURES OF EFFECTIVENESS AND PERFORMANCE FOR YEAR 2000
TACTICAL C 3 SYSTEMS
Dr. Djimitri Wiggert ........................................
AN END USER FACILITY (EUF) FOR COMMAND, CONTROL, AND
COMMUNICATIONS (C3 )
Drs. Jan D. Wald and Sam R. HoZZingsworth ...................
173
COMMUNICATION,
FOREWORD
.......
DATA BASES
& DECISION SUPPORT
.......................
RELIABLE BROADCAST ALGORITHMS IN COMMUNICATIONS NETWORK
Professor Adrian SegaZZ .....................................
THE HF INTRA TASK FORCE COMMUNICATION NETWORK DESIGN STUDY
Drs. Dennis Baker, Jeffrey E. WieseLthier, and
Anthony Ephremides .....................................
FAIRNESS IN FLOW CONTROLLED NETWORKS
Professors Mario GerZa and Mark Staskaukas ..................
PERFORMANCE MODELS OF DISTRIBUTED DATABASE
Professor Victor O.-K. Li ...................................
ISSUES IN DATABASE MANAGEMENT SYSTEM COMMUNICATION
Mr. Kuan-Tsae Huang and Professor Wilbur B. Davenport, Jr...
MEASUREMENT OF INTER-NODAL DATA BASE COMMONALITY
Dr. David E. Corman .........................................
MULTITERMINAL RELIABILITY ANALYSIS OF DISTRIBUTED PROCESSING
SYSTEMS
Professors Aksenti Grnarov and Mario GerZa ..................
FAULT TOLERANCE IMPLEMENTATION ISSUES USING CONTEMPORARY
TECHNOLOGY
Professor David Rennels .....................................
APPLICATION OF CURRENT AI TECHNOLOGIES TO C2
Dr. Robert BechtaZ ..........................................
174
A PROTOCOL LEARNING SYSTEM FOR CAPTURING DECISION-MAKER LOGIC
Dr. Robert BechtaZ ...........................................
ON USING THE AVAILABLE GENERAL-PURPOSE EXPERT-SYSTEMS PROGRAMS
Dr- Carroll K. Johnson .......................................
175
IV
C
FOREWORD
...
THEORY
.........
RATE OF CHANGE OF UNCERTAINTY AS AN INDICATOR OF COMMAND
AND CONTROL EFFECTIVENESS
Mr. Joseph G. WohZ ...........................................
THE ROLE OF TIME IN A COMMAND CONTROL SYSTEM
Dr. JoeZ S. Lawson, Jr ......................................
GAMES WITH UNCERTAIN MODELS
Dr. David Castanon .........................................
INFORMATION PROCESSING IN MAN-MACHINE SYSTEMS
Dr. Prodip Sen and Professor Rudolph F. Drenick ..............
MODELING THE INTERACTING DECISION MAKER WITH BOUND RATIONALITY
Mr. Kevin L. Boettcher and Dr. AZexander H. Levis ............
DECISION AIDING -- AN ANALYTIC AND EXPERIMENTAL STUDY IN A
MULTI-TASK SELECTION PARADIGM
Professor David L. KZeiman and Drs. Eric P. Soulsby, and
Krishna R. Pattipati ...............................
FUZZY PROBABILITIES AND THEIR ROLE IN DECISION ANALYSIS
Professor
Lotfi A. Zadeh ....................................
COMMAND, CONTROL AND COMMUNICATIONS (C3
AND MEASURES OF EFFECTIVENESS (MOE's)
)
SYSTEMS MODEL
Drs. Scot Harmon and Robert Brandenburg ......................
THE EXPERT TEAM OF EXPERTS APPROACH TO C 2 ORGANIZATIONS
Professor MichaeZ Athans ........................
176
A CASE STUDY OF DISTRIBUTED DECISION MAKING
Professor Robert R. Tenney ...................................
ANALYSIS OF NAVAL COMMAND STRUCTURES
Drs. John R. Delaney, Nils R. SandeZZ, Jr., Leslie C. Kramer,
and Professors Robert R. Tenney and MichaeZ Athans ...........
MODELING OF AIR FORCE COMMAND AND CONTROL SYSTEMS
Dr. Gregory S. Lauer, Professor Robert R. Tenney, and
Dr. Nils R. SandeZZ, Jr .....................................
A FRAMEWORK FOR THE DESIGN OF SURVIVABLE DISTRIBUTED SYSTEM
-- PART I:
COMMUNICATION SYSTEMS
Professors Marc Buchner and Victor Matula: presented
by Professor Kenneth Loparo ..................................
A FRAMEWORK FOR THE DESIGN OF SURVIVABLE DISTRIBUTED SYSTEMS
-- PART II:
CONTROL AND INFORMATION STRUCTURE
Professors Kenneth Loparo, Bruce Walker and Bruce Griffiths ..
CONTROL SYSTEM FUNCTIONALIZATION OF C- SYSTEMS VIA TWO-LEVEL
DYNAMICAL HIERARCHICAL SYSTEMS (DYHIS)
Professor Peter P. Groumpos ..................................
SEQUENTIAL LINEAR OPTIMIZATION & THE REDISTRIBUTION OF ASSETS
Lt. CoZoneZ Anthony Feit and Professor John M. Wozencraft ....
C 3 AND WAR GAMES -- A NEW APPROACH
Dr. Alfred G. Brandstein .....................................
177
Download