XXII. SENSORY AIDS RESEARCH
Prof. S. J. Mason
D. A. Cahlander
E. S. Davis
J. Dupress
W. G. Kellner
D. G. Kocher
W. B. Macurdy
M. A. Pilla
M. M. Schiffman
D. E. Troxel
A. AN INFORMATIONAL ANALYSIS OF DISCRETE HUMAN COMMUNICATION
SYSTEMS
1. Introduction
An important problem confonting the designer of a sensory aid concerns the comparison of one sensory aid with another. One reasonable procedure is to separate the problem of extracting information from the real world from that of transmitting this information to the handicapped person. In this report we shall focus our attention on the second problem. In our analysis we shall consider a system that is capable of presenting L distinct stimuli, x i
, to which there correspond uniquely L distinct responses, y.. Such a system, consisting of a stimulator and human receiver, will be termed a discrete human communication system. It is assumed that we know the first-order stimulus probabilities, p(xi), and that successive stimuli are statistically independent.
To proceed we must know something about the relation of the response ensemble to the stimulus ensemble. One way of obtaining this information is to conduct an experiment in which successive stimuli are drawn from a known probability distribution and the responses that are elicited from the human receiver are tabulated in a confusion matrix.
The rows of this matrix correspond to the stimuli presented, and the columns, to the responses that are evoked. A response of yj to a stimulus of x. is recorded by indexing the ij term of the confusion matrix up by one. Then at the conclusion of our experiment, the ij term of the confusion matrix divided by the total number of stimuli presented is an estimate of the joint probability p(yj, x.). Thus we can construct an estimated conditional probability matrix (p(yj Ixi)
) from the confusion matrix. We shall take this conditional probability matrix to be a description of the channel consisting of the stimulator system and the human receiver. It is important to note that this channel matrix is not necessarily independent of the input stimulus probability distribution. However, for the present, let us fix the input distribution and consider the resultant channel matrix.
The channels with which we are concerned differ from those ordinarily treated by the methods of information theory in that they contain human receivers.
Obvious complications spring to mind humans have memory, have time-variant characteristics, and so forth. But all is not lost. Some significant simplifications are made possible
This work was supported in part by the National Institutes of Health (Grant
MH-04737-02); in part by the National Science Foundation (Grant G-16526); and in part by the U.S. Air Force (Electronic Systems Division) under Contract AF19(628)-258.
429
(XXII. SENSORY AIDS RESEARCH) by the inclusion of humans in the system. Perhaps the most important simplification is that a workable system must be characterized by a fairly low over-all probability of error (say, less than 20 per cent). Humans characteristically tend to arrange things, in this case stimuli, in some natural grouping. Thus if there are L-1 possible errors that can be made when a particular stimulus is presented, a human is not very likely to make a large number of different erroneous choices. Instead, he is more likely to cluster his responses within a group that corresponds to stimuli that are somewhat similar to the presented stimulus.
The channel matrix and associated input distribution represents a description of the performance of the communication system. However, it is a rather unwieldy description if one is interested in evaluating performance under variations of stimulator parameters or for different human receivers. A much more manageable situation results if one reduces the channel description to a single number. There are many ways to accomplish this data reduction, but here we choose to calculate the average mutual information shared between the stimulus and response ensembles.l
Unfortunately, it is very difficult to evaluate mutual information for large channel sizes (for example, L
= 64) that are common in discrete human communication systems.
A straightforward way of getting around the computational difficulties is to use modern computers.
However, it is both inconvenient and expensive to run computer programs for day-to-day plotting of learning curves. Also, the large number of significant figures resulting from computer calculations tends to be somewhat misleading. One of the dangers of using a number is the propensity to put too much faith in its accuracy.
It should be remembered at this point that the channel matrix, on which the information calculation is based, is but an estimate of the "true" channel matrix. Accordingly, it makes little sense to grind out the mutual information to 10 decimal places, or (in most cases) even to 3.
With this in mind, it is then reasonable to try to develop a simplified computational procedure that will enable one to arrive at an approximate information transfer (mutual information) with an error in the neighborhood of that inherent in the test data.
We have derived the following upper and lower bounds which use the simplifications afforded by the inclusion of humans in the system. Following this discussion of the bounds is a proposed model channel that is used to estimate the information transfer of a discrete human communication system.
2. Derivation of a Lower Bound
We can express the information transfer as
I(X;Y) = H(X) H(XJY). (1)
We wish to consider the probability of error in the bounds, thus it is necessary to define a decoding scheme. We shall only consider maximum mutual-information decoding that
430
(XXII. SENSORY AIDS RESEARCH) is equivalent to maximum likelihood decoding:
I(x;y) - log -
P(y x)
P(y)
We can now define a probability of error where
P(e) = k
P(xk) j
P(yj xk)
=
P(yj) P(e ly), j#k j
P(e lyj) 1 - P(xj Iy)
= I
P(x k#j
It is convenient to define the entropy
H(e) = -P(e) log P(e) [l-P(e)] log [l-P(e)] for the binary-choice error and no error, and the corresponding conditional entropy
(4)
(5)
H(e IY) = P(yj)
J
H(e Iyj), (6) with
H(e y j
) = -P(e ly) log P(ely j
) [I-P(elyj)] log [1-P(elyj)].
(7)
We shall also define C. as the number of nonzero off-diagonal conditional probabilities,
P(yj Ixj), in column j and Cmax as the maximum of C.j for any column.
THEOREM 1: The mutual information satisfies the inequality
I(X; Y) > H(X) H(e IY) P(e) log Cmax > H(X) H(e) P(e) log Cma x
(8)
PROOF 1: H(X Y) is, by definition,
H(X P(yj) H(Xl yj), with
H(XIyj) = P(xk Yj) log P(xky j )
(10) the equivocation when the point of the space Y is given.
This equivocation can
431
(XXII. SENSORY AIDS RESEARCH) be rewritten as
H(X y j
) = -P(xj yj) log P(xI y j
) -
P(XklY j ) log P(xk Yj)
[1-P(xj yj)] log [1-P(xj fyj)] [1-P(x.
log 1
[l-P(xj lyj ]
Applying Eqs. 4 and 7, we have
(11)
H(X Iy j
) = H(e y j
) kP(xk IYj) k-j log P(x k
I j
) k~j
P(x k
[,-P(X i Iyj)]
=H(ely j
) [1-P(x. jyj)]
I kfj
P(xk Iy j
)
1 - P(xj yj)
- log
P(Xk Yj)
1 - P(x yj)
H(e yj) + P(e yj)
P(xk Y j
)
P k j P(e yj) log p(xk y j
)
P(e yj)
(12)
The summation on the right-hand side of Eq. 12 is recognized as the entropy of an ensemble consisting of C. points, and thus, its value cannot exceed log C.. It follows that
H(X y j
) <
H(e yj) + P(e yj) log C..
J
(13)
Since log Cma x
> log Cj, (14) we have
H(X Iyj) 4 H(e ly j
) + P(e ly j
) log Cma x .
Averaging over the ensemble Y, we have
H(X IY) < H(e IY) + P(e) log Cmax
(15)
(16)
Substitution of inequality (16) in Eq. 1 and use of the fact that
H(e IY) < H(e) (17) yields the inequality (7), which was to be proved.
Q.E.D.
In particular, if the probability distribution of the L inputs is uniform, we have an easy bound to calculate
432
(XXII. SENSORY AIDS RESEARCH)
(18) I(X;Y) > log L H(e) P(e) log C max
The equality sign in (13) holds only when
P(xk Yj) or zero for all k # j.
(19)
C.
For the equality sign to hold in (15) and (16) we must have C. = C t max tion to (19).
for all j, in addi-
EXAMPLE
Let C. = C
3
Now
1: Take P(xk) =Lfor all k and P(yjIxk) = p for j = k.
9 )
1-p
-C max max when it is nonzero.
L
H(X) log L := j=1 and
P(yj) =
1
+Cax(Cm a
L
1 thus H(Y) = log L. Also,
H(XY) =
L p
log - Cmax j= 1
I-p
LC max log i-p
LC max
= -p log p (l-p) log (1-p) + log L + (1-p) log C max
Since P(e) = 1 p, we have
H(XY) = H(e) + log L + P(e) log C max
Thus
I(X;Y) = H(X) + H(Y) H(XY) = log L H(e) P(e) log Cma x
' which is the lower bound. A sample channel matrix of order L = 4 is shown below. Note that this is a special case of a doubly uniform channel.
Y
I
.9 0 .05
I
.05
Uniform input distribution
.05
0
.05
.9
.05
.05
.05
.9
0
0
.05
.9
I(X;Y) = log 4 - H(.1) .1 log 2
= 1.431 bits.
433
(XXII. SENSORY AIDS RESEARCH)
EXAMPLE 2: Consider the following channel.
IY
Uniform input distribution
X
.9 0 .05 .05
.05 .85 .05 .05
0 .1 .9 0
.05 .05 0 .9
] max
= 2.
Also, P(e) 1 - [3(.9)+.85] = 0.1125, so that
I > log 4 - H(.1125) - .1125 log 2
I > 1.380 bits.
Calculation of the actual mutual information yields I = 1. 386 bits.
3. Derivation of an Upper Bound
THEOREM 2: The mutual information satisfies the inequality
I(X;Y) - log L IP(xi) H(pii), i where L is the order of the channel and
H(pii) -P(Yi xi) log P(yi Ixi) [1-P(yijxi)] log [l-P(y i xi)].
PROOF 2: We can express the information transfer as
I(X;Y) = H(Y) H(Y IX).
We always have, of course, the fact that
H(Y) < log L.
Now,
H(Y IX) = P(x i
) H(Y Ixi) and
H(Y Ix) = P(yj
log P(yj x i )
> -P(yi Ixi) log P(yi xi) [1-P(yi xi)] log [1-P(y i
= H(pii).
(20)
(21)
(22)
(23)
434
(XXII. SENSORY AIDS RESEARCH)
Thus we have
H(Y IX) > i
P(x i) H(pii). (24)
Substitution of (22) and (24) in (1) yields the inequality (20), which was to be proved.
Q.E.D.
The equality signs in (23) and (24) hold only when there is only 1 off-diagonal nonzero conditional probability in each row. Obviously, the equality sign in (4) holds only when the output distribution P(yj) is uniform.
EXAMPLE 3: Consider the channel treated in Example 1. The upper bound is
I <log L -
L i=l
H(p) or
I < log L H(p)
= log L H(e).
The difference between this upper bound and the actual mutual information is simply the term
P(e) log C = .1 bit for the sample 4 X 4 channel so that
I < log 4 - H(.1)
I< 1.531 bits.
EXAMPLE 4: Consider the channel treated in Example 2.
The upper bound is
3
I < log 4 -4H(.1)
1
-H(.15) or
I < 1.496 bits.
4. Proposed Model Channel
A first-order approximation to a channel describing a discrete human communication system is shown in Fig. XXII-1. Each of the major diagonal terms of this conditional probability matrix is taken to be equal to p.
The off-diagonal terms in each row may be either zero or a constant. If ri is defined as the number of nonzero off-diagonal terms in the ith row, then this constant is (1-p)/r i
.
The order of the matrix is L, so that we have 1 < r.
1
L 1. No assumptions have been made about the probability of any
435
(XXII. SENSORY AIDS RESEARCH)
Y Response
0
1-p r l p l -p r.
1 p 0
X
Stimulus
Fig. XXII-1.
Model channel.
particular off-diagonal term being zero. The zeros inserted in Fig. XXII-1 serve solely as an example. The probability of error for this model channel is 1 p when maximum likelihood decoding is used.
A simple estimate for the information transfer with uniform input probability distribution will be based on the following upper bound.
THEOREM 3: The information transfer for the model channel shown in Fig. XXII-1, with uniform input probability distribution, satisfies the following inequality.
I(X;Y) < log L H(e) -
P(e)
L i log r i
PROOF 3: The mutual information can be expressed as
I(X;Y) = H(X) + H(Y) H(XY).
The input distribution is uniform. We have
H(X) = log L and, of course, we always have
H(Y) < log L.
H(XY) will now be evaluated as the negative sum over the whole matrix of
Pij
L log
Pij
L
(25)
(26)
(27)
436
(XXII. SENSORY AIDS RESEARCH)
Consider, first, the sum on row i.
S. j
Pij Pij p log-=-
L p 1 p log-+ l log log
1
1 -p i p p 1-p l-p l-p log- log L L log ri.
No w
-H(XY) i
Sl1-p
S i
= p log L + (1-p) log L
As P(e) = 1 - p, we have
1-p
L i log r..i'
P(e)
H(XY) = log L + H(e) L log r.. (28) i
Substitution of (27), (22), and (28) in (26) yields the desired inequality (25), which was to be proved.
Q.E.D.
The inequality sign is necessary solely because of Eq. 22. If the total probability of error is small (say less than 10-15 per cent), then one can reasonably expect that p(y) will be fairly independent of y. H(Y) does not vary rapidly as p(y) departs from a uniform distribution; thus one can reasonably expect that the bound (Eq. 25) is rather tight.
On the basis of Theorem 3, the following formula is proposed as an estimate of the information transfer for a channel describing a discrete human communication system when the input stimulus distribution is uniform.
A
P(e)
I(X;Y) = log L H(e) L log r .
(29) i
EXAMPLE 5: Consider the channel treated in Example 1 with r.
1 max
.
The estimate is
I(X;Y) = log L H(e) P(e) log C max
This is identical with the lower bound that is the actual information transferred.
A for the sample channel of order four, we have I = 1.43 bits per stimulus.
Thus
EXAMPLE 6: Consider the channel treated in Example 2. The estimate is
A
I(X;Y) = log 4 - H(.1125) -
.1125
(log2+log 3 + log 1 +log2) = 1.39 bits per stimulus.
437
(XXII. SENSORY AIDS RESEARCH)
5. Experimental Results
As we have shown, the primary purpose of the bounds that have been derived and the proposed estimate is to circumvent the laborious computations that are inherent in the calculation of mutual information. The price to be paid for a simplified estimate
(Eq. 29) is uncertainty about its accuracy. One indication of accuracy is obtained by calculating the upper and lower bounds. The worst possible percentage of error is given by estimate lower bound lower bound or
F log r
1
E = 100 log L H(e) i
P(e) log C max
(31)
It follows that
P(e) log Cmax
Emax log L H(e) P(e) log Cmax
(32)
This represents an easily calculable upper bound to the error. When a nonuniform input distribution is used, a reasonable estimate of the mutual information is obtained by choosing the mid-point between the bounds.
Some feeling for the "tightness" of the bounds and the accuracy of the estimate can be achieved by inspection of Figs. XXII-2 and XXII-3, in which the bounds, estimates, and actual mutual information are compared for various error probabilities. The stimuli for the smaller channel (L=5) are pulsed sine waves of different frequencies in a background of white noise, while the stimuli for the larger channel (1=16) consist of patterns of poke probes presented to the right index finger.
Results for a channel of order L = 64, for which the stimuli consist of poke probes to the right index finger, are: measured probability of error = 0.095; upper bound = 5.58 bits; actual information = 5. 36 bits; estimate = 5.36 bits; and lower bound = 5.23 bits.
The percentage of error of the estimate with respect to the actual information is
438
2.5
2.0
1.5
1.0
0.5
0
EZ
I
OD
-0
-8
-10
-12
-2
-4
2
0 uJ
0
UPPER BOUND
ACTUAL INFORMATION
ESTIMATE
-LOWER BOUND
0.05
0 10 0.15
0.20
MEASURED PROBABILITY OF ERROR
0.05
**
0
0
0.25
0.30
0.25
I
MEASURED PROBABILITY
OF ERROR
Fig. XXII-2.
Uniform input distribution (L=5).
439
4.0 -
.0 I
LU
U-
O<
Z
2.0 -
0
KEY
UPPER BOUND
-
-t
ESTIMATE
INFORMATION
LOWER BOUND
0.05 0.10 0.15 0.20
MEASURED PROBABILITY OF ERROR
0.25
0.05 0.10
0
0.30
0.15 0.20 0.25
*
MEASURED PROBABILITY
OF ERROR
0.30
O<~
""
Fig. XXII-3.
Uniform input distribution (L= 16).
440
(XXII. SENSORY AIDS RESEARCH)
-0.15 per cent. The confusion matrix for this channel is a composite one formed by adding confusion matrices for 4 different human receivers.
D. E. Troxel
References
1. For a discussion of the applicability of the mutual-information measure to various psychophysical test situations, see F. Attneave, Applications of Information Theory to
Psychology (Henry Holt and Company, New York, 1959), Chapter 4.
441