Towards a Cognitively-Based Analytic Model of Human Control of Swarms

advertisement
Formal Verification and Modeling in Human-Machine Systems: Papers from the AAAI Spring Symposium
Towards a Cognitively-Based Analytic Model of Human Control of Swarms
Behzad Tabibian
Michael Lewis
Christian Lebiere
University of Pittsburgh
4200 Fifth Ave.
Pittsburgh, Pennsylvania, US
University of Pittsburgh
4200 Fifth Ave.
Pittsburgh, Pennsylvania, US
Carnegie Mellon University
5000 Forbes Ave.
Pittsburgh, Pennsylvania, US
Nilanjan Chakraborty, Katia Sycara, Stefano Bennati
Meeko Oishi
Carnegie Mellon University, 5000 Forbes Ave.,
Pittsburgh, Pennsylvania, US
University of New Mexico
1 University of New Mexico
Albuquerque, New Mexico, US
Abstract
that require fidelity only in overt behavior but need analytic descriptions to integrate with other models. Engineering models of human performance provide such lightweight
analytical models that roughly approximate human behavior.
However, current engineering models are typically restricted
to the task for which they were designed and do not perform
well even for very similar tasks. The Optimal Control Model
(OCM) (Kleinman, Baron, and Levison 1970) is a good example of an engineering model. The OCM is based on the
premise of bounded rationality and contends that an operator performing a manual control task such as stabilizing
a helicopter will perform optimally subject to lags in perception and action and noise in observations and muscular
responses. The OCM fits compensatory tracking data quite
well (Kleinman, Baron, and Levison 1970) but is markedly
poorer in describing pursuit tracking where preview leading to human anticipation can contaminate responses. Rouse
(Rouse 1980) catalogs engineering models describing a variety of human tasks using observer, control, queuing, and
production system models. While modeling the task, as engineering models do, provides a good first order approximation of what a human (or any other decision maker) must
do to satisfy the constraints of a task, they fail to capture
the idiosyncrasies and limitations of human information processing that cause it to depart from optimality. Engineering
models, for example, are typically static while human performance reflects changing levels of activation, memory decay, learning from experience, fatigue, etc.
This paper describes initial steps to develop engineering
models of human performance that reflect both task constraints and cognitive architecture constraints based on the
ACT-R architecture. We evaluate the validity of an ACT-R
cognitive model in a task of human control of robot swarms.
This work is the first step towards developing analytic models of humans that have been based on high fidelity cognitive
models, so as to enable the overall human-machine system
to be mathematically verifiable but also include by construction realistic cognitive constraints and abilities.
In the reported study human data from a 2-choice control
task were collected using Mechanical Turk. Two engineering models, one based on a classifier (static) and the other
on reinforcement learning (dynamic) were fitted to the human data. Another model was developed in ACT-R. The
engineering models and the cognitive model are compared
Robotic swarms are nonlinear dynamical systems that benefit from the presence of human operators in realistic missions with changing goals and constraints. There has been
recent interest in safe operations of robotic swarms under human control. Verification and validation techniques for these
human-machine systems could be deployed to provide formal guarantees of safe performance. A current limitation that
hinders practical significant applications of verification to
human-machine systems is the lack of analytic models of the
human operator that include realistic cognitive constraints.
Our research aims to develop high fidelity analytic human
models via abstraction of cognitive models based on ACTR. In this paper, we report on results from the first step in
this process. We designed a 2-choice control task of a robotic
swarm, obtained data from human operators and compared
the fit of these data to two analytic models and an ACT-R
based cognitive model. We present the experimental results
and discuss our future plans on how the analytic model will
be derived from the cognitive model so that the whole humanswarm system could be amenable to verification techniques.
Introduction
This paper describes initial research towards developing
high fidelity models of humans. Such models could be abstracted to provide analytic human models so that the whole
human-automation system can be amenable to formal verification techniques. Formal verifications, in turn, can provide
guarantees of safe system behavior.
Current computational cognitive model architectures such
as Act-R (Anderson and Lebiere 1998) and SOAR (Newell
1992) are complex but able to replicate the details of human
cognitive performance. As implemented theories of cognition these programs instantiate details of our cognitive architectures that allow prediction of behavior in novel situations and for untested parameters. This generality, however,
comes at the cost of complex and expensive computations.
A saying attributed to Einstein is that: Everything should be
as simple as it can be, but not simpler. The complexity of
today’s cognitive models reflects this dictum. The complexity makes cognitive models unsuitable for many applications
c 2014, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
68
Parameter
Environment size
Variance Range
Training Trials Count
Repeated Trials Count
Main Trials Count
in their fit to human data. Our results show that the ACT-R
model outperformed both engineering models in terms of its
ability to fit the human data.
Experiment Design
We studied the learning and performance behavior of human
participants in a human-swarm interaction task in which a
subject controls simulated robots through a web interface.
The interface provided information about the current state
of the swarm and two strategies, rendezvous and deploy, implemented as in (Bullo, Cortés, and Martinez 2009). The human subjects could not control individual robots directly but
could control the swarm as a whole (and thus control individual swarm members indirectly) to achieve the performance goal.
The performance objective of the task was to maximize
the explored area of the simulated environment using the
robots. In each trial a subject chose one possible strategy,
rendezvous or deploy, to cover as much of the map as possible. The deploy strategy caused the robots to distribute themselves to cover the environment, whereas rendezvous caused
the robots to converge to a single location. At the end of each
trial the coverage score of the swarm was reported visually
and numerically. Coverage score is the total area covered by
the robots in the swarm. Figures 1 and 2 show the screen
shots of the experiment before and after choice and execution of the chosen control strategy.
Value
[-140,140]
[10,210]
10
10
40
Table 1: Summary of experiment design parameters
Robots operate in an environment with obstacles. Obstacles and robots were distributed differently in each trial.
Presence of obstacles in the environment made the task of
predicting swarm behavior difficult for participants.
This experiment was conducted on a web site. Participants
from Mechanical Turk were presented the instructions on the
homepage. After starting the experiment they went through
10 training trials where they got familiar with the environment, the task and the methods of controlling swarm. During these 10 trials, the participants picked a strategy and observed its effect in terms of the covered area. Additionally,
for each training trial, the subjects also were shown the effect of the other strategy (the one they did not choose) from
the same initial state of swarm. After these 10 training trials, and for the rest of the experiment, the participants were
allowed to only pick one of the strategies for each trial and
were shown the effects of only the strategy they picked after
each trial. At the end of the experiment they were presented
with the total coverage across all trials and then redirected
back to Mechanical Turk with a code to submit in order to
redeem their fee.
Robots and obstacles were samples from two different
Bivariate Gaussian distributions. Means of these two distributions were uniformly sampled from the range [-140,140]
which is the size of the environment. . The variances were
also uniformly sampled from [10,210]. Finally, the training trials were presented again at the end of the experiment
to observe if subjects changed their choice after interacting
with the experiment long enough to improve their performance. A summary of the experiment design is presented in
Table 1. Figure 3 show the distribution of means across all
trials.
Figure 1: Initial condition of a trial with two choices to control the swarm.
Engineering Models
Reinforcement Learning
We also modeled our setup as an instance of a classification problem. We used logistic regression model (Bishop
and Nasrabadi 2006) to build a classifier for this model. The
logistic regression model requires complete data, i.e. coverage scores for both choices at each trial. The original experiment does not provide complete information to subjects
after each trial and thus it is not possible to directly compare
the results of this model and human subject data, since the
model has more information than the humans. Logistic regression was used in an incremental fashion, in other words
the model tried to predict the choice of strategy for the next
trial, based on the history of trials it had already seen. Results of the performance of the logistics regression model are
Figure 2: The shaded area indicates covered area after a
choice of one strategy has been executed..
69
dural module, but all memory chunks are matched in parallel, with the best-matching one is returned into the retrieval buffer. These operations combine symbolic (propositional) representations with subsymbolic (statistical) processes. For instance, a chunk is retrieved from memory that
not only matches the requested symbolic description but also
reflects factors such as recency and frequency of reinforcement, priming from related items, and semantic similarity
to requested pattern. Learning mechanisms operate to both
create symbolic structures such as declarative chunks and
production rules and constantly tune the subsymbolic parameters controlling their selection according to statistical
factors reflecting their pattern of use and presence in the environment.
Figure 3: Xs show distribution of swarm mean and Os show
distribution of obstacles means
given in Figure 5b. However, later we see that even with this
extra information this model cannot produce results better or
similar to human data.
Logistic Regression
We also modeled our setup as an instance of a classification problem. We used logistic regression model (Bishop
and Nasrabadi 2006) to build a classifier for this model. The
logistic regression model requires complete data, i.e. coverage scores for both choices at each trial. The original experiment does not provide complete information to subjects
after each trial and thus it is not possible to directly compare
the results of this model and human subject data, since the
model has more information than the humans. Logistic regression was used in an incremental fashion, in other words
the model tried to predict the choice of strategy for the next
trial, based on the history of trials it had already seen. Results of the performance of the logistics regression model are
given in Figure 5b. However, later we see that even with this
extra information this model cannot produce results better or
similar to human data.
Figure 4: ACT-R Cognitive Architecture
ACT-R model
ACT-R (Anderson and Lebiere 1998) is a neurally-inspired
cognitive architecture that aims to reproduce both human capabilities and human performance limitations. Its structure,
Figure 4, reflects the organization of the human brain and
a commitment to mapping architecture modules onto brain
areas using neural imaging techniques (Anderson 2007).
Its modular, distributed nature facilitates parallel processing, with synchronization accomplished through a central
procedural module that coordinates information flow between the other modules. Communication between modules is accomplished through limited-capacity buffers that
can only hold one chunk of information at a time. Similarly, each module is itself massively parallel but can only
perform a single operation at a time. For instance, only a
single declarative memory retrieval can be performed at a
time based on a pattern request received from the proce-
The cognitive model for this task follows the instancebased learning (IBL) methodology (Gonzalez, Lerch, and
Lebiere 2003). The main precept of IBL modeling is that
people primarily base their decisions directly in their experience of the task. In IBL, experience is stored in the form
of decision instances that reflect directly the interaction with
the task environment. Decision instances typically take the
form of a memory chunk including a context component,
which encodes the characteristics and features of a trial that
are taken into account when making the decision, the possible choice(s) or action(s) that can be taken, and the outcome(s) resulting from the decision(s). Decisions are then
made by encoding the current situation, matching it to the
context of past decision chunks in declarative memory to retrieve the most similar past experience(s), then use it to evaluate the current choices and make the best decision. This key
70
match of a chunk is a function of the semantic similarity
between the pattern supplied, i.e. the context features, and
the actual corresponding content of the chunks. We encoded
both centrality and dispersion in a [0,1] interval and used a
linear similarity with scaling factor (mismatch penalty) of
2.5. A traditional memory retrieval would retrieve a single
decision instance and use the specific coverage achieved in
that instance. Instead, we used the blending retrieval mechanism (Lebiere, 1999) which retrieves an aggregate value
that minimizes the discrepancy with the values retrieved for
each chunk, weighted by the probability of retrieving that
chunk as specified by a Boltzmann (softmax) distribution
(noise value of 0.5, corresponding to a temperature parameter of 0.7). Using linear similarities for the coverage values
as well, this is equivalent to a weighted averaging procedure
over the various decision instances according to their degree
of match. The decision procedure is simply to choose the
strategy with the best predicted coverage.
process of retrieving relevant experience(s) from memory is
directly constrained and supported by the cognitive architecture.
While the central process of memory retrieval is constrained and enabled by cognitive primitives, a number of
additional specifications have to be included in the model.
Those include which features to include in the representation
of the decision context, which action(s) can be taken, and
how to represent the outcome of the action. We attempted
to make these choices in as straightforward a manner as
possible according to the no-magic doctrine (Anderson and
Lebiere 1998), which recommends avoiding complex model
assumptions unsupported by direct evidence. The actions
that can be taken are directly specified in the experiment in
the form of the deploy and rendezvous strategies. While one
could simply represent the outcome in terms of which strategy performed best on each training trial, that information is
not available for the testing trials since the participants can
only try one of the two strategies, and can never be certain
whether the other might have been better. Instead, what is
directly available, and what is encoded in the decision instances, is the percentage of coverage of the option taken
(both options in the case of the training trials).
Finally, the most difficult issue is how to represent the decision context. While ACT-R includes a vision module used
to mediate perception of the environment, it mostly embodies a theory of attention that does not strongly constrain how
to process a display such as the one used in this task, and in
particular how to represent aggregate perceptual information
such as the relative distribution of robots and obstacles. We
computed five features that together specify the task-relevant
characteristics of those distributions: the centrality and dispersion of the robot and obstacle distributions, and the distance between the two distribution centers. We computed
correlations between those features and the coverage of each
strategy. The dominant feature was robot distribution, which
correlated strongly (greater than 0.6) with success of the rendezvous strategy (and negatively with the deploy strategy).
The second most significant feature (correlation about 0.3)
was the centrality of the robot distribution. To keep the representation in accordance with traditional working memory
constraints, we limited the context representation to these
two features. This decision is compatible with informal reports indicating that participants primarily focused on the
robot distribution and did not pay much attention to the role
of obstacles.
A final decision was whether to represent both actions and
their outcomes together or as part of separate chunks. To
avoid duplication of context, we decided to represent them
together in a single chunk. A decision instance was therefore composed of four slots: the centrality and dispersion of
the robot distribution, and the percent coverage of the rendezvous and deploy strategy (the strategies themselves did
not need to be represented explicitly). A new decision instance chunk was created for each training trial. For the test
trials, after encoding the context features, a memory retrieval
was requested, matching those features and requesting values for the missing slots, namely the percentage of coverage of the rendezvous and deploy strategies. The degree of
Results
Fifty participants were recruited through Mechanical Turk
and performed the experiment. 48 subjects finished all trials.
The solid lines in Figure 6a demonstrate the trend in number of correct answers of the human subjects over 50 trials
(10 first test trials are discarded). Figures 5a and 5b shows
the performance of the reinforcement learning and logistic
regression model respectively. It can be seen that these two
models could not make a good prediction of human behavior. In particular, Figure5b shows that even with the extra
information given to the logistic regression model, it could
not produce results that were better or similar to those of the
humans.
Since memory retrieval in ACT-R model is stochastic because of the activation noise, 1000 Monte Carlo runs of
the model were collected and averaged. For actual strategy
choice decisions, the performance of the model matches well
that of the human participants. Figure 6a displays the percentage of correct answers of ACT-R against human subject
data as a function of trial number, excluding the initial ten
training trials. Model performance for most trials increases
with practice from about 70% to slightly over 80%. This reflects the accumulation of instances to gradually cover the
decision space and make better decisions from more similar
examples. Meanwhile, some trials result in much lower correct performance, some as low as about 20%, but most in the
40-60% range. All these trends can also be observed in the
human data.
A strong determinant of correct decisions is the difference
in coverage between the two control strategies. If the strategy choices lead to widely different results, then the choice
decision is usually a pretty easy one. Conversely, if the coverage expectations for the two choices are very close, then
even small differences in estimation can lead to a different,
and possibly wrong, choice. Figure 6b displays the percent
of correct answers as a function of the difference in coverage expressed as percentage of the map, binned in intervals
of 3% from 0% to 18%. Again, the same trends present in
human data are reproduced. The main trend is a significant,
but limited correlation between difference in coverage and
71
(a)
(a)
(b)
(b)
Figure 5: (a) Reinforcement Learning algorithm for humanswarm interaction task. (b) Logistic Regression model.
Since algorithm is fully deterministic trials have success of 0
or 1. In order to demonstrate the trend, trials are binned with
steps of 5 and performance is averaged within each bin.
Figure 6: (a) The solid line represents the performance of
48 participants in the experiment over 50 trials. The dashed
line represents ACT-R model predictions. Each point indicates average number of correct answers for that trial. (b)
Percentage of correct answers as difference in performance
of the two strategies.
correct choice. For barely noticeable differences in coverage
(less than 1.5%), performance is barely better than random
at slightly over 50%. It rises to about 60% correct for about
5% difference, then fluctuates between 70% and 80% for
differences greater than 7.5%. As for humans, while larger
coverage differences translate to more correct choices, they
do not lead to perfect performance or anything close, presumably because of unmodeled factors.
human data very well.
Our future work has two broad thrusts. First we will work
to make the ACT-R model performance even higher with respect to fit to human data, and second we will use data gotten
from simulations using the cognitive model to develop analytic models that have high fidelity with respect to human
performance. This will enable the integration of high fidelity
human models into human-automation systems so that the
integrated system could be amenable to formal verification
techniques to guarantee safe performance.
With respect to the first thrust, future work involves exploring additional features of the ACT-R model in order to
improve the detailed fit on a trial-by-trial basis. The baselevel learning mechanism varies the activation of memory
chunks according to the power law of practice and the power
law of recency. The latter in particular would have the effect
of making recent decision instances more salient in the retrieval process, and thus increase their effect on coverage
expectations. For instance, a particularly successful use of
a strategy could lead to increased selection in following tri-
Conclusion and Future Work
In this paper we conducted a set of human experiments in
control of robotic swarms. We compared the fit to the human
data obtained from the experiment of two well known engineering models of data analysis and classification, namely
reinforcement learning and logistic regression. We found
that the two engineering models not only did not capture the
important features of the human data, e.g. performance improvement due to learning, but for the most part had worse
performance than the humans. We also compared the human performance data to a cognitive model that was based
on ACT-R and found that this cognitive model captured the
72
als. Also, rather than having the modeler select the decision
features, It would also be possible to include a feature selection mechanism that would rely on the production utility
learning mechanism to decide which feature to encode when
perceiving a trial according to how much that feature contributes to successful decisions. This process might initially
lead to poorer performance in the initial trials as the model
is searching through the set of possible features.
With respect to the second thrust, we plan to use the cognitive model as a proxy for the human operator and run
simulations to produce the decisions made by the model as
a function of operator cognitive state and cognitive limitations. The evolution of the cognitive state can be thought of
as a k-dimensional discrete-time signal (which is the timetrajectory of activation level of the different memory chunks
through various decision cycles), in response to particular
inputs. Figure 7 is a schematic illustration of our approach.
The interface shows the multirobot system to be controlled
with two actions, ”Deploy” and ”Rendezvous”. The interface also shows the percentage coverage obtained by the
action. As stated before, an IBL memory chunk in ACT-R
consists of a representation of the context and quality of the
control actions. In this setting, context involves the centrality
and dispersion of the robots, variables x1 and x2 in the Figure 7. The memory chunk also contains a representation of
the decisions made or actions taken and resulting outcome,
the variables x3 and x4 which denote the coverage for Rendezvous and Deploy respectively.
Environment and system observations change the activation levels of the memory chunks in the cognitive model,
thus reflecting the system state as observed by the operator. Figure 7 shows three memory chunks, A1 , A2 , and A3
corresponding to three different instances of the problem.
The time-trajectory of the activation levels of the memory
chunks is clustered to produce the Markov model. In the
figure, an event corresponds to a pattern of memory chunk
activation levels. In this case the clustering generates three
events (S1, S2 and S3) and three corresponding states in a
Markov model. In general there can be k events that correspond to the consistent states of the system as observed by
the operator.
clusively human experiments). Although the data will be
generated for particular scenarios, since the Markov models are built from abstract cognitive states, the model will
be a generalization of the particular scenarios seen. Since
cognitive constraints like attentional limitations and working
memory are already part of the ACT-R model and the traces
generated by the cognitive model respect these constraints,
the Markov model that will be generated conforms to human cognitive constraints by construction. Furthermore, the
number of states in our approach is relatively insensitive to
the problem size (in terms of the number of robots, environment complexity etc.).
Acknowledgments
This work is funded by NSF awards CNS1329986,
CNS1329762 and CNS1329878.
References
Anderson, J., and Lebiere, C. J. 1998. The Atomic components of thought. Psychology Press.
Anderson, J. R. 2007. How can the human mind occur in
the physical universe? Oxford University Press.
Bishop, C. M., and Nasrabadi, N. M. 2006. Pattern recognition and machine learning, volume 1. springer New York.
Bullo, F.; Cortés, J.; and Martinez, S. 2009. Distributed control of robotic networks: a mathematical approach to motion
coordination algorithms. Princeton University Press.
Gonzalez, C.; Lerch, J. F.; and Lebiere, C. 2003. Instancebased learning in dynamic decision making. Cognitive Science 27(4):591–635.
Kleinman, D.; Baron, S.; and Levison, W. 1970. An optimal control model of human response part i: Theory and
validation. Automatica 6(3):357–369.
Newell, A. 1992. Unified theories of cognition and the role
of soar. In Soar: A cognitive architecture in perspective.
Springer. 25–79.
Rouse, W. B. 1980. Systems engineering models of humanmachine interaction. North Holland Oxford.
Sutton, R. S., and Barto, A. G. 1998. Reinforcement learning: An introduction (adaptive computation and machine
learning). The MIT Press.
Figure 7: Process of abstraction of cognitive model into analytical model
The proposed method has the advantage that it is easy to
generate data with cognitive models (rather than using ex-
73
Download