Formal Verification and Modeling in Human-Machine Systems: Papers from the AAAI Spring Symposium Towards a Cognitively-Based Analytic Model of Human Control of Swarms Behzad Tabibian Michael Lewis Christian Lebiere University of Pittsburgh 4200 Fifth Ave. Pittsburgh, Pennsylvania, US University of Pittsburgh 4200 Fifth Ave. Pittsburgh, Pennsylvania, US Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, Pennsylvania, US Nilanjan Chakraborty, Katia Sycara, Stefano Bennati Meeko Oishi Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, Pennsylvania, US University of New Mexico 1 University of New Mexico Albuquerque, New Mexico, US Abstract that require fidelity only in overt behavior but need analytic descriptions to integrate with other models. Engineering models of human performance provide such lightweight analytical models that roughly approximate human behavior. However, current engineering models are typically restricted to the task for which they were designed and do not perform well even for very similar tasks. The Optimal Control Model (OCM) (Kleinman, Baron, and Levison 1970) is a good example of an engineering model. The OCM is based on the premise of bounded rationality and contends that an operator performing a manual control task such as stabilizing a helicopter will perform optimally subject to lags in perception and action and noise in observations and muscular responses. The OCM fits compensatory tracking data quite well (Kleinman, Baron, and Levison 1970) but is markedly poorer in describing pursuit tracking where preview leading to human anticipation can contaminate responses. Rouse (Rouse 1980) catalogs engineering models describing a variety of human tasks using observer, control, queuing, and production system models. While modeling the task, as engineering models do, provides a good first order approximation of what a human (or any other decision maker) must do to satisfy the constraints of a task, they fail to capture the idiosyncrasies and limitations of human information processing that cause it to depart from optimality. Engineering models, for example, are typically static while human performance reflects changing levels of activation, memory decay, learning from experience, fatigue, etc. This paper describes initial steps to develop engineering models of human performance that reflect both task constraints and cognitive architecture constraints based on the ACT-R architecture. We evaluate the validity of an ACT-R cognitive model in a task of human control of robot swarms. This work is the first step towards developing analytic models of humans that have been based on high fidelity cognitive models, so as to enable the overall human-machine system to be mathematically verifiable but also include by construction realistic cognitive constraints and abilities. In the reported study human data from a 2-choice control task were collected using Mechanical Turk. Two engineering models, one based on a classifier (static) and the other on reinforcement learning (dynamic) were fitted to the human data. Another model was developed in ACT-R. The engineering models and the cognitive model are compared Robotic swarms are nonlinear dynamical systems that benefit from the presence of human operators in realistic missions with changing goals and constraints. There has been recent interest in safe operations of robotic swarms under human control. Verification and validation techniques for these human-machine systems could be deployed to provide formal guarantees of safe performance. A current limitation that hinders practical significant applications of verification to human-machine systems is the lack of analytic models of the human operator that include realistic cognitive constraints. Our research aims to develop high fidelity analytic human models via abstraction of cognitive models based on ACTR. In this paper, we report on results from the first step in this process. We designed a 2-choice control task of a robotic swarm, obtained data from human operators and compared the fit of these data to two analytic models and an ACT-R based cognitive model. We present the experimental results and discuss our future plans on how the analytic model will be derived from the cognitive model so that the whole humanswarm system could be amenable to verification techniques. Introduction This paper describes initial research towards developing high fidelity models of humans. Such models could be abstracted to provide analytic human models so that the whole human-automation system can be amenable to formal verification techniques. Formal verifications, in turn, can provide guarantees of safe system behavior. Current computational cognitive model architectures such as Act-R (Anderson and Lebiere 1998) and SOAR (Newell 1992) are complex but able to replicate the details of human cognitive performance. As implemented theories of cognition these programs instantiate details of our cognitive architectures that allow prediction of behavior in novel situations and for untested parameters. This generality, however, comes at the cost of complex and expensive computations. A saying attributed to Einstein is that: Everything should be as simple as it can be, but not simpler. The complexity of today’s cognitive models reflects this dictum. The complexity makes cognitive models unsuitable for many applications c 2014, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 68 Parameter Environment size Variance Range Training Trials Count Repeated Trials Count Main Trials Count in their fit to human data. Our results show that the ACT-R model outperformed both engineering models in terms of its ability to fit the human data. Experiment Design We studied the learning and performance behavior of human participants in a human-swarm interaction task in which a subject controls simulated robots through a web interface. The interface provided information about the current state of the swarm and two strategies, rendezvous and deploy, implemented as in (Bullo, Cortés, and Martinez 2009). The human subjects could not control individual robots directly but could control the swarm as a whole (and thus control individual swarm members indirectly) to achieve the performance goal. The performance objective of the task was to maximize the explored area of the simulated environment using the robots. In each trial a subject chose one possible strategy, rendezvous or deploy, to cover as much of the map as possible. The deploy strategy caused the robots to distribute themselves to cover the environment, whereas rendezvous caused the robots to converge to a single location. At the end of each trial the coverage score of the swarm was reported visually and numerically. Coverage score is the total area covered by the robots in the swarm. Figures 1 and 2 show the screen shots of the experiment before and after choice and execution of the chosen control strategy. Value [-140,140] [10,210] 10 10 40 Table 1: Summary of experiment design parameters Robots operate in an environment with obstacles. Obstacles and robots were distributed differently in each trial. Presence of obstacles in the environment made the task of predicting swarm behavior difficult for participants. This experiment was conducted on a web site. Participants from Mechanical Turk were presented the instructions on the homepage. After starting the experiment they went through 10 training trials where they got familiar with the environment, the task and the methods of controlling swarm. During these 10 trials, the participants picked a strategy and observed its effect in terms of the covered area. Additionally, for each training trial, the subjects also were shown the effect of the other strategy (the one they did not choose) from the same initial state of swarm. After these 10 training trials, and for the rest of the experiment, the participants were allowed to only pick one of the strategies for each trial and were shown the effects of only the strategy they picked after each trial. At the end of the experiment they were presented with the total coverage across all trials and then redirected back to Mechanical Turk with a code to submit in order to redeem their fee. Robots and obstacles were samples from two different Bivariate Gaussian distributions. Means of these two distributions were uniformly sampled from the range [-140,140] which is the size of the environment. . The variances were also uniformly sampled from [10,210]. Finally, the training trials were presented again at the end of the experiment to observe if subjects changed their choice after interacting with the experiment long enough to improve their performance. A summary of the experiment design is presented in Table 1. Figure 3 show the distribution of means across all trials. Figure 1: Initial condition of a trial with two choices to control the swarm. Engineering Models Reinforcement Learning We also modeled our setup as an instance of a classification problem. We used logistic regression model (Bishop and Nasrabadi 2006) to build a classifier for this model. The logistic regression model requires complete data, i.e. coverage scores for both choices at each trial. The original experiment does not provide complete information to subjects after each trial and thus it is not possible to directly compare the results of this model and human subject data, since the model has more information than the humans. Logistic regression was used in an incremental fashion, in other words the model tried to predict the choice of strategy for the next trial, based on the history of trials it had already seen. Results of the performance of the logistics regression model are Figure 2: The shaded area indicates covered area after a choice of one strategy has been executed.. 69 dural module, but all memory chunks are matched in parallel, with the best-matching one is returned into the retrieval buffer. These operations combine symbolic (propositional) representations with subsymbolic (statistical) processes. For instance, a chunk is retrieved from memory that not only matches the requested symbolic description but also reflects factors such as recency and frequency of reinforcement, priming from related items, and semantic similarity to requested pattern. Learning mechanisms operate to both create symbolic structures such as declarative chunks and production rules and constantly tune the subsymbolic parameters controlling their selection according to statistical factors reflecting their pattern of use and presence in the environment. Figure 3: Xs show distribution of swarm mean and Os show distribution of obstacles means given in Figure 5b. However, later we see that even with this extra information this model cannot produce results better or similar to human data. Logistic Regression We also modeled our setup as an instance of a classification problem. We used logistic regression model (Bishop and Nasrabadi 2006) to build a classifier for this model. The logistic regression model requires complete data, i.e. coverage scores for both choices at each trial. The original experiment does not provide complete information to subjects after each trial and thus it is not possible to directly compare the results of this model and human subject data, since the model has more information than the humans. Logistic regression was used in an incremental fashion, in other words the model tried to predict the choice of strategy for the next trial, based on the history of trials it had already seen. Results of the performance of the logistics regression model are given in Figure 5b. However, later we see that even with this extra information this model cannot produce results better or similar to human data. Figure 4: ACT-R Cognitive Architecture ACT-R model ACT-R (Anderson and Lebiere 1998) is a neurally-inspired cognitive architecture that aims to reproduce both human capabilities and human performance limitations. Its structure, Figure 4, reflects the organization of the human brain and a commitment to mapping architecture modules onto brain areas using neural imaging techniques (Anderson 2007). Its modular, distributed nature facilitates parallel processing, with synchronization accomplished through a central procedural module that coordinates information flow between the other modules. Communication between modules is accomplished through limited-capacity buffers that can only hold one chunk of information at a time. Similarly, each module is itself massively parallel but can only perform a single operation at a time. For instance, only a single declarative memory retrieval can be performed at a time based on a pattern request received from the proce- The cognitive model for this task follows the instancebased learning (IBL) methodology (Gonzalez, Lerch, and Lebiere 2003). The main precept of IBL modeling is that people primarily base their decisions directly in their experience of the task. In IBL, experience is stored in the form of decision instances that reflect directly the interaction with the task environment. Decision instances typically take the form of a memory chunk including a context component, which encodes the characteristics and features of a trial that are taken into account when making the decision, the possible choice(s) or action(s) that can be taken, and the outcome(s) resulting from the decision(s). Decisions are then made by encoding the current situation, matching it to the context of past decision chunks in declarative memory to retrieve the most similar past experience(s), then use it to evaluate the current choices and make the best decision. This key 70 match of a chunk is a function of the semantic similarity between the pattern supplied, i.e. the context features, and the actual corresponding content of the chunks. We encoded both centrality and dispersion in a [0,1] interval and used a linear similarity with scaling factor (mismatch penalty) of 2.5. A traditional memory retrieval would retrieve a single decision instance and use the specific coverage achieved in that instance. Instead, we used the blending retrieval mechanism (Lebiere, 1999) which retrieves an aggregate value that minimizes the discrepancy with the values retrieved for each chunk, weighted by the probability of retrieving that chunk as specified by a Boltzmann (softmax) distribution (noise value of 0.5, corresponding to a temperature parameter of 0.7). Using linear similarities for the coverage values as well, this is equivalent to a weighted averaging procedure over the various decision instances according to their degree of match. The decision procedure is simply to choose the strategy with the best predicted coverage. process of retrieving relevant experience(s) from memory is directly constrained and supported by the cognitive architecture. While the central process of memory retrieval is constrained and enabled by cognitive primitives, a number of additional specifications have to be included in the model. Those include which features to include in the representation of the decision context, which action(s) can be taken, and how to represent the outcome of the action. We attempted to make these choices in as straightforward a manner as possible according to the no-magic doctrine (Anderson and Lebiere 1998), which recommends avoiding complex model assumptions unsupported by direct evidence. The actions that can be taken are directly specified in the experiment in the form of the deploy and rendezvous strategies. While one could simply represent the outcome in terms of which strategy performed best on each training trial, that information is not available for the testing trials since the participants can only try one of the two strategies, and can never be certain whether the other might have been better. Instead, what is directly available, and what is encoded in the decision instances, is the percentage of coverage of the option taken (both options in the case of the training trials). Finally, the most difficult issue is how to represent the decision context. While ACT-R includes a vision module used to mediate perception of the environment, it mostly embodies a theory of attention that does not strongly constrain how to process a display such as the one used in this task, and in particular how to represent aggregate perceptual information such as the relative distribution of robots and obstacles. We computed five features that together specify the task-relevant characteristics of those distributions: the centrality and dispersion of the robot and obstacle distributions, and the distance between the two distribution centers. We computed correlations between those features and the coverage of each strategy. The dominant feature was robot distribution, which correlated strongly (greater than 0.6) with success of the rendezvous strategy (and negatively with the deploy strategy). The second most significant feature (correlation about 0.3) was the centrality of the robot distribution. To keep the representation in accordance with traditional working memory constraints, we limited the context representation to these two features. This decision is compatible with informal reports indicating that participants primarily focused on the robot distribution and did not pay much attention to the role of obstacles. A final decision was whether to represent both actions and their outcomes together or as part of separate chunks. To avoid duplication of context, we decided to represent them together in a single chunk. A decision instance was therefore composed of four slots: the centrality and dispersion of the robot distribution, and the percent coverage of the rendezvous and deploy strategy (the strategies themselves did not need to be represented explicitly). A new decision instance chunk was created for each training trial. For the test trials, after encoding the context features, a memory retrieval was requested, matching those features and requesting values for the missing slots, namely the percentage of coverage of the rendezvous and deploy strategies. The degree of Results Fifty participants were recruited through Mechanical Turk and performed the experiment. 48 subjects finished all trials. The solid lines in Figure 6a demonstrate the trend in number of correct answers of the human subjects over 50 trials (10 first test trials are discarded). Figures 5a and 5b shows the performance of the reinforcement learning and logistic regression model respectively. It can be seen that these two models could not make a good prediction of human behavior. In particular, Figure5b shows that even with the extra information given to the logistic regression model, it could not produce results that were better or similar to those of the humans. Since memory retrieval in ACT-R model is stochastic because of the activation noise, 1000 Monte Carlo runs of the model were collected and averaged. For actual strategy choice decisions, the performance of the model matches well that of the human participants. Figure 6a displays the percentage of correct answers of ACT-R against human subject data as a function of trial number, excluding the initial ten training trials. Model performance for most trials increases with practice from about 70% to slightly over 80%. This reflects the accumulation of instances to gradually cover the decision space and make better decisions from more similar examples. Meanwhile, some trials result in much lower correct performance, some as low as about 20%, but most in the 40-60% range. All these trends can also be observed in the human data. A strong determinant of correct decisions is the difference in coverage between the two control strategies. If the strategy choices lead to widely different results, then the choice decision is usually a pretty easy one. Conversely, if the coverage expectations for the two choices are very close, then even small differences in estimation can lead to a different, and possibly wrong, choice. Figure 6b displays the percent of correct answers as a function of the difference in coverage expressed as percentage of the map, binned in intervals of 3% from 0% to 18%. Again, the same trends present in human data are reproduced. The main trend is a significant, but limited correlation between difference in coverage and 71 (a) (a) (b) (b) Figure 5: (a) Reinforcement Learning algorithm for humanswarm interaction task. (b) Logistic Regression model. Since algorithm is fully deterministic trials have success of 0 or 1. In order to demonstrate the trend, trials are binned with steps of 5 and performance is averaged within each bin. Figure 6: (a) The solid line represents the performance of 48 participants in the experiment over 50 trials. The dashed line represents ACT-R model predictions. Each point indicates average number of correct answers for that trial. (b) Percentage of correct answers as difference in performance of the two strategies. correct choice. For barely noticeable differences in coverage (less than 1.5%), performance is barely better than random at slightly over 50%. It rises to about 60% correct for about 5% difference, then fluctuates between 70% and 80% for differences greater than 7.5%. As for humans, while larger coverage differences translate to more correct choices, they do not lead to perfect performance or anything close, presumably because of unmodeled factors. human data very well. Our future work has two broad thrusts. First we will work to make the ACT-R model performance even higher with respect to fit to human data, and second we will use data gotten from simulations using the cognitive model to develop analytic models that have high fidelity with respect to human performance. This will enable the integration of high fidelity human models into human-automation systems so that the integrated system could be amenable to formal verification techniques to guarantee safe performance. With respect to the first thrust, future work involves exploring additional features of the ACT-R model in order to improve the detailed fit on a trial-by-trial basis. The baselevel learning mechanism varies the activation of memory chunks according to the power law of practice and the power law of recency. The latter in particular would have the effect of making recent decision instances more salient in the retrieval process, and thus increase their effect on coverage expectations. For instance, a particularly successful use of a strategy could lead to increased selection in following tri- Conclusion and Future Work In this paper we conducted a set of human experiments in control of robotic swarms. We compared the fit to the human data obtained from the experiment of two well known engineering models of data analysis and classification, namely reinforcement learning and logistic regression. We found that the two engineering models not only did not capture the important features of the human data, e.g. performance improvement due to learning, but for the most part had worse performance than the humans. We also compared the human performance data to a cognitive model that was based on ACT-R and found that this cognitive model captured the 72 als. Also, rather than having the modeler select the decision features, It would also be possible to include a feature selection mechanism that would rely on the production utility learning mechanism to decide which feature to encode when perceiving a trial according to how much that feature contributes to successful decisions. This process might initially lead to poorer performance in the initial trials as the model is searching through the set of possible features. With respect to the second thrust, we plan to use the cognitive model as a proxy for the human operator and run simulations to produce the decisions made by the model as a function of operator cognitive state and cognitive limitations. The evolution of the cognitive state can be thought of as a k-dimensional discrete-time signal (which is the timetrajectory of activation level of the different memory chunks through various decision cycles), in response to particular inputs. Figure 7 is a schematic illustration of our approach. The interface shows the multirobot system to be controlled with two actions, ”Deploy” and ”Rendezvous”. The interface also shows the percentage coverage obtained by the action. As stated before, an IBL memory chunk in ACT-R consists of a representation of the context and quality of the control actions. In this setting, context involves the centrality and dispersion of the robots, variables x1 and x2 in the Figure 7. The memory chunk also contains a representation of the decisions made or actions taken and resulting outcome, the variables x3 and x4 which denote the coverage for Rendezvous and Deploy respectively. Environment and system observations change the activation levels of the memory chunks in the cognitive model, thus reflecting the system state as observed by the operator. Figure 7 shows three memory chunks, A1 , A2 , and A3 corresponding to three different instances of the problem. The time-trajectory of the activation levels of the memory chunks is clustered to produce the Markov model. In the figure, an event corresponds to a pattern of memory chunk activation levels. In this case the clustering generates three events (S1, S2 and S3) and three corresponding states in a Markov model. In general there can be k events that correspond to the consistent states of the system as observed by the operator. clusively human experiments). Although the data will be generated for particular scenarios, since the Markov models are built from abstract cognitive states, the model will be a generalization of the particular scenarios seen. Since cognitive constraints like attentional limitations and working memory are already part of the ACT-R model and the traces generated by the cognitive model respect these constraints, the Markov model that will be generated conforms to human cognitive constraints by construction. Furthermore, the number of states in our approach is relatively insensitive to the problem size (in terms of the number of robots, environment complexity etc.). Acknowledgments This work is funded by NSF awards CNS1329986, CNS1329762 and CNS1329878. References Anderson, J., and Lebiere, C. J. 1998. The Atomic components of thought. Psychology Press. Anderson, J. R. 2007. How can the human mind occur in the physical universe? Oxford University Press. Bishop, C. M., and Nasrabadi, N. M. 2006. Pattern recognition and machine learning, volume 1. springer New York. Bullo, F.; Cortés, J.; and Martinez, S. 2009. Distributed control of robotic networks: a mathematical approach to motion coordination algorithms. Princeton University Press. Gonzalez, C.; Lerch, J. F.; and Lebiere, C. 2003. Instancebased learning in dynamic decision making. Cognitive Science 27(4):591–635. Kleinman, D.; Baron, S.; and Levison, W. 1970. An optimal control model of human response part i: Theory and validation. Automatica 6(3):357–369. Newell, A. 1992. Unified theories of cognition and the role of soar. In Soar: A cognitive architecture in perspective. Springer. 25–79. Rouse, W. B. 1980. Systems engineering models of humanmachine interaction. North Holland Oxford. Sutton, R. S., and Barto, A. G. 1998. Reinforcement learning: An introduction (adaptive computation and machine learning). The MIT Press. Figure 7: Process of abstraction of cognitive model into analytical model The proposed method has the advantage that it is easy to generate data with cognitive models (rather than using ex- 73