Uploaded by Hubern Collins

Carl von Clausewitz, the Fog-of-War, and the AI Revolution The Real World Is Not A Game Of Go ( PDFDrive )

advertisement
SPRINGER BRIEFS IN APPLIED SCIENCES AND
TECHNOLOGY  COMPUTATIONAL INTELLIGENCE
Rodrick Wallace
Carl von
Clausewitz, the
Fog-of-War, and
the AI Revolution
The Real World Is
Not A Game Of Go
123
SpringerBriefs in Applied Sciences
and Technology
Computational Intelligence
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Systems Research Institute,
Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
More information about this series at http://www.springer.com/series/10618
Rodrick Wallace
Carl von Clausewitz,
the Fog-of-War, and the AI
Revolution
The Real World Is Not A Game Of Go
123
Rodrick Wallace
Division of Epidemiology
The New York State Psychiatric Institute
New York, NY
USA
ISSN 2191-530X
ISSN 2191-5318 (electronic)
SpringerBriefs in Applied Sciences and Technology
ISSN 2520-8551
ISSN 2520-856X (electronic)
SpringerBriefs in Computational Intelligence
ISBN 978-3-319-74632-6
ISBN 978-3-319-74633-3 (eBook)
https://doi.org/10.1007/978-3-319-74633-3
Library of Congress Control Number: 2017964243
© The Author(s), under exclusive licence to Springer International Publishing AG, part of Springer
Nature 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by the registered company Springer International Publishing AG part
of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Corporate interests and their academic clients now claim that artificial intelligence,
via recent advances in deep learning and related technologies, is ready to take on
management of critical real-time processes ranging from driverless cars on intelligent roads to the conduct of war. In the past, corporate interests have also claimed
that smoking is harmless, environmental contamination unimportant, faulty airbags
are safe, polyvinyl chloride furnishings and finishings in fires are no more dangerous than wood, and made any number of other assertions that, in the long run,
have caused massive human suffering. In many cases, aggressive marketing by
those interests was able to build edifices “too big to fail, too big to jail”, privatizing
profits while socializing costs. Corporate AI advocates for driverless cars and
autonomous weapons stand on the verge of creating such conditions for their
products. Absent intervention, others will follow.
The central thesis of this monograph is that cognitive algorithmic entities tasked
with the real-time management of critical processes under rapidly shifting “roadway” conditions will face many of the same conundrums and constraints that
confront the conduct of warfare and other forms of conflict. As with conventional
traffic flow, such roadways need not be passive, but may engage or employ entities
having their own purposes, mechanisms, and cognitive abilities. These may range
across biological, individual, social, institutional, machine, and/or hybrid manifestations and dynamics, from cancer, murder, and neoliberal capitalism, to Centaur
or autonomous battlespaces.
From the Somme and Passchendaele, to Blitzkrieg madness and Cold War
preparations for human extinction, Vietnam, and the current Middle Eastern
bludgeonings, the art and science of warfare has been singularly unable to cope
with what the military theorist Carl von Clausewitz characterized as the
“fog-of-war” and “friction” inevitable to human conflict. We argue here that, in the
real world, Artificial Intelligence will face similar challenges with similar or greater
ineptitude. The biblical injunction not to put trust in the chariots of Egypt is likely
to take new meaning over the next century.
v
vi
Preface
More specifically, the monograph’s first chapter shows how straightforward
arguments from control and information theories imply that emergence of the AI
revolution from games of Chess and Go into the real world will fatally encounter
the central matters of the Clausewitz analysis. Promises of graceful degradation
under stress for large numbers of driverless vehicles on intelligent roads, of precision targeting that avoids civilian collateral damage for autonomous or so-called
man/machine centaur weapons, of precision medicine under even normal living
condition, let alone during the current slow disasters of climate change and social
decay, of the ability to manage financial crises in real time with agent-based
models, and so on, are delusive groupthink or marketing hype that will be
beta-tested on human populations, a gross contravention of fundamental moral and
legal norms.
The second chapter extends the model to nonergodic cognitive systems, a parallel to the nonparametric extension of more familiar statistical models.
This requires some comment.
Cognition—biological, social, institutional, machine, or composite—most singularly involves choice that reduces uncertainty. Reduction of uncertainty implies
the existence of an information source dual to the cognitive process under study.
However, information source uncertainty for path-dependent nonergodic systems
cannot be described as a conventional Shannon entropy since time averages are not
ensemble averages. Nonetheless, the essential nature of information as a form of
free energy allows study of nonergodic cognitive systems having complex dynamic
topologies whose algebraic expression is in terms of directed homotopy groupoids
rather than groups. This permits a significant extension of the data rate theorem
linking control and information theories via an analog to the spontaneous symmetry
breaking arguments fundamental to modern physics. In addition, the identification
of information as a form of free energy enables construction of dynamic empirical
Onsager models in the gradient of a classic entropy that can be built from the
Legendre transform of even path-dependent information source uncertainties. The
methodology provides new analytic tools that should prove useful in understanding
failure modes and their dynamics across a broad spectrum of cognitive phenomena,
ranging from physiological processes at different scales and levels of organization
to critical system automata and institutional economics.
The third chapter provides a worked-out example, making a schematic application of the theory to passenger crowding on vehicle-to-infrastructure (V2I) public
transit systems in which buses or subways become so crowded that they are ordered
by a central control to begin a degraded “skip-stop” service. D. Wallace and R.
Wallace (1998) examine such “skip-stop” dynamics for fire service in New York
City, a policy called “fallback” in which increasing demand was met by a programmed decline in the dispatch of emergency equipment. The results, for the
Bronx, Central Harlem, and so on, were spectacularly catastrophic during the
1970s.
Preface
vii
The fourth chapter provides another case history, examining how failure of the
dynamics of crosstalk between “tactical” and “strategic” levels of organization will
lead to another version of the John Boyd mechanism of command failure: the rules
of the game change faster than executive systems can respond.
The fifth chapter comes full circle, applying the theory explicitly to military
systems. Here, the powerful asymptotic limit theorems of control and information
theories particularly illuminate target discrimination failures afflicting autonomous
weapon, man/machine centaur or cockpit, and more traditional structures under
increasing fog-of-war and friction burdens. Argument indicates that degradation in
targeting precision by high-level cognitive entities under escalating uncertainty,
operational difficulty, attrition, and real-time demands will almost always involve
sudden collapse to an all too familiar pathological state in which “all possible
targets are enemies”, otherwise known as “kill everyone and let God sort them out”.
The sixth chapter examines real-time critical processes on a longer timescale,
through an evolutionary lens. The basic finding is that protracted conflict between
cognitive entities can trigger a self-referential, coevolutionary bootstrap dynamic, in
essence a “language that speaks itself”. Such phenomena do not permit simple
command-loop interventions in John Boyd’s sense and are very hard to contain. An
example might be found in the evolutionary transformation of the Soviet Union’s
military forces, tactics, and strategy in the face of German Bewegungskrieg from
the battles of Moscow to Stalingrad, and then Kursk, and in the “insurgency” that
followed the 2003 tactical successes of the US in Iraq. Another example can be
found in the systematic resistance of the defeated Confederate states after the US
Civil War that resulted in the withdrawal of US troops and the end of
Reconstruction in 1877, permitting imposition of the Jim Crow system of racial
segregation and voter suppression that lasted well into the latter half of the twentieth
century.
The final chapter sums up the argument: Caveat Venditor, Caveat Emptor.
Some explicit comment on methodology is in order. The basic approach is
through the asymptotic limit theorems of information and control theories, leading
to statistical models that, like regression equations, are to be fitted to observational
or experimental data. The essential questions do not, then, revolve around the
pseudoscientific manipulation of metaphors abducted from “nonlinear science”, as
devastatingly critiqued by Lawson (2014), but rather on how well these statistical
models work in practice. Mathematical models that surround, or arise from, the
development of these tools should be viewed in the sense of the theoretical ecologist E. C. Pielou (1977) as generating conjectures that are to be tested by the
analysis of observational and experimental data: the word is never the thing.
The author thanks Barry Watts and a number of anonymous commentators for
suggestions and differences of opinion useful in revision.
New York, USA
Rodrick Wallace
viii
Preface
References
Lawson, S., 2014. Non-Linear Science and Warfare: Chaos, complexity and the US military in the
information age. New York: Routledge.
Pielou, E.C., 1977. Mathematical Ecology. New York: John Wiley and Sons.
Wallace, D., Wallace, R., 1998. A Plague on Your Houses. New York: Verso.
Contents
1 AI in the Real World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 The Data Rate Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 The ‘Clausewitz Temperature’ . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 A Bottleneck Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Dynamics of Control Failure . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 The Failure of Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7 No Free Lunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8 The ‘Boyd Temperature’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.9 Flash-Crash Market Pathologies . . . . . . . . . . . . . . . . . . . . . . . . .
1.10 Network Fragmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.11 The Ratchet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.12 Operational and Strategic Failure . . . . . . . . . . . . . . . . . . . . . . . .
1.13 Failure, Remediation, and Culture . . . . . . . . . . . . . . . . . . . . . . .
1.14 The Synergism of Phase Transitions in Real-Time Critical
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.15 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
3
5
6
7
14
16
17
25
28
30
31
36
2 Extending the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Generalizing the Data Rate Theorem . . . . . . . . . . . . . . . . . . . . .
2.3 The Transitive Cognitive Decomposition . . . . . . . . . . . . . . . . . .
2.4 Environmental Insult and Developmental Dysfunction . . . . . . . .
2.5 Other Complexity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
47
50
52
52
53
54
55
38
39
43
ix
x
Contents
3 An Example: Passenger Crowding Instabilities of V2I Public
Transit Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 The Data Rate Theorem for Traffic Flow . . . . . . . . . . . . . . . . . .
3.3 Multimodal Transport Systems . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Simplified Dynamics of System Failure . . . . . . . . . . . . . . . . . . .
3.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
57
58
61
62
65
65
4 An Example: Fighting the Last War . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 A Crosstalk Model: Mutual Information Dynamics . . . . . . . . . . .
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
67
69
72
72
5 Coming Full Circle: Autonomous Weapons . . . . . . . . . . . . . . . . . . .
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 The Topology of Target Space . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
73
74
77
6 An Evolutionary Approach to Real-Time Conflict: Beware
the ‘Language that Speaks Itself’ . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 An Iterated Coevolutionary Ratchet . . . . . . . . . . . . . . . . . . . . . .
6.3 Dynamics of Large Deviations . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Cambrian Events: Spawning Hydras . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
79
82
85
86
89
7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
92
Appendix A: Mathematical Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
About the Author
Rodrick Wallace received an undergraduate degree in mathematics and a Ph.D. in
physics from Columbia University. He took postdoctoral training in the epidemiology of mental disorders at Rutgers University and is a Research Scientist in the
Division of Epidemiology of the New York State Psychiatric Institute. A past
recipient of an Investigator Award in Health Policy Research from the Robert
Wood Johnson Foundation, he spent a decade as a public interest lobbyist, with
particular emphasis on empirical studies of urban fire service deployment, before
returning to full-time research and is the author of many peer-reviewed papers and
books across a variety of disciplines primarily relating to public health and pubic
order.
xi
Chapter 1
AI in the Real World
Abstract Straightforward arguments from control and information theories imply
that emergence of the AI revolution from games of Chess and Go into the real
world will fatally encounter the central matters of Carl von Clausewitz’ analysis of
Zweikampf warfare. Promises of graceful degradation under stress for large numbers
of driverless vehicles on intelligent roads, of precision targeting that avoids civilian
collateral damage for autonomous or so-called man/machine centaur weapons, of
precision medicine under even normal living condition, let alone during the current
slow disasters of climate change and social decay, of the ability to manage financial
crises in real time with agent-based models, and so on, are delusive groupthink or
marketing hype that will be beta-tested on human populations, a gross contravention
of fundamental moral and legal norms.
1.1 Introduction
Critical systems operating on complex, rapidly-shifting real-time ‘roadway’ topologies are inherently unstable precisely because of those topologies. Think of driving
a fast car on a twisting, pot-holed road at night, a matter that requires not only quick
reflexes and a reliable vehicle, but really good headlights. Combat operations against
a skilled adversary face similar inherent ‘roadway’ instability. At a different scale,
virtually all important physiological processes are also inherently unstable in the
control theory sense: development must be closely regulated to activate and deactivate a relatively small number of genes in exactly the right sequence and at the right
stage, in concert with and response to powerful, often rapidly-changing, epigenetic
and environmental signals. Failure of control—onset of instability—produces serious developmental disorders. Cognitive immune function (Atlan and Cohen 1998)
engages in pathogen attack and routine maintenance according to highly irregular patterns of need, but must be closely regulated to avoid autoimmune dysfunction. Similarly, the stream of animal consciousness must be held within contextual
‘riverbanks’ to restrict its meanderings to useful realms. Such things as inattentional
blindness—overfocus on a detail—can lead to individual predation, while, at a larger
scale, group survival requires that social animals must conform to group norms.
© The Author(s), under exclusive licence to Springer International Publishing AG,
part of Springer Nature 2018
R. Wallace, Carl von Clausewitz, the Fog-of-War, and the AI Revolution, SpringerBriefs
in Computational Intelligence, https://doi.org/10.1007/978-3-319-74633-3_1
1
2
1 AI in the Real World
Thus for important biological and social processes, instability and its draconian
regulation are always implicit. Similar considerations apply to large-scale human
institutions that respond to rapidly-changing patterns of demand and opportunity.
Driverless cars on intelligent roads—V2V/V2I systems—will operate quite literally
on rapidly-shifting roadway environments, as, currently, do financial, communications, and power networks of any size, and, of course, autonomous, man-machine
‘centaur’ and more familiar ‘cockpit’ weapon systems of varying levels of complexity
and human control.
One example that has engendered particular attention is Richard Bookstaber’s
(2017) elegant sales pitch for agent-based modeling in economics and finance. He
proposes yet another ‘revolution in military affairs’ (Neuneck 2008), this time guided
by an array of cognitive modules that, acting together, via a kind of swarm intelligence
in the presence of Big Data, are supposed to permit us to manage financial crises in
real time. Agent-based models rely on ad-hoc (and possibly evolutionarily derived)
heuristics rather than a full-scale, detailed underlying dynamic picture of the world.
Agent-based modeling per se has been the subject of trenchant criticism. In one
example, Conte and Paolucci (2014), who go on to cite a considerable skeptical
literature, write
[Agent Based Models (ABM)] can only provide a sufficient explanation of the phenomenon
of interest, not a necessary one. This... is also known as multi-realizability... and is an outstanding property of multilevel systems. A macro-level phenomenon in whatever domain...
is multirealizable when it can be implemented in different ways on the lower levels... Even
if as many models as generating paths were actually implemented, it would still be difficult,
if not impossible, to assess which one among them is effectively implemented in the real
world...
Under the pressure of complex systems science... agent-based simulation is increasingly
expected to meet a further... requirement, i.e., to be fed by massive data in real-time...
Unlike laws of nature, Agent Based models of socio-economic phenomena are countless and
not always consistent...
...[T]he variety of equivalent agent models in part depends on a property inherent to [complex]
multi-level systems... [i.e.,]... multirealizability... [i]n part... a consequence of the shaky
foundations, the poor theoretical justification at the basis of many agent models...
They particularly note that the consensus surrounding ABM directs that one
seeks the rules that are minimally necessary to obtain the macroscopic effect to
be described, and emphasize, by contrast, that ‘Entities and properties emerge from
the bottom up and retro-act on the systems that have generated them. Current agentbased models instead simulate only emergent properties’.
Clearly, real-world, real-time systems are not necessarily minimal and are almost
always engaged in feedback with their effects. One need only think of the multiplicities and variations of parasite and pathogen life cycles that have evolved under
shifting selection pressures. Institutional systems suffer similar feedbacks and selections, and neither minimality nor linearity can be assumed.
Indeed, unlike the linear case, out-of-sample dynamics of nonlinear systems cannot be estimated by ABM’s necessarily constructed on the sample. The Ptolemaic
solar system involves circular cycles-on-cycles of different radii about a fixed Earth
1.1 Introduction
3
to emulate the progression of the planets. Similar to a Fourier series approximation to
a real scalar function on a fixed range, the approximation can be made to any desired
accuracy. Outside the initial data range, however, the approximation rapidly fails.
Indeed, extending the Ptolemaic analysis via Copernican solar centrality and Keplerian ellipses, having major and minor axes rather than a single well-defined radius,
does indeed seem to fix the matter, but multi-object gravitational interactions create perturbation dynamics that, to this day, are difficult to treat using even the most
powerful methods available from General Relativity, perhaps the most successful
physical theory.
This being said, unlike Fourier series, agent-based models are cognitive in the
same sense as many of the dynamic phenomena they are supposed to emulate: they
must, in the end, compare incoming ‘sensory’ information with some internal ‘picture of the world’, and then choose some action from a set of those open to them
(Atlan and Cohen 1998). The Atlan/Cohen criterion is not a very high standard, as
the simplest thermostat is ‘cognitive’ in their sense. Even in the absence of some
internal picture of the world, any active choice necessarily decreases uncertainty in a
formal manner, and this implies the existence of an information source ‘dual’ to the
cognitive process of interest. Information sources are subject to constraints imposed
by the asymptotic limit theorems of information theory, and the role of information
in system control is constrained by the Data Rate Theorem that links control and
information theories. These constraints are similar to, but different from, the Central
Limit Theorem that conforms sufficiently long sums of stochastic variables, having
any inherent probability distributions, to the Normal distribution.
There is no way around such statistical constraints, although there may be ‘parametric’ and ‘nonparametric’ versions of them, representing different degrees of regularity.
We begin with a reconsideration of the linkage between control and information
theories.
1.2 The Data Rate Theorem
The Data Rate Theorem (DRT) (Nair et al. 2007) relates control and information
theories in the study of regulation and its failure. That is, the DRT tells how good the
headlights must be for driving on a particular twisting, potholed road at night. More
specifically, the DRT establishes the minimum rate at which externally-supplied
control information must be provided for an inherently unstable system to maintain
stability.
At first approximation, it is usual to assume a linear expansion near a nonequilibrium steady state, so that an n-dimensional vector of system parameters at time
t, xt , determines the state at time t + 1 according to the model of Fig. 1.1 and the
expression
(1.1)
xt+1 = Axt + Bu t + Wt
4
1 AI in the Real World
Fig. 1.1 A linear expansion near a nonequilibrium steady state of an inherently unstable control
system, for which xt+1 = Axt + Bu t + Wt . A, B are square matrices, xt the vector of system
parameters at time t, u t the control vector at time t, and Wt a white noise vector. The Data Rate
Theorem states that the minimum rate at which control information must be provided for system
stability is H > log[| det[Am |], where Am is the subcomponent of A having eigenvalues ≥ 1. This is
characterized as saying that the rate of control information must exceed the rate at which the unstable
system generates topological information. The US military strategist John Boyd has observed that
driving conflict at a rate more rapid than an adversary can respond causes fatal destabilization, in
this context making the rate of topological information greater than the rate at which an opponent
can exert control. All cognitive systems will be vulnerable to such challenge
A, B are fixed n × n matrices, u t is the vector of control information, and Wt is
an n-dimensional vector of white noise. The DRT under such conditions states that
the minimum control information rate H necessary for system stability is
H > log[| det[Am ]|] ≡ a0
(1.2)
where, for m ≤ n, Am is the subcomponent of A having eigenvalues ≥ 1. The
right hand side of Eq. (1.2) is interpreted as the rate at which the system generates
‘topological information’.
According to the DRT, stability will be lost if the inequality of Eq. (1.2) is violated.
For the night-driving example, if the headlights go out, the twisting road cannot be
navigated. Here we will examine in more detail the dynamics of control failure under
‘Clausewitz constraints’. A more comprehensive derivation of the DRT is given in
Wallace (2017, Sect. 7.10), based on an application of the Rate Distortion Theorem
that will vary according to the nature of the control channel, but is ultimately based
on the inherent convexity of all Rate Distortion Functions.
1.2 The Data Rate Theorem
5
For those familiar with the works of the US military strategist John Boyd, Eqs. (1.1)
and (1.2) and Fig. 1.1 instantiate something close to his vision of a necessary continuous cycle of interaction with the environment, assessing and responding to its constant
changes. Boyd asserts that victory in conflict is assured by the ability to ‘get inside’
the decision/correction control loop time frame of an opponent. That is, driving circumstances more rapidly than an adversary can respond triggers fatal destabilization
by making the rate at which topological information is generated greater than the
rate at which the adversary can counter with useful control information.
No cognitive system—biological, machine, organizational, or hybrid—is immune
to such attack.
1.3 The ‘Clausewitz Temperature’
How do elaborate control systems fail? The military strategist Carl von Clausewitz
emphasized two particular constraints leading to failure; ‘fog-of-war’ and ‘friction’.
The first term refers to the inevitability of limited intelligence regarding battlefield
conditions, and the second to the difficulty of imposing control, due to weather,
terrain, time lags, attrition, difficulty in resupply and logistics, and so on. Again,
for a night driving example, this might be represented as a synergism between poor
headlights and unresponsive steering.
Perhaps obviously, each critical real-time AI system will have several, perhaps
many, such constraints acting synergistically. We then envision, for each system, a
nonsymmetric n × n ‘correlation matrix’ ρ having elements ρi, j representing those
constraints and their pattern of interaction. Such matrices will have n invariants,
ri , i = 1..n, that remain fixed when ‘principal component’ transformations are
applied to data, and we construct an invariant scalar measure from them, based
on the well-known polynomial relation
p(λ) = det(ρ − λI ) = λn + r1 λn−1 + ...rn−1 λ + rn
(1.3)
det is the determinant, λ a parameter, and I is the n × n identity matrix. The first
invariant will be the trace of the matrix, and the last ± the determinant. Using these
n invariants we define an appropriate composite scalar index Γ = Γ (r1 , ..., rn ) as a
monotonic increasing real function. This is similar to the Rate Distortion Manifold
of Glazebrook and Wallace (2009) or the Generalized Retina of Wallace and Wallace
(2016).
Taking the one dimensional projection Γ as the ‘Clausewitz parameter’, we heuristically extend the condition of Eq. (1.2) as
H (Γ ) > f (Γ )a0
(1.4)
The Mathematical Appendix, following Wallace (2017, Sect. 7.10), uses a BlackScholes approximation to find that H (Γ ) will have, in first order, the unsurprising
6
1 AI in the Real World
Fig. 1.2 The horizontal line
represents the critical limit
a0 . For κ2 /κ4 κ1 /κ2 , at an
intermediate value of the
index Γ the ‘Clausewitz
temperature’ T falls below
that limit, and control fails
form H ≈ κ1 Γ + κ2 . Taking f (Γ ) to similar order, so that f (Γ ) = κ3 Γ + κ4 , the
limit condition becomes
κ1 Γ + κ2
> a0
(1.5)
T ≡
κ3 Γ + κ4
where we will characterize T as the ‘Clausewitz temperature’ of the system. For
Γ = 0 the stability condition is κ2 /κ4 > a0 . At large Γ this becomes κ1 /κ3 > a0 . If
κ2 /κ4 κ1 /κ3 , the stability condition may be violated at high Γ . Figure 1.2 shows
the pattern.
1.4 A Bottleneck Model
A second approach to Eq. (1.5) is via the information bottleneck method of Tishby
et al. (1999), adapted here from Wallace (2017, Sect. 9.5). The basic idea is to view
the control information H of Eq. (1.2) as the distortion measure in a Rate Distortion
Theorem argument. We examine a sequence of actual system outputs and, in a deterministic manner, infer from it a sequence of control signals Û i = û i0 , û i2 , ... that we
compare with the actual sequence of control signals U i = u i0 , u i1 , ... having a probability p(U n ). The RDT distortion measure is then the minimum necessary control
information for system stability H (Û i , U i ), and we write an ‘average distortion’ as
1.4 A Bottleneck Model
7
Hˆ ≡
p(U n )H (Û i , U i ) ≥ 0
(1.6)
Un
Using standard methods (Cover and Thomas 2006), we can then define a convex
‘Rate Distortion Function’ in the ‘distortion’ Hˆ . For illustration we take the RDF as
the standard Gaussian, although the essential result depends only on the function’s
inherent convexity (Cover and Thomas 2006). Then
R(Hˆ ) = 1/2 log[σ 2 /Hˆ ], Hˆ < σ 2
R(Hˆ ) = 0, Hˆ ≥ σ 2
(1.7)
Following Feynman (2000), information can be taken as a form of free energy
and the RDF can be used to define an ‘entropy’ via the Legendre transform
S ≡ R(Hˆ ) − Hˆ d R/d Hˆ
(1.8)
The next step is to apply a stochastic extension of the Onsager approximation of
nonequilibrium thermodynamics (de Groot and Mazur 1984) so that dynamics are
driven by the gradient of S in Hˆ :
d Hˆt = [−μd S/d Hˆ − G(T )]dt + β Hˆ dWt
μ
=[
− G(T )]dt + β Hˆ dWt
2Hˆt
(1.9)
dWt is a volatility White noise for Hˆ , β its magnitude, independent of the σ process. G(T ) is a monotonic increasing real positive function of the Clausewitz
temperature, the only possible determinant of the rate of topological information
generated by the inherently unstable system under control.
The stability of this relation can be studied via the Ito chain rule expansion for
ˆ
H 2 . Direct calculation shows that the expectation of Hˆ 2 will not be a real number
unless
√
G(T ) ≥ β μ
√
T ≥ G −1 (β μ) ≡ a0
(1.10)
which recovers Eq. (1.5). Other—convex—forms of RDF give the same result.
1.5 Dynamics of Control Failure
We next examine control failure, focusing on the dynamics of T itself, using a variant
of the bottleneck approach.
8
1 AI in the Real World
Again the central interest is on how a control signal u t in Fig. 1.1 is expressed in
the system response xt+1 , but here with a focus on T rather than on H .
Again the idea is to deterministically retranslate an observed sequence of system
outputs X i = x1i , x2i , ... into a sequence of possible control signals Û i = û i0 , û i1 , ...
and compare that sequence with the original control sequence U i = u i0 , u i1 , ..., with
the difference between them having a particular value under some chosen distortion
measure and hence having an average distortion
D≡
p(U i )d(U i , Û i )
(1.11)
i
where p(U i ) is the probability of the sequence U i and d(U i , Û i ) measures the distortion between U i and the sequence of control signals that has been deterministically
reconstructed from the system output.
Again, a classic Rate Distortion argument. According to the Rate Distortion Theorem, there exists a Rate Distortion Function, R(D), that determines the minimum
channel capacity necessary to keep the average distortion below some fixed limit
D (Cover and Thomas 2006). Based on Feynman’s (2000) interpretation of information as a form of free energy, it becomes possible to construct a Boltzmann-like
pseudoprobability in the Clausewitz temperature T as
exp[−R/T ]d R
d P(R, T ) = ∞
0 exp[−R/T ]d R
(1.12)
since higher T must necessarily be associated with greater channel capacity.
The denominator can be interpreted as a statistical mechanical partition function,
and it becomes possible to define another ‘free energy’ Morse Function (Pettini 2007)
F as
∞
exp[−R/T ]d R = T
(1.13)
exp[−F /T ] ≡
0
Defining an entropy in the free energy measure F as the Legendre Transform
S ≡ F (T ) − T dF /dT = T allows use of a stochastic Onsager approximation
for the dynamics of T in the gradient dS /dT (de Groot and Mazur 1984). The
resulting stochastic differential equation is
dTt = μdt + σ Tt dWt
(1.14)
where μ is a ‘diffusion coefficient’ representing the attempts of the system to meet
demand, dWt is Brownian white noise, and σ determines the magnitude of the volatility.
The base equation
dT /dt = μ > 0
(1.15)
1.5 Dynamics of Control Failure
9
‘explodes’ with increasing time. By the Stochastic Stabilization Theorem (Mao 2007;
Appleby et al. 2008), an ‘exploding’ function for which
| f (x, t)| ≤ |x|ω, ω > 0
(1.16)
can be stabilized by a volatility term σ xt dWt in the sense that
σ2
log[|x(t)|]
→−
+ω
t
2
t→∞
lim sup
(1.17)
almost surely. If σ 2 /2 > ω, x(t) → 0.
Thus, for fixed μ, rising ‘volatility’—increasing σ —can trigger a downward
ratchet leading to violation of the DRT condition for T in a highly punctuated
manner.
A variant of this model assumes a volatility expression of the form
σ T 2 + β 2 dWt , β > 0
in Eq. (1.14), so that there is an internal source of variation, β, independent of T .
Then expansion of log[T ] via the Ito relation, using Jensen’s inequality for a concave
function, leads to the expectation condition
E(T ) ≥ E(log[T ]) =
μ±
μ2 − β 2 σ 4
σ2
(1.18)
Some exploration shows the upper limit is stable, while the lower either rises to
the upper or collapses to zero. An evident necessary condition for any stability is
μ > βσ 2 , independent of, and in addition to, the DRT stability requirement that
T > a0 .
Another model introduces a ‘system’ parameter, φ, in Eq. (1.12), making the
replacement T → φT . In a military setting this might, for example, be a measure
of ‘force capacity’, or, following McQuire (1987), an ‘index of resolve’.
Then the entropy expression becomes
S = F (T , φ) − T ∂F /∂T − φ∂F /∂φ =
φT (log[φT ] + 2)
(1.19)
Considering φ to a be fixed external index, this produces the dynamic equation
dTt = (μ∂S /∂T )dt + σ Tt dWt =
(μφ(log[φTt ] + 3)dt + σ Tt dWt
(1.20)
10
1 AI in the Real World
Expanding log[Tt ] using the Ito Chain Rule gives the differential equation
1
dT /dt = μφ log[T (t)φ] + 2μφ − T (t)σ 2
2
(1.21)
As above, there are two nonequilibrium steady state solutions, constrained by
Jensen’s inequality, with the larger stable and the smaller either collapsing to zero or
increasing toward the larger. The relations are
1
−σ 2 exp(−3)
2φW
[−1,
]
σ2
μφ 2
1
−σ 2 exp(−3)
]
E(T S ) ≥ − 2 2φW [0,
σ
μφ 2
E(T L ) ≥ −
(1.22)
where W [−1, x], W [0, x] are the −1 and 0 branches of the Lambert W-function. As
above, large enough σ coalesces the upper and lower limits, causing T to collapse
to zero. Figure 1.3 shows that coalescence with increasing σ for the relation
−1
W [ j, −σ 2 ], j = −1, 0
σ2
Fig. 1.3 Coalescence of
stable and unstable
nonequilibrium steady state
modes of
−(1/σ 2 )W [i, −σ 2 ], i =
−1, 0 with increasing σ . If σ
increases sufficiently, then at
some point essential
regulatory mechanisms must
fail catastrophically
1.5 Dynamics of Control Failure
11
Setting the two different expressions for W in Eq. (1.22) equal to each other and
solving for φ gives a stability condition in terms of σ and μ. The trick is to recognize
that W [−1, −x] = W [0, −x] at a branch point x = exp[−1]. This gives the stability
condition on the force capacity/resolve φ as
σ
φ>
2μ exp(2)
(1.23)
Loss of force capacity remains a difficult conundrum for models of combat operations. Ormrod and Turnbull (2017), for example, write that
The practical relationship between attrition and combat remains uncertain, with a host of
variables influencing the outcome of battle [for example leadership, fire support, morale,
training, mobility, infiltration etc.]... Comprehensive assessment models of military forces
and combat skill is a difficult and unsolved proposition... [D]ata are far from convincing
that [available] simulations provide robust macro-attrition models that align with military
doctrine.
Again, McQuire focuses on ‘force resolve’ rather than attrition per se, although
most battles have been broken off at casualty rates less than 10%. Nonetheless, the
inference of Eq. (1.23), in consonance with much observation, is that sufficiently lowered force capacity φ—from either loss of resources or resolve—can be expected to
trigger tactical, operational, or strategic failure, depending on the scale of observa√
tion. Details vary, for this model, in proportion to the ratio σ/ μ.
It is probably necessary to make the same kind of expansion for φ as was done
for Γ in Sect. 1.2 so as to include factors of resolve as well as of material resource.
McQuire explicitly identifies a high level of enemy maneuverability as an essential
determinant of defeat in combat, and we can model the interaction between T and
φ from that perspective.
We first normalize variates as Tˆ ≡ T /Tmax , φ̂ ≡ φ/φmax . The interaction
between them is then taken as
d Tˆ /dt = μ1 φ̂(t)[1 − Tˆ (t)] − γ1 Tˆ (t)
d φ̂/dt = μ2 Tˆ (t)[1 − φ̂(t)] − γ2 φ̂(t)
(1.24)
The μi indicate positive feedback and the γi represent the rate of ‘entropy’ effects
that decrease the indices of interest, respectively the rates of attrition of situational
awareness and capability/force resolve.
Elementary calculation finds equilibrium values for this system as
μ1 μ2 − γ1 γ2
Tˆ →
μ2 (μ1 + γ1 )
μ1 μ2 − γ1 γ2
φ̂ →
μ1 (μ2 + γ2 )
(1.25)
12
1 AI in the Real World
Clearly, no equilibrium is possible unless μ1 μ2 > γ1 γ2 . That is, the system
collapses unless positive reinforcement effects are greater than entropy effects, in
this model.
We suppose that enemy maneuverability is measured by an inverse index R, a
composite, projected inverse mobility index, a retina-like scalar compiled in much
the same manner as Γ , instantiating the armored warfare mantra ‘don’t move to
fight, fight to move’. The reason for the choice of an inverse measure will become
apparent.
The simplest conjecture for the effect of enemy maneuverability on a combat
force is that μi ∝ R, γi ∝ 1/R. Then, for example,
R 2 − 1/R 2
Tˆ ∝
R(R + 1/R)
(1.26)
This relation is plotted in Fig. 1.4.
It is striking that this figure is closely similar to the growth of the giant component
in a random network (e.g., Wallace 2017, Fig. 3.1; Wallace 1993, Fig. 3b), suggesting
that opponent maneuverability can serve to weaken essential network linkages across
an embattled agent. This is consonant with John Boyd’s assertion that victory in
combat is more likely if it is possible to ‘get inside’ the decision loop of the adversary,
in essence generating topological information at a rate greater than can be met by
the adversary’s rate of control information.
Fig. 1.4 Normalized
Clausewitz temperature as a
function of an inverse index
of ‘enemy maneuverability’.
The relation is closely
similar to the growth of the
giant component in a random
network (Wallace 2017
Fig. 3.1; Wallace 1993,
Fig. 3b), suggesting that
increasing opponent
maneuverability acts to
weaken essential linkages
within a force or an
enterprise, consonant with
John Boyd’s conjecture
1.5 Dynamics of Control Failure
13
See Wallace (1993) for an application of the network-failure method to the recurrent collapse of fire service in New York City that began after 1972, triggered by
political ‘planned shrinkage’ fire service reductions focused in high population, high
density minority voting blocs. These reductions persist to the present, in spite of the
reoccupation of formerly minority communities by an affluent majority population.
That analysis centers on cascading hierarchical disintegration.
It is interesting to note that justification for these fire service reductions was by
means of ‘deployment algorithms’ developed by the Rand Corporation that have
since been institutionalized in a highly automated system that is a recognizable
precursor of the AI which will be given control of V2V/V2I and similar critical
infrastructure (Wallace and Wallace 1998). In essence, New York City’s housing
stock collapsed to a level that could be supported by the reduced fire extinguishment
services, resulting in the loss of hundreds of thousands of primarily low income units.
Analogous evolutionary selection pressures can be expected to follow widespread
deployments of AI control for other critical real-time systems.
A different approach is to expand Eq. (1.24) by letting X (t) represent the vector
< Tˆ , φ̂ >
and assuming that a ‘mobility function’, redesignated R̂(X (t)), acts directly—as
opposed to inversely with R above. Then Eq. (1.24) can be expressed as a stochastic
differential equation vector system
d X t = f (X t )dt + R̂(X t )dWt2
(1.27)
where f (X ) is the system of Eq. (1.24) and dWt2 is a two-dimensional vector of
white noise. The Stochastic Stabilization Theorem of Mao (2007) shows that, for
any function f (X ), there will be a vector function R̂(X ) that collapses f in the sense
that
1
(1.28)
lim sup log[|X (t)|] < 0
t→∞ t
almost surely.
That is, sufficient ‘enemy maneuverability’, in this model, if maintained long
enough, drives any levels of Clausewitz temperature and capacity/resolve to extinction.
One can, of course, imagine both this and the mechanism of Eq. (1.26) at work
together.
The challenge to an agent or agency is then to deny an ‘opponent’ the necessary
scale and pattern of maneuverability.
A simplified stochastic variant of Eq. (1.24) would involve fixing the value of φ,
analogous to the development of Eqs. (1.20)–(1.23). Then
d Tˆt = [μφ(1 − Tˆt ) − γ Tˆt ]dt + σ Tˆt dWt
= [μφ − (μφ + γ )Tˆt ]dt + σ Tˆt dWt
(1.29)
14
1 AI in the Real World
This has the mean
E(Tˆ ) =
μφ
μφ + γ
(1.30)
Two points are evident. First, since this is an expectation, there will always be
some probability that the system falls below the critical value for Tˆ determined
by the DRT. Second, as the rate of attrition of situational awareness, γ , rises, this
probability significantly increases.
However, applying the Ito Chain Rule to Tˆ 2 , after some calculation, finds
E(Tˆ 2 ) = [
μφ
]2
μφ + γ − σ 2 /2
(1.31)
an expression that explodes if σ becomes large enough.
That is, the condition for stable variance in this model is
2(μφ + γ )
1 2
φ > [σ /2 − γ ]
μ
σ <
(1.32)
Rising σ can thus trigger a particular instability leading to rapid violation of the
DRT condition.
1.6 The Failure of Cognition
A more comprehensive ‘cognitive’ argument can be made for less regular circumstances if it is possible to identify equivalence classes of a system’s developmental
pathways, e.g., ‘healthy’ versus ‘pathological’, permitting definition of a ‘developmental symmetry groupoid’ (Wallace 2017; Weinstein 1996; Golubitsky and Stewart
2006). A groupoid is a generalization of the idea of a symmetry group in which a
product is not necessarily defined between each element. The simplest example might
be a disjoint union of separate symmetry groups, but sets of equivalence classes also
define a groupoid. See the Mathematical Appendix for an introduction to standard
material on groupoids.
We will show that a new ‘free energy’ can then be defined that is liable to an analog
of Landau’s classical spontaneous symmetry breaking, in the Morse Theory sense
(Pettini 2007). Under symmetry breaking, higher ‘temperatures’ are associated with
more symmetric higher energy states in physical systems. Cosmological theories
make much of such matters in the first moments after the ‘big bang’, where different
physical phenomena began to break out as the universe rapidly cooled. Here, for
cognitive processes controlled by AI systems a decline in the Clausewitz temperature
T can result in sharply punctuated collapse from higher to lower symmetry states,
1.6 The Failure of Cognition
15
often resulting in serious failures analogous to developmental disorders across a
broad spectrum of control processes (Wallace 2017).
More specifically, we extend the perspective of the previous sections via the
‘cognitive paradigm’ of Atlan and Cohen (1998), viewing a system as cognitive
if it compares incoming signals with a learned or inherited picture of the world,
then actively chooses a response from a larger set of those possible to it. Intuitively,
choice implies the existence of an information source, since it reduces uncertainty
in a formal way. Wallace (2012, 2015, 2017) provide details.
Given a ‘dual’ information source associated with the inherently unstable cognitive system of interest, an equivalence class algebra can be constructed by choosing
different system origin states and defining the equivalence of subsequent states at a
later time by the existence of a high probability path connecting them to the same
origin state. Disjoint partition by equivalence class, analogous to orbit equivalence
classes in dynamical systems, defines a symmetry groupoid associated with the cognitive process. Again, groupoids are extensions of group symmetries in which there
is not necessarily a product defined for each possible element pair (Wallace 2017;
Weinstein 1996; Golubitsky and Stewart 2006).
The equivalence classes across possible origin states define a set of information
sources dual to different cognitive states available to the inherently unstable cognitive
system. These create a large groupoid, with each orbit corresponding to an elementary
‘transitive’ groupoid whose disjoint union is the full groupoid. Each subgroupoid is
associated with its own dual information source, and larger groupoids must have
richer dual information sources than smaller.
Let X G i be the system’s dual information source associated with groupoid element
G i . We next construct a Morse Function using the Clausewitz temperature T as the
temperature analog.
Let H (X G i ) ≡ HG i be the Shannon uncertainty of the information source associated with the groupoid element G i . Define a Boltzmann-like pseudoprobability
as
exp[−HG i /T ]
(1.33)
P[HG i ] ≡ j exp[−HG j /T ]
where the sum is over the different possible cognitive modes of the full system.
A ‘free energy’ Morse Function F can then be defined as
exp[−F/T ] ≡
exp[−HG j /T ]
j
F = −T log[
exp[−HG j /T ]]
(1.34)
j
Given the underlying groupoid generalized symmetries associated with high-order
cognition, as opposed to simple control theory, it is possible to apply a version
of Landau’s symmetry-breaking approach to phase transition (Pettini 2007). The
shift between such symmetries should remain highly punctuated in the Clausewitz
16
1 AI in the Real World
temperature T , but in the context of what are likely to be far more complicated
groupoid rather than group symmetries.
As above, it is possible to invoke an index of resolve/capability by the mapping
T → φT in Eqs. (1.33) and (1.34).
Based on the analogy with physical systems, there should be only a few possible
phases, with sharp and sudden transitions between them as the Clausewitz temperature T decreases.
It is possible to examine sufficient conditions for the intractable stability of the
pathological ‘ground state’ via the Stochastic Stabilization Theorem (Appleby et al.
2008; Mao 2007). Suppose there is a multidimensional vector of parameters associated with that phase, X , that measures deviations from the pathological state. The
free energy measure from Eq. (1.34) allows definition of another entropy in terms of
a Legendre transform
(1.35)
Ŝ ≡ F(X ) − X · ∇ X F
It is then possible to write another first-order ‘Onsager’ dynamic equation in the
gradients of Ŝ that will have the general form
d X t = f (X t , t)dt + σ g(X t , t)dWt
(1.36)
where dWt is multidimensional white noise.
Again, f (X t , t) is a first-order ‘diffusion’ equation in the gradients of Ŝ by X .
Typically, the base equation d X/dt = f (X, t) will have a solution |X (t)| → ∞.
The multidimensional version of the Stochastic Stabilization Theorem (Mao 2007)
ensures that, under very broad conditions, sufficiently large noise, that is, great
enough σ , will drive |X (t)| logarithmically to zero for very general forms of g(X, t),
stabilizing the pathological mode. Colored noise can be treated using the DoleansDade exponential to give much the same result (Protter 1990).
For nonergodic systems, where time averages are not the same as ensemble averages, the groupoid symmetries become ‘trivial’, associated with the individual high
probability paths for which an H -value may be defined, although it cannot be represented in the form of the usual Shannon ‘entropy’ (Khinchin 1957, p. 72). Then
equivalence classes must be defined in terms of other similarity measures for different developmental pathways. The ‘lock-in’ of the pathological mode then follows
much the same argument. These matters will be more fully examined in the following
chapter.
1.7 No Free Lunch
Algorithmic systems—cognitive in the Atlan/Cohen sense or otherwise—are constrained by the ‘no free lunch’ theorem, the NFLT (e.g., Wallace 2017, Sect. 4.11,
Wolpert and MacReady 1995, 1997). Algorithmic approach to real-time problems usually centers on optimization of some particular objective value function.
1.7 No Free Lunch
17
The NFLT, as developed by Wolpert and Macready, implies there is no generally
superior function optimizer. That is, an optimizer, in their development, pays for
superior performance on some functions with inferior performance on others. In general, gains and losses balance and all optimizers have identical average performance:
superiority on one subset of functions implies inferiority on the complementary subset. Any version of this result implies the necessity of tuning in the face of dynamic
challenge, i.e., of the necessary existence of a higher executive, or hierarchy of them,
able to sense when to ‘change gears’ in a shifting ‘roadway’ environment.
A parallel approach follows the arguments of Wallace (2017, Sect. 4.10). It is surprisingly straightforward to invert the Shannon Coding Theorem, fixing the distribution of the message sent along some channel, but tuning the probability distribution
of that channel so as to maximize the information transmitted at the fixed message
distribution. In a sense, the message is (formally) taken as transmitting the channel,
and, according to the Shannon Coding Theorem, there will then be a channel distribution that will maximize such a dual channel capacity. Channel maximization for
one set of control signals is thus highly specific to the message transmitted, and must
be retuned for a different message.
Shannon (1959) described something of this in terms of a duality between the
properties of an information source with a distortion measure and those of a channel,
particularly for channels in which there is a cost associated with the different ‘letters’
transmitted. Solving the problem, he states, corresponds to finding a source that is
right for the channel and the desired cost. In a dual way, evaluating the rate distortion
function for a source corresponds to finding a channel that is just right for the source
and the allowed distortion level.
The implication is that, under dynamic circumstances, there must be a higher executive, an overriding cognitive system, able to retune the underlying control system
of Fig. 1.1 according to shifting demands, i.e., a dynamic ‘optimizer of optimizers’.
There may even be higher levels of command. Such hierarchical cognition will be
particularly susceptible to frictional and fog-of-war impediments, and to the effects
of events ‘getting inside the command loop’ to adapt John Boyd’s terminology. The
result is an inevitable violation of the Data Rate Theorem, and onset of debilitating
instability.
We model this dynamic as follows.
1.8 The ‘Boyd Temperature’
Chains of command involve the nesting of cognitive processes according to some
intricate—and often dynamic—topology. Here, we use an upper limit argument to
estimate the demand of the full system for resources, taken as a ‘free energy’ per
unit time for each ‘cognitive module’ 1 ≤ j ≤ n.
Each module then requires M j > 0 units of resource per unit time. These are
‘cyberphysical’ systems that, in biological terms, would be characterized as instantiating ‘embodied cognition’. As a consequence M j = M j (M j , C j ) is a monotonic
18
1 AI in the Real World
increasing function of both the rate of material supply M j and of information supply
capability of the system is
characterized by a local channel capacity C j . The overall
seen as limited to some total maximum rate M = j M j (M j , C j ).
Wallace (2016) uses an Arrhenius reaction rate model to argue that the rate of
individual module cognition is then given as exp[−K j /M j ] for an appropriate K j >
0. We focus first on optimization under real-time ‘roadway’ constraints in which
tactical rather than strategic considerations predominate.
Taking
a Lagrange multiplier approach to efficiency optimization under the constraint
j M j = M > 0, we use the simplest possible equally-weighted multiobjective scalarization, producing the Lagrangian
L≡
exp[−K j /M j ] − λ[
j
M j − M]
(1.37)
j
More sophisticated approaches are possible, leading to Pareto-optimal surfaces
for distributed multi-agent optimization (e.g., Lobel et al. 2011; Hwang and Masud
1979), understanding, however, that Pareto-optimal strategies may be highly pathological.
The corresponding gradient equations are
Kj
exp[−K j /M j ]
=λ
M 2j
M=
Mj
j
∂ L/∂ M = λ
(1.38)
where, abducting arguments from physical theory, λ is taken as the ‘inverse Boyd
temperature’ of the full system. Any good Statistical thermodynamics text will go
through the argument (e.g., Schrodinger 1989, Chap. II). The calculation is based on
maximizing a probability distribution using Lagrange multipliers. Then log(P) in
N!
n 1 !n 2 !...
is maximized subject to the constraints i n i = N and i εi n i = E, where n i
is the number in state i and εi its energy. One then applies the Stirling formula
n! ≈ n(log(n) − 1) and some hand-waving to identify the energy multiplier as an
inverse temperature.
Figure 1.5 shows a single term for K j = 0.5 over a range 0 ≤ M j ≤ 2.
It is important to recognize that, for small λ, i.e., high Boyd temperature, an M j
may become arbitrarily large, a requirement that cannot be met: the system then fails
catastrophically.
P=
1.8 The ‘Boyd Temperature’
19
Fig. 1.5 Optimization of the
response rate term
exp[−K j /M j ], K = 0.5.
Small values of λ—high
Boyd temperature—imply
resource demands that
simply cannot be met
Clearly, then, sufficient ‘cognitive challenge’ creates the conditions for sudden,
punctuated collapse. This follows directly from the inference that, for a given cognitive module, we will most likely have something much like M j ∝ φ j T j , i.e., the
rate of resource consumption is determined by the synergism between the force
capacity/resolve index and the Clausewitz temperature index. At the very least,
M j (M j , C j ), where C j represents the required information channel capacity, must
itself be some positive, monotonic increasing function of them both.
Although we have used a ‘simple’ deterministic model, the real world is seldom
deterministic: the essential parameters of Eqs. (1.37) and (1.38) can themselves be
stochastic variates, and we enter the complicated realm of stochastic programming,
following closely the presentation of Cornuejols and Tutuncu (2006, Chap. 16).
Many optimization problems are described by uncertain parameters, and one form
of approach, stochastic programming, assumes that these uncertain parameters are
random variables with known probability distributions. This information is then used
to transform the stochastic program into a so-called deterministic equivalent which
might be a linear program, a nonlinear program, or an integer program. As Cornuejols
and Tutuncu put it,
While stochastic programming models have existed for several decades, computational technology has only recently allowed the solution of realistic size problems... It is a popular
modeling tool for problems in a variety of disciplines including financial engineering...
Stochastic programming models can include anticipative and/or adaptive decision variables.
Anticipative variables correspond to those decisions that must be made here-and-now and
cannot depend on the future observations/partial realizations of the random parameters...
20
1 AI in the Real World
Evidently, real-time critical systems will often fall heavily into the anticipative
category.
We provide a relatively simple example, explicitly reconsidering the effectiveness
reaction rate index exp[−K j /M j ].
The scalarization function is then to be replaced by its expectation before the
optimization calculation is carried out.
We assume, first, that the K j have exponential distribution density functions, i.e.,
ρ(K j ) = ω j exp[−ω j K j ] so that
E(K j ) =
∞
K j ρ(K j )d K j = 1/ω j
(1.39)
0
As a consequence,
E(exp[−K k /M j ]) =
∞
ω j exp[−ω j K j ] exp[−K j /M j ]d K j =
0
Mjωj
Mjωj + 1
(1.40)
The first part of Eq. (1.38) then becomes
M j ω2j
ωj
−
=λ
M j ω j + 1 (M j ω j + 1)2
(1.41)
Figure 1.6 shows this relation for < K j >= 1/ω j = 0.5. Again, small λ, equivalent to a high Boyd temperature, is to be associated with exploding demand for
resources, but without the a zero state as at the left of the peak in Fig. 1.5. In this
case, noise precludes such a state.
A second approach is to take the M j themselves as stochastic variables having
exponential distributions and the K j as fixed parameters so that
< M > j ≡ E(M j ) =
∞
M j ω j exp[−ω j M j ]d M j = 1/ω j
(1.42)
0
The Lagrangian in the effectiveness measure is then
L=
j
E(exp[−K j /M j ]) − λ[
E(M j ) − M] =
j
1
2 ω j K j Bessel K (1, 2 ω j K j ) − λ[
− M]
ωj
j
j
(1.43)
1.8 The ‘Boyd Temperature’
21
Fig. 1.6 Stochastic
optimization of the do-or-die
effectiveness measure under
an exponential distribution
for the rate parameters K j .
Here,
< K j >= 1/ω j = 0.5.
Under these stochastic
circumstances there is no
point ‘behind the peak’, and
a small λ, corresponding to a
high Boyd temperature, leads
directly to resource demands
that cannot be met
The first term in the gradient equation analogous to that of Eq. (1.38), but now
replacing ω j with 1/ < M > j , is
2Bessel K (0, 2 K j / < M > j )(K j / < M >2j ) = λ
(1.44)
Figure 1.7 plots that relation for K j = 0.5.
The average demand for resources, < M j >= 1/ω j , grows very rapidly with
declining λ under this model, again reaching impossible levels under real-time frictional constraints.
If both K j and M j obey exponential distributions,
E(exp[−K j /M j ]) =
< M >j
(1 − 2 exp[< K > j / < M > j ]Ei 3 (< K > j / < M > j ))
< K >j
(1.45)
where Ei n is the exponential integral of order n.
The gradient equation of the resulting Lagrangian in < M > j becomes
λ=
1
(2 exp[K /M](K − M)Ei 3 (K /M)
KM
−2K exp[K /M]Ei 2 (K /M) + M)
(1.46)
where we have suppressed the j index and both M and K are their expectation values.
22
1 AI in the Real World
Fig. 1.7 Stochastic
optimization for the
effectiveness measure under
an exponential distribution in
the resource supply M j at
fixed values of K j . Then
< M j >= 1/ω j , and the
average resource demand
rapidly becomes extreme for
declining λ
Fig. 1.8 Stochastic
optimization under resource
constraint assuming K j and
M j both follow exponential
distributions. < K > j = 0.5
Figure 1.8 shows this relation for < K > j = 0.5, and is similar in form to Fig. 1.6.
It is of some interest to carry through this program for the efficiency measure
exp[−K /M]/M which would become important on strategic scales of analysis, that
is, long-term conflict beyond do-or-die exigencies. The equations below, for which
1.8 The ‘Boyd Temperature’
23
we have suppressed the j index, are equivalent to the first of Eq. (1.38): (1). fully
deterministic. For exponential distributions, (2). deterministic K , stochastic M, (3).
stochastic K , deterministic M, (4). both K and M stochastic.
K exp[−K /M] exp[−K /M]
−
M3
M√2
√
√
2Bessel K (0, 2 K /M) 2Bessel K (1, 2 K /M) K M
λ=−
+
M2
M3
1/K 2
λ=
(M/K + 1)2
exp[K /M](K + M)Ei 1 (K /M) + M
λ=
M3
λ=
(1.47)
Ei 1 is the exponential integral of order 1. In the second equation, M is actually
the expectation < M >. For the third, K is the expectation, and for the fourth, both
are expectations.
Figure 1.9 shows the detail of the deterministic result, which admits of negative
Boyd temperatures. Such values are analogous to negative temperatures in unstable
‘pumped’ physical systems, like lasers. Evidently, a negative Boyd temperature for
the deterministic efficiency measure implies extraordinary demands for resources
over the strategic, as opposed to short-time tactical, time scale.
Fig. 1.9 Deterministic
model for optimization of the
efficiency
index
j exp[−K j /M j ]/M j .
Negative λ implies
extraordinary demand for
resources over the strategic
time scale. K is taken as 0.5
24
1 AI in the Real World
Fig. 1.10 Term-by-term stochastic optimization for the efficiency index j exp[−K j /M j ]/M j .
a fixed K , stochastic M, b stochastic K , fixed M, c both stochastic. Exponential distributions
assumed. K and < K > are taken as 0.5
Figure 1.10 shows the pattern for the different stochastic optimizations: (a) fixed
K , stochastic M, (b) stochastic K , fixed M, (c) both stochastic. In all cases, the
demand for resources, either directly or on average, becomes explosive with declining λ.
These stochastic optimization calculations are not completely trivial and needed
a sophisticated computer algebra program for their solution.
One-parameter distributions, in general, can be explored using a variant of the
method applied here. Under such a condition, < M > j ≡ E(M j ) can be expressed
as a function of the distribution’s characteristic parameter, say α j , recalling that the
distribution function is ρ(α j , M j ). Then
1.8 The ‘Boyd Temperature’
25
< M > j ≡ E(M j ) =
∞
M j ρ(α j , M j )d M j = Q j (α j )
0
This can be back-solved as α j = Q −1
j (< M > j ), which can then be used to calculate E(exp[−K j /M j ]) or the expectation of the efficiency measure. Differentiating
under the integral in < M > j gives the gradient expression of Eq. (1.38), which can
be relatively easily evaluated by numerical means. The same argument applies to the
K j . A good computer algebra program will, in fact, sometimes generate functions
that can be explicitly plotted. For example, taking the Rayleigh distribution for M j ,
i.e.,
ρ(M j ) =
Mj
exp[−M 2j /(2σ j2 )]
σ j2
√
so that < M > j = σ j π/2, leads to an equation like Eq. (1.44), but far more
complicated and in terms of several MeijerG functions instead of a single BesselK
function. For K = 0.5, however, the graph against < M > j is very similar to Fig. 1.7.
Weibull and Levy distributions for K give recognizably similar results to Figs. 1.7,
1.8, 1.9 and 1.10 for both effectiveness and efficiency measures when M is deterministic, with details depending on the distribution parameter values. The Wald
distribution also gives similar results in both M and K separately.
Many other distributions are less algebraically tractable and need numerical exploration. And, as Cornuejols and Tutuncu (2006) point out, complex real-world applications, involving often highly dynamic empirical distributions, are likely to be computationally challenging.
1.9 Flash-Crash Market Pathologies
Many years ago, Huberman and Hogg (1987) examined the punctuated onset of
collective phenomena across interacting algorithmic systems:
We predict that large-scale artificial intelligence systems and cognitive models will undergo
sudden phase transitions from disjointed parts into coherent structures as their topological
connectivity increases beyond a critical value. These situations, ranging from production
systems to semantic net computations, are characterized by event horizons in space-time
that determine the range of causal connections between processes. At transition, these event
horizons undergo explosive changes in size. This phenomenon, analogous to phase transitions in nature, provides a new paradigm with which to analyze the behavior of large-scale
computation and determine its generic features.
Recent work on ‘flash crash’ stock market collapses bears out these predictions,
implying indeed dynamics roughly analogous to the Boyd mechanism of the previous
section (Parker 2016a, b; Johnson et al. 2013; Zook and Grote 2017).
26
1 AI in the Real World
Fig. 1.11 From Zook and Grote 2017. Flash Crash of May 6, 2010. Percent change in Standard
and Poor’s 500 index at one minute intervals during trading day. Chapter 6 will examine such
phenomena as an example of the punctuated onset of a coevolutionary ‘language that speaks itself’.
Military coevolutionary catastrophes can play out over much longer time scales
Figure 1.11, from Zook and Grote (2017), shows the flash crash of May 6, 2010,
in which Standard and Poor’s 500 index declined by about 5 percent in only a few
minutes.
Zook and Grote (2017) remark
In the days of HFT [high frequency trading] with its enormous technological infrastructure,
public information is transformed into orders that are brought to the market extremely fast
so that they resemble private information—at least with regard to other, slower, market
participants. Fore fronting the processes and strategies contained in the assemblages of
HFT is essential in recognizing that the recreation of capital exchanges is not simply an
exercise in efficiency but a calculated strategy. The human traders directing the efforts of
HFT assemblages rely upon space-based strategies of information inequality to extract profits
while simultaneously introduce new and unknown risks into the market.
Parker (2016a) writes:
While a wide variety of causes have been offered to explain the anomalous market phenomena
known as a ‘Flash Crash’, there is as of yet no consensus among financial experts as to the
sources of these sudden market collapses. In contrast to the behavior expected from standard
financial theory, both the equity and bond markets have been thrown into freefall in the
absence of any significant news event. The author posits that information theory offers a
relatively simple explanation of the causes of some of these dramatic events. This... suggests
new policies or measures to lower the probability of occurrence and to mitigate the effects
of these extreme events. [It is possible to develop] equations modeling the adjusted volatility
for equity markets and the information theory derived yield term for treasury markets. These
equations both take as inputs the information production (CC A ) and processing rates (CC L )
of the market and market participants respectively. The value of the ratio (CC A /CC L ) of
these rates determines different regimes of normal and ‘anomalous’ behaviors for equity and
bond markets. As this ratio evolves over a continuum of values, these markets can be shown
to go through phase transitions between different types of behavior...
1.9 Flash-Crash Market Pathologies
27
Thus the ratio CC A /CC L acts as a temperature analog in the Parker model.
Johnson et al. (2013) put it somewhat differently, invoking their own version of
the Boyd analysis:
Society’s techno-social systems are becoming ever faster and more computer-oriented. However, far from simply generating faster versions of existing behavior... this speed-up can generate a new behavioral regime as humans lose the ability to intervene in real time. Analyzing
millisecond-scale data for... the global financial market, we uncover an abrupt transition to
a new all-machine phase characterized by large numbers of subsecond extreme events.
Their ‘temperature’ analog is the real-time ratio of the number of available strategies to the number of active agents. If this is greater than 1, the system remains
stable in their model. Below 1, the system undergoes a phase transition to an unstable dynamic regime. See their Fig. 6 for details, and their reference list for some of
the other phase transition studies of the flash-crash pathology.
Something similar to Parker’s analysis emerges from the arguments of the previous section—although only indirectly as a temperature—by letting the constraint
M, via its components M j , be given wholly in terms of the available information
channel capacity
C j , replacing the resolve-and-information constraint above. That
M
(M
becomes a purely informational constraint in a multiis, M =
j
j , C j ), j
channel complex, i.e., C = j C j . The system’s inverse Boyd temperature index λ
then determines whether there is enough channel capacity available to permit stability.
Unlike Parker’s single component result, for larger, interactive systems, under certain
Boyd temperature regimes there may never be enough channel capacity available.
That is, for the flash crash example, if the rate of challenge ‘gets inside the command
loop’ of the market system, CC L of individual components can never be made large
enough for stabilization: the response rate calculation leading to Figs. 1.6 and 1.7
suggest that high enough Boyd temperature—sufficiently small λ—leads to channel
capacity demands for individual modules that cannot be met.
These mechanisms have been recognized as sources of instability in AI-driven
military confrontations (e.g., Baumard 2016). As Kania (2017) put it in the context
of the inevitably different design ‘cultures’ for Western and Chinese military AI
systems,
Against the backdrop of intensifying strategic competition, great powers are unlikely to
accept constraints upon capabilities considered critical to their future military power. At
this point, despite recurrent concerns over the risks of ‘killer robots,’ an outright ban would
likely be infeasible. At best, militaries would vary in their respective adherence to potential
norms. The military applications of AI will enable new capabilities for militaries but also will
create new vulnerabilities. This militarization of AI could prove destabilizing, potentially
intensifying the risks of uncontrollable or even unintended escalation. There will likely be
major asymmetries between different militaries’ approaches to and employment of AI in
warfare, which could exacerbate the potential for misperception or unexpected algorithmic
interactions.
Turchin and Denkenberger (2018), in their long chapter on military AI, devote
only a single paragraph to such dynamics:
Nuclear weapons lessened the time of global war to half an hour. In the case of war between
two military AIs it could be even less.... A war between two military AIs may be similar to
28
1 AI in the Real World
the flash-crash: two AIs competing with each other in a stable mode, could, in a very short
time (from minutes to milliseconds), lose that stability. They could start acting hostilely to
each other...
Altmann and Sauer (2017) provide a more comprehensive analysis, explicitly
citing ‘flash-crash’ examples, including an April 2011 ‘combat’ on Amazon between
vendor algorithms that escalated the price offered for an out-of-print biology book
to $23.7 million. As they put it,
With the goal of improved military effectiveness providing a strong incentive to increase
operational speeds, and thus allow [autonomous weapon systems] to operate without further
human intervention, tried and tested mechanisms for double-checking and reconsideration
that allow humans to function as fail-safes or circuit-breakers are discarded. This, in combination with unforeseeable algorithm interactions producing unforseeable military outcomes,
increases crisis instability and is unpleasantly reminiscent of Cold War scenarios of accidental war... [Autonomous weapons systems] are also bound to introduce stronger incentives
for premeditated (including surprise) attacks...
It seems clear that the risk of such pathological interaction is inherent to AI control
of real-time critical systems across many venues.
In Chap. 6 we will reexamine algorithmic flash-crash processes and similar phenomena from the more general perspective of evolutionary theory, suggesting that
they represent the punctuated onset of rapid coevolutionary dynamics, in effect,
of a ‘language that speaks itself’, creating instabilities far beyond those resulting
from John Boyd’s command loop challenge. Indeed, quite perversely, command
loop robustness under dynamic challenge is the keystone to the instability.
1.10 Network Fragmentation
Many AI systems, like driverless cars on intelligent roads or Agent-based models of
complex financial or other phenomena, will involve networks of cognitive entities that
exchange information and/or affect each other directly in a characteristic ‘real time’.
Contending military hierarchies, of course, provide a central paradigm. Consideration
suggests that ‘phase transitions’ in such systems are dependent, no only on such
temperature analogs as T φ, but on their rate-of-change. The argument is direct,
centering on the ‘free energy’ measure F of Eq. (1.34).
Following standard argument, we take K ≡ 1/T φ as an inverse temperature.
The essential idea is to define a metric on the network structure representing some
inherent distance measure, L , between interacting nodes. Typically, this will be some
monotonic increasing positive inverse measure of their probability of interaction:
smaller probability, larger ‘distance’.
Let J be a dummy variable that will be set to zero in the limit. The central question
regards the dynamics of the system as K → KC , where KC is the critical value at
which a phase transition occurs.
Interest focuses on both F(J, K ) and on the correlation length of the system
across the network, χ (J, K ).
1.10 Network Fragmentation
29
Abducting the basic physical model of Wilson (1971), we impose a renormalization symmetry as (Wallace 2005)
F(JL , KL ) = L D F(J, K )
χ (JL , KL ) = χ (J, K )/L
(1.48)
where JL and KL are the transformed values after the clumping renormalization,
and we take J1 , K1 ≡ J, K . D is a real positive number characteristic of the network,
here most likely a fractal dimension. In physical systems D is integral and determined
by the underlying dimensionality of the object under study (Wilson 1971). As shown
in the Mathematical Appendix, many different such renormalization relations are
possible for cognitive systems.
These relations are presumed to hold in the neighborhood of the critical value of
the transition index, KC .
Differentiating with respect to L gives expressions of the form
dKL /dL = w(JL , KL )/L
d JL /dL = v(JL , KL )JL /L
(1.49)
These equations are solved for JL and KL in terms of L , J and K . Substituting
back and expanding in a first order Taylor series near the critical value KC gives
an analog to the Widom-Kadanoff relations of physical systems (Wilson 1971). In
particular, letting J → 0 and taking ω = (KC − K )/KC gives, in first order near
KC
F = ω D/y F0
χ = ω1/y χ0
(1.50)
where y > 0, F0 , χ0 are constants.
In standard form, at the critical point a Taylor expansion of the renormalization
equations gives a first order matrix of derivatives whose eigenstructure defines system
dynamics (Wilson 1971; Binney et al. 1986).
Next, assume that the rate of change of ω = (KC − K )/KC remains constant
at some rate |dω/dt| = 1/τ K . Arguing by abduction from physical theory suggests
there is a characteristic time constant for the phase transition, τ ≡ τ0 /ω, such that if
changes in ω take place on a timescale longer than τ for any given ω, the correlation
length χ = χ0 ω−s , s = 1/y, will be in equilibrium with internal changes and result
in very large fragments in L-space.
Zurek (1985, 1996) argues that the ‘critical’ time will occur for a system time
tˆ = χ /|dχ /dt| such that tˆ = τ . Taking the derivative dχ /dt, remembering that
dω/dt ≡ 1/τ K , gives
(1.51)
χ /|dχ /dt| = ωτ K /s = τ0 /ω
30
1 AI in the Real World
so that
ω=
sτ0 /τ K
(1.52)
Substituting this into the relation for the correlation length gives the expected
fragment size in L -space, d(tˆ), as
d ≈ χ0 (τ K /sτ0 )s/2
(1.53)
with s = 1/y > 0.
The more rapidly K approaches KC , the smaller τ K and the smaller and more
numerous are the resulting fragments in L -space. Under real-time combat or combatlike conditions, such fragments will have lost essential economies of both scale and
command cohesion.
A more detailed examination of the phase transitions associated with fragmentation is given in the Mathematical Appendix under the heading ‘Cognitive renormalization’. The exact form of Eq. (1.51) depends critically on the renormalization model
adopted, which, in turn, is dependent on the rate of growth of F with increasing L .
1.11 The Ratchet
What are the limits on T (or on T φ), the temperature analog that determines cognitive function in elaborate AI (and other) cognitive systems? We reconsider the
argument leading to Eq. (1.13).
First, assume that T → T + Δ, Δ
T.
This leads to an expression for the free energy index F of the form
exp[−
F
]=
(T + Δ)
∞
exp[−R/(T + Δ)]d R = (T + Δ)
(1.54)
0
Defining another entropy in the free energy measure F as S ≡ F (Δ) −
ΔdF /dΔ allows use of an iterated stochastic Onsager approximation for the dynamics of Δ in the gradient dS /dΔ (de Groot and Mazur 1984). The resulting stochastic
differential equation is
dΔt =
μΔt
dt + σ Δt dWt ≈
T + Δt
μ
Δt dt + σ Δt dWt
T
(1.55)
where μ is an appropriate ‘diffusion coefficient’, dWt represents Brownian white
noise, σ determines the magnitude of the volatility, and we use the condition that
Δ
T.
1.11 The Ratchet
31
Applying the Ito Chain Rule (Protter 1990) to log[Δ] produces the SDE
d log[Δt ] = (
μ
1
− σ 2 )dt + σ dWt
T
2
(1.56)
Invoking the Stochastic Stabilization Theorem (Mao 2007; Appleby et al. 2008),
log[|Δt |]
→< 0
t→∞
t
lim
almost surely unless
μ
1
> σ2
T
2
2μ
T < 2
σ
(1.57)
The essential point is that there will be an upper limit to T in this version of the
ratchet. Above that ceiling, other things being equal, Δt → 0.
This mechanism might constrain the maximum possible T .
Conversely, a sudden increase in σ might trigger a decline in T that in turn causes
a subsequent increase in σ , leading to a downward ratchet and system collapse.
1.12 Operational and Strategic Failure
Typically, for real-time systems, local entities are engaged in what the military call
immediate do-or-die ‘tactical’ challenges, for example a single driverless car in a
rapidly varying traffic stream. Two subsequent layers of cognition, however, are
imposed on the tactical level. The highest involves the ‘strategic’ aims in which
tactical decisions are embedded. For driverless cars on intelligent roads, so-called
V2V/V2I systems, the ultimate aim is unimpeded, rapid traffic flow over some preexisting street network. Connecting strategy to tactics is done through the operational
level of command, the necessary interface between local and ultimate cognitive
intent. While ‘tactical’ problems usually have relatively straightforward engineering
solutions—lidar, radar, V2V crosstalk, and so on for driverless cars—operational
and strategic levels do not.
As Watts (2008), in a military setting, puts it
The cognitive skills demanded of operational artists and competent strategists appear to
differ fundamentally from those underlying tactical expertise in do-or-die situations... Tactical competence does not necessarily translate into operational competence... Operational
problems, being wicked [in a precise technical sense] are characterized by complexity and
uncertainty embedded in a turbulent environment riddled with uncertainties.
32
1 AI in the Real World
Rose (2001) explores critical US strategic intelligence failures during the Korean
War. The tactical brilliance of the US amphibious landing at Inchon, on South Korea’s
Northwest coast, on September 15, 1950 was matched by a stunning blindness to persistent and accurate intelligence reports of a massive Chinese buildup in Manchuria.
Indeed, the Chinese had already sent numerous diplomatic signals that they viewed
US presence north of the 38th Parallel as a strategic threat. US Cold War doctrine,
however, dictated that the Soviet Union controlled all Communist entities, and that,
fearing war with the US, the Soviets would simply reign in the Chinese. US China
scholars, who would have known better and might have entered into policy discussions, had all been silenced by the McCarthy era smears about who had ‘lost China’.
US commanding general Douglas MacArthur and his hand-picked, sycophantic staff
argued that, in spite of the evident massive military buildup, the Chinese would not
intervene. On October 13, they began doing so, in two distinct stages.
As Rose puts it,
By mid-November [1950], FEC reported that 12 PLA divisions had been identified in Korea.
On 24 November, however, National Intelligence Estimate 2/1 stated that China had the
capability for large-scale offensive operations but that there were no indications such an
offensive was in the offing. That same day, the second Chinese offensive started, leaving the
8th Army fighting for its life and most of the 1st Marine Division surrounded and threatened
with annihilation.
It took several days for MacArthur and his staff to face the fact that his ‘end of the war’
offensive toward the Yalu was over and victory was not near. Finally, on 28 November,
MacArthur reported that he faced 200,000 PLA troops and a completely new war. MacArthur
again had the numbers significantly wrong, but he got the ‘new war’ part right.
Similarly, Bowden (2017) describes in some detail US operational and strategic
failures associated with the occupation of the Vietnamese city of Hue during the
1968 Tet offensive by a highly-disciplined and well-equipped North Vietnamese and
Viet Cong force of some 10,000. US operational and strategic command assumed
that, as had been previous experience, the Vietnamese could not possibly field such
a force and that whatever groups occupied Hue would, as was their previous custom,
withdraw at the first US counterattack. In consequence, US operational command, in
the face of frequent and accurate intelligence and field reports of the real strength of
enemy opposition, repeatedly ordered some few hundred Marines to advance under
fire and ‘clear Hue’.
As one commentator put it, in the context of current US military operations, ‘we
have learned nothing since Vietnam’.
The military writings of Mao Tse-Tung (1963, pp. 79–80), in a piece from December of 1936, put it thus:
Why is it necessary for the commander of a campaign or a tactical operation to understand
the laws of strategy to some degree? Because an understanding of the whole facilitates the
handling of the part, and because the part is subordinate to the whole. The view that strategic
victory is determined by tactical successes alone is wrong because it overlooks the fact that
victory or defeat in a war is first and foremost a question of whether the situation as a whole
and its various stages are properly taken into account. If there are serious defects or mistakes
in taking the situation as a whole and its various stages into account, the war is sure to be
lost.
1.12 Operational and Strategic Failure
33
In this regard, Nisbett and Miyamoto (2005) find
There is recent evidence that perceptual processes are influenced by culture. Westerners tend
to engage in context-independent and analytic perceptual processes by focusing on a salient
object independently of its context, whereas Asians tend to engage in context-dependent
and holistic perceptual processes by attending to the relationship between the object and the
context in which the object is located. Recent research has explored mechanisms underlying
such cultural differences, which indicate that participating in different social practices leads
to both chronic as well as temporary shifts in perception. These findings establish a dynamic
relationship between the cultural context and perceptual processes. We suggest that perception can no longer be regarded as consisting of processes that are universal across all people
at all times.
Operational and strategic incompetence can often be characterized as a particular
kind of communication failure, i.e., the failure of the diffusion of essential information across a network weighted by an index of the level of command. This may,
of course, originate from any number of sources, including command stupidity or
tunnel vision imposed by more local learned practices, as well as inherent cultural
blindness.
The diffusion of information on a network of cognitive entities is a contagious
process in the sense of the analytic geographers (Abler et al. 1971; Gould and Wallace
1994). That is, fads, rumors, and epidemics can be characterized in terms of ‘signal’
(in a large sense) per unit entity—population, area, biomass, or, in this case, command responsibility. Gould and Wallace (1994) and Wallace et al. (1997) examine in
some detail diffusion on a ‘commuting field’ defined by self-to-self and self-to-other
transmissions between rapidly-interacting network nodes.
The perspective describes the propagation of a signal via a Markov ‘network
dynamics’ method in terms of a network probability-of-contact matrix (POCM)
defined by observation of signal exchange and its equilibrium distribution.
Following Gould and Wallace (1994), the spread of a ‘signal’ on a particular network of interacting sites—between and within—is characterized at nonequilibrium
steady state in terms of an equilibrium distribution εi ‘per unit area’ Ai of a Markov
process, where A scales with the different ‘size’ of each node, taken as distinguishable by a scale variable A, here an index of the level of command, as well as by its
‘position’ i or the associated POCM. The POCM—again, determined empirically
from observation—is then normalized to a stochastic matrix Q having unit row sums,
and the vector ε calculated as ε = εQ
There is a vector set of dimensionless network flows Xt i , i = 1, ..., n at time t.
These are each determined by some relation
Xt i = g(t, εi /Ai )
(1.58)
Here, i is the index of the node of interest, Xt i is the corresponding dimensionless
scaled i-th signal, t the time, and g an appropriate function. Again, εi is defined by
the relation ε = εQ for a stochastic matrix Q, calculated as the network probabilityof-contact matrix between regions, normalized to unit row sums.
34
1 AI in the Real World
Using Q, we have broken out the underlying network topology, a fixed betweenand-within communication configuration weighted by ‘command weight’ Ai that is
assumed to change relatively slowly on the timescale of observation compared to the
time needed to approach the nonequilibrium steady state distribution.
Since the X are expressed in dimensionless form, g, t, and A must be rewritten as
dimensionless as well giving, for the monotonic increasing (or threshold-triggered)
function F
εi
× Aτ ]
(1.59)
Xτi = G[τ,
Ai
where Aτ is the value of a ‘characteristic area’ variate that represents the spread of
the essential signal at (dimensionless) characteristic time τ = t/T0 .
G may be quite complicated, including dimensionless ‘structural’ variates for each
individual geographic node i. The idea is that the characteristic ‘area’ Aτ —the level
of command that recognizes the importance of essential incoming information—
grows according to a stochastic process, even though G may be a deterministic
mixmaster driven by systematic local probability-of-contact or other information
flow patterns.
An example.
A characteristic area cannot grow indefinitely, and we invoke a ‘carrying capacity’
for command level on the network under study, say K > 0. An appropriate SDE is
then
(1.60)
dAτ = [μρAτ (1 − Aτ /K )]dτ + σ Aτ dWτ
where we take ‘ρ’ as representing a composite index of operational and/or strategic
competence.
Using the Ito chain rule on log(A ), as a consequence of the added Ito correction
factor and the Jensen inequality for a concave function,
E(A ) → 0, μρ < σ 2 /2
σ2
), μρ ≥ σ 2 /2
E(A ) ≥ K (1 −
2μρ
(1.61)
Figure 1.12 shows the form of this relation. To the left of a critical value of the
competence index ρ, given the usual stochastic variabilities and excursions, there is
a high probability that critical information will not propagate to higher command
from the tactical level.
The effect of more general noise forms—colored, Levy, etc.—can be explored
using the Doleans-Dade exponential (DDE) (Protter 1990). We suppose Eq. (1.60)
can be rewritten in the form
(1.62)
dAτ = Aτ dYτ
where Yτ is an appropriate stochastic process. The DDE of A is then given as
E (A ) ∝ exp(Yτ − 1/2[Yτ , Yτ ])
(1.63)
1.12 Operational and Strategic Failure
35
Fig. 1.12 Lower limit of the
expectation for the command
level recognizing essential
information as a function of
an overall
operational/strategic
competence index ρ at a
fixed level of noise. To the
left of a critical competence,
essential information is
unlikely propagate to higher
levels of command: if zero is
attainable, stochastic
variation ensures that value
will be attained
where [Yτ , Yτ ] is the quadratic variation of the stochastic process Yτ . Heuristically,
invoking the Mean Value Theorem, if
1
d[Yτ , Yτ ]/dt > dYτ /dt
2
(1.64)
then the pathological ground state is stable and information will not flow across the
network system. A version of the formalism does indeed extend to Levy noise, which
has a long tail and relatively large jumps, in comparison with the usual Brownian
noise (Protter 1990).
In one dimension, for sufficiently powerful noise, similar results arise directly via
the stochastic stabilization theorems explored by Mao (2007), Appleby et al. (2008).
Matters are more complicated in two or more dimensions, where the noise structure
can determine more complicated dynamic effects.
Similar themes have been explored using Kuramoto’s (1984) model of frequency
synchronization across a coupled network (e.g., Acebron et al. 2005; Kalloniatis
and Roberts 2017). A central result of Kalloniatis and Roberts is the difference
between random and scale-free networks in their response to Levy noise. Their Fig. 6
is essentially our Fig. 1.12, but rephrased in terms of the Kuramoto order parameter
representing the degree of synchronization across the network as a function of the
strength of the noise, so the figures are mirror images.
36
1 AI in the Real World
1.13 Failure, Remediation, and Culture
Human minds, small working teams, larger institutions, and the machines—cognitive
and otherwise—that become synergistic with them, are all cultural artifacts. Indeed,
culture, as the evolutionary anthropologist Robert Boyd has commented, ‘is as much
a part of human biology as the enamel on our teeth’.
The failure of critical real-time cognition at tactical, operational, and strategic
scales—and the correction of failure—can and must be reexamined from the perspective of the dynamics imposed by embedding culture. US operational and strategic
misfeasance, malfeasance and nonfeasance in Korea, Vietnam, and Afghanistan have
deep cultural roots, as the military writings of Mao Tse-Tung and other East Asian
practitioners suggest.
Artificial Intelligence systems are, then, also cultural artifacts, and the dynamics
of critical systems under their influence must inevitably reflect something of the
embedding culture, if only through the dynamics of the rapidly-shifting roadway
topologies they must ride on, adapt to, or attempt to control. A simple, if somewhat
static, example can be seen in the differential successes of Japanese and American
automobile manufacturers.
Extension of the preceding theoretical development is surprisingly direct, at least
in a purely formal sense. The devil, as always, will be in the details.
The symmetry-breaking model of Sect. 1.6 can be extended to include the effects
of an embedding cultural environment—via an information source Z —on global
broadcast mechanisms at the different scales and levels of organization that link
across the tactical, operational, and strategic scales of organization. A single dual
information source X G i then becomes a large-scale joint information source whose
individual components are linked by crosstalk, having a joint source uncertainty
j
q
H (X Gi 1 , X G 2 , ..., X G m ).
Given the embedding cultural information source Z , then the splitting criterion
between high and low probability dynamic system trajectories is given by network
information theory as the complicated sum
j
q
I ((X Gi 1 , X G 2 , ..., X G m |Z ) =
j
H (Z ) +
H (X G n |Z )
j
q
−H (X Gi 1 , X G 2 , ..., X G m |Z )
(1.65)
Equations (1.33) and (1.34) are then rewritten in terms of the splitting criterion
j
q
I ((X Gi 1 , X G 2 , ..., X G m |Z )
We will call the new ‘free energy’ index, now influenced by embedding culture, F .
We have, in essence, extended to complex man/work group/institution/machine
composites a kind of ‘Mathematical Kleinman Theory’, representing in a formal way
something of the observations of Kleinman (1991), Kleinman and Cohen (1997),
1.13 Failure, Remediation, and Culture
37
and their colleagues who studied the profound differences in the expression and
experience of mental disorders across cultures.
It is possible to reexamine sufficient conditions for the intractable stability of a
pathological ‘ground state’ condensation representing control system collapse via
the Stochastic Stabilization Theorem (Appleby et al. 2008; Mao 2007), but now in a
particular embedding cultural milieu. Recall that, for military systems, that ground
state is usually something like ‘kill them all and let God sort them out’, or other
forms of ‘target discrimination failure’.
We assume a multidimensional vector of parameters associated with that phase,
J , that measures deviations from the pathological ground state. The free energy
measure from the generalization in Eq. (1.34) allows definition of another ‘entropy’
in terms of the Legendre transform
Ŝ ≡ F (J ) − J · ∇ J F
(1.66)
We write another first-order ‘Onsager’ dynamic equation in the gradients of Ŝ:
d Jt = f (Jt , t)dt + σ (Jt , t)dWt
(1.67)
dWt is a multidimensional white noise vector and σ (Jt , t) is a multidimensional
matrix function, and f (Jt , t) is a first-order ‘diffusion’ equation in the gradients of
Ŝ by J .
The base equation d X/dt = f (J, t), after some delay, under normal conditions
of recovery from a pathological state, will have a solution |J (t)| → ∞, implying that
there must be a transition to a more healthy nonequilibrium steady state. Successful
organisms, species, and/or more complex colonial systems, all have long-evolved
remedial mechanisms akin to the immune system, cancer suppression, wound healing, and suchlike.
However, as with Eq. (1.36), the multidimensional version of the Stochastic Stabilization Theorem (Appleby et al. 2008) ensures that, under very broad conditions, sufficient noise—a big enough ‘symmetric’ form of the multidimensional noise matrix
σ —will drive |J (t)| logarithmically to zero, stabilizing the pathological mode in
spite of internal remedial efforts. Damage accumulation, aging, starvation and so
on come to mind. Institutional and machine system equivalents, particularly under
military stresses, seem obvious.
Conversely, however, Appleby et al. (2008) also show that, for a system of dimension ≥ 2, a noise matrix can always be found that destabilizes an inherently stable
system, i.e., one for which |Jt | → 0, in this context, a persistent pathological condition for the organism or colony. That is, a ‘treatment’ can be found that causes a
transition to a different nonequilibrium steady state. Iatrogenic intervention makes
the individual sicker, proper treatment heals. In many cases, of course, successful
treatment is simply not realistic, and the malfunctioning system must be withdrawn,
deactivated, abandoned, or destroyed.
38
1 AI in the Real World
What should be evident is that culture will become inherently convoluted not only
with patterns of cognitive system failure at different scales and levels of organization, but with successful modalities of treatment. Treatment of cognitive failure—for
individual minds, small groups, institutions, real-time AI critical systems, and so on,
in the sense of Kleinman, will itself always be ‘culture-bound’.
For nonergodic systems addressed in the next chapter, where time averages are
not the same as ensemble averages, the groupoid symmetries become ‘trivial’, associated with the individual high probability paths for which an H -value may be
defined, although it cannot be represented in the form of the usual Shannon ‘entropy’
(Khinchin 1957, p. 72). Then equivalence classes must be defined in terms of other
similarity measures for different developmental pathways. The arguments of this
section regarding pathological modes and their treatment then follow through.
1.14 The Synergism of Phase Transitions in Real-Time
Critical Systems
Matters are far more complicated than we have examined so far. That is, while
this work has studied particular mechanisms and their dynamics at various scales
and levels of organization, in real systems the individual ‘factoids’ will influence
each other, consequently acting collectively and emergently, becoming, in the usual
sense, greater than the sum of their parts. This implies the existence of a ‘free energy’
splitting criterion that must be a specific and appropriate generalization of Eq. (1.65).
The argument is, yet again, surprisingly direct.
1. Cultural, cognitive, and communication processes can all be characterized by
information sources subject to phase transitions analogous to those of physical systems, if only by the identification of information as a form of free energy. The
Mathematical Appendix provides several examples of ‘biological’ renormalizations.
2. Behavioral ‘traffic flow’ for real-time critical systems, in a very large sense,
is itself subject to phase transitions, via directed homotopy groupoids, building into
shifting aggregations of these simpler transitive groupoids. That is, the system ‘traffic’
involves one-way paths from an ‘origin’ state to a ‘destination’ state. Equivalence
classes of such paths form the transitive groupoids that combine into the larger
groupoid of interest, subject to ‘symmetry’ making/breaking associated with system
and time-specific extensions. Wallace (2018), in the context of driveless cars on
intelligent roads, so-called V2V/V2I systems, puts it thus:
Traffic flow can be rephrased in terms of ‘directed homotopy’ – dihomotopy – groupoids
on an underlying road network, again parameterized by [a particular] ‘temperature’ index
T . Classical homotopy characterizes topological structures in terms of the number of ways
a loop within the object can be continuously reduced to a base point... For a sphere, all
loops can be reduced [to a single point]. For a toroid – a donut shape – there is a hole so
that two classes of loops cannot be reduced to a point. One then composes loops to create
the ‘fundamental group’ of the topological object. The construction is standard. Vehicles
on a road network, however, are generally traveling from some initial point So to a final
1.14 The Synergism of Phase Transitions in Real-Time Critical Systems
39
destination S1, and directed paths, not loops are the ‘natural’ objects, at least over a short
time period, as in commuting.
Given some ‘hole’ in the road network, there will usually be more than one way to reach S1
from So. An equivalence class of directed paths is defined by paths that can be deformed
into one another without crossing barrier zones [such as holes]... At high values of [some
appropriate index] T , many different sets of paths will be possible allowing unobstructed
travel from one given point to another, defining equivalence classes creating a large groupoid.
As [the critical index] T declines, roadways and junctions become increasingly jammed,
eliminating entire equivalence classes of open pathways, and lowering the groupoid symmetry: phase transitions via classic symmetry breaking on a network. The ‘order parameter’
that disappears at high T is then simply the number of jammed roadways.
These results extend to higher dihomotopy groupoids via introduction of cylindrical paths
rather than one-dimensional lines...
Most fundamentally... the traffic flow groupoid and the groupoid associated with cognition across the V2V/V2I system will inevitably be intimately intertwined, synergistically
compounding symmetry breaking traffic jams as-we-know-them with symmetry breaking
cognitive collapse of the control system automata, creating conditions of monumental chaos.
3. Sufficiently rapid challenge can always ‘get inside the command loop’ of a
real-time critical system in the sense of John Boyd, and/or can trigger network fragmentation by the Zurek mechanism(s) of Sect. 1.10.
These considerations lead to a particular inference:
4. The dynamics of critical real-time systems will almost always involve the synergism of several of the mechanisms studied above, leading to highly counterintuitive,
unexpected, and often deliberately triggered, groupoid symmetry breaking phase
transitions that can, and most certainly will, seriously compromise the health and
welfare of large populations.
The devil, of course, will be in the particular details of each system studied.
1.15 Discussion
The first two models examined the effect of a declining ‘Clausewitz temperature’ T
constructed from indices of the fog-of-war, friction, and related constraints on the
stability of an inherently unstable control system, using adaptations of the Data Rate
Theorem.
A third approximation modeled the dynamics of control breakdown for a simple
class of inherently unstable systems in terms of the dynamics of the Clausewitz
temperature itself and a parallel ‘resolve/capability’ index. The outcome, after some
algebra, also implies the inevitability of highly punctuated collapse under sufficient
stress, and it appears possible, using relatively direct methods, to calculate explicit
system limits that may be empirically tested.
These models were followed by an extension of the DRT to more highly ‘cognitive’ systems via the recognition that cognition represents choice, choice reduces
uncertainty, and the reduction in uncertainty implies the existence of an information
source ‘dual’ to the cognitive process under study. The dynamics of the most complex
40
1 AI in the Real World
and capable cognitive systems—including but not limited to AI—appear governed
by punctuated phase transitions driven by a fall in Clausewitz temperature. This is
closely similar to the spontaneous symmetry breaking of physical systems, but in
terms of the far more involved groupoid symmetries that so distinctly characterize cognition (Wallace 2015, 2017). Examination suggests the pathological ‘ground
state’ condensation can be highly resistant to correction. For military systems this
translates as something like ‘Kill everyone and let God sort them out’.
Another model suggests that such ground state collapse will be characteristic of
any hierarchical cognitive topology and introduces a ‘Boyd temperature’ that can
index catastrophically high demands for material and information resources.
The next model examine the role of rate-of-challenge in determining the fragmentation of networked AI (or other cognitive) systems. High rates lead to small
fragments that may be unable to achieve command goals.
We then study an economic-like ratchet mechanism, exploring how decline in T
can increase a ‘noise’ that lowers T further, triggering a race-to-the-bottom system
collapse.
The later models explore command and operational competence and the influence of both culture and stress on competence. This determines the level of higher
cognition, in a large sense, to which critical information diffuses upward from on-theground ‘tactical’ experience under real-time constraints. Below a competence limit,
complete failure of diffusion of essential information from local systems is likely.
Cultural embedding determines the dynamics of both failure and its remediation.
Overall, the emerging AI revolution will be relentlessly confronted by essentially
the same factors that challenged proponents of the ‘revolution in military affairs’
who hoped to eliminate the unpredictability of the battlefield with networks of
sensors. As Neuneck (2008) put it, ‘War is a complex, nonlinear process of violent interactions where technological edge is not a guarantee for success’. Much
the same challenge faces AI entities ceded responsibility for critical real-time systems on complex, dynamic, and inherently unstable ‘roadway’ topologies. Indeed,
Ingber et al. (1991) and Ingber and Sworder (1991) describe a statistical mechanics
approach to combat simulation that uses nonlinear stochastic differential equations
recognizably similar to the models of Sect. 1.5. Ingber and Sworder’s 1991 remarks
on the limitations of algorithmic approaches are prescient:
Military [Command, Control and Communications], while supported by an imposing array
of sensors, computers, displays, communications and weapons is in its essence a human
decision making activity. A trained individual has a unique talent for recognizing changes
in situation, and for allocating resources appropriately in a dynamically varying encounter
which is subject to significant uncertainty... Indeed, the ability of a human to employ powers
of analogical reasoning and problem reconstruction in response to sudden and unexpected
events contrasts sharply with algorithmic approaches to decision making problems... [A]n
algorithmic surrogate could perform certain observation and decision making functions without requiring direct human action. Unfortunately, such autonomous systems have not fulfilled
much of their initial promise... At present only the most modest and precisely focused tasks
are capably dealt with autonomously.
1.15 Discussion
41
This central problem remains. Although human failures of perception are manifold
(e.g., Kahneman 2011), they have been paired down by an evolutionary selection
process that has yet to act full-scale on human systems controlled by automata.
More recently Watts (2011), consonant with Neuneck’s view, has commented
on the assertion that near-perfect information availability combined with precision
targeting in combat situations will allow large-scale man/machine ‘cockpit’ systems
to profoundly alter the conduct of war. He states that
These assumptions obviously fly in the face of the view that the fundamental nature of war
is essentially an interactive clash - a Zweikampf or two-sided ‘duel,’ as Carl von Clausewitz characterized it - between independent, hostile, sentient wills dominated by friction,
uncertainty, disorder, and highly nonlinear interactions. Can sensory and network technologies eliminate the frictions, uncertainties, disorder, and nonlinearities of interactive clashes
between opposing polities? As of this writing, the answer appears to be ‘No.’
Again, Watts (2004):
..[F]riction has been a consistent, recurring feature of wars, not only in our own time but also
as far back as the wars of Greek city states and the Persian empire. The further realization that
every actor in war - from polities and nations to individual combatants and military forces are complex adaptive systems only underscores my central argument: friction is unlikely to
be eliminated from future war regardless of technological advances.
Schrage (2003) comments
...[A]n... unvarnished view of how individuals and institutions actually behave in information
rich environments – as opposed to how we might like them to behave – does not assure that
greater quantities of data will lead to better quality results... Capacity is not the same as
capability.
As John Boyd put it (1976)
[Combat] is dialectic in nature generating both disorder and order that emerges as a changing
and expanding universe of mental concepts matched to a changing and expanding universe
of observed reality.
Schrage (2003) again makes Boyd’s case about ‘getting inside’ an opponent’s
decision loop:
...[C]omparative information advantage accrues as the rate of information acquisition and
analysis changes over time. The ability to detect and respond to battlespace changes faster,
better, cheaper and more pervasively than the opposing force inherently places a premium
on better ‘improvisation’ than better planning. Indeed, the critical combat competency for
commanders shifts from rigorous planning – that stochastically evaporates on contact with
the enemy – to improvisational responsiveness...
The rapidly-varying ‘roadway’ topologies confronting the AI revolution present
essentially similar decision loop rate conundrums—the threat of destabilizing Boyd
temperatures—that both challenge ‘planning’/training and may be emergently structured and highly punctuated. Such matters as traffic jams on V2V/V2I systems, power
blackouts, C 3 network outages, and the like, come to mind. Indeed, all the models
studied pretty much drop out, and this is not happenstance. An anonymous reviewer
of an earlier version of this analysis remarked that
42
1 AI in the Real World
The ‘roadway’ example used at the beginning... is akin to the well-known midcourse guidance
problem of stochastic optimal control that has yet to be solved. In the midcourse guidance
example, many authors have tried to deal with the problem that as the space voyage progresses, information gained about the state vector is at the cost of increased control energy
[e.g., Mortensen’s seminal studies 1966a, b].
For the record, spacecraft trajectories are relatively simple geodesics in a gravitational manifold characterized by Newtonian mechanics. In the context of air traffic
control, Hu et al. (2001) show that finding collision-free maneuvers for multiple
agents on a two-dimensional Euclidean plane surface R2 is the same as finding the
shortest geodesic in a particular manifold with nonsmooth boundary. Given n vehicles the geodesic is calculated for the quotient space R2n /W (r ), where W (r ) is
defined by the requirement that no vehicles are closer together than some critical
Euclidean distance r .
For autonomous ground vehicles, R2 must be replaced by a far more topologically complex ‘roadmap space’ M2 subject to traffic jams and similar conditions.
Geodesics for n such vehicles are then in a quotient space M2n /W (r ) whose dynamics
are subject to phase transitions driven by changes in vehicle and/or passenger density
that represent cognitive groupoid symmetry breaking (Wallace 2017, Sect. 9.6).
Fifty years after Mortensen’s work, on the eve of the ‘AI revolution’ that will
place inherently unstable critical infrastructure fully under machine control, essential
questions have yet to be solved, or—in large part—even properly stated.
In consequence, and in summary, promises of ‘graceful degradation under stress’
for driverless vehicles on intelligent roads, of ‘precision targeting’ for autonomous
or ‘centaur’ weapons that avoids civilian casualties, of ‘precision medicine’ under
collapsing living and working conditions, of agent-based models that manage financial crises in real time, and so on, at best represent species of wishful thinking across
many different modalities, scales, and levels of organization.
It is difficult (but not impossible with the help of self-deception, groupthink, or
outright prostitution and a good PR firm) to escape the inference that the forthcoming
beta testing of large-scale AI systems on unsuspecting human populations violates
fundamental norms.
The USA’s tactical and operational level ‘Revolution in Military Affairs’ of the
1990’s, the networked information system designed to lift the fog-of-war from armed
conflict, died a hard death in the protracted insurgencies that evolved against it in
Iraq and Afghanistan, abetted by a strategic incompetence that is a major recurring
manifestation of friction (e.g., Bowden 2017). (For a more trenchant analysis, see
Stephenson 2010). Now the AI revolution is about to meet Carl von Clausewitz.
New York City’s early use of algorithms to supervise critical service deployment,
the resulting catastrophic loss of housing and community, and a consequent massive
rise in premature mortality, provides a cogent case history (Wallace 1993; Wallace
and Wallace 1998).
The real world is not a game of Go. In the real world, nothing is actually too big
to fail, and in the real world, the evolution of one’s competitors is an ever-present
selection pressure: Caveat Emptor, Caveat Venditor.
References
43
References
Abler, R., J. Adams, and P. Gould. 1971. Spatial organization: The geographer’s view of the world.
New York: Prentice Hall.
Acebron, J., L. Bonilla, C. Perez Vicente, F. Ritort, and R. Spigler. 2005. The Kuramoto model: A
simple paradigm for synchronization phenomena. Reviews of Modern Physics 77: 137–185.
Altmann, J., and F. Sauer. 2017. Autonomous weapon systems and strategic stability. Survival 59:
117–142.
Appleby, J., X. Mao, and A. Rodkina. 2008. Stabilization and destabilization of nonlinear differential
equations by noise. IEEE Transactions on Automatic Control 53: 126–132.
Atlan, H., and I. Cohen. 1998. Immune information, self-organization and meaning. International
Immunology 10: 711–717.
Baumard, P. 2016. Deterence and escalation in an artificial intelligence dominant paradigm: Determinants and outputs. MIT CSAIL Computer Science and Artificial Intelligence Laboratory,
Boston MA: In MIT international conference on military cyber stability.
Binney, J., N. Dowrick, A. Fisher, and M. Newman. 1986. The theory of critical phenomena. Oxford,
UK: Clarendon Press.
Bookstaber, R. 2017. The end of theory: Financial crises, the failure of economics, and the sweep
of human interactions. Princeton NJ: Princeton University Press.
Bowden, M. 2017. Hue 1968: A turning point of the American war in Vietnam. New York: Atlantic
Monthly Press.
Boyd, J. 1976. Destruction and creation. Available online from various sources.
Conte, R., and M. Paolucci. 2014. On agent-based modeling and computational social science.
Frontiers in Psychology 5: 668.
Cornuejols, G., and R. Tutuncu. 2006. Optimization methods in finance. New York: Cambridge
University Press.
Cover, T., and J. Thomas. 2006. Elements of information theory, 2nd ed. New York: Wiley.
de Groot, S., and P. Mazur. 1984. Non-equilibrium thermodynamics. New York: Dover.
Feynman, R. 2000. Lectures in computation. Boulder CO: Westview Press.
Glazebrook, J.F., and R. Wallace. 2009. Rate distortion manifolds as model spaces for cognitive
information. Informatica 33: 309–346.
Golubitsky, M., and I. Stewart. 2006. Nonlinear dynamics and networks: The groupoid formalism.
Bulletin of the American Mathematical Society 43: 305–364.
Gould, P., and R. Wallace. 1994. Spatial structures and scientific paradoxes in the AIDS pandemic.
Geografiska Annaler 76B: 105–116.
Hu, J., M. Prandini, K. Johnasson, and S. Sastry. 2001. Hybrid geodesics as optimal solutions to
the collision-free motion planning problem. In ed. M. Di Benedetto, A. Sangiovanni-Vincentelli,
HSCC 2001, LNCS 2034:305-318.
Huberman, B., and T. Hogg. 1987. Phase transitions in artificial intelligence systems. Artificial
Intelligence 33: 155–171.
Hwang, C., and A. Masud. 1979. Multiple objective decision making, methods and applications.
New York: Springer.
Ingber, L., and D. Sworder. 1991. Statistical mechanics of combat with human factors. Mathematical
Computational Modeling 15: 99–127.
Ingber, L., H. Fujio, and M. Wehner. 1991. Mathematical comparison of combat computer models
to exercise data. Mathematical Computational Modeling 15: 65–90.
Johnson, N., G. Zhao, E. Hunsader, H. Qi, N. Johnson, J. Meng, et al. 2013. Abrupt rise of new
machine ecology beyond human response time. Scientific Reports 3: 2627.
Kahneman, D. 2011. Thinking fast and slow. New York: Farrar, Straus and Giroux.
Kalloniatis, A., and D. Roberts. 2017. Synchronization of networked Kuramoto oscillators under
stable Levy noise. Physica A 466: 476–491.
Kania, E. 2017. Battlefield singularity: Artificial intelligence, military revolution, and China’s future
military power. Retreived from https://www.cnas.org/.
44
1 AI in the Real World
Khinchin, A. 1957. Mathematical foundations of information theory. New York: Dover Publications.
Kleinman, A. 1991. Rethinking psychiatry: From cultural category to personal experience. New
York: Free Press.
Kleinman, A., and A. Cohen. 1997. Psychiatry’s global challenge. Scientific American 276 (3):
86–89.
Kuramoto, Y. 1984. Chemical oscillations, waves, and turbulence. Berlin: Springer.
Lobel, I., A. Ozdaglar, and D. Feijer. 2011. Distributed multi-agent optimization with statedependent communication. Mathematical Programming B 129: 255–284.
Mao, X. 2007. Stochastic differential equations and applications, 2nd ed. Philadelphia: Woodhead
Publishing.
McQuie, R. 1987. Battle outcomes: Casualty rates as a measure of defeat, ARMY, November, 30-34.
Mortensen, R. 1966a. A priori open loop optimal control of continuous time stochastic systems.
International Journal of Control 3: 113–127.
Mortensen, R. 1966b. Stochastic optimal control with noisy observations. International Journal of
Control 4: 455–464.
Nair, G., F. Fagnani, S. Zampieri, and R. Evans. 2007. Feedback control under data rate constraints:
An overview. Proceedings of the IEEEE 95: 108–137.
Neuneck, G. 2008. The revolution in military affairs: Its driving forces, elements, and complexity.
Complexity 14: 50–60.
Nisbett, R., and Y. Miyamoto. 2005. The influence of culture: Holistic versus analytic perception.
TRENDS in Cognitive Sciences 10: 467–473.
Ormrod, D., and B. Turnbull. 2017. Attrition rates and maneuver in agent-based simulation models.
Journal of Defense Modelling and Simulation: Applications, Methodology, Technology. https://
doi.org/10.1177/1548512917692693.
Parker, E. 2016a. Flash crashes, information processing limits, and phase transitions. http://ssrn.
com/author=2119861.
Parker, E. 2016b. Flash crashes: The role of information processing based subordination and the
Cauchy distribution in market instability. Journal of Insurance and Financial Management 2:
90–103.
Pettini, M. 2007. Geometry and topology in hamiltonian dynamics. New York: Springer.
Protter, P. 1990. Stochastic integration and differential equations. New York: Springer.
Rose, P. 2001. Two strategic intelligence mistakes in Korea, 1950. https://www.cia.gov/library/
center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/fall_winter_2001/
article06.html
Schrage, M. 2003. Perfect information and perverse incentives: Costs and consequences of transformation and transparency. SPP Working Paper WP 03-1, MIT Center for International Studies.
Schrodinger, E. 1989. Statistical thermodynamics. New York: Dover Publications.
Shannon, C. 1959. Coding theorems for a discrete source with a fidelity criterion. Institute of Radio
Engineers International Convention Record 7: 142–163.
Stephenson, S. 2010. The revolution in military affairs: 12 observations on an out-of-fashion idea.
In Military review, 38–46, May–June.
Tishby, N., F. Pereira, and W. Bialek. 1999. The information bottleneck method. In 37th annual
conference on communication, control and computing, 368–377.
Tse-Tung, Mao. 1963. Selected military writings of Mao Tse-Tung. Peking, PRC: Foreign Langues
Press.
Turchin, A., and D. Denkenberger. 2018. Military AI as a convergent goal of self-improving AI. In
AI Safety and Security, ed. R. Yampolskiy. CRC Press.
Wallace, R. 1993. Recurrent collapse of the fire service in New York City: The failure of paramilitary
systems as a phase change. Environment and Planning A 25: 233–244.
Wallace, R. 2005. Consciousness: A mathematical treatment of the global neuronal workspace
model. New York: Springer.
Wallace, R. 2012. Consciousness, crosstalk, and the mereological fallacy: An evolutionary perspective. Physics of Life Reviews 9: 426–453.
References
45
Wallace, R. 2015. An ecosystem approach to economic stabilization: Escaping the neoliberal wilderness. London: Routledge.
Wallace, R. 2016. High metabolic demand in neural tissues: Information and control theory perspectives on the synergism between rate and stability. Journal of Theoretical Biology 409: 86–96.
Wallace, R. 2017. Information theory models of instabilities in critical systems. Singapore: World
Scientific.
Wallace, R. 2018. Canonical Instabilities of autonomous vehicle systems: The unsettling reality
behind the dreams of greed. New York: Springer.
Wallace, D., and R. Wallace. 1998. A plague on your houses. New York: Verso.
Wallace, R., and D. Wallace. 2016. Gene expression and its discontents: The social production of
chronic disease, 2nd ed. New York: Springer.
Wallace, R., D. Wallace, and H. Andrews. 1997. AIDS, tuberculosis, violent crime and low birthweight in eight US metropolitan areas: Public policy, stochastic resonance, and the regional
diffusion of inner city markers. Environment and Planning A 29: 525–555.
Watts, B. 2004. Clausewitzian friction and future war, revised edition, McNair Paper 68. Washington, DC: Institute for National Strategic Studies, National Defense University.
Watts, B. 2008. US Combat training, operational art, and strategic competence: Problems and
opportunities. Washington, D.C.: Center for Strategic and Budgetary Assessments.
Watts, B. 2011. The maturing revolution in military affairs. Washington DC: Center for Strategic
and Budetary Affairs.
Weinstein, A. 1996. Groupoids: Unifying internal and external symmetry. Notices of the American
Mathematical Association 43: 744–752.
Wilson, K. 1971. Renormalization group and critical phenomena. I renormalization group and the
Kadanoff scaling picture. Physics Reviews B 4: 3174–3183.
Wolpert, D., and W. MacReady. 1995. No free lunch theorems for search. SFI-TR-02-010, Santa
Fe Institute.
Wolpert, D., and W. MacReady. 1997. No free lunch theorems and optimization. IEEE Transactions
on Evolutionary Computation 1: 67–82.
Zook, M., and M. Grote. 2017. The microgeographies of global finance: High-frequency trading
and the construction of information inequality. Environment and Planning A 49: 121–140.
Zurek, W. 1985. Cosmological experiments in superfluid helium? Nature 317: 505–508.
Zurek, W. 1996. The shards of broken symmetry. Nature 382: 296–298.
Chapter 2
Extending the Model
Abstract It is possible to extend the model to nonergodic cognitive systems, a parallel to the nonparametric extension of more familiar statistical models. Cognition
of any nature involves choice that reduces uncertainty. Reduction of uncertainty
implies the existence of an information source dual to the cognitive process under
study. Information source uncertainty for path-dependent nonergodic systems cannot
be described as a conventional Shannon entropy since time averages are not ensemble averages. The fact that information as a form of free energy, however, allows
study of nonergodic cognitive systems having complex dynamic topologies whose
algebraic expression is in terms of directed homotopy groupoids rather than groups.
This permits a significant extension of the Data Rate Theorem linking control and
information theories via an analog to the spontaneous symmetry breaking arguments
fundamental to modern physics.
2.1 Introduction
Cognitive systems can, at least in first order, be described in terms of the ‘grammar’
and ‘syntax’ of appropriate information sources. This is because cognition implies
choice, choice reduces uncertainty, and the reduction of uncertainty implies the existence of an information source (Atlan and Cohen 1998; Wallace 2012, 2015a, b,
2016a, b, c, 2017). Conventional ‘parametric’ theory focuses, however, on adiabatically piecewise stationary ergodic (APSE) sources, i.e., those that are parameterized
in time but remain as close as necessary to ergodic and stationary for the theory to
work. ‘Stationary’ implies that probabilities are not time dependent, and ‘ergodic’
roughly means that time averages are well represented as ensemble averages. Transitions between ‘pieces’ can then be described using an adaptation of standard renormalization methods, as described in the Mathematical Appendix.
The Wallace references provide details of the ‘adiabatic’ approximation, much like
the Born-Oppenheimer approach to molecular dynamics where nuclear oscillations
are taken as very slow in comparison with electron dynamics that equilibrate about
the nuclear motions. Here, we extend the theory to nonergodic cognitive systems that,
© The Author(s), under exclusive licence to Springer International Publishing AG,
part of Springer Nature 2018
R. Wallace, Carl von Clausewitz, the Fog-of-War, and the AI Revolution, SpringerBriefs
in Computational Intelligence, https://doi.org/10.1007/978-3-319-74633-3_2
47
48
2 Extending the Model
as is the case of nonparametric statistics, may encompass more real-world examples
than are covered by the ‘parametric’ models.
Something similar has been the focus of attention in economics (e.g., Durlauf
1993). Economic agents are quintessentially cognitive, and the approach can be
applied across many scales and levels of biological and other forms of organization.
In particular it is possible to describe the dynamics of pathology in such systems
using fairly direct methods. For example, Wallace (2015a) applies a ‘locally ergodic’
formalism to economic problems that is similar to the standard ergodic decomposition
methods (Von Numann 1932; Gray and Davisson 1974; Gray and Saadat 1984;
Schonhuth 2008; Gray 2011, Lemma 1.5; Coudene 2016) and produces multiple
nonequilibrium steady states (nss). These are characterized by assignment of an
APSE source to equivalence classes of developmental paths that are represented by
groupoid symmetries, leading to groupoid symmetry breaking via an analog of group
symmetry breaking in physical systems.
Standard extensions of classic information theory theorems to nonergodic stationary processes, and to asymptotically mean stationary processes, have been in
terms of the decomposition of sources into their ergodic components, with averaging
across them, a development with a long tradition. Coudene (2016, Sect. 14.1) puts it
When a system is not ergodic, it is possible to decompose the underlying space into several
pieces, so that the transformation is ergodic on each of these pieces. We call this a partition
into ergodic components. The number of components may be uncountable, but the resulting
partition still satisfies a certain regularity property: it is possible to approximate it with
partitions having finitely many pieces.
As Hoyrup (2013) notes, however, while every non-ergodic measure has a unique
decomposition into ergodic ones, this decomposition is not always computable. Such
expansions—in terms of the usual ergodic decomposition or the groupoid/directed
homotopy equivalents—both explain everything and explain nothing, in the same
sense that, over some limited domain, almost any real function can be written as
a Fourier series or integral that retains the essential character of the function itself.
Sometimes this helps if there are basic underlying periodicities leading to a meaningful spectrum, otherwise not. A good analogy is the contrast between the Ptolemaic
expansion of planetary orbits in circular components around a fixed Earth versus
the Newtonian/Keplerian gravitational model in terms of ellipses with the Sun at one
focus. While the Ptolemaic expansion converges to any required accuracy, it conceals
the essential dynamics.
Here, we show that the very general approach adapted from nonequilibrium thermodynamics and used above can apply to both nonergodic systems and their ergodic
components, if such exist. Again, this is in terms of inherent groupoid symmetries
associated with equivalence classes of directed homotopy developmental pathways.
To reiterate, the attack is based on the counterintuitive recognition of information
as a form of free energy (Feynman 2000), rather than an ‘entropy’ in the physical
sense. A central constraint is that, in the extreme case which will be the starting point,
only individual developmental paths can be associated with an information-theoretic
source function that cannot be represented in terms of a Shannon entropy-like uncertainty value across a probability distribution.
2.1 Introduction
49
Equivalence classes then must arise via a metric distance measure for which the
developmental trajectories of one kind of ‘game’ are closer together than for a significantly different ‘game’. Averaging occurs according to such equivalence classes,
and is marked by groupoid symmetries, and by characteristic dynamics of symmetry
breaking according to appropriate ‘temperature’ changes indexing the influence of
embedding regulatory mechanisms. We will, however, recover the standard decomposition by noting that larger equivalence classes across which uncertainty measures
are constant can be collapsed to single paths on an appropriate quotient manifold.
Recall that, for a stationary, ergodic information source X, as Khinchin (1957)
indicates, it is possible to divide statements of length n—written as x n = {X (0) =
x0 , X (1) = x1 , ..., X (n) = xn }—into two sets. The first, and largest, is not consonant
with the ‘grammar’ and ‘syntax’ of the information source, and consequently has
vanishingly small probability in the limit of large n. The second, much smaller
set that is consonant and characterized as ‘meaningful’, has the following essential
properties.
If N (n) is the number of meaningful statements of length n, then limits exist
satisfying the conditions
log[N (n)]
=
n
lim H (X n |X 0 , ..., X n−1 ) =
H [X] = lim
n→∞
n→∞
H (X 0 , ..., X n )
n→∞
n
lim
(2.1)
H (X n |X 0 , ..., X n−1 ) and H (X 0 , ..., X n ) are conditional and joint Shannon uncertainties having the familiar pseudo-entropy form
H =−
i
0 ≤ Pi ≤ 1,
Pi log[Pi ]
Pi = 1
(2.2)
i
in the appropriate joint and conditional probabilities (Cover and Thomas 2006). This
limit is called the source uncertainty.
Nonergodic information sources cannot be directly represented in terms of Shannon uncertainties resembling entropies. For such sources, however, a function,
H (x n ), of each path x n → x, may still be defined, such that limn→∞ H (x n ) =
H (x) holds (Khinchin 1957, p. 72). However, H will not, in general, be given
by the simple cross-sectional laws-of-large numbers analog having the (deceptive)
entropy-like form of Eq. (2.2).
50
2 Extending the Model
2.2 Generalizing the Data Rate Theorem
Cognitive information sources are characterized by equivalence classes of states and
developmental paths in the topological spaces defined by those states (Wallace 2012,
2015a, b, 2017; Wallace and Fullilove 2008). Under ‘ergodic’ conditions, for each
of these classes a ‘dual’ APSE information source can be assigned. Perhaps the
simplest example of such an equivalence class would be the set of high probability
‘developmental’ trajectories from an initial phenotype a0 to some final phenotype a∞ .
Variation in a0 and a∞ then produces the set of classes, defining a groupoid (Weinstein
1996), as opposed to the group symmetries more familiar from standard algebraic
topology (e.g., Lee 2000). Consequently, products may not necessarily be defined
between groupoid members (Weinstein 1996). As discussed elsewhere (e.g., Wallace
2015a, b, 2017), phase transitions for ergodic cognitive systems are associated with
necessary (but not sufficient) changes in underlying groupoid symmetries that are
analogous to the spontaneous symmetry breaking of simpler physical systems (e.g.,
Pettini 2007).
Consideration of these matters for fully path-dependent nonergodic information
sources leads quickly to an analog of the Data Rate Theorem (DRT) that mandates
a minimum rate of control information for an inherently unstable system (Nair et al.
2007). A principal tool is directed homotopy, or dihomotopy—the study of topological structure using nonreversible paths rather than complete loops (Fajstrup et al.
2016; Grandis 2009).
Cognitive systems are embodied: there is no cognition without sensory input,
following the basic model of Atlan and Cohen (1998). Sensory information is the
tool by which choice-of-action is made, and such choice is the defining characteristic
of cognition, reducing uncertainty and implying the existence of a dual information
source. For a relatively simple but inherently unstable linear ‘plant’, clever application of the classic Bode integral theorem implies that the rate of control information
must exceed the rate at which that system generates ‘topological information’, in
a particular sense (Nair et al. 2007). Ergodic cognitive processes may be expected
to show more complex patterns of behavior, and we will extend the argument to
nonergodic cognition.
Again, the central focus is on paths xn → x that are consonant with the ‘grammar’
and ‘syntax’ of the information source dual to the cognitive process. For these, a fully
path-dependent information source function H (x) can be defined, i.e., it’s value, in
general, changes from path to path. For an ergodic source, there is only one value
possible across an equivalence class of developmental pathways, and it is given by
the usual Shannon uncertainty across a probability distribution.
Suppose the nonergodic cognitive system is placed in some initial reference state
x0 , and is then confronted with different sets of environmental challenges. Each
challenge can be addressed by relatively similar subsequent sets of developmental
pathways. Two of these, say indexed by i and j and both originating at x0 , since they
address the same challenge, will be closer together according to any reasonable metric
M (xi , x j ) than will be paths addressing fundamentally different cognitive tasks: two
2.2 Generalizing the Data Rate Theorem
51
baseball games will usually be played in recognizably similar ways, but a baseball and
a football game are played quite differently. This permits identification of directed
homotopy equivalence classes of paths associated with different ‘fundamental tasks’
carried out by the cognitive system under study. Again, equivalence classes of paths
define groupoids, and groupoids represent an extension of the idea of a symmetry
group (Weinstein 1996). For example, the simplest groupoid might be seen as a
disjoint union of groups, for which there is no single universal product.
See the Mathematical Appendix for formal characterization of the metric M , a
somewhat nontrivial matter that conceals much of the underlying machinery.
Suppose the data rate of the incoming control information—again, this is via
another information source—is a real number U . H (x) is the path dependent information source uncertainty associated with the consonant cognitive path x, and we
can construct a Morse Function (Pettini 2007) using a pseudoprobability
exp[−H (x)/κU ]
P(x) ≡ x̂ exp[−H ( x̂)/κU ]
(2.3)
where the sum is over all possible consonant paths x̂ originating from some base
point. κ is a measure of the effectiveness of the control signal and might parameterize
processes of aging or environmental insult.
A Morse Function F, analogous to free energy in a physical system, is then defined
as
exp[−H (x̂)/κU ]
(2.4)
exp[−F/κU ] ≡
x̂
where, again, the sum over all possible consonant paths originating from some fixed
initial system state.
The extension of the Data Rate Theorem emerges via a spontaneous symmetry
breaking driven by changes in κU . These changes affect the groupoid structure
underlying the ‘free energy’ Morse Function F associated with different dihomotopy
classes defined in terms of the metric M . Generally, higher values of κU will be
permit richer cognitive behaviors—higher values of H (x). The analogy is with
spontaneous group symmetry breaking in physical systems, first characterized by
Landau, that has since become a foundation of much of modern physics (Pettini
2007). We argue that extension of the perspective to cognition is via dihomotopy
groupoid rather than group symmetries. Previous work in this direction was restricted
to ergodic sources and their spectral constructs and averages. Here, we have attempted
to lift that restriction without invoking an ergodic decomposition that may not actually
be computable (Hoyrup 2013) and in a manner that permits a variant of the symmetrybreaking arguments now central to modern physical theory.
It seems clear that the extended DRT for cognitive systems is not confined to
dichotomy between stable and unstable operation, but can encompass a broad range
of qualitative behavioral dynamics, some of which may be adaptive to selection
52
2 Extending the Model
pressures, but many of which will not, and might be characterized as pathological,
particularly as the embedding control information U or its effectiveness as parameterized by κ, declines.
2.3 The Transitive Cognitive Decomposition
We have explored equivalence classes of dihomotopy developmental paths associated with a highly nonergodic cognitive system defined in terms of only single-path
source uncertainties, requiring imposition of structure via the metric M , leading to
groupoid symmetry-breaking transitions driven by changes in the temperature analog κU . There is an intermediate case under circumstances in which the standard
ergodic decomposition of a stationary process is both reasonable and computable.
Then there is an obvious natural directed homotopy partition in terms of the transitive
components of the path-equivalence class groupoid (Weinstein 1996). It seems reasonable to assume that this decomposition is equivalent to, and maps on, the ergodic
decomposition of the overall stationary cognitive process. Then it becomes possible to define a constant source uncertainty on each transitive subcomponent, fully
indexed by the embedding groupoid.
That is, on each ergodic/transitive groupoid component of the ergodic decomposition, one recovers a constant value of the source uncertainty dual to the cognitive
process, presumably given by the standard Shannon ‘entropy’ expression. Since one
can envision the components themselves as constituting single paths in an appropriate quotient space, this leads to the development of the previous section.
These arguments seem much in the direction of George W. Mackey’s theory of
‘virtual groups’, otherwise known as ‘ergodic groupoids’ (e.g., Mackey 1963; Series
1977; Hahn 1978).
A complication, however, arises via the imposition of a double symmetry involving M -defined equivalence classes of this quotient space: there are different possible
strategies for any two teams playing a particular baseball game.
In any event, groupoid symmetry-breaking in the free energy construct of Eq. (2.4)
will still be driven by changes in κU .
2.4 Environmental Insult and Developmental Dysfunction
The formalism allows restatement of a result from Chap. 1, but in more general terms.
The regulation and control of a developmental trajectory is almost certainly a high
dimensional process, involving a number of interacting signals at different critical
branch points. We can model the dynamics of this, in first order, via an analog to
Onsager’s approach to nonequilibrium thermodynamics. The general approach is
well-studied (e.g., Groot and Mazur 1984). The first step is to use the free energy
2.4 Environmental Insult and Developmental Dysfunction
53
Morse Function F of Eq. (2.4) to construct an entropy scalar via the Legendre transform in the vector of essential driving parameters K as
S ≡ F(K ) − K · ∇ K F
(2.5)
The Onsager approximation then makes a linear expansion for the rate of change
of the vector K in the gradients of S by the components of K , which we write in the
more general, and not necessarily linear multidimensional form,
d K t = f (K t )dt + g(K t )dWt
(2.6)
where dWt is multidimensional white noise and f is taken as locally Lipschitz,
in the sense that, for f from Q ⊂ R n → R m , there is a constant C such that
|| f (y) − f (x)|| ≤ C||y − x|| for all y ∈ Q that are sufficiently near to x.
Then Appleby et al. (2008) show that, for any such function f , a function g can
always be found that stabilizes an inherently unstable function f —one that → ∞—
or else, in two or more dimensions, destabilizes an inherently stable equilibrium
for f .
This result, that carries through to nonergodic systems, has deep implications for
developmental processes across a variety of modalities.
Successful development involves repeatedly shifting—destabilizing—a sequence
of quasi-stable states, each at the right time in the right manner, according to a highly
regulated template that must respond to a variety of internal and external signals.
Environmental ‘noise’, characterized by the function g in Eq. (2.6), depending on its
form, can interfere with development by triggering an unstable transition to a pathological state—destabilizing a quasi-equilibrium. A different character of noise, or the
same noise at a different developmental stage involving different regulatory machinery, can then freeze the pathological state—stabilize what might be an unstable mode
in the face of corrective regulatory actions by the embedding control system—in a
kind of one-two punch initiating an irreversible pathological developmental pathway.
2.5 Other Complexity Measures
Much of the basic argument can be redone using the Kolmogorov algorithmic complexity K (X ) of a stochastic process X , since the expectation of K converges to
the Shannon uncertainty, i.e.,
1
E[ K (X n |n)] → H (X )
n
(2.7)
Cover and Thomas (2006) provide details.
However, Zvon and Levin (1970) argue that, if the ensemble is stationary but not
ergodic, the limit varies over the ensemble, as is the case with Shannon uncertainty.
54
2 Extending the Model
This permits a redefinition of the entropy measure of Eq. (2.5) in terms of K and
may provide a different perspective on system dynamics. Indeed, there may well
be a considerable set of such complexity measures that converge in expectation to
Shannon uncertainty in a similar manner. These could perhaps be crafted to particular circumstances for cleaving specific Gordian knots, much as does reexpressing
electrodynamics according to the underlying symmetries of the system under study:
Maxwell’s equations for spherical systems are more easily solved in spherical coordinates, and so on. However, this is not at all straightforward. For example, Teixeira
et al. (2011) demonstrate that analogs to the above expression apply exactly to Renyi
and Tsallis entropies of order α only in the limit α → 1, for which they are not
defined. However, for the Tsallis entropy they do show that, for every ε > 0 and
0 < ε̂ < 1, given a probability distribution P, T1+ε (P) ≤ H (P) ≤ T1−ε̂ (P), where
T represents the Tsallis and H the Shannon measure.
2.6 Discussion
Critical system automatons operating on complex, rapidly-shifting ‘roadway’ topologies are inherently unstable as a consequence of those topologies. Driverless cars on
intelligent roads, so-called V2V/V2I systems, come to mind, as do financial, communications, and power networks of any size, and, of course, weapon systems of
varying levels of complexity and human control.
A symmetry-driven extension of the Data Rate Theorem for nonergodic cognitive
systems via directed homotopy identifies possibly large sets of complex adaptive
versus pathological behaviors associated with phase transitions between them as
measures of control information, or its effectiveness, change. These symmetry-driven
phase transitions are analogous to the effects of temperature variation in a physical
system, but are associated with groupoid rather than group algebras.
The essential nature of information as a kind of free energy allows construction
of empirical Onsager dynamic models in the gradient of an entropy built from the
Legendre transform of path-dependent information source uncertainty. From this
‘dynamic model’ there emerges a regulated sequence of quasi-stable nonequilibrium
steady states similar to the DRT phase transition analysis. Indeed, Van den Broeck
et al. (1994) describe how, using a similar stochastic differential equation approach,
...[A] simple model of a... [two dimensional] system... subject to multiplicative [white]
noise... can undergo a nonequilibrium phase transition to a symmetry-breaking state, while
no such transition exists in the absence of the noise term. The transition possesses features
similar to those observed at second order phase transition...
The existence of a path dependent source uncertainty H (xn ) → H (x) as xn → x
permits extension of much found in the ergodic version of the theory, at the cost of
losing identification of source uncertainty with Shannon ‘entropy’. This is not a
catastrophic loss, since the essential characteristic of information lies in Feynman’s
and Bennett’s identification of it as a kind of free energy, permitting imposition of
2.6 Discussion
55
regularities from Onsager theory without the ‘reciprocity relations’ associated with
microreversibility, and a spontaneous symmetry breaking using groupoid rather than
group symmetries that extends the Data Rate Theorem. The underlying one-way
topological perspective of directed homotopy for cognitive/information processes
holds through the loss of the ergodic property and the consequent disappearance of
any simple expression for information source uncertainty.
These results provide a different perspective on the mechanisms of punctuated
failure across a broad spectrum of cognitive phenomena, ranging from cellular, neurological, and other physiological and psychosocial processes, to critical systems
automata, institutional economics, and sociocultural dynamics.
References
Appleby, J., X. Mao, and A. Rodkina. 2008. Stabilization and destabilization of nonlinear differential
equations by noise. IEEE Transactions on Automatic Control 53: 126–132.
Atlan, H., and I. Cohen. 1998. Immune information, self-organization and meaning. International
Immunology 10: 711–717.
Coudene, T. 2016. Ergodic theory and dynamical systems. New York: Springer Universietext.
Cover, T., and J. Thomas. 2006. Elements of information theory, 2nd ed. New York: Wiley.
de Groot, S., and P. Mazur. 1984. Non-equilibrium thermodynamics. New York: Dover.
Durlauf, S. 1993. Nonergodic economic growth. Reviews of Economic Studies 60: 349–366.
Fajstrup, L., E. Goubault, A. Mourgues, S. Mimram, and M. Raussen. 2016. Directed algebraic
topology and concurrency. New York: Springer.
Feynman, R. 2000. Lectures in computation. Boulder CO: Westview Press.
Grandis, M. 2009. Directed algebraic topology: Models of non-reversible worlds. New York: Cambridge University Press.
Gray, R., Davisson, L. (1974). The ergodic decomposition of stationary discrete random processes.
IEEE Transactions on Information Theory, IT, 20, 625-636.
Gray, R. 2011. Entropy and information theory, 2nd ed. New York: Springer.
Gray, R., and F. Saadat. 1984. Block source coding theory for asymptotically mean stationary
measures. IEEE Transactions on Information Theory 30: 5468.
Hahn, P. 1978. The regular representations of measure groupoids. Transactions of the American
Mathematical Society 242: 35–53.
Hoyrup, M. 2013. Computability of the ergodic decomposition. Annals of Pure and Applied Logic
164: 542–549.
Khinchin, A. 1957. Mathematical foundations of information theory. New York: Dover Publications.
Lee, J. 2000. Introduction to topological manifolds. New York: Springer.
Mackey, G.W. 1963. Ergodic theory, group theory, and differential geometry. Proceedings of the
National Academy of Sciences USA 50: 1184–1191.
Nair, G., F. Fagnani, S. Zampieri, and R. Evans. 2007. Feedback control under data rate constraints:
An overview. Proceedings of the IEEEE 95: 108–137.
Pettini, M. 2007. Geometry and topology in Hamiltonian dynamics. New York: Springer.
Schonhuth, A. 2008. The ergodic decomposition of asymptotically mean stationary random sources.
arXiv: 0804.2487v1 [cs.IT].
Series, C. 1977. Ergodic actions of product groups. Pacific Journal of Mathematics 70: 519–534.
Teixeira, A., A. Matos, A. Souto, and L. Antunes. 2011. Entropy measures vs. Kolmogorov complexity. Entropy 13: 595–611.
Van den Broeck, C., J. Parrondo, and R. Toral. 1994. Noise-induced nonequilibrium phase transition.
Physical Review Letters 73: 3395–3398.
56
2 Extending the Model
Von Numann, J. 1932. Zur Operatorenmethode der klassischen Mechanik. Annals of Mathematics
33: 587642.
Wallace, R. 2012. Consciousness, crosstalk, and the mereological fallacy: An evolutionary perspective. Physics of Life Reviews 9: 426–453.
Wallace, R. 2015a. An ecosystem approach to economic stabilization: Escaping the neoliberal
wilderness. London: Routledge.
Wallace, R. 2015b. An information approach to Mitochondrial dysfunction: Extending Swerdlow’s
hypothesis. Singapore: World Scientific.
Wallace, R. 2016a. High metabolic demand in neural tissues: Information and control theory perspectives on the synergism between rate and stability. Journal of Theoretical Biology 409: 86–96.
Wallace, R. 2016b. Subtle noise structures as control signals in high-order biocognition. Physica
Letters A 380: 726–729.
Wallace, R. 2016c. Environmental induction of neurodevelopmental disorders. Bulletin of Mathematical Biology 78: 2408–2426.
Wallace, R. 2017. Information theory models of instabilities in critical systems. Singapore: World
Scientific.
Wallace, R., and M. Fullilove. 2008. Collective consciousness and its discontents. New York:
Springer.
Weinstein, A. 1996. Groupoids: Unifying internal and external symmetry. Notices of the American
Mathematical Association 43: 744–752.
Zvon, A., and L. Levin. 1970. The complexity of finite objects and the development of the concepts
of information and randomness by means of the theory of algorithms. Russ. Math. Suros. 25:
83–124.
Chapter 3
An Example: Passenger Crowding
Instabilities of V2I Public Transit Systems
Abstract We apply the theory to passenger crowding on vehicle-to-infrastructure
(V2I) public transit systems in which buses or subways become so crowded that they
are ordered by a central control to begin a degraded ‘skip-stop’ service. Application of
the Data Rate Theorem shows there is no coding or other strategy that can compensate
for inadequate service levels that produce passenger crowding of either stops or
vehicles.
3.1 Introduction
Urban public transportation, particularly bus service, necessarily interacts with a
more general traffic flow on congested roadways that may include trucks, passenger cars, taxis, and emergency vehicles, forming a complex transit ecosystem that
is inherently difficult to manage (Chiabaut 2015; Geroliminis et al. 2014). Recent
advances make it possible for an embedding regulatory infrastructure—the ‘I’ of
the chapter title—to communicate with bus and light rail vehicles in real time—the
‘V’ of the chapter title. Such V2I systems fall under the necessary constraints of the
asymptotic limit theorems of control and information theories. Indeed, public transit
has been the subject of much mathematical modeling, and the effects of passenger
crowding on service continue to receive central focus (Tirachini et al. 2013, 2014;
Ivanchev et al. 2014, and references therein).
Tirachini et al. (2013) describe the classic passenger crowding conundrum as
follows:
When buses and trains circulate with a low number of passengers, everyone is able to find
a seat, transfer of passengers at stations is smooth, and passenger-related disruptions that
impose unexpected delays are rare. As the number of passengers increases, a threshold
is reached at which not everyone is able to find a seat and some users need to stand inside
vehicles. In turn, this may make more difficult the movement of other passengers that need to
board to or alight from a vehicle: therefore, riding time increases due to friction or crowding
effects among passengers... [Study] finds that dwell time increases with the square of the
© The Author(s), under exclusive licence to Springer International Publishing AG,
part of Springer Nature 2018
R. Wallace, Carl von Clausewitz, the Fog-of-War, and the AI Revolution, SpringerBriefs
in Computational Intelligence, https://doi.org/10.1007/978-3-319-74633-3_3
57
58
3 An Example: Passenger Crowding Instabilities of V2I Public Transit Systems
number of standees inside a bus, multiplied by the total number of passengers boarding and
alighting at a bus stop... A formal treatment [shows]... that average waiting time is related not
only to the headway (the inverse of bus frequency) but also to the occupancy rate or crowding
level in an additive or multiplicative way... A second effect of high occupancy levels on
waiting times is the possibility of triggering bus bunching [by a number of mechanisms] ...
[T]he negative impacts of crowding on the reliability of public transport services should be
carefully analysed...
The seduction of real-time V2I systems using GPS positioning of individual transit
vehicles is the assumption that sufficient control of vehicle headway will smooth out
passenger and vehicle congestion, avoiding bunching, mitigating overcrowding, and
so on. Here, via the Data Rate Theorem that links control and information theories,
we show that assumption to be an illusion, and that there will always be a critical
value of passenger density at which a public transit system suffers the functional
equivalent of a massive traffic jam.
The phenomenological model we develop will, in fact, link larger-scale vehicles/mile traffic density with passengers/bus density and roadway quality.
The underlying conceit of V2I systems is that the infrastructure can control individual vehicles to regulate traffic flow. An essential constraint on such systems,
however, is that they are inherently unstable, and require a constant flow of control
information to stay on the road or, if on a track, to avoid collisions. As discussed
above, aircraft can be designed to be inherently stable, in the sense that, for a short
time at least, they can be allowed to proceed ‘hands off’, as long as the center of
pressure of the vehicle is behind the center of gravity. Then small perturbations from
steady state rapidly die out. Ground vehicles in heavy traffic on twisting roads must,
by contrast, always be under real-time direction by a cognitive entity—driver or AI
automaton.
The first stage of modeling the V2I public transit system is the usual linear expansion around a nonequilibrium steady state in which control information is sufficient
to keep the system ‘on track’.
3.2 The Data Rate Theorem for Traffic Flow
Recall that the Data Rate Theorem (Nair et al. 2007) establishes the minimum rate
at which externally-supplied control information must be provided for an inherently
unstable system to maintain stability. Given the linear expansion near a nonequilibrium steady state, an n-dimensional vector of system parameters at time t, xt ,
determines the state at time t + 1 according to the model of Fig. 1.1, so that
xt+1 = Axt + Bu t + Wt
(3.1)
where A, B are fixed n × n matrices, u t is the vector of control information, and Wt is
an n-dimensional vector of white noise. Again, the Data Rate Theorem (DRT) under
3.2 The Data Rate Theorem for Traffic Flow
59
such conditions states that the minimum control information rate H is determined
by the relation
(3.2)
H > log[| det(Am )|] ≡ a0
where, for m ≤ n, Am is the subcomponent of A having eigenvalues ≥1. Again,
the right hand side of Eq. (3.2) is to be interpreted as the rate at which the system
generates ‘topological information’.
For a simple traffic flow system on a fixed road segment, the only source of
‘topological information’ is the average linear vehicle density ρ, leading to a characteristically particular derivation of a ‘Clausewitz temperature’, as follows.
The ‘fundamental diagram’ of traffic flow studies relates the total vehicle flow
to the linear vehicle density, shown in Fig. 3.1 for a Rome street (Blandin 2011),
one month on a Japanese freeway (Sugiyama et al. 2008), and 49 Mondays on a
Flanders freeway (Maerivoet and De Moor 2006). Behavior shifts from smooth flow
to traffic jams at about 40 vehicles/mile, at which value the system ‘crystallizes’
out into discrete ‘chunks’ that interfere with each other, and similarly at about 10%
occupancy (Kerner and Klenov 2009). Analogous dynamics can be expected from
‘macroscopic passenger fundamental diagrams’ that examine multimodal travel networks, but focusing on passenger rather than vehicle flows and densities (Chiabaut
2015; Geroliminis et al. 2014).
Given ρ as the fundamental traffic density parameter, we can extend Eq. (3.2) as
was done for Eq. (1.4):
(3.3)
H (ρ) > f (ρ)a0
Here, however, a0 is a road network constant and f (ρ) is a positive, monotonically
increasing function. Again, the Mathematical Appendix uses a Black-Scholes model
to approximate the ‘cost’ of H as a function of the ‘investment’ ρ. Recall that the
first approximation is linear, i.e. H ≈ κ1 ρ + κ2 . Taking f (ρ) to similar order, so
that, as in the case of Eq. (1.5),
f (ρ) ≈ κ3 ρ + κ4
(3.4)
Again, a Clausewitz temperature can be defined, and, as before, the limit condition
for stability becomes
κ1 ρ + κ2
> a0
(3.5)
T ≡
κ3 ρ + κ4
And as before, for small ρ, the stability condition is κ2 /κ4 > a0 . At large ρ this
again becomes κ1 /κ3 > a0 . If κ2 /κ4 κ1 /κ3 , the stability condition may be violated
at high traffic densities, and instability becomes manifest, as at the higher ranges of
Fig. 3.1. See Fig. 1.2 for the canonical form.
60
3 An Example: Passenger Crowding Instabilities of V2I Public Transit Systems
Fig. 3.1 a Vehicles per hour
as a function of vehicle
density per mile for a street
in Rome. Both streamline
geodesic flow and the phase
transition to ‘crystallized’
turbulent flow at critical
traffic density are evident at
about 40 v/mi. Some of the
states may be ‘supercooled’,
i.e., delayed ‘crystallization’
in spite of high traffic
density. ‘Fine structure’ can
be expected within both
geodesic and turbulent
modes. b One month of data
at a single point on a
Japanese freeway, flow per
five minutes versus vehicles
per kilometer. The critical
value is about 25 v/km =
39.1 v/mi. c 49 Mondays on
a Flanders freeway. The
ellipses contain 97.5% of
data points for the free flow
and congested regimes.
Breakdown begins just shy
of 10% occupancy. Public
transit systems should show
recognizably similar
relations for individual
routes at fixed times
involving the plot of
passengers/hour versus
passengers/vehicle
3.3 Multimodal Transport Systems
61
3.3 Multimodal Transport Systems
For buses embedded in a larger traffic stream we recapitulate something of Sect. 1.3,
as there are at least three critical densities that must interact: vehicles per linear mile,
passengers per bus, and an inverse index of roadway quality that might be called
‘potholes per mile’. There is, then, a characteristic density matrix for the system,
which we write as ρ̂:
⎞
⎛
ρ11 ρ12 ρ13
⎝ ρ21 ρ22 ρ23 ⎠
ρ31 ρ23 ρ33
ρ11 is the number of passengers per bus, ρ22 vehicles per mile, ρ33 ‘potholes per
mile’, and the off-diagonal terms are measures of interaction between them since,
at the least, buses are part of the traffic stream, roadway quality affects vehicles per
mile, and so on.
One might extend the model to even higher dimensions by including, for example,
passenger densities of a subway or light rail system feeding into a transit ‘hot spot’.
Again, we apply the arguments of Sect. 1.3. An n × n matrix ρ̂ has n invariants
ri , i = 1..n, that remain fixed when ‘principal component analysis’ transformations
are applied to data, and these can be used to construct an invariant scalar measure,
using the polynomial relation
p(λ) = det(ρ̂ − λI ) = λn + r1 λn−1 + ... + rn−1 λ + rn
(3.6)
det is the determinant, λ is a parameter and I the n ×n identity matrix. The invariants
are the coefficients of λ in p(λ), normalized so that the coefficient of λn is 1. As
described in Sect. 1.3, typically, the first invariant will be the matrix trace and the
last ± the matrix determinant.
For an n × n ρ-matrix it again becomes possible to define a composite scalar
index Γ as a monotonic increasing function of the matrix invariants
Γ = f (r1 , ..., rn )
(3.7)
Again, Γ replaces ρ in Eq. (3.5).
The simplest example, for a 2 × 2 matrix, would be
Γ = α1 T r [ρ̂] + α2 | det[ρ̂]|
(3.8)
for positive αi . Recall that, for n = 2, Tr[ρ̂] = ρ11 +ρ22 and det[ρ̂] = ρ11 ρ22 −ρ12 ρ21 .
Again, an n × n matrix will have n such invariants from which the scalar index
Γ can be constructed.
62
3 An Example: Passenger Crowding Instabilities of V2I Public Transit Systems
This method can be seen as a variant of the ‘Rate Distortion Manifold’ of Glazebrook and Wallace (2009) or the ‘Generalized Retina’ of Wallace and Wallace
(2013, Sect. 10.1) in which high dimensional data flows can be projected down onto
lower dimensional, shifting, tunable ‘tangent spaces’ with minimal loss of essential
information.
3.4 Simplified Dynamics of System Failure
The DRT argument implies a raised probability of a transition between stable and
unstable behavior if the Clausewitz temperature analog
T ≡
κ 1 Γ + κ2
κ3 Γ + κ4
falls below a critical value, as in Fig. 1.2. Kerner and Klenov (2009), however, argue
that traffic flow can be subject to more than two phases. We can recover something
similar for V2I public transit systems driven by passenger density etc. via a ‘cognitive
paradigm’ similar to that of Sect. 1.6. Recall that Atlan and Cohen (1998) view a
system as cognitive if it must compare incoming signals with a learned or inherited
picture of the world, then actively chooses a response from a larger set of those
possible to it. V2I systems are clearly cognitive in that sense. Such choice, however,
implies the existence of an information source, since it reduces uncertainty in a formal
way.
Given the ‘dual’ information source associated with the inherently unstable cognitive V2I public transit system, an equivalence class algebra can again be constructed
by selecting different system origin states and defining the equivalence of subsequent
states at a later time by the existence of a high probability path connecting them to the
same origin state. Disjoint partition by equivalence class, analogous to orbit equivalence classes in dynamical systems, defines a symmetry groupoid associated with
the cognitive process. Groupoids are ‘weak’ generalizations of group symmetries in
which there is not necessarily a product defined for each possible element pair, for
example in the disjoint union of different groups.
The equivalence classes across possible origin states define a set of information
sources dual to different cognitive states available to the inherently unstable V2I
public transit system. These create a large groupoid, with each orbit corresponding
to a transitive groupoid whose disjoint union is the full groupoid. Each subgroupoid
is associated with its own dual information source, and larger groupoids must have
richer dual information sources than smaller.
Let X G i be the V2I system’s dual information source associated with groupoid
element G i . Given the argument leading to Eqs. (3.5–3.7), it is again possible to
construct a Morse Function in the manner of Sect. 1.6.
Let H (X G i ) ≡ HG i be the Shannon uncertainty of the information source associated with the groupoid element G i . We can define another pseudoprobability as
3.4 Simplified Dynamics of System Failure
63
exp[−HG i /ωT ]
P[HG i ] ≡ j exp[−HG j /ωT ]
(3.9)
where T has again been constructed using a composite index, Γ , and the sum is over
the different possible cognitive modes of the full system. ω is a scaling parameter
representing the rate at which changes in T affect system cognition.
A ‘free energy’ Morse Function F can again be defined as
exp[−F/ωT ] ≡
exp[−HG j /ωT ]
(3.10)
exp[−HG j /ωT ]]
(3.11)
j
or, more explicitly,
F = −ωT log[
j
As a consequence of the inherent groupoid structure associated with complicated
cognition, as opposed to a ‘simple’ stable-unstable control system, we can again
apply an extension of Landau’s version of phase transition. Recall that Landau saw
spontaneous symmetry breaking as representing phase change in physical systems,
with the higher energies available at higher temperatures being more symmetric.
The shift between symmetries is highly punctuated in the temperature index, here
the ‘temperature’ analog of Eq. (3.5), in terms of the scalar construct Γ , but in the
context of groupoid rather than group symmetries. Usually, for physical systems,
there are only a few phases possible. Kerner and Klenov (2009) recognize three
phases in ordinary traffic flow, but V2I transit systems embedded in a larger traffic
network may have relatively complex stages of dysfunction, with highly punctuated
transitions between them as passenger density increases and/or interacts with traffic
density and roadway quality.
The arguments leading to Eqs. (1.36) and (1.67) examine the stability of pathological ‘ground states’ in such a system, suggesting onset of hysteresis effects. That
is, under many circumstances, ground state jam or other condensations may not be
able to clear themselves rapidly, and can need to be actively ameliorated.
A different perspective on system failure is that of the network whose nodes are
the stops and whose edges are the connecting routes. These are taken as ‘closed’
with the probability that the vehicles on it are overcrowded. Following Corless et al.
(1996), when a network with M vertices has m = (1/2)a M connecting edges chosen
‘closed’ at random, for a > 1, it almost surely has a giant connected component—
here a geographic passenger/vehicle/roadway ‘transit jam’—with approximately g M
vertices, where
g(a) = 1 + W [−a exp(−a)]/a
(3.12)
with W being the Lambert W-function. This is defined by W (x) exp[W (x)] = x. We
take a as an index of the proportion of routes with overcrowded transit vehicles.
Figure 3.2 shows the relation, which is strikingly similar to the ‘two population’ model of Fig. 1.4. Indeed, treating the proportions of overcrowded buses and
64
3 An Example: Passenger Crowding Instabilities of V2I Public Transit Systems
Fig. 3.2 Relative size of the largest network connected component—the multimodal ‘transit jam’—
for random connections. a is taken as an index of the proportion of transit vehicles that are overcrowded, and W is the Lambert W-function. Tuning the topology of the network leads to a family
of broadly similar curves with different thresholds and topping-out levels
passengers as separate interacting variates leads to essentially the same model as in
Fig. 1.4.
Decline in a below threshold leads to fragmentation of crowding and the collapse
of the complex transit jam.
As Albert and Barabasi (2002) indicate, tuning the topology—making the networks less random—produces similar forms, differing largely in threshold and
topping-out level. The interaction of two populations can almost always be reexpressed as an abstract network model.
More generally, multiple population systems can be characterized in terms of
sets of different ‘nodes’ associated with sets of different ‘edges’ in various ways.
Connected subcomponents can be defined for such network-analogs and then used
to construct equivalence class groupoids, with the largest ‘connected component(s)’
defining the ‘richest’ such symmetry or symmetries, subject to symmetry-breaking.
3.5 Discussion and Conclusions
65
3.5 Discussion and Conclusions
The essential content of the Data Rate Theorem is, of course, that, if the rate at which
control information can be provided to an unstable system is below the critical limit
defined by the rate at which the system generates ‘topological information’, there
is no coding strategy, no timing strategy, no control scheme of any form, that can
provide stability. Generalization, based on the inherently cognitive nature of V2I
systems—human or AI controlled—suggests that there may be a sequence of stages
of increasing transit jam dysfunction for public transit under the burden of rising
per-bus passenger densities.
Thus, for a bus system necessarily embedded in a larger traffic flow, no matter
what V2I headway manipulations are applied, there will always be a critical per-bus
passenger density that creates the public transit equivalent of a traffic jam, i.e., transit
jams that include bunching, long headways, extended on/off delays, buses too full to
pick up passengers, and so on, all synergistic with gross overcrowding. The arguments
of Kerner and Klenov (2009) on phase transitions carry over into public transit systems whose dynamics are driven by multiple density measures and their interaction.
For a given route at a fixed time, there should be a ‘passenger density macroscopic
fundamental diagram’ (PDMFD) much like Fig. 3.1 showing passengers/hour as a
function of passengers/vehicle. The previous sections, describing service instability,
imply the inevitability of ‘explosive’ deviations from regularity in the PDMFD with
increasing passenger load.
The essential solution to traffic jam analogs—to transit jams in public
transportation—is to provide adequate numbers of vehicles so that critical passenger
densities are not exceeded.
In sum, there can be no cheap tech fix for inadequate public transit service.
References
Albert, R., and A. Barabasi. 2002. Statistical mechanics of complex networks. Reviews of Modern
Physics 74: 47–97.
Atlan, H., and I. Cohen. 1998. Immune information, self-organization and meaning. International
Immunology 10: 711–717.
Blandin, S., et al. 2011. A general phase transition model for vehicular traffic. SIAM Journal of
Applied Mathematics 71: 107–127.
Chiabaut, N. 2015. Evaluation of a multimodal urban arterial: The passenger macroscopic fundamental diagram. Transportation Research Part B 81: 410–420.
Corless, R., G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth. 1996. On the Lambert W function.
Advances in Computational Mathematics 4: 329–359.
Geroliminis, N., N. Zheng, and K. Ampountolas. 2014. A three-dimensional macroscopic fundamental diagram for mixed bi-model urban networks. Transportation Research Part C 42:
168–181.
Glazebrook, J.F., and R. Wallace. 2009. Rate distortion manifolds as model spaces for cognitive
information. Informatica 33: 309–346.
66
3 An Example: Passenger Crowding Instabilities of V2I Public Transit Systems
Ivanchev, J., H. Aydt, and A. Knoll. 2014. Stochastic bus traffic modeling and validation using smart
card fare collection data, In IEEE 17th Conference on ITSC, 2954–2061.
Kerner, B., and S. Klenov. 2009. Phase transitions in traffic flow on multilane roads. Physical Review
E 80: 056101.
Maerivoet, S., and B. De Moor. 2006. Data quality travel time estimation and reliability. Katholieke
University Leuven 06-030.
Nair, G., F. Fagnani, S. Zampieri, and R. Evans. 2007. Feedback control under data rate constraints:
An overview. Proceedings of the IEEEE 95: 108–137.
Sugiyama, Y., M. Fukui, M. Kikuchi, K. Hasebe, A. Nakayama, et al. 2008. Traffic jams without
bottlenecks—experimental evidence for the physical mechanisms of the formation of a jam. New
Journal of Physics 10: 033001.
Tirachini, A., D. Hensher, and J. Rose. 2013. Crowding in public transport systems: Effects on
users, operation and implications for the estimation of demand. Transportation Research Part A
53: 36–52.
Tirachini, A., D. Hensher, and J. Rose. 2014. Multimodal pricing and optimal design of urban public
transport: The interplay between traffic congestion and bus crowding. Transportation Research
Part B 61: 33–54.
Wallace, R., and D. Wallace. 2013. A mathematical approach to multilevel, multiscale health interventions: Pharmaceutical Industry decline and policy response. London: Imperial College Press.
Chapter 4
An Example: Fighting the Last War
Abstract We examine how inadequate crosstalk between ‘tactical’ and ‘strategic’
levels of organization will lead to another version of the John Boyd mechanism of
command failure: the rules of the game change faster than executive systems can
respond. Adequate levels of crosstalk take work.
4.1 Introduction
A healthy small child, given ten or so different pictures of an elephant over a few
days, when first taken to the zoo or the circus (at least during the author’s childhood)
or to the appropriate Disney movie, has no trouble identifying a newly-seen elephant,
in-the-flesh, or on screen. AI systems, deep learning or otherwise, must be confronted
with innumerable elephant pictures in an enormous variety of situations to be able to
identify an elephant in some previously unexperienced context. Human institutions,
which are cognitive entities, do not fare much better.
The canonical example of institutional failure is, perhaps, the inevitability of
military structures almost always ‘fighting the last war’ (or the last battle). Although
one might prefer to focus on the fall of France in 1940, such considerations apply as
much to Erwin Rommel’s armored sideshow of Bewegungskrieg a la 1940 France
in the face of an ultimately overwhelming English/US strategic superiority in North
Africa that harked back to U.S. Grant’s Civil War strategy. Indeed, Bewegungskrieg
suffered a similar, if slower, grinding collapse on the Eastern Front of WWII from
Moscow to Stalingrad and Kursk, for analogous reasons associated with differences
in both manufacturing and manpower capacity and approach. Vergultungs waffen
and a handful of jet interceptors and Tiger tanks didn’t count for much in the face of
massive supply chains and the evolution of sophisticated combined arms tactics. As
they say, the mills of the Gods grind slowly, but they grind exceeding fine.
Grant’s autobiography remains of some interest.
Another cogent example can be found the aftermath of the US victory in the
Gulf wars of 1991 and 2003, and in the career of General H.R. McMaster, as of this
writing, the U.S. National Security Advisor.
© The Author(s), under exclusive licence to Springer International Publishing AG,
part of Springer Nature 2018
R. Wallace, Carl von Clausewitz, the Fog-of-War, and the AI Revolution, SpringerBriefs
in Computational Intelligence, https://doi.org/10.1007/978-3-319-74633-3_4
67
68
4 An Example: Fighting the Last War
During the first Gulf War, February, 1991, H.R. McMaster was a Capitan commanding Eagle Troop of the Second Armored Cavalry Regiment in the battle known
as 73 Easting. As the result of an all too typical US operational failure, McMaster’s
unit of 9 M1A1 tanks and 12 Bradley reconnaissance vehicles (each armed with a
brace of reloadable anti-tank missiles), was ordered to rapidly advance toward Iraqi defense lines in a sand storm, without air support, and without any intelligence
regarding actual enemy deployment.
In the sand storm, not knowing whereabouts of the enemy, McMaster ordered the
lightly-armored Bradley vehicles to form up behind the line of tanks.
Topping a ridge, Eagle Troop’s 9 M1A1 tanks were unexpectedly confronted
with a fully dug-in Iraqi T-72 tank company. Relying on the tactical superiority of
the M1A1 over the T-72, and on the relentless US live-fire training that permitted a
fire rate of 12 shots per minute, in 23 min Eagle Troop destroyed 28 T-72 tanks, 16
armored personnel carriers, and 39 trucks, eliminating the entrenched Iraqi company,
without taking a casualty.
Other US armored units in the same offensive thrust faced similar operational lacunae, again forced to engage in unexpected large-scale combat with highly motivated,
modestly well-trained, deeply entrenched Iraqi armor. Again, only vastly superior
equipment and training permitted US forces to carry through the confrontations with
minimal casualties and with the destruction of almost all enemy units.
In some 90 min, in spite of a characteristic operational level incompetence, US
tactical advantage resulted in the elimination of an entire elite Iraqi armored brigade.
The spirit of Erwin Rommel, and of a resurrected Prussian Bewegungskrieg, seemed
to have completely won the day.
Fast forward to the 2003 occupation of Iraq, the invasion of Afghanistan, and
nearly fifteen years of grinding insurgency: somebody changed the rules of the game
from armored Bewegungskrieg to another style of US Grant’s sociocultural grind.
The Gods are still deciding how small the pieces are going to be on that.
Indeed, in 2005 then-Col. McMaster was tasked with the pacification of the city of
Tal Afar in Iraq, under the rubric of ‘Operation Restore Rights’, a modestly successful
effort whose central innovation was the active presence of many US troops within
the city 24/7, usually involving forcible evictions of Iraqi families to house them
overnight (Finer 2017). The US soon ‘declared victory’ and withdrew. By mid-2006,
in Finer’s words, Tal Afar “was awash in the sectarian violence that had engulfed
much of Iraq”. In June 2014 Tal Afar was one of the first cities taken by the Islamic
State.
McMaster went on to hold a series of staff positions in the US Central Command.
It is, however, of some interest that, in 2006–7, he was passed over for promotion to
general. In 2007 the Secretary of the Army requested that General David Petraeus
return from Iraq to take charge of the promotion board as a way to ensure that the
best performers in combat received every consideration for advancement, resulting
in McMaster’s promotion. As a third-star general, in 2014 he began duties as Deputy
Commanding General of the Training and Doctrine Command (Wikipedia 2017).
As Watts (2008) argues at some length, skills needed at the tactical do-or-die level
do not translate well into a corresponding degree skill at the operational and strategic
4.1 Introduction
69
levels, where the US remains severely challenged. As Watts puts it, tactical problems
are usually subject to relatively simple engineering solutions—better equipment and
training than the opposition—while operational and strategic problems are, in a
formal sense, ‘wickedly hard’, involving subtleties and synergisms that are difficult
to understand and to address. Different kinds of thinking, training, and talent are
required for each level.
It is the contention of this work that AI systems tasked with the control of critical real-time systems will face many of the same problems that have routinely
crippled their institutional counterparts, particularly under fog-of-war and frictional
constraints.
Here, we will model how cognitive systems, including AI entities, are dependent
on continuing crosstalk between strategic and operational ‘algorithms’, in a large
sense, and an appropriate reading of real-time field experience. This will prove to
generate a different version of John Boyd’s ‘command loop’ dynamic failure mechanism.
In short, if you don’t know if, when, or how the rules have changed, you can’t
win the game.
4.2 A Crosstalk Model: Mutual Information Dynamics
Here, the approach to dynamic process is via the mutual information generated by
crosstalk between channels. The essential point is again that there must be continual
communication between tactical and higher—operational and strategic—levels of
cognition. The tactical level is tasked with response to real-time ‘roadway’ shifts,
while the operational level must coordinate larger-scale tactical challenges and the
strategic must focus on monitoring the inevitably-changing rules-of-the-game. As
they say, don’t take a knife to a gunfight.
The focus is on the mutual information between information channels representing
the tactical and higher levels of control.
Mutual information between information sources X and Y is defined as
I (X ; Y ) = H (Y ) − H (Y |X ) =
p(x, y) log[ p(x, y)/ p(x) p(y)] =
x,y
p(x, y) log[ p(x, y)/ p(x) p(y)]d xd y
(4.1)
x,y
where the last expression is for continuous variates. It is a convex function of the
conditional probability p(y|x) = p(x, y)/ p(x) for fixed probabilities p(x) (Cover
and Thomas 2006), and this would permit a complicated construction something like
that of previous chapters, taking the x-channel as the control signal. We will treat a
simplified example.
70
4 An Example: Fighting the Last War
Given two interacting channels where the p’s have normal distributions, mutual
information is related to correlation as
M ≡ I (X ; Y ) = −(1/2) log[1 − ρ 2 ]
(4.2)
ρ is the standard correlation coefficient.
Mutual information is usually taken as simply a measure of correlation, but this
is not strictly true, since information is a form of free energy, and causal correlation
between phenomena cannot occur without the transfer of free energy. Taking ρ 2 ≡ Z ,
we can, as in the earlier chapters, define an entropy-analog as
SM ≡ M (Z ) − Z dM (Z )/d Z
(4.3)
The simplest deterministic empirical model is then
d Z (t)/dt = μd SM /d Z = −
Z (t)
μ
2 (1 − Z (t))2
(4.4)
Assuming Z (0) = 1, the implied integral gives
− Z 2 + 4Z − 2 log(Z ) = μt + 3
(4.5)
The Implicitplot function of the computer algebra program Maple, or ContourPlot
in Mathematica, gives Fig. 4.1. For small Z and large t this is just Z ≈ exp[−μt/2],
from the relation d Z /dt ≈ −(μ/2)Z (t).
Whatever the initial squared correlation Z (0), using the gradient of SM , the
nonequilibrium steady state, when d Z /dt = 0, is always exactly zero. That is, under
dynamic conditions, the final state is an uncorrelated pair of signals, in the absence
of free energy exchange from crosstalk linking them.
If tactical experience is isolated from cognitive strategy based on ‘doctrine’, one
attempts Bewegungskrieg at Kursk. Or one occupies Iraq and invades Afghanistan.
The simplest SDE extension of the model is just
Zt
μ
+ K dt + bZ t dWt
d Zt = −
2 (1 − Z t )2
(4.6)
where K is a measure of free energy exchange, μ is a diffusion rate and, again, dWt
represents white noise.
This has the steady state expectation
1 4K + μ − 8K μ + μ2
ρ =
4
K
2
The limit of this expression, as K → ∞, is 1.
(4.7)
4.2 A Crosstalk Model: Mutual Information Dynamics
71
Fig. 4.1 Square of
correlation coefficient
between two channels
characterized by normally
distributed variates versus
normalized time μt for
mutual information without
free energy crosstalk. At
t = 0, ρ 2 = 1
Fig. 4.2 For μ = 1, the
steady state expectation of
the squared correlation
coefficient between two
linked channels having
normal distributions of
numerical variates. It is
shown as a function of the
crosstalk free energy index
linking them in the model of
Eq. (4.4). The rate of
convergence of ρ 2 to 1
decreases with increasing μ
Figure 4.2 shows a graph of ρ 2 versus K for μ = 1. The greater the diffusion
coefficient μ, the slower the rate of convergence.
It is possible to determine the standard deviation of the squared correlation (i.e.,
of the fraction of the joint variance) by calculating the difference of steady state
expectations E(Z 2 ) − E(Z )2 for the model of Eq. (4.6), again using the Ito chain
72
4 An Example: Fighting the Last War
rule. Taking b = μ = 1, the real result declines monotonically with increasing K :
more crosstalk, less scatter.
For non-normally distributed variates, one can expand the mutual information
about a normal distribution using the Gram-Charlier method (e.g., Stuart and Ord
1994).
4.3 Discussion
The K index in Eq. (4.6) is a free energy measure representing the degree of crosstalk
between the channels X and Y , indexing here different levels of command. Free energy is a measure of actual work. That is, it takes active work to effectively crosslink
strategic and tactical levels of organization in a cognitive system. It is not enough
to ‘accept information’ from below, but that information must be winnowed out to
look for patterns of changing challenge and/or opportunity. Winnowing data involves
choice, choice involves cognition and the reduction of uncertainty. The reduction of
uncertainty implies the existence of an information source. Information is a form
of free energy, and the exercise of cognition implies the expenditure of, often considerable, free energy. The argument is exactly circular, and illustrated by Fig. 4.2.
Fog-of-war and frictional constraints can probably be added to this model, perhaps
via parameterization of the variance measure E(Z 2 ) − E(Z )2 .
AI systems that take millions of exposures to pictures of elephants in different
contexts to recognize one elephant in an unfamiliar context will not do well when
told they must search for tigers. ‘Big Data’ cannot train AI unless some executive
function has already recognized the need to winnow down and choose the appropriate
data subset for retraining, even for advanced algorithms that are easily trained. If
embedding reality changes the game faster than the AI (or institutional) system can
respond, then some version of John Boyd’s trap will have been sprung on it, either
through happenstance or deliberation.
Crosstalk takes work.
References
Cover, T., and J. Thomas. 2006. Elements of information theory, 2nd ed. New York: Wiley.
Finer, J. 2017. H.R. McMaster is hailed as the hero of Iraq’s Tal Afar. Here’s what that operation
looked like, Washington Post, 2/24/2017.
Stuart, A., and J. Ord. 1994. Kendall’s advanced theory of statistics, 6th ed. London: Hodder Arnold.
Watts, B. 2008. US combat training, operational art, and strategic competence: Problems and
opportunities. Washington, D.C.: Center for Strategic and Budgetary Assessments.
Wikipedia. 2017. https://en.wikipedia.org/wiki/H._R._McMaster.
Chapter 5
Coming Full Circle: Autonomous Weapons
Abstract The powerful asymptotic limit theorems of control and information
theories illuminate target discrimination failures afflicting autonomous weapon,
man/machine centaur or cockpit, and more traditional structures under increasing
fog-of-war and friction burdens. Degradation in targeting precision by high level
cognitive entities under escalating uncertainty, operational difficulty, attrition, and
real-time demands, will almost always involve sudden collapse to the familiar pathological ground state in which all possible targets are enemies, historically known as
‘kill everyone and let God sort them out’.
5.1 Introduction
Unfortunately... adjusting the sensor threshold to increase the number of target attacks also
increases the number of false target attacks. Thus the operator’s objectives are competing,
and a trade-off situation arises. (Kish et al. 2009)
Failure to differentiate between combatants and non combatants haunts military
enterprise under the best of circumstances. While the Wikipedia entry for ‘WW II
massacres’ lists 57 deliberate incidents of state terrorism—ranging from Babi Yar
to Zywocice—even US forces in Korea saw themselves confronted by a fog-of-war
challenge involving a ‘threat of infiltration’ via displaced civilians streaming South.
US command response to this uncertainty was to order troops to relentlessly fire on
refugees. The infamous No Gun Ri Bridge incident in which perhaps 300 unarmed
men, women, and children were killed represents the tip of an iceberg involving
hundreds of similar mass killings of civilians (Hanley et al. 2001).
The My Lai massacre of some 500 Vietnamese villagers by an out-of-control
US unit would seem to be a similar failure under fog-of-war pressures: ‘Who is the
enemy? Everybody is the enemy’ (Hersh 1972).
This cognitive ‘ground state collapse’ mechanism, which we have explored in
some detail above, extends to autonomous, and hence also cognitive, weapons and
other military systems. Indeed, the American drone war in the Middle East and
Africa is already a political catastrophe (Columbia 2012; Stanford/NYU 2012) that
will haunt the United States well into the next century, much as the miscalculations
© The Author(s), under exclusive licence to Springer International Publishing AG,
part of Springer Nature 2018
R. Wallace, Carl von Clausewitz, the Fog-of-War, and the AI Revolution, SpringerBriefs
in Computational Intelligence, https://doi.org/10.1007/978-3-319-74633-3_5
73
74
5 Coming Full Circle: Autonomous Weapons
that created and followed World War I—including the European colonial ‘country
building’ producing Iraq and Syria—haunt us today. At present, the USA—and other
nations—are poised to move beyond current man/machine ‘cockpit’ drone systems
to autonomous weapons.
As Archbishop Silvano Tomasi (2014) put it,
..[T]he development of complex autonomous weapon systems which remove the human
actor from lethal decision-making is short-sighted and may irreversibly alter the nature of
warfare in a less humane direction, leading to consequences we cannot possibly foresee, but
that will in any case increase the dehumanization of warfare.
‘Centaur warfighting’—enhanced cockpits—that ‘keep the man in the loop’, it is
asserted, will both outperform automatons and constrain, somewhat, the horrors of
war. However, as Scharre (2016) describes, the First Gulf War Patriot missile fratricides (Hawley 2006) raise significant questions regarding the operational reliability
of such systems under fog-of-war constraints. The Patriot can be seen as an early
example of forthcoming centaur man/machine composites.
Trsek (2014) studies the 1988 US AEGIS system downing of a civilian airliner,
concluding that
[Command responsibility] is already several steps removed from the operator in practice – it
is naive to believe that we are relying on biological sensing to fulfill [rules-of-engagement]
criteria, where the majority of information is electronically derived.
The collapse dynamics we have explored in the previous chapters move the argument beyond Scharre’s ‘operational risk’ into violations of the Laws of Land Warfare
that require distinction between combatants and non combatants.
To reiterate, unlike an aircraft that can remain in stable flight as long as the center
of pressure is sufficiently behind the center of gravity, high-order cognitive systems
like human sports and combat teams, man-machine ‘cockpits’, self-driving vehicles,
autonomous weapon systems, and modern fighter aircraft—built to be maneuverable
rather than stable—operate in real-time on rapidly-shifting topological ‘highways’
of complex multimodal demand. Facing these turbulent topologies, according to the
Data Rate Theorem, the cognitive system must receive a constant flow of sufficiently
detailed information describing them.
5.2 The Topology of Target Space
Matters are, of course, even more complex. The underlying ‘roadway topology’
of combat operations becomes exceedingly rich under conditions of necessary discrimination between combatant and noncombatant. Again, the problem of air traffic
control (ATC) provides an entry point. In ATC, locally stable vehicle paths are seen
as thick braid geodesics in a simpler Euclidean quotient space (Hu et al. 2001). These
are generalizations of the streamline characteristics of hydrodynamic flow (Landau
and Lifshitz 1987). As described above, in the context of ATC, Hu et al. demonstrate
5.2 The Topology of Target Space
75
that finding collision-free maneuvers for multiple agents on a Euclidean plane surface R 2 is the same as finding the shortest geodesic in a particular manifold with
nonsmooth boundary. Given n vehicles, that geodesic is calculated for the topological quotient space R 2n /W (r ), where W (r ) is defined by the requirement that no
vehicles are closer together than some critical Euclidean distance r .
For autonomous or other weapons under targeting constraints r is, crudely, the
minimum acceptable distance to possible noncombatants in the target zone. R 2 must
again be replaced by a far more topologically complex and extraordinarily dynamic
roadway space M 2 (or even M 3 ) that incorporates evasive maneuvers of potential
targets within and around ‘no-go’ zones for the weapon. Geodesics for n possible
targets are then in a highly irregular and rapidly-shifting quotient space M αn /W (r ),
whose dynamics are subject to phase transitions driven by the convolution of fog-ofwar and friction indices characterized in the previous chapters. The different phases
are analogous to the different ‘traffic jam’ conformations identified by Kerner and
Klenov (2009), who apply insights from statistical physics to traffic flow.
Needless to say, navigating under such restraints will always be far more difficult
than in the case of air traffic control. The ‘ground state’ fallback will obviously be
to simply collapse r to zero and thus greatly simplify target space topology.
According to the Data Rate Theorem, if the rate at which control information
can be provided to an unstable system is below the critical limit defined by the rate
at which the system generates ‘topological information’, there is no coding strategy, no timing strategy, no control scheme of any form, that can ensure stability.
Generalization to the rate of incoming information from the rapidly-changing multimodal ‘roadway’ environments in which a real-time cognitive system must operate
suggests that there will be sharp onset of serious dysfunction under the burden of
rising demand. In Sect. 1.3 we analyzed that multimodal demand in terms of the
crosstalk-like fog-of-war matrix ρi, j that can be characterized by situation-specific
statistical models leading to the scalar temperature analog T and a similar argument
leading to the friction/resolve index φ. More complicated ‘tangent space’ reductions
are possible, at the expense of greater mathematical overhead (e.g., Glazebrook and
Wallace 2009).
There will not be graceful degradation under falling fog-of-war ‘temperatures’ or
increasing ‘friction’, but rather punctuated functional decline that, for autonomous,
centaur, or man-machine cockpit weapon systems, deteriorates into a frozen state
in which ‘all possible targets are enemies’, as in the case of the Patriot missile
fratricides (Hawley 2006). Other cognitive systems will display analogous patterns of
punctuated collapse into simplistic dysfunctional phenotypes or behaviors (Wallace
2015a, b, 2017): the underlying dynamic is ubiquitous and, apparently, inescapable.
As Neuneck (2008) puts it,
[Proponents of the ‘Revolution in Military Affairs’ seek] to eliminate Clausewitz’s ‘fog of
war’... to eliminate unpredictability on the battlefield. War is a complex, nonlinear process
of violent interactions where technological edge is not a guarantee for success.
76
5 Coming Full Circle: Autonomous Weapons
Fig. 5.1 Adapted from Venkataraman et al. (2011). Under real-time fog-of-war constraints it
becomes difficult for automated systems to differentiate between military and civilian vehicles.
Ground state collapse identifies everything as a tank
The problem, in its many and varied intractable forms, has been considered and
reconsidered across a number of venues. In addition to the opening remarks by Kish
et al. (2009); Venkataraman et al. (2011), for example, review a relevant sector of
the signal processing literature and Fig. 5.1, adapted from their paper, encapsulates,
in reverse, something of the conundrum. Under sufficient real-time fog-of-war constraint, a cognitive system collapses into a ground state that does not differentiate
between an SUV, a van, and a tank.
In 2017 the Pentagon’s elite advisory panel, the infamous JASON group of Vietnam war ‘automated battlefield’ fame, released an unclassified overview of possible
uses for artificial intelligence by the US Department of Defense (JASON 2017).
5.2 The Topology of Target Space
77
At the very end of a Statement of Work appendix to the report is the following
‘Scope’ Q&A exchange:
4. Many leading AI researchers and scientists from other disciplines expressed their concerns
of potential pitfalls of AI development in the “Open Letter on Artificial Intelligence.” As the
letter suggests, can we trust these agents to perform correctly? Can we verify and validate
these agents with sufficient level of built-in security and control to ensure that these systems
do what we want them to do?
JASON response: Verification and validation of AI agents is, at present, immature. There is
considerable opportunity for DoD to participate in the program of advancing the state of the
art of AI to become a true engineering discipline, in which V&V, as well as other engineering
“ilities”[reliability, maintainability, accountability, verifiability, etc.], will be appropriately
controlled.
Recognizing, perhaps, a classic lawyer’s tapdance, Carl von Clausewitz might well
disagree that matters will be this simple. Indeed, it is interesting to note that John Boyd
himself had directly ordered the closing of JASON’s automated battlefield project—
a mix of electronic sensors and quick-response air strikes aimed at closing North
Vietnam’s ‘Ho-Chi Mihn trail’ supply line—as an ineffective waste of resources
(Lawson 2014, Chap. 5).
In sum, as with other real-time AI, there is no free lunch for cognitive weapon systems, with or without hands-on human control. All such systems—including conventional military command structures at different scales and levels of organization—are
inherently susceptible to serious operational instabilities under complex fog-of-war
and frictional environments. Policy based on the business dreams of military contractors and their academic or think-tank clients—promises of precision targeting—will
be confronted by nightmare realities of martyred civilian populations, recurring generations of new ‘terrorists’, and the persistent stench of war crime.
References
Columbia University Law School Human Rights Clinic. 2012. Counting drone strike deaths. http://
web.law.columbia.edu/human-rights-institute.
Glazebrook, J.F., and R. Wallace. 2009. Rate distortion manifolds as model spaces for cognitive
information. Informatica 33: 309–346.
Hanley, C., M. Mendoza, and S. Choe. 2001. The bridge at No Gun Ri: A hidden nightmare from
the Korean War. New York: Henry Holt and Company.
Hawley, J. 2006. Patriot fratricides: The human dimension lessons of Operation Iraqi Freedom.
Field Artillery, January-February.
Hersh, S. 1972. Cover-up: The Army’s secret investigation of the massacre at My Lai 4. New York:
Random House.
Hu, J., M. Prandini, K. Johnasson, and S. Sastry. 2001. Hybrid geodesics as optimal solutions to
the collision-free motion planning problem. In HSCC 2001. LNCS, vol. 2034, eds. Di Benedetto,
M., and A. Sangiovanni-Vincentelli, 305–318.
JASON. 2017. Perspectives on research in artificial intelligence and artificial general intelligence
relevant to DoD, JSR-16-Tasl-003. McLean, VA: The MITRE Corporation.
Kerner, B., and S. Klenov. 2009. Phase transitions in traffic flow on multilane roads. Physical Review
E 80: 056101.
78
5 Coming Full Circle: Autonomous Weapons
Kish, B., M. Pachter, and D. Jacques. 2009. Effectiveness measures for operations in uncertain
environments. In UAV Copperative Decision and Control: Challenges and Practical Applications,
eds. Shima, T., and S. Rasmussen, Chap. 7. Philadelphia: SIAM Publications.
Landau, L., and E. Lifshitz. 1987. Fluid mechanics, 2nd ed. Pergamon: New York.
Lawson, S. 2014. Non-linear science and warfare: Chaos, complexity and the US military in the
information age. New York: Routledge.
Neuneck, G. 2008. The revolution in military affairs: Its driving forces, elements, and complexity.
Complexity 14: 50–60.
Scharre, P. 2016. Autonomous weapons and operational risk. Washington, DC: Center for New
American Security. http://www.cnas.org/autonomous-weapons-and-operational-risk.vtISHGORiY.
Stanford/NYU. 2012. Living under drones: Death, injury and trauma to civilians from US drone
practices in Pakistan. http://livingunderdrones.org/.
Tomasi, S. 2014. Catholic Herald. http://www.catholicherald.co.uk/news/2014/05/15/vaticanofficial-voices-opposition-to-automated-weapons-systems/.
Trsek, R., and (Lt. Col. USAF). 2014. Cutting the cord: Discrimination and command responsibility
in autonomous lethal weapons. USA: Air War College of Air University.
Venkataraman, V., G. Fan, L. Yu, X. Zhang, W. Liu, and J. Havlick. 2011. Automated target tracking
and recognition using coupled view and identity manifolds for shape recognition. EURASIP
Journal of Advances in Signal Processing 2011: 124.
Wallace, R. 2015a. An ecosystem approach to economic stabilization: Escaping the neoliberal
wilderness. London: Routledge.
Wallace, R. 2015b. An information approach to mitochondrial dysfunction: Extending Swerdlow’s
hypothesis. Singapore: World Scientific.
Wallace, R. 2017. Information theory models of instabilities in critical systems. Singapore: World
Scientific.
Chapter 6
An Evolutionary Approach to Real-Time
Conflict: Beware the ‘Language that Speaks
Itself’
Abstract We examine real-time critical processes through an evolutionary lens,
finding that protracted conflict between cognitive entities can trigger a self-referential,
coevolutionary bootstrap dynamic, virtually a ‘language that speaks itself’. Such phenomena do not permit simple command-loop interventions in John Boyd’s sense and
are very hard to contain.
6.1 Introduction
Evolutionary perspectives, focusing on institutional interactions and dynamics, are
an attractive alternative to simplistic ‘atomistic self-interest’ approaches in economic
theory (e.g., Hodgson and Knudsen 2010; Wallace 2015a, Chap. 1, and references
therein). In addition, coevolutionary arguments provide other insights to the flashcrash algorithmic pathologies examined in Sect. 1.9. The presentation is (much)
adapted from Wallace (2017b, Chap. 10) and Wallace (2011).
We attempt to explicitly formalize dynamics that, in evolutionary terms, are usually characterized as ‘self-referential’.
Goldenfeld and Woese (2010) describe the mechanism for biological evolution.
They see the genome as encoding the information which governs the response of
an organism to its physical and biological environment. At the same time, they
argue, this environment actually shapes genomes through gene transfer processes
and phenotype selection. This inevitably produces a situation where the dynamics
must be self-referential: the update rules change during the time evolution of the
system, and the way in which they change is a function of the state and thus the
history of the system self-referential dynamics is an inherent and probably defining
feature of evolutionary dynamics.
Others, of course, have observed the recursive, self-referential nature of evolution, language, and postulated something approaching a ‘language of evolution’
(e.g., Langton 1992; Sereno 1991; Von Neumann 1966). Here we explore such selfreferential dynamics from the perspectives of Wallace (2010, 2011), recognizing
that the representation of fundamental biological, cognitive, and other processes
in terms of information sources significantly restrains the inherent nonequilibrium
© The Author(s), under exclusive licence to Springer International Publishing AG,
part of Springer Nature 2018
R. Wallace, Carl von Clausewitz, the Fog-of-War, and the AI Revolution, SpringerBriefs
in Computational Intelligence, https://doi.org/10.1007/978-3-319-74633-3_6
79
80
6 An Evolutionary Approach to Real-Time Conflict …
nature of those processes. That is, although the operation of information sources is
both nonequilibrium and irreversible in the most fundamental sense (e.g., few and
short palindromes), the asymptotic limit theorems of information theory beat back
the mathematical thicket surrounding such phenomena. The theorems permit, in
some measure, a non-equilibrium steady state approximation to inherently nonequilibrium processes under proper circumstances, and allow the stochastic differential
equation models inherent to nonequilibrium statistical mechanics to penetrate a full
step deeper.
Two dynamics dominate evolutionary process: punctuated equilibrium, in the
sense of Eldredge and Gould (1972), and path dependence (Gould 2002). Punctuated equilibrium implies periods of relational stasis followed by sudden ‘extinction’
and/or ‘speciation’ events where entities undergo fundamental reorganization under
selection pressures that may involve competition. Path dependence implies that what
comes next depends heavily on, and is largely driven by, what has come before.
Western ‘market economies’ quintessentially involve persistent, grinding conflict
between cognitive entities, i.e., competition between institutions. The model can be
applied, with some modification, to the kind of de-facto combat operations likely to
confront AI control of real-time critical systems. The basic argument is that conflict
will always act as a powerful selection pressure on interacting cognitive entities, of
any nature, leading to punctuated speciation/extinction events on appropriate time
scales, depending on the exact nature of the contending systems.
Changes in Soviet military organization, leadership, and doctrine under German
‘selection pressure’ in WWII provide something of a case history, albeit on a different
timescale than the stock market flash-crash. The emergence of an ‘insurgency’ after
the botched US occupation of Iraq in 2003 provides another example, as does the
ultimately successful resistance of the defeated Confederate states after the US Civil
War that forced the removal of Federal troops after 1877, leading to imposition of a
draconian ‘Jim Crow’ system of racial apartheid and voter disenfranchisement that
lasted well into the latter half of the 20th Century. Indeed, after 1980, the Jim Crow
system evolved into current nation-wide programs of mass incarceration afflicting
racial minorities with much the same effect.
Interacting cognitive enterprises can be seen as undergoing evolutionary process
according to a modified version of the traditional biological mode (Wallace 2010,
2011, 2013, 2015a):
1. Variation. Among individual cognitive entities—AI systems, individuals, institutions, and their composites—there is considerable variation in structure and
behavior.
2. Inheritance of culture. Along its developmental path, which can be seen as a
kind of reproductive process, a machine/entity/institution (MEI) will resemble
its own history more than that of others, as ‘corporate’ strategies, resources, and
perspectives are passed on in time.
3. Change. Learned or enforced variation in structure, policy, and ‘doctrine’, in a
large sense, is constantly occurring in surviving MEI’s.
6.1 Introduction
81
4. Environmental interaction. Individual MEI’s and related groups engage in powerful, often punctuated, dynamic mutual relations with their embedding environments that may include the exchange of ‘heritage material’ between markedly
different entities through learning, or the abduction or diffusion of ideas and
opinions.
Many of the essential processes within this kind of structure and sets of such
structures can be represented in terms of interacting information sources, constrained
by the asymptotic limit theorems of information and control theories. Following the
arguments of Wallace (2010, 2011, 2013, 2015a), it can be shown that:
1. An embedding ecosystem—in a large sense—must have regularities of ‘grammar’
and ‘syntax’ that allow it to be represented as an information source, say X .
2. Like genetic heritage, MEI heritage is also characterized as a ‘language’, and
hence an information source Y .
3. As described above, cognition involves a dual information source, Z. Further,
cognition is always associated with groupoids that generalize the idea of a symmetry group.
4. Large deviations in dynamical systems occur with very high probability only
along certain developmental pathways, allowing definition of an information
source we will call LD . See Wallace (2010, 2011, 2013, 2015a) for details that
follow the arguments of Champagnat et al. (2006).
Somewhat more specifically, as Champagnat et al. (2006) note, shifts between the
nonequilibrium steady states of an evolutionary system can be addressed by the large
deviations formalism. They find that the issue of evolutionary dynamics drifting away
from trajectories predicted by their canonical defining equations can be investigated
by considering the asymptotic of the probability of ‘rare events’ for the sample paths
of the diffusion.
By rare events they mean diffusion paths drifting far away from the canonical
equation. The probability of such rare events is governed by a large deviation principle: when a critical parameter (designated I ) goes to zero, the probability that
the sample path of the diffusion is close to a given rare path decreases exponentially
to 0 with rate I (φ), where the rate function I can be expressed in terms of the
parameters of the diffusion. This result, in their view, can be used to study long-time
behavior of the diffusion process when there are multiple attractive evolutionary singularities. Under proper conditions the most likely path followed by the diffusion
when exiting a basin of attraction is the one minimizing the rate function I over all
the appropriate trajectories. The time needed to exit the basin is of the order exp(V /ε)
where V is a quasipotential representing the minimum of the rate function I over
all possible trajectories.
An essential fact of large deviations theory is that the rate function I can be
expressed in the familiar canonical form of an information source, i.e.,
I =−
j
Pj log(Pj )
(6.1)
82
6 An Evolutionary Approach to Real-Time Conflict …
for some probability distribution. This result goes under a number of names; Sanov’s
Theorem, Cramer’s Theorem, the Gartner-Ellis Theorem, the ShannonMcMillan
Theorem, and so forth (Dembo and Zeitouni 1998). Thus a large deviation can itself
be described in terms of an information source, here designated LD .
As a consequence of these considerations, we can define a joint Shannon uncertainty representing the interaction of these information sources as
H (X , Y , Z, LD )
(6.2)
6.2 An Iterated Coevolutionary Ratchet
Defining (yet) another ‘entropy’ across a vector of system parameters K as
Ŝ ≡ H (K) − K · ∇K H
(6.3)
we can apply, in first order, an analog to the now-standard Onsager approximation
involving time dynamics driven by a linear SDE model in the gradients of Ŝ by the
components of K. Then
dKti ≈ (
μi,k ∂ Ŝ/∂Kti )dt + σi Kti dBt
(6.4)
k
μi,k is a diffusion matrix analog, and the last term represents volatility in a noise
process dBt that may not be Brownian.
Setting the expectation of this set of relations to zero, we find a relatively large
set of nonequilibrium steady states, indexed as j = 1, 2, ...jmax 1 and each characterized by an uncertainty value Hj .
Importing the Clausewitz temperature T , we again write a pseudoprobability for
state q as
exp(−Hq /T )
(6.5)
Pq = j exp(−Hj /T )
and define a new ‘free energy’ Morse Function F̂ in terms of the denominator sum,
exp(−F̂/T ) ≡
exp(−Hj /T )
(6.6)
j
Arguing by abduction from previous sections, change in T (that is inverse in
the index of fog-of-war and/or friction, ρ or Γ ) will be associated with profound—
and highly punctuated—evolutionary transitions (Eldredge and Gould 1972; Gould
2002; Wallace 2010, 2011). These transitions, involving cognitive groupoid analogs
to physical ‘symmetry breaking’ (Pettini 2007), then define entirely new pathways
6.2 An Iterated Coevolutionary Ratchet
83
Fig. 6.1 Adapted from Fig. 10.3 of Wallace (2017b). The vertical axis indexes MEI capacity. The
horizontal one represents the degree of Clausewitz challenge Γ . At low values the system drifts
about a nonequilibrium steady state with significant capacity. Γ burden exceeding some critical level
triggers a punctuated phase change via a large deviation, leading to a less organized nonequilibrium
steady state. Such disintegration will likely, in itself, constitute a serious environmental insult,
leading to ‘self-referential’ ratchet dynamics: a positive feedback-driven race to the bottom. Similar
mechanisms may act during the rapid flash-crashes studied in Sect. 1.9
along which systems of conflicting MEI’s develop. There is never, ever, a ‘return to
normal after perturbation’ in path-dependent evolutionary process.
The evolutionary dynamic we propose for conflicting MEI’s under Clauswitzian
stress is illustrated by Fig. 6.1, (adapted from Fig. 10.3 of Wallace 2017b). The vertical axis represents an index of system capacity—the ability to carry out designated
duties. The horizontal axis is taken as a measure of the Clausewitz stress Γ . At low
levels of stress the system drifts about some nonequilibrium steady state having relatively high degrees of capacity. When stress exceeds a threshold, there is a punctuated
phase change associated with a large deviation, leading to a less organized nonequilibrium steady state, as indicated. Thus onset of disintegration may itself constitute a
significant environmental insult, leading to a fully self-referential downward ratchet,
similar to the argument in Sect. 1.11.
A relatively simple deterministic mathematical description of such a binary switch
might be as follows. Assume Γ , the stress index, is initially at some nonequilibrium
steady state, and that Γ → Γ + ε. Then ε can be assumed, at least in first order, to
follow an approximate relation
d ε/dt = με − C/ε, C, μ > 0
(6.7)
84
6 An Evolutionary Approach to Real-Time Conflict …
√
If ε ≤ C/μ, then d ε/dt ≤ 0, and the system remains at or near the initial value
of Γ . Otherwise d ε/dt becomes positive, and the switch is triggered, according to
Fig. 6.1.
The standard stochastic extension has the SDE dynamics
d εt = (μεt − C/εt )dt + σ εt d Wt
(6.8)
where σ is an index of the magnitude of an impinging white noise d Wt . Then,
applying the Ito chain rule to log[εt ], the relation of Eq. (6.6) becomes
d ε/dt = με −
1
C
− σ 2ε
ε
2
(6.9)
The last term is the added ‘Ito correction factor’ due to noise. ε has the nonequilibrium steady state expectation, again via the Jensen inequality for a concave function,
E(ε) ≥
C
μ − 21 σ 2
(6.10)
Below this level, the system collapses to zero. Above it, the system ‘explodes’ to
higher values.
Sufficient noise creates the ‘stochastic self-stabilization’ of Mao (2007), locking
in the collapsed ratchet state. In addition, since Eq. (6.8) represents an expectation
across a probability distribution, even at relatively low mean values there may well be
much larger stochastic excursions—large deviations—that can trigger a destabilizing
transition, following Fig. 6.1. For example, Wallace (2015a, Chap. 7) examines the
impact of the diversion of technological resources from civilian to military industrial
enterprise during the Cold War—equivalent to increasing σ in Eq. (6.9)—locking in
the massive ‘rust belt’ industrial collapse in the US.
Of course, given sufficient ‘available free energy’, in a large sense, upward ratchets
in levels of organization—analogous to the famous aerobic transition or, in human
social systems, to the Renaissance, the Industrial Revolution, the many post Victorian
Great Urban Reforms, and the US Labor and Civil Rights Movements—are also
possible, but these cannot at all be described as a ‘return to normal’ after perturbation.
Under such circumstances, decline in σ in Eq. (6.9) can lower the collapse-to-zero
threshold and trigger a monotonic increasing function of the ‘free energy’ index.
If σ 2 /2 ≥ μ, however, the needed reinvestment may become very large indeed,
leading to the collapse of the MEI.
It is important to realize that, although we have couched evolutionary dynamics
in terms of interacting information sources, evolutionary process, per se, is not cognitive. Variation and selection will operate in the presence of any heritage system,
Lamarckian, cultural, and so on. Russian success over Prussian Bewegungskrieg in
WWII owes much (but not everything) to this dynamic, which, in long-term, can
undercut the John Boyd mechanism associated with the Data Rate Theorem that
6.2 An Iterated Coevolutionary Ratchet
85
applies to real-time tactics and operations. Recall also the ultimate defeat of the
US ‘revolution in military affairs’ of the 1990s by the grinding ‘insurgencies’ that
evolved against it in Iraq and Afghanistan.
In sum, Wallace (2011); Goldenfeld and Woese (2010), and others emphasize the
point that evolutionary process is, at base, self-dynamic, self-referential, continuallybootstrapping phenomenon, one that, in essence, becomes ‘a language that speaks
itself’. Once triggered, such evolutionary ratchets can take on a life of their own,
entraining constituent cognitive subprocesses into a larger, embedding, but basically
non-cognitive, processes in which there is no command loop to short-circuit, in the
sense of John Boyd. The German experience on the Eastern Front of WW II, the
US experiences in Vietnam, Iraq and Afghanistan, and the market flash-crashes of
Sect. 1.9 seem to provide examples, albeit on different time scales.
6.3 Dynamics of Large Deviations
A somewhat different view emerges from explicitly considering the dynamics of the
large deviations characterized by the information source LD (Wallace 2011). This
can be done using the metric M from Chap. 2, as described in the Mathematical
Appendix. Recall that M characterizes the ‘distance’ between different essential
behaviors and/or other ‘phenotypes’. We then express the large deviation in terms
of the dynamics of M , using the entropy of Eq. (6.3) to define another first-order
stochastic Onsager equation having of the form
d Ŝ
dM
=μ
+ σ Wt
dt
dM
(6.11)
Here, d M /dt represents the ‘flow’ from system A to  in the underlying manifold,
and Wt represents Brownian noise. Again, see the Mathematical Appendix for details.
More generally, this must be expressed as the SDE
d Mt = μ(Mt , t)dt + σ (Mt , t)dBt
(6.12)
where Bt is not necessarily Brownian white noise and μ and σ are now appropriate
functions of Mt and t. Here we enter deep realms of stochastic differential geometry
in the sense of Emery (1989). We do this by making an explicit parameterization of
M (A, Â) in terms of a vector K and an associated metric tensor gi,j (K) as
 dKi dKj 1/2
] dt
[
gi,j (K)
M (A, Â) =
dt dt
A
i,j
(6.13)
86
6 An Evolutionary Approach to Real-Time Conflict …
where the integral is taken over some parameterized curve from A to  in the embedding manifold. Substituting Eq. (6.13) into Eq. (6.12) produces a very complicated
expression in the components of K.
A first order iteration would apply the calculus of variations to minimize Eq. (6.13),
producing a starting expression having the form
d 2 Ki i dKj dKm
+
=0
Γj,m
dt
dt dt
j,m
(6.14)
where the Γ terms are the famous Christoffel symbols involving sums and products
of gi,j and ∂gi,j /∂Km . For the second iteration, this must be extended by introduction
of noise terms to produce Emery’s stochastic differential geometry. The formalism
provides a means of introducing essential factors such as geographic, social, and
other structures as the necessary ‘riverbanks’ constraining the ‘flow’ of self-dynamic
evolutionary process that act in addition to historical path-dependence.
In the Mathematical Appendix we contrast this empirical Onsager approach,
where equations must actually fit data, with the supposedly necessary and sufficient
methodology of evolutionary game theory.
6.4 Cambrian Events: Spawning Hydras
A particular implication of an evolutionary perspective is the possibility of the coevolutionary ‘spawning’ of independent entities that subsequently battle for limited
resources. For example, one can envision the network fragmentation mechanism of
Sect. 1.10 resulting in a number of subgroups that communicate within each other
but then compete, for example autonomous vehicles or other AI entities seeking
bandwidth or, in an Afghanistan context, local warlords seeking extortion monies or
access to opium poppy supplies. Wallace and Fullilove (2014) examine the fragmentation of drug cartels in Northern Mexico from a similar perspective. The creation
of fragments is seen as a ‘nucleation event’ within a susceptible population, leading
to the growth of analogs to traffic jams (e.g., Wallace 2018 and references therein),
or of crystals suddenly solidifying across a supercooled fluid.
Let Nt ≥ 0 represent the number of cooperating individuals in a particular fragmentary subgroup at time t. The simplest dynamic model is then something like
dNt = αNt (1 − Nt /K)dt + σ Nt d Wt
(6.15)
where K is the ‘ecological carrying capacity’ for the ‘species’, α is a characteristic
growth rate constant, σ a noise strength parameter, and d Wt again a white noise
process.
6.4 Cambrian Events: Spawning Hydras
87
Applying the Ito chain rule to log(Nt ) invokes the stochastic stabilization mechanisms of Mao (2007), via the added ‘correction factor’, leading to the long-time
endemic limits
Nt → 0, α < σ 2 /2
σ2
), α ≥ σ 2 /2
Nt → K(1 −
2α
(6.16)
If the rate of growth of the initial fragment, α, is large enough, noise-driven
fluctuations are not sufficient to collapse it to zero: a ‘traffic jam’ or ‘Cambrian
event’ (Wallace 2014) analog grows.
Figure 6.2 shows two simulations, with σ below and above criticality.
Taking the potential carrying capacity K as very large, so that Nt /K → 0 in
Eq. (6.15), the model suggests that improper management of conflict between cognitive entities can lead to Cambrian events akin to Hercules’ battle with the Hydra:
cut off one head, and two will grow in its place.
Wallace and Fullilove (2014) describe the Hydra mechanism of the latter dynamic
as follows:
Atomistic, individual-oriented economic models of criminal behavior fail to capture critical scale-dependent behaviors that characterize criminal enterprises as cultural artifacts.
Public policies based on such models have contributed materially to the practice of mass
incarceration in the USA. A survey of similar policing strategies in other venues suggests
that such policies almost inevitably lead to exacerbation of organized violence. Adapting a
Black-Scholes methodology, it is possible to characterize the ‘regulatory investment’ needed
to manage criminal enterprise under conditions of uncertainty at a scale and level of organization that avoids an atomistic fallacy. The model illuminates how public policy that
might seem rational on an individual scale can trigger ecosystem resilience transitions to
long-lasting or permanent modes of institutionalized hyperviolence. The homicide waves
associated with the planned shrinkage program in New York City that was directed at dispersing minority voting blocks carry implications for national patterns of social disruption in
which mass incarceration is an ecological keystone. Continuing large-scale socioeconomic
decay, in the specific context of that keystone, greatly increases the probability of persistent,
large-scale, organized hyperviolence, as has been the experience in Naples, Sicily, Mexico,
and elsewhere.
One is indeed led to another quotation, this by Charles Dickens, from the 1853
novel Bleak House, describing the social diffusion of the pathologies of the generic
London slum he called Tom-All-Alone’s:
Even the winds are his messengers, and they serve him in these hours of darkness... There
is not an atom of Tom’s slime, not a cubic inch of any pestilential gas in which he lives, not
one obscenity or degradation about him, not an ignorance, not a wickedness, not a brutality
of his committing, but shall work its retribution through every order of society up to the
proudest of the proud and to the highest of the high.
Welcome to Iraq, Afghanistan, and the Drone Wars: sow chaos, reap chaos.
88
6 An Evolutionary Approach to Real-Time Conflict …
Fig. 6.2 Simulating Nt based on the Ito chain rule expansion of log(Nt ) using Eq. (6.14). The simulations apply the ItoProcess function in Mathematica
10 for white noise. N0 = 100, K = 100, α =
√
1, σ = 0.5, 1.5. The critical value for σ is 2. 2000 time steps. While the upper trace fluctuates
about K, the lower collapses to zero. If K becomes large, then the upper trace explodes in the Hydra
mechanism: cut off one head, and two grow in its place. This is the Drone War dynamic
References
89
References
Champagnat, N., R. Ferriere, and S. Meleard. 2006. Unifying evolutionary dynamics: From individual stochastic process to macroscopic models. Theoretical Population Biology 69: 297–321.
Dembo, A., and O. Zeitouni. 1998. Large deviations and applications, 2nd ed. New York: Springer.
Eldredge, N., and S. Gould. 1972. Punctuated equilibrium: An alternative to phyletic gradualism.
In Models in Paleobiology, ed. T. Schopf, 82–115. San Francisco: Cooper and Co.
Emery, M. 1989. Stochastic calculus in manifolds. Springer, New York: Universititext Series.
Goldenfeld, N., and C. Woese. 2010. Life is physics: Evolution as a collective phenomenon far from
equilibrium. arXiv: 1011.4125v1 [q-bio.PE]
Gould, S.J. 2002. The structure of evolutionary theory. Cambridge, MA: Harvard University Press.
Hodgson, G., and T. Knudsen. 2010. Darwin’s Conjecture: The search for general principles of
social and economic evolution. Chigago, IL: University of Chicago Press.
Langton, C. 1992. Life at the edge of chaos. In Artificial Life II, ed. C. Langton, C. Taylor, J. Farmer,
and S. Rasmussen. Reading MA: Addison-Wesley.
Mao, X. 2007. Stochastic differential equations and applications, 2nd ed. Philadelphia: Woodhead
Publishing.
Pettini, M. 2007. Geometry and topology in Hamiltonian dynamics. New York: Springer.
Sereno, M. 1991. Four analogies between biological and cultural/linguistic evolution. Journal of
Theoretical Biology 151: 467–507.
Von Neumann, J. 1966. Theory of self-reproducing automata. University of Illinois Press.
Wallace, R. 2010. Expanding the modern synthesis. Comptes Rendus Biologies 333: 701–709.
Wallace, R. 2011. A formal approach to evolution as self-referential language. BioSystems 106:
36–44.
Wallace, R. 2013. A new formal approach to evolutionary processes in socioeconomic systems.
Journal of Evolutionary Economics 23: 1–15.
Wallace, R. 2014. A new formal perspective on ‘Cambrian explosions’. Comptes Rendus Biologies
337: 1–5.
Wallace, R. 2015a. An ecosystem approach to economic stabilization: Escaping the neoliberal
wilderness. London: Routledge.
Wallace, R. 2017b. Computational Psychiarty: A systems biology approach to the epigenetics of
mental disorders. New York: Springer.
Wallace, R. 2018. Canonical instabilities of autonomous vehicle systems: The unsettling reality
behind the dreams of greed. New York: Springer.
Wallace, R., and R. Fullilove. 2014. State policy and the political economy of criminal enterprise:
Mass incarceration and persistent organized hyperviolence in the USA. Structural Change and
Economic Dynamics 31: 17–31.
Chapter 7
Summary
The language of business is the language of dreams, but the language of war is the
language of nightmare made real. Yet business dreams of driverless cars on intelligent roads, and of other real-time critical systems under the control of algorithmic
entities, have much of war about them. Critical real-time systems, including military
institutions at the tactical, operational and strategic scales, act on rapidly-shifting
roadway topologies whose ‘traffic rules’ can themselves rapidly change. Indeed,
combat rules-of-the-game usually morph in direct response to an entity’s ‘driving
pattern’, in a large sense. ‘Defensive driving’ is something more than an oxymoron.
The conduct of war is never without both casualty and collateral damage. Realtime critical systems of any nature will inevitably partake of fog-of-war and frictional challenges almost exactly similar to those that have made warfare increasingly
intractable for modern states. Indeed, the destabilization of essential algorithmic
entities has become a new tool of war.
Into the world of Carl von Clausewitz, John Boyd, Mao Tse-Tung, Vo Nguyen
Giap and Genghis Khan, come the brash, bright-eyed techies of Waymo, Alphabet,
Microsoft, Amazon, Uber, and all the wanna-bees. They will forthrightly step in
where a literal phalanx of angels has not feared to tread, but has already treaded very
badly indeed.
For systems facing Clausewitz challenges, everybody always eventually screws
up, and there are always very many dead bodies. Nobody navigates, or can navigate,
such landscapes unscathed.
Something of this is, of course, already known within the tech industries, much as
the risks of tobacco, of PVC furnishings and finishings, and so on, have always been
well understood by the corporations that market them and deliberately obscure their
dangers. At best, heuristic measures such as ‘anytime algorithms’ or multi-subsystem
voting strategies are deemed sufficient to meet real-world conditions. What is not well
appreciated by the tech industries, however, is the utterly unforgiving nature of the
Clausewitz Zweikampf. A taste of these matters has been presented here in a number
© The Author(s), under exclusive licence to Springer International Publishing AG,
part of Springer Nature 2018
R. Wallace, Carl von Clausewitz, the Fog-of-War, and the AI Revolution, SpringerBriefs
in Computational Intelligence, https://doi.org/10.1007/978-3-319-74633-3_7
91
92
7 Summary
of narrative military vignettes: the ‘best and the brightest’ have always screwed up.
Even the Russians lost some 4 million men in WWII before they learned how to
systematically overcome Prussian Bewegungskrieg, as at Stalingrad and Kursk.
If you are doing AI and all this seems irrelevant, you are fucking clueless.
Where fog-of-war and frictional challenges are both infrequent and small, AI systems will be relatively reliable, perhaps as reliable as the already highly-automated
power grids. Introduce those challenges, and AI will fail as badly as has military enterprise. Internet-of-X real-time systems—V2V/V2I, etc.—will be particularly susceptible to large-scale blackout analogs (e.g., Wallace 2018, and references therein).
If these are critical systems, then considerable morbidity and mortality must be
expected.
Deep learning and reinforcement learning AI, when confronted with novel, and
often highly cognitive, challenges that ‘get inside the command decision loop’ can
be expected to fail. Under heavy load, the command decision loop time constant
will relentlessly increase, providing opportunity for inadvertent or deliberate shortcircuiting leading to failure. Indeed, minor perturbations of any nature at the functional equivalent of ‘rush hour’ will have increased probability of amplification to
debilitating meso-, and often macro-scale, phase transitions. This is pretty much
written in stone, as are the associated coevolutionary ‘flash-crash’ and extended
self-referential dynamics that take matters beyond John Boyd’s OODA loop.
The persistent and characteristic failures of military enterprises confronted by
Clausewitz challenges raise a red flag for tech industries hell-bent on marketing AI
for the control of real-time critical systems. The current trajectory of both policy
and practice suggests that, at best, the liability lawyers are going to get rich beyond
dreams of avarice. Worst case scenarios involve large-scale ‘flash crash’ contending
military AI systems.
Reference
Wallace, R. 2018. Canonical instabilities of autonomous vehicle systems: The unsettling reality
behind the dreams of greed. New York: Springer.
Appendix A
Mathematical Appendix
A.1
The Black-Scholes Model
Take H (Γ ) as the control information rate ‘cost’ of stability at the index level Γ .
What is the mathematical form of H (Γ ) under conditions of volatility i.e., variability
in Γ proportional to it? Let
dΓt = g(t, Γt )dt + bΓt dWt
(A.1)
where dWt is taken as white noise and the function g(t, Γ ) will ‘fall out’ of the
calculation on the assumption of certain regularities.
Let H (Γt , t) be the minimum needed incoming rate of control information under
the Data Rate Theorem, and expand in Γ using the Ito chain rule (Protter 1990)
1
dHt = [∂H /∂t + g(Γt , t)∂H /∂Γ + b2 Γt2 ∂ 2 H /∂Γ 2 ]dt
2
+[bΓt ∂H /∂Γ ]dWt
(A.2)
Define a quantity L as a Legendre transform of the rate H by convention having
the form
L = −H + Γ ∂H /∂Γ
(A.3)
Since H is an information index, it is a kind of free energy in the sense of
Feynman (2000) and L is a classic entropy measure.
Heuristically, replacing d X with ΔX in these expressions and applying Eq. (A.2),
1
ΔL = (−∂H /∂t − b2 Γ 2 ∂ 2 H /∂Γ 2 )Δt
2
© The Author(s), under exclusive licence to Springer International Publishing AG,
part of Springer Nature 2018
R. Wallace, Carl von Clausewitz, the Fog-of-War, and the AI Revolution, SpringerBriefs
in Computational Intelligence, https://doi.org/10.1007/978-3-319-74633-3
(A.4)
93
94
Appendix A: Mathematical Appendix
As in the classical Black-Scholes model (Black and Scholes 1973), the terms
in g and dWt ‘cancel out’, and the effects of noise are subsumed into the Ito correction factor, a regularity assumption making this an exactly solvable but highly
approximate model.
The conventional Black-Scholes calculation takes ΔL/ΔT ∝ L. Here, at nonequilibrium steady state, we assume ΔL/Δt = ∂H /∂t = 0, so that
1
− b2 Γ 2 ∂ 2 H /∂Γ 2 = 0
2
(A.5)
H = κ1 Γ + κ2
(A.6)
By inspection,
where the κi are nonnegative constants.
A.2
Groupoids
Given a pairing, connection by a meaningful path to the same basepoint, it is
possible to define ‘natural’ end-point maps α(g) = a j , β(g) = ak from the set of
morphisms G into A, and a formally associative product in the groupoid g1 g2
provided α(g1 g2 ) = α(g1 ), β(g1 g2 ) = β(g2 ), and β(g1 ) = α(g2 ). Then the product is defined, and associative, i.e., (g1 g2 )g3 = g1 (g2 g3 ), with inverse defined by
g = (a j , ak ), g −1 ≡ (ak , a j ).
In addition there are natural left and right identity elements λg , ρg such that
λg g = g = gρg .
An orbit of the groupoid G over A is an equivalence class for the relation a j ∼ Gak
if and only if there is a groupoid element g with α(g) = a j and β(g) = ak . A groupoid
is called transitive if it has just one orbit. The transitive groupoids are the building
blocks of groupoids in that there is a natural decomposition of the base space of a
general groupoid into orbits. Over each orbit there is a transitive groupoid, and the
disjoint union of these transitive groupoids is the original groupoid. Conversely, the
disjoint union of groupoids is itself a groupoid.
The isotropy group of a ∈ X consists of those g in G with α(g) = a = β(g).
These groups prove fundamental to classifying groupoids.
If G is any groupoid over A, the map (α, β) : G → A × A is a morphism from
G to the pair groupoid of A. The image of (α, β) is the orbit equivalence relation
∼ G, and the functional kernel is the union of the isotropy groups. If f : X → Y
is a function, then the kernel of f , ker ( f ) = [(x1 , x2 ) ∈ X × X : f (x1 ) = f (x2 )]
defines an equivalence relation.
Groupoids may have additional structure. As Weinstein (1996) explains, a
groupoid G is a topological groupoid over a base space X if G and X are topological spaces and α, β and multiplication are continuous maps. A criticism sometimes
applied to groupoid theory is that their classification up to isomorphism is nothing
Appendix A: Mathematical Appendix
95
other than the classification of equivalence relations via the orbit equivalence relation and groups via the isotropy groups. The imposition of a compatible topological
structure produces a nontrivial interaction between the two structures. Below we will
introduce a metric structure on manifolds of related information sources, producing
such interaction.
In essence a groupoid is a category in which all morphisms have an inverse, here
defined in terms of connection by a meaningful path of an information source dual
to a cognitive process.
As Weinstein (1996) points out, the morphism (α, β) suggests another way of
looking at groupoids. A groupoid over A identifies not only which elements of A
are equivalent to one another (isomorphic), but it also parameterizes the different
ways (isomorphisms) in which two elements can be equivalent, i.e., all possible
information sources dual to some cognitive process. Given the information theoretic
characterization of cognition presented above, this produces a full modular cognitive
network in a highly natural manner.
Brown (1987) describes the basic structure as follows:
A groupoid should be thought of as a group with many objects, or with many identities... A
groupoid with one object is essentially just a group. So the notion of groupoid is an extension
of that of groups. It gives an additional convenience, flexibility and range of applications...
EXAMPLE 1. A disjoint union [of groups] G = ∪λ G λ , λ ∈ Λ, is a groupoid: the product
ab is defined if and only if a, b belong to the same G λ , and ab is then just the product in the
group G λ . There is an identity 1λ for each λ ∈ Λ. The maps α, β coincide and map G λ to
λ, λ ∈ Λ.
EXAMPLE 2. An equivalence relation R on [a set] X becomes a groupoid with α, β :
R → X the two projections, and product (x, y)(y, z) = (x, z) whenever (x, y), (y, z) ∈ R.
There is an identity, namely (x, x), for each x ∈ X ...
Weinstein (1996) makes the following fundamental point:
Almost every interesting equivalence relation on a space B arises in a natural way as the
orbit equivalence relation of some groupoid G over B. Instead of dealing directly with the
orbit space B/G as an object in the category Smap of sets and mappings, one should consider
instead the groupoid G itself as an object in the category G ht p of groupoids and homotopy
classes of morphisms.
It is, in fact, possible to explore homotopy in paths generated by information
sources.
A.3
Morse Theory
Morse theory examines relations between analytic behavior of a function—the location and character of its critical points—and the underlying topology of the manifold
on which the function is defined. We are interested in a number of such functions,
for example information source uncertainty on a parameter space and ‘second order’
iterations involving parameter manifolds determining critical behavior, for example
sudden onset of a giant component in a network model. We follow Pettini (2007).
96
Appendix A: Mathematical Appendix
The central argument of Morse theory is to examine an n-dimensional manifold
M as decomposed into level sets of some function f : M → R where R is the set of
real numbers. The a-level set of f is defined as
f −1 (a) = {x ∈ M : f (x) = a},
the set of all points in M with f (x) = a. If M is compact, then the whole manifold
can be decomposed into such slices in a canonical fashion between two limits, defined
by the minimum and maximum of f on M. Let the part of M below a be defined as
Ma = f −1 (−∞, a] = {x ∈ M : f (x) ≤ a}.
These sets describe the whole manifold as a varies between the minimum and
maximum of f .
Morse functions are defined as a particular set of smooth functions f : M → R as
follows. Suppose a function f has a critical point xc , so that the derivative d f (xc ) =
0, with critical value f (xc ). Then f is a Morse function if its critical points are
nondegenerate in the sense that the Hessian matrix J of second derivatives at xc ,
whose elements, in terms of local coordinates are
Ji, j = ∂ 2 f /∂ x i ∂ x j ,
has rank n, which means that it has only nonzero eigenvalues, so that there are no
lines or surfaces of critical points and, ultimately, critical points are isolated.
The index of the critical point is the number of negative eigenvalues of J at xc .
A level set f −1 (a) of f is called a critical level if a is a critical value of f , that
is, if there is at least one critical point xc ∈ f −1 (a).
Again following Pettini (2007), the essential results of Morse theory are as follows:
1. If an interval [a, b] contains no critical values of f , then the topology of f −1 [a, v] does not
change for any v ∈ (a, b]. Importantly, the result is valid even if f is not a Morse function,
but only a smooth function.
2. If the interval [a, b] contains critical values, the topology of f −1 [a, v] changes in a manner
determined by the properties of the matrix J at the critical points.
3. If f : M → R is a Morse function, the set of all the critical points of f is a discrete subset
of M, i.e., critical points are isolated. This is Sard’s Theorem.
4. If f : M → R is a Morse function, with M compact, then on a finite interval [a, b] ⊂ R,
there is only a finite number of critical points p of f such that f ( p) ∈ [a, b]. The set of
critical values of f is a discrete set of R.
5. For any differentiable manifold M, the set of Morse functions on M is an open dense set
in the set of real functions of M of differentiability class r for 0 ≤ r ≤ ∞.
6. Some topological invariants of M, that is, quantities that are the same for all the manifolds
that have the same topology as M, can be estimated and sometimes computed exactly once
all the critical points of f are known: let the Morse numbers μi (i = 0, ..., m) of a function
f on M be the number of critical points of f of index i, (the number of negative eigenvalues
Appendix A: Mathematical Appendix
97
of H ). The Euler characteristic of the complicated manifold M can be expressed as the
alternating sum of the Morse numbers of any Morse function on M,
χ=
m
(−1)i μi .
i=1
The Euler characteristic reduces, in the case of a simple polyhedron, to
χ=V−E+F
where V, E, and F are the numbers of vertices, edges, and faces in the polyhedron.
7. Another important theorem states that, if the interval [a, b] contains a critical value of f
with a single critical point xc , then the topology of the set Mb defined above differs from
that of Ma in a way which is determined by the index, i, of the critical point. Then Mb is
homeomorphic to the manifold obtained from attaching to Ma an i-handle, i.e., the direct
product of an i-disk and an (m − i)-disk.
Matsumoto (2002) and Pettini (2007) provide details and further references.
A.4
The Metric M
To reiterate, cognition involves choice that reduces uncertainty, and reduction of
uncertainty implies existence of an information source ‘dual’ to the cognitive process
studied (e.g., (Atlan and Cohen 1998; Wallace and Fullilove 2008), Sect. 3.1). That
information source need not be ergodic in the formal sense of information theory,
and this introduces serious difficulties.
Again, for stationary non-ergodic information sources, a function, H (x n ), of
each path x n → x, may still be defined, such that limn→∞ H (x n ) = H (x) holds
(Khinchin 1957, p. 72). However, H will not be given by the simple cross-sectional
laws-of-large numbers analog with the entropy-like form of Eq. (2.2).
It is possible to extend the theory to information sources supporting a standard
atlas/manifold topology (Glazebrook and Wallace 2009).
Let s ≡ d(x, x̂) ≥ 0 be a real number assigned to pairs of high probability paths
x and x̂ by an appropriate distortion measure d, as described in Cover and Thomas
(2006). Heuristically, for ‘nearly’ ergodic systems, one might expect something like
H (x̂) ≈ H (x) + sdH /ds|s=0
(A.7)
to hold for s sufficiently small. The idea is to take a distortion measure as a kind of
Finsler metric, imposing resulting ‘global’ geometric structures for an appropriate
class of non-ergodic information sources. Possible interesting theorems, then, revolve
around what properties are metric-independent, in much the same manner as the Rate
Distortion Theorem is independent of the exact distortion measure chosen.
This sketch can be made more precise.
98
Appendix A: Mathematical Appendix
Take a set of ‘consonant’ paths x n → x, that is, paths consistent with the ‘grammar’ and ‘syntax’ of the information source dual to the cognitive process of interest.
Suppose, for all such x, there is an open set, U , containing x, on which the
following conditions hold:
(i) For all paths x̂ n → x̂ ∈ U , a distortion measure s n ≡ dU (x n , x̂ n ) exists.
(ii) For each path x n → x in U there exists a pathwise invariant function H (x n ) →
H (x), in the sense of Khinchin (1957, p.72). While such a function will almost
always exist, only in the case of an ergodic information source does it have the
mathematical form of an ‘entropy’ (Khinchin 1957). It can, however, in the sense
of Feynman (2000), still be characterized as homologous to free energy, since
Bennett’s elegant little machine (Feynman 2000) can still turn the information
in a message from a nonergodic information source into work.
(iii) A function MU (s n , n) ≡ Mn → M exists, for example,
and so on.
(iv) The limit
Mn = s n , log[s n ]/n, s n /n
(A.8)
H (x n ) − H (x̂ n )
≡ dH /dM
n→∞
Mn
(A.9)
lim
exists and is finite.
Another approach approximates ergodicity on a local open ‘tangent set’ of paths,
much as a topological manifold can be locally approximated on an open set by a
mapping to a simple tangent plane. Different cognitive phenomena have, according
to our development, dual information sources, and we are interested in the local
properties of the system near a particular reference state.
Impose a topology on the system, so that, near a particular ‘language’ A, dual to
an underlying cognitive process, there is an open set U of closely similar languages
Â, such that A, Â ⊂ U . It may be necessary to coarse-grain the system’s responses
to define these information sources. The problem is to proceed in such a way as to
preserve the underlying essential topology, while eliminating ‘high frequency noise’.
Since the information sources dual to the cognitive processes are similar, for all
pairs of languages A, Â in U , it is possible to:
1. Create an embedding alphabet which includes all symbols allowed to both of
them.
2. Define an information-theoretic distortion measure in that extended, joint alphabet between any high probability (i.e., grammatical and syntactical) paths in A
and Â, which we write as d(Ax, Âx) (Cover and Thomas 2006). These languages
do not interact, in this approximation.
3. Define a metric on U (Glazebrook and Wallace 2009):
d(Ax, Âx) −
d(Ax, A x̂)|
(A.10)
M (A, Â) = |
A, Â
A,A
Appendix A: Mathematical Appendix
99
using an appropriate integration limit argument. The second integration is over different paths within A itself, while the first is between different paths in A and Â.
A.5
Cognitive Renormalization
Equation (1.48) states that the ‘free energy’ F and the correlation length, the degree
of coherence on the underlying network, scale under renormalization clustering in
chunks of size L as
F[K L , JL ]/ f (L ) = F[J, K ]
χ [K L , JL ]L = χ (K , J )
with f (1) = 1, K 1 = K , J1 = J , and we have slightly rearranged terms.
Differentiating these two equations with respect to L , so that the right hand sides
are zero, and solving for d K L /dL and d JL /dL gives, after some consolidation,
expressions of the form
d K L /dL = u 1 d log( f )/dL + u 2 /L
v2
JL
d JL /dL = v1 JL d log( f )/dL +
L
(A.11)
The u i , vi , i = 1, 2 are functions of K L , JL , but not explicitly of L itself.
We expand these equations about the critical value K L = K C and about JL = 0,
obtaining
d K L /dL = (K L − K C )yd log( f )/dL + (K L − K C )z/L
d JL /dL = w JL d log( f )/dL + x JL /L
(A.12)
The terms y = du 1 /d K L | K L =K C , z = du 2 /d K L | K L =K C , w = v1 (K C , 0), x =
v2 (K C , 0) are constants.
Solving the first of these equations gives
K L = K C + (K − K C )L z f (L ) y
(A.13)
again remembering that K 1 = K , J1 = J, f (1) = 1.
Wilson’s (1971) essential trick is to iterate on this relation, which is supposed to
converge rapidly near the critical point, assuming that for K L near K C , we have
K C /2 ≈ K C + (K − K C )L z f (L ) y
(A.14)
We iterate in two steps, first solving this for f (L ) in terms of known values,
and then solving for L , finding a value LC that we then substitute into the first
100
Appendix A: Mathematical Appendix
of Eq. (1.48) to obtain an expression for F[K , 0] in terms of known functions and
parameter values.
The first step gives the general result
f (LC ) ≈
[K C /(K C − K )]1/y
z/y
21/y LC
(A.15)
Solving this for LC and substituting into the first expression of Eq. (1.48) gives,
as a first iteration of a far more general procedure (Shirkov and Kovalev 2001), the
result
F0
F[K C /2, 0]
=
f (LC )
f (LC )
χ (K , 0) ≈ χ (K C /2, 0)LC = χ0 LC
F[K , 0] ≈
(A.16)
which are the essential relationships.
Note that a power law of the form f (L ) = L m , m = 3, which is the direct
physical analog, may not be cognitively reasonable, since it says that ‘language
richness’ can grow very rapidly as a function of increased network size. Such rapid
growth is simply not observed.
Taking the more realistic example of non-integral ‘fractal’ exponential growth,
f (L ) = L δ
(A.17)
where δ > 0 is a real number which may be quite small, equation we can be solve
for LC , obtaining
[K C /(K C − K )][1/(δy+z)]
(A.18)
LC =
21/(δy+z)
for K near K C . Note that, for a given value of y, one might characterize the relation
α ≡ δy + z = constant as a ‘tunable universality class relation’ in the sense of Albert
and Barabasi (2002).
Substituting this value for LC back gives a complex expression for F, having
three parameters: δ, y, z.
A more interesting choice for f (L ) is a logarithmic curve that ‘tops out’, for
example
f (L ) = m log(L ) + 1
(A.19)
Again f (1) = 1.
A late version of the computer algebra program Mathematica solves for LC as
LC =
y/z
Q
Lamber t W [Q exp(z/my)]
(A.20)
Appendix A: Mathematical Appendix
where
101
Q ≡ (z/my)2−1/y [K C /(K C − K )]1/y
The transcendental function LambertW(x) is defined by the relation
Lamber t W (x) exp(Lamber t W (x)) = x
The function arises in the theory of random networks and in renormalization strategies
for quantum field theories.
An asymptotic relation for f (L ) would be of particular interest, implying that
‘computational richness’ increases to a limiting value with system growth. Taking
f (L ) = exp[m(L − 1)/L ]
(A.21)
gives a system which begins at 1 when L = 1, and approaches the asymptotic limit
exp(m) as L → ∞. Mathematica finds
LC =
where
my/z
Lamber t W [A]
(A.22)
A ≡ (my/z) exp(my/z)[21/y [K C /(K C − K )]−1/y ] y/z
Applying these latter results to the Zurek calculation on fragment size, Eq. (1.53),
has yet to be done.
A.6
On Evolutionary Game Theory
In contrast to the empirical methodology of Chap. 6, where equations must fit data,
evolutionary game theory is supposed to provide both a necessary and sufficient
model of evolutionary dynamics. The underlying formalism is the replicator equation
of Taylor and Jonker (1978). We follow the presentation of Roca et al. (2009).
Given an evolutionary game with a payoff matrix W , the dynamics of the distribution of strategy frequencies, xi , as elements of a vector x, follow the relation
d xi
= xi [(W x − xT W x]
(A.23)
dt
xi = 1. The implications are then derived by
The term xT W x ensures that
recourse to dynamical systems theory. An appropriate change of variables converts
the equation to a system of the Lotka-Volterra type.
102
Appendix A: Mathematical Appendix
Evolutionary game theory makes several assumptions.
1. The population is infinitely large.
2. Individuals meet randomly or play against each other, such that the payoff strategy is proportional to the payoff averaged over the current population state.
3. There are no mutations, so that strategies increase or decrease in frequency only
due to reproduction. In other words, adversaries do not learn from conflict.
4. The variation of the population is linear in the payoff difference.
Roca et al. (2009) find the approach lacking, particularly noting that non-meanfield effect may arise from temporal fluctuations or spatial correlations, along
with questions of nonlinearity.
References
Albert, R., and A. Barabasi. 2002. Statistical mechanics of complex networks. Reviews of Modern
Physics 74: 47–97.
Atlan, H., and I. Cohen. 1998. Immune information, self-organization and meaning. International
Immunology 10: 711–717.
Black, F., and M. Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political
Economy 81: 637–654.
Brown, R. 1987. From groups to groupoids: A brief survey. Bulletin of the London Mathematical
Society 19: 113–134.
Cover, T., and J. Thomas. 2006. Elements of information theory, 2nd ed. New York: Wiley.
Feynman, R. 2000. Lectures in computation. Boulder CO: Westview Press.
Glazebrook, J.F., and R. Wallace. 2009. Rate distortion manifolds as model spaces for cognitive
information. Informatica 33: 309–346.
Khinchin, A. 1957. Mathematical foundations of information theory. New York: Dover Publications.
Pettini, M. 2007. Geometry and topology in Hamiltonian dynamics. New York: Springer.
Protter, P. 1990. Stochastic integration and differential equations. New York: Springer.
Roca, C., J. Cuesta, and A. Sanchez. 2009. Evolutionary game theory: Temporal and spatial effects
beyond replicator dynamics. Physics of Life Reviews 6: 208–249.
Shirkov, D., and V. Kovalev. 2001. The Bogolirbov renormalization group and solution symmetry
in mathematical physics. Physics Reports 352: 219–249.
Taylor, P., and L. Jonker. 1978. Evolutionarily stable strategies and game dynamics. Mathematical
Biosciences 40: 145–156.
Wallace, R., and M. Fullilove. 2008. Collective consciousness and its discontents. New York:
Springer.
Weinstein, A. 1996. Groupoids: unifying internal and external symmetry. Notices of the American
Mathematical Association 43: 744–752.
Wilson, K. 1971. Renormalization group and critical Phenomena. I. Renormalization group and the
Kadanoff scaling picture. Physics Reviews B 4: 3174–3183.
Download