Information Theory in Intelligent Decision Making

advertisement
Information Theory
in Intelligent Decision Making
Daniel Polani
Adaptive Systems and Algorithms Research Groups
School of Computer Science
University of Hertfordshire, United Kingdom
March 5, 2015
Daniel Polani
Information Theory in Intelligent Decision Making
Information Theory
in Intelligent Decision Making
The Theory
Daniel Polani
Adaptive Systems and Algorithms Research Groups
School of Computer Science
University of Hertfordshire, United Kingdom
March 5, 2015
Daniel Polani
Information Theory in Intelligent Decision Making
Motivation
Artificial Intelligence
modelling cognition in humans
realizing human-level “intelligent” behaviour in machines
jumble of various ideas to get above points working
Question
Is there a joint way of understanding cognition?
Probability
we have probability theory for a theory of uncertainty
we have information theory for endowing probability with a
sense of “metrics”
Daniel Polani
Information Theory in Intelligent Decision Making
Motivation
Artificial Intelligence
modelling cognition in humans
realizing human-level “intelligent” behaviour in machines (just
performance: not necessarily imitating biological substrate)
jumble of various ideas to get above points working
Question
Is there a joint way of understanding cognition?
Probability
we have probability theory for a theory of uncertainty
we have information theory for endowing probability with a
sense of “metrics”
Daniel Polani
Information Theory in Intelligent Decision Making
Random Variables
Def.: Event Space
Consider an event space Ω = {ω1 , ω2 , . . . }, finite or countably
infinite with a (probability) measure PΩ : Ω → [0, 1] s.t.
∑ω PΩ (ω ) = 1. The ω are called events.
Random Variable
A random variable X is a map X : Ω → X with some outcome
space X = { x1 , x2 , . . . } and induced probability measure
PX ( x ) = PΩ ( X −1 ( x )).
We also write instead
PX ( x ) ≡ P( X = x ) ≡ p( x ) .
Daniel Polani
Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma I
Lemma
Consider observations x1 , x2 , . . . , xn of a random variable X
and two potential hypotheses (distributions) p1 and p2 they
could have been based upon.
Consider the test for hypothesis p1 to be given as
( x1 , xn
2 , . . . , xn ) ∈ A where o
p (x0 ,x0 ,...,x0 )
A = x = ( x10 , x20 , . . . , xn0 ) p21 (x10 ,x20 ,...,xn0 ) ≥ C with some
n
1 2
C ∈ R+ .
Assuming the rate α of false negatives p1 (Ā) to be given.
Generated by p1 , but not in A
If β is the rate of false positives p2 (A)
Then: any test with false negative rate α0 ≤ α has false
positive rate β0 ≥ β.
(Cover and Thomas, 2006)
Daniel Polani
Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma II
Proof
(Cover and Thomas, 2006)
Let A as above and B some other acceptance region; χA and χB
be the indicator functions. Then for all x:
[χA (x) − χB (x)] [ p1 (x) − Cp2 (x)] ≥ 0 .
Multiplying out & integrating:
0≤
∑( p1 − Cp2 ) − ∑( p1 − Cp2 )
A
B
= (1 − α) − Cβ − (1 − α0 ) + Cβ0
= C ( β0 − β) − (α − α0 )
Daniel Polani
Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma III
Consideration
assume events x
i.i.d.
test becomes:
p1 ( x i )
∏ p2 ( x i ) ≥ C
i
logarithmize:
p1 ( x i )
∑ log p2 (xi ) ≥ κ
(:= log C )
i
Daniel Polani
Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma IV
Consideration
assume events x
i.i.d.
test becomes:
p1 ( x i )
∏ p2 ( x i ) ≥ C
i
Note
Average “evidence” growth per sample
p1 ( X )
E log
p2 ( X )
p (x)
= ∑ p ( x ) log 1
p2 ( x )
x ∈X
logarithmize:
p1 ( x i )
∑ log p2 (xi ) ≥ κ
(:= log C )
i
Daniel Polani
Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma V
Consideration
assume events x
i.i.d.
test becomes:
p1 ( x i )
∏ p2 ( x i ) ≥ C
i
Note: Kullback-Leibler Divergence
Average “evidence” growth per sample
p1 ( X )
DKL ( p1 || p2 ) = E p1 log
p2 ( X )
p (x)
= ∑ p1 ( x ) log 1
p2 ( x )
x ∈X
logarithmize:
p1 ( x i )
∑ log p2 (xi ) ≥ κ
(:= log C )
i
Daniel Polani
Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma VI
900
"0.40_vs_0.60.dat"
"0.50_vs_0.60.dat"
"0.55_vs_0.60.dat"
800
700
log sum
600
500
400
300
200
100
0
-100
0
2000
4000
6000
8000
10000
samples
Daniel Polani
Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma VII
900
"0.40_vs_0.60.dat"
"0.50_vs_0.60.dat"
"0.55_vs_0.60.dat"
dkl_04*x
dkl_05 * x
dkl_055 * x
800
700
log sum
600
500
400
300
200
100
0
-100
0
2000
4000
6000
8000
10000
samples
Daniel Polani
Information Theory in Intelligent Decision Making
Part I
Information Theory — Motivation
Daniel Polani
Information Theory in Intelligent Decision Making
Structural Motivation
Intrinsic Pathways to Information Theory
Information
Theory
Structural Motivation
Intrinsic Pathways to Information Theory
Information
Theory
optimal
communication
Structural Motivation
Intrinsic Pathways to Information Theory
Shannon
axioms
Information
Theory
optimal
communication
Structural Motivation
Intrinsic Pathways to Information Theory
physical
entropy
Shannon
axioms
Information
Theory
optimal
communication
Structural Motivation
Intrinsic Pathways to Information Theory
physical
entropy
Laplace’s
principle
Shannon
axioms
Information
Theory
optimal
communication
Structural Motivation
Intrinsic Pathways to Information Theory
physical
entropy
Laplace’s
principle
typicality
theory
Shannon
axioms
Information
Theory
optimal
communication
Structural Motivation
Intrinsic Pathways to Information Theory
physical
entropy
Laplace’s
principle
typicality
theory
optimal Bayes
Shannon
axioms
Information
Theory
optimal
communication
Structural Motivation
Intrinsic Pathways to Information Theory
physical
entropy
Laplace’s
principle
typicality
theory
Shannon
axioms
Information
Theory
optimal Bayes
Rate Distortion
optimal
communication
Structural Motivation
Intrinsic Pathways to Information Theory
physical
entropy
Laplace’s
principle
Shannon
axioms
Information
Theory
typicality
theory
optimal
communication
information
geometry
optimal Bayes
Rate Distortion
Daniel Polani
Information Theory in Intelligent Decision Making
Structural Motivation
Intrinsic Pathways to Information Theory
physical
entropy
Laplace’s
principle
Shannon
axioms
Information
Theory
typicality
theory
optimal
communication
information
geometry
optimal Bayes
Rate Distortion
AI
Daniel Polani
Information Theory in Intelligent Decision Making
Optimal Communication
Codes
task: send messages (disambiguate states) from sender to
receiver
consider self-delimiting codes (without extra delimiting
character)
simple example: prefix codes
Def.: Prefix Codes
codes where none is a prefix of another code
Daniel Polani
Information Theory in Intelligent Decision Making
Prefix Codes
0
0
0
1
1
1
0
1
0
Daniel Polani
Information Theory in Intelligent Decision Making
Kraft Inequality
Theorem
Assume events x ∈ X = { x1 , x2 , . . . xk } are coded using prefix
codewords based on alphabet size b = |B|, with lengths
l1 , l2 , . . . , lk for the respective events, then one has
k
∑ bl
i =1
i
≤1.
Proof Sketch
(Cover and Thomas, 2006)
Let lmax be the length of the longest codeword. Expand tree fully
to level lmax . Fully expanded leaves are either: 1. codewords; 2.
descendants of codewords; 3. neither.
An li codeword has blmax −li full-tree descendants, which must be
different for the different codewords and there cannot be more
than blmax in total. Hence
∑ bl
max − li
≤ blmax
Remark
The converse also holds.
Daniel Polani
Information Theory in Intelligent Decision Making
Considerations — Most compact code
Assume
Want to code stream of events
x ∈ X appearing with probability
p ( x ).
Minimize
Average code length:
E[ L] = ∑i p( xi ) li under
Note
1 try to make l as small as
i
possible
2 make b − li as large as possible
3 limited by Kraft inequality;
ideally becoming equality
Result
Differentiating Lagrangian
∑b
− li
=1
!
constraint ∑i b−li = 1
∑ p ( x i ) li + λ ∑ b − l
i
i
i
w.r.t. l gives codeword
lengths for “shortest” code:
i
li = − logb p( xi )
as li are integers, that’s typically not exact
Daniel Polani
Information Theory in Intelligent Decision Making
Considerations — Most compact code
Assume
Want to code stream of events
x ∈ X appearing with probability
p ( x ).
Minimize
Average code length:
E[ L] = ∑i p( xi ) li under
Note
1 try to make l as small as
i
possible
2 make b − li as large as possible
3 limited by Kraft inequality;
ideally becoming equality
Result
Differentiating Lagrangian
∑b
− li
!
constraint ∑i b−li = 1
∑ p ( x i ) li + λ ∑ b − l
i
i
i
w.r.t. l gives codeword
lengths for “shortest” code:
=1
i
li = − logb p( xi )
as li are integers, that’s typically not exact
Average Codeword Length
= ∑ p( xi ) · li = − ∑ p( x ) log p( x )
i
x
In the following, assume binary log.
Daniel Polani
Information Theory in Intelligent Decision Making
Entropy
Def.: Entropy
Consider the random variable X. Then the entropy H ( X ) of X is
defined as
H (X)
:= − ∑ p( x ) log p( x )
x
with convention 0 log 0 ≡ 0
Daniel Polani
Information Theory in Intelligent Decision Making
Entropy
Def.: Entropy
Consider the random variable X. Then the entropy H ( X ) of X is
defined as
H (X)
:= − ∑ p( x ) log p( x )
x
with convention 0 log 0 ≡ 0
Interpretations
average optimal codeword length
uncertainty (about next sample of X)
physical entropy
much more . . .
Quote
“Why don’t you call it entropy. In the first place, a mathematical
development very much like yours already exists in Boltzmann’s
statistical mechanics, and in the second place, no one understands
entropy very well, so in any discussion you will be in a position of
advantage.”
John von Neumann
Daniel Polani
Information Theory in Intelligent Decision Making
Entropy
Def.: Entropy
Consider the random variable X. Then the entropy H ( X ) of X is
defined as
H ( X )[≡ H ( p)] := − ∑ p( x ) log p( x )
x
with convention 0 log 0 ≡ 0
Interpretations
average optimal codeword length
uncertainty (about next sample of X)
physical entropy
much more . . .
Quote
“Why don’t you call it entropy. In the first place, a mathematical
development very much like yours already exists in Boltzmann’s
statistical mechanics, and in the second place, no one understands
entropy very well, so in any discussion you will be in a position of
advantage.”
John von Neumann
Daniel Polani
Information Theory in Intelligent Decision Making
Meditation
Probability/Code Mismatch
Consider events x following a probability p( x ), but modeler
assuming mistakenly probability q( x ), with optimal code lengths
− log q( x ). Then “code length waste per symbol” given by
− ∑ p( x ) log q( x ) + ∑ p( x ) log p( x )
x
x
= ∑ p( x ) log
x
p( x )
q( x )
= DKL ( p||q)
Daniel Polani
Information Theory in Intelligent Decision Making
Part II
Types
Daniel Polani
Information Theory in Intelligent Decision Making
A Tip of Types
(Cover and Thomas, 2006)
Method of Types: Motivation
consider sequences with same empirical distribution
how many of these with a particular distribution
probability of such a sequence
Sketch of the Method
consider binary event set X = {0, 1}
w.l.o.g.
consider sample x (n) = ( x1 , . . . , xn ) ∈ X n
(n)
the type px is the empirical distribution of symbols y ∈ X in
sample x (n) . I.e. px(n) (y) counts how often symbol y appears
in x (n) . Let Pn be set of types with denominator n.
or dividing n
for p ∈ Pn , call the set of all sequences x (n) ∈ X n with type p
the type class C (p) = { x (n) |px(n) = p}.
Daniel Polani
Information Theory in Intelligent Decision Making
Type Theorem
Type Count
If |X | = 2, one has |Pn | = n + 1 different types for sequences of
length n.
easy to generalize
Important
Pn grows only polynomially, but X n grows exponentially with n.
It follows that (at least one) type must contain exponentially many
sequences. This corresponds to the “macrostate” in physics.
Theorem
(Cover and Thomas, 2006)
If x1 , x2 , . . . , xn is an i.i.d. drawn sample sequence drawn from q,
then the probability of x (n) depends only on its type and is given by
2−n[ H (px(n) )+ DKL (px(n) ||q)]
Corollary
If x (n) has type q, then its probability is given by
2−nH (q)
A large value of H (q) indicates many possible candidates x (n) and
high uncertainty, a small value few candidates and low uncertainty.
here, we interpret probability q as type
Daniel Polani
Information Theory in Intelligent Decision Making
Part III
Laplace’s Principle and Friends
Daniel Polani
Information Theory in Intelligent Decision Making
Laplace’s Principle of Insufficient Reason I
Scenario
Consider X . A probability distribution is assumed on X , but it is
unknown.
Laplace’s principle of insufficient reason states that, in absence of
any reason to assume that the outcomes are inequivalent, the
probability distribution on X is assumed as equidistribution.
Question
How to generalize when something is known?
Daniel Polani
Information Theory in Intelligent Decision Making
Answer: Types
Dominant Sample Sequence
Remember: sequence probability of sequences in type class C (q)
2−nH (q)
A priori, a probability q maximizing H (q) will generate dominating
sequence types dominating all others.
Maximum Entropy Principle
Maximize: H (q) with respect to q
Result: equidistribution q( x ) =
Daniel Polani
1
|X |
Information Theory in Intelligent Decision Making
Sanov’s Theorem I
Theorem
Consider i.i.d. sequence
X1 , X2 , . . . , Xn of random variables,
distributed according to q( X ). Let
further E be a set of probability
distributions.
E
p∗
Then (amongst other), if E is closed
and with p∗ = arg min p∈E D ( p||q),
one has
q
1
log q(n) (E ) −→ − D ( p∗ ||q)
n
Daniel Polani
Information Theory in Intelligent Decision Making
Sanov’s Theorem II
Interpretation
p is unknown, but one knows constraints for p (e.g. some
!
condition, such as some mean value Ū = ∑ x p( x )U ( x ) must be
attained, i.e. the set E is given), then the dominating types are
those close to p∗ .
Special Case
if prior q is equidistribution (indifference), then minimizing D ( p||q)
under constraints E is equivalent to maximizing H ( p) under these
constraints.
Jaynes’ Maximum Entropy Principle
Daniel Polani
Information Theory in Intelligent Decision Making
Sanov’s Theorem III
Jaynes’ Principle
generalization of Laplace’s Principle
maximally uncommitted distribution
Daniel Polani
Information Theory in Intelligent Decision Making
Maximum Entropy Distributions I
No constraints
We are interested in maximizing
H ( X ) = − ∑ p( x ) log p( x )
x
over all probabilities p. The probability p lives in the simplex
∆ = {q ∈ R|X | | ∑i qi = 1, qi ≥ 0}
The maximization requires to respect constraints, of which we now
!
consider only ∑ x p( x ) = 1.
The edge constraints happen not to be invoked here.
Daniel Polani
Information Theory in Intelligent Decision Making
Maximum Entropy Distributions II
No constraints
Unconstrained maximization via Lagrange:
max[− ∑ p( x ) log p( x ) + λ ∑ p( x )]
p
x
x
Taking derivative ∇ p(x) gives
!
− log p( x ) − 1 + λ = 0
. Thus p( x ) = eλ−1 ≡ 1/|X | — equidistribution
Daniel Polani
Information Theory in Intelligent Decision Making
Maximum Entropy Distributions
Linear Constraints
Constraints are now
∑ p( x ) = 1
!
x
∑ p( x ) f ( x ) =
!
f¯ .
x
Derive Lagrangian
0=
− ∑ p( x ) log p( x ) + λ ∑ p( x ) + µ ∑ p( x ) f ( x )
x
x
x
− log p( x ) − 1 + λ + µ f ( x ) = 0
so that one has
Boltzmann/Gibbs Distribution
p ( x ) = e λ −1+ µ f ( x )
1
= eµ f ( x)
Z
Daniel Polani
Information Theory in Intelligent Decision Making
Maximum Entropy Distributions
Linear Constraints
Constraints are now
∑ p( x ) = 1
!
x
∑ p( x ) f ( x ) =
!
f¯ .
x
Derive Lagrangian
0 = ∇ P [ − ∑ p( x ) log p( x ) + λ ∑ p( x ) + µ ∑ p( x ) f ( x )]
x
x
x
− log p( x ) − 1 + λ + µ f ( x ) = 0
so that one has
Boltzmann/Gibbs Distribution
p ( x ) = e λ −1+ µ f ( x )
1
= eµ f ( x)
Z
Daniel Polani
Information Theory in Intelligent Decision Making
Part IV
Kullback-Leibler and Friends
Daniel Polani
Information Theory in Intelligent Decision Making
Conditional Kullback-Leibler
DKL can be conditional
DKL [ p(Y | x )||q(Y | x )]
DKL [ p(Y | X )||q(Y || X )] =
Daniel Polani
∑ p(x) DKL [ p(Y |x)||q(Y |x)]
x
Information Theory in Intelligent Decision Making
Kullback-Leibler and Bayes
(Biehl, 2013)
Want to estimate p( x |θ ), where θ is the parameter. Observe y.
Seek “best” q( x |y) for this y in the following sense:
1
minimize DKL of true distribution to model distribution q
DKL [ p( x |θ )||q( x |y)]
min
q
Daniel Polani
Information Theory in Intelligent Decision Making
Kullback-Leibler and Bayes
(Biehl, 2013)
Want to estimate p( x |θ ), where θ is the parameter. Observe y.
Seek “best” q( x |y) for this y in the following sense:
1
minimize DKL of true distribution to model distribution q
2
averaged over possible observations y
min
q
∑ p(y|θ ) DKL [ p(x|θ )||q(x|y)]
y
Daniel Polani
Information Theory in Intelligent Decision Making
Kullback-Leibler and Bayes
(Biehl, 2013)
Want to estimate p( x |θ ), where θ is the parameter. Observe y.
Seek “best” q( x |y) for this y in the following sense:
1
minimize DKL of true distribution to model distribution q
2
averaged over possible observations y
3
averaged over θ
min
q
Z
dθ p(θ )
∑ p(y|θ ) DKL [ p(x|θ )||q(x|y)]
y
Daniel Polani
Information Theory in Intelligent Decision Making
Kullback-Leibler and Bayes
(Biehl, 2013)
Want to estimate p( x |θ ), where θ is the parameter. Observe y.
Seek “best” q( x |y) for this y in the following sense:
1
minimize DKL of true distribution to model distribution q
2
averaged over possible observations y
3
averaged over θ
min
q
Z
dθ p(θ )
∑ p(y|θ ) DKL [ p(x|θ )||q(x|y)]
y
Result
q( x |y) is the Bayesian inference obtained from p(y| x ) and p( x )
Daniel Polani
Information Theory in Intelligent Decision Making
Conditional Entropies
Special Case: Conditional Entropy
H (Y | X = x ) := − ∑ p(y| x ) log p(y| x )
y
H (Y | X ) := − ∑ p( x ) ∑ p(y| x ) log p(y| x )
x
y
Information
Reduction of entropy (uncertainty) by knowing another variable
I ( X; Y ) := H (Y ) − H (Y | X )
= H ( X ) − H ( X |Y )
= H ( X ) + H (Y ) − H ( X, Y )
= DKL [ p( x, y)|| p( x ) p(y)]
Daniel Polani
Information Theory in Intelligent Decision Making
Part V
Towards Reality
Daniel Polani
Information Theory in Intelligent Decision Making
Rate/Distortion Theory
Code below specifications
Reminder
Information is about sending messages. We considered most
compact codes over a given noiseless channel. Now consider the
situation where either:
1
2
channel is not noiseless but has noisy characteristics p( x̂ | x ) or
we cannot afford to spend average of H ( X ) bits per symbol
to transmit
Question
What happens? Total collapse of transmission
Daniel Polani
Information Theory in Intelligent Decision Making
Rate/Distortion Theory I
Distortion
“Compromise”
don’t longer insist on perfect transmission
accept compromise, measure distortion d( x, x̂ ) between
original x and transmitted x̂
small distortion good, large distortion “baaad”
Theorem: Rate Distortion Function
Given p( x ) for generation of symbols X,
R( D ) :=
min
p( x̂ | x )
E[d( X,X̂ )]= D
I ( X; X̂ )
where the mean is over p( x, x̂ ) = p( x̂ | x ) p( x ).
Daniel Polani
Information Theory in Intelligent Decision Making
Rate/Distortion Theory II
Distortion
1.8
r(x)
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
Daniel Polani
0.6
0.8
1
Information Theory in Intelligent Decision Making
First Example: Infotaxis
(Vergassola et al., 2007)
Daniel Polani
Information Theory in Intelligent Decision Making
Information Theory
in Intelligent Decision Making
Applications
Daniel Polani
Adaptive Systems and Algorithms Research Groups
School of Computer Science
University of Hertfordshire, United Kingdom
March 5, 2015
Daniel Polani
Information Theory in Intelligent Decision Making
Thank You
Informationtheoretic PA-Loop
Invariants,
Empowerment
Alexander Klyubin
Christoph Salge
Cornelius Glackin
EC (FEELIX
GROWING, FP6),
NSF, ONR, DARPA,
FHA
Relevant Information
Chrystopher Nehaniv
Collective
Naftali Tishby
Empowerment
Thomas Martinetz
Philippe Capdepuy
Jan Kim
Collective Systems
Digested
Malte Harder
Information
World Structure,
Christoph Salge
Graphs,
Continuous
Empowerment in
Empowerment
Games
Tobias Jung
Tom Anthony
Peter Stone
Sensor Evolution,
Information
distribution over the
PA-Loop
Sander van Dijk
Alexandra Mark
Achim Liese
Information Flow,
PA-Loop Models
Nihat Ay
Further
Contributions
Mikhail Prokopenko
Lars Olsson
Philippe Capdepuy
Malte Harder
Simon McGregor
This work was partially supported by
FP7 ICT-270219
Daniel Polani
Information Theory in Intelligent Decision Making
Part VI
Crash Introduction
Daniel Polani
Information Theory in Intelligent Decision Making
Modelling Cognition: Motivation from Biology
Question
Why/how did cognition evolve in biology?
Observations in biology
sensors often highly optimized:
detection of few molecules (moths)
(Dusenbery, 1992)
detection of few or individual photons (humans/toads)
(Hecht et al., 1942; Baylor et al., 1979)
auditive sense operates close to thermal noise
(Denk and Webb, 1989)
cognitive processing very expensive
(Laughlin et al., 1998; Laughlin, 2001)
Daniel Polani
Information Theory in Intelligent Decision Making
Conclusions
Evidence
sensors often operate at physical limits
evolutionary pressure for high cognitive functions
But What For?
close the cycle
actions matter
Daniel Polani
Information Theory in Intelligent Decision Making
Conclusions
Evidence
sensors often operate at physical limits
evolutionary pressure for high cognitive functions
But What For?
close the cycle
actions matter
Entscheidend ist, was hinten rauskommt.
Daniel Polani
Information Theory in Intelligent Decision Making
Conclusions
Evidence
sensors often operate at physical limits
evolutionary pressure for high cognitive functions
But What For?
close the cycle
actions matter
Entscheidend ist, was hinten rauskommt.
Trade-Offs
sharpening sensors
improve processing
boosting actuators
Was man nicht im Kopf hat, muss man in den
Beinen haben.
Daniel Polani
Information Theory in Intelligent Decision Making
Part VII
Information
Daniel Polani
Information Theory in Intelligent Decision Making
Decisions, Decisions
Challenge
Linking sensors, processing and actuators
The Physical and the Biological
Physics:
given dynamical equations etc.
known (in principle)
Biological Cognition:
no established unique model
complex, difficult to untangle
Daniel Polani
Information Theory in Intelligent Decision Making
Decisions, Decisions
Challenge
Linking sensors, processing and actuators
The Physical and the Biological
Physics:
given dynamical equations etc.
known (in principle)
Biological Cognition:
Robotic Cognition:
no established unique model
complex, difficult to untangle
many near-equivalent incompatible
solutions and architectures
often specific and hand-designed
Problem
Considerable arbitrariness in treatment of cognition
Daniel Polani
Information Theory in Intelligent Decision Making
Idea
Issues
uniform treatment of cognition
distinguish:
essential
incidental
aspects of computation
Proposal: “Covariant” Modeling of Computation
Physics:
observations may depend on “coordinate system”
for same underlying phenomenon
Cognition:
computation may depend on architecture
but essentially computes “the same concepts”
Bottom Line
“coordinate-” (mechanism-)free view of cognition?
Daniel Polani
Information Theory in Intelligent Decision Making
Landauer’s Principle
Fundamental Limits for Information Processing
On lowest level: cannot fully separate physics and information
processing
Consequence: erasure of information from a “memory” creates
heat
Connection: of energy and information
Wt+1
Wt
(Wt , Mt )
(Wt+1 , Mt+1 )
Mt
Daniel Polani
Mt + 1
Information Theory in Intelligent Decision Making
Informational Invariants: Beyond Physics
Law of Requisite Variety
(Ashby, 1956; Touchette and Lloyd, 2000, 2004)
Ashby: “only variety can destroy variety”
extension by Touchette/Lloyd
Open-Loop Controller: max. entropy reduction
∗
∆Hopen
. . . Wt−3
Wt−1
Wt−2
A t −3
A t −2
Wt+1
Wt
A t −1
Daniel Polani
At
Wt+2 . . .
A t +1
Information Theory in Intelligent Decision Making
Informational Invariants: Beyond Physics
Law of Requisite Variety
(Ashby, 1956; Touchette and Lloyd, 2000, 2004)
Ashby: “only variety can destroy variety”
extension by Touchette/Lloyd
Open-Loop Controller: max. entropy reduction
∗
∆Hopen
Closed-Loop Controller: max. entropy reduction
∗
∆Hclosed ≤ ∆Hopen
+ I (Wt ; At )
. . . Wt−3
Wt−1
Wt−2
A t −3
A t −2
Wt+1
Wt
A t −1
Daniel Polani
At
Wt+2 . . .
A t +1
Information Theory in Intelligent Decision Making
Informational Invariants: Scenario
Core Statement
Task: consider e.g. navigational task
Informationally: reduction of entropy of initial (arbitrary) state
Example:
\tex[c][c][1][0]{y}
10
5
0
−5
−10
−10
−5
0
5
10
\tex[c][c][1][0]{x}
Daniel Polani
Information Theory in Intelligent Decision Making
Information Bookkeeping
Bayesian Network
. . . Wt−3
St −3
Wt−1
Wt−2
A t −3
. . . Mt −3
St −2
A t −2
St −1
Mt −2
Daniel Polani
Wt+1
Wt
A t −1
Mt −1
St
At
Mt
Wt+2 . . .
St +1
A t +1
Mt +1 . . .
Information Theory in Intelligent Decision Making
Information Bookkeeping
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Daniel Polani
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
Information Theory in Intelligent Decision Making
Information Bookkeeping
Bayesian Network
. . . Wt−3
St−3
Wt−2
A t −3
St−2
Wt−1
A t −2
St−1
Wt+1
Wt
A t −1
St
At
St+1
Wt+2 . . .
A t +1
Informational “Conservation Laws”
Total Sensor History: S(t) = (S0 , S1 , . . . , St−1 )
Result:
lim I (S(t) ; W0 ) = H (W0 )
t→∞
(Klyubin et al., 2007), and see also (Ashby, 1956; Touchette and Lloyd, 2000, 2004)
Daniel Polani
Information Theory in Intelligent Decision Making
Observations
Key Motto
There is no perpetuum mobile of 3rd kind.
Information Balance Sheet
Task Invariant: H (W0 ) determines minimum information
required to get to center
Task Variant: but can be spread/concentrated differently over
time
environment and agents (“stigmergy”)
sensors and memory
(Klyubin et al., 2004a,b, 2007; van Dijk et al., 2010)
Note: invariance is purely entropic: indifferent to task
Next Step
refine towards specific tasks
Daniel Polani
Information Theory in Intelligent Decision Making
Observations
Key Motto
There is no perpetuum mobile of 3rd kind.
Actually, rather, there may be no free lunch, but sometimes there is free beer.
Information Balance Sheet
Task Invariant: H (W0 ) determines minimum information
required to get to center
Task Variant: but can be spread/concentrated differently over
time
environment and agents (“stigmergy”)
sensors and memory
(Klyubin et al., 2004a,b, 2007; van Dijk et al., 2010)
Note: invariance is purely entropic: indifferent to task
Next Step
refine towards specific tasks
Daniel Polani
Information Theory in Intelligent Decision Making
Information for Decision Making
Replace gradient follower by general policy π
Dynamics
S t −1
. . . S t −2
π
π
A t −2
S t +1
St
π
S t +2 . . .
π
A t −1
At
A t +1
Utility
V π ( s ) : = E π [ R t + R t +1 + · · · | s ]
a
0
a
= ∑ π ( a|s) ∑ Pss
0 R ss0 + V ( s )
a
Daniel Polani
s0
Information Theory in Intelligent Decision Making
A Parsimony Principle
Traditional MDP
Task: find best policy π ∗
Traditional RL: does not consider decision costs
Credo: information processing expensive in biology!
(Laughlin et al., 1998; Laughlin, 2001; Polani, 2009)
Hypothesis: organisms trade off information-processing costs with
task payoff
(Tishby and Polani, 2011; Polani, 2009; Laughlin, 2001)
Therefore: include information cost and expand to I-MDP
(Polani et al., 2006; Tishby and Polani, 2011)
Principle of Information Parsimony
minimize I (S; A) (relevant information) at fixed utility level
Daniel Polani
Information Theory in Intelligent Decision Making
Motto
It is a very sad thing that nowadays there is so little useless
information.
Oscar Wilde
Daniel Polani
Information Theory in Intelligent Decision Making
Relevant Information and its Policies
Computation
Via Lagrangian formalism:
(Stratonovich, 1965; Polani et al., 2006; Belavkin, 2008, 2009; Still and Precup, 2012; Saerens et al., 2009; Tishby
and Polani, 2011)
find:
min I (S; A) − βE[V π (S)]
π
β → ∞: policy is optimal while informationally parsimonious!
β finite: policy suboptimal at fixed level E[V π (S)] while
informationally parsimonious
I (S; A) as well as V π depend on π
Expectation
for higher utility, more relevant information required
and vice versa
Daniel Polani
Information Theory in Intelligent Decision Making
Experiments
Scenario
Define
a , Ra )
(Pss
0
ss0
A
by:
States: grid world
Actions: north, east, south,
west
Reward: action produces a
“reward” of -1 until
goal reached
B
Experiment
Trade off utility and relevant
information
Question
Form of expected trade-off?
Daniel Polani
Information Theory in Intelligent Decision Making
Experiment — Find the Corner
0
E[Q(S,A)]
-10
-20
-30
-40
-50
0
0.2
0.4
0.6
I(S;A)
0.8
1
Daniel Polani
1.2
Information Theory in Intelligent Decision Making
Experiment — Find the Corner
Optimal Case
0
goal B has higher utility
than A
E[Q(S,A)]
-10
but needs a lot more
information per step
-20
-30
Suboptimal Case
-40
goal B much worse than
goal A
-50
0
0.2
0.4
0.6
I(S;A)
0.8
1
1.2
for same information cost
Daniel Polani
Information Theory in Intelligent Decision Making
Experiment — With a Twist I
Experiment Revisited
grid-world again
consider only goal A
cost as before
The “Twist”
(Polani, 2011)
permute directions north, east, south, west!
random fixed permutation of directions for each state
a , R a ) by ( P̃ a , R̃ a ) where
replace (Pss
0
ss0
ss0
ss0
σ ( a)
s
a
P̃ss
0 : = P ss0
σ ( a)
s
a
R̃ss
0 : = R ss0
Daniel Polani
Information Theory in Intelligent Decision Making
Experiment — With a Twist II
Expectation
a , R̃ a ) remains
as a traditional MDP, “twisted” MDP (P̃ss
0
ss0
exactly equivalent:
same optimal values
e ∗ ( s ), s ∈ S
V ∗ (s) = V
same optimal policy after undoing twist
pre-/post-twist policies equivalent via
e π (s, a) = Qπ̃ (s, σs ( a))
Q
π (s, a) = π̃ (s, σs ( a))
Daniel Polani
Information Theory in Intelligent Decision Making
Experiment With a Twist: Uh-Oh!
0
E[V(S)]
-10
-20
-30
-40
-50
0
0.2
0.4
0.6
I(S;A)
0.8
1
Daniel Polani
1.2
Information Theory in Intelligent Decision Making
Experiment With a Twist: Uh-Oh!
Optimal Case
sanity check: utility same
for original and twisted
0
E[V(S)]
-10
but latter needs a lot
more information per step
-20
-30
-40
-50
0
0.2
0.4
0.6
I(S;A)
0.8
1
1.2
Suboptimal Case
twisted MDP becomes
much worse than original
at same information cost
Daniel Polani
Information Theory in Intelligent Decision Making
Intermediate Conclusions
Insights
as traditional MDP both experiments fully equivalent
as I-MDP, however . . .
significant difference between
agent “taking actions with it” and
having “realigned” set of actions at each step
embodiment allows to offload informational effort
(eg. Paul, 2006; Pfeifer and Bongard, 2007)
Daniel Polani
Information Theory in Intelligent Decision Making
Part VIII
Goal-Relevant Information
Daniel Polani
Information Theory in Intelligent Decision Making
Towards Multiple Goals
Extension
assume family of tasks (e.g. multiple goals)
action now depends on both state and goals
S t −1
S t +1
St
A t −1
At
G
Goal-Relevant Information
I ( G; At |st ) = H ( At |st ) − H ( At | Gt , st )
Daniel Polani
Information Theory in Intelligent Decision Making
Towards Multiple Goals
Extension
assume family of tasks (e.g. multiple goals)
action now depends on both state and goals
S t −1
S t +1
St
A t −1
At
G
Goal-Relevant Information (Regularized)
min I ( G; At |St ) − βE[V π (St , G, At )]
π ( at |st ,g)
Daniel Polani
Information Theory in Intelligent Decision Making
Goal-Relevant Information
I ( G; At |st )
Daniel Polani
Information Theory in Intelligent Decision Making
I ( St ; A t | G )
Goal-Relevant and Sensor Information Trade-Offs
0.6
0.5
0.4
0.3
0.2
0.1
0
α=0
α=1
0
0.2 0.4 0.6 0.8
1
1.2 1.4 1.6
I ( G; At |St )
Lagrangian
min (1 − α) I ( G; At |St ) + αI (St ; At | G ) − βE[V π (St , G, At )]
π ( at |st ,g)
Daniel Polani
Information Theory in Intelligent Decision Making
Information “Caching”
Note
not only the how much of goal-relevant information matters
but also the which
Consider
Accessible History (Context): e.g.
A t −1 = ( A 0 , A 1 , . . . , A t −1 )
“Cache Fetch”: new goal-relevant information not already used
I ( At ; G |At−1 ) = H ( At |At−1 ) − H ( At | G, At−1 )
Daniel Polani
Information Theory in Intelligent Decision Making
Subgoals
I ( A t ; G | A t −1 , s )
new goal information
I ( A t −1 ; G | A t , s )
discarded goal information
(van Dijk and Polani, 2011; van Dijk and Polani, 2013)
Daniel Polani
Information Theory in Intelligent Decision Making
Subgoals
I ( A t ; G | A t −1 , s )
new goal information
I ( A t −1 ; G | A t , s )
discarded goal information
(van Dijk and Polani, 2011; van Dijk and Polani, 2013)
Psychological Connections?
Crossing doors causes forgetting
(see also Radvansky et al., 2011)
Daniel Polani
Information Theory in Intelligent Decision Making
Efficient Relevant Goal Information
(van Dijk and Polani, 2013)
“Most Efficient” Goal
G −→ G̃1 −→ A ←− S
min
I ( G̃1 ;A|S)≥C
I ( G; G̃1 )
Daniel Polani
Information Theory in Intelligent Decision Making
Efficient Relevant Goal Information
(van Dijk and Polani, 2013)
February 14, 2013
14
17:28 WSPC/INSTRUCTION FILE
acs12
S.G. van Dijk and D. Polani
“Most Efficient” Goal
G −→ G̃1 −→ A ←− S
min
I ( G̃1 ;A|S)≥C
I ( G; G̃1 )
(a) |G̃1 | = 3
(b) |G̃1 | = 4
(c) |G̃1 | = 5
(d) |G̃1 | = 6
Fig. 8: Goal clusters induced by the bottleneck G̃1 on the primary goal-information
pathway in a 6-room grid world navigation task. Figures (a) to (d) show the mappings for increasing cardinality of the bottleneck variable.
distribution for this pathway:
Daniel Polani
Information
Theory
in Intelligent
min I(G;
G̃2 ) subj.
to I(St ; ADecision
t |G̃2 ) ≥ CI2Making
(7)
Making State Predictive for Actions
“Most Enhancive” Goal
G −→ G̃2 −→ A ←− S
min
I (S;A| G̃2 )≥C
I ( G; G̃2 )
Daniel Polani
Information Theory in Intelligent Decision Making
Making State Predictive for Actions
Informational Constraints-Driven Organization in Goal-Directed Behavior
17
“Most Enhancive” Goal
G −→ G̃2 −→ A ←− S
min
I (S;A| G̃2 )≥C
I ( G; G̃2 )
(a) |G � | = 4
(b) |G � | = 5
(c) |G � | = 6
(d) |G � | = 7
Fig. 10: Goal clusters induced by the bottleneck G̃2 on the secondary, stateinformation
goal-information
pathway Decision
in a 9-room
grid world naviDaniel
Polani modulating
Information
Theory in Intelligent
Making
Making State Predictive for Actions
Informational Constraints-Driven Organization in Goal-Directed Behavior
17
“Most Enhancive” Goal
G −→ G̃2 −→ A ←− S
min
I (S;A| G̃2 )≥C
I ( G; G̃2 )
Insights
“spillover” ignoring local
boundaries
(a) |G � | = 4
(b) |G � | = 5
(c) |G � | = 6
(d) |G � | = 7
action information
induces global “frame of
reference”
depends on action
consistency
Fig. 10: Goal clusters induced by the bottleneck G̃2 on the secondary, stateinformation
goal-information
pathway Decision
in a 9-room
grid world naviDaniel
Polani modulating
Information
Theory in Intelligent
Making
Part IX
Empowerment: Motivation
Daniel Polani
Information Theory in Intelligent Decision Making
Universal Utilities
Problems
in biology, success criterium is survival
concept of a “task” and “reward” is not sharp
“search space” too large for full-fledged success feedback
Daniel Polani
Information Theory in Intelligent Decision Making
Universal Utilities
Problems
in biology, success criterium is survival
concept of a “task” and “reward” is not sharp
“search space” too large for full-fledged success feedback
pure Darwinism: feedback by death
Daniel Polani
Information Theory in Intelligent Decision Making
Universal Utilities
Problems
in biology, success criterium is survival
concept of a “task” and “reward” is not sharp
“search space” too large for full-fledged success feedback
pure Darwinism: feedback by death
this is very sparse
Notes
Homeostasis: provides dense networks to guide living beings
Problem:
specific to particular organisms
designed on case-to-case basis for artificial agents
more generalizable perspective in view of success of
evolution?
Daniel Polani
Information Theory in Intelligent Decision Making
Idea
Universal Drives and Utilities
Core Idea: adaptational feedback should be dense and rich
artificial curiosity, learning progress, autotelic principle,
intrinsic reward
(Schmidhuber, 1991; Kaplan and Oudeyer, 2004; Steels, 2004; Singh et al., 2005)
homeokinesis, and predictive information
(Der, 2001; Ay et al., 2008)
Physical Principle:
causal entropic forcing
(Wissner-Gross and Freer, 2013)
Daniel Polani
Information Theory in Intelligent Decision Making
Present Ansatz
Use Embodiment
optimize informational fit into the sensorimotor niche
maximization of potential
to inject information into the environment (via actuators)
and recapture it from the environment (via sensors)
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Here: Empowerment
Motto
“Being in control of one’s destiny
is good.”
(Jung et al., 2011)
Daniel Polani
Information Theory in Intelligent Decision Making
Here: Empowerment
Motto
“Being in control of one’s destiny
and knowing it
is good.”
(Jung et al., 2011)
Daniel Polani
Information Theory in Intelligent Decision Making
Here: Empowerment
Motto
“Being in control of one’s destiny
and knowing it
is good.”
(Jung et al., 2011)
More Precisely
information-theoretic version of
controllability (being in control of destiny)
observability (knowing about it)
combined
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−1
Wt−2
A t −3
. . . Mt −3
St −2
A t −2
St −1
Mt −2
Wt+1
Wt
A t −1
St
At
Mt
Mt −1
Wt+2 . . .
St +1
A t +1
Mt +1 . . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−2
A t −3
St −2
Wt−1
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−1
Wt−2
A t −3
St −2
A t −2
Wt+1
Wt
St −1
A t −1
St
At
St +1
Wt+2 . . .
A t +1
“Free Will” Actions
Empowerment: Formal Definition
E( k ) : =
max
p( at−k ,at−k+1 ,...,at−1 )
I ( A t − k , A t − k +1 , . . . , A t −1 ; S t )
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−1
Wt−2
A t −3
St −2
A t −2
St −1
Wt+1
Wt
A t −1
St
At
St +1
Wt+2 . . .
A t +1
“Free Will” Actions
Empowerment: Formal Definition
max
p( at−k ,at−k+1 ,...,at−1 |wt−k )
E( k ) ( w t − k ) : =
I ( A t − k , A t − k +1 , . . . , A t −1 ; S t | w t − k )
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . . Wt−3
St −3
Wt−1
Wt−2
A t −3
St −2
A t −2
Wt+1
Wt
St −1
A t −1
St
At
St +1
Wt+2 . . .
A t +1
“Free Will” Actions
Empowerment: Formal Definition
E( k ) ( w t − k ) : =
max
(k)
(k)
p ( at−k | wt−k )
I ( A t − k ; St | w t − k )
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani
Information Theory in Intelligent Decision Making
Empowerment — a Universal Utility
Notes
Empowerment E(k) (w) defined
given horizon k, i.e. local
given starting state w (or context, for POMDPs)
i.e. empowerment is function of state, “utility”
However
only defined by world dynamics
no reward function assumed
Daniel Polani
Information Theory in Intelligent Decision Making
Empowerment — Notes
Properties of Empowerment
want to maximize potential information flow
could be injected through the actuators
into the environment
and recaptured by sensors in the future
potential influence on the environment
which is detectable through agent sensors
a only
determined by embodiment Pss
0
a
no external reward Rss0
Bottom Line
information-theoretic controllability/observability
informational efficiency of sensorimotor niche
Daniel Polani
Information Theory in Intelligent Decision Making
Other Interpretations
Related Concepts
mobility
money
affordance
graph centrality
(Anthony et al., 2008)
antithesis to “helplessness”
(Seligman and Maier, 1967; Overmier and Seligman, 1967)
Think Strategic
Tactics is what you do when you have a plan
Strategy is what you do when you haven’t
Daniel Polani
Information Theory in Intelligent Decision Making
Part X
First Examples
Daniel Polani
Information Theory in Intelligent Decision Making
Maze Empowerment
maze average distance
E ∈ [1; 2.32]
E ∈ [1.58; 3.70] E ∈ [3.46; 5.52] E ∈ [4.50; 6.41]
Daniel Polani
Information Theory in Intelligent Decision Making
Empowerment vs. Average Distance
**
*
6.0
***
*
*
**
***
*
*
**
*
*
*
*
**
**
**
EE
5.5
*
*
*
5.0
*
*
*
*
4.5
*
6
8
10
12
14
*
16
d
Daniel Polani
Information Theory in Intelligent Decision Making
Box Pushing
stationary box
pushable box
E ∈ [5.86, 5.93]
E = log2 61 ≈ 5.93 bit
E ∈ [5.86, 5.93]
E ∈ [5.93, 7.79]
box invisible
to agent
box visible to
agent
Daniel Polani
Information Theory in Intelligent Decision Making
In the Continuum:
Pendulum Swing-up Task w/o Reward
(Jung et al., 2011)
Dynamics
pendulum (length l = 1, mass m = 1, grav g = 9.81, friction µ = 0.05)
ϕ̈(t) =
−µ ϕ̇(t) + mgl sin( ϕ(t)) + u(t)
ml 2
with state st = ( ϕ(t), ϕ̇(t)) and continuous control u ∈ [−5, 5].
system time discretized to ∆ = 0.05 sec
discretize actions to u ∈ {−5, −2.5, 0, +2.5, +5}
Goal
To provide this system with some matching purpose, consider
pendulum swing-up task
Comparison
empowerment-based control
traditional optimal control
Daniel Polani
Information Theory in Intelligent Decision Making
Results: Performance
Performance of optimal policy (FVI+KNN on 1000x1000 grid)
Performance of maximally empowered policy (3−step)
5
2
phi
phidot
phi
phidot
4
1
3
0
2
−1
1
−2
0
−3
−1
−4
−2
0
1
2
3
4
5
Time (sec)
6
7
8
9
10
−5
0
1
2
3
4
5
Time (sec)
6
7
8
9
10
Phase plot of ϕ and ϕ̇ when following the respective greedy policy from the last slide. Note that for ϕ, the y-axis
Danielupright,
Polani the goal
Information
shows the height of the pendulum (+1 means
state). Theory in Intelligent Decision Making
Results: “Explored” Space
Empowerment−based Exploration
6
4
φ’ [rad/s]
2
0
Action 0
−2
Action 1
Action 2
Action 3
−4
Action 4
−6
−pi
−pi/2
Daniel Polani
0
φ [rad]
pi/2
pi
Information Theory in Intelligent Decision Making
Empowerment: Acrobot
(Jung et al., 2011)
Setting
two-linked pendulum
actuation in hip only
Idea
Add LQR control to bang-bang control
Daniel Polani
Information Theory in Intelligent Decision Making
Acrobot: Demo
Daniel Polani
Information Theory in Intelligent Decision Making
Block’s World
(Salge, 2013)
Properties
scenario with modifiable world
deterministic (i.e. empowerment is log n where n is the
number of reachable states in horizon k)
agent can incorporate, place, destroy blocks and move
estimated via (highly incomplete) sampling
Empowered “Minecrafter”
(Salge, 2013)
Daniel Polani
Information Theory in Intelligent Decision Making
Explorer Accompanying Robot
(Glackin et al., 2015)
Consortium
Demonstrator II
Daniel Polani
Information Theory in Intelligent Decision Making
Part XI
References
Daniel Polani
Information Theory in Intelligent Decision Making
Anthony, T., Polani, D., and Nehaniv, C. L. (2008). On preferred
states of agents: how global structure is reflected in local
structure. In Bullock, S., Noble, J., Watson, R., and Bedau,
M. A., editors, Artificial Life XI: Proceedings of the Eleventh
International Conference on the Simulation and Synthesis of
Living Systems, Winchester 5–8. Aug., pages 25–32. MIT
Press, Cambridge, MA.
Ashby, W. R. (1956). An Introduction to Cybernetics. Chapman &
Hall Ltd.
Ay, N., Bertschinger, N., Der, R., Güttler, F., and Olbrich, E.
(2008). Predictive information and explorative behavior of
autonomous robots. European Journal of Physics B,
63:329–339.
Baylor, D., Lamb, T., and Yau, K. (1979). Response of retinal
rods to single photons. Journal of Physiology, London,
288:613–634.
Belavkin, R. (2008). The duality of utility and information in
optimally learning systems. In Proc. 7th IEEE International
Conference on ’Cybernetic Intelligent Systems’. IEEE Press.
Daniel Polani
Information Theory in Intelligent Decision Making
Belavkin, R. V. (2009). Bounds of optimal learning. In Adaptive
Dynamic Programming and Reinforcement Learning, 2009.
ADPRL’09. IEEE Symposium on, pages 199–204. IEEE.
Biehl, M. (2013). Kullback-leibler and bayes. Internal Memo.
Cover, T. M. and Thomas, J. A. (2006). Elements of Information
Theory. Wiley, 2nd edition.
Denk, W. and Webb, W. W. (1989). Thermal-noise-limited
transduction observed in mechanosensory receptors of the
inner ear. Phys. Rev. Lett., 63(2):207–210.
Der, R. (2001). Self-organized acqusition of situated behavior.
Theory Biosci., 120:1–9.
Dusenbery, D. B. (1992). Sensory Ecology. W. H. Freeman and
Company, New York.
Glackin, C., Salge, C., Trendafilov, D., Greaves, M., Polani, D.,
Leu, A., Haque, S. J. U., Slavnić, S., , and Ristić-Durrant, D.
(2015). An information-theoretic intrinsic motivation model
for robot navigation and path planning.
Daniel Polani
Information Theory in Intelligent Decision Making
Hecht, S., Schlaer, S., and Pirenne, M. (1942). Energy, quanta and
vision. Journal of the Optical Society of America, 38:196–208.
Jung, T., Polani, D., and Stone, P. (2011). Empowerment for
continuous agent-environment systems. Adaptive Behaviour,
19(1):16–39. Published online 13. January 2011.
Kaplan, F. and Oudeyer, P.-Y. (2004). Maximizing learning
progress: an internal reward system for development. In Iida,
F., Pfeifer, R., Steels, L., and Kuniyoshi, Y., editors,
Embodied Artificial Intelligence, volume 3139 of LNAI, pages
259–270. Springer.
Klyubin, A., Polani, D., and Nehaniv, C. (2007). Representations
of space and time in the maximization of information flow in
the perception-action loop. Neural Computation,
19(9):2387–2432.
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2004a).
Organization of the information flow in the perception-action
loop of evolved agents. In Proceedings of 2004 NASA/DoD
Conference on Evolvable Hardware, pages 177–180. IEEE
Computer Society.
Daniel Polani
Information Theory in Intelligent Decision Making
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2004b). Tracking
information flow through the environment: Simple cases of
stigmergy. In Pollack, J., Bedau, M., Husbands, P., Ikegami,
T., and Watson, R. A., editors, Artificial Life IX: Proceedings
of the Ninth International Conference on Artificial Life, pages
563–568. MIT Press.
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2005a). All else
being equal be empowered. In Advances in Artificial Life,
European Conference on Artificial Life (ECAL 2005), volume
3630 of LNAI, pages 744–753. Springer.
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2005b).
Empowerment: A universal agent-centric measure of control.
In Proc. IEEE Congress on Evolutionary Computation, 2-5
September 2005, Edinburgh, Scotland (CEC 2005), pages
128–135. IEEE.
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2008). Keep your
options open: An information-based driving principle for
sensorimotor systems. PLoS ONE, 3(12):e4018.
Daniel Polani
Information Theory in Intelligent Decision Making
Laughlin, S. B. (2001). Energy as a constraint on the coding and
processing of sensory information. Current Opinion in
Neurobiology, 11:475–480.
Laughlin, S. B., de Ruyter van Steveninck, R. R., and Anderson,
J. C. (1998). The metabolic cost of neural information.
Nature Neuroscience, 1(1):36–41.
Nehaniv, C. L., Polani, D., Olsson, L. A., and Klyubin, A. S.
(2007). Information-theoretic modeling of sensory ecology:
Channels of organism-specific meaningful information. In
Laubichler, M. D. and Müller, G. B., editors, Modeling
Biology: Structures, Behaviour, Evolution, The Vienna Series
in Theoretical Biology, pages 241–281. MIT press.
Overmier, J. B. and Seligman, M. E. P. (1967). Effects of
inescapable shock upon subsequent escape and avoidance
responding. Journal of Comparative and Physiological
Psychology, 63:28–33.
Paul, C. (2006). Morphological computation: A basis for the
analysis of morphology and control requirements. Robotics
and Autonomous Systems, 54(8):619–630.
Daniel Polani
Information Theory in Intelligent Decision Making
Pfeifer, R. and Bongard, J. (2007). How the Body Shapes the Way
We think: A New View of Intelligence. Bradford Books.
Polani, D. (2009). Information: Currency of life? HFSP Journal,
3(5):307–316.
Polani, D. (2011). An informational perspective on how the
embodiment can relieve cognitive burden. In Proc. IEEE
Symposium Series in Computational Intelligence 2011 —
Symposium on Artificial Life, pages 78–85. IEEE.
Polani, D., Nehaniv, C., Martinetz, T., and Kim, J. T. (2006).
Relevant information in optimized persistence vs. progeny
strategies. In Rocha, L. M., Bedau, M., Floreano, D.,
Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proc.
Artificial Life X, pages 337–343.
Radvansky, G. A., Krawietz, S. A., and Tamplin, A. K. (2011).
Walking through doorways causes forgetting: Further
explorations. The Quarterly Journal of Experimental
Psychology, 64(8):1632–1645.
Daniel Polani
Information Theory in Intelligent Decision Making
Saerens, M., Achbany, Y., Fuss, F., and Yen, L. (2009).
Randomized shortest-path problems: Two related models.
Neural Computation, 21:2363–2404.
Salge, C. (2013). Block’s world. Presented at GSO 2013.
Schmidhuber, J. (1991). A possibility for implementing curiosity
and boredom in model-building neural controllers. In Meyer,
J. A. and Wilson, S. W., editors, Proc. of the International
Conference on Simulation of Adaptive Behavior: From
Animals to Animats, pages 222–227. MIT Press/Bradford
Books.
Seligman, M. E. P. and Maier, S. F. (1967). Failure to escape
traumatic shock. Journal of Experimental Psychology, 74:1–9.
Singh, S., Barto, A. G., and Chentanez, N. (2005). Intrinsically
motivated reinforcement learning. In Proceedings of the 18th
Annual Conference on Neural Information Processing Systems
(NIPS), Vancouver, B.C., Canada.
Steels, L. (2004). The autotelic principle. In Iida, F., Pfeifer, R.,
Steels, L., and Kuniyoshi, Y., editors, Embodied Artificial
Daniel Polani
Information Theory in Intelligent Decision Making
Intelligence: Dagstuhl Castle, Germany, July 7-11, 2003,
volume 3139 of Lecture Notes in AI, pages 231–242. Springer
Verlag, Berlin.
Still, S. and Precup, D. (2012). An information-theoretic approach
to curiosity-driven reinforcement learning. Theory in
Biosciences, 131(3):139–148.
Stratonovich, R. (1965). On value of information. Izvestiya of
USSR Academy of Sciences, Technical Cybernetics, 5:3–12.
Tishby, N. and Polani, D. (2011). Information theory of decisions
and actions. In Cutsuridis, V., Hussain, A., and Taylor, J.,
editors, Perception-Action Cycle: Models, Architecture and
Hardware, pages 601–636. Springer.
Touchette, H. and Lloyd, S. (2000). Information-theoretic limits of
control. Phys. Rev. Lett., 84:1156.
Touchette, H. and Lloyd, S. (2004). Information-theoretic approach
to the study of control systems. Physica A, 331:140–172.
van Dijk, S. and Polani, D. (2011). Grounding subgoals in
information transitions. In Proc. IEEE Symposium Series in
Daniel Polani
Information Theory in Intelligent Decision Making
Computational Intelligence 2011 — Symposium on Adaptive
Dynamic Programming and Reinforcement Learning, pages
105–111. IEEE.
van Dijk, S. and Polani, D. (2013). Informational
constraints-driven organization in goal-directed behavior.
Advances in Complex Systems, 16(2-3). Published online, 30.
April 2013, DOI:10.1142/S0219525913500161.
van Dijk, S. G., Polani, D., and Nehaniv, C. L. (2010). What do
you want to do today? relevant-information bookkeeping in
goal-oriented behaviour. In Proc. Artificial Life, Odense,
Denmark, pages 176–183.
Vergassola, M., Villermaux, E., and Shraiman, B. I. (2007).
’infotaxis’ as a strategy for searching without gradients.
Nature, 445:406–409.
Wissner-Gross, A. D. and Freer, C. E. (2013). Causal entropic
forcing. Physics Review Letters, 110(168702).
Daniel Polani
Information Theory in Intelligent Decision Making
Download