Learning to Satisfy Actuator Networks Mark Coates National Science and

advertisement
Learning to Satisfy
Actuator Networks
Mark Coates
National Science and
Engineering Research
Council of Canada (NSERC)
A Journey

“And what is a journey? Is it just… distance
traveled? Time spent? No. It's what
happens on the way, the things that happen
to you. At the end of the journey you're not
the same.”
Plan your Journey, Learn

“It's what happens on the way, it the things that
happen to you.”

Sensor Networks

Local Actuation:


SANETs
Control sensors, control objects.
Modify the environment.

Learn causal relationships between actuations and
environmental (model) variables.

Plan behaviour to optimize performance
Sensor/Actuator
Network

Set of actuators ( ) +
associated sensors ( ).

Actuators perform a
physical (modifying)
action.

Sensors monitor the
response of the system.

Quantify the net effect on
the system (positive or
negative)

Design actuation strategy
to optimize response
Causal Analysis

How can we infer the impact of an actuation
based on a set of observations?

In particular, how do we derive:
Manipulated Probability P(Y | X := x, Z=z)
From (observations based on)
Unmanipulated Probability P(Y | X = x, Z=z)
Example Problem

We wish to evaluate the average
effectiveness of a fertilizer

Local background variables (for example soil
moisture, temperature, salinity, weed density)
affect:


The successful reception of the fertilizer
The impact of the fertilizer on the crop
Causal Graph
uj : local realizations of
background variables with global
distribution g
zj : action by actuator (0/1 =
off/on) [known or measured]
dj : actuation received (0/1 =
no/yes) [unobserved]
yj : response (0/1 =
negative/positive) [unobserved]
xj , wj: observed measurements,
dependent on dj and yj.
Average Causal Effect
(ACE)
 Expectation (over latent variables) of:
[ Prob. of positive response given fertilizer ―
Prob. of positive response without fertilizer ]
ACE ( D  Y ) 
Eu  p( y1 d1 , u )  p( y1 | d0 , u ) 
Model the mapping not the
variable

Problem:



Latent variables u can be high dimensional;
Probability distribution g(u) can have complex
structure.
Approach:



We have binary variables Z, D, Y
We don’t care about the value of uj and how that
directly influences dj
What we do care about is how u impacts the
mappings ZD and DY
New Causal Graph

cr: sixteen states

c= 0: inhibit
1: pass
2: flip
3: activate
g
crj
zj
dj

Much easier to estimate
this distribution
xj
ACE ( D  Y )    p(cr ( s,1))  p(cr (s, 2))
s
yj
wj
Evaluating ACE

Estimate ACE by applying distributed EM algorithm
across the graphical model (model g(cr) as
multinomial)

Locally maximize the likelihood function:
L  z, d, w, y | ACE( D  Y ) 

Expectation: calculate expected crj at each node

Maximization: average the expected cri to estimate
g(cr).
Sensor Network
Evaluation

Tree network topology: An efficient mechanism for
data aggregation and dissemination.

Data aggregation (bottom-up)
Leaf nodes: Transmit E[crj] to parent node
 Parent node: Performs aggregation and relays result to
its own parent
 Root node: Performs maximization


Result dissemination (top-down)

Each node broadcasts result to its children nodes
Influencing the
Environment

Design an actuation strategy



Possible Objectives:



Set of decision rules
Map from (current and past) sensor measurements to an
actuation
Maximize expected response of system
Provide probabilistic bounds on worst-case behaviour
Possible Scenarios


Accurate models of probability distributions  Bayesian
networks
Uninformative models  Learning approaches
Problem Formulation

Epoch of T discrete time intervals

At times t = 0,…,T, node i measures a set of
(i )
environmental variables Vt
(i )
t

Chooses an actuation A belonging to a
discrete set of actuations

At the end of the epoch, measure a local
response variable Y ( i )
T
Maximization Approach

Single binary actuation decision without a
good model of p(y|v,a)

Consider p(y|v,a) = f(y|v,a) + n(v,a)

We have a set of points (vi,ai,yi) and want to
learn the best actuation strategy A(v), i.e.,
that which maximizes f(y|v,a).

Approach: regression + subsequent
maximization
Robustness concerns

Maximization amplifies regression errors.

Multi-stage planning implies repeated
regression + maximization
Proliferation of error
Relaxed Problem

Identify the largest set of environmental
conditions and actuations such that:


Expected response exceeds threshold
Probability of terrible response is very low
LSAT Formulation


Learning to Satisfy (LSAT)
Given points
that solves:
 X i , Yi i1
N
P
max  (G ) subject to
Ci (G, P)  0
where Ci(G,P) are constraints.
find the set G
i  1,..., K
Two types of constraints

Point-wise constraints: C(G,P) = C(x,G,P) is a
function of the input variable x and the
constraint takes the form
Ci ( x, G, P)  0 x  G


Example:
min xG E[Y | X  x]  U
Set-average constraints: C(G,P) > 0 is only
satisfied on-average over the entire set.

Example:
P(Y  L | X  G )  p
Solution
Cˆi (G)

Derive equivalent empirical constraints

Consider solution to empirical constraints
Gˆ  arg max  (G)
subject to
Cˆi (G)  
i  1,..., K

If empirical constraints are close to ideal constraints
then solution is satisfactory.

Algorithm: extension of support vector machine

Lagrangian formulation allocating different penalties to
violations of individual constraints.
Comments

Actuator networks present a host of problems



Assessment of whether causal relationships exist
and evaluation of their strength
Design of actuation strategies that yield a
satisfactory (optimal?) environmental response
These problems are difficult in a centralized
setting – the extension to distributed
algorithms poses an even greater challenge.
Download