Optimizing Sensing from Water to the Web Andreas Krause rsrg

advertisement
Optimizing Sensing
from Water to the Web
Andreas Krause
Cornell University, March 19 2010
rsrg@caltech
..where theory and practice collide
Monitoring algal blooms
Algal blooms threaten freshwater
4 million people without water
1300 factories shut down
$14.5 billion to clean up
Other occurrences in Australia, Japan, Canada,
Brazil, Mexico, Great Britain, Portugal, Germany …
Growth processes still unclear
Need to characterize growth in the lakes, not in the lab!
Tai Lake China
10/07 MSNBC2
Monitoring rivers and lakes
[Singh, K, Guestrin, Kaiser, Journal of AI Research ‘08]
Need to monitor large spatial phenomena
Temperature, nutrient distribution, fluorescence, …
NIMS
Kaiser
et.al.
(UCLA)
Can only make a limited
number of measurements!
Depth
Color indicates actual temperature
Predicted temperature
Use robotic sensors to
cover large areas
Predict at
unobserved
locations
Location
across
Where
should
welake
sense to get most accurate predictions?
3
Monitoring water networks
[K, Leskovec, Guestrin, VanBriesen, Faloutsos, J Wat Res Mgt ‘08]
Contamination of drinking water
could affect millions of people
Contamination
Sensors
Simulator from EPA
Hach Sensor
Place sensors to detect contaminations
~$14K
“Battle of the Water Sensor Networks” competition
Where should we place sensors to quickly detect contamination?
4
Sensing problems
Want to learn something about the state of the world
Estimate water quality in a geographic region, detect outbreaks, …
We can choose (partial) observations…
Make measurements, place sensors,
choose experimental parameters …
… but they are expensive / limited
hardware cost, power consumption, …
Want to cost-effectively get most useful information!
Fundamental problem in AI:
How do we automate curiosity & serendipity?
5
Related work
Sensing problems considered in
Experimental design (Lindley ’56, Robbins ’52…), Spatial statistics
(Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics
(Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …),
Operations Research (Nemhauser ’78, …), …
Existing algorithms typically
Heuristics: No guarantees! Can do arbitrarily badly.
Find optimal solutions (Mixed integer programming, POMDPs):
Very difficult to scale to bigger problems.
6
Research in my group
Theoretical:
Approximation algorithms that have
theoretical guarantees and scale to large problems
Applied:
Empirical studies with real deployments
and large datasets
7
Running example: Detecting fires
Want to place sensors to detect fires in buildings
8
Bayesian’s view of sensor networks
X1
X2
Y1
Y3
Y2
X4
Y4
Xs: temperature
X3
X5
Y5
at location s
X6
Ys: sensor value
Y6
Ys = Xs + noise
at location s
Joint probability distribution
P(X1,…,Xn,Y1,…,Yn) = P(X1,…,Xn) P(Y1,…,Yn | X1,…,Xn)
Prior
Likelihood
9
Why is this useful?
X1
X2
Y1
X3
Y3
Y2
X4
Y4
X6
X5
Y5
Y6
Robust reasoning: Integrate measurements from
multiple sensors. E.g.: P(X2 | y1,y2,y3) likely more
accurate than P(X2 | y2)
Exploiting correlation: Can predict P(X1, X3 | y2)
 Can turn some sensors off to save battery life
10
Making observations
CNH
X2
X1
CNH
CNH
X3
Y2
Y1Y
=hot
1
Y3
CNH
X5
CNH
X4
Y5
Y4
Less uncertain  Reward[ P(X|Y1=hot)] = 0.2
11
Making observations
CNH
X2
X1
CNH
CNH
X3
Y2
Y1
Y33=hot
CNH
X5
CNH
X4
Y5
Y4
Reward[ P(X|Y3=hot)] = 0.4
12
A different outcome…
CNH
X2
X1
CNH
CNH
X3
Y2
Y1
Y3=cold
CNH
X5
CNH
X4
Y5
Y4
Reward[ P(X|Y3=cold)] = 0.1
13
Example reward functions
Should we raise a fire alert?
Temp. X
X1
CNH
Fiery hot
normal/cold
No alarm
-$$$
0
Raise alarm
$
-$
Actions
Only have belief about temperature P(X = hot | obs)
 choose a* = argmaxa x P(x|obs) U(x,a)
Decision theoretic value of information
Reward[ P(X | obs) ] = maxa x P(x|obs) U(x,a)
14
Other example reward functions
Entropy
Reward[ P(X) ] = -H(X) = x P(x) log2 P(x)
Expected mean squared prediction error (EMSE)
Reward[ P(X) ] = -1/n s Var(Xs),
Many other objectives possible and useful…
15
Value of information [Lindley ’56, Howard ’64]
For any set A of sensors, its sensing quality is
F(A) = yA P(yA) Reward[P(X | yA)]
Observations
Reward when observing
made by sensors A
YA = yA
16
Optimizing sensing / Outline
Sensing locations
Sensing quality
Sensing cost
Sensing budget
Sensor placement
Robust sensing
Complex constraints
Adaptive sensing
17
Maximizing value of information
[Krause, Guestrin, Journal of AI Research ’09]
Want to find a set A* µ V, |A*| · k s.t.
A* = argmax|A|· k F(A)
Theorem: Complexity of optimizing value of information
X1
X1
X2
X2
X3
For chains (HMMs, etc.)
Optimally solvable in polytime 
X3
X4
For trees:
PP complete
NP
X5

18
Approximating Value of Information
Given: finite set V of locations
Want:
A*µ V such that
X2
X1
Y2
Y1
Typically NP-hard!
Greedy algorithm:
X3
Y3
X5
X4
Y5
Start with A = ;
For i = 1 to k
s* := argmaxs F(A [ {s})
A := A [ {s*}
Y4
How well can this simple heuristic do?
19
Performance of greedy
50
Optimal
OFFIC E
52
12
9
54
OFFIC E
51
49
QUIET
PHONE
11
8
53
16
15
10
C ONFER ENC E
13
14
7
17
18
STOR AGE
48
LAB
ELEC
C OPY
5
47
19
6
4
46
45
21
2
SERVER
44
K ITC HEN
39
37
Greedy
41
38
36
23
33
35
40
42
22
1
43
20
3
29
27
31
34
25
32
30
28
24
26
Temperature data
from sensor network
Greedy empirically close to optimal. Why?
20
Key observation: Diminishing returns
Selection A = {Y1, Y2}
Selection B = {Y1,…, Y5}
Y2
Y1
Y2
Y1
Y3
Y5
Y4
Theorem [Krause and Guestrin, UAI ‘05]: If Y cond. ind. given X:
Information gain F(A) = H(X) – H(X | YA) is submodular!
Y‘
Adding Y’ doesn’t help much
Adding Y’ will help a lot!
New sensor Y’
Submodularity:
B A
+ Y’ Large improvement
+ Y’
Small improvement
For Aµ B, F(A [ {Y’}) – F(A) ¸ F(B [ {Y’}) – F(B)
21
One reason submodularity is useful
Theorem [Nemhauser et al ‘78]
Greedy algorithm gives constant factor approximation
F(Agreedy) ¸ (1-1/e) F(Aopt)
~63%
Greedy algorithm gives near-optimal solution!
For information gain: Guarantees best possible unless P = NP!
[Krause & Guestrin ’05]
Many more reasons, sit back and relax…
22
Building a Sensing Chair
[Mutlu, K, Forlizzi, Guestrin, Hodgins, UIST ‘07]
People sit a lot
Activity recognition in
assistive technologies
Seating pressure as
user interface
Equipped with
1 sensor per cm2!
Costs $6,000! 
Lean Lean Slouch
left forward
Can we get similar
accuracy with fewer, 82% accuracy on
cheaper sensors?
10 postures! [Zhu et al] 23
How to place sensors on a chair?
Sensor readings at locations V as random variables
Predict posture Y using probabilistic model P(Y,V)
Pick sensor locations A* µ V to minimize entropy:
Possible locations V
Placed sensors, did a user study:
Accuracy
Before
82%
After
79%
Cost
$6,000 
$100 
Similar accuracy at <2% of the cost!
24
Battle of the Water Sensor Networks Competition
[K, Leskovec, Guestrin, VanBriesen, Faloutsos, J Wat Res Mgt 2008]
Real metropolitan area network (12,527 nodes)
Water flow simulator provided by EPA
3.6 million contamination events
Multiple objectives: Detection time, affected population, …
Place sensors that detect well “on average”
25
Sensor placement in water networks
Simulator predicts utility of placing sensors
Water flow dynamics, demands of households, …
For each subset A µ V compute utility F(A)
Model predicts
High
impact
Contamination
Low impact
location
Medium impact
location
S3
S1S2
S1
S4
S3
Sensor reduces
impact through
early detection!
Theorem:
Set V of all
S1
Impact network
reduction
F(A) is submodular!
junctions
High sensing quality F(A) = 0.9
S2
S4
Low sensing quality F(A)=0.01
26
BWSN Competition results
13 participants
Performance measured in 30 different criteria
G: Genetic algorithm D: Domain knowledge
E: “Exact” method (MIP)
H: Other heuristic
Higher is better
Total Score
30
G
H D
D G
G
H
E
G
G H
E
25
20
15
10
5
0
24% better performance than runner-up! 
27
What was the trick?
Running time (minutes)
Lower is better
Simulated all 3.6M contaminations on 2 weeks / 40 processors
152 GB data on disk , 16 GB in main memory (compressed)
Very slow evaluation of F(A) 
 Very accurate sensing quality 
30 hours/20 sensors
300
200
100
Exhaustive search
(All subsets)
Naive
greedy
Fast greedy
0
1 2 3 4 5 6 7 8 9 10
Number of sensors selected
6 weeks for all
30 settings 
ubmodularity
to the rescue:
Using “lazy evaluations”:
1 hour/20 sensors
after 2 days! 
Advantage through theory andDone
engineering!
28
What about worst-case?
Knowing the sensor locations, an
adversary contaminates here!
S3
S2
S1
S3
S2
S4
S4
S1
Placement detects
Very different average-case score,
well on “average-case”
Same worst-case score
(accidental) contamination
Where should we place sensors to quickly detect in the worst case?
29
Optimizing for the worst case
Separate utility function Fi with each contamination i
Fi(A) = impact reduction by sensors A for contamination i
Want to solve
Contamination
at node s
(A) is high
Fs(B)
Sensors B
Sensors A
Each of the Fi is submodular
Contamination
Unfortunately, mini Fi not submodular!
at node r
Fr(B)
(A) is high
low
How can we solve this robust sensing problem?
30
Outline
Sensor placement
Robust sensing
Complex constraints
Adaptive sensing
31
How does the greedy algorithm do?
V={s1, s2, s3}
s2
s1
s3
Optimal
solution
Optimal score: 1
Buy k=2 sensors
Fi = intrusion at si
Set A
{s1}
F1
1
F2
0
mini Fi
0
{s2}
{s3}
0
1
0



Hence
{s1,s3}we can’t
1
find
any
approximation
algorithm.
{s2,s3}
1


{s1,s2}Or can
1 we?
1
1
Greedy picks
s3 first
Then, can
choose only
s1 or s2
Greedy score: 
 Greedy does arbitrarily badly  Can we do better?
Theorem [NIPS ’07]: The problem max|A|· k mini Fi(A)
does not admit any approximation unless P=NP
32
Alternative formulation
If somebody told us the optimal value,
can we recover the optimal solution A*?
Need to find
Is this any easier?
Yes, if we relax the constraint |A| · k
33
Solving the alternative problem
Trick: For each Fi and c, define truncation
c
Fi(A)
F’i,c(A)
|A|
Problem 1 (last slide)
Problem 2
Same
Non-submodular
 optimal solutions! Submodular!
one solves the other
Don’t know howSolving
to solve
Can use greedy! 
34
Back to our example
Guess c=1
First pick s1
Then pick s2
 Optimal solution!
s2
s1
Set A
{s1}
{s2}
{s3}
{s1,s3}
{s2,s3}
{s1,s2}
F1
1
0
F2
0
1
mini Fi
0
0
F’avg,1

1




1


1
(1+)/2

1
1
½
½
(1+)/2
1
s3
How do we find c?
Do binary search!
35
SATURATE Algorithm
[K, McMahan, Guestrin, Gupta JMLR ‘08]
Given: set V, integer k and submodular functions F1,…,Fm
Initialize cmin=0, cmax = mini Fi(V)
Do binary search: c = (cmin+cmax)/2
Greedily find AG such that F’avg,c(AG) = c
If |AG| ·  k: increase cmin
If |AG| >  k: decrease cmax
until convergence
Truncation
threshold
(color)
36
Theoretical guarantees
Theorem: The problem max|A|· k mini Fi(A)
does not admit any approximation unless P=NP 
Theorem: Saturate finds a solution AS such that
mini Fi(AS) ¸ OPTk and |AS| ·  k
where
OPTk = max|A|·k mini Fi(A)
 = 1 + log maxs i Fi({s})
Theorem:
If there were a polytime algorithm with better factor  < ,
then NP µ DTIME(nlog log n)
37
Example: Lake monitoring
pH value
Monitor pH values using robotic sensor
transect
Observations A
Prediction at unobserved
locations
True (hidden) pH values
Var(s | A)
Position s along transect
Use probabilistic model
(Gaussian processes)
to estimate prediction error
Where should we sense to minimize our maximum error?
 Robust sensing problem!
(often) submodular
[Das & Kempe ’08]
38
Comparison with state of the art
Algorithm used in geostatistics: Simulated Annealing
[Sacks & Schiller ’88, van Groeningen & Stein ’98, Wiens ’05,…]
7 parameters that need to be fine-tuned
Maximum marginal variance
Maximum marginal variance
better
0.25
0.2
Greedy
0.15
Saturate
0.1
0.05
0
0
Simulated
Annealing
20
40
Number of sensors
60
2.5
2
Greedy
1.5
Simulated
Annealing
1
Saturate
0.5
0
20
40
60
Number of sensors
Saturate is competitive & 10x faster
Precipitation data
Environmental monitoring
No parameters to tune!
80
100
39
Maximum detection time (minutes)
Lower is better
Results on water networks
3000
2500
Greedy
2000
Simulated
Annealing
1500
No decrease
until all
contaminations
detected!
1000
500
0
0
Saturate
10
20
Number of sensors
Water networks
60% lower worst-case detection time!
40
Summary so far
Submodularity in
optimization
All thesesensing
applications
involve physical sensing.
Now
Greedy is near-optimal
[UAIsomething
’05, JMLR ’07, KDD
’07]
for
completely
different.
Let’s jump from water…
Path planning
Robust sensing
Communication constraints
Greedy fails badly
Saturate is near-optimal
[JMLR ’08, IPSN ‘08]
Adaptive sensing
Greedy fails badly
pSPIEL gives strong guarantees
[IJCAI ’07, IPSN ’06, AAAI ‘07]
Adaptive Submodularity [’10]
Regret bounds for GP Optimization [’10]
41
… to the Web!
You have 10 minutes each day for reading blogs / news.
Which of the million blogs should you read?
42
Cascades in the Blogosphere
Time
[Leskovec, K, Guestrin, Faloutsos, VanBriesen, Glance KDD 07 – Best Paper]
Learn about
story after us!
Information
cascade
Which blogs should we read to learn about big cascades early?
43
Water vs. Web
Placing sensors in
water networks
In both problems we are given
vs.
Selecting
informative blogs
Graph with nodes (junctions / blogs) and edges (pipes / links)
Cascades spreading dynamically over the graph (contamination / citations)
Want to pick nodes to detect big cascades early
In both applications, utility functions submodular 
[Generalizes Kempe, Kleinberg, Tardos, KDD ’03]
44
Performance on Blog selection
Lower is better
Cascades captured
Higher is better
Greedy
0.6
0.5
0.4
In-links
0.3
All outlinks
0.2
# Posts
Random
0.1
0
0
20
40
60
Number of blogs
Blog selection
80
100
Running time (seconds)
400
0.7
Exhaustive search
(All subsets)
300
Naive
greedy
200
100
Fast greedy
0
1
2
3
4
5
6
7
8
9
10
Number of blogs selected
Blog selection
~45k blogs
Outperforms state-of-the-art heuristics
700x speedup using submodularity!
45
Predicting the “hot” blogs
Detects on training set
0.25 Greedy on future
Test on future
“Cheating”
0.2
#detections
Greedy
0.15
Greedy on historic
Test on future
0.1
0.05
0
0
1000
2000
3000
4000
Number of posts (time) allowed
#detections
Cascades captured
Want blogs that will be informative in the future
Split data set; train on historic, test on future
Let’s see what
goes wrong here.
200
0
Jan
Feb
Mar
Apr
May
Saturate
Blog
selection
“overfits”
Detect
poorly
Detect well
here!
here!
to training data!
200
Poor generalization!
0
Want
Jan
Febblogs
Mar that
Apr
May
Why’s
that?
continue to do well!
46
Online optimization
Fi(A) = detections
in interval i
#detections
“Overfit” blog
selection A
#detections
Greedy
200
0
Jan
Feb
F1(A)=.5
200
0
Mar
Apr
May
Saturate
F3 (A)=.6 F5 (A)=.02
F2 (A)=.8 F4(A)=.01
Jan
Feb
Mar
Apr
May
Online optimization:
47
Online maximization of submodular rankings
[Streeter, Golovin, Krause NIPS ‘09]
Pick sets
SFs
A1
F1
Reward
r1=F1(A1)
A2
A3
AT
…
F2
F3
…
FT
r2
r3
…
rT
Total: t rt  max
Time
Theorem
Can efficiently choose A1,…At s.t. in expectation
for any sequence Fi, as T!1
“Can asymptotically get ‘no-regret’ over clairvoyant greedy”
48
Avg. normalized performance
Results on blogs
1
T=47
0.8
0.6
0.4
0.2
0
0
100
200
Time (days)
300
Performance of online algorithm converges quickly
to clairvoyant (“cheating”) offline greedy algorithm!
49
Current work
AI/ML,
Optimization
Sensor and
information
networks
How can we infer a model of a complex system from
data in a principled manner?
How can we learn to adaptively optimize the
performance of a complex, distributed system?
50
Current work: Inferring Networks of Diffusion
[with Leskovec, Gomez-Rodriguez]
Want to detect
information cascades
Often, only know time
of occurrence, not links
Examples:
Information propagation
Epidemics
Neural activation?
1
34
2
Want to infer underlying
network (edges and directions)
4
25
3
1
51
Current work: Inferring Networks of Diffusion
[with Leskovec, Gomez-Rodriguez]
Actual network inferred from 172 million articles from 1 million news sources
Theoretical performance guarantees for GM structure learning!
52
Current work: Community Sense & Response
[with Chandy, Clayton, Faulkner, Golovin]
Privately-held
sensors
Common goal
Contribute
sensor data
Can’t continuously
monitor (bandwidth
power / privacy / …)
Estimate spatial
phenomenon
(traffic, weather, …)
Detect earthquakes
(w Chandy, Clayton)
…
Can’t keep track of all sensors
Let sensors decide when their information is useful!
53
Distributed online sensor selection
[Golovin, Faulkner, K IPSN ’10]
Centralized greedy algorithm
Guarantees (1-1/e) of optimal value! 
Needs to know submodular function F in advance 
Searches through all possible sensors for activation
 large communication overhead 
Distributed online greedy (DOG)
Sensors learn to independently
decide whether to activate based
on local observations
Don’t need to know F in advance
Small (constant) communication overhead
Guaranteed to quickly converge to same
performance of centralized algorithm! 
54
Structure in AI problems
AI/ML last 10 years:
AI/ML “next 10 years:”
Convexity
Submodularity 
Kernel machines
SVMs, GPs, MLE…
New structural
properties
Structural insights help us solve challenging problems
Shameless plug:  www.submodularity.org
MATLAB Toolbox for optimizing submodular functions (JMLR MLOSS ’10)
Tutorial slides (ICML ’08 and IJCAI ’09), References & Video
DISCML ‘09: NIPS Workshop on Discrete Optimization in ML
ROBOPAL ’10: RSS Workshop on Active Learning in Robotics
55
Conclusions
Sensing and information acquisition problems are
important and ubiquitous
Can exploit structure to find provably good solutions
Presented algorithms with strong guarantees
Perform well on real world problems
Thanks:
56
Download