Optimizing Sensing from Water to the Web Andreas Krause Cornell University, March 19 2010 rsrg@caltech ..where theory and practice collide Monitoring algal blooms Algal blooms threaten freshwater 4 million people without water 1300 factories shut down $14.5 billion to clean up Other occurrences in Australia, Japan, Canada, Brazil, Mexico, Great Britain, Portugal, Germany … Growth processes still unclear Need to characterize growth in the lakes, not in the lab! Tai Lake China 10/07 MSNBC2 Monitoring rivers and lakes [Singh, K, Guestrin, Kaiser, Journal of AI Research ‘08] Need to monitor large spatial phenomena Temperature, nutrient distribution, fluorescence, … NIMS Kaiser et.al. (UCLA) Can only make a limited number of measurements! Depth Color indicates actual temperature Predicted temperature Use robotic sensors to cover large areas Predict at unobserved locations Location across Where should welake sense to get most accurate predictions? 3 Monitoring water networks [K, Leskovec, Guestrin, VanBriesen, Faloutsos, J Wat Res Mgt ‘08] Contamination of drinking water could affect millions of people Contamination Sensors Simulator from EPA Hach Sensor Place sensors to detect contaminations ~$14K “Battle of the Water Sensor Networks” competition Where should we place sensors to quickly detect contamination? 4 Sensing problems Want to learn something about the state of the world Estimate water quality in a geographic region, detect outbreaks, … We can choose (partial) observations… Make measurements, place sensors, choose experimental parameters … … but they are expensive / limited hardware cost, power consumption, … Want to cost-effectively get most useful information! Fundamental problem in AI: How do we automate curiosity & serendipity? 5 Related work Sensing problems considered in Experimental design (Lindley ’56, Robbins ’52…), Spatial statistics (Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …), … Existing algorithms typically Heuristics: No guarantees! Can do arbitrarily badly. Find optimal solutions (Mixed integer programming, POMDPs): Very difficult to scale to bigger problems. 6 Research in my group Theoretical: Approximation algorithms that have theoretical guarantees and scale to large problems Applied: Empirical studies with real deployments and large datasets 7 Running example: Detecting fires Want to place sensors to detect fires in buildings 8 Bayesian’s view of sensor networks X1 X2 Y1 Y3 Y2 X4 Y4 Xs: temperature X3 X5 Y5 at location s X6 Ys: sensor value Y6 Ys = Xs + noise at location s Joint probability distribution P(X1,…,Xn,Y1,…,Yn) = P(X1,…,Xn) P(Y1,…,Yn | X1,…,Xn) Prior Likelihood 9 Why is this useful? X1 X2 Y1 X3 Y3 Y2 X4 Y4 X6 X5 Y5 Y6 Robust reasoning: Integrate measurements from multiple sensors. E.g.: P(X2 | y1,y2,y3) likely more accurate than P(X2 | y2) Exploiting correlation: Can predict P(X1, X3 | y2) Can turn some sensors off to save battery life 10 Making observations CNH X2 X1 CNH CNH X3 Y2 Y1Y =hot 1 Y3 CNH X5 CNH X4 Y5 Y4 Less uncertain Reward[ P(X|Y1=hot)] = 0.2 11 Making observations CNH X2 X1 CNH CNH X3 Y2 Y1 Y33=hot CNH X5 CNH X4 Y5 Y4 Reward[ P(X|Y3=hot)] = 0.4 12 A different outcome… CNH X2 X1 CNH CNH X3 Y2 Y1 Y3=cold CNH X5 CNH X4 Y5 Y4 Reward[ P(X|Y3=cold)] = 0.1 13 Example reward functions Should we raise a fire alert? Temp. X X1 CNH Fiery hot normal/cold No alarm -$$$ 0 Raise alarm $ -$ Actions Only have belief about temperature P(X = hot | obs) choose a* = argmaxa x P(x|obs) U(x,a) Decision theoretic value of information Reward[ P(X | obs) ] = maxa x P(x|obs) U(x,a) 14 Other example reward functions Entropy Reward[ P(X) ] = -H(X) = x P(x) log2 P(x) Expected mean squared prediction error (EMSE) Reward[ P(X) ] = -1/n s Var(Xs), Many other objectives possible and useful… 15 Value of information [Lindley ’56, Howard ’64] For any set A of sensors, its sensing quality is F(A) = yA P(yA) Reward[P(X | yA)] Observations Reward when observing made by sensors A YA = yA 16 Optimizing sensing / Outline Sensing locations Sensing quality Sensing cost Sensing budget Sensor placement Robust sensing Complex constraints Adaptive sensing 17 Maximizing value of information [Krause, Guestrin, Journal of AI Research ’09] Want to find a set A* µ V, |A*| · k s.t. A* = argmax|A|· k F(A) Theorem: Complexity of optimizing value of information X1 X1 X2 X2 X3 For chains (HMMs, etc.) Optimally solvable in polytime X3 X4 For trees: PP complete NP X5 18 Approximating Value of Information Given: finite set V of locations Want: A*µ V such that X2 X1 Y2 Y1 Typically NP-hard! Greedy algorithm: X3 Y3 X5 X4 Y5 Start with A = ; For i = 1 to k s* := argmaxs F(A [ {s}) A := A [ {s*} Y4 How well can this simple heuristic do? 19 Performance of greedy 50 Optimal OFFIC E 52 12 9 54 OFFIC E 51 49 QUIET PHONE 11 8 53 16 15 10 C ONFER ENC E 13 14 7 17 18 STOR AGE 48 LAB ELEC C OPY 5 47 19 6 4 46 45 21 2 SERVER 44 K ITC HEN 39 37 Greedy 41 38 36 23 33 35 40 42 22 1 43 20 3 29 27 31 34 25 32 30 28 24 26 Temperature data from sensor network Greedy empirically close to optimal. Why? 20 Key observation: Diminishing returns Selection A = {Y1, Y2} Selection B = {Y1,…, Y5} Y2 Y1 Y2 Y1 Y3 Y5 Y4 Theorem [Krause and Guestrin, UAI ‘05]: If Y cond. ind. given X: Information gain F(A) = H(X) – H(X | YA) is submodular! Y‘ Adding Y’ doesn’t help much Adding Y’ will help a lot! New sensor Y’ Submodularity: B A + Y’ Large improvement + Y’ Small improvement For Aµ B, F(A [ {Y’}) – F(A) ¸ F(B [ {Y’}) – F(B) 21 One reason submodularity is useful Theorem [Nemhauser et al ‘78] Greedy algorithm gives constant factor approximation F(Agreedy) ¸ (1-1/e) F(Aopt) ~63% Greedy algorithm gives near-optimal solution! For information gain: Guarantees best possible unless P = NP! [Krause & Guestrin ’05] Many more reasons, sit back and relax… 22 Building a Sensing Chair [Mutlu, K, Forlizzi, Guestrin, Hodgins, UIST ‘07] People sit a lot Activity recognition in assistive technologies Seating pressure as user interface Equipped with 1 sensor per cm2! Costs $6,000! Lean Lean Slouch left forward Can we get similar accuracy with fewer, 82% accuracy on cheaper sensors? 10 postures! [Zhu et al] 23 How to place sensors on a chair? Sensor readings at locations V as random variables Predict posture Y using probabilistic model P(Y,V) Pick sensor locations A* µ V to minimize entropy: Possible locations V Placed sensors, did a user study: Accuracy Before 82% After 79% Cost $6,000 $100 Similar accuracy at <2% of the cost! 24 Battle of the Water Sensor Networks Competition [K, Leskovec, Guestrin, VanBriesen, Faloutsos, J Wat Res Mgt 2008] Real metropolitan area network (12,527 nodes) Water flow simulator provided by EPA 3.6 million contamination events Multiple objectives: Detection time, affected population, … Place sensors that detect well “on average” 25 Sensor placement in water networks Simulator predicts utility of placing sensors Water flow dynamics, demands of households, … For each subset A µ V compute utility F(A) Model predicts High impact Contamination Low impact location Medium impact location S3 S1S2 S1 S4 S3 Sensor reduces impact through early detection! Theorem: Set V of all S1 Impact network reduction F(A) is submodular! junctions High sensing quality F(A) = 0.9 S2 S4 Low sensing quality F(A)=0.01 26 BWSN Competition results 13 participants Performance measured in 30 different criteria G: Genetic algorithm D: Domain knowledge E: “Exact” method (MIP) H: Other heuristic Higher is better Total Score 30 G H D D G G H E G G H E 25 20 15 10 5 0 24% better performance than runner-up! 27 What was the trick? Running time (minutes) Lower is better Simulated all 3.6M contaminations on 2 weeks / 40 processors 152 GB data on disk , 16 GB in main memory (compressed) Very slow evaluation of F(A) Very accurate sensing quality 30 hours/20 sensors 300 200 100 Exhaustive search (All subsets) Naive greedy Fast greedy 0 1 2 3 4 5 6 7 8 9 10 Number of sensors selected 6 weeks for all 30 settings ubmodularity to the rescue: Using “lazy evaluations”: 1 hour/20 sensors after 2 days! Advantage through theory andDone engineering! 28 What about worst-case? Knowing the sensor locations, an adversary contaminates here! S3 S2 S1 S3 S2 S4 S4 S1 Placement detects Very different average-case score, well on “average-case” Same worst-case score (accidental) contamination Where should we place sensors to quickly detect in the worst case? 29 Optimizing for the worst case Separate utility function Fi with each contamination i Fi(A) = impact reduction by sensors A for contamination i Want to solve Contamination at node s (A) is high Fs(B) Sensors B Sensors A Each of the Fi is submodular Contamination Unfortunately, mini Fi not submodular! at node r Fr(B) (A) is high low How can we solve this robust sensing problem? 30 Outline Sensor placement Robust sensing Complex constraints Adaptive sensing 31 How does the greedy algorithm do? V={s1, s2, s3} s2 s1 s3 Optimal solution Optimal score: 1 Buy k=2 sensors Fi = intrusion at si Set A {s1} F1 1 F2 0 mini Fi 0 {s2} {s3} 0 1 0 Hence {s1,s3}we can’t 1 find any approximation algorithm. {s2,s3} 1 {s1,s2}Or can 1 we? 1 1 Greedy picks s3 first Then, can choose only s1 or s2 Greedy score: Greedy does arbitrarily badly Can we do better? Theorem [NIPS ’07]: The problem max|A|· k mini Fi(A) does not admit any approximation unless P=NP 32 Alternative formulation If somebody told us the optimal value, can we recover the optimal solution A*? Need to find Is this any easier? Yes, if we relax the constraint |A| · k 33 Solving the alternative problem Trick: For each Fi and c, define truncation c Fi(A) F’i,c(A) |A| Problem 1 (last slide) Problem 2 Same Non-submodular optimal solutions! Submodular! one solves the other Don’t know howSolving to solve Can use greedy! 34 Back to our example Guess c=1 First pick s1 Then pick s2 Optimal solution! s2 s1 Set A {s1} {s2} {s3} {s1,s3} {s2,s3} {s1,s2} F1 1 0 F2 0 1 mini Fi 0 0 F’avg,1 1 1 1 (1+)/2 1 1 ½ ½ (1+)/2 1 s3 How do we find c? Do binary search! 35 SATURATE Algorithm [K, McMahan, Guestrin, Gupta JMLR ‘08] Given: set V, integer k and submodular functions F1,…,Fm Initialize cmin=0, cmax = mini Fi(V) Do binary search: c = (cmin+cmax)/2 Greedily find AG such that F’avg,c(AG) = c If |AG| · k: increase cmin If |AG| > k: decrease cmax until convergence Truncation threshold (color) 36 Theoretical guarantees Theorem: The problem max|A|· k mini Fi(A) does not admit any approximation unless P=NP Theorem: Saturate finds a solution AS such that mini Fi(AS) ¸ OPTk and |AS| · k where OPTk = max|A|·k mini Fi(A) = 1 + log maxs i Fi({s}) Theorem: If there were a polytime algorithm with better factor < , then NP µ DTIME(nlog log n) 37 Example: Lake monitoring pH value Monitor pH values using robotic sensor transect Observations A Prediction at unobserved locations True (hidden) pH values Var(s | A) Position s along transect Use probabilistic model (Gaussian processes) to estimate prediction error Where should we sense to minimize our maximum error? Robust sensing problem! (often) submodular [Das & Kempe ’08] 38 Comparison with state of the art Algorithm used in geostatistics: Simulated Annealing [Sacks & Schiller ’88, van Groeningen & Stein ’98, Wiens ’05,…] 7 parameters that need to be fine-tuned Maximum marginal variance Maximum marginal variance better 0.25 0.2 Greedy 0.15 Saturate 0.1 0.05 0 0 Simulated Annealing 20 40 Number of sensors 60 2.5 2 Greedy 1.5 Simulated Annealing 1 Saturate 0.5 0 20 40 60 Number of sensors Saturate is competitive & 10x faster Precipitation data Environmental monitoring No parameters to tune! 80 100 39 Maximum detection time (minutes) Lower is better Results on water networks 3000 2500 Greedy 2000 Simulated Annealing 1500 No decrease until all contaminations detected! 1000 500 0 0 Saturate 10 20 Number of sensors Water networks 60% lower worst-case detection time! 40 Summary so far Submodularity in optimization All thesesensing applications involve physical sensing. Now Greedy is near-optimal [UAIsomething ’05, JMLR ’07, KDD ’07] for completely different. Let’s jump from water… Path planning Robust sensing Communication constraints Greedy fails badly Saturate is near-optimal [JMLR ’08, IPSN ‘08] Adaptive sensing Greedy fails badly pSPIEL gives strong guarantees [IJCAI ’07, IPSN ’06, AAAI ‘07] Adaptive Submodularity [’10] Regret bounds for GP Optimization [’10] 41 … to the Web! You have 10 minutes each day for reading blogs / news. Which of the million blogs should you read? 42 Cascades in the Blogosphere Time [Leskovec, K, Guestrin, Faloutsos, VanBriesen, Glance KDD 07 – Best Paper] Learn about story after us! Information cascade Which blogs should we read to learn about big cascades early? 43 Water vs. Web Placing sensors in water networks In both problems we are given vs. Selecting informative blogs Graph with nodes (junctions / blogs) and edges (pipes / links) Cascades spreading dynamically over the graph (contamination / citations) Want to pick nodes to detect big cascades early In both applications, utility functions submodular [Generalizes Kempe, Kleinberg, Tardos, KDD ’03] 44 Performance on Blog selection Lower is better Cascades captured Higher is better Greedy 0.6 0.5 0.4 In-links 0.3 All outlinks 0.2 # Posts Random 0.1 0 0 20 40 60 Number of blogs Blog selection 80 100 Running time (seconds) 400 0.7 Exhaustive search (All subsets) 300 Naive greedy 200 100 Fast greedy 0 1 2 3 4 5 6 7 8 9 10 Number of blogs selected Blog selection ~45k blogs Outperforms state-of-the-art heuristics 700x speedup using submodularity! 45 Predicting the “hot” blogs Detects on training set 0.25 Greedy on future Test on future “Cheating” 0.2 #detections Greedy 0.15 Greedy on historic Test on future 0.1 0.05 0 0 1000 2000 3000 4000 Number of posts (time) allowed #detections Cascades captured Want blogs that will be informative in the future Split data set; train on historic, test on future Let’s see what goes wrong here. 200 0 Jan Feb Mar Apr May Saturate Blog selection “overfits” Detect poorly Detect well here! here! to training data! 200 Poor generalization! 0 Want Jan Febblogs Mar that Apr May Why’s that? continue to do well! 46 Online optimization Fi(A) = detections in interval i #detections “Overfit” blog selection A #detections Greedy 200 0 Jan Feb F1(A)=.5 200 0 Mar Apr May Saturate F3 (A)=.6 F5 (A)=.02 F2 (A)=.8 F4(A)=.01 Jan Feb Mar Apr May Online optimization: 47 Online maximization of submodular rankings [Streeter, Golovin, Krause NIPS ‘09] Pick sets SFs A1 F1 Reward r1=F1(A1) A2 A3 AT … F2 F3 … FT r2 r3 … rT Total: t rt max Time Theorem Can efficiently choose A1,…At s.t. in expectation for any sequence Fi, as T!1 “Can asymptotically get ‘no-regret’ over clairvoyant greedy” 48 Avg. normalized performance Results on blogs 1 T=47 0.8 0.6 0.4 0.2 0 0 100 200 Time (days) 300 Performance of online algorithm converges quickly to clairvoyant (“cheating”) offline greedy algorithm! 49 Current work AI/ML, Optimization Sensor and information networks How can we infer a model of a complex system from data in a principled manner? How can we learn to adaptively optimize the performance of a complex, distributed system? 50 Current work: Inferring Networks of Diffusion [with Leskovec, Gomez-Rodriguez] Want to detect information cascades Often, only know time of occurrence, not links Examples: Information propagation Epidemics Neural activation? 1 34 2 Want to infer underlying network (edges and directions) 4 25 3 1 51 Current work: Inferring Networks of Diffusion [with Leskovec, Gomez-Rodriguez] Actual network inferred from 172 million articles from 1 million news sources Theoretical performance guarantees for GM structure learning! 52 Current work: Community Sense & Response [with Chandy, Clayton, Faulkner, Golovin] Privately-held sensors Common goal Contribute sensor data Can’t continuously monitor (bandwidth power / privacy / …) Estimate spatial phenomenon (traffic, weather, …) Detect earthquakes (w Chandy, Clayton) … Can’t keep track of all sensors Let sensors decide when their information is useful! 53 Distributed online sensor selection [Golovin, Faulkner, K IPSN ’10] Centralized greedy algorithm Guarantees (1-1/e) of optimal value! Needs to know submodular function F in advance Searches through all possible sensors for activation large communication overhead Distributed online greedy (DOG) Sensors learn to independently decide whether to activate based on local observations Don’t need to know F in advance Small (constant) communication overhead Guaranteed to quickly converge to same performance of centralized algorithm! 54 Structure in AI problems AI/ML last 10 years: AI/ML “next 10 years:” Convexity Submodularity Kernel machines SVMs, GPs, MLE… New structural properties Structural insights help us solve challenging problems Shameless plug: www.submodularity.org MATLAB Toolbox for optimizing submodular functions (JMLR MLOSS ’10) Tutorial slides (ICML ’08 and IJCAI ’09), References & Video DISCML ‘09: NIPS Workshop on Discrete Optimization in ML ROBOPAL ’10: RSS Workshop on Active Learning in Robotics 55 Conclusions Sensing and information acquisition problems are important and ubiquitous Can exploit structure to find provably good solutions Presented algorithms with strong guarantees Perform well on real world problems Thanks: 56