Abdelzaher (UIUC) Research Milestones Due Q1 Q2 Q3 Q4 Description Estimation-theoretic QoI analysis. Formulation of analytic models for quantifying accuracy of prediction/estimation results. Extended analysis of semantic links in information networks. Formulation of information network abstractions that are amenable to analysis as new sensors in a data fusion framework. Data pool quality metrics and impact of data fusion. Formulation of metrics for data selection when all data cannot be used/sent. Validation of QoI theory. Documentation and publications. Research Milestones Due Q1 Q2 Q3 Q4 Description Estimation-theoretic QoI analysis. Formulation of analytic models for quantifying accuracy of prediction/estimation results. Extended analysis of semantic links in information networks. Formulation of information network abstractions that are amenable to analysis as new sensors in a data fusion framework. Data pool quality metrics and impact of data fusion. Formulation of metrics for data selection when all data cannot be used/sent. Validation of QoI theory. Documentation and publications. This Talk: Towards a QoI Theory for Data Fusion from Sensors + Information network links Fusion of hard sources Methods: • Bayesian analysis • Maximum likelihood Estimation • etc. Signal data fusion Fusion of soft sources Methods: • Ranking • Clustering • etc. Information Network Analysis Fusion of text and images Methods: • Transfer knowledge • CCM • etc. Machine Learning Fusion from human sources Methods: • Fact-finding • Influence analysis • etc. Trust, Social Networks Sensors, reports, and human sources 4 Sensor Fusion Example: Target Classification Different sensors (of known reliability, false alarm rates, etc) are used to classify targets Well-developed theory exists to combine possibly conflicting sensor measurements to accurately estimate Vibration Infrared motion sensor Target target attributes. Bayesian analysis Maximum likelihood Kalman filters etc. Acoustic sensors sensors Information Network Mining Example: Fact-finding Example 1: Consider a graph of who published where (but no prior knowledge of these individuals and conferences) Rank conferences and authors by importance in their field WWW Han KDD Roth Fusion Abdelzaher Sensys Example 2: Consider a graph of who said what (sources and assertions but no prior knowledge of their credibility) Rank sources and assertions by credibility John Claim1 Mike Claim3 Claim4 Claim2 Sally The Challenge How to combine information from sensors and information network links to offer a rigorous quantification of QoI (e.g., correctness probability) with minimal prior knowledge? P(armed convoy)=? John Claim1 Mike Claim3 Claim4 Vibration sensors Claim2 + Sally Acoustic sensors Target Infrared motion sensor Applications Understand Civil Unrest Remote situation assessment Use Twitter feeds, news, cameras, … Expedite Disaster Recovery Damage assessment and first response Use sensor feeds, eye witness reports, … Reduce Traffic Congestion Maping traffic congestion in city Use crowd-sourcing (of cell-phone GPS measurements), speed sensor readings, eye witness reports, … Approach: Back to the Basics Interpret the simplest fact-finder as a classical (Bayesian) sensor fusion problem Identify the duality between information link analysis and Bayesian sensor fusion (links = sensor readings) Use that duality to quantify probability of correctness of fusion (i.e., information link analysis) results Incrementally extend analysis to more complex information network models and mining algorithms An Interdisciplinary Team QoI Mining Fusion Task Task I1.1 I3.1 QoI Task I1.2 Abdelzaher (QoI, sensor fusion) Roth (fact-finders, machine learning) Aggarwal, Han (Data mining, veracity analysis) The Bayesian Interpretation John The Simplest Fact-finder: Rank (Claim j ) Rank (Source i ) 1 j 1 i Claim1 Rank (Source kSourcesj k Mike ) Claim2 Claim3 Rank (Claim k ) Sally Claim4 kClaimsi The Simplest Bayesian Classifier (Naïve Bayesian): P(Sensor P(Target j | Sensors ) P(Target j ) kSensorsj Z k | Target j ) The Equivalence Condition P(Sensor P(Target j | Sensors ) P(Target j ) k | Target j ) kSensorsj Z We know that for a sufficiently small xk: (1 x ) 1 x k k k k Consider individually unreliable sensors: P(Sensor k | Target j ) P(Sensor k ) 1 x jk , x jk 1 A Bayesian Fact-finder By duality, if: Sensors Sources Measured States Claims Then, Bayes Theorem eventually leads to: Rank (Claim j ) Rank (Source i ) and: Rank (Source kSourcesj Rank (Claim kClaimsi P(Claim j | network ) ( Rank (Claim j ) 1) P(Source i | network) ( Rank (Source i ) 1) k k ) ) Fusion of Sensors and Information Networks Source1 Sensor2 Sensor1 Sensor3 Source2 Claim3 Fusion Result Information Network Claim1 Claim2 Source3 Claim4 Putting fusion of sensors and information network link analysis on a common analytic foundation: Can quantify probability of correctness of results Can leverage existing theory to derive accuracy bounds Fusion of Sensors and Information Networks Source1 Sensor2 Sensor1 Sensor3 Source2 Claim3 Measurements Fusion Result Information Network Claim1 Claim2 Source3 Claim4 Measurements Putting fusion of sensors and information network link analysis on a common analytic foundation: Can quantify probability of correctness of results Can leverage existing theory to derive accuracy bounds Simulation-based Evaluation Generate thousands of “assertions” (some true, some false – unknown to the fact-finder) Generate tens of sources (each source has a different probability of being correct – unknown to the fact-finder) Sources make true/false assertions consistently with their probability of correctness A link is created between each source and each assertion it makes Analyze the resulting network to determine: The set of true and false assertions The probability that a source is correct No prior knowledge of individual sources and assertions is assumed Evaluation Results Comparison to 4 fact-finders from literature Significantly improved prediction accuracy of source correctness probability (from 20% error to 4% error) Evaluation Results Comparison to 4 fact-finders from literature (Almost) no false positives for larger networks (> 30 sources) Evaluation Results Comparison to 4 fact-finders from literature Below 1% false negatives for larger networks (> 30 sources) Abdelzaher, Adali, Han, Huang, Roth, Szymanski Coming up: The Apollo FactFinder Apollo: Improves fusion QoI from noisy human and sensor data. Demo in IPSN 2011 (in April) Collects data from cell-phones Interfaced to twitter Can use sensors and human text Analysis on several data sets: what really happened? Apollo Architecture Apollo: Towards Factfinding in Participatory Sensing, H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J. Han, D. Roth, B. Szymanski, and S. Adali, demo session at ISPN10, The 10th International Conference on Information Processing in Sensor Networks, April, 2011, Chicago, IL, USA. Apollo Datasets Track data from cell-phones in a controlled experiment 2 Million tweets from Egypt Unrest Tweets on Japan Earthquake, Tsunami and Nuclear Emergency Immediate Extensions Non-independent sources Sources that have a common bias, sources where one influences another, etc. Collaboration opportunities with SCNARC and Trust Non-independent claims Claims that cannot be simultaneously true Claims that increase or decrease each other’s probability Mixture of reliable and unreliable sources More reliable sources can help calibrate correctness of less reliable sources Road Ahead Develop a unifying QoI-assurance theory for fact-finding/fusion from hard and soft sources Sources Use different media: signals, text, images, … Feature differ authors: physical sensors, humans Capabilities Computes accurate best estimates of probabilities of correctness Computes accurate confidence bounds in results Enhances QoI/cost trade-offs in data fusion systems Integrates sensor and information network link analysis into a unified analytic framework for QoI assessment Accounts for data dependencies, constraints, context and prior knowledge Account for effect of social factors such as trust, influence, and homophily on opinion formation, propagation, and perception (in human sensing) Impact: Enhanced warfighter ability to assess information Collaborations QoI Mining Task I3.1 QoI/cost analysis (unified theory for estimation/prediction and information network link analysis Fusion Task I1.1 QoI Task I1.2 (w/Dan Roth) Account for prior knowledge and constraints (w/Jiawei Han) Consider new link analysis algorithms Community Modeling S2.2 Decisions under Stress S3.1 (w/Boleslaw Szymanski and Sibel Adali) Model humans in the loop (w/Aylin Yener) Increase OICC OICC Task C1.2 Sister QoI Task C1.1 (w/Ramesh Govindan) Improve communication resource efficiency Collaborations Collaborative – Multi-institution: Q2 (UIUC+IBM): Tarek Abdelzaher, Dong Wang, Hossein Ahmadi, Jeff Pasternack, Dan Roth, Omid Fetemieh, and Hieu Le, Charu Aggarwal, “On Bayesian Interpretation of Fact-finding in Information Networks,” submitted to Fusion 2011 Collaborative – Inter-center: Q2 (I+SC): H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J. Han, D. Roth, B. Szymanski, S. Adali, “Apollo: Towards Factfinding in Participatory Sensing,” IPSN Demo, April 2011 Q2 (I+SC): Mani Srivastava, Tarek Abdelzaher, Boleslaw Szymanski, “Human-centric Sensing,” Philosophical Transactions of the Royal Society, special issue on Wireless Sensor Networks, expected in 2011 (invited). Invited Session on QoI at Fusion 2011 (co-chaired with Ramesh Govindan, CNARC) Military Relevance Enhanced warfighter decision-making ability based on better quality assessment of fusion outputs A unified QoI assurance theory for fusion systems that utilize both sensors and information networks Offers a quantitative understanding of the benefits of exploiting information network links in data fusion Enhances result accuracy and provides confidence bounds in result correctness