UNCLASSIFIED Searching for the Quantifiable, Scalable, Verifiable, and Understandable Dewey Murdick, Ph.D. Program Manager Quantitative Methods in Defense of National Security, 25 May 2010 25 May 2010 UNCLASSIFIED 1 UNCLASSIFIED Intelligence Advanced Research Projects Activity (IARPA) 25 May 2010 UNCLASSIFIED 2 UNCLASSIFIED Overview IARPA’s mission is to invest in high-risk/high-payoff research programs that have the potential to provide the U.S. with an overwhelming intelligence advantage over our future adversaries This is about taking real risk. – CAVEAT: HIGH-RISK/HIGH-PAYOFF IS NOT A FREE PASS FOR STUPIDITY. – This is NOT about “quick wins”, “low-hanging fruit”, “sure things”, etc. Competent failure is acceptable; incompetence is not. “Best and brightest”. – World-class PMs. o – IARPA will not start a program without a good idea and an exceptional person to lead its execution. Full and open competition to the greatest possible extent. Cross-community focus. – Address cross-community challenges – Leverage agency expertise (both operational and R&D) – Work transition strategies and plans 25 May 2010 UNCLASSIFIED 3 UNCLASSIFIED The “P” in IARPA is very important Technical and programmatic excellence are required Each Program will have a clearly defined and measurable end-goal, typically 3-5 years out. – Intermediate milestones to measure progress are also required – Every Program has a beginning and an end – A new program may be started that builds upon what has been accomplished in a previous program, but that new program must compete against all other new programs This approach, coupled with rotational PM positions, ensures that… – IARPA does not “institutionalize” programs – Fresh ideas and perspectives are always coming in – Status quo is always questioned – Only the best ideas are pursued, and only the best performers are funded. 25 May 2010 UNCLASSIFIED 4 UNCLASSIFIED The “Heilmeier Questions” 1. What are you trying to do? 2. How does this get done at present? Who does it? What are the limitations of the present approaches? – Are you aware of the state-of-the-art and have you thoroughly thought through all the options? 3. What is new about your approach? Why do you think you can be successful at this time? – Given that you’ve provided clear answers to 1 & 2, have you created a compelling option? – What does first-order analysis of your approach reveal? 4. If you succeed, what difference will it make? – Why should we care? 5. How long will it take? How much will it cost? What are your mid-term and final exams? – What is your program plan? How will you measure progress? What are your milestones/metrics? What is your transition strategy? 25 May 2010 UNCLASSIFIED 5 UNCLASSIFIED The Three Strategic Thrusts (Offices) Smart Collection: dramatically improve the value of collected data – Innovative modeling and analysis approaches to identify where to look and what to collect. – Novel approaches to access. – Innovative methods to ensure the veracity of data collected from a variety of sources. Incisive Analysis: maximizing insight from the information we collect, in a timely fashion – Advanced tools and techniques that will enable effective use of large volumes of multiple and disparate sources of information. – Innovative approaches (e.g., using virtual worlds, shared workspaces) that dramatically enhance insight and productivity. – Methods that incorporate socio-cultural and linguistic factors into the analytic process. – Estimation and communication of uncertainty and risk. Safe and Secure Operations: countering new capabilities of our adversaries that could threaten our ability to operate effectively in a networked world – Cybersecurity o o Focus on future vulnerabilities Approaches to advancing the "science" of cybersecurity, to include the development of fundamental laws and metrics – Quantum information science & technology 25 May 2010 UNCLASSIFIED 6 UNCLASSIFIED Program Manager Interest Areas by Office safe and secure operations incisive analysis smart collection 25 20 May April2010 2010 UNCLASSIFIED 7 UNCLASSIFIED Concluding Thoughts on IARPA Technical Excellence & Technical Truth – Scientific Method – Peer/independent review – Full and open competition We are looking for outstanding PMs. How to find out more about IARPA: www.iarpa.gov 25 May 2010 UNCLASSIFIED 8 UNCLASSIFIED Conference on Technical Information Discovery, Extraction & Organization – – – Mark Heiligman, IARPA PM, Mile-wide, Mile-deep (M2) Exploration Held October 28-29, 2008, consisted of talks, breakout sessions, and open discussion Attended by 30+ researchers, business intelligence, and government participants Facilitated an open and active discussion on current methods, challenges, and opportunities in: – Information Retrieval This talk is a personal summary of – Text Processing the materials presented and – Knowledge Discovery – Information Extraction discussed at the conference. – Social Network Analysis – Scientometrics – Information Visualization and – Closely related research domains Goal: Drive technical innovation and explore novel applications in the area of systematically mining the global technical literature for useful and nonobvious information and insights 25 May 2010 UNCLASSIFIED 9 UNCLASSIFIED M2 Information Content Formal Presentations – Mile-wide, Mile-deep, Mark Heiligman, IARPA – Information Retrieval, Scientometrics/Text Mining,and Literature-related Discovery and Innovation, Ron Kostoff, MITRE – From Knowledge Mapping to Innovation Evolution, Hsinchun Chen, University of Arizona – Machine Learning for Extraction, Integration and Mining of Research Literature, Andrew McCallum, University of Massachusetts Amherst – Information Retrieval:The Path Ahead, Jamie Callan, Carnegie Mellon University – Sentiment Analysis from User Forums, Ronen Feldman, Hebrew University – The Accuracy of a Map of Science: Measurement & Implications, Richard Klavans, SciTech Strategies, Inc – Document Classification Using Nonnegative Matrix Factorization, Michael W. Berry, University of Tennessee, Knoxville Breakout Sessions & Open Discussion – richest idea content, and biggest contribution to what follows MITRE Summary: – A Two-step Analytic-workshop Process For Identifying Promising Research Opportunities, by Ronald Kostoff et al. 25 May 2010 UNCLASSIFIED 10 UNCLASSIFIED Problems Too Much Data / Diversity – Scale – Textual / Multimedia – Multilingual – Multiple Sources Too Complex – Motivation (Create / Disseminate) – Topics / Domains (# / Connectedness) – Shared Intentionally or Not Too Fast – Streaming Example for Technical Topics: Scientific Literature, Patents, Conference Proceedings, Talks, Technical Blogs, S&T News, Social Media, Experimental Data, Computational Models / Code, Forecasts, Corporate Filings, Government Funding, Policy, Public Opinion, etc. 25 May 2010 UNCLASSIFIED 11 UNCLASSIFIED Weak Signals in Context Find weak signals Use weak signals within context for – Finding connections – Anomaly detection/rare events – Cultural meaning / implications Manage uncertainty Development new standards for “ground truth” 25 May 2010 UNCLASSIFIED 12 UNCLASSIFIED Connecting Weak Signals Automated Connection Making / Knowledge Discovery Iterative information retrieval (IR), extraction (IE), and linkages identification Leveraging previous relevancy judgments and feedback Probabilistic linking of subjective qualities within text Goal: find high-value, low-signature information in context Material processing method X may be interesting for property Y ! Intriguing Rumors, Uncertain Source Analyst 25 May 2010 Analyst w/ Quantitative System UNCLASSIFIED Analyst 13 UNCLASSIFIED Enhancing Contextual Awareness Automatically – Leverage element characteristics in connection building process – Focused information augmentation from secondary sources – Characterize and apply to analogous situations o Network Behaviors and Features o Assessments of subjectivity (e.g., theme, sentiment) Goal: rapidly inform non-experts with context about a given area/issue Context Analyst 25 May 2010 UNCLASSIFIED 14 UNCLASSIFIED Identifying Outliers, Rare Events Automatically – Measuring and analyzing low-frequency indicators in group trends – Systematically identifying anomalies from records of interest and early-stage emerging technologies – Identifying rare events based on non-technical phrase association patterns – Extracting technical phrases of interest by targeting non-technical phrases such as sentiment, analysis, stylistics, etc. – Intelligent clustering techniques Goal: Identify significant rare events Is Jim doing something illegal? Bank statements Analyst 25 May 2010 UNCLASSIFIED 15 UNCLASSIFIED Collaboration (Two Different Kinds) Common playground facilitating: – Large-scale data sharing – Data discovery annotation – Error corrections – Multi-source integration – Recall of what has been done in the past Measure collaboration – Recognize cultural differences – Discover key players – Process changes over time 25 May 2010 UNCLASSIFIED 16 UNCLASSIFIED Multilingual Methods Need algorithms that can process, filter, and analyze multilingual data Leverage domain-specific machine translation Compare and contrast translated and multilingual data for improvements in queries, trends, etc. Language translation is high cost Translation is not enough to understand meaning in non-English text Cultural information helps to understand social landscape, motivation, and production of scientists in S&T 25 May 2010 UNCLASSIFIED 17 UNCLASSIFIED No Black Boxes No Algorithm black boxes – Shared environment for algorithm development – Success verifiable through indicator metrics – Output must be humanly comprehensible Human comprehension metrics: o Number of potential associations o Number of dimensions simultaneously analyzed o Steps to finding information o Amount of time to digest information o Amount of information at time o Efficiency of user-driven tuning of level-of-detail Algorithmic output exportable to interactive tools 25 May 2010 UNCLASSIFIED 18 UNCLASSIFIED User-Friendly Displays for Data Analysis Interactive and multifaceted views of scientific landscape – Geo-location – Entity Networks – Topical Networks Environments that provide both contextual awareness and visualizations – Contextual information (Wikipedia style) provided when user encounters unfamiliar term or concept Interactive interfaces to pull out information 25 May 2010 UNCLASSIFIED 19 UNCLASSIFIED Metric Validation Processes User studies and human labeling to verify data in information extraction(IE) and NLP is costly Use hybrid methods (e.g., boosting) Leverage automatically processed information from a external source to validate output Automating identification of trusted sources to help validation process Validate results with historical studies, knowledge of current state, and forecasts 25 May 2010 UNCLASSIFIED Serious Need for Novel Thinking 20 UNCLASSIFIED Things to Remember Track Uncertainty – Indicator metrics – Weak signals No black boxes – Human comprehensible output Provide clear view of evaluation metrics – Gold standards – Ground truth 25 May 2010 UNCLASSIFIED 21 UNCLASSIFIED Take Action Respond to an open BAA Chat with a Program Manager (PM) Come up with new ideas for programs, become a PM Provide information to open RFIs 25 May 2010 UNCLASSIFIED 22