Visual Analytics CS 4460/7450 - Information Visualization March 16, 2010 John Stasko Agenda • Overview of what the term means and how it relates to information visualization • Some example VA research projects • Specific example, Jigsaw, helping investigative analysis Spring 2010 CS 4460/7450 2 1 Acknowledgment Slides looking like this provided courtesy of Jim Thomas CS 4460/7450 3 Visual Analytics • A new term for something that is familiar to all of us • Informal description: Using visual representations to help make decisions Spring 2010 CS 4460/7450 4 2 VA vs. InfoVis • Compare & contrast Spring 2010 CS 4460/7450 5 Visual Analytics Definition Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces. People use visual analytics tools and techniques to Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data Detect the expected and discover the unexpected Provide timely, defensible, and understandable assessments Communicate assessment effectively for action. CS 4460/7450 Thomas & Cook 2005 “The beginning of knowledge is the discovery of something we do not understand.” ~Frank Herbert (1920 - 1986) # 3 Visual Analytics • Not really an “area” per se More of an “umbrella” notion • Combines multiple areas or disciplines • Ultimately about using data to improve our knowledge and help make decisions Spring 2010 CS 4460/7450 7 Alternate Definition • Visual analytics combines automated analysis techniques with interactive visualizations for an effective understanding, reasoning and decision making on the basis of very large and complex data sets Keim et al, chapter in Information Visualization: Human-Centered Issues and Perspectives, 2008 Spring 2010 CS 4460/7450 8 4 Synergy • Combine strengths of both human and electronic data processing Gives a semi-automated analytical process Use strengths from each From Keim Spring 2010 CS 4460/7450 9 Main Components Interactive visualization Analytical reasoning Spring 2010 Computational analysis CS 4460/7450 10 5 Back to InfoVis Comparison • Clearly much overlap • Perhaps fair to say that infovis hasn‟t always focused on analysis tasks so much and that it doesn‟t always include advanced data analysis algorithms Not a criticism, just not focus InfoVis has a more narrow scope (Some of us actually believe that infovis has/should include those topics) Spring 2010 CS 4460/7450 11 Academic Context Visual Analytics ~2005 Art-y casual infovis Spring 2010 Analytical reasoning Computational analysis Information Visualization ~1990 CS 4460/7450 12 6 Visual Analytics • Encompassing, integrated approach to data analysis Use computational algorithms where helpful Use human-directed visual exploration where helpful Not just “Apply A, then apply B” though Integrate the two tightly Spring 2010 CS 4460/7450 13 VA-related Areas • Visualization InfoVis & SciVis • Data management Databases, information retrieval • Data Analysis Knowledge discovery, data mining • Perception & cognition • Human-computer interaction Spring 2010 CS 4460/7450 14 7 Visual Analytics Partnership Disciplines • • • • Statistics, data representation and statistical graphics Geospatial and Temporal Sciences Applied Mathematics Knowledge representation, management and discovery – Ontology, semantics, NLP, extraction, synthesis, … • Cognitive and Perceptual Sciences • Comunications: Capture, Illustrate and present a message • Decision sciences • Information and Scientific Visualization And far more than homeland security CS 4460/7450 15 Multiple Techniques Contribute to Threat Assessment Prediction Visual Analytics Cognition Graph Matching Analysis Pattern Analysis Evidence Extraction Organization Synthesis Content Management Link Discovery Integration Connect the Dots Extraction Aggregation Data CS 4460/7450 Information Knowledge 16 8 Uses Today • • • • • • • Scientific Research Regulatory and Legal Communities Intelligence Analysis DOE and DOD Market Assessments Capability Analysis - Resumes Medical and Pharmaceutical Communities • National Security and Law Enforcement • Information Assurance, Web Analytics • Technology Scanning, Asset and Intellectual Property Management CS 4460/7450 17 Capabilities Desired • Reduce the threat of terrorism through the invention, development, evaluation, and deployment of technology to analyze masses of data in different formats and types, from different sources, with highly varying degrees of confidence levels, within time frames required for rapid decision making. • Better understand the risks and vulnerabilities of our critical infrastructures, trade, ports, and immigration by combining sensor, computational and visual analytics technologies for in-the-field and strategic decision making. • Enable rapid visual communication technology for response teams for clear understanding of the situation assessment and alternate options for response with geospatial, and multi-jurisdictional situations for WME and natural disasters. • Ensure effective information communication methods and technologies throughout DHS missions of analysis, risk, levels of alerts, and response, in unwrappable levels of assessment with evidence and communication styles aimed within audience-centric applications for rapid understanding and action. • Provide an enduring talent base of educated professionals supporting future developments requiring visual communication of integrated information and operational support missions. CS 4460/7450 18 9 Research Agenda • Available at http://nvac.pnl.gov/ in PDF form • At IEEE Press in book form • Special thanks to IEEE Technical Committee on Visualization and Graphics CS 4460/7450 19 Overview of the R&D Agenda • Challenges • Science of Analytical Reasoning • Science of Visual Representations and Interactions • Data Representations and Transformations • Production, Presentation, and Dissemination • Moving Research Into Practice • Positioning for an Enduring Success CS 4460/7450 20 10 Projects • Let‟s look at some of the research projects in this area Spring 2010 CS 4460/7450 21 IN-SPIRE™ Visual Document Analysis A “Thinking Aid” for advanced investigation of unstructured text Uncovers Common Topics in Large Document Collections Enganging Displays for Exploration Multiple Query and Search Tools Supports Real-Time Streaming Data Compatible with Foreign Languages Shows Trends over Time CS 4460/7450 http://in-spire.pnl.gov Video 22 11 HARVEST Combined user-system synthesis of knowledge Users annotate and refine existing knowledge Video Spring 2010 Gotz, Zhou & Aggarwal VAST „06 CS 4460/7450 D-DUPE 23 Video System for entity resolution in large networks such as bibliographic collections System does computational analysis and provides suggestions and user can augment and correct Spring 2010 CS 4460/7450 Bilgic et al VAST „06 24 12 Other Examples CS 4460/7450 25 Many Others • A number of nice examples shown earlier on Graph & Network visualization day Wong: Graph Signatures Perer: Social Action etc. Spring 2010 CS 4460/7450 26 13 Application Area • Investigative & Intelligence Analysis Gather information from various sources then analyze and reason about what you find and know Analyze situations, understand the particulars, anticipate what may happen Spring 2010 CS 4460/7450 27 Definitions • Thinking1 - or reasoning - involves objectively connecting present beliefs with evidence in order to believe something else • Critical Thinking1 is a deliberate meta-cognitive(thinking about thinking) thinking act whereby a person reflects on the quality of the reasoning process simultaneously while reasoning to a conclusion. • Intelligence1 is a specialized form of knowledge, an activity, and an organization. As knowledge, intelligence informs leaders, uniquely aiding their judgment and decision-making. … 1. Critical Thinking and Intelligence Analysis: David Moore CS 4460/7450 28 14 Critical Thinking* “…the quality of our life and that of what we produce, make, or build depends precisely on the quality of our thoughts.” Elements of thought: Points of View Implications & Consequences Purpose of the Thinking Question at Issue Assumptions Concepts Information Interpretation And Inference * Foundations of Critical Thinking www.criticalthinking.org CS 4460/7450 29 Example: Heuer’s Central Ideas • “Tools and techniques that gear the analyst’s mind to apply higher levels of critical thinking can substantially improve analysis… structuring information, challenging assumptions, and exploring alternative interpretations.” CS 4460/7450 30 15 Investigative Process Spring 2010 CS 4460/7450 Pirolli & Card Intl Conf Intelligence Analysis „05 31 Pain Points • Cost structure of scanning and selecting items for further attention • Analysts‟ span of attention for evidence and hypotheses Spring 2010 CS 4460/7450 32 16 HW 5 – Investigative Analysis • Did you encounter those pain points? • Discuss How did you work on the problem? What were the main challenges? Spring 2010 CS 4460/7450 33 Jigsaw • Visualization for Investigative Analysis across Document Collections Targeting analysis Law enforcement Fraud detection Intelligence community “Putting the pieces together” Spring 2010 CS 4460/7450 34 17 Problem Addressed • Help investigative analysts discover plans, plots and threats embedded across the individual documents in large document collections DBs Spring 2010 Documents/ case reports Blogs CS 4460/7450 35 Document Focus • Unstructured (plain) text documents • Semi-structured documents such as spreadsheets Spring 2010 CS 4460/7450 Examples 36 18 Our Focus • Entities within the documents Person, place, organization, phone number, date, license plate, etc. • Thesis: A plot/threat within the documents will involve a set of entities in coordination Spring 2010 CS 4460/7450 Doc 1 John Smith 37 Doc 2 Doc 3 June 1, 2009 Mary Wilson Atlanta Boston PETA Bill Jones Spring 2010 CS 4460/7450 38 19 Entity Identification • Must identify and extract entities from plain text documents Crucial for our work • Not our main research focus – Collaborate with or use tools from others Spring 2010 CS 4460/7450 39 Example Document Spring 2010 CS 4460/7450 40 20 Entities Identified Spring 2010 CS 4460/7450 41 Connections • Entities relate/connect to each other to make a larger “story” • Connection definition: Two entities are connected if they appear in a document together The more documents they appear in together, the stronger the connection Spring 2010 CS 4460/7450 42 21 Jigsaw “Putting the pieces together” • Multiple visualizations (views) of documents, entities, & their connections • Views are highly interactive and coordinated • User actions generate events that are transmitted to and (possibly) reflected in other views Spring 2010 CS 4460/7450 43 CS 4460/7450 44 System Views Spring 2010 22 The Need for Pixels 45 Even More! Spring 2010 CS 4460/7450 46 23 Even More! Spring 2010 CS 4460/7450 47 Demo Spring 2010 CS 4460/7450 48 24 Console Spring 2010 CS 4460/7450 49 Document View Spring 2010 CS 4460/7450 50 25 List View Spring 2010 CS 4460/7450 51 CS 4460/7450 52 Graph View Spring 2010 26 Report Cluster View Spring 2010 CS 4460/7450 53 Entity Connection View Spring 2010 CS 4460/7450 54 27 WordTree View Spring 2010 CS 4460/7450 55 Calendar View Spring 2010 CS 4460/7450 56 28 Document Grid View Spring 2010 CS 4460/7450 57 Scatterplot View Spring 2010 CS 4460/7450 58 29 Timeline View Spring 2010 CS 4460/7450 59 Room to Improve • What Jigsaw doesn‟t do so well now The end-part of the Pirolli-Card model Helping the analyst take notes, organize evidence, generate hypotheses, etc. (but we‟re working on it) Sometimes called “evidence marshalling” Others have focused more on that aspect… Spring 2010 CS 4460/7450 60 30 i2’s Analyst Notebook Spring 2010 CS 4460/7450 61 Analyst’s Notebook • Leading commercial tool in this space (law enforcement and intelligence agencies) • Large zooming workspace where analyst creates networks of entities and notes • Often used to produce presentation or story of analysis done Spring 2010 CS 4460/7450 62 31 Oculus’ Sandbox Video Wright et al CHI „06 Spring 2010 CS 4460/7450 63 Sandbox • Flexible space for inserting text and graphics • Objects can be dragged-and-dropped from their other analysis tools • Flexible level of detail • Flexible gestures for making space, inserting, etc. • Assertions with evidence gates • Reasoning templates Spring 2010 CS 4460/7450 64 32 PARC’s Entity Workspace Video Bier, Card & Bodnar VAST „08 Spring 2010 CS 4460/7450 65 Entity Workspace • Tools for rapid ingest of entities from documents • Can snap together entities into groups • Can indicate level of interest in objects • Four main view panels, with zooming UI Spring 2010 CS 4460/7450 66 33 Jigsaw Future Work • Collaborative capabilities • Improved evidence marshalling • Present/browse investigation history • Scalability upward • Web document ingest • Implement network algorithms • Document similarity • Sentiment analysis Spring 2010 • • • • • • • • • Wikipedia & Intellipedia Geospatial view Better timeline capabilities Document clustering by theme/topic Reliability/uncertainty Other types of data Active crawling/RSS ingest Try it on display wall Deployment to real clients CS 4460/7450 67 Related Area of Interest • Sensemaking • A general term that has been used in a number of different contexts E.g., How large corporations make decisions • To me, ultimately about people working with data and information to understand it better Spring 2010 CS 4460/7450 68 34 Sensemaking Nice definition: “A motivated , continuous effort to understand connections (which can be among people, places, and events) in order to anticipate their trajectories and act effectively.” – Klein, Moon and Hoffman IEEE Intelligent Systems „06 Spring 2010 CS 4460/7450 69 Alternate Definition “The process of creating situation awareness in situations of uncertainty” – D. Leedom, ‟01 SM Symp. Report Situation awareness: “It‟s knowing what‟s going on so you know what to do” – B. McGuinness, quoting an Air Force pilot Spring 2010 CS 4460/7450 70 35 Chang et al WireVis Information Visualization „08 • Another VA investigative analysis project • Helping Bank of America examine wire transfers of money • Want to detect fraud and illegal actions Thanks to R. Chang for some slide content Spring 2010 CS 4460/7450 71 Particulars • Who – Bank analysts • Problem – Detect money laundering and fraud in wire transfers of money • Data – Electronic records of wire transactions and information associated with each Spring 2010 CS 4460/7450 72 36 Background • Wire transfers of money can be complex Have a “from” and “to” but often many “middlemen May not know who intermediaries are • Millions of transfers per day occur Vast majority are legal • Bank has legal responsibility to report suspicious activities Spring 2010 CS 4460/7450 73 Data • Each transaction: Money amount Payer (could be third party) Payee (could be an agent) Potential intermediaries Addresses of payer and payee, instructions, additional comments are optional Spring 2010 CS 4460/7450 74 37 Challenges London Singapore Charlotte, NC Indonesia No Standard Form… When a wire leaves Bank of America in Charlotte… The recipient can appear as if receiving at London, Indonesia or Singapore Vice versa, if receiving from Indonesia to Charlotte The sender can appear as if originating from London, Singapore, or Indonesia Spring 2010 CS 4460/7450 75 Challenges • Scale: BoA may do 200k transfers per day • No international standard: loosely structured data • Bad guys are smart and one step ahead Detection tools are always reactive Spring 2010 CS 4460/7450 76 38 Existing Detection • Examine for certain temporal patterns of activity • Look for keywords in free text Filter transactions based on these highly secretive words Typically a few hundred Updated based on intelligence reports Spring 2010 CS 4460/7450 77 Current Practices • Load transactions into large relational DB • Download some amount to spreadsheets via filtering based on keywords, amounts, dates, … • Can only look at a week or two this way • Difficult to notice temporal patterns Spring 2010 CS 4460/7450 78 39 System Overview Heatmap View (Accounts to Keywords Relationship) Strings and Beads (Relationships over Time) Spring 2010 Search by Example (Find Similar Accounts) Keyword Network (Keyword Relationships) CS 4460/7450 79 Heatmap View List of Keywords Sorted by frequency from high to low (left to right) Hierarchical clusters of accounts Sorted by activities from big companies to individuals (top to bottom) Fast “binning” that takes O(3n) Spring 2010 Number of occurrences of keywords Light color indicates few occurrences CS 4460/7450 80 40 Strings and Beads Each string corresponds to a cluster of accounts in the Heatmap view Each bead represents a day Y-axis can be amounts, number of transactions, etc. Fixed or logarithmic scale Spring 2010 CS 4460/7450 Time 81 Keyword Network • Each dot is a keyword • Position of the keyword is based on their relationships Keywords close to each other appear together more frequently Using a spring network, keywords in the center are the most frequently occurring keyword • Link between keywords denote co-occurrence Spring 2010 CS 4460/7450 82 41 Search By Example Target Account Histogram depicts the occurrences of keywords User interactively selects features within the histogram used in comparison Accounts that are within the similarity threshold appear ranked (most similar on top) Similarity threshold slider Spring 2010 CS 4460/7450 83 CS 4460/7450 84 Video Spring 2010 42 This Topic • It‟s what I do now :^) • A number of the group projects are really in this space • Interested in getting more work in this area started Spring 2010 CS 4460/7450 85 Upcoming • Time Series Data Reading: Aigner et al • Spring Break – woo woo • Interaction Reading: Yi et al Spring 2010 CS 4460/7450 86 43