Visual analytic tools for monitoring and understanding the emergence and evolution of innovations in science & technology Cody Dunne Dept. of Computer Science and Human-Computer Interaction Lab, University of Maryland cdunne@cs.umd.edu Links from this talk: bit.ly/stmwant OECD KNOWINNO Workshop November 14-15, 2011 Alexandria, VA, USA 1 Outline 1. 2. 3. 4. 5. Academic literature exploration Case study: Tree visualization techniques Case study: Business intelligence news Case study: Pennsylvania innovations STICK approach 2 1. Academic literature exploration Users are looking for: 1. Foundations 2. Emerging research topics 3. State of the art/open problems 4. Collaborations & relationships between Communities 5. Field evolution 6. Easily understandable surveys 3 Action Science Explorer 4 User requirements • Control over the paper collection – Choose custom subset via query, then iteratively drill down, filter, & refine • Overview either as visualization or text statistics – Orient within subset • Easy to understand metrics for identifying interesting papers – Ranking & filtering • Create groups & annotate with findings – Organize discovery process – Share results 5 Action Science Explorer • Bibliometric lexical link mining to create a citation network and citation context • Network clustering and multi-document summarization to extract key points • Potent network analysis and visualization tools www.cs.umd.edu/hcil/ase 6 2. Case study: Tree visualization • Problem: Traditional 2D node-link diagrams of trees become too large • Solutions: – Treemaps: Nested Rectangles – Cone Trees: 3D Interactive Animations – Hyperbolic Trees: Focus + Context • Measures: – Papers, articles, patents, citations,… – Press releases, blog posts, tweets,… – Users, downloads, sales,… 7 Treemaps: nested rectangles www.cs.umd.edu/hcil/treemap-history 8 Smartmoney MarketMap Feb 27, 2007 smartmoney.com/marketmap 9 Cone trees: 3D interactive animations Robertson, G. G., Card, S. K., and Mackinlay, J. D., Information visualization using 3D interactive animation, Communications of the ACM, 36, 4 (1993), 51-71. Robertson, G. G., Mackinlay, J. D., and Card, S. K., Cone trees: Animated 3D visualizations of hierarchical information, 10 Proc. ACM SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York, (April 1991), 189-194. Hyperbolic trees: focus & context Lamping, J. and Rao, R., Laying out and visualizing large trees using a hyper-bolic space, Proc. 7th Annual ACM symposium on User Interface Software and Technology, ACM Press, New York (1994), 13-14. Lamping, J., Rao, R., and Pirolli, P., A focus+context technique based on hy-perbolic geometry for visualizing large 11 hierarchies, Proc. SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York (1995), 401-408. TM=Treemaps CT=Cone Trees HT=Hyperbolic Trees Patents Academic Papers Trade Press Articles Tree visualization publishing 12 TM=Treemaps CT=Cone Trees HT=Hyperbolic Trees Patents Academic Papers Tree visualization citations 13 Insights • Emerging ideas may benefit from open access • Compelling demonstrations with familiar applications help • Many components to commercial success • 2D visualizations w/spatial stability successful • Term disambiguation & data cleaning are hard Shneiderman, B., Dunne, C., Sharma, P. & Wang, P. (2011), "Innovation trajectories for information visualizations: Comparing treemaps, cone trees, and hyperbolic trees", Information Visualization. http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2010-16 14 3. Case study: Business intelligence news Proquest 2000-2009 Term hyperion Frequency 3122 Term Frequency decision support system 39 data mining 889 business process reengineering 36 business intelligence 434 data mart 29 knowledge mgmt. 221 business analytics 21 data warehouse 207 text mining 19 data warehousing 139 predictive analytics 18 cognos 112 business performance mgmt 6 competitive intelligence 86 online analytical processing 5 electronic data itrch. 69 knowledge discovery in database 1 meta data 69 ad hoc query 1 15 PQ Business Intelligence 2000-2009 Co-occurrence of concepts with organizations Data Mining Frequency • • • • • • • • • Year National Security Agency NSA White House FBI AT&T American Civil Liberties Union Electronic Frontier Foundation Dept. of Homeland Security CIA Business Intelligence 2000-2009 Matrix showing CoOccurrence of concepts and orgs. 18 Business Intelligence 2000-2009: (subset) 19 Business Intelligence 2000-2009: Data mining • • • • • • • • • • • NSA CIA FBI White House Pentagon DOD DHS AT&T ACLU EFF Senate Judiciar Committee 20 Business Intelligence 2000-2009: Tech1 • • • • Google Yahoo Stanford Apple Tech2 • IBM, Cognos • Microsoft • Oracle Finance • • • • • NASDAQ NYSE SEC NCR MicroStrategy 21 Business Intelligence 2000-2009: • • • • • Air Force Army Navy GSA UMD* 22 Insights • Useful groupings in PQ BI terms based on events and long-term collaborators • Interactive line charts useful for looking at cooccurrence relationships over time • Clustered heatmaps useful for overall cooccurrence relationships stick.ischool.umd.edu 23 4. Case study: Pennsylvania innovations • Innovation relationships during 1990 – State & federal funding – Patents (both strong and weak ties) – Location • Connecting – State & federal agencies – Universities – Firms – Inventors 24 Patent Tech SBIR (federal) PA DCED (state) Related patent 2: Federal agency 3: Enterprise 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries 19: Other states Patent Tech SBIR (federal) PA DCED (state) Related patent 2: Federal agency 3: Enterprise 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries 19: Other states No Location Philadelphia Patent Tech Navy SBIR (federal) PA DCED (state) Related patent 2: Federal agency Pharmaceutical/Medical Pittsburgh Metro 3: Enterprise 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries Westinghouse Electric 19: Other states No Location Philadelphia Patent Tech Navy SBIR (federal) PA DCED (state) Related patent 2: Federal agency Pharmaceutical/Medical Pittsburgh Metro 3: Enterprise 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries Westinghouse Electric 19: Other states Insights • Meta-layouts useful for showing: – Groups (clusters, attributes, manual) – Relationships between them • User comments – “We've never been able to see anything like this“ – “This is going to be huge" www.terpconnect.umd.edu/~dempy/ 29 5. STICK approach • NSF SciSIP Program – Science of Science & Innovation Policy – Goal: Scientific approach to science policy • The STICK Project – Science & Technology Innovation Concept Knowledge-base – Goal: Monitoring, Understanding, and Advancing the (R)Evolution of Science & Technology Innovations STICK approach cont… • Scientific, data-driven way to track innovations – Vs. current expert-based, time consuming approaches (e.g., Gartner’s Hype Cycle, tire track diagrams) • Includes both concept and product forms – Study relationships between • Study the innovation ecosystem – Organizations & people – Both those producing & using innovations stick.ischool.umd.edu 31 STICK Process (overview) • Identify concepts • Business intelligence, cloud computing, customer relationship management, health IT, web 2.0, electronic health records, biotech • Query data sources • Processing • • • Automatic entity recognition Crowd-sourced verification Co-occurrence networks • Visualizing & analyzing • • • • News • Dissertation • Academic • Patent • Blogs Overall statistics Co-occurrence networks Network evolution • Sharing results 32 Process 1. 2. 3. 4. Collecting Processing Visualizing & Analyzing Collaborating Cleaning Collecting Identify Concepts • Begin with target concepts – – – – Business Intelligence Health IT Cloud Computing Customer Relationship Management – Web 2.0 – Personal Health Records – Nanotechnology • Develop 20-30 sub concepts from domain experts, wikis Data Sources • News • Dissertation • Academic • Patent • Blogs Collecting (2) • Form & Expand Queries ABS( "customer relationship management" OR "customers relationship management" OR "customer relation management" ) OR TEXT(…) OR SUB(…) OR TI(…) • Scrape Results Processing Automatic Entity Recognition • BBN IdentiFinder Crowd-Sourced Verification • Extract most frequent 25% • Assign to CrowdFlower – Workers check organization names and sample sentences Processing (2) • Compute Co-Occurrence Networks – Overall edge weights – Slice by time to see network evolution • Output CSV GraphML Visualizing & Analyzing Spotfire • Import CSV, Database • Standard charts • Multiple coordinated views • Highly scalable NodeXL • CSV, Spigots, GraphML • Automate feature – Batch analysis & visualization • Excel 2007/2010 template Shared data & analysis repositories • Online Research Community • Share data, tools, results – Data & analysis downloads – Spotfire Web Player • Communication • Co-creation, co-authoring stick.ischool.umd.edu/community 39 Ongoing Work Collecting: Additional data sources and queries Processing: Improving entity recognition accuracy Visualizing & Analyzing: Visualizing network evolution • Co-occurrence network sliced by time Collaborating: Develop the STICK Open Community site • Motivate user participation • Improve the resources available • Invitation-only testing Outline 1. Academic literature exploration – Citation networks and text summarization 2. Case study: Tree visualization techniques – Papers, patents, and trade press articles 3. Case study: Business intelligence news – News term co-occurrence 4. Case study: Pennsylvania innovations – Patents, funding, and locations 5. STICK approach – Tracking innovations across papers, patents, news articles, and blog posts 41 Take Away Messages • Easier scientific, data-driven innovation analysis: – Automatic collection & processing of innovation data – Easy access to visual analytic tools for finding clusters, trends, outliers – Communities for sharing data, tools, & results Visual analytic tools for monitoring and understanding the emergence and evolution of innovations in science & technology Cody Dunne Dept. of Computer Science and Human-Computer Interaction Lab, University of Maryland cdunne@cs.umd.edu This work has been partially supported by NSF grants IIS 0705832 (ASE) and SBE 0915645 (STICK) Links from this talk: bit.ly/stmwant 43