NSF/DHS FODAVA-LEAD: Missions and Plans Haesun Park Computational Science and Engineering Division Georgia Institute of Technology FODAVA Kick-off Meeting, September 2008 Data and Visual Analytics (DAVA) Analytical Reasoning Visual Representation and Interaction Data Representation and Transformation Production, Presentation, Dissemination Data and Visual Analytics (DAVA) Analytical Reasoning • Apply human judgment to reach conclusions • Methods to maximally utilize human capacity to derive deep understanding and insight into complex situations in a minimum amount of time Data Representation and Transformation • Representing dynamic, incomplete, conflicting data to convey important content in a form and level of abstraction appropriate to the analytical task to enable understanding • Transforming data among possible representations to support analysis and discovery Visual Representation and Interaction • Visual presentation of information in ways that instantly convey important content taking advantage of human vision • Interaction techniques (e.g., search) between the analyst and data to facilitate the analytical reasoning process Production, Presentation, Dissemination • Seamless integration of data acquisition, analysis, decision making, and action A Discipline in Data & Visual Analytics I think, therefore I am. Data Representation and Transformation Analytical Reasoning “Solving a problem simply means representing it so that the solution is obvious.” Herbert Simon, 96 Foundations Visual Representation and Interaction Production Presentation and Dissemination FODAVA is concerned with defining the mathematical and computational foundations for the Data and Visual Analytics Discipline Applications Epidemiology Medical Informatics : Bioinformatics Astrophysics Homeland Security Text Analysis Biometric Recognition Theory and practice of knowledge integration, management and use in healthcare delivery, med, public health Social Networks • FODAVA team will perform foundational research that can be applied to many different fields – Common end objective is to apply knowledge in decision making process, at the time and place that a decision is needed. – Common challenges across applications as well as application specific challenges VISION: Establishing DAVA as a Distinct Discipline Data Analytics Visualization Production, Presentation, Dissemination Analytic Reasoning Data and Visual Analytics Mathematical and Computational Foundations • Establish Body of Knowledge • Develop FODAVA community, engage larger DAVA field – Foundations, subareas, applications – Researchers – Curriculum – Educators – Education programs – Practitioners Data and Visual Analytics Communities National Visualization and Analytics Center (NVAC)/VAC Consortium FODAVA FODAVA lead FODAVA partners (08, 09, …) RVAC/ DHS Science & Technology Center of Excellence “This partnership with NSF is the most important event since the creation of NVAC in March 2004. It brings to the front stage efforts by folks within DHS, NVAC and NSF to jointly fund the development of basic research in visual analytics supporting DHS applied mission needs.” ~Jim Thomas, NVAC Director FODAVA will interact with several communities of researchers & practitioners FODAVA-Lead Mission • Research and Education: Serve as a central facility that will involve all FODAVA awardees in a common effort to develop the scientific foundations for data and visual analytics • Effective Liaison between FODAVA Researchers and NVAC: Interface with DHS NVAC/RVAC and DHS S&T Center of Excellence in research and educational opportunities • Community Building: Integrate diverse DAVA communities and reach out for broader participation FODAVA-Lead Challenges Research and Collaboration • Creation of the Mathematical and Computational Sciences Foundations required to represent and transform all types of digital data in ways to enable efficient and effective Visualization and Analytic Reasoning • Intrinsic Challenges: Data sets massive, heterogeneous, multi-dimensional, dirty, incomplete, time-varying; solutions must be produced with time and space constraints, …. • Understanding Fundamental issues/needs in VA and Communicating results – Isolated theoretical research is not enough – Problem driven foundational research is needed FODAVA-Lead Challenges (cont’d) • Education and Research – Defining Foundations of Data and Visual Analytics – Undergraduate and Graduate Curriculum (core body of knowledge) for Data and Visual Analytics • Community Building/Integration – A community of researchers who claim DAVA as their own discipline and FODAVA an essential part – Conferences, journals, books, professional society engagement, – Industry, tech transfer, … FODAVA-Lead PIs at GAtech Alex Gray CSE Machine Learning Fast Algorithms for Massive DA Vladimir Koltchinskii Mathematics Machine Learning Theory Computational Statistics Haesun Park Director CSE, Associate Chair Numerical Computing Data Analysis Research, FODAVA Community Building Renato Monteiro ISyE Continuous Optimization Statistical Computing John Stasko Associate Director IC, Associate Chair SRVAC Co-Director Information Vis. Collaboration with NVAC and RVACs Liaison with Vis. community FODAVA-Lead Senior Personnel James Foley Associate Dean CoC Graphics and Visualization, HCI Visual Analytics Digital Library Alexander Shapiro ISyE Stochastic Programming Optimization Multivariate Stat. Analysis Richard Fujimoto Associate Director CSE, Chair Modeling and Simulation Education and Outreach Santosh Vempala CS Theory of Computig Director of ARC Guy Lebanon Arkadi Nemirovski CSE ISyE Machine Learning Optimization Computational Statistics Non-parametric Stat. Hongyuan Zha CSE Numerical Computing Data Analysis Director of Graduate Studies Hao-Min Zhou Mathematics Wavelet and PDE Image Processing 2008 FODAVA Partners • • • • • • • Global Structure Discovery on Sampled Spaces Leonidas Guibas and Gunnar Carlsson (Stanford University) Visualizing Audio for Anomaly Detection Mark Hasegawa-Johnson, Thomas Huang, Hank Kaczmarski, Camille Goudeseune (University of Illinois Urbana-Champaign) Principles for Scalable Dynamic Visual Analytics H. Jagadish, and George Michailidis (University of Michigan) Efficient Data Reduction and Summarization Ping Li (Cornell University) Uncertainty-Aware Data Transformations for Collaborative Reasoning Kwan-Liu Ma (UC Davis) Mathematical Foundations of Multiscale Graph Representations and Interactive Learning Mauro Maggioni, Rachael Brady, Eric Monson (Duke University) Visually-Motivated Characterizations of Point Sets Embedded in HighDimensional Geometric Spaces Leland Wilkinson and Robert Grossman (University of Illinois Chicago) Adilson Motter (Northwestern University) Expertise of FODAVA team Computational Math&Statistics Human Computer Interaction Information Visualization Database Real-time Systems Machine Learning Numeric & Geometric Computing Optimization Simulation Gaming Graphics and Vis. Information Retrieval High Performance Computing Discrete/Graph Algorithms Speech Recognition FODAVA Activities • Body of Knowledge – – – – Curriculum development Repository for education materials Distinguished lecture series Outreach to underrepresented groups • Community Development – Communications: project description and results – FODAVA web site • Repository of FODAVA data sets and results – Conferences and meetings • • • • Annual FODAVA Workshop NVAC Consortium meetings Activities at established meetings Meetings to establish new research directions Curriculum Development • Goals – Identify and catalog curriculum development efforts in Data and Visual Analytics • Individual courses, minors, degree programs • Undergraduate and graduate level – Leverage existing efforts (e.g., RVAC) – Share experiences, develop best practices – Develop curriculum recommendations • Curriculum workshop – POCs: Cook (NVAC), Fujimoto (FODAVA), Stasko (RVAC and FODAVA) – December 2008, Atlanta, Georgia Visual Analytics Digital Library (http://vadl.cc.gatech.edu) • Developed by Georgia Tech (Foley et al.) in Southeast Regional Visual Analytics Center • Repository for curriculum and education materials – Lecture notes – Homeworks, projects – Reference materials, videos, etc. • Includes evolving taxonomy for Data and Visual Analytics • FODAVA will build upon this resource to – Provide a library and web portal of FODAVA educational materials – Expand support to DAVA community to include FODAVA areas – Document curriculum develop efforts Distinguished Lecture Series • Goal: Provide forum for leaders in DAVA community to articulate vision and DAVA-related research and education activities and applications • Plans (2009) Photo: Joe Kielman, VAC Consortium meeting, 2008 – Lecture series featuring leaders in the data and visual analytics community – Develop in collaboration with FODAVA partners, NVAC, RVAC, DHS/S&T CoE – Webcast Outreach to Underrepresented Groups Example: GT CRUISE Program • CRUISE: CSE Research Undergraduate Intern Summer Experience • Encourage students to consider PhD studies • Diverse student participation – Multicultural, emphasizing minorities, women – U.S. and international students • Ten week summer research projects in areas such as data and visual analytics, high performance computing, modeling & simulation • Interdisciplinary individual and group projects – Year-long collaboration with North Carolina A&T University • CRUISE-wide events – Weekly seminars (technical, grad studies) – Social events – Symposium: conference-style presentations FODAVA Website http://fodava.gatech.edu • Forum for FODAVA Community • Maintain close collaboration with NVAC • Functionality – Dissemination of results to user communities – DAVA community events and meeting information depot – Repository of data sets for FODAVA community FODAVA Annual Workshop (from Fall 2009) • Annual Theme – Initially more mathematically/computationally oriented – Increasing emphasis over time on visualization, humancomputer interaction, cognitive science, … • Organizers – Co-organized in collaboration among FODAVA-Lead, FODAVA-Partners, NVAC, and DHS S&T Center of Excellence • Time – Co-locate with NVAC Fall Consortium meeting • Location – PNNL/NVAC, Richland, WA FODAVA Annual Workshop 2009 • Theme: Machine Learning & Geometric Computing in Visual Analytics • Organizers: Vladimir Koltchinskii (GATech) and Mauro Maggioni (Duke) • Time: November, 2009 • Location: PNNL/NVAC, Richland, WA VAC Consortium Meetings • Provides broader exposure of work, to DHS and NVAC communities • Semi-annual: Next Meeting: Nov 11-13, 2008, PNNL – Nov. 11: University Technical Exchange Day – FODAVA Panel session – FODAVA Demo/Poster session • Please participate! Additional Workshops • FODAVA workshops at major conferences and meetings • IEEE VAST Conference – Birds of a Feather session at VAST Oct., 2008 • Workshop on Temporal Analytics • Other Potential venues – – – – – – International Conference on Machine Learning Neural Information Processing Systems (NIPS) SIAM CSE / SIAM Optimization / SIAM ALA Conferences ACM Knowledge Discovery and Data Mining (KDD) AAAS meeting Others? Calendar of Events • Sept 2008: FODAVA Kick-Off Meeting • Oct 2008: VAST 2008 BoF Session • Nov 2008: VAC Consortium meeting, FODAVA Panel and Poster/Demo Session • Dec 2008: DAVA Curriculum Workshop • May 2009: VAC Consortium Meeting • Oct 2009: VAST Conference • Nov 2009: VAC Consortium and FODAVA Annual Workshop • Temporal Analytics Workshop under consideration Project Materials • Goal: Articulate contributions being made by the FODAVA community • Benefits – Potential collaborators – Foster technology transition opportunities – Broader exposure to potential sponsors • Materials requested – Project brochures and other collateral material – Videos especially welcome • Tell us what you’re doing! • POC: Richard Fujimoto Concluding Remarks • DAVA represents a new, exciting discipline that brings together diverse communities • Research is motivated and driven by real-world problems • FODAVA will play a key role in developing and defining the foundations for DAVA • Communication and collaboration with other elements of DAVA (e.g., NVAC, RVAC, DHS/S&T CoE) is essential – We need to educate ourselves! Thank you! Extra slides Student Interns • Support deep research collaboration between FODAVA lead, FODAVA partners, and PNNL / NVAC – Fundamental research driven by real-world applications • Leverage existing intern programs at PNNL – Summer interns • Leverage GT distance learning capability for academic year interns • Details to be determined Undergraduate Education • Georgia Tech Threads curriculum – Undergraduate program defined as a set of 8 threads – Thread is a body of coursework targeting a certain career path, e.g., modeling and simulation, human computer interaction, embedded systems, etc. – Students take two threads to complete BS in CS degree • Existing threads – – – – – – – – Modeling and Simulation: representing processes/systems Devices: embedded computing Theory: theoretical foundations of computing Information Networks: information communication Intelligence: human-level intelligence Media: systems for creative expression People: human-centric computing Platforms: computing systems, architecture, languages Modeling & Simulation Thread • Many students come to Georgia Tech with an inherent love for math and science • Computation provides a framework to view, understand, analyze, and design systems Computational modeling is about going from to Involves developing mathematical / conceptual abstractions of systems that can be represented by efficient software Fluid flow model Cellular Automata Queueing Model A Data and Visual Analytics Thread? Aero Civil, Elect. EAS, Biology Chemistry, Math Physics, Industrial Eng. Application Discipline (pick one) Computational Methods for Data Analysis And Visualization ? Math Discrete Math Continuous Math Computing Theory Software Hardware Algorithms Science Physics Biology Chemistry Foundations • Curriculum • • • Foundational mathematics, computing, science Data analytics, information visualization Application-oriented specialization • Integrated approach with capstone design project • Natural complement to modeling and simulation thread Application Domains • DHS: Intelligence analysis, Law Enforcement, Emergency response, Intrusion and fraud detection, …. • BioMedical Informatics • Bioinformatics/Systems Biology • Astronomy • Text Analysis: Documents, e-mails, … • Cybersecurity • Transportation • … Vladimir Koltchinskii, School of Mathematics • Machine Learning - Learning Theory - Feature Selection - Theory of Sparse Recovery - Empirical Risk Minimization • Computational Statistics Sparse Recovery : For automatic determination of relevant features (Basis pursuit, Soft threshholding, LASSO …) Comprehensive theory is only starting to be developed Penalized Empirical Risk Minimization: Basis for many solutions in basic problems of learning theory, e.g. regression, classification, density estimation Challenge: extend the theory of sparse recovery to broader framework of learning theory, e.g. infinite classes of functions Renato Monteiro, School of Industrial & Sys. Eng. • Continuous Optimization - Interior-point methods - Semidefinite programming - Cone programming - Algorithms for large-scale optimization • Computational Statistics and Graph Theory Dimension Reduction and Semi-definite Programming • Higher level of reduction with more difficult objective function • Learning manifolds which preserve ordering of distances • Off-the-shelf SDP software does not scale • Design of efficient algorithms based on the first-order method, convex-concave saddle point problem Alexander Gray, Computational Sci. & Eng. Goal: make machine learning efficient – For massive datasets, e.g. for astronomy, Large Hadron Collider, network traffic – For fast visualization, e.g. our new manifold learning methods • Developed fastest practical algorithms for many learning methods • Coming in Dec 2008: MLPACK library John Stasko, School of Interactive Computing and GVU Center Information Visualization Human Computer Interaction Visualization for Investigative Analysis Putting the Pieces Together with Jigsaw Help investigative analysts discover plans, plots and threats embedded across large document collections Multiple visualizations (views) of the documents, entities, & their connections Views are highly interactive and coordinated Analysts explore the documents and entities through the views Building a collaborative version Representing reliability and uncertainty Entity aliasing and hierarchy support Visualizing the investigative process Haesun Park, Computational Sci. & Eng. • Numerical Computing • Algorithms for Massive Data Analysis - Dimension Reduction - Clustering and Classification • Bioinformatics - Microarray analysis - Protein structure prediction Effective Dimension Reduction with Prior Knowledge • Dimension Reduction for Clustered Data: Linear Discriminant Analysis (LDA), Generalized LDA (LDA/GSVD), Orthogonal Centroid Method (OCM) • Dimension Reduction for Nonnegative Data: Nonnegative Matrix Factorization (NMF) • Applications: Text Classification, Face Recognition, Fingerprint Classification, Gene Clustering in Microarray Analysis … Education and Outreach Goals FODAVA lead will • Encourage and coordinate development of FODAVA Curriculum • Encourage and coordinate knowledge exchange toward creating a workforce pipeline – Undergraduate education – Graduate education – Lifelong learning • Facilitate research collaboration • Facilitate outreach to underrepresented groups Engaging FODAVA Community • FODAVA program provides a platform to bring together community of researchers, educators and practitioners • Activities might include – Education workshops to share experiences, develop best practices – Curriculum development – Repository of information and teaching materials (e.g., SRVAC, VADL)