.. .. .. .. . Institute for Visualization and Perception Research University of Massachusetts Lowell Computer Science Department Lowell MA 01854 Revision date – 10 February 2016 – Copyright UMass Lowell Weave Information Visualization for Social Responsibility Figure 1 – Obesity in the United States. CDC county-level data shows the percentage of the adult population that is characterized as obese. Red indicates the highest levels, blue the lowest. 1 .. .. .. .. . Table of Contents 1. Vision ..................................................................................................................................... 3 2. Social Responsibility ....................................................................................................... 6 3. First Open Source Product: Weave ............................................................................. 7 4. Data ........................................................................................................................................ 8 5. Analysis ................................................................................................................................ 8 6. Visualization .................................................................................................................... 10 7. Session State .................................................................................................................... 12 8. User Experience ............................................................................................................. 14 9. The Near Future: High-Impact Activities ............................................................... 16 9.1. SimpleWeave ............................................................................................................... 16 9.2. STEM (Science, Technology, Engineering and Mathematics) Learning .. 16 9.3. Large Group Collaboration ..................................................................................... 16 9.4. Data Commons ............................................................................................................ 17 9.5. Sharing Visualizations .............................................................................................. 17 9.6. InfoMaps ........................................................................................................................ 17 9.7. ADA-compliant Weave.............................................................................................. 18 10. Summary: Looking to the Future ........................................................................... 19 2 .. .. .. .. . 1. Vision Our mission is to change the world by democratizing data -- by providing access to data anywhere by anyone for any purpose. By equalizing the ability of organizations and individuals to use data, we all will be better able to make informed decisions and to influence the social, political, economic and environmental distribution of resources. Challenges to Democratizing Data: Access to data is limited. Our world faces significant problems, challenges that cannot be resolved without access to data. Yet we keep information locked away from the very people, organizations and institutions that can best help address those problems. Data that is accessible is not easy to understand. Most data is simply not available in clear, usable formats. It is expensive to access and analyze data. Most data remains unintelligible except to a privileged population – those with the resources and tools to manipulate, analyze and interpret it. We will approach a true data democracy when everyone – regardless of position or power in society – is able to understand, explore and interpret data in depth; to discover, generate and test hypotheses; and to make informed decisions. To make the best use of today’s veritable explosion of data, we need methods and tools that allow every individual, organization and institution to raise their awareness of and increase access to data. We also need ways to make analysis and sharing of that data easier. Such tools will affect large-scale social change by determining trends, causes, correlations and structures in that data. Several specific steps must be taken to attain a true data democracy. We must improve access to data. The first step is to provide universal webbased access to all available public data. To achieve this, we need software tools that facilitate secure distribution and access. We must provide education and training so that anyone can analyze and use data for decision-making. True lasting change cannot occur without knowledge and understanding. Data is the foundation of such knowledge. Data Information Knowledge 3 .. .. .. .. . Data becomes information once it is structured. Information becomes knowledge once it is analyzed and visualized. That step – analysis and visualization – allows knowledge to inform decisions. A well-informed public holds decision makers accountable and enables all to make better decisions. We must provide free, modern, web-based tools and training to all. To become truly informed, the public must have access to data analysis tools that are free, up to date, powerful, flexible well-maintained and easy to learn and use. These tools must provide universal web-based access to all available public data and must provide methods to share discovered knowledge. To fulfill our mission, our goal is to provide technology that will make it easy for everyone to access, analyze and share data. Though not a short-term goal, technology for democratizing data is a realizable one. In late 2011, the IVPR, in collaboration with the Open Indicators Consortium, released Weave Version 1.0 (see iWeave.org) which enables access to public data. Weave Version 2 (anticipated release: Spring 2013) will include SimpleWeave, a streamlined version, for users and small organizations with little or no IT support. SimpleWeave will allow users to easily upload, present and share their data. Weave Version 2 will also include document retrieval (Infomaps) which will automatically link data to associated documents, reducing the time and effort for separate online searches. Weave Version 2 will support collaboration, real time sharing of data by up to hundreds of remote users. Weave Version 3 will include an internet data engine that will enable search and access of any and all publicly available data much like a Google search for documents today. The final anticipated version, Weave Version 4, will include access to large national and international data sets. These are exciting possibilities and we’re on our way. Weave will empower small and large organizations, especially non-profits, to access, analyze, visualize, and use data, and affect change through the power of data and knowledge. Data Questions + Citizen Analyses + Visualizations => Insights and Informed Decisions 4 .. .. .. .. . The IVPR, its faculty, staff and students, are committed to providing the public with free tools to access, visualize, and analyze data. Here’s how you can help us support our goal. Financial Fund collaborative research or applied projects Fund one of the high-impact activities (See Section 9) Donate to the University of Massachusetts Lowell Foundation -- Weave Project Actions Download Weave and use it to analyze data or make data available Work with us to make national, international, and global data sets available Encourage the widespread use of Weave Promote Weave, blog about it, post about it, download it, and bring it up in conversation whenever possible. Promote the concept that data exists as a resource for all Figure 2 – Showing trees in Boston MA (each is labeled and probeable). 5 .. .. .. .. . 2. Social Responsibility The Institute for Visualization and Perception Research (IVPR) will continue to develop free and open-source software specifically designed to support not-for-profit organizations and the general public in the exploration and presentation of data. We are using the latest software technologies to develop software tools that support all aspects of data visualization and analysis. These tools can also help disseminate webbased visualizations. Free open-source software provides four freedoms or rights (as described by the Free Software Foundation): The freedom to run the program for any purpose. The freedom to study how the program works and to change it to make it do what you wish. The freedom to redistribute copies so you can help your neighbor. The freedom to improve the program and release your improvements (and modified versions in general) to the public so that the whole community benefits. Figure 3 - CDC Obesity data visualized by state 6 .. .. .. .. . 3. First Open Source Product: Weave Weave is an open-source state-of-the-art Web-based Analysis and Visualization Environment which provides software tools that researchers, educators, analysts, trainers, students and the general public can use to analyze and visualize remote, local or distributed data (iWeave.org). Weave is the IVPR’s fifth-generation visualization system and incorporates 20 years of research with embedded patented algorithms. The previous generations were desktop versions designed and used to solve complex problems in a variety of application areas including drug discovery, medicine, economics and national security. Weave is being developed in conjunction with the Open Indicators Consortium (OIC) and is currently in use by the Massachusetts Department of Early Education and Care, the Massachusetts Department of Higher Education and many organizations and government agencies including those in Boston, Chicago, Columbus, Grand Rapids, Kansas City, Seattle, Arizona, Connecticut and Rhode Island. People are using Weave to solve complex problems and to further the goal of data democratization. (See iWeave.org for examples.) Figure 4 - MBTA bus routes in Boston, Massachusetts. Red points indicate bus stops. 7 .. .. .. .. . 4. Data Providing universal secure web-based access to all available public data involves: 1. Raising public awareness that relevant data exists 2. Finding the data. 3. Using accepted access standards. We’ve addressed all three. We use standard query systems to access data, whether that data is in a local spreadsheet, a database on a server or distributed databases. We are developing the National Data Commons (NDC) to make available very large databases that are of interest to the public and organizations. We are developing SimpleWeave (see section 9) to facilitate Weave installations and further broaden data access. SimpleWeave will have numerous visualizationexamples and Wizards to demystify the visualization process. It will enable the average person to become data literate and make informed decisions, and produce and share their own visualizations. We and the OIC are are committed to the concept of open data and making publicly available data accessible to all. Figure 5 -- Weave-based website of OIC member, Connecticut Data Collaborative 5. Analysis To gain knowledge, raw data must first be converted to information. This is accomplished by curating the data which involves examining the data for errors or 8 .. .. .. .. . omissions and adding metadata (data about the data such as information about its source, collection and its structure). Computational tools can be used to efficiently handle missing data, to identify correlations, outliers and patterns in data, to generate and validate hypotheses and to convert these into actionable decisions. Weave provides significant support for data analysis—from clustering to trend analysis. Weave has both server-based computational engines such as the R-project for statistics or analysis and client-based computation. An equation editor allows a user to define any function on the data and generate new computed columns, titles or labeling. R-project can perform various analyses including exploration of trends or identification of clusters of data. The resulting clusters or ordering of data or attributes can be used as controls for other high-dimensional visualizations or highlighted as selections in any visualization. Missing data can be imputed, correlations computed (as per Pearson and others) and multiple clusterings compared and all visualized. Weave also includes other advanced analytic tools including Principal Component Analysis, Support Vector Machines and Clustering, Bayesian Networks, Multidimensional Scaling and Association Rules. In addition, classifiers can be generated, tested and validated. Once results have been identified (classes, trends, outliers, messages to convey), the analyst can use Weave tools to print high quality visualizations, to generate interactive visualizations on the web, and shortly, to make these web-pages ADA compliant. 9 .. .. .. .. . 6. Visualization Visualizations are critical in the decision-making process. We can easily determine the largest value within a column, but to determine its distribution is very difficult without visualization. Visualizations support decision makers as well as the discovery process. Figure 5 - Lowell Foreclosure data, Lowell, Massachusetts, showing census tracts and individual lots Visualization Tools All Weave visualizations can be embedded in web pages or used within Weave’s integrated visualization and analysis system. The visualizations are rich, broad and flexible. They range from exploratory and provide interactive experiences that are also aesthetically controllable. Analysis Weave tightly couples analysis and visualization thereby improving the efficiency and effectiveness of analysis. The user can generate a self-organizing map, a multidimensional scaling visualization or any other analysis-integrated visualization. 10 .. .. .. .. . Advanced Visual Analytics We have developed new powerful visualization and analysis technologies not just for simple data but also for high-dimensional data (hundreds of thousands of variables and millions of records). We have validated these high-dimensional visualizations in many application areas including drug discovery, health records monitoring, economics and national security. One of these tools, RadViz, has been extended to a multiple clustering visualization called vectorized RadViz. Vectorized RadViz shows stable sub-clusters within data and is a remarkably powerful visual and analytic tool. We have also enhanced RadViz dramatically for Weave to provide strong user support. Graphs and Networks With Weave, graphs can be visualized and then linked with other visualizations to greatly enhance visual analytics. In order to provide stable views of graphs, we have defined anchor points (user defined, context defined or computed from some clustering or MDSlike algorithm). The anchor pointlocations stay the same throughout the algorithm iterations thereby extending and speeding up current layout algorithms. Merging Text and Visualizations We have made our patented InfoMaps available within Weave. This provides an integration of Weave with document collections, whether locally available, in databases or in content management systems. For example, when selecting a subset within a visualization, relevant documents in a collection can be identified and highlighted. 11 .. .. .. .. . 7. Session State The Session State architecture on which Weave is based provides a mechanism that tracks each activity performed within Weave. This mechanism can be used to address two issues central to modern data analysis: the need to continually evolve new analysis tools and the ability to validate and replicate computational results. 1. To continually improve, expand and update data analysis tools, is it imperative that we understand how researchers and data explorers analyze data, discover patterns, and gain insights. 2. To validate computational or visual results, analysts must be able to replicate discoveries or analyses. Weave tracks every user action and produces a session history. These single-user or multiple-user session histories can be visualized and interpreted, used to build user profiles, to construct a recommendation system, and to support training and maintenance. Usage patterns may be identified and studied, sessions compared, user groups discovered and anomalous users identified. Sessions can be shared and used as templates. Thus, any user can view the visualization and the process including all the analytic steps used to generate it. A session history becomes an object which can be distributed for others to see and explore. 12 .. .. .. .. . Figure 6 - Nitrogen load in New England streams displayed using river basin shape files. Snapshot would be stored in the session state 13 .. .. .. .. . 8. User Experience Commercial Grade Software Usable, robust, maintainable and reliable software is expected from any software vendor. The IVPR has developed commercial-grade software for the past 20 years, including graphics software for Lockheed and Intel, risk assessment software for the Avon Breast Center at Massachusetts General Hospital and surgical simulations with haptics tactile feedback system for Sensable Technologies. Figure 7 – The Rhode Island Data Hub website is based on Weave visualizations. Other live Weave-based websites include: Metropolitan Area Planning Council (Boston), Michigan Data Collaborative (Grand Rapids) and Mid-Ohio Regional Planning Commission. 14 .. .. .. .. . Agile Software Development Process Weave was developed using the Agile Software Development Process with continual feedback from Weave users and developers. The current base of users in the OIC provides a strong usability evaluation group. Internally our students and faculty continually test all new capabilities in academic classes, in research projects and in response to OIC members. Bugs are reported and quickly addressed. Learning Community The OIC supports a learning community that shares deployment experiences, supports each other via group emails and holds an annual user group conference. A feature list is maintained and open to the public for suggesting long-range future desired features and also serves to identify usability issues. Weave is being extended to be ADA compliant and large multiple user collaboration interfaces are being designed and implemented. Future Capabilities Weave has proven to be a welcome and timely system. Its future capabilities discussed in the next session are broad and our design focus continues to focus on providing a high– performance and usable system for everyone to analyze and visualize data. Figure 8 - Lowell, Massachusetts – data displayed at parcel level 15 .. .. .. .. . 9. The Near Future: High-Impact Activities Future releases of Weave will include several high-impact tools thatwill support our goal of providing access to data anywhere by anyone. 9.1. SimpleWeave As mentioned above, we are developing SimpleWeave, a release specifically designed for the average person with no IT background. SimpleWeave will not require use of a server. A small not-for-profit community without an IT department will be able to easily install Weave. The user will download the SimpleWeave file, execute it, answer a few questions such as what maps they would like to use (county, street, neighborhood, ...), where their data resides (Excel file, database), and select a page from a template. SimpleWeave will read the appropriate shape files from the UMass Lowell/OIC server, set up the hosting pages and open a number of preset visualizations computed from the user’s data. A wizard will then lead the user through the selection of visualizations and interactions. This simple set-up solves a persistent problem for individuals and small nonprofits that do not have or cannot afford IT expertise, thus addressing a major impediment to data democratization. 9.2. STEM (Science, Technology, Engineering and Mathematics) Learning Weave session history tracks the actions taken while using Weave. We can support the analysis and classification of Weave session data. This information can be used to study how students approach learning, particularly in the STEM fields. Students could be asked to use Weave to complete a STEM-focused task (for example, “study the migration of birds”). As each student proceeds in his/her own way the system collects the student’s session history. By examining these session histories, teachers can observe the individual student’s problem-solving approach, monitor progress and determine where a student may have gone astray or may have used an unconventional but effective strategy. Since session histories can be clustered and analyzed, patterns may appear which show common approaches to that exercise (patterns of steps, and even more interesting, a classification into a very small number). These can be analyzed with demographic and other data to identify correlations. Such STEM tools will extend the classroom beyond current physical boundaries. They also will allow researchers to explore learning theories and can be used to provide STEM experience in non-STEM areas. 9.3. Large Group Collaboration Weave will support groups of users collaborating across the world. Weave's current collaboration feature allows a student who is at home to participate in class by following 16 .. .. .. .. . what the teacher is doing and using Weave to interact while the teacher and class watch. By extending collaboration to hundreds of users, Weave will improve user experience in a variety of scenarios from large-scale distributed classrooms to groups that wish to explore complex problems with experts from many different disciplines or viewpoints. Such tools dramatically enhance webinars by providing users with a shared interactive system for education and group discovery. 9.4. Data Commons We will provide new and open access to the growing body of national and international data (data sets and databases) now available on the Internet from both government and non-government sources. Each data set contains metadata, information that describes that data, in order to facilitate access and comparison. A user will be able to search for databases that contain, for example, the query " availability of “Meals-on-Wheels” in cities with low crime and fewer than 100,000 people” " The Data Commons will search its databases as well as the public indexed databases and return links to those databases that satisfy the query. That data will then be viewable in Weave. Such a tool will allow any organization to compare itself with others, to see how others are evolving, to perform queries across the collection of data and to enrich their research with these external datasets. 9.5. Sharing Visualizations We will provide tools to exchange and share visualization sessions. Weave's session history allows a user to save both a visualization and the process (the sequence of steps taken by the user) that generated that visualization. A user will be able to send the visualization session (session history) to a colleague who can examine not just the visualized data but also the steps that generated it. By making the session public, others can see the steps that were taken, the results that were obtained and can also begin to understand and improve the process. This is very similar to the STEM application of Weave described above. This tool will allow sites and organizations to permit their members or clients to upload and share visualizations and sessions. The session history can be replayed at any time and can be used as a teaching or training tool. 9.6. InfoMaps As mentioned above, future versions of Weave will include InfoMaps technology to help users visually manage and monitor information from multiple sources. InfoMaps allows the user to pre-select subject areas or sources to monitor. This could include patents, publications, newsletters, pages, or other text collections, whether private, public or 17 .. .. .. .. . subscribed. Once these selections are made, InfoMaps automatically updates and presents the information in a user-defined visual format on the desktop. InfoMaps’ technology has been integrated with Weave. Any user data selection in Weave automatically filters documents on the InfoMaps display for those most relevant. InfoMaps links Weave-selected data and provides immediate access to relevant documents, news and web site information in real time all within a single display. For example, a user may select a subset of a scatterplot that might include population, obesity and education. Relevant documents or news reports would then be made available within a window using an InfoMaps layout. This addresses the problem of dealing with the massive quantity of information available and needed by individuals by making that information accessible automatically as the data is explored. Without InfoMaps, users must continually perform web search queries and cut and paste between the Weave and Information Retrieval displays and pages. InfoMaps is easy to configure and use. It takes only five minutes to set up and one minute to learn. 9.7. ADA-compliant Weave According to the Americans with Disabilities Act of 1990 (ADA), any information made available online by a publicly-funded entity must also be made available to people with disabilities. This poses a challenge to these agencies that wish to use visualizations to share information, since visually impaired individuals may not have access to the information contained in a visualization in the same way that others have. The main goal of visualization is to take data, most often amounts that are too large to easily grasp by reading a table or text, and communicate a message at a glance. The blind or visually impaired do not have the benefit of gaining a message at a visual glance. The challenge is to make that message accessible and convey it to visually-impaired users so that they comprehend the same message as a person with no visual impairments. Another challenge is to create an automated process that detects the messages contained within a visualization. Current methods to make visualization accessible to visually impaired individuals utilize “tags”, descriptors that are read to the user via commonly used screen-reading devices such as Window Eyes or Jaws. These tags must be manually created for each visualization, resulting in a scalability issue. To make visualizations accessible to the visually impaired, we propose to use the session history feature of Weave, which would allow us to share specific descriptions of each visualization. This method would use Weave session history to automatically detect certain features of the visualization and provide that information to the user via a screen-reading device. As described above, the Weave architecture is based on the concept of session states. All significant actions made within the system, including the visual parameter settings, tool properties and user interactions are recorded and this “snapshot of history” is stored as a session state. The session state contains all of the information required to restore a given Weave visualization or instance. To create a visualization that is accessible to the 18 .. .. .. .. . visually impaired, the data contained within the session state can be automatically passed to a screen-reading device to provide an audible description of the visualization to the user. Although the Weave session state contains all of the necessary information to automatically re-create a visualization, we must determine what part of this information is needed to accurately describe the visualization. We will develop a model of the syntax that is required to accurately describe a visualization in English. Once the model is designed it will be tested on both sighted and visually impaired individuals. Then, with an understanding of the descriptors that are needed to communicate the message of a visualization, we will apply the model to Weave using the session state data to present the visualization data in spoken English. This new ADA-compliant visualization will then be tested on both sighted and visually impaired individuals. Although the primary focus of this project is to address ADA compliance needs, the ability to provide audible information about visualizations is also applicable to the general population, particularly those users whose eyesight is compromised due to age or from a short-term illness or injury. Figure 10 – Demographics of Atlanta, Georgia metropolitan area by county 10. Summary: Looking to the Future Looking to the future, the problems we face with data require new approaches, methods and tools that will allow every individual, group and organization to easily access, analyze and share data. 19 .. .. .. .. . Weave provides these tools. Weave is free, powerful, modern, flexible, well-supported and easy to learn and use. Weave provides universal web-based access to available public data. This allows the public to make informed decisions and thus to affect change. Our mission is to democratize data. We will change the world by making data available through Weave — providing visualizations for everyone — thereby enabling informed decision-making. Weave and its future releases will bridge the current data divide. Having data access is a start. Having tools that support and encourage continued access and analysis will ensure progress toward our goal of equal access to data. That is Weave -- technology that will enable everyone, organizations and individuals, to make informed decisions to influence the social, political, economic and environmental distribution of resources. 20