Weave - Community Science

advertisement
..
..
..
..
.
Institute for Visualization and Perception Research
University of Massachusetts Lowell
Computer Science Department
Lowell MA 01854
Revision date – 10 February 2016 – Copyright UMass Lowell
Weave
Information Visualization for Social Responsibility
Figure 1 – Obesity in the United States. CDC county-level data shows the percentage of the adult population that is
characterized as obese. Red indicates the highest levels, blue the lowest.
1
..
..
..
..
.
Table of Contents
1.
Vision ..................................................................................................................................... 3
2. Social Responsibility ....................................................................................................... 6
3. First Open Source Product: Weave ............................................................................. 7
4. Data ........................................................................................................................................ 8
5. Analysis ................................................................................................................................ 8
6. Visualization .................................................................................................................... 10
7. Session State .................................................................................................................... 12
8. User Experience ............................................................................................................. 14
9. The Near Future: High-Impact Activities ............................................................... 16
9.1.
SimpleWeave ............................................................................................................... 16
9.2.
STEM (Science, Technology, Engineering and Mathematics) Learning .. 16
9.3.
Large Group Collaboration ..................................................................................... 16
9.4.
Data Commons ............................................................................................................ 17
9.5.
Sharing Visualizations .............................................................................................. 17
9.6.
InfoMaps ........................................................................................................................ 17
9.7.
ADA-compliant Weave.............................................................................................. 18
10.
Summary: Looking to the Future ........................................................................... 19
2
..
..
..
..
.
1. Vision
Our mission is to change the world by democratizing data -- by providing access to
data anywhere by anyone for any purpose.
By equalizing the ability of organizations and individuals to use data, we all will be
better able to make informed decisions and to influence the social, political,
economic and environmental distribution of resources.
Challenges to Democratizing Data:



Access to data is limited. Our world faces significant problems, challenges that
cannot be resolved without access to data. Yet we keep information locked away
from the very people, organizations and institutions that can best help address
those problems.
Data that is accessible is not easy to understand. Most data is simply not
available in clear, usable formats.
It is expensive to access and analyze data. Most data remains unintelligible
except to a privileged population – those with the resources and tools to
manipulate, analyze and interpret it.
We will approach a true data democracy when everyone – regardless of position or power
in society – is able to understand, explore and interpret data in depth; to discover,
generate and test hypotheses; and to make informed decisions.
To make the best use of today’s veritable explosion of data, we need methods and tools
that allow every individual, organization and institution to raise their awareness of and
increase access to data. We also need ways to make analysis and sharing of that data
easier. Such tools will affect large-scale social change by determining trends, causes,
correlations and structures in that data.
Several specific steps must be taken to attain a true data democracy.

We must improve access to data. The first step is to provide universal webbased access to all available public data. To achieve this, we need software tools
that facilitate secure distribution and access.

We must provide education and training so that anyone can analyze and use
data for decision-making. True lasting change cannot occur without knowledge
and understanding. Data is the foundation of such knowledge.
Data  Information  Knowledge
3
..
..
..
..
.

Data becomes information once it is structured. Information becomes knowledge
once it is analyzed and visualized. That step – analysis and visualization – allows
knowledge to inform decisions. A well-informed public holds decision makers
accountable and enables all to make better decisions.

We must provide free, modern, web-based tools and training to all. To
become truly informed, the public must have access to data analysis tools that are
free, up to date, powerful, flexible well-maintained and easy to learn and use.
These tools must provide universal web-based access to all available public data
and must provide methods to share discovered knowledge.
To fulfill our mission, our goal is to provide technology that will make it easy for
everyone to access, analyze and share data.
Though not a short-term goal, technology for democratizing data is a realizable one.
In late 2011, the IVPR, in collaboration with the Open Indicators Consortium,
released Weave Version 1.0 (see iWeave.org) which enables access to public
data.
Weave Version 2 (anticipated release: Spring 2013) will include SimpleWeave, a
streamlined version, for users and small organizations with little or no IT support.
SimpleWeave will allow users to easily upload, present and share their data.
Weave Version 2 will also include document retrieval (Infomaps) which will
automatically link data to associated documents, reducing the time and effort for
separate online searches. Weave Version 2 will support collaboration, real time
sharing of data by up to hundreds of remote users.
Weave Version 3 will include an internet data engine that will enable search and
access of any and all publicly available data much like a Google search for
documents today.
The final anticipated version, Weave Version 4, will include access to large
national and international data sets.
These are exciting possibilities and we’re on our way.
Weave will empower small and large organizations, especially non-profits, to access,
analyze, visualize, and use data, and affect change through the power of data and
knowledge.
Data Questions + Citizen Analyses + Visualizations => Insights and Informed Decisions
4
..
..
..
..
.
The IVPR, its faculty, staff and students, are committed to providing the public with
free tools to access, visualize, and analyze data.
Here’s how you can help us support our goal.
Financial



Fund collaborative research or applied projects
Fund one of the high-impact activities (See Section 9)
Donate to the University of Massachusetts Lowell Foundation -- Weave Project
Actions





Download Weave and use it to analyze data or make data available
Work with us to make national, international, and global data sets available
Encourage the widespread use of Weave
Promote Weave, blog about it, post about it, download it, and bring it up in
conversation whenever possible.
Promote the concept that data exists as a resource for all
Figure 2 – Showing trees in Boston MA (each is labeled and probeable).
5
..
..
..
..
.
2. Social Responsibility
The Institute for Visualization and Perception Research (IVPR) will continue to develop
free and open-source software specifically designed to support not-for-profit
organizations and the general public in the exploration and presentation of data.
We are using the latest software technologies to develop software tools that support all
aspects of data visualization and analysis. These tools can also help disseminate webbased visualizations.
Free open-source software provides four freedoms or rights (as described by the Free
Software Foundation):




The freedom to run the program for any purpose.
The freedom to study how the program works and to change it to make it do what
you wish.
The freedom to redistribute copies so you can help your neighbor.
The freedom to improve the program and release your improvements (and
modified versions in general) to the public so that the whole community benefits.
Figure 3 - CDC Obesity data visualized by state
6
..
..
..
..
.
3. First Open Source Product: Weave
Weave is an open-source state-of-the-art Web-based Analysis and Visualization
Environment which provides software tools that researchers, educators, analysts, trainers,
students and the general public can use to analyze and visualize remote, local or
distributed data (iWeave.org).
Weave is the IVPR’s fifth-generation visualization system and incorporates 20 years of
research with embedded patented algorithms. The previous generations were desktop
versions designed and used to solve complex problems in a variety of application areas
including drug discovery, medicine, economics and national security.
Weave is being developed in conjunction with the Open Indicators Consortium (OIC) and
is currently in use by the Massachusetts Department of Early Education and Care, the
Massachusetts Department of Higher Education and many organizations and government
agencies including those in Boston, Chicago, Columbus, Grand Rapids, Kansas City,
Seattle, Arizona, Connecticut and Rhode Island.
People are using Weave to solve complex problems and to further the goal of data
democratization. (See iWeave.org for examples.)
Figure 4 - MBTA bus routes in Boston, Massachusetts. Red points indicate bus stops.
7
..
..
..
..
.
4. Data
Providing universal secure web-based access to all available public data involves:
1. Raising public awareness that relevant data exists
2. Finding the data.
3. Using accepted access standards.
We’ve addressed all three.



We use standard query systems to access data, whether that data is in a local
spreadsheet, a database on a server or distributed databases.
We are developing the National Data Commons (NDC) to make available very large
databases that are of interest to the public and organizations.
We are developing SimpleWeave (see section 9) to facilitate Weave installations and
further broaden data access. SimpleWeave will have numerous
visualizationexamples and Wizards to demystify the visualization process. It will
enable the average person to become data literate and make informed decisions, and
produce and share their own visualizations.
We and the OIC are are committed to the concept of open data and making publicly
available data accessible to all.
Figure 5 -- Weave-based website of OIC member, Connecticut Data Collaborative
5. Analysis
To gain knowledge, raw data must first be converted to information. This is
accomplished by curating the data which involves examining the data for errors or
8
..
..
..
..
.
omissions and adding metadata (data about the data such as information about its source,
collection and its structure). Computational tools can be used to efficiently handle
missing data, to identify correlations, outliers and patterns in data, to generate and
validate hypotheses and to convert these into actionable decisions.
Weave provides significant support for data analysis—from clustering to trend analysis.
Weave has both server-based computational engines such as the R-project for statistics or
analysis and client-based computation. An equation editor allows a user to define any
function on the data and generate new computed columns, titles or labeling.
R-project can perform various analyses including exploration of trends or identification
of clusters of data. The resulting clusters or ordering of data or attributes can be used as
controls for other high-dimensional visualizations or highlighted as selections in any
visualization. Missing data can be imputed, correlations computed (as per Pearson and
others) and multiple clusterings compared and all visualized. Weave also includes other
advanced analytic tools including Principal Component Analysis, Support Vector
Machines and Clustering, Bayesian Networks, Multidimensional Scaling and Association
Rules.
In addition, classifiers can be generated, tested and validated.
Once results have been identified (classes, trends, outliers, messages to convey), the
analyst can use Weave tools to print high quality visualizations, to generate interactive
visualizations on the web, and shortly, to make these web-pages ADA compliant.
9
..
..
..
..
.
6. Visualization
Visualizations are critical in the decision-making process. We can easily determine the
largest value within a column, but to determine its distribution is very difficult without
visualization. Visualizations support decision makers as well as the discovery process.
Figure 5 - Lowell Foreclosure data, Lowell, Massachusetts, showing census tracts and individual lots
Visualization Tools
All Weave visualizations can be embedded in web pages or used within Weave’s
integrated visualization and analysis system. The visualizations are rich, broad and
flexible. They range from exploratory and provide interactive experiences that are also
aesthetically controllable.
Analysis
Weave tightly couples analysis and visualization thereby improving the efficiency and
effectiveness of analysis. The user can generate a self-organizing map, a multidimensional scaling visualization or any other analysis-integrated visualization.
10
..
..
..
..
.
Advanced Visual Analytics
We have developed new powerful visualization and analysis technologies not just for
simple data but also for high-dimensional data (hundreds of thousands of variables and
millions of records). We have validated these high-dimensional visualizations in many
application areas including drug discovery, health records monitoring, economics and
national security. One of these tools, RadViz, has been extended to a multiple clustering
visualization called vectorized RadViz. Vectorized RadViz shows stable sub-clusters
within data and is a remarkably powerful visual and analytic tool. We have also enhanced
RadViz dramatically for Weave to provide strong user support.
Graphs and Networks
With Weave, graphs can be visualized and then linked with other visualizations to greatly
enhance visual analytics. In order to provide stable views of graphs, we have defined
anchor points (user defined, context defined or computed from some clustering or MDSlike algorithm). The anchor pointlocations stay the same throughout the algorithm
iterations thereby extending and speeding up current layout algorithms.
Merging Text and Visualizations
We have made our patented InfoMaps available within Weave. This provides an
integration of Weave with document collections, whether locally available, in databases
or in content management systems. For example, when selecting a subset within a
visualization, relevant documents in a collection can be identified and highlighted.
11
..
..
..
..
.
7. Session State
The Session State architecture on which Weave is based provides a mechanism that
tracks each activity performed within Weave. This mechanism can be used to address two
issues central to modern data analysis: the need to continually evolve new analysis tools
and the ability to validate and replicate computational results.
1. To continually improve, expand and update data analysis tools, is it imperative
that we understand how researchers and data explorers analyze data, discover
patterns, and gain insights.
2. To validate computational or visual results, analysts must be able to replicate
discoveries or analyses.
Weave tracks every user action and produces a session history. These single-user or
multiple-user session histories can be visualized and interpreted, used to build user
profiles, to construct a recommendation system, and to support training and maintenance.
Usage patterns may be identified and studied, sessions compared, user groups discovered
and anomalous users identified. Sessions can be shared and used as templates. Thus, any
user can view the visualization and the process including all the analytic steps used to
generate it. A session history becomes an object which can be distributed for others to
see and explore.
12
..
..
..
..
.
Figure 6 - Nitrogen load in New England streams displayed using river basin shape files. Snapshot
would be stored in the session state
13
..
..
..
..
.
8. User Experience
Commercial Grade Software
Usable, robust, maintainable and reliable software is expected from any software vendor.
The IVPR has developed commercial-grade software for the past 20 years, including
graphics software for Lockheed and Intel, risk assessment software for the Avon Breast
Center at Massachusetts General Hospital and surgical simulations with haptics tactile
feedback system for Sensable Technologies.
Figure 7 – The Rhode Island Data Hub website is based on Weave visualizations. Other live Weave-based websites
include: Metropolitan Area Planning Council (Boston), Michigan Data Collaborative (Grand Rapids) and Mid-Ohio
Regional Planning Commission.
14
..
..
..
..
.
Agile Software Development Process
Weave was developed using the Agile Software Development Process with continual
feedback from Weave users and developers. The current base of users in the OIC
provides a strong usability evaluation group. Internally our students and faculty
continually test all new capabilities in academic classes, in research projects and in
response to OIC members. Bugs are reported and quickly addressed.
Learning Community
The OIC supports a learning community that shares deployment experiences, supports
each other via group emails and holds an annual user group conference. A feature list is
maintained and open to the public for suggesting long-range future desired features and
also serves to identify usability issues. Weave is being extended to be ADA compliant
and large multiple user collaboration interfaces are being designed and implemented.
Future Capabilities
Weave has proven to be a welcome and timely system. Its future capabilities discussed in
the next session are broad and our design focus continues to focus on providing a high–
performance and usable system for everyone to analyze and visualize data.
Figure 8 - Lowell, Massachusetts – data displayed at parcel level
15
..
..
..
..
.
9. The Near Future: High-Impact Activities
Future releases of Weave will include several high-impact tools thatwill support our goal
of providing access to data anywhere by anyone.
9.1. SimpleWeave
As mentioned above, we are developing SimpleWeave, a release specifically designed for
the average person with no IT background. SimpleWeave will not require use of a server.
A small not-for-profit community without an IT department will be able to easily install
Weave. The user will download the SimpleWeave file, execute it, answer a few questions
such as what maps they would like to use (county, street, neighborhood, ...), where their
data resides (Excel file, database), and select a page from a template.
SimpleWeave will read the appropriate shape files from the UMass Lowell/OIC server,
set up the hosting pages and open a number of preset visualizations computed from the
user’s data. A wizard will then lead the user through the selection of visualizations and
interactions. This simple set-up solves a persistent problem for individuals and small nonprofits that do not have or cannot afford IT expertise, thus addressing a major
impediment to data democratization.
9.2. STEM (Science, Technology, Engineering and Mathematics) Learning
Weave session history tracks the actions taken while using Weave. We can support the
analysis and classification of Weave session data. This information can be used to study
how students approach learning, particularly in the STEM fields. Students could be asked
to use Weave to complete a STEM-focused task (for example, “study the migration of
birds”). As each student proceeds in his/her own way the system collects the student’s
session history. By examining these session histories, teachers can observe the individual
student’s problem-solving approach, monitor progress and determine where a student
may have gone astray or may have used an unconventional but effective strategy. Since
session histories can be clustered and analyzed, patterns may appear which show
common approaches to that exercise (patterns of steps, and even more interesting, a
classification into a very small number). These can be analyzed with demographic and
other data to identify correlations.
Such STEM tools will extend the classroom beyond current physical boundaries. They
also will allow researchers to explore learning theories and can be used to provide STEM
experience in non-STEM areas.
9.3. Large Group Collaboration
Weave will support groups of users collaborating across the world. Weave's current
collaboration feature allows a student who is at home to participate in class by following
16
..
..
..
..
.
what the teacher is doing and using Weave to interact while the teacher and class watch.
By extending collaboration to hundreds of users, Weave will improve user experience in
a variety of scenarios from large-scale distributed classrooms to groups that wish to
explore complex problems with experts from many different disciplines or viewpoints.
Such tools dramatically enhance webinars by providing users with a shared interactive
system for education and group discovery.
9.4. Data Commons
We will provide new and open access to the growing body of national and international
data (data sets and databases) now available on the Internet from both government and
non-government sources. Each data set contains metadata, information that describes that
data, in order to facilitate access and comparison. A user will be able to search for
databases that contain, for example, the query " availability of “Meals-on-Wheels” in
cities with low crime and fewer than 100,000 people” " The Data Commons will search
its databases as well as the public indexed databases and return links to those databases
that satisfy the query. That data will then be viewable in Weave.
Such a tool will allow any organization to compare itself with others, to see how others
are evolving, to perform queries across the collection of data and to enrich their research
with these external datasets.
9.5. Sharing Visualizations
We will provide tools to exchange and share visualization sessions. Weave's session
history allows a user to save both a visualization and the process (the sequence of steps
taken by the user) that generated that visualization. A user will be able to send the
visualization session (session history) to a colleague who can examine not just the
visualized data but also the steps that generated it. By making the session public, others
can see the steps that were taken, the results that were obtained and can also begin to
understand and improve the process. This is very similar to the STEM application of
Weave described above.
This tool will allow sites and organizations to permit their members or clients to upload
and share visualizations and sessions. The session history can be replayed at any time
and can be used as a teaching or training tool.
9.6. InfoMaps
As mentioned above, future versions of Weave will include InfoMaps technology to help
users visually manage and monitor information from multiple sources. InfoMaps allows
the user to pre-select subject areas or sources to monitor. This could include patents,
publications, newsletters, pages, or other text collections, whether private, public or
17
..
..
..
..
.
subscribed. Once these selections are made, InfoMaps automatically updates and presents
the information in a user-defined visual format on the desktop.
InfoMaps’ technology has been integrated with Weave. Any user data selection in Weave
automatically filters documents on the InfoMaps display for those most relevant.
InfoMaps links Weave-selected data and provides immediate access to relevant
documents, news and web site information in real time all within a single display. For
example, a user may select a subset of a scatterplot that might include population, obesity
and education. Relevant documents or news reports would then be made available within
a window using an InfoMaps layout. This addresses the problem of dealing with the
massive quantity of information available and needed by individuals by making that
information accessible automatically as the data is explored. Without InfoMaps, users
must continually perform web search queries and cut and paste between the Weave and
Information Retrieval displays and pages. InfoMaps is easy to configure and use. It takes
only five minutes to set up and one minute to learn.
9.7. ADA-compliant Weave
According to the Americans with Disabilities Act of 1990 (ADA), any information made
available online by a publicly-funded entity must also be made available to people with
disabilities. This poses a challenge to these agencies that wish to use visualizations to
share information, since visually impaired individuals may not have access to the
information contained in a visualization in the same way that others have. The main goal
of visualization is to take data, most often amounts that are too large to easily grasp by
reading a table or text, and communicate a message at a glance. The blind or visually
impaired do not have the benefit of gaining a message at a visual glance. The challenge is
to make that message accessible and convey it to visually-impaired users so that they
comprehend the same message as a person with no visual impairments. Another
challenge is to create an automated process that detects the messages contained within a
visualization.
Current methods to make visualization accessible to visually impaired individuals utilize
“tags”, descriptors that are read to the user via commonly used screen-reading devices
such as Window Eyes or Jaws. These tags must be manually created for each
visualization, resulting in a scalability issue. To make visualizations accessible to the
visually impaired, we propose to use the session history feature of Weave, which would
allow us to share specific descriptions of each visualization. This method would use
Weave session history to automatically detect certain features of the visualization and
provide that information to the user via a screen-reading device.
As described above, the Weave architecture is based on the concept of session states. All
significant actions made within the system, including the visual parameter settings, tool
properties and user interactions are recorded and this “snapshot of history” is stored as a
session state. The session state contains all of the information required to restore a given
Weave visualization or instance. To create a visualization that is accessible to the
18
..
..
..
..
.
visually impaired, the data contained within the session state can be automatically passed
to a screen-reading device to provide an audible description of the visualization to the
user.
Although the Weave session state contains all of the necessary information to
automatically re-create a visualization, we must determine what part of this information
is needed to accurately describe the visualization. We will develop a model of the syntax
that is required to accurately describe a visualization in English. Once the model is
designed it will be tested on both sighted and visually impaired individuals. Then, with
an understanding of the descriptors that are needed to communicate the message of a
visualization, we will apply the model to Weave using the session state data to present the
visualization data in spoken English. This new ADA-compliant visualization will then be
tested on both sighted and visually impaired individuals.
Although the primary focus of this project is to address ADA compliance needs, the
ability to provide audible information about visualizations is also applicable to the
general population, particularly those users whose eyesight is compromised due to age or
from a short-term illness or injury.
Figure 10 – Demographics of Atlanta, Georgia metropolitan area by county
10. Summary: Looking to the Future
Looking to the future, the problems we face with data require new approaches, methods
and tools that will allow every individual, group and organization to easily access,
analyze and share data.
19
..
..
..
..
.
Weave provides these tools. Weave is free, powerful, modern, flexible, well-supported
and easy to learn and use. Weave provides universal web-based access to available public
data. This allows the public to make informed decisions and thus to affect change.
Our mission is to democratize data. We will change the world by making data available
through Weave — providing visualizations for everyone — thereby enabling informed
decision-making.
Weave and its future releases will bridge the current data divide. Having data access is a start. Having tools that
support and encourage continued access and analysis will ensure progress toward our goal of equal access to
data. That is Weave -- technology that will enable everyone, organizations and individuals, to make informed
decisions to influence the social, political, economic and environmental distribution of resources.
20
Download