ijde_mt_pr_iw_v5 - University of Melbourne

advertisement
The Design & Implementation of Multiple-Linked-Views across
Multiple Geospatial Datasets
Ivo Widjajaa,*, Patrizia Russob, Chris Pettitb, Richard Sinnotta, Martin
Tomkoa
a
Department of Computing and Information Science, The University of Melbourne,
Melbourne, Australia
b
Faculty of Architecture, Building and Planning, The University of Melbourne,
Melbourne, Australia
Ivo Widjaja, Department of Computing and Information Science, The University of
Melbourne, Melbourne, VIC 3010, Australia. ivow@unimelb.edu.au
Provide short biographical notes on all contributors here if the journal requires them.
The Design & Implementation of Multiple-Linked-Views across
Multiple Geospatial Datasets
Keywords: linked visualizations,
Subject classification codes: include these here if the journal requires them
Abstract
TBW: Martin, Chris
Introduction
In our age of exponential growth of available data collected about all facets of our
society (Huberman and Adamic, 1999), effective and efficient data visualization
techniques are of upmost importance to enable (a) initial visual data exploration of the
patterns detectable in the datasets (Tuckey, 1980), and (b) efficient communication of
the results of analytical (scientific) enquiry. Web-based applications now provide
efficient means to provide access to very large datasets that are beyond the processing
capacity of single computers, as well as allow the mashing of data sourced from diverse
sources. Facilitating the visual exploration of the patterns contained in and across such
disparate datasets is a challenge that motivates this paper.
While many Web-based applications provide more than one visualisation form and
modalities, many of them are often designed with a focus on a particular visualisation
paradigm (e.g., mapping environments enabling 2D mapping – Google Maps,
OpenStreetMap; 3D spatial data visualization in virtual earth environments –
VirtualEarth, GoogleEarth; or statistical graphics – Gapminder). Some applications
enable the combination of statistical graphics (barchart, histogram, line chart) and
thematic mapping. Only a few provide capabilities of linking these various visualisation
forms by a technique called multiple-linked views (MLV), such as the OIC Weave
(http://www.oicweave.org/). GeoViz Toolkit (Hardisty and Robinson, 2011), while not
a Web-based system, enables users to highlight relationship across 2D map, charts, and
tabular data.
Providing multiple linked view modalities has been shown to support user cognition in
data exploration tasks in various contexts (Buja et al., 1991, Buja et al., 1996,
Schneiderman, 1996), including in the geospatial context (Anselin, 2002; Roberts,
2004). It is, however, a challenge to support multiple linked views of data in the context
of heterogeneous, often federated data sources, where visualisation in a single useroriented environment is required that allows ad-hoc combinations of data from diverse
data sourced from distributed, autonomous data providers.
In this paper, we first, systematically explore, in the spirit of the human factors
discipline, how MLV can lead to better understanding of information. Human computer
interaction literature has well established that multi-modalities in user interaction and
the multi-media presentation of data can increase the usability of a software system
aimed at data presentation (Vetere and Howard, 1999). This is the theoretical starting
point of this paper. This paper is grounded in a systematic literature survey of various
MLV design and implementations in (particular) geospatial applications. We critically
analyse their context-of-use, as well as identify the issues and challenges around the use
of MLV.
Second, we describe an approach enabling the implementation of an environment where
diverse visualization views (exposing different dimensionalities of explored data, such
as 2D and 3D spatial data visualization, raw data display and statistical graphics) can be
linked and visually correlated in the context of dynamically, ad-hoc sourced datasets.
The approach is demonstrated through the visualisation of Twitter data as a rich source
of ad-hoc, streamed, spatio-temporal data. The visualisation is intended to highlight the
spatial trajectories of individual (anonymised) users as well as the general pattern of a
Twitter crowd within some geospatial boundaries. To achieve this, we use a Space Time
Cube (STC), a spatio-temporal visualization inspired by the space-time analytical
approach proposed by Hagerstrand (1970) and further developed by Kraak (2003) and
others). We demonstrate how the 3D STC can be further linked to a slice of the data
rendered over a 2D map as a point-and-path visualization, and linked to the
corresponding views of the data in a tabular view. This provides the ability to access the
data records’ details on demand (Shneiderman, B., 1996). We further introduce
additional contextual data that can support the 3D and 2D views. In this case an on-thefly linguistic analysis of the tweets is performed to explore the relationship between
space, the tweets and their linguistic properties. These contextual data is also visualised
as part of the existing views, which can be linked to other visualisation forms (e.g., bar
chart showing frequencies of individual languages) of the contextual data and their
aggregates. This approach brings a richness of presentation, which can simultaneously
provide “zoom-in” (detail) version and “zoom-out” (overview) version of the overall
data, and thus facilitates the exploration of the rich phenomena contained in related
datasets.
The visualisation environment is investigated as part of the Australian Urban Research
Infrastructure Network project (www.aurin.org.au). AURIN is tasked with developing
e-infrastructure providing access to diverse, federated data, modeling and visualisation
tools to support the urban researcher community in Australia. Such an infrastructure
model provides the opportunity to link a particular dataset (e.g. demographic, public
health or even Twitter data) with for example, socio-economic indicators of the relevant
geospatial context. Through providing multiple-linked-views across a myriad of spatial
datasets we believe this increases the usability and increases the overall utility of the
data.
Background
Theory of Multiple Linked Views
Schneiderman (1996) identified seven elementary tasks humans perform when
exploring data: overview, zoom, filter, details-on-demand, relate, history, extracts. Data
need not be tied to a single representation, and different representations may indeed be
applicable and appropriate to discover different patterns in the data. Multi-linked views
(MLV), also known as Coordinated Multiple Views (CMV), are terms referring to an
interactive visualization construct where two or more data views (each a different data
representation form) of identical or related data are made interdependent in order to
enhance the data exploration and information seeking behaviour of the user (Buja et al.,
1991; Buja et al., 1996). MLVs can be viewed in Schneiderman’s scheme to support the
relate, and to a lesser extent, zoom tasks in data exploration.
Context-of-use of Multi-linked views
MLV are used to depict data from different perspectives and in multidimensional
spaces. They have been explored in a number of application contexts, including
geopspatial data visualization in 2D (Anselin, 2002; Roberts, 2004, Weaver, 2004).
Some, such as LinkWinds (Jacobson et al. 1994) and Visage (Roth et al. 1996), also
include views for three-dimensional spatial visualizations. Here multiple views are used
to show data attributes, attributes relationships, related information but also to control
and display objects (Boukhelifa et al., 2003). In most cases, each view contains data of
the same type and exposes a visualisation form most appropriate for the specific
attribute or facet of the data (Boukhelifa et al., 2003). In short, the design possibilities of
MLV are numerous, depending on the realised variety of view composition, design and
interaction means.
The claimed benefits of MLV are that the separation of data attributes into different
views decreases clustering, extends the analysis scope and facilitates data comparison
(Butkiewicz et al. 2008; Boukhelifa et al. 2003). A common notion is that MLV
integrates the benefits of each view in one environment allowing simultaneous
application of views with no need for users to conduct data inspection separately and
multiple times.
Interactive Aspects of Multi View Linkage
How the views are linked together in order to communicate related information differs
considerably. Coordination, as view linkage is also called, can take many forms
(Boukhelifa et al. 2003). Views may be displayed side-by-side or individually and
linked by graphical objects (e.g. lines). If multiple views are used, coordination can be
used to display views permanently and simultaneously or asynchronously, respectively
(Boukhelifa et al. 2003).
Interactive techniques that support coordination and regulate how views interact
between each other are an essential element of MLV for exploratory visualisation.
Interactive techniques commonly used in MLV are linking, brushing and navigational
slaving (Pillat & Freitas 2006; Wang Baldonado et al. 2000). These interactive
techniques provide actions and manipulations in one view (e.g. selection, highlighting,
filtering, query) to be executed in all other views. The interactivity is often expressed by
the change in the visual encoding (change in visual attributes) to display each data
record in a given view (Mackinlay, 1986), in particular during interactions where data
are selected. Simultaneous communication between views is particularly beneficial for
facilitating information discovery and recognition of attribute relationship. Examples of
MLV that illustrate brushing techniques are Xmdvtool, Spotfire, Cdv and IVEE
(Boukhelifa et al. 2003).
The coordination configuration can be either imposed by the system, or can be, to a
certain extent, defined by the user. Applications such as GeoVista Studio (Hardisty and
Robinson, 2011), Snap-Together (North and Shneiderman, 2000a), InfoVis (Pillat &
Freitas 2006) and Improvise (Weaver, 2004) allow users to (partly) design and compose
their own MLV environments according to their requirements and to select field-ofviews, compositions and linkages of views. Such configuration is, however, often
relatively complicated and geared towards expert visual data miners.
Useability of Multiple Linked View Displays
North and Shneiderman (2000) show how coordinated displays can significantly
improve user performance in visual data navigation and interpretation. The
simultaneous display of multiple views is, in the case of linked overview-and-detailviews, particularly useful. This use-case relates to a hierarchical coordination between
overview and detailed views of data (overview-and-detail-view). This can be best
illustrated on spatial data visualizations, where an overview 2D map shows the total
area of map coverage and highlights the location (or geographic area) of the detailed
view. The detail-view is usually displayed in the foreground as the main view, and
allows a field of focus into spatial micro-scale processes (Butkiewicz et al. 2008). To
preserve the focus and context features, past research suggests a relative scale factor of
up to 25-30 (Plumlee & Ware 2006).
By showing both views, user’s awareness of spatial context and scale difference is
enhanced. Further benefits of overview-and-detail-view are: it allows comparison of the
same spatial information at two different scales (micro and macro); it facilitates the
identification of possible scale-based pattern differences, and it supports the reduction
of occlusion when dealing with great amount of data. While this use-case supports the
need for coordinated linked views, other research emphasizes that MLVs may not
always be appropriate as they may be overwhelming and unnecessary complex for users
(Wang Baldonado et al. 2000).
The useability of MLV for completing typical tasks such as identifying attributes
relationships and patterns can be systematically evaluated by measuring the
effectiveness (success of responses) and efficiency (required time) of the MLV
interfaces, as well as by the collection of qualitative assessments of users’ satisfaction
(North & Shneiderman 2000).
User-friendliness of MLVs is often associated with the amount of users‘ working
memory required (Wang Baldonado et al. 2000; Convertino et al. 2003; Vetere &
Howard 2000). Working memory is where users store short-term information directly
related to the task at hand (Plumlee & Ware 2006). The amount of information stored
determines a users‘ mental workload which in turn might be translated with how
cognitively overwhelmed users are (Wang Baldonado et al. 2000; Convertino et al.
2003; Vetere & Howard 2000). Any additional views, perceptual techniques and cues
but also redundant information in MLV occupy the already limited working memory.
Accordingly, cognitive research suggests the reduction of unnecessary and redundant
interface elements and information. However, that a trade-off between increased
complexity and information provision is likely to occur is stated by Nielsen (1993: 115):
”Every additional feature or item of information on a screen is one more thing to search
through when looking for the thing you want”. In other words, any additional interface
element uses limited working memory and increases mental workload. On the other
hand, however, the added information could be relevant for task solving.
Design Considerations for Multiple Linked View Displays
As shown, careful consideration for the applicability and design of MLVs is paramount.
A better understanding of how users‘ cognitive processes and abilities influence task
completion in MLV is an important step towards designing less overwhelming MLVs
(Convertino et al. 2003). Given the lack of design guidelines specific for MLV, Wang
Baldonado et al. (2000) formalised a list of eight guidelines for the application and
design of MLV. These eight guidelines are: diversity, complementarity, decomposition,
parsimony, space/time resource optimization, self-evidence, consistency and attention
management.
The first two guidelines (diversity and complementarity) support the use of MLV where
a diverse (multiple attributes, abstraction levels, etc.) and complementary (nonredundant) dataset exists. Guideline 3 and 4 (decomposition and parsimony) deal with
the number of views to include in MLV. While decomposition guides the splitting of
complex views into multiple simpler ones that are easier to interpret by users,
parsimony suggests to reduce the number of the views to a minimum to decrease the
cognitive effort required to mentally switch between views (Wang Baldonado et al.
2000). Balancing these two requirements well in a MLV application is of primary
concern.
The last four guidelines (space/time resource optimization, self-evidence, consistency
and attention management) are more design-oriented, yet no less important. The
space/time resource optimization guideline indicates that the time required to develop
MLV is likely to be higher then for independent views. Therefore, it is worth
considering whether the benefits of using MLV compensate time requirement for
developing it. Another aspect to consider when developing MLV is screen space
requirement. MLV require usually more space on the screen, especially when the views
are displayed side-by-side. A possible solution to this problem is to display the views
sequentially or even better on-demand to have a smaller number of views on screen at a
time. This solution may also solve problems related to guideline 4 (parsimony) in that it
reduces the risk of cognitively overwhelm users without reducing information. The
attention management and self-evidence guidelines use perceptual techniques (e.g.
brushing, movement, sound) and cues (e.g. highlighting, selecting) to lead the user
through the MLV environment and enhance relationship recognition respectively. For
facilitating user interaction with MLV environments, the consistency guideline suggests
all views to use the same presentation and interaction design.
Besides the data to be analysed, the next significant determinant on the applicability of
MLV are the analytical tasks that have to be supported by the data-viewing
environment. Previous research has shown that MLV increases users‘ performance and
satisfaction especially when completing difficult tasks which require switching between
views (Pillat & Freitas 2006; North & Shneiderman 2000). Compared to users of
independent views, MLV users performed better for tasks such searching details,
identifying data in another scale and recognising non-evident patterns. For simple
looking-up tasks where the inspection of an individual view was sufficient, no
performance difference between independent views and MLV users was measured. In
relation to user satisfaction with the system, MLV scored higher than independent views
and MLV users state that not having perceived coordination is disturbing.
Application Context: A flexible, Web-based environment for ad-hoc urban data
access, analysis and visualisation
AURIN is an initiative of the Australian government aiming at facilitating the access to,
and analysis of data of relevance to urban researchers across the country. AURIN’s
main deliverable is the AURIN Portal, a fully Web-based, eResearch environment that
exposes to researchers a range of functionalities that allow for the discovery and
acquisition of diverse, spatially-referenced data (such as demographic datasets, public
health datasets, fundamental GIS layers, and many others), and their interactive
visualization and exploration in a rich user environment (Sinnott et al., 2012). The
users’ client-side requirements are minimal, and restricted to a modern Web browser.
The users can interact with the system facilitating all steps of the shop-explore-analysecreate-collaborate and share cycle (Tomko et al., 2012). The environment allows
exploratory data analysis linking map, tabular, and statistical graphics views, and
further supports the researchers with a confirmatory data analysis capability based on a
workflow environment.
A fundamental property of AURIN is that all data are directly sourced, on-demand from
autonomous, federated data providers through targeted data services. A user is guided
through a process of selection of area of interest (through a MLV interface exposing the
various geographical regionalisations of Australia on a map as well as in a form similar
to the Mac OS finder interface); and a subsequent step of data search and discovery
(Tomko et al., 2012). The acquired data are then acquired, and can be visualized
through a coordinated, MLV interface linking tabular, map and graph views based on a
lightweight set of user-oriented (and data provider provided!) metadata. In the following
Sections, we illustrate how this mechanism works and how it addresses some of the
guidelines from Wang Baldonado et al. (2000).
Linked views across Single vs Multiple Data Sources
Most applications that support multiple-linked views only allow for linking across
multiple forms of visualisation of a single dataset. This is not sufficient in situations
where a phenomenon is described by multiple datasets, often sourced from different,
federated data providers. The ability to link views of data pertaining to different datasets
is a highly desirable feature for advanced exploratory data environments such as
AURIN, as discussed later.
The challenges for providing this capability are the variability in the ways various data
providers describe their datasets. Typically, within a single dataset, brushing techniques
relies on a particular data column – it’s key – to link multiple views of the dataset. With
multiple datasets, normalisation of keys needs to be performed before the actual linking
of datasets can be done (Figure 1).
Figure 1 Linking views across multiple datasources – link normalization. The
records (features) in each dataset can be linked based on their feature keys, as long
as they belong to the same geographical aggregation. Each key can be normalised
based on the application of a regular expression recorded in a metadata entry.
Dataset key normalisation
Dataset key normalization is a procedure that allows to associate the key of one dataset
A (keyA) with that of dataset B (keyB), and apply a mapping f to the literal value of keyA
if required.
𝑓
𝑘𝑒𝑦𝐴 → 𝑘𝑒𝑦𝐵
(1)
The normalisation is enabled through an advanced data registration system maintaining
information about the primary keys of federated datasets. These primary keys relate to
the unique identifiers of regions mapped in the various regionalisations for which the
Australian census data are reported by the Australian Bureau of Statistics (ABS). We
note that beyond the key parameter, the geographic aggregation of the dataset needs to
also be specified in the metadata (whether the data relate to suburbs, statistical local
areas on any other geographies). Only data pertaining to the same level of aggregation
should be compared together.
The mapping is performed on-the-fly by applying a regular expression based on the key
of a dataset, if and only if such a normalization is required (e.g., if a federated data
provider has altered a primary key, for instance by prefixing an integer identifier by a
string, such as SLA – short for statistical local area, one of the regionalizations of
Australia). This regular expression is stored in the metadata store together with
information about the aggregation of the data (in this case, SLA). This mechanism also
allows for additional safeguards that enable linked views only between datasets that
relate to the same data aggregation level and thereby guide users through correct
research practice.
One-to-One and One-to-Many Linking
A one-to-one linking where there is a bijective relationship between records in one or
multiple datasets and/or across multiple views realized based on linking over a unique
primary key is the most common type of MLV linkage. The key of a record in the
dataset A is then directly mappable to a record in the dataset B, as are any of the views
of these datasets. Linking through direct correspondence is used to emphasize focus on
a single data record relating to a single real-world entity and provides visual isolation,
slicing a dataset and providing a view of the entity in diverse contexts. This is the
default behavior in many MLV applications such as GeoVista.
While most shopped datasets represent a single snapshot in time of a particular measure
across a particular geospatial aggregation level, some datasets might contain categorical
data (such as a result from a data classification process). Examples of data classification
processes abound in scientific applications, for instance as a preparatory step for
choropleth mapping (e.g., binning into Jenks classes (Jenks, 1967)), or as a result of
other classification (in our use-case, a language classification of the tweets’ text in a
Twitter dataset).
Linking the results of a classification process (e.g., counts of records belonging to a
given category) in a MLV display is therefore highly desirable in an analytical interface.
This requires the linking of multiple views beyond a unique record and its key, and may
require a one-to-many (surjective) mapping. This approach is developed to highlight the
association of a record from a particular dataset with as many features as possible from
other datasets. We call the brushing interaction resulting form such surjective mappings
greedy brushing.
The linking may be done by association of a categorical attribute of a dataset, other than
the primary key. This approach works best with the entire data set (cube) or its
substantial sub-cube instead of a single slice of a cube-like dataset. The one-to-one and
one-to-many linking can be depicted as shown in Figure 2.
Figure 2 One-to-one (top) and one-to-many (bottom) linking
Consequently, we investigate whether multiple views linked by non-unique attributes in
greedy brushing could provide additional benefits in data exploration and thus support
the acquisition of insights from multiple visualisations.
Implementation
MLV-based scientific data visualization and exploratory analysis can greatly benefit
from enhancements in Web infrastructures, in particular the recent progress in browser
capabilities (HTML5) and the increased availability of data sourced from a variety of
Web-enabled services. In this Section, we first present the implementation details of the
ordinary and greedy brushing record linkage in the ad-hoc, MLV environment
implemented in AURIN. We then evaluate the implementation, and demonstrate how
AURIN’s federated data access philosophy and flexible user interface capabilities
address some of the guidelines introduced by Wang Baldonado et al. (2000).
Data
The demonstration is based on a multiple-linked-view system using a dataset harvested
from Twitter. This dataset is a convenient and accessible source of spatio-temporal data
(tweets are positioned through geographic coordinates and time is recorded as the tweet
timestamp), with secondary attributes that can be used in an automated classification
process, such as language detection of the Tweet’s text. The spatio-temporal aspect of
the dataset is visualized using a 3D Space-Time Cube (latitude, longitude and time)
visualisation as one of the data views. The twitter dataset used for our experiments was
anonymised and the location of the tweets shifted by about 100m to preserve privacy.
The dataset has not been released to the users of the AURIN Portal and has only been
used for development and concept testing purposes.
The initial, primary dataset contains numerous tweets from multiple users. A second
dataset is generated though an automatic classification process from the first dataset,
analyzing the Tweets and detecting the language used in the message. This is a second-
derivation dataset. A third dataset used is also derived form the primary dataset, and
contains the list of users (restricted to the users that have contributed tweets in the
primary dataset) and the corresponding number of tweets. In this sense, it is also a
second-derivation dataset. Lastly, we have a third-derivation dataset that lists the
languages detected in the tweets and their associated counts.
Use Cases
Over these datasets, our system allows for various views and visualisations including: a
data grid (a tabular, spreadsheet-like representation of raw data), barchart, scatter plot,
2D map and the space-time cube. Across these views the Web-based application
provides linking and brushing capabilities. The linking and brushing behavior can be
raw twitter dataset
tweet_id
user_id
user count (Z)
text
lat
lon
created_at
user_id
count
bar chart of user count (A)
point plot with user path (C)
twitter dataset
with languge identified (X)
tweet_id
user_id
text
lat
lon
created_at
language
space time cube (D)
point plot of lat, long (E)
language count (Y)
language
count
bar chart of language count (B)
Figure 3 Datasets and visualizations used in the use case. The raw Twitter
dataset (top-left) has been processed and visualized in data grid views of the user
counts (Z), a grid view of languages of Tweets (X), and the grid view of
language frequencies (Y). The visualizations explored are (A) – the bar chart of
user counts, (B) – the bar chart of language counts, (C) – the point plot of user
paths, (D) – the space-time cube of individual users, and (E) – the point plot of
user locations.
described in the following examples:
Use Case 1 - 1:n brushing: Consider the visualisation views described in Figure 3.
This example demonstrates a 1:n brushing, where the highlighting of a single record’s
representation in a primary or secondary dataset (a tweet point) on the 2D map
highlights the corresponding entry in the primary data table (data grid view), the
record’s visualization in the space time cube (point in 3D coordinates), as well as the
corresponding representations of the aggregated records in the language and user count
bar charts and the grid views of the third-derivation datasets (not shown, for legibility)
(see Figure 4). For all purposes, these derived datasets are independent datasets linked
through normalized keys, and the behavior of the system is identical to that where adhoc datasets are linked.
Figure 4 Use-case 1: Brushing across various data view modalities.
Use Case 2 – 1:n greedy brushing: The selection of an aggregated data record from a
third-derivation dataset representing m raw records links to n raw data records in the
primary or secondary datasets. Highlighting a particular language in the bar chart
highlights all the tweets of the given language in the map (Figure 5). Similarly, the
selection of a single user’s bar in the bar chart of its tweet frequency leads to the
highlighting of all of the tweets from this user, as well as the highlighting of the user’s
path on the map Figure 6.
Figure 5 One-to-many brushing by language.
Figure 6 One-to-many brushing by user.
Use-case Summary
The use-cases illustrating our approach demonstrates how complex patterns contained
in a number of closely related, but nevertheless disjoint datasets can be visualized and
explored through multiple linked views operating in a dynamic Web-based environment
where ad-hoc datasets can be linked through a relatively simple key normalization
process.
The following table summarizes some of the technical issues and potential benefits that
emerge during our implementation of MLV (Table 1).
Table 1. Summary of greedy brushing in MLV Implementation
Single Dataset
Multiple-LinkedViews
Metadata
A attribute key needs
to be provided
Attribute processing
No processing
needed.
Implementation
Challenges
Simple
Visualisation
Strength
Allow focus
(highlights through
brushing) and zoom
in (detail through
tooltip)
Multiple Geospatial Dataset
Multiple-Linked-Views
One-to-one Linking One-to-many
Linking
Other than attribute key, a regular expression
may be needed to allow for normalisation of
keys across matching datasets, together with
a specification of geographical aggregation
level
Normalisation
Matching strategy
needed across keys
needs to be defined
(e.g. specifying
additional categorical
attributes)
Simple, but
Complex
normalisation needs
normalisation and
to be done.
matching.
Dependency
Computationally
checking needs to
expensive when
ensure the same
indexing is not
geospatial level
managed
linking
Allow focus
Allow focus
(highlights through
(highlights through
brushing) and zoom
brushing) and zoom
in (detail through
in (detail through
tooltip)
tooltip).
Additionally, users
may see distribution
patterns in other
views
We further evaluate our approach from the perspective of the eight considerations
outlined by Wang Baldonado et al. (2000), against the AURIN Portal’s implementation
of the MLV concept. We address the first four considerations at a coarse level as
follows:
 Diversity and complementarity: diverse, ad-hoc datasets can be linked together
on demand by users. The configuration of the linkage is based purely on
metadata and relates to the identification of the dataset’s key, it’s normalization
mapping, and it’s aggregation level;

Decomposition and parsimony: in our approach, each view is atomic and
presents a single visualization modality. The addition of views to the graphical
visual interface is done on demand by the user, and is not defined nor limited.
The user is free to open and close any visualization view that is applicable to the
data explored.
The detailed evaluation applied to the use cases is summarised in Table 2.
Table 2. Summary of our MLV Evaluation using Wang Baldonado et al.'s (2000).
Guidelines for MLV design
MLV in AURIN portal
(Wang Baldonado et al., 2000)
Diversity
Attributes
Posting date and location and language of
1
tweet
Models
2D vs. 3D displays
User profiles
Spatial vs. non-spatial perceptual
capabilities
Levels of abstraction More detailed 2D map vs. overview in 3D
Genres
Bar chart vs. tabular view
Complementarity
Relations between geographical and
2
temporal distribution and language of tweets
Decomposition
Geographical and temporal views, language
3
count, number of tweets per sender
Parsimony
User defined
4
Space/time resource optimization
User defined
5
Self-evidence (perceptual cues)
Brushing, highlighting
6
Consistency Interface of views
Same line and circle colours in 2D map and
7
STC view
State of views
Provided by brushing functionality
Attention management (perceptual Highlighting
8
techniques)
The evaluation has demonstrated that the MLV approach implemented in AURIN
supports a significant diversity of data and visualisation forms. In this context, the
relations between geographical and temporal distribution and language of tweets for
more comprehensive analysis are emphasised. The content of the summary table is
decomposed in two geographical and two bar charts supporting user’s working memory.
In relation to guideline 3, 4, and 5 great freedom is provided for arranging the views.
The user can decide whether and where to display the individual views as well as
determine the size of each view. This freedom for view arrangement and composition
can be both an advantage and disadvantage, and requires deliberate user action.
We assume some level of experience in data analysis with MLV to exist for defining
efficient composition and arrangement of views. Especially for users who are unfamiliar
with MLV a default interface may be useful, providing a dashboard-like environment.
The creation of such an interface is, however, complex, as dashboards are often
tailoured to specific use-cases, whilst AURIN allows users to identify the datasets of
interest, and gives them full freedom to analyse and visualise the data on-the-fly.
Brushing and highlighting respectively ensure that the user is led between the multiple
views and that important features are highlighted. While the analysis state is shown in
all views simultaneously (brushing), the lack of interface consistency (i.e., bar chart and
choropleth map colours) between the multiple views represents the major limitation of
this MLV at this moment.
The evaluation revealed that the provision of MLV in this case is highly appropriate
(guidelines 1-4). However, providing a default composition and arrangement of the
views as well as more interface consistency could improve presentation and use of the
MLV.
Discussion
We now discuss our MLV approach using a cost-benefits analysis, over some
computational and cognitive aspects associated with the implementation.
Computational overheads
The creation of on-the-fly normalisation and one-to-many matching is computationally
expensive. One possible solution to overcome this is the use of real-time indexing
during the process of data prefetching or attaching the categorical indexing during the
build of the view. This is necessary for the enablement of the brushing functionality.
While some of these steps can be performed on the client side, it is likely to be more
efficient to offload the computationally-heavy transformations (such as regular
expression application) to a server-side process performed once, following the data
discovery and acquisition process.
The brushing and linking in our system are done automatically based on metadata
records configured by data providers. As such users need not worry about the setting up
of the linking or the selection of the attributes for linking. While this is convenient, we
feel that an option should be provided for the advanced users, who want to have explicit
control over how the linking should be done. This is also opportunity to provide users
with the ability to specify how the one-to-many transformation should be done.
Cognitive benefits
We have observed a number of benefits for pattern discovery that is facilitated by our
approach to on-to-many linking over ad-hoc datasets. We highlight these benefits on the
following two scenarios:
Exploring the change of behavioural patterns in space.
Consider the point and path visualisations of a Twitter user’s timeline (Figure 4). The
vector paths are colored based on user identity. This facilitates the exploration of users
behavior in space. If we want to explore whether the language usage of the users is
correlated with their spatial behaviour, we can use the brushing over a particular
language in the language count bar chart to highlight the distribution of a particular
language in the map. Finding out that the languages spread over multiple paths and the
presence of non-highlighted nodes within a single path indicates how users used
multiple languages. This is the case with some tweets over the international community
of Melbourne, which also hosts a large number of international students and visitors
using, interchangeably, a number of languages (e.g. thai and english)
Exploring the linear distribution of values over a data grid.
The ability to interactively sort datasets in a natural order (for instance, by time) can be
used to explore the temporal change in a dataset. The data grid view can provide an
indication of the distribution, emergence or spread of particular patterns in the data, for
instance when exploring categorical values. This can be used to answer queries such as
how the use of particular languages varies over time in a given spatial region or for a
given user, or whether a user communicates through Twitter in regular intervals or
rather Tweets only in bursts (in the context of other users).
Conclusions and Future Work
We have described the AURIN approach to the implementation of MLV interfaces in an
ad-hoc, federated data environment. The issues, challenges, and solutions faced when
implementing this solution have been illustrated on two use-cases using a Twitter
dataset containing spatio-temporal data. Some distinctive themes surfaced: the
enablement of MLV in a distributed, heterogeneous database system requires more
complex data normalisation and metadata handling. Such an environment has opened up
the possibility of exploring aggressive view linking (such as greedy brushing of one-tomany records), where the link is derived from primary keys and relevant categorical
data attributes. We showed how this approach brings up a novel feature of the AURIN
environment, demonstrated across multidimensional views of the data (table, charts, 2D
map, and 3D digital earth). We have discussed the potential benefits of improved
support to user cognition workload when detecting insightful pattern while highlighting
the need to care for impending complexity.
Our future work will focus on the improvement of the key normalization approach and
the exploration of its behavior when handling temporal visualizations. We are also
exploring the handling of explicitly grouped data (such as temporal slices of a larger
data cube), in particular with respect to facilitating the automated construction of oneto-many MLV environments for temporal pattern exploration.
References
Anselin, L., Syabri, I., & Smirnov, O. (2002). Visualizing multivariate spatial
correlation with dynamically linked windows. Urbana, 51, 61801.
Boukhelifa, N., Roberts, J.C. & Rodgers, P.J., 2003. A Coordination Model for
Exploratory Multi-View Visualization.
Buja, A., Cook, D., & Swayne, D. F. (1996). Interactive high-dimensional data
visualization. Journal of Computational and Graphical Statistics, 5(1), 7899.
Buja, A., McDonald, J. A., Michalak, J., & Stuetzle, W. (1991). Interactive data
visualization using focusing and linking. In Visualization, 1991.
Visualization'91, Proceedings., IEEE Conference on (pp. 156-163). IEEE.
Butkiewicz, T. et al., 2008. Multi-focused geospatial analysis using probes.
IEEE transactions on visualization and computer graphics, 14(6),
pp.1165–72. Available at:
http://www.ncbi.nlm.nih.gov/pubmed/18988960.
Boukhelifa, N., J. C. Roberts, and P. J. Rodgers (2003). A Coordination Model
for Exploratory Multiview Visualization. In Proceedings of the
International Conference on Coordinated and Multiple Views in
Exploratory Visualization, 2003. pp. 76-85. IEEE.
Convertino, G., Chen, J., Yost, B., Ryu, Y. S., and North, C. 2003. Exploring
Context Switching and Cognition in Dual-View Coordinated
visualizations. In Proceedings of the International Conference on
Coordinated and Multiple Views in Exploratory Visualization, 2003. pp.
55-62. IEEE.
Hardisty, F., and Robinson, A. C. 2011. "The geoviz toolkit: using componentoriented coordination methods for geographic visualization and analysis."
International Journal of Geographical Information Science 25 (2). pp.191210.
Hägerstrand T (1970) What about people in regional science? Paper Reg Sci
Assoc 24:7–21
Huberman, B. and Adamic, L. Growth dynamics of the World Wide Web. Nature
401(6749), 131 (1999).
Jacobson, A. S., Berkin, A. L., and Orton, M. N. (1994). LinkWinds: interactive
scientific data analysis and visualization. Communications of the ACM,
37(4), 42-52.
Jenks, G. F. (1967). The data model concept in statistical mapping.
International yearbook of cartography, 7(1), 186-190.
Kraak, M. J. (2003). The space-time cube revisited from a geovisualization
perspective. In Proc. 21st International Cartographic Conference (pp.
1988-1996).
Mackinlay, J. (1986). Automating the design of graphical presentations of
relational information. ACM Transactions on Graphics (TOG), 5(2), 110141.
Nielsen, J. (1993). Usabilig Engineering. Chestnut Hill, MA: AP Professional.
North, C. & Shneiderman, B. (2000a). Snap-Together Visualization : A User
Interface for Coordinating Visualizations via Relational Schemata. Proc.
Advanced Visual Interfaces 2000.
North, C., & Shneiderman, B. (2000b). Snap-together visualization: can users
construct and operate coordinated visualizations?. International Journal
of Human-Computer Studies, 53(5), 715-739.
Pettit, C., I. Widjaja, et al. (2012). Visualisation Support for Exploring Urban
Space and Place. ISPRS Annals of Photogrammetry, Remote Sensing
and Spatial Information Science, Melbourne, Australia, ISPRS.
Pillat, R.M. and Freitas, C.M.D.S., 2006. Coordinating views in the InfoVis
toolkit. Proceedings of the working conference on Advanced visual
interfaces - AVI ’06, p.496.
Plumlee, M.D. & Ware, C., 2006. Zooming Versus Multiple Window Interfaces :
Cognitive Costs of Visual Comparisons. , 13(2), pp.1–31.
Roberts, Jonathan C. (2004) Exploratory Visualization with Multiple Linked
Views. In: MacEachren, Alan and Kraak, M.J. and Dykes, J., eds.
Exploring Geovisualization. Amsterdam: Elseviers. ISBN 9780080445311.
Roth, S. F., et al 1996. Visage: “A user interface environment for exploring
information”. In Proceedings Information Visualization, pp 3-12. San
Francisco, IEEE.
Shneiderman, B., “The eyes have it: A task by data type taxonomy for
information visualization”. Visual Languages, 1996.
Sinnott, R., Bayliss, C., Galang, G., Greenwood, P., Koetsier, G., Mannix, D.,
Morandini, L., Nino-Ruiz, M., Pettit, C., Tomko, M., Sarwar, M., Stimson,
R., Voorsluys, W., Widjaja, I. (2012). A Data-driven Urban Research
Environment for Australia. 8th IEEE International Conference on
eScience 2012, Chicago, IL, USA, IEEE.
Tomko, M., Bayliss, C., Widjaja, I., Greenwood, P., Galang, G. G.,Koetsier, G.,
Sarwar, M., Nino-Ruiz, M., Mannix, D., Morandini, L., Voorsluys, W.,
Pettit, C. Stimson, R., Sinnott, R. (2012). The Design of a Flexible Webbased Analytical Platform for Urban Research. ACM SIGSPATIAL '12
Proceedings of the 20th International Conference on Advances in
Geographic Information Systems, Redondo Beach, California, USA,
ACM
Tukey, J. W. (1980). We need both exploratory and confirmatory. The American
Statistician, 34(1), 23-25.
Vetere, F. & Howard, S., 2000. Prior Knowledge And Redundant Multimedia.,
IEEE International Conference on Multimedia and Expo, 2000. ICME
2000.
Vetere, F. and Howard, S. (1999) Redundancy Effects in Instructional
Multimedia Systems. INTERACT '99: Proceedings of the 7th IFIP TC13
International Conference on Human-Computer Interaction, 589-596.
Wang Baldonado, M.Q., Woodruff, A. & Kuchinsky, A., 2000. Guidelines for
using multiple views in information visualization. Proceedings of the
working conference on Advanced visual interfaces - AVI ’00, pp.110–
119. ACM.
Weaver, C. (2004). Building Highly-Coordinated Visualizations in Improvise.
IEEE Symposium on Information Visualization, 2004. INFOVIS 2004,
Austin, Texas, IEEE.
Download