ijde_mt_pr_iw_v5 - University of Melbourne

The Design & Implementation of Multiple-Linked-Views across Multiple Geospatial Datasets Ivo Widjajaa,*, Patrizia Russob, Chris Pettitb, Richard Sinnotta, Martin Tomkoa a Department of Computing and Information Science, The University of Melbourne, Melbourne, Australia b Faculty of Architecture, Building and Planning, The University of Melbourne, Melbourne, Australia Ivo Widjaja, Department of Computing and Information Science, The University of Melbourne, Melbourne, VIC 3010, Australia. ivow@unimelb.edu.au Provide short biographical notes on all contributors here if the journal requires them. The Design & Implementation of Multiple-Linked-Views across Multiple Geospatial Datasets Keywords: linked visualizations, Subject classification codes: include these here if the journal requires them Abstract TBW: Martin, Chris Introduction In our age of exponential growth of available data collected about all facets of our society (Huberman and Adamic, 1999), effective and efficient data visualization techniques are of upmost importance to enable (a) initial visual data exploration of the patterns detectable in the datasets (Tuckey, 1980), and (b) efficient communication of the results of analytical (scientific) enquiry. Web-based applications now provide efficient means to provide access to very large datasets that are beyond the processing capacity of single computers, as well as allow the mashing of data sourced from diverse sources. Facilitating the visual exploration of the patterns contained in and across such disparate datasets is a challenge that motivates this paper. While many Web-based applications provide more than one visualisation form and modalities, many of them are often designed with a focus on a particular visualisation paradigm (e.g., mapping environments enabling 2D mapping – Google Maps, OpenStreetMap; 3D spatial data visualization in virtual earth environments – VirtualEarth, GoogleEarth; or statistical graphics – Gapminder). Some applications enable the combination of statistical graphics (barchart, histogram, line chart) and thematic mapping. Only a few provide capabilities of linking these various visualisation forms by a technique called multiple-linked views (MLV), such as the OIC Weave (http://www.oicweave.org/). GeoViz Toolkit (Hardisty and Robinson, 2011), while not a Web-based system, enables users to highlight relationship across 2D map, charts, and tabular data. Providing multiple linked view modalities has been shown to support user cognition in data exploration tasks in various contexts (Buja et al., 1991, Buja et al., 1996, Schneiderman, 1996), including in the geospatial context (Anselin, 2002; Roberts, 2004). It is, however, a challenge to support multiple linked views of data in the context of heterogeneous, often federated data sources, where visualisation in a single useroriented environment is required that allows ad-hoc combinations of data from diverse data sourced from distributed, autonomous data providers. In this paper, we first, systematically explore, in the spirit of the human factors discipline, how MLV can lead to better understanding of information. Human computer interaction literature has well established that multi-modalities in user interaction and the multi-media presentation of data can increase the usability of a software system aimed at data presentation (Vetere and Howard, 1999). This is the theoretical starting point of this paper. This paper is grounded in a systematic literature survey of various MLV design and implementations in (particular) geospatial applications. We critically analyse their context-of-use, as well as identify the issues and challenges around the use of MLV. Second, we describe an approach enabling the implementation of an environment where diverse visualization views (exposing different dimensionalities of explored data, such as 2D and 3D spatial data visualization, raw data display and statistical graphics) can be linked and visually correlated in the context of dynamically, ad-hoc sourced datasets. The approach is demonstrated through the visualisation of Twitter data as a rich source of ad-hoc, streamed, spatio-temporal data. The visualisation is intended to highlight the spatial trajectories of individual (anonymised) users as well as the general pattern of a Twitter crowd within some geospatial boundaries. To achieve this, we use a Space Time Cube (STC), a spatio-temporal visualization inspired by the space-time analytical approach proposed by Hagerstrand (1970) and further developed by Kraak (2003) and others). We demonstrate how the 3D STC can be further linked to a slice of the data rendered over a 2D map as a point-and-path visualization, and linked to the corresponding views of the data in a tabular view. This provides the ability to access the data records’ details on demand (Shneiderman, B., 1996). We further introduce additional contextual data that can support the 3D and 2D views. In this case an on-thefly linguistic analysis of the tweets is performed to explore the relationship between space, the tweets and their linguistic properties. These contextual data is also visualised as part of the existing views, which can be linked to other visualisation forms (e.g., bar chart showing frequencies of individual languages) of the contextual data and their aggregates. This approach brings a richness of presentation, which can simultaneously provide “zoom-in” (detail) version and “zoom-out” (overview) version of the overall data, and thus facilitates the exploration of the rich phenomena contained in related datasets. The visualisation environment is investigated as part of the Australian Urban Research Infrastructure Network project (www.aurin.org.au). AURIN is tasked with developing e-infrastructure providing access to diverse, federated data, modeling and visualisation tools to support the urban researcher community in Australia. Such an infrastructure model provides the opportunity to link a particular dataset (e.g. demographic, public health or even Twitter data) with for example, socio-economic indicators of the relevant geospatial context. Through providing multiple-linked-views across a myriad of spatial datasets we believe this increases the usability and increases the overall utility of the data. Background Theory of Multiple Linked Views Schneiderman (1996) identified seven elementary tasks humans perform when exploring data: overview, zoom, filter, details-on-demand, relate, history, extracts. Data need not be tied to a single representation, and different representations may indeed be applicable and appropriate to discover different patterns in the data. Multi-linked views (MLV), also known as Coordinated Multiple Views (CMV), are terms referring to an interactive visualization construct where two or more data views (each a different data representation form) of identical or related data are made interdependent in order to enhance the data exploration and information seeking behaviour of the user (Buja et al., 1991; Buja et al., 1996). MLVs can be viewed in Schneiderman’s scheme to support the relate, and to a lesser extent, zoom tasks in data exploration. Context-of-use of Multi-linked views MLV are used to depict data from different perspectives and in multidimensional spaces. They have been explored in a number of application contexts, including geopspatial data visualization in 2D (Anselin, 2002; Roberts, 2004, Weaver, 2004). Some, such as LinkWinds (Jacobson et al. 1994) and Visage (Roth et al. 1996), also include views for three-dimensional spatial visualizations. Here multiple views are used to show data attributes, attributes relationships, related information but also to control and display objects (Boukhelifa et al., 2003). In most cases, each view contains data of the same type and exposes a visualisation form most appropriate for the specific attribute or facet of the data (Boukhelifa et al., 2003). In short, the design possibilities of MLV are numerous, depending on the realised variety of view composition, design and interaction means. The claimed benefits of MLV are that the separation of data attributes into different views decreases clustering, extends the analysis scope and facilitates data comparison (Butkiewicz et al. 2008; Boukhelifa et al. 2003). A common notion is that MLV integrates the benefits of each view in one environment allowing simultaneous application of views with no need for users to conduct data inspection separately and multiple times. Interactive Aspects of Multi View Linkage How the views are linked together in order to communicate related information differs considerably. Coordination, as view linkage is also called, can take many forms (Boukhelifa et al. 2003). Views may be displayed side-by-side or individually and linked by graphical objects (e.g. lines). If multiple views are used, coordination can be used to display views permanently and simultaneously or asynchronously, respectively (Boukhelifa et al. 2003). Interactive techniques that support coordination and regulate how views interact between each other are an essential element of MLV for exploratory visualisation. Interactive techniques commonly used in MLV are linking, brushing and navigational slaving (Pillat & Freitas 2006; Wang Baldonado et al. 2000). These interactive techniques provide actions and manipulations in one view (e.g. selection, highlighting, filtering, query) to be executed in all other views. The interactivity is often expressed by the change in the visual encoding (change in visual attributes) to display each data record in a given view (Mackinlay, 1986), in particular during interactions where data are selected. Simultaneous communication between views is particularly beneficial for facilitating information discovery and recognition of attribute relationship. Examples of MLV that illustrate brushing techniques are Xmdvtool, Spotfire, Cdv and IVEE (Boukhelifa et al. 2003). The coordination configuration can be either imposed by the system, or can be, to a certain extent, defined by the user. Applications such as GeoVista Studio (Hardisty and Robinson, 2011), Snap-Together (North and Shneiderman, 2000a), InfoVis (Pillat & Freitas 2006) and Improvise (Weaver, 2004) allow users to (partly) design and compose their own MLV environments according to their requirements and to select field-ofviews, compositions and linkages of views. Such configuration is, however, often relatively complicated and geared towards expert visual data miners. Useability of Multiple Linked View Displays North and Shneiderman (2000) show how coordinated displays can significantly improve user performance in visual data navigation and interpretation. The simultaneous display of multiple views is, in the case of linked overview-and-detailviews, particularly useful. This use-case relates to a hierarchical coordination between overview and detailed views of data (overview-and-detail-view). This can be best illustrated on spatial data visualizations, where an overview 2D map shows the total area of map coverage and highlights the location (or geographic area) of the detailed view. The detail-view is usually displayed in the foreground as the main view, and allows a field of focus into spatial micro-scale processes (Butkiewicz et al. 2008). To preserve the focus and context features, past research suggests a relative scale factor of up to 25-30 (Plumlee & Ware 2006). By showing both views, user’s awareness of spatial context and scale difference is enhanced. Further benefits of overview-and-detail-view are: it allows comparison of the same spatial information at two different scales (micro and macro); it facilitates the identification of possible scale-based pattern differences, and it supports the reduction of occlusion when dealing with great amount of data. While this use-case supports the need for coordinated linked views, other research emphasizes that MLVs may not always be appropriate as they may be overwhelming and unnecessary complex for users (Wang Baldonado et al. 2000). The useability of MLV for completing typical tasks such as identifying attributes relationships and patterns can be systematically evaluated by measuring the effectiveness (success of responses) and efficiency (required time) of the MLV interfaces, as well as by the collection of qualitative assessments of users’ satisfaction (North & Shneiderman 2000). User-friendliness of MLVs is often associated with the amount of users‘ working memory required (Wang Baldonado et al. 2000; Convertino et al. 2003; Vetere & Howard 2000). Working memory is where users store short-term information directly related to the task at hand (Plumlee & Ware 2006). The amount of information stored determines a users‘ mental workload which in turn might be translated with how cognitively overwhelmed users are (Wang Baldonado et al. 2000; Convertino et al. 2003; Vetere & Howard 2000). Any additional views, perceptual techniques and cues but also redundant information in MLV occupy the already limited working memory. Accordingly, cognitive research suggests the reduction of unnecessary and redundant interface elements and information. However, that a trade-off between increased complexity and information provision is likely to occur is stated by Nielsen (1993: 115): ”Every additional feature or item of information on a screen is one more thing to search through when looking for the thing you want”. In other words, any additional interface element uses limited working memory and increases mental workload. On the other hand, however, the added information could be relevant for task solving. Design Considerations for Multiple Linked View Displays As shown, careful consideration for the applicability and design of MLVs is paramount. A better understanding of how users‘ cognitive processes and abilities influence task completion in MLV is an important step towards designing less overwhelming MLVs (Convertino et al. 2003). Given the lack of design guidelines specific for MLV, Wang Baldonado et al. (2000) formalised a list of eight guidelines for the application and design of MLV. These eight guidelines are: diversity, complementarity, decomposition, parsimony, space/time resource optimization, self-evidence, consistency and attention management. The first two guidelines (diversity and complementarity) support the use of MLV where a diverse (multiple attributes, abstraction levels, etc.) and complementary (nonredundant) dataset exists. Guideline 3 and 4 (decomposition and parsimony) deal with the number of views to include in MLV. While decomposition guides the splitting of complex views into multiple simpler ones that are easier to interpret by users, parsimony suggests to reduce the number of the views to a minimum to decrease the cognitive effort required to mentally switch between views (Wang Baldonado et al. 2000). Balancing these two requirements well in a MLV application is of primary concern. The last four guidelines (space/time resource optimization, self-evidence, consistency and attention management) are more design-oriented, yet no less important. The space/time resource optimization guideline indicates that the time required to develop MLV is likely to be higher then for independent views. Therefore, it is worth considering whether the benefits of using MLV compensate time requirement for developing it. Another aspect to consider when developing MLV is screen space requirement. MLV require usually more space on the screen, especially when the views are displayed side-by-side. A possible solution to this problem is to display the views sequentially or even better on-demand to have a smaller number of views on screen at a time. This solution may also solve problems related to guideline 4 (parsimony) in that it reduces the risk of cognitively overwhelm users without reducing information. The attention management and self-evidence guidelines use perceptual techniques (e.g. brushing, movement, sound) and cues (e.g. highlighting, selecting) to lead the user through the MLV environment and enhance relationship recognition respectively. For facilitating user interaction with MLV environments, the consistency guideline suggests all views to use the same presentation and interaction design. Besides the data to be analysed, the next significant determinant on the applicability of MLV are the analytical tasks that have to be supported by the data-viewing environment. Previous research has shown that MLV increases users‘ performance and satisfaction especially when completing difficult tasks which require switching between views (Pillat & Freitas 2006; North & Shneiderman 2000). Compared to users of independent views, MLV users performed better for tasks such searching details, identifying data in another scale and recognising non-evident patterns. For simple looking-up tasks where the inspection of an individual view was sufficient, no performance difference between independent views and MLV users was measured. In relation to user satisfaction with the system, MLV scored higher than independent views and MLV users state that not having perceived coordination is disturbing. Application Context: A flexible, Web-based environment for ad-hoc urban data access, analysis and visualisation AURIN is an initiative of the Australian government aiming at facilitating the access to, and analysis of data of relevance to urban researchers across the country. AURIN’s main deliverable is the AURIN Portal, a fully Web-based, eResearch environment that exposes to researchers a range of functionalities that allow for the discovery and acquisition of diverse, spatially-referenced data (such as demographic datasets, public health datasets, fundamental GIS layers, and many others), and their interactive visualization and exploration in a rich user environment (Sinnott et al., 2012). The users’ client-side requirements are minimal, and restricted to a modern Web browser. The users can interact with the system facilitating all steps of the shop-explore-analysecreate-collaborate and share cycle (Tomko et al., 2012). The environment allows exploratory data analysis linking map, tabular, and statistical graphics views, and further supports the researchers with a confirmatory data analysis capability based on a workflow environment. A fundamental property of AURIN is that all data are directly sourced, on-demand from autonomous, federated data providers through targeted data services. A user is guided through a process of selection of area of interest (through a MLV interface exposing the various geographical regionalisations of Australia on a map as well as in a form similar to the Mac OS finder interface); and a subsequent step of data search and discovery (Tomko et al., 2012). The acquired data are then acquired, and can be visualized through a coordinated, MLV interface linking tabular, map and graph views based on a lightweight set of user-oriented (and data provider provided!) metadata. In the following Sections, we illustrate how this mechanism works and how it addresses some of the guidelines from Wang Baldonado et al. (2000). Linked views across Single vs Multiple Data Sources Most applications that support multiple-linked views only allow for linking across multiple forms of visualisation of a single dataset. This is not sufficient in situations where a phenomenon is described by multiple datasets, often sourced from different, federated data providers. The ability to link views of data pertaining to different datasets is a highly desirable feature for advanced exploratory data environments such as AURIN, as discussed later. The challenges for providing this capability are the variability in the ways various data providers describe their datasets. Typically, within a single dataset, brushing techniques relies on a particular data column – it’s key – to link multiple views of the dataset. With multiple datasets, normalisation of keys needs to be performed before the actual linking of datasets can be done (Figure 1). Figure 1 Linking views across multiple datasources – link normalization. The records (features) in each dataset can be linked based on their feature keys, as long as they belong to the same geographical aggregation. Each key can be normalised based on the application of a regular expression recorded in a metadata entry. Dataset key normalisation Dataset key normalization is a procedure that allows to associate the key of one dataset A (keyA) with that of dataset B (keyB), and apply a mapping f to the literal value of keyA if required. 𝑓 𝑘𝑒𝑦𝐴 → 𝑘𝑒𝑦𝐵 (1) The normalisation is enabled through an advanced data registration system maintaining information about the primary keys of federated datasets. These primary keys relate to the unique identifiers of regions mapped in the various regionalisations for which the Australian census data are reported by the Australian Bureau of Statistics (ABS). We note that beyond the key parameter, the geographic aggregation of the dataset needs to also be specified in the metadata (whether the data relate to suburbs, statistical local areas on any other geographies). Only data pertaining to the same level of aggregation should be compared together. The mapping is performed on-the-fly by applying a regular expression based on the key of a dataset, if and only if such a normalization is required (e.g., if a federated data provider has altered a primary key, for instance by prefixing an integer identifier by a string, such as SLA – short for statistical local area, one of the regionalizations of Australia). This regular expression is stored in the metadata store together with information about the aggregation of the data (in this case, SLA). This mechanism also allows for additional safeguards that enable linked views only between datasets that relate to the same data aggregation level and thereby guide users through correct research practice. One-to-One and One-to-Many Linking A one-to-one linking where there is a bijective relationship between records in one or multiple datasets and/or across multiple views realized based on linking over a unique primary key is the most common type of MLV linkage. The key of a record in the dataset A is then directly mappable to a record in the dataset B, as are any of the views of these datasets. Linking through direct correspondence is used to emphasize focus on a single data record relating to a single real-world entity and provides visual isolation, slicing a dataset and providing a view of the entity in diverse contexts. This is the default behavior in many MLV applications such as GeoVista. While most shopped datasets represent a single snapshot in time of a particular measure across a particular geospatial aggregation level, some datasets might contain categorical data (such as a result from a data classification process). Examples of data classification processes abound in scientific applications, for instance as a preparatory step for choropleth mapping (e.g., binning into Jenks classes (Jenks, 1967)), or as a result of other classification (in our use-case, a language classification of the tweets’ text in a Twitter dataset). Linking the results of a classification process (e.g., counts of records belonging to a given category) in a MLV display is therefore highly desirable in an analytical interface. This requires the linking of multiple views beyond a unique record and its key, and may require a one-to-many (surjective) mapping. This approach is developed to highlight the association of a record from a particular dataset with as many features as possible from other datasets. We call the brushing interaction resulting form such surjective mappings greedy brushing. The linking may be done by association of a categorical attribute of a dataset, other than the primary key. This approach works best with the entire data set (cube) or its substantial sub-cube instead of a single slice of a cube-like dataset. The one-to-one and one-to-many linking can be depicted as shown in Figure 2. Figure 2 One-to-one (top) and one-to-many (bottom) linking Consequently, we investigate whether multiple views linked by non-unique attributes in greedy brushing could provide additional benefits in data exploration and thus support the acquisition of insights from multiple visualisations. Implementation MLV-based scientific data visualization and exploratory analysis can greatly benefit from enhancements in Web infrastructures, in particular the recent progress in browser capabilities (HTML5) and the increased availability of data sourced from a variety of Web-enabled services. In this Section, we first present the implementation details of the ordinary and greedy brushing record linkage in the ad-hoc, MLV environment implemented in AURIN. We then evaluate the implementation, and demonstrate how AURIN’s federated data access philosophy and flexible user interface capabilities address some of the guidelines introduced by Wang Baldonado et al. (2000). Data The demonstration is based on a multiple-linked-view system using a dataset harvested from Twitter. This dataset is a convenient and accessible source of spatio-temporal data (tweets are positioned through geographic coordinates and time is recorded as the tweet timestamp), with secondary attributes that can be used in an automated classification process, such as language detection of the Tweet’s text. The spatio-temporal aspect of the dataset is visualized using a 3D Space-Time Cube (latitude, longitude and time) visualisation as one of the data views. The twitter dataset used for our experiments was anonymised and the location of the tweets shifted by about 100m to preserve privacy. The dataset has not been released to the users of the AURIN Portal and has only been used for development and concept testing purposes. The initial, primary dataset contains numerous tweets from multiple users. A second dataset is generated though an automatic classification process from the first dataset, analyzing the Tweets and detecting the language used in the message. This is a second- derivation dataset. A third dataset used is also derived form the primary dataset, and contains the list of users (restricted to the users that have contributed tweets in the primary dataset) and the corresponding number of tweets. In this sense, it is also a second-derivation dataset. Lastly, we have a third-derivation dataset that lists the languages detected in the tweets and their associated counts. Use Cases Over these datasets, our system allows for various views and visualisations including: a data grid (a tabular, spreadsheet-like representation of raw data), barchart, scatter plot, 2D map and the space-time cube. Across these views the Web-based application provides linking and brushing capabilities. The linking and brushing behavior can be raw twitter dataset tweet_id user_id user count (Z) text lat lon created_at user_id count bar chart of user count (A) point plot with user path (C) twitter dataset with languge identified (X) tweet_id user_id text lat lon created_at language space time cube (D) point plot of lat, long (E) language count (Y) language count bar chart of language count (B) Figure 3 Datasets and visualizations used in the use case. The raw Twitter dataset (top-left) has been processed and visualized in data grid views of the user counts (Z), a grid view of languages of Tweets (X), and the grid view of language frequencies (Y). The visualizations explored are (A) – the bar chart of user counts, (B) – the bar chart of language counts, (C) – the point plot of user paths, (D) – the space-time cube of individual users, and (E) – the point plot of user locations. described in the following examples: Use Case 1 - 1:n brushing: Consider the visualisation views described in Figure 3. This example demonstrates a 1:n brushing, where the highlighting of a single record’s representation in a primary or secondary dataset (a tweet point) on the 2D map highlights the corresponding entry in the primary data table (data grid view), the record’s visualization in the space time cube (point in 3D coordinates), as well as the corresponding representations of the aggregated records in the language and user count bar charts and the grid views of the third-derivation datasets (not shown, for legibility) (see Figure 4). For all purposes, these derived datasets are independent datasets linked through normalized keys, and the behavior of the system is identical to that where adhoc datasets are linked. Figure 4 Use-case 1: Brushing across various data view modalities. Use Case 2 – 1:n greedy brushing: The selection of an aggregated data record from a third-derivation dataset representing m raw records links to n raw data records in the primary or secondary datasets. Highlighting a particular language in the bar chart highlights all the tweets of the given language in the map (Figure 5). Similarly, the selection of a single user’s bar in the bar chart of its tweet frequency leads to the highlighting of all of the tweets from this user, as well as the highlighting of the user’s path on the map Figure 6. Figure 5 One-to-many brushing by language. Figure 6 One-to-many brushing by user. Use-case Summary The use-cases illustrating our approach demonstrates how complex patterns contained in a number of closely related, but nevertheless disjoint datasets can be visualized and explored through multiple linked views operating in a dynamic Web-based environment where ad-hoc datasets can be linked through a relatively simple key normalization process. The following table summarizes some of the technical issues and potential benefits that emerge during our implementation of MLV (Table 1). Table 1. Summary of greedy brushing in MLV Implementation Single Dataset Multiple-LinkedViews Metadata A attribute key needs to be provided Attribute processing No processing needed. Implementation Challenges Simple Visualisation Strength Allow focus (highlights through brushing) and zoom in (detail through tooltip) Multiple Geospatial Dataset Multiple-Linked-Views One-to-one Linking One-to-many Linking Other than attribute key, a regular expression may be needed to allow for normalisation of keys across matching datasets, together with a specification of geographical aggregation level Normalisation Matching strategy needed across keys needs to be defined (e.g. specifying additional categorical attributes) Simple, but Complex normalisation needs normalisation and to be done. matching. Dependency Computationally checking needs to expensive when ensure the same indexing is not geospatial level managed linking Allow focus Allow focus (highlights through (highlights through brushing) and zoom brushing) and zoom in (detail through in (detail through tooltip) tooltip). Additionally, users may see distribution patterns in other views We further evaluate our approach from the perspective of the eight considerations outlined by Wang Baldonado et al. (2000), against the AURIN Portal’s implementation of the MLV concept. We address the first four considerations at a coarse level as follows:  Diversity and complementarity: diverse, ad-hoc datasets can be linked together on demand by users. The configuration of the linkage is based purely on metadata and relates to the identification of the dataset’s key, it’s normalization mapping, and it’s aggregation level;  Decomposition and parsimony: in our approach, each view is atomic and presents a single visualization modality. The addition of views to the graphical visual interface is done on demand by the user, and is not defined nor limited. The user is free to open and close any visualization view that is applicable to the data explored. The detailed evaluation applied to the use cases is summarised in Table 2. Table 2. Summary of our MLV Evaluation using Wang Baldonado et al.'s (2000). Guidelines for MLV design MLV in AURIN portal (Wang Baldonado et al., 2000) Diversity Attributes Posting date and location and language of 1 tweet Models 2D vs. 3D displays User profiles Spatial vs. non-spatial perceptual capabilities Levels of abstraction More detailed 2D map vs. overview in 3D Genres Bar chart vs. tabular view Complementarity Relations between geographical and 2 temporal distribution and language of tweets Decomposition Geographical and temporal views, language 3 count, number of tweets per sender Parsimony User defined 4 Space/time resource optimization User defined 5 Self-evidence (perceptual cues) Brushing, highlighting 6 Consistency Interface of views Same line and circle colours in 2D map and 7 STC view State of views Provided by brushing functionality Attention management (perceptual Highlighting 8 techniques) The evaluation has demonstrated that the MLV approach implemented in AURIN supports a significant diversity of data and visualisation forms. In this context, the relations between geographical and temporal distribution and language of tweets for more comprehensive analysis are emphasised. The content of the summary table is decomposed in two geographical and two bar charts supporting user’s working memory. In relation to guideline 3, 4, and 5 great freedom is provided for arranging the views. The user can decide whether and where to display the individual views as well as determine the size of each view. This freedom for view arrangement and composition can be both an advantage and disadvantage, and requires deliberate user action. We assume some level of experience in data analysis with MLV to exist for defining efficient composition and arrangement of views. Especially for users who are unfamiliar with MLV a default interface may be useful, providing a dashboard-like environment. The creation of such an interface is, however, complex, as dashboards are often tailoured to specific use-cases, whilst AURIN allows users to identify the datasets of interest, and gives them full freedom to analyse and visualise the data on-the-fly. Brushing and highlighting respectively ensure that the user is led between the multiple views and that important features are highlighted. While the analysis state is shown in all views simultaneously (brushing), the lack of interface consistency (i.e., bar chart and choropleth map colours) between the multiple views represents the major limitation of this MLV at this moment. The evaluation revealed that the provision of MLV in this case is highly appropriate (guidelines 1-4). However, providing a default composition and arrangement of the views as well as more interface consistency could improve presentation and use of the MLV. Discussion We now discuss our MLV approach using a cost-benefits analysis, over some computational and cognitive aspects associated with the implementation. Computational overheads The creation of on-the-fly normalisation and one-to-many matching is computationally expensive. One possible solution to overcome this is the use of real-time indexing during the process of data prefetching or attaching the categorical indexing during the build of the view. This is necessary for the enablement of the brushing functionality. While some of these steps can be performed on the client side, it is likely to be more efficient to offload the computationally-heavy transformations (such as regular expression application) to a server-side process performed once, following the data discovery and acquisition process. The brushing and linking in our system are done automatically based on metadata records configured by data providers. As such users need not worry about the setting up of the linking or the selection of the attributes for linking. While this is convenient, we feel that an option should be provided for the advanced users, who want to have explicit control over how the linking should be done. This is also opportunity to provide users with the ability to specify how the one-to-many transformation should be done. Cognitive benefits We have observed a number of benefits for pattern discovery that is facilitated by our approach to on-to-many linking over ad-hoc datasets. We highlight these benefits on the following two scenarios: Exploring the change of behavioural patterns in space. Consider the point and path visualisations of a Twitter user’s timeline (Figure 4). The vector paths are colored based on user identity. This facilitates the exploration of users behavior in space. If we want to explore whether the language usage of the users is correlated with their spatial behaviour, we can use the brushing over a particular language in the language count bar chart to highlight the distribution of a particular language in the map. Finding out that the languages spread over multiple paths and the presence of non-highlighted nodes within a single path indicates how users used multiple languages. This is the case with some tweets over the international community of Melbourne, which also hosts a large number of international students and visitors using, interchangeably, a number of languages (e.g. thai and english) Exploring the linear distribution of values over a data grid. The ability to interactively sort datasets in a natural order (for instance, by time) can be used to explore the temporal change in a dataset. The data grid view can provide an indication of the distribution, emergence or spread of particular patterns in the data, for instance when exploring categorical values. This can be used to answer queries such as how the use of particular languages varies over time in a given spatial region or for a given user, or whether a user communicates through Twitter in regular intervals or rather Tweets only in bursts (in the context of other users). Conclusions and Future Work We have described the AURIN approach to the implementation of MLV interfaces in an ad-hoc, federated data environment. The issues, challenges, and solutions faced when implementing this solution have been illustrated on two use-cases using a Twitter dataset containing spatio-temporal data. Some distinctive themes surfaced: the enablement of MLV in a distributed, heterogeneous database system requires more complex data normalisation and metadata handling. Such an environment has opened up the possibility of exploring aggressive view linking (such as greedy brushing of one-tomany records), where the link is derived from primary keys and relevant categorical data attributes. We showed how this approach brings up a novel feature of the AURIN environment, demonstrated across multidimensional views of the data (table, charts, 2D map, and 3D digital earth). We have discussed the potential benefits of improved support to user cognition workload when detecting insightful pattern while highlighting the need to care for impending complexity. Our future work will focus on the improvement of the key normalization approach and the exploration of its behavior when handling temporal visualizations. We are also exploring the handling of explicitly grouped data (such as temporal slices of a larger data cube), in particular with respect to facilitating the automated construction of oneto-many MLV environments for temporal pattern exploration. References Anselin, L., Syabri, I., & Smirnov, O. (2002). Visualizing multivariate spatial correlation with dynamically linked windows. Urbana, 51, 61801. Boukhelifa, N., Roberts, J.C. & Rodgers, P.J., 2003. A Coordination Model for Exploratory Multi-View Visualization. Buja, A., Cook, D., & Swayne, D. F. (1996). Interactive high-dimensional data visualization. Journal of Computational and Graphical Statistics, 5(1), 7899. Buja, A., McDonald, J. A., Michalak, J., & Stuetzle, W. (1991). Interactive data visualization using focusing and linking. In Visualization, 1991. Visualization'91, Proceedings., IEEE Conference on (pp. 156-163). IEEE. Butkiewicz, T. et al., 2008. Multi-focused geospatial analysis using probes. IEEE transactions on visualization and computer graphics, 14(6), pp.1165–72. Available at: http://www.ncbi.nlm.nih.gov/pubmed/18988960. Boukhelifa, N., J. C. Roberts, and P. J. Rodgers (2003). A Coordination Model for Exploratory Multiview Visualization. In Proceedings of the International Conference on Coordinated and Multiple Views in Exploratory Visualization, 2003. pp. 76-85. IEEE. Convertino, G., Chen, J., Yost, B., Ryu, Y. S., and North, C. 2003. Exploring Context Switching and Cognition in Dual-View Coordinated visualizations. In Proceedings of the International Conference on Coordinated and Multiple Views in Exploratory Visualization, 2003. pp. 55-62. IEEE. Hardisty, F., and Robinson, A. C. 2011. "The geoviz toolkit: using componentoriented coordination methods for geographic visualization and analysis." International Journal of Geographical Information Science 25 (2). pp.191210. Hägerstrand T (1970) What about people in regional science? Paper Reg Sci Assoc 24:7–21 Huberman, B. and Adamic, L. Growth dynamics of the World Wide Web. Nature 401(6749), 131 (1999). Jacobson, A. S., Berkin, A. L., and Orton, M. N. (1994). LinkWinds: interactive scientific data analysis and visualization. Communications of the ACM, 37(4), 42-52. Jenks, G. F. (1967). The data model concept in statistical mapping. International yearbook of cartography, 7(1), 186-190. Kraak, M. J. (2003). The space-time cube revisited from a geovisualization perspective. In Proc. 21st International Cartographic Conference (pp. 1988-1996). Mackinlay, J. (1986). Automating the design of graphical presentations of relational information. ACM Transactions on Graphics (TOG), 5(2), 110141. Nielsen, J. (1993). Usabilig Engineering. Chestnut Hill, MA: AP Professional. North, C. & Shneiderman, B. (2000a). Snap-Together Visualization : A User Interface for Coordinating Visualizations via Relational Schemata. Proc. Advanced Visual Interfaces 2000. North, C., & Shneiderman, B. (2000b). Snap-together visualization: can users construct and operate coordinated visualizations?. International Journal of Human-Computer Studies, 53(5), 715-739. Pettit, C., I. Widjaja, et al. (2012). Visualisation Support for Exploring Urban Space and Place. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Science, Melbourne, Australia, ISPRS. Pillat, R.M. and Freitas, C.M.D.S., 2006. Coordinating views in the InfoVis toolkit. Proceedings of the working conference on Advanced visual interfaces - AVI ’06, p.496. Plumlee, M.D. & Ware, C., 2006. Zooming Versus Multiple Window Interfaces : Cognitive Costs of Visual Comparisons. , 13(2), pp.1–31. Roberts, Jonathan C. (2004) Exploratory Visualization with Multiple Linked Views. In: MacEachren, Alan and Kraak, M.J. and Dykes, J., eds. Exploring Geovisualization. Amsterdam: Elseviers. ISBN 9780080445311. Roth, S. F., et al 1996. Visage: “A user interface environment for exploring information”. In Proceedings Information Visualization, pp 3-12. San Francisco, IEEE. Shneiderman, B., “The eyes have it: A task by data type taxonomy for information visualization”. Visual Languages, 1996. Sinnott, R., Bayliss, C., Galang, G., Greenwood, P., Koetsier, G., Mannix, D., Morandini, L., Nino-Ruiz, M., Pettit, C., Tomko, M., Sarwar, M., Stimson, R., Voorsluys, W., Widjaja, I. (2012). A Data-driven Urban Research Environment for Australia. 8th IEEE International Conference on eScience 2012, Chicago, IL, USA, IEEE. Tomko, M., Bayliss, C., Widjaja, I., Greenwood, P., Galang, G. G.,Koetsier, G., Sarwar, M., Nino-Ruiz, M., Mannix, D., Morandini, L., Voorsluys, W., Pettit, C. Stimson, R., Sinnott, R. (2012). The Design of a Flexible Webbased Analytical Platform for Urban Research. ACM SIGSPATIAL '12 Proceedings of the 20th International Conference on Advances in Geographic Information Systems, Redondo Beach, California, USA, ACM Tukey, J. W. (1980). We need both exploratory and confirmatory. The American Statistician, 34(1), 23-25. Vetere, F. & Howard, S., 2000. Prior Knowledge And Redundant Multimedia., IEEE International Conference on Multimedia and Expo, 2000. ICME 2000. Vetere, F. and Howard, S. (1999) Redundancy Effects in Instructional Multimedia Systems. INTERACT '99: Proceedings of the 7th IFIP TC13 International Conference on Human-Computer Interaction, 589-596. Wang Baldonado, M.Q., Woodruff, A. & Kuchinsky, A., 2000. Guidelines for using multiple views in information visualization. Proceedings of the working conference on Advanced visual interfaces - AVI ’00, pp.110– 119. ACM. Weaver, C. (2004). Building Highly-Coordinated Visualizations in Improvise. IEEE Symposium on Information Visualization, 2004. INFOVIS 2004, Austin, Texas, IEEE.

ijde_mt_pr_iw_v5 - University of Melbourne

Related documents

Products

Support

ijde_mt_pr_iw_v5 - University of Melbourne

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib