Using Map-Based Visual Interfaces to Facilitate Knowledge Discovery in Digital Libraries Olha Buchel Faculty of Information and Media Studies University of Western Ontario [email protected] ABSTRACT In recent years there has been growing interest in supporting knowledge discovery activities using map-based visual interfaces. The goal is promising and ambitious, but not very easy to achieve due to the lack of understanding of cognitive factors involved in how information is transformed into knowledge. In this paper we present a map-based visual interface, VICOLEX (VIsual COLlection Explorer), aimed at facilitating and supporting knowledge discovery and users’ cognitive activities by means of integrated visual representations coupled with interactions. Keywords Map-based visual interfaces, design, knowledge discovery, visual representation of georeferenced collections, digital libraries. INTRODUCTION In 1999 MacEachren et al. (1999) suggested the possibility of integrating geographic visualizations with knowledge discovery (KD). The possibility of bringing these two research areas together sounds ambitious, yet promising. The two disciplines can both contribute insight to the joint venture. One of the possible areas of investigation that can emerge from this joint venture is the development of mapbased visual interfaces (MVIs) that can support KD activities. At the outset of such investigation, it seems that the main concern of both research areas is the following: discovering useful knowledge within given information1. This concern involves such tasks as discovering patterns in large volumes of data, identifying new patterns of data distribution and dispersion, formulating hypothesis based on observed patterns and trends, and finding new unsuspected correlations and relationships (Fayyad, Grinstein, & Wierse, 2002; MacEachren et al. 1999). This observation, however, does not result in a simple borrowing This is the space reserved for copyright notices. ASIST 2011, October 9-13, 2011, New Orleans, LA, USA. Copyright notice continues right here. 1 In this paper, we use the terms data and information interchangeably. Kamran Sedig Department of Computer Science Faculty of Information and Media Studies University of Western Ontario [email protected] of ideas and methods from the two areas, and does not translate to readymade, easy design choices. Fayyad, et. al. (2002) and MacEachren et al. (1999) emphasize the interactive and iterative nature of KD. In KD humans need to interpret information and make many decisions in the process of refining knowledge. From the point of view of cognition, information interpretation and decision making are complex activities in their own right. In feature interpretation, for example, users have to link abstract representations of data with the prior knowledge of their own. Although a great portion of interpretation takes place in the human mind, people often help their thinking by performing small external actions with information such as selecting, filtering, rearranging, reformulating, and simplifying representations (Kirsh, 2009). At first glance, such actions might seem superfluous, but their value can be better understood when they are considered in the context of some activity (e.g., in the context of performing KD activities using highly-cluttered maps). To cope with complexity and to interpret encoded information, users’ visual system samples visual information on maps by some inherently selective perceptual acts that direct the attention to restricted regions of the visual field. By processing maps selectively, people visually extract a pool of hot spots (Amit & Geman, 1999; Yang, Yuan, & Wu, 2007), suppress the distracters, apply spatial filters, group similar items, and perform operations on entire groups (e.g., reject, classify) (Luck & Hillyard, 1994). To compute all these operations in the mind by relying solely on vision is difficult. For this reason when people work with paper maps they often perform many external actions: they fold maps in order to better focus on hot spots; they annotate them; they mark locations of interest; and carry out other actions (Knapp, 1995). This example demonstrates that with external actions people help their vision and prepare information for higherlevel cognitive activities such as interpretation, decision making, and KD. Therefore, MVIs that are intended to facilitate KD activities should provide users with mechanisms by which they can act upon visual representations (MacEachren et al. 1999). The goal of this paper is to examine the role of interactions with visual representations in KD activities. In particular, we explain the value of interactions in MVIs as front ends to complex digital libraries (DLs). BACKGROUND An MVI is made of interactive representations that provide access to information and facilitate and support a KD activity. A representation here refers to an integrated set of visual encodings of entities (such as documents and locations) and their properties. Such representations can take various forms: maps, graphs, tables, and so on. The main representation in a georeferenced collection is a map. However, as an information space can be very complex, a map can only encode a subset of the space’s entities and relationships. As a result, other information elements and relationships can be encoded and communicated using different representations. These representations can then be integrated to work as a unit at the interface level. For example, in georeferenced DLs, other representations, such as tables and graphs, can be placed on top of a map to communicate other aspects and properties of information. Even though representations encode information elements and their relationships and properties, all static representations have limitations, can support only certain tasks, and can provide answers to certain questions. Finally, due to the amount and complexity of encoded information, a representation may become cluttered and dense, and hence ineffective at communicating information. This is certainly true of maps representing complex DL collections. To compensate for some of the limitations of static representations and to increase their utility, the MVIs should provide support for users’ actions by means of computer interactions. Computer interactions have two components: actions and reactions (Fast, & Sedig, Accepted). A user acts upon a representation and the representation reacts and gives a response. An interface can reduce its complexity and density by making certain representations of information latent. Interactions, then, can allow users to perform physical actions on the interface in order to bring latent information to a more observable level in order to simplify mental unpacking and elaboration associated with representations (Kirsch, 2003). More specifically, interactions enable different properties, relations, and layers of static visual representations to be probed, and available on demand, thereby making the information representations better suited to the individual and contextual needs of users; this can potentially enhance users’ ability to explore, navigate, and transform different elements and features of map-based visual representations, all important cognitive tasks involved in KD activities. Besides information latency, information context also plays an important role in KD. Any particular object, document, data, or event can be informative only under certain circumstances depending on the inquiry and on the expertise of the inquirer (Buckland, 1991). It follows from this that the designers of visual representations, whose goal is to facilitate KD, have to surmise situations of information use. In this paper we assume the position that situations can be predicted for particular contexts, and interactions can support situational use of information. Interactions serve as a glue that binds a series of low-level actions to support different situational tasks that can be performed with representations. In this sense, interactions allow information to behave dynamically and situationally so that it can facilitate users’ needs more effectively. This in turn plays an important role in transforming information into personal knowledge in KD situations. PROTOTYPE COLLECTION Our testbed collection is about the local history of Ukraine. It is comprised of 349 MAchine Readable Cataloguing (MARC) book records from the Library of Congress Catalogue. All these records have call numbers that belong to DK508 class of the Library of Congress Classification. This class contains placenames and has many MARC records linked to them via call numbers. Among the selected MARC records there are the entire collections of records for 32 Ukrainian cities which we treat as subcollections of the whole collection. This collection is highly contextual: documents in this collection are interconnected by subjects and have similarities in bibliographic descriptions, forms/genres, languages, and places of publication. Context in this collection is inferred by the ontological properties of documents in the collection such as physical descriptions, languages, subjects, and authors. All of the above-mentioned properties were chosen to be visually represented. VICOLEX In this section we present our prototype MVI, VICOLEX (VIsual COLlection Explorer). VICOLEX is designed to allow users to explore georeferenced collections. It is designed with close attention to representations and interactions with the purpose of making collection structure more salient; providing users with multiple perspectives on the data; and therefore facilitating KD and sense making. Representations As to representations, we chose a variety of different representations, each of which represents a collection from a different perspective. More specifically, all metadata records were mapped onto Google Maps (GM) (see Figure 1 below). Each marker of GM represents the number of metadata records in each sub-collection. Since some collections for individual locations have quite a large number of records (e.g., Lviv – 78, Kyyiv – 92), additional graphical representations were used to represent ontological properties of sub-collections. An example is shown in Figure 1. The scatter plot is utilized for showing book heights, number of pages, and languages (Figure 1.a); the pie chart, for displaying languages (Figure 1.b); the histogram, for showing years of publication (Figure 1.c); the embedded map, for visualizing places of publication (Figure 1.d); the Kohonen map, for representing subjects (Figure 1.e); and the tag cloud, for displaying authors (Figure 1.f). increases the speed and accuracy of information processing, and reduces cognitive effort required to complete propertyrelated tasks (Enns & Akhtar, 1989). In VICOLEX, the results of filtering can be observed not only on the surface of the map, but also on the representations of ontological properties of individual sub-collections that are linked to markers. Because of filtering on the map, the representations of properties in sub-collections become more legible and easier to understand. Such filtering allows completing tasks not only at the level of information entities, but also at the level of properties. Figure 1. Representing collections on Google Maps. Overall, VICOLEX has 193 representations. These representations encode the entities in the prototype collection and their properties. These representations help users gain insight into the various aspects of the collection which are hidden from view on the main map. Each representation encodes only small portions of the information about the collection and supports only specific tasks, hence making the main map in VICOLEX less cluttered. Some representations assign additional meaning to data (e.g., histogram of the years explains years of publication in terms of historical periods). Each set of representations for each location encodes storybooks about that sub-collection, related to subjects, years of publication, languages, book sizes, authors, and where the subcollection was published. Interactions Despite obvious advantages, the above approach of using different representations to communicate properties and entities in the collection still has shortcomings. In particular, it is difficult to understand how properties are related to each other; how they are distributed spatially and temporally in the collection; how properties of the collections can be combined and viewed together; and how people can adapt VICOLEX’s MVI to their own needs. To overcome these shortcomings in VICOLEX, we augment representations with interactions, particularly linking, filtering, selecting, and grouping which we discuss next. Filtering Filtering allows users to sift out document properties. Users can query the ontological properties of a collection by ticking off checkboxes and by setting limits on timelines that show time of acquisition and publication (shown in Figure 1). Property-based filtering reduces the complexity of high-dimensional data, reduces cluttering, gives users flexibility in selecting properties, and generates a number of easy-to-understand displays, each focused clearly on a particular aspect of the underlying data. In general, filtering helps inhibit the processing of task-irrelevant information, Selecting Variable selection and feature extraction are regarded to be crucial steps in KD (Fayyad, Grinstein, & Wierse, 2002). Selecting objects with certain properties from unnamed geographic areas (e.g., north or south of some region) from MVIs can be quite challenging because such regions are rarely described in systems explicitly. To facilitate this type of selection, VICOLEX allows selecting regions with markers by drawing a bounding box around markers with a drag-and-drop rectangle corner technique (Figure 2.b). This selection is intended to provide a sandbox like feel to the MVI, with the capability to dynamically adjust properties of objects since such a selection can be performed both on an entire collection as well as on a filtered collection. For example, a user can make visible only books about history and select only those from the Western Ukraine using the bounding box (Figure 2.a and b). Properties that are suppressed by filtering cannot be selected with the bounding box. Moreover, the area selection mechanism in VICOLEX is coupled with grouping interaction which results in representing the selected documents with the same set of additional representations as documents that are linked to individual markers (Figure 2.c). Such selections with groupings can be useful for answering the following questions: a) In which area of Ukraine do collections have more illustrations? b) Are places of publication in collections about small locations different from places of publication about large locations? c) Is there a difference in subjects in collections about different parts of Ukraine? And other queries. Figure 2. Example of selection with filtering and grouping. KNOWLEDGE DISCOVERY USING VICOLEX In this section, we briefly discuss how representations of the entities in the prototype collection along with the implemented interactions in VICOLEX support KD. We report a number of discoveries that we made using VICOLEX, particularly with regard to changes in the collection during the 1980ies and 1990ies. In general, the discoveries can be classified as quantitative and qualitative. Quantitative One of the things that we were able to discover was that the larger half of the entire collection was published after 1991. This is evident from filtering the main map by the years of publication: before 1990 and after 1991. Second, we found that, in publications prior to 1990 maps were rarely included in books. Moreover, books with maps published before 1990 are about large cities only, whereas books with maps after 1991 are about both small and large places. Third, the number of publications in Ukrainian significantly increased after 1991. Fourth, books in Polish about Ukraine were nonexistent before 1981. But beginning with 1981 the number of publications in Polish started increasing incrementally, especially about Lviv. Fifth, after 1991 certain subjects started to demonstrate significant growth (e.g., Biographies, Archaeological Excavations). Qualitative The majority of qualitative changes are associated with subjects. Just as the number of published books increased after 1991, the variability of subjects became greater after 1991 too. Subjects that emerged after 1991 are “Ethnic Relations,” “Ukrainian, Nationalism,” “Minorities,” “Jews,” “Economic conditions,” “International Executive Service Corps,” “Vinnytsia Massacre, Vinnytsia, Ukraine, 1937-1938,” “Political Prisoners,” “Rehabilitation,” “Political Prosecutions,” “Prisoners of War,” “Massacres,” and others. Many of these subjects were banned during the period when Ukraine was part of the Soviet Union, and therefore they do not appear in books published before 1990. Second, it appears that books about locations with population size smaller than 200,000 people appear to be smaller and fewer in total than books about larger locations. Third, Russian-language books are distributed more in the East and South than in the West. In addition, we were able to discover a few sub-collections with unusual language distributions other than Ukrainian and Russian. CONCLUSIONS In this paper, we have presented VICOLEX, a prototype front-end interface that provides ample support for users to perform KD by means of interacting with MVIs of library collections. A few of the representations used in VICOLEX included maps, pie charts, scatter plots, and tag clouds. Multiplicity of representations is intended to keep information latent, not to overwhelm users with too much information at once. The latent information remains hidden waiting for users’ interactions. A few of the interactions presented in this paper were linking, filtering, and selecting. These interactions are intended to support KD activities. As such, they simplify interpretation and understanding of information in various situations, facilitate transformation of information into personal knowledge for users, and ultimately support higher-level KD activities. The VICOLEX conceptualization can be utilized in the design of front ends to complex DLs with georeferenced collections. With numerous representations coupled with interactions DLs will become more suitable for KD. REFERENCES Amit, Y., and Geman, D. (1999). A Computational Model for Visual Selection. Neural Computation , 11, 7, 1691-1715. Buckland, M. (1991). Information and information systems. New York, NY: Greenwood Publishing Group, Inc. Enns, J. T., and Akhtar, N. (1989). A Developmental Study of Filtering in Visual Attention. Child Development (60), 1188-1199. Fast, K., and Sedig, K. (Accepted). Interaction and the epistemic potential of digital libraries. International Journal of Digital Libraries . Fayyad, U., Grinstein, G., and Wierse, A. (2002). Information visualization in data mining and knowledge discovery. London, UK: Academic Press. Kirsch, D. (2009). Interaction, External Representations and Sense Making. Proceedings of the 31st Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society. Knapp, L. (1995). A task analysis approach: to the visualization of geographic data. In T. L. Nyerges et al. (Eds.), Cognitive aspects of human-computer interaction for geographic information systems (pp. 355-371). Springer Verlag. Luck, S. J., and Hillyard, S. A. (1994). Spatial Filtering During Visual Search: Evidence From Human Electrophysiology. Journal of Experimental Psychology , 20, 5, 1000-1014. MacEachren, A. et al. (1999). Constructing Knowledge From Multivariate Spatiotemporal Data: Integrating Geographic Visualization (GVis) with Knowledge Discovery in Database (KDD) Methods. International Journal of Geographical Information Science , 13 (4), 311-334. Swanson, L. (1986). Organization of mammalian neuroendocrine system. In V. Mountcastle, F. E. Bloom, & S. Geinger (Eds.), Handbook of physiology. Sec. 1, The nervous system, Vol. IV, Intrinsic regulatory systems of the brain (pp. 317-363). Bethesda, MD: American Physiological Society. Yang, M., Yuan, J., and Wu, Y. (2007). Spatial selection for attentional visual tracking. Computer Vision and Pattern Recognition, 1-8.