Self-organizing map browser for database retrieval

advertisement
Presenting data as similarity clusters instead of lists. Data from local
politics as an example
Mauri Kaipainen, Timo Koskenniemi, Antti Kerminen, Antti Raike, Antti Ellonen
Media Laboratory, University of Art and Design Helsinki, Soft Computing Interfaces Group (SCIG),
mauri.kaipainen@uiah.fi
Similarity-clustering achieved by means of self-organizing map (SOM) is suggested as an alternative or
complementary for conventional list-based manner of organizing large data characterized by significant
patterns of multiple criteria. A sketch for a general-purpose browser interface is introduced as a tool to
visualize and manage data using the SOM as the starting point. A data corpus of city council candidates’
political opinions is used as an example.
1. Motivation and background
Immense reservoirs of information should already be open to everyone with a computer connected to the
Internet. However, only a fraction of this potential is effectively in the layman’s use. The usefulness of the
information found depends critically on not only on the search methods but also on the tools of managing the
found information. The standard way of presenting complex information is a standard list format allowing
data to be ordered and viewed only one dimension at a time, such as alphabetical order or measure of
goodness. This is handy when the information can be filtered down to a relatively small number. However,
sometimes it is necessary to view large numbers of items to figure out holistic patterns. Examples of such
cases include polls of qualitative opinions, complex system states consisting of multiple technical
measurements, or combinations of economical indices. Conventional list presentations fail to describe such
data optimally, because significant patterns tend to be of multi-dimensional character. From the users'
viewpoint, it is often time-consuming and requires a relatively high level of academic skill to find coherent
patterns in complicated data, because the significant information has to be collected from several lists and
integrated in a relatively complex cognitive process. The aim of our report of a work in progress is to
demonstrate an alternative to the conventional way of presenting data.
The cognitive load in such multi-criteria matching tasks can be reduced by presenting the user with data
pre-ordered into similarity clusters, which take full advantage of internal correlations within the data. In our
implementation, such order is automatically generated by the self-organizing map (SOM), introduced by
Kohonen (1982), more concisely described e.g. in Kohonen (1990). 1 It is an artificial neural network based on
a grid of abstracted neural-like units. It accepts lists of descriptive features, i.e. feature vectors consisting of
numeric values of primitive feature descriptions, as its input. The output of the SOM for each input vector is a
location on a two-dimensional map, surrounded by a distributed field of graded responses. The particularly
useful feature of the map is that inputs that are similar, defined in terms of proximity of their descriptive
vectors in the multidimensional input space, tend to be mapped close to each other on the map.
From the point of view of matching users’ cognitive capabilities, it is worthwhile to point out that the fields
of neural activity on the cortex respond to perceptual or behavioral stimulus patterns in a manner similar the
SOM, i.e., with the point of maximal response surrounded by graded degrees of activity with a tendency to
weaken as a function of distance. In such a mapping that can be legitimately called natural, relatively similar
stimuli correspond to relatively similar activity patterns on the cortex. Tonotopies (Hood 1977), somatotopies
(Merzenich 1988; Wall 1988) and spatial representations (Olton 1977) are well-known examples of such
cortical mappings. They suggest that categories, groups and clusters emerge in the brain in some process of
neurodynamical self-organization, whether of not it is comparable to the SOM algorithm.
2. The SOM Browser
Our contribution is to suggest both a computational solution combined with an interface design concept, in
order to increase accessibility and readability for the future use of such services. Practically, our interface, a
further development of (Kerminen, Raike & Kaipainen 2000), lets the user browse the map and retrieve
information in an intuitive similarity-based manner which we claim is native to human cognition. The
1
For an explication of the algorithm, see http://www.cis.hut.fi/research/som-research/som.shtml
interface, implemented as a JAVA applet, consists of 1) the map panel, 2) the questionnaire panel and 3) the
data panel. In addition, there is 4) a separate display window for the data that is retrieved.
3. Example corpus and demonstration
As an example from politics, it is often difficult for a layman to make sense of the candidates' agendas and in
particular to compare them with each other, because there are too many issues to take into account. In the last
local elections of Finland 2, all candidates for the city council of Helsinki (we exclude other communes of the
country, originally included in the data) were asked to fill a form to indicate their stances to a list of specific
political claims by choosing a degree of agreement with each. An Internet site 3 was established to facilitate
the comparison of the user’s opinion with those of the candidates. The user filled in his/her opinion using a
standard radio button interface (Figure 1). The output of the service was a list of best matching candidates
(Figure 2), resolved using a simple point-counting method.
Figure 1. The users of the original web site defined their opinions with respect to specific issues of local politics
using a radio button interface. This example is about whether wind power should be built on islands of Helsinki
archipelago. (Original web site.)
Figure 2. The original service output, a list of 10 candidates in the order of matching with the voter’s opinion given
above. (Original web site.)
Before the election date the service was accessed more than 70 000 times for or all 15 communes included
and more than 33 000 times for the Helsinki city.
The data consisted of a questionnary collected from city council candidates from Helsinki area in year
2000. Out of total 903 candidates 565 answered to some or all of the 20 questions. Answers to questions were
multiple selections of three or four choices and possibility to give weight to the topic as 1-3. The data was
converted for the self-organizing map as follows: The answers were either in nominal or ordinal scale,
according to question. In the latter case, each choice was mapped to a individual component and marked with
2
In Finland citizens choose their local council candidate from a relatively large group of candidates. In the
last local elections, several media companies built their case-specific search and evaluating engines to help
citizens to find out most suitable candidates. The first application was done by Finnish Broadcasting Company
in 1996 for the European Parliament elections.
3
http://www.vaalikone.net/kysymyslomake.jsp?Kunta=HEL
either 1 or 0. For each of weights and answers in ordinal scale, only one component was allocated and the
numerical value of choice or weight was used after normalizing it to fit in range between 0 and 1. The
procedure above resulted in 565 vectors each with 58 dimensions.
A SOM for the data was calculated using rectangular topology and Gaussian neighborhood. The
dimensions of the map were 16 times 20. During the teaching iteration the map was exposed to 26000 steps of
data altogether.
4. Demonstrations
How can I find my candidate? The user can set his/her own preferences using the questionnaire panel, which
is responded by the best matching node of the map to be indicated. (Figure 3).
Figure 3. The user’s preferences selected in the questionnaire panel. The best fitting node of the map is indicated,
and the information corresponding to the candidates mapped to that area is displayed in a separate window.
Show all candidates that are for, e.g. improving communal services in circumstances of positive economical
growth. The questionnaire panel lets the user choose a single question or a set of questions to be highlighted
in the map panel. Graded degrees of a dedicated color’s darkness indicate the degree of agreement of the
candidates with the questions, in terms of generalized prototypes. An example is depicted in Figure 4.
Figure 4. Highlighted map with labels corresponding to candidates for improving communal services in
circumstances of positive economical growth. Dark color corresponds to strong agreement.
How can I find out facts about these candidate?. The user can take the advantage of the visual indications by
selecting items on the map for further viewing, either by clicking a point or dragging over a rectangular area.
The display panel (on the right) allows the user to choose from among the available information associated
with the selected items, e.g., fields of the database. The chosen information will then be shown in the display
window, as shown in Figure 5.
Figure 5. A map area is chosen by mouse-dragging, correspondimg to the candidates with strongest position against
the planned tunnel under downtown Helsinki. The candidates and associated information are listed in the display
window.
5. Aspects of accessibility
The SOM browser facilitates reading complicated information for ordinary people, in particular for those not
familiar with scientific presentations. Moreover, people with a degree of dyslexia may find this kind of
visualized data presentation more accessible than strictly text based representations. From the point of view of
the visually impaired, similarity grouping is a way to focus on topologically laid-out clusters controllable by
mouse-hand movements. Combined by voice synthesis, such a presentation makes it easier to focus to the
desired topic and renders it needless to read out loud extensive lists, typical to hierarchical representations
search engine outputs.
6. Conclusion
We have introduced an approach and a sketch for a browser interface for information in which significant
patterns are defined by multiple dimensions. Our intention is to make such information more accessible and
graspable for all users. The idea is to organize large numbers of data items by content similarity using the selforganizing map paradigm, and to present the map so achieved with a set of easy-to-use online visulization and
navigation tools allowing the users explore the material starting from their individual interests. These online
tools give a possibility to explore data from the users’ individual points of view in terms of highlighting map
areas, responding immediately to their choices of descriptive feature dimensions. Such a hands-on approach to
data is most likely to support grasping and learning more effectively than fixed presentations.
The approach is particularly well suited for internet or database searches with a large number of arguments.
For example, it can be applied in conjunction with the WEBSOM technology (Honkela et al. 1996 etc.) for
analysing massive numbers of free-format text documents, such as archives of patent applications and popular
newsgroup articles.
We demonstrated the browser interface with data collected from the candidates for Helsinki city council,
each describing his/her stance toward a list of specific political issues. With this choice we intended to suggest
that in the future accessible knowledge management designs may have large socio-political value. We have
also hinted at the advantages of the knowledge presentation method for special groups such as the dyslectics
and the visually impaired. This work is an example of soft computation used as a method of knowledge
management and visualization. It can be considered to be “softer” than conventional solutions in the sense that
it serves the information to the user in a preclustered but not a priori indexed manner, yet offering tools to
visualize and manage the information in a holistic manner.
Acknowledgements
We thank Helsingin Sanomat4 for collaboration and access to the data.
Bibliography
Honkela, T.; Kaski, S.; Lagus, K.; Kohonen, T. (1996). Newsgroup Exploration with WEBSOM
Method and Browsing Interface, Helsinki University of Technology, Lab. of Computer and
Information Science. Report A 32.
Hood, J. (1977). Psychological and Psychological Aspects of Hearing, -Critchley, M.; Henson, R.
1980. Music and the brain. London: Heinemannss.
4
The data was collected by Helsingin Sanomat, the biggest daily newspaper in Scandinavia with a circulation
of 4469 729. The internet service, provided for 15 communes, was used more than 70,000 times before the
election date (33 000 for Helsinki).
Kaipainen, M.; Karhu, P. (2000). Bringing Knowing-When and Knowing-What Together.
Periodically Tuned Categorization and Category-Based Timing Modeled with the Recurrent
Oscillatory Self-Organizing Map (ROSOM), Minds and Machines 10: 203-229, 2000.
Kerminen, A.; Raike, A.; Kaipainen, M. (2000). Self-organizing map browser for database
retrieval, -Emiliani, P.L.; Stephanidis, C. (2000). User Interfaces for All. Special Theme:
Information Soociety for All, Consiglio Nazionale delle Ricerche, Istituto di Ricerca sulle Onde
Eöettromagnetiche "Nello Carrara", Firenze. 6th ERCIM Workshop, Florence, Italy 25-26October 2000.
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps, Biological
Cybernetics 43:59-69.
Kohonen, T. (1990). The Self-Organizing Map, Proceedings of the IEEE, Vol. 78, No.9, Sept.
1990.
Merzenich, M. M.; Recanzone, G.; Jenkins, W. M.; Allard, T. T.; Nudo, R. J. (1988). Cortical
Representational Plasticity, -Rakic, P.; Singer, W. (Eds.) 1988. Neurobiology of Neocortex. John
Wiley & Sons Limited.
Olton, D. S. (1977). Spatial memory, Scientific American.
Wall, J. T. (1988). Variable organization in cortical maps of the skin as an indication of the lifelong
adaptive capabilities of circuits in the mammalian brain, Trends in Neuroscience, Vol. 11, No. 12.
Download