CMSC 838B Information Visualization Visualizing Mailbox Yoo Ah Kim Min-ho Shin ykim@cs.umd.edu mhshin@cs.umd.edu Department of Computer Science University of Maryland Abstract Electronic mails are one of the most popular computer applications. As the number of emails we exchange increases at high rate, it becomes more and more important how to manage huge volume of electronic messages. In addition, email data patterns may give us useful information including his/her personal history. We propose two visualizations of email dataset: time-based view and thread-based view. Timebased view displays messages in a twodimensional table of which rows are people and columns are received/sent time. To scale large volume of data, we use dynamic query and zooming method. Thread-based view shows emails that belong to the same thread. It shows all senders who participated in a thread and messages in the order of time with relations among those messages. Keywords: Electronic mails, time-based view, thread-based view, scalability Introduction Nowadays electronic mails are one of the most popular computer applications. As the number of emails we exchange increases at high rate, it becomes more and more important how to manage huge volume of electronic messages. Although emails are invented for asynchronous communication, they are used for other purposes such as task management, personal archives. In addition, email data patterns may give us useful information including his/her personal history. However, there is no proper visualization which can meet these purposes. In this paper, we propose two visualizations of email dataset to help users perform these tasks: time-based view and thread-based view. Timebased view displays messages in a twodimensional table, of which rows are people and columns are received/sent time. To scale large volume of data, we use dynamic query and zooming method. It also has sort, filter, aggregate functions to help users find information they need. Thread-based view shows emails that belong to the same thread. Threads are created using “reply” menu when users send mails. Thread-based view shows all senders who participated in a thread and messages in the order of time with relationship of those messages. Design Goals View sent/received email patterns With email dataset, users may want to see mail patterns according to time. Interesting questions are who sent the most emails in a certain period or when a person sent emails most frequently. To see patterns with large volume of messages, the scalability problem should be solved. We used dynamic query, zooming, aggregation, filtering, and gradation to cope with this problem. Find people and emails related to each other Emails can be threaded using "reply" and several users participate in a mail thread. It would be useful if we can see all participating users and who sent or received emails in the thread with relations among them. Search information in the mailbox Emails are used as personal archives to find information in the future. Several studies [8] showed that semantic hierarchies using folders, the most predominant scheme currently, is not suitable for this task because it is difficult for users to organize mail folders properly and figure out which mail folder has the mail they need. Because people may easily figure out senders and approximate sent/received time of the message, time-based view can help users find Outlook 2000 also has time-based view (Figure 2). They display all messages with subject at received time without aggregating by date or considering senders. Because they used the fixed width for a day and show all messages with subject, the view might be messy and hard to understand if there are too many messages. In the case that many emails arrives for a short time period, they expand y-axis to list them. Threading is necessary to help manage conversation history and track the status of conversation in emails [8]. Many systems are developed to visualize conversations in chat programs and instant messaging services [2][3][4][5][7]. Netscan thread trees display conversation thread for newsgroups. But visualizing email thread is more difficult because both senders and receivers are important and there are two kinds of messages - incoming and outgoing - unlike newsgroup. a mail they need. Thread-based view also makes it easy to extract related information by providing all messages in the same thread. Figure 2. Outlook 2000 Related Work Timestore [1] [9] organizes messages by time and sender in a two-dimensional grid as shown in Figure 1. Messages are displayed as dots encoding the number of messages as size. It allows narrowing of the search space using fulltext searching. They also merged it with task and calendar management system. Timestore focused on time-based archiving and retrieving emails Figure 3. Netscan Thread Tree Figure 1. Timestore Time-based Visualization Features In this view, we display messages in a two dimensional grid, of which row is email address of a person and column is date as shown in Figure 4. Each grid has the messages that the corresponding person sent/received on the given time. We encoded the number of messages as height in bar chart or gradation in spot. see (Figure 6). If users change a range, then data in the range will fit into the screen and data out of the range is hidden. By moving slider bar, we can see the hidden data, too. The labels such as addresses or date fit dynamically to the chosen range by displaying more detailed information as zoomed more. The first section shows email addresses of people who sent or received mails. The second section shows the number of mails the person sent/received in total, using bar chart. Users can choose the option whether they see incoming mails or outgoing or both. Users can choose date level as date, month, year that messages are aggregated by the level. When it is aggregated by date, there appear vertical lines by week to help users see weekly patterns. Sort can be done by the order of email addresses, domain names, and message counts. It has functions to filter people whose email address has a certain substring, especially filtering by domain name is an interesting query. It is also possible to search messages by email addresses or subject. Scalability - Bar chart vs. Gradation To see the number of messages in each gird more accurately and compare with others, bar chart might be more helpful. But if we have many people in a screen and a range of period is very long, it is difficult to show the patterns using bar chart. For the case that we have many people and long-term period, we have another view using gradation. Each cell has a spot and the gradation of the spot represents the number of messages. This view will give a good overview of messages in terms of people and date. While incoming and outgoing messages can be shown simultaneously in bar chart as color coding, spot s will only show the total number of messages as chosen. Figure 5 shows the views using bar chart. - Dynamic Query To manage large dataset, we also used dynamic query method for people and date. This will dynamically filter and zoom the range of data so that users can easily find the data they want to Message Selection As putting a mouse on the cell, the information of the cell- person and date - can be seen. Users can see the detailed information by clicking the right mouse button on the cell. A pop-up window will show up with a list of the messages in the cell. Each message has the subject and the number of messages in the thread which it belongs to. To see the thread view related to a message, users choose a individual message in the list. Figure 7 shows the pop-up window for message selection. Thread-based Visualization Thread view shows the relations of messages as shown Figure 8. For a chosen message, we find all messages that are related to it and display them with all the people who participated in the thread. The rows are people and messages are listed in the order of received/sent time. Note that unlike newsgroup data, both senders and receivers are important. We represented senders as big red rectangles and receivers as small blue circles. There appear arrows between senders and receivers of the same mails to show we. If a mail is the reply mail to the other, then another kind of links connects two mails, which is red thick lines in Figure 8. We divided time axis by date to help understand time information of messages. Problems in Visualization For outgoing mails, receivers are important because senders are always the owner. Receivers may not be one, so the same messages may appear several times in time-based view. This may show us more messages in visualization than really exists. But in some sense, we can think that several messages that have the same contents are sent to receivers. Our thread view can be detected only if users write messages using "reply", which will add reply information in email headers. But sometimes users may send emails without using it although they are replies to other mails. In this case, we should consider subjects, contents and receiver/senders group but it is much more difficult to find the correct information. "Forward" information also can be useful for constructing thread, but it is not available in our implementation because this is not a part of standard email headers. In case that the same person use several email addresses, we cannot detect them. Especially, if users are in a mailing list, we cannot find this only with mailboxes. In this case, it should be possible that users can specify which email addresses are actually from the same person and merge the data related to them. Future Work In our visualization, users can see data in many ways using filter, sort, search, etc. But they may want to edit or annotate at messages for future use. This function can be useful, especially in email dataset. For example, users may want to mark messages as it needs to be replied or as it is a reminder for future tasks. Search functions can be done only for subject, and sender/receivers. But it will be useful to search contents. Specifically we might want to find a message that has URL, Email-address, or attached files. In time-based visualization, we can aggregate or filter people based on domain name of their email addresses. But other aggregation/filtering can be done if we define groups for people in various ways. For example, we can make a group based on thread or users may define a group such as family, friends, colleagues, etc. More generally, it would be good if we can connect this visualization with databases that have information about people, and filter/aggregate people based on the database. We can think of another useful view of emails: group-based visualization. Email exchange pattern will give useful information about relations between people. We may group people based on how frequently they were in the same thread and visualize those groups as graphs. Conclusion We proposed two visualizations of email dataset: time-based view and thread-based view. Timebased view displays messages in a twodimensional table of which rows are people and columns are received/sent time and each cell has a list of messages for the person and the time. To manage large volume of data, we used dynamic query, zooming and gradation in this view. This view will give users temporal email exchange patterns of correspondents. Thread-based view shows emails exchanged using "reply". It displays all senders who participated in the thread and messages in the order of time with relations of those messages. This view is helpful to see view the history and track the status of conversation about the same topic. Acknowledgements We would like to thank Jihwang Yeo and Hyunmo Kang for their valuable comments. Reference [1] Baecker, R., Booth K., Jovicic, S., McGrenere, J., Moore, G. "Reducing the Gap Between What Users Know and What They Need to Know" [2] Donath, J., K. Karahalios, and F. Viegas, "Visualizing conversations", In Proceedings of HICSS 32, January 5-8, 1999 [3] Rodenstein, Roy and Judith S. Donath. (2000) "Talking in Circles: Designing A Spatially-Grounded AudioConferencing Environment", In Proceedings of CHI '2000, pp. 81-88 [4] Smith, Marc A., Cadiz, JJ and Burkhalter, B., "Conversation Trees and Threaded Chats", the Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work [5] Smith, Marc A. and Fiore, Andrew. "Visualization Components for Persistent Conversations", ACM SIG CHI 2001 [6] Shneiderman, B., "Dynamic Queries for Visual Information Seeking", IEEE Software, 11(6), 70-77 [7] Viegas, F. B. and Donath., J. S. "Chat Circles", Proc. of CHI'99. 1999 [8] Whittaker, S. and Sidner, C. "Email overload: exploring personal information management of email", In Proceedings of Conference on Human Factors in Computing System `96 [9] Yiu, K., Baecker, R.M., Silver, N., and Long, B., "A Time-based Interface for Electronic Mail and Task Management," In Design of Computing Systems: Proceedings of HCI International '97, Volume 2, Elsevier, 1997, 19-22.